High Availability Behavior When the Heartbeat is Lost in a XenServer Pool

High Availability Behavior When the Heartbeat is Lost in a XenServer Pool

book

Article ID: CTX129721

calendar_today

Updated On:

Description

This article describes High Availability (HA) behavior in situations when the heartbeat is lost on XenServer 5.x.

HA does not check the link state of the interfaces. It rather works on the bases of HA Host Groups and Network Partitions. HA Daemon chooses the larger of the network partitions to take over and fences servers on the smaller partition.

See Peeking Under the Hood of High Availability for more information.

When there are three hosts in the pool and one losses connection, HA Daemon keeps the larger network partition with two hosts and the smaller network partition with one host. The host in the smaller partition is fenced and rebooted.

When there are only two hosts in a pool and one of them loses connection to the other, there is no larger partition to choose. When there are only two hosts in an HA pool, there is no way for the HA daemon to know which of the two hosts has had its network cable pulled. It ends up defaulting to the host with the lowest UUID, which in this case is the master with the bad network connection. In other words, the host with higher HOST-UUID reboots.

Environment

  • Two XenServer hosts in the pool
  • Dedicated management network (only for management traffic)
  • Dedicated storage network or Fibre Channel for storage or local storage
  • Dedicated network for VM traffic

Instructions

To resolve this, unplug the cable from management interface on one of the hosts in the pool. Following are the different scenarios:

XenServer Pool with Three Hosts

The HA maintains the state of all three hosts. If one loses the connection or its heartbeat (host 3) to others, HA determines that Hosts 1 and 2 can still communicate (this is the larger host group; larger network partition). Therefore, Host 3 is fenced (smaller host group; smaller network partition).

XenServer Pool with Two Hosts

From the preceding example, there is one problem with having only two servers in the pool. The network partitions are the same size after losing connection between Host 1 and 2. In this case, the host with the higher host-UUID is fenced and rebooted.

The following table shows multiple scenarios of unplugging cable on a particular host and the HA action.

MASTER

SLAVE

Which Server Fenced?

Host1 (446…)  [cable unplugged]
VM running

Host2 (586…)

Host2 (Higher UUID)

Host1 (446…)
VM running

Host2 (586…) [cable unplugged]

Host2 (Higher UUID)

Host1 (446…)

Host2 (586…) [cable unplugged]
VM running

Host2 (Higher UUID)

Host1 (446…) [cable unplugged]

Host2 (586…)
VM running

Host2 (Higher UUID)

Host2 (586…) [cable unplugged]
VM running

Host1 (446…)

Host2 (Higher UUID)

Host2 (586…)
VM running

Host1 (446…) [cable unplugged]

Host2 (Higher UUID)

Host2 (586…)

Host1 (446…) [cable unplugged]
VM running

Host2 (Higher UUID)

Host2 (586…) [cable unplugged]

Host1 (446…)
VM running

Host2 (Higher UUID)

It is recommended to increase the number of hosts in the pool to three to provide proper failover.

Issue/Introduction

This article describes High Availability (HA) behavior in situations when the heartbeat is lost on XenServer 5.x.

Additional Information

CTX119717 - XenServer 6.x High Availability