This article describes High Availability (HA) behavior in situations when the heartbeat is lost on XenServer 5.x.
HA does not check the link state of the interfaces. It rather works on the bases of HA Host Groups and Network Partitions. HA Daemon chooses the larger of the network partitions to take over and fences servers on the smaller partition.
See Peeking Under the Hood of High Availability for more information.
When there are three hosts in the pool and one losses connection, HA Daemon keeps the larger network partition with two hosts and the smaller network partition with one host. The host in the smaller partition is fenced and rebooted.
When there are only two hosts in a pool and one of them loses connection to the other, there is no larger partition to choose. When there are only two hosts in an HA pool, there is no way for the HA daemon to know which of the two hosts has had its network cable pulled. It ends up defaulting to the host with the lowest UUID, which in this case is the master with the bad network connection. In other words, the host with higher HOST-UUID reboots.
To resolve this, unplug the cable from management interface on one of the hosts in the pool. Following are the different scenarios:
The HA maintains the state of all three hosts. If one loses the connection or its heartbeat (host 3) to others, HA determines that Hosts 1 and 2 can still communicate (this is the larger host group; larger network partition). Therefore, Host 3 is fenced (smaller host group; smaller network partition).
From the preceding example, there is one problem with having only two servers in the pool. The network partitions are the same size after losing connection between Host 1 and 2. In this case, the host with the higher host-UUID is fenced and rebooted.
The following table shows multiple scenarios of unplugging cable on a particular host and the HA action.
MASTER |
SLAVE |
Which Server Fenced? |
Host1 (446…) [cable unplugged] |
Host2 (586…) |
Host2 (Higher UUID) |
Host1 (446…) |
Host2 (586…) [cable unplugged] |
Host2 (Higher UUID) |
Host1 (446…) |
Host2 (586…) [cable unplugged] |
Host2 (Higher UUID) |
Host1 (446…) [cable unplugged] |
Host2 (586…) |
Host2 (Higher UUID) |
Host2 (586…) [cable unplugged] |
Host1 (446…) |
Host2 (Higher UUID) |
Host2 (586…) |
Host1 (446…) [cable unplugged] |
Host2 (Higher UUID) |
Host2 (586…) |
Host1 (446…) [cable unplugged] |
Host2 (Higher UUID) |
Host2 (586…) [cable unplugged] |
Host1 (446…) |
Host2 (Higher UUID) |
It is recommended to increase the number of hosts in the pool to three to provide proper failover.