How to Adjust the Bond Balance Interval in XenServer 6.x

How to Adjust the Bond Balance Interval in XenServer 6.x

book

Article ID: CTX134947

calendar_today

Updated On:

Description

XenServer 6.1.0 introduces a change to the NIC bonding load balancing algorithm for active-active bonds in the vSwitch network stack. In past releases, active-active bonds were set to rebalance load every 10 seconds. However, in XenServer 6.1.0 active-active bonds on the vSwitch rebalance load every 30 minutes. This article describes the new rebalancing interval, the rationale for the change, and how and when users should change the interval.

Source-load Balancing

NIC bonding is a mechanism that allows two or more network interface cards (NICs) to be used as if they are one physical NIC. When an active-active bond is created, outgoing traffic can use all bonded NICs. Active-active bonds use source-load-balancing (SLB), which employs the following method to determine which NIC to use for a particular packet:

  • The source MAC address is extracted, and a hashing algorithm is used to map it to a hash number 0-255.

  • Each hash is assigned to one of the NICs on the bond, which means packets with the same hash are always sent through the same NIC.

  • If a new hash is found, it is assigned to the NIC that currently has the lowest utilization.

In practice, this means that when virtual machines (VMs) are set up on a bond, packets from one VM (with the same source MAC) will always be sent through the same NIC.

Rebalancing

To ensure the load on the NICs is balanced, rebalancing occurs at a regular interval. The amount of traffic each hash generated recently is measured. The hashes are then redistributed between the NICs to ensure all NICs are utilized to approximately the same extent.

In previous releases (up to and including XenServer 6.0.2), the rebalancing interval was 10 seconds (for both vSwitch and the Linux bridge network stack).

Frequent Rebalancing Issues

The source-load balancing algorithm works on the assumption that if a hash is moved between NICs in the bond, the switch will update its MAC table and will correctly redirect all traffic to the new NIC. However, some switches do not cope well with frequent rebalancing, which results in a condition known as MAC flapping.

In this case, MAC flapping results in the switch receiving packets with the same source MAC address on different ports in short intervals of time. MAC flapping can cause issues like MAC table corruption or high CPU usage. Issues due to MAC flapping are more likely to occur when bonded NICs are connected to non-stacked switches. Previously, for some configurations, the only remedy was to switch to active-passive bonding mode, which, while offering failover functionality, does not use more than one link at the same time.

Change in XenServer 6.1.0

In XenServer 6.1.0, the rebalancing interval for the active-active bonding mode was increased from 10 seconds to 30 minutes. This is intended to eliminate problems related to MAC flapping while still providing load balancing. Consequently, for traffic levels that vary significantly over short periods of time, the traffic distribution may be less even than in previous XenServer versions.


Instructions

To change the rebalancing interval, complete the following procedure:

  1. If you experience uneven load distribution on active-active bonds, you might consider lowering the default value of 30 minutes, provided the switches used can tolerate more frequent MAC shifting between links. The following example illustrates how to obtain the bond's PIF uuid and device name and then how to verify the current rebalancing interval.
    xe bond-list params=master
    master ( RO) : 86e1c0c4-62fa-4964-7a77-e8cdfbea1b08
    xe pif-param-get uuid=86e1c0c4-62fa-4964-7a77-e8cdfbea1b08 param-name=device
    bond0
    ovs-vsctl list port bond0 | grep other_config
    other_config : {bond-detect-mode=carrier, bond-miimon-interval="100", bond-rebalance-interval="1800000"}

  2. The bond-rebalance-interval is expressed in milliseconds. The value of 0 indicates that automatic rebalancing will not occur (that is, each hash will remain assigned to the same NIC). If you want to change the rebalancing interval back to 10 seconds (10000 milliseconds), execute the following commands:
    xe pif-param-set other-config:bond-rebalance-interval=10000 uuid=<pif_uuid>
    xe pif-plug uuid=<pif_uuid> 

    Note that the other-config PIF parameter will be passed to the vSwitch layer only after replugging the PIF (or rebooting the server).

  3. A low rebalancing interval is known to cause issues on certain switches‒apply this change only if you are sure about your switches will cope well with MACs shifting frequently between ports (for example, if in the past you have successfully used active-active bonds on the same setup). If you experience any issues, revert to the initial settings.

Issue/Introduction

This article describes how to adjust the Bond Balance Interval in XenServer 6.x.

Additional Information

If you experience any difficulties, contact Citrix Technical Support.

For a review of all supported bonding modes, refer to CTX134585 – XenServer 6.1.0 Administrator's Guide.

For additional information about bonding in the vSwitch network stack, refer to Open vSwitch documentation. XenServer 6.1.0 uses Open vSwitch v1.4.2.

For further information about XenServer 6.1.0 refer to CTX134582 – XenServer 6.1.0 Release Notes.