This article explains how the new Heterogeneous CPU Pooling feature in XenServer works and how to leverage it to extend your XenServer host pooling capabilities.
To ensure successful live virtual machine migrations, XenServer 5.5 and earlier hosts were only allowed to join a pool if they had identical CPU vendor, model number, family, and feature flag values. However, most system vendors add and discontinue CPU offerings within the life cycle of a server model, making it difficult to purchase servers with identical CPUs over time.
XenServer (starting with version 5.6) contains two changes to simplify adding hosts to pools over time:
When joining a host to a pool, only the features exposed by the CPU are considered to determine CPU compatibility.
Added support for Intel (FlexMigration) and AMD (Extended Migration) technologies that provide CPU "masking" or "leveling".
These CPU masking features allow a CPU to be configured to appear as providing different features than it actually does, enabling CPU models with different features to appear identical.
This combination allows disparate host hardware to be joined into a resource pool, known as heterogeneous resource pools.
There are four types of heterogeneous pools:
Adding more capable host hardware to a less capable pool
Adding less capable host hardware to a more capable pool
Combining different and mutually exclusive host hardware into a pool
Combining different CPU models that have identical features
Type 1 requires applying a CPU mask on the joining server to make it compatible with the existing pool, which remains unmasked, and is supported automatically using XenCenter. Type 2 requires applying a CPU mask on existing pool hosts to make them compatible with the joining host. Type 3 requires applying a common CPU mask on both the joining host and existing pool hosts. Types 2 and 3 are supported using XenAPI and the xe CLI. Type 4 represents CPUs with different marketing model names but identical model number, family, and feature flag attributes. Because they have identical attributes, these combinations have always been supported and do not require a mask, but their compatibility has not been obvious when comparing the marketing model names.An additional nuance is that newer and older CPUs do not always translate to more capable and less capable. New CPU models often discontinue features that are present in older CPUs, which can result in mutually exclusive feature sets. As a result, it is not guaranteed that applying a mask to a host with a newer CPU is sufficient to join it to a pool containing older CPU models.
The heterogeneous pool types that require applying a CPU mask to hosts in an existing pool also imply that any existing virtual machines in the pool must be shut down until all hosts in the pool have the new CPU configuration in effect. Use of a rolling approach where virtual machines are consolidated through migration while hosts are rebooted in turn cannot be used because virtual machine migration is not supported across hosts with disparate CPU configurations.
The attributes of a XenServer 5.6+ host CPU can be viewed using the xe host-cpu-info CLI command. Output from that command running on hosts with Intel E5502 and X3353 CPUs looks like:
[host_a] # xe host-cpu-info cpu_count : 4 vendor: GenuineIntel speed: 1866.734 modelname: Intel(R) Xeon(R) CPU E5502 @ 1.87GHz family: 6 model: 26 stepping: 5 flags: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht nx constant_tsc pni vmx est ssse3 sse4_1 sse4_2 popcnt features: 009ce3bd-bfebfbff-00000001-28100800 features_after_reboot: 009ce3bd-bfebfbff-00000001-28100800 physical_features: 009ce3bd-bfebfbff-00000001-28100800 maskable: full [host_b] # xe host-cpu-info cpu_count : 4 vendor: GenuineIntel speed: 2666.668 modelname: Intel(R) Xeon(R) CPU X3353 @ 2.66GHz family: 6 model: 23 stepping: 6 flags: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht constant_tsc pni vmx est ssse3 sse4_1 features: 000ce3bd-bfebfbff-00000001-20000800 features_after_reboot: 000ce3bd-bfebfbff-00000001-20000800 physical_features: 000ce3bd-bfebfbff-00000001-20000800 maskable: base
In XenServer 5.5 and earlier, hosts were only allowed to join a pool if they had identical vendor, model number, family, and feature flag values¹. With heterogeneous pool support in XenServer 5.6, only the set of features exposed by the joining host's CPU must be identical to the pool master ², allowing the use of CPU masking features to configure identical sets of features.
¹ In XenServer 5.5, the “est” flag was ignored to ensure compatibility with XenServer 5.0.
² See "The pool.other-config:cpuid_feature_mask setting".
After restarting the host with the masked E5502 CPU, it's CPU configuration has a features value identical to the X3353, allowing the E5502 to successfully join a pool of X3353 hosts. After a mask is set, the CPU's un-masked features are retained in the physical features parameter:
# xe host-cpu-info
cpu_count : 4 vendor: GenuineIntel speed: 1866.734 modelname: Intel(R) Xeon(R) CPU E5502 @ 1.87GHz family: 6 model: 26 stepping: 5 flags: fpu de tsc msr pae mce cx8 apic sep mtrr mca cmov pat clflush acpi mmx fxsr sse sse2 ss ht nx constant_tsc pni vmx est ssse3 sse4_1 sse4_2 popcnt features: 000ce3bd-bfebfbff-00000001-20000800 features_after_reboot: 000ce3bd-bfebfbff-00000001-20000800 physical_features: 009ce3bd-bfebfbff-00000001-28100800 maskable: full
There are several methods to determine CPU compatibility.
Use the XenServer HCL. If you do not have a host that contains the CPU in question, such as when considering purchase of a new host to add to an existing pool, use the XenServer HCL (Hardware Compatibility List) to verify if the CPU model in the host being considered for purchase has been certified as compatible with the CPU model in the pool master host.
Attempt to join the host using XenCenter. If you already have the host to be added, attempt the pool join using XenCenter. When XenCenter detects that a joining host's CPU has a different features value than the pool master, it evaluates the two values to determine if type 1 applies and CPU masking can be used. If the following conditions are true, XenCenter offers to automatically calculate and apply the appropriate mask on the joining host. If the following conditions do not apply, the pool join fails with a "This server's hardware is incompatible with the master's" message:
The existing pool and joining host must both have an Advanced, Enterprise or Platinum license
The joining host's CPU has FlexMigration or Extended Migration support
The CPU vendor (Intel/AMD) of the joining host is the same as the pool master
The features of the joining host's CPU are a super-set of the pool master host's CPU feature
Use the compare-cpu script. The compare-cpu script (included in the Heterogeneous CPU Pool self- test kit here) uses the output of the xe host-cpu-info command from the joining host and the existing pool master host to compare the feature values and masking capabilities, and returns which type applies. With the examples above as E5502.txt and X3353.txt respectively, compare-cpu provides the following output:
# ./compare-cpu E5502.txt X3353.txt -v file1: E5502.txt file2: X3353.txt pool_mask: ffffff7f-ffffffff-ffffffff-ffffffff CPU 1: model name: Intel(R) Xeon(R) CPU E5502 @ 1.87GHz features: 009ce3bd-bfebfbff-00000001-28100800 masking level: full CPU 2: model name: Intel(R) Xeon(R) CPU X3353 @ 2.66GHz features: 000ce3bd-bfebfbff-00000001-20000800 masking level: base Result: CPU 1 and CPU 2 are compatible for masking Mask type: 1 CPU 1 has a superset of features to CPU 2 Mask: 000ce3bd-bfebfbff-00000001-20000800 # ./compare-cpu X3353.txt E5502.txt -v file1: X3353.txt file2: E5502.txt pool_mask: ffffff7f-ffffffff-ffffffff-ffffffff CPU 1: model name: Intel(R) Xeon(R) CPU X3353 @ 2.66GHz features: 000ce3bd-bfebfbff-00000001-20000800 masking level: base CPU 2: model name: Intel(R) Xeon(R) CPU E5502 @ 1.87GHz features: 009ce3bd-bfebfbff-00000001-28100800 masking level: full Result: CPU 1 and CPU 2 are compatible for masking Mask type: 2 CPU 1 has a subset of features to CPU 2 Mask: 000ce3bd-bfebfbff-00000001-20000800
Joining the E5502 to a pool of X3353s represents type 1. Therefore, the reverse (joining a X3353 to a pool of E5502s) represents type 2, and for that case, the mask required on the E5502 for compatibility with the X3353 is simply the X3353's feature value.
It is also possible to have CPU combinations with mutually exclusive features (type 3):
# ./compare-cpu X5560.txt E5420.txt -v file1: X5560.txt file2: E5420.txt pool_mask: ffffff7f-ffffffff-ffffffff-ffffffff CPU 1: model name: Intel(R) Xeon(R) CPU X5560 @ 2.80GHz features: 009ce3bd-bfebfbff-00000001-28100800 masking level: full CPU 2: model name: Intel(R) Xeon(R) CPU E5420 @ 2.50GHz
features: 040ce3bd-bfebfbff-00000001-20100800 masking level: base Result: CPU 1 and CPU 2 are compatible for masking Mask type: 3 CPU 1 and CPU 2 have a mutually exclusive set of features but support a common mask Mask: 000ce3bd-bfebfbff-00000001-20100800
The common mask must be applied to all hosts to address the mutually exclusive differences. Applying a CPU mask to hosts in an existing pool requires that any existing virtual machines in the pool must be shut down until all hosts in the pool have the new CPU configuration in effect. Use of a rolling approach where virtual machines are consolidated through migration, while hosts are rebooted in turn, cannot be used because virtual machine migration is not supported across hosts with disparate CPU configurations.
Manually: There are two sets of CPU features: base features and extended features. Both sets are separated into two halves, known as ecx and edx. Determining the masking options for a given pair of CPUs requires comparing the feature values in combination with the relative masking capability of each CPU. XenServer stores the feature bits in hexadecimal for brevity.
The type 3 examples above have the following features and masking support:
|| CPU model || base_ecx || base_edx || ext_ecx || ext_edx || Masking level || | X5560 | 009ce3bd | bfebfbff | 00000001 | 28100800 | full | | E5420 | 040ce3bd | bfebfbff | 00000001 | 20100800 | base |
The differences can be observed to be in base_ecx and ext_edx. Converting to binary shows the specific variance in supported feature bits:
|| CPU model || base_ecx (bin) || ext_edx (bin) || | X5560 | 000100111001110001110111101 |101000000100000000100000000000 | | E5420 | 100000011001110001110111101 |100000000100000000100000000000 | | | x x x | x | | | 26 23 20 0 | 27 0 |
In base_ecx, bits 20 and 23 are present in the X5560 but not in the E5420, and bit 26 is present in the E5420 but not in the X5560. In ext_edx, bit 27 is present in the X5560 but not in the E5420. Because both CPUs support base masking and the X5560 supports full masking (base and extended), a joint mask is possible. The joint mask can be calculated by performing a bitwise AND to turn off the mutually exclusive feature bits:
|| CPU model || base_ecx (bin) || ext_edx (bin) || | X5560 | 000100111001110001110111101 | 101000000100000000100000000000 | | E5420 | 100000011001110001110111101 | 100000000100000000100000000000 | | | x x x | x | | Joint | 000000011001110001110111101 | 100000000100000000100000000000 |
Converting the joint base_ecx and ext_edx values back to hexadecimal and padding to eight digits gives:
|| base_ecx || ext_edx || | 000ce3bd | 20100800 |
Combining those values with the unchanged ext_ecx and base_edx values provides the joint mask:
Each XenServer (starting with version 5.6) pool contains a pool.other-config setting that is used during the evaluation of CPU compatibility. The cpuid_feature_mask value represents a set of feature bits to ignore while comparing CPU features. By default, this value is ffffff7f-ffffffff-ffffffff-ffffffff, which, after converting to binary, shows that only base_ecx bit 7 (the "EST" feature flag) is ignored to provide compatibility with XenServer 5.0 and 5.5.Modifications to the cpuid_feature_mask should be done with great caution because it allows hosts with different features to be joined within a pool. If a virtual machine’s operating system or application detects and relies upon the presence of a specific feature, it might become unstable if migrated from a host that has the feature to one that does not.
Additional CPU combinations can be certified using the XenServer Server CPU Pooling Self-Test kit. Download the self-test kit from here. The kit includes details on testing requirements and how to submit results.