This article describes how to diagnose an unresponsive XenServer or Citrix Hypervisor server.
If you cannot connect to your Citrix Hypervisor server from XenCenter or other orchestration tool, you can use the steps in this article to attempt to regain access to the unresponsive server and gather diagnostics.
As part of these steps, you might be directed to gather the following types of diagnostic information:
The xen-bugtool command collates the Xen dmesg output, details of the hardware configuration of your machine, information about the build of Xen that you are using, plus, if you allow it, various logs.
If a Citrix Hypervisor server crashes, the running kernel can migrate into a special memory area which is based on kexec
functions that loads a special kernel without the need for a server cold restart. This new kernel attempts to gather as much details about the crash as possible (including the following information from memory: stacktraces, various structures, kernel ringbuffer) and save it into the /var/crash
directory.
However, as the server is in an exceptional state when it attempts this process, production of the crashdump is not guaranteed.
Depending on your server hardware, the Baseboard Management Controller (BMC) might gather information about system and network watchdogs, error logs, and sensor data into logs that you can access and use to diagnose potential hardware and firmware issues.
Citrix Hypervisor server or XenServer host that is configured to use serial console (any supported version). For more information, see CTX121442 - How to Configure Serial Console Access on XenServer.
Serial console access. This must be real serial console, not emulated by the CPU running Xen. However, serial console emulated by the BMC (for example, serial over LAN or the various other protocols) can be used for this procedure.
If you need to go as far as step 2, your serial console access must either be direct or use a different network to SSH or XenCenter traffic.
If the Citrix Hypervisor server is not responsive to XenCenter, it might still be running but management functionality has failed.
xen-bugtool --yestoallThis command creates a status report file on the Citrix Hypervisor server. The command prints the location of this file:
Copy your status report file from the Citrix Hypervisor server to another system to retain it.
If you are successful in this step, you do not need to progress to the next step.
If you are unable to ping or SSH to the Citrix Hypervisor server, the issue might be with either your network infrastructure or the networking stack on the Citrix Hypervisor server.
xen-bugtool --yestoallThis command creates a status report file on the Citrix Hypervisor server. The command prints the location of this file:
You can copy your status report file from the Citrix Hypervisor server to a physical USB to retain it.
If you are successful in this step, you do not need to progress to the next step.
In response to an NMI, Citrix Hypervisor attempts to crash and gather crashdump information. However, as the server is in an exceptional state, production of the crashdump is not guaranteed.
Alternatively, you can log on to the dom0 console and check for the new crash dump on the server:
[root@xhost ~]# xe host-crashdump-list uuid ( RO) : 82e7e92a-d0b3-28e8-4cd8-3a8aea131775 timestamp ( RO): 20090323T15:05:28Z size ( RO): 16874464 [root@xghost~]# ls /var/crash/ 20090323-145416-GMT
If you are successful in this step, you do not need to progress to the next step.
If the NMI button doesn't work, you can attempt to trigger a crash from the Xen hypervisor console. This is a best-effort process for creating a crashdump. However, as the server is in an exceptional state, production of the crashdump is not guaranteed.
Connect to the Citrix Hypervisor server through the serial console
Press Ctrl + A three times. This shortcut switches you to the Xen hypervisor menu.
Press h to display all available operations.
Press Shift + C to trigger a crashdump.
Wait until the Citrix Hypervisor server restarts, then check to see if XenCenter can now connect to the server.
Alternatively, you can log on to the dom0 console and check for the new crash dump on the server:
[root@xhost ~]# xe host-crashdump-list uuid ( RO) : 82e7e92a-d0b3-28e8-4cd8-3a8aea131775 timestamp ( RO): 20090323T15:05:28Z size ( RO): 16874464 [root@xghost~]# ls /var/crash/ 20090323-145416-GMT
Copy the crashdump information from the Citrix Hypervisor server to another system to retain it.
If you are successful in this step, you do not need to progress to the next step.
If none of the previous steps have been successful in gathering diagnostics and recovering the server, you must power the server off and on to attempt to recover the server to a working state. When you complete this step, no crashdump is produced
If you are able to reproduce the issue that caused the server to become unresponsive, you can edit the file syslog.conf to log more information. However, we advise that you gather as much information as possible from the unresponsive server before rebooting it to edit syslog.conf and attempting to reproduce the issue.
Edit the file /etc/syslog.conf
and remove the comment symbol (#
) from the following line:
kern.* /dev/console
[root@rxen1 ~]# service syslog restart
[root@rxen1 ~]# service rsyslog restart
Shutting down kernel logger: [ OK ] Shutting down system logger: [ OK ] Starting system logger: [ OK ] Starting kernel logger: [ OK ]
Restart the Citrix Hypervisor server and start it from xe-serial configuration.
After the boot: prompt appears, type xe-serial (or choose the "serial" option) and restart the server.