Diagnosing an unresponsive Citrix Hypervisor server

Diagnosing an unresponsive Citrix Hypervisor server

book

Article ID: CTX120540

calendar_today

Updated On:

Description

This article describes how to diagnose an unresponsive XenServer or Citrix Hypervisor server. 

If you cannot connect to your Citrix Hypervisor server from XenCenter or other orchestration tool, you can use the steps in this article to attempt to regain access to the unresponsive server and gather diagnostics.

As part of these steps, you might be directed to gather the following types of diagnostic information:

Bugtool

The xen-bugtool command collates the Xen dmesg output, details of the hardware configuration of your machine, information about the build of Xen that you are using, plus, if you allow it, various logs.

 

Crashdump

If a Citrix Hypervisor server crashes, the running kernel can migrate into a special memory area which is based on kexec functions that loads a special kernel without the need for a server cold restart. This new kernel attempts to gather as much details about the crash as possible (including the following information from memory: stacktraces, various structures, kernel ringbuffer) and save it into the /var/crash directory.

However, as the server is in an exceptional state when it attempts this process, production of the crashdump is not guaranteed.
 

BMC logs

Depending on your server hardware, the Baseboard Management Controller (BMC) might gather information about system and network watchdogs, error logs, and sensor data into logs that you can access and use to diagnose potential hardware and firmware issues.

Requirements

  • Citrix Hypervisor server or XenServer host that is configured to use serial console (any supported version). For more information, see CTX121442 - How to Configure Serial Console Access on XenServer.

  • Serial console access. This must be real serial console, not emulated by the CPU running Xen. However, serial console emulated by the BMC (for example, serial over LAN or the various other protocols) can be used for this procedure.
    If you need to go as far as step 2, your serial console access must either be direct or use a different network to SSH or XenCenter traffic.


Instructions

1. Ping the server

If the Citrix Hypervisor server is not responsive to XenCenter, it might still be running but management functionality has failed.

  1. Ping the server.
    • If the server ping receives no response, move on to the next section in this article.
  2. SSH into the Citrix Hypervisor server.
    • If you are running your Citrix Hypervisor server in Common Criteria mode or have SSH access explicitly disabled, you cannot complete this step and must move on to the next.
  3. If your password is rejected or the log in attempt times out, try to log in as a local root user.
    • If you can log in as local root, but not with your user name, there might be issues with external AD authentication.
  4. Gather bugtool information to diagnose why the server is not communicating with XenCenter. In the dom0 console, run the following command:
    xen-bugtool  --yestoall
    This command creates a status report file on the Citrix Hypervisor server. The command prints the location of this file: 
    User-added image
  5. Copy your status report file from the Citrix Hypervisor server to another system to retain it.

If you are successful in this step, you do not need to progress to the next step.

2. Log in to the dom0 console through the serial console

If you are unable to ping or SSH to the Citrix Hypervisor server, the issue might be with either your network infrastructure or the networking stack on the Citrix Hypervisor server.

  1. If possible, check your network infrastructure before continuing diagnostic steps in Citrix Hypervisor.
  2. Connect to the Citrix Hypervisor server through the serial console.
    • If a login prompt is not available at the serial console, move on to the next section in this article.
  3. Log in to the dom0 console with the local root user and password.
    • If the login prompt does not respond to your login attempt, move on to the next section in this article.
  4. Gather bugtool information to diagnose any networking issues with the Citrix Hypervisor server. In the dom0 console, run the following command:
    xen-bugtool  --yestoall
    This command creates a status report file on the Citrix Hypervisor server. The command prints the location of this file: 
    User-added image
  5. You can copy your status report file from the Citrix Hypervisor server to a physical USB to retain it.

  6. Restart the Citrix Hypervisor server to see if this restores networking capability.

If you are successful in this step, you do not need to progress to the next step.

3. Use a physical or virtual NMI button

In response to an NMI, Citrix Hypervisor attempts to crash and gather crashdump information. However, as the server is in an exceptional state, production of the crashdump is not guaranteed.

  1. Use a physical or virtual NMI button to trigger a crash.
    • If the NMI button does not trigger a crash or is not available, move on to the next section in this article.
  2. Wait until the Citrix Hypervisor server restarts, then check to see if XenCenter can now connect to the server.
    • If you can access the server through XenCenter, it shows if a new crash dump is available
    • Alternatively, you can log on to the dom0 console and check for the new crash dump on the server:

      [root@xhost ~]# xe host-crashdump-list
      
      uuid ( RO)          : 82e7e92a-d0b3-28e8-4cd8-3a8aea131775
      timestamp ( RO): 20090323T15:05:28Z
      size ( RO): 16874464
      [root@xghost~]# ls  /var/crash/
      20090323-145416-GMT 
  3. Copy the crashdump information from the Citrix Hypervisor server to another system to retain it.
  4. If available, gather BMC firmware log information. 

If you are successful in this step, you do not need to progress to the next step.

4. Access the Xen hypervisor console through the serial console

If the NMI button doesn't work, you can attempt to trigger a crash from the Xen hypervisor console. This is a best-effort process for creating a crashdump. However, as the server is in an exceptional state, production of the crashdump is not guaranteed.

  1. Connect to the Citrix Hypervisor server through the serial console

  2. Press Ctrl + A three times. This shortcut switches you to the Xen hypervisor menu.

    • If the serial console does not respond to Ctrl + A, move on to the next step in this article.
  3. Press to display all available operations.

    User-added image

  4. Press Shift + C to trigger a crashdump.

  5. Wait until the Citrix Hypervisor server restarts, then check to see if XenCenter can now connect to the server.

    • If you can access the server through XenCenter, it shows if a new crash dump is available
    • Alternatively, you can log on to the dom0 console and check for the new crash dump on the server:

      [root@xhost ~]# xe host-crashdump-list
      
      uuid ( RO)          : 82e7e92a-d0b3-28e8-4cd8-3a8aea131775
      timestamp ( RO): 20090323T15:05:28Z
      size ( RO): 16874464
      [root@xghost~]# ls  /var/crash/
      20090323-145416-GMT 
  6. Copy the crashdump information from the Citrix Hypervisor server to another system to retain it.

  7. If available, gather BMC firmware log information. 

If you are successful in this step, you do not need to progress to the next step.

5. Power cycle the server

If none of the previous steps have been successful in gathering diagnostics and recovering the server, you must power the server off and on to attempt to recover the server to a working state. When you complete this step, no crashdump is produced

  1. Power the server off then on again.
  2. If available, gather BMC firmware log information. 
  3. Gather any other available diagnostics and log data.

 

(Optional) Gathering more information on reproduction

If you are able to reproduce the issue that caused the server to become unresponsive, you can edit the file syslog.conf to log more information. However, we advise that you gather as much information as possible from the unresponsive server before rebooting it to edit syslog.conf and attempting to reproduce the issue.

  1. Edit the file /etc/syslog.conf and remove the comment symbol (#) from the following line:
    kern.*                         /dev/console

  2. Run the following command to restart the syslog daemon. Depending on your version of Citrix Hypervisor or XenServer the command is one of the following:
    • [root@rxen1 ~]# service  syslog restart
    • [root@rxen1 ~]# service  rsyslog restart

      The command gives the following output:
      Shutting down kernel logger:                      [  OK  ]
      Shutting down system logger:                     [  OK  ]
      Starting system logger:                                  [  OK  ]
      Starting kernel logger:                                   [  OK  ]
  3. Restart the Citrix Hypervisor server and start it from xe-serial configuration.

  4. After the boot: prompt appears, type xe-serial (or choose the "serial" option) and restart the server.

    User-added image

Issue/Introduction

This article describes how to generate a crashdump from an unresponsive XenServer.

Additional Information

CTX121442 - How to Configure Serial Console Access on XenServer

CTX125372 - How to Collect Diagnostic Information for XenServer