Troubleshooting NetScaler High Availability (HA) Issues

Troubleshooting NetScaler High Availability (HA) Issues

book

Article ID: CTX109013

calendar_today

Updated On:

Description

When troubleshooting issues related to the high availability feature of a NetScaler appliance, consider the following points:

Avoid NetScaler software mismatch

Avoid different NetScaler software releases and builds on the NetScaler appliances. Citrix recommends that you do not install different NetScaler software releases and builds on the NetScaler appliances forming the high availability setup. Such inconsistency can lead to undesired performances, such as failovers and missing configurations.

You can prevent this by using the stay primary and stay secondary commands on the respective appliances. For more information refer to Citrix Documentation - Forcing the Primary Node to Stay Primary and Forcing the Secondary Node to Stay Secondary.

Back to top

Restore configuration from a backup

​If necessary, you can retrieve the original primary appliance configuration from a backup copy present on the hard disk of the appliance. The appliance saves the last five copies of the ns.conf file in the /nsconfig directory. These ns.conf files are named ns.conf.0 through ns.conf.4. The ns.conf.0 file contains the most recently saved configuration. For more information on NetScaler configuration file refer to Citrix Documentation.

To retrieve and update the configuration of the appliance from a backup, complete the following procedure:

-rw------- 1 root wheel 4671 Feb 28 20:54 /nsconfig/ns.conf.0
-rw------- 1 root wheel 4671 Feb 28 20:54 /nsconfig/ns.conf.0
-rw------- 1 root wheel 4671 Feb 28 20:42 /nsconfig/ns.conf.2
-rw------- 1 root wheel 4671 Feb 28 20:41 /nsconfig/ns.conf.1
-rw------- 1 root wheel 4671 Feb 28 20:40 /nsconfig/ns.conf.4root@ns#

In the output of the preceding command, notice the date and time stamp for the ns.conf file.

  1. Log on to NetScaler CLI and issue the following command to switch to the shell prompt of the NetScaler appliance:
    shell 

  2. Run the following command to determine the name of the latest backup copy of the ns.conf file:
    root@ns# ls -ltr /nsconfig/ns.conf.?

  3. Run the following command to make a copy of the latest ns.conf backup file:
    root@ns# cp /nsconfig/ns.conf.0 /nsconfig/copyns.conf 

  4. Run the following command to switch to the command line interface of the appliance:
    root@ns# exit 

  5. Run the following command to read the contents of the copyns.conf file and run each line as an individual command:
    > batch –filename /nsconfig/copyns.conf 

  6. Run the following command to save the running configuration to the /nsconfig/ns.conf file:
    save config

Back to top


The nsroot password should match on NetScaler appliances

The nsroot password must be the same for the primary and secondary NetScaler appliance in the high availability setup. When the password of the nsroot user account is changed on either of the appliances, the change must also be performed on the peer appliance because the password synchronization is required between the appliances.

For more information refer to Citrix Documentation - Resetting the Default Administrator (nsroot) Password.

Back to top


Troubleshooting synchronization issues

Synchronization is a process of duplicating the configuration of the primary node on the secondary node. The purpose of synchronization is to ensure that there is no loss of configuration information between the primary and the secondary nodes, regardless of the number of failovers that occur. Synchronization uses port 3010.

User-added image

The ha_err_sync_failure counter increments when a NetScaler high availability synchronization failure is detected. The ha_err_sync_failure counter tracks the number of times the primary and secondary appliance failed to synchronize the configuration after the last transition. A synchronization failure results in mismatched configuration. For a complete list of NetScaler high availability counters refer to CTX131802  - NetScaler High Availability Counters.

If there are synchronization issues, verify the following information:

  • Run the sync ha files all command twice and examine the results. Occasionally an issue occurs when the file synchronization does not finish in a minute or when you manually run the sync ha files command simultaneously. To confirm this issue, you must stop the periodic synchronization by commenting the appropriate line in the crontab file.
  • Ensure that the primary and secondary appliances can communicate with each other. The management and heartbeat packets are sent on the L2 layer. The L2 layer connectivity between the two appliances in the high availability setup must allow the heartbeat packets to be received within 3 seconds on the Port 3003.
  • Verify port 22 is not blocked between the primary and the secondary appliance by using ACL or firewall policies. Port 22 is used by the rsync process. For detailed information on all ports that should be open refer to CTX101810 - Communication Ports Used by Citrix Technologies.
  • Ensure that any configured ACLs permit communication between the pair.
  • Ensure that nsconf, nsfsyncd and nssync process is running.
  • Ensure that SSL files are not missing on the secondary appliance.
  • Ensure that there is no temporary network connectivity loss between primary and secondary appliance.
  • Run the following command to verify that the nsnetsvc process is running:
    root@GA-NS4# ps auxw | grep -i nsnetsvc | grep -v grep
    root 256 0.0 0.2 18568 5668 ?? Ss Wed05PM 0:14.33 /netscaler/nsnetsvc
    

Complete the following procedures to resolve file synchronization problems by analyzing the network trace to verify the communication between the appliances:

  1. Log on to the NetScaler appliances using an SSH utility, such as PuTTY and specifying the NetScaler IP (NSIP). Use the nsroot credentials to log on to the appliance.
  2. Terminate the nsfsyncd process on both the primary and the secondary appliances and restart it. Ensure that the process is running on both appliances:
    /netscaler/nsfsyncd –d
  3. Ensure that Keep Alive is enabled on TCP parameters on both appliances. Occasionally if this variable is disabled, the nsfsyncd process is terminated. Run the following command to verify if the process is enabled:
    >show tcpparam | grep KA
  4. Comment the nsfsyncd –p line in the /etc/crontab file. To comment or hash out nsfsyncd, log on to the NetScaler appliance through any SFTP client like WinSCP and edit the file with a text editor.

    User-added image

  5. Disable high availability synchronization on the secondary appliance.
    Note: The reason for disabling synchronization is to clearly identify file synchronization ioctl and its operations.
    > set ha node -hasync DISABLED
  6. Run a trace from both appliances and then run the sync ha files all command locally from the secondary and the primary appliance.
    Note: You can use this trace to analyze or verify the communication between the appliances.
  7. You can then uncomment the nsfsyncd process and enable high availability synchronization.

For more information on troubleshooting this issue, refer to CTX138748 - File Synchronization in NetScaler High Availability Setup and Citrix Documentation - Configuring Synchronization.

Back to top


Be aware of configuration file synchronization exceptions

The configuration file of the primary and secondary NetScaler appliances is synchronized with the following exceptions:

  • The primary and secondary NetScaler appliances must be configured with unique NetScaler IP (NSIP) addresses. Therefore, this information is not synchronized between the appliances. The information about the interfaces are also omitted.
  • For each NetScaler appliance, configure the other high availability NetScaler appliance node. The node ID and associated IP address must reflect the node ID and IP addresses of the peer node. For example, NetScaler1 configured with a unique node ID and IP address of NetScaler2 and NetScaler2 configured with a unique node ID and IP address of NetScaler1.
User-added image

Back to top


Common configuration or certificate files on NetScalers

On both NetScaler appliances in the high availability setup, there might be a need to have a set of common configuration or certificate files depending on the deployment needs. If this is a requirement, then specific files present in the same location on both the appliances need to be manually synchronized.
For example, if SSL offload is enabled, then SSL certificates must be copied to the same location on both the appliances. Similarly, the vsr.html file for Sure Connect, any manually customized files, or any other batch files containing configuration commands should be manually synchronized. You can use secure file transfer utility, such as the WinSCP, to transfer the files. You can run the following commands on the primary appliance to synchronize files with the secondary appliance:
sync HA files ALL 

For all the operations that can be performed on "HA files" command refer to Citrix Documentation.

Back to top


RPC node password in GSLB setup 

Ensure that the RPC node password is the same on NetScaler appliances.
If you have configured Global Server Load Balancing (GSLB), then the RPC node passwords should be configured on high availability NetScaler appliances for additional security, else the default password is enforced. Initially, all NetScaler appliances are configured with the same default RPC node password.

Note: In NetScaler 11.0 hash value or encrypted string for RPC node password will look different even though they are configured to be the same.This is by design.

For more information refer to Citrix Documentation - Creating or Changing an RPC Node Password.

Back to top


Check the state of all interfaces

On new NetScaler appliances, all interfaces are enabled by default. Ensure that only used interfaces are enabled. Run the disable interface <Interface_Number> command to disable any unused interfaces to ensure that failover can occur when required.
Disable monitoring for the interfaces whose failure should not cause a failover in the high availability setup by running the following command from the command line interface of the appliance, set interface -hamonitor OFF.

For more information refer to Citrix Documentation - Configuring Network Interfaces.
Note: Repeat this step for each appliance interface that is used and whose failure should not cause failover, such as the management only interface.

Back to top


Ensure that the MIP address is the same on both the NetScaler appliances

Mapped IP addresses (MIP) are used for server-side connections. A MIP can be considered a default Subnet IP (SNIP) address, because MIPs are used when a SNIP is not available or Use SNIP (USNIP) mode is disabled.
For more information refer to Citrix Documentation - Configuring Mapped IP Addresses (MIPs).

Back to top


NetScaler appliances failing over unexpectedly

If the NetScaler appliances are failing over unexpectedly, run the nsconmsg –d event command from the shell prompt to display the current events that might be causing the failover.

The following are the possible causes:

  • Interface is down.
  • SSL acceleration card is down.
  • System stopped responding.

Back to top

NetScaler command propagation fails 

If command propagation fails, it can be a result of connectivity issues, duplex mismatches, packet drops, or the /netscaler/nsnetsvc process is not running.
For more information refer to Citrix Documentation - Configuring Command Propagation and Troubleshooting Command Propagation.

Back to top

 

Do not use cross-over cables 

Do not connect the appliances by a cross-over cable, because it can result in a bridging loop.

Back to top

 

Check for worn-out/damaged cables 

Examine the cable on both the appliances and if required change the cable. Worn-out/damaged cables can cause failovers.

Back to top

 

HA MAC flaps on the interfaces that are part of the HA configuration 

You can examine the nic_cur_ha_MAC interface on the primary NetScaler in HA to see which interface has the HA MAC address. NetScaler ARPs for peer device interface MAC before sending heartbeat. In general the 0/1 interface MAC address of secondary should be learnt by 0/1 interface of primary and vice versa. 

If we notice that an interface MAC address of one device is learnt on multiple interfaces of the other appliance in HA then we will see this issue. 

User-added image
 

For example, 0/1 interface and 10/1 interface are in use on primary (NS1) and secondary (NS2) NetScalers where, 0/1 interface is used for management and 10/1 interface is used for data traffic.

If the NS1 learns the MAC address of 0/1 interface of NS2 then it would send all the heartbeats to only 0/1 interface of NS2. Due to this there will be no heartbeats seen on the 10/1 interface of NS2.

You can address such issues by separating the interfaces using VLANs. For example, use VLAN100 for 0/1 interface and use VLAN200 for 10/1 interface. Make sure you create L3 VLANs..

Back to top

 

Incorrect VLAN Tagging

Ensure that the VLANs are tagged accurately. For more information refer to CTX122921 - Citrix NetScaler Interface Tagging and Flow of High Availability Packets and CTX214033 - NetScaler Networking and VLAN Best Practices.

Back to top

Point to Note

  • ​The secondary NetScaler appliance drops all traffic except for the high availability management and heartbeat packets. This is evident from the fact that the secondary appliance displays constantly increasing packet drops on all ports. This is the expected behavior of the secondary appliance.

Back to top

Issue/Introduction

This article contains information about troubleshooting issues related to the high availability (HA) feature of NetScaler appliance.

Additional Information