Summary
This document describes how to troubleshoot NetScaler High Availability issues.
Procedure
To configure NetScaler systems for High Availability (HA) mode, consider the following points when troubleshooting issues.
1. It is not recommended to run an HA pair with different versions of operating system code and builds. The NetScaler device with the lower version of code will become Primary.
For example if a new device “NewNS2” (version 5.2 build 50.9) is issued as a replacement for a faulty device “Faulty-NS2” (version 5.2 build 50.16) and HA is setup between “NS1” and “Faulty-NS2” you need to keep in mind “NewNS2” has no configuration. Once “NewNS2” is installed it will take over as Primary (due to the code version mismatch between build 50.16 and build 50.9) and erase the configuration on the existing Primary “NS1.”
2. You can retrieve the original Primary device configuration from a back-up copy present on the system disk. The system saves the last four copies of the ns.conf file in the /nsconfig directory(for NetScaler version 6.X). These are named ns.conf.0, ns.conf.1, and so on. The ns.conf.0 file contains the latest configuration. To retrieve the system’s configuration, proceed as follows:
a. Exit from the CLI to FreeBSD by entering this CLI command: >shell
b. Enter the following command to determine the name of the latest backup copy (based on the timestamp of the file, in this example ns.conf.0 has the latest timestamp hence the latest backup):
root@ns#ls -lt /nsconfig/ns.conf.?
-rw------- 1 root wheel 4671 Feb 28 20:54 /nsconfig/ns.conf.0
-rw------- 1 root wheel 4671 Feb 28 20:43 /nsconfig/ns.conf.3
-rw------- 1 root wheel 4671 Feb 28 20:42 /nsconfig/ns.conf.2
-rw------- 1 root wheel 4671 Feb 28 20:41 /nsconfig/ns.conf.1
-rw------- 1 root wheel 4671 Feb 28 20:40 /nsconfig/ns.conf.4
root@ns#
c. Copy the latest backup file to /nsconfig/ns.conf by entering the command: root@ns# cp /nsconfig/ns.conf.0 /nsconfig/copyns.conf
d. Use the batch command to read the contents of the copyns.conf file and execute each line as a separate CLI command. Lines starting with # are considered comments. Execute the batch command from the NetScaler CLI by typing the following:
root@ns# exit
> batch –f /nsconfig/copyns.conf
>save config (this writes the config from running memory to /nsconfig/ns.conf)
3. In HA mode, the nsroot password must be the same for Primary and Secondary NetScaler systems. When the password of the nsroot user account is changed on either system, the change must also be performed on the peer as password synchronization is required.
4. The configuration file (ns.conf) on the Primary NetScaler system and the configuration file (ns.conf) on the Secondary NetScaler system will be synchronized with the following exceptions:
5. On both units in an HA setup, there may be a need to have a set of common configuration or certificate files depending on the deployment needs. If this is a requirement specific files may need to be manually synchronized (present in the same location on both nodes of the HA pair). For example, if SSL offload is enabled, then SSL certificates must be copied to the same location (directory) on both the NetScaler units. Similar examples include vsr.html (for Sure Connect), any manually customized files, or any other batch files containing configuration commands. You can use “scp” for secure file transfer. Starting with release 7.0, a new command that automatically copies all of the above-mentioned files was introduced. The following commands can be entered on the primary to synchronize files to the secondary:
• sync files (release 7.0)
• sync HA files (release 8.0)
6. If running GSLB, the RPC node passwords should be configured on HA systems for added security, else the default will be enforced. Initially, all NetScaler systems are configured with the same default RPC node password.
7. Check the state of all interfaces. On new NetScaler devices all interfaces are enabled by default. Ensure only used interfaces are enabled. Use the >disable interface <ifnum> command to disable any unused interfaces so that failover can occur when needed. Disable monitoring for those interfaces whose failure should not cause a failover in the HA mode by typing the following command in the CLI.
set interface <ifnum> -hamonitor OFF
Where ifnum is the interface number of the NetScaler.
Note: Repeat this step for each system interface that will be used and whose failure should not cause failover, for example a management only interface.
8. Ensure the MIP is the same on both the Primary and the Secondary NetScaler systems.
9. If you are experiencing synchronization issues check the following:
a. Check if the Primary and Secondary nodes can see each other. Management and heartbeat messages are sent via L2. L2 connectivity between the two HA nodes must allow the heartbeat to be received within 3 seconds.
b. Ensure any configured ACLs permit communication between the pair.
c. Check inetd.conf file to ensure the /netscaler/nsnetsvc process is not commented out. Enter the following command:
root@ns# more /etc/inetd.conf
#Netscaler /etc/inetd.conf
#
# This file is present in the memory filesystem by default, and any changes
# to this file will be lost following a reboot. If changes to this file
# require persistence between reboots, copy this file to the /nsconfig directory
# and make the required changes to that file.
#
# Warning: This method of altering available network services may not be
# supported in the future.
#
# The FTP and Telnet protocols are insecure. Consider using ssh,scp, or sftp.
#ftp stream tcp nowait root /usr/libexec/ftpd ftpd –l
#telnet stream tcp nowait root /usr/libexec/telnetd telnetd
# NetScaler network service
# Warning: Do not remove this line
nsnetsvc stream tcp nowait root /netscaler/nsnetsvc nsnetsvc
root@ns#
d. Ensure the nsnetsvc stream tcp nowait root /netscaler/nsnetsvc nsnetsvc line is not commented out.
e. Check the /tmp/ns_com_cfg.conf file on the Secondary system and its permissions by typing the following command:
10. If the NetScaler systems are failing over unexpectedly, issue the root@ns# nsconmsg –d event command to view current events that may be causing the failover. Possible causes can include:
11. If command propagation failures occur this can be a result of connectivity issues, duplex mismatches, packet drops, or the /netscaler/nsnetsvc process not running (outlined in step 8).
12. Do not connect two devices by a cross-over cable, this may cause a bridging loop.
13. The Secondary NetScaler device will drop all traffic except for HA management and heartbeat messages. Evidence of this can been seen as the Secondary device will display constantly increasing packet drops on all ports. This is expected behavior.