[ADC] HA config sync fails with error "command failed on secondary node but suceeded on primary node"

[ADC] HA config sync fails with error "command failed on secondary node but suceeded on primary node"

book

Article ID: CTX261775

calendar_today

Updated On:

Description

1: Ha sync failing between netscaler device, when doing force sync we get error:
"command failed on secondary node but suceeded on primary node"

2: exec: show HA node
                                1)      Node ID:      0
                                                                IP:    10.10.10.12
                                                                Node State: UP
                                                                Master State: Primary
                                                                Fail-Safe Mode: OFF
                                                                INC State: DISABLED
                                                                Sync State: ENABLED
                                                   <snip>
                                2)      Node ID:      1
                                                                IP:    10.10.10.51
                                                                Node State: UP
                                                                Master State: Secondary
                                                                Fail-Safe Mode: OFF
                                                                INC State: DISABLED
                                                                Sync State: FAILED                                                                                                                                                                                         <<<<<<<<<<<<<look here
                                                                <snip>

3: On comparing the primary and secondary, we see the huge difference in between the 2 nodes:

Secondary:
 more ../../shell/ls_lRtrp.out
total 17079
drwxr-xr-x    2 root  wheel   512B Aug 25  2018 mnt/
drwxr-xr-x    2 root  wheel   512B Aug 25  2018 home/
drwxrwxr-x    2 root  5       512B Aug 25  2018 .snap/
drwxr-xr-x    2 root  wheel   1.0k Aug 25  2018 bin/
drwxr-xr-x    2 root  wheel   2.0k Aug 25  2018 sbin/
drwxr-xr-x    2 root  wheel   512B Aug 25  2018 libexec/
drwxr-xr-x    3 root  wheel   2.0k Aug 25  2018 lib/
drwxr-xr-x    3 root  wheel   512B Aug 25  2018 compat/
drwxr-xr-x   10 root  wheel   512B Aug 25  2018 usr/
drwxr-xr-x   14 root  wheel   5.0k Aug 25  2018 netscaler/
drwxr-xr-x   38 root  wheel   1.0k Oct  4  2018 var/
dr-xr-xr-x    7 root  wheel   512B Jul 27 22:05 dev/
lrwxr-xr-x    1 root  wheel    16B Jul 27 22:05 nsconfig -> /flash/nsconfig/
drwxr-xr-x  260 root  wheel   3.5k Jul 27 22:05 nscache/
lrwxr-xr-x    1 root  wheel    33B Jul 27 22:05 optional -> /netscaler/portal/themes/optional
lrwxr-xr-x    1 root  wheel    15B Jul 27 22:05 configdb -> /flash/configdb
lrwxr-xr-x    1 root  wheel    33B Jul 27 22:05 colorful -> /netscaler/portal/themes/colorful
drwxr-xr-x    4 root  wheel   512B Aug  8 01:44 root/
drwxr-xr-x    6 root  wheel   512B Aug 12 11:40 flash/
drwxr-xr-x    9 root  wheel   1.5k Aug 13 06:19 etc/
-rw-r--r--    1 root  wheel    16M Aug 13 06:32 (null)  <<<<<<<<< 16M here seems to be our problem / the difference between the two nodes.
drwxrwxrwt    3 root  wheel   1.5k Aug 13 06:32 tmp/
dr-xr-xr-x    1 root  wheel     0B Aug 13 06:32 proc/


Primary:
/shell/ls_lRtrp.out
total 41
drwxr-xr-x    2 root  wheel   512B Aug 25  2018 mnt/
drwxr-xr-x    2 root  wheel   512B Aug 25  2018 home/
drwxrwxr-x    2 root  5       512B Aug 25  2018 .snap/
drwxr-xr-x    2 root  wheel   1.0k Aug 25  2018 bin/
drwxr-xr-x    2 root  wheel   2.0k Aug 25  2018 sbin/
drwxr-xr-x    2 root  wheel   512B Aug 25  2018 libexec/
drwxr-xr-x    3 root  wheel   2.0k Aug 25  2018 lib/
drwxr-xr-x    3 root  wheel   512B Aug 25  2018 compat/
drwxr-xr-x   10 root  wheel   512B Aug 25  2018 usr/
drwxr-xr-x   14 root  wheel   5.0k Aug 25  2018 netscaler/
drwxr-xr-x   38 root  wheel   1.0k Oct  4  2018 var/
dr-xr-xr-x    7 root  wheel   512B Jul 27 22:12 dev/
lrwxr-xr-x    1 root  wheel    16B Jul 27 22:12 nsconfig -> /flash/nsconfig/
drwxr-xr-x  260 root  wheel   3.5k Jul 27 22:12 nscache/
lrwxr-xr-x    1 root  wheel    33B Jul 27 22:12 optional -> /netscaler/portal/themes/optional
lrwxr-xr-x    1 root  wheel    15B Jul 27 22:12 configdb -> /flash/configdb
lrwxr-xr-x    1 root  wheel    33B Jul 27 22:12 colorful -> /netscaler/portal/themes/colorful
-rw-r--r--    1 root  wheel     2k Jul 27 22:13 (null)                                                <<<<<<<<<<<<<look here
drwxr-xr-x    4 root  wheel   512B Aug  8 01:45 root/
drwxr-xr-x    6 root  wheel   512B Aug 12 11:40 flash/
drwxr-xr-x    9 root  wheel   1.5k Aug 13 06:30 etc/
drwxrwxrwt    3 root  wheel   1.5k Aug 13 06:34 tmp/
dr-xr-xr-x    1 root  wheel     0B Aug 13 06:34 proc/


4: We see huge number of following logs in ‘messages’ file:

 Aug 13 06:32:40 <local0.alert> 10.10.140.51 08/13/2019:06:32:40 GMT VMP01-HS 0-PPE-0 : default EVENT STATECHANGE 202 0 :  Device "self node 10.10.140.51" - State "SYNC Failure - Save remote config failed"
Aug 13 06:32:41 <local0.err> VMP01-HS nssync: NSAPI_AUTOSYNC_FILES issued failed: Synchronization failed, please try again
Aug 13 06:32:45 <kern.err> VMP01-HS kernel: pid 96836 (rsync), uid 0 inumber 2517 on /: filesystem full
Aug 13 06:32:46 <kern.err> VMP01-HS kernel: pid 96849 (nscli), uid 0 inumber 42200 on /: filesystem full


So rsync was failing, causing ramdisk usage increasing with this (null) file
Checking the auth.log on primary, indicated nsroot unable to login.
 

Resolution


- Perform a reboot (secondary only). This will clear the entire ramdisk in the process.
- Once the file is removed, Enable the nsinternal user on both nodes.
                                set ns param -internaluserlogin ENABLED

 

Problem Cause


Logs indicate sync to be failing due to a disk usage issue on the ramdisk md0. Internal nsuser is not able to login causing failure. This is causing the sync failures on the secondary appliance.