[ADC] HA config sync fails with error "command failed on secondary node but suceeded on primary node"

book

Article ID: CTX261775

calendar_today

Updated On:

Description

1: Ha sync failing between netscaler device, when doing force sync we get error:
"command failed on secondary node but suceeded on primary node"

2: exec: show HA node
                                1)      Node ID:      0
                                                                IP:    10.10.10.12
                                                                Node State: UP
                                                                Master State: Primary
                                                                Fail-Safe Mode: OFF
                                                                INC State: DISABLED
                                                                Sync State: ENABLED
                                                   <snip>
                                2)      Node ID:      1
                                                                IP:    10.10.10.51
                                                                Node State: UP
                                                                Master State: Secondary
                                                                Fail-Safe Mode: OFF
                                                                INC State: DISABLED
                                                                Sync State: FAILED                                                                                                                                                                                         <<<<<<<<<<<<<look here
                                                                <snip>

3: On comparing the primary and secondary, we see the huge difference in between the 2 nodes:

Secondary:
more ../../shell/ls_lRtrp.out
total 17079
drwxr-xr-x    2 root wheel   512B Aug 25 2018 mnt/
drwxr-xr-x    2 root wheel   512B Aug 25 2018 home/
drwxrwxr-x    2 root 5       512B Aug 25 2018 .snap/
drwxr-xr-x    2 root wheel   1.0k Aug 25 2018 bin/
drwxr-xr-x    2 root wheel   2.0k Aug 25 2018 sbin/
drwxr-xr-x    2 root wheel   512B Aug 25 2018 libexec/
drwxr-xr-x    3 root wheel   2.0k Aug 25 2018 lib/
drwxr-xr-x    3 root wheel   512B Aug 25 2018 compat/
drwxr-xr-x   10 root wheel   512B Aug 25 2018 usr/
drwxr-xr-x   14 root wheel   5.0k Aug 25 2018 netscaler/
drwxr-xr-x   38 root wheel   1.0k Oct 4 2018 var/
dr-xr-xr-x    7 root wheel   512B Jul 27 22:05 dev/
lrwxr-xr-x    1 root wheel    16B Jul 27 22:05 nsconfig -> /flash/nsconfig/
drwxr-xr-x 260 root wheel   3.5k Jul 27 22:05 nscache/
lrwxr-xr-x    1 root wheel    33B Jul 27 22:05 optional -> /netscaler/portal/themes/optional
lrwxr-xr-x    1 root wheel    15B Jul 27 22:05 configdb -> /flash/configdb
lrwxr-xr-x    1 root wheel    33B Jul 27 22:05 colorful -> /netscaler/portal/themes/colorful
drwxr-xr-x    4 root wheel   512B Aug 8 01:44 root/
drwxr-xr-x    6 root wheel   512B Aug 12 11:40 flash/
drwxr-xr-x    9 root wheel   1.5k Aug 13 06:19 etc/
-rw-r--r--    1 root wheel    16M Aug 13 06:32 (null) <<<<<<<<< 16M here seems to be our problem / the difference between the two nodes.
drwxrwxrwt    3 root wheel   1.5k Aug 13 06:32 tmp/
dr-xr-xr-x    1 root wheel     0B Aug 13 06:32 proc/

Primary:
/shell/ls_lRtrp.out
total 41
drwxr-xr-x    2 root wheel   512B Aug 25 2018 mnt/
drwxr-xr-x    2 root wheel   512B Aug 25 2018 home/
drwxrwxr-x    2 root 5       512B Aug 25 2018 .snap/
drwxr-xr-x    2 root wheel   1.0k Aug 25 2018 bin/
drwxr-xr-x    2 root wheel   2.0k Aug 25 2018 sbin/
drwxr-xr-x    2 root wheel   512B Aug 25 2018 libexec/
drwxr-xr-x    3 root wheel   2.0k Aug 25 2018 lib/
drwxr-xr-x    3 root wheel   512B Aug 25 2018 compat/
drwxr-xr-x   10 root wheel   512B Aug 25 2018 usr/
drwxr-xr-x   14 root wheel   5.0k Aug 25 2018 netscaler/
drwxr-xr-x   38 root wheel   1.0k Oct 4 2018 var/
dr-xr-xr-x    7 root wheel   512B Jul 27 22:12 dev/
lrwxr-xr-x    1 root wheel    16B Jul 27 22:12 nsconfig -> /flash/nsconfig/
drwxr-xr-x 260 root wheel   3.5k Jul 27 22:12 nscache/
lrwxr-xr-x    1 root wheel    33B Jul 27 22:12 optional -> /netscaler/portal/themes/optional
lrwxr-xr-x    1 root wheel    15B Jul 27 22:12 configdb -> /flash/configdb
lrwxr-xr-x    1 root wheel    33B Jul 27 22:12 colorful -> /netscaler/portal/themes/colorful
-rw-r--r--    1 root wheel     2k Jul 27 22:13 (null)                                                <<<<<<<<<<<<<look here
drwxr-xr-x    4 root wheel   512B Aug 8 01:45 root/
drwxr-xr-x    6 root wheel   512B Aug 12 11:40 flash/
drwxr-xr-x    9 root wheel   1.5k Aug 13 06:30 etc/
drwxrwxrwt    3 root wheel   1.5k Aug 13 06:34 tmp/
dr-xr-xr-x    1 root wheel     0B Aug 13 06:34 proc/

4: We see huge number of following logs in ‘messages’ file:

Aug 13 06:32:40 <local0.alert> 10.10.140.51 08/13/2019:06:32:40 GMT VMP01-HS 0-PPE-0 : default EVENT STATECHANGE 202 0 : Device "self node 10.10.140.51" - State "SYNC Failure - Save remote config failed"
Aug 13 06:32:41 <local0.err> VMP01-HS nssync: NSAPI_AUTOSYNC_FILES issued failed: Synchronization failed, please try again
Aug 13 06:32:45 <kern.err> VMP01-HS kernel: pid 96836 (rsync), uid 0 inumber 2517 on /: filesystem full
Aug 13 06:32:46 <kern.err> VMP01-HS kernel: pid 96849 (nscli), uid 0 inumber 42200 on /: filesystem full

So rsync was failing, causing ramdisk usage increasing with this (null) file
Checking the auth.log on primary, indicated nsroot unable to login.

Resolution

- Perform a reboot (secondary only). This will clear the entire ramdisk in the process.
- Once the file is removed, Enable the nsinternal user on both nodes.
set ns param -internaluserlogin ENABLED

Problem Cause

Logs indicate sync to be failing due to a disk usage issue on the ramdisk md0. Internal nsuser is not able to login causing failure. This is causing the sync failures on the secondary appliance.

Was this article helpful?

thumb_up Yes

thumb_down No

Welcome to "KB Articles"