In a NetScaler HA (High Availability) setup, you may observe the following issues:
The issue happens when all of the following triggers match:
You have 4 options to resolve the issue:
Option 1:
Configure "HA Sync VLAN" to the VLAN that we expect to carry HA packets.
That can make the VLAN be HA-related configuration and won't be cleared during full sync.
Restricting High-Availability Synchronization Traffic to a VLAN
Option 2:
Configure NSVLAN. NSVLAN is L3 VLAN bound to NSIP, and won't be cleared during full sync.
Configuring NSVLAN
Option 3:
Disable "tagall" on NetScaler and configure allow untagged(native) VLAN on peer switch.
Native VLAN is untagged and packets won't be dropped by Switch if NetScaler's native VLAN is changed.
Option_4:
Upgrading to NetScaler 13.1 Build 52.19 or higher.
During HA full sync, NetScaler increases HA dead interval to 21 seconds temporarily to mitigate the node DOWN issue. When sync is completed, dead interval goes back to 3 seconds.
Note: If SyncVLAN is configured, nothing will be changed to dead interval. Configuring SyncVLAN is highly suggested.
[NSHELP-35628]
During HA full sync, before applying the configuration synced from Primary, Secondary node firstly clear non-HA-related configuration, including VLAN. Therefore, heartbeats sent from Secondary node will go back to VLAN1 with tag (-tagall). That can be rejected by Switch or forwarded to VLAN1 that can't reach Primary node.
If the time between "clear VLAN" and "re-add VLAN" is larger than 3 seconds, node will be DOWN.
When node DOWN happens during Full HA sync, NetScaler will not update incarnation number when sync is finished. Hence, if a new command is executed in the future on Primary node and propagated to Secondary node, HA full sync will happen again due to incarnation mismatch (out-of-sync).