[NetScaler] HA Full Sync may cause Heartbeats missing

book

Article ID: CTX494496

calendar_today

Updated On:

Description

In a NetScaler HA (High Availability) setup, you may observe the following issues:

HA Full sync can cause both nodes report peer node DOWN due to heartbeats missing.
A new command propagation can cause HA Full sync again, and also leads node DOWN.

The issue happens when all of the following triggers match:

All interfaces/channels that carry HA heartbeats has Native VLAN changed from VLAN1.
All interfaces/channels that carry HA heartbeats has -tagall enabled.

Resolution

You have 4 options to resolve the issue:

Option 1:
Configure "HA Sync VLAN" to the VLAN that we expect to carry HA packets.
That can make the VLAN be HA-related configuration and won't be cleared during full sync.
Restricting High-Availability Synchronization Traffic to a VLAN

Option 2:
Configure NSVLAN. NSVLAN is L3 VLAN bound to NSIP, and won't be cleared during full sync.
Configuring NSVLAN

Option 3:
Disable "tagall" on NetScaler and configure allow untagged(native) VLAN on peer switch.
Native VLAN is untagged and packets won't be dropped by Switch if NetScaler's native VLAN is changed.

Option_4:
Upgrading to NetScaler 13.1 Build 52.19 or higher.
During HA full sync, NetScaler increases HA dead interval to 21 seconds temporarily to mitigate the node DOWN issue. When sync is completed, dead interval goes back to 3 seconds.
Note: If SyncVLAN is configured, nothing will be changed to dead interval. Configuring SyncVLAN is highly suggested.
[NSHELP-35628]

Problem Cause

During HA full sync, before applying the configuration synced from Primary, Secondary node firstly clear non-HA-related configuration, including VLAN. Therefore, heartbeats sent from Secondary node will go back to VLAN1 with tag (-tagall). That can be rejected by Switch or forwarded to VLAN1 that can't reach Primary node.

If the time between "clear VLAN" and "re-add VLAN" is larger than 3 seconds, node will be DOWN.

When node DOWN happens during Full HA sync, NetScaler will not update incarnation number when sync is finished. Hence, if a new command is executed in the future on Primary node and propagated to Secondary node, HA full sync will happen again due to incarnation mismatch (out-of-sync).

Issue/Introduction

Full HA sync and node DOWN issue

Additional Information

NetScaler Interface Tagging and Flow of High Availability Packets Examples

Was this article helpful?

thumb_up Yes

thumb_down No

Welcome to "KB Articles"