NetScaler Memory will spike UP if HA SYNC is disabled or firmware version is not the same.

NetScaler Memory will spike UP if HA SYNC is disabled or firmware version is not the same.

book

Article ID: CTX229369

calendar_today

Updated On:

Description

In use Memory is getting spiked UP to more than 90 % as shown in the below log snippet.

In the below log snippet we can see the memory allocation failure for MEM_CONN and MEM_TBUFF services. Due to which the connections are not getting established with the NetScaler. Thus unable to access the management GUI.

Log snippet: 
==========


1) Node ID: 0 
IP: XX.XX.XX.XX
Node State: UP 
Master State: Secondary 
Fail-Safe Mode: OFF 
INC State: DISABLED 
Sync State: DISABLED 
Propagation: DISABLED 
Enabled Interfaces : 0/1 1/1 
Disabled Interfaces : None 
HA MON ON Interfaces : None 
Interfaces on which heartbeats are not seen : None 
Interfaces causing Partial Failure: None 
SSL Card Status: NOT PRESENT 
Hello Interval: 200 msecs 
Dead Interval: 3 secs 
Node in this Master State for: 0:0:5:35 (days:hrs:min:sec) 

2) Node ID: 1 
IP: XX.XX.XX.XX
Node State: UP 
Master State: Primary 
Fail-Safe Mode: OFF 
INC State: DISABLED 
Sync State: DISABLED 
Propagation: DISABLED 
Enabled Interfaces : 0/1 1/1 
Disabled Interfaces : None 
HA MON ON Interfaces : None 
Interfaces on which heartbeats are not seen : None 
Interfaces causing Partial Failure: None 
SSL Card Status: NOT PRESENT 

reltime:mili second between two records Tue Aug 29 20:39:13 2017 
Index rtime totalcount-val delta rate/sec symbol-name&device-no 
0 2499657 1772710688 70600 10081 dht_err_pcb_link_err 
1 6997 1772779087 68399 9775 dht_err_pcb_link_err 
2 7001 1772845684 66597 9512 dht_err_pcb_link_err 
3 7005 1772911986 66302 9464 dht_err_pcb_link_err. 
4 7000 1772978787 66801 9543 dht_err_pcb_link_err 
5 7003 1773046187 67400 9624 dht_err_pcb_link_err 
6 7003 1773113785 67598 9652 dht_err_pcb_link_err 

[hariharana@sjanalysis-2 /upload/ftp/73930687/collector_S_X.X.X.X_30Aug2017_09_42/var/nslog]$ nsconmsg -K newnslog.32 -d oldconmsg -s time=29AUG2017:20:39:00 -s ConMEM=3 | more 

current time is Tue Aug 29 20:39:13 2017 
---------------------------------------------------------------------------------------------------------------------- 
TotalMEM: (3857944320/3871342592) Allocated: 3839426368(99.18%) ActualInUse: 3638151004(93.98%) Free: 31916224 

MEMPOOL MaxAllowd CurAlloc ErrLmtFailed ErrAllocFailed ErrFreeFailed 
Bytes (Own%)(Overall%) 
---------------------------------------------------------------------------------------------------------------------- 
MEM_IOH 5242880 0(0.00% 0.00%) 0 0 0 
MEM_LOGGING 4294967295 65600(0.00% 0.00%) 0 0 0 
MEM_CONN 4294967295 2015414016(46.93% 52.06%) 0 4387107 

MEM_TBUF 4294967295 1530920960(35.64% 39.54%) 0 175


CONN_POOL_MEMBERS: 
Name CurAllocd CurFree PgAllocd PgAllocFailed 
---------------------------------------------------------------------------------- 
NSB 33852 32084 31 (1.7%) 0 
JUMBO_NSB 5310 4286 6 (0.3%) 0 
PCB 13104 13009 6 (0.3%) 0 
NATPCB 5461 5460 1 (0.1%) 0 
B64 9797632 13441 299 (16.2%) 0 
B128 9781248 0 597 (32.3%) 4387107 


TBUF_POOL_MEMBERS: 
Name CurAllocd CurFree PgAllocd PgAllocFailed 
---------------------------------------------------------------------------------- 
BM16 131072 130596 1 (0.1%) 0 
BM32 131072 125617 2 (0.1%) 0 
BM64 1507328 666734 46 (2.5%) 0 
BM128 10256384 466 626 (33.9%) 175 


 

Resolution


Workaround: Enable HA Synchronization on both the nodes. 

Permanent Fix: Upgrade the device to latest 11.1 or 12.0 build. 
Fixed in following build.
12.0  9+
11.0  69.1001+
10.5  64.1+
11.1  49.6001+

Problem Cause


The memory spike issue is because of HA SYNC is disabled on the NetScalers.

Root cause: A NetScaler Application Firewall appliance in a high availability configuration will run out of memory, because firewall sessions not getting cleaned up if sync or propagation is disabled or the software versions running on a pair of nodes do not match. This is due to DHT not being able to clean up session entries properly. 

Please refer the Issue ID 0646293 in the 11.0 build 68.10 or latest 11.1 or 12.0 build release notes.