AES-GCM ciphers causes memory leak on NetScaler VPX devices. The VIP that has this Cipher group bound goes down due to high memory usage.
$nsconmsg -K newnslog.154 -g mem_err -d statswt0Displaying current counter value excluding counters with 0 value information
NetScaler V20 Performance Data
NetScaler NS11.0: Build 67.12.nc, Date: Jul 13 2016, 18:59:25
reltime:mili second between two records Thu Oct 6 14:17:22 2016
Index reltime counter-value symbol-name&device-no
1 0 2931 mem_err_alloc_failed MEM_TBUF
3 0 2931 mem_err_alloc_failed mempool_SYSTEM(SYSTEM)
5 0 6146752512 mem_err_alloc_failed_size MEM_TBUF
7 0 6146752512 mem_err_alloc_failed_size mempool_SYSTEM(SYSTEM)
9 0 2931 mem_err_allocfailed
11 0 6146752512 mem_err_allocfailsize
13 0 2931 mem_err_bm1024pages_allocfailed
Done.
$nsconmsg -K newnslog.154 -d oldconmsg -sConMEM=1 | grep -i tbuf | tail -5
MEM_TBUF 4294967295 2292187136(53.37% 0.00%) 0 3668 0
MEM_TBUF 4294967295 2292187136(53.37% 0.00%) 0 3671 0
$ nsconmsg -K newnslog.153 -d memstats | grep -i tbuf
MEM_TBUF 4294967295 2155872256(50.20% 0.00%) 0 0 0
$ nsconmsg -K newnslog.154 -d memstats | grep -i tbuf
MEM_TBUF 4294967295 2292187136(53.37% 0.00%) 0 2931 0
As we can see above, the MEM_TBUF usage is high. Logs indicate a gradual increase in this. Every handshake containing the AES-GCM cipher, causes the MEM_TBUF to increase.
This particular issue has already been reported in the past and the fix for this issue is now available in the firmware NetScaler 11.0 build 68.12 (and above) and NetScaler 11.1 build 49 and above.
This is also mentioned in the release notes of NetScaler 11.0 build 68.12 - https://www.citrix.com/content/dam/citrix/en_us/documents/downloads/netscaler-adc/NS_11_0_68_12.html
The following is an excerpt from the release notes:
NetScaler virtual appliance sometimes fails because of a memory leak if you use GCM-based ciphers on a VPX platform. The ciphers can eventually exhaust memory, causing the appliance to fail if the memory exhaustion error is not gracefully handled.
[# 652477, 654559, 656035, 657343]
To workaround the issue, unbind all cipher groups with GCM ciphers.
For example:
> show run | grep GCM
bind ssl cipher gcm-cipher -cipherName TLS1.2-ECDHE-RSA-AES256-GCM-SHA384
bind ssl cipher gcm-cipher -cipherName TLS1.2-ECDHE-RSA-AES128-GCM-SHA256
bind ssl cipher gcm-cipher -cipherName TLS1.2-DHE-RSA-AES256-GCM-SHA384
bind ssl cipher gcm-cipher -cipherName TLS1.2-DHE-RSA-AES128-GCM-SHA256
If you see any of the above ciphers, please do not use to avoid memory issues. If you want to use above ciphers then please upgrade to the latest 11.1 build.
Few things to keep in mind regarding memory leak issues: