"Aanalytics_afdecoder goes down frequently and "swap_pager_getswapspace(9): failed" messages seen in NetScaler Insight Center

"Aanalytics_afdecoder goes down frequently and "swap_pager_getswapspace(9): failed" messages seen in NetScaler Insight Center

book

Article ID: CTX205249

calendar_today

Updated On:

Description

Netscaler Insight Center - Subsystem down error messages:

From logs you will see below:
===
Nov 16 09:02:09 <local0.crit> insight-center analytics_event: 10.247.191.141 16/11/2015:09:02:08 GMT : SNMP TRAP_SENT : 127.0.0.1:SubSystemState:analytics_afdecoder - SubSystem Down: analytics_afdecoder
Nov 16 09:02:09 <local0.crit> insight-center analytics_event: 10.247.191.141 16/11/2015:09:02:08 GMT : EVENT SUBSYSTEMDOWN : 127.0.0.1:SubSystemState:analytics_afdecoder - SubSystem Down: analytics_afdecoder
 

 

Resolution

From the Insight logs we see below:
===

Messages logs shows below errors which makes us to believe that Insight is running out of memory:

Nov 16 09:02:09 <local0.crit> insight-center analytics_event: 10.247.191.141 16/11/2015:09:02:08 GMT : SNMP TRAP_SENT : 127.0.0.1:SubSystemState:analytics_afdecoder - SubSystem Down: analytics_afdecoder
Nov 16 09:02:09 <local0.crit> insight-center analytics_event: 10.247.191.141 16/11/2015:09:02:08 GMT : EVENT SUBSYSTEMDOWN : 127.0.0.1:SubSystemState:analytics_afdecoder - SubSystem Down: analytics_afdecoder

Nov 16 09:02:04 <kern.crit> insight-center kernel: swap_pager_getswapspace(9): failed
Nov 16 09:02:04 <kern.crit> insight-center kernel: swap_pager_getswapspace(16): failed
Nov 16 09:02:04 <kern.crit> insight-center kernel: swap_pager_getswapspace(9): failed
Nov 16 09:02:04 <kern.crit> insight-center kernel: swap_pager_getswapspace(6): failed
Nov 16 09:02:04 <kern.crit> insight-center kernel: swap_pager_getswapspace(9): failed
Nov 16 09:02:04 <kern.crit> insight-center kernel: swap_pager_getswapspace(4): failed
Nov 16 09:02:04 <kern.err> insight-center kernel: pid 11720 (analytics_afdecoder), uid 0, was killed: out of swap space

Insight decoder process in getting chocked with this data record. Within 5 min Insight receives 1024064 data records.
 
  PID USERNAME    PRI NICE   SIZE    RES STATE   C   TIME   WCPU COMMAND
28052 root        108    0  3770M  3656M CPU3    3   5:30 53.96% analytics_afdecoder{HDXInsightWorker}
28052 root         76    0  3770M  3656M RUN     0  18:06 42.97% analytics_afdecoder{AfPacketHandler}
28052 root        105    0  3770M  3656M RUN     0   4:05 41.99% analytics_afdecoder{analytics_afdeco}
 
cat mps_afdecoder.log.17 | grep "Data Record: ica_session_state" | wc -l
1024064


Insight memory is getting full:
==
 
93123 Tuesday, 5 Jan 16 10:02:19 +0000 [Debug] For AFProto: Soft limit crossed. Packet will be dropped after Hard limit crossed.
  93124 Tuesday, 5 Jan 16 10:02:55 +0000 [Debug] HDXInsightDataProcessor::handle_ica_session_update_data: for session guid is: 4e62fa71-5635-0048-3980-8b56abcd0800 cha
nnel_id_4 is  0, has not valid value.
  93125 Tuesday, 5 Jan 16 10:03:00 +0000 [Debug] ....................... start of the process ....................... 

From Insight mps_afdecoder logs you will see flood of below 
data record with 0000000 session-guid and the frequency of this data record is very high. :

Tuesday, 5 Jan 16 09:57:56 +0000 [Debug] Data Record: ica_session_state, Values:->       exporter_id: 10.80.29.10, flags: 524288, error_code: 120, session_guid: 00000000000000000000000000000000, session_guid_size: 16, dev_serial_num: 0, client_cookie: 00000000000000000000000000000000, client_cookie_size: 16, connection_chain_id: 00000000000000000000000000000000, connection_chain_id_size: 16, clientIPAddress: 2887172505, serverIPAddress: 183943126, server_port: 1494, seq_number: 0, acr_reconnect: 0, connection_priority: 0
Tuesday, 5 Jan 16 09:59:08 +0000 [Debug] Data Record: ica_session_state, Values:->       exporter_id: 10.80.29.10, flags: 524288, error_code: 120, session_guid: 00000000000000000000000000000000, session_guid_size: 16, dev_serial_num: 0, client_cookie: 00000000000000000000000000000000, client_cookie_size: 16, connection_chain_id: 00000000000000000000000000000000, connection_chain_id_size: 16, clientIPAddress: 2887026803, serverIPAddress: 184008626, server_port: 1494, seq_number: 0, acr_reconnect: 0, connection_priority: 0
Tuesday, 5 Jan 16 09:59:08 +0000 [Debug] Data Record: ica_session_state, Values:->       exporter_id: 10.80.29.10, flags: 524288, error_code: 120, session_guid: 00000000000000000000000000000000, session_guid_size: 16, dev_serial_num: 0, client_cookie: 00000000000000000000000000000000, client_cookie_size: 16, connection_chain_id: 00000000000000000000000000000000, connection_chain_id_size: 16, clientIPAddress: 2887172521, serverIPAddress: 183943094, server_port: 1494, seq_number: 0, acr_reconnect: 0, connection_priority: 0
Tuesday, 5 Jan 16 09:59:08 +0000 [Debug] Data Record: ica_session_state, Values:->       exporter_id: 10.80.29.10, flags: 524288, error_code: 120, session_guid: 00000000000000000000000000000000, session_guid_size: 16, dev_serial_num: 0, client_cookie: 00000000000000000000000000000000, client_cookie_size: 16, connection_chain_id: 00000000000000000000000000000000, connection_chain_id_size: 16, clientIPAddress: 2887165612, serverIPAddress: 184008566, server_port: 1494, seq_number: 0, acr_reconnect: 0, connection_priority: 0

The issue is similar to an internally reported issue and is fixed in 10.5.58+ builds:
 

Problem Cause

NetScaler is sending lot of skipflow data record.  These skipflow records are received whenever Netscaler is not able to parse the appflow packet of ICA connection
 
 
 

Additional Information