While doing the force failover, the secondary node is undergoing a reboot

While doing the force failover, the secondary node is undergoing a reboot

book

Article ID: CTX230126

calendar_today

Updated On:

Description

While doing the force failover, the secondary node is undergoing a soft reboot and the NS is not getting failed over. 
WE can see the core files getting generated everytime we do a failover. 

The var/log/ns.log presents below records:
Nov 13 17:32:30 <local0.alert> NSAUSL nsppe: PE 0 (pid 1085) got signal 11; signal mask is 0x0 0x0 0x0 0x0
Nov 13 17:32:30 <local0.info> NSAUSL [27]: proc NSPPE-00 (1085) SIGNALED
Nov 13 17:32:30 <local0.alert> NSAUSL [27]: pitboss Mon Nov 13 17:32:30 2017 NSPPE-00 (1085) unexpectedly died due to receiving signal
Nov 13 17:32:30 <local0.alert> NSAUSL [27]: pitboss Mon Nov 13 17:32:30 2017 There may be a delay restarting process while collecting core dump on NSPPE-00 (1085)
Nov 13 17:32:30 <local0.alert> NSAUSL [27]: pitboss Mon Nov 13 17:32:30 2017 proc NSPPE-00 (1085) failure. Therefore initiating nCore NetScaler restart according to policy setting (0x29b4)
Nov 13 17:32:30 <local0.alert> NSAUSL [27]: pitboss Mon Nov 13 17:32:30 2017 NetScaler restart may be delayed if collecting core dump for NSPPE-00 (1085)
Nov 13 17:32:30 <local0.info> NSAUSL [27]: pitboss Mon Nov 13 17:32:30 2017 Nsshutdown sysctl lock remains free ...
Nov 13 17:32:30 <local0.alert> NSAUSL [27]: pitboss Mon Nov 13 17:32:30 2017 Pitboss declaring system failure: NSPPE-00 (1085) exited
Nov 13 17:32:30 <local0.debug> NSAUSL nsprofmon: Write failed because of broken pipe in nsnet_sendreq; FD=4 errno=32 peerPort=3010 peerProcess=nsnetsvc
Nov 13 17:32:30 <local0.info> NSAUSL restart[1576]: Attempting intial cleanup of netscaler daemons 
Nov 13 17:32:36 <local0.info> NSAUSL restart[1576]: Cleanup completed 
Nov 13 17:32:36 <local0.alert> NSAUSL restart[1576]: Nsshutdown lock released !

Resolution

Workaround: disable the AppQOE and the AppFlow feature, however disabling AppFlow or AppQOE may cause issues with other entities like NMAS/Insight which uses AppFlow or AppQOE. Please verify the environment before disabling the same.
Fix: The Issue is fixed in 11.0 72+ release also customer can upgrade to 11.1 or 12.0 GA builds to have the fix for the same.

Problem Cause

#697709 PE is crashing in the SSL handshake while doing ECDH key computation 

Issue/Introduction

The issue reported on build 11.0 71.18 that the HA NetScaler reboots after an upgrade to 11.0 71.18 and generates a crash files while performing a force failover. The issue can also be seen at both the HA NetScaler crashing after upgrade to 11.0 71.18. Also it may cause several failover in the HA Environment.