Primary does not restart but HA failover happens.
From the old primary node, there are below logs. So the failover request is initiated by the other node--the old secondary.
34895 207 PPE-0 self node 172.16.28.10: INIT due to REQUEST from HA peer node Thu Mar 1 08:40:16 2018
From the old secondary, there are below logs. We can see the old secondary request failover for it missed 15 heartbeats.
21655 0 PPE-1 interface(LA/1): No HA heartbeats (Last received: Thu Mar 1 08:40:13 2018 ; Missed 15 heartbeats) Thu Mar 1 08:40:16 2018
21656 0 PPE-1 remote node 172.16.28.10: DOWN Thu Mar 1 08:40:16 2018
21657 0 PPE-1 self node 172.16.28.11: Claiming Thu Mar 1 08:40:16 2018
21658 0 PPE-1 self node 172.16.28.11: Primary Thu Mar 1 08:40:16 2018
21659 0 PPE-1 interface(LA/1): HA heartbeats received Thu Mar 1 08:40:16 2018
From here ,we have two assumptions:
Network Issue or
Hardware Issue.
But the newnslog,we can see some clue about the specific reason.
nsconmsg -K newnslog -d current -g ha_tot_pkt_tx -s time=01Mar2018:08:39 -s disptime=1 |more
reltime:mili second between two records Thu Mar 1 08:39:09 2018
Index rtime totalcount-val delta rate/sec symbol-name&device-no&time
7 7000 16066592 35 5 ha_tot_pkt_tx Thu Mar 1 08:39:58 2018
8 7000 16066627 35 5 ha_tot_pkt_tx Thu Mar 1 08:40:05 2018
9 8113 16066666 39 4 ha_tot_pkt_tx Thu Mar 1 08:40:13 2018 10 10190 16066701 35 3 ha_tot_pkt_tx Thu Mar 1 08:40:23 2018 11 7000 16066736 35 5 ha_tot_pkt_tx Thu Mar 1 08:40:30 2018
NetScaler generated two logs at 08:40:13 and 08:40:23.
The interval is 10s and NetScaler should generate 50 heartbeat (NetScaler generates 5 heartbeat per second by default). But from the log, only 35 heartbeats were generated and sent.
This is more likely to be a hardware failure.
At the same time , ns_hw_err.bash showed HDD errors . So we can locate the reason as hardware failure and then request RMA .