Issues with Interfaces flapping and causing HA failovers

Issues with Interfaces flapping and causing HA failovers

book

Article ID: CTX238578

calendar_today

Updated On:

Description

Issues with vip access and interface flaps.
 

For the LACP flap event:
 
4238     0 PPE-0 'interface(10/1)' DOWN                Fri Apr 13 11:00:29 2018
4239     0 PPE-0 interface(10/1): INIT                 Fri Apr 13 11:00:29 2018
4240     0 PPE-0 interface(10/1): UNSELECTED           Fri Apr 13 11:00:29 2018
4241     0 PPE-0 interface(10/1): ATTACHED             Fri Apr 13 11:00:29 2018
4242     0 PPE-0 'interface(10/1)' migrated            Fri Apr 13 11:00:29 2018
4243     0 PPE-0 interface(10/1): DETACHED             Fri Apr 13 11:00:29 2018
4244     0 PPE-0 'interface(10/1)' migrated            Fri Apr 13 11:00:29 2018
4245     0 PPE-0 interface(10/1): PORT_DISABLED        Fri Apr 13 11:00:29 2018
4246     0 PPE-0 interface(10/1): LACP_DISABLED        Fri Apr 13 11:00:29 2018
4247     0 PPE-0 'interface(10/2)' DOWN                Fri Apr 13 11:00:29 2018
4248     0 PPE-0 interface(10/2): INIT                 Fri Apr 13 11:00:29 2018
4249     0 PPE-0 interface(10/2): UNSELECTED           Fri Apr 13 11:00:29 2018
4250     0 PPE-0 interface(10/2): COLLECTING           Fri Apr 13 11:00:29 2018
4251     0 PPE-0 interface(10/2): ATTACHED             Fri Apr 13 11:00:29 2018
4252     0 PPE-0 'interface(10/2)' migrated            Fri Apr 13 11:00:29 2018
4253     0 PPE-0 interface(10/2): DETACHED             Fri Apr 13 11:00:29 2018
4254     0 PPE-0 'interface(10/2)' migrated            Fri Apr 13 11:00:29 2018
 
rtime: Relative time between two records in milliseconds
seqno rtime event-message                         event-time
4255     0 PPE-0 interface(10/2): PORT_DISABLED        Fri Apr 13 11:00:29 2018
4256     0 PPE-0 interface(10/2): LACP_DISABLED        Fri Apr 13 11:00:29 2018
4257     0 PPE-0 remote node 10.204.73.49: Primary     Fri Apr 13 11:00:29 2018
4258     0 PPE-0 'interface(10/3)' DOWN                Fri Apr 13 11:00:29 2018
4259     0 PPE-0 interface(10/3): INIT                 Fri Apr 13 11:00:29 2018
.
.
.
4522   119 PPE-1 'interface(10/1)' UP                  Fri Apr 13 11:02:50 2018
 
rtime: Relative time between two records in milliseconds
seqno rtime event-message                         event-time
4693     0 PPE-0 'interface(10/1)' UP                  Fri Apr 13 11:02:51 2018
 
We can see that on Interface 10/1 we are not receiving LACPDU’s for the Instance for ~90 seconds(long timeout), but this could be because the link is down.
 
/var/nslog]$ nsconmsg105 -K newnslog -g lacpdus -s disptime=1 -d current | grep \10\/1 | less
 
      2    7000            189          1        0 nic_tot_rx_lacpdus interface(10/1) Fri Apr 13 11:00:15 2018
      4    7500            190          1        0 nic_tot_rx_lacpdus interface(10/1) Fri Apr 13 11:00:29 2018
     18    7000            195          5        0 nic_tot_rx_lacpdus interface(10/1) Fri Apr 13 11:02:54 2018
     22    7000            196          1        0 nic_tot_rx_lacpdus interface(10/1) Fri Apr 13 11:03:29 2018

Resolution

Upgrade to 11.1.56.19 or 12.0 build 59..3 version.

Problem Cause


Root Cause:Jumbo NSB is becoming non-jumbo in software RSS when the size is <= 1514. IS_JUMBO_NSB assumptions are failing because of this and buflen is not handled in non-jumbo cases.