Network Connection Failure with "STA ALERT LINK" at LCD Display

Network Connection Failure with "STA ALERT LINK" at LCD Display

book

Article ID: CTX213082

calendar_today

Updated On:

Description

NetScaler network connection went down suddenly.It can be recovered by reboot. But the network connection will go down several days later.On NetScaler LCD display, it shows "STA ALERT LINK"

Resolution

RMA for the NetScaler device.

Problem Cause

From the logs, NIC Tx stall repeatedly:

During first Tx stall time, there is huge increase in rx_nobuf, but this should not cause the NIC to stall as per the logic:
 
$ nsconmsg105 -K newnslog.114 -d current -g stalls -s disptime=1
NetScaler NS10.5: Build 55.8.nc, Date: Jan 25 2015, 23:55:26
 
reltime:mili second between two records Wed May  4 01:56:28 2016
  Index   rtime totalcount-val      delta rate/sec symbol-name&device-no&time
      0  776999              1          1        0 nic_err_link_tx_stalls interface(0/1) Wed May  4 01:56:28 2016
      1    7000              2          1        0 nic_err_link_tx_stalls interface(0/1) Wed May  4 01:56:35 2016
      2    7000              4          2        0 nic_err_link_tx_stalls interface(0/1) Wed May  4 01:56:42 2016
 
 
    959    7000         239630         38        5 nic_err_rx_nobufs interface(0/1) Wed May  4 01:56:14 2016
    960    7000         239642         12        1 nic_err_rx_nobufs interface(0/1) Wed May  4 01:56:21 2016
    961    7000       22390940   22151298  3164471 nic_err_rx_nobufs interface(0/1) Wed May  4 01:56:28 2016
    962       0              1          1        0 nic_err_link_tx_stalls interface(0/1) Wed May  4 01:56:28 2016
    963    7000       22390942          2        0 nic_err_rx_nobufs interface(0/1) Wed May  4 01:56:35 2016
    964       0              2          1        0 nic_err_link_tx_stalls interface(0/1) Wed May  4 01:56:35 2016
    965    7000       22390943          1        0 nic_err_rx_nobufs interface(0/1) Wed May  4 01:56:42 2016
    966       0              4          2        0 nic_err_link_tx_stalls interface(0/1) Wed May  4 01:56:42 2016
 
 
I see 6 packets pending just before (7 sec) the stall hit:
 
**************** Tx Ring *************************
0/1: tx h/t=501/507 (0x1f5/0x1fb)
0/1: tx cur/dirty=181/180 (0xb5/0xb4)
0/1: tx ring at 0xaf340000/0x9c85528
0/1: tx ring size tx/dev_tx 512/512
XXX cur_eop = 507 dirty_eop = 501 ctx = 507 dtx = 500 dev_cur_tx = 181 dev_dirty_tx = 180
 
 
**************** Tx Ring *************************
0/1: tx h/t=0/17 (0x0/0x11)
0/1: tx cur/dirty=17/0 (0x11/0x0)
0/1: tx ring at 0xaf340000/0x9c85528
0/1: tx ring size tx/dev_tx 512/512
XXX cur_eop = 17 dirty_eop = 0 ctx = 17 dtx = 0 dev_cur_tx = 17 dev_dirty_tx = 0
 
 
Noticed some congestion drops during different time:
 
  422       0           1571       1571      224 nic_err_congested_pkts_dropped interface(0/1) Wed May  4 11:32:19 2016
    422       0           1571       1571      224 nic_err_congested_pkts_dropped interface(0/1) Wed May  4 11:32:19 2016
 
    423       0          20457      20457     2922 nic_err_congestionlimit_pkts_dropped interface(0/1) Wed May  4 11:32:19 2016
    423       0          20457      20457     2922 nic_err_congestionlimit_pkts_dropped interface(0/1) Wed May  4 11:32:19 2016
 
    421       0          20457      20457     2922 nic_err_tx_dropped interface(0/1) Wed May  4 11:32:19 2016
    421       0          20457      20457     2922 nic_err_tx_dropped interface(0/1) Wed May  4 11:32:19 2016
 
    420       0          65034      65034     9290 nic_err_tx_overflow interface(0/1) Wed May  4 11:32:19 2016
    420       0          65034      65034     9290 nic_err_tx_overflow interface(0/1) Wed May  4 11:32:19 2016
 
 
 
During the second time stall, there is no jump in the nobuf:
 
$ nsconmsg105 -K newnslog.117 -d current -g stalls -s disptime=1
 
NetScaler NS10.5: Build 55.8.nc, Date: Jan 25 2015, 23:55:26
 
 
reltime:mili second between two records Sat May  7 09:18:04 2016
  Index   rtime totalcount-val      delta rate/sec symbol-name&device-no&time
      0 1910999              1          1        0 nic_err_link_tx_stalls interface(0/1) Sat May  7 09:18:04 2016
      1    7000              2          1        0 nic_err_link_tx_stalls interface(0/1) Sat May  7 09:18:11 2016
      2    7000              4          2        0 nic_err_link_tx_stalls interface(0/1) Sat May  7 09:18:18 2016
 
 
    999    7000          33318        223       31 nic_err_rx_nobufs interface(0/1) Sat May  7 09:09:26 2016
   1000  497000          33461        143       20 nic_err_rx_nobufs interface(0/1) Sat May  7 09:17:43 2016
   1001    7000          33797        336       48 nic_err_rx_nobufs interface(0/1) Sat May  7 09:17:50 2016
   1002    7000          34181        384       54 nic_err_rx_nobufs interface(0/1) Sat May  7 09:17:57 2016
   1003    7000          34254         73       10 nic_err_rx_nobufs interface(0/1) Sat May  7 09:18:04 2016
   1004       0              1          1        0 nic_err_link_tx_stalls interface(0/1) Sat May  7 09:18:04 2016
   1005    7000          34256          2        0 nic_err_rx_nobufs interface(0/1) Sat May  7 09:18:11 2016
   1006       0              2          1        0 nic_err_link_tx_stalls interface(0/1) Sat May  7 09:18:11 2016
   1007    7000          34257          1        0 nic_err_rx_nobufs interface(0/1) Sat May  7 09:18:18 2016
   1008       0              4          2        0 nic_err_link_tx_stalls interface(0/1) Sat May  7 09:18:18 2016
 
 
 
This time 5 packets were pending:
 
**************** Tx Ring *************************
0/1: tx h/t=63/68 (0x3f/0x44)
0/1: tx cur/dirty=480/476 (0x1e0/0x1dc)
0/1: tx ring at 0xaf340000/0x9c85528
0/1: tx ring size tx/dev_tx 512/512
XXX cur_eop = 68 dirty_eop = 63 ctx = 68 dtx = 62 dev_cur_tx = 480 dev_dirty_tx = 476
 
 
  E1000_TDH(i)    0x3810  0x3f(63)
  E1000_TDT(i)    0x3818  0x44(68)
 
 
**************** Tx Ring *************************
0/1: tx h/t=0/14 (0x0/0xe)
0/1: tx cur/dirty=14/0 (0xe/0x0)
0/1: tx ring at 0xaf340000/0x9c85528
0/1: tx ring size tx/dev_tx 512/512
XXX cur_eop = 14 dirty_eop = 0 ctx = 14 dtx = 0 dev_cur_tx = 14 dev_dirty_tx = 0
 
$ grep "tx h/t" nicdata.117
0/1: tx h/t=63/68 (0x3f/0x44)
0/1: tx h/t=63/68 (0x3f/0x44)
0/1: tx h/t=0/14 (0x0/0xe)
0/1: tx h/t=0/14 (0x0/0xe)
0/1: tx h/t=0/14 (0x0/0xe)
0/1: tx h/t=0/14 (0x0/0xe)
0/1: tx h/t=0/12 (0x0/0xc)
0/1: tx h/t=0/12 (0x0/0xc)
 
 
 
There is no problem in entering tx/rx proc loop cycle during the issue:
 
47061    7000        9919119         28        4 vc_tot_rx_proc_enter VC(12) Sat May  7 09:17:29 2016
  47062       0       15631791         41        5 vc_tot_tx_proc_enter VC(12) Sat May  7 09:17:29 2016
  47063       0      159962632       3913      559 vc_tot_rx_proc_enter VC(13) Sat May  7 09:17:29 2016
  47064    7000        9919308        189       27 vc_tot_rx_proc_enter VC(12) Sat May  7 09:17:36 2016
  47065       0       15632002        211       30 vc_tot_tx_proc_enter VC(12) Sat May  7 09:17:36 2016
  47066       0      159966580       3948      564 vc_tot_rx_proc_enter VC(13) Sat May  7 09:17:36 2016
  47067    7000        9921593       2285      326 vc_tot_rx_proc_enter VC(12) Sat May  7 09:17:43 2016
  47068       0       15633670       1668      238 vc_tot_tx_proc_enter VC(12) Sat May  7 09:17:43 2016
  47069       0      159970707       4127      589 vc_tot_rx_proc_enter VC(13) Sat May  7 09:17:43 2016
  47070    7000        9927166       5573      796 vc_tot_rx_proc_enter VC(12) Sat May  7 09:17:50 2016
  47071       0       15637720       4050      578 vc_tot_tx_proc_enter VC(12) Sat May  7 09:17:50 2016
  47072       0      159974611       3904      557 vc_tot_rx_proc_enter VC(13) Sat May  7 09:17:50 2016
  47073    7000        9932626       5460      780 vc_tot_rx_proc_enter VC(12) Sat May  7 09:17:57 2016
  47074       0       15641695       3975      567 vc_tot_tx_proc_enter VC(12) Sat May  7 09:17:57 2016
  47075       0      159978537       3926      560 vc_tot_rx_proc_enter VC(13) Sat May  7 09:17:57 2016
  47076    7000        9933473        847      121 vc_tot_rx_proc_enter VC(12) Sat May  7 09:18:04 2016
  47077       0       15642871       1176      168 vc_tot_tx_proc_enter VC(12) Sat May  7 09:18:04 2016
  47078       0      159982327       3790      541 vc_tot_rx_proc_enter VC(13) Sat May  7 09:18:04 2016
  47079    7000       15643608        737      105 vc_tot_tx_proc_enter VC(12) Sat May  7 09:18:11 2016
  47080       0      159986280       3953      564 vc_tot_rx_proc_enter VC(13) Sat May  7 09:18:11 2016
  47081    7000       15644086        478       68 vc_tot_tx_proc_enter VC(12) Sat May  7 09:18:18 2016
  47082       0      159990053       3773      539 vc_tot_rx_proc_enter VC(13) Sat May  7 09:18:18 2016
  47083    7000      159993939       3886      555 vc_tot_rx_proc_enter VC(13) Sat May  7 09:18:25 2016
  47084    7000      159997835       3896      556 vc_tot_rx_proc_enter VC(13) Sat May  7 09:18:32 2016
  47085    7000      160001911       4076      582 vc_tot_rx_proc_enter VC(13) Sat May  7 09:18:39 2016
  47086    7000      160005795       3884      554 vc_tot_rx_proc_enter VC(13) Sat May  7 09:18:46 2016
  47087    7000      160009694       3899      557 vc_tot_rx_proc_enter VC(13) Sat May  7 09:18:53 2016