Network Connection Failure with "STA ALERT LINK" at LCD Display
book
Article ID: CTX213082
calendar_today
Updated On:
Description
NetScaler network connection went down suddenly.It can be recovered by reboot. But the network connection will go down several days later.On NetScaler LCD display, it shows "STA ALERT LINK"
Resolution
RMA for the NetScaler device.
Problem Cause
From the logs, NIC Tx stall repeatedly:
During first Tx stall time, there is huge increase in rx_nobuf, but this should not cause the NIC to stall as per the logic:
$ nsconmsg105 -K newnslog.114 -d current -g stalls -s disptime=1
NetScaler NS10.5: Build 55.8.nc, Date: Jan 25 2015, 23:55:26
reltime:mili second between two records Wed May 4 01:56:28 2016
Index rtime totalcount-val delta rate/sec symbol-name&device-no&time
0 776999 1 1 0 nic_err_link_tx_stalls interface(0/1) Wed May 4 01:56:28 2016
1 7000 2 1 0 nic_err_link_tx_stalls interface(0/1) Wed May 4 01:56:35 2016
2 7000 4 2 0 nic_err_link_tx_stalls interface(0/1) Wed May 4 01:56:42 2016
959 7000 239630 38 5 nic_err_rx_nobufs interface(0/1) Wed May 4 01:56:14 2016
960 7000 239642 12 1 nic_err_rx_nobufs interface(0/1) Wed May 4 01:56:21 2016
961 7000 22390940 22151298 3164471 nic_err_rx_nobufs interface(0/1) Wed May 4 01:56:28 2016
962 0 1 1 0 nic_err_link_tx_stalls interface(0/1) Wed May 4 01:56:28 2016
963 7000 22390942 2 0 nic_err_rx_nobufs interface(0/1) Wed May 4 01:56:35 2016
964 0 2 1 0 nic_err_link_tx_stalls interface(0/1) Wed May 4 01:56:35 2016
965 7000 22390943 1 0 nic_err_rx_nobufs interface(0/1) Wed May 4 01:56:42 2016
966 0 4 2 0 nic_err_link_tx_stalls interface(0/1) Wed May 4 01:56:42 2016
I see 6 packets pending just before (7 sec) the stall hit:
**************** Tx Ring *************************
0/1: tx h/t=501/507 (0x1f5/0x1fb)
0/1: tx cur/dirty=181/180 (0xb5/0xb4)
0/1: tx ring at 0xaf340000/0x9c85528
0/1: tx ring size tx/dev_tx 512/512
XXX cur_eop = 507 dirty_eop = 501 ctx = 507 dtx = 500 dev_cur_tx = 181 dev_dirty_tx = 180
**************** Tx Ring *************************
0/1: tx h/t=0/17 (0x0/0x11)
0/1: tx cur/dirty=17/0 (0x11/0x0)
0/1: tx ring at 0xaf340000/0x9c85528
0/1: tx ring size tx/dev_tx 512/512
XXX cur_eop = 17 dirty_eop = 0 ctx = 17 dtx = 0 dev_cur_tx = 17 dev_dirty_tx = 0
Noticed some congestion drops during different time:
422 0 1571 1571 224 nic_err_congested_pkts_dropped interface(0/1) Wed May 4 11:32:19 2016
422 0 1571 1571 224 nic_err_congested_pkts_dropped interface(0/1) Wed May 4 11:32:19 2016
423 0 20457 20457 2922 nic_err_congestionlimit_pkts_dropped interface(0/1) Wed May 4 11:32:19 2016
423 0 20457 20457 2922 nic_err_congestionlimit_pkts_dropped interface(0/1) Wed May 4 11:32:19 2016
421 0 20457 20457 2922 nic_err_tx_dropped interface(0/1) Wed May 4 11:32:19 2016
421 0 20457 20457 2922 nic_err_tx_dropped interface(0/1) Wed May 4 11:32:19 2016
420 0 65034 65034 9290 nic_err_tx_overflow interface(0/1) Wed May 4 11:32:19 2016
420 0 65034 65034 9290 nic_err_tx_overflow interface(0/1) Wed May 4 11:32:19 2016
During the second time stall, there is no jump in the nobuf:
$ nsconmsg105 -K newnslog.117 -d current -g stalls -s disptime=1
NetScaler NS10.5: Build 55.8.nc, Date: Jan 25 2015, 23:55:26
reltime:mili second between two records Sat May 7 09:18:04 2016
Index rtime totalcount-val delta rate/sec symbol-name&device-no&time
0 1910999 1 1 0 nic_err_link_tx_stalls interface(0/1) Sat May 7 09:18:04 2016
1 7000 2 1 0 nic_err_link_tx_stalls interface(0/1) Sat May 7 09:18:11 2016
2 7000 4 2 0 nic_err_link_tx_stalls interface(0/1) Sat May 7 09:18:18 2016
999 7000 33318 223 31 nic_err_rx_nobufs interface(0/1) Sat May 7 09:09:26 2016
1000 497000 33461 143 20 nic_err_rx_nobufs interface(0/1) Sat May 7 09:17:43 2016
1001 7000 33797 336 48 nic_err_rx_nobufs interface(0/1) Sat May 7 09:17:50 2016
1002 7000 34181 384 54 nic_err_rx_nobufs interface(0/1) Sat May 7 09:17:57 2016
1003 7000 34254 73 10 nic_err_rx_nobufs interface(0/1) Sat May 7 09:18:04 2016
1004 0 1 1 0 nic_err_link_tx_stalls interface(0/1) Sat May 7 09:18:04 2016
1005 7000 34256 2 0 nic_err_rx_nobufs interface(0/1) Sat May 7 09:18:11 2016
1006 0 2 1 0 nic_err_link_tx_stalls interface(0/1) Sat May 7 09:18:11 2016
1007 7000 34257 1 0 nic_err_rx_nobufs interface(0/1) Sat May 7 09:18:18 2016
1008 0 4 2 0 nic_err_link_tx_stalls interface(0/1) Sat May 7 09:18:18 2016
This time 5 packets were pending:
**************** Tx Ring *************************
0/1: tx h/t=63/68 (0x3f/0x44)
0/1: tx cur/dirty=480/476 (0x1e0/0x1dc)
0/1: tx ring at 0xaf340000/0x9c85528
0/1: tx ring size tx/dev_tx 512/512
XXX cur_eop = 68 dirty_eop = 63 ctx = 68 dtx = 62 dev_cur_tx = 480 dev_dirty_tx = 476
E1000_TDH(i) 0x3810 0x3f(63)
E1000_TDT(i) 0x3818 0x44(68)
**************** Tx Ring *************************
0/1: tx h/t=0/14 (0x0/0xe)
0/1: tx cur/dirty=14/0 (0xe/0x0)
0/1: tx ring at 0xaf340000/0x9c85528
0/1: tx ring size tx/dev_tx 512/512
XXX cur_eop = 14 dirty_eop = 0 ctx = 14 dtx = 0 dev_cur_tx = 14 dev_dirty_tx = 0
$ grep "tx h/t" nicdata.117
0/1: tx h/t=63/68 (0x3f/0x44)
0/1: tx h/t=63/68 (0x3f/0x44)
0/1: tx h/t=0/14 (0x0/0xe)
0/1: tx h/t=0/14 (0x0/0xe)
0/1: tx h/t=0/14 (0x0/0xe)
0/1: tx h/t=0/14 (0x0/0xe)
0/1: tx h/t=0/12 (0x0/0xc)
0/1: tx h/t=0/12 (0x0/0xc)
There is no problem in entering tx/rx proc loop cycle during the issue:
47061 7000 9919119 28 4 vc_tot_rx_proc_enter VC(12) Sat May 7 09:17:29 2016
47062 0 15631791 41 5 vc_tot_tx_proc_enter VC(12) Sat May 7 09:17:29 2016
47063 0 159962632 3913 559 vc_tot_rx_proc_enter VC(13) Sat May 7 09:17:29 2016
47064 7000 9919308 189 27 vc_tot_rx_proc_enter VC(12) Sat May 7 09:17:36 2016
47065 0 15632002 211 30 vc_tot_tx_proc_enter VC(12) Sat May 7 09:17:36 2016
47066 0 159966580 3948 564 vc_tot_rx_proc_enter VC(13) Sat May 7 09:17:36 2016
47067 7000 9921593 2285 326 vc_tot_rx_proc_enter VC(12) Sat May 7 09:17:43 2016
47068 0 15633670 1668 238 vc_tot_tx_proc_enter VC(12) Sat May 7 09:17:43 2016
47069 0 159970707 4127 589 vc_tot_rx_proc_enter VC(13) Sat May 7 09:17:43 2016
47070 7000 9927166 5573 796 vc_tot_rx_proc_enter VC(12) Sat May 7 09:17:50 2016
47071 0 15637720 4050 578 vc_tot_tx_proc_enter VC(12) Sat May 7 09:17:50 2016
47072 0 159974611 3904 557 vc_tot_rx_proc_enter VC(13) Sat May 7 09:17:50 2016
47073 7000 9932626 5460 780 vc_tot_rx_proc_enter VC(12) Sat May 7 09:17:57 2016
47074 0 15641695 3975 567 vc_tot_tx_proc_enter VC(12) Sat May 7 09:17:57 2016
47075 0 159978537 3926 560 vc_tot_rx_proc_enter VC(13) Sat May 7 09:17:57 2016
47076 7000 9933473 847 121 vc_tot_rx_proc_enter VC(12) Sat May 7 09:18:04 2016
47077 0 15642871 1176 168 vc_tot_tx_proc_enter VC(12) Sat May 7 09:18:04 2016
47078 0 159982327 3790 541 vc_tot_rx_proc_enter VC(13) Sat May 7 09:18:04 2016
47079 7000 15643608 737 105 vc_tot_tx_proc_enter VC(12) Sat May 7 09:18:11 2016
47080 0 159986280 3953 564 vc_tot_rx_proc_enter VC(13) Sat May 7 09:18:11 2016
47081 7000 15644086 478 68 vc_tot_tx_proc_enter VC(12) Sat May 7 09:18:18 2016
47082 0 159990053 3773 539 vc_tot_rx_proc_enter VC(13) Sat May 7 09:18:18 2016
47083 7000 159993939 3886 555 vc_tot_rx_proc_enter VC(13) Sat May 7 09:18:25 2016
47084 7000 159997835 3896 556 vc_tot_rx_proc_enter VC(13) Sat May 7 09:18:32 2016
47085 7000 160001911 4076 582 vc_tot_rx_proc_enter VC(13) Sat May 7 09:18:39 2016
47086 7000 160005795 3884 554 vc_tot_rx_proc_enter VC(13) Sat May 7 09:18:46 2016
47087 7000 160009694 3899 557 vc_tot_rx_proc_enter VC(13) Sat May 7 09:18:53 2016
Was this article helpful?
thumb_up
Yes
thumb_down
No