Error:nsmap[1384]: Working socket got closed with the error: Connection reset by peer

Error:nsmap[1384]: Working socket got closed with the error: Connection reset by peer

book

Article ID: CTX209480

calendar_today

Updated On:

Description

Issue:
Customer is running Netscaler build 10.1 132.x and he observing the following entry in ns.log continously. If he removes the location file he doesnt see the error for some time.Please note there are no errors reported for nsmap with the location entry.I would like to know the reason for the same. CPE team has recommended OB for moving forward with the investigation.


Environment:
show hardware

        Platform: NSMPX-11500 12*CPU+8*IX+4*E1K+2*E1K+2*CVM N3 1400210
        Manufactured on: 12/17/2014
        CPU: 2400MHZ
        Host Id: 872841350
        Serial no: D1DH52EZ3U
        Encoded serial no: D1DH52EZ3U


Configuration in place:
add locationFile "/var/netscaler/locdb/GeoIPCountryWhois.csv" -format geoip-country
add cs policy mapi-CN-switch -rule "CLIENT.IP.SRC.MATCHES_LOCATION(\"*.CN.*.*.*.*\") || CLIENT.IP.SRC.MATCHES_LOCATION(\"*.TW.*.*.*.*\")"
add cs policy www-CN-switch -rule "CLIENT.IP.SRC.MATCHES_LOCATION(\"*.CN.*.*.*.*\") || CLIENT.IP.SRC.MATCHES_LOCATION(\"*.TW.*.*.*.*\")"

ns.log:Dec 25 10:17:35 <local0.info> L41 nsmap[1384]: Working socket got closed with the error: Connection reset by peer
ns.log:Dec 25 10:26:59 <local0.info> L41 nsmap[1384]: Connection to PPE 4 closed on timeout. Idle time: 622909816

Temporary work around:
------------------------------
1. rm locationfile -> remove GeoIPCountryWhois.csv on the device -> upload new version csv file. ->add locationfile
2. test the location file (nsmap -d -t / sh location para)
3. reboot the mpx  

Works for a few days without error and error comes back after that

ns.log:Dec 25 10:17:35 <local0.info> L41 nsmap[1384]: Working socket got closed with the error: Connection reset by peer
ns.log:Dec 25 10:26:59 <local0.info> L41 nsmap[1384]: Connection to PPE 4 closed on timeout. Idle time: 622909816
ns.log:Dec 25 10:55:59 <local0.info> L41 nsmap[1384]: Connection to PPE 2 closed on timeout. Idle time: 637019591
ns.log:Dec 25 10:57:59 <local0.info> L41 nsmap[1384]: Connection to PPE 5 closed on timeout. Idle time: 645399764
ns.log:Dec 25 11:26:20 <local0.info> L41 nsmap[1384]: Working socket got closed with the error: Connection reset by peer
ns.log:Dec 25 11:45:59 <local0.info> L41 nsmap[1384]: Connection to PPE 5 closed on timeout. Idle time: 617648437
ns.log:Dec 25 11:56:59 <local0.info> L41 nsmap[1384]: Connection to PPE 0 closed on timeout. Idle time: 639441617
ns.log:Dec 25 12:04:01 <local0.info> L41 nsmap[1384]: Working socket got closed with the error: Connection reset by peer
ns.log:Dec 25 12:09:59 <local0.info> L41 nsmap[1384]: Connection to PPE 0 closed on timeout. Idle time: 612231470

Resolution

Even though as per design enchancement request has been filed to fix the issue.
     ENH0629590      NSMAP: Implement keep alive mechanism between NSMAP to PPE communication
 
 

Problem Cause

As per current design, PE does not close connection once request is served. This connection can be reused for subsequent search requests. If there is an idle time for 10 minutes, connection is closed by PCB zombie cleanup routine.

In this case, it is only PPE-3 making most of the NSMAP requests. On all other PEs, frequency is very low. Here, one point to note is we have a ownership for location entries and only owner make request to NSMAP. Non-owner PEs requests need to go via owner PE. Ownership is decided based on the IP hash. For that reason, we suspect that most of the requests are coming from one IP range which has got ownership on PE3.


 
CUSTOMER IS USING 7 PEs MPX BOX.
 
NetScaler is receiving traffic on all PEs(pcb_hits counter will give the supporting proof). Only PE3 is receiving all DB request.
 
There might be two reasons for PE3 to receive all the DB request.
  1. All the sourceIP/ClientIP falls is hashed to PE3
  2. The request is landing on PE3.
 
We belive all the client traffic are from the same IP range.
 
reltime:mili second between two records Mon Dec 28 00:01:44 2015
  Index   rtime totalcount-val      delta rate/sec symbol-name&device-no&time
    378       0         166054          1        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:44 2015 (PE-3)
    379       0         166048          2        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:44 2015 (PE-4)
    380       0         166239          2        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:44 2015 (PE-5)
    381       0         165744          1        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:44 2015 (PE-6)
    382       0        1118374          7        1 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:44 2015 (Aggr)
    383       0         434367          1        0 pcb_hits cs_pol(mapi-CN-switch)(tmon-mapi-80) Mon Dec 28 00:01:44 2015 (Aggr)
    384       0       30314986        158       22 gslb_tot_sp_db_req_search  Mon Dec 28 00:01:44 2015 (Aggr)
    385    7000         122385          2        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:51 2015 (PE-0)
    386       0         165614          4        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:51 2015 (PE-1)
    387       0         166297          1        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:51 2015 (PE-2)
    388       0       30304784        136       19 gslb_tot_sp_db_req_search  Mon Dec 28 00:01:51 2015 (PE-3)
    389       0         166056          2        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:51 2015 (PE-3)
    390       0         166051          3        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:51 2015 (PE-4)
    391       0          64547          2        0 pcb_hits cs_pol(mapi-CN-switch)(tmon-mapi-80) Mon Dec 28 00:01:51 2015 (PE-4)
    392       0         166240          1        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:51 2015 (PE-5)
    393       0         165746          2        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:51 2015 (PE-6)
    394       0        1118389         15        2 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:51 2015 (Aggr)
    395       0       30315122        136       19 gslb_tot_sp_db_req_search  Mon Dec 28 00:01:51 2015 (Aggr)
    396       0         434369          2        0 pcb_hits cs_pol(mapi-CN-switch)(tmon-mapi-80) Mon Dec 28 00:01:51 2015 (Aggr)
    397    7000         122387          2        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:58 2015 (PE-0)
    398       0         165615          1        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:58 2015 (PE-1)
    399       0         166299          2        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:58 2015 (PE-2)
    400       0          64576          1        0 pcb_hits cs_pol(mapi-CN-switch)(tmon-mapi-80) Mon Dec 28 00:01:58 2015 (PE-2)
    401       0       30304946        162       23 gslb_tot_sp_db_req_search  Mon Dec 28 00:01:58 2015 (PE-3)
    402       0         166057          1        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:58 2015 (PE-3)
    403       0         166054          3        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:58 2015 (PE-4)
    404       0         166243          3        0 pcb_hits cs_pol(www-CN-switch)(tmon-www-80) Mon Dec 28 00:01:58 2015 (PE-5)