NetScaler StoreFront Monitor Probe Fails on StoreFront 3.5

NetScaler StoreFront Monitor Probe Fails on StoreFront 3.5

book

Article ID: CTX208406

calendar_today

Updated On:

Description

When we bind the StoreFront monitor to our StoreFront 3.5 servers, every hour there is an entry on the dashboard and system log that there is a failure - probe failed.

Resolution

Changing the monitor parameters for successive probe and response time-out to 10 and 5 seconds has fixed the issue.

This tells us that the script is fine and the NetScaler is able to reach the StoreFront servers but it is not getting a response in time to consider the server as UP.
This explains why we see the monitor flap down and up in such a short period. Giving the extra response time allows the server more time to respond to the probe.

To understand how to monitor Citrix StoreFront, you can refer to: http://docs.citrix.com/en-us/netscaler/11/traffic-management/load-balancing/load-balancing-builtin-monitors/monitor-citrix-sf-services.html


Problem Cause

Looking through the newnslog messages we see the following entries:

2155 14856 PPE-0 MonServiceBinding_SRV-BE-DI-0134.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0134_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 17:41:53 2016
2165    21 PPE-0 MonServiceBinding_SRV-BE-DI-0135.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0135_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 17:42:19 2016
2195  2032 PPE-0 MonServiceBinding_SRV-BE-DI-0134.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0134_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 18:19:15 2016
2202     7 PPE-0 MonServiceBinding_SRV-BE-DI-0134.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0134_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 18:23:20 2016
2203     7 PPE-0 DBSMonServiceBinding_SRV-BE-DI-0134.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0134_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 18:23:25 2016
2210   231 PPE-0 MonServiceBinding_SRV-BE-DI-0135.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0135_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 18:27:18 2016
2222     0 PPE-0 MonServiceBinding_SRV-BE-DI-0135.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0135_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 18:46:43 2016
2223    14 PPE-0 DBSMonServiceBinding_SRV-BE-DI-0135.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0135_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 18:46:53 2016
2275    28 PPE-0 MonServiceBinding_SRV-BE-DI-0135.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0135_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 18:49:28 2016
2333   168 PPE-0 MonServiceBinding_SRV-BE-DI-0135.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0135_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 19:04:02 2016
2335   196 PPE-0 MonServiceBinding_SRV-BE-DI-0135.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0135_SSL): UP; Last response: Success - Probe succeeded. Tue Mar  1 19:07:20 2016
2337    70 PPE-0 MonServiceBinding_SRV-BE-DI-0135.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0135_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 19:08:30 2016
2349     7 PPE-0 MonServiceBinding_SRV-BE-DI-0134.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0134_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 19:08:47 2016
2350    28 PPE-0 MonServiceBinding_SRV-BE-DI-0135.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0135_SSL): UP; Last response: Success - Probe succeeded. Tue Mar  1 19:09:12 2016
2352     7 PPE-0 MonServiceBinding_SRV-BE-DI-0134.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0134_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 19:09:22 2016
2375    14 PPE-0 MonServiceBinding_SRV-BE-DI-0134.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0134_SSL): DOWN; Last response: Failure - Probe failed. Tue Mar  1 19:11:43 2016
2377   315 PPE-0 MonServiceBinding_SRV-BE-DI-0134.XXXXXXX.COM:443_(mon_storefront)(svc_srv-be-di-0134_SSL): UP; Last response: Success - Probe succeeded. Tue Mar  1 19:16:54 2016

As you can see, the probes are failing more often than every hour. We do not get a reason for the probe failing though.

I also checked the nsumond.log and there are a lot of entries for the script nssf.pl failing for different reasons:

Wed Mar  2 20:54:52 2016: /netscaler/monitors/nssf.pl Script failed. Exit code : 1 (Partition ID: 0)
Wed Mar  2 20:54:52 2016: /netscaler/monitors/nssf.pl Exit Reason : (404 Not Found) (Partition ID: 0)
 
Wed Mar  2 21:05:22 2016: /netscaler/monitors/nssf.pl Script failed. Exit code : 1 (Partition ID: 0)
Wed Mar  2 21:05:22 2016: /netscaler/monitors/nssf.pl Exit Reason : (Citrix Peer Resolution Service CitrixConfigurationReplication CitrixCredentialWallet CitrixDefaultDomainService CitrixSubscriptionsStore WAS W3SVC stopped running.Degraded Services.) (Partition ID: 0)
 
Fri Mar  4 14:15:23 2016: /netscaler/monitors/nssf.pl Script failed. Exit code : 1 (Partition ID: 0)
Fri Mar  4 14:15:23 2016: /netscaler/monitors/nssf.pl Exit Reason : (200 OK) (Partition ID: 0)

However the main reason for the failure is this:

Wed Mar  2 19:34:38 2016: /netscaler/monitors/nssf.pl Script failed. Exit code : 1 (Partition ID: 0)
Wed Mar  2 19:34:38 2016: /netscaler/monitors/nssf.pl Exit Reason : (500 Can't connect to 192.168.200.135:443) (Partition ID: 0)
Wed Mar  2 19:34:43 2016: /netscaler/monitors/nssf.pl Script failed. Exit code : 1 (Partition ID: 0)
Wed Mar  2 19:34:43 2016: /netscaler/monitors/nssf.pl Exit Reason : (500 Can't connect to 192.168.200.135:443) (Partition ID: 0)
Wed Mar  2 19:34:58 2016: /netscaler/monitors/nssf.pl Script failed. Exit code : 1 (Partition ID: 0)
Wed Mar  2 19:34:58 2016: /netscaler/monitors/nssf.pl Exit Reason : (500 Can't connect to 192.168.200.134:443) (Partition ID: 0)
Wed Mar  2 20:07:49 2016: /netscaler/monitors/nssf.pl Script failed. Exit code : 1 (Partition ID: 0)
Wed Mar  2 20:07:49 2016: /netscaler/monitors/nssf.pl Exit Reason : (500 Can't connect to 192.168.200.135:443) (Partition ID: 0)

Issue/Introduction

When we bind the StoreFront monitor to our StoreFront 3.5 servers, every hour there is an entry on the dashboard, system log, that there is a failure - probe failed.