NetScaler GSLB Deployment: GSLB vserver falling back to RR LB instead of configured RTT LB and LDNS entries being flushed
book
Article ID: CTX223269
calendar_today
Updated On:
Description
Problem Description:
In multi-site GSLB deployment, the gslb vserver is configured with RTT LB but still serving requests in RR fashion. As RTT based LB is not done, the dns lookups don’t land consistently into one GEO and this leads to breaking customer Apps in terms of latency and geo proximity. As a side effect the LDNS entries are absent on the site and are being flushed.
Resolution
Workaround:
Short-term Workaround
- As soon as the MEP goes down for one site, the GSLB services associated with the site need to be disabled on all other sites so that it is not considered for RTT selection, and the other GSLB services can still partake in the RTT selection.
- If the service is disabled only on one site, a GSLB sync will have to be done to all the other GSLB sites so the configuration is consistent across all sites in a cluster.
Long Term Design Solution
The GSLB RTT method can work with NetScaler based load balancers as well as non-NetScaler load-balancers. The current design needs to be adjusted to send the state of the GSLB services and RTT information via MEP to have a consistent state and not have a different state reported by the external monitor probes for the participated as gslb services on a given site.
This can be achieved by utilizing the following configuration flow:
- Configure a Load Balancing vserver on the local site GSLB layer Netscalers with services as external services that are configured on other load balancers. If more than one port need to be monitored, say port 80 and 443, we can have two monitor probes bound to the services that in turn are bound to the Load Balancing Virtual Server entity. The LB VIP will only be a placeholder VIP and can have an internal IP. This needs to be done on local site for the corresponding services.
- The IPs hosted on non NetScaler based load balancers are to be bound as services to the LB VIP entity on the GSLB layer Netscalers’ local site.
- The GSLB service configured on the local GSLB site references the LB vserver (created in step 1), but have the public IP same as the non-NetScaler load balancer based IP. This will ensure non-NetScaler load balancer IP is returned during GSLB decision.
- The monitor probes is only be bound to the services for the LB VIP entity on the local site and not to the GSLB services on the other GSLB layer Netscalers. It will also reduce the monitoring probes that are being sent out and will be restricted from the site that is local.
- In this scenario if the MEP goes down the corresponding LB VIP and hence the GSLB service will also go down and will not be considered for RTT selection. The other GSLB services belonging to other sites would still take part in the RTT selection and the GSLB method will not default to round robin. If the gslb site comes back up, the MEP comes back up immediately. And the state is communicated to the remaining sites and the service should start taking part in GSLB decision. Internally it should be instantaneous, the time taken to propagate to all sites is dependent on RTT between the nodes. This information is sent to all nodes simultaneously since they have MEP communication established.
Configuration on the local site
- Create a LB vserver on the local site and bind the non-NetScaler layer IPs as services. The GSLB service on the local site refers to the LB vserver entity but has the public IP which is the same as the SLB layer IP
add lb vserver dummy-vip HTTP xxx.xxx.xxx.xxx 80
add gslb service gslb_svc1 xxx.xxx.xxx.xxx HTTP 80 -publicIP x.x.x.x -publicPort 80 -maxClient 0 -healthMonitor NO -siteName LOCAL -cltTimeout 180 -svrTimeout 360 -downStateFlush DISABLED -appflowLog DISABLED
add service externalsvc x.x.x.x HTTP 80
bind lb vserver dummy-vip externalsvc
bind service externalsvc -monitorname http
bind service externalsvc -monitorname https
- The external monitoring configuration needs to be removed from all GSLB services. The monitors will be configured only on the services that are bound to the LB entity created on the local site that in turn is referenced by the GSLB service.
Problem Cause
Root cause:
-
The gslb_tot_mep_ldns_entries_received counter gets incremented indicating that MEP is UP and local site is receiving ldns entries from other sites.
-
One of the conditions leading to this issue is that at-least one out of multiple sites would be down and typically all svcs for this site are up as corresponding monitors are up
-
One of these svc from the remote down site is bound to the vserver on local site doing RR LB instead of configured RTT LB
-
vsvr_gslb_rr_contribs counter gets incremented to non-zero value on the vserver on local site doing RR LB instead of configured RTT LB. This represents the number of services that are contributing towards deferred Round Robin LB due to MEP failure. Hence services equal to counter increment value would impact the vserver thereby not allowing the vserver to apply the configured RTT LB method and to fall back to RR LB.
-
The short term remedy is to disable these svcs from remote down site on the local site where vserver is falling back to RR LB
Additional Information
Recommended upgrade procedure(s):
Upgrade is not applicable as NetScaler code change is not required for this behavior. It’s a configuration change for the work around.
Was this article helpful?
thumb_up
Yes
thumb_down
No