Rate this Article:
You must be signed in to rate again
Article Feedback Print View
Alternate Languages: N/A

Case Study: Authentication Fails Periodically

Document ID: CTX114008   /   Created On: Aug 30, 2007   /   Updated On: Oct 12, 2007
Average Rating: not yet rated

Problem Definition

The customer reported that RADIUS users were prevented from authenticating every few days. They examined the traffic coming into the RADIUS server and noted that no authentication requests were coming from the NetScaler when this happened. The customer found that restarting the NetScaler resolves the issue for around a week, or unbinding the secondary and then rebinding it resolves it for around a day. This was reported on 7.0 build 49.5, which was upgraded from 43.1.

Environment

  • NetScaler 7.0 build 49.5
  • Authenticating with a RADIUS server

Troubleshooting Methodology

When the problem started occurring, the output from the aaad.debug.log was examined to determine if nsaaad was trying to contact the RADIUS server. Simultaneously a trace was taken to look at the traffic being sent from the NetScaler to the RADIUS server. Newnslog was also examined for hints on the behavior of the nsaaad process. The trace showed that no packets were being sent from the NetScaler to the RADIUS server. Engineering confirmed that the symptoms matched a known problem documented in BUG25686. This describes an issue when NetScaler runs out of connections to the process which communicates with the RADIUS server for authentication. When the NetScaler enters dialogue mode and the client never completes the conversation, one aaad connection gets into a state where it can’t be reused. When the maximum of 10 connections get into this state, the communication to the RADIUS server is lost.

Resolution

Workaround

When the authentication process was failing, the customer already knew that restarting the NetScaler would resolve the issue for a short time. Additionally restarting the nsaaad process through the appropriate kill command would free up the congested connections, and get the process working again. However, because it would take a while to get into the failed condition, a cron job could be set up to kill the nsaaad process on a regular basis. Because the system would be stable for a day, restarting the process nightly would be sufficient to eliminate the pain from the hung communications. Killing the nsaaad process would result in PITBOSS restarting it quickly, and alleviating the condition. A script was set up to allow the customer to implement this regular restart of the process so that the eventual fix could be planned and implemented without having to rush. Although the affected customer never reported back on this circumvention, it is believed that it was effective in keeping the authentication process working properly.

Final Resolution

Engineering was able to isolate the problem to the dialogue mode when the client never completed the conversation, resulting in connections that could not be reused. Over time, all 10 connections would become unusable. In order to resolve this situation, KeepAlive, which previously was enabled to prevent close idle connections, was disabled for these dialogue mode connections, and the idle timeout for dialogue mode connections was reduced to 2 minutes from the non-dialogue mode value of 6 minutes. The fix was integrated into 7.0 build 50 and 8.0 build 30.

Additional Information

NSTrace was analyzed to show that no packets were being sent to the RADIUS server when the problem was happening. The newnslog didn’t show any particularly unusual issues, as the nsaaad process was not stopped, just hung. Similarly, the aaad.debug.log file was not helpful in determining the status of the process.


Search
Knowledge Center
XenApp
XenApp Plugins (Clients)
XenServer
XenDesktop
NetScaler Application Delivery
Access Gateway
EdgeSight
Provisioning Server
WANScaler
Password Manager
Does it work with Citrix? Verify it - introducing the new Citrix Ready Community Verified