Citrix ADC : Radius authentication failures when Accounting and Authentication are configured on the same port

Citrix ADC : Radius authentication failures when Accounting and Authentication are configured on the same port

book

Article ID: CTX261680

calendar_today

Updated On:

Description

When Radius authentication and accounting are configured on the same port on Citrix ADC, the Authentication requests to the Radius server can start failing overtime and CPU can go high.

Resolution

Workaround : Kill the AAAD process on the ADC 

Fix :
- It is recommended to use different radius action for account and authentication purpose.
- Separate Authentication and Accounting connections to 2 different ports – 1812 and 1813 (RFC standard), so that Authentication action does not get blocked by Accounting action. Any two ports can be used as per server configuration and are not limited to 1812 and 1813. 

Sample policies for Radius Server

add authentication radiusAction Authserver -serverIP <x.x.x.x> -serverPort 1812 -authTimeout <x> -radKey XXX -authentication ON -accounting OFF -authServRetry <y> 
add authentication radiusPolicy AuthPol ns_true Authserver
 
add authentication radiusAction AccountingServer -serverIP <x.x.x.x> -serverPort 1813 -authTimeout 1 -radKey XXX -authentication OFF -accounting ON -authServRetry 1
add authentication radiusPolicy AccountingPol ns_true AccountingServer
 
Notes:
- 1st policy is for Auth-only, 2nd is for Accounting-only;
- 1812 is standard port for Radius Auth, and 1813 for Radius Accounting
Accounting functionality [In NetScaler] works based on best effort principle where it is not guaranteed that operation is successful. If a lot of accounting requests are generating in the environment, it is recommended to tweak certain parameters to optimize accounting functionality :

authTimeout  : This can be set to 1. Because for accounting anyway NetScaler does not do any operation based on response from server.
authServRetry : Since accounting functionality works on best effort principle, we do not need to retry many times. This can be changed to 1


 

Problem Cause

- Because of high number of accounting requests on the same port as used for Authentication, Auth requests can get held up in AAAD surge queue and get timed out causing Authentication failures.


nsconmsg -K newnslog -g aaad -s time=27Mar2019:03:14:00 -s disptime=1 -d current | more
 
      6       0          68221          1        0 aaa_tot_newconn_aaad  Wed Mar 27 03:14:14 2019 
      7       0          68043          1        0 aaa_tot_cpcb_in_aaad_surgeQ  Wed Mar 27 03:14:14 2019 
      8    7000       11416661      37515     5359 aaa_tot_aaad_protocol_error  Wed Mar 27 03:14:21 2019 
      9    7000       11454084      37423     5346 aaa_tot_aaad_protocol_error  Wed Mar 27 03:14:28 2019 
     10    7001       11491575      37491     5355 aaa_tot_aaad_protocol_error  Wed Mar 27 03:14:35 2019 
     11    7000       11529002      37427     5346 aaa_tot_aaad_protocol_error  Wed Mar 27 03:14:42 2019 
     12    7000          68232         11        1 aaa_tot_newconn_aaad  Wed Mar 27 03:14:49 2019 
     13       0       11566127      37125     5303 aaa_tot_aaad_protocol_error  Wed Mar 27 03:14:49 2019 
     14       0        2216314          2        0 aaa_tot_aaad_replace_conn  Wed Mar 27 03:14:49 2019 
     15       0          15464          2        0 aaa_tot_aaad_fin  Wed Mar 27 03:14:49 2019 
     16       0          68045          2        0 aaa_tot_cpcb_in_aaad_surgeQ  Wed Mar 27 03:14:49 2019 
     17    7000          68247         15        2 aaa_tot_newconn_aaad  Wed Mar 27 03:14:56 2019 
     18       0       11579675      13548     1935 aaa_tot_aaad_protocol_error  Wed Mar 27 03:14:56 2019 
     19       0        2216328         14        2 aaa_tot_aaad_replace_conn  Wed Mar 27 03:14:56 2019 
     20       0          15477         13        1 aaa_tot_aaad_fin  Wed Mar 27 03:14:56 2019 
     21       0          68051          6        0 aaa_tot_cpcb_in_aaad_surgeQ  Wed Mar 27 03:14:56 2019 


aaa_tot_aaad_protocol_error may also increment wildly - means that AAAD is not ACK-ing to PE.
- There is a limit on the number of connection PE can maintain with AAAD at any point of time. This limit is getting hit, as shown by aaa_tot_cpcb_in_aaad_surgeQ counter. So it *can* affect other authentication types too. It will not always affect because surgeQ can get emptied in time (before “other” authentication timeouts).
- Accounting, by its nature, consumes a lot of these PE <-> AAAD connections. Especially if it is configured as ns_true.