Workaround : Kill the AAAD process on the ADC
Fix :- It is recommended to use different radius action for account and authentication purpose.
- Separate Authentication and Accounting connections to 2 different ports – 1812 and 1813 (RFC standard), so that Authentication action does not get blocked by Accounting action. Any two ports can be used as per server configuration and are not limited to 1812 and 1813.
Sample policies for Radius Server: add authentication radiusAction Authserver -serverIP <x.x.x.x> -serverPort 1812 -authTimeout <x> -radKey XXX -authentication ON -accounting OFF -authServRetry <y> add authentication radiusPolicy AuthPol ns_true Authserver add authentication radiusAction AccountingServer -serverIP <x.x.x.x> -serverPort 1813 -authTimeout 1 -radKey XXX -authentication OFF -accounting ON -authServRetry 1add authentication radiusPolicy AccountingPol ns_true AccountingServer Notes:- 1st policy is for Auth-only, 2nd is for Accounting-only;- 1812 is standard port for Radius Auth, and 1813 for Radius Accounting- Accounting functionality [In NetScaler] works based on best effort principle where it is not guaranteed that operation is successful. If a lot of accounting requests are generating in the environment, it is recommended to tweak certain parameters to optimize accounting functionality :
authTimeout : This can be set to 1. Because for accounting anyway NetScaler does not do any operation based on response from server.
authServRetry : Since accounting functionality works on best effort principle, we do not need to retry many times. This can be changed to 1
Problem Cause
- Because of high number of accounting requests on the same port as used for Authentication, Auth requests can get held up in AAAD surge queue and get timed out causing Authentication failures.
nsconmsg -K newnslog -g aaad -s time=27Mar2019:03:14:00 -s disptime=1 -d current | more
6 0 68221 1 0 aaa_tot_newconn_aaad Wed Mar 27 03:14:14 2019
7 0 68043 1 0 aaa_tot_cpcb_in_aaad_surgeQ Wed Mar 27 03:14:14 2019
8 7000 11416661 37515 5359 aaa_tot_aaad_protocol_error Wed Mar 27 03:14:21 2019
9 7000 11454084 37423 5346 aaa_tot_aaad_protocol_error Wed Mar 27 03:14:28 2019
10 7001 11491575 37491 5355 aaa_tot_aaad_protocol_error Wed Mar 27 03:14:35 2019
11 7000 11529002 37427 5346 aaa_tot_aaad_protocol_error Wed Mar 27 03:14:42 2019
12 7000 68232 11 1 aaa_tot_newconn_aaad Wed Mar 27 03:14:49 2019
13 0 11566127 37125 5303 aaa_tot_aaad_protocol_error Wed Mar 27 03:14:49 2019
14 0 2216314 2 0 aaa_tot_aaad_replace_conn Wed Mar 27 03:14:49 2019
15 0 15464 2 0 aaa_tot_aaad_fin Wed Mar 27 03:14:49 2019
16 0 68045 2 0 aaa_tot_cpcb_in_aaad_surgeQ Wed Mar 27 03:14:49 2019
17 7000 68247 15 2 aaa_tot_newconn_aaad Wed Mar 27 03:14:56 2019
18 0 11579675 13548 1935 aaa_tot_aaad_protocol_error Wed Mar 27 03:14:56 2019
19 0 2216328 14 2 aaa_tot_aaad_replace_conn Wed Mar 27 03:14:56 2019
20 0 15477 13 1 aaa_tot_aaad_fin Wed Mar 27 03:14:56 2019
21 0 68051 6 0 aaa_tot_cpcb_in_aaad_surgeQ Wed Mar 27 03:14:56 2019
- aaa_tot_aaad_protocol_error may also increment wildly - means that AAAD is not ACK-ing to PE.
- There is a limit on the number of connection PE can maintain with AAAD at any point of time. This limit is getting hit, as shown by aaa_tot_cpcb_in_aaad_surgeQ counter. So it *can* affect other authentication types too. It will not always affect because surgeQ can get emptied in time (before “other” authentication timeouts).
- Accounting, by its nature, consumes a lot of these PE <-> AAAD connections. Especially if it is configured as ns_true.