Users are encountering extended ICA connection interruptions during NetScaler High Availability (HA) failover events within the Azure environment.
There are two source of extended interruption:
(1) caused by Azure SLB health probe interval;
(2) caused by exponential interval of TCP SYN retransmission.
If the Azure SLB health probe interval is 5s, according to information from of Microsoft, in the worst case it will take 10 seconds for ALB to judge active/passive state of VPX.
By default, when the client receives a reset from the ADC he will immediately send a reconnection, followed by a retransmission in 3s, 6s, and 12s. In the previous situation of long ICA disconnection, the retransmission of 3s and 6s did not receive any reply, and retransmission in 12s will get a reply every time. This phenomenon is consistent with the above prediction that ALB might take 10s for HA failover in the worst case.
In the case of smooth switching, the client got correct reply when retransmitting within 3s or 6s.
Nstrace analysis:
(1) According to package on client side, HA failover happened on 16:18:14 and client received TCP reset from ADC(Frame No. 4619). Then at 16:18:15, client send TCP syn(Frame No. 4718) to recover the connection. Client sent TCP retransmission at 16:18:18/16:18:24 and got no response(Frame No. 4865 and 4946). Until 12s later at 16:18:36, TCP connection established successfully.
The behavior of the client is reasonable because it complies with the SYN retransmission mechanism in TCP protocol defined in RFC 1122.
Reference: https://www.rfc-editor.org/rfc/rfc1122#page-96
(2) In nstrace on VPX, the first TCP syn is received at 16:18:36,whicth means previous two TCP retransmission is lost in the way: