Customer installed a Repeater 8540 at Data Center and two Branch Repeater 300 appliances at two branch offices for the Proof of Concept (POC) test. After installation, customers in the branch offices reported that ICA sessions stopped responding from time to time. The session does not respond to keystrokes and mouse movements. However, the session continues to update items such as the clock, new emails, and so on.
Other branch offices, where Branch Repeater 300 appliance is not installed and the ICA traffic is not traversed through the Data Center Repeater 8540, the problem is not reported.
The troubleshooting tasks eliminated the following factors as the cause of ICA unresponsive problem:
Session reliability: ICA session stopped responding regardless of whether the session reliability was enabled or disabled. Also, the unresponsive period lasted for more than 10 minutes which exceed the default session reliability timeout of 3 minutes.
Multi-stream ICA: ICA session stopped responding regardless of whether the multi-stream ICA Support was enabled or disabled in the Repeater appliance.
Initially, the WAN link was assumed to be overdriven. WAN link has been tuned with Quality of Service (QoS) and no more packet transmission was observed. However, ICA unresponsiveness still persisted.
The Technical Support engineers observed multiple fast RTO (Round Trip Timeout) on the LAN interface. The RTO occurred for requests with protocols such as ICA, CIFS, and HTTP. RTO indicates packet drops for extended period. The engineers compared the time between the RTO and the ICA hung incidents reported by the users and it matched.
To verify if the issue is with the communication path between the Repeater 8540 and the XenApp Server farm, the engineers requested the customer to capture a network packet trace in PCAP format at the XenServer virtual interface (1) bound to the POC XenApp server. The engineers observed that the trace showed the following warning message when the packets were dropped:
“Checksum: 0xffff [should be 0x0000 (see RFC 1624)]”
By default, XenServer hypervisor offloads TCP checksum calculation to hardware NICs, and the same for Repeater 8540 offloads to the corresponding NICs. Customer is using Cisco UCS platform, and ICA session stopped responding because of compatibility issues between Repeater hardware and Cisco UCS hardware. Since the customer environment is defined, Citrix decided to change the Repeater to another appliance which is RFC1624 compliance. After Repeater 8540 is replaced by CloudBridge 2000 in the customer Data Center, the ICA session responded without any issues.
The value of TCP checksum in the packet is calculated as zero. This zero can be represented as either 0xffff using the RFC1071 method or 0x0000 using RFC1624 method.
Repeater (8500 and 8800 series appliance) uses RFC1071 compliance to compute “tx checksum” value under the method “Zero = 0xffff”.
The customer environment uses the more recent RFC1624 compliance to verify “rx checksum” value under the method “Zero = 0x0000”. The 0xffff value is not allowed and the packet is dropped.
After the packet is dropped, the packet is not retransmitted by ICA client because it is acknowledged by Branch Repeater appliance at the client side (4).
Instead, the packet is cached and retransmitted by Repeater 8540 (3) at the server side.
The retransmitted packet is dropped again because it contains the same TCP checksum value 0xffff. This packet is not dropped, and retransmission cycle causes the ICA session to stop responding.
The following links provide more information on TCP checksum Interoperability between RFC1624 and RFC1071:
Checksum interoperability problem occurs only when the system verifies “rx checksum” as per RFC1624 method. As explained in RFC1624 Section 5, this problem does not occur if system verifies “rx checksum” as per RFC1071 method. RFC1624 only requires system to compute incremental “tx checksum” as per RFC1624 method. CloudBridge 2000 and 3000 appliances follows these criteria.