Cloudbridge3000::Very high ICA latency observed and poor performance issues observed due to high Inbound Queue Service Time

Cloudbridge3000::Very high ICA latency observed and poor performance issues observed due to high Inbound Queue Service Time

book

Article ID: CTX214125

calendar_today

Updated On:

Description

Poor performance for ICA users and high ICA latency are observed on the CB3000 units. We can see spike in Inbound Queue Service time when the issue happens. 

Inbound_latency
LatencyonCB

Resolution

Engineering team has released a hot fix 7.4.5.1002 for this issue. Customers reporting “High Inbound Latency” should enable profiling after upgrading to 7.4.5.1002 hotfix.
This will help us in identifying slowness issues other than “memory fragmentation”.



Profiling feature is controlled through CB parameters and details are as follows
Sr. NoParameter NameDefault ValueDescriptionRecommendation
1Oprofile.EnabledONEnable/Disable Profiling feature. Works only on Neopolis platforms(400, 800, 2K, 3K, 1K).Do not change this until recommended by Dev OR you are sure that the CB has some issue due to this feature.
2Oprofile.InboundWaitThreshHold500000 (500 ms)Profiling will start once we reach this threshold.Change this value if customer is reporting inbound queue latency that is less than 500 ms and you feel profiling should start earlier.
E.g. If customer is reporting 300 ms Inbound latency, you can change this value 200 ms to trigger profiling before it reaches the peak(300 ms).
3Oprofile.Start Starts monitoring InboundLatency until it reaches Oprofile.InboundWaitThreshHold.
Once Oprofile.InboundWaitThreshHold is reached, profiling will automatically start and will stop when Inbound Latency is less than Oprofile.InboudWaitThreshold or Oprofile.RunTimeOut whichever occurs earlier.
 
4Oprofile.RunTimeOut120 secondsEach Profile Sample is collected for minimum of this value OR until Inbound Latency is again below Oprofile.InboundWaitThreshHold .Do not change this unless recommended by Dev.
5Oprofile.TotalSample4After running Oprofile.Start, we collect Oprofile.TotalSample number of samples.
Profiling will be stopped after this and wont resume until Oprofile.Start command is started again.
Change this value if you feel that default number of samples wont be sufficient to analyze the issue.
6Oprofile.SleepTime2 secondsInternal useDo not change this unless recommended by Dev.
7Oprofile.CommandSleepInterval5 secondsInternal useDo not change this unless recommended by Dev.

Steps to follow:
1)       Change default values only if needed or as suggested above.
2)       Run command Oprofile.Start
3)       Monitor Inbound Latency, once the Latency has crossed the set threshold value multiple times or max Oprofile.TotalSample(4), trigger STS collection.

 

Problem Cause

We can notice that there are stalled connections present on the device. An example for reference: 
7409::7408 Thu Oct 22 10:49:58 2015 TCP RST by Local Side , 172.18.124.156:49741(SLOW) (RTT = 516.820 mSec) <- CB -> 172.18.98.182:2598(FAST) (RTT = 6.606 mSec). LAN RX, TX: (278032B, 68625B). WAN RX, TX: (81374B, 239856B). Connection Active for: 586.887 Sec. SLOW last RX, TX Delta: (1.380 Sec,49.972 mSec). FAST last RX, TX Delta: (522.199 mSec,1.380 Sec). Slow, Fast RTOs: (14,0). Slow, Fast TCP Zero Win: (0,0) , Reset Cause: Connection 172.18.124.156:49741<->172.18.98.182:2598 has stalled, endpoint 172.18.124.156:49741 is no longer responding.. SC: Name = ICA ,Compression = DBC. FI: CPSProtocol = CGP.

Engineering team has created a hotfix so that we can enable profiling in customer deployments so that we can get more debug information at the issue time to identify the exact root cause for this issue.