Citrix Provisioning Target Poor Performance And Slow Boot

Citrix Provisioning Target Poor Performance And Slow Boot

book

Article ID: CTX583998

calendar_today

Updated On:

Description

Target Devices appear to boot slowly and hang or remain stuck at a black screen shortly after power on.  Other devices that are up and running can be found frozen.  These machines might be found "unregistered" in Studio and you cannot RDP or console into them.  PVS servers may be reporting DB Offline events and overall performance is slow. 

Target Devices event logs were redirected should persist showing events 155, 156, 157 and 158.  
PVS Servers may show DB Offline events under streamprocess in the event viewer.

Resolution

Target Devices should be running version 1912CU6 or 2203CU3 or any Current Release version beyond this. 

Find the machines that are currently booted up and hung or stuck in the middle of a slow boot and power them down.  This eliminates the consistent HA_Login request and allows PVS to catch up with IO request for running devices.  DB Offline should return to a normal state and boot performance should start performing as expected.  

The PVS Software should be upgraded to avoid this slow performance.  The Target Device had inconsistencies in its reconnect over several versions of the client as a result of ongoing Microsoft changes in this space.

Problem Cause

Target Device client software cannot reconnect back to its local socket in order to establish the vDisk connection.  Prior to PVS 1912CU6 /2203CU3 the target software client did not consistently connect back to its local socket during HA.  The Target can HA for many reasons, too many request for the same block of data, a break in network connectivity (with pending IO requests) such as during DRS or even the act of dynamically or manually rebalancing the devices amongst servers will set off HA.  HA is driven by the Target Device (bnistack.sys) its job is to send a HA_Logon request to the list of servers it found in its bootstrap.  The Target will send a request every second until reconnected or power cycled.  The PVS server must query SQL when the request comes in, this in turn puts pressure on SQL, the more devices in this state of cannot reconnect continues driving significant traffic in the servers direction.  This can result in SQL deadlocking and DB Offline events and continued slowness in IO replies if devices continue to fail and not first be mitigated, powered down, rebooted etc..

Event 155 is completely normal, this is a message indicating HA in progress.  PVS typically handles HA seamlessly and this event is not usually followed by anything.

This issue is not consistent in terms of failure you may see any combination of events always starting with event 155 and ending with event 158.  156 and 157 may or may not be present in the failure.

https://support.citrix.com/article/CTX126407/windows-event-codes-generated-by-provisioning-services-bnistack

155 Informational [IosReconnectHA] HA Reconnect in progress
156 Informational [IosReconnectHA] Invalid socket error, trying another socket
157 Informational [IosReconnectHA] Failed to connect to the server pausing before retry
158 Error [MIoProcessIosReadTransaction] Invalid reconnection handle returned

Additional Information

This problem in particular is specific to how the PVS Target software reconnects back to the Windows socket.  This problem can also occur when an aggressive security client or AV component prevents the connection.  The continued HA_Login requests can have a significant impact on streamprocess performance, this in turn leads to PVS server high CPU.  It important to apply a good version of the target software or eliminate the additional security impact followed by a reboot of all devices in order to fully mitigate this behavior.