Why are user logins with Elastic Layers enabled slower than normal logins on a non-EL image? Why are logins slower depending on how many Elastic Layer assignments a user has?
Background:
When you define an Image Template in the Layering Management Console (LMC), you have to select the Elastic Layering mode. The options are None, App Layers, App Layers plus Office 365 User Layers, and App Layers plus full User Layers.
With "None", the App Layering minifilter and registry virtualization drivers are not installed and do not run, so the boot times should be exactly the same as a non-layered machine. Use this as your baseline for determining how fast your image can allow a user to log in.
When you enable any of the Elastic Layering modes, an additional partition is added to the boot disk to capture file writes, and an additional registry hive (HKLM\RSD_P1) is attached to capture any registry modifications. So whether or not you actually have any Elastic Layer Assignments, just enabling the feature starts our virtualization services. When you boot a machine with Elastic Layers turned on, we have to enable the filter driver immediately to capture locally modified files, and the registry virtualization driver to capture registry changes. The boot partition and the registry hives on it are always at the bottom of the stack, and we need a place to preserve modified files and keys no matter what other layers are attached. So we have the second, writable partition on the boot disk and the extra registry hive which are kept at the top of the stack.Any Elastic Layers are inserted into the stack between the boot data and the writable data. We must always have that writable data at the top of the stack.
Layer disks are plain NTFS filesystems in VHD files. They have specific names that identify the layer by ID number that must not be changed. The virtualized registry hives are normal registry hive files that are mounted under HKLM as RSD_P<layer ID>. RSD stands for "Registry Splitter Driver", but any reference you see to RSD likely refers to the App Layering registry virtualization system.
What actually happens during boot
App Layering has two separate impacts on the boot time: we are part of the chain of services that must be notified of a login event, and we are part of the file and registry access path once the login is allowed to proceed.
When a user begins to login, Windows has a list of services that must be notified. The login process pauses until all notified services have returned. When the Citrix Layering Agent is notified, we perform the following functions:
- Identify the AD identifier of the user logging in, as well as the AD identifier of the machine itself (because either might have Elastically Assigned Layers)
- Using the user account of the person logging in, attach to the Network File Share
- Read the JSON files that identify what layers are available to what users, groups and machine accounts
- Query AD to find out if the user belongs to any of the AD groups
- Call Disk Management to attach any Elastically Assigned Layers that apply
- Connect (and create/format if necessary) the User Layer disk, if User Layers are enabled
- Wait for the filesystems to surface in Disk Management
- Mount the RSD hives under HKLM
- Alert our minifilter driver and registry virtualization driver that new layers have been attached
That may seem like a lot of steps, but it usually takes at most a few seconds to run through all of it. It can never be instantaneous. Its actual speed depends on the speed of your network, your file server, the local CPU and boot disk storage, and any security policies on the vDisk or elsewhere in the environment that might slow down any of these steps. Also, any policies that explicitly block removable disk devices might block our access to the Elastic Application and User Layer disks.
Then our driver returns control to Windows, which allows the rest of the login to complete. We have a separate impact on the rest of the login process, and our impact on the rest of the login is not zero, because we are now virtualizing the filesystem and the registry.
When a request comes in from any program to open a file, the App Layering minifilter driver intercepts the call and directs it to a specific disk. It checks every attached layer disk (which it identifies based on the filesystem label) to see which disks contain the file, and then determines the highest-priority disk, and redirects the File Open call to that disk. The writable disk (whether it's the second boot disk partition or the User Layer VHD) is always the highest priority, the boot partition is always the lowest priority, and the Elastic Layers sit between in numerical order based on layer ID. Once we have redirected the File Open to point to a specific disk, we step out of the way and allow the program to talk directly through Windows to the disk. Our overhead is only incurred when the file is opened. We have no direct impact on actually reading or writing the data. Note that your network speed to the file server, and the storage speed of the file server, definitely can have a measurable impact if it's slow enough. But the process of determining which disk a file should be opened on is nearly instantaneous.
For virtualizing the registry, we had to build, effectively, the same kind of system that the minifilter driver system provides us for files. Every layer has its own registry hive (called "RSD"), and we have to intercept every call into the registry and check all the mounted RSD hives. That process is considerably slower than filesystem virtualization, and that is where the bulk of the extra time when logging in is coming from virtualizing the registry. Every registry read and write is slower as a result, but most of the time you don't notice. You really notice when users are logging in, because that involves a huge amount of registry activity. You may also notice transient slowness when specific programs (like Visual Studio) that do a lot of registry actions start up. But normally, registry access is a pretty small part of your operations, and you don't notice the slowdown at all.
Both the filesystem and the registry virtualization performance depends on the number of currently attached layers; the more layers that have to be checked, the longer the check takes. Again, filesystem checks in general take much less time than registry key checks. However, users with many layers may see slower login performance than users with fewer layers.
Does the PVS Cache Disk have any impact on this?
No. The second partition, which is used to capture writes, is still on the same disk as the boot partition. It has no impact on login times, and it also has no impact on the utilization of the PVS cache disk. CTX227454 helps explain this a little.
https://support.citrix.com/article/CTX227454The reason App Layering doesn't impact the PVS cache disk is that it doesn't cause any additional writes to happen. When a file is written to in PVS, blocks are modified on the boot disk and those blocks are redirected into the cache disk. When a file is written in App Layering with EL, even though the file is written to the second partition, it's still the same: blocks are modified on the boot disk and those blocks are redirected into the cache disk. There is no additional write that happens, no additional I/O.
What can be done about slow logins?
Probably nothing. There are no tunable parameters in the Citrix Layering Agent or unifltr.sys or unirsd.sys. We work hard to be as fast as possible in our virtualization, and while it's possible we may figure out a way to be even faster in the future, there is nothing you can do to alter our process. All you can do is investigate whether some piece of software (like an antivirus) in the base image is causing a measurable impact, or if changing to a new hypervisor or datastore or network file share helps. Certainly make sure you are using a fast network, fast storage, and even caching on your file server. Check to see if some GPO policy or other setting or software pushed into the machine from the network has an impact. You may be able to make marginal improvements by investigating infrastructure like that. But there's nothing that App Layering offers directly in the way of tunable performance parameters.