Troubleshooting Replication Status in Citrix Provisioning Server

Troubleshooting Replication Status in Citrix Provisioning Server

book

Article ID: CTX200804

calendar_today

Updated On:

Description

NOTE: Provisioning Services does not accomplish file replication. Replication can be completed using a some of the methods in the list:

  1. Manual Copy
  2. Robocopy
  3. DFS Replication

Most enterprise environments use automated software to replicate multiple files across servers on-demand or on a scheduled basis. Many times, however, it is beneficial for troubleshooting to use a manual method such as Copy/Paste in Windows that you can observe and make sure it completes successfully. Here are some Pros and Cons of all three methods:

1. Manual Copy (Copy/Paste)

PROS: Generally very reliable; the administrator can fully observe the process from the Windows user interface making sure that all the right files are being replicated.

CONS: Time and resource consuming, inefficient for large environments.

2. Robocopy

PROS: Extremely reliable command-line utility; full control over copying process via switches; wildcards are available for multiple selection; events are logged to a robocopy log.

CONS: Requires knowledge of robocopy syntax.

3. DFS Replication

PROS: Enterprise-ready, automated solution for file replication and synchronization; can be used on a scheduled basis.

CONS: Limited ability to troubleshoot without 3rd party intervention; requires continued maintenance and patching from Microsoft.

Whichever method you choose, perform some testing prior to using it in production to find out the one suits your needs.

As stated above, Provisioning Services does not perform vDisk replication. Provisioning Service (PVS) only shows the replication status or what we believe the current state of the files is. Every time you check replication status in the Provisioning Services Console, the Inventory process in PVS sends a UDP packet to the remote servers to check for the presence and integrity of .VHD/.AVHD files for the v-Disk specified. After synchronizing Inventory tables and file integrity is verified, replication status dialog box appears and should look like this:
User-added image

The two blue circles mean that PVS was able to contact the remote servers and the vDisk files are up-to-date. If the virtual disk is versioned and correctly replicated, replication status would like this:

User-added image

The first time you create a new version (AVHD) in a distributed HA environment where each PVS server has locally attached storage containing the vDisk, replication status will look like this:

User-added image

As observed above, when you hover your mouse over the remote server’s warning triangle, it displays a "File is missing" message notifying the user that the other server does not have the AVHD file for Version #1. In this case, you can either boot your target device from PVS01 in maintenance, make the necessary changes to the version, promote it to production, and then replicate the file over to the rest of the servers. The exception here is shared storage. If both Provisioning Servers are pointing to the same store (NFS/CIFS, etc.) as a UNC path, copying files between stores is not necessary.

NOTE: The document Provisioning Server Failover states, "Provisioning Services does not support High Availability of vDisks on local storage that are in Private Image mode or that are currently in maintenance (read/write enabled)."

Important: Copying AVHD (versioning) files from one store to another requires them to be exported first from the PVS Console to generate a manifest.xml file containing time stamp information. Without the XML file, versions cannot be reimported back to the Console.

To export a vDisk open the Provisioning Services Console and navigate to Farm => Sites => Site => vDisk Pools, right-click on the vDisk you wish to export, and choose "Export vDisk…"

User-added image

Anytime you get "File is missing" message in replication status, use Windows Explorer to navigate to the vDisk store of that server and ensure that all VHD, AVHD, PVP, and XML files pertinent to that virtual disk are present and if they are not, copy them over manually from their original location.

Another type of warning you may receive during replication check is "File is out of date" or "Properties are out of date." What this means is that the timestamp of a vDisk file has been changed due to a modification on one PVS server but has not been replicated over to the other:

User-added image

If you experience this issue, the fastest way to recover is to synchronize your files by copying them over from PVS01 to PVS02 manually. According to Microsoft TechNet documentation DFS Replication: Frequently Asked Questions (FAQ) the DFS Replication utility will not detect timestamp changes and will therefore not re-sync vDisks automatically unless there is a change in the actual files. Fortunately, Provisioning Services has a PowerShell/MCLI command that you can employ to see the exact timestamp that is out-of-sync. Here is an example output from this command:

PS F:\> MCLI Get DiskInfo -p siteName=SOC,storeName=STORE1_DFS-R_Prod,diskLocatorName=vDisk_PVS61_XA65_socxa65ncap -f diskLocatorId

Executing: Get DiskInfo

Get succeeded. 1 record returned.

Record #1

diskLocatorId: 95f8c606-6156-4f1f-b68a-f73bc044930f

 

PS F:\> MCLI Get DiskInventory -p diskLocatorId=95f8c606-6156-4f1f-b68a-f73bc044930f,version=3 -f ServerName,version,fileTime,state

Executing: Get DiskInventory

Get succeeded. 4 record(s) returned.

 

Record #1

serverName: SOCCTXPVS01

version: 3

fileTime: 2014-01-15 09:29

state: 2

Record #2

serverName: SOCCTXPVS03

version: 3

fileTime: 2014-01-15 09:29

state: 2

Record #3

serverName: SOCCTXPVS04

version: 3

fileTime: 2014-01-15 09:47

state: 0

Record #4

serverName: SOCCTXPVS05

version: 3

fileTime: 2014-01-15 09:29

state: 2

 

In the above example first we need to obtain the DiskLocatorID of the vDisk we are checking and then use it to get the Inventory information containing the timestamps of the file.

The third type of warning you may receive upon checking replication status is "Server is not reachable."

User-added image

What this message means is that the Inventory.exe process is unable to contact remote servers to validate the presence of a VHD file. The vast majority of cases have shown evidence that this is a network/connectivity/DNS-related problem. So how do we troubleshoot it?

First we verify basic network connectivity between Provisioning Servers by leveraging the Ping utility in Windows Command Prompt:

ping <ServerIP>

If a simple ping from PVS01 returns no response from PVS02, then the customer needs to restore the network connection between the two hosts. Very often this is a result of a firewall issue and the way to correct it is to turn on network discovery and put exclusions in place for all PVS-related ports Information about the Inter-Process Communication for Provisioning Services 6.x and 7.x or List of Provisioning Management Ports. Disabling the firewall altogether is ideal for testing in order to rule out this component as a possible culprit. You can use telnet from command line to test connectivity to individual ports:

telnet <servername> <port>

If telnet fails to open a connection to the specified port (for example, 6895 for Inventory), then the corresponding process in PVS will not be able to use it.

Second we verify that Provisioning Servers are able to resolve each other by using the Nslookup utility in Windows command-line:

nslookup <servername>

If nslookup fails to resolve the name or FQDN of the remote server, Notifier.exe will likely not be able to resolve it either and replication status will fail. In this case, the customer needs to add the necessary records in DNS for the PVS servers to enable name resolution. Refer to Importing a New vDisk Allows only the First Provisioning Services Server that Imported the vDisk to Allow Connections to it for a snippet of a failing name resolution in the Notifier log.

The problem gets a bit more complicated in multi-homed environments with multiple NICs on the servers because of the way PVS processes utilize network adapter order. In PVS 6.x when you run Provisioning Services Configuration Wizard you can only specify a single IP address to use for PVS (also known as Streaming IP). However, replication status might fail if the corresponding network interface is not on top of the NIC Binding order. For that reason, employ the following two techniques to resolve this issue:

In Windows Control Panel, go to Network and Internet => View network status and tasks => Change adapter settings => press F10 to bring the Advanced menu and select Advanced Settings… => Adapters and Bindings. Finally, click on the up arrow to move the PVS/Streaming adapter to the top of the list:

User-added image

Navigate to C:\Windows\system32\drivers\etc and modify the hosts file to add the PVS/Streaming IP and FQDNs of all PVS servers in the farm as well as the SQL server as follows:

<IP(PVS01)> <FQDN(PVS01)>

<IP(PVS02)> <FQDN(PVS02)>

<IP(SQL)> <FQDN(SQL)>

 

Save the file and flush the DNS resolver cache using the following command in command prompt:

Ipconfig /flushdns

Due to changes in architecture in PVS 7.x and when running the Provisioning Services Configuration Wizard, you can now specify a Management and a Streaming IP to separate the PVS traffic in two different interfaces. Because in PVS 7.x Inventory traffic is sent to the Management interface, in a multi-homed environment where replication status is failing, modify the NIC Binding order and the hosts record as shown above.  NOTEThe Management Interface should be specified instead of the Streaming adapter IP.

Sometimes the problem can be related to the Inventory process itself if it fails to send communication to the remote servers during replication status check. How do we validate this? Take a Process Monitor trace as an Administrator while checking replication status in the PVS Console (the ProcMon tool from Windows Sysinternals is available for download here). Open the trace and apply the following filters:

Process Name <is> Inventory.exe

Operation <is> UDP Send

Operation <is> UDP Receive

A working scenario will yield the following results from Inventory:

User-added image

We observe a successful UDP Send and Receive by Inventory from PVS01 to PVS02 over port 6895. If our process is failing to send those packets to remote servers, you will see the absence of them in ProcMon.

Another way to validate if an Inventory packet is ever sent from localhost is to run a network trace using WireShark or Netmon on PVS01 while checking replication status and apply the following display filter:

(ip.addr eq <IP(PVS01)> and ip.addr eq <IP(PVS02)>) and (udp.port eq 6895 and udp.port eq 6895)

A successful exchange of Inventory traffic between PVS servers will look like this:

User-added image

An empty trace after applying the display filter means our process is not sending traffic successfully.

When the validation of UDP traffic from Inventory is not being sent during replication status check and all previous troubleshooting steps have been completed, refer to the latest Provisioning Services Hot sheets sent on a monthly basis to the #Support – Provisioning Hot Sheets distribution list in Outlook or at http://kb.citrite.net/newsletters and scan them for any private fixes available for your issue.

As of 02/26/2014, there is a private fix available for PVS 6.1.19 and 7.1 for a random Inventory communication issue known to result in replication status failure. The defect reference is LA5762 and can be requested from Escalation if all the troubleshooting requirements above have been met.

Ultimately, if all regular troubleshooting is exhausted and no hotfixes are available for this issue, collect PVSDataTools along with a CDF trace, WireShark, and ProcMon of the issue and engage an Escalation Engineer via your Technical Lead for expert advice or case handoff.

Issue/Introduction

The purpose of this paper is to provide general guidance on how to troubleshoot issues with vDisk replication status in PVS.