Rate this Article:
You must be signed in to rate again
Article Feedback Print View
Alternate Languages: N/A

Case Study: Application Launch Failures In a WAN Multi-zone Environment

Document ID: CTX111883   /   Created On: Jan 9, 2007   /   Updated On: Nov 2, 2007
Average Rating: not yet rated

Problem Definition

Users traveling to regional offices were unable to access applications published on servers in their home regions. Intermittently users would receive the error message:

“An error occurred when connecting to the MetaFrame server to launch the application. Please make sure that the MetaFrame server is running and that the network is functioning.” 

Environment

Client Access: Web Interface 4.0

Metaframe: Presentation Server 3.0 with Service Pack 2005.04

Server: Windows 2003, Service Pack 1

Data Store: SQL 2000 Service Pack 3 - MDAC 2.8

Zones: 3 geographically separated Zones, 2 Data Collectors per Zone - 1 PDC/1 BDC

Hotfixes: PSE400W2K3R01

This environment consists of three zones spanning across North America, Europe, and Asia. There are approximately 600 servers in the New York office, 200 servers in London and 100 in Tokyo. These are Presentation Server 3.0 with Service Pack 2005.4, R01 and Web Interface 4.0. Users launch the Web Interface site and based on their location they are routed through a VIP to a localized Web Interface server. Load sharing has not been enabled between the zones.

Troubleshooting Methodology

Launch.ica file: Attempting to examine the launch.ica file also failed. The “save file as” operation would hang or produce a similar error.

CDF Tracing: CDF tracing was performed on the Zone Data Collectors (which are also used as the XML brokers by Web Interface in this environment). Trace modules were selected for capturing Dynamic Store and XML data.

Details

Using a batch file with tracelog.exe, the following (four) pre-configured trace files were provided:

Runtrace.cmd containing the following:
tracelog -start tracesession -guid ZDC_trace.ctl -flag 0xffffff -level 16 -cir 50 -f c:\ZDC.etl

Stoptrace.cmd containing:
tracelog -stop tracesession

ZDC_trace.ctl containing:

    7460B365-C1D2-495E-843E-5C88865CA6F1 MF_Driver_Wdica
    5DF7C852-3BB0-49A2-A2DD-3D63D0B143DB MF_DLL_Wsxica
    7EB18582-343E-4239-95DD-F34F6C1D60BC MF_Services_ServerFTA
    04979F4A-470E-4569-97A9-A0D6FB785872 MF_Service_CtxXmlSS
    28FB3FE7-B3E9-4E46-B462-D8AAB4AC3E0E MF_SDK_MfcomExe
    0432987E-F918-4598-8AA0-50657FDFE334 IMA_Sals_MfServer
    08265143-ADBE-4578-8EDB-987FD15F3104 IMA_Subsystems_Browser
    5BD888D6-6540-462C-A011-9B0D2C205B3B IMA_Subsystems_MfServer
    5AD82332-790A-4B1C-941C-AADF4AC4BB25 IMA_System_System
    50381E5B-C32A-455B-B3C2-9735570677F2 IMA_Runtime_DynamicStore
    4AEAF09B-6997-4CF3-96F4-F823A46510DC IMA_Runtime_ZoneManager
    5D452398-2CE7-4A5B-955E-F907A86BC5F7 IMA_Runtime_HostResolver
    3A02EF43-BC8F-4D76-BE63-2CE5EAFE7126 IMA_Runtime_PersistentStore
    F2F8EC10-BDDF-4D92-9015-A07D3D2B97B8 IMA_Runtime_Runtime

QFarm: QFarm /Offline output determined whether or not the local Zone Data Collector acknowledges servers belonging to the remote zones.

Technical analysis

In a multi-zone environment, a Zone Data Collector must establish a “gateway” to each remote Zone Data Collector in order to share dynamic data between zones. This process of building gateways is done automatically. Once the gateway has been established, Dynamic Store information can be replicated between the remote and local zones.

To validate gateway communication, heartbeat pings are forwarded between Zone Data Collectors, and ping responses are received in return. Once the response is received, the ping (failed) counter is reset.

Example

Sending PING to host [Server02TS].

Ping Succeeded and pingFailedCount has been reset

If a zone times out while waiting for a response, it will add 1 to its counter and send another ping. Once 5 timeouts occur, the gateway is torn down and all data from the remote zone is purged from the local Dynamic Store. The gateway is then recreated.

Example (from CDF trace on Tokyo Zone Data Collector / XML broker):

The ping communication failure occurs:

No Pings received for 5 ping times.

Creating all gateways

At this point Citrix Engineering was able to focus on a problem in the area of inter-zone communication. This would affect multi-zone environments where zones are separated by high-latency networks. The trace data shows the Zone Data Collector gateways being destroyed between Tokyo, New York City and Europe.

Example (from CDF trace on Tokyo Zone Data Collector):

[0]1518.1448::09/08/2006-16:52:12.194 [ds2]Destroying gateway for zone [NorthAmerica]

[0]1518.1448::09/08/2006-16:52:12.194 [ds2]Destroying gateway for zone [Europe]

End user experiences application resolution or launch attempt failures:

[0]1518.1448::09/08/2006-16:53:11.068 [mfserver]MFServer::ResolveAppInZones : do not have a zone preference list or resolution failed in those zones

The local zone tries again to ping the remote zone. Once the gateway to the remote zone is rebuilt, all of the dynamic store data tables (applications, servers, users..) are imported and rewritten to the local Dynamic Store. When this process takes place, application and server information goes missing and the launch.ica file cannot be created.

Sporadically, you may also notice that QFARM /OFFLINE lists the servers belonging to remote zones.

Resolution / Recommendation

1. PSE400R01W2K3034 [CPR#128340]

This issue of Gateways being torn down has been addressed in hotfix PSE400R01W2K3034 (now replaced with PSE400R01W2K3076 and PSE400W2K3R02). CDF tracing confirmed that the behavior also occurred on the test servers. Hotfix 34 was applied to the test environment and the resolution was confirmed with further CDF tracing. Hotfix 34 was then deployed to the customer’s production environment. However, it took approximately 2 hours before the zone data was fully converged.

2. PSE400R02W2K3003 [CPR#143478]

The change in hotfix 34 helps to reduce the tearing down of Gateways, however, it still required a significant amount of time for all zones to converge the information, for example, when Zone Data Collectors are changed or restarted intentionally. With the current customer configuration, this took 2 hours to converge zone data.

Hotfix PSE400R02W2K3003, was applied to the Zone Data Collectors. This significantly reduced the convergence time from 2 hours to less than a ½ hour.

Note: Hotfix PSE400R02W2K3003 is not yet publicly available.

3. Recommendation to remove the Gateway Validation Interval registry key.

Caution! This fix requires you to edit the registry. Using Registry Editor incorrectly can cause serious problems that may require you to reinstall your operating system. Citrix cannot guarantee that problems resulting from the incorrect use of Registry Editor can be solved. Use Registry Editor at your own risk. Be sure to back up the registry before you edit it.

The registry below key was previously added to the Zone Data Collectors in this environment:

    HKEY_LOCAL_MACHINE\SOFTWARE\Citrix\IMA\RUNTIME
    GatewayValidationInterval (DWORD)
    Value: 0x00007530 (hex)

Reducing the GatewayValidationInterval registry value can cause an adverse effect in high latency environments. This was previously documented in the Advanced Concepts Guide. We have since revised this recommendation for High latency environments and produced the article referenced below.

Additional Information

CTX111103 – Gateway Validation Interval Registry Value Introduces Application Launch Failures in High Latency Environments

CTX107059 – Advanced Concepts Guide


This document applies to:

  • Presentation Server 4.0 for Microsoft Windows 2000
  • MetaFrame Presentation Server 3.0 for Microsoft Windows 2003
  • Web Interface for Presentation Server 4.0
  • Presentation Server 4.0 for Microsoft Windows 2003
  • Web Interface 4.2
  • MetaFrame Presentation Server 3.0 for Microsoft Windows 2000
  • Presentation Server 4.0 for Solaris
Search
Knowledge Center
Presentation Server
Presentation Server Clients (ICA)
XenServer
XenDesktop
NetScaler Application Delivery
Access Gateway
EdgeSight
Provisioning Server
WANScaler
Password Manager
Citrix Developer Community