This article provides an overview of the primary factors that affect video quality when using a XenApp-hosted or XenDesktop-hosted Skype for Business client.
Citrix and Microsoft co-developed a solution that optimizes the delivery of Skype for Business from XenApp and XenDesktop by redirecting audio-video processing to the user device whenever possible and delivering audio-video traffic out-of-band from the Citrix ICA protocol so that it does not “hairpin” through the XenApp or XenDesktop server. The joint solution consists of (1) updated versions of the Skype for Business client that include the Media Manager API and (2) the Citrix HDX RealTime Optimization Pack (RTOP) 2.x for Skype for Business. In optimized mode, media processing is performed by the HDX RealTime Media Engine (RTME), a Citrix Receiver plug-in that runs on the user device. If there is no HDX RTME on the user device, the solution provides for fallback to server-side media processing and audio-video delivery over ICA using Citrix-developed and supported technologies.
The key factors affecting video quality in optimized mode are:
Most Skype for Business customers use the Microsoft-provided Audio-Video Conferencing Server as their conference bridge (also known as a multipoint control unit [MCU]). The Skype for Business A-V Conferencing Server is a media relay bridge. Unlike MCUs from some other vendors (e.g. Pexip, Cisco) which may also be used in Skype for Business implementations, the Microsoft A-V Conferencing Server does not perform transcoding.
During a video conference call, the HDX RealTime Media Engine on each device negotiates video quality with the Skype for Business A-V Conferencing Server. Likewise, native Skype for Business clients running outside of the Citrix XenApp or XenDesktop environment negotiate video quality.
HDX RealTime Optimization Pack 2.3 introduced support for Simulcast Video (multiple concurrent video streams). Simulcast allows endpoints to send more than one video stream at a time, if more than one resolution is requested for incoming Video Source Requests. There are many factors that determine the number of video streams and their resolution, frame rate, and bit rate. These factors include the endpoint capabilities, bandwidth availability, and decoding/encoding capabilities.
HDX RealTime Optimization Pack 2.2.x (and below) supported only a single video stream (unicast). Consequently, the lowest video quality negotiated by a participant to the video conference dictated the video quality for all participants.
The native Skype for Business client requests a video resolution based on the size of the video window on the user’s screen. HDX RealTime Optimization Pack 2.3 introduced an equivalent capability, whereby the RealTime Media Engine requests video resolution based on the size of the user’s video window. This feature minimizes network and CPU load without sacrificing quality.
Onboard H.264 hardware encoding leverages the chipset of the user device. The RTOP can also leverage outboard hardware encoding in UVC 1.1/1.5 compliant webcams (e.g. Logitech C930e).
On Windows devices, the RTME supports both onboard and outboard hardware encoding. On Linux devices, the RTME supports outboard encoding but does not offer support for onboard encoding. On the HDX Ready Pi, onboard encoding is performed using the Broadcom chip, but outboard hardware encoding is not supported. H.264 hardware acceleration is not currently used on Apple Mac OS X devices as these offer higher end processing speeds.
In addition to H.264, Skype for Business also supports the previous generation Microsoft-proprietary RT-Video codec. This is a software codec; no hardware acceleration is available.
The minimum standard for HDX Premium thin clients as of RTOP 2.4 is 720p resolution.
Video quality is highly dependent on what the user’s video camera can deliver. To state the obvious, HD video can only be delivered by an HD-capable webcam. The built-in cameras on many mobile devices (laptops, notebooks, tablets) are often of lower quality than modern external webcams.
Although it is not required to use a Skype for Business qualified webcam (and likewise a Skype for Business qualified audio device), it is highly recommended. For a list of Skype for Business qualified devices, see Phones and Devices for Skype for Business on Microsoft TechNet.
When the user device does not support onboard H.264 hardware encoding, a UVC 1.1 or 1.5 compliant webcam provides outboard H.264 hardware encoding that can often deliver a superior experience to CPU-based encoding on the device, especially on thin clients that have slower CPUs than typical PCs. UVC-based hardware encoding is supported on point-to-point and conference calls.
This factor is one of the easier ones to understand. If bandwidth is low, video resolution and frame rate will be reduced. If packet loss is high, Forward Error Correction may not be able to overcome it and video will be degraded. Inter-packet arrival jitter will also affect video quality. A wired connection can often provide more bandwidth and stability and, if available, should be used in preference to a Wi-Fi connection.
On congested networks, Quality of Service (QoS) should be implemented to allow real-time audio and video packets to be prioritized over other network traffic.
Like the native Skype for Business client, the HDX RTME observes the bandwidth policy settings established on the Skype for Business server. Many customers deliberately constrict bandwidth on video calls to protect their network. In such cases, HD video will not be possible.
Third-party technologies that repacketize real-time audio-video, change the sequence in which packets are delivered, or delay packet delivery, can negatively impact the quality of video calls.
When there is no HDX RTME on a user’s device, the RTOP supports optional fallback to server-side media processing (audio-only or audio plus video). In this scenario, the HDX RTME within the server-side HDX RealTime Connector is used for audio-video processing, and the media traffic is delivered over the ICA protocol. The following ICA virtual channels are used: