s9884 user experience is key to vdi success, color … · 2019-03-29 · s9884 user experience is...
TRANSCRIPT
Nachiket Karmakar – Sr. Performance Engineer - NVIDIA
S9884 USER EXPERIENCE IS KEY TO VDI SUCCESS, COLOR ACCURACY IS THE KEY TO USER EXPERIENCE
2
SESSION TARGET
• CITRIX PROTOCOL OVERVIEW
• PROTOCOL/CODEC USAGE SCENARIOS
• IMAGE QUALITY HUMAN EYE & SSIM MEASUREMENT FOR H.264
• BANDWIDTH COMPARISON FOR VIDEO USE CASE
• VDI ON SCALE TESTING
• WRAP-UP
Why is it key to choose the right protocol to get the best user experience
3
PROTOCOL & CODECS
Video Codec Policy Region Visual Quality CODECS USED HW ENC*
Do Not Use Region optimized MediumStatic: JPEG (90) + 2D/MDRLE
Video: Adaptive JPEG (10-65)No
For Entire Screen Entire Screen Medium H.264 4:2:0 Yes
For act. changing regions Region optimized MediumStatic: JPEG (90) + 2D/MDRLE
Video: H.264 4:2:0Yes
H.264+TextOptimization* Entire Screen Medium H.264 4:2:0 + Lossless Text No
For Entire Screen Entire Screen Build To LosslessH.264 4:2:0 during activity,
2D/MDRLE when stationaryYes
For Entire Screen Entire Screen Visual lossless: Medium H.264 4:4:4 Yes
For Entire Screen (H.265) Entire Screen Medium H.265 4:2:0 Yes
For act. changing regions
(H.265)Region optimized Medium
Static: JPEG (90) + 2D/MDRLE
Video: H.265 4:2:0Yes
For act. changing regions
(H.265)Entire Screen Build To Lossless
H.265 4:2:0 during activity,
2D/MDRLE when stationaryYes
Citrix XenDesktop 7.18
* videocodec (H.264/H.265) part via NVENC* no policy available for TextOpt
4
CODECS & USE CASE
Bitmap (JPG, RLE) H.264 H.265
• 2DRLE/MDRLE for text/crisp areas,
JPEG for photographic imagery
• „Build to Lossless“ and „Always
Lossless“ policies for pixel perfect
quality
• Many compression policies (Image
quality, color depth, etc.)
• Can utilize client side bitmap cache
• No hardware encoding (NVENC)
• Very bandwidth efficient for static
content
YUV 4:2:0• Good compression and visual quality
• Hardware encoding (NVENC)
• Chroma subsampling yields blurred text
• Bandwidth efficient for video/moving
images
YUV 4:4:4• Very good visual quality
• Hardware encoding (NVENC)
• No chroma subsampling
• Great for sharp graphics as well as text
• Increase in bandwidth
YUV 4:2:0• Better compression at same visual
quality or same quality at lower
bandwidth (compared to H.264)
• Requires hardware encoding (NVENC)
• No CPU encoding as it would be to cost
intensive (~8xCPU load compared to
H.264)
• Requires specific endpoint capabilities
to decode H.265. Use 3rd party tools
like DXVAChecker to see if your endpoint
is capable
What to use when...
Office VDI usage
3D VDI usage
3D VDI usage with high color accuracy requirements
3D VDI usage in low bandwidth
scenarios
5
CODECS & USE CASE
Mixed Mode (Video and Bitmap)
Adaptive Display / Selective H.264/H.265
• „Hybrid“: Use the best available codec for a specific screen „region“
• Leverages hardware encoding H.264/H.265 (NVENC) for video regions (a.k.a. „Selective H.264“). If HW encoding not available,
software H.264 encoding is used.
• Very good image quality for static content (Bitmap) and low bandwidth requirement for moving images/video (H.264/H.265)
H.264/H.265 / Build to Lossless (NEW with 7.18)
• Hardware encoding (NVENC) for video codec usage
• „Sharpening“ effect when changing from moving to static content but pixel perfect quality
• Chroma subsampling less problematic as it is used only for moving images/video
What to use when...
Office VDI usage with multimedia content
3D VDI usage with high color accuracy requirements and low bandwidth
6
IMAGE QUALITY COMPARISON
7
COMPARISON H.264YUV4:2:0 and YUV4:4:4 (Reference Image)
8
COMPARISON H.264YUV4:2:0 and YUV4:4:4
Citrix YUV420 Citrix YUV444
Citrix YUV420 Citrix YUV444
9
H.264 (STATIC TEXT)
YUV4:2:0 YUV4:4:4
11
IMAGE QUALITYStatic Text
H.264 YUV 4:2:0(Entire Screen)
H.264 YUV 4:4:4(Entire Screen)
Bitmap MDRLEH.264 YUV 4:2:0(Active Regions)
H.264 YUV 4:2:0(Entire Screen, VQ:
BTL)
H.264 YUV 4:2:0(TextOptimization)
H.265 YUV 4:2:0(Entire Screen)
H.265 YUV 4:2:0(Entire Screen, VQ:
BTL)
H.265 YUV 4:2:0(Active Regions)
SSIM (StaticText) 0.83086 0.98362 0.99995 0.99994 0.9999 0.99111 0.83118 0.99872 0.99993
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
SSIM
Image Quality (Static Text)
12
IMAGE QUALITYHeatmaps
H264 YUV 4:2:0(Entire Screen)
H264 YUV 4:2:0 (BTL)
H264 YUV 4:2:0 (TextOptimization)
H264 YUV 4:4:4: (Entire Screen)
H265 YUV 4:2:0(Entire Screen) Bitmap Encoding
(JPEG/RLE)
13
COMPARISON H.264YUV4:2:0 and YUV4:4:4 (Reference Image)
14
COMPARISON H.264YUV4:2:0 and YUV4:4:4
15
H.264 (WIREFRAME)
YUV4:2:0 YUV4:4:4
17
IMAGE QUALITYWireframe
H.264 YUV 4:2:0(Entire Screen)
H.264 YUV 4:4:4(Entire Screen)
Bitmap MDRLEH.264 YUV 4:2:0(Active Regions)
H.264 YUV 4:2:0(Entire Screen, VQ:
BTL)
H.264 YUV 4:2:0(TextOptimization)
H.265 YUV 4:2:0(Entire Screen)
H.265 YUV 4:2:0(Entire Screen, VQ:
BTL)
H.265 YUV 4:2:0(Active Regions)
SSIM (Wireframe) 0.99083 0.99738 0.99158 0.98559 0.99992 0.9915 0.99162 0.99994 0.99144
0.5
0.55
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
SSIM
Image Quality (Wireframe)
18
BANDWIDTH COMPARISON (VIDEO)
19
BANDWIDTH COMPARISONVideo playback scenario
141408x592 window size
2:30min duration
Win10 with 1920x1200 resolution, [email protected], P40-1B profile
20
BANDWIDTH COMPARISONVideo playback @ 30fps
CODECVisual
Quality
Encoder
CPUTotal FPS MB transfered
Bitmap JPG/RLE Medium 7% 3693 355MB
H.264 YUV420 Medium 2% 3736 220MB
H.264 YUV444 Medium 3% 3728 655MB
H.264/Bitmap* Medium 7% 3698 205MB
H.264 Build To lossless 5% 3642 195MB
H.264 TextOpt Medium 23% 3448 160MB
H.265 YUV420 Medium 2% 3766 180MB
H.265/Bitmap* Medium 8% 3721 185MB
H.265 Build To Lossless 5% 3796 175MB
*Adaptive Display (active changing regions)
21
BANDWIDTH COMPARISONVideo playback @ 30fps
CODECVisual
Quality
Encoder
CPUTotal FPS MB transfered
Bitmap JPG/RLE High 8% 3633 610MB
H.264 YUV420 High 2% 3719 210MB
H.264 YUV444 High 4% 3716 690MB
H.264/Bitmap* High 5% 3671 215MB
H.264 Build To lossless 5% 3642 195MB
H.264 TextOpt High 22% 3508 160MB
H.265 YUV420 High 3% 3780 185MB
H.265/Bitmap* High 7% 3627 175MB
H.265 Build To Lossless 5% 3796 175MB
*Adaptive Display (active changing regions)
22
VDI ON SCALE TESTING24 VMS ON 1 TESLA P40
23
TEST SYSTEMConfiguration Details
Host Configuration VDI Configuration
Cisco UCS C240 M5 vCPU - 4
Intel Xeon Gold 6154 @ 3.00 GHz vRAM – 4096 MB
VMware ESXi 6.7 NIC – 1 (E1000)
Number of CPUs: 36 (2 x 18) Hard Disk – 40 GB
Memory: 768 GB vGPU – P40-1B
Storage: All-Flash SAN (iSCSI) Virtual Hardware – vmx-14
Hyperthreading, Turbo boost FRL enabled - Yes
Power Setting: High Performance VDI agent – CITRIX XenDesktop 7.18
GPU: 1 x P40 CITRIX HDX
GPU Scheduling Policy – Best Effort Number of Screens - 2
NVIDIA vGPU Driver 6.2 390.72 Screen Resolution – 1920 x 1080
Cirrus Knowledge Worker Workload (Excel, Word, PowerPoint, Chrome, Media Player, PDF)
24
END USER LATENCY (CLICK TO PHOTON)
H.264 YUV4:2:0 (Entire
Screen)
H.264 YUV4:4:4 (Entire
Screen)
BitmapJPG/RLE
H.264 YUV4:2:0 (Active
Regions)
H.264 YUV4:2:0 (EntireScreen, VQ:
BTL)
H.264 YUV4:2:0
(TextOptimization)
H.265 YUV4:2:0 (Entire
Screen)
H.265 YUV4:2:0 (EntireScreen, VQ:
BTL)
H.265 YUV4:2:0 (Active
Regions)
End User Latency 115 166 199 199 116 132 115 132 201
0
50
100
150
200
250M
illis
eco
nd
s
End User Latency
25
TOTAL REMOTED FRAMES
H.264 YUV4:2:0 (Entire
Screen)
H.264 YUV4:4:4 (Entire
Screen)Bitmap MDRLE
H.264 YUV4:2:0 (Active
Regions)
H.264 YUV4:2:0 (EntireScreen, VQ:
BTL)
H.264 YUV4:2:0
(TextOptimization)
H.265 YUV4:2:0 (Entire
Screen)
H.265 YUV4:2:0 (EntireScreen, VQ:
BTL)
H.265 YUV4:2:0 (Active
Regions)
Total FPS 11684.33333 11799.08333 13347.625 13165.20833 20278.20833 11608.41667 11564.33333 20006.45833 13220.5
0
5000
10000
15000
20000
25000
Remoted Frames
26
BANDWIDTH H.264
0
5000
10000
15000
20000
25000
1
14
27
40
53
66
79
92
10
5
11
8
13
1
14
4
15
7
17
0
18
3
19
6
20
9
22
2
23
5
24
8
26
1
27
4
28
7
30
0
31
3
32
6
33
9
35
2
36
5
37
8
39
1
40
4
41
7
43
0
44
3
45
6
46
9
48
2
49
5
50
8
52
1
53
4
54
7
56
0
57
3
58
6
59
9
61
2
62
5
63
8
65
1
66
4
67
7
69
0
70
3
71
6
72
9
74
2
75
5
76
8
Mb
its
ESX Server - Transmitted Bandwidth (Cumulative Mbits)
H.264 YUV 4:2:0 (Entire Screen) H.264 YUV 4:4:4 (Entire Screen) Bitmap JPG/RLE H.264 YUV 4:2:0 (Active Regions) H.264 YUV 4:2:0 (Entire Screen, VQ: BTL)
27
BANDWIDTH H.265
0
5000
10000
15000
20000
25000
11
32
53
74
96
17
38
59
71
09
12
11
33
14
51
57
16
91
81
19
32
05
21
72
29
24
12
53
26
52
77
28
93
01
31
33
25
33
73
49
36
13
73
38
53
97
40
94
21
43
34
45
45
74
69
48
14
93
50
55
17
52
95
41
55
35
65
57
75
89
60
16
13
62
56
37
64
96
61
67
36
85
69
77
09
72
17
33
74
57
57
76
9
Mb
its
ESX Server - Transmitted Bandwidth (Cumulative Mbits)
Bitmap JPG/RLE H.265 YUV 4:2:0 (Entire Screen) H.265 YUV 4:2:0 (Entire Screen, VQ: BTL) H.265 YUV 4:2:0 (Active Regions)
28
WRAP-UP
• H.264 BTL is a very interesting addition for different use cases. If the users get used to the „sharpening“ effect in their session this is the best possible compromise between visual quality, performance and bandwidth consumption which finally leads to the best achievable USER EXPERIENCE
• Bitmap (Thinwire+) is still a good solution for pure office VDI use case, same applies to Mixed Mode(Adaptive Display)
• H.265 leads to slightly reduced bandwidth consumption and is therefore interesting for 3D use cases with limited bandwidth
Analyzing the data lead to the following...
29
USEFUL TECHNICAL RESOURCES
• http://sschaber.de/blog/
• https://www.nvidia.com/object/better-ux.html
• https://www.nvidia.com/object/quantifying-impact-of-vgpu-whitepaper.html
• https://www.nvidia.com/en-us/design-visualization/solutions/virtualization/resources/
Blogs, white papers and everything vGPU
30
NVIDIA VIRTUAL GPU RESOURCES
Virtual GPU Test Drivehttps://www.nvidia.com/tryvgpu
NVIDIA Virtual GPU Websitewww.nvidia.com/virtualgpu
NVIDIA Virtual GPU YouTube Channelhttp://tinyurl.com/gridvideos
Questions? Ask on our Forumshttps://gridforums.nvidia.com
NVIDIA Virtual GPU on LinkedInhttp://linkd.in/QG4A6u
Follow us on Twitter@NVIDIAVirt
31
Q & A