s5393 - evolution of an nvidia grid™...

ERIK BOHNHORST, SR. GRID SOLUTION ARCHITECT, NVIDIA

RONALD GRASS, SR. SYSTEMS ENGINEER, CITRIX SYSTEMS

S5393 - EVOLUTION OF AN NVIDIA GRID™ DEPLOYMENT

Who implemented NVIDIA GRID with Citrix XenDesktop

Why did they want to move to a remote desktop solution

How did they evaluate and implemented NVIDIA GRID

Sales pitch & TechDemo

Proof of concept

Production environment

Challenges and learnings

How will they move forward

What we will cover

Manufacturing vertical

NVIDIA QUADRO customer

Competitive market

Wide range of CAD/CAE applications

Experienced with remote desktop solutions

Who are we talking about

Growing globalization within the company

Enabling remote sites across the globe

Increasing competition to hire the best

Allowing employees, partners and contractors to work from anywhere

Increasing competition to design and build faster with better quality

Increasing productivity and flexibility

Enable collaboration between internal and external teams

Increasing security breaches

Increasing the security and compliance

German law ( “Arbeitnehmerueberlassung”)

Enabling contractors to work off premise

Business Drivers and initiatives

Wouldn’t it be great if….

ON ANY DEVICE FROM ANYWHERE

COLLABORATION

PRODUCTIVITY

INCREASE

SECURITY &

COMPLIANCE

LESS REDUNDANT

INFRASTRUCTURE

Project Start – early 2013

Evaluation of multiple remote solutions

Interest in HP Blades due to the high density of GPUs

Customer received a sales pitch on NVIDIA vGPU & XenDesktop

Overall plan was to evaluate NVIDIA vGPU in early beta under NDA and compare NVIDIA vGPU vs. GPU Passthrough

Once upon a time ... when the customer started

Citrix vGPU announced during Synergy Keynote May 2013

Citrix & Nvidia Partnership since 2008 GRID Announced during Nvidia GTC Keynote May 2012

Nvidia RTM 2013 Sep 2013 vGPU Tech Preview Oct 2013 vGPU General Availability Dec 2013

Somewhere in between

[root@SM01 ~]# xe vm-list name-label=Win7-vGPU-01 uuid ( RO) : 831ab2f3-8e23-e876-d92a-16810a85499e name-label ( RW): Win7-vGPU-01 power-state ( RO): halted [root@SM01 ~]# xe vgpu-create vm-uuid=831ab2f3-8e23-e876-d92a-16810a85499e gpu-group-uuid=d840caad-2ce0-6395-78a5-9ac984667412 vgpu-type-uuid=5514073f-6d7b-90c6-6648-2335ad1cc81a 23908c99-eecb-835e-fd46-5936e0a3bf652 That’s our vGPU object as seen by the hypervisor (XenServer 6.2)

Evolution of Nvidia GRID / vGPU : 2013 vGPU beta Only 5 vGPU profiles + passthrough available K100, K140Q, K200, K240Q, K260Q, passthrough

Limited to Windows 7 only

Creating a passthrough or vGPU objects was possible through CLI only :(

No way to use passthrough and vGPU VMs at the same time

XenServer 6.2 only ( special patched)

Very limited hardware available

Evolution of Nvidia GRID / vGPU : 2013 RTM

Same 5 vGPU profiles + passthrough available K100, K140Q, K200, K240Q, K260Q, passthrough

Creation of vGPUs and monitoring of pGPUs through CLI or XenCenter (GUI)

Mass creations of vGPU enabled VMs through Desktop Studio (XenDesktop >7.1)

Passthrough and vGPU VMs can be run simultaneously

XenServer 6.2 SP1 with 64 vGPUs

w0rk5 f0r m3 ... s0 ch3ck the uuids, bl00dy n00b !!

I can‘t get it to work :-(

Lifecycle of a successful GRID implementation

Phase 1 (TechDemo) Conduct a techdemo for CAD/CAM responsibles / engineers that leads to a „WOW“-effect.

Phase 2 (Assessment & small and focussed PoC)

Phase 3 (widened PoC based on feedback)

Phase 4 (Implementation/User Acceptance/Production)

Phase 5 (Maintenance / Update / Daily Use)

Sales pitch & TechDemo – create the „WOW-effect“

We did a sales pitch on Nvidia GRID and a very convincing TechDemo of Citrix XenDesktop with vGPU on XenServer to create the WOW-effect

Demo Applications like Nvidia Hair, Nvidia Faceworks, Design Garage, Blender, VRRender, Autodesk 30 Day Trials or JT2Go have been used because of the lack of licenses and deep CATIA / SolidEdge / Siemens NX knowledge

Demonstrated access from mobile platforms ( Android - Galaxy Tab 10.1 and iOS - iPad)

We used Cloud-hosted Demo Center which proved this solution will work over WAN as well

Focused on user experience and used peripherals (i.e Spacepilot)

From WOW to HOW ? Next steps

Phase 1 (TechDemo)

Phase 2 (Assessment and very focussed PoC) Start with a strictly defined use case ( LAN only, specific applications, small usergroup)

Collect feedback on user experience, network

Phase 3 (widened PoC based on feedback) Evaluate user feedback

Widening use cases like remote access (WAN)

Use more complex drawings / models and higher end use cases ( Engineer vs. Viewer only )



Components involved

Dassault CATIA, Siemens NX, AutoDesk products, PTC Creo, JT2GO

Dual Socket Server with two

NVIDIA GRID K2

Hypervisor

NVIDIA vGPU Driver

Citrix Virtual Desktop Agent

CAD Application

Citrix XenDesktop 7.1 or 7.5

NVIDIA Display Driver 332.83 & corresponding vGPU Manager version

Citrix XenServer 6.2 SP1

2x Intel E5-2690 v2, 256 GB RAM, SSDs, 2x GRID K2

POC – Define virtual Workstations

User

Segmentation OS vCPUs Virtual GPU Frame Buffer GPU Mode

Remoting

Stack

GPUs per host

(2x GRID K2)

Entry Windows 7 4 GRID K220Q 512 NVIDIA vGPU Citrix

XenDesktop 32

Medium Windows 7 4 GRID K240Q 1024 NVIDIA vGPU Citrix

XenDesktop 16

Advanced Windows 7 4 GRID K260Q 2048 NVIDIA vGPU Citrix

XenDesktop 8

Expert Windows 7 4 GRID “K280Q” 4096 Passthrough Citrix

XenDesktop 4

Medium Linux 4 GRID K2 4096 Passthrough NICE DCV,

HP RGS 4

Expert Linux 4 GRID K2 4096 Passthrough NICE DCV,

HP RGS 4

Technical challenges

Physical laws (latency, bandwidth,packet loss)

Matching workstation-like user experience

Server / Client side rendered mouse cursor

Endpoint devices & endpoint performance (i.e. ThinClients)

High screen resolution – lots of data (UHD/4K)

Framerate / Low bandwidth / Graphics quality

API support

Distributed locations

Peripheral devices

Bandwidth, Latency, Network Quality

Quality and performance are in close relationship with available network (bandwidth) and distance (latency)

Average User ~1-2 Mbps *

Expert User ~4-5 Mbps *

20 Mbps for ~15 CAD/CAM Engineers *

Influencing parameters

Windows size and number of monitors

Screen resolution

Size of models, different usage patterns (VR, CAD, DMU, 3D-Viewing, etc.)

Individual perception / level of acceptance (User Experience)

* average measurements

Source: Customer presentation

Technical Pitfalls we experienced

64bit hardware (MMIO - BAR Mapping)

Server and GRID Card BIOSes

NUMA – Server architecture

Endpoint devices & performance (i.e. ThinClients + supported protocols)

Framebuffer grabbing (NVFBC / Monterey API)

POC– End user feedback


User

Segmentation OS vCPUs Virtual GPU Frame Buffer GPU Mode

Remoting

Stack

GPUs per host

(2x GRID K2)

Entry Windows 7 4 GRID K220Q 512 NVIDIA vGPU Citrix

XenDesktop 32

Medium Windows 7 4 GRID K240Q 1024 NVIDIA vGPU Citrix

XenDesktop 16

Advanced Windows 7 4 GRID K260Q 2048 NVIDIA vGPU Citrix

XenDesktop 8

Expert Windows 7 4 GRID “K280Q” 4096 Passthrough Citrix

XenDesktop 4

Medium Linux 4 GRID K2 4096 Passthrough NICE DCV,

HP RGS 4

Expert Linux 4 GRID K2 4096 Passthrough NICE DCV,

HP RGS 4

POC – IT administrator evaluation

Too little GPU frame buffer and not enough CPU resources

Great performance but doesn’t build the business case

Great performance and great scalability for most users

Great performance and good scalability for many users

POC – Sizing learning

Frame

Buffer

3D

Engine

NVIDIA QUADRO

Time scheduling allows highest densities without compromising performance

Customers need to understand the GPU requirements of their applications

VS

3D

Engine

Frame

Buffer

NVIDIA GRID vGPU

POC – Organizational challenges


Clarification of support by the software vendors

Decision on license model for CAx-Applications on virtual machines - international usage - usage by external partners, etc.

Adjusting applications or the associated environment for an optimal use of the applications on virtual machines

Support model for company internal and external users

Targets Project Schedule

Project result must be a validated technical solution which will be provided to customers internal departments and their external development partners as an IT Service


Phase 1 (TechDemo)



Phase 4 (Implementation/User Acceptance/Production) Educate support engineers / introduce support matrix

Implement daily managment processes like provisioning of new and patching of existing VMs


Meanwhile things changed …

Evolution of Nvidia GRID / vGPU : 2014 vGPU 1.1 Introduced 2 additional vGPU profiles K100, K120Q, K140Q, K200, K220Q, K240Q, K260Q, passthrough

Powershell Interface available

nView and NVWMI supported on all vGPUs

Windows 8.1 and Windows Server 2012 R2 signed drivers are included

Various bugfixes

Expanded certified servers and certified applications list

Evolution of Nvidia GRID / vGPU : 2015 vGPU 1.2 Introduced 3 additional vGPU profiles K100, K120Q, K140Q, K160Q, K180Q, K200, K220Q, K240Q, K260Q, K280Q, passthrough

XenServer 6.2 SP1 and XenServer 6.5

96 vGPUs per host on XenServer 6.5

...and more to come ... stay connected

Many customer are now in full production

Many customer are now in full production

Did we succeed ?

How can we improve further?

Growing globalization within the company

Enabling remote sites across the globe

Increasing competition to hire the best

Allowing employees, partners and contractors to work from anywhere

Increasing competition to design and build faster with better quality

Increasing productivity and flexibility

Enable collaboration between internal and external teams

Increasing security breaches

Increasing the security and compliance

German law ( “Arbeitnehmerueberlassung”)

Enabling contractors to work off premise

Have we been successful ?

√

√

√ √

√

√


Phase 1 (TechDemo)





Maintenenance / Update / Daily Use

Upgrade to XenServer 6.5

Upgrade to XenDesktop 7.6

Upgrade to new Grid vGPU Manager and in-guest drivers

Lifecycle of applications, VMs & associated Baseimage

Educate the Nvidia/Citrix partners

Higher density (use H.264 Hardware encoding)

User Experience in hostile network environments (Framehawk)

Provide Linux based VMs for CAE etc

Collaboration features

Best practices and application specific whitepaper

Room for improvement


Q & A

Summary - How successful projects lift off

Familiarize yourself with GRID (self paced learning, demo/test system)

Do a proper assessment on existing workstations (real GPU usage)

Leverage or build a close relationship with vendor (Citrix, Nvidia, etc)

Set the right expectations

Find a sponsor with a need to change the tradional workplace

Involve ALL people ( IT, CAD/CAM department, endusers, decision makers experienced virtualization partner)

Leverage partners who are familiar with desktop virtualization

Specify phases ( TechDemo, PoC, Implementation, Production )

Continously listen to End-User feedback

s5393 - evolution of an nvidia grid™...

Documents