s5393 - evolution of an nvidia grid™...
TRANSCRIPT
ERIK BOHNHORST, SR. GRID SOLUTION ARCHITECT, NVIDIA
RONALD GRASS, SR. SYSTEMS ENGINEER, CITRIX SYSTEMS
S5393 - EVOLUTION OF AN NVIDIA GRID™ DEPLOYMENT
Who implemented NVIDIA GRID with Citrix XenDesktop
Why did they want to move to a remote desktop solution
How did they evaluate and implemented NVIDIA GRID
Sales pitch & TechDemo
Proof of concept
Production environment
Challenges and learnings
How will they move forward
What we will cover
Manufacturing vertical
NVIDIA QUADRO customer
Competitive market
Wide range of CAD/CAE applications
Experienced with remote desktop solutions
Who are we talking about
Growing globalization within the company
Enabling remote sites across the globe
Increasing competition to hire the best
Allowing employees, partners and contractors to work from anywhere
Increasing competition to design and build faster with better quality
Increasing productivity and flexibility
Enable collaboration between internal and external teams
Increasing security breaches
Increasing the security and compliance
German law ( “Arbeitnehmerueberlassung”)
Enabling contractors to work off premise
Business Drivers and initiatives
Wouldn’t it be great if….
ON ANY DEVICE FROM ANYWHERE
COLLABORATION
PRODUCTIVITY
INCREASE
SECURITY &
COMPLIANCE
LESS REDUNDANT
INFRASTRUCTURE
Project Start – early 2013
Evaluation of multiple remote solutions
Interest in HP Blades due to the high density of GPUs
Customer received a sales pitch on NVIDIA vGPU & XenDesktop
Overall plan was to evaluate NVIDIA vGPU in early beta under NDA and compare NVIDIA vGPU vs. GPU Passthrough
Once upon a time ... when the customer started
Citrix vGPU announced during Synergy Keynote May 2013
Citrix & Nvidia Partnership since 2008 GRID Announced during Nvidia GTC Keynote May 2012
Nvidia RTM 2013 Sep 2013 vGPU Tech Preview Oct 2013 vGPU General Availability Dec 2013
Somewhere in between
[root@SM01 ~]# xe vm-list name-label=Win7-vGPU-01 uuid ( RO) : 831ab2f3-8e23-e876-d92a-16810a85499e name-label ( RW): Win7-vGPU-01 power-state ( RO): halted [root@SM01 ~]# xe vgpu-create vm-uuid=831ab2f3-8e23-e876-d92a-16810a85499e gpu-group-uuid=d840caad-2ce0-6395-78a5-9ac984667412 vgpu-type-uuid=5514073f-6d7b-90c6-6648-2335ad1cc81a 23908c99-eecb-835e-fd46-5936e0a3bf652 That’s our vGPU object as seen by the hypervisor (XenServer 6.2)
Evolution of Nvidia GRID / vGPU : 2013 vGPU beta Only 5 vGPU profiles + passthrough available K100, K140Q, K200, K240Q, K260Q, passthrough
Limited to Windows 7 only
Creating a passthrough or vGPU objects was possible through CLI only :(
No way to use passthrough and vGPU VMs at the same time
XenServer 6.2 only ( special patched)
Very limited hardware available
Evolution of Nvidia GRID / vGPU : 2013 RTM
Same 5 vGPU profiles + passthrough available K100, K140Q, K200, K240Q, K260Q, passthrough
Creation of vGPUs and monitoring of pGPUs through CLI or XenCenter (GUI)
Mass creations of vGPU enabled VMs through Desktop Studio (XenDesktop >7.1)
Passthrough and vGPU VMs can be run simultaneously
XenServer 6.2 SP1 with 64 vGPUs
w0rk5 f0r m3 ... s0 ch3ck the uuids, bl00dy n00b !!
I can‘t get it to work :-(
Lifecycle of a successful GRID implementation
Phase 1 (TechDemo) Conduct a techdemo for CAD/CAM responsibles / engineers that leads to a „WOW“-effect.
Phase 2 (Assessment & small and focussed PoC)
Phase 3 (widened PoC based on feedback)
Phase 4 (Implementation/User Acceptance/Production)
Phase 5 (Maintenance / Update / Daily Use)
Sales pitch & TechDemo – create the „WOW-effect“
We did a sales pitch on Nvidia GRID and a very convincing TechDemo of Citrix XenDesktop with vGPU on XenServer to create the WOW-effect
Demo Applications like Nvidia Hair, Nvidia Faceworks, Design Garage, Blender, VRRender, Autodesk 30 Day Trials or JT2Go have been used because of the lack of licenses and deep CATIA / SolidEdge / Siemens NX knowledge
Demonstrated access from mobile platforms ( Android - Galaxy Tab 10.1 and iOS - iPad)
We used Cloud-hosted Demo Center which proved this solution will work over WAN as well
Focused on user experience and used peripherals (i.e Spacepilot)
From WOW to HOW ? Next steps
Phase 1 (TechDemo)
Phase 2 (Assessment and very focussed PoC) Start with a strictly defined use case ( LAN only, specific applications, small usergroup)
Collect feedback on user experience, network
Phase 3 (widened PoC based on feedback) Evaluate user feedback
Widening use cases like remote access (WAN)
Use more complex drawings / models and higher end use cases ( Engineer vs. Viewer only )
Phase 4 (Implementation/User Acceptance/Production)
Phase 5 (Maintenance / Update / Daily Use)
Components involved
Dassault CATIA, Siemens NX, AutoDesk products, PTC Creo, JT2GO
Dual Socket Server with two
NVIDIA GRID K2
Hypervisor
NVIDIA vGPU Driver
Citrix Virtual Desktop Agent
CAD Application
Citrix XenDesktop 7.1 or 7.5
NVIDIA Display Driver 332.83 & corresponding vGPU Manager version
Citrix XenServer 6.2 SP1
2x Intel E5-2690 v2, 256 GB RAM, SSDs, 2x GRID K2
POC – Define virtual Workstations
User
Segmentation OS vCPUs Virtual GPU Frame Buffer GPU Mode
Remoting
Stack
GPUs per host
(2x GRID K2)
Entry Windows 7 4 GRID K220Q 512 NVIDIA vGPU Citrix
XenDesktop 32
Medium Windows 7 4 GRID K240Q 1024 NVIDIA vGPU Citrix
XenDesktop 16
Advanced Windows 7 4 GRID K260Q 2048 NVIDIA vGPU Citrix
XenDesktop 8
Expert Windows 7 4 GRID “K280Q” 4096 Passthrough Citrix
XenDesktop 4
Medium Linux 4 GRID K2 4096 Passthrough NICE DCV,
HP RGS 4
Expert Linux 4 GRID K2 4096 Passthrough NICE DCV,
HP RGS 4
Technical challenges
Physical laws (latency, bandwidth,packet loss)
Matching workstation-like user experience
Server / Client side rendered mouse cursor
Endpoint devices & endpoint performance (i.e. ThinClients)
High screen resolution – lots of data (UHD/4K)
Framerate / Low bandwidth / Graphics quality
API support
Distributed locations
Peripheral devices
Bandwidth, Latency, Network Quality
Quality and performance are in close relationship with available network (bandwidth) and distance (latency)
Average User ~1-2 Mbps *
Expert User ~4-5 Mbps *
20 Mbps for ~15 CAD/CAM Engineers *
Influencing parameters
Windows size and number of monitors
Screen resolution
Size of models, different usage patterns (VR, CAD, DMU, 3D-Viewing, etc.)
Individual perception / level of acceptance (User Experience)
* average measurements
Source: Customer presentation
Technical Pitfalls we experienced
64bit hardware (MMIO - BAR Mapping)
Server and GRID Card BIOSes
NUMA – Server architecture
Endpoint devices & performance (i.e. ThinClients + supported protocols)
Framebuffer grabbing (NVFBC / Monterey API)
POC– End user feedback
Source: Customer presentation
User
Segmentation OS vCPUs Virtual GPU Frame Buffer GPU Mode
Remoting
Stack
GPUs per host
(2x GRID K2)
Entry Windows 7 4 GRID K220Q 512 NVIDIA vGPU Citrix
XenDesktop 32
Medium Windows 7 4 GRID K240Q 1024 NVIDIA vGPU Citrix
XenDesktop 16
Advanced Windows 7 4 GRID K260Q 2048 NVIDIA vGPU Citrix
XenDesktop 8
Expert Windows 7 4 GRID “K280Q” 4096 Passthrough Citrix
XenDesktop 4
Medium Linux 4 GRID K2 4096 Passthrough NICE DCV,
HP RGS 4
Expert Linux 4 GRID K2 4096 Passthrough NICE DCV,
HP RGS 4
POC – IT administrator evaluation
Too little GPU frame buffer and not enough CPU resources
Great performance but doesn’t build the business case
Great performance and great scalability for most users
Great performance and good scalability for many users
POC – Sizing learning
Frame
Buffer
3D
Engine
NVIDIA QUADRO
Time scheduling allows highest densities without compromising performance
Customers need to understand the GPU requirements of their applications
VS
3D
Engine
Frame
Buffer
NVIDIA GRID vGPU
POC – Organizational challenges
Source: Customer presentation
Clarification of support by the software vendors
Decision on license model for CAx-Applications on virtual machines - international usage - usage by external partners, etc.
Adjusting applications or the associated environment for an optimal use of the applications on virtual machines
Support model for company internal and external users
Targets Project Schedule
Project result must be a validated technical solution which will be provided to customers internal departments and their external development partners as an IT Service
Lifecycle of a successful GRID implementation
Phase 1 (TechDemo)
Phase 2 (Assessment & small and focussed PoC)
Phase 3 (widened PoC based on feedback)
Phase 4 (Implementation/User Acceptance/Production) Educate support engineers / introduce support matrix
Implement daily managment processes like provisioning of new and patching of existing VMs
Phase 5 (Maintenance / Update / Daily Use)
Meanwhile things changed …
Evolution of Nvidia GRID / vGPU : 2014 vGPU 1.1 Introduced 2 additional vGPU profiles K100, K120Q, K140Q, K200, K220Q, K240Q, K260Q, passthrough
Powershell Interface available
nView and NVWMI supported on all vGPUs
Windows 8.1 and Windows Server 2012 R2 signed drivers are included
Various bugfixes
Expanded certified servers and certified applications list
Evolution of Nvidia GRID / vGPU : 2015 vGPU 1.2 Introduced 3 additional vGPU profiles K100, K120Q, K140Q, K160Q, K180Q, K200, K220Q, K240Q, K260Q, K280Q, passthrough
XenServer 6.2 SP1 and XenServer 6.5
96 vGPUs per host on XenServer 6.5
...and more to come ... stay connected
Many customer are now in full production
Many customer are now in full production
Did we succeed ?
How can we improve further?
Growing globalization within the company
Enabling remote sites across the globe
Increasing competition to hire the best
Allowing employees, partners and contractors to work from anywhere
Increasing competition to design and build faster with better quality
Increasing productivity and flexibility
Enable collaboration between internal and external teams
Increasing security breaches
Increasing the security and compliance
German law ( “Arbeitnehmerueberlassung”)
Enabling contractors to work off premise
Have we been successful ?
√
√
√ √
√
√
Lifecycle of a successful GRID implementation
Phase 1 (TechDemo)
Phase 2 (Assessment & small and focussed PoC)
Phase 3 (widened PoC based on feedback)
Phase 4 (Implementation/User Acceptance/Production)
Phase 5 (Maintenance / Update / Daily Use)
Maintenenance / Update / Daily Use
Upgrade to XenServer 6.5
Upgrade to XenDesktop 7.6
Upgrade to new Grid vGPU Manager and in-guest drivers
Lifecycle of applications, VMs & associated Baseimage
Educate the Nvidia/Citrix partners
Higher density (use H.264 Hardware encoding)
User Experience in hostile network environments (Framehawk)
Provide Linux based VMs for CAE etc
Collaboration features
Best practices and application specific whitepaper
Room for improvement
Lifecycle of a successful GRID implementation
Q & A
Summary - How successful projects lift off
Familiarize yourself with GRID (self paced learning, demo/test system)
Do a proper assessment on existing workstations (real GPU usage)
Leverage or build a close relationship with vendor (Citrix, Nvidia, etc)
Set the right expectations
Find a sponsor with a need to change the tradional workplace
Involve ALL people ( IT, CAD/CAM department, endusers, decision makers experienced virtualization partner)
Leverage partners who are familiar with desktop virtualization
Specify phases ( TechDemo, PoC, Implementation, Production )
Continously listen to End-User feedback