GPU VirtualizationYiying Zhang
A Full GPU Virtualization Solution with Mediated
Pass-Through
Kun Tian, Yaozu Dong, David Cowperthwaite [email protected],
GPUvm:'Why'Not'Virtualizing'GPUs'at'the'Hypervisor?�
Yusuke'Suzuki*'in'collaboraBon'with'
Shinpei'Kato**,'Hiroshi'Yamada***,'Kenji'Kono*''
*'Keio'University'**'Nagoya'University'
***'Tokyo'University'of'Agriculture'and'Technology'�
Graphic'Processing'Unit'(GPU)�• GPUs'are'used'for'dataMparallel'computaBons'– Composed'of'thousands'of'cores'– Peak'doubleMprecision'performance'exceeds'1'TFLOPS'– PerformanceMperMwaT'of'GPUs'outperforms'CPUs'
• GPGPU'is'widely'accepted'for'various'uses'– Network'Systems'[Jang&et&al.&’11],'FS'[Silberstein&et&al.&’13]&
[Sun&et&al.&’12],'DBMS'[He&et&al.&’08]&etc.'
L2'Cache�L1� L1� L1� L1� L1� L1� L1�
Video'Memory�Main'Memory�CPU�
NVIDIA/GPU�
MoBvaBon�• GPU'is'not'the'firstMclass'ciBzen'of'cloud'compuBng'environment'– Can'not'mulBplex'GPGPU'among'virtual'machines'(VM)'– Can'not'consolidate'VMs'that'run'GPGPU'applicaBons'
• GPU'virtualizaBon'is'necessary'– VirtualizaBon'is'the'norms'in'the'clouds'
VM�
GPU�
VM� VM�
Hypervisor�Share'a'single'GPU'among'VMs' Physical'
Machine�
VirtualizaBon'Approaches'
• Categorized'into'three'approaches'1. I/O'passMthrough'2. API'remoBng'3. ParaMvirtualizaBon'
I/O'passMthrough'
• Amazon'EC2'GPU'instance,'Intel'VTMd&
– Assign'physical'GPUs'to'VMs'directly'
– MulBplexing'is'impossible'
VM�
GPU�
VM�
GPU�
VM�
Hypervisor�
…�
Assign'GPUs'
to'VMs'
directly'
GPU�
API'remoBng'• GViM'[Gupta&et&al.&’09],'rCUDA'[Duato&et&al&’10],'VMGL'[Largar?Cavilla&et&al.&’07]'etc.'– Forward'API'calls'from'VMs'to'the'host’s'GPUs'– API'and'its'version'compaBbility'problem'– Enlarge'the'trusted'compuBng'base'(TCB)'
Wrapper'Library'v4�
VM�
Hypervisor�
GPU�Forwarding'API'calls�
…�Wrapper'Library'v5�
VM�
Driver�
Host�Library'v4�Library'v4�
ParaMvirtualizaBon'
• VMWare'SVGA2'[Dowty&’09]'LoGV'[GoEschalk&et&al.&’10]'– Expose'an'ideal'GPU'device'model'to'VMs'– Guest'device'driver'must'be'modified'or'rewriTen'
Driver�
Host�
Hypervisor�
GPU�Hypercalls�
…�Library�VM�
PV'Driver�
Library�VM�
PV'Driver�
gVirt
� Full GPU virtualization
� Mediated Pass-through � Pass-through performance critical operations � Trap-and-emulate privileged operations
7
Full-featured vGPU
Up to 95% native performance
Scale up to 7 VMs
Run native graphics driver in VM
GPU Virtualization Approaches
8
API Forwarding
Direct Pass-Through
Performance
Feature
Sharing
Performance
Feature
Sharing
Full GPU Virtualization
Performance
Feature
Sharing
gVirt
� Open source implementation � GPL/BSD dual-license � Current based on Xen (codename as XenGT) � KVM support is coming
� Support Intel® Processor Graphics built into 4th
generation Intel® Core™ processors � Principles apply to different GPUs
� Trademarked as Intel® GVT-g
� Intel® Graphics Virtualization Technology for virtual GPU
9
Challenges
� Complexity in virtualizing a modern GPU
� Efficiency when sharing the GPU
� Secure isolation among the VMs
10
Architecture of Intel Processor Graphics
gVirt Architecture• gVirt stub
• Extends Xen vMMU, selectively present/hide address ranges to VMs
• Mediator
• Emulates vGPUs for privileged resources
• Context switches vGPUs
• Native driver in VM
• Directly access a portion of perf-critical resource
• QEM for legacy VGA mode
GPU Sharing
• Render engine scheduling
• Time slides of 16ms are assigned to VMs (vGPUs)
• Waits until the guest ring buffer to become idle before switching
• Render context switch
• Save/restore internal pipeline and I/O register states, and cache/TLB flush
Pass-Through Accesses• Graphics memory resource partition
• Each VM gets a (fixed) portion of the real graphics memory => perf impact
• Needs to translate between guest and host view => perf impact
• Translation can be avoided by adding fake (ballooned) guest address ranges
GPU Page Table Virtualization
GPU Page Table Virtualization
Command Protection• Command buffers
• The primary buffer is a statically allocated ring buffers
• Batch buffers are pages allocated on demand
• gVirt audits guest command buffers when commands are submitted
• to guarantee no unauthorized address references
• But what about the window after commands are submitted (and audited) to when they are actually executed?
• What if a malicious VM modifies its commands during this window?
Smart Shadowing
� Utilize specific programming model
20
Ring Buffer
Batch Buffer
Statically allocated
Limited page number Lazy
Shadowing
Allocated on-demand
Rare access after submission Write
Protection
Lazy Shadowing
21
VM Graphics
Driver
Mediator
GPU
Submit
Submit
Execute complete
Copy &
Audit
complete
Write-Protection
22
VM Graphics
Driver
Mediator
GPU
Submit
Audit &
Write-Protection on
Submit
Execute
complete &
Write-Protection off
Linux VM Performance
24
• 3D Benchmark: Phoronix Test Suite • LightsMark, OpenArena, UrbanTerror, Nexuiz
• 2D Benchmark: Cairo-perf-trace • Firefox-asteroids, firefox-scrolling, midori-zommed, gnome-system-monitor
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark* and MobileMark*, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the performance of that product when combined with other products. For more information go to http://www.intel.com/performance.
Summary
� Full GPU virtualization + mediated pass-through � Run native graphics driver in VM � Good balance for performance, feature and
sharing capability � Publicly available patches
� https://github.com/01org/XenGT-Preview-xen � https://github.com/01org/XenGT-Preview-kernel � https://github.com/01org/XenGT-Preview-qemu
30
GPUvm:'Why'Not'Virtualizing'GPUs'at'the'Hypervisor?�
Yusuke'Suzuki*'in'collaboraBon'with'
Shinpei'Kato**,'Hiroshi'Yamada***,'Kenji'Kono*''
*'Keio'University'**'Nagoya'University'
***'Tokyo'University'of'Agriculture'and'Technology'�
GPU'Internals�• PCIe'connected'discrete'GPU'(NVIDIA,'AMD'GPU)'• Driver'accesses'to'GPU'w/'MMIO'through'PCIe'BARs'• Three'major'components'– GPU&compuJng&cores,'GPU&channel&and'GPU&memory&
MMIO�
…�…� GPU'CompuBng'Cores�
GPU�
Driver,'Apps'(CPU)�
PCIe'BARs�GPU'
Channel�…�GPU'
Channel� GPU'Channels�
GPU'Memory�
GPU'Channel'&'CompuBng'Cores�• GPU'channel'is'a'hardware'unit'to'submit'commands'to'GPU'compuBng'cores'
• The'number'of'GPU'channels'is'fixed'• MulBple'channels'can'be'acBve'at'a'Bme'
App� App�
GPU'Channel�
GPU'Channel�
…�
GPU'Commands�
…�…�
GPU'CompuBng'Cores�
GPU�
Commands'are'executed'on'compuBng'cores�
CompuBng�
GPU'Memory�
• Memory'accesses'from'compuBng'cores'are'confined'by'GPU'page'tables'App� App�
GPU'Channel�
GPU'Channel� …�
GPU'Commands�
…�…�
GPU'CompuBng'Cores�
GPU�
GPU'Memory�
Pointer'to'GPU'Page'Table�
GPU'Virtual'Address�
GPU'Physical'Address�
GPU'Page'Table'
GPU'Page'Table'
Unified'Address'Space�
• GPU'and'CPU'memory'spaces'are'unified'– GPU'virtual'address'(GVA)'is'translated'CPU'physical'addresses'as'well'as'GPU'physical'addresses'(GPA)'
…�…�
GPU'CompuBng'Cores�
App�
GPU'Channel� …�
GPU'Commands�
GPU�
GPU'Memory�
GVA�
GPA�CPU'Memory�
CPU'physical'address�
Unified'Address'Space�
GPU'Page'Table'
GPUvm'overview�
• Isolate'GPU'channel,'compuBng'cores'&'memory'
GPU�GPU'Channel�
GPU'Channel�
…
Assigned'to'VM1�
GPU'Channel�
GPU'Channel�
…
Assigned'to'VM2�…�
GPU'Memory�
Assigned'to'VM1� …�Assigned'
to'VM2�
…�
VM1�
…�…�
GPU'CompuBng'Cores�
Time'Sharing�
…� Virtual'GPU�…�
VM2�
…� Virtual'GPU�…�
GPUvm'components�
1. GPU'shadow'page'table'– Isolate'GPU'memory'
2. GPU'shadow'channel'– Isolate'GPU'channels'
3. GPU'fairMshare'scheduler'– Isolate'GPU'Bme'using'GPU'compuBng'cores'
Conclusion�
• GPUvm'shows'the'design'of'full'GPU'virtualizaBon'– GPU'shadow'page'table'– GPU'shadow'channel'– GPU'fairMshare'scheduler'
• FullMvirtualizaBon'exhibits'nonMtrivial'overhead'– MMIO'handling'
• Intercept'TLB'flush'and'scan'page'table'– OpBmizaBons'and'paraMvirtualizaBon'reduce'this'overhead'
– However'sBll'2M3'Bmes'slower'