windows display driver model (wddm) v2 and beyond steve pronovost, microsoft henry moreton, nvidia...
TRANSCRIPT
Windows Display Driver Windows Display Driver Model (WDDM) v2 Model (WDDM) v2 And BeyondAnd Beyond
Steve Pronovost, MicrosoftSteve Pronovost, MicrosoftHenry Moreton, NVIDIAHenry Moreton, NVIDIATim Kelley, ATITim Kelley, ATI
OutlineOutline
IntroductionIntroductionTrends in use of GPU(s)Trends in use of GPU(s)
WDDM v1.0 overviewWDDM v1.0 overview
WDDM v.2.x overviewWDDM v.2.x overview
Scenarios that benefitScenarios that benefit
Trends In Use Of GPUTrends In Use Of GPU
Windows XP: Single client at a timeWindows XP: Single client at a timeGDI desktopGDI desktop
Video decodingVideo decoding
Full screen gameFull screen game
CAD/Workstation applicationsCAD/Workstation applications
GPUs getting more flexibleGPUs getting more flexibleDirect3D pushing increased programmability, Direct3D pushing increased programmability, precision and performanceprecision and performance
Massive processing power, not fully Massive processing power, not fully utilized todayutilized today
Trends In Use Of GPUTrends In Use Of GPU
Windows Vista: Multiple clients togetherWindows Vista: Multiple clients togetherDesktop window managerDesktop window manager
WinFX APIs based on Direct3D 9WinFX APIs based on Direct3D 9
Picture, video playback, capture, encode, Picture, video playback, capture, encode, transcode, edit leverage GPUstranscode, edit leverage GPUs
In-box gamesIn-box games
Emerging General – Purpose-GPU trendEmerging General – Purpose-GPU trendPhysics, image processing, etc.Physics, image processing, etc.
WDDM v1.0WDDM v1.0
Designed to work on Designed to work on existingexisting GPUs GPUs
Increase stability, robustness and securityIncrease stability, robustness and security
GPU schedulingGPU scheduling
Virtualized video memoryVirtualized video memory
Resource virtualization seamless Resource virtualization seamless across legacy APIacross legacy API
Ddraw, dx3, dx5, dx6, dx7, dx8, dx9, OGL Ddraw, dx3, dx5, dx6, dx7, dx8, dx9, OGL
Use new API to take full advantage of Use new API to take full advantage of resource virtualizationresource virtualization
Direct3D 9Ex, Direct3D 10Direct3D 9Ex, Direct3D 10
WDDM v2.0WDDM v2.0
New generation of GPUs designed New generation of GPUs designed for multi-taskingfor multi-tasking
Mid command buffer preemptionMid command buffer preemption
Demand faulting of resourcesDemand faulting of resourcesSurface fault (preferred mode for v2.0)Surface fault (preferred mode for v2.0)
Page fault (stall the GPU)Page fault (stall the GPU)
Per process page tablesPer process page tables
Better multi-tasking than WDDM v1.0,Better multi-tasking than WDDM v1.0,still some client cooperation requiredstill some client cooperation required
WDDM v2.1WDDM v2.1
Everything WDDM v2.0 GPU can doEverything WDDM v2.0 GPU can do
Fine grained context switchingFine grained context switchingCan preempt mid pixelCan preempt mid pixel
Doesn’t stall GPU on page faultDoesn’t stall GPU on page fault
True preemptive multi-taskingTrue preemptive multi-tasking
Ultimate flexibility for the GPU Ultimate flexibility for the GPU
GPU can be used for any scenarios GPU can be used for any scenarios without impact on the desktopwithout impact on the desktop
WDDM Cheat SheetWDDM Cheat Sheet
WDDM v1.0WDDM v1.0 WDDM v2.0WDDM v2.0 WDDM v2.1WDDM v2.1
SchedulingScheduling PacketPacket RunListRunList RunListRunList
PreemptionPreemption PacketPacket Mid PacketMid Packet Mid PixelMid Pixel
Demand Demand faultingfaulting
Not supportedNot supported Surface/Surface/Page (STALL)Page (STALL)
PagePage
MemoryMemoryManagementManagement
Physical/ Physical/ ContiguousContiguous
Virtual/ Virtual/ Page tablePage table
Virtual/Virtual/Page tablePage table
Multi-taskingMulti-tasking CooperativeCooperative Mostly Mostly PreemptivePreemptive
Truly Truly PreemptivePreemptive
WDDM 2.x Scheduling, WDDM 2.x Scheduling, Performance AndPerformance AndMulti-GPU SupportMulti-GPU Support
Henry MoretonHenry MoretonNVIDIANVIDIA
GPUs On The DesktopGPUs On The Desktop
The power of the GPU is finally tappedThe power of the GPU is finally tappedGraphicsGraphics
VideoVideo
Bandwidth and floating point (GPGPU)Bandwidth and floating point (GPGPU)
Applications are vying for this Applications are vying for this powerful resourcepowerful resource
The Vista Desktop The Vista Desktop Window Manager (DWM)Window Manager (DWM)
Photo editingPhoto editing
Video feedsVideo feeds
Personal Video RecorderPersonal Video Recorder
GPU Management Is CrucialGPU Management Is Crucial
Applications naturally see the Applications naturally see the processor as their ownprocessor as their own
Great GPU tasks really exploit the powerGreat GPU tasks really exploit the power
But...But...Some GPU operations are so massiveSome GPU operations are so massivethey take non-trivial timethey take non-trivial time
Some GPU operations are time sensitiveSome GPU operations are time sensitive
Management of the GPU is crucial to Management of the GPU is crucial to success (a happy user)success (a happy user)
Watching The Daily ShowWatching The Daily Show©©
Doodling with photosDoodling with photos
I find a great program forI find a great program forcreating panoramas...creating panoramas...
TodayTodayI set it up with twelve, I set it up with twelve, 6 mega-pixel images6 mega-pixel images
Press Press gogo and wait... and wait... a long time (minutes)a long time (minutes)
Soon, with GPU acceleration, I press Soon, with GPU acceleration, I press go and wait a second or twogo and wait a second or two
A Typical Situation (For Me)A Typical Situation (For Me)
But A Second Or But A Second Or Two Is A Long TimeTwo Is A Long Time
Managed as a shared resource the GPUManaged as a shared resource the GPURenders my video unaffectedRenders my video unaffected
Builds my panorama in no time...Builds my panorama in no time...
UnmanagedUnmanagedThe Daily Show risks being a slide show...The Daily Show risks being a slide show...
So Scheduling Is ImportantSo Scheduling Is Important
How does scheduling vary acrossHow does scheduling vary acrossWDDM v1.0WDDM v1.0
WDDM v2.0WDDM v2.0
WDDM v2.1WDDM v2.1
What are What are the mechanics?the mechanics?
What is the context What is the context switch behavior?switch behavior?
What is expected performance?What is expected performance?With varying numbers of active contexts...With varying numbers of active contexts...
WDDM v2.x – The Care WDDM v2.x – The Care And Feeding Of The GPUAnd Feeding Of The GPU
User Mode Driver (UMD)User Mode Driver (UMD)Creates DMA buffer of commandsCreates DMA buffer of commands
Kernel Mode Driver (KMD)Kernel Mode Driver (KMD)Appends DMA buffer to GPU context’s queueAppends DMA buffer to GPU context’s queue
The GPU Scheduler schedules contextsThe GPU Scheduler schedules contextsA Run List of contexts each with A Run List of contexts each with its own ring buffer of DMA buffersits own ring buffer of DMA buffers
Run ListsRun Lists
List of contexts (box)List of contexts (box)
GPU processes GPU processes a context untila context until
Context is completed Context is completed (get new run list)(get new run list)
Scheduler pre-emptsScheduler pre-empts
Page fault – WDDM v2.1Page fault – WDDM v2.1
Protection faultProtection fault
Synchronization eventSynchronization event
Multiple contexts per Run ListMultiple contexts per Run ListHide latencyHide latency
How Nimble Is How Nimble Is Context Switching?Context Switching?
XPXPAll Q’d DP2 buffers must completeAll Q’d DP2 buffers must complete(very coarse)(very coarse)
WDDM v1.0 – Basic schedulingWDDM v1.0 – Basic schedulingCurrent DMA buffer Current DMA buffer must complete (coarse)must complete (coarse)
WDDM v2.0WDDM v2.0Switch on command/triangle (fine)Switch on command/triangle (fine)
WDDM v2.1WDDM v2.1Switch “immediately” (very fine)Switch “immediately” (very fine)
Context Switch GuaranteesContext Switch Guarantees
Pre WDDM v2.1 (XP, v1.0, v2.0)Pre WDDM v2.1 (XP, v1.0, v2.0)No guaranteeNo guarantee
VERY long shader, VERY large triangle slow to switchVERY long shader, VERY large triangle slow to switch
expected performanceexpected performanceRelatively coarse switching for XP and v1.0Relatively coarse switching for XP and v1.0
V2.0: Good average/typical switch time V2.0: Good average/typical switch time
WDDM v2.1WDDM v2.1Guaranteed to context switchGuaranteed to context switch
Same average/typical switch time as v2.0Same average/typical switch time as v2.0
Much better switch time on applications Much better switch time on applications with long shaderswith long shaders
Context Switch ChallengeContext Switch Challenge
Because GPUs are heavily threaded Because GPUs are heavily threaded there is much more state than on a CPUthere is much more state than on a CPU
Consider rendering @ 60 fpsConsider rendering @ 60 fps17 millisecond frame time17 millisecond frame time
With a context switch time of 100µsWith a context switch time of 100µs
Three concurrent applications see Three concurrent applications see a ~2% context switch overheada ~2% context switch overhead
Fast GPU context switching is Fast GPU context switching is important and challenging!important and challenging!
WDDM v2.x EfficienciesWDDM v2.x Efficiencies
WDDM v1.0WDDM v1.0User Mode Driver (UMD) creates User Mode Driver (UMD) creates GPU-specific command bufferGPU-specific command buffer
KMD patches addressesKMD patches addresses
Copies to GPU visible DMA bufferCopies to GPU visible DMA buffer
WDDM v2.0 and 2.1WDDM v2.0 and 2.1UMD creates DMA buffer directly UMD creates DMA buffer directly in GPU memoryin GPU memory
No copy, no patch, fast and efficientNo copy, no patch, fast and efficient
Performance – Performance – Memory FootprintMemory Footprint
WDDM v1.0WDDM v1.0No demand fault (page or surface)No demand fault (page or surface)
Entire surfaces resident – coarse grainedEntire surfaces resident – coarse grained
OS must guarantee residence – CPU overheadOS must guarantee residence – CPU overhead
WDDM v2.0WDDM v2.0Surface fault – supports load on bindSurface fault – supports load on bind
GPU switches to new context, no stallingGPU switches to new context, no stalling
Fault and stall – permits partial evictionFault and stall – permits partial evictionGPU stalls waiting for missing pageGPU stalls waiting for missing page
WDDM v2.1WDDM v2.1Page fault – permits partial eviction/residencePage fault – permits partial eviction/residence
GPU switches to new context, no stallingGPU switches to new context, no stalling
Multi-Engine, Multi-Engine, Multi-GPU SupportMulti-GPU Support
GPUs are composed of nodes of enginesGPUs are composed of nodes of engines
Homogeneous nodesHomogeneous nodes3D3D nodes nodes
VideoVideo nodes nodes
CopyCopy, etc., etc.
RunList per engineRunList per engine
GPU Device-common address spaceGPU Device-common address spaceMultiple GPU Contexts (per engine)Multiple GPU Contexts (per engine)
Synchronization Synchronization Fence, Trap, Wait, Signal Fence, Trap, Wait, Signal
GPU3D3Dvideo
Multi-GPUMulti-GPU
Linked AdapterLinked AdapterSingle logical adapterSingle logical adapterMultiple physical Multiple physical adaptersadapters
MemoryMemoryMirrored or instancedMirrored or instanced
Broadcast – multiple DMA buffer referencesBroadcast – multiple DMA buffer references
Split Frame RenderingSplit Frame Rendering
WDDM v2.x Memory WDDM v2.x Memory Management And Management And RobustnessRobustness
Tim KelleyTim KelleyATIATI
WDDM v1.0 Surface MgmtWDDM v1.0 Surface Mgmt
All allocations (surfaces) referenced in DMA buffer All allocations (surfaces) referenced in DMA buffer must be resident at GPU submitmust be resident at GPU submit
Driver tracks every allocation Driver tracks every allocation reference in the DMA bufferreference in the DMA buffer
Contiguous memory for each allocationContiguous memory for each allocation
DMA buffers patched with physical addresses DMA buffers patched with physical addresses once surfaces are residentonce surfaces are resident
Driver defines DMA split Driver defines DMA split points to identify minimal points to identify minimal working setworking set
Significant risk of graphics Significant risk of graphics memory thrashingmemory thrashing
WDDM v2.0 WDDM v2.0 Surface FaultingSurface Faulting
A step in the right directionA step in the right direction
GPU supports per process virtual memoryGPU supports per process virtual memory
Two faulting behaviorsTwo faulting behaviorsSurface fault and context switchSurface fault and context switch
Page fault and stallPage fault and stall
In surface faulting, GPU In surface faulting, GPU probes first page of surfaceprobes first page of surface
On probe of non-resident surfaceOn probe of non-resident surfaceGPU faultsGPU faults
GPU context switches to next run list entryGPU context switches to next run list entryContext switch is coarse grained; graphics pipeline drainsContext switch is coarse grained; graphics pipeline drains
OS VidMm issues paging requestsOS VidMm issues paging requests
WDDM v2.0 Page WDDM v2.0 Page Fault And StallFault And Stall
Even if surface probe Even if surface probe succeeds, entire surface succeeds, entire surface may not be residentmay not be resident
GPU must still support page faultingGPU must still support page faulting
On access to a non-resident pageOn access to a non-resident pageGPU faults and stallsGPU faults and stalls
Driver informs OS of missing pagesDriver informs OS of missing pages
OS VidMm issues paging requestsOS VidMm issues paging requests
Driver restarts GPU once pages are residentDriver restarts GPU once pages are resident
Entire working set doesn’t have to Entire working set doesn’t have to be resident simultaneouslybe resident simultaneously
WDDM v2.1 Page FaultingWDDM v2.1 Page Faulting
Finally, full fledged page faulting with context switching!Finally, full fledged page faulting with context switching!
GPUs support general page faulting and GPUs support general page faulting and virtual memory per processvirtual memory per process
On a page fault, GPU context On a page fault, GPU context switches to next run list entryswitches to next run list entry
Context switch is “immediate”Context switch is “immediate”
OS can partially populate OS can partially populate allocations to reduce an allocations to reduce an app’s working setapp’s working set
GPU faults on non-resident page accessGPU faults on non-resident page access
GPU context switches to next run list entryGPU context switches to next run list entry
Dedicated Paging EngineDedicated Paging Engine
Addition of high bandwidth copy Addition of high bandwidth copy engine for pagingengine for paging
Operates in parallel to 3D engineOperates in parallel to 3D engine
GPU can perform paging operations GPU can perform paging operations for one context in parallel with 3D for one context in parallel with 3D rendering for another contextrendering for another context
Paging DeterminationPaging Determination
GPU reports faulting addressGPU reports faulting address
GPU/Driver determine set of pages GPU/Driver determine set of pages needed to make further progressneeded to make further progress
GPU maintains a set of page access bitsGPU maintains a set of page access bits
OS VidMm uses the above to determine OS VidMm uses the above to determine appropriate paging operations appropriate paging operations (including evictions)(including evictions)
Additionally, OS uses heuristics Additionally, OS uses heuristics to preload pagesto preload pages
Efficient Memory ManagementEfficient Memory Management
Steady state residency of surface Steady state residency of surface data for applicationsdata for applications
No texture thrashing for apps whose No texture thrashing for apps whose working set fits into graphics memoryworking set fits into graphics memory
No need for entire surface to be residentNo need for entire surface to be resident
Apps with large surfaces run fast in Apps with large surfaces run fast in smaller local memory if working set fitssmaller local memory if working set fits
Page access info guides VidMm Page access info guides VidMm eviction and promotioneviction and promotion
Reduced minimum physical Reduced minimum physical memory requirementsmemory requirements
WDDM v2.x RobustnessWDDM v2.x Robustness
WDDM V2.x increases OS robustnessWDDM V2.x increases OS robustness
GPU uses virtual addressing instead of physicalGPU uses virtual addressing instead of physicalKernel mode driver (KMD) no longer patches DMA Kernel mode driver (KMD) no longer patches DMA buffers with physical addressesbuffers with physical addresses
User Mode Driver (UMD) builds DMA bufferUser Mode Driver (UMD) builds DMA bufferKMD no longer validates command bufferKMD no longer validates command buffer
KMD no longer copies cmd buffer to DMA bufferKMD no longer copies cmd buffer to DMA buffer
No DMA buffer splittingNo DMA buffer splittingUMD no longer identifies split pointsUMD no longer identifies split points
OS no longer splits DMA buffers to fit resourcesOS no longer splits DMA buffers to fit resources
WDDM v2.1 RobustnessWDDM v2.1 Robustness
Guaranteed sub-triangle context switchingGuaranteed sub-triangle context switching
Driver processing on fault Driver processing on fault essentially eliminatedessentially eliminated
No application can hog GPUNo application can hog GPU
Better application responsivenessBetter application responsiveness
Applications with arbitrarily complex Applications with arbitrarily complex GPU processing do not hinder GPU processing do not hinder other applicationsother applications
E.g., Complex GPGPU number E.g., Complex GPGPU number crunching alongside glitch free videocrunching alongside glitch free video
SecuritySecurity
Per-process virtual memoryPer-process virtual memoryProtection moved to GPUProtection moved to GPU
Patching eliminated Patching eliminated from driverfrom driver
Privileged OperationsPrivileged Operations
Privileged memoryPrivileged memory
More secure platform More secure platform for future premium for future premium content protectioncontent protection
Privileged OperationsPrivileged Operations
DMA buffers created in user mode cannot DMA buffers created in user mode cannot compromise the systemcompromise the system
Can’t access memory belonging to other processesCan’t access memory belonging to other processes
Can’t interfere with correct and robust operationCan’t interfere with correct and robust operation
Certain GPU operations are privileged Certain GPU operations are privileged and only available to KMD-built DMA and only available to KMD-built DMA buffers; Examples includebuffers; Examples include
Display settingsDisplay settings
GPU configurationGPU configuration
Context switching controlsContext switching controls
UMD-created DMA buffers cannot UMD-created DMA buffers cannot perform privileged operationsperform privileged operations
Privileged MemoryPrivileged Memory
Provides secure location for page tables, ring buffers, Provides secure location for page tables, ring buffers, and other allocations that should be protectedand other allocations that should be protected
Malicious apps cannot compromise system securityMalicious apps cannot compromise system security
GPU maintains per-page privilege setting (in page table)GPU maintains per-page privilege setting (in page table)
Fault occurs on GPU access to privileged memory from Fault occurs on GPU access to privileged memory from limited DMA buffers constructed by UMDlimited DMA buffers constructed by UMD
GPU access GPU access allowed for allowed for privileged privileged DMA buffers DMA buffers constructed constructed by KMDby KMD
PagePage Table Table
Bad Bad DMA DMA
BufferBuffer
V2.1 GPUV2.1 GPU
Process Process Ring Ring
BufferBuffer
WDDM Future WDDM Future And ConclusionAnd Conclusion
Steve PronovostSteve PronovostMicrosoftMicrosoft
Future: WDDM 3.xFuture: WDDM 3.x
All the features of WDDM v2.1All the features of WDDM v2.1
Better support for content streamingBetter support for content streaming
Virtual machine supportVirtual machine support
Call To ActionCall To Action
Invest in WDDM v2.x GPUInvest in WDDM v2.x GPU
Find new interesting ways Find new interesting ways to use the GPUto use the GPU
Questions Or Feedback?Questions Or Feedback?
Send e-mail toSend e-mail toDirectX @ microsoft.comDirectX @ microsoft.com
© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,
it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.