windows display driver model (wddm) v2 and beyond steve pronovost, microsoft henry moreton, nvidia...

Windows Display Driver Windows Display Driver Model (WDDM) v2 Model (WDDM) v2 And BeyondAnd Beyond

Steve Pronovost, MicrosoftSteve Pronovost, MicrosoftHenry Moreton, NVIDIAHenry Moreton, NVIDIATim Kelley, ATITim Kelley, ATI

OutlineOutline

IntroductionIntroductionTrends in use of GPU(s)Trends in use of GPU(s)

WDDM v1.0 overviewWDDM v1.0 overview

WDDM v.2.x overviewWDDM v.2.x overview

Scenarios that benefitScenarios that benefit

Trends In Use Of GPUTrends In Use Of GPU

Windows XP: Single client at a timeWindows XP: Single client at a timeGDI desktopGDI desktop

Video decodingVideo decoding

Full screen gameFull screen game

CAD/Workstation applicationsCAD/Workstation applications

GPUs getting more flexibleGPUs getting more flexibleDirect3D pushing increased programmability, Direct3D pushing increased programmability, precision and performanceprecision and performance

Massive processing power, not fully Massive processing power, not fully utilized todayutilized today

Trends In Use Of GPUTrends In Use Of GPU

Windows Vista: Multiple clients togetherWindows Vista: Multiple clients togetherDesktop window managerDesktop window manager

WinFX APIs based on Direct3D 9WinFX APIs based on Direct3D 9

Picture, video playback, capture, encode, Picture, video playback, capture, encode, transcode, edit leverage GPUstranscode, edit leverage GPUs

In-box gamesIn-box games

Emerging General – Purpose-GPU trendEmerging General – Purpose-GPU trendPhysics, image processing, etc.Physics, image processing, etc.

WDDM v1.0WDDM v1.0

Designed to work on Designed to work on existingexisting GPUs GPUs

Increase stability, robustness and securityIncrease stability, robustness and security

GPU schedulingGPU scheduling

Virtualized video memoryVirtualized video memory

Resource virtualization seamless Resource virtualization seamless across legacy APIacross legacy API

Ddraw, dx3, dx5, dx6, dx7, dx8, dx9, OGL Ddraw, dx3, dx5, dx6, dx7, dx8, dx9, OGL

Use new API to take full advantage of Use new API to take full advantage of resource virtualizationresource virtualization

Direct3D 9Ex, Direct3D 10Direct3D 9Ex, Direct3D 10

WDDM v2.0WDDM v2.0

New generation of GPUs designed New generation of GPUs designed for multi-taskingfor multi-tasking

Mid command buffer preemptionMid command buffer preemption

Demand faulting of resourcesDemand faulting of resourcesSurface fault (preferred mode for v2.0)Surface fault (preferred mode for v2.0)

Page fault (stall the GPU)Page fault (stall the GPU)

Per process page tablesPer process page tables

Better multi-tasking than WDDM v1.0,Better multi-tasking than WDDM v1.0,still some client cooperation requiredstill some client cooperation required

WDDM v2.1WDDM v2.1

Everything WDDM v2.0 GPU can doEverything WDDM v2.0 GPU can do

Fine grained context switchingFine grained context switchingCan preempt mid pixelCan preempt mid pixel

Doesn’t stall GPU on page faultDoesn’t stall GPU on page fault

True preemptive multi-taskingTrue preemptive multi-tasking

Ultimate flexibility for the GPU Ultimate flexibility for the GPU

GPU can be used for any scenarios GPU can be used for any scenarios without impact on the desktopwithout impact on the desktop

WDDM Cheat SheetWDDM Cheat Sheet

WDDM v1.0WDDM v1.0 WDDM v2.0WDDM v2.0 WDDM v2.1WDDM v2.1

SchedulingScheduling PacketPacket RunListRunList RunListRunList

PreemptionPreemption PacketPacket Mid PacketMid Packet Mid PixelMid Pixel

Demand Demand faultingfaulting

Not supportedNot supported Surface/Surface/Page (STALL)Page (STALL)

PagePage

MemoryMemoryManagementManagement

Physical/ Physical/ ContiguousContiguous

Virtual/ Virtual/ Page tablePage table

Virtual/Virtual/Page tablePage table

Multi-taskingMulti-tasking CooperativeCooperative Mostly Mostly PreemptivePreemptive

Truly Truly PreemptivePreemptive

WDDM 2.x Scheduling, WDDM 2.x Scheduling, Performance AndPerformance AndMulti-GPU SupportMulti-GPU Support

Henry MoretonHenry MoretonNVIDIANVIDIA

GPUs On The DesktopGPUs On The Desktop

The power of the GPU is finally tappedThe power of the GPU is finally tappedGraphicsGraphics

VideoVideo

Bandwidth and floating point (GPGPU)Bandwidth and floating point (GPGPU)

Applications are vying for this Applications are vying for this powerful resourcepowerful resource

The Vista Desktop The Vista Desktop Window Manager (DWM)Window Manager (DWM)

Photo editingPhoto editing

Video feedsVideo feeds

Personal Video RecorderPersonal Video Recorder

GPU Management Is CrucialGPU Management Is Crucial

Applications naturally see the Applications naturally see the processor as their ownprocessor as their own

Great GPU tasks really exploit the powerGreat GPU tasks really exploit the power

But...But...Some GPU operations are so massiveSome GPU operations are so massivethey take non-trivial timethey take non-trivial time

Some GPU operations are time sensitiveSome GPU operations are time sensitive

Management of the GPU is crucial to Management of the GPU is crucial to success (a happy user)success (a happy user)

Watching The Daily ShowWatching The Daily Show©©

Doodling with photosDoodling with photos

I find a great program forI find a great program forcreating panoramas...creating panoramas...

TodayTodayI set it up with twelve, I set it up with twelve, 6 mega-pixel images6 mega-pixel images

Press Press gogo and wait... and wait... a long time (minutes)a long time (minutes)

Soon, with GPU acceleration, I press Soon, with GPU acceleration, I press go and wait a second or twogo and wait a second or two

A Typical Situation (For Me)A Typical Situation (For Me)

But A Second Or But A Second Or Two Is A Long TimeTwo Is A Long Time

Managed as a shared resource the GPUManaged as a shared resource the GPURenders my video unaffectedRenders my video unaffected

Builds my panorama in no time...Builds my panorama in no time...

UnmanagedUnmanagedThe Daily Show risks being a slide show...The Daily Show risks being a slide show...

So Scheduling Is ImportantSo Scheduling Is Important

How does scheduling vary acrossHow does scheduling vary acrossWDDM v1.0WDDM v1.0

WDDM v2.0WDDM v2.0

WDDM v2.1WDDM v2.1

What are What are the mechanics?the mechanics?

What is the context What is the context switch behavior?switch behavior?

What is expected performance?What is expected performance?With varying numbers of active contexts...With varying numbers of active contexts...

WDDM v2.x – The Care WDDM v2.x – The Care And Feeding Of The GPUAnd Feeding Of The GPU

User Mode Driver (UMD)User Mode Driver (UMD)Creates DMA buffer of commandsCreates DMA buffer of commands

Kernel Mode Driver (KMD)Kernel Mode Driver (KMD)Appends DMA buffer to GPU context’s queueAppends DMA buffer to GPU context’s queue

The GPU Scheduler schedules contextsThe GPU Scheduler schedules contextsA Run List of contexts each with A Run List of contexts each with its own ring buffer of DMA buffersits own ring buffer of DMA buffers

Run ListsRun Lists

List of contexts (box)List of contexts (box)

GPU processes GPU processes a context untila context until

Context is completed Context is completed (get new run list)(get new run list)

Scheduler pre-emptsScheduler pre-empts

Page fault – WDDM v2.1Page fault – WDDM v2.1

Protection faultProtection fault

Synchronization eventSynchronization event

Multiple contexts per Run ListMultiple contexts per Run ListHide latencyHide latency

How Nimble Is How Nimble Is Context Switching?Context Switching?

XPXPAll Q’d DP2 buffers must completeAll Q’d DP2 buffers must complete(very coarse)(very coarse)

WDDM v1.0 – Basic schedulingWDDM v1.0 – Basic schedulingCurrent DMA buffer Current DMA buffer must complete (coarse)must complete (coarse)

WDDM v2.0WDDM v2.0Switch on command/triangle (fine)Switch on command/triangle (fine)

WDDM v2.1WDDM v2.1Switch “immediately” (very fine)Switch “immediately” (very fine)

Context Switch GuaranteesContext Switch Guarantees

Pre WDDM v2.1 (XP, v1.0, v2.0)Pre WDDM v2.1 (XP, v1.0, v2.0)No guaranteeNo guarantee

VERY long shader, VERY large triangle slow to switchVERY long shader, VERY large triangle slow to switch

expected performanceexpected performanceRelatively coarse switching for XP and v1.0Relatively coarse switching for XP and v1.0

V2.0: Good average/typical switch time V2.0: Good average/typical switch time

WDDM v2.1WDDM v2.1Guaranteed to context switchGuaranteed to context switch

Same average/typical switch time as v2.0Same average/typical switch time as v2.0

Much better switch time on applications Much better switch time on applications with long shaderswith long shaders

Context Switch ChallengeContext Switch Challenge

Because GPUs are heavily threaded Because GPUs are heavily threaded there is much more state than on a CPUthere is much more state than on a CPU

Consider rendering @ 60 fpsConsider rendering @ 60 fps17 millisecond frame time17 millisecond frame time

With a context switch time of 100µsWith a context switch time of 100µs

Three concurrent applications see Three concurrent applications see a ~2% context switch overheada ~2% context switch overhead

Fast GPU context switching is Fast GPU context switching is important and challenging!important and challenging!

WDDM v2.x EfficienciesWDDM v2.x Efficiencies

WDDM v1.0WDDM v1.0User Mode Driver (UMD) creates User Mode Driver (UMD) creates GPU-specific command bufferGPU-specific command buffer

KMD patches addressesKMD patches addresses

Copies to GPU visible DMA bufferCopies to GPU visible DMA buffer

WDDM v2.0 and 2.1WDDM v2.0 and 2.1UMD creates DMA buffer directly UMD creates DMA buffer directly in GPU memoryin GPU memory

No copy, no patch, fast and efficientNo copy, no patch, fast and efficient

Performance – Performance – Memory FootprintMemory Footprint

WDDM v1.0WDDM v1.0No demand fault (page or surface)No demand fault (page or surface)

Entire surfaces resident – coarse grainedEntire surfaces resident – coarse grained

OS must guarantee residence – CPU overheadOS must guarantee residence – CPU overhead

WDDM v2.0WDDM v2.0Surface fault – supports load on bindSurface fault – supports load on bind

GPU switches to new context, no stallingGPU switches to new context, no stalling

Fault and stall – permits partial evictionFault and stall – permits partial evictionGPU stalls waiting for missing pageGPU stalls waiting for missing page

WDDM v2.1WDDM v2.1Page fault – permits partial eviction/residencePage fault – permits partial eviction/residence

GPU switches to new context, no stallingGPU switches to new context, no stalling

Multi-Engine, Multi-Engine, Multi-GPU SupportMulti-GPU Support

GPUs are composed of nodes of enginesGPUs are composed of nodes of engines

Homogeneous nodesHomogeneous nodes3D3D nodes nodes

VideoVideo nodes nodes

CopyCopy, etc., etc.

RunList per engineRunList per engine

GPU Device-common address spaceGPU Device-common address spaceMultiple GPU Contexts (per engine)Multiple GPU Contexts (per engine)

Synchronization Synchronization Fence, Trap, Wait, Signal Fence, Trap, Wait, Signal

GPU3D3Dvideo

Multi-GPUMulti-GPU

Linked AdapterLinked AdapterSingle logical adapterSingle logical adapterMultiple physical Multiple physical adaptersadapters

MemoryMemoryMirrored or instancedMirrored or instanced

Broadcast – multiple DMA buffer referencesBroadcast – multiple DMA buffer references

Split Frame RenderingSplit Frame Rendering

WDDM v2.x Memory WDDM v2.x Memory Management And Management And RobustnessRobustness

Tim KelleyTim KelleyATIATI

WDDM v1.0 Surface MgmtWDDM v1.0 Surface Mgmt

All allocations (surfaces) referenced in DMA buffer All allocations (surfaces) referenced in DMA buffer must be resident at GPU submitmust be resident at GPU submit

Driver tracks every allocation Driver tracks every allocation reference in the DMA bufferreference in the DMA buffer

Contiguous memory for each allocationContiguous memory for each allocation

DMA buffers patched with physical addresses DMA buffers patched with physical addresses once surfaces are residentonce surfaces are resident

Driver defines DMA split Driver defines DMA split points to identify minimal points to identify minimal working setworking set

Significant risk of graphics Significant risk of graphics memory thrashingmemory thrashing

WDDM v2.0 WDDM v2.0 Surface FaultingSurface Faulting

A step in the right directionA step in the right direction

GPU supports per process virtual memoryGPU supports per process virtual memory

Two faulting behaviorsTwo faulting behaviorsSurface fault and context switchSurface fault and context switch

Page fault and stallPage fault and stall

In surface faulting, GPU In surface faulting, GPU probes first page of surfaceprobes first page of surface

On probe of non-resident surfaceOn probe of non-resident surfaceGPU faultsGPU faults

GPU context switches to next run list entryGPU context switches to next run list entryContext switch is coarse grained; graphics pipeline drainsContext switch is coarse grained; graphics pipeline drains

OS VidMm issues paging requestsOS VidMm issues paging requests

WDDM v2.0 Page WDDM v2.0 Page Fault And StallFault And Stall

Even if surface probe Even if surface probe succeeds, entire surface succeeds, entire surface may not be residentmay not be resident

GPU must still support page faultingGPU must still support page faulting

On access to a non-resident pageOn access to a non-resident pageGPU faults and stallsGPU faults and stalls

Driver informs OS of missing pagesDriver informs OS of missing pages

OS VidMm issues paging requestsOS VidMm issues paging requests

Driver restarts GPU once pages are residentDriver restarts GPU once pages are resident

Entire working set doesn’t have to Entire working set doesn’t have to be resident simultaneouslybe resident simultaneously

WDDM v2.1 Page FaultingWDDM v2.1 Page Faulting

Finally, full fledged page faulting with context switching!Finally, full fledged page faulting with context switching!

GPUs support general page faulting and GPUs support general page faulting and virtual memory per processvirtual memory per process

On a page fault, GPU context On a page fault, GPU context switches to next run list entryswitches to next run list entry

Context switch is “immediate”Context switch is “immediate”

OS can partially populate OS can partially populate allocations to reduce an allocations to reduce an app’s working setapp’s working set

GPU faults on non-resident page accessGPU faults on non-resident page access

GPU context switches to next run list entryGPU context switches to next run list entry

Dedicated Paging EngineDedicated Paging Engine

Addition of high bandwidth copy Addition of high bandwidth copy engine for pagingengine for paging

Operates in parallel to 3D engineOperates in parallel to 3D engine

GPU can perform paging operations GPU can perform paging operations for one context in parallel with 3D for one context in parallel with 3D rendering for another contextrendering for another context

Paging DeterminationPaging Determination

GPU reports faulting addressGPU reports faulting address

GPU/Driver determine set of pages GPU/Driver determine set of pages needed to make further progressneeded to make further progress

GPU maintains a set of page access bitsGPU maintains a set of page access bits

OS VidMm uses the above to determine OS VidMm uses the above to determine appropriate paging operations appropriate paging operations (including evictions)(including evictions)

Additionally, OS uses heuristics Additionally, OS uses heuristics to preload pagesto preload pages

Efficient Memory ManagementEfficient Memory Management

Steady state residency of surface Steady state residency of surface data for applicationsdata for applications

No texture thrashing for apps whose No texture thrashing for apps whose working set fits into graphics memoryworking set fits into graphics memory

No need for entire surface to be residentNo need for entire surface to be resident

Apps with large surfaces run fast in Apps with large surfaces run fast in smaller local memory if working set fitssmaller local memory if working set fits

Page access info guides VidMm Page access info guides VidMm eviction and promotioneviction and promotion

Reduced minimum physical Reduced minimum physical memory requirementsmemory requirements

WDDM v2.x RobustnessWDDM v2.x Robustness

WDDM V2.x increases OS robustnessWDDM V2.x increases OS robustness

GPU uses virtual addressing instead of physicalGPU uses virtual addressing instead of physicalKernel mode driver (KMD) no longer patches DMA Kernel mode driver (KMD) no longer patches DMA buffers with physical addressesbuffers with physical addresses

User Mode Driver (UMD) builds DMA bufferUser Mode Driver (UMD) builds DMA bufferKMD no longer validates command bufferKMD no longer validates command buffer

KMD no longer copies cmd buffer to DMA bufferKMD no longer copies cmd buffer to DMA buffer

No DMA buffer splittingNo DMA buffer splittingUMD no longer identifies split pointsUMD no longer identifies split points

OS no longer splits DMA buffers to fit resourcesOS no longer splits DMA buffers to fit resources

WDDM v2.1 RobustnessWDDM v2.1 Robustness

Guaranteed sub-triangle context switchingGuaranteed sub-triangle context switching

Driver processing on fault Driver processing on fault essentially eliminatedessentially eliminated

No application can hog GPUNo application can hog GPU

Better application responsivenessBetter application responsiveness

Applications with arbitrarily complex Applications with arbitrarily complex GPU processing do not hinder GPU processing do not hinder other applicationsother applications

E.g., Complex GPGPU number E.g., Complex GPGPU number crunching alongside glitch free videocrunching alongside glitch free video

SecuritySecurity

Per-process virtual memoryPer-process virtual memoryProtection moved to GPUProtection moved to GPU

Patching eliminated Patching eliminated from driverfrom driver

Privileged OperationsPrivileged Operations

Privileged memoryPrivileged memory

More secure platform More secure platform for future premium for future premium content protectioncontent protection

Privileged OperationsPrivileged Operations

DMA buffers created in user mode cannot DMA buffers created in user mode cannot compromise the systemcompromise the system

Can’t access memory belonging to other processesCan’t access memory belonging to other processes

Can’t interfere with correct and robust operationCan’t interfere with correct and robust operation

Certain GPU operations are privileged Certain GPU operations are privileged and only available to KMD-built DMA and only available to KMD-built DMA buffers; Examples includebuffers; Examples include

Display settingsDisplay settings

GPU configurationGPU configuration

Context switching controlsContext switching controls

UMD-created DMA buffers cannot UMD-created DMA buffers cannot perform privileged operationsperform privileged operations

Privileged MemoryPrivileged Memory

Provides secure location for page tables, ring buffers, Provides secure location for page tables, ring buffers, and other allocations that should be protectedand other allocations that should be protected

Malicious apps cannot compromise system securityMalicious apps cannot compromise system security

GPU maintains per-page privilege setting (in page table)GPU maintains per-page privilege setting (in page table)

Fault occurs on GPU access to privileged memory from Fault occurs on GPU access to privileged memory from limited DMA buffers constructed by UMDlimited DMA buffers constructed by UMD

GPU access GPU access allowed for allowed for privileged privileged DMA buffers DMA buffers constructed constructed by KMDby KMD

PagePage Table Table

Bad Bad DMA DMA

BufferBuffer

V2.1 GPUV2.1 GPU

Process Process Ring Ring

BufferBuffer

WDDM Future WDDM Future And ConclusionAnd Conclusion

Steve PronovostSteve PronovostMicrosoftMicrosoft

Future: WDDM 3.xFuture: WDDM 3.x

All the features of WDDM v2.1All the features of WDDM v2.1

Better support for content streamingBetter support for content streaming

Virtual machine supportVirtual machine support

Call To ActionCall To Action

Invest in WDDM v2.x GPUInvest in WDDM v2.x GPU

Find new interesting ways Find new interesting ways to use the GPUto use the GPU

Questions Or Feedback?Questions Or Feedback?

Send e-mail toSend e-mail toDirectX @ microsoft.comDirectX @ microsoft.com

© 2006 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions,

it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

windows display driver model (wddm) v2 and beyond steve pronovost, microsoft henry moreton, nvidia...

Documents

gpu gpu

gpu management

gpu operations

security gpu

gpu acceleration

desktop slide

overview wddm

preemptive slide