gpu performance tools -...
TRANSCRIPT
![Page 1: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/1.jpg)
GPU Performance ToolsGPU Performance Tools
Sébastien DominéManager of Developer Technology Tools
NVIDIA Corporation
![Page 2: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/2.jpg)
Agenda• Tools of Today
– NVPerfHUD 2.0– NVShaderPerf– FX Composer 1.1
• Tools of Tomorrow– Instrumented Driver– FX Composer 1.5
• Conclusion and Q&A
![Page 3: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/3.jpg)
The Tools of Today
![Page 4: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/4.jpg)
NVPerfHUD 2.0• Overlay graph that
displays stats from :– Direct3D9 API
interception layer – Direct3D Driver
• Able to bypass and inject some API calls to assist with performance analysis Image courtesy of FutureMark Corp.
![Page 5: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/5.jpg)
••Frame rateFrame rate••Number of triangles/frameNumber of triangles/frame••Elapsed time in the sessionElapsed time in the session
Current Memory footprint:Current Memory footprint:••AGPAGP••Video MemoryVideo Memory
Driver Instrumentation:Driver Instrumentation:••Time spent in Frame Time spent in Frame ••Time spent in Driver Time spent in Driver ••Driver waiting for GPU (Spin)Driver waiting for GPU (Spin)••GPU Idle Performance CounterGPU Idle Performance Counter
Histogram of Draw Primitives BatchesHistogram of Draw Primitives Batches
Total number of Draw Primitives Batches:Total number of Draw Primitives Batches:••CurrentCurrent••Value over timeValue over time
Image courtesy of FutureMark Corp.
![Page 6: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/6.jpg)
What’s new in 2.0?3D Application
DirectX Runtime
DirectX Driver
HW
NVPerfHUD 1.0
3D Application
DirectX Runtime
DirectX Driver
HW
NVPerfHUD 2.0
stats
![Page 7: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/7.jpg)
• DrawPrimitives/DrawIndexPrimitivesHistogram
1000
2000
0100 1000500
# of triangles
# of
Dra
wP
rimiti
ves
![Page 8: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/8.jpg)
Texture Stage States
Image courtesy of FutureMark Corp.
![Page 9: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/9.jpg)
Pixel Shaders 1.x
Image courtesy of FutureMark Corp.
![Page 10: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/10.jpg)
Pixel Shaders 2.0
Image courtesy of FutureMark Corp.
![Page 11: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/11.jpg)
2x2 Texture replacement2x2 Texture replacement
Image courtesy of FutureMark Corp.
![Page 12: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/12.jpg)
Null DrawPrimitive mode
Image courtesy of FutureMark Corp.
![Page 13: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/13.jpg)
NVPerfHUD - Overhead• NVPerfHUD is fairly lean but...• Overlay graph and interception only can
costs up to 1.3%• Driver instrumentation can cost up to 6%• Upper bound for total cost: 7%
![Page 14: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/14.jpg)
NVPerfHUD - Demo
Demo running FutureMark’s 3DMark2003
![Page 15: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/15.jpg)
v2f BumpReflectVS(a2v IN,uniform float4x4 WorldViewProj,uniform float4x4 World,uniform float4x4 ViewIT)
{v2f OUT;// Position in screen space.OUT.Position = mul(IN.Position, WorldViewProj);// pass texture coordinates for fetching the normal mapOUT.TexCoord.xyz = IN.TexCoord;OUT.TexCoord.w = 1.0;// compute the 4x4 tranform from tangent space to object spacefloat3x3 TangentToObjSpace;// first rows are the tangent and binormal scaled by the bump scaleTangentToObjSpace[0] = float3(IN.Tangent.x, IN.Binormal.x, IN.Normal.x);TangentToObjSpace[1] = float3(IN.Tangent.y, IN.Binormal.y, IN.Normal.y);TangentToObjSpace[2] = float3(IN.Tangent.z, IN.Binormal.z, IN.Normal.z);OUT.TexCoord1.x = dot(World[0].xyz, TangentToObjSpace[0]);OUT.TexCoord1.y = dot(World[1].xyz, TangentToObjSpace[0]);OUT.TexCoord1.z = dot(World[2].xyz, TangentToObjSpace[0]);OUT.TexCoord2.x = dot(World[0].xyz, TangentToObjSpace[1]);OUT.TexCoord2.y = dot(World[1].xyz, TangentToObjSpace[1]);OUT.TexCoord2.z = dot(World[2].xyz, TangentToObjSpace[1]);OUT.TexCoord3.x = dot(World[0].xyz, TangentToObjSpace[2]);OUT.TexCoord3.y = dot(World[1].xyz, TangentToObjSpace[2]);OUT.TexCoord3.z = dot(World[2].xyz, TangentToObjSpace[2]);float4 worldPos = mul(IN.Position, World);// compute the eye vector (going from shaded point to eye) in cube spacefloat4 eyeVector = worldPos - ViewIT[3]; // view inv. transpose contains eye position in world space in last row.OUT.TexCoord1.w = eyeVector.x;OUT.TexCoord2.w = eyeVector.y;OUT.TexCoord3.w = eyeVector.z;return OUT;
}
///////////////// pixel shader //////////////////
float4 BumpReflectPS(v2f IN,uniform sampler2D NormalMap,uniform samplerCUBE EnvironmentMap,uniform float BumpScale) : COLOR
{// fetch the bump normal from the normal mapfloat3 normal = tex2D(NormalMap, IN.TexCoord.xy).xyz * 2.0 - 1.0;normal = normalize(float3(normal.x * BumpScale, normal.y * BumpScale, normal.z)); // transform the bump normal into cube space// then use the transformed normal and eye vector to compute a reflection vector// used to fetch the cube map// (we multiply by 2 only to increase brightness)float3 eyevec = float3(IN.TexCoord1.w, IN.TexCoord2.w, IN.TexCoord3.w);float3 worldNorm;worldNorm.x = dot(IN.TexCoord1.xyz,normal);worldNorm.y = dot(IN.TexCoord2.xyz,normal);worldNorm.z = dot(IN.TexCoord3.xyz,normal);float3 lookup = reflect(eyevec, worldNorm);return texCUBE(EnvironmentMap, lookup);
NVShaderPerfInputs:•HLSL•!!FP1.0•!!ARBfp1.0•PS1.x•PS2.x
NVShaderPerf
GPU Arch:•NV3X •...and more
![Page 16: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/16.jpg)
NVShaderPerfDirect3D Application
DirectX Runtime
DirectX Driver
HW
Unified Compiler
HLSL
Direct3D shader op-codes
HW Binary
API agnostic shader op-codes
![Page 17: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/17.jpg)
v2f BumpReflectVS(a2v IN,uniform float4x4 WorldViewProj,uniform float4x4 World,uniform float4x4 ViewIT)
{v2f OUT;// Position in screen space.OUT.Position = mul(IN.Position, WorldViewProj);// pass texture coordinates for fetching the normal mapOUT.TexCoord.xyz = IN.TexCoord;OUT.TexCoord.w = 1.0;// compute the 4x4 tranform from tangent space to object spacefloat3x3 TangentToObjSpace;// first rows are the tangent and binormal scaled by the bump scaleTangentToObjSpace[0] = float3(IN.Tangent.x, IN.Binormal.x, IN.Normal.x);TangentToObjSpace[1] = float3(IN.Tangent.y, IN.Binormal.y, IN.Normal.y);TangentToObjSpace[2] = float3(IN.Tangent.z, IN.Binormal.z, IN.Normal.z);OUT.TexCoord1.x = dot(World[0].xyz, TangentToObjSpace[0]);OUT.TexCoord1.y = dot(World[1].xyz, TangentToObjSpace[0]);OUT.TexCoord1.z = dot(World[2].xyz, TangentToObjSpace[0]);OUT.TexCoord2.x = dot(World[0].xyz, TangentToObjSpace[1]);OUT.TexCoord2.y = dot(World[1].xyz, TangentToObjSpace[1]);OUT.TexCoord2.z = dot(World[2].xyz, TangentToObjSpace[1]);OUT.TexCoord3.x = dot(World[0].xyz, TangentToObjSpace[2]);OUT.TexCoord3.y = dot(World[1].xyz, TangentToObjSpace[2]);OUT.TexCoord3.z = dot(World[2].xyz, TangentToObjSpace[2]);float4 worldPos = mul(IN.Position, World);// compute the eye vector (going from shaded point to eye) in cube spacefloat4 eyeVector = worldPos - ViewIT[3]; // view inv. transpose contains eye position in world space in last row.OUT.TexCoord1.w = eyeVector.x;OUT.TexCoord2.w = eyeVector.y;OUT.TexCoord3.w = eyeVector.z;return OUT;
}
///////////////// pixel shader //////////////////
float4 BumpReflectPS(v2f IN,uniform sampler2D NormalMap,uniform samplerCUBE EnvironmentMap,uniform float BumpScale) : COLOR
{// fetch the bump normal from the normal mapfloat3 normal = tex2D(NormalMap, IN.TexCoord.xy).xyz * 2.0 - 1.0;normal = normalize(float3(normal.x * BumpScale, normal.y * BumpScale, normal.z)); // transform the bump normal into cube space// then use the transformed normal and eye vector to compute a reflection vector// used to fetch the cube map// (we multiply by 2 only to increase brightness)float3 eyevec = float3(IN.TexCoord1.w, IN.TexCoord2.w, IN.TexCoord3.w);float3 worldNorm;worldNorm.x = dot(IN.TexCoord1.xyz,normal);worldNorm.y = dot(IN.TexCoord2.xyz,normal);worldNorm.z = dot(IN.TexCoord3.xyz,normal);float3 lookup = reflect(eyevec, worldNorm);return texCUBE(EnvironmentMap, lookup);
NVShaderPerf - Demo
![Page 18: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/18.jpg)
FX Composer 1.1• IDE for HLSL authoring,
debugging and optimization
• Pixel Shader scheduling• Direct3D9 VS/PS op-code
disassembly• Advanced texture
generation for baking Look Up Tables
![Page 19: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/19.jpg)
FX Composer – Shader Perf•Target GPU
•Driver version match
•Number of Cycles
•Number of Registers
•GPU Utilization
![Page 20: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/20.jpg)
FX Composer - Disassembly
•Vertex Shader
•Pixel Shader 1.x, 2.x
![Page 21: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/21.jpg)
FX Composer – LUT Optimization
• Bake your own texture for function look up:– Normalization
cubemaps– Lighting computation– Expensive math – Functions that can
be artist controlled...
![Page 22: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/22.jpg)
FX Composer - Demo
Overview of a performance tutorial
![Page 23: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/23.jpg)
The Tools of Tomorrow
![Page 24: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/24.jpg)
Instrumented Driver• Perfect companion for Intel VTune, MSFT PIX
for Windows, Perfmon, etc...• Allows 3D applications to monitor:
– Resource available (AGP, etc...)– Driver counters (Spins, etc...)– Hardware counters (bottlenecks, etc...)
![Page 25: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/25.jpg)
Driver Instrumentation Architecture
Windows Performance Data Helper(PDH)
NVPMAPI.DLL
VTune
PIX for Windows
Game Engine OpenGL Driver
Direct3D DriverNVIDIA Developer Control Panel
![Page 26: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/26.jpg)
NV Dev Control Panel
![Page 27: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/27.jpg)
Instrumented Driver - Demo
• Direct3D9• OpenGL• HW Counters
![Page 28: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/28.jpg)
FX Composer 1.5
• Vertex Shader Scheduling• Texture Anisotropic Wizardry• Support for next generation GPU• …and much more to come!
![Page 29: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/29.jpg)
Conclusion
• API Performance• Unified Compiler Performance• Driver Performance• HW Performance• What else do you need?
![Page 30: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/30.jpg)
Other talks of interest...
• Wed: Practical Performance Analysis and Tuning [4:00pm – 5:00pm] by AshuRege and Clint Brewer
![Page 31: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/31.jpg)
Credits
• FutureMark for letting us use 3DMark2003
• Special thanks to Raul Aguaviva, Jeffrey Kiel and Christopher Maughanfor making these tools!
![Page 33: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/33.jpg)
Graphics Pipeline
Framebuffer
Fragment Processor
Texture Storage
+ Filtering
RasterizerGeometry Processor
GeometryStorageCPU
Vertices Pixels
1 3 2
CPU/Bus GPU
![Page 34: GPU Performance Tools - http.download.nvidia.comhttp.download.nvidia.com/developer/presentations/GDC_2004/gdc_2004...• Overlay graph and interception only can costs up to 1.3%](https://reader030.vdocument.in/reader030/viewer/2022021510/5aa739af7f8b9a294b8bcdab/html5/thumbnails/34.jpg)
OpenGL Driver
PC Driver Model3D Application
DirectX Runtime
DirectX Driver
HW
OpenGL
Unified Compiler