gpus – under the hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · gpus...
TRANSCRIPT
![Page 1: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/1.jpg)
GPUs – Under the Hood
Prof. Aaron Lanterman School of Electrical and Computer Engineering
Georgia Institute of Technology
![Page 2: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/2.jpg)
2
Bandwidth – Gravity of modern computer systems • The bandwidth between key components
ultimately dictates system performance – Especially true for massively parallel systems
processing massive amount of data – Tricks like buffering, reordering, caching can
temporarily defy the rules in some cases – Ultimately, the performance falls back to what
the “feeds and speeds” dictate – PCIe replaced AGP (Advanced Graphics Port)
Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 6, Fall 2007; used with permission See http://courses.engr.illinois.edu/ece498/al
![Page 3: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/3.jpg)
3
3D buzzwords • Fill Rate – how fast the GPU can generate
pixels, often a strong predictor for application frame rate
• Performance Metrics – Mtris/sec - Triangle Rate – Mverts/sec - Vertex Rate – Mpixels/sec - Pixel Fill (Write) Rate – Mtexels/sec - Texture Fill (Read) Rate – Msamples/sec - Antialiasing Fill (Write) Rate
Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See http://courses.engr.illinois.edu/ece498/al
![Page 4: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/4.jpg)
4
Adding programmability to the pipeline
3D Application or Game
3D API: OpenGL or Direct3D
Programmable Vertex
Processor
Primitive Assembly
Rasterization & Interpolation
3D API Commands
Transformed Vertices
Assembled Polygons, Lines, and
Points
GPU Command &
Data Stream
Programmable Fragment Processor
Rasterized Pre-transformed
Fragments
Transformed Fragments
Raster Operations Framebuffer
Pixel Updates GPU
Front End
Pre-transformed Vertices
Vertex Index Stream
Pixel Location Stream
CPU – GPU Boundary
Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See http://courses.engr.illinois.edu/ece498/al
![Page 5: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/5.jpg)
Shader data • Typically floats, and vectors/matrices
of floats • Fixed size arrays • Three main types:
– Per-instance data, e.g., per-vertex position
– Per-pixel interpolated data, e.g., texture coordinates
– Per-batch data, e.g., light position • Data are tightly bound to the GPU
![Page 6: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/6.jpg)
Shader flow control • Very simple • No recursion • Fixed size loops for Shader Model 2.0
or earlier • Simple if-then-else statements
allowed in the latest APIs • Texkill (asm) or clip (HLSL) or discard (GLSL) allows you to abort a write to a pixel (form of flow control)
![Page 7: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/7.jpg)
7
Specialized instructions (GeForce 6)
• Dot products • Exponential instructions:
– EXP, LOG – LIT (Blinn specular lighting model calculation!)
• Reciprocal instructions: – RCP (reciprocal) – RSQ (reciprocal square root!)
• Trignometric functions – SIN, COS
• Swizzling (swapping xyzw), write masking (only some xyzw get assigned), and negation is “free”
From GPU Gems 2, p. 484
![Page 8: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/8.jpg)
Vertex shader • Transform to clip space • Inputs:
– Common inputs: • Vertex position (x, y, z, w) • Texture coordinate • Vertex colors • Constant inputs
– Output to a pixel (fragment) shader • Vertex shader is executed once per vertex, so
usually less expensive than pixel shader
![Page 9: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/9.jpg)
oD1
Vertex shader data flow (3.0)
Vertex Shader
v15 v0 v1 v2
16 Vertex data registers
Vertex stream
Cn
C0
C1
C2
Con
stan
t flo
at re
gist
ers
(at l
east
256
) 16
Con
stan
t Int
eger
Reg
iste
rs
r31
r0
r1
r2
32 Temporary registers
Each register is a 4-component vector register except aL
aL Loop
Register
a0 Address Register
oPos oTn
texture position fog
oFog oD0
Diff. color Spec. color
oPts
Output Pt size
12 output registers
![Page 10: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/10.jpg)
Vertex shader: logical view Vertex Processing Unit
Per-vertex Input Data
Per-vertex Output Data
Register File
r0 r1 r2 r3 ...
Swizzle / Mask Unit
.rgba
.xyzw
.zzzz
.xxyz ...
cosine log sine sub add ...
Math/Logic Unit
Shader Resources (bound by application)
Shader Start Addr Bound Textures Bound Samplers Bound Consants
Sampler Unit
Texture Memory
Shader Constants
Input Data Architectural State
Output Data Control Logic
State Information
Memory
Transformed and
Lit vertices
![Page 11: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/11.jpg)
Some uses of vertex shaders • Transform vertices to clip space • Pass normal, texture coordinates to PS • Transform vectors to other spaces (e.g.,
texture space) • Calculate per-vertex lighting (e.g., Gouraud
shading) • Distort geometry (waves)
Adapted from Mart Slot’s presentation
![Page 12: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/12.jpg)
12
Easy cross products and normalization
From Stanford CS448A: Real-Time Graphics Architectures See graphics.stanford.edu/courses/cs448a-01-fall
![Page 13: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/13.jpg)
13
Blinn lighting in “one” instruction
From Stanford CS448A: Real-Time Graphics Architectures See graphics.stanford.edu/courses/cs448a-01-fall
![Page 14: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/14.jpg)
14
Simple graphics pipeline
From Stanford CS448A: Real-Time Graphics Architectures See graphics.stanford.edu/courses/cs448a-01-fall
![Page 15: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/15.jpg)
Pixel (or fragment) shader (1) • Determine each fragment’s color
– Custom (sophisticated) pixel operations – Texture sampling
• Inputs – Interpolated output from vertex shader – Typically vertex position, vertex normals, texture
coordinates, etc. – These registers could be reused for other purpose
• Output – Color (including alpha) – Depth value (optional)
![Page 16: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/16.jpg)
Pixel (or fragment) shader (2)
• Executed once per pixel, hence typically executed many more times than a vertex shader
• It is advantageous to compute stuff on a per-vertex basis to improve performance
![Page 17: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/17.jpg)
Pixel shader data flow (3.0)
Pixel Shader
Color (diff/spec) and texture coord. registers
Pixel stream
Cn
C0
C1
Con
stan
t reg
iste
rs
(16
INT,
224
Flo
at)
r31
r0
r1
Temporary registers
oC0 oDepth
Depth color
s15
s0
s1
Sam
pler
Reg
iste
rs
(Up
to 1
6 te
xtur
e su
rface
s
can
be re
ad in
a s
ingl
e pa
ss)
v9 v0 v1
![Page 18: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/18.jpg)
Pixel shader: logical view Pixel Processing Unit
Per-pixel Input Data
Per-pixel Output Data
Register File
r0 r1 r2 r3 ...
Swizzle / Mask Unit
.rgba
.xyzw
.zzzz
.xxyz ...
cosine log sine sub add ...
Math/Logic Unit
Shader Resources (bound by application)
Shader Start Addr Bound Textures Bound Samplers Bound Consants
Sampler Unit
Texture Memory
Shader Constants
Input Data Architectural State
Output Data Control Logic
State Information
Memory
Interpolator
Pixel Color Depth Info Stencil Info
Color buffer Depth Buffer Stencil Buffer
![Page 19: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/19.jpg)
Some uses of pixel shaders • Texturing objects • Per-pixel lighting (e.g., Phong shading) • Normal mapping (each pixel has its own
normal) • Shadows (determine whether a pixel is
shadowed or not) • Environment mapping
Adapted from Mart Slot’s presentation
![Page 20: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/20.jpg)
20
Old GeForce graphics pipeline Host
Vertex Control Vertex Cache
VS/T&L
Triangle Setup
Raster
Shader
ROP
FBI
Texture Cache Frame
Buffer Memory
Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See http://courses.engr.illinois.edu/ece498/al
![Page 21: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/21.jpg)
21
Vertex cache • Reusing vertices between primitives
saves PCIe bus bandwidth and GPU computational resources
• A vertex cache attempts to exploit “commonality” between triangles to generate vertex reuse
• Unfortunately, many applications do not use efficient triangular ordering
Host
Vertex Control Vertex Cache
VS/T&L
Triangle Setup
Raster
Shader
ROP
FBI
Texture Cache Frame
Buffer Memory
Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See http://courses.engr.illinois.edu/ece498/al
![Page 22: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/22.jpg)
22
Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See http://courses.engr.illinois.edu/ece498/al
Texture cache • Stores temporally local texel values
to reduce bandwidth requirements
• Due to nature of texture filtering high degrees of efficiency are possible (75% or better hit rates)
• Reduces texture (memory) bandwidth by a factor of four for bilinear filtering
Host
Vertex Control Vertex Cache
T&L
Triangle Setup
Raster
Shader
ROP
FBI
Texture Cache Frame
Buffer Memory
![Page 23: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/23.jpg)
23
Built-in texture filtering (GeForce 6) • Pixel texturing
– Hardware supports 2D, 3D, and cube map – Non power-of-2 textures OK – Hardware handles addressing and interpolation
• Bilinear, trilinear (3D or mipmap), anisotropic
• Vertex texturing – Vertex processors can access texture memory too – Only nearest-neighbor filtering supported in G60
hardware
![Page 24: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/24.jpg)
24
ROP (Raster Operations)
• C-ROP performs frame buffer blending – Combinations of colors and transparency – Antialiasing – Read/Modify/Write the Color Buffer
• Z-ROP performs the Z operations – Determine the visible pixels – Discard the occluded pixels – Read/Modify/Write the Z-Buffer
• ROP on GeForce also performs – “Coalescing” of transactions – Z-Buffer compression/decompression
Host
Vertex Control Vertex Cache
T&L
Triangle Setup
Raster
Shader
ROP
FBI
Texture Cache Frame
Buffer Memory
Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See http://courses.engr.illinois.edu/ece498/al
![Page 25: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/25.jpg)
25
The frame buffer • The primary determinant of graphics
performance other than the GPU • The most expensive component of a
graphics product other than the GPU • Memory bandwidth is the key • Frame buffer size also determines
– Local texture storage – Maximum resolutions – Anitaliasing resolution limits
Host
Vertex Control Vertex Cache
T&L
Triangle Setup
Raster
Shader
ROP
FBI
Texture Cache Frame
Buffer Memory
Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See http://courses.engr.illinois.edu/ece498/al
![Page 26: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/26.jpg)
26
Frame Buffer Interface (FBI)
• Manages reading from and writing to frame buffer
• Perhaps the most performance-critical component of a GPU
• GeForce’s FBI is a crossbar • Independent memory controllers for
4+ independent memory banks for more efficient access to frame buffer
Host
Vertex Control
Surface Engine Vertex Cache
T&L
Triangle Setup
Raster
Shader
ROP
FBI
Texture Cache Frame
Buffer Memory
Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 5, Fall 2007; used with permission See http://courses.engr.illinois.edu/ece498/al
![Page 27: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/27.jpg)
27
GeForce 7800 GTX board details
256MB/256-bit DDR3 600 MHz 8 pieces of 8Mx32 16x PCI-Express
SLI Connector
DVI x 2
sVideo TV Out
Single slot cooling
Slide by David Kirk/NVIDIA and Wen-mei. W. Hwu, 2007, from UIUC ECE498 Lecture 6, Fall 2007; used with permission See http://courses.engr.illinois.edu/ece498/al
![Page 28: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/28.jpg)
28
From www.xbitlabs.com/articles/video/display/g70-indepth.html NVIDIA 7800 GTX
ROPs (Raster Op. Units)
Vertex Processors
Pixel Processors
G70 Architecture
![Page 29: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/29.jpg)
29
NVIDIA 7800 GTX
Vertex Processors
G70 Architecture
NVIDIA 7800 GTX – Vertex processors
G70 Architecture
7800 GTX has 8 of these
![Page 30: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/30.jpg)
30
NVIDIA 7800 GTX
G70 Architecture
NVIDIA 7800 GTX – Pixel processors
8 MADD (multiply/add) instructions in a single cycle
From http://www.xbitlabs.com/articles/video/display/g70-indepth_3.html
7800 GTX has 24 of these
![Page 31: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/31.jpg)
31
NVIDIA 7800 GTX
Vertex Processors
G70 Architecture
Modern GPUs: unified design G70 Architecture
Slide by David Luebke from http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
![Page 32: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/32.jpg)
GeForce 8 architecture
32
Slide by David Luebke from http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
![Page 33: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/33.jpg)
Why unify? (1)
33 Slide by David Luebke from http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
![Page 34: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/34.jpg)
Why unify? (2)
34 Slide by David Luebke from http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
![Page 35: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/35.jpg)
Dynamic load balancing – Company of Heroes
35 Slide by David Luebke from http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf
![Page 36: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/36.jpg)
Motivation for shader languages
From The Cg Tutorial
• Programming powerful hardware with assembly code is hard
• Programmers need the benefits of
a high-level language: – Easier programming – Easier code reuse – Easier debugging – Portability
Assembly … DP3 R0, c[11].xyzx, c[11].xyzx; RSQ R0, R0.x; MUL R0, R0.x, c[11].xyzx; MOV R1, c[3]; MUL R1, R1.x, c[0].xyzx; DP3 R2, R1.xyzx, R1.xyzx; RSQ R2, R2.x; MUL R1, R2.x, R1.xyzx; ADD R2, R0.xyzx, R1.xyzx; DP3 R3, R2.xyzx, R2.xyzx; RSQ R3, R3.x; MUL R2, R3.x, R2.xyzx; DP3 R2, R1.xyzx, R2.xyzx; MAX R2, c[3].z, R2.x; MOV R2.z, c[3].y; MOV R2.w, c[3].y; LIT R2, R2; ...
float3 cSpecular = pow(max(0, dot(Nf, H)), phongExp).xxx; float3 cPlastic = Cd * (cAmbient + cDiffuse) + Cs * cSpecular;
![Page 37: GPUs – Under the Hoodstanley.gatech.edu/.../466/2016/07/gpulecture07su16_underthehood.pdf · GPUs – Under the Hood Prof. Aaron Lanterman School of Electrical and Computer Engineering](https://reader033.vdocument.in/reader033/viewer/2022042801/5a85a1177f8b9a14748c39fd/html5/thumbnails/37.jpg)
Shader languages • HLSL/Cg most common
– Both are more-or-less compatible
• Other alternatives: – GLSL (for OpenGL) – Assembly? (not anymore…)