course roadmap evolution of the programmable graphics pipelinecis565/lectures2011/lecture2.pdf ·...

15
Evolution of the Evolution of the Programmable Graphics Programmable Graphics Pipeline Pipeline Lecture 2 Lecture 2 Original Slides by: Suresh Venkatasubramanian Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider Updates by Joseph Kider Course Roadmap Course Roadmap Graphics Pipeline (GLSL) Graphics Pipeline (GLSL) GPGPU (GLSL) GPGPU (GLSL) Briefly Briefly GPU Computing (CUDA, OpenCL) GPU Computing (CUDA, OpenCL) Choose your own adventure Choose your own adventure Student Presentation Student Presentation Final Project Final Project Goal Goal : Prepare you for your presentation : Prepare you for your presentation and project and project Lecture Outline Lecture Outline A historical perspective on the graphics pipeline A historical perspective on the graphics pipeline Dimensions of innovation. Dimensions of innovation. Where we are today Where we are today Fixed Fixed-function vs programmable pipelines function vs programmable pipelines A closer look at the fixed function pipeline A closer look at the fixed function pipeline Walk thru the sequence of operations Walk thru the sequence of operations Reinterpret these as stream operations Reinterpret these as stream operations We can program the fixed We can program the fixed-function pipeline ! function pipeline ! Some examples Some examples What constitutes data and memory, and how What constitutes data and memory, and how access affects program design. access affects program design. The evolution of the pipeline The evolution of the pipeline Elements of the graphics pipeline: 1. A scene description: vertices, triangles, colors, lighting 2. Transformations that map the scene to a camera viewpoint 3. “Effects”: texturing, shadow mapping, lighting calculations 4. Rasterizing: converting geometry into pixels 5. Pixel processing: depth tests, stencil tests, and other per-pixel operations. Parameters controlling design of the pipeline: 1. Where is the boundary between CPU and GPU ? 2. What transfer method is used ? 3. What resources are provided at each step ? 4. What units can access which GPU memory elements ?

Upload: others

Post on 24-Oct-2019

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

Evolution of the Evolution of the Programmable Graphics Programmable Graphics

PipelinePipelineLecture 2Lecture 2

Original Slides by: Suresh VenkatasubramanianOriginal Slides by: Suresh VenkatasubramanianUpdates by Joseph KiderUpdates by Joseph Kider

Course RoadmapCourse Roadmap

►►Graphics Pipeline (GLSL)Graphics Pipeline (GLSL)►►GPGPU (GLSL)GPGPU (GLSL)

BrieflyBriefly►►GPU Computing (CUDA, OpenCL)GPU Computing (CUDA, OpenCL)►►Choose your own adventureChoose your own adventure

Student PresentationStudent PresentationFinal ProjectFinal Project

►►GoalGoal: Prepare you for your presentation : Prepare you for your presentation and projectand project

Lecture OutlineLecture Outline

►► A historical perspective on the graphics pipelineA historical perspective on the graphics pipelineDimensions of innovation.Dimensions of innovation.Where we are todayWhere we are todayFixedFixed--function vs programmable pipelinesfunction vs programmable pipelines

►► A closer look at the fixed function pipelineA closer look at the fixed function pipelineWalk thru the sequence of operationsWalk thru the sequence of operationsReinterpret these as stream operationsReinterpret these as stream operations

►► We can program the fixedWe can program the fixed--function pipeline !function pipeline !Some examplesSome examples

►► What constitutes data and memory, and how What constitutes data and memory, and how access affects program design.access affects program design.

The evolution of the pipelineThe evolution of the pipeline

Elements of the graphics pipeline:

1. A scene description: vertices, triangles, colors, lighting

2. Transformations that map the scene to a camera viewpoint

3. “Effects”: texturing, shadow mapping, lighting calculations

4. Rasterizing: converting geometry into pixels

5. Pixel processing: depth tests, stencil tests, and other per-pixel operations.

Parameters controlling design of the pipeline:

1. Where is the boundary between CPU and GPU ?

2. What transfer method is used ?

3. What resources are provided at each step ?

4. What units can access which GPU memory elements ?

Page 2: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

Generation I: 3dfx Voodoo (1996)Generation I: 3dfx Voodoo (1996)

http://accelenation.com/?ac.id.123.2

• One of the first true 3D game cards

• Worked by supplementing standard 2D video card.

• Did not do vertex transformations:these were done in the CPU

• Did do texture mapping, z-buffering.

PrimitiveAssembly

PrimitiveAssembly

VertexTransforms

VertexTransforms

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

CPU GPUPCI

Aside: Mario Kart 64Aside: Mario Kart 64

Image from: http://www.gamespot.com/users/my_shoe/

►►High fragment load / low vertex loadHigh fragment load / low vertex load

Aside: Mario Kart WiiAside: Mario Kart Wii

High fragment load / low vertex load?

Image from: http://wii.ign.com/dor/objects/949580/mario-kart-wii/images/

VertexTransforms

VertexTransforms

Generation II: GeForce/Radeon 7500 (1998)Generation II: GeForce/Radeon 7500 (1998)

http://accelenation.com/?ac.id.123.5

• Main innovation: shifting the transformation and lighting calculations to the GPU

• Allowed multi-texturing: giving bump maps, light maps, and others..

• Faster AGP bus instead of PCI

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

GPUAGP

Page 3: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

VertexTransforms

VertexTransforms

Generation III: GeForce3/Radeon 8500(2001)Generation III: GeForce3/Radeon 8500(2001)

http://accelenation.com/?ac.id.123.7

• For the first time, allowed limited amount of programmability in the vertex pipeline

• Also allowed volume texturing and multi-sampling (for antialiasing)

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

GPUAGP

Small vertexshaders

Small vertexshaders

VertexTransforms

VertexTransforms

Generation IV: Radeon 9700/GeForce FX (2002)Generation IV: Radeon 9700/GeForce FX (2002)

http://accelenation.com/?ac.id.123.8

• This generation is the first generation of fully-programmable graphics cards

• Different versions have different resource limits on fragment/vertex programs

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

AGPProgrammableVertex shader

ProgrammableVertex shader

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Texture Memory

Generation IV.V: GeForce6/X800 (2004)Generation IV.V: GeForce6/X800 (2004)

Slide adapted from Suresh Venkatasubramanian and Joe Kider

►► Simultaneous rendering to multiple buffersSimultaneous rendering to multiple buffers►► True conditionals and loops True conditionals and loops ►► PCIe busPCIe bus►► Vertex texture fetchVertex texture fetch

VertexTransforms

VertexTransforms

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

PCIeProgrammableVertex shader

ProgrammableVertex shader

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Texture Memory Texture Memory

NVIDIA NV40 ArchitectureNVIDIA NV40 Architecture

Image from GPU Gems 2: http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter30.html

6 vertexshader units

16 fragmentshader units

Vertex TextureFetch

Page 4: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

D3D 10 PipelineD3D 10 Pipeline

Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf

Generation IV.V: GeForce6/X800 (2004)Generation IV.V: GeForce6/X800 (2004)Not exactly a quantum leap, butNot exactly a quantum leap, but……►► Simultaneous rendering to multiple Simultaneous rendering to multiple

buffersbuffers►► True conditionals and loops True conditionals and loops ►► Higher precision throughput in the Higher precision throughput in the

pipeline (64 bits endpipeline (64 bits end--toto--end, compared end, compared to 32 bits earlier.)to 32 bits earlier.)

►► PCIe busPCIe bus►► More memory/program length/texture More memory/program length/texture

accessesaccesses►► Texture access by vertex shaderTexture access by vertex shader

VertexTransforms

VertexTransforms

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

AGPProgrammableVertex shader

ProgrammableVertex shader

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Texture Memory Texture Memory

Generation V: GeForce8800/HD2900 (2006)Generation V: GeForce8800/HD2900 (2006)Complete quantum leapComplete quantum leap►► GroundGround--up rewrite of GPUup rewrite of GPU►► Support for DirectX 10, and all Support for DirectX 10, and all

it implies (more on this later)it implies (more on this later)►► Geometry ShaderGeometry Shader►► Support for General GPU Support for General GPU

programmingprogramming►► Shared Memory (NVIDIA only)Shared Memory (NVIDIA only)

Input Assembler

Input Assembler

ProgrammablePixel

Shader

ProgrammablePixel

Shader

RasterOperations

ProgrammableGeometry

Shader

AGP

ProgrammableVertex shader

ProgrammableVertex shader

OutputMerger

VertexIndex

Stream

3D APICommands

AssembledPrimitives

PixelUpdates

PixelLocationStream

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Tra

nsf

orm

ed

Ver

tice

s

ProgrammableVertex

Processor

ProgrammableVertex

Processor

GPUFront End

GPUFront End

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

3D API:OpenGL orDirect3D

3D API:OpenGL orDirect3D

3DApplication

Or Game

3DApplication

Or Game

Pre-tran

sform

ed

Vertices

Pre-tran

sform

ed

Fragm

ents

Tra

nsf

orm

ed

Fragm

ents

GPU

Com

mand &

Data S

tream

CPU-GPU Boundary (AGP/PCIe)

Fixed-function pipeline

Page 5: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

Geometry Shaders: Point SpritesGeometry Shaders: Point Sprites Geometry Shaders: Point SpritesGeometry Shaders: Point Sprites

Geometry ShadersGeometry Shaders

Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf

NVIDIA G80 ArchitectureNVIDIA G80 Architecture

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Page 6: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

NVIDIA G80 ArchitectureNVIDIA G80 Architecture

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Why Unify Shader Processors?Why Unify Shader Processors?

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Why Unify Shader Processors?Why Unify Shader Processors?

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Unified Shader ProcessorsUnified Shader Processors

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Page 7: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

TerminologyTerminology

Shader Model

Direct3D OpenGL Video cardExample

2 9 2.x NVIDIA GeForce 6800ATI Radeon X800

3 10.x 3.x NVIDIA GeForce 8800ATI Radeon HD 2900

4 11.x 4.x NVIDIA GeForce GTX 480ATI Radeon HD 5870

Shader CapabilitiesShader Capabilities

Table courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Shader CapabilitiesShader Capabilities

Table courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Evolution of the Programmable Evolution of the Programmable Graphics PipelineGraphics Pipeline

Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf

Page 8: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

Evolution of the Programmable Evolution of the Programmable Graphics PipelineGraphics Pipeline

Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf

►►Not covered today:Not covered today:SM 5 / D3D 11 / GL 4SM 5 / D3D 11 / GL 4Tessellation shadersTessellation shaders►►*cough**cough* student presentation student presentation *cough**cough*

Later this semester: NVIDIA FermiLater this semester: NVIDIA Fermi►►Dual warp schedulerDual warp scheduler►►Configurable L1 / shared memoryConfigurable L1 / shared memory►►Double precisionDouble precision►►……

Evolution of the Programmable Evolution of the Programmable Graphics PipelineGraphics Pipeline

New Tool: AMD System MonitorNew Tool: AMD System Monitor►►Released 01/04/2011Released 01/04/2011►► http://support.amd.com/us/kbarticles/Pages/AMDSystemMonitor.aspxhttp://support.amd.com/us/kbarticles/Pages/AMDSystemMonitor.aspx

A closer look at the A closer look at the fixedfixed--function pipelinefunction pipeline

Page 9: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

Pipeline InputPipeline Input

(x, y, z)

(r, g, b,a)

(Nx,Ny,Nz)

(tx, ty,[tz])

(tx, ty)

(tx, ty)

Vertex Image F(x,y) = (r,g,b,a)

Material properties*

ModelView TransformationModelView Transformation

►►Vertices mapped from object space to world Vertices mapped from object space to world space space

►►M = model transformation (scene)M = model transformation (scene)►►V = view transformation (camera)V = view transformation (camera)

X’

Y’

Z’

W’

X

Y

Z

1

M * V *

Each matrix transform is applied to each vertex in the input stream. Think of this as a kernel operator.

Lighting Lighting

Lighting information is combined with Lighting information is combined with normals and other parameters at each normals and other parameters at each vertex in order to create new colors.vertex in order to create new colors.

Color(v) = emissive + ambient + diffuse + specular

Each term in the right hand side is a function of the vertex color, position, normal and material properties.

Clipping/Projection/Viewport(3D)Clipping/Projection/Viewport(3D)

►►More matrix transformations that operate on More matrix transformations that operate on a vertex to transform it into the viewport a vertex to transform it into the viewport space. space.

►►Note that a vertex may be eliminated from Note that a vertex may be eliminated from the input stream (if it is clipped). the input stream (if it is clipped).

►►The viewport is twoThe viewport is two--dimensional: however, dimensional: however, vertex zvertex z--value is retained for depth testing.value is retained for depth testing.

Clip test is first example of a conditional in the pipeline.

However, it is not a fully general conditional. Why ?

Page 10: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

Fragment attributes:

(r,g,b,a)

(x,y,z,w)

(tx,ty), …

Rasterizing+InterpolationRasterizing+Interpolation

►►All primitives are now converted to All primitives are now converted to fragments. fragments.

►►Data type change ! Vertices to fragmentsData type change ! Vertices to fragments

Texture coordinates are interpolated from texture coordinates of vertices.

This gives us a linear interpolation operator for free. VERY USEFUL !

PerPer--fragment operationsfragment operations

►►The rasterizer produces a stream of The rasterizer produces a stream of fragments.fragments.

►►Each fragment undergoes a series of tests Each fragment undergoes a series of tests with increasing complexity.with increasing complexity.

Test 1: Scissor

If (fragment lies in fixed rectangle) let it pass else discard it

Test 2: Alpha

If( fragment.a >= <constant> ) let it pass else discard it.

Scissor test is analogous to clipping operation in fragment space instead of vertex space.

Alpha test is a slightly more general conditional. Why ?

PerPer--fragment operationsfragment operations

►► Stencil test: S(x, y) is stencil buffer value for Stencil test: S(x, y) is stencil buffer value for fragment with coordinates (x,y)fragment with coordinates (x,y)

►► If f(S(x,y)), let pixel pass else kill it. If f(S(x,y)), let pixel pass else kill it. UpdateUpdate S(x, y) conditionally depending on S(x, y) conditionally depending on f(S(x,y)) and g(D(x,y)).f(S(x,y)) and g(D(x,y)).

►► Depth test: D(x, y) is depth buffer value.Depth test: D(x, y) is depth buffer value.►► If g(D(x,y)) let pixel pass else kill it. If g(D(x,y)) let pixel pass else kill it.

UpdateUpdate D(x,y) conditionally.D(x,y) conditionally.

PerPer--fragment operationsfragment operations

►► Stencil and depth tests are more general Stencil and depth tests are more general conditionals. conditionals. Why ?Why ?

►► These are the only tests that can change the state These are the only tests that can change the state of internal storage (stencil buffer, depth buffer). of internal storage (stencil buffer, depth buffer).

►► One of the update operations for the stencil buffer One of the update operations for the stencil buffer is a is a ““countcount”” operation. Remember this!operation. Remember this!

►► Unfortunately, stencil and depth buffers have Unfortunately, stencil and depth buffers have lower precision (8, 24 bits resp.)lower precision (8, 24 bits resp.)

Page 11: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

PostPost--processingprocessing

►►Blending: pixels are accumulated into final Blending: pixels are accumulated into final framebuffer storageframebuffer storage

newnew--val = oldval = old--val val opop pixelpixel--valuevalueIf If opop is +, we can sum all the (say) red is +, we can sum all the (say) red components of pixels that pass all tests. components of pixels that pass all tests.

Problem: In generation<= IV, blending can Problem: In generation<= IV, blending can only be done in 8only be done in 8--bit channels (the channels bit channels (the channels sent to the video card); precision is limited. sent to the video card); precision is limited.

We could use accumulation buffers, but they are very slow.

Quick Review: BuffersQuick Review: Buffers

►►Color BuffersColor BuffersFrontFront--leftleftFrontFront--rightrightBackBack--leftleftBackBack--rightright

►►Depth Buffer (zDepth Buffer (z--buffer)buffer)►►Stencil BufferStencil Buffer►►Accumulation BufferAccumulation Buffer

Quick Review: TestsQuick Review: Tests

►► Scissor TestScissor TestIf(fragment exists inside rectangle)If(fragment exists inside rectangle)

keepkeepElseElse

deletedelete

►► Alpha Test Alpha Test –– Compare fragmentCompare fragment’’s alpha value against s alpha value against reference valuereference value

►► Stencil Test Stencil Test –– Compare fragment against stencil mapCompare fragment against stencil map►► Depth Test Depth Test –– Compare a fragmentCompare a fragment’’s depth to the depth s depth to the depth

value already present in the depth buffervalue already present in the depth bufferNeverNeverAlwaysAlwaysLessLessLessLess--EqualEqualGreaterGreater--EqualEqualGreaterGreaterNotNot--EqualEqual

Readback = FeedbackReadback = Feedback

What is the output of a What is the output of a ““computationcomputation”” ??1.1. Display on screen.Display on screen.2.2. Render to buffer and retrieve values (Render to buffer and retrieve values (readbackreadback))

Readbacks are VERY slow !Readbacks are VERY slow !

PCI and AGP buses are asymmetric: DMA enables fast transfer TO graphics card. Reverse transfer has traditionally not been required, and is much slower. PCIe is symmetric but still very slow compared to GPU speeds.

This motivates idea of “pass” being an atomic “unit cost” operation.

What options do we have ?

1. Render to off-screen buffers like accumulation buffer

2. Copy from framebuffer to texture memory ?

3. Render directly to a texture ?

Page 12: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

Time for a puzzleTime for a puzzle……An Example: Voronoi An Example: Voronoi

Diagrams.Diagrams.

DefinitionDefinition

►►You are given n sites (pYou are given n sites (p11, p, p22, p, p33, , …… ppnn) in ) in the plane (think of each site as having a the plane (think of each site as having a color)color)

►►For any point p in the plane, it is For any point p in the plane, it is closest closest to to some site psome site pjj. Color p with color i.. Color p with color i.

►►Compute this colored map on the plane. In Compute this colored map on the plane. In other words, other words, Compute the nearestCompute the nearest--neighbour diagram of neighbour diagram of the sites. the sites.

ExampleExample

So how do we do this on the graphics card?Note, this does not use any programmable features of the card

Page 13: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

Hint: Think in one dimension Hint: Think in one dimension higherhigher

The lower envelope of “cones” centered at the points is the Voronoi diagram of this set of points.

The ProcedureThe Procedure

►►In order to compute the lower envelope, we In order to compute the lower envelope, we need to determine, at each pixel, the need to determine, at each pixel, the fragment having the smallest depth value.fragment having the smallest depth value.

►►This can be done with a simple depth test. This can be done with a simple depth test. Allow a fragment to pass only if it is smaller Allow a fragment to pass only if it is smaller than the current depth buffer value, and update than the current depth buffer value, and update the buffer accordingly.the buffer accordingly.

►►The fragment that survives has the correct The fragment that survives has the correct color. color.

LetLet’’s make this more complicateds make this more complicated

►►The 1The 1--median of a set of sites is a point q* median of a set of sites is a point q* that minimizes the sum of distances from all that minimizes the sum of distances from all sites to itself.sites to itself.

q* = arg min q* = arg min ΣΣ d(p, q)d(p, q)

WRONG ! RIGHT !

A First StepA First Step

Can we compute, for each pixel q, the valueCan we compute, for each pixel q, the valueF(q) = F(q) = ΣΣ d(p, q)d(p, q)

We can use the cone trick from before, and We can use the cone trick from before, and instead of computing the minimum depth instead of computing the minimum depth value, compute the value, compute the sumsum of all depth values of all depth values using blending.using blending.

WhatWhat’’s the catch ? s the catch ?

Page 14: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

We canWe can’’t blend depth values !t blend depth values !

►► Using texture interpolation helps here. Using texture interpolation helps here. ►► Instead of drawing a single cone, we draw a Instead of drawing a single cone, we draw a

shaded cone, with an appropriately constructed shaded cone, with an appropriately constructed texture map.texture map.

►► Then, fragment having depth z has color Then, fragment having depth z has color component 1.0 * z.component 1.0 * z.

►► Now we can blend the colors.Now we can blend the colors.►► OpenGL has an aggregation operator that will OpenGL has an aggregation operator that will

return the overall minreturn the overall min

Warning: we are ignoring issues of precision.Warning: we are ignoring issues of precision.

Now we apply a Now we apply a streaming perspectivestreaming perspective……

Two kinds of dataTwo kinds of data

►► Stream data (data Stream data (data associated with vertices associated with vertices and fragments)and fragments)

Color/position/texture Color/position/texture coordinates.coordinates.Functionally similar to Functionally similar to member variables in a C++ member variables in a C++ object.object.Can be used for limited Can be used for limited message passing: I modify message passing: I modify an object state and send it an object state and send it to you. to you.

►► ““PersistentPersistent”” data data (associated with buffers).(associated with buffers).

Depth, stencil, textures.Depth, stencil, textures.

►► Can be modifed by Can be modifed by multiple fragments in a multiple fragments in a single pass.single pass.

►► Functionally similar to a Functionally similar to a global array global array BUTBUT each each fragment only gets one fragment only gets one location to change.location to change.

►► Can be used to Can be used to communicate communicate acrossacrosspasses.passes.

Who has access ? Who has access ? ►► Memory Memory ““connectivityconnectivity”” in the graphics use of a GPU is tricky.in the graphics use of a GPU is tricky.►► In a traditional C program, all global variables can be written In a traditional C program, all global variables can be written by all by all

routines. routines. ►► In the fixedIn the fixed--function pipeline, certain data is private.function pipeline, certain data is private.

A fragment cannot change a depth or stencil value of a location A fragment cannot change a depth or stencil value of a location different different from its own.from its own.The framebuffer can be copied to a texture; a depth buffer cannoThe framebuffer can be copied to a texture; a depth buffer cannot be t be copied in this way, and neither can a stencil buffer.copied in this way, and neither can a stencil buffer.Only a stencil buffer can count (efficiently)Only a stencil buffer can count (efficiently)

►► In the fixedIn the fixed--function pipeline, depth and stencil buffers can be used in function pipeline, depth and stencil buffers can be used in a multia multi--pass computation only via readbacks. pass computation only via readbacks.

►► A texture cannot be written directly. A texture cannot be written directly. ►► In programmable GPUs, the memory connectivity becomes more open,In programmable GPUs, the memory connectivity becomes more open,

but there are still constraints. but there are still constraints.

Understanding access constraints and memory Understanding access constraints and memory ““connectivityconnectivity”” is a key is a key step in programming the GPU.step in programming the GPU.

Page 15: Course Roadmap Evolution of the Programmable Graphics Pipelinecis565/Lectures2011/Lecture2.pdf · Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph

How does this relate to stream How does this relate to stream programs ?programs ?

►► The most important question to ask when The most important question to ask when programming the GPU is: programming the GPU is:

What can I do in one pass ?What can I do in one pass ?►► Limitations on memory connectivity mean that a Limitations on memory connectivity mean that a

step in a computation may often have to be step in a computation may often have to be deferred to a new pass. deferred to a new pass.

►► For example, when computing the second smallest For example, when computing the second smallest element, we could not store the current minimum element, we could not store the current minimum in read/write memory.in read/write memory.

►► Thus, the Thus, the ““communicationcommunication”” of this value has to of this value has to happen across a pass. happen across a pass.

Graphics pipelineGraphics pipeline

VertexIndex

Stream

3D APICommands

AssembledPrimitives

PixelUpdates

PixelLocationStream

GPUFront End

GPUFront End

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

3D API:OpenGL orDirect3D

3D API:OpenGL orDirect3D

3DApplication

Or Game

3DApplication

Or Game

GPU

Com

man

d &

Data S

tream

CPU-GPU Boundary

Vertex pipeline Fragment pipeline