evolution of the programmable graphics pipeline lecture 2 original slides by: suresh...

57
Evolution of the Evolution of the Programmable Programmable Graphics Pipeline Graphics Pipeline Lecture 2 Lecture 2 Original Slides by: Suresh Venkatasubramanian Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider Updates by Joseph Kider

Upload: julie-kelly

Post on 23-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Evolution of the Evolution of the Programmable Programmable

Graphics PipelineGraphics Pipeline

Lecture 2Lecture 2Original Slides by: Suresh VenkatasubramanianOriginal Slides by: Suresh Venkatasubramanian

Updates by Joseph KiderUpdates by Joseph Kider

Page 2: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Course RoadmapCourse Roadmap

►Graphics Pipeline (GLSL)Graphics Pipeline (GLSL)►GPGPU (GLSL)GPGPU (GLSL)

BrieflyBriefly►GPU Computing (CUDA, OpenCL)GPU Computing (CUDA, OpenCL)►Choose your own adventureChoose your own adventure

Student PresentationStudent Presentation Final ProjectFinal Project

►GoalGoal: Prepare you for your : Prepare you for your presentation and projectpresentation and project

Page 3: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Lecture OutlineLecture Outline

► A historical perspective on the graphics pipelineA historical perspective on the graphics pipeline Dimensions of innovation.Dimensions of innovation. Where we are todayWhere we are today Fixed-function vs programmable pipelinesFixed-function vs programmable pipelines

► A closer look at the fixed function pipelineA closer look at the fixed function pipeline Walk thru the sequence of operationsWalk thru the sequence of operations Reinterpret these as stream operationsReinterpret these as stream operations

► We can program the fixed-function pipeline !We can program the fixed-function pipeline ! Some examplesSome examples

► What constitutes data and memory, and how What constitutes data and memory, and how access affects program design.access affects program design.

Page 4: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

The evolution of the pipelineThe evolution of the pipeline

Elements of the graphics pipeline:

1. A scene description: vertices, triangles, colors, lighting

2. Transformations that map the scene to a camera viewpoint

3. “Effects”: texturing, shadow mapping, lighting calculations

4. Rasterizing: converting geometry into pixels

5. Pixel processing: depth tests, stencil tests, and other per-pixel operations.

Parameters controlling design of the pipeline:

1. Where is the boundary between CPU and GPU ?

2. What transfer method is used ?

3. What resources are provided at each step ?

4. What units can access which GPU memory elements ?

Page 5: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Generation I: 3dfx Voodoo Generation I: 3dfx Voodoo (1996)(1996)

http://accelenation.com/?ac.id.123.2

• One of the first true 3D game cards

• Worked by supplementing standard 2D video card.

• Did not do vertex transformations: these were done in the CPU

• Did do texture mapping, z-buffering.

PrimitiveAssembly

PrimitiveAssembly

VertexTransforms

VertexTransforms

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

CPU GPUPCI

Page 6: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Aside: Mario Kart 64Aside: Mario Kart 64

Image from: http://www.gamespot.com/users/my_shoe/

►High fragment load / low vertex loadHigh fragment load / low vertex load

Page 7: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Aside: Mario Kart WiiAside: Mario Kart Wii

High fragment load / low vertex load?

Image from: http://wii.ign.com/dor/objects/949580/mario-kart-wii/images/

Page 8: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

VertexTransforms

VertexTransforms

Generation II: GeForce/Radeon 7500 Generation II: GeForce/Radeon 7500 (1998)(1998)

http://accelenation.com/?ac.id.123.5

• Main innovation: shifting the transformation and lighting calculations to the GPU

• Allowed multi-texturing: giving bump maps, light maps, and others..

• Faster AGP bus instead of PCI

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

GPUAGP

Page 9: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

VertexTransforms

VertexTransforms

Generation III: GeForce3/Radeon Generation III: GeForce3/Radeon 8500(2001)8500(2001)

http://accelenation.com/?ac.id.123.7

• For the first time, allowed limited amount of programmability in the vertex pipeline

• Also allowed volume texturing and multi-sampling (for antialiasing)

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

GPUAGP

Small vertexshaders

Small vertexshaders

Page 10: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

VertexTransforms

VertexTransforms

Generation IV: Radeon 9700/GeForce FX Generation IV: Radeon 9700/GeForce FX (2002)(2002)

http://accelenation.com/?ac.id.123.8

• This generation is the first generation of fully-programmable graphics cards

• Different versions have different resource limits on fragment/vertex programs

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

AGPProgrammableVertex shader

ProgrammableVertex shader

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Texture Memory

Page 11: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Generation IV.V: GeForce6/X800 Generation IV.V: GeForce6/X800 (2004)(2004)

Slide adapted from Suresh Venkatasubramanian and Joe Kider

► Simultaneous rendering to multiple Simultaneous rendering to multiple buffersbuffers

► True conditionals and loops True conditionals and loops ► PCIe busPCIe bus► Vertex texture fetchVertex texture fetch

VertexTransforms

VertexTransforms

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

PCIeProgrammableVertex shader

ProgrammableVertex shader

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Texture Memory Texture Memory

Page 12: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

NVIDIA NV40 ArchitectureNVIDIA NV40 Architecture

Image from GPU Gems 2: http://http.developer.nvidia.com/GPUGems2/gpugems2_chapter30.html

6 vertexshader units

16 fragmentshader units

Vertex TextureFetch

Page 13: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

D3D 10 PipelineD3D 10 Pipeline

Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf

Page 14: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Generation IV.V: GeForce6/X800 (2004)Generation IV.V: GeForce6/X800 (2004)Not exactly a quantum leap, but…Not exactly a quantum leap, but…► Simultaneous rendering to Simultaneous rendering to

multiple buffersmultiple buffers► True conditionals and loops True conditionals and loops ► Higher precision throughput in the Higher precision throughput in the

pipeline (64 bits end-to-end, pipeline (64 bits end-to-end, compared to 32 bits earlier.)compared to 32 bits earlier.)

► PCIe busPCIe bus► More memory/program More memory/program

length/texture accesseslength/texture accesses► Texture access by vertex shaderTexture access by vertex shader

VertexTransforms

VertexTransforms

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

AGPProgrammableVertex shader

ProgrammableVertex shader

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Texture Memory Texture Memory

Page 15: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Generation V: GeForce8800/HD2900 Generation V: GeForce8800/HD2900 (2006)(2006)

Complete quantum leapComplete quantum leap► Ground-up rewrite of GPUGround-up rewrite of GPU► Support for DirectX 10, and Support for DirectX 10, and

all it implies (more on this all it implies (more on this later)later)

► Geometry ShaderGeometry Shader► Support for General GPU Support for General GPU

programmingprogramming► Shared Memory (NVIDIA only)Shared Memory (NVIDIA only)

Input Assembler

Input Assembler

ProgrammablePixel

Shader

ProgrammablePixel

Shader

RasterOperations

ProgrammableGeometry

Shader

AGP

ProgrammableVertex shader

ProgrammableVertex shader

OutputMerger

Page 16: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

VertexIndex

Stream

3D APICommands

AssembledPrimitives

PixelUpdates

PixelLocationStream

ProgrammableFragmentProcessor

ProgrammableFragmentProcessor

Tra

nsf

orm

ed

Vert

ices

ProgrammableVertex

Processor

ProgrammableVertex

Processor

GPUFront End

GPUFront End

PrimitiveAssembly

PrimitiveAssembly

Frame Buffer

Frame Buffer

RasterOperations

Rasterizationand

Interpolation

3D API:OpenGL orDirect3D

3D API:OpenGL orDirect3D

3DApplication

Or Game

3DApplication

Or Game

Pre

-transfo

rmed

Vertice

s

Pre

-transfo

rmed

Fragm

en

ts

Tra

nsf

orm

ed

Fragm

en

ts

GPU

Com

mand &

Data

Stre

am

CPU-GPU Boundary (AGP/PCIe)

Fixed-function pipeline

Page 17: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Geometry Shaders: Point Geometry Shaders: Point SpritesSprites

Page 18: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Geometry Shaders: Point Geometry Shaders: Point SpritesSprites

Page 19: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Geometry ShadersGeometry Shaders

Image from David Blythe : http://download.microsoft.com/download/f/2/d/f2d5ee2c-b7ba-4cd0-9686-b6508b5479a1/direct3d10_web.pdf

Page 20: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

NVIDIA G80 ArchitectureNVIDIA G80 Architecture

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Page 21: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

NVIDIA G80 ArchitectureNVIDIA G80 Architecture

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Page 22: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Why Unify Shader Why Unify Shader Processors?Processors?

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Page 23: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Why Unify Shader Why Unify Shader Processors?Processors?

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Page 24: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Unified Shader ProcessorsUnified Shader Processors

Slide from David Luebke: http://s08.idav.ucdavis.edu/luebke-nvidia-gpu-architecture.pdf

Page 25: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

TerminologyTerminology

Shader Model

Direct3D OpenGL Video card

Example

2 9 2.x NVIDIA GeForce 6800

ATI Radeon X800

3 10.x 3.x NVIDIA GeForce 8800

ATI Radeon HD 2900

4 11.x 4.x NVIDIA GeForce GTX 480

ATI Radeon HD 5870

Page 26: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Shader CapabilitiesShader Capabilities

Table courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Page 27: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Shader CapabilitiesShader Capabilities

Table courtesy of A K Peters, Ltd. http://www.realtimerendering.com/

Page 28: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Evolution of the Programmable Evolution of the Programmable Graphics PipelineGraphics Pipeline

Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf

Page 29: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Evolution of the Programmable Evolution of the Programmable Graphics PipelineGraphics Pipeline

Slide from Mike Houston: http://s09.idav.ucdavis.edu/talks/01-BPS-SIGGRAPH09-mhouston.pdf

Page 30: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

►Not covered today:Not covered today: SM 5 / D3D 11 / GL 4SM 5 / D3D 11 / GL 4 Tessellation shadersTessellation shaders

►*cough**cough* student presentation student presentation *cough**cough*

Later this semester: NVIDIA FermiLater this semester: NVIDIA Fermi►Dual warp schedulerDual warp scheduler►Configurable L1 / shared memoryConfigurable L1 / shared memory►Double precisionDouble precision►……

Evolution of the Programmable Evolution of the Programmable Graphics PipelineGraphics Pipeline

Page 31: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

New Tool: AMD System New Tool: AMD System MonitorMonitor

►Released 01/04/2011Released 01/04/2011►

http://support.amd.com/us/kbarticles/Pages/AMDSystemMonitor.aspxhttp://support.amd.com/us/kbarticles/Pages/AMDSystemMonitor.aspx

Page 32: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

A closer look at the A closer look at the fixed-function fixed-function

pipelinepipeline

Page 33: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Pipeline InputPipeline Input

(x, y, z)

(r, g, b,a)

(Nx,Ny,Nz)

(tx, ty,[tz])

(tx, ty)

(tx, ty)

Vertex Image F(x,y) = (r,g,b,a)

Material properties

*

Page 34: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

ModelView TransformationModelView Transformation

►Vertices mapped from object space to Vertices mapped from object space to world space world space

►M = model transformation (scene)M = model transformation (scene)►V = view transformation (camera)V = view transformation (camera)

X’

Y’

Z’

W’

X

Y

Z

1

M * V *

Each matrix transform is applied to each vertex in the input stream. Think of this as a kernel operator.

Page 35: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Lighting Lighting

Lighting information is combined with Lighting information is combined with normals and other parameters at each normals and other parameters at each vertex in order to create new colors.vertex in order to create new colors.Color(v) = emissive + ambient + diffuse +

specular

Each term in the right hand side is a function of the vertex color, position, normal and material properties.

Page 36: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Clipping/Projection/Clipping/Projection/Viewport(3D)Viewport(3D)

►More matrix transformations that operate on a More matrix transformations that operate on a vertex to transform it into the viewport space. vertex to transform it into the viewport space.

►Note that a vertex may be eliminated from the Note that a vertex may be eliminated from the input stream (if it is clipped). input stream (if it is clipped).

►The viewport is two-dimensional: however, The viewport is two-dimensional: however, vertex z-value is retained for depth testing.vertex z-value is retained for depth testing.

Clip test is first example of a conditional in the pipeline.

However, it is not a fully general conditional. Why ?

Page 37: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Fragment attributes:

(r,g,b,a)

(x,y,z,w)

(tx,ty), …

Rasterizing+InterpolationRasterizing+Interpolation

►All primitives are now converted to fragments. All primitives are now converted to fragments. ►Data type change ! Vertices to fragmentsData type change ! Vertices to fragments

Texture coordinates are interpolated from texture coordinates of vertices.

This gives us a linear interpolation operator for free. VERY USEFUL !

Page 38: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Per-fragment operationsPer-fragment operations

►The rasterizer produces a stream of fragments.The rasterizer produces a stream of fragments.►Each fragment undergoes a series of tests with Each fragment undergoes a series of tests with

increasing complexity.increasing complexity.

Test 1: Scissor

If (fragment lies in fixed rectangle) let it pass else discard it

Test 2: Alpha

If( fragment.a >= <constant> ) let it pass else discard it.

Scissor test is analogous to clipping operation in fragment space instead of vertex space.

Alpha test is a slightly more general conditional. Why ?

Page 39: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Per-fragment operationsPer-fragment operations

► Stencil test: S(x, y) is stencil buffer value for Stencil test: S(x, y) is stencil buffer value for fragment with coordinates (x,y)fragment with coordinates (x,y)

► If f(S(x,y)), let pixel pass else kill it. If f(S(x,y)), let pixel pass else kill it. UpdateUpdate S(x, y) conditionally depending on S(x, y) conditionally depending on f(S(x,y)) and g(D(x,y)).f(S(x,y)) and g(D(x,y)).

► Depth test: D(x, y) is depth buffer value.Depth test: D(x, y) is depth buffer value.► If g(D(x,y)) let pixel pass else kill it. If g(D(x,y)) let pixel pass else kill it.

UpdateUpdate D(x,y) conditionally. D(x,y) conditionally.

Page 40: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Per-fragment operationsPer-fragment operations

► Stencil and depth tests are more general Stencil and depth tests are more general conditionals. conditionals. Why ?Why ?

► These are the only tests that can change the These are the only tests that can change the state of internal storage (stencil buffer, state of internal storage (stencil buffer, depth buffer). depth buffer).

► One of the update operations for the stencil One of the update operations for the stencil buffer is a “count” operation. Remember buffer is a “count” operation. Remember this!this!

► Unfortunately, stencil and depth buffers Unfortunately, stencil and depth buffers have lower precision (8, 24 bits resp.)have lower precision (8, 24 bits resp.)

Page 41: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Post-processingPost-processing

►Blending: pixels are accumulated into final Blending: pixels are accumulated into final framebuffer storageframebuffer storage

new-val = old-val new-val = old-val opop pixel-value pixel-value

If If opop is +, we can sum all the (say) red is +, we can sum all the (say) red components of pixels that pass all tests. components of pixels that pass all tests.

Problem: In generation<= IV, blending can Problem: In generation<= IV, blending can only be done in 8-bit channels (the channels only be done in 8-bit channels (the channels sent to the video card); precision is limited. sent to the video card); precision is limited.

We could use accumulation buffers, but they are very slow.

Page 42: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Quick Review: BuffersQuick Review: Buffers

►Color BuffersColor Buffers Front-leftFront-left Front-rightFront-right Back-leftBack-left Back-rightBack-right

►Depth Buffer (z-buffer)Depth Buffer (z-buffer)►Stencil BufferStencil Buffer►Accumulation BufferAccumulation Buffer

Page 43: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Quick Review: TestsQuick Review: Tests

► Scissor TestScissor TestIf(fragment exists inside rectangle)If(fragment exists inside rectangle)

keepkeepElseElse

deletedelete

► Alpha Test – Compare fragment’s alpha value Alpha Test – Compare fragment’s alpha value against reference valueagainst reference value

► Stencil Test – Compare fragment against stencil mapStencil Test – Compare fragment against stencil map► Depth Test – Compare a fragment’s depth to the Depth Test – Compare a fragment’s depth to the

depth value already present in the depth bufferdepth value already present in the depth buffer NeverNever AlwaysAlways LessLess Less-EqualLess-Equal Greater-EqualGreater-Equal GreaterGreater Not-EqualNot-Equal

Page 44: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Readback = FeedbackReadback = Feedback

What is the output of a “computation” ?What is the output of a “computation” ?

1.1. Display on screen.Display on screen.

2.2. Render to buffer and retrieve values Render to buffer and retrieve values ((readbackreadback))

Readbacks are VERY slow !Readbacks are VERY slow ! PCI and AGP buses are asymmetric: DMA enables fast transfer TO graphics card. Reverse transfer has traditionally not been required, and is much slower. PCIe is symmetric but still very slow compared to GPU speeds.

This motivates idea of “pass” being an atomic “unit cost” operation.

What options do we have ?

1. Render to off-screen buffers like accumulation buffer

2. Copy from framebuffer to texture memory ?

3. Render directly to a texture ?

Page 45: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Time for a puzzle…Time for a puzzle…

Page 46: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

An Example: Voronoi An Example: Voronoi Diagrams.Diagrams.

Page 47: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

DefinitionDefinition

►You are given n sites (pYou are given n sites (p11, p, p22, p, p33, … p, … pnn) in ) in the plane (think of each site as having a the plane (think of each site as having a color)color)

►For any point p in the plane, it is For any point p in the plane, it is closest closest to to some site psome site pjj. Color p with color i.. Color p with color i.

►Compute this colored map on the plane. In Compute this colored map on the plane. In other words, other words,

Compute the nearest-neighbour diagram Compute the nearest-neighbour diagram of the sites. of the sites.

Page 48: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

ExampleExample

So how do we do this on the graphics card?Note, this does not use any programmable features of the card

Page 49: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Hint: Think in one dimension Hint: Think in one dimension higherhigher

The lower envelope of “cones” centered at the points is the Voronoi diagram of this set of points.

Page 50: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

The ProcedureThe Procedure

► In order to compute the lower envelope, In order to compute the lower envelope, we need to determine, at each pixel, the we need to determine, at each pixel, the fragment having the smallest depth value.fragment having the smallest depth value.

►This can be done with a simple depth test. This can be done with a simple depth test. Allow a fragment to pass only if it is smaller Allow a fragment to pass only if it is smaller

than the current depth buffer value, and than the current depth buffer value, and update the buffer accordingly.update the buffer accordingly.

►The fragment that survives has the The fragment that survives has the correct color. correct color.

Page 51: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Let’s make this more Let’s make this more complicatedcomplicated

►The 1-median of a set of sites is a The 1-median of a set of sites is a point q* that minimizes the sum of point q* that minimizes the sum of distances from all sites to itself.distances from all sites to itself.

q* = arg min q* = arg min ΣΣ d(p, q) d(p, q)

WRONG ! RIGHT !

Page 52: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

A First StepA First Step

Can we compute, for each pixel q, the valueCan we compute, for each pixel q, the value

F(q) = F(q) = ΣΣ d(p, q) d(p, q)

We can use the cone trick from before, and We can use the cone trick from before, and instead of computing the minimum depth instead of computing the minimum depth value, compute the value, compute the sumsum of all depth values of all depth values using blending.using blending.

What’s the catch ? What’s the catch ?

Page 53: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

We can’t blend depth We can’t blend depth values !values !

► Using texture interpolation helps here. Using texture interpolation helps here. ► Instead of drawing a single cone, we draw a shaded Instead of drawing a single cone, we draw a shaded

cone, with an appropriately constructed texture cone, with an appropriately constructed texture map.map.

► Then, fragment having depth z has color Then, fragment having depth z has color component 1.0 * z.component 1.0 * z.

► Now we can blend the colors.Now we can blend the colors.► OpenGL has an aggregation operator that will OpenGL has an aggregation operator that will

return the overall minreturn the overall min

Warning: we are ignoring issues of precision.Warning: we are ignoring issues of precision.

Page 54: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Now we apply a Now we apply a streaming streaming

perspective…perspective…

Page 55: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Two kinds of dataTwo kinds of data

► Stream data (data Stream data (data associated with associated with vertices and vertices and fragments)fragments) Color/position/texture Color/position/texture

coordinates.coordinates. Functionally similar to Functionally similar to

member variables in a member variables in a C++ object.C++ object.

Can be used for limited Can be used for limited message passing: I message passing: I modify an object state modify an object state and send it to you. and send it to you.

► ““Persistent” data Persistent” data (associated with buffers).(associated with buffers). Depth, stencil, textures.Depth, stencil, textures.

► Can be modifed by Can be modifed by multiple fragments in a multiple fragments in a single pass.single pass.

► Functionally similar to a Functionally similar to a global array global array BUTBUT each each fragment only gets one fragment only gets one location to change.location to change.

► Can be used to Can be used to communicate communicate acrossacross passes.passes.

Page 56: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

Who has access ? Who has access ? ► Memory “connectivity” in the graphics use of a GPU is tricky.Memory “connectivity” in the graphics use of a GPU is tricky.► In a traditional C program, all global variables can be written by In a traditional C program, all global variables can be written by

all routines. all routines. ► In the fixed-function pipeline, certain data is private.In the fixed-function pipeline, certain data is private.

A fragment cannot change a depth or stencil value of a location A fragment cannot change a depth or stencil value of a location different from its own.different from its own.

The framebuffer can be copied to a texture; a depth buffer cannot The framebuffer can be copied to a texture; a depth buffer cannot be copied in this way, and neither can a stencil buffer.be copied in this way, and neither can a stencil buffer.

Only a stencil buffer can count (efficiently)Only a stencil buffer can count (efficiently)► In the fixed-function pipeline, depth and stencil buffers can be In the fixed-function pipeline, depth and stencil buffers can be

used in a multi-pass computation only via readbacks. used in a multi-pass computation only via readbacks. ► A texture cannot be written directly. A texture cannot be written directly. ► In programmable GPUs, the memory connectivity becomes In programmable GPUs, the memory connectivity becomes

more open, but there are still constraints. more open, but there are still constraints.

Understanding access constraints and memory “connectivity” is Understanding access constraints and memory “connectivity” is a key step in programming the GPU.a key step in programming the GPU.

Page 57: Evolution of the Programmable Graphics Pipeline Lecture 2 Original Slides by: Suresh Venkatasubramanian Updates by Joseph Kider

How does this relate to stream How does this relate to stream programs ?programs ?

► The most important question to ask when The most important question to ask when programming the GPU is: programming the GPU is:

What can I do in one pass ?What can I do in one pass ?► Limitations on memory connectivity mean that a Limitations on memory connectivity mean that a

step in a computation may often have to be step in a computation may often have to be deferred to a new pass. deferred to a new pass.

► For example, when computing the second For example, when computing the second smallest element, we could not store the current smallest element, we could not store the current minimum in read/write memory.minimum in read/write memory.

► Thus, the “communication” of this value has to Thus, the “communication” of this value has to happen across a pass. happen across a pass.