Rasterization
Kurt Akeley
CS248 Lecture 5
9 October 2007
http://graphics.stanford.edu/courses/cs248-07/
CS248 Lecture 5 Kurt Akeley, Fall 2007
The vertex pipeline
Vertex assembly
Primitive assembly
Rasterization
Fragment operations
Display
Vertex operations
Application
Primitive operations
struct { float x,y,z,w; float r,g,b,a;} vertex;
struct { vertex v0,v1,v2
} triangle;struct { short int x,y; float depth; float r,g,b,a;} fragment;
struct { int depth; byte r,g,b,a;} pixel;
Frame buffer
Screen coordinates
CS248 Lecture 5 Kurt Akeley, Fall 2007
Screen coordinates
Ideal screen coordinates are continuous
Not integer pixel locations!
Implementations always use discrete math
Fixed-point or floating-point Always with substantial sub-pixel precision
Fixed-point illustrated in the pre-filter antialiasing lecture
A pixel is a ‘big’ thing
Spatial resolution can approach # of pixels on screen
Data resolution can be large too Multiple copies of pixel data structure
SGI RealityEngine frame buffer was deeper than wide or tall
struct { float x,y,z; float r,g,b,a;} vertex;
CS248 Lecture 5 Kurt Akeley, Fall 2007
Key facts about perspective projection
Straight lines project to straight lines (on a plane)
Only vertexes need to be transformed
That’s why we’re interested in lines and polygons
Parameterizations (e.g., distance) are warped:
More on projection in later lectures …
CS248 Lecture 5 Kurt Akeley, Fall 2007
Two fundamental operations
Fragment selection Identify pixels for which
fragments are to be generated <x,y> ‘attributes’ are special
Must be Exact, for aliased rendering Superset, for antialiased
rendering
Should be efficient, for performance
Attribute assignment Assign attribute values to each
fragment E.g., color, depth, …
struct { short int x,y; float depth; float r,g,b,a;} fragment;
CS248 Lecture 5 Kurt Akeley, Fall 2007
Fragment selection
Generate one fragment for each pixel that is intersected (or covered) by the primitive
Intersected could mean that the primitive’s area intersects the pixel’s:
Center point, or
Square region, or
Filter-function (in area-sampling terms)
Some examples …
CS248 Lecture 5 Kurt Akeley, Fall 2007
Point-sampled fragment selection
Generate fragment if pixel center is inside triangle
Implements point-sampled aliased rasterization
CS248 Lecture 5 Kurt Akeley, Fall 2007
Point-sampled fragment selection
Pixels along shared edges should have exactly one fragment selected for them
Must handle on-edge/on-vertex sample points consistently
CS248 Lecture 5 Kurt Akeley, Fall 2007
Tiled fragment selection
Generate fragment if unit square intersects triangle
Implements multisample and tiled rasterizations
CS248 Lecture 5 Kurt Akeley, Fall 2007
Tiled fragment selection
Multisample rasterization
4x4 samples per pixel
CS248 Lecture 5 Kurt Akeley, Fall 2007
Tiled fragment selection
Tiled rasterization
4x4 pixels per tile
CS248 Lecture 5 Kurt Akeley, Fall 2007
Antialiased fragment selection
Generate fragment if filter function intersects triangle
Implements pre-filtered antialiasing
CS248 Lecture 5 Kurt Akeley, Fall 2007
Fragment selection (continued)
What if the primitive doesn’t have a geometric area?
Delta-function points and lines don’t
Three choices:
Rule-based approach n x n pixel point
Bresenham line (details later in this lecture)
Pre-filter Bandlimited infinite spatial extent
Assign a screen-space geometry Circle for point
Rectangle for line
CS248 Lecture 5 Kurt Akeley, Fall 2007
Geometry-based attribute assignment
(Assumes vertex-specified geometry, i.e., polygons)
Two steps
Parameterize the attribute Fit a function (surface) to the vertex values
Point-sample this parameterization
Which parameterization?
Constant (aka flat shading) No continuity at shared edges
CS248 Lecture 5 Kurt Akeley, Fall 2007
Geometry-based attribute assignment
(Assumes vertex-specified geometry, i.e., polygons)
Two steps
Parameterize the attribute Fit a function (surface) to the vertex values
Point-sample this parameterization
Which parameterization?
Constant (aka flat shading) No continuity at shared edges
Bilinear (planar surface) Value continuity at shared edges
CS248 Lecture 5 Kurt Akeley, Fall 2007
Geometry-based attribute assignment
(Assumes vertex-specified geometry, i.e., polygons)
Two steps
Parameterize the attribute Fit a function (surface) to the vertex values
Point-sample this parameterization
Which parameterization?
Constant (aka flat shading) No continuity at shared edges
Bilinear (planar surface) Value continuity at shared edges
Cubic (non-planar surface) Slope continuity at shared edges
CS248 Lecture 5 Kurt Akeley, Fall 2007
Mach banding – value discontinuities
Flat shaded, but appeared ‘scalloped’
CS248 Lecture 5 Kurt Akeley, Fall 2007
Geometry-based attribute assignment
(Assumes vertex-specified geometry, i.e., polygons)
Two steps
Parameterize the attribute Fit a function (surface) to the vertex values
Point-sample this parameterization
Which parameterization?
Constant (aka flat shading) No continuity at shared edges
Bilinear (planar surface) Value continuity at shared edges
Cubic (non-planar surface) Slope continuity at shared edges
Gouraud (hybrid)
CS248 Lecture 5 Kurt Akeley, Fall 2007
Gouraud shaded quad
Fragment selection
Walk (iterate along) edges
Change edges at vertexes
Attribute assignment
Loop in a loop algorithm: Iterate linearly along edges
Iterate linearly edge-to-edge
Outer loop is complex E.g., either 2 or 3 regions
Parameterization is a function of Screen orientation
Choice of spans
CS248 Lecture 5 Kurt Akeley, Fall 2007
Problems with quads / polygons
“All” projected quadrilaterals are non-planar
Due to discrete coordinate precision
What if quadrilateral is concave?
Concave is complex (split spans -- see example)
Non-planar concave for some view
What if quadrilateral intersects itself?
A real mess (no vertex to signal change –- see example)
Non-planar “bowtie” for some view
CS248 Lecture 5 Kurt Akeley, Fall 2007
All polygons are triangles (or should be)
Triangle is always convex
Regardless of arithmetic precision
Simplifies rasterization—no special cases
Three points define a plane
All triangles are planar
All parameterizations are (or can be) planar
Modern GPUs decompose polygons to triangles
SGI switched in 1990 with the VGX product
OpenGL is designed to allow triangulation
Optimized quadrilateral decomposition developed
CS248 Lecture 5 Kurt Akeley, Fall 2007
Complex polygons
There are algorithms to rasterize
Self-intersecting polygons
Polygons with holes
…
These polygons have applications in 2-D rendering
But they are not useful for 3-D rendering
Too slow to render
Don’t have meaningful attribute parameterizations
So we will ignore them
CS248 Lecture 5 Kurt Akeley, Fall 2007
Normal-based quad decomposition
Compute A•C and B•D
Connect vertex pair with the greater dot product
Avoid connecting the ‘stirrups’
Bottom line: decomposition matters!
A
B C
D
CS248 Lecture 5 Kurt Akeley, Fall 2007
Iteration vs. direct evaluation
1n ny y dydx+ = +
1n nx x dxdy- = -
1, ,x y x ya a dadx+ = +
, 1 ,x y x ya a dady- = -
Along edgesBetween adjacent
pixels
Iteration:
0ny y x dydx= + × , 0x ya a x dadx y dady= + × + ×
Direct evaluation:
1n na a dady+ = +
CS248 Lecture 5 Kurt Akeley, Fall 2007
Iteration vs. direct evaluation
Iteration
Is less numerically intensive (no multiplication)
Direct evaluation
Is more precise (no accumulated error)
Parallelizes better (no sequence presumption)
CS248 Lecture 5 Kurt Akeley, Fall 2007
DDA iteration
Digital Differential Analyzer (DDA)
Implements iteration in fixed-point representation
E.g., iiiiiiii.ffff (8.4) or siiiiiii.ffff (s7.4)
Repeatedly adds delta value to accumulated value
Loses ½ LSB precision per iteration step
Require log2(n) fraction bits for n steps To reach the correct extreme values
Dimensions of rendering space determine maximum number of steps May differ from size of frame buffer
2-D iteration requires an extra bit
CS248 Lecture 5 Kurt Akeley, Fall 2007
Triangle rasterization examples
Gouraud shaded (GTX)
Edge walk, planar parameterization (VGX)
Barycentric direct evaluation (InfiniteReality)
Small tiles (Bali – proposed)
Per-pixel evaluation (Pixel Planes 4)
CS248 Lecture 5 Kurt Akeley, Fall 2007
Algorithm properties
Setup and execution costs
Setup: constant per triangle
Execution: relative to triangle’s projected area
Ability to parallelize
Ability to cull to a rectangular screen region
To support tiling
To support “scissoring”
Scissor region
Triangle to be
rasterized
CS248 Lecture 5 Kurt Akeley, Fall 2007
Gouraud shaded (GTX)
Two-stage algorithm DDA edge walk
fragment selection attribute assignment
DDA scan-line walk attribute assignment only
Requires expensive scan-line setup Location of first sample is
non-unit distance from edge
Parallelizes in two stages (e.g., GTX)
Cannot scissor efficiently
Works on quadrilaterals
dadx
CS248 Lecture 5 Kurt Akeley, Fall 2007
Edge walk, planar evaluation (VGX)
Hybrid algorithm
Edge DDA walk for fragment selection Efficient generation of conservative fragment set
Pixel-center DDA walk for attribute assignment Never step off sample grid, so
Sub-pixel adjustment is made just once,
– Rather than for each scan-line
Scissor cull possible
Adds complexity to edge walk
Easy for attribute evaluation
Parallelizes similarly to Gouraud
CS248 Lecture 5 Kurt Akeley, Fall 2007
DDA can operate out-of-range
MSBs beyond desired range don’t influence result
Carry chain flows up, not down
Can handle arbitrarily large slopes
Can iterate outside the triangle’s area
Must not clamp (range limit) intermediate results!
Doesn’t work for floating point!
+
Accum
Delta
CS248 Lecture 5 Kurt Akeley, Fall 2007
Wrapping
Value Binary
3 11
2 10
1 01
0 00
Overflow Underflow
Problem: overflow or underflow of iterated value
Integer arithmetic “wraps” Maximum value overflows to zero
Zero underflows to maximum value
Minor iteration error huge value error
CS248 Lecture 5 Kurt Akeley, Fall 2007
Guard bits
Solution: extend range to detect “wrapped” values
Add one or two “guard” MSBs
Non-zero guard bit(s) out-of-range value
‘Clamp’ out-of-range values to the nearer of zero or max
CS248 Lecture 5 Kurt Akeley, Fall 2007
Guard-bit example
Value Binary Clamped Value Binary
5 101 3 11
4 100 3 11
3 011 3 11
2 010 2 10
1 001 1 01
0 000 0 00
-1 (7) 111 0 00
-2 (6) 110 0 00
Overflow
Underflow
2-bit value, 1 guard bit
CS248 Lecture 5 Kurt Akeley, Fall 2007
Guard-bit implementation (n-bit)
Out0In0
Out1In1
Outn-1Inn-1
Inguard
CS248 Lecture 5 Kurt Akeley, Fall 2007
DDA bit-assignment examples
Pixel (12) Subpixel (10)
Iteration (12)Guard (1)
Depth (24)
Iteration (13)Guard (2)
Edge walk in 4k x 4k rendering space (35 bits)
Depth walk in 4k x 4k rendering space (39 bits)
(one extra for diagonal)
CS248 Lecture 5 Kurt Akeley, Fall 2007
Barycentric (InfiniteReality)
Hybrid algorithm
Approximate edge walk for fragment selection Pineda edge functions used to generate AA
masks
Direct barycentric evaluation for attribute assignment Minimizes setup cost
Additional computational complexity accepted
Handles small triangles well
Scissor cull implemented
CS248 Lecture 5 Kurt Akeley, Fall 2007
Barycentric attribute evaluation
0 0 1 1 2 2a v a v a vv
a
+ +=
(x0, y0, v0)
(x1, y1, v1)
(x2, y2, v2)
(x, y, v)
( ) ( ) ( )0 1 1 1 2 1 2 2 2a yx xy y x x y y x x y= - + - + -
a2
a1
a0
1 ...a =
2 ...a =
( ) ( ) ( )0 1 0 1 1 2 1 2 2 0 2 0a y x x y y x x y y x x y= - + - + -
CS248 Lecture 5 Kurt Akeley, Fall 2007
Small tiles (Bali – proposed)
Frame buffer tiled into nxn (16x16) regions
Each tile is owned by one of k separate engines
Two-level rasterization:
Tile selection (avoid broadcast, conservative)
Fragment selection and attribute assignment
Parallelizes well
Handles small triangles well
Scissors well
At tile selection stage
CS248 Lecture 5 Kurt Akeley, Fall 2007
Tiled fragment selection
Tiled rasterization
4x4 pixels per tile
CS248 Lecture 5 Kurt Akeley, Fall 2007
Engine per pixel (Pixel Planes 4)
Image courtesy of Anselmo Lastra, University of North Carolina at Chapel Hill
CS248 Lecture 5 Kurt Akeley, Fall 2007
Engine-per-pixel (Pixel Planes 4)
Individual direct-evaluation engine at every pixel !
Solves edge equations to determine inclusion
Solves attribute equations to determine values
Setup involves computation of plane and edge slopes
Execution is in constant-time
Clever evaluation tree makes this possible
Extremely fast for large triangles, but
Extremely inefficient for small triangles Effectively generates a fragment for every pixel
Scissor culling is a non-issue
CS248 Lecture 5 Kurt Akeley, Fall 2007
Pixel Planes 4 fragment selection
Image courtesy of Anselmo Lastra, University of North Carolina at Chapel Hill
CS248 Lecture 5 Kurt Akeley, Fall 2007
Pixel Planes 4 fragment selection
Image courtesy of Anselmo Lastra, University of North Carolina at Chapel Hill
CS248 Lecture 5 Kurt Akeley, Fall 2007
Pixel Planes 4 fragment selection
Image courtesy of Anselmo Lastra, University of North Carolina at Chapel Hill
CS248 Lecture 5 Kurt Akeley, Fall 2007
Pixel Planes 4 attribute evaluation
Image courtesy of Anselmo Lastra, University of North Carolina at Chapel Hill
CS248 Lecture 5 Kurt Akeley, Fall 2007
Other approaches
Homogeneous recursive descent
Rasterizes unprojected, unclipped geometry
Used by NVIDIA GPUs
Read Olano and Greer, Graphics Hardware 1997
Scan-line rasterization
Keep sorted list of primitives per scanline
Generate image directly (no frame buffer)
Ray tracing
…
CS248 Lecture 5 Kurt Akeley, Fall 2007
Bresenham lines
Developed by Jack Bresenham at IBM for pen plotters
Evolved over time, however
Like DDA, but with no division required for setup
In a sense the division is done bit-by-bit as the line is generated.
X- or Y-major iteration
Limitations:
Original version does not handle subpixel vertexes
Error term cannot be used for pre-filter AA
Defining property: one pixel per iteration step
Diagonal lines are ‘less bright’
DDA can be used to adjust this
CS248 Lecture 5 Kurt Akeley, Fall 2007
Bresenham lines
Y-major X-major
Symmetric (or nearly so)
Always one pixel per iteration along the major
axis
CS248 Lecture 5 Kurt Akeley, Fall 2007
Bresenham line pseudo-code
// first octant (assumes 0 < dx <= dy)int dx = x1-x0;int dy = y1-y0;int x = x0;int y = y0;int error = dx>>1; // division by 2while (x <= x1) { DrawFragment(x,y); x += 1; error -= dy; if (error < 0) { y += 1; error += dx; }}
CS248 Lecture 5 Kurt Akeley, Fall 2007
Revisit pre-filtered antialiased lines
6Line slope
4Xscreen frac
bits
4Yscreen frac bits
3Pixel index
81M x 8
3Line width
Fragment Alpha
For x-major line
Line slope from DDA delta fraction bits
Xscreen frac bits needed for end-points only
Yscreen frac bits from DDA accumulation fraction bits
CS248 Lecture 5 Kurt Akeley, Fall 2007
Summary
Screen coordinates are continuous, not pixel addresses
Rasterization converts primitives to fragments Fragment selection: identify ‘covered’ pixels Attribute evaluation: determine color, depth, …
Modern 3-D graphics systems … Point sample for fragment selection and
attribute evaluation Decompose polygons and quads to triangles Prefer direct evaluation over iteration Prefer floating-point to fixed-point
representations
CS248 Lecture 5 Kurt Akeley, Fall 2007
Assignments
Before next Tuesday’s class, read
Paul Haeberli and Kurt Akeley, The Accumulation Buffer: Hardware Support for High-Quality Rendering, Proceedings of SIGGRAPH, pp. 309-318, 1990.
Kurt Akeley, RealityEngine Graphics, Proceedings of SIGGRAPH, pp. 109-116, 1993.
Optional: Marc Olano and Trey Greer, Triangle Scan Conversion Using 2D Homogeneous Coordinates, Proceedings of Graphics Hardware, pp. 89-96, 1997.
Project 1:
Demos tomorrow!
Sign up for a slot today