® gdc’99 subdivision surfaces with the pentium ® iii processor mike bargeron senior software...
TRANSCRIPT
RR
®®
GDC’99GDC’99
Subdivision SurfacesSubdivision Surfaceswith the Pentiumwith the Pentium®® III III ProcessorProcessor
Mike BargeronSenior Software DeveloperIntel [email protected](480) 552-3256
March 18, 1999
RR
®®
GDC’99GDC’99
Pros & Cons of Subdivision SurfacesPros & Cons of Subdivision Surfaces
Content Creation Content Creation
Run-time SubdivisionRun-time Subdivision
Pentium® III Processor Optimizations
Pros & Cons of Subdivision SurfacesPros & Cons of Subdivision Surfaces
Content Creation
Run-time Subdivision
Pentium® III Processor Optimizations
AgendaAgenda
RR
®®
GDC’99GDC’99
Platform Performance TrendPlatform Performance Trend
PlatformPerformance
(MHz)
0
100
200
300
400
500
600
...
NextGeneration
CurrentProcessor
Time 20001989
RR
®®
GDC’99GDC’99
High end for best impressionHigh end for best impression
Low end for more customersLow end for more customers
Scalable algorithms are the answerScalable algorithms are the answerScalable algorithms are the answerScalable algorithms are the answer
??
Developer's DilemmaDeveloper's Dilemma
RR
®®
GDC’99GDC’99
End UserExperience
Scalable Algorithm
Platform Performance
Looks, sounds &feels better!
Fixed Algorithm
Scalable Algorithms ?Scalable Algorithms ?
RR
®®
GDC’99GDC’99
Procedural detail-enhancementProcedural detail-enhancement
Subset of Bezier surfaces (B-Splines)Subset of Bezier surfaces (B-Splines)
First used for First used for "reverse engineering""reverse engineering"
NotNot a progressive mesh technology a progressive mesh technology
Subdivision SurfacesSubdivision Surfaces
RR
®®
GDC’99GDC’99
+ Good for arbitrary curves Good for arbitrary curves andand edges edges
+ Lower memory requirement Lower memory requirement
+ Transform fewer points Transform fewer points
"Pros" Checklist"Pros" Checklist
RR
®®
GDC’99GDC’99
- 4x polygons per iteration4x polygons per iteration
- Limited at ultra-low level of detailLimited at ultra-low level of detail
- Increased artist overheadIncreased artist overhead
- CPU-intensiveCPU-intensive
"Cons" Checklist"Cons" Checklist
RR
®®
GDC’99GDC’99
Subdivision Final GradeSubdivision Final Grade
Subdivision surfaces can be agreat, scalable technology
Subdivision surfaces can be agreat, scalable technology
RR
®®
GDC’99GDC’99
Pros & Cons of Subdivision SurfacesPros & Cons of Subdivision Surfaces
Content Creation
Run-time Subdivision
Pentium® III Processor Optimizations
AgendaAgenda
RR
®®
GDC’99GDC’99
Attributes Attributes
Connectivity Connectivity
SubdivideSubdivide
Control MeshControl Mesh
EdgeEdge
ValenceValence
Buzz WordsBuzz Words
RR
®®
GDC’99GDC’99
ControlControlMeshMesh
After oneAfter onesubdivisionsubdivision
After multipleAfter multiplesubdivisionssubdivisions
Subdivision SurfacesSubdivision SurfacesExampleExample
RR
®®
GDC’99GDC’99
Assign Assign attributesattributes smooth
creasesmooth
Assign edge Assign edge attributesattributes smooth
creasesmooth
Derive Derive vertex attributesvertex attributes
crease
creasecrease
Draw Draw control meshcontrol mesh
Content CreationContent Creation
Generate Generate connectivityconnectivity
V3
V2E2
E1
E3
V1
RR
®®
GDC’99GDC’99
Hairline Fracture
Drawing the Control MeshDrawing the Control Mesh
Subdivision
Design only continuous meshesDesign only continuous meshesDesign only continuous meshesDesign only continuous meshes
RR
®®
GDC’99GDC’99
Edge AttributesEdge Attributes
Attribute At limit of subdivision
Smooth Continuous surface normals
Crease Discontinuous surfacenormals
smooth
creasesmooth
RR
®®
GDC’99GDC’99
Vertex AttributesVertex AttributesVertex
Attribute# Incident
crease edges
Smooth 0
Dart 1
Crease 2(reg. or irreg.)
Corner 3
Conical Special case
crease
creasecrease
RR
®®
GDC’99GDC’99
Too Much InformationToo Much InformationEdge ConnectivityIndex Type VertexA VertexB PolygonA PolygonB 0 0 0 1 1 2
Vertex ConnectivityIndex Valence Neighbors Polygons Edges 0 4 1, 3, 6, 7 2, 3, 5, 6 0, 1, 10, 11
Polygon ConnectivityIndex EdgeA Edge B EdgeC 0 4 6 10
RR
®®
GDC’99GDC’99
Necessary InformationNecessary Information
Output a "Winged-Edge" data structure to reduce overhead
Output a "Winged-Edge" data structure to reduce overhead
A
B
C D
RR
®®
GDC’99GDC’99
AgendaAgenda
Pros & Cons of Subdivision SurfacesPros & Cons of Subdivision Surfaces
Content Creation
Run-time Subdivision
Pentium® III Processor Optimizations
RR
®®
GDC’99GDC’99
ControlControlMeshMesh
After oneAfter onesubdivisionsubdivision
After multipleAfter multiplesubdivisionssubdivisions
Subdivision SurfacesSubdivision SurfacesExampleExample
RR
®®
GDC’99GDC’99
Bisect edgesBisect edges
Reposition original Reposition original verticesvertices
Generate new meshGenerate new mesh
Run-time SubdivisionRun-time Subdivision
RR
®®
GDC’99GDC’99
Coefficients for edgesubdivision
Condition {a,b,c,d}
AB = smooth, orA or B = dart
{3,3,1,1}
AB = crease while(A or B = regularcrease && B or A =irreg. or corner)
{5,3,0,0}
Else {1,1,0,0}
Vnew = 3A + 3B + C + D
8
A
B
C D
Bisecting an edgeBisecting an edge
Weighted average of edge neighborsWeighted average of edge neighborsWeighted average of edge neighborsWeighted average of edge neighbors
RR
®®
GDC’99GDC’99
4 Simple Steps:
• Compute c0 for the vertex you are repositioning
• Calculate ci for each neighbor
• Multiply each vertex by its coefficient
• Add results and divide by sum of ci 's
Repositioning a vertexRepositioning a vertex
RR
®®
GDC’99GDC’99
v’0 = ci vi / cii=0 i=0
nn
Sigma coefficients for vertexrepositioning
Vertex Type co ci
smooth ordart
n a(n)
1
crease 6 oi
corner 1 0
conical b(n) 1
- n
a(n) = 5/8 – (3+2 cos (2/n))2 / 64
b(n) = ( 3 + 2 cos (2/n)) / 8
Coefficient constants
v0 v1
v2v3
v4
v5
Repositioning a vertexRepositioning a vertex
RR
®®
GDC’99GDC’99
v’0 = 6.892 v0 + v1 + v2 + v3 + v4 + v5
6.892 + 1 + 1 + 1 + 1 + 1
c0 = 5 / a(5) - 5 = 5 / .4205 - 5 = 6.892
c1 = c2 = c3 = c4 = c5 = 1
Sigma coefficients for vertexrelocation
Vertex Type co cismooth ordart
na(n)
1
crease 6 oi
corner 1 0conical b(n) 1
- n
a(5) = 5/8–(3+2 cos (2/5))2 / 64 = .4205 a(n) = 5/8 – ( 3 + 2 cos (2/n))2 / 64b(n) = ( 3 + 2 cos (2/n)) / 8
Coefficient constants
Weighted average of vertex neighborsWeighted average of vertex neighborsWeighted average of vertex neighborsWeighted average of vertex neighbors
Repositioning a vertexRepositioning a vertex
RR
®®
GDC’99GDC’99
Generating a New MeshGenerating a New Mesh
C2
A2
B2
D2
E2
F2
B1
A1
C1 D1
E1
F1
1 2
A
B
C D
Preserve polygon winding rulesPreserve polygon winding rulesPreserve polygon winding rulesPreserve polygon winding rules
RR
®®
GDC’99GDC’99
v0 v2v1 v3 v4 v5 v6 v7 v8 vVcount...
Repositioning function
v'0 v'2v'1 v'3 v'4 v'5 v'6 v'7 v'8 v’Vcount...
Output Buffer (Reposition)Output Buffer (Reposition)
RR
®®
GDC’99GDC’99
e0 e1 e2 e3 e4 e5 eEcount-1...
Subdivision function
v'0 v'1 v'Vcount... v'Vcount + 1 v'Vcount + 2
... v'Vcount + Ecount
{ {RepositionedVertices
BisectedVertices
Original boundary
Output Buffer (Bisection)Output Buffer (Bisection)
RR
®®
GDC’99GDC’99
Pros & Cons of Subdivision SurfacesPros & Cons of Subdivision Surfaces
Content Creation
Run-time Subdivision
Pentium® III Processor Optimizations
AgendaAgenda
RR
®®
GDC’99GDC’99
...X0 Y0 Z0 X1 Y1 Z1 X2 Y2 Z2 X3~ ~ ~
32-byte Boundaries
Cache Line split
No split
...X0 Y0 Z0 A0 X0 Y1 Z1 A1 X2 Y2
Many wasted CPU clocks !
Cache Line Split StallsCache Line Split Stalls
RR
®®
GDC’99GDC’99
““Array of Structures” - AoS Array of Structures” - AoS – Ordinary interleaved data elementsOrdinary interleaved data elements
– Data typically not 32B alignedData typically not 32B aligned
– Risk of cache line split stallsRisk of cache line split stalls
struct {float x, y, z, nx, ny, nz;
} AoS_xyz_abc[200];
Conventional Data StructuresConventional Data Structures
RR
®®
GDC’99GDC’99
Structure of Arrays - SoA (Planar)Structure of Arrays - SoA (Planar)– Data in good order for SIMDData in good order for SIMD
– Cache only data that gets usedCache only data that gets used
– Avoids cache line splitsAvoids cache line splits
struct {float x[200], y[200],
z[200];float nx[200], ny[200],
nz[200];} SoA_xyz_abc;
Better Order for Subdivision?Better Order for Subdivision?
RR
®®
GDC’99GDC’99
Hybrid SoA - Array of SoA orderHybrid SoA - Array of SoA order– Easy to access memory sequentiallyEasy to access memory sequentially
– Address with one register plus offsetsAddress with one register plus offsets
– Avoid multiple page issues Avoid multiple page issues
struct { float x_nx[8], y_ny[8], z_nz[8];} Hybrid[50];
Ideal Order for Subdivision?Ideal Order for Subdivision?
RR
®®
GDC’99GDC’99
Array-of-Structures X0 Y0 Z0 NX0 NY0 NZ0 X1 Y1 ...
Structure-of-ArraysX0 NX0 X1 NX1 ...
Y0 NY0 Y1 NY1 ...
Z0 NZ0 Z1 NZ1 ...
Hybrid Structure
MT
X0 NX0 X1 NX1 Y0 NY0 Y1 NY1 ...Z0 NZ0
Loading Vertex DataLoading Vertex Data
xmm0 X0NX1 X1 NX0
xmm1 X7NX3 X3 NX7
xmm2 X2NX8 X8 NX2
RR
®®
GDC’99GDC’99
xmm2
xmm3
xmm0
xmm1
movlps NX7 X7
movlps ci7 ci7
NX3 X3movhpsci3 ci3movhps
movlps NX0 X0
movlps c0(0) c0(0)c0(1) c0(1)movhps
NX1 X1movhps
mulps
Exploiting the Streaming Exploiting the Streaming SIMD Extensions™SIMD Extensions™
CNX0 CX0CNX1 CX1CNX7 CX7CNX3 CX3
mulps
RR
®®
GDC’99GDC’99
Exploiting the Streaming Exploiting the Streaming SIMD Extensions™SIMD Extensions™
c0(0) c0(0)c0(1) c0(1)
ci7 ci7ci3 ci3addps
CNX0 CX0CNX1 CX1
CNX7 CX7CNX3 CX3addps
NX0 X0NX1 X1
C0 C0C1 C1divps
NX0' X0'NX1' X1'
RR
®®
GDC’99GDC’99
Latency Pipelined ?
sqrtps ~35 Nodivps ~35 Norsqrtps 2 Yesrcpps 2 Yes
11 bits of precision 11 bits of precision Newton-RaphsonNewton-Raphson
– rcp_a = 2 * rcp(a) - a * rcp(a)^2
– rsqrt_a = 0.5 * rsqrt(a) * ( 3 - (rsqrt(a)^2) )
Approximation InstructionsApproximation Instructions
RR
®®
GDC’99GDC’99
NX0 X0NX1 X1
1/C0 1/C01/C1 1/C1mulps
NX0' X0'NX1' X1'
C0 C0C1 C1
rcpps
Approximation InstructionsApproximation Instructions
RR
®®
GDC’99GDC’99
OptimizationsOptimizations
Some subdivision can be donereal-time on Pentium® III processor
Some subdivision can be donereal-time on Pentium® III processor
RR
®®
GDC’99GDC’99
Great scalable technologyGreat scalable technology
Continuous, winged-edge meshes
Weighted averages of neighbors
Optimize for the Pentium® III
processor for real-time subdivision
In SummaryIn Summary
RR
®®
GDC’99GDC’99
ReferencesReferencesPiecewise Smooth Surface Reconstruction,Piecewise Smooth Surface Reconstruction, Hugues Hugues
Hoppe, Tony DeRose, Tom Duchamp, Hubert Jin, Mark Hoppe, Tony DeRose, Tom Duchamp, Hubert Jin, Mark Halstead, John McDonald, Jean Schweitzer, Werner Halstead, John McDonald, Jean Schweitzer, Werner Stuetzle, pp. 295-302 in Siggraph 94 Conference Stuetzle, pp. 295-302 in Siggraph 94 Conference Proceedings, July 1994.Proceedings, July 1994.
Edge Based Data Structures for Solid Modeling in Curved-Edge Based Data Structures for Solid Modeling in Curved-Surface Environments,Surface Environments, K. Weiler, IEEE Computer K. Weiler, IEEE Computer Graphics and Applications, Vol. 5. No. 1, January, 1985. Graphics and Applications, Vol. 5. No. 1, January, 1985.
Fast Rendering of Subdivision Surfaces,Fast Rendering of Subdivision Surfaces, K. Pulli and M. K. Pulli and M. Segal, Siggraph 96 Conference Proceedings, 1996.Segal, Siggraph 96 Conference Proceedings, 1996.
Smooth Subdivision Surfaces Based on Triangles, Smooth Subdivision Surfaces Based on Triangles, C. C. Loop, University of Utah Master's Theseis, 1987.Loop, University of Utah Master's Theseis, 1987.