® gdc’99 subdivision surfaces with the pentium ® iii processor mike bargeron senior software...

48
® GDC’99 GDC’99 Subdivision Subdivision Surfaces Surfaces with the Pentium with the Pentium ® ® III Processor III Processor Mike Bargeron Senior Software Developer Intel Corporation [email protected] (480) 552-3256 March 18, 1999

Upload: gordon-doyle

Post on 30-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

RR

®®

GDC’99GDC’99

Subdivision SurfacesSubdivision Surfaceswith the Pentiumwith the Pentium®® III III ProcessorProcessor

Mike BargeronSenior Software DeveloperIntel [email protected](480) 552-3256

March 18, 1999

RR

®®

GDC’99GDC’99

RR

®®

GDC’99GDC’99

Pros & Cons of Subdivision SurfacesPros & Cons of Subdivision Surfaces

Content Creation Content Creation

Run-time SubdivisionRun-time Subdivision

Pentium® III Processor Optimizations

Pros & Cons of Subdivision SurfacesPros & Cons of Subdivision Surfaces

Content Creation

Run-time Subdivision

Pentium® III Processor Optimizations

AgendaAgenda

RR

®®

GDC’99GDC’99

Platform Performance TrendPlatform Performance Trend

PlatformPerformance

(MHz)

0

100

200

300

400

500

600

...

NextGeneration

CurrentProcessor

Time 20001989

RR

®®

GDC’99GDC’99

High end for best impressionHigh end for best impression

Low end for more customersLow end for more customers

Scalable algorithms are the answerScalable algorithms are the answerScalable algorithms are the answerScalable algorithms are the answer

??

Developer's DilemmaDeveloper's Dilemma

RR

®®

GDC’99GDC’99

End UserExperience

Scalable Algorithm

Platform Performance

Looks, sounds &feels better!

Fixed Algorithm

Scalable Algorithms ?Scalable Algorithms ?

RR

®®

GDC’99GDC’99

Procedural detail-enhancementProcedural detail-enhancement

Subset of Bezier surfaces (B-Splines)Subset of Bezier surfaces (B-Splines)

First used for First used for "reverse engineering""reverse engineering"

NotNot a progressive mesh technology a progressive mesh technology

Subdivision SurfacesSubdivision Surfaces

RR

®®

GDC’99GDC’99

+ Good for arbitrary curves Good for arbitrary curves andand edges edges

+ Lower memory requirement Lower memory requirement

+ Transform fewer points Transform fewer points

"Pros" Checklist"Pros" Checklist

RR

®®

GDC’99GDC’99

- 4x polygons per iteration4x polygons per iteration

- Limited at ultra-low level of detailLimited at ultra-low level of detail

- Increased artist overheadIncreased artist overhead

- CPU-intensiveCPU-intensive

"Cons" Checklist"Cons" Checklist

RR

®®

GDC’99GDC’99

Subdivision Final GradeSubdivision Final Grade

Subdivision surfaces can be agreat, scalable technology

Subdivision surfaces can be agreat, scalable technology

RR

®®

GDC’99GDC’99

Pros & Cons of Subdivision SurfacesPros & Cons of Subdivision Surfaces

Content Creation

Run-time Subdivision

Pentium® III Processor Optimizations

AgendaAgenda

RR

®®

GDC’99GDC’99

Attributes Attributes

Connectivity Connectivity

SubdivideSubdivide

Control MeshControl Mesh

EdgeEdge

ValenceValence

Buzz WordsBuzz Words

RR

®®

GDC’99GDC’99

ControlControlMeshMesh

After oneAfter onesubdivisionsubdivision

After multipleAfter multiplesubdivisionssubdivisions

Subdivision SurfacesSubdivision SurfacesExampleExample

RR

®®

GDC’99GDC’99

Assign Assign attributesattributes smooth

creasesmooth

Assign edge Assign edge attributesattributes smooth

creasesmooth

Derive Derive vertex attributesvertex attributes

crease

creasecrease

Draw Draw control meshcontrol mesh

Content CreationContent Creation

Generate Generate connectivityconnectivity

V3

V2E2

E1

E3

V1

RR

®®

GDC’99GDC’99

Hairline Fracture

Drawing the Control MeshDrawing the Control Mesh

Subdivision

Design only continuous meshesDesign only continuous meshesDesign only continuous meshesDesign only continuous meshes

RR

®®

GDC’99GDC’99

Edge AttributesEdge Attributes

Attribute At limit of subdivision

Smooth Continuous surface normals

Crease Discontinuous surfacenormals

smooth

creasesmooth

RR

®®

GDC’99GDC’99

Vertex AttributesVertex AttributesVertex

Attribute# Incident

crease edges

Smooth 0

Dart 1

Crease 2(reg. or irreg.)

Corner 3

Conical Special case

crease

creasecrease

RR

®®

GDC’99GDC’99

Too Much InformationToo Much InformationEdge ConnectivityIndex Type VertexA VertexB PolygonA PolygonB 0 0 0 1 1 2

Vertex ConnectivityIndex Valence Neighbors Polygons Edges 0 4 1, 3, 6, 7 2, 3, 5, 6 0, 1, 10, 11

Polygon ConnectivityIndex EdgeA Edge B EdgeC 0 4 6 10

RR

®®

GDC’99GDC’99

Necessary InformationNecessary Information

Output a "Winged-Edge" data structure to reduce overhead

Output a "Winged-Edge" data structure to reduce overhead

A

B

C D

RR

®®

GDC’99GDC’99

AgendaAgenda

Pros & Cons of Subdivision SurfacesPros & Cons of Subdivision Surfaces

Content Creation

Run-time Subdivision

Pentium® III Processor Optimizations

RR

®®

GDC’99GDC’99

ControlControlMeshMesh

After oneAfter onesubdivisionsubdivision

After multipleAfter multiplesubdivisionssubdivisions

Subdivision SurfacesSubdivision SurfacesExampleExample

RR

®®

GDC’99GDC’99

Bisect edgesBisect edges

Reposition original Reposition original verticesvertices

Generate new meshGenerate new mesh

Run-time SubdivisionRun-time Subdivision

RR

®®

GDC’99GDC’99

Coefficients for edgesubdivision

Condition {a,b,c,d}

AB = smooth, orA or B = dart

{3,3,1,1}

AB = crease while(A or B = regularcrease && B or A =irreg. or corner)

{5,3,0,0}

Else {1,1,0,0}

Vnew = 3A + 3B + C + D

8

A

B

C D

Bisecting an edgeBisecting an edge

Weighted average of edge neighborsWeighted average of edge neighborsWeighted average of edge neighborsWeighted average of edge neighbors

RR

®®

GDC’99GDC’99

4 Simple Steps:

• Compute c0 for the vertex you are repositioning

• Calculate ci for each neighbor

• Multiply each vertex by its coefficient

• Add results and divide by sum of ci 's

Repositioning a vertexRepositioning a vertex

RR

®®

GDC’99GDC’99

v’0 = ci vi / cii=0 i=0

nn

Sigma coefficients for vertexrepositioning

Vertex Type co ci

smooth ordart

n a(n)

1

crease 6 oi

corner 1 0

conical b(n) 1

- n

a(n) = 5/8 – (3+2 cos (2/n))2 / 64

b(n) = ( 3 + 2 cos (2/n)) / 8

Coefficient constants

v0 v1

v2v3

v4

v5

Repositioning a vertexRepositioning a vertex

RR

®®

GDC’99GDC’99

v’0 = 6.892 v0 + v1 + v2 + v3 + v4 + v5

6.892 + 1 + 1 + 1 + 1 + 1

c0 = 5 / a(5) - 5 = 5 / .4205 - 5 = 6.892

c1 = c2 = c3 = c4 = c5 = 1

Sigma coefficients for vertexrelocation

Vertex Type co cismooth ordart

na(n)

1

crease 6 oi

corner 1 0conical b(n) 1

- n

a(5) = 5/8–(3+2 cos (2/5))2 / 64 = .4205 a(n) = 5/8 – ( 3 + 2 cos (2/n))2 / 64b(n) = ( 3 + 2 cos (2/n)) / 8

Coefficient constants

Weighted average of vertex neighborsWeighted average of vertex neighborsWeighted average of vertex neighborsWeighted average of vertex neighbors

Repositioning a vertexRepositioning a vertex

RR

®®

GDC’99GDC’99

Generating a New MeshGenerating a New Mesh

C2

A2

B2

D2

E2

F2

B1

A1

C1 D1

E1

F1

1 2

A

B

C D

Preserve polygon winding rulesPreserve polygon winding rulesPreserve polygon winding rulesPreserve polygon winding rules

RR

®®

GDC’99GDC’99

v0 v2v1 v3 v4 v5 v6 v7 v8 vVcount...

Repositioning function

v'0 v'2v'1 v'3 v'4 v'5 v'6 v'7 v'8 v’Vcount...

Output Buffer (Reposition)Output Buffer (Reposition)

RR

®®

GDC’99GDC’99

e0 e1 e2 e3 e4 e5 eEcount-1...

Subdivision function

v'0 v'1 v'Vcount... v'Vcount + 1 v'Vcount + 2

... v'Vcount + Ecount

{ {RepositionedVertices

BisectedVertices

Original boundary

Output Buffer (Bisection)Output Buffer (Bisection)

RR

®®

GDC’99GDC’99

RR

®®

GDC’99GDC’99

RR

®®

GDC’99GDC’99

RR

®®

GDC’99GDC’99

RR

®®

GDC’99GDC’99

RR

®®

GDC’99GDC’99

Pros & Cons of Subdivision SurfacesPros & Cons of Subdivision Surfaces

Content Creation

Run-time Subdivision

Pentium® III Processor Optimizations

AgendaAgenda

RR

®®

GDC’99GDC’99

...X0 Y0 Z0 X1 Y1 Z1 X2 Y2 Z2 X3~ ~ ~

32-byte Boundaries

Cache Line split

No split

...X0 Y0 Z0 A0 X0 Y1 Z1 A1 X2 Y2

Many wasted CPU clocks !

Cache Line Split StallsCache Line Split Stalls

RR

®®

GDC’99GDC’99

““Array of Structures” - AoS Array of Structures” - AoS – Ordinary interleaved data elementsOrdinary interleaved data elements

– Data typically not 32B alignedData typically not 32B aligned

– Risk of cache line split stallsRisk of cache line split stalls

struct {float x, y, z, nx, ny, nz;

} AoS_xyz_abc[200];

Conventional Data StructuresConventional Data Structures

RR

®®

GDC’99GDC’99

Structure of Arrays - SoA (Planar)Structure of Arrays - SoA (Planar)– Data in good order for SIMDData in good order for SIMD

– Cache only data that gets usedCache only data that gets used

– Avoids cache line splitsAvoids cache line splits

struct {float x[200], y[200],

z[200];float nx[200], ny[200],

nz[200];} SoA_xyz_abc;

Better Order for Subdivision?Better Order for Subdivision?

RR

®®

GDC’99GDC’99

Hybrid SoA - Array of SoA orderHybrid SoA - Array of SoA order– Easy to access memory sequentiallyEasy to access memory sequentially

– Address with one register plus offsetsAddress with one register plus offsets

– Avoid multiple page issues Avoid multiple page issues

struct { float x_nx[8], y_ny[8], z_nz[8];} Hybrid[50];

Ideal Order for Subdivision?Ideal Order for Subdivision?

RR

®®

GDC’99GDC’99

Array-of-Structures X0 Y0 Z0 NX0 NY0 NZ0 X1 Y1 ...

Structure-of-ArraysX0 NX0 X1 NX1 ...

Y0 NY0 Y1 NY1 ...

Z0 NZ0 Z1 NZ1 ...

Hybrid Structure

MT

X0 NX0 X1 NX1 Y0 NY0 Y1 NY1 ...Z0 NZ0

Loading Vertex DataLoading Vertex Data

xmm0 X0NX1 X1 NX0

xmm1 X7NX3 X3 NX7

xmm2 X2NX8 X8 NX2

RR

®®

GDC’99GDC’99

xmm2

xmm3

xmm0

xmm1

movlps NX7 X7

movlps ci7 ci7

NX3 X3movhpsci3 ci3movhps

movlps NX0 X0

movlps c0(0) c0(0)c0(1) c0(1)movhps

NX1 X1movhps

mulps

Exploiting the Streaming Exploiting the Streaming SIMD Extensions™SIMD Extensions™

CNX0 CX0CNX1 CX1CNX7 CX7CNX3 CX3

mulps

RR

®®

GDC’99GDC’99

Exploiting the Streaming Exploiting the Streaming SIMD Extensions™SIMD Extensions™

c0(0) c0(0)c0(1) c0(1)

ci7 ci7ci3 ci3addps

CNX0 CX0CNX1 CX1

CNX7 CX7CNX3 CX3addps

NX0 X0NX1 X1

C0 C0C1 C1divps

NX0' X0'NX1' X1'

RR

®®

GDC’99GDC’99

Latency Pipelined ?

sqrtps ~35 Nodivps ~35 Norsqrtps 2 Yesrcpps 2 Yes

11 bits of precision 11 bits of precision Newton-RaphsonNewton-Raphson

– rcp_a = 2 * rcp(a) - a * rcp(a)^2

– rsqrt_a = 0.5 * rsqrt(a) * ( 3 - (rsqrt(a)^2) )

Approximation InstructionsApproximation Instructions

RR

®®

GDC’99GDC’99

NX0 X0NX1 X1

1/C0 1/C01/C1 1/C1mulps

NX0' X0'NX1' X1'

C0 C0C1 C1

rcpps

Approximation InstructionsApproximation Instructions

RR

®®

GDC’99GDC’99

OptimizationsOptimizations

Some subdivision can be donereal-time on Pentium® III processor

Some subdivision can be donereal-time on Pentium® III processor

RR

®®

GDC’99GDC’99

Great scalable technologyGreat scalable technology

Continuous, winged-edge meshes

Weighted averages of neighbors

Optimize for the Pentium® III

processor for real-time subdivision

In SummaryIn Summary

RR

®®

GDC’99GDC’99

Q & AQ & A

RR

®®

GDC’99GDC’99

ReferencesReferencesPiecewise Smooth Surface Reconstruction,Piecewise Smooth Surface Reconstruction, Hugues Hugues

Hoppe, Tony DeRose, Tom Duchamp, Hubert Jin, Mark Hoppe, Tony DeRose, Tom Duchamp, Hubert Jin, Mark Halstead, John McDonald, Jean Schweitzer, Werner Halstead, John McDonald, Jean Schweitzer, Werner Stuetzle, pp. 295-302 in Siggraph 94 Conference Stuetzle, pp. 295-302 in Siggraph 94 Conference Proceedings, July 1994.Proceedings, July 1994.

Edge Based Data Structures for Solid Modeling in Curved-Edge Based Data Structures for Solid Modeling in Curved-Surface Environments,Surface Environments, K. Weiler, IEEE Computer K. Weiler, IEEE Computer Graphics and Applications, Vol. 5. No. 1, January, 1985. Graphics and Applications, Vol. 5. No. 1, January, 1985.

Fast Rendering of Subdivision Surfaces,Fast Rendering of Subdivision Surfaces, K. Pulli and M. K. Pulli and M. Segal, Siggraph 96 Conference Proceedings, 1996.Segal, Siggraph 96 Conference Proceedings, 1996.

Smooth Subdivision Surfaces Based on Triangles, Smooth Subdivision Surfaces Based on Triangles, C. C. Loop, University of Utah Master's Theseis, 1987.Loop, University of Utah Master's Theseis, 1987.