tessellation in a low poly world nicolas thibieroz amd graphics products group...

47
Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group [email protected] Original materials from Bill Bilodeau 1 22/06/22 GDC Paris 2008

Upload: andrew-sutherland

Post on 26-Mar-2015

218 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Tessellation in a Low Poly World

Nicolas ThibierozAMD Graphics Products Group

[email protected]

Original materials from Bill Bilodeau110/04/23

GDC Paris 2008

Page 2: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Medium

What is Tessellation?

Tessellation is the process of adding new primitives into an existing model

Triangle counts can be “dialed in” by adjusting the tessellation level

Low High

Page 3: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

AMD Hardware Tessellator

Output Merger

Rasterizer

Pixel Shader

Mem

ory

/ R

esou

rces

Vertex Shader

Mem

ory

/ R

esou

rces

Input Assembler

Tessellator

Page 4: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Hardware tessellation allows you to render more polygons for better silhouettes

Initial concept artwork from Bay Raitt, Valve

Page 5: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Surface control cages are easier to work with than individual triangles

Artists prefer to create models this way

Animations are simpler on a control cage

Control cage can be animated on the GPU, then tessellated in a second pass

Animated Control Cage

Vertex Shader

Pixel Shader

R2VB

Vertex Shader

Pixel Shader

Tessellator

Page 6: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Hardware tessellation is a form of compression

Smaller footprint – you only need to store the control cage and possibly a displacement map

Improved bandwidth – less data to transfer from memory to GPU

Page 7: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Three types of primitives, or “superprims”, are supported

Triangles

Quads

Lines

Page 8: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

There are two tessellation modes

- Continuous

- Adaptive

Page 9: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Continuous Tessellation

Specify floating point tessellation level per-draw call

– Tessellation levels range from 1.0 to 14.99

Eliminates popping as vertices are added through tessellation

Level 1.0 Level 2.0

Page 10: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Level = 1.0Level = 1.1Level = 1.3Level = 1.7Level = 2.0

Continuous Tessellation

Level 1.0 Level 2.0

Specify floating point tessellation level per-draw call

– Tessellation levels range from 1.0 to 14.99

Eliminates popping as vertices are added through tessellation

Page 11: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Adaptive allows different levels of tessellation within the same mesh

Edge tessellation factor = 5.x

Edge t

esse

llatio

n fa

ctor

= 3

.x Edge tessellation factor = 3.x

Edge tessellation factor = 5.x

Edge tessellation factor = 7.x

Edge tessellation factor =

3.xEdg

e te

ssel

lati

on fa

ctor

= 3

.x

Page 12: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Adaptive tessellation can be done in real-time using multiple passes

Transformed Superprim

Mesh

Superprim Mesh

Vertex Shader

Pixel Shader

Superprim Mesh

Vertex Shader

Pixel Shader

Sampler

Stream 0

Vertex Shader

Pixel ShaderSuperprim

Mesh

Stream 1Tessellator

Tessellation Factors

R2VB

Page 13: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Code Example: Continuous Tessellation

// Enable tessellation:TSSetTessellationMode( pd3dDevice, TSMD_ENABLE_CONTINUOUS );// Set tessellation level:TSSetMaxTessellationLevel( pd3dDevice, sg_fMaxTessellationLevel );// Select appropriate technique to render our tessellated objects:sg_pEffect->SetTechnique( "RenderTessellatedDisplacedScene" );

// Render all passes with tessellationV( sg_pEffect->Begin( &cPasses, 0 ) ); for ( iPass = 0; iPass < cPasses; iPass++ ) { V( sg_pEffect->BeginPass( iPass ) ); V( TSDrawMeshSubset( sg_pMesh, 0 ) ); V( sg_pEffect->EndPass() ); } V( sg_pEffect->End() );

// Disable tessellation:TSSetTessellationMode( pd3dDevice, TSMD_DISABLE );

Page 14: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Displacement Map

The vertex shader is used as an evaluation shader

Tessellator

Super-prim Mesh Tessellated and Displaced Mesh

Tessellated Mesh

Vertex Shader

(Evaluation Shader)

Sampler

Page 15: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Example Code: Evaluation Vertex Shader

struct VsInputTessellated

{

// Barycentric weights for this vertex

float3 vBarycentric: BLENDWEIGHT0;

// Data from superprim vertex 0:

float4 vPositionVert0 : POSITION0;

float2 vTexCoordVert0 : TEXCOORD0;

float3 vNormalVert0 : NORMAL0;

// Data from superprim vertex 1:

float4 vPositionVert1 : POSITION4;

float2 vTexCoordVert1 : TEXCOORD4;

float3 vNormalVert1 : NORMAL4;

// Data from superprim vertex 2:

float4 vPositionVert2 : POSITION8;

float2 vTexCoordVert2 : TEXCOORD8;

float3 vNormalVert2 : NORMAL8;

};

Page 16: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Example Code: Evaluation Vertex Shader

VsOutputTessellated VSRenderTessellatedDisplaced( VsInputTessellated i )

{

VsOutputTessellated o;

// Compute new position based on the barycentric coordinates:

float3 vPosTessOS = i.vPositionVert0.xyz * i.vBarycentric.x + i.vPositionVert1.xyz

i.vBarycentric.y + i.vPositionVert2.xyz * i.vBarycentric.z;

// Output world-space position:

o.vPositionWS = vPosTessOS;

// Compute new normal vector for the tessellated vertex:

o.vNormalWS = i.vNormalVert0.xyz * i.vBarycentric.x + i.vNormalVert1.xyz * i.vBarycentric.y

+ i.vNormalVert2.xyz * i.vBarycentric.z;

// Compute new texture coordinates based on the barycentric coordinates:

o.vTexCoord = i.vTexCoordVert0.xy * i.vBarycentric.x + i.vTexCoordVert1.xy * i.vBarycentric.y

+ i.vTexCoordVert2.xy * i.vBarycentric.z;

// Displace the tessellated vertex (sample the displacement map)

o.vPositionWS = DisplaceVertex( vPosTessOS, o.vTexCoord, o.vNormalWS );

// Transform position to screen-space:

o.vPosCS = mul( float4( o.vPositionWS, 1.0 ), g_mWorldViewProjection );

return o;

} // End of VsOutputTessellated VSRenderTessellatedDisplaced(..)

Page 17: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

What if you want to do more?

DirectX 9 has a limit of 15 float4 vertex input components – High order surfaces need more inputs

TSToggleIndicesRetrieval() allows you to fetch the super-prim data from a vertex texture

Bezier Control Points

Vertex Shader

Sampler

Tessellator

(u,v)

P0,0, P0,1 … P3,3

Page 18: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Other Tessellation Library Functions

TSDrawIndexed(…)

– Analogous to DrawIndexedPrimitive(…)

TSDrawNonIndexed(…)

– Needed for adaptive tessellation, since every edge needs its own tessellation level

TSSetMinTessellationLevel(…)

– Sets the minimum tessellation level for adaptive tessellation

TSComputeNumTessellatedPrimitives(…)

– Calculates the number of tessellated primitives that will be generated by the tessellator

Page 19: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Displacement mapping alters tangent space

To do normal mapping we need to rotate tangent space

Alternatively, use model space normal maps Doesn’t work with animation or tiling

Page 20: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Displacement map lighting

Use the displacement map to calculate the per-pixel normal

Central differencing with neighboring displacements can approximate the derivative

Light with the computed normal

No need to use a normal map

Page 21: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Terrain Rendering: Performance Results

Both use the same displacement map (2K x 2K)

and identical pixel shaders

Low Resolution with Tessellation

High Resolution, No Tessellation

On-disk model polygon count (pre-tessellation)

840 triangles 1,280,038 triangles

Original model rendering cost

1210 fps (0.83 ms)

Actual rendered model polygon count

1,008,038 triangles 1,280,038 triangles

VRAM Vertex buffer size

70 KB 31 MB

VRAM Index buffer size 23 KB 14 MB

Rendering time 821.41 fps (1.22 ms) 301 fps (3.32 ms)

Rendering with tessellation is > 6X faster and provides memory

savings over 44MB! Subtracting the cost of shading

Page 22: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Terrain Tessellation Sample

Page 23: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

AMD GPU MeshMapper

New tool for generate normal, displacement, and ambient occlusion maps from hi-res and low-res mesh pairs

Page 24: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Advantages of the Tessellator

• Saves memory bandwidth and reduces memory footprint

• Flexible support for displacement mapping and many kinds of high order surfaces

• Easier content creation – artists and animators only need to work with low resolution geometry

• Continuous LOD avoids unnecessary triangles

• The tessellator is available now on the Xbox 360 and the latest ATI Radeon and FireGL graphics cards

• Public availability of tessellation SDK very soon

Page 25: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Harnessing the Power of Multiple GPUs

Nicolas ThibierozAMD Graphics Products Group

[email protected]

Original materials from Jon Story & Holger Grün 2510/04/23

GDC Paris 2008

Page 26: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Why MGPU?

MGPUs can be used to dramatically increase performance and visual quality

– At higher screen resolutions

– Especially with increased use of MSAA

Many applications become GPU limited at higher screen resolutions

– High resolution monitors => mainstream affordability

Achieve next generation performance on today‘s HW

– Prototype your next engine

Provides an upgrade path for mainstream parts

2610/04/23

Page 27: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Multiple Boards

An increasing number of motherboards can accept 2 or more discrete video cards

Connected by high speed crossover cables

Now possible to fit 4 Radeon HD3850 boards to a single motherboard

CrossFireX technology allows you to harness that performance

2710/04/23

4x

2x

Page 28: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Multiple GPUs per Board

The Radeon HD3870 X2 is a single-board multi-GPU architecture

– AFR is on by default

Heavy peer to peer communication

– Bi-directional 16x lane pipe connecting the 2 GPUs

CrossFireX supports 2 HD3870 X2 boards for Quad GPU performance

2810/04/23

4x

2x

Page 29: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Hybrid Crossfire

Combination of integrated and discrete graphics

3D graphics performance boost

– Laptops

– Mainstream desktop PCs

Use less power during non-taxing graphical tasks

2910/04/23

Page 30: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

CrossFire Rendering Modes

Split Frame Rendering / Scissor

– Screen is divided into number of GPUs

– Dynamic load balancing

Alternate Frame Rendering

– GPUs take alternate frames

– Vertex processing not duplicated

– Highest performing mode

3010/04/23

Page 31: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

How does AFR Work?

3110/04/23

CPU

GPU0 (Frame N) GPU1 (Frame N+1)

Command

Command

Command

Command

Command

Command

Command

Command

Command

Command

Command

Command

Page 32: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Hardware Considerations

Current MGPU setups are not shared memory architectures

– Resources placed in local video memory are duplicated for each GPU

Driver initiates peer to peer (P2P) copies to keep resources in sync

– On some chipsets this may involve the CPU

– Synchronizes all GPUs

– Very heavy impact on performance that can even result in negative scaling

3210/04/23

Page 33: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Driver Modes

Compatible AFR Mode

– Default mode

– Driver checks for AFR unfriendly behaviour

– Will P2P copy stale resources

Full AFR Mode (Application Profile)

– Driver recognises EXE name

– Use a unique name and don‘t change it

– Behaviour fully guided by profile

– Best performance – no checking

– Rename EXE to “AFR-FriendlyD3D.exe“

– Use “AFR-FriendlyOGL.exe“ for OpenGL

– No checking : Speed & compatibility test

3310/04/23

Page 34: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Detecting the Number of GPUs

Visit http://ati.amd.com/developer

– Download project called “CrossFire Detect“

Statically link to:

– “atimgpud_s_x86.lib“ 32 bit version

– “atimgpud_s_x64.lib“ 64 bit version

Include header file:

– “atimgpud.h“

Call this function:

– INT count = AtiMultiGPUAdapters();

3410/04/23

Page 35: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Common Pitfalls & Solutions

3510/04/23

Page 36: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Pitfall: Dependencies Between Frames

36 10/04/23

Update resource A

Present (N)

Draw using A

Update resource A

Present (N+1)

GPU1 (Frame N+1)GPU0 (Frame N)

resource Aresource A

Present (N-1)

Draw using A

P2P copy from GPU0 to GPU1

Page 37: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Solution: Resources that Change Every Frame

3710/04/23

Draw using A

Present (N)

Update resource A

Draw using A

Present (N+1)

GPU1 (Frame N+1)GPU0 (Frame N)

resource Aresource A

Present (N-1)

Update resource A

Page 38: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Solution: Resources that Change Every Few Frames

3810/04/23

Draw using A

Present (N)

Update resource A

Draw using A

Present (N+1)

GPU1 (Frame N+1)GPU0 (Frame N)

resource Aresource A

Present (N-1)

Update resource A

Draw using A

Present (N+2)

Draw using A

Present (N+4)

Draw using A

Present (N+3)

Page 39: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Pitfalls: In DX10 there are Other Ways to Update Resources...

Drawing to vertex/index buffers

Stream Out

CopyResource() calls

CopySubresourceRegion() calls

GenerateMips() calls

ResolveSubresource() calls

3910/04/23

Page 40: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Pitfall: Waiting on Queries

4010/04/23

CPU

GPU0 (Frame N) GPU1 (Frame N+1)

Command

Command

Command

Command

Command

Command

Command

Command

Command

Command

Command

Command

Waiting for Query Result!!!

Page 41: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Solution: Queries

Avoid using queries whenever possible

- For occlusion queries consider a CPU-based approach

Avoid waiting on query results

- Pick up the result of a query at least N-GPU frames after it was issued

For queries issued every frame

- Create additional query objects for each GPU

- Cycle through them

Page 42: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Pitfall: CPU Access to a Renderable Resource

When the CPU locks a renderable resource it must wait for all GPUs to finish using the resource before acquiring the pointer

All GPUs now have to wait until the CPU unlocks the resource pointer

After the unlock the driver has to update the resource on each GPU via P2P copies

Just don‘t do this – it destroys performance even on a single GPU setup, and is catastrophic for MGPUs

4210/04/23

Page 43: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Solutions: Locks / Maps

In DX10 stream to and copy from STAGING textures

In DX9 StretchRect() is always better than Lock()

At resource creation time use the appropriate flags from:

– D3D10_USAGE

– D3D10_CPU_ACCESS_FLAG

In DX9 never lock static Vertex/Index Buffers because it will cause P2P copies

4310/04/23

Page 44: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Concluding Pitfalls & Solutions

Drivers take a conservative approach

– Performs checks on resource synchronization

– P2P copy if necessary

You know the application best

– Determine if a P2P copy is necessary

– Talk to us about a profile

4410/04/23

Page 45: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

AFR-Friendly SDK Sample

Part of the ATI developer SDK

– http://ati.amd.com/developer

Detects the number of GPUs

Correctly deals with textures used as render targets

Provides a solution for dealing with mouse cursor lag

Go and take a look!!

4510/04/23

Page 46: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

Call to Action

• MGPUs provide demonstrable performance gains

• MGPUs boost visual quality

• Plan from day one to make your rendering scale

• Detect the number of GPUs

• Regularly check for AFR unfriendly behavior

• Talk to us...

4610/04/23

Page 47: Tessellation in a Low Poly World Nicolas Thibieroz AMD Graphics Products Group nicolas.thibieroz@amd.com Original materials from Bill Bilodeau 1 15/01/2014

QUESTIONS?

[email protected]