optimalpartitionforoptimal partition for parallel...

17
Optimal Partition for Optimal Partition for Parallel Rendering Jongwook Jin K Wh Kwangyun Wohn VR Lab. CS Dept. KAIST Culture Technology Research Center Culture Technology Research Center MAR.06.2008

Upload: others

Post on 18-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Optimal Partition forOptimal Partition forParallel Renderingg

Jongwook JinK W hKwangyun Wohn

VR Lab. CS Dept. KAISTCulture Technology Research CenterCulture Technology Research Center

MAR.06.2008

Page 2: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Optimal Partition for Parallel RenderingOptimal Partition for Parallel Rendering

• Load balanced partition for rendering – Adhoc and Scan-like Tries – No Optimal Approaches for reducing Parallel

rendering Hazardsrendering Hazards

Partition algorithm for Parallel Rendering• Partition algorithm for Parallel Rendering– Load Balance with Minimization Parallel

d i dRendering Hazards– Realtime Partitioning Performanceg

Page 3: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Parallel Graphics HazardsParallel Graphics Hazards

Processing Hazard Management Hazard

• Unbalanced Graphics Processing

• vertices and pixel processing

• Unbalanced Graphics Processing

• vertices and pixel processing

• Unbalanced Management

Processing

• Unbalanced Management

Processinge t ces a d p e p ocess g

• overlap primitives

between different partitions

e t ces a d p e p ocess g

• overlap primitives

between different partitions

• frame coherency

between each processor’s

• frame coherency

between each processor’s

partitions of animate frame se

quence

t f d i

partitions of animate frame se

quence

t f d i• out of core rendering

resources management

• out of core rendering

resources management

Page 4: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Optimal Partition Generation forpParallel Graphics

T i l L d b l i f N L d b l i fTypical Load balancing for

Parallel processing

New Load balancing for

Parallel Realtime Graphics

• Preprocessed, Static Partition • Preprocessed, Static Partition • Runtime, Dynamic Partition

with computing time

• Runtime, Dynamic Partition

with computing time

• Load Balance with Minimization

Cutting Edge between Partitions

• Load Balance with Minimization

Cutting Edge between Partitions

constraints

• Load Balance with minimizing

Parallel Graphics Hazards

constraints

• Load Balance with minimizing

Parallel Graphics HazardsCutting Edge between Partitions

(for communication minimization)

Cutting Edge between Partitions

(for communication minimization)Parallel Graphics HazardsParallel Graphics Hazards

Page 5: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Optimal Partition Generation forParallel Graphics

Visibility Culling Graph Modeling Graph Partitioning

Vi f t C lli ith Ad ti h i f H d Mi i i tiView frustum Culling with

Hierarchical Structure

bl

Adapting graph size for

realtime partition algorithm

Hazards Minimization

within load balance condition

F Effi i d S l blProblem set

per each image frame3D or 2D Graph

for problem set

Fast, Efficient and Scalable

Graph Partition Algorithm with

realtime constraints

Graphics Hazards Modeling via

Weighted LaplacianOptimal graph partition for

Assignments to graphic

processors

Page 6: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Graph Partition Algorithm -p

gOverview of Spectral Bisection

kl d h l• A Workload Graph expresses as a Laplacian Matrix.• Rayleigh quotient of Assignment Vector and laplacian

t i t f tti d t b tmatrix represents sum of cutting edge cost between two partitions.

Laplacian Matrix⎪

⎪⎨

⎧=

∈−= ji

EvvdL

ji

iij if),(if1

Discrete }Ninnodesto-Ninnodesconnectingedges#{4 +×=t Lxx

⎪⎩ else0

Discrete

Optimization

Continues

1}Nin nodestoNin nodesconnectingedges#{4

±=+×=

ixLxx

∑t LxxF 2)()(Continues

Relaxation∑

−=⋅

=Eji

ji xxxx

xF),(

2)()(

Page 7: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Graph Partition Algorithm -p

gOverview of Spectral Bisection

l l f l l d ll• Spectral analysis of laplacian matrix; Second smallest eigenvector minimizes Rayleigh Quotient. O ti l i t t i d i ti t• Optimal assignment vector is a good approximation to load balanced graph partition problem with minimal communication edges between partitionscommunication edges between partitions.

Spectral ∑n

tSpectral

Analysis ∑=

=i

tiii zzL

1λ 1i11 and,0 +<== iez λλλ

Optimal

Assignment Vector22

00

,over minimize λ=⋅

=⇒⋅

=≠ xx

Lxxzxxxx

Lxx tt

xex

t

Page 8: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Graph Partition Algorithm –g

gAlgorithm Selection

Avoidance of computing full spectral analysis;

just need is second smallest Eigen vector.j g

Modification of conventional optimization method;Modification of conventional optimization method;

Conjugate Gradient Method

• Shows High convergence speed to

second smallest Eigen vector of Laplacian matrix

• Meets Load balance constraint

• Requires Low memoryq y

Page 9: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Conjugate gradient method for the spectral partitioning of graphs

,)(xx

LxxxFt

⋅= ))((2)( xxFxL

xxxF −⋅

⋅=∇

0000 ),( ghxFg =−∇=Step 1:

Step 2: Fletch Reeves Conjugate gradient method

Initialize and gradient vector0x

11 )(min −−

+=

+= kkk

hxx

hxF

α

ααα

Step 2: Fletch-Reeves Conjugate gradient method

11

)(−−

⋅+=

−∇=+=

kk

kk

kkkk

hgggh

xFghxx α

111

−−− ⋅

+= kkk

kk hgg

gh

Step 3: Check for convergence. If convergence has not been reached go to Step 2.

kxx =

g g p

Step 4: Accept optimal vector

Page 10: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Experiment for feasibility studyExperiment for feasibility study

Visibility Culling Graph Modeling Graph Partitioning

Vi f t C lli ith U S ifi H d dView frustum Culling with

Hierarchical Structure

i ibl O ll

3D 6 Grid Vertex Graph

for problem set

User Specific Hazard and

Load balance thresholds

F Effi i d S l blVisible Octree cell set

per each image frameRendering Primitive Cost

Modeling via Weighted

Laplacian

Fast, Efficient and Scalable

Graph Partition Algorithm with

realtime constraints

Optimal graph partition for

Assignments to graphic

processors

Page 11: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Load Balanced Partition GenerationLoad Balanced Partition Generation

Urban Scene (Digital Media City)

323863 triangles

[262 visible octree cells]

Positive Set

164524

[49]

Negative Set

159339

[213]

Imbalance

5185 / 323863 = 1.60%5185 / 323863 1.60%

Page 12: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Optimal Bisection for a image frame

18 000 009

Optimal Bisection for a image frame

14 00

16.00

18.00

0.007

0.008

0.009

10 00

12.00

14.00

0.005

0.006

6.00

8.00

10.00

0.003

0.004

2.00

4.00

0.000

0.001

0.002

0.00-0.001

1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106

113

120

127

134

141

148

155

162

169

176

183

Objective imbalance

182 iteration for partitioning weighted graph of 262 octree cells

Page 13: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Optimal Bisection for animation frames

Camera Closing to urban ground and Partitions

Page 14: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Dynamic Workload

Graph Size and Imbalance

by camera movement

0.080

0.090

900

1000

Graph Size and Imbalance

0.060

0.070

600

700

800

0.030

0.040

0.050

400

500

0.010

0.020

100

200

300

0.000 0

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29

VertexCount imbalanceVertexCount imbalance

With Objective function threshold 0.0002, Imbalance threshold 2%

Page 15: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Graph Size and Iteration Count

Time Complexity and Execution Time

800

1000

Graph Size and Iteration Count

200

400

600

0

200

1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829

351000

Graph Size and Execution Time

VertexCount IterCount

20

25

30

600

800

0

5

10

15

0

200

400

Intel Pentium 2.4Ghz3Giga Memory00

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28

VertexCount MilliSec

3Giga Memory

Page 16: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

ConclusionConclusion

O i l i i ld b l d i l• Optimal partition could be solved in real-time for parallel graphics and simulations.

• Future works– Graphics Hazards Modeling

• for reducing parallel graphics hazards

– Valley Constraint Optimization • for partition’s frame coherency

– Recursive partitioning • for K-way processors

Page 17: OptimalPartitionforOptimal Partition for Parallel …kr.nvidia.com/content/cudazone/download/showcase/kr/...Optimal Partition Generation for Parallel Graphicsp TilLdbl i fTypical Load

Thanks a lot.