optimalpartitionforoptimal partition for parallel...
TRANSCRIPT
Optimal Partition forOptimal Partition forParallel Renderingg
Jongwook JinK W hKwangyun Wohn
VR Lab. CS Dept. KAISTCulture Technology Research CenterCulture Technology Research Center
MAR.06.2008
Optimal Partition for Parallel RenderingOptimal Partition for Parallel Rendering
• Load balanced partition for rendering – Adhoc and Scan-like Tries – No Optimal Approaches for reducing Parallel
rendering Hazardsrendering Hazards
Partition algorithm for Parallel Rendering• Partition algorithm for Parallel Rendering– Load Balance with Minimization Parallel
d i dRendering Hazards– Realtime Partitioning Performanceg
Parallel Graphics HazardsParallel Graphics Hazards
Processing Hazard Management Hazard
• Unbalanced Graphics Processing
• vertices and pixel processing
• Unbalanced Graphics Processing
• vertices and pixel processing
• Unbalanced Management
Processing
• Unbalanced Management
Processinge t ces a d p e p ocess g
• overlap primitives
between different partitions
e t ces a d p e p ocess g
• overlap primitives
between different partitions
• frame coherency
between each processor’s
• frame coherency
between each processor’s
partitions of animate frame se
quence
t f d i
partitions of animate frame se
quence
t f d i• out of core rendering
resources management
• out of core rendering
resources management
Optimal Partition Generation forpParallel Graphics
T i l L d b l i f N L d b l i fTypical Load balancing for
Parallel processing
New Load balancing for
Parallel Realtime Graphics
• Preprocessed, Static Partition • Preprocessed, Static Partition • Runtime, Dynamic Partition
with computing time
• Runtime, Dynamic Partition
with computing time
• Load Balance with Minimization
Cutting Edge between Partitions
• Load Balance with Minimization
Cutting Edge between Partitions
constraints
• Load Balance with minimizing
Parallel Graphics Hazards
constraints
• Load Balance with minimizing
Parallel Graphics HazardsCutting Edge between Partitions
(for communication minimization)
Cutting Edge between Partitions
(for communication minimization)Parallel Graphics HazardsParallel Graphics Hazards
Optimal Partition Generation forParallel Graphics
Visibility Culling Graph Modeling Graph Partitioning
Vi f t C lli ith Ad ti h i f H d Mi i i tiView frustum Culling with
Hierarchical Structure
bl
Adapting graph size for
realtime partition algorithm
Hazards Minimization
within load balance condition
F Effi i d S l blProblem set
per each image frame3D or 2D Graph
for problem set
Fast, Efficient and Scalable
Graph Partition Algorithm with
realtime constraints
Graphics Hazards Modeling via
Weighted LaplacianOptimal graph partition for
Assignments to graphic
processors
Graph Partition Algorithm -p
gOverview of Spectral Bisection
kl d h l• A Workload Graph expresses as a Laplacian Matrix.• Rayleigh quotient of Assignment Vector and laplacian
t i t f tti d t b tmatrix represents sum of cutting edge cost between two partitions.
Laplacian Matrix⎪
⎪⎨
⎧=
∈−= ji
EvvdL
ji
iij if),(if1
Discrete }Ninnodesto-Ninnodesconnectingedges#{4 +×=t Lxx
⎪⎩ else0
Discrete
Optimization
Continues
1}Nin nodestoNin nodesconnectingedges#{4
±=+×=
ixLxx
∑t LxxF 2)()(Continues
Relaxation∑
∈
−=⋅
=Eji
ji xxxx
xF),(
2)()(
Graph Partition Algorithm -p
gOverview of Spectral Bisection
l l f l l d ll• Spectral analysis of laplacian matrix; Second smallest eigenvector minimizes Rayleigh Quotient. O ti l i t t i d i ti t• Optimal assignment vector is a good approximation to load balanced graph partition problem with minimal communication edges between partitionscommunication edges between partitions.
Spectral ∑n
tSpectral
Analysis ∑=
=i
tiii zzL
1λ 1i11 and,0 +<== iez λλλ
Optimal
Assignment Vector22
00
,over minimize λ=⋅
=⇒⋅
=≠ xx
Lxxzxxxx
Lxx tt
xex
t
Graph Partition Algorithm –g
gAlgorithm Selection
Avoidance of computing full spectral analysis;
just need is second smallest Eigen vector.j g
Modification of conventional optimization method;Modification of conventional optimization method;
Conjugate Gradient Method
• Shows High convergence speed to
second smallest Eigen vector of Laplacian matrix
• Meets Load balance constraint
• Requires Low memoryq y
Conjugate gradient method for the spectral partitioning of graphs
,)(xx
LxxxFt
⋅= ))((2)( xxFxL
xxxF −⋅
⋅=∇
0000 ),( ghxFg =−∇=Step 1:
Step 2: Fletch Reeves Conjugate gradient method
Initialize and gradient vector0x
11 )(min −−
+=
+= kkk
hxx
hxF
α
ααα
Step 2: Fletch-Reeves Conjugate gradient method
11
)(−−
⋅+=
−∇=+=
kk
kk
kkkk
hgggh
xFghxx α
111
−−− ⋅
+= kkk
kk hgg
gh
Step 3: Check for convergence. If convergence has not been reached go to Step 2.
kxx =
g g p
Step 4: Accept optimal vector
Experiment for feasibility studyExperiment for feasibility study
Visibility Culling Graph Modeling Graph Partitioning
Vi f t C lli ith U S ifi H d dView frustum Culling with
Hierarchical Structure
i ibl O ll
3D 6 Grid Vertex Graph
for problem set
User Specific Hazard and
Load balance thresholds
F Effi i d S l blVisible Octree cell set
per each image frameRendering Primitive Cost
Modeling via Weighted
Laplacian
Fast, Efficient and Scalable
Graph Partition Algorithm with
realtime constraints
Optimal graph partition for
Assignments to graphic
processors
Load Balanced Partition GenerationLoad Balanced Partition Generation
Urban Scene (Digital Media City)
323863 triangles
[262 visible octree cells]
Positive Set
164524
[49]
Negative Set
159339
[213]
Imbalance
5185 / 323863 = 1.60%5185 / 323863 1.60%
Optimal Bisection for a image frame
18 000 009
Optimal Bisection for a image frame
14 00
16.00
18.00
0.007
0.008
0.009
10 00
12.00
14.00
0.005
0.006
6.00
8.00
10.00
0.003
0.004
2.00
4.00
0.000
0.001
0.002
0.00-0.001
1 8 15 22 29 36 43 50 57 64 71 78 85 92 99 106
113
120
127
134
141
148
155
162
169
176
183
Objective imbalance
182 iteration for partitioning weighted graph of 262 octree cells
Optimal Bisection for animation frames
Camera Closing to urban ground and Partitions
Dynamic Workload
Graph Size and Imbalance
by camera movement
0.080
0.090
900
1000
Graph Size and Imbalance
0.060
0.070
600
700
800
0.030
0.040
0.050
400
500
0.010
0.020
100
200
300
0.000 0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
VertexCount imbalanceVertexCount imbalance
With Objective function threshold 0.0002, Imbalance threshold 2%
Graph Size and Iteration Count
Time Complexity and Execution Time
800
1000
Graph Size and Iteration Count
200
400
600
0
200
1 2 3 4 5 6 7 8 9 1011121314151617181920212223242526272829
351000
Graph Size and Execution Time
VertexCount IterCount
20
25
30
600
800
0
5
10
15
0
200
400
Intel Pentium 2.4Ghz3Giga Memory00
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
VertexCount MilliSec
3Giga Memory
ConclusionConclusion
O i l i i ld b l d i l• Optimal partition could be solved in real-time for parallel graphics and simulations.
• Future works– Graphics Hazards Modeling
• for reducing parallel graphics hazards
– Valley Constraint Optimization • for partition’s frame coherency
– Recursive partitioning • for K-way processors
Thanks a lot.