parallel performance of hierarchical multipole algorithms for inductance extraction
DESCRIPTION
Parallel Performance of Hierarchical Multipole Algorithms for Inductance Extraction. Ananth Grama, Purdue University Vivek Sarin, Texas A&M University Hemant Mahawar, Texas A&M University Acknowledgements: National Science Foundation. Outline. Inductance Extraction - PowerPoint PPT PresentationTRANSCRIPT
Parallel Performance of Hierarchical Multipole Algorithms for
Inductance Extraction
Ananth Grama, Purdue University
Vivek Sarin, Texas A&M University
Hemant Mahawar, Texas A&M University
Acknowledgements: National Science Foundation.
HiPC 2004 2
Outline• Inductance Extraction
• Underlying Linear System
• The Solenoidal Basis Method
• Hierarchical Algorithms
• Parallel Formulations
• Experimental Results
HiPC 2004 3
Inductance Extraction• Inductance
Property of electric circuit to oppose change in its current
Electromotive force (emf) is induced Self Inductance, Mutual Inductance –
between conductors
• Inductance extraction Signal delays in circuits depend on
parasitic R, L, C At high frequency – signal delays
dominated by parasitic inductance Accurate estimation of inductive
coupling for circuit components Credits: oea.com
HiPC 2004 4
Inductance Extraction …• Inductance Extraction
For a set of s conductors – compute s x s impedance matrix Z
Z – self and mutual impedance among conductors
Conductors are discretized using a uniform two dimensional mesh for accurate impedance calculation
HiPC 2004 5
Constraints• Current density at a point
• Voltage drop across filaments – filament current & voltage
• Kirchoff’s law at nodes
• Potential difference in terms of node voltage
• Inductance matrix – function of 1/r
f fjω R L I V
f sTB I I
klVr Vrlk
lk
lkkl dVdV
rr
uu
aaπ
μL
kk ll
1
4
rΦVdrr
rJ
π
μjω
σ
rJ
V
4
n fBV V
HiPC 2004 6
Linear System• System Matrix
• Characteristics: • R – diagonal; B – sparse; L – dense
• Solution Method• Iterative methods – GMRES• Dense matrix-vector product with L
• hierarchical methods, matrix-free approach
• Challenge • Effective Preconditioning in absence of system matrix
f
n s
jω
T
I 0R L -B
V IB 0
HiPC 2004 7
Solenoidal Basis Method• Linear system with modified RHS
• Solenoidal basis• Automatically satisfies conservation
laws - Kirchoff’s current law• Mesh currents - basis for filament current• Solenoidal basis matrix P:• Current obeys Kirchoff’s law:
• Reduced system
0PBT 0IBPxI T
FPPxL RP TT jω
n
jω
T
IR L -B F
VB 0 0
HiPC 2004 8
Problem SizeNumber of unknowns for ground plane problem
MeshPotential
NodesCurrent
FilamentsLinear
SystemSolenoidal
functions
33x33 1,089 2,112 3,201 1,024
65x65 4,225 8,320 12,545 4,096
129x129 16,641 33,024 49,665 16,384
257x257 66,049 131,584 197,633 65,536
513x513 263,169 525,312 788,481 262,144
HiPC 2004 9
Hierarchical Methods• Matrix-vector product with n x n matrix – O (n2)• Faster matrix-vector product
• Matrix-free approach• Appel’s algorithm, Barnes-Hut method
• Particle-cluster interactions – O (n lg n)• Fast Multipole method
• Cluster-cluster interactions – O (n)
• Hierarchical refinement of underlying domain• 2-D – quad-tree, 3-D – oct-tree
• Rely on decaying 1/r kernel functions• Compute approximate matrix-vector product at the
cost of accuracy
HiPC 2004 10
Hierarchical Methods …• Fast Multipole Method (FMM)
• Divides the domain recursively into 8 sub-domain• Up-traversal
• computes multipole coefficients to give the effects of all the points inside a node at a far-way point
• Down-traversal• computes local coefficients to get the effect of all far-away
points inside a node• Direct interactions – for near by points
• Computation complexity – O ((d+1)4*N)• d – multipole degree
HiPC 2004 11
Hierarchical Methods …• Hierarchical Multipole Method (HMM)
• Augmented Barnes-Hut method or variant of FMM• Up-traversal
• Same as FMM• For each particle
• Multipole-acceptance-criteria (MAC) - ratio of distance of the particle from the center of the box to the dimension of the box
• use MAC to determine if multipole coefficients should be used to get the effect of all far-away points or not
• Direct interactions – for near by points
• Computation complexity – O ((d+1)2*N lg N)
HiPC 2004 12
ParIS: Parallel Solver• Application - inductance extraction• Solve reduced system with preconditioned iterative
method
• Iterative method – GMRES• Dense matrix-vector product with preconditioner and
coefficient matrix• Dense matrix-vector product dominates the computational
cost of the algorithm• Use of hierarchical methods to computes potential –
inductive effect on filaments
• Vector inner products• Negligible computation and communication cost
FPPxL RP TT jω
HiPC 2004 13
Parallelization Scheme• Two tier parallelization
• Each conductor - filaments and associated oct-tree
• Conductors – across MPI processes• Within a conductor – OpenMP process
• Pruning of tree to obtain sub-trees• Computation at top few levels of the tree is sequential
OpenMP OpenMP OpenMP
HiPC 2004 14
Experiments• Experiments on Interconnect Cross over problem
• 2 cm long, 2mm wide • Distance between conductors
• within a layer - .3 mm and across layers - 3 mm• Non-uniform distribution of conductors
• Comparison between FMM and HMM
• Parallel PlatformBeowulf cluster – Texas A&M University
• 64bit AMD – Opteron • LAM/MPI on SuSE-Linux – GNU compilers• 1.4 GHz, 128 dual-processor nodes, Gigabit ethernet
HiPC 2004 15
Cross Over Interconnects
HiPC 2004 16
Parameters• d – multipole degree
• α – multipole acceptance criteria
• s – number of particles per leaf node in tree
• Since d and α influence accuracy of matrix-vector product
• Impedance errors are kept similar – within 1% of a reference value computed by FMM with d = 8
• Scaled Efficiency E = BOPS/p• BOPS = average number of base operations per second• p = number of processors used
HiPC 2004 17
Experimental ResultsEffect of multipole degree (d) for different choice of s
FMM code HMM code
25.721.5 21.3
34.8
46.8
36.531.3
41.9
110.8
84.5
63.0 61.9
0.0
20.0
40.0
60.0
80.0
100.0
120.0
2 8 32 128
Number of particles per leaf node (s)
Tim
e (s
ec)
d=1 d=2 d=4
49.518.3 12.7
29.9
225.8
62.525.3 32.8
398.2
110.8
50.7
0.0
100.0
200.0
300.0
400.0
500.0
2 8 32 128
Number of particles per leaf node (s)
Tim
e (s
ec)
d=1 d=2 d=4
HiPC 2004 18
Experimental Results Effect of multipole degree (d) for different choice of s
Time in secs
d
FMM code HMM code
s=2 s=8 s=32 s=128 s=2 s=8 s=32 s=128
1 49.5 18.3 12.7 29.9 25.7 21.5 21.3 34.8
2 225.8 62.5 25.3 32.8 46.8 36.5 31.3 41.9
4 1513.3 398.2 110.8 50.7 110.8 84.5 63.0 61.9
HiPC 2004 19
Experimental ResultsEffect of MAC on HMM for different choice of s and d
Varying s Varying d
46.77
36.4931.28
89.25
70.45
59.44
0
20
40
60
80
100
120
2 8 32
Number of particles per leaf node (s)
Tim
e (s
ec)
alpha=1 alpha=1.5
21.5
36.5
84.5
40.1
70.6
158.2
0.0
40.0
80.0
120.0
160.0
1 2 4
Multipole degree (d)
Tim
e (s
ec)
alpha=1 alpha=1.5
HiPC 2004 20
Experimental Results …Effect of MAC on HMM for different choice of s and d
Time in secs Time in secs
α d=1 d=2 d=4
1 21.5 36.5 84.5
1.5 40.1 70.6 158.2
α s=2 s=8 s=32
1 46.8 36.5 31.3
1.5 89.3 70.6 59.5
HiPC 2004 21
Experimental Results Effect of multipole degree (d) on the HMM code on p
processors for two different choice of s
s = 8 s = 32
21.5 26.550.9
105.8
36.5 46.5
96.5
184.3
84.5101.9
220.9
436.8
0.0
100.0
200.0
300.0
400.0
500.0
1 2 4 8
Processors (p)
Tim
e (s
ec)
d=1 d=2 d=4
21.3 24.448.8
94.1
31.3 38.3
77.9
157.5
63.078.2
169.6
347.9
0.0
50.0
100.0
150.0
200.0
250.0
300.0
350.0
1 2 4 8
Processors (p)
Tim
e (s
ec)
d=1 d=2 d=4
HiPC 2004 22
Experimental Results Effect of multipole degree (d) on the HMM code on p
processors for two different choice of s
Time in secs
d
s = 8 s = 32
p=1 p=2 p=4 p=8 p=1 p=2 p=4 p=8
1 21.5 26.5 50.9 105.8 21.3 24.4 48.8 94.1
2 36.5 46.5 96.5 184.3 31.3 38.3 77.9 157.5
4 84.5 101.9 220.9 436.8 63.0 78.2 169.6 347.9
HiPC 2004 23
Experimental Results Effect of multipole degree (d) on the FMM code on p
processors for two different choice of s
s = 8 s = 32
18.3 25.7 34.559.262.5 72.5 87.5
131.3
398.2431.4
470.9
683.3
0.0
150.0
300.0
450.0
600.0
750.0
1 2 4 8Processors (p)
Tim
e (s
ec)
d=1 d=2 d=4
12.7 13.9
40.4
94.4
25.3 26.6
58.0
126.3110.8 113.4
165.7
277.8
0.0
50.0
100.0
150.0
200.0
250.0
300.0
1 2 4 8
Processors (p)
Tim
e (s
ec)
d=1 d=2 d=4
HiPC 2004 24
Experimental Results Effect of multipole degree (d) on the FMM code on p
processors for two different choice of s
Time in secs
d
s = 8 s = 32
p=1 p=2 p=4 p=8 p=1 p=2 p=4 p=8
1 18.3 25.7 34.5 59.2 12.7 13.9 40.4 94.4
2 62.5 72.5 87.5 131.3 25.3 26.6 58.0 126.3
4 398.2 431.4 470.9 683.3 110.8 113.4 165.7 277.8
HiPC 2004 25
Experimental Results …Parallel efficiency of the extraction codes
for different choice of d
FMM code HMM code
0.99
0.93 0.94
0.86
1.00
0.920.90
0.92
1.000.98
0.93 0.94
0.60
0.80
1.00
1 2 4 8
Processors (p)
Effic
ienc
y
d=1 d=2 d=4
0.98
0.74
0.87 0.87
0.99
0.86
0.97 0.981.00
0.93
1.04
0.98
0.60
0.80
1.00
1 2 4 8
Processors (p)
Effic
ienc
y
d=1 d=2 d=4
HiPC 2004 26
Experimental Results Parallel efficiency of the extraction codes
for different choice of d
d
FMM code HMM code
p=1 p=2 p=4 p=8 p=1 p=2 p=4 p=8
1 0.99 0.93 0.94 0.86 0.98 0.74 0.87 0.87
2 1.00 0.92 0.90 0.92 0.99 0.86 0.97 0.98
4 1.00 0.98 0.93 0.94 1.00 0.93 1.04 0.98
HiPC 2004 27
Experimental Results …Ratio of execution time of FMM to HMM code
on p processor for different choice of d
s = 8 s = 32
0.9 1.00.7 0.6
1.7 1.6
0.90.7
4.7
4.2
2.1
1.6
0.0
1.0
2.0
3.0
4.0
5.0
1 2 4 8
Processors (p)
Rat
io o
f FM
M ti
me
to H
MM
tim
e
d=1 d=2 d=4
0.6 0.6
0.8
1.0
0.80.7 0.7 0.8
1.8
1.4
1.0
0.8
0.0
0.2
0.4
0.6
0.8
1.0
1.2
1.4
1.6
1.8
2.0
1 2 4 8
Processors (p)
Rat
io o
f FM
M ti
me
to H
MM
tim
e
d=1 d=2 d=4
HiPC 2004 28
Experimental Results Ratio of execution time of FMM to HMM code
on p processor for different choice of d
d
s = 8 s = 32
p=1 p=2 p=4 p=8 p=1 p=2 p=4 p=8
1 0.9 1.0 0.7 0.6 0.6 0.6 0.8 1.0
2 1.7 1.6 0.9 0.7 0.8 0.7 0.7 0.8
4 4.7 4.2 2.1 1.6 1.8 1.4 1.0 0.8
HiPC 2004 29
Concluding Remarks• FMM execution time – O ((d+1)4N)
• HMM execution time - O ((d+1)2N lg N)
• For HMM increase in MAC (α) – increase in time and accuracy for matrix-vector product
• FMM achieves higher parallel efficiency for large d
• When the number of particles per leaf node (s) is smaller, HMM outperforms FMM in execution time
• Parallel implementation, ParIS, is scalable and achieves high parallel efficiency
HiPC 2004 30
Thank You !!