parallel performance of hierarchical multipole algorithms for inductance extraction

Parallel Performance of Hierarchical Multipole Algorithms for

Inductance Extraction

Ananth Grama, Purdue University

Vivek Sarin, Texas A&M University

Hemant Mahawar, Texas A&M University

Acknowledgements: National Science Foundation.

HiPC 2004 2

Outline• Inductance Extraction

• Underlying Linear System

• The Solenoidal Basis Method

• Hierarchical Algorithms

• Parallel Formulations

• Experimental Results

HiPC 2004 3

Inductance Extraction• Inductance

Property of electric circuit to oppose change in its current

Electromotive force (emf) is induced Self Inductance, Mutual Inductance –

between conductors

• Inductance extraction Signal delays in circuits depend on

parasitic R, L, C At high frequency – signal delays

dominated by parasitic inductance Accurate estimation of inductive

coupling for circuit components Credits: oea.com

HiPC 2004 4

Inductance Extraction …• Inductance Extraction

For a set of s conductors – compute s x s impedance matrix Z

Z – self and mutual impedance among conductors

Conductors are discretized using a uniform two dimensional mesh for accurate impedance calculation

HiPC 2004 5

Constraints• Current density at a point

• Voltage drop across filaments – filament current & voltage

• Kirchoff’s law at nodes

• Potential difference in terms of node voltage

• Inductance matrix – function of 1/r

f fjω R L I V

f sTB I I

klVr Vrlk

lk

lkkl dVdV

rr

uu

aaπ

μL

kk ll

1

4

rΦVdrr

rJ

π

μjω

σ

rJ

V

4

n fBV V

HiPC 2004 6

Linear System• System Matrix

• Characteristics: • R – diagonal; B – sparse; L – dense

• Solution Method• Iterative methods – GMRES• Dense matrix-vector product with L

• hierarchical methods, matrix-free approach

• Challenge • Effective Preconditioning in absence of system matrix

f

n s

jω

T

I 0R L -B

V IB 0

HiPC 2004 7

Solenoidal Basis Method• Linear system with modified RHS

• Solenoidal basis• Automatically satisfies conservation

laws - Kirchoff’s current law• Mesh currents - basis for filament current• Solenoidal basis matrix P:• Current obeys Kirchoff’s law:

• Reduced system

0PBT 0IBPxI T

FPPxL RP TT jω

n

jω

T

IR L -B F

VB 0 0

HiPC 2004 8

Problem SizeNumber of unknowns for ground plane problem

MeshPotential

NodesCurrent

FilamentsLinear

SystemSolenoidal

functions

33x33 1,089 2,112 3,201 1,024

65x65 4,225 8,320 12,545 4,096

129x129 16,641 33,024 49,665 16,384

257x257 66,049 131,584 197,633 65,536

513x513 263,169 525,312 788,481 262,144

HiPC 2004 9

Hierarchical Methods• Matrix-vector product with n x n matrix – O (n2)• Faster matrix-vector product

• Matrix-free approach• Appel’s algorithm, Barnes-Hut method

• Particle-cluster interactions – O (n lg n)• Fast Multipole method

• Cluster-cluster interactions – O (n)

• Hierarchical refinement of underlying domain• 2-D – quad-tree, 3-D – oct-tree

• Rely on decaying 1/r kernel functions• Compute approximate matrix-vector product at the

cost of accuracy

HiPC 2004 10

Hierarchical Methods …• Fast Multipole Method (FMM)

• Divides the domain recursively into 8 sub-domain• Up-traversal

• computes multipole coefficients to give the effects of all the points inside a node at a far-way point

• Down-traversal• computes local coefficients to get the effect of all far-away

points inside a node• Direct interactions – for near by points

• Computation complexity – O ((d+1)4*N)• d – multipole degree

HiPC 2004 11

Hierarchical Methods …• Hierarchical Multipole Method (HMM)

• Augmented Barnes-Hut method or variant of FMM• Up-traversal

• Same as FMM• For each particle

• Multipole-acceptance-criteria (MAC) - ratio of distance of the particle from the center of the box to the dimension of the box

• use MAC to determine if multipole coefficients should be used to get the effect of all far-away points or not

• Direct interactions – for near by points

• Computation complexity – O ((d+1)2*N lg N)

HiPC 2004 12

ParIS: Parallel Solver• Application - inductance extraction• Solve reduced system with preconditioned iterative

method

• Iterative method – GMRES• Dense matrix-vector product with preconditioner and

coefficient matrix• Dense matrix-vector product dominates the computational

cost of the algorithm• Use of hierarchical methods to computes potential –

inductive effect on filaments

• Vector inner products• Negligible computation and communication cost

FPPxL RP TT jω

HiPC 2004 13

Parallelization Scheme• Two tier parallelization

• Each conductor - filaments and associated oct-tree

• Conductors – across MPI processes• Within a conductor – OpenMP process

• Pruning of tree to obtain sub-trees• Computation at top few levels of the tree is sequential

OpenMP OpenMP OpenMP

HiPC 2004 14

Experiments• Experiments on Interconnect Cross over problem

• 2 cm long, 2mm wide • Distance between conductors

• within a layer - .3 mm and across layers - 3 mm• Non-uniform distribution of conductors

• Comparison between FMM and HMM

• Parallel PlatformBeowulf cluster – Texas A&M University

• 64bit AMD – Opteron • LAM/MPI on SuSE-Linux – GNU compilers• 1.4 GHz, 128 dual-processor nodes, Gigabit ethernet

HiPC 2004 15

Cross Over Interconnects

HiPC 2004 16

Parameters• d – multipole degree

• α – multipole acceptance criteria

• s – number of particles per leaf node in tree

• Since d and α influence accuracy of matrix-vector product

• Impedance errors are kept similar – within 1% of a reference value computed by FMM with d = 8

• Scaled Efficiency E = BOPS/p• BOPS = average number of base operations per second• p = number of processors used

HiPC 2004 17

Experimental ResultsEffect of multipole degree (d) for different choice of s

FMM code HMM code

25.721.5 21.3

34.8

46.8

36.531.3

41.9

110.8

84.5

63.0 61.9

0.0

20.0

40.0

60.0

80.0

100.0

120.0

2 8 32 128

Number of particles per leaf node (s)

Tim

e (s

ec)

d=1 d=2 d=4

49.518.3 12.7

29.9

225.8

62.525.3 32.8

398.2

110.8

50.7

0.0

100.0

200.0

300.0

400.0

500.0

2 8 32 128


Tim

e (s

ec)

d=1 d=2 d=4

HiPC 2004 18

Experimental Results Effect of multipole degree (d) for different choice of s

Time in secs

d

FMM code HMM code

s=2 s=8 s=32 s=128 s=2 s=8 s=32 s=128

1 49.5 18.3 12.7 29.9 25.7 21.5 21.3 34.8

2 225.8 62.5 25.3 32.8 46.8 36.5 31.3 41.9

4 1513.3 398.2 110.8 50.7 110.8 84.5 63.0 61.9

HiPC 2004 19

Experimental ResultsEffect of MAC on HMM for different choice of s and d

Varying s Varying d

46.77

36.4931.28

89.25

70.45

59.44

0

20

40

60

80

100

120

2 8 32


Tim

e (s

ec)

alpha=1 alpha=1.5

21.5

36.5

84.5

40.1

70.6

158.2

0.0

40.0

80.0

120.0

160.0

1 2 4

Multipole degree (d)

Tim

e (s

ec)

alpha=1 alpha=1.5

HiPC 2004 20

Experimental Results …Effect of MAC on HMM for different choice of s and d

Time in secs Time in secs

α d=1 d=2 d=4

1 21.5 36.5 84.5

1.5 40.1 70.6 158.2

α s=2 s=8 s=32

1 46.8 36.5 31.3

1.5 89.3 70.6 59.5

HiPC 2004 21

Experimental Results Effect of multipole degree (d) on the HMM code on p

processors for two different choice of s

s = 8 s = 32

21.5 26.550.9

105.8

36.5 46.5

96.5

184.3

84.5101.9

220.9

436.8

0.0

100.0

200.0

300.0

400.0

500.0

1 2 4 8

Processors (p)

Tim

e (s

ec)

d=1 d=2 d=4

21.3 24.448.8

94.1

31.3 38.3

77.9

157.5

63.078.2

169.6

347.9

0.0

50.0

100.0

150.0

200.0

250.0

300.0

350.0

1 2 4 8

Processors (p)

Tim

e (s

ec)

d=1 d=2 d=4

HiPC 2004 22

Experimental Results Effect of multipole degree (d) on the HMM code on p


Time in secs

d

s = 8 s = 32

p=1 p=2 p=4 p=8 p=1 p=2 p=4 p=8

1 21.5 26.5 50.9 105.8 21.3 24.4 48.8 94.1

2 36.5 46.5 96.5 184.3 31.3 38.3 77.9 157.5

4 84.5 101.9 220.9 436.8 63.0 78.2 169.6 347.9

HiPC 2004 23

Experimental Results Effect of multipole degree (d) on the FMM code on p


s = 8 s = 32

18.3 25.7 34.559.262.5 72.5 87.5

131.3

398.2431.4

470.9

683.3

0.0

150.0

300.0

450.0

600.0

750.0

1 2 4 8Processors (p)

Tim

e (s

ec)

d=1 d=2 d=4

12.7 13.9

40.4

94.4

25.3 26.6

58.0

126.3110.8 113.4

165.7

277.8

0.0

50.0

100.0

150.0

200.0

250.0

300.0

1 2 4 8

Processors (p)

Tim

e (s

ec)

d=1 d=2 d=4

HiPC 2004 24

Experimental Results Effect of multipole degree (d) on the FMM code on p


Time in secs

d

s = 8 s = 32

p=1 p=2 p=4 p=8 p=1 p=2 p=4 p=8

1 18.3 25.7 34.5 59.2 12.7 13.9 40.4 94.4

2 62.5 72.5 87.5 131.3 25.3 26.6 58.0 126.3

4 398.2 431.4 470.9 683.3 110.8 113.4 165.7 277.8

HiPC 2004 25

Experimental Results …Parallel efficiency of the extraction codes

for different choice of d

FMM code HMM code

0.99

0.93 0.94

0.86

1.00

0.920.90

0.92

1.000.98

0.93 0.94

0.60

0.80

1.00

1 2 4 8

Processors (p)

Effic

ienc

y

d=1 d=2 d=4

0.98

0.74

0.87 0.87

0.99

0.86

0.97 0.981.00

0.93

1.04

0.98

0.60

0.80

1.00

1 2 4 8

Processors (p)

Effic

ienc

y

d=1 d=2 d=4

HiPC 2004 26

Experimental Results Parallel efficiency of the extraction codes

for different choice of d

d

FMM code HMM code

p=1 p=2 p=4 p=8 p=1 p=2 p=4 p=8

1 0.99 0.93 0.94 0.86 0.98 0.74 0.87 0.87

2 1.00 0.92 0.90 0.92 0.99 0.86 0.97 0.98

4 1.00 0.98 0.93 0.94 1.00 0.93 1.04 0.98

HiPC 2004 27

Experimental Results …Ratio of execution time of FMM to HMM code

on p processor for different choice of d

s = 8 s = 32

0.9 1.00.7 0.6

1.7 1.6

0.90.7

4.7

4.2

2.1

1.6

0.0

1.0

2.0

3.0

4.0

5.0

1 2 4 8

Processors (p)

Rat

io o

f FM

M ti

me

to H

MM

tim

e

d=1 d=2 d=4

0.6 0.6

0.8

1.0

0.80.7 0.7 0.8

1.8

1.4

1.0

0.8

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

1.6

1.8

2.0

1 2 4 8

Processors (p)

Rat

io o

f FM

M ti

me

to H

MM

tim

e

d=1 d=2 d=4

HiPC 2004 28

Experimental Results Ratio of execution time of FMM to HMM code

on p processor for different choice of d

d

s = 8 s = 32

p=1 p=2 p=4 p=8 p=1 p=2 p=4 p=8

1 0.9 1.0 0.7 0.6 0.6 0.6 0.8 1.0

2 1.7 1.6 0.9 0.7 0.8 0.7 0.7 0.8

4 4.7 4.2 2.1 1.6 1.8 1.4 1.0 0.8

HiPC 2004 29

Concluding Remarks• FMM execution time – O ((d+1)4N)

• HMM execution time - O ((d+1)2N lg N)

• For HMM increase in MAC (α) – increase in time and accuracy for matrix-vector product

• FMM achieves higher parallel efficiency for large d

• When the number of particles per leaf node (s) is smaller, HMM outperforms FMM in execution time

• Parallel implementation, ParIS, is scalable and achieves high parallel efficiency

HiPC 2004 30

Thank You !!

parallel performance of hierarchical multipole algorithms for inductance extraction

Documents

mutual inductance

methodsmatrixvector

inducedself inductance

lhierarchical methods

mutual impedance

nodedirect interactions

notdirect interactions

parasitic r