a fast multifrontal solver for dei - university of …geppo/papers/iccs04-poster.pdf · a fast...

1
A Fast Multifrontal Solver for Non-Linear Multi-Physics Problems [A. Bertoldo, M. Bianco, G. Pucci] {cyberto, bianco1, geppo} @dei.unipd.it ACG ADVANCED COMPUTING GROUP DEI - UNIVERSITY OF PADOVA The multifrontal approach Unifrontal Sequential assembly strategy Recursive assembly strategy in a bottom-up fashion Multifrontal June 6-9, 2004 Krakow, Poland Regions Mesh region E An elementary region A composite region At each node: 1) Assemble the region 2) Eliminate fully-summed variables The assembly tree Leaves: The assembly phase gets elements from the FEM formulation At the root we have fully decomposed the linear system into LU factors Strip phase III. Strip factors away for subsequent use in the final backward and forward substitution Elimination phase II. Eliminate FS variables using a blocked UL decomposition BLAS Level 3 0.059 0.6279 1.154 7.136 0.06 0.99 2.072 23.264 0.114 0.6802 1.357 12.599 0.0539 1.062 4.584 33.194 0 5 10 15 20 25 30 35 100 400 625 2500 Number of elements Execution times (s) PMMS Unifrontal MUMPS SuperLU 312 494 518 666 325 584 642 846 450 553 606 860 74 284 317 354 0 100 200 300 400 500 600 700 800 900 1000 100 400 625 2500 Number of elements Flops rate (MFLOPS) PMMS Unifrontal MUMPS SuperLU Execution time Flops PMMS is faster than Unifrontal solver, but they have the same solving kernel The multifrontal assembly scheme is more efficient PMMS is faster than both SuperLU and MUMPS for all significant problem sizes The super-assembly phase and the use of BLAS boosts the computation The larger is the problem size, the faster is PMMS with respect to the other solvers For larger test cases we expect a bigger performance improvement MUMPS and Unifrontal solver exhibit larger flop rates than PMMS does They are better tuned but their algorithm has higher complexity Performance Results Algorithm of super-assembly phase MIUR Center for Science and Application of Advanced Computing Paradigms at the University of Padova, Italy Consorzio Roma Ricerche Roma, Italy International Centre for Mechanical Sciences, Udine, Italy Ente per le Nuove tecnologie, l’Energia e l’Ambiente, Roma, Italy Department of Information Engineering University of Padova, Italy ? IBM Power3@375MHz with 4 GB mem ? HPM Toolkit for performance measurements ? FE square meshes with 100, 400, 625, and 2500 square 8-node elements ? PMMS is our multifrontal solver ? SuperLU version 3.0 ? MUMPS version 4.3 Assembly phase Ia. Merge the two reduced components into a new composite region Swap phase Ib. Pack FS rows and columns at the bottom-right corner of the non-reduced region Copy phase Ic. Copy FS blocks into temporary buffers + + = Super-assembly phase Computation properties Symbolic analysis Symbolic data These data are computed once at the beginning of the computation, but used at each iteration to perform the super-assembly phase. Assembly tree topology Finite element mesh data Simulation of porous media under high temperature Area of interest Non-linear coupled multi-physics problem Large non-linear systems of PDEs LARGE LINEAR SYSTEMS FEM Physical model Mathematical model Linearization Our Goal Main features of our solver Other specific features: F Multifrontal assembly/elimination strategy F Implicit minimum degree pivoting F Symbolic preprocessing phase F Super-assembly phase F Blocked LU decomposition for elimination + Frontal solvers are direct methods since they first transform the system using Gaussian elimination or LU decomposition, then get the final solution using forward and backward substitution. + They do not operate on the completely assembled linear system, but rather interleave assembly phases with elimination phases. + They require low memory space and can exploit efficient dense linear algebra kernels. Regions can be both elementary and composite. The former are obtained from the finite element formulation. The latter are unions of two component regions from an assembly phase. The “spalling” phenomenon in a concrete-made pillar after a simulated fire. The FE simulation of the “spalling” phenomenon in a (section of) concrete-made pillar in case of fire

Upload: lyngoc

Post on 17-May-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

A Fast Multifrontal Solver for Non-Linear Multi-Physics Problems

[A. Bertoldo, M. Bianco, G. Pucci] {cyberto, bianco1, geppo} @dei.unipd.it

ACGADVANCED COMPUTING GROUP

DEI - UNIVERSITY OF PADOVA

The multifrontal approachUnifrontal

Sequential assemblystrategy

Recursive assembly strategyin a bottom-up fashion

Multifrontal

June 6-9, 2004Krakow, Poland

Regions

Mesh region E

An elementaryregion

A compositeregion

At each node:1) Assemble the region2) Eliminate fully-summed variables

The assembly tree

Leaves: The assembly phase gets elements from the FEM formulation

At the root we havefully decomposed the linear system into LU factors

Strip phaseIII.Strip factors away for

subsequent use in the finalbackward and forward substitution

Elimination phaseII.Eliminate FS variables

using a blocked UL decomposition

BLASLevel 3

0.0

59

0.6

27

9

1.1

54

7.1

36

0.0

6

0.9

9

2.0

72

23

.26

4

0.1

14

0.6

80

2

1.3

57

12

.59

9

0.0

53

9

1.0

62 4.5

84

33

.19

4

0

5

10

15

20

25

30

35

100 400 625 2500

Number of elements

Ex

ec

uti

on

tim

es

(s

)

PMMS

Unifrontal

MUMPS

SuperLU

31

2

49

4

51

8

66

6

32

5

58

4 64

2

84

6

45

0

55

3 60

6

86

0

74

28

4

31

7 35

4

0

100

200

300

400

500

600

700

800

900

1000

100 400 625 2500

Number of elements

Flo

ps

ra

te (

MF

LO

PS

)

PMMS

Unifrontal

MUMPS

SuperLU

Execution time Flops

PMMS is faster than Unifrontal solver,but they have the same solving kernel

The multifrontal assembly scheme is more efficient

PMMS is faster than both SuperLU andMUMPS for all significant problem sizes

The super-assembly phase and theuse of BLAS boosts the computation

The larger is the problem size, the faster is PMMS with respect to the other solvers

For larger test cases we expect abigger performance improvement

MUMPS and Unifrontal solver exhibitlarger flop rates than PMMS does

They are better tuned but theiralgorithm has higher complexity

Performance ResultsAlgorithm of super-assembly phase

MIUR Center for Science and Applicationof Advanced Computing Paradigms at the University of Padova, Italy

Consorzio Roma RicercheRoma, ItalyInternational Centre

for Mechanical Sciences,Udine, Italy

Ente per le Nuove tecnologie,l’Energia e l’Ambiente, Roma, Italy

Department of InformationEngineering University of Padova, Italy

? IBM Power3@375MHz with 4 GB mem? HPM Toolkit for performance measurements? FE square meshes with 100, 400, 625, and 2500 square

8-node elements

? PMMS is our multifrontal solver? SuperLU version 3.0? MUMPS version 4.3

Assembly phaseIa.

Merge the two reduced components into a new

composite region

Swap phaseIb.

Pack FS rows and columnsat the bottom-right cornerof the non-reduced region

Copy phaseIc.

Copy FS blocks intotemporary buffers

+ +

= Super-assembly phase

Computation properties

Symbolicanalysis

Symbolic dataThese data are computed once at the beginning of

the computation, but used at each iteration to performthe super-assembly phase.

Assembly treetopology

Finite elementmesh data

Simulation of porous mediaunder high temperature

Area of interest

Non-linear coupledmulti-physics problem

Large non-linearsystems of PDEs

LARGE LINEAR SYSTEMS

FEM

Physical model

Mathematical model

Linearization

Our Goal

Main features of our solver

Other specific features:F Multifrontal assembly/elimination strategyF Implicit minimum degree pivotingF Symbolic preprocessing phaseF Super-assembly phaseF Blocked LU decomposition for elimination

+Frontal solvers are direct methods since they first transform the system using Gaussian elimination or LU decomposition, then get the final solution using forward and backward substitution.+They do not operate on the completely

assembled linear system, but rather interleave assembly phases with elimination phases.+They require low memory space and can exploit

efficient dense linear algebra kernels.

Regions can be both elementary and composite. The former are obtained from the finite element formulation. The latter are unions of two component regions from an assembly phase.

The “spalling” phenomenon in a concrete-made pillar

after a simulated fire.

The FE simulation of the “spalling”phenomenon in a (section of)

concrete-made pillar in case of fire