parallel computing with elmer

Elmer – Open source finite element software for multiphysical problems

Elmer

Parallel computing with Elmer

Elmer Team CSC – IT Center for Science Ltd. Elmer Course CSC, 9-10.1.2012

Outline

Parallel computing concepts

Parallel computing with Elmer– Preprocessing with ElmerGrid

– Parallel solver ElmerSolver_mpi

– Postprocessing with ElmerGrid and ElmerPost

Introductory example: Flow through pipe junction


Parallel computation means executing tasks concurrently.− A task encapsulates a sequential program and local data, and its interface

to its environment.

– Data of those other tasks is remote.

Data dependency means that the computation of one task requires data from an another task in order to proceed.

Parallel computers

Shared memory

− All cores can access the whole memory.

Distributed memory

– All cores have their own memory.

– Communication between cores is needed in order to access the memory of other cores.

Current supercomputers combine the distributed and shared memory approaches.

Parallel programming models

Message passing (OpenMPI)– Can be used both in distributed and shared memory computers.

– Programming model allows good parallel scalability.

– Programming is quite explicit.

Threads (pthreads, OpenMP)– Can be used only in shared memory computers.

– Limited parallel scalability.

– Simpler or less explicit programming.

Execution model

Parallel program is launched as a set of independent, identical processes– The same program code and instructions.

Can reside in different computation nodes.

Or even in different computers.

Current CPU's in your workstations− Six cores (AMD Opteron Shanghai)

Multi-threading

− For example, OpenMP

High performance Computing (HPC)

− Message passing, for example OpenMPI

General remarks about parallel computing

The size of the problem remains constant.

Execution time decreases in proportion to the increase in the number of cores.

Strong parallel scaling

Increasing the size of the problem.

Execution time remains constant, when number of cores increases in proportion to the problem size.

Weak parallel scaling

Domain decomposition.

Additional pre-processing step called mesh partitioning.

Every domain is running its own ElmerSolver.

ElmerSolver_mpi parallel process communication– MPI

Recombination of results to ElmerPost output.


Only selected linear algebra methods are available in parallel.

– Iterative solvers (Krylov subspace methods) are basically the same.

– From direct solvers, only MUMPS exists in parallel (uses OpenMP)

– With the additional HYPRE package, some linear methods are available only in parallel.


Preconditioners required by iterative methods are not the same as serial.

– For example, ILUn.

– May deteriorate parallel performance.

Diagonal preconditioner is the same as parallel and hence exhibits exactly the same behavior in parallel as in serial.


Preprocessing− Developing of automated

meshing algorithms.

Postprocessing

− Parallel postprocessing

− Paraview

Grand challenges

Parallel workflow

Scaling of wall clock time with dofs in the cavity lid case using GMRES+ILU0. Simulation by Juha Ruokolainen, CSC and visualization by Matti Gröhn, CSC .

Examples of parallel scaling of Elmer

Examples of parallel scaling of Elmer

0 16 32 48 64 80 96 112 128 1440

500

1000

1500

2000

2500

3000

3500

4000

Cores

Tim

e /

s

Serial mesh structure of Elmer− Header file contains general dimensions: mesh.header.

− Node file contains coordinate and ownership of nodes: mesh.nodes.

− Elements file contains compositions of bulk elements and ownerships (bodies): mesh.elements.

− Boundary file contains compositions of elements and ownerships (boundaries) and dependencies (parents) boundary elements: mesh.boundary.

Parallel preprocessing with Elmer

Parallel mesh structure: /partitioning.n/− Header file: part.1.header, part.2.header, … part.n.header

− Nodes: part.1.nodes, part.2.nodes, … part.n.nodes

− Elements: part.1.elements, part.2.elements, … part.n.elements

− Boundary elements: part.1.boundary, part.2.boundary, … part.n.boundary

− Shared nodes between partitions: part.1.shared, part.2.shared, … part.n.shared


The best way to partition− Serial mesh → ElmerGrid → parallel mesh

General syntax

− ElmerGrid 2 2 existing.mesh [partoption]

Two principal partitioning techniques

− Along cartesian axis (simple geometries or topologies)

− METIS library


Minimizes communication between computation nodes.

Minimizes the relative number of mesh elements between computation nodes.

Equal load between computation nodes.

Ideal mesh partitioning

Directional decomposition− ElmerGrid 2 2 dir -partition Nx Ny Nz F

-partition 2 2 1 0 -partition 2 2 1 1

element-wisenodal


Directional decomposition− ElmerGrid 2 2 dir -partition Nx Ny Nz F -partoder nx ny nz

− Defines the ordering direction (components of a vector).


Using METIS library− ElmerGrid 2 2 dir -metis N Method

-metis 4 0

PartMeshDualPartMeshNodal

-metis 4 1


METIS− ElmerGrid 2 2 dir -metis N Method

-metis 4 2

PartGraphKwayPartGraphRecursive

-metis 4 3


METIS ElmerGrid 2 2 dir -metis N Method

-metis 4 4

PartGraphPKway


Halo-elements− ElmerGrid 2 2 dir -metis N Method

-halo

− Necessary, if using discontinuous Galerkin method.

− Puts “ghost cell” on each side of the partition boundary.


More parallel options in ElmerGrid -indirect: creates indirect connections. -periodic Fx Fy Fz: declares the periodic coordinate directions for

parallel meshes. -partoptim: aggressive optimization to node sharing. -partbw minimize the bandwidth of partiotion-partition couplings.


Mpirun -np N ElmerSolver_mpi.

Might change on other platforms.

Might need a hostfile.

Needs an N-partitioned mesh.

Needs ELMERSOLVER_STARTINFO, which contains the name of the command file.

Optional libraries− HYPRE

− MUMPS

Parallel version of ElmerSolver

Different behaviour of ILU preconditioner.− Not available parts at partition

boundaries.

− Sometimes works. If not,use HYPRE

Linear System Use Hypre = Logical True


Alternative preconditioners in HYPRE− ParaSails (sparse approximate inverse preconditioner)

Linear System Preconditioning = String “ParaSails”

− BoomerAMG (Algebraic multigrid)

Linear System Preconditioning = String “BoomerAMG”


Alternative solvers− BoomerAMG (Algebraic Multigrid)

Linear System Solver = “Iterative”

Linear System Iterative Method = “BoomerAMG”

− Multifrontal parallel direct solver (MUMPS)

Linear System Solver = “Direct”

Linear System Direct Method = Mumps


Elmer writes results in partitionwise.name.0.ep, name.1.ep, ..., name.(n-1).ep

ElmerPost fuses resultfiles into one.

Elmergrid 15 3 name

− Fuses all time steps (also non-existing) into a single file called name.ep (overwritten, if exists).

Special option for only partial fuse

-saveinterval start end step

− The first, last and the (time)step of fusing parallel data.

Parallel postprocessing

Introductory example: Flow through a pipe junction

Flow through pipe junction.Boundary conditions:

1. vin

= 1 cm/s

2. None

3. vin = 1 cm/s

4. and 5. no-slip (u

i = 0 m/s) on walls.

Description of the problem

1

3

2

5

4

Partition of mesh with ElmerGrid.ElmerGrid 2 2 mesh -out part_mesh -scale

0.01 0.01 0.01 -metis 4 2

Scales the problem from cm → m.

Creates a mesh with 4 partitions by using PartGraphRecursive option of METIS-library.

Preprocessing

Header

Mesh DB "." "flow"

End

Simulation

Coordinate System = "Cartesian 3D"

Simulation Type ="Steady"

Output Intervals = Integer 1

Post File = File "parallel_flow.ep"

Output File = File "parallel_flow.result"

max output level = Integer 4

End

Solver input file

Solver 1

Equation = "Navier-Stokes"

Optimize Bandwidth = Logical True

Linear System Solver = Iterative

Linear System Direct Solver = Mumps

Stabilization Method = Stabilized

Nonlinear System Convergence Tolerance = Real 1.0E-03

Nonlinear System Max Iterations = Integer 30

Nonlinear System Newton After Iterations = Integer 1

Nonlinear System Newton After Tolerance = Real 1.0E-03

End

Solver input file

Body 1

Name = "fluid"

Equation = 1

Material = 1

Body Force = 1

Initial Condition = 1

End

Equation 1

Active Solvers(1) = 1

Convection = Computed

End

Solver input file

Initial Condition 1

Velocity 1 = Real 0.0



Pressure = Real 0.0

End

Body Force 1

Flow BodyForce 1 = Real 0.0



End

Solver input file

Material 1

Density = Real 1000.0

Viscosity = Real 1.0

End

Boundary Condition 1

Name = "largeinflow"

Target Boundaries = 1

Normal-Tangential Velocity = True

Velocity 1 = Real -0.01



End

Solver input file

Outward pointing normal


Name = "largeoutflow"





End


Name = "smallinflow"



Velocity 1 = Real -0.01



End

Solver input file

2

3


Name = "pipewalls"

Target Boundaries(2) = 4 5

Normal-Tangential Velocity = False




End

Solver input file

4

5

Save the sif-file with name parallel_flow.sif

Write the name of the sif-file into ELMERSOLVER_STARTINFO

Commands for vuori.csc.fi:

Parallel run

module switch PrgEnv-pgi PrgEnv-gnu

module load elmer/latest

sallocate -n 4 --ntasks-per-node=4 --mem-per-cpu=1000 -t 00:10:00 –p interactive

srun ElmerSolver_mpi

On an usual MPI platform:

mpirun –np 4 ElmerSolver_mpi

Change into the mesh directory.

Run ElmerGrid to combine results.ElmerGrid 15 3 parallel_flow

Launch ElmerPost.

Load parallel_flow.ep.

Combining the results

Adding heat transfer

Solver 2

Exec Solver = Always

Equation = "Heat Equation"

Procedure = "HeatSolve" "HeatSolver"

Steady State Convergence Tolerance = Real 3.0E-03

Nonlinear System Max Iterations = Integer 1

Nonlinear System Convergence Tolerance = Real 1.0e-6

Nonlinear System Newton After Iterations = Integer 1

Nonlinear System Newton After Tolerance = Real 1.0e-2

Linear System Solver = Iterative

Linear System Max Iterations = Integer 500

Linear System Convergence Tolerance = Real 1.0e-6

Stabilization Method = Stabilized


Linear System Use Hypre = Logical True

Linear System Iterative Method = BoomerAMG

BoomerAMG Max Levels = Integer 25

BoomerAMG Coarsen Type = Integer 0

BoomerAMG Num Functions = Integer 1

BoomerAMG Relax Type = Integer 3

BoomerAMG Num Sweeps = Integer 1

BoomerAMG Interpolation Type = Integer 0

BoomerAMG Smooth Type = Integer 6

BoomerAMG Cycle Type = Integer 1

End


Material 1

…

Heat Capacity = Real 1000.0

Heat Conductivity = Real 1.0

End


…

Temperature = Real 10.0

End


…

Temperature = Real 90.0

End

ParaView is an open-source, multi-platform data analysis and visualization application by Kitware.

Distributed under ParaView License Version 1.2

Elmer's ResultOutputSolve module provides output as VTK UnstructuredGrid files.

− vtu

− pvtu (parallel)

Postprocessing with ParaView

Output for ParaView

Solver 3

Equation = "Result Output"

Procedure = "ResultOutputSolve" "ResultOutputSolver"

Exec Solver = After Saving

Output File Name = String "flowtemp"

Output Format = Vtu

Show Variables = Logical True

Scalar Field 1 = Pressure

Scalar Field 2 = Temperature

Vector Field 1 = Velocity

End

ParaView

parallel computing with elmer

Documents

parallel performance

parallel computingthe

parallel uses openmp

good parallel scalability

limited parallel scalability

distributed memory

shared memory computers

number of cores