parallel computing with elmer

54
Elmer – Open source finite element software for multiphysical problems

Upload: pilafa

Post on 19-Jan-2016

54 views

Category:

Documents


9 download

DESCRIPTION

Parallel Computing With Elmer

TRANSCRIPT

Page 1: Parallel Computing With Elmer

Elmer – Open source finite element software for multiphysical problems

Page 2: Parallel Computing With Elmer

Elmer

Parallel computing with Elmer

Elmer Team CSC – IT Center for Science Ltd. Elmer Course CSC, 9-10.1.2012

Page 3: Parallel Computing With Elmer

Outline

Parallel computing concepts

Parallel computing with Elmer– Preprocessing with ElmerGrid

– Parallel solver ElmerSolver_mpi

– Postprocessing with ElmerGrid and ElmerPost

Introductory example: Flow through pipe junction

Page 4: Parallel Computing With Elmer

Parallel computing concepts

Page 5: Parallel Computing With Elmer

Parallel computing concepts

Parallel computation means executing tasks concurrently.− A task encapsulates a sequential program and local data, and its interface

to its environment.

– Data of those other tasks is remote.

Data dependency means that the computation of one task requires data from an another task in order to proceed.

Page 6: Parallel Computing With Elmer

Parallel computers

Shared memory

− All cores can access the whole memory.

Distributed memory

– All cores have their own memory.

– Communication between cores is needed in order to access the memory of other cores.

Current supercomputers combine the distributed and shared memory approaches.

Page 7: Parallel Computing With Elmer

Parallel programming models

Message passing (OpenMPI)– Can be used both in distributed and shared memory computers.

– Programming model allows good parallel scalability.

– Programming is quite explicit.

Threads (pthreads, OpenMP)– Can be used only in shared memory computers.

– Limited parallel scalability.

– Simpler or less explicit programming.

Page 8: Parallel Computing With Elmer

Execution model

Parallel program is launched as a set of independent, identical processes– The same program code and instructions.

Can reside in different computation nodes.

Or even in different computers.

Page 9: Parallel Computing With Elmer

Current CPU's in your workstations− Six cores (AMD Opteron Shanghai)

Multi-threading

− For example, OpenMP

High performance Computing (HPC)

− Message passing, for example OpenMPI

General remarks about parallel computing

Page 10: Parallel Computing With Elmer

The size of the problem remains constant.

Execution time decreases in proportion to the increase in the number of cores.

Strong parallel scaling

Page 11: Parallel Computing With Elmer

Increasing the size of the problem.

Execution time remains constant, when number of cores increases in proportion to the problem size.

Weak parallel scaling

Page 12: Parallel Computing With Elmer

Parallel computing with Elmer

Page 13: Parallel Computing With Elmer

Domain decomposition.

Additional pre-processing step called mesh partitioning.

Every domain is running its own ElmerSolver.

ElmerSolver_mpi parallel process communication– MPI

Recombination of results to ElmerPost output.

Parallel computing with Elmer

Page 14: Parallel Computing With Elmer

Only selected linear algebra methods are available in parallel.

– Iterative solvers (Krylov subspace methods) are basically the same.

– From direct solvers, only MUMPS exists in parallel (uses OpenMP)

– With the additional HYPRE package, some linear methods are available only in parallel.

Parallel computing with Elmer

Page 15: Parallel Computing With Elmer

Preconditioners required by iterative methods are not the same as serial.

– For example, ILUn.

– May deteriorate parallel performance.

Diagonal preconditioner is the same as parallel and hence exhibits exactly the same behavior in parallel as in serial.

Parallel computing with Elmer

Page 16: Parallel Computing With Elmer

Preprocessing− Developing of automated

meshing algorithms.

Postprocessing

− Parallel postprocessing

− Paraview

Grand challenges

Page 17: Parallel Computing With Elmer

Parallel workflow

Page 18: Parallel Computing With Elmer

Scaling of wall clock time with dofs in the cavity lid case using GMRES+ILU0. Simulation by Juha Ruokolainen, CSC and visualization by Matti Gröhn, CSC .

Examples of parallel scaling of Elmer

Page 19: Parallel Computing With Elmer

Examples of parallel scaling of Elmer

0 16 32 48 64 80 96 112 128 1440

500

1000

1500

2000

2500

3000

3500

4000

Cores

Tim

e /

s

Page 20: Parallel Computing With Elmer

Serial mesh structure of Elmer− Header file contains general dimensions: mesh.header.

− Node file contains coordinate and ownership of nodes: mesh.nodes.

− Elements file contains compositions of bulk elements and ownerships (bodies): mesh.elements.

− Boundary file contains compositions of elements and ownerships (boundaries) and dependencies (parents) boundary elements: mesh.boundary.

Parallel preprocessing with Elmer

Page 21: Parallel Computing With Elmer

Parallel mesh structure: /partitioning.n/− Header file: part.1.header, part.2.header, … part.n.header

− Nodes: part.1.nodes, part.2.nodes, … part.n.nodes

− Elements: part.1.elements, part.2.elements, … part.n.elements

− Boundary elements: part.1.boundary, part.2.boundary, … part.n.boundary

− Shared nodes between partitions: part.1.shared, part.2.shared, … part.n.shared

Parallel preprocessing with Elmer

Page 22: Parallel Computing With Elmer

The best way to partition− Serial mesh → ElmerGrid → parallel mesh

General syntax

− ElmerGrid 2 2 existing.mesh [partoption]

Two principal partitioning techniques

− Along cartesian axis (simple geometries or topologies)

− METIS library

Parallel preprocessing with Elmer

Page 23: Parallel Computing With Elmer

Minimizes communication between computation nodes.

Minimizes the relative number of mesh elements between computation nodes.

Equal load between computation nodes.

Ideal mesh partitioning

Page 24: Parallel Computing With Elmer

Directional decomposition− ElmerGrid 2 2 dir -partition Nx Ny Nz F

-partition 2 2 1 0 -partition 2 2 1 1

element-wisenodal

Parallel computing with Elmer

Page 25: Parallel Computing With Elmer

Directional decomposition− ElmerGrid 2 2 dir -partition Nx Ny Nz F -partoder nx ny nz

− Defines the ordering direction (components of a vector).

Parallel computing with Elmer

Page 26: Parallel Computing With Elmer

Using METIS library− ElmerGrid 2 2 dir -metis N Method

-metis 4 0

PartMeshDualPartMeshNodal

-metis 4 1

Parallel computing with Elmer

Page 27: Parallel Computing With Elmer

METIS− ElmerGrid 2 2 dir -metis N Method

-metis 4 2

PartGraphKwayPartGraphRecursive

-metis 4 3

Parallel computing with Elmer

Page 28: Parallel Computing With Elmer

METIS ElmerGrid 2 2 dir -metis N Method

-metis 4 4

PartGraphPKway

Parallel computing with Elmer

Page 29: Parallel Computing With Elmer

Halo-elements− ElmerGrid 2 2 dir -metis N Method

-halo

− Necessary, if using discontinuous Galerkin method.

− Puts “ghost cell” on each side of the partition boundary.

Parallel computing with Elmer

Page 30: Parallel Computing With Elmer

More parallel options in ElmerGrid -indirect: creates indirect connections. -periodic Fx Fy Fz: declares the periodic coordinate directions for

parallel meshes. -partoptim: aggressive optimization to node sharing. -partbw minimize the bandwidth of partiotion-partition couplings.

Parallel computing with Elmer

Page 31: Parallel Computing With Elmer

Mpirun -np N ElmerSolver_mpi.

Might change on other platforms.

Might need a hostfile.

Needs an N-partitioned mesh.

Needs ELMERSOLVER_STARTINFO, which contains the name of the command file.

Optional libraries− HYPRE

− MUMPS

Parallel version of ElmerSolver

Page 32: Parallel Computing With Elmer

Different behaviour of ILU preconditioner.− Not available parts at partition

boundaries.

− Sometimes works. If not,use HYPRE

Linear System Use Hypre = Logical True

Parallel version of ElmerSolver

Page 33: Parallel Computing With Elmer

Alternative preconditioners in HYPRE− ParaSails (sparse approximate inverse preconditioner)

Linear System Preconditioning = String “ParaSails”

− BoomerAMG (Algebraic multigrid)

Linear System Preconditioning = String “BoomerAMG”

Parallel version of ElmerSolver

Page 34: Parallel Computing With Elmer

Alternative solvers− BoomerAMG (Algebraic Multigrid)

Linear System Solver = “Iterative”

Linear System Iterative Method = “BoomerAMG”

− Multifrontal parallel direct solver (MUMPS)

Linear System Solver = “Direct”

Linear System Direct Method = Mumps

Parallel version of ElmerSolver

Page 35: Parallel Computing With Elmer

Elmer writes results in partitionwise.name.0.ep, name.1.ep, ..., name.(n-1).ep

ElmerPost fuses resultfiles into one.

Elmergrid 15 3 name

− Fuses all time steps (also non-existing) into a single file called name.ep (overwritten, if exists).

Special option for only partial fuse

-saveinterval start end step

− The first, last and the (time)step of fusing parallel data.

Parallel postprocessing

Page 36: Parallel Computing With Elmer

Introductory example: Flow through a pipe junction

Page 37: Parallel Computing With Elmer

Flow through pipe junction.Boundary conditions:

1. vin

= 1 cm/s

2. None

3. vin = 1 cm/s

4. and 5. no-slip (u

i = 0 m/s) on walls.

Description of the problem

1

3

2

5

4

Page 38: Parallel Computing With Elmer

Partition of mesh with ElmerGrid.ElmerGrid 2 2 mesh -out part_mesh -scale

0.01 0.01 0.01 -metis 4 2

Scales the problem from cm → m.

Creates a mesh with 4 partitions by using PartGraphRecursive option of METIS-library.

Preprocessing

Page 39: Parallel Computing With Elmer

Header

Mesh DB "." "flow"

End

Simulation

Coordinate System = "Cartesian 3D"

Simulation Type ="Steady"

Output Intervals = Integer 1

Post File = File "parallel_flow.ep"

Output File = File "parallel_flow.result"

max output level = Integer 4

End

Solver input file

Page 40: Parallel Computing With Elmer

Solver 1

Equation = "Navier-Stokes"

Optimize Bandwidth = Logical True

Linear System Solver = Iterative

Linear System Direct Solver = Mumps

Stabilization Method = Stabilized

Nonlinear System Convergence Tolerance = Real 1.0E-03

Nonlinear System Max Iterations = Integer 30

Nonlinear System Newton After Iterations = Integer 1

Nonlinear System Newton After Tolerance = Real 1.0E-03

End

Solver input file

Page 41: Parallel Computing With Elmer

Body 1

Name = "fluid"

Equation = 1

Material = 1

Body Force = 1

Initial Condition = 1

End

Equation 1

Active Solvers(1) = 1

Convection = Computed

End

Solver input file

Page 42: Parallel Computing With Elmer

Initial Condition 1

Velocity 1 = Real 0.0

Velocity 2 = Real 0.0

Velocity 3 = Real 0.0

Pressure = Real 0.0

End

Body Force 1

Flow BodyForce 1 = Real 0.0

Flow BodyForce 2 = Real 0.0

Flow BodyForce 3 = Real 0.0

End

Solver input file

Page 43: Parallel Computing With Elmer

Material 1

Density = Real 1000.0

Viscosity = Real 1.0

End

Boundary Condition 1

Name = "largeinflow"

Target Boundaries = 1

Normal-Tangential Velocity = True

Velocity 1 = Real -0.01

Velocity 2 = Real 0.0

Velocity 3 = Real 0.0

End

Solver input file

Outward pointing normal

Page 44: Parallel Computing With Elmer

Boundary Condition 2

Name = "largeoutflow"

Target Boundaries = 2

Normal-Tangential Velocity = True

Velocity 2 = Real 0.0

Velocity 3 = Real 0.0

End

Boundary Condition 3

Name = "smallinflow"

Target Boundaries = 3

Normal-Tangential Velocity = True

Velocity 1 = Real -0.01

Velocity 2 = Real 0.0

Velocity 3 = Real 0.0

End

Solver input file

2

3

Page 45: Parallel Computing With Elmer

Boundary Condition 4

Name = "pipewalls"

Target Boundaries(2) = 4 5

Normal-Tangential Velocity = False

Velocity 1 = Real 0.0

Velocity 2 = Real 0.0

Velocity 3 = Real 0.0

End

Solver input file

4

5

Page 46: Parallel Computing With Elmer

Save the sif-file with name parallel_flow.sif

Write the name of the sif-file into ELMERSOLVER_STARTINFO

Commands for vuori.csc.fi:

Parallel run

module switch PrgEnv-pgi PrgEnv-gnu

module load elmer/latest

sallocate -n 4 --ntasks-per-node=4 --mem-per-cpu=1000 -t 00:10:00 –p interactive

srun ElmerSolver_mpi

On an usual MPI platform:

mpirun –np 4 ElmerSolver_mpi

Page 47: Parallel Computing With Elmer

Change into the mesh directory.

Run ElmerGrid to combine results.ElmerGrid 15 3 parallel_flow

Launch ElmerPost.

Load parallel_flow.ep.

Combining the results

Page 48: Parallel Computing With Elmer

Adding heat transfer

Solver 2

Exec Solver = Always

Equation = "Heat Equation"

Procedure = "HeatSolve" "HeatSolver"

Steady State Convergence Tolerance = Real 3.0E-03

Nonlinear System Max Iterations = Integer 1

Nonlinear System Convergence Tolerance = Real 1.0e-6

Nonlinear System Newton After Iterations = Integer 1

Nonlinear System Newton After Tolerance = Real 1.0e-2

Linear System Solver = Iterative

Linear System Max Iterations = Integer 500

Linear System Convergence Tolerance = Real 1.0e-6

Stabilization Method = Stabilized

Page 49: Parallel Computing With Elmer

Adding heat transfer

Linear System Use Hypre = Logical True

Linear System Iterative Method = BoomerAMG

BoomerAMG Max Levels = Integer 25

BoomerAMG Coarsen Type = Integer 0

BoomerAMG Num Functions = Integer 1

BoomerAMG Relax Type = Integer 3

BoomerAMG Num Sweeps = Integer 1

BoomerAMG Interpolation Type = Integer 0

BoomerAMG Smooth Type = Integer 6

BoomerAMG Cycle Type = Integer 1

End

Page 50: Parallel Computing With Elmer

Adding heat transfer

Material 1

Heat Capacity = Real 1000.0

Heat Conductivity = Real 1.0

End

Boundary Condition 1

Temperature = Real 10.0

End

Boundary Condition 3

Temperature = Real 90.0

End

Page 51: Parallel Computing With Elmer

ParaView is an open-source, multi-platform data analysis and visualization application by Kitware.

Distributed under ParaView License Version 1.2

Elmer's ResultOutputSolve module provides output as VTK UnstructuredGrid files.

− vtu

− pvtu (parallel)

Postprocessing with ParaView

Page 52: Parallel Computing With Elmer

Output for ParaView

Solver 3

Equation = "Result Output"

Procedure = "ResultOutputSolve" "ResultOutputSolver"

Exec Solver = After Saving

Output File Name = String "flowtemp"

Output Format = Vtu

Show Variables = Logical True

Scalar Field 1 = Pressure

Scalar Field 2 = Temperature

Vector Field 1 = Velocity

End

Page 53: Parallel Computing With Elmer

ParaView

Page 54: Parallel Computing With Elmer

ParaView