parallel computing with elmer
DESCRIPTION
Parallel Computing With ElmerTRANSCRIPT
Elmer – Open source finite element software for multiphysical problems
Elmer
Parallel computing with Elmer
Elmer Team CSC – IT Center for Science Ltd. Elmer Course CSC, 9-10.1.2012
Outline
Parallel computing concepts
Parallel computing with Elmer– Preprocessing with ElmerGrid
– Parallel solver ElmerSolver_mpi
– Postprocessing with ElmerGrid and ElmerPost
Introductory example: Flow through pipe junction
Parallel computing concepts
Parallel computing concepts
Parallel computation means executing tasks concurrently.− A task encapsulates a sequential program and local data, and its interface
to its environment.
– Data of those other tasks is remote.
Data dependency means that the computation of one task requires data from an another task in order to proceed.
Parallel computers
Shared memory
− All cores can access the whole memory.
Distributed memory
– All cores have their own memory.
– Communication between cores is needed in order to access the memory of other cores.
Current supercomputers combine the distributed and shared memory approaches.
Parallel programming models
Message passing (OpenMPI)– Can be used both in distributed and shared memory computers.
– Programming model allows good parallel scalability.
– Programming is quite explicit.
Threads (pthreads, OpenMP)– Can be used only in shared memory computers.
– Limited parallel scalability.
– Simpler or less explicit programming.
Execution model
Parallel program is launched as a set of independent, identical processes– The same program code and instructions.
Can reside in different computation nodes.
Or even in different computers.
Current CPU's in your workstations− Six cores (AMD Opteron Shanghai)
Multi-threading
− For example, OpenMP
High performance Computing (HPC)
− Message passing, for example OpenMPI
General remarks about parallel computing
The size of the problem remains constant.
Execution time decreases in proportion to the increase in the number of cores.
Strong parallel scaling
Increasing the size of the problem.
Execution time remains constant, when number of cores increases in proportion to the problem size.
Weak parallel scaling
Parallel computing with Elmer
Domain decomposition.
Additional pre-processing step called mesh partitioning.
Every domain is running its own ElmerSolver.
ElmerSolver_mpi parallel process communication– MPI
Recombination of results to ElmerPost output.
Parallel computing with Elmer
Only selected linear algebra methods are available in parallel.
– Iterative solvers (Krylov subspace methods) are basically the same.
– From direct solvers, only MUMPS exists in parallel (uses OpenMP)
– With the additional HYPRE package, some linear methods are available only in parallel.
Parallel computing with Elmer
Preconditioners required by iterative methods are not the same as serial.
– For example, ILUn.
– May deteriorate parallel performance.
Diagonal preconditioner is the same as parallel and hence exhibits exactly the same behavior in parallel as in serial.
Parallel computing with Elmer
Preprocessing− Developing of automated
meshing algorithms.
Postprocessing
− Parallel postprocessing
− Paraview
Grand challenges
Parallel workflow
Scaling of wall clock time with dofs in the cavity lid case using GMRES+ILU0. Simulation by Juha Ruokolainen, CSC and visualization by Matti Gröhn, CSC .
Examples of parallel scaling of Elmer
Examples of parallel scaling of Elmer
0 16 32 48 64 80 96 112 128 1440
500
1000
1500
2000
2500
3000
3500
4000
Cores
Tim
e /
s
Serial mesh structure of Elmer− Header file contains general dimensions: mesh.header.
− Node file contains coordinate and ownership of nodes: mesh.nodes.
− Elements file contains compositions of bulk elements and ownerships (bodies): mesh.elements.
− Boundary file contains compositions of elements and ownerships (boundaries) and dependencies (parents) boundary elements: mesh.boundary.
Parallel preprocessing with Elmer
Parallel mesh structure: /partitioning.n/− Header file: part.1.header, part.2.header, … part.n.header
− Nodes: part.1.nodes, part.2.nodes, … part.n.nodes
− Elements: part.1.elements, part.2.elements, … part.n.elements
− Boundary elements: part.1.boundary, part.2.boundary, … part.n.boundary
− Shared nodes between partitions: part.1.shared, part.2.shared, … part.n.shared
Parallel preprocessing with Elmer
The best way to partition− Serial mesh → ElmerGrid → parallel mesh
General syntax
− ElmerGrid 2 2 existing.mesh [partoption]
Two principal partitioning techniques
− Along cartesian axis (simple geometries or topologies)
− METIS library
Parallel preprocessing with Elmer
Minimizes communication between computation nodes.
Minimizes the relative number of mesh elements between computation nodes.
Equal load between computation nodes.
Ideal mesh partitioning
Directional decomposition− ElmerGrid 2 2 dir -partition Nx Ny Nz F
-partition 2 2 1 0 -partition 2 2 1 1
element-wisenodal
Parallel computing with Elmer
Directional decomposition− ElmerGrid 2 2 dir -partition Nx Ny Nz F -partoder nx ny nz
− Defines the ordering direction (components of a vector).
Parallel computing with Elmer
Using METIS library− ElmerGrid 2 2 dir -metis N Method
-metis 4 0
PartMeshDualPartMeshNodal
-metis 4 1
Parallel computing with Elmer
METIS− ElmerGrid 2 2 dir -metis N Method
-metis 4 2
PartGraphKwayPartGraphRecursive
-metis 4 3
Parallel computing with Elmer
METIS ElmerGrid 2 2 dir -metis N Method
-metis 4 4
PartGraphPKway
Parallel computing with Elmer
Halo-elements− ElmerGrid 2 2 dir -metis N Method
-halo
− Necessary, if using discontinuous Galerkin method.
− Puts “ghost cell” on each side of the partition boundary.
Parallel computing with Elmer
More parallel options in ElmerGrid -indirect: creates indirect connections. -periodic Fx Fy Fz: declares the periodic coordinate directions for
parallel meshes. -partoptim: aggressive optimization to node sharing. -partbw minimize the bandwidth of partiotion-partition couplings.
Parallel computing with Elmer
Mpirun -np N ElmerSolver_mpi.
Might change on other platforms.
Might need a hostfile.
Needs an N-partitioned mesh.
Needs ELMERSOLVER_STARTINFO, which contains the name of the command file.
Optional libraries− HYPRE
− MUMPS
Parallel version of ElmerSolver
Different behaviour of ILU preconditioner.− Not available parts at partition
boundaries.
− Sometimes works. If not,use HYPRE
Linear System Use Hypre = Logical True
Parallel version of ElmerSolver
Alternative preconditioners in HYPRE− ParaSails (sparse approximate inverse preconditioner)
Linear System Preconditioning = String “ParaSails”
− BoomerAMG (Algebraic multigrid)
Linear System Preconditioning = String “BoomerAMG”
Parallel version of ElmerSolver
Alternative solvers− BoomerAMG (Algebraic Multigrid)
Linear System Solver = “Iterative”
Linear System Iterative Method = “BoomerAMG”
− Multifrontal parallel direct solver (MUMPS)
Linear System Solver = “Direct”
Linear System Direct Method = Mumps
Parallel version of ElmerSolver
Elmer writes results in partitionwise.name.0.ep, name.1.ep, ..., name.(n-1).ep
ElmerPost fuses resultfiles into one.
Elmergrid 15 3 name
− Fuses all time steps (also non-existing) into a single file called name.ep (overwritten, if exists).
Special option for only partial fuse
-saveinterval start end step
− The first, last and the (time)step of fusing parallel data.
Parallel postprocessing
Introductory example: Flow through a pipe junction
Flow through pipe junction.Boundary conditions:
1. vin
= 1 cm/s
2. None
3. vin = 1 cm/s
4. and 5. no-slip (u
i = 0 m/s) on walls.
Description of the problem
1
3
2
5
4
Partition of mesh with ElmerGrid.ElmerGrid 2 2 mesh -out part_mesh -scale
0.01 0.01 0.01 -metis 4 2
Scales the problem from cm → m.
Creates a mesh with 4 partitions by using PartGraphRecursive option of METIS-library.
Preprocessing
Header
Mesh DB "." "flow"
End
Simulation
Coordinate System = "Cartesian 3D"
Simulation Type ="Steady"
Output Intervals = Integer 1
Post File = File "parallel_flow.ep"
Output File = File "parallel_flow.result"
max output level = Integer 4
End
Solver input file
Solver 1
Equation = "Navier-Stokes"
Optimize Bandwidth = Logical True
Linear System Solver = Iterative
Linear System Direct Solver = Mumps
Stabilization Method = Stabilized
Nonlinear System Convergence Tolerance = Real 1.0E-03
Nonlinear System Max Iterations = Integer 30
Nonlinear System Newton After Iterations = Integer 1
Nonlinear System Newton After Tolerance = Real 1.0E-03
End
Solver input file
Body 1
Name = "fluid"
Equation = 1
Material = 1
Body Force = 1
Initial Condition = 1
End
Equation 1
Active Solvers(1) = 1
Convection = Computed
End
Solver input file
Initial Condition 1
Velocity 1 = Real 0.0
Velocity 2 = Real 0.0
Velocity 3 = Real 0.0
Pressure = Real 0.0
End
Body Force 1
Flow BodyForce 1 = Real 0.0
Flow BodyForce 2 = Real 0.0
Flow BodyForce 3 = Real 0.0
End
Solver input file
Material 1
Density = Real 1000.0
Viscosity = Real 1.0
End
Boundary Condition 1
Name = "largeinflow"
Target Boundaries = 1
Normal-Tangential Velocity = True
Velocity 1 = Real -0.01
Velocity 2 = Real 0.0
Velocity 3 = Real 0.0
End
Solver input file
Outward pointing normal
Boundary Condition 2
Name = "largeoutflow"
Target Boundaries = 2
Normal-Tangential Velocity = True
Velocity 2 = Real 0.0
Velocity 3 = Real 0.0
End
Boundary Condition 3
Name = "smallinflow"
Target Boundaries = 3
Normal-Tangential Velocity = True
Velocity 1 = Real -0.01
Velocity 2 = Real 0.0
Velocity 3 = Real 0.0
End
Solver input file
2
3
Boundary Condition 4
Name = "pipewalls"
Target Boundaries(2) = 4 5
Normal-Tangential Velocity = False
Velocity 1 = Real 0.0
Velocity 2 = Real 0.0
Velocity 3 = Real 0.0
End
Solver input file
4
5
Save the sif-file with name parallel_flow.sif
Write the name of the sif-file into ELMERSOLVER_STARTINFO
Commands for vuori.csc.fi:
Parallel run
module switch PrgEnv-pgi PrgEnv-gnu
module load elmer/latest
sallocate -n 4 --ntasks-per-node=4 --mem-per-cpu=1000 -t 00:10:00 –p interactive
srun ElmerSolver_mpi
On an usual MPI platform:
mpirun –np 4 ElmerSolver_mpi
Change into the mesh directory.
Run ElmerGrid to combine results.ElmerGrid 15 3 parallel_flow
Launch ElmerPost.
Load parallel_flow.ep.
Combining the results
Adding heat transfer
Solver 2
Exec Solver = Always
Equation = "Heat Equation"
Procedure = "HeatSolve" "HeatSolver"
Steady State Convergence Tolerance = Real 3.0E-03
Nonlinear System Max Iterations = Integer 1
Nonlinear System Convergence Tolerance = Real 1.0e-6
Nonlinear System Newton After Iterations = Integer 1
Nonlinear System Newton After Tolerance = Real 1.0e-2
Linear System Solver = Iterative
Linear System Max Iterations = Integer 500
Linear System Convergence Tolerance = Real 1.0e-6
Stabilization Method = Stabilized
Adding heat transfer
Linear System Use Hypre = Logical True
Linear System Iterative Method = BoomerAMG
BoomerAMG Max Levels = Integer 25
BoomerAMG Coarsen Type = Integer 0
BoomerAMG Num Functions = Integer 1
BoomerAMG Relax Type = Integer 3
BoomerAMG Num Sweeps = Integer 1
BoomerAMG Interpolation Type = Integer 0
BoomerAMG Smooth Type = Integer 6
BoomerAMG Cycle Type = Integer 1
End
Adding heat transfer
Material 1
…
Heat Capacity = Real 1000.0
Heat Conductivity = Real 1.0
End
Boundary Condition 1
…
Temperature = Real 10.0
End
Boundary Condition 3
…
Temperature = Real 90.0
End
ParaView is an open-source, multi-platform data analysis and visualization application by Kitware.
Distributed under ParaView License Version 1.2
Elmer's ResultOutputSolve module provides output as VTK UnstructuredGrid files.
− vtu
− pvtu (parallel)
Postprocessing with ParaView
Output for ParaView
Solver 3
Equation = "Result Output"
Procedure = "ResultOutputSolve" "ResultOutputSolver"
Exec Solver = After Saving
Output File Name = String "flowtemp"
Output Format = Vtu
Show Variables = Logical True
Scalar Field 1 = Pressure
Scalar Field 2 = Temperature
Vector Field 1 = Velocity
End
ParaView
ParaView