Parallel Computation of the 2D Laminar Axisymmetric Coflow Nonpremixed Flames
Qingan Andy Zhang
PhD CandidateDepartment of Mechanical and Industrial EngineeringUniversity of Toronto
ECE 1747 Parallel Programming Course Project Dec. 2006
2
Outline
Introduction Motivation Objective Methodology Result Conclusion Future Improvement Work in Progress
3
Introduction
Multi-dimensional flame Easy to model Computationally OK with detail
sub-models such as chemistry, transport, etc.
Lots of experimental data Resembles the turbulent flames
in some cases (eg. flamelet regime)
Flow configuration
4
Motivation
Complex Chemical Mechanism Appel (2000) mechanism (101 species,543 reactions)
Complex Geometry Large 2D coflow laminar flame (1,000*500=500,000) 3D laminar flame (1,000*500*100=50,000,000)
Complex Physical Problem Soot formation Multi-phase problem
The run time is expected to be long if:
5
Objective
Speedup Feasibility Accuracy Flexibility
To develop parallel flame code based on the sequential flame code
6
Methodology -- Options Shared Memory
OpenMP Pthread
Distributed Memory MPI
Distributed Shared Memory Munin TreadMarks
MPI is chosen because it is widely used for scientific computation, easy to program and also the cluster is a Distributed Memory system.
7
Methodology -- Preparation
Linux OS Programming tool (Fortran, Make, IDE) Parallel computation concepts MPI commands Network (SSH, queuing system)
8
Methodology –Sequential code
Sequential Code AnalysisAlgorithmDependency
Data I/O
CPU time breakdown
Sequential code is the backbone for parallelization!
9
Flow configuration and computational domain
Methodology
Continuity equation
Momentum equation
Gas species equation
Energy equation
CFD
With parallel computation
Constitutive relation
+ Initial condition
Boundary condition
10
Quantities solved (primitive variables):
U, V, P’, Yi (i=1,KK), T
Yi --- ith gas species mass fraction
KK --- total gas species number
MethodologyCFD:
Finite Volume Method
Iterative process on Staggered grid
Flow configuration and computational domain
If KK=100, then we have to solve (3+100+1)=104 equations at each point.
If mesh is 1000*500, then, we have to solve 104*1000*500=52,000,000 equations in each iteration.
If 3000 iterations are required to get converged solution, we have to totally solve 52,000,000*3000=156,000,000,000 equations.
11
Unsteady Term + Convection Term = Diffusion Term + Source Term
Unsteady: time variant term
Convection: caused by flow motion
Diffusion:
For species: molecular diffusion and thermo diffusion
Source term:
For species: chemical reaction
General Transport Equation
12
zgz
vr
rrz
u
z
rvrrzz
u
zr
ur
rrz
p
z
uu
r
uv
)(1
)(3
2
)(3
2)(2)(
1
z
u
rrv
rrr
v
r
u
zz
ur
rr
rvrrrr
vr
rrz
v
zr
p
z
vu
r
vv
3
2)(
3
22)()(
1
3
2
)(1
3
2)(
2)(
22
Axial momentum:
Radial momentum:
Mass and Momentum equation
0)()(
urz
vrr
Mass:
13
Species and Energy equation
rkk
KK
kk
KK
kkzkrkpkp
QWh
z
TV
r
TVYc
z
T
zr
Tr
rrz
Tu
r
Tvc
1
1
)()()(1
)(
Diffusion of species
Chemical reaction
Radiation heat transfer
KKk
WVYz
VYrrrz
Yu
r
Yv kkkzkkrk
kk
,...,2,1
)()(1
Species
Energy
14
Methodology –Sequential code
Start iteration from scratch or continued job Within one iteration
Iteration starts Discretization get AP(I,J) and CON(I,J) Solve TDMA or PbyP Gauss Elimination Get new value update F(I,J,NF) array Do other equations
Iteration ends
End iteration if convergence reached
15
Methodology –Sequential code
Fig. 1 CPU time for each sub-code summarized after one iteration with radiation included
Most time-consuming part: Species Jacobian matrix DSDY(K1,K2,I,J) evaluation
Dependency??
16
Methodology -- Parallelization
Domain Decomposition Method (DDM) with Message Passing Interface (MPI) programming
Six processes used to decompose the computational domain of 206*102 staggered grid points
R, V
Z, U
Ghost Points are placed at the boundary to reduce communication among processes!
17
Cluster Information Cluster location: icpet.nrc.ca in Ottawa 40 nodes connected by Ethernet
AMD Opteron 250 (2.4GHz) with 5G memory Redhat Linux Enterprise Edition 4.0 Batch-queuing system: Sun Grid Engine (SGE) Portland Group compilers (V 6.2) + MPICH2
n1-5 n2-5 n3-5 n4-5 | n5-5 n6-5 n7-5 n8-5n1-4 n2-4 n3-4 n4-4 | n5-4 n6-4 n7-4 n8-4n1-3 n2-3 n3-3 n4-3 | n5-3 n6-3 n7-3 n8-3n1-2 n2-2 n3-2 n4-2 | n5-2 n6-2 n7-2 n8-2n1-1 n2-1 n3-1 n4-1 | n5-1 n6-1 n7-1 n8-1
18
Results --SpeedupTable 1 CPU time and speedup for 50 iterations with Appel et al. 2000 mechanism
ProcessesSequential 4 processes 6 processes 12 processes
CPU time(s) 51313 15254 10596 5253
Speedup 1 3.36 4.84 9.77
(1)Speedup is good(2)CPU time spent on 50 iterations for the original
sequential code is 51313 seconds, i.e. 14.26 hours. Too long!
19
Results --Speedup
1
3.36
4.84
9.77
0
2
4
6
8
10
12
1 Process 4 Processes 6 Processes 12 Processes
Speedup
Fig. 3 Speedup obtained with different processes
20
Pyrene field (in mole fraction)
Results --Real applicationOH field (in mole fraction)Temperature field (in K)
Benzene field (in mole fraction)
Flame field calculation using the parallel code
(Appel 2000 mechanism)
The trend is well predicted!
21
Conclusion
The sequential flame code is parallelized with DDM
Speedup is good The parallel code is applied to model a
flame using a detailed mechanism Flexibility is good, i.e. geometry and/or #
of processors can be easily changed
22
Future Improvement
Optimized DDM Species line solver
23
Work in Progress
Fixed sectional soot modelAdd 70 equations to the original system of
equations
24
Experience
Keep communication down Wise parallelization method Debugging is hard I/O
25
Thanks
Questions?