outline - dogra.csis.oita-u.ac.jpdogra.csis.oita-u.ac.jp/tkm/study/slides/20120719-csc12.pdf · 03!...
TRANSCRIPT
Jul. 16-19, 2012 CSC’12, Las Vegas
Dependent Iterations in Scientific CalculationsParallelized by the Parareal-in-Time Algorithm
Toshiya Takami, Kyushu University, JapanIn collaboration with: K. Fukazawa, H. Honda, Y. Inadomi, R. Susukita, T. Kobayashi, and T. Nanri
1
Outline
! 1. Introduction! Scientific Computing on Massively Parallel Machines! Dependent Calculations in Scientific Computing
! 2. “Parareal” Algorithm! Parareal-in-Time as Higher-Order Perturbation
! 3. How to Implement “Parareal” Calculations! Template class: Parareal<T>! Non-blocking Communications for Further Speed-up
! 4. Conclusion2
2
1. Introduction
! Massively Parallel Computers! Almost all in the Top500 List
! Large problems with many independent components! SPMD, Master-Worker, Many Task Computing, etc.! Dependency can be resolved by communications.
! Relatively small problems! Small systems, small matrices ==> Low performance! Sequencial calculations ==> Not parallelize
3
3
Parallel Application Programs
! Roughly speaking, applications are usually configured in these types.
1. Introduction
SPMD-type Configuration
Application Program
Parallelized Numerical Library
(Linear, FFT, ...)
Master-Worker type
WorkerWorkerWorkerWorker
MasterApplication
with Domain Decomposition
Numerical Library (Linear, FFT, ...)
Para
llel
Reso
urce
Para
llel
Reso
urce
Para
llel
Reso
urce
Para
llel
Reso
urce
4
4
Dependent Calculationsin Scientific Computing! Dependent calculations:
! Time evolutions in the initial value problem:! Simulations by explicit/implicit solvers
! Convergent sequence of state vectors:! Iterative calculations for linear/nonlinear equations! Krylov subspace methods for linear problems
1. Introduction
5
x0
t = 0
・・・1 2
x1 x2 x3
3
F1 F2
・・・
・・・ xk−1 xk
k − 1 k
Fk−2F0 Fk−1F3 Fk−3Fk−4F4 F5
4 5 k − 2k − 3
xk−3 xk−2x4 x5
Domain Decomposition in Time Direction
xk+1 = Fk(xk)
5
2. “Parareal-in-Time” Algorithm
! Even strictly dependent calculations, ,can be parallelized by the use of the Parareal iterations,
! : a state vector at k-th time step! : r-th approximation of the exact state! : the exact (fine) solver (iterator)! : an approximate (coarse) solver to
! Lions, et al., showed that real-time acceleration is possibleif the coarse solver is fast enough and the iteration rapidly converges to the exact solution.
xk+1 = Fk(xk)
x(r+1)k+1 = Gk(x(r+1)
k ) + Fk(x(r)k )− Gk(x(r)
k )
J-L. Lions, Y. Maday, and G. Turinici, C. R. Acad. Sci., Ser. I: Math. 232, 661–668 (2001).
Gk(x)
xk
x(r)k
Fk(x)xk
Fk(x)
6
6
What is Parareal-in-Time?
! “Its main goal concerns real time problems, hence proposed terminology of ‘parareal’ algorithm”
! “Parareal is not the first algorithm to propose the solution of evolution problems in a time-parallel fashion”
! Multiple Shooting Methods with Newton’s Iteration! Space-Time Multigrid Method
J-L. Lions, Y. Maday, and G. Turinici, C. R. Acad. Sci., Ser. I: Math. 232, 661–668 (2001).
M. J. Gander and S. Vandewalle, SIAM J. Sci. Comput. 29, 556-578 (2007).
2. Parareal Algorithm & Applications
7
7
Parareal Algorithmas a Newton’s Iteration (1)
! Consider the procedure as a Newton’s iteration! Consider a nonlinear equation
! Newton’s iteration for this problem:
xk+1 = Fk(xk) x(r+1)k+1 = Gk(x(r+1)
k ) + Fk(x(r)k )− Gk(x(r)
k )
M. J. Gander and S. Vandewalle, SIAM J. Sci. Comput. 29, 556-578 (2007).
!(Original Problem) (Parareal-in-Time Algorithm)
f �(X(r)) =
1 0F �
0(x(r)0 ) 1
F �1(x
(r)1 )
. . .
. . . 10 F �
k−1(x(r)k−1) 1
f(X(r)) = 0
X(r+1) = X(r) −�f �(X(r))
�−1f(X(r))
f(X(r)) ≡
x(r)0 − x0
x(r)1 − F0(x
(r)0 )
...x(r)
k − Fk−1(x(r)k−1)
2. Parareal Algorithm & Applications
8
8
Parareal Algorithmas a Newton’s Iteration (2)! After some calculations, we obtain multiple shooting
relations,
! Substitute the Jacobian term by the coarse solver,
we obtain the Parareal-in-Time iteration
x(r)0 = x0
x(r+1)k+1 = Fk(x(r)
k ) + F �k(x(r)
k )�x(r+1)
k − x(r)k
�
F �k(x(r)
k )�x(r+1)
k − x(r)k
�≈ Gk(x(r+1)
k )− Gk(x(r)k )
x(r+1)k+1 = Gk(x(r+1)
k ) + Fk(x(r)k )− Gk(x(r)
k )
2. Parareal Algorithm & Applications
9
9
Applications in Early Years
! Various time-evolution problems in scientific computing are parallelized by this method! PDE with a fluid/structure coupling:
! MD with quantum interactions:
! Quantum optimal control problem:
! The coarse solver by varying the width of time-steps:
L. Baffico, S. Bernard, Y. Maday, G. Turinici, and G. Zérah, Phys. Rev. E66, 057701 (2002).
C. Farhat and M. Chandesris, Int. J. Numer. Meth. Engng. 58, 1397-1434 (2003).
Y. Maday, G. Turinici, Int. J. Quant. Chem. 93, 223-228 (2003).
2. Parareal Algorithm & Applications
10x(t + ∆t) = f∆t(x(t)), Fk = [f∆t]n, Gk ≡ fn∆t
10
Parareal Algorithmas a Higher-Order Perturbation! ‘Perturbation’ is a basic concept in science:
! Unperturbed description + complicated perturbation
! For linear problems, we write
! If we introduce perturbed sequence , we obtainthe ‘Parareal relation’upto r-th order.
xk+1 = Fkxk
= [Gk + (Fk − Gk)]xk
x(r+1)k+1 = [Gk + (Fk − Gk)]x(r+1)
k
= Gkx(r+1)k + (Fk − Gk)x(r+1)
k
≈ Gkx(r+1)k + (Fk − Gk)x(r)
k
Simple, Fast Accurate but time consuming
{x(r)k }
2. Parareal Algorithm & Applications
11
11
Scientific Computing2. Parareal Algorithm & Applications
12
Short-range MD + Ewald or FMM
xk+1 = A0 + Bkxk
FC=SC! (Hartree-Fock-Roothaan Equation)
|ψ� = exp�
∆t
i� H
�|ψ� =
�I −
�exp
�∆t
i� H
�− I
��|ψ�
! Expensive interactions are introduced as perturbation! Molecular Dynamics with long-range interactions
! Quantum systems (driven by external fields)
! Non time-evolution problems! Dependent linear calculus: general linear transformations
! Convergent sequence: Nonlinear SCF iterations
12
Speed-up Ratioin Parallel Computing! Expected speed-up:
(dashed line)
! Colored lines:! Measured values of S(r, K, P)
for matrix multiplications.! For large problem (N=8192):
! Linear speed-up! For small problem (N=1024):
! Saturated
(b)
Spee
dup
Rat
io
Total Number of Rank
Linear Speed
up
r=10r=20r=40
N=8192
N=1024
0.2
0.5
1
2
5
10
20
50
100
1 10 100 1000
S(r, K, P ) =P
r + TP +P (P − 1)
Kt
13T. Takami and A. Nishida, Adv. Par. Comp. 22, 437-444 (2012).
13
Parareal-in-Time Algorithm
3. How to Implement “Parareal”
! This algorithm should be parallelized in relatively higher layer of the software stack.
! Useful frameworks or interfaces are required.Standard
Application
Numerical Library
Communication Library
xk+1 = Fk(xk) x(r+1)k+1 = Gk(x(r+1)
k ) + Fk(x(r)k )− Gk(x(r)
k )
Standard Application
Numerical Library
Communication Library
Standard Application
Numerical Library
Communication Library
Standard Application
Numerical Library
Communication Library
Standard Application
Numerical Library
Communication Library
Dependent Dependent Dependent
14
14
C++ template class: Parareal <T>
! To support implementation of your Parareal Program:! T is a class to represent or ! Implement methods in abstract classes:
! PararealElement<T>! PararealSequence<T>
! Instantiate! Parareal<T>
! Use the following methods:! T* pararealSequence(Seq, Niter)! T* exactSequence(Seq)
3. How to Implement Parareal
15
xk x(r)k
15
Actual Codes of Parareal<T>
! T* pararealSequence(T& seq, int Niter)
16
x(r+1)k+1 = Gk(x(r+1)
k ) + Fk(x(r)k )− Gk(x(r)
k )
Independent
Dependent
3. How to Implement Parareal
16
Parallel Implementation������,.
���� ���� ��� *������ �2 03%)$�,.�����"/03!���-(�+#�03!&�+#�
G G G
G G G
G G G
G G
G
G
G
G
F
G
Process 1:
Process 2:
Process 3:
Process 4:
Iterate r times
P resources
KTg + Tc(P − 1) rK(Tf + Tg)/P
F F
F F F
F F F
F F FComputational Time
K(Tg + Tf )
KTg + Tc(P − 1) + rK(Tg + Tf )/P
E. Aubanel, Parallel Computing 37, 172-182 (2011).
+#1'03!
3. How to Implement Parareal
! Pipeline-type parallel executions are available, since taking barrier over the whole nodes is not necessary.
Communication
17
17
Speedup Ratio
! The ideal speedup of this algorithm is 1/T.
! From the previous figure, the actual speedup ratio will be
whereand
! Schematic behaviour is likethe right figure.
S(r, K, P ) =K(Tg + Tf )
KTg + Tc(P − 1) +rK(Tf + Tg)
P
=P
r + TP +P (P − 1)
Kt
T ≡ Tg/(Tg + Tf )t ≡ Tc/(Tg + Tf )
Spee
dup
Rat
io
Total Number of Ranks (P)
Limit of Speedup
K>>P>>1 K~P
Max
. Effi
cien
cy (1
/r)
0
1/T
0 K
3. How to Implement Parareal
18
18
Further Speedup
! For large P (≈K), actual speedup ratios become lower than the ideal value 1/T. <== Communication time
! How can we reducethe communication time?
! It is difficult to reduce the timeitself, but we can overlap thecalculations and communica-tions, which leads to recoverthe ideal speed-up.
3. How to Implement Parareal
Spee
dup
Rat
io
Total Number of Ranks (P)
Limit of Speedup
K>>P>>1 K~P
Max
. Effi
cien
cy (1
/r)
0
1/T
0 K
19
������,.
���� ���� ��� *������ �2 03%)$�,.�����"/03!���-(�+#�03!&�+#�
G G G
G G G
G G G
G G
G
G
G
G
F
G
Process 1:
Process 2:
Process 3:
Process 4:
Iterate r times
P resources
KTg + Tc(P − 1) rK(Tf + Tg)/P
F F
F F F
F F F
F F FComputational Time
K(Tg + Tf )
KTg + Tc(P − 1) + rK(Tg + Tf )/P
E. Aubanel, Parallel Computing 37, 172-182 (2011).
+#1'03!
19
Pipeline Pattern andNon-Blocking Communications! As a future work, non-blocking communications are used
to improve total performance of the Parareal-in-Time.! “Pipeline Interface and Library” will be written by sparse-
collective communications with reduction calculations.
3. How to Implement Parareal
20
work1
work1
work1
work2
work2
work2
resource1
resource2
resource3
20
4. Conclusion
! Through perturbative description of the “Parareal”,! We showed that various dependent iterations in scientific
problems are in the range of the “Parareal-in-Time” method.! Template class Parareal<T> for C++ is available to
implement “Parareal-in-Time” calculations.! Since, in general, it is difficult to parallelize in an upper layer,
we provide a useful library for programming.! Future Work:
! non-blocking communications can be used to hide the communication time and improve total performance
21
21
As the Third Scheme forMassively Parallel Computing! As one of the typical schemes for massively parallel computing?
SPMD-type Master-Worker type
WorkerWorkerWorkerWorker
MasterApplication
with Domain Decomposition
Numerical Library (Linear, FFT, ...)
Para
llel
Com
pone
nt
Para
llel
Com
pone
nt
22
Pipeline type
wor
k 3
wor
k 6
wor
k 2
wor
k 5
wor
k 8
wor
k 1
wor
k 4
wor
k 7
22
Thank you for your attention
! This work is supported by JST, CREST.
23
23