outline - dogra.csis.oita-u.ac.jpdogra.csis.oita-u.ac.jp/tkm/study/slides/20120719-csc12.pdf · 03!...

6
Jul. 16-19, 2012 CSC’12, Las Vegas Dependent Iterations in Scientific Calculations Parallelized by the Parareal-in-Time Algorithm Toshiya Takami , Kyushu University, Japan In collaboration with: K. Fukazawa, H. Honda, Y. Inadomi, R. Susukita, T. Kobayashi, and T. Nanri 1 Outline ! 1. Introduction ! Scientific Computing on Massively Parallel Machines ! Dependent Calculations in Scientific Computing ! 2. “Parareal” Algorithm ! Parareal-in-Time as Higher-Order Perturbation ! 3. How to Implement “Parareal” Calculations ! Template class: Parareal<T> ! Non-blocking Communications for Further Speed-up ! 4. Conclusion 2 2 1. Introduction ! Massively Parallel Computers ! Almost all in the Top500 List ! Large problems with many independent components ! SPMD, Master-Worker, Many Task Computing, etc. ! Dependency can be resolved by communications. ! Relatively small problems ! Small systems, small matrices ==> Low performance ! Sequencial calculations ==> Not parallelize 3 3 Parallel Application Programs ! Roughly speaking, applications are usually configured in these types. 1. Introduction SPMD-type Configuration Application Program Parallelized Numerical Library (Linear, FFT, ...) Master-Worker type Worker Worker Worker Worker Master Application with Domain Decomposition Numerical Library (Linear, FFT, ...) Parallel Resource Parallel Resource Parallel Resource Parallel Resource 4 4

Upload: others

Post on 24-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Outline - dogra.csis.oita-u.ac.jpdogra.csis.oita-u.ac.jp/tkm/study/slides/20120719-CSC12.pdf · 03! -( +# 03!& +# G G G G G G G G G G G G G G Process 1: F Process 2: Process 3: Process

Jul. 16-19, 2012 CSC’12, Las Vegas

Dependent Iterations in Scientific CalculationsParallelized by the Parareal-in-Time Algorithm

Toshiya Takami, Kyushu University, JapanIn collaboration with: K. Fukazawa, H. Honda, Y. Inadomi, R. Susukita, T. Kobayashi, and T. Nanri

1

Outline

! 1. Introduction! Scientific Computing on Massively Parallel Machines! Dependent Calculations in Scientific Computing

! 2. “Parareal” Algorithm! Parareal-in-Time as Higher-Order Perturbation

! 3. How to Implement “Parareal” Calculations! Template class: Parareal<T>! Non-blocking Communications for Further Speed-up

! 4. Conclusion2

2

1. Introduction

! Massively Parallel Computers! Almost all in the Top500 List

! Large problems with many independent components! SPMD, Master-Worker, Many Task Computing, etc.! Dependency can be resolved by communications.

! Relatively small problems! Small systems, small matrices ==> Low performance! Sequencial calculations ==> Not parallelize

3

3

Parallel Application Programs

! Roughly speaking, applications are usually configured in these types.

1. Introduction

SPMD-type Configuration

Application Program

Parallelized Numerical Library

(Linear, FFT, ...)

Master-Worker type

WorkerWorkerWorkerWorker

MasterApplication

with Domain Decomposition

Numerical Library (Linear, FFT, ...)

Para

llel

Reso

urce

Para

llel

Reso

urce

Para

llel

Reso

urce

Para

llel

Reso

urce

4

4

Page 2: Outline - dogra.csis.oita-u.ac.jpdogra.csis.oita-u.ac.jp/tkm/study/slides/20120719-CSC12.pdf · 03! -( +# 03!& +# G G G G G G G G G G G G G G Process 1: F Process 2: Process 3: Process

Dependent Calculationsin Scientific Computing! Dependent calculations:

! Time evolutions in the initial value problem:! Simulations by explicit/implicit solvers

! Convergent sequence of state vectors:! Iterative calculations for linear/nonlinear equations! Krylov subspace methods for linear problems

1. Introduction

5

x0

t = 0

・・・1 2

x1 x2 x3

3

F1 F2

・・・

・・・ xk−1 xk

k − 1 k

Fk−2F0 Fk−1F3 Fk−3Fk−4F4 F5

4 5 k − 2k − 3

xk−3 xk−2x4 x5

Domain Decomposition in Time Direction

xk+1 = Fk(xk)

5

2. “Parareal-in-Time” Algorithm

! Even strictly dependent calculations, ,can be parallelized by the use of the Parareal iterations,

! : a state vector at k-th time step! : r-th approximation of the exact state! : the exact (fine) solver (iterator)! : an approximate (coarse) solver to

! Lions, et al., showed that real-time acceleration is possibleif the coarse solver is fast enough and the iteration rapidly converges to the exact solution.

xk+1 = Fk(xk)

x(r+1)k+1 = Gk(x(r+1)

k ) + Fk(x(r)k )− Gk(x(r)

k )

J-L. Lions, Y. Maday, and G. Turinici, C. R. Acad. Sci., Ser. I: Math. 232, 661–668 (2001).

Gk(x)

xk

x(r)k

Fk(x)xk

Fk(x)

6

6

What is Parareal-in-Time?

! “Its main goal concerns real time problems, hence proposed terminology of ‘parareal’ algorithm”

! “Parareal is not the first algorithm to propose the solution of evolution problems in a time-parallel fashion”

! Multiple Shooting Methods with Newton’s Iteration! Space-Time Multigrid Method

J-L. Lions, Y. Maday, and G. Turinici, C. R. Acad. Sci., Ser. I: Math. 232, 661–668 (2001).

M. J. Gander and S. Vandewalle, SIAM J. Sci. Comput. 29, 556-578 (2007).

2. Parareal Algorithm & Applications

7

7

Parareal Algorithmas a Newton’s Iteration (1)

! Consider the procedure as a Newton’s iteration! Consider a nonlinear equation

! Newton’s iteration for this problem:

xk+1 = Fk(xk) x(r+1)k+1 = Gk(x(r+1)

k ) + Fk(x(r)k )− Gk(x(r)

k )

M. J. Gander and S. Vandewalle, SIAM J. Sci. Comput. 29, 556-578 (2007).

!(Original Problem) (Parareal-in-Time Algorithm)

f �(X(r)) =

1 0F �

0(x(r)0 ) 1

F �1(x

(r)1 )

. . .

. . . 10 F �

k−1(x(r)k−1) 1

f(X(r)) = 0

X(r+1) = X(r) −�f �(X(r))

�−1f(X(r))

f(X(r)) ≡

x(r)0 − x0

x(r)1 − F0(x

(r)0 )

...x(r)

k − Fk−1(x(r)k−1)

2. Parareal Algorithm & Applications

8

8

Page 3: Outline - dogra.csis.oita-u.ac.jpdogra.csis.oita-u.ac.jp/tkm/study/slides/20120719-CSC12.pdf · 03! -( +# 03!& +# G G G G G G G G G G G G G G Process 1: F Process 2: Process 3: Process

Parareal Algorithmas a Newton’s Iteration (2)! After some calculations, we obtain multiple shooting

relations,

! Substitute the Jacobian term by the coarse solver,

we obtain the Parareal-in-Time iteration

x(r)0 = x0

x(r+1)k+1 = Fk(x(r)

k ) + F �k(x(r)

k )�x(r+1)

k − x(r)k

F �k(x(r)

k )�x(r+1)

k − x(r)k

�≈ Gk(x(r+1)

k )− Gk(x(r)k )

x(r+1)k+1 = Gk(x(r+1)

k ) + Fk(x(r)k )− Gk(x(r)

k )

2. Parareal Algorithm & Applications

9

9

Applications in Early Years

! Various time-evolution problems in scientific computing are parallelized by this method! PDE with a fluid/structure coupling:

! MD with quantum interactions:

! Quantum optimal control problem:

! The coarse solver by varying the width of time-steps:

L. Baffico, S. Bernard, Y. Maday, G. Turinici, and G. Zérah, Phys. Rev. E66, 057701 (2002).

C. Farhat and M. Chandesris, Int. J. Numer. Meth. Engng. 58, 1397-1434 (2003).

Y. Maday, G. Turinici, Int. J. Quant. Chem. 93, 223-228 (2003).

2. Parareal Algorithm & Applications

10x(t + ∆t) = f∆t(x(t)), Fk = [f∆t]n, Gk ≡ fn∆t

10

Parareal Algorithmas a Higher-Order Perturbation! ‘Perturbation’ is a basic concept in science:

! Unperturbed description + complicated perturbation

! For linear problems, we write

! If we introduce perturbed sequence , we obtainthe ‘Parareal relation’upto r-th order.

xk+1 = Fkxk

= [Gk + (Fk − Gk)]xk

x(r+1)k+1 = [Gk + (Fk − Gk)]x(r+1)

k

= Gkx(r+1)k + (Fk − Gk)x(r+1)

k

≈ Gkx(r+1)k + (Fk − Gk)x(r)

k

Simple, Fast Accurate but time consuming

{x(r)k }

2. Parareal Algorithm & Applications

11

11

Scientific Computing2. Parareal Algorithm & Applications

12

Short-range MD + Ewald or FMM

xk+1 = A0 + Bkxk

FC=SC! (Hartree-Fock-Roothaan Equation)

|ψ� = exp�

∆t

i� H

�|ψ� =

�I −

�exp

�∆t

i� H

�− I

��|ψ�

! Expensive interactions are introduced as perturbation! Molecular Dynamics with long-range interactions

! Quantum systems (driven by external fields)

! Non time-evolution problems! Dependent linear calculus: general linear transformations

! Convergent sequence: Nonlinear SCF iterations

12

Page 4: Outline - dogra.csis.oita-u.ac.jpdogra.csis.oita-u.ac.jp/tkm/study/slides/20120719-CSC12.pdf · 03! -( +# 03!& +# G G G G G G G G G G G G G G Process 1: F Process 2: Process 3: Process

Speed-up Ratioin Parallel Computing! Expected speed-up:

(dashed line)

! Colored lines:! Measured values of S(r, K, P)

for matrix multiplications.! For large problem (N=8192):

! Linear speed-up! For small problem (N=1024):

! Saturated

(b)

Spee

dup

Rat

io

Total Number of Rank

Linear Speed

up

r=10r=20r=40

N=8192

N=1024

0.2

0.5

1

2

5

10

20

50

100

1 10 100 1000

S(r, K, P ) =P

r + TP +P (P − 1)

Kt

13T. Takami and A. Nishida, Adv. Par. Comp. 22, 437-444 (2012).

13

Parareal-in-Time Algorithm

3. How to Implement “Parareal”

! This algorithm should be parallelized in relatively higher layer of the software stack.

! Useful frameworks or interfaces are required.Standard

Application

Numerical Library

Communication Library

xk+1 = Fk(xk) x(r+1)k+1 = Gk(x(r+1)

k ) + Fk(x(r)k )− Gk(x(r)

k )

Standard Application

Numerical Library

Communication Library

Standard Application

Numerical Library

Communication Library

Standard Application

Numerical Library

Communication Library

Standard Application

Numerical Library

Communication Library

Dependent Dependent Dependent

14

14

C++ template class: Parareal <T>

! To support implementation of your Parareal Program:! T is a class to represent or ! Implement methods in abstract classes:

! PararealElement<T>! PararealSequence<T>

! Instantiate! Parareal<T>

! Use the following methods:! T* pararealSequence(Seq, Niter)! T* exactSequence(Seq)

3. How to Implement Parareal

15

xk x(r)k

15

Actual Codes of Parareal<T>

! T* pararealSequence(T& seq, int Niter)

16

x(r+1)k+1 = Gk(x(r+1)

k ) + Fk(x(r)k )− Gk(x(r)

k )

Independent

Dependent

3. How to Implement Parareal

16

Page 5: Outline - dogra.csis.oita-u.ac.jpdogra.csis.oita-u.ac.jp/tkm/study/slides/20120719-CSC12.pdf · 03! -( +# 03!& +# G G G G G G G G G G G G G G Process 1: F Process 2: Process 3: Process

Parallel Implementation������,.

���� ���� ��� *������ �2 03%)$�,.�����"/03!���-(�+#�03!&�+#�

G G G

G G G

G G G

G G

G

G

G

G

F

G

Process 1:

Process 2:

Process 3:

Process 4:

Iterate r times

P resources

KTg + Tc(P − 1) rK(Tf + Tg)/P

F F

F F F

F F F

F F FComputational Time

K(Tg + Tf )

KTg + Tc(P − 1) + rK(Tg + Tf )/P

E. Aubanel, Parallel Computing 37, 172-182 (2011).

+#1'03!

3. How to Implement Parareal

! Pipeline-type parallel executions are available, since taking barrier over the whole nodes is not necessary.

Communication

17

17

Speedup Ratio

! The ideal speedup of this algorithm is 1/T.

! From the previous figure, the actual speedup ratio will be

whereand

! Schematic behaviour is likethe right figure.

S(r, K, P ) =K(Tg + Tf )

KTg + Tc(P − 1) +rK(Tf + Tg)

P

=P

r + TP +P (P − 1)

Kt

T ≡ Tg/(Tg + Tf )t ≡ Tc/(Tg + Tf )

Spee

dup

Rat

io

Total Number of Ranks (P)

Limit of Speedup

K>>P>>1 K~P

Max

. Effi

cien

cy (1

/r)

0

1/T

0 K

3. How to Implement Parareal

18

18

Further Speedup

! For large P (≈K), actual speedup ratios become lower than the ideal value 1/T. <== Communication time

! How can we reducethe communication time?

! It is difficult to reduce the timeitself, but we can overlap thecalculations and communica-tions, which leads to recoverthe ideal speed-up.

3. How to Implement Parareal

Spee

dup

Rat

io

Total Number of Ranks (P)

Limit of Speedup

K>>P>>1 K~P

Max

. Effi

cien

cy (1

/r)

0

1/T

0 K

19

������,.

���� ���� ��� *������ �2 03%)$�,.�����"/03!���-(�+#�03!&�+#�

G G G

G G G

G G G

G G

G

G

G

G

F

G

Process 1:

Process 2:

Process 3:

Process 4:

Iterate r times

P resources

KTg + Tc(P − 1) rK(Tf + Tg)/P

F F

F F F

F F F

F F FComputational Time

K(Tg + Tf )

KTg + Tc(P − 1) + rK(Tg + Tf )/P

E. Aubanel, Parallel Computing 37, 172-182 (2011).

+#1'03!

19

Pipeline Pattern andNon-Blocking Communications! As a future work, non-blocking communications are used

to improve total performance of the Parareal-in-Time.! “Pipeline Interface and Library” will be written by sparse-

collective communications with reduction calculations.

3. How to Implement Parareal

20

work1

work1

work1

work2

work2

work2

resource1

resource2

resource3

20

Page 6: Outline - dogra.csis.oita-u.ac.jpdogra.csis.oita-u.ac.jp/tkm/study/slides/20120719-CSC12.pdf · 03! -( +# 03!& +# G G G G G G G G G G G G G G Process 1: F Process 2: Process 3: Process

4. Conclusion

! Through perturbative description of the “Parareal”,! We showed that various dependent iterations in scientific

problems are in the range of the “Parareal-in-Time” method.! Template class Parareal<T> for C++ is available to

implement “Parareal-in-Time” calculations.! Since, in general, it is difficult to parallelize in an upper layer,

we provide a useful library for programming.! Future Work:

! non-blocking communications can be used to hide the communication time and improve total performance

21

21

As the Third Scheme forMassively Parallel Computing! As one of the typical schemes for massively parallel computing?

SPMD-type Master-Worker type

WorkerWorkerWorkerWorker

MasterApplication

with Domain Decomposition

Numerical Library (Linear, FFT, ...)

Para

llel

Com

pone

nt

Para

llel

Com

pone

nt

22

Pipeline type

wor

k 3

wor

k 6

wor

k 2

wor

k 5

wor

k 8

wor

k 1

wor

k 4

wor

k 7

22

Thank you for your attention

! This work is supported by JST, CREST.

23

23