talk at scicade2013 about "accelerated multiple precision ode solver base on fully implicit...

27
On Numerical Properties of Accelerated Multiple Precision Implicit Runge-Kutta Methods Shizuoka Institute of Science and Technology Tomonori Kouya http://na-inet.jp/na/birk/ SciCADE2013 in Valladolid, SPAIN 2013-09-16(Mon) – 20(Fri)

Upload: shizuoka-inst-science-and-tech

Post on 07-Jul-2015

242 views

Category:

Technology


1 download

DESCRIPTION

We have implemented a multiple precision ODE solver based on high-order fully implicit Runge-Kutta(IRK) methods. This ODE solver uses any order Gauss type formulas, and can be accelerated by using (1) MPFR as multiple precision floating-point arithmetic library, (2) real tridiagonalization supported in SPARK3, of linear equations to be solved in simplified Newton method as inner iteration, (3) mixed precision iterative refinement method\cite{mixed_prec_iterative_ref}, (4) parallelization with OpenMP, and (5) embedded formulas for IRK methods. In this talk, we describe the reason why we adopt such accelerations, and show the efficiency of the ODE solver through numerical experiments such as Kuramoto-Sivashinsky equation.

TRANSCRIPT

Page 1: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

On Numerical Properties of Accelerated MultiplePrecision Implicit Runge-Kutta Methods

Shizuoka Institute of Science and TechnologyTomonori Kouya

http://na-inet.jp/na/birk/

SciCADE2013 in Valladolid, SPAIN2013-09-16(Mon) – 20(Fri)

Page 2: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Abstarct

Abstract

Motivation

IRK method with simplified Newton method

Acceleration of inner iteration and stepsize selection

Performance check by solving linear ODE

Numerical experiments of Evolutionary PDEs

Conclusion and Future work

Page 3: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

MotivationBNCpack

▶ provides double and multiple precision numerical algorithmsbased on MPFR/GMP.

▶ has simple explicit and implicit Runge-Kutta (IRK) methodsand extrapolation methods for solving ODEs.

In SciCADE 2007, a gentleman suggested to us thatKuramoto-Sivashinsky (K-S) equation is suitable for our multipleprecision ODE solvers because of one of chaotic, stiff and largescale examples of ODEs.

Accelerated Multiple precision IRK methods based on MPFR/GMPare neccesary to solve it.

Page 4: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

The Features of Accelerated IRK methods

1. It uses Gauss formula, which is 2m-th order for m stages, Aand P-stable, and symplectic method.

2. Supporting mixed precision iterative refinement method insimplified Newton iteration in IRK process can drasticallyreduce computational time.

3. The parallelization by using OpenMP can be more highlyperformed.

Page 5: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

IVP of n dimensional ODE to be solved

{dydt = f(t,y) ∈ Rn

y(t0) = y0

Integration Interval:[t0, α]

(1)

We suppose that this above ODE has the unique solution, soLipschize constant L > 0 exists to be satisfied such as

||f(t,v)− f(t,w)|| ≤ L||v −w|| (2)

for ∀v,w ∈ Rn, ∀t ∈ [t0, α].

1D Brusselator problem and K-S eq. has large L >> 1, so they arecalled “stiff problems. ”

Page 6: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Skeleton of m stages IRK methods

Discretization: t0, t1 := t0 + h0, ..., tk+1 := tk + hk...

When we calculate the approximation yk+1 ≈ y(tk+1) from theformer yk ≈ y(tk), the following two steps are executed:

(A) Inner iteration: Solve the nonlinear equation for unknownY = [Y1 ... Ym]T ∈ Rmn.

Y1 = yk + hk∑m

j=1 a1jf(tk + cjhk, Yj)...

Ym = yk + hk∑m

j=1 amjf(tk + cjhk, Yj)

⇕F(Y) = 0 (3)

(B) Calculate the next approximation yk+1 with the above Y.

yk+1 := yk + hk

m∑j=1

bjf(tk + cjhk, Yj)

Page 7: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Coefficients of m stages Runge-Kutta method

We use IRK coefficients such as:

c1 a11 · · · a1m...

......

cm am1 · · · amm

b1 · · · bm

=c A

bT (4)

Our IRK solver only uses Gauss formula family which is one of fullyIRK formulas (aij = 0 (i ≤ j)).

Page 8: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Simplified Newton method as inner iteration of IRKmethod

RADAU5 (by Hairer) and SPARK3(by Jay) use simplified Newtonmethod as inner iteration to solve the nonlinear equation (3).

Simplified Newton Method:

Yl+1 := Yl−(Im ⊗ In − hkA⊗ J)−1F(Yl) (5)

where In and Im are n× n and m×m unit matrix respectively, J= ∂f/∂y(tk,yk) ∈ Rn×n is the Jacobi matrix corresponding to f .

⇒ We must solve the following linear equation for each iteration ofsimplified Newton method (5):

(In ⊗ Im − hkA⊗ J)Z = −F(Yl) (6)

and then obtain the solution Z and calculate Yl+1 := Yl + Z.

Page 9: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Why do we select SPARK3 reduction, not RADAU5?RADAU5: Complex Diagonalization of A by Complex SimilarityTransformation Matrix S

(S ⊗ In)(Im ⊗ In − hA⊗ J)(S−1 ⊗ In) = Im ⊗ In − hΛ⊗ J

=

In − hλ1J. . .

In − hλmJ

.

SPARK3: Real Tridiagonalization of A by Real Similarity TransformationMatrix W

X = WTBAW =

1/2 −ζ1

ζ1 0. . .

. . .. . . −ζm−2

ζm−2 0 −ζm−1

ζm−1 0

where W = [wij ] = [Pj−1(ci)] (i, j = 1, 2, ...,m)

ζi =(2√

4i2 − 1)−1

(i = 1, 2, ...,m− 1)

B = diag(b), Im = WTBW = diag(1 1 · · · 1)

Page 10: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Condition numbers of two kinds of similaritytransformation matrices

m 3 5 10 15 20 50

κ∞(S) 22.0 388 3.× 105 3.× 108 2.× 1011 4.× 1028

κ∞(W ) 3.24 6.27 16.4 29.3 44.5 172

▶ RADAU5’s S has larger condition numbers(κ∞(S) = ∥S∥∞∥S−1∥∞) as the number of stages of IRKformulas.

▶ SPARK3’s W condition number (κ∞(W ) = ∥W∥∞∥W−1∥∞)become mildly larger.

=⇒ SPARK3 reduction is the only one selection for many stagesIRK formulas.

Page 11: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

SPARK3 Reduction(1/3)

The coefficient matrix of the linear equation for SPARK3 reductionis:

(W TB ⊗ In)(Im ⊗ In − hkA⊗ J)(W ⊗ In)

= Im ⊗ In − hkX ⊗ J =

E1 F1

G1 E2 F2

. . .. . .

. . .

Gm−2 Em−1 Fm−1

Gm−1 Em

where

E1 = In − 1

2hkJ, E2 = · · · = Es = In

Fi = hkζiJ, Gi = −hkζiJ (i = 1, 2, ...,m− 1).

Page 12: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

SPARK3 Reduction(2/3)

Jay proposed the left preconditioned matrix P for linear solver sushas:

P =

E1 F1

G1 E2 F2

. . .. . .

. . .

Gm−2 Em−1 Fm−1

Gm−1 Em

≈ Im ⊗ In − hkX ⊗ J

so the preconditioned linear equation to be solved for Z is

P−1(Im ⊗ In − hkX ⊗ J)Z = P−1(W TB ⊗ In)(−F(Y)).

Page 13: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

SPARK3 Reduction (3/3)We use LU decomposed P such as

P =

In

G1H−11 In

. . .. . .

Gm−2H−1m−2 In

Gm−1H−1m−1 In

×

H1 F1

H2 F2

. . .. . .

Hm−1 Fm−1

Hm

where

Hi := In − (2(2i− 1))−1hJ (i = 1, 2, ...,m).

(cf.) ”A Parallelizable Preconditioner for the Iterative Solution of Implicit Runge-Kutta-type Methods”, Journal of

Computational and Applied Mathematics 111 (1999) P.63-76

Page 14: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Mixed precision iterative refinement method

Mixed precision iterative refinement method is to reducecomputational cost by combining short S digits arithmetic andlong L digits arithmetic (S << L).

The linear equation to be solved: Cx = d , C ∈ RN×N , d,x ∈ RN

=⇒

(L) Solve Cx0 = d for x0.

For ν = 0, 1, 2, ...

(L) rν := d− Cxν

(S) r′ν := rν/∥rν∥(S) Solve Cz = r′ν for z.(L) xν+1 := xν + ∥rν∥z

Check convergence.

=⇒ x := xνstop

(cf.) Buttari, Alfredo, et al. International Journal of High Performance

Computing Applications 21.4 (2007): 457-466.

Page 15: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

The whole algorithm of accelerated IRK methodInitial guess: Y−1 ∈ Rmn

For l = 0, 1, 2, ... Simplified Newton iteration

(1) Yl := [Y(l)1 Y

(l)2 ... Y

(l)m ]T

(2) C := Im ⊗ In − hkX ⊗ J , Compute ||C||F(3) d := (W TB ⊗ In)(−F(Yl))(4) Solve Cx0 = d for x0 (S)

For ν = 0, 1, 2, ... Mixed precision iterative refinement

(5) rν := d− Cxν

(6-1) r′ν := rν/||rν || (S)(6-2) Solve Cz = r′ν for z (S)(6-3) xν+1 := xν + ||rν ||z(6-4) Check convergence ⇒ xνstop

(7) Yl+1 := Yl + (W ⊗ In)xνstop

Check convergence ⇒ Ylstop

Y := Ylstop = [Y1 Y2 ... Ym]T

yk+1 := yk + hk∑m

j=1 bjf(tk + cjhk, Yj)

Page 16: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Computational environment

H/W Intel Core i7 3820 (4 cores) 3.6GHz + 64GB RAM

OS Scientific Linux 6.3 x86 64

S/W Intel C++ 13.0.1, MPFR 3.1.1/GMP 5.1.1,BNCpack 0.8

▶ OpenMP in Intel C++ standard.

▶ Block Parallelization for capable parts of IRK methods.

▶ Except left preconditioning and direct method.

Page 17: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Performance check by solving 128th dimentional constantlinear ODE (50 decimal digits)

1.E-38

1.E-34

1.E-30

1.E-26

1.E-22

1.E-18

1.E-14

1.E-10

1.E-06

1.E-02

1.E+02

1.E+06

1.E+10

1.E+14

0

200

400

600

800

1000

1200

1400

1600

3 4 5 6 7 8 9 10 11 12

Relative ErrorComp.Time (s)

m

Iter.Ref-DM W-Trans. W-Iter.Ref-MM W-Iter.Ref-DM Max.Rel.Err

Iter.Ref-DM No reduction + quasi-Newton + Double Precision (DP) -Multiple Precision (MP) mixed precision iterativerefinement method (based on direct method)

W-Trans. SPARK3 reduction + MP direct methodW-Iter.Ref-MM SPARK3 + MP(S = L/2)-MP iterative refinementW-Iter.Ref-DM SPARK3 + DP-MP iterative refinement

Page 18: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Stepsize selection by embedded formula (1/2)

Embedded formula for IRK methods (by Hairer): The followingm+ 1 stages IRK formula for given contant γ0:

0 0 0T

c 0 A

γ0 bT

In order to extend A stable area, we select γ0 = 1/8 whereb= [b1 · · · bm]T is obtained by solving the following linear equationto be satisfied in simplified assumption B(m):

1 · · · 1c1 · · · cm...

...

cm−11 · · · cm−1

m

b1b2...

bm

=

1− γ01/2...

1/m

.

Page 19: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Stepsize selection by embedded formula (2/2)

By using this embedded formula , we can get yk+1 as following:

yk+1 := yk + hkγ0f(tk,yk) + hk

m∑j=1

bjf(tk + cjhk, Yj).

And we use the yk+1 for the following local error estimator errk.

||errk|| =

√√√√ 1

n

n∑i=1

(|y(k+1)

i − y(k+1)i |

ATOL+RTOLmax(|y(k)i |, |y(k+1)i |)

)2

where ATOL is set as absolute tolerance and RTOL as relativetolerance given by users.This estimator is used in next stepsize hk+1 prediction as following:

hk+1 := 0.9||errk||m+1hk

Page 20: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Numerical experiments of Evolutionary PDEs

▶ 1D Brusselator Problem (omit!){∂u∂t = 1 + u2v − 4 + 0.02 · ∂2u

∂x2

∂v∂t = 3u− u2v + 0.02 · ∂2v

∂x2

(7)

▶ 1D Kuramoto-Sivashinsky (K-S) equation

∂U

∂t= −∂2U

∂x2− ∂4U

∂x4− 1

2

∂U2

∂x(8)

Page 21: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

1D Kuramoto-Sivashinsky Equation: Discretizationmethod(1/2)cf. Hairer & Wanner, Solving ODE II, Chap. IV, pp.148 - 149.

∂U

∂t= −∂2U

∂x2− ∂4U

∂x4− 1

2

∂U2

∂x

Periodic boundary condition: U(x+ L, t) = U(x, t)

Initial value: U(x, 0) = 16max(0,

min(x/L, 0.1− x/L),

20(x/L− 0.2)(0.3− x/L),

min(x/L− 0.6, 0.7− x/L),

min(x/L− 0.9, 1− x/L))

Parameters: L = 2π/q, q = 0.025

Page 22: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

1D Kuramoto-Sivashinsky Equation: Discretizationmethod(2/2)

⇓ Discretization by using pseudospectral method

Uj(t) =1

L

∫ L

0U(x, t) exp(−iqjx)dx

U(x, t) =∑j∈Z

Uj(t) exp(iqjx)

dUj

dt= ((qj)2 − (qj)4)Uj −

iqj

2(U · U)j (j ∈ Z)

⇓ Truncating at N = 1024, we make ODE for y(t) = {yj(t)}dyjdt

= ((qj)2−(qj)4)yj−iqj

2FN (F−1

N y·F−1N y) (j = 1, 2, ..., N/2−1)

where FN , F−1N means FFT and inverse FFT, respectively.

⋆ Mutiple precision real FFT and inverse real FFT routines areoriginated by Ooura’s double precision C routines.http://www.kurims.kyoto-u.ac.jp/~ooura/fftman/ftmn2_12.htm.

Page 23: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

K-S eq. : Numerical values by Multiple precision andRADAU5(Double precision)

Page 24: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

K-S eq. : Relative Errors of RADAU5 (Double precision)

Page 25: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

K-S eq.: Computational Times by using variable #stagesIRK formulas

4 threads, 80 stages formulas in 100 decimal digits as the truesolution, and t = 10

80 dec.digits RTOL = ATOL = 10−60

# stages(m) 20 30 40 50

Comp.Time(s) 130165.4 160601.8 133541.0 190131.4# steps 6911 2667 1103 856

Average (s) 18.8 60.2 121.1 222.1Max.Rel.Error 4.2E-38 2.7E-38 1.4E-38 2.0E-36Min.Rel.Error 1.1E-54 1.8E-50 1.7E-52 1.0E-63

RTOL = ATOL = 10−70

# stages(m) 20 30 40 50

Comp.Time(s) 100695.2 86331.4 137232.9 200454.8# steps 6978 1738 1175 918

Average (s) 14.4 49.7 116.8 218.4Max.Rel.Error 4.4E-48 1.9E-49 2.4E-47 5.1E-47Min.Rel.Error 2.7E-68 5.9E-68 1.3E-62 1.2E-68

Page 26: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Conclusion

▶ We can implement the accelerated multiple precision IRKmethods with DP-MP mixed precision iterative refinementmethod and SPARK3 reduction in inner simplified Newtoniteration.

▶ Parallelization can reduce the computational cost.

▶ Our implemented ODE solver is available for solving complexevolutionary PDEs such as Brusselator problem or 1DKuramoto-Sivashinsky equation.

Page 27: Talk at SciCADE2013 about "Accelerated Multiple Precision ODE solver base on Fully Implicit Runge-Kutta Methods"

Future work

We have the following plans to:

1. Seek higher performance ODE solver in massively parallelcomputation environment such as GPGPU or Intel MIC.

2. Implement stable double precision linear solvers such asGMRES(m) or other stable Krylov subspace methods.

3. Solve many other problems by our ODE solver.

A part of our implemented ODE solver is published asBIRK(extented Bncpack for Implicit Runge-Kutta methods) in ourWeb site.

http://na-inet.jp/na/birk/