large-scale reservoir simulation on gpu - gpu technology...

22
RESERVOIR SIMULATION Large-Scale Reservoir Simulation on GPU Song Yu, Hui Liu Advisor: Dr. Zhangxing (John) Chen University of Calgary

Upload: others

Post on 04-Apr-2020

32 views

Category:

Documents


0 download

TRANSCRIPT

RESERVOIR SIMULATION

Large-Scale Reservoir Simulation on GPU

Song Yu, Hui Liu

Advisor: Dr. Zhangxing (John) Chen

University of Calgary

RESERVOIR SIMULATION

Outline •  Introduction

•  GPU-based Linear Solver

•  GPU-based Reservoir Simulation

•  Numerical Experiments

•  Conclusions

RESERVOIR SIMULATION

Introduction

•  Numerical method: FDM, FEM, FVM à matrix system

•  A system matrix arising from simulation: sparse, highly nonsymmetric and ill-conditioned.

•  The general choice: Krylov subspace solvers with preconditioners.

•  Large-scale Reservoir simulation time: 80% -90% on solver

•  Speed up linear solvers à speed up reservoir simulation

RESERVOIR SIMULATION

GPU Architecture (Tesla) D

RA

M I/

F H

OST

I/F

Gig

a Th

read

D

RA

M I/

F DR

AM

I/F D

RA

M I/F

DR

AM

I/F D

RA

M I/F

L2

GPU SM

RESERVOIR SIMULATION

GPU-based Linear Solver Package

RESERVOIR SIMULATION

GMRES Iterative algorithm used for solving linear system of equations in the form of Ax = b For an m*m matrix, GMRES guarantees convergence to the exact solution within m iterations. In reality, m is a very large number, so we use restart GMRES(m). GMRES converges after a small number of iterations when it is used in conjunction with a good preconditioner.

Main computational factor: •  BLAS operation: •  Matrix-vector product:

•  Preconditioning operation:

T

y x vector scalez x y dot producty x y saxpy

α

α

=

=

= +

y Ax=

Mr b=

RESERVOIR SIMULATION

Preconditioner The convergence rate of iterative linear solvers depends highly on the condition number of the matrix. Preconditioners are used to reduce the matrix condition number and speed up the convergence of iterative solvers. Ax = b à M-1Ax = M-1b M ≈ A ≈ LU Two criteria to choose M : 1: good approximation of A 2: easy to compute M-1 or solve Mx=b •  ILU is one of the most popular preconditioner families. Some non-zero elements in the L and U factors are ignored to reduce the cost

and the number of fill-ins. ILU has many varieties based on the level of fill-in. 1. no fill-in ILU: ILU(0), is the simplest one. In ILU(0), the lower and upper triangular matrices only keep non-zero elements, whose positions have non-zero elements in the original matrix. 2. fill-in ILU : ILUT with numerical threshold and ILUk with fill-in level k The more fill-in, the more time the factorization takes. It is a trade-off between accuracy and speed

RESERVOIR SIMULATION

Sparse matrix vector multiplication

•  Matrix: HEC format, Hybrid of ELL format and CSR format

J V Ap

J

V

ELL format CSR format

i

i i+1

RESERVOIR SIMULATION

GPU-based Linear Solver Package

RESERVOIR SIMULATION

GPU-based Reservoir Simulation

•  Conservation Equations –  Material Conservation

–  Energy Conservation

•  Linear Solver –  Linear Solver, eg. GMRES, BICGSTAB, ORTHOMIN

–  Non-Linear (Newtonian) Solver

RESERVOIR SIMULATION

Jacobian Matrix Example

nRRRRRRRRR

TspTspTsp

n

e

w

o

e

w

o

e

w

o

o

o

o

o

o

o

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

SR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

SR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

TR

sR

pR

e

o

e

o

ee

o

e

o

e

w

o

w

o

ww

o

w

o

w

o

o

o

o

oo

o

o

o

o

e

o

e

o

ee

o

e

o

ee

o

e

o

e

w

o

w

o

ww

o

w

o

ww

o

w

o

w

o

o

o

o

oo

o

o

o

oo

o

o

o

o

e

o

e

o

ee

o

e

o

e

w

o

w

o

ww

o

w

o

w

o

o

o

o

oo

o

o

o

o

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

−=

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

Δ

Δ

Δ

Δ

Δ

Δ

Δ

Δ

Δ

⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

ΔΔ

3

3

3

2

2

2

1

1

1

3

3

3

2

2

2

1

1

1

3

3

3

3

3

3

2

3

2

3

2

3

3

3

3

3

3

3

2

3

2

3

2

3

3

3

3

3

3

3

2

3

2

3

2

3

3

2

3

2

3

2

2

2

2

2

2

2

1

2

1

2

1

2

3

2

3

2

3

2

2

2

2

2

2

2

1

2

1

2

1

2

3

2

3

2

3

2

2

2

2

2

2

2

1

2

1

2

1

2

2

1

2

1

2

1

1

1

1

1

1

1

2

1

2

1

2

1

1

1

1

1

1

1

2

1

2

1

2

1

1

1

1

1

1

1

000000000

000000000

RESERVOIR SIMULATION

GPU-based Reservoir Simulation

All timesteps done?

Start timestep loop

Initialization

Start Newton iteration

Build Jacobian & r.h.s.

Solve matrix equation

Converged?

Update and I/O

End

No

Yes

No

Yes

Data input

Yes

Data input

Time to end?

Start time step loop

Initialization

Start Newton iteration

Build Jacobian & r.h.s.

Solve matrix equation on GPU

Converged?

Update and I/O

End

No

Yes

No Matrix solver

Matrix preprocess on CPU

Generate PC M on CPU

Transfer DATA to GPU

Solve Ax = b on GPU

Transfer x back to CPU

RESERVOIR SIMULATION

Numerical Experiments •  CPU, Intel Xeon X5570, 8M cache, 2.93GHz, 32G memory •  GPU, NVIDIA Tesla C2050/C2070, 3G/6G memory, ECC

disabled •  Environment: Linux (Fedora 13 x86_64, kernel 2.6.34.7-61),

CUDA Toolkit 4.0, GCC 4.4.5 •  Compiler options: -arch=sm_20 –Xcompiler “-Wall” –O3

RESERVOIR SIMULATION

Numerical Experiments

case 1: Testing 4 preconditioners and 3 solvers.

case 2: Testing the effect of block number to the speedup performance of BILU(0) and BILU(T)

case 3: Testing the speedup of the whole simulation process

Matrix N NNZ NNZ/ROW SPE10-1 2,188,851 29,915,573 13.7 SPE10-2 2,188,851 29,915,573 13.7

Case description

RESERVOIR SIMULATION

Case 1

Matrix N NNZ NNZ/ROW SPE10-1 2,188,851 29,915,573 13.7

Relative tolerance 1E-3 Restart m 40

Neumann Polynomial order 16 METIS partition 8

Case description

Experimental parameter

RESERVOIR SIMULATION

Solver PC Iteration CPU time GPU time Speedup GMRES Neumann Poly 30 1620.5 125.9 12.9

ILU(0) 18 263.8 27.9 9.5 BILU(0) 20 307.8 27.5 11.2

Performance comparison

All the PC à speedup of 10x Bilu(0) and ILU(0) both converge fast.

RESERVOIR SIMULATION

Solver PC Iteration CPU time GPU time Speedup BiCGSTAB Neumann Poly 359 740.7 64.7 11.4

ILU(0) 260/249 84.3 11.7 7.2 BILU(0) 243 85.6 9 9.5

Performance comparison

Solver PC Iteration CPU time GPU time Speedup ORTHOMIN Neumann Poly 543 1449.9 114.1 12.7

ILU(0) 392 284.8 30.1 9.5 BILU(0) 400 283 27.6 10.3

Speed up à 10x BICGSTAB with ILU(0) and BILU(0) solved faster than GMRES and ORTHOMIN

RESERVOIR SIMULATION

Blks Iteration CPU time GPU time Speedup 1 21 121.1 15 8.14 4 23 124.33 15 8.27 8 23 126.40 15.32 8.23

16 29 180.06 19.05 9.44

Blks Iteration CPU time GPU time Speedup 1 5 34.20 11.70 2.92 4 7 44.67 10.35 4.30 8 7 45.78 9.58 4.76

16 10 63.13 12.43 5.07

GMRES(20) + block ILU(0)

GMRES(20) + block ILUT

Case 2

RESERVOIR SIMULATION

Case 3: GPU-based Reservoir Simulation

•  The SPE 10 Comparative Solution Project •  Fine grid (60 * 220 * 85) •  Highly heterogeneous

Relative tolerance 1e-3 Restart m 60

Neumann Polynomial order

16

Number of blocks 8

RESERVOIR SIMULATION

Solver PC CPU time GPU time Speedup

GMRES Neumann Poly 4h49m23s 29m43s 9.7

ILU(0) 1h30m16s 17m18s 5.2

BILU(0) 2h37m02s 20m18s 7.7

BiCGSTAB Neumann Poly 4h14m57s 36m13s 7

ILU(0) 1h0m40s 31m42s 1.9 BILU(0) 1h7m22s 34m28s 2

ORTHOMIN Neumann Poly 7h57m11s 56m27s 8.5 ILU(0) 2h25m48s 37m58s 3.8

BILUK(0) 2h37m23s 41m22s 3.8

RESERVOIR SIMULATION

Conclusions •  Implemented a GPU-based linear solver package including the blas

operation, linear solvers, preconditioners and several pre-process methods.

•  Compared the speedup performances of different linear solvers and preconditioners, and achieved around 10x speedup for SPE10 matrix.

•  Coupled our GPU-based linear solver package with a in-house black oil reservoir simulator to speed-up SPE10 simulation problem and using GMRES, we can achieve the speed up of 5-10 for different precondtioners.

All publications can be accessed at:

•  http://sites.google.com/site/monramax/publication

RESERVOIR SIMULATION

THANK YOU