large-scale reservoir simulation on gpu - gpu technology...

22
RESERVOIR SIMULATION Large-Scale Reservoir Simulation on GPU Song Yu, Hui Liu Advisor: Dr. Zhangxing (John) Chen University of Calgary

Upload: others

Post on 18-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • RESERVOIR SIMULATION

    Large-Scale Reservoir Simulation on GPU

    Song Yu, Hui Liu

    Advisor: Dr. Zhangxing (John) Chen

    University of Calgary

  • RESERVOIR SIMULATION

    Outline •  Introduction

    •  GPU-based Linear Solver

    •  GPU-based Reservoir Simulation

    •  Numerical Experiments

    •  Conclusions

  • RESERVOIR SIMULATION

    Introduction

    •  Numerical method: FDM, FEM, FVM à matrix system •  A system matrix arising from simulation: sparse, highly nonsymmetric and ill-

    conditioned.

    •  The general choice: Krylov subspace solvers with preconditioners. •  Large-scale Reservoir simulation time: 80% -90% on solver •  Speed up linear solvers à speed up reservoir simulation

  • RESERVOIR SIMULATION

    GPU Architecture (Tesla) D

    RA

    M I/

    F H

    OST

    I/F

    Gig

    a Th

    read

    D

    RA

    M I/

    F DRA

    M I/F

    DR

    AM

    I/F D

    RA

    M I/F

    DR

    AM

    I/F

    L2

    GPU SM

  • RESERVOIR SIMULATION

    GPU-based Linear Solver Package

  • RESERVOIR SIMULATION

    GMRES Iterative algorithm used for solving linear system of equations in the form of Ax = b For an m*m matrix, GMRES guarantees convergence to the exact solution within m iterations. In reality, m is a very large number, so we use restart GMRES(m). GMRES converges after a small number of iterations when it is used in conjunction with a good preconditioner.

    Main computational factor: •  BLAS operation: •  Matrix-vector product:

    •  Preconditioning operation:

    T

    y x vector scalez x y dot producty x y saxpy

    α

    α

    =

    =

    = +

    y Ax=

    Mr b=

  • RESERVOIR SIMULATION

    Preconditioner The convergence rate of iterative linear solvers depends highly on the condition number of the matrix. Preconditioners are used to reduce the matrix condition number and speed up the convergence of iterative solvers. Ax = b à M-1Ax = M-1b M ≈ A ≈ LU Two criteria to choose M : 1: good approximation of A 2: easy to compute M-1 or solve Mx=b •  ILU is one of the most popular preconditioner families. Some non-zero elements in the L and U factors are ignored to reduce the cost

    and the number of fill-ins. ILU has many varieties based on the level of fill-in. 1. no fill-in ILU: ILU(0), is the simplest one. In ILU(0), the lower and upper triangular matrices only keep non-zero elements, whose positions have non-zero elements in the original matrix. 2. fill-in ILU : ILUT with numerical threshold and ILUk with fill-in level k The more fill-in, the more time the factorization takes. It is a trade-off between accuracy and speed

  • RESERVOIR SIMULATION

    Sparse matrix vector multiplication

    •  Matrix: HEC format, Hybrid of ELL format and CSR format

    J V Ap

    J

    V

    ELL format CSR format

    i

    i i+1

  • RESERVOIR SIMULATION

    GPU-based Linear Solver Package

  • RESERVOIR SIMULATION

    GPU-based Reservoir Simulation

    •  Conservation Equations –  Material Conservation –  Energy Conservation

    •  Linear Solver –  Linear Solver, eg. GMRES, BICGSTAB, ORTHOMIN –  Non-Linear (Newtonian) Solver

  • RESERVOIR SIMULATION

    Jacobian Matrix Example

    nRRRRRRRRR

    TspTspTsp

    n

    e

    w

    o

    e

    w

    o

    e

    w

    o

    o

    o

    o

    o

    o

    o

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    SR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    SR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    TR

    sR

    pR

    e

    o

    e

    o

    ee

    o

    e

    o

    e

    w

    o

    w

    o

    ww

    o

    w

    o

    w

    o

    o

    o

    o

    oo

    o

    o

    o

    o

    e

    o

    e

    o

    ee

    o

    e

    o

    ee

    o

    e

    o

    e

    w

    o

    w

    o

    ww

    o

    w

    o

    ww

    o

    w

    o

    w

    o

    o

    o

    o

    oo

    o

    o

    o

    oo

    o

    o

    o

    o

    e

    o

    e

    o

    ee

    o

    e

    o

    e

    w

    o

    w

    o

    ww

    o

    w

    o

    w

    o

    o

    o

    o

    oo

    o

    o

    o

    o

    ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

    ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

    −=

    ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

    ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

    Δ

    Δ

    Δ

    Δ

    Δ

    Δ

    Δ

    Δ

    Δ

    ⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟⎟

    ⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜⎜

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    ΔΔ

    3

    3

    3

    2

    2

    2

    1

    1

    1

    3

    3

    3

    2

    2

    2

    1

    1

    1

    3

    3

    3

    3

    3

    3

    2

    3

    2

    3

    2

    3

    3

    3

    3

    3

    3

    3

    2

    3

    2

    3

    2

    3

    3

    3

    3

    3

    3

    3

    2

    3

    2

    3

    2

    3

    3

    2

    3

    2

    3

    2

    2

    2

    2

    2

    2

    2

    1

    2

    1

    2

    1

    2

    3

    2

    3

    2

    3

    2

    2

    2

    2

    2

    2

    2

    1

    2

    1

    2

    1

    2

    3

    2

    3

    2

    3

    2

    2

    2

    2

    2

    2

    2

    1

    2

    1

    2

    1

    2

    2

    1

    2

    1

    2

    1

    1

    1

    1

    1

    1

    1

    2

    1

    2

    1

    2

    1

    1

    1

    1

    1

    1

    1

    2

    1

    2

    1

    2

    1

    1

    1

    1

    1

    1

    1

    000000000

    000000000

  • RESERVOIR SIMULATION

    GPU-based Reservoir Simulation

    All timesteps done?

    Start timestep loop

    Initialization

    Start Newton iteration

    Build Jacobian & r.h.s.

    Solve matrix equation

    Converged?

    Update and I/O

    End

    No

    Yes

    No

    Yes

    Data input

    Yes

    Data input

    Time to end?

    Start time step loop

    Initialization

    Start Newton iteration

    Build Jacobian & r.h.s.

    Solve matrix equation on GPU

    Converged?

    Update and I/O

    End

    No

    Yes

    No Matrix solver

    Matrix preprocess on CPU

    Generate PC M on CPU

    Transfer DATA to GPU

    Solve Ax = b on GPU

    Transfer x back to CPU

  • RESERVOIR SIMULATION

    Numerical Experiments •  CPU, Intel Xeon X5570, 8M cache, 2.93GHz, 32G memory •  GPU, NVIDIA Tesla C2050/C2070, 3G/6G memory, ECC

    disabled •  Environment: Linux (Fedora 13 x86_64, kernel 2.6.34.7-61),

    CUDA Toolkit 4.0, GCC 4.4.5 •  Compiler options: -arch=sm_20 –Xcompiler “-Wall” –O3

  • RESERVOIR SIMULATION

    Numerical Experiments

    case 1: Testing 4 preconditioners and 3 solvers.

    case 2: Testing the effect of block number to the speedup performance of BILU(0) and BILU(T)

    case 3: Testing the speedup of the whole simulation process

    Matrix N NNZ NNZ/ROW SPE10-1 2,188,851 29,915,573 13.7 SPE10-2 2,188,851 29,915,573 13.7

    Case description

  • RESERVOIR SIMULATION

    Case 1

    Matrix N NNZ NNZ/ROW SPE10-1 2,188,851 29,915,573 13.7

    Relative tolerance 1E-3 Restart m 40

    Neumann Polynomial order 16 METIS partition 8

    Case description

    Experimental parameter

  • RESERVOIR SIMULATION

    Solver PC Iteration CPU time GPU time Speedup GMRES Neumann Poly 30 1620.5 125.9 12.9

    ILU(0) 18 263.8 27.9 9.5 BILU(0) 20 307.8 27.5 11.2

    Performance comparison

    All the PC à speedup of 10x Bilu(0) and ILU(0) both converge fast.

  • RESERVOIR SIMULATION

    Solver PC Iteration CPU time GPU time Speedup BiCGSTAB Neumann Poly 359 740.7 64.7 11.4

    ILU(0) 260/249 84.3 11.7 7.2 BILU(0) 243 85.6 9 9.5

    Performance comparison

    Solver PC Iteration CPU time GPU time Speedup ORTHOMIN Neumann Poly 543 1449.9 114.1 12.7

    ILU(0) 392 284.8 30.1 9.5 BILU(0) 400 283 27.6 10.3

    Speed up à 10x BICGSTAB with ILU(0) and BILU(0) solved faster than GMRES and ORTHOMIN

  • RESERVOIR SIMULATION

    Blks Iteration CPU time GPU time Speedup 1 21 121.1 15 8.14 4 23 124.33 15 8.27 8 23 126.40 15.32 8.23

    16 29 180.06 19.05 9.44

    Blks Iteration CPU time GPU time Speedup 1 5 34.20 11.70 2.92 4 7 44.67 10.35 4.30 8 7 45.78 9.58 4.76

    16 10 63.13 12.43 5.07

    GMRES(20) + block ILU(0)

    GMRES(20) + block ILUT

    Case 2

  • RESERVOIR SIMULATION

    Case 3: GPU-based Reservoir Simulation

    •  The SPE 10 Comparative Solution Project •  Fine grid (60 * 220 * 85) •  Highly heterogeneous

    Relative tolerance 1e-3 Restart m 60

    Neumann Polynomial order

    16

    Number of blocks 8

  • RESERVOIR SIMULATION

    Solver PC CPU time GPU time Speedup

    GMRES Neumann Poly 4h49m23s 29m43s 9.7

    ILU(0) 1h30m16s 17m18s 5.2

    BILU(0) 2h37m02s 20m18s 7.7

    BiCGSTAB Neumann Poly 4h14m57s 36m13s 7

    ILU(0) 1h0m40s 31m42s 1.9 BILU(0) 1h7m22s 34m28s 2

    ORTHOMIN Neumann Poly 7h57m11s 56m27s 8.5 ILU(0) 2h25m48s 37m58s 3.8

    BILUK(0) 2h37m23s 41m22s 3.8

  • RESERVOIR SIMULATION

    Conclusions •  Implemented a GPU-based linear solver package including the blas

    operation, linear solvers, preconditioners and several pre-process methods.

    •  Compared the speedup performances of different linear solvers and preconditioners, and achieved around 10x speedup for SPE10 matrix.

    •  Coupled our GPU-based linear solver package with a in-house black oil reservoir simulator to speed-up SPE10 simulation problem and using GMRES, we can achieve the speed up of 5-10 for different precondtioners.

    All publications can be accessed at:

    •  http://sites.google.com/site/monramax/publication

  • RESERVOIR SIMULATION

    THANK YOU