large scale reservoir simulation utilizing multiple gpus...large scale reservoir simulation...

22
Innovative Technology for Reservoir Engineers Ridgeway Kite Large Scale Reservoir Simulation utilizing multiple GPUs Garf Bowen 25 th March 2014

Upload: others

Post on 07-Mar-2021

10 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Innovative Technology for Reservoir Engineers Ridgeway Kite

Large Scale Reservoir Simulation utilizing multiple GPUs

Garf Bowen

25th March 2014

Page 2: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Summary

• Introduce

– RKS

– Reservoir Simulation

• HPC goals

• Implementation

• Large scale simulations

• Results & future

Page 3: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

RKS

• Start-up (April 2013)

– Long history in Reservoir Simulation

– Sister company, NITEC – consulting

• Differentiators

– Massively Parallel Code

– Multiple Realizations

– “Unconventional”

– Coupled surface network

Page 4: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Reservoir Simulation

• Finite Volume

• Unstructured (features)

• Implicit

𝑹 = ∆𝑴 − 𝑭 = 𝟎

Page 5: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Driving from London to Manchester…

Check the Ferrari or the traffic jam?

Lot of code that all needs to go fast Challenge is often “not to go slow” Can’t just focus on “hot spots”

Page 6: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

HPC goals

• “not to go slow”

• Portability CPU/GPU (+clusters)

– Want to be future proof

• Simplification

– (massive) parallelization is an opportunity

– Developer efficiency

– Same result on any platform

Page 7: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Shuffle Calculate Pattern

Calculate “one-to-one”

Shuffle Scatter I/O from node zero

Gather output

• All data is on the GPU • Calculations are embarrassingly parallel • No indirect addressing • Ability to time separately

Page 8: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Example – calculate flows

More flows than cells

One cell involved in Multiple flows

One flow two cells Different flow same cell

Multiple copies – “slots”

Page 9: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Simplicity Returns? “one code” kernel many (independent) calls

Split to run MPI distributed

Underlying system - XPL • Takes care of running

• Different modes • Different architectures

Code looks serial again

Page 10: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Maps & MPI

Src Dest Slot

i1 j1 0

i2 j2 1

i3 j3 0

i4 j4 1

… … …

Maps are defined in “serial” space Not recommended

test.exe –cpu

test.exe –gpu

mpirun –np 16 test.exe

Page 11: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Simple Example

𝑥𝑖 = 𝐴𝑖−1𝑟𝑖 ∀𝑖

A - n*n small dense matrix ~millions of i’s LU factorization (partial pivoting)

template<typename KP>

struct Testinv

{

__host__ __device__

Testinv(Args* inArgs, int index, int N)

{

int ia=0;

mat<double,KP> a(inArgs,ia++,index);

vec<double,KP> r(inArgs,ia++,index);

vec<double,KP> x(inArgs,ia++,index);

mat<double,KP> w(inArgs,ia++,index);

w = a;

w.inv();

x.zero();

w.mult(r,x);

case rks::TestKernels::TEST_INV:

calc(inArgs, gpu<Testinv<kp> >, cpu<Testinv<kp> >);

break;

y = 2.35x + 2.31 y = 2.23x + 1.20

2.00

3.00

4.00

5.00

0.40 0.60 0.80 1.00 1.20

log

tim

e (

secs

)

Log n

Scaling

CPU

GPU

Page 12: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Now add complexity well -- 40 8.4 jac -- 40 19.1

mass -- -- 40 1.9

flow -- -- 40 16.5

flow_ -- -- -- 4640 16.0

norm -- 40 0.4

lin -- 30 52.7 52.5

ling -- -- 30 2.0 2.0

lins -- -- 30 50.0

orth-it -- -- -- 30 49.9

norm -- -- -- -- 219 0.1

precon -- -- -- -- 189 48.1

pressure -- -- -- -- -- 189 46.9

====================================================

Comparison between:

cpu 1243.630 and gpu 147.960

====================================================

well -- 1.0 0.08

jac -- 1.0 12.62

mass -- -- 1.0 17.93

flow -- -- 1.0 11.66

flow_ -- -- -- 1.0 11.84

norm -- 1.0 2.19

lin -- 1.0 9.87

ling -- -- 1.0 1.70

lins -- -- 1.0 10.08

orth-it -- -- -- 1.0 10.10

norm -- -- -- -- 1.0 48.40

precon -- -- -- -- 1.0 9.17

pressure -- -- -- -- -- 1.0 8.24

Page 13: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Linear Solver Strategy Linear Solver Important

Communication Mechanism Challenge in parallel

environments

…but we’re only a small company And don’t really want to be linear

solver experts

Like getting “the same” results If we can implement a solver in XPL,

then we get this for free

Home grown May not be competitive

Using Nvidia’s AmgX Lose the “same” algorithm

Performing

Page 14: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Linear Solver

• Home Grown – Massively helpful for development

• Same results for all configurations

– Challenged algorithmically on difficult problems

• AmgX – Many options (pre-coded)

– Single GPU working well

– Focussed our effort here • MPI programming becomes important

Page 15: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Strategy as problem size increases

• Tesla C2070

– 6Gb memory

– Black Oil model 1million cells (SPE10 1.2e6 cells)

• Little incentive to utilize >1 GPU

• noting people will often run multiple realizations

• Larger model -> cluster

– Memory constrained

Page 16: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Scaling Test

• Based on SPE10 benchmark – Refined model – 5 wells – ~1 million cells

• We can fit: – Base case on one GPU – 4 (connected) copies on 4 GPUs

• Actually require 8 GPUs – Extra memory

– 16 copies on 16/32 GPUs

• Less challenging scaling than refinement

Page 17: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Memory & Performance

0

500

1000

1500

2000

2500

3000

3500

4000

4500

1 2 3 4 5 6 7 8

Me

mo

ry M

b

processors

Memory

4E6 - 8GPUs

1E6 - 2 GPUs

1E6 - 1GPU

0

200

400

600

800

1000

1200

1400

"1E6-1GPU" "1E6-2GPU" "4E6-8GPU"

Wal

l Clo

ck T

ime

(se

cs)

Example Performance

Lessons: Very variable timings Instrumentation vital Future: Still working on the 32-way case Classical MPI optimization step

Page 18: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Summary & Conclusions

• Shuffle-Calculate pattern

– Works for us, so far

– Portable

– Allowing us to exploit the GPU

– Using Amgx we’re able to tackle realistic cases requiring multi-GPU’s

• Full system

– Commercial offering early next year

Page 19: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Acknowledgements

• Co-authors: Bachar Zineddin & Tommy Miller

• Jeremy Appleyard, Nvidia

• “The authors would like to acknowledge the work presented here made use of the IRIDIS*/EMERALD* HPC facility provided by the Centre for Innovation.”

• Nvidia for AmgX beta access

Page 20: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Questions?

Page 21: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Backup#1 – LU code example //

// Main elimination loop

//

for (int j=0; j<m_xdim; j++)

{

//

// Sum

//

for (int i=0; i<j;i++)

{

double sum = (*this)(i,j);

for (int k=0; k<i; k++)

{

sum = sum - (*this)(i,k)*(*this)(k,j);

}

(*this)(i,j) = sum;

}

//

// Max

//

aamax = 0.0;

for(int i=j; i<m_xdim; i++)

{

double sum = (*this)(i,j);

for( int k=0; k<j; k++)

{

sum = sum - (*this)(i,k)*(*this)(k,j);

}

(*this)(i,j) = sum;

if ( std::fabs(vv[i]*sum)>=aamax )

{

imax = i;

aamax = std::fabs(vv[i]*sum);

}

}

//

// Swap

//

if (j!=imax)

{

for( int k=0; k<m_xdim; k++)

{

double dum = (*this)(imax,j);

(*this)(imax,k) = (*this)(j,k);

(*this)(j,k) = dum;

}

vv[imax] = vv[j];

}

//

// Store

//

piv[j] = imax;

if ( (*this)(j,j)==0.0 )

{

(*this)(j,j) = 1e-20;

}

//

// Set

//

if(j!=m_xdim)

{

double dum = 1.0/(*this)(j,j);

for( int i=j+1; i<m_xdim; i++ )

{

(*this)(i,j) = (*this)(i,j)*dum;

}

}

}

//------ End lu step ----

Page 22: Large Scale Reservoir Simulation Utilizing Multiple GPUs...Large Scale Reservoir Simulation Utilizing Multiple GPUs Author: Garfield Bowen Subject: Reservoir simulation has a long

Ridgeway Kite

Backup#2 – Home Grown Solver

𝐴𝑤𝑤 𝐴𝑤𝑏

𝐴𝑏𝑤 𝐴𝑏𝑏

𝑥𝑤

𝑥𝑏=

𝑅𝑤

𝑅𝑏

𝐴𝑤𝑤 0

𝐴𝑏𝑤 𝐴𝑏𝑏∗

𝐼 𝐴𝑤𝑏∗

0 𝐼

𝑥𝑤

𝑥𝑏=

𝑅𝑤

𝑅𝑏

𝐴𝑏𝑏∗ =𝐴𝑏𝑏 − 𝐴𝑏𝑤𝐴𝑤𝑤

−1𝐴𝑤𝑏

1 − 𝑥 −1 = 1 + 𝑥 + 𝑥2 + 𝑥3 + … . .

𝑥 = 𝐴𝑏𝑤𝐴𝑤𝑤−1𝐴𝑤𝑏 𝐴𝑏𝑏

−1

Note:

With: