large scale reservoir simulation utilizing multiple...
TRANSCRIPT
![Page 1: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/1.jpg)
Innovative Technology for Reservoir Engineers Ridgeway Kite
Large Scale Reservoir Simulation utilizing multiple GPUs
Garf Bowen
25th March 2014
![Page 2: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/2.jpg)
Ridgeway Kite
Summary
• Introduce
– RKS
– Reservoir Simulation
• HPC goals
• Implementation
• Large scale simulations
• Results & future
![Page 3: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/3.jpg)
Ridgeway Kite
RKS
• Start-up (April 2013)
– Long history in Reservoir Simulation
– Sister company, NITEC – consulting
• Differentiators
– Massively Parallel Code
– Multiple Realizations
– “Unconventional”
– Coupled surface network
![Page 4: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/4.jpg)
Ridgeway Kite
Reservoir Simulation
• Finite Volume
• Unstructured (features)
• Implicit
𝑹 = ∆𝑴 − 𝑭 = 𝟎
![Page 5: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/5.jpg)
Ridgeway Kite
Driving from London to Manchester…
Check the Ferrari or the traffic jam?
Lot of code that all needs to go fast Challenge is often “not to go slow” Can’t just focus on “hot spots”
![Page 6: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/6.jpg)
Ridgeway Kite
HPC goals
• “not to go slow”
• Portability CPU/GPU (+clusters)
– Want to be future proof
• Simplification
– (massive) parallelization is an opportunity
– Developer efficiency
– Same result on any platform
![Page 7: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/7.jpg)
Ridgeway Kite
Shuffle Calculate Pattern
Calculate “one-to-one”
Shuffle Scatter I/O from node zero
Gather output
• All data is on the GPU • Calculations are embarrassingly parallel • No indirect addressing • Ability to time separately
![Page 8: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/8.jpg)
Ridgeway Kite
Example – calculate flows
More flows than cells
One cell involved in Multiple flows
One flow two cells Different flow same cell
Multiple copies – “slots”
![Page 9: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/9.jpg)
Ridgeway Kite
Simplicity Returns? “one code” kernel many (independent) calls
Split to run MPI distributed
Underlying system - XPL • Takes care of running
• Different modes • Different architectures
Code looks serial again
![Page 10: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/10.jpg)
Ridgeway Kite
Maps & MPI
Src Dest Slot
i1 j1 0
i2 j2 1
i3 j3 0
i4 j4 1
… … …
Maps are defined in “serial” space Not recommended
test.exe –cpu
test.exe –gpu
mpirun –np 16 test.exe
![Page 11: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/11.jpg)
Ridgeway Kite
Simple Example
𝑥𝑖 = 𝐴𝑖−1𝑟𝑖 ∀𝑖
A - n*n small dense matrix ~millions of i’s LU factorization (partial pivoting)
template<typename KP>
struct Testinv
{
__host__ __device__
Testinv(Args* inArgs, int index, int N)
{
int ia=0;
mat<double,KP> a(inArgs,ia++,index);
vec<double,KP> r(inArgs,ia++,index);
vec<double,KP> x(inArgs,ia++,index);
mat<double,KP> w(inArgs,ia++,index);
w = a;
w.inv();
x.zero();
w.mult(r,x);
case rks::TestKernels::TEST_INV:
calc(inArgs, gpu<Testinv<kp> >, cpu<Testinv<kp> >);
break;
y = 2.35x + 2.31 y = 2.23x + 1.20
2.00
3.00
4.00
5.00
0.40 0.60 0.80 1.00 1.20
log
tim
e (
secs
)
Log n
Scaling
CPU
GPU
![Page 12: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/12.jpg)
Ridgeway Kite
Now add complexity well -- 40 8.4 jac -- 40 19.1
mass -- -- 40 1.9
flow -- -- 40 16.5
flow_ -- -- -- 4640 16.0
norm -- 40 0.4
lin -- 30 52.7 52.5
ling -- -- 30 2.0 2.0
lins -- -- 30 50.0
orth-it -- -- -- 30 49.9
norm -- -- -- -- 219 0.1
precon -- -- -- -- 189 48.1
pressure -- -- -- -- -- 189 46.9
====================================================
Comparison between:
cpu 1243.630 and gpu 147.960
====================================================
well -- 1.0 0.08
jac -- 1.0 12.62
mass -- -- 1.0 17.93
flow -- -- 1.0 11.66
flow_ -- -- -- 1.0 11.84
norm -- 1.0 2.19
lin -- 1.0 9.87
ling -- -- 1.0 1.70
lins -- -- 1.0 10.08
orth-it -- -- -- 1.0 10.10
norm -- -- -- -- 1.0 48.40
precon -- -- -- -- 1.0 9.17
pressure -- -- -- -- -- 1.0 8.24
![Page 13: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/13.jpg)
Ridgeway Kite
Linear Solver Strategy Linear Solver Important
Communication Mechanism Challenge in parallel
environments
…but we’re only a small company And don’t really want to be linear
solver experts
Like getting “the same” results If we can implement a solver in XPL,
then we get this for free
Home grown May not be competitive
Using Nvidia’s AmgX Lose the “same” algorithm
Performing
![Page 14: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/14.jpg)
Ridgeway Kite
Linear Solver
• Home Grown – Massively helpful for development
• Same results for all configurations
– Challenged algorithmically on difficult problems
• AmgX – Many options (pre-coded)
– Single GPU working well
– Focussed our effort here • MPI programming becomes important
![Page 15: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/15.jpg)
Ridgeway Kite
Strategy as problem size increases
• Tesla C2070
– 6Gb memory
– Black Oil model 1million cells (SPE10 1.2e6 cells)
• Little incentive to utilize >1 GPU
• noting people will often run multiple realizations
• Larger model -> cluster
– Memory constrained
![Page 16: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/16.jpg)
Ridgeway Kite
Scaling Test
• Based on SPE10 benchmark – Refined model – 5 wells – ~1 million cells
• We can fit: – Base case on one GPU – 4 (connected) copies on 4 GPUs
• Actually require 8 GPUs – Extra memory
– 16 copies on 16/32 GPUs
• Less challenging scaling than refinement
![Page 17: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/17.jpg)
Ridgeway Kite
Memory & Performance
0
500
1000
1500
2000
2500
3000
3500
4000
4500
1 2 3 4 5 6 7 8
Me
mo
ry M
b
processors
Memory
4E6 - 8GPUs
1E6 - 2 GPUs
1E6 - 1GPU
0
200
400
600
800
1000
1200
1400
"1E6-1GPU" "1E6-2GPU" "4E6-8GPU"
Wal
l Clo
ck T
ime
(se
cs)
Example Performance
Lessons: Very variable timings Instrumentation vital Future: Still working on the 32-way case Classical MPI optimization step
![Page 18: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/18.jpg)
Ridgeway Kite
Summary & Conclusions
• Shuffle-Calculate pattern
– Works for us, so far
– Portable
– Allowing us to exploit the GPU
– Using Amgx we’re able to tackle realistic cases requiring multi-GPU’s
• Full system
– Commercial offering early next year
![Page 19: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/19.jpg)
Ridgeway Kite
Acknowledgements
• Co-authors: Bachar Zineddin & Tommy Miller
• Jeremy Appleyard, Nvidia
• “The authors would like to acknowledge the work presented here made use of the IRIDIS*/EMERALD* HPC facility provided by the Centre for Innovation.”
• Nvidia for AmgX beta access
![Page 20: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/20.jpg)
Ridgeway Kite
Questions?
![Page 21: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/21.jpg)
Ridgeway Kite
Backup#1 – LU code example //
// Main elimination loop
//
for (int j=0; j<m_xdim; j++)
{
//
// Sum
//
for (int i=0; i<j;i++)
{
double sum = (*this)(i,j);
for (int k=0; k<i; k++)
{
sum = sum - (*this)(i,k)*(*this)(k,j);
}
(*this)(i,j) = sum;
}
//
// Max
//
aamax = 0.0;
for(int i=j; i<m_xdim; i++)
{
double sum = (*this)(i,j);
for( int k=0; k<j; k++)
{
sum = sum - (*this)(i,k)*(*this)(k,j);
}
(*this)(i,j) = sum;
if ( std::fabs(vv[i]*sum)>=aamax )
{
imax = i;
aamax = std::fabs(vv[i]*sum);
}
}
//
// Swap
//
if (j!=imax)
{
for( int k=0; k<m_xdim; k++)
{
double dum = (*this)(imax,j);
(*this)(imax,k) = (*this)(j,k);
(*this)(j,k) = dum;
}
vv[imax] = vv[j];
}
//
// Store
//
piv[j] = imax;
if ( (*this)(j,j)==0.0 )
{
(*this)(j,j) = 1e-20;
}
//
// Set
//
if(j!=m_xdim)
{
double dum = 1.0/(*this)(j,j);
for( int i=j+1; i<m_xdim; i++ )
{
(*this)(i,j) = (*this)(i,j)*dum;
}
}
}
//------ End lu step ----
![Page 22: Large Scale Reservoir Simulation Utilizing Multiple GPUson-demand.gputechconf.com/.../S4727-lg-scale-reservoir-sims-gpus.pdf · Innovative Technology for Reservoir Engineers Ridgeway](https://reader031.vdocument.in/reader031/viewer/2022020411/5ac25b597f8b9a213f8e3499/html5/thumbnails/22.jpg)
Ridgeway Kite
Backup#2 – Home Grown Solver
𝐴𝑤𝑤 𝐴𝑤𝑏
𝐴𝑏𝑤 𝐴𝑏𝑏
𝑥𝑤
𝑥𝑏=
𝑅𝑤
𝑅𝑏
𝐴𝑤𝑤 0
𝐴𝑏𝑤 𝐴𝑏𝑏∗
𝐼 𝐴𝑤𝑏∗
0 𝐼
𝑥𝑤
𝑥𝑏=
𝑅𝑤
𝑅𝑏
𝐴𝑏𝑏∗ =𝐴𝑏𝑏 − 𝐴𝑏𝑤𝐴𝑤𝑤
−1𝐴𝑤𝑏
1 − 𝑥 −1 = 1 + 𝑥 + 𝑥2 + 𝑥3 + … . .
𝑥 = 𝐴𝑏𝑤𝐴𝑤𝑤−1𝐴𝑤𝑏 𝐴𝑏𝑏
−1
Note:
With: