seismic imaging on nvidia gpus€¦ · seismic imaging summary • seismic imaging cuda codes –...
TRANSCRIPT
![Page 1: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/1.jpg)
Seismic ImagingSeismic Imagingon NVIDIA on NVIDIA GPUsGPUs
Scott MortonScott MortonHess CorporationHess Corporation
![Page 2: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/2.jpg)
2
Seismic ImagingSeismic ImagingOutlineOutline
• Seismic data & imaging
• NVIDIA GPUs + CUDA– Why?– How?
• Three imaging methods– Algorithm– Challenges– Performance
![Page 3: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/3.jpg)
3
Seismic ImagingSeismic ImagingDataData
oilgas
H2O
![Page 4: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/4.jpg)
4
Seismic ImagingSeismic ImagingDataData
Receiver
Time (m
s)
![Page 5: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/5.jpg)
5
Seismic ImagingSeismic ImagingDataData
![Page 6: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/6.jpg)
6
Seismic ImagingSeismic ImagingAn iterative processAn iterative process
Construct initialearth model
Perform imaging
Image & Modelconsistent?
Done
No
Yes
Update earth model
Computation scales as size of data and
image/model
![Page 7: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/7.jpg)
7
Seismic ImagingSeismic ImagingWhy GPUs?Why GPUs?
• Price-to-performance ratio improvement– Want 10X to change platforms
• Payback must more than cover effort & risk• Got 10X ten years ago in switching from
supercomputers to PC clusters
![Page 8: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/8.jpg)
8
Seismic ImagingSeismic ImagingWhy GPUs?Why GPUs?
• Price-to-performance ratio improvement– Want 10X to change platforms
• Payback must more than cover effort & risk• Got 10X ten years ago in switching from
supercomputers to PC clusters
– Several years ago there were indicators we can get 10X or more on GPUs• Peak performance• Benchmarks• Simple prototype kernels
![Page 9: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/9.jpg)
9
Seismic ImagingSeismic ImagingWhy CUDA & NVIDIA GPUs?Why CUDA & NVIDIA GPUs?
• Ease of programming– Must be able to port, maintain & modify production
codes (relatively) easily• These costs must be included
– Have tried Cg, Brook and Peakstream• All lacking in some aspect
– CUDA programming model straightforward• SIMD-like thread-based parallelism• In 1.5 days
– Took “intro to CUDA” class– Wrote a working 2-D seismic modeling code
• Programming memory hierarchy for optimization is the biggest challenge
![Page 10: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/10.jpg)
10
Seismic ImagingSeismic ImagingHow to port a code?How to port a code?
• Design GPU algorithm– Optimize for memory hierarchy– Keep main data structures in GPU memory
• Create prototype GPU kernel– Include main computational characteristics– Test performance against CPU kernel– Iteratively refine prototype
• Port full kernel & compare with CPU kernel– Verify numerical results– Compare performance results
• Incorporate into production code & system
![Page 11: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/11.jpg)
11
Seismic ImagingSeismic ImagingImaging methodsImaging methods
• Kirchhoff imaging– High-frequency propagation– Ray or eikonal travel-times
• “Wave-equation” imaging– One-way propagation: z ~ t– Frequency-domain method– ADI (alternating direction
implicit) finite difference
• “Reverse-time” imaging– Two-way propagation– Time-domain– Explicit finite-difference
Increasing computational cost
![Page 12: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/12.jpg)
12
Kirchhoff ImagingKirchhoff ImagingPhysical algorithmPhysical algorithm
• Based on the Kirchhoff integral– Pre-compute coarse travel-times for propagation
from surface locations to image points:
– 4-D surface integral through a 5-D data set
– Computational complexity:• NI ~ 109 is the number of output image points• ND~ 108 is the number of input data traces• f ~ 10 is the number of cycles/point/trace• f NI ND ~ 1018 cycles ~ 10 CPU-years
( )),(T),(T,,D)(I 22 rxxsrsrsx rrrrrrr+== ∫∫ tdd
),(T xs rr
![Page 13: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/13.jpg)
13
Kirchhoff ImagingKirchhoff ImagingComputational kernelComputational kernel
z
x
y
t = TS + TR Add toImage
Image traceData trace
Image point x
Source s
Receiver r
TS
TR
( )∑ +==rs
rxxsrsxrr
rrrrrrr
,),(T),(T,,D)(I t
1 migration contribution
![Page 14: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/14.jpg)
14
Kirchhoff ImagingKirchhoff ImagingCUDA kernelCUDA kernel
GPU
Data traces
In texture
Get cached
![Page 15: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/15.jpg)
15
0 – Initial Kernel1 – Used Texture Memory2 – Used Shared Memory3 – Global Memory Coalescing4 – Decreased Data Trace Shared
Memory Use5 – Optimized Use of Shared
Memory6 – Consolidated “if” Statements,
Eliminated or Substituted Some Math Operations
7 – Removed an “if” and “for”8 – Used Texture Memory for Data-
Trace Fetch
0 – Initial Kernel1 – Used Texture Memory2 – Used Shared Memory3 – Global Memory Coalescing4 – Decreased Data Trace Shared
Memory Use5 – Optimized Use of Shared
Memory6 – Consolidated “if” Statements,
Eliminated or Substituted Some Math Operations
0 – Initial Kernel1 – Used Texture Memory2 – Used Shared Memory3 – Global Memory Coalescing4 – Decreased Data Trace Shared
Memory Use5 – Optimized Use of Shared
Memory
0 – Initial Kernel1 – Used Texture Memory2 – Used Shared Memory3 – Global Memory Coalescing
Performance
0.00
1.00
2.00
3.00
4.00
5.00
6.00
7.00
8.00
9.00
10.00
0 1 2 3 4 5 6 7 8 9
Code Version
Bill
ions
of M
igra
tion
Con
trib
utio
ns p
er S
econ
d
GPUCPU
Kirchhoff ImagingKirchhoff ImagingKernel optimizationKernel optimization
![Page 16: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/16.jpg)
16
GPU-to-CPU Performance Ratio
0
20
40
60
80
100
2 4 6 8 10 12 14 16 18
image points per travel-time cell in x or y
GPU
Spe
ed-u
p
CUDA 2D Tex (nix=4)CUDA Linear Tex (nix=4)CUDA 2D Tex (niy=4)CUDA Linear Tex (niy=4)CUDA 2D Tex (nix=4) G2CUDA 2D Tex (nix=4) G2 PINCUDA Linear Tex (nix=4) G2CUDA Linear Tex (nix=4) G2 PINCUDA 2D Tex (niy=4) G2CUDA 2D Tex (niy=4) G2 PINCUDA Linear Tex (niy=4) G2CUDA Linear Tex (niy=4) G2 PIN
Kirchhoff ImagingKirchhoff ImagingKernel performanceKernel performance
G80
GT200
Typical parameter range
![Page 17: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/17.jpg)
17
Kirchhoff ImagingKirchhoff ImagingProduction statusProduction status
• GPU kernel incorporated into production code– Large kernel speed-ups results in “CPU overhead” for task
setup dominating GPU production runs
• Further optimizations– create GPU kernels for most “overhead” components– optimized left-over CPU code (which helps CPU version also)
Time (hr) Set-up Kernel Total Speed-up
Original CPU code 5 20 25
Main GPU kernel 5 0.5 5.5 5
Further optimizations
0.5 0.5 1 25
![Page 18: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/18.jpg)
18
PtP
V2
2
2
2)(1
∇=∂∂
xr
• Based on scalar wave equation
• Frequency-domain• Preferred direction of
propagation: z ~ t
– Evolution in depth
xy
z
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
Pyx
VV
izP
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
+∂∂
+±
=∂∂
2
2
2
2
2
2 )(1)( ωω xx
r
r
![Page 19: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/19.jpg)
19
xy
z
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
PtP
V2
2
2
2)(1
∇=∂∂
xr
Pyx
VV
izP
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
+∂∂
+±
=∂∂
2
2
2
2
2
2 )(1)( ωω xx
r
r
• Based on scalar wave equation
• Frequency-domain• Preferred direction of
propagation: z ~ t
– Evolution in depth
![Page 20: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/20.jpg)
20
xy
z
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
PtP
V2
2
2
2)(1
∇=∂∂
xr
Pyx
VV
izP
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
+∂∂
+±
=∂∂
2
2
2
2
2
2 )(1)( ωω xx
r
r
• Based on scalar wave equation
• Frequency-domain• Preferred direction of
propagation: z ~ t
– Evolution in depth
![Page 21: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/21.jpg)
21
xy
z
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
PtP
V2
2
2
2)(1
∇=∂∂
xr
Pyx
VV
izP
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
+∂∂
+±
=∂∂
2
2
2
2
2
2 )(1)( ωω xx
r
r
• Based on scalar wave equation
• Frequency-domain• Preferred direction of
propagation: z ~ t
– Evolution in depth
![Page 22: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/22.jpg)
22
xy
z
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
PtP
V2
2
2
2)(1
∇=∂∂
xr
Pyx
VV
izP
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
+∂∂
+±
=∂∂
2
2
2
2
2
2 )(1)( ωω xx
r
r
• Based on scalar wave equation
• Frequency-domain• Preferred direction of
propagation: z ~ t
– Evolution in depth
![Page 23: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/23.jpg)
23
xy
z
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
PtP
V2
2
2
2)(1
∇=∂∂
xr
Pyx
VV
izP
⎟⎟⎠
⎞⎜⎜⎝
⎛∂∂
+∂∂
+±
=∂∂
2
2
2
2
2
2 )(1)( ωω xx
r
r
• Based on scalar wave equation
• Frequency-domain• Preferred direction of
propagation: z ~ t
– Evolution in depth
![Page 24: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/24.jpg)
24
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
• Evolution eqn uses– Continued fractions– Operator splitting– ADI finite difference
• Each depth step requires applying four operators– Along x– Along y– Along x + y– Along x – y
x
y
![Page 25: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/25.jpg)
25
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
• Evolution eqn uses– Continued fractions– Operator splitting– ADI finite difference
• Each depth step requires applying four operators– Along x– Along y– Along x + y– Along x – y
x
y
![Page 26: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/26.jpg)
26
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
• Evolution eqn uses– Continued fractions– Operator splitting– ADI finite difference
• Each depth step requires applying four operators– Along x– Along y– Along x + y– Along x – y
x
y
![Page 27: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/27.jpg)
27
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
• Evolution eqn uses– Continued fractions– Operator splitting– ADI finite difference
• Each depth step requires applying four operators– Along x– Along y– Along x + y– Along x – y
x
y
![Page 28: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/28.jpg)
28
““WaveWave--equationequation”” ImagingImagingOneOne--way propagationway propagation
• Evolution eqn uses– Continued fractions– Operator splitting– ADI finite difference
• Each depth step requires applying four operators– Along x– Along y– Along x + y– Along x – y
x
y
![Page 29: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/29.jpg)
29
P z+Δz P z=
x
z
z
z + Δz
““WaveWave--equationequation”” ImagingImagingImplicit complex triImplicit complex tri--diagonal linear systemsdiagonal linear systems
( )( ) z
yxxzyx
zyxx
zzyxx
zzyx
zzyxx
PAPAPA
APPAAP
,*
,*
,*
,,,
21
21
Δ+Δ−
Δ+Δ+
Δ+Δ+Δ−
+−+
=+−+
![Page 30: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/30.jpg)
30
P z+Δz P z=
x
z
z
z + Δz
““WaveWave--equationequation”” ImagingImagingImplicit complex triImplicit complex tri--diagonal linear systemsdiagonal linear systems
( )( ) z
yxxzyx
zyxx
zzyxx
zzyx
zzyxx
PAPAPA
APPAAP
,*
,*
,*
,,,
21
21
Δ+Δ−
Δ+Δ+
Δ+Δ+Δ−
+−+
=+−+
The evaluation and solution of these complex tri-diagonal systemsdominates the computational cost of our wave-equation imaging code.
![Page 31: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/31.jpg)
31
““WaveWave--equationequation”” ImagingImagingLow level parallelismLow level parallelism
• Common work between shot-records– Calculating the coefficients of the matrices
• Dependent on frequency & local velocity
– Part of the solving of the tri-diagonal system
• Parallelize over shot-records in the kernel
zz+Δz
= P1 PnP2P1 PnP2
![Page 32: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/32.jpg)
32
““WaveWave--equationequation”” ImagingImagingCUDA kernelsCUDA kernels
• Separate kernels for each operator– x, y, x+y and x-y
![Page 33: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/33.jpg)
33zz+Δz
= P1 PnP2P1 PnP2
““WaveWave--equationequation”” ImagingImagingCUDA kernelsCUDA kernels
• Separate kernels for each operator– x, y, x+y and x-y
![Page 34: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/34.jpg)
34
““WaveWave--equationequation”” ImagingImagingCUDA kernelsCUDA kernels
x
y
![Page 35: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/35.jpg)
35
““WaveWave--equationequation”” ImagingImagingCUDA kernelsCUDA kernels
x
y
![Page 36: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/36.jpg)
36
““WaveWave--equationequation”” ImagingImagingPerformancePerformance
• Production CPU kernel– Performance: 15 – 50 Mpoints/sec
• Prototype CUDA kernel– Single tri-diagonal system– Constant coefficients– Performance: 700 Mpoints/sec
• Production CUDA kernels– Single kernel handles x, y, x+y & x-y operators– Several kernels calculate coefficients– Performance: 300-500 Mpoints/sec
![Page 37: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/37.jpg)
37
• Based on the scalar wave equation
• Explicit finite-difference scheme– 2nd order in time– Variable order in space: 6th - 16th
• Most of the computation
– Bandwidth• Read P(x,t), P(x,t-dt) & V(x) (plus halo!)• Write P(x,t+dt)• Max performance is 4+ Gpt/s
““ReverseReverse--timetime”” ImagingImagingTwoTwo--way propagationway propagation
PtP
V2
2
2
2)(1
∇=∂∂
x
8th
![Page 38: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/38.jpg)
38
““ReverseReverse--timetime”” ImagingImagingCUDA algorithmCUDA algorithm
• Paulius’s algorithm
– Each thread specifies an (x,y) point, marching in z.
– Each thread block handles a 2-D rectangle.
– Each 2-D slice + halo is read into shared memory.
– Threads in a block re-use these values. Required Halo
Computed Laplacian
threadIdx.x
thre
adId
x.y
![Page 39: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/39.jpg)
39
““ReverseReverse--timetime”” ImagingImagingCUDA algorithmCUDA algorithm
• Paulius’s algorithm
– Threads in a block march in z, storing values “in-front” & “behind” in registers.
– Number of registers limits the block size and the core-to-halo ratio.
– GPU performance is predictable: 2.5 – 3 Gpt/s. Required Halo
Computed Laplacian
Thread marching direction
![Page 40: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/40.jpg)
40
““ReverseReverse--timetime”” ImagingImagingKernel performanceKernel performance
8th order, RTM
0
5
10
15
20
25
30
128x
128x20
025
6x96x
20096
x256x
200
256x
256x20
0
480x
480x20
0
512x
512x51
2
640x
640x20
064
0x640
x8
960x
704x20
0
704x
960x20
0
Grid size
Spee
dup
const, 16x16shared, 16x16const, 16x32shared, 16x32
![Page 41: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/41.jpg)
41
““ReverseReverse--timetime”” ImagingImagingInterInter--GPU communicationGPU communication
• High frequency requires– Dense sampling– Large memory– Multiple GPUs– Halo exchange– Inter-GPU communication
• Device Host– Use pinned memory– PCIe bus predictably yields ~ 5 GB/s– ~ 10 % of kernel time– Easily hidden
![Page 42: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/42.jpg)
42
““ReverseReverse--timetime”” ImagingImagingInterInter--GPU communicationGPU communication
• CPU process process– Currently using MPI
• From legacy code
– Performance variable– Comparable to kernel time– Solutions
• OpenMP?• single controlling process?
• Node node– Currently Gigabit Ethernet– Solution? Infiniband? 10-GigE?
![Page 43: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/43.jpg)
43
Seismic ImagingSeismic ImagingSummarySummary
• Seismic imaging CUDA codes– All 3 main codes written & verified
• Two in production• One in production testing/optimization
– All done with about two-man years of effort– Kernel speed-ups vary from 10 – 80 X on GT200– 456-GPU cluster out-performs 3000-CPU cluster
• GPU cluster– Jan 2008: bought 32-nodes (128 G80 GPUs)– Dec 2008: upgraded & expanded to 456 GT200s– Nov 2009: expanding to 1200 GT200s
![Page 44: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/44.jpg)
44
Seismic ImagingSeismic ImagingSummarySummary
![Page 45: Seismic Imaging on NVIDIA GPUs€¦ · Seismic Imaging Summary • Seismic imaging CUDA codes – All 3 main codes written & verified •Two in production •One in production testing/optimization](https://reader034.vdocument.in/reader034/viewer/2022051810/601ab55094e2517c8643cc86/html5/thumbnails/45.jpg)
45
Seismic ImagingSeismic ImagingAcknowledgementsAcknowledgements
• Code co-authors/collaborators– Thomas Cullison (Hess & Colorado School of Mines)– Paulius Micikevicius (NVIDIA)– Igor Terentyev (Hess & Rice University)
• Hess GPU systems– Jeff Davis, Mac McCalla
• NVIDIA support & management– Ty Mckercher, Paul Holzhauer, Jeff Saunders, Philip
Nenon
• Hess management– Jacques Leveille, Vic Forsyth, Jim Sherman