gpu-accelerated element-free reverse-time migration with...

GPU-accelerated element-free reverse-timemigration with Gauss points partition

Zhen Zhou1,2, Xiaofeng Jia1,3 and Xiaodong Qiang1

1 Laboratory of Seismology and Physics of Earth’s Interior, School of Earth and Space Sciences, Universityof Science and Technology of China, 96 Jinzhai Rd., Hefei, Anhui 230026, People’s Republic of China2Agrosphere IBG-3, Forschungszentrum Jülich GmbH, D-52425 Jülich, Germany

E-mail: [email protected]

Received 30 June 2017, revised 17 November 2017Accepted for publication 11 December 2017Published 26 February 2018

AbstractAn element-free method (EFM) has been demonstrated successfully in elasticity, heatconduction and fatigue crack growth problems. We present the theory of EFM and its numericalapplications in seismic modelling and reverse time migration (RTM). Compared with the finitedifference method and the finite element method, the EFM has unique advantages: (1)independence of grids in computation and (2) lower expense and more flexibility (because onlythe information of the nodes and the boundary of the concerned area is required). However, inEFM, due to improper computation and storage of some large sparse matrices, such as the massmatrix and the stiffness matrix, the method is difficult to apply to seismic modelling and RTMfor a large velocity model. To solve the problem of storage and computation efficiency, wepropose a concept of Gauss points partition and utilise the graphics processing unit to improvethe computational efficiency. We employ the compressed sparse row format to compress theintermediate large sparse matrices and attempt to simplify the operations by solving the linearequations with CULA solver. To improve the computation efficiency further, we introduce theconcept of the lumped mass matrix. Numerical experiments indicate that the proposed method isaccurate and more efficient than the regular EFM.

Keywords: element-free method, GPU, Gauss points partition, lumped mass matrix, reverse timemigration

(Some figures may appear in colour only in the online journal)

1. Introduction

Currently, many numerical strategies, such as the finite differ-ence method (FDM) and the finite element method (FEM)(Cohen et al 1992, Ichimura et al 2007), have been developedto solve the seismic wave equations (Claerbout 1971, Mullenand Belytschko 1982). Both methods, in practice, have certainshortcomings in either accuracy or computational cost. Forexample, FEM is restricted by the traditional finite element gridsin the process of dealing with large deformation, simulation ofstructural damage, and the high stress gradient and the transienthigh gradient. FDM has the following deficiencies: (1) theconservation of the discrete equation is difficult to guarantee and(2) the adaptability of the method to irregular regions is poor.

The element-free method (EFM) (Belytschko et al 1994, Luet al 1995, Masafumi et al 2009) provides a possible solution tothe above issues for seismic modelling and imaging. In fact,EFM has been demonstrated successfully in elasticity, heatconduction and fatigue crack growth problems (Belytschkoet al 1994). EFM was first introduced to solve the seismic waveequation and was successfully applied to seismic modelling andimaging by Jia and Hu (Jia and Hu 2006). However, Jia’s workcan address at most 81×81 nodes under a 2 G memory com-putational environment because of improper storage of the massmatrix ( )M and the stiffness matrix ( )K . Fan and Jia (2013)compressed the sparse matrices and simplified the operations bysolving the linear equations instead of inverting sparse matrices.The method uses the data structure of the stack to store thenonzero information in computing K and M, which solves theproblem of storage at the cost of long computational time.

Journal of Geophysics and Engineering

J. Geophys. Eng. 15 (2018) 718–728 (11pp) https://doi.org/10.1088/1742-2140/aaa0a9

3 Author to whom any correspondence should be addressed.

1742-2132/18/030718+11$33.00 © 2018 Sinopec Geophysical Research Institute Printed in the UK718

mailto:[email protected]

https://doi.org/10.1088/1742-2140/aaa0a9

http://crossmark.crossref.org/dialog/?doi=10.1088/1742-2140/aaa0a9&domain=pdf&date_stamp=2018-02-26

http://crossmark.crossref.org/dialog/?doi=10.1088/1742-2140/aaa0a9&domain=pdf&date_stamp=2018-02-26

In EFM, the number of Gauss points should be consistentwith the number of model nodes; otherwise, the accuracy of theintermediate coefficient matrices is degraded (Fan andJia 2013). Thus, when the number of nodes in the velocitymodel increases for higher resolution, the memory requirementwill be a great bottleneck. The decomposed element-freeGalerkin method (Marfurt 1984) was presented to resolve thelimit of memory. The node pair-wise approach (Karatarakiset al 2013) was developed to use its amenability to parallelism.To solve the problem of storage and computation efficiency, wepropose a concept of Gauss points partition (GPP) and utilisegraphics processing unit (GPU) to improve the computationefficiency of K and M (Liu et al 2008). We employ thecompressed sparse row (CSR) format to compress the inter-mediate sparse matrices and simplify the operations by solvingthe linear equations with CULA (Xu et al 2010) Sparse’sConjugate Gradient (CG) solver instead of the linear sparsesolver ‘PARDISO’ (Gould et al 2005). In this paper, NVIDA’sCompute Unified Device Architecture 5.0 (CUDA5.0) is used,and the graphics card used is NVIDIA Tesla K20, with 5 GB ofmemory size and a 5.2 GHz memory clock.

Due to the characteristics of the Gaussian points, the GPPmethod does not influence the propagation of seismic waves inthe velocity model. In this paper, the GPP method includes fourpoints: (1) Gauss points correspond with GPU threads one byone, and participate in parallel computation (Fu et al 2010). (2)To replace the global model search with the local search withinthe influence domain of each Gauss point. (3) The relative-coordinate of model nodes should be used in the influencedomain of each Gauss point. (4) The summation is performedwithin the influence domain of different Gauss points that containthe same model node.

In the process of time iteration, we employ the lumped massmatrix (LMM) (Wang et al 2004, Wu 2006) for seeking theinverses of the coefficient matrices. For different scales of thevelocity model, we can either solve the linear equations usingCULA or the simply inverse LMM. For the reverse time migra-tion (RTM), both the zero-lag cross-correlation imaging condition(Claerbout 1985 and Sava and Formel 2006, Sava 2007) and thesource compensation imaging condition (Kaelin and Guitton2006) are used. In order to verify these methods mentioned, asimple three-layer velocity model and Marmousi model have beenemployed. The final results of imaging are relatively accuratecompared with the other numerical methods. Numerical examplesdemonstrate that our methods are efficient.

2. Construction of the discrete equations

Consider the following two-dimensional (2D) scalar wavepropagation problem:

= + Î W

= Î W

= Î W

=

¶¶

¶¶

¶¶

=

¶¶ =

G G

⎧

⎨⎪⎪⎪

⎩⎪⎪⎪

( ) ( )

∣ ( )

( )

∣ ( )

( )

x x

x

x

t

D D x t

u u x

v x

u u t

, 0,

,

,

0,

1

u

t

u

x

u

y

t

u

t t

0 0

00

u

2

2

2

2

2

2

where u is the displacement field, t x, and y denote thetemporal and spatial coordinates; ( )xD is the square of wavevelocity in the media. To solve the equation easily, we definethe moving least squares approximant as

å= º( ) ( ) ( ) ( ) ( ) ( )x x x p x a xu p a , 2h

j

m

j jT

=( ) [ ] [ ] ( )p x x y x y x xy y1, , or 1, , , , , , 3T 2 2

where m is the dimension of the basis vector ( )p x . To obtain( )a x in equation (2), we minimise the following expression:

å= - -( )( ( ) ( ) ) ( )x x p x a xJ w u , 4I

N

I I IT 2

inf

where -( )x xw I is the weight function defined by theinfluence domain of x, in which - >( )x xw 0.I For the nodexI outside the influence domain of x, - =( )x xw 0.I Ninf isthe number of nodes in the influence domain of x, and uI isthe nodal value at x .I We minimise the above norm J to get

( )a x :

¶¶

=( )( )

( )xa xJ

0. 5

We obtain

= -( ) ( ) ( ) ( )a x A x B Ux , 61

where

å= -( ) ( ) ( ) ( ) ( )A x x x p x p xw , 7I

N

I IT

I

inf

= - -¼ -

( ) [ ( ) ( ) ( ) ( )( ) ( )] ( )

B x x x p x x x p xx x p x

w ww

, ,, , 8N N

1 1 2 2

inf inf

= ¼[ ] ( )U u u u, , , . 9NT

1 2 inf

From equations (7) and (8), we have

=( ) ( ) ( ) ( )A x B x p x . 10

Substituting ( )a x into equation (2) yields

j= º-( ) ( ) ( ) ( ) ( ) ( )x p x A x B x U x Uu , 11h 1

where j ( )x is the shape function.Using the variation principle for equation (1), it is

equivalent to find the minimum of the following function:

ò

=¶¶

+¶¶

+

´ + - G

W

GG

⎜ ⎟⎡⎣⎢⎢

⎛⎝

⎞⎠

⎛⎝⎜

⎞⎠⎟

⎤⎦⎥⎥∬ ( ) ( )

( ) ( )

J D xu

xD x

u

yuu

x ya

u u

1

2

1

2

d d2

d , 12

0

2 2

2

where a is the penalty factor. Applying equation (11) to thisprinciple, we have

+ + = ( )KU M FU 0, 13

in which K is the stiffness matrix, M is the mass matrix, andF is the equivalent load vector. On condition that partialdifferential of velocity is ignored, these large and sparse

719

J. Geophys. Eng. 15 (2018) 718 Z Zhou et al

matrices are defined by

ò

ò

j j j j

j j

j j

j

=¶¶

¶¶

+¶¶

¶¶

´ + G

=

= - G

W

G

W

GG

⎜ ⎟ ⎜ ⎟⎧

⎨

⎪⎪⎪⎪

⎩

⎪⎪⎪⎪⎪

⎡⎣⎢⎢⎛⎝

⎞⎠

⎛⎝

⎞⎠

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎤⎦⎥⎥∬

∬( )

K

M

F

xD

x yD

y

x y a

x y

a u

d d d ,

d d ,

d .

14

T T

T

T

T

The stiffness matrix and the mass matrix are usuallycalculated by the respective Gauss quadratures that gather thecontribution of all Gauss points. Thus, from equation (14), weignore the boundary conditions and the partial differential ofvelocity, new equations for K and M can be obtained:

å å

å å

j j

j j

j j

= º¶¶

¶¶

+¶¶

¶¶

= º

⎪

⎪

⎪

⎪

⎧

⎨

⎪⎪⎪⎪

⎩

⎪⎪⎪⎪

⎧⎨⎩⎫⎬⎭

⎧⎨⎩

⎡⎣⎢⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎤⎦⎥⎥

⎫⎬⎭

⎧⎨⎩⎫⎬⎭

⎧⎨⎩⎫⎬⎭( )

( )

K K

M M

xD

x

yD

y,

,

15

DGGN

DG

GNT

GN

GNT

GN

DGGN

DGGNT

GN

where KGN and MGN are computed at the correspondingmodel nodes within the influence domain of each Gausspoint. The summation is performed within the influencedomain of different Gauss points that contain the same modelnode. The curly braces denote the collection of the calcul-ation, which show the final results based on GPP method.

When partial differential of velocity is considered,equation (15) should be modified the following equation (16)

å å

å å

j j

j jj

jj

j

j j

= º¶¶

¶¶

+¶¶

¶¶

+¶¶

´¶¶

+¶¶

¶¶

= º

⎜ ⎟

⎪

⎪

⎪

⎪

⎧

⎨

⎪⎪⎪⎪⎪⎪

⎩

⎪⎪⎪⎪⎪⎪

⎧⎨⎩⎫⎬⎭

⎧⎨⎩

⎡⎣⎢⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎛⎝

⎞⎠

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎤⎦⎥

⎫⎬⎭

⎧⎨⎩⎫⎬⎭

⎧⎨⎩⎫⎬⎭( )

( )

K K

M M

xD

x

yD

y

D

x

x

D

y y,

.

16

DGGN

DG

GNT

GN

GNT

GNGNT

GNGNT GN

DGGN

DGGNT

GN

The system (13) is actually semi-discrete because it containsthe acceleration U. The time recursion relations could beobtained by integrating U using the average accelerationalgorithm:

= + D +D +

= +D +

++

++

⎧⎨⎪⎪

⎩⎪⎪

( ) ( )

( )( )

u u tut u u

u ut u u

4,

2.

17

n n nn n

n nn n

12 1

11

From equation (13), we obtain the following equation:

+ + =+ + =

+ + +⎧⎨⎩

( )KU MU F

KU MU F0,

0.18

n 1 n 1 n 1

n n n

Applying the above equation to the average accelerationalgorithm, we obtain the final time recursion equation as

= +D

-D

+ D +D

-D

´ +D

+

= -D

+ -D

´ +

+-

-

-+

+ - +

- +

⎧

⎨

⎪⎪⎪⎪⎪⎪

⎩

⎪⎪⎪⎪⎪⎪

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

( ) ( )

( ) ( )

( ) ( )

( )

( )

( )

U M K M K U

M K MU

M K F F

U U M K U U

M F F

t t

tt t

t

t t

4 4

4 4

4,

2 2,

19

n 1 n

n

qn

qn 1

n 1 n n n 1

qn

qn 1

2 1 2

2 1 2

2 1

1

1

where n denotes the time step. Equation (19) can be modifiedas

+D

= -D

+ D -D

+

= -D

+ -D

´ +

+

+

+ +

+

⎧

⎨

⎪⎪⎪⎪

⎩

⎪⎪⎪⎪

⎛⎝⎜

⎞⎠⎟

⎛⎝⎜

⎞⎠⎟

( ) ( )

( ) ( )

( )

( )

( )

M K U M K U

MU F F

MU MU K U U

F F

t t

tt

t t

4 4

4,

2 2.

20

n 1 n

nq

nqn 1

n 1 n n n 1

qn

qn 1

2 2

2

In this way, the time recursion relations become linearequations. To enhance the computational efficiency, weemploy CG iterative solvers and the Jacobi preconditioner ofCULA Sparse. The configuration for CULA Sparse is therelative tolerance of -10 5 and the maximum number ofiterations of 104 (Zhang et al 2010).

3. Computations of K and M utilising GPP

Note that the calculations of coefficient matrices K and Mrequire much computer time. To improve the computationalefficiency, we employ a GPU when computing K and M.Because of the limited memory storage of the GPU, the GPUis unable to calculate the whole large velocity model. Toovercome these limitations, the method of GPP is used. Fromequations (15) and (16), we take two steps for calculating thecoefficient matrices K and M. The first step is to computeKGN and M ,GN and the second step is to sum KGN and M .GN

Considering that K and M are usually sparse matrices, a moreefficient way is to store and implement only the non-zeroelements of a sparse matrix. In this paper, we use the CSRformat (Fan and Jia 2013).

Considering each Gauss point is independent, we are ableto allocate Gauss points to GPU threads. Due to limitedmemory size of GPU, we cannot allocate whole Gauss pointsto GPU threads once. In practice, a 2D velocity model usuallyhas more grids in the horizontal direction than that in depth.

720


To facilitate the summation of KGN and MGN in next step, wepartition the Gauss points only in the horizontal direction(figure 2(b)). In the calculation program, the loop ofthe Gauss point and the model nodes is required to satisfy theprinciple that the horizontal direction is defined as theouter-loop.

We allocate the proper Gauss points to the kernel’sthreads according to the GPU memory size (Weiss andShragge 2013) after the GPP. Unlike the outer-loop Gausspoints, which participate in parallel computation by GPUthreads (Goedel et al 2009), the inner-loop model nodes arecomputed in a linear way using the GPU. A strategy of usinglocal search to replace the global search in the model nodesloop is employed. The model nodes of the inner-loop can onlylocate nearby the influence domain of each correspondingGauss point using this strategy. Because the influence domainof the Gauss point is circular in 2D cases, we make an outertangent square of the circle, so that the inner model nodesloop only occurs in the square area. The inner-loop is onlyrelated to the number of model nodes in the influence domainof each Gauss point. Therefore, the size of the inner-loop willnot increase as the model size increases. Because the numberof nodes in the influence domain of each Gauss point islimited (figure 1), very little memory is required for thecomputation. The threads on GPU are organised using thefollowing settings:

==

( )( )x y

Block size Dim3 16, 16, 1 ,Grid size Dim3 GP 16, GP 16, 1 ,

where GPx and GPy represent the number of Gauss pointsthat are allocated to the kernel’s threads in the horizontal andvertical directions, respectively.

It is necessary to use relative-coordinate instead ofabsolute-coordinate, when we compute model nodes valueswithin the influence domain of each Gauss point. As indicatedin equations (7), (8), (10), (11), (15) and (16) to generate thestiffness matrix and the mass matrix using the GPU, weonly compute ¶ ¶( ) ( ) ( )P x B x B x x, , and ¶ ¶( )B x y. Thecomputational relative-coordinate results of ( ) ( )P x B x, ,¶ ¶( )B x x and ¶ ¶( )B x y are saved in three-dimensionalarrays, respectively. The three-dimensional arrays are formedaccording to the relative-coordinates of model nodes andGauss points. The maximum value of relative-coordinates ofmodel nodes equals to the number of model nodes within theinfluence domain of each Gauss point. After transferring theresult from the GPU to the CPU, we convert the small 3Darray to the 2D array for each Gauss point. To compute KGN

and M ,GN the absolute-coordinates of model nodes based onthe entire velocity model should be recovered for each Gausspoint. For details on the data transfer between the CPU andthe GPU in computing KGN and M ,GN refer to figure 1.

When summing KGN and MGN in the influence domainsof different Gauss points that contain the same model nodes(figure 2(a)) based on the global velocity model, we encountera problem caused by the fixed storage of the CPU. The reasonfor this problem is that, although we address the abovesummation for a partition velocity model, the velocity model

partition will affect the propagation of seismic wave in thevelocity model. To overcome the conflicts, we employthe GPP method once again. Note that the model nodes of thesummation are located in the overlap influence domain ofthe neighbouring Gauss points. Therefore, when we partitionthe Gauss points, the corresponding model nodes that must besummed are also partitioned. The summation will be com-puted until no influence domains of Gauss points contain thesame model nodes. To suit the memory of the CPU (Davisand Chung 2012), we partition the Gauss points and make thecorresponding coordinate index subscript of model nodes(MS) satisfy the condition that max(MS)−min(MS)�M (Mis an integer and is related to memory, e.g., M=6561 in 2Gmemory).

4. Solving linear equations using CULA and the LMM

To avoid matrix inversion and multiplication, we solve thelinear equation (20). We employ the iterative method basedon the Sparse CULA library function and solve the problemby taking advantage of the CG iterative solvers and the Jacobipreconditioner based on the GPU platform. The linearequations =Ax b represent the extreme condition of thecorresponding quadratic function. By using the CG iterativeformula (Hestenes and Stiefel 1952), the problem of linearequations will be solved.

From equation (20), we find that three sets of matrixvector multiplications must be performed and two sets oflinear equations must be solved. Three intermediate matricesof the matrix-vector multiplication are - D(( ) )/M K Mt 4 ,2

and K; two intermediate matrices involved in solving thelinear equations are + D(( ) )/M Kt 42 and M.

In FEM, when the size of the finite element is muchsmaller than the seismic wavelength, we can change thesparse symmetric matrix with a diagonal matrix, i.e., theLMM. Although the computation accuracy will be slightlylower (Mullen and Belytschko 1982), the theory of finiteelement (Koketsu et al 2004 and Komatitsch et al 2010a) hasproved to have the same order-of-magnitude of error (Friedand Malkus 1975) after using the LMM and the consistentmass matrix (CMM). The approximate imaging results ofLMM can be used for starting imaging results by avoidingsolving liner equations, which is helpful to provide referencefor other complex methods.

The influence domain in EFM is similar to the finiteelement in FEM. Because the influence domain usuallycontains dozens of nodes, it can hardly satisfy the principlethat the size of the influence domain is much smaller than theseismic wavelength. To avoid solving linear equations usingLMM, we employ the CMM in the process of matrix-vectormultiplication. According to equation (14), LMM can beexpressed as follows:

òd j=W

( )M x yd d , 21ijL

ij i

where dij=1 if i=j, and dij=0 otherwise.

721


Figure 1. Data transfer between the CPU and the GPU in EFM (GPP-GPU-CUDA).

Figure 2. (a) Influence domains of different Gauss points (×). The red points represent the model nodes. As an example, the arrow indicatesthe node on which KGN and MGN must be summed. (b) Gauss points are partitioned in the horizontal orientation.

722


To consider one of the matrix inversions in equation (19):

+ D = +

´ D

- - -

-

( ( ) ) ( ) ( ( )( ( ) )) ( )

M K M I M

K

t

t

4

4 . 22

ijL

ijL

ijL2 1 1 1

2 1

Using binomial expansion for the right hand side ofequation (22), we have

+ D = -

´ D + - D

+

- - -

-

( ( ) ) ( ) ( ( )

( ( ) ) ( ( ) ( ( ) ))) ( )

M K M I M

K M K

t

t t

4

4 4

, 23

ijL

ijL

ijL

ijL

2 1 1 1

2 1 2 2

because the eigenvalues range of + D( ( ) )M K t 4ijL 2 is

between −1 and 0, the expansion in equation (23) is con-vergent. The numerical examples indicate that, in most cases,it is sufficient to retain only the first two terms in theexpansion. The accuracy of imaging of LMM will beimproved when we retain more items on the right-hand sideof equation (23) with the increasing of computational time. Ofcourse, the deviation between LMM and FDM still exists nomatter how many items are kept.

It is feasible to inverse matrices with LMM instead ofsolving linear equations for small velocity models, forexample, the Marmousi model. However, for large models,the accuracy of the method decreases significantly. Thisreduced accuracy is caused by the large vertical size of themodel, which makes the row values in the matrix quitescattered and different from the diagonal values in the lumpedmatrix. In this case, dropping the high-order terms inequation (23) will lead to large errors. To solve the problemof large models, we solve the linear equations only employingCMM using CULA CG iterative solvers.

5. Numerical examples

In this section, we test several numerical examples on mod-elling and prestack RTM (Buske et al 2009). The zero-lagcross-correlation imaging condition has the form of

ò= -( ) ( ) ( ) ( )I x t S x t R x t t t, , , d , 24t

0max

max

where tmax is the maximum recording time, ( )S x t, is theseismic wavefield, -( )R x t t, max is the receiver wavefield,and ( )I x t, is the image result at location x.

To eliminate the footprints of the shots near the surfaceand improve the accuracy of imaging, we employ the sourcecompensation imaging condition

åå

=( )( ) ( )

( )( )I x t

S x t R x t

S x t,

, ,

,. 25t

t2

The source compensation imaging condition is morecapable of representing the true amplitude of the reflectionand adjusting the energy of the whole imaging results. At thesame time, this imaging condition weakens the influence ofstrong source-related artefacts.

First, we design a three-layer velocity model, as shown infigure 3. The length of the model is 7.4375 km, and the depth

Figure 3. The three-layer velocity model; the velocities are 2.5, 3.5and 4.5 km s−1 from top to bottom.

Figure 4. The stacked image from 20 single-shot images obtained byEFM combined with GPU-GPP and CULA-CG. K and M arecomputed based on equation (15).

Figure 5. The stacked image from 20 single-shot images obtained byEFM combined with GPU-GPP and CULA-CG. K and M arecomputed based on equation (16).

Figure 6. Marmousi velocity model.

723


is 2.99 km. From shallow to deep, the velocity is 2.5, 3.5 and4.5 km s−1. We discretise the model with 596×300 nodes.For Gauss quadrature, 300×300 Gauss cells and 3×3 Gausspoints in each cell are used. The recording time is 3 s, with thetime step of 1 ms, and the dominant frequency of the sourcewavelet is 25 Hz. We conduct RTM on this model. Figures 4and 5 show 20-shots-stacked migration results for this modelby EFM using GPU-GPP and CULA-CG, in which K and Mare computed based on equations (15) and (16), respectively.The zero-lag cross-correlation imaging condition is employed.

Comparing figure 4 with 5, we find some differencescaused by partial differential of velocity for the three-layer

velocity model under the zero-lag cross-correlation imagingcondition. Figure 5 shows higher imaging resolution, espe-cially in the areas close to the surface and the shallow velocityboundary, and indicates less noise than figure 4. That meansthe item of partial differential of velocity affects the finalimaging results when velocity changes too much.

The second model tested is the Marmousi model shownin figure 6. The length of the model is 7.425 km, and thedepth is 2.99 km. We discretise the model with 595×298nodes, 300×300 Gauss cells and 3×3 Gauss points ineach cell. The total recording time is 3 s, with the time step of1 ms. The dominant frequency of the source wavelet is 25 Hz.

Figure 7. Snapshots simulated by EFM (GPU-GPP and CULA-CG) in the Marmousi model. (a) Snapshot at 0.3 s. (b) Snapshot at 0.6 s.(c) Snapshot at 0.8 s. (d) Snapshot at 1.0 s. (e) Snapshot at 1.3 s. (f) Snapshot at 1.7 s.

724


Figure 7 shows snapshots simulated by EFM using GPU-GPPand CULA CG in the Marmousi model, in which K and Mare computed based on equation (16). The source is located atthe midpoint of the surface.

Figures 8 and 9 show the images generated by the ele-ment-free RTM using GPU-GPP and CULA-CG, in which K

and M are both computed based on equation (15). We utilisethe zero-lag cross-correlation imaging condition in figure 8and the source compensation imaging condition in figure 9.Figure 8 contains many artefacts near the surface causedtheoretically by the two-way wave equation used in the RTM.Figure 9 indicates that the source compensation imagingcondition can improve the accuracy of imaging by eliminatingthe footprints of the shots near the surface. Therefore, withoutconsidering partial differential of velocity, we comparefigure 8 with 9 to find the better imaging condition. From theimaging results, we find the source compensation imagingcondition is the better choice. The main reason is we canobtain the amplitude value which is more connected with thereflection coefficient under the source compensation imagingcondition.

Figure 10 shows the images generated by the element-free RTM using GPU-GPP and LMM. The source compen-sation imaging condition is applied in the element-free RTM.Compared with figure 9, figure 10 shows that, despite severalapproximations, LMM inversion can obtain appropriatelyaccurate imaging results.

Figure 11 shows the images generated by the element-free RTM using GPU-GPP and CULA-CG on condition ofconsidering partial differential of velocity in the computation

Figure 8. The stacked image from 18 single-shot images obtained byEFM (GPU-GPP and CULA-CG) with the zero-lag cross-correlationimaging condition RTM, in which K and M are computed based onequation (15).

Figure 9. The stacked image from 18 single-shot images obtained byEFM (GPU-GPP and CULA-CG) with the source compensationimaging condition, in which K and M are computed based onequation (15).

Figure 10. The stacked image from 18 single-shot images obtainedby EFM (GPU-GPP and LMM) with the source compensationimaging condition, in which K and M are computed based onequation (15).

Figure 11. The stacked image from 18 single-shot images obtainedby EFM (GPU-GPP and CULA-CG) with the zero-lag cross-correlation imaging condition RTM, in which K and M arecomputed based on equation (16).

Figure 12. The stacked images from 18 single-shot images obtainedby the finite difference RTM.

725


of K and M and employing zero-lag cross-correlation ima-ging condition. Compared with figure 8, figure 11 has higherresolution with more details in clearer deep imaging and lessartefacts near the surface.

Figure 12 shows the classical eighth-order finite differ-ence RTM result. Compared to figure 12, the method ofgenerating figure 11 can also obtain clear images of the threelarge faults and the deep anticline in the model. Especially forthe image of deep anticline, our method can obtain betterresults than the finite difference RTM method. When partialdifferential of velocity is considered in the computationof K and M, the imaging results of LMM and the sourcecompensation imaging condition are poor because of theapproximations they contain.

Therefore, partial differential of velocity can be ignored(especially when the true velocity is unknown) and it isnecessary to solve the linear equations with CG method underusing zero-lag cross-correlation imaging condition whenpartial differential of velocity is considered.

Table 1 shows great improvement in the computationalefficiency is achieved by our method. As clearly shown in thetable, compared to CULA Sparse host (using CPU), CULASparse CUDA (using GPU) doubles the computation effi-ciency in the RTM. LMM inversion is the most efficientmethod at the cost of approximations in accuracy.

Figure 13 illustrates the comparison of computationefficiency of RTM by different methods.

To obtain higher resolution, we discretise the Marmousimodel with 5395×956 nodes, 2800×600 Gauss cells and3×3 Gauss points in each cell. Note that we distribute 2800Gauss cells in the horizontal direction. Such a large index ofGauss cells will bring abnormal results in the computation ofK and M. The reason is that the 3×3 order in large index ofGauss cells leads to deficiency in the computation of Gaussintegral. To solve the problem of the abnormal results and stilluse the 3×3 order, we employ the following strategy: par-tition the model in half in the horizontal direction and shift theright part of the model with larger Gauss cells index to theleft. Between the two parts, an overlap region should beremained in each one. The width of this overlap region is notsmaller than the diameter of the influence domain in the EFM.

For large velocity models, the storage of the wavefield ateach time step in RTM is a significant problem for computerresources. To solve this, we employ the method of check-pointing storage for the wavefield. According to the timedomain sampling law, the continuous signal ( )f t with Ffrequency band can be expressed by a series of discretesampling values ( )f t ,1 D( )f t t ,1 D( )f t t2 ,1 ... The lawdemands that the time interval of sampling points satisfy

D ( )t F1 2 . In this paper, the dominant frequency of thesource wavelet is 25 Hz and the time step of 1 ms, thereforewe can utilise the time step of 12 ms for storing wave fieldvalue.

6. Conclusions

In this paper, we proposed a concept of GPP and utilised GPUto improve the computation efficiency of the intermediatematrices. To improve the computation efficiency of largecoefficient matrices, we employ CSR format to compress thestiffness matrix and the mass matrix and simplify the opera-tions by solving the linear equations with the CULA Sparse’sCG solver. Under certain conditions, the LMM is able toobtain accepted imaging results with several approximations.In GPP method, we use local search to replace the globalsearch in the model nodes loop. The concept of relative-coordinate is used to store the index of the intermediatematrices for resolving shortcomings of memory and compu-tation efficiency.

Using these strategies, the problems of the limitation ofstorage and the time-consuming computation are solved. Withthe computational efficiency greatly improved, the accuracyof modelling and imaging is not lowered. Our method is easyto be applied in other similar numerical methods, suchas FEM.

The arrays require quite limited memory storage, whichmakes the method promising in dealing with large-scale 3Dproblems. Moreover, wavelet transform can be implementedfor compression and storage of the coefficient matrices.Combined with space-domain parallel computations, theefficiency of this method can be increased further.

Table 1. Comparison of the computational time before and after using improved approach.

Time consumption ofcomputing K and M

Time consumption ofmodelling

Time consumptionof the RTM

The conventional EFM 420 min 520 min 620 minThe GPP method (only using CPU Pardiso) 55 min 155 min 255 minGPU-GPP-host (CULA Sparse host) 3.5 min 53 min 102 minGPU-GPP-CUDA (CULA Sparse CUDA) 3.5 min 28.5 min 53.5 minGPU-GPP-MI (using inverse MI ) 3.5 min 12.5 min 21 minRatio (conventional EFM: GPU-GPP-MI) 120 28.5 29.5Ratio (GPP: GPU-GPP-MI) 15.7 12.4 12.1Ratio (GPU-GPP-host: GPU-GPP-MI ) 1 4.24 4.86Ratio (GPU-GPP-CUDA: GPU-GPP-MI ) 1 2.28 2.54

726


Acknowledgments

The authors gratefully acknowledge critical reviews by twoanonymous reviewers. We would like to thank Bo Chen,Quanli Li, Jinyin Hu and Xiaolin Hu for their fruitful discus-sions. We are especially grateful to Qihua Li for providing thefinite difference reverse time migration result of Marmousimodel. This study received support from National NaturalScience Foundation of China (41774121 and 41374006).

References

Belytschko T, Lu Y Y and Gu L 1994 Element-free Galerkinmethods Int. J. Numer. Methods Eng. 37 229–56

Buske S et al 2009 Imaging and inversion—introduction Geophysics74 WCA1–CA4

Claerbout J 1971 Toward a unified theory of reflector mappingGeophysics 36 467–81

Claerbout J 1985 Imaging the Earth’s Interior (Oxford: Blackwell)p 398

Cohen L D and Cohen I 1992 Finite element methods for activecontour models and balloons from 2D to 3D Proc. Conf. onComputer Vision and Pattern Recognition pp 592–8

Davis J D and Chung E S 2012 SpMV: a memory-bound applicationon the GPU stuck between a rock and a hard place TechnicalReport 14 Microsoft Research Silicon Valley

Fan Z and Jia X 2013 Element-free method and its efficiencyimprovement in seismic modeling and reverse time migrationJ. Geophys. Eng. 10 025002

Fried I and Malkus D S 1975 Finite element mass matrix lumping bynumerical integration with no convergence rate loss Int. J.Solids Struct. 11 461–6

Fu H, Clapp R G and Lindtjorn O 2010 Revisiting convolution andFFT on parallel computation platforms Expand. Abst. of 80thAnnu. Int. SEG Meeting pp 3071–75

Figure 13. Comparison of the computational efficiency of the RTM.

727


https://doi.org/10.1002/nme.1620370205

https://doi.org/10.1002/nme.1620370205

https://doi.org/10.1002/nme.1620370205

https://doi.org/10.1190/1.3256872

https://doi.org/10.1190/1.1440185

https://doi.org/10.1190/1.1440185

https://doi.org/10.1190/1.1440185

https://doi.org/10.1088/1742-2132/10/2/025002

https://doi.org/10.1016/0020-7683(75)90081-5

https://doi.org/10.1016/0020-7683(75)90081-5

https://doi.org/10.1016/0020-7683(75)90081-5

https://doi.org/10.1190/1.3513484

https://doi.org/10.1190/1.3513484

https://doi.org/10.1190/1.3513484

Goedel N, Warburton T and Clemens M 2009 GPU accelerateddiscontinuous Galerkin FEM for electromagnetic radiofrequency problems IEEE Antennas and Propagation SocietyInt. Symp. pp 1–4

Gould N I M, Scott J A and Hu Y 2005 A numerical evaluation ofsparse direct solvers for the solution of large sparse symmetriclinear systems of equations ACM Trans. Math. Softw. 101236463

Hestenes M R and Stiefel E 1952 Methods of conjugate gradients forsolving linear systems J. Res. Natl Bur. Stand. 49 409–36

Ichimura T, Hori M and Kuwamoto H 2007 Earthquake motionsimulation with multiscale finite-element analysis on hybridgrid Bull. Seismol. Soc. Am. 97 1133–43

Jia X and Hu T 2006 Element-free precise integration method and itsapplications in seismic modelling and imaging Geophys. J. Int.166 349–72

Kaelin B and Guitton A 2006 Imaging condition for reverse timemigration Expand. Abstr. of 76th Annu. Int. SEG Meetingpp 2594–98

Karatarakis A, Metsis P and Papadrakakis M 2013 GPU-acceleration ofstiffness matrix calculation and efficient initialization of EFGmeshless methods Comput. Methods Appl. Mech. Eng. 258 63–80

Koketsu K, Fujiwara H and Ikegami Y 2004 Finite-elementsimulation of seismic ground motion with a voxel mesh PureAppl. Geophys. 161 2183–98

Komatitsch D et al 2010a High-order finite element seismic wavepropagation modeling with MPI on a large GPU clusterJ. Comput. Phys. 229 7692–714

Liu Y, Jiao S, Wu W and De S 2008 GPU accelerated fast FEMdeformation simulation In Circuits and Systems (APCCAS)(Piscataway, NJ: IEEE) pp 606–9

Lu Y Y, Belytschko T and Tabbara M 1995 Element-free Galerkinmethod for wave propagation and dynamic fracture Comput.Methods Appl. Mech. Eng. 126 131–53

Marfurt K J 1984 Accuracy of finite-difference and finite-elementmodeling of the scalar andelastic wave equations Geophysics49 533–49

Masafumi K, Toshifumi M, Hitoshi M and Yoshinori S 2009Decomposed element-free Galerkin method compared withfinite-difference method for elastic wave propagationGeophysics 74 13–25

Mullen R and Belytschko T 1982 Dispersion analysis of finiteelement semidiscretizationsof the two-dimensional waveequation Int. J. Numer. Methods Eng. 18 11–29

Sava P 2007 Stereographic imaging condition for wave-equationmigration Geophysics 72 A87–91

Sava P and Formel S 2006 Time-shift imaging condition in seismicmigration Geophysics 71 209–17

Wang G et al 2004 Lumped-mass method for the study of bandstructure in two-dimensional phononic crystals Phys. Rev. B 69184302

Weiss R M and Shragge J 2013 Solving 3D anisotropic elastic waveequations on parallel GPU devices Geophysics 78 F7–15

Wu S R 2006 Lumped mass matrix in explicit finite element methodfor transient dynamics of elasticity Comput. Methods Appl.Mech. Eng. 195 5983–94

Xu K, Zhou B and McMechan G A 2010 Implementation of prestackreverse time migration using frequency-domain extrapolationGeophysics 75 S61–72

Zhang B et al 2010 The CUBLAS and CULA based GPUacceleration of adaptive finite element framework forbioluminescence tomography Opt. Express 18 20201–14

728


https://doi.org/10.1109/APS.2009.5171720

https://doi.org/10.1109/APS.2009.5171720

https://doi.org/10.1109/APS.2009.5171720

https://doi.org/10.6028/jres.049.044



https://doi.org/10.1785/0120060175

https://doi.org/10.1785/0120060175

https://doi.org/10.1785/0120060175

https://doi.org/10.1111/j.1365-246X.2006.03024.x

https://doi.org/10.1111/j.1365-246X.2006.03024.x

https://doi.org/10.1111/j.1365-246X.2006.03024.x

https://doi.org/10.1190/1.2370059

https://doi.org/10.1190/1.2370059

https://doi.org/10.1190/1.2370059

https://doi.org/10.1016/j.cma.2013.02.011



https://doi.org/10.1007/s00024-004-2557-7

https://doi.org/10.1007/s00024-004-2557-7

https://doi.org/10.1007/s00024-004-2557-7

https://doi.org/10.1016/j.jcp.2010.06.024



https://doi.org/10.1109/APCCAS.2008.4746096



https://doi.org/10.1016/0045-7825(95)00804-A

https://doi.org/10.1016/0045-7825(95)00804-A

https://doi.org/10.1016/0045-7825(95)00804-A

https://doi.org/10.1190/1.1441689

https://doi.org/10.1190/1.1441689

https://doi.org/10.1190/1.1441689

https://doi.org/10.1190/1.3111113

https://doi.org/10.1190/1.3111113

https://doi.org/10.1190/1.3111113

https://doi.org/10.1002/nme.1620180103

https://doi.org/10.1002/nme.1620180103

https://doi.org/10.1002/nme.1620180103

https://doi.org/10.1190/1.2781582

https://doi.org/10.1190/1.2781582

https://doi.org/10.1190/1.2781582

https://doi.org/10.1190/1.2338824

https://doi.org/10.1190/1.2338824

https://doi.org/10.1190/1.2338824

https://doi.org/10.1103/PhysRevB.69.184302

https://doi.org/10.1103/PhysRevB.69.184302

https://doi.org/10.1190/geo2012-0063.1

https://doi.org/10.1190/geo2012-0063.1

https://doi.org/10.1190/geo2012-0063.1




https://doi.org/10.1190/1.3339386

https://doi.org/10.1190/1.3339386

https://doi.org/10.1190/1.3339386

https://doi.org/10.1364/OE.18.020201

https://doi.org/10.1364/OE.18.020201

https://doi.org/10.1364/OE.18.020201

gpu-accelerated element-free reverse-time migration with...

Documents