research article horizon detection in seismic data: an application … · 2016-08-02 · seismic...

11
Research Article Horizon Detection in Seismic Data: An Application of Linked Feature Detection from Multiple Time Series Robert G. Aykroyd 1 and Fathi M. O. Hamed 2 1 Department of Statistics, University of Leeds, Leeds LS2 9JT, UK 2 Department of Statistics, University of Benghazi, P.O. Box 1308, Benghazi, Libya Correspondence should be addressed to Robert G. Aykroyd; [email protected] Received 6 May 2014; Revised 30 July 2014; Accepted 12 August 2014; Published 9 September 2014 Academic Editor: Shuo-Jye Wu Copyright © 2014 R. G. Aykroyd and F. M. O. Hamed. is is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Seismic studies are a key stage in the search for large scale underground features such as water reserves, gas pockets, or oil fields. Sound waves, generated on the earth’s surface, travel through the ground before being partially reflected at interfaces between regions with high contrast in acoustic properties such as between liquid and solid. Aſter returning to the surface, the reflected signals are recorded by acoustic sensors. Importantly, reflections from different depths return at different times, and hence the data contain depth information as well as position. A strong reflecting interface, called a horizon, indicates a stratigraphic boundary between two different regions, and it is the location of these horizons which is of key importance. is paper proposes a simple approach for the automatic identification of horizons, which avoids computationally complex and time consuming 3D reconstruction. e new approach combines nonparametric smoothing and classification techniques which are applied directly to the seismic data, with novel graphical representations of the intermediate steps introduced. For each sensor position, potential horizon locations are identified along the corresponding time-series traces. ese candidate locations are then examined across all traces and when consistent patterns occur the points are linked together to form coherent horizons. 1. Introduction ere are many applications with multiple time-series data recorded at high frequency, such as environmental science, geophysics, financial trading, and internet marketing. Oſten the aim of the analysis is to identify events which occur across many data series but not necessarily at the same time. Rapid acquisition means that a full analysis may be impractical and so simple yet reliable methods are needed. An exemplary application is geophysical surveying where large datasets are obtained but only limited questions need answering, such as the following: is there an oil field? And if so, what is its location? In such cases, there is no need to perform a full 3D reconstruction of the study area and then interpret the reconstruction. Instead, it is possible to produce an answer directly from the data. A seismic study will form motivation for the proposed approach, but the method could be equally applied to other situations where the aim is to identify coherent features across multiple time series. A seismic dataset consists of records of reflected signals measured by a number of seismic sensors placed at the nodes of a lattice on the earth surface. Figure 1 presents a diagram of the forming of reflected seismic signals. For details and a review of seismology, see, for example, Shearer [1], Sheriff and Geldart [2], Waters [3], and Hatton et al. [4]. e data can be treated as adjacent time series, called traces, indicating the arrival time of artificially generated sound waves reflected by underground features. An incident sound wave is reflected primarily by boundaries between layers with differing physical properties. Strong reflections are named horizons and indicate boundaries between distinct layers, or strata. e main purpose of the analysis is to identify geometric features automatically, which represent strong reflections, and describe the horizons. e tracking of horizons is not easy in many cases, for example, when dealing with noisy measurements or across faults. A fault is a vertical crack in the rock which is recognized in seismic data by discontinuities in Hindawi Publishing Corporation Advances in Statistics Volume 2014, Article ID 548070, 10 pages http://dx.doi.org/10.1155/2014/548070

Upload: others

Post on 07-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

Research ArticleHorizon Detection in Seismic Data An Application of LinkedFeature Detection from Multiple Time Series

Robert G Aykroyd1 and Fathi M O Hamed2

1 Department of Statistics University of Leeds Leeds LS2 9JT UK2Department of Statistics University of Benghazi PO Box 1308 Benghazi Libya

Correspondence should be addressed to Robert G Aykroyd rgaykroydleedsacuk

Received 6 May 2014 Revised 30 July 2014 Accepted 12 August 2014 Published 9 September 2014

Academic Editor Shuo-Jye Wu

Copyright copy 2014 R G Aykroyd and F M O Hamed This is an open access article distributed under the Creative CommonsAttribution License which permits unrestricted use distribution and reproduction in any medium provided the original work isproperly cited

Seismic studies are a key stage in the search for large scale underground features such as water reserves gas pockets or oil fieldsSound waves generated on the earthrsquos surface travel through the ground before being partially reflected at interfaces betweenregionswith high contrast in acoustic properties such as between liquid and solid After returning to the surface the reflected signalsare recorded by acoustic sensors Importantly reflections from different depths return at different times and hence the data containdepth information as well as position A strong reflecting interface called a horizon indicates a stratigraphic boundary betweentwo different regions and it is the location of these horizons which is of key importance This paper proposes a simple approachfor the automatic identification of horizons which avoids computationally complex and time consuming 3D reconstruction Thenew approach combines nonparametric smoothing and classification techniques which are applied directly to the seismic datawith novel graphical representations of the intermediate steps introduced For each sensor position potential horizon locationsare identified along the corresponding time-series traces These candidate locations are then examined across all traces and whenconsistent patterns occur the points are linked together to form coherent horizons

1 Introduction

There are many applications with multiple time-series datarecorded at high frequency such as environmental sciencegeophysics financial trading and internet marketing Oftenthe aim of the analysis is to identify events which occuracross many data series but not necessarily at the sametime Rapid acquisition means that a full analysis may beimpractical and so simple yet reliable methods are neededAn exemplary application is geophysical surveying wherelarge datasets are obtained but only limited questions needanswering such as the following is there an oil field Andif so what is its location In such cases there is no need toperform a full 3D reconstruction of the study area and theninterpret the reconstruction Instead it is possible to producean answer directly from the data A seismic study will formmotivation for the proposed approach but the method couldbe equally applied to other situations where the aim is toidentify coherent features across multiple time series

A seismic dataset consists of records of reflected signalsmeasured by a number of seismic sensors placed at thenodes of a lattice on the earth surface Figure 1 presents adiagramof the forming of reflected seismic signals For detailsand a review of seismology see for example Shearer [1]Sheriff and Geldart [2] Waters [3] and Hatton et al [4]The data can be treated as adjacent time series called tracesindicating the arrival time of artificially generated soundwaves reflected by underground features An incident soundwave is reflected primarily by boundaries between layers withdiffering physical properties Strong reflections are namedhorizons and indicate boundaries between distinct layers orstrata

The main purpose of the analysis is to identify geometricfeatures automatically which represent strong reflectionsand describe the horizons The tracking of horizons is noteasy in many cases for example when dealing with noisymeasurements or across faults A fault is a vertical crack in therock which is recognized in seismic data by discontinuities in

Hindawi Publishing CorporationAdvances in StatisticsVolume 2014 Article ID 548070 10 pageshttpdxdoiorg1011552014548070

2 Advances in Statistics

seismic pulseSource of the

Underground reflecting surfaces

Seismic sensors

lowast

Figure 1 Diagram showing the formation of the reflected seismicsignals

the horizonThe location of faults is an important task for thestructural interpretation of seismic data

There are several approaches and many contributionsthat have been made to solve seismological problems Oneapproach is to perform a full 3D reconstruction of the sub-surface from the surface measurementsThis type of problemis well known and is referred to as an ill-posed inverseproblem which often requires substantial regularization Ageneral background to inverse problems can be found inRibes and Schmitt [5] and in a mathematical discussionin Stuart [6] Even in 2D the reconstruction process isusually mathematically and computationally challengingand performing a full 3D reconstruction for a large scaleseismic problem (possibly involving terabytes of data) maybe impractical and ineffectivemdashthis is in contrast to othergeophysical data collection methods (see eg the reviewpaper of [7]) Also the reconstructed distribution wouldstill need postprocessing to identify features and a majordrawback of this two-stage approach is that the regularizationrequired in the first step may mask the features of interestInstead other types of analysis have beenmorewidely studiedlooking to produce surface maps to indicate what is beneaththe surface Lendzionowski et al [8] and van der Baan andPaul [9] use pattern recognition methods in seismologyIn particular they compare the traces against a library ofpatterns from known features with the aim of identifyingsimilar features The key to success is the construction ofa varied library as otherwise features of interest could beignored Walden [10] uses simple summary measures of thetraces followed by projection pursuit and thresholding toclassify the traces into homogeneous groups This howeverwas performed on the entire trace and hence no depthinformation is retained Also there is no spatial smoothingmeaning that some locations can be classified differently toneighbouring locations simply due to the effects of noise

The rest of the paper is organized as follows In Section 2a model is presented which describes the full multistageprocess leading to a seismic dataset This model will be used

to generate synthetic data but it will not be used for dataanalysis Instead the method of data analysis is described inSection 3 along with examples Finally there is a conclusionsrsquosection followed by references

2 A Model for Seismic Data

In this section the standard model for seismic data isdescribed In this paper the model has been used for datageneration but it is not used in the proposed method ofanalysis The use of a realistic seismic model will mean thatsynthetic data mimic real data If the aim of an analysisis full 3D subsurface reconstruction then such a modelwould also be used in analysis however in this work theproposed method detailed in the next section interprets thedata directly

Suppose that acoustic sensors are placed on the earthrsquossurface at 119899 distinct locations forming a regular grid acrosssome study regionsAt time 119905

0= 0 a seismic pulse is generated

and data recording starts The energy from the pulse spreadsthrough the ground with some proportion being reflectedback towards the surface The sensors record this reflectedenergy at 119899 equally spaced time points T = 119905

119894 119894 =

1 2 119899 spread over several seconds before data collectionstops Hence at each location a time series y = 119910

119894 119894 =

1 2 119899 is recorded Let the 119899 sensor locations be denotedby S = s

119895 119895 = 1 2 119898 Collecting the multiple time

series together produces the full seismic dataset Y = 119910119894119895

119894 = 1 2 119899 119895 = 1 2 119898 where 119910119894119895

represents thereflected energy measured at time 119905

119894and location s

119895

The amount of reflected energy depends on the acousticimpedance distribution of the subsurface as well as proper-ties of the initial seismic pulse A common model used inreflection seismology is that of a stack of horizontal layers(see [4 11]) Suppose that there are 119870 distinct layers with thebottom of the layers at depthsD = 119889

119896 119896 = 1 2 119870minus1mdash

note that the lowest layer is assumed to stretch to infinity andhence there is no value for 119889

119870mdashand that the corresponding

acoustic impedances are W = 120596119896 119896 = 1 2 119870 In

turn the acoustic impedance is determined by the product ofthe density 120588 and the acoustic velocity V of the material inthe layer Typical values for density (in gcm3) and acousticvelocity (kmsec) are water 120588 = 10 V = 14 oil 120588 = 08V = 13 and rock 120588 = 15ndash30 V = 30ndash60 This meansthat it is possible for two materials to have different densityand acoustic velocity but to have indistinguishable acousticimpedance values

Where two layers touch there is an acoustic impedancecontrast described by the reflection coefficientThis is definedas the difference between the acoustic impedances of the twolayers divided by their sum So the reflection coefficient 120574

119896

at the bottom of layer 119896 is given by

120574119896=

120596119896+1

minus 120596119896

120596119896+1

+ 120596119896

119896 = 1 2 119870 minus 1 (1)

In a regionwithmultiple layers therewill be several reflectingsurfaces at different depths A reflection coefficient seriesor spike series records the depth and value of the various

Advances in Statistics 3

00

02

04

06

08

10

000 002 004minus004 minus002

(a)

00

02

04

06

08

10

000 002 004minus004 minus002

(b)

00

02

04

06

08

10

000 002 004minus004 minus002

(c)

Figure 2 Ricker wavelet of varying dominant frequency (a) 25Hz (b) 50Hz (c) 100Hz

reflection coefficients and can be written as 120574 = 120574119896 119896 =

1 2 119870 minus 1 at depths D = 119889119896 119896 = 1 2 119870 minus 1

This series therefore encodes the information about thehorizons in the sense that the arrival time of a reflectionis directly proportional to the location of the spike 119889

119896 and

the amplitude of the spike represents the strength of thereflection 120574

119896

The model of recorded seismic data is completed by theconvolution of the reflection coefficient series with a transferfunction which is often taken as the Ricker wavelet [12] TheRicker wavelet is defined as the scaled second derivative ofthe Gaussian function written as

119908 (119889 119905) = (1 minus 212058721198912

119877(119905 minus

119889

V)

2

) 119890minus12058721198912

119877(119905minus119889V)2

(2)

where 119889 is the depth of the reflecting surface 119905 is the timedelay for the reflection to return to the surface 119891

119877is the

dominant frequency of the sound wave and V is the averagevelocity of the sound wave along its path The transferfunction relates the time taken for the energy pulse to covera particular distance Examples of the Ricker wavelet forthree different dominant frequencies are shown in Figure 2The Ricker wavelet is wide for low frequencies and morecompact for higher dominant frequencies Although thismeans that higher frequency soundwaves have the advantageof greater time resolution suchwaves have the overwhelmingdisadvantage that they penetrate less deeply into the groundThen the discrete convolution model can be written as

119891 (119905119894) =

119870minus1

sum

119896=1

119908 (119889119896 119905119894) times 120574119896 119894 = 1 2 119899 (3)

and including the effects of measurement error gives the finaldata equation

119910119894= 119891 (119905

119894) + 120598119894 119894 = 1 2 119899 (4)

where 120598119894 119894 = 1 2 119899 are independent measurement

errors modelled by a Gaussian distribution Figure 3 showsa diagrammatic representation of the formation of a seismicsignal which results from the convolution of the reflectioncoefficient and the Ricker wavelet The horizon locationscan be seen as spikes in the reflection coefficient series

Noise-free traceRicker waveletReflection coefficient

=times

(a) (b) (c)

Figure 3 Diagram illustrating data production (a) reflectioncoefficients (b) Ricker wavelet (c) seismic trace

corresponding to contrasting acoustic impedance valuesTheconvolution of this series with the Ricker wavelet leads to ablurred version of the reflection coefficient series In practicethe reflection coefficient series will be contaminated byminoracoustic impedance contrasts due to natural variation and byrandom noise describing other sources of error

Now consider an example of the simulation of a syntheticdataset using the above model Consider a sedimentarygeological regionwhich consists of several rock layers beddedhorizontally upon each other with each layer having a char-acteristic but naturally varying acoustic impedance Thislatter point will create weak reflections due to low-contrastchanges in acoustic impedance within a layer creating clutterwhich may partially obscure the true horizon locations Thiswill have the effect of contaminating the reflection coefficientseries which will be transformed into correlated errors in theseismic tracemdashin addition to the independent measurementerrors Further suppose there is a fault causing one part ofthe region to be deeper relative to the surrounding areaAs a particular example Figure 4 shows the seismogram oftraces along a transectmdashin these traces moderate naturalvariation and measurement noise have been included Thedataset consists of 21 traces each having 361 readings at equaltime intervals Three horizons can easily be identified onecurving up towards the top-right and the other two almosthorizontal in the bottomhalf of the pictureThe discontinuityin the horizons is caused by the presence of a fault towards

4 Advances in Statistics

Figure 4 Seismogram showing multiple traces in a syntheticdataset

the right-hand side There is also a gap in the lowest horizontowards the left-hand side

3 Method of Analysis

31 General Strategy The method proposed here takes asits starting point the seismogram Although this may havealready been preprocessed it is assumed that this involvedlittle or no smoothing Hence the method starts by consider-ing smoothing to remove the effects of noise before movingon to identify key features within the data

The overall objective of the analysis is to identify thestratigraphic structure in the seismogram that is long linearfeatures which are called horizons These represent lines ofhigh acoustic impedance contrast which are associated withimportant changes in subsurface structure In the presenceof low noise the points along a strong contrast horizoncan easily be identified as turning points in the individualtracesmdashsee Figure 4 Even in this case however as thenumber of horizons is unknown it might not be obvioushow many turning points to take Also when the contrast islower relative to the noise level identification will be furthercomplicated as turning points caused by the noise and featuremight sometimes have similar signal values

In the proposed method first each time series trace isconsidered separately and candidate horizon locations areidentified as local maxima and minima after smoothingThen these candidate points are classified into groups wheremembers of each class have similar signals Again noisemay mean that some points are incorrectly classified Theidentified classes are defined as structural points Finallyselected points in each class are linked to identify thehorizons The diagram in Figure 5 illustrates the steps of theanalysis strategy

32 Extracting Candidate Point Locations Figure 6(a) showsa single trace from themiddle of the earlier seismogramwith

local minima and maxima shown as open and filled circlesrespectivelyThe sets of the locations of the local maxima andminima are simply determined as

119871max0

= 119905119894 119910119894minus 119910119894minus1

gt 0 119910119894+1

minus 119910119894lt 0

119871min0

= 119905119894 119910119894minus 119910119894minus1

lt 0 119910119894+1

minus 119910119894gt 0

(5)

In total there are 61 turning points and although the threehorizon locations can easily be seen the remaining 58 arecaused by the noise In situations where there are highernoise levels it will not be so easy to separate the extremepoints created by horizons from those due to noise This willmean that some horizons are lost and that spurious pointscontaminate the true locations To reduce the influence ofnoise the traces are smoothed

For a given trace consider the 119899 paired data points(119905119894 119910119894) 119894 = 1 119899 related by an unknown nonlinear

relationship 119891(sdot) with additive noise that is

119910119894= 119891 (119905

119894) + 120598119894 119894 = 1 119899 (6)

where the noise 120598119894 119894 = 1 119899 are independent and

identically distributed Gaussian random variables with zeromean and constant variance 1205902

To reduce the effects of noise the unknown functioncan be estimated using smoothing splines (see eg [13 14])which minimizes the objective function

119869 (120582) =

1

119899

119899

sum

119894=1

(119910119894minus 119891(119905119894))2+ 120582int (119891

10158401015840(119905))

2

119889119905 (7)

subject to the constraint that the fitted curve is twice differ-entiable That is

119891120582(119905) = argmin

119891

1

119899

119899

sum

119894=1

(119910119894minus 119891(119905119894))2+ 120582int (119891

10158401015840(119905))

2

119889119905

(8)

In these the integral is over the interval [119905(1) 119905(119899)] that is the

minimum and maximum of the data collection times Thefirst term a residual mean-square measures the discrepancybetween the data and the fitted function and the secondan integrated squared curvature measures the roughnessof the fitted function and 120582 is a smoothing or bandwidthparameter To make the residuals zero the fitted functionmust interpolate the data In general this will lead to aquickly varying function which will have a high curvatureIn contrast to make the curvature zero the fitted func-tion must be linear which is unlikely to fit the data wellmaking the residuals large The smoothing parameter 120582controls the balance between these two measures Figures6(b) and 6(c) show the results of spline smoothing (usingthe R function smoothspline [15]) for small andmoderatevalues of 120582 respectively In each the major structure is stillevident and some undesirable points are removed Howevercare is needed to avoid oversmoothingwhichmight eliminateimportant features In the extreme however even the mainstructural features would be lost

Advances in Statistics 5

Smoothing HorizonsLink

structural pointscandidate pointsClassify the Extract

candidate pointsdataInput

Figure 5 Diagram of the data analysis strategy

(a) (b) (c)

Figure 6 A seismic trace after increasing levels of smoothing (a)low (b) moderate and (c) high smoothing Also shown are thelocations of local minima and maxima

Figures 6(b) and 6(c) also show the locations of the localturning points of the smoothed function estimates which canbe defined as

119871max120582

= 119905119894119891120582(119905119894) minus

119891120582(119905119894minus1) gt 0

119891120582(119905119894+1) minus

119891120582(119905119894) lt 0

119871min120582

= 119905119894119891120582(119905119894) minus

119891120582(119905119894minus1) lt 0

119891120582(119905119894+1) minus

119891120582(119905119894) gt 0

(9)

Here and later a scale-space approach (see eg [16 17]) willbe considered The idea is that one should not try to focuson a single smoothing level but consider several levels simul-taneously with the aim of identifying any dominant scalesFor any value of the smoothing parameter 120582 minimizing (7)will yield a fitted function 119891

120582 and hence this can be repeated

for a range of values of the smoothing parameter Figure 7shows a set of smoothed traces As the level of smoothingincreases from left to right the resulting smoothed tracesbecome smoother and smoother until they are a straight lineon the far right The best smoothed trace will be somewherewithin this set

Recall that the aim here is not to produce a ldquonicelysmoothed tracerdquo but instead to accurately locate the turningpoints which correspond to the horizons Hence the choiceof smoothing parameter focuses on this output rather thanon a more usual global goodness-of-fit measure To helpwith the analysis a novel turning-point tree is proposedwith an example shown in Figure 8 where the locations of

Figure 7 A single seismic trace after increasing levels of smoothing

all minima and maxima are shown as the smoothing levelchanges That is the tree shows a graphical representationof the sets 119871max

120582and 119871

min120582

as functions of 120582 The bottomedge of Figure 8 corresponds to no smoothing and everyturning point in the data appears As the smoothing increasesmoving upwardswithin the graph the turning point locationschange and sometimes turning points disappear Note thatas the smoothing increases pairs of neighbouring maximaand minima move towards each other before disappearingGradually the number of turning points decreases until at thetop of the graph only the three principal turning points areseen and eventually even these will disappear The horizontaldotted lines correspond to the low moderate and highsmoothing situations shown in Figure 6Overall it seems thatturning points associatedwith true horizons remain for a verywide range of levels of smoothing and hence themethod doesnot seem to be too sensitive to the exact choice

To summarize the threshold tree in Figure 8 considerthe number of turning points 119873

120582 for each value of the

smoothing parameter as shown in Figure 9(a) For smallvalues of 120582 there is little smoothing and hence there aremany turning points but as 120582 increases the number ofturning points reduces Turning points due to noise shouldbe removed quickly but true turning points due to thehorizons should remain To select an appropriate value of 120582it is proposed to use the value corresponding to the greatestchange in the number of turning points This derivative isshown in Figure 9(b) with the maximum (absolute) valueshown as a vertical line at about 120582lowast = 062 Figure 6(b)shows a smoothed trace using the corresponding smoothingparameter value Figure 10(a) shows the full dataset after

6 Advances in Statistics

Figure 8 Threshold tree showing the location of the turning points (blue minima red maxima) for varying levels of smoothing

00 02 04 06 08 10 12Smoothing

Num

ber o

f tur

ning

poi

nts

0

10

20

30

40

(a)

minus140

minus100

minus60

minus20

Der

ivat

ive

00 02 04 06 08 10 12Smoothing

(b)

Figure 9 Graphs of (a) the number of turning points as a functionof bandwidth and (b) the corresponding derivative and point ofmaximum absolute derivative

smoothing using the automatically chosen bandwidth andFigure 10(b) shows the corresponding minima and maximaas open and filled circles It is worth noting that this haslocated many more candidate points than the number of trueturning pointsThis is important as at this stage it is vital thattrue horizon locations are not lostThe next stage will identifywhich candidate points form coherent features

33 Candidate Point Classification and Structural Point Link-ing The second stage in the procedure is to classify thecandidate points into groups where members of a groupform a single horizon It is assumed that points along ahorizon will have a distinct acoustic impedance contrastgiving rise to a distinct reflectance coefficient and hence adistinct measured signal Since each horizon is expected tohave a different characteristic acoustic impedance contrastthe measured signals will be a mixture of an unknownnumber of distributions each of an unknown shape Alsothe various sources of systematic and random variability maymean that measured signals from one horizon vary and maybe similar to those from other horizons It is impossibleto obtain a parametric form for this distribution and so akernel density approach is used to provide a nonparametricsmoothed version of the signal strength distribution of thecandidate points For background to kernel density meth-ods see for example Wand and Jones [18] and Silverman[19]

Suppose that 120578 = 119873120582lowast turning points have been found

that their signal strengths are denoted by 119906119894 119894 = 1

120578 and that their locations along the traces are V119894 119894 =

1 120578 then the kernel density estimate of the signalstrength distribution is

119892120575(119906) =

120578

sum

119894=1

119870(

119906119895minus 119906

120575

) minusinfin lt 119906 lt infin (10)

Advances in Statistics 7

(a) (b)

Figure 10 Multiple smoothed traces in (a) with corresponding candidate points in (b) where open and filled circles represent minima andmaxima respectively

(a) (b) (c)

Figure 11 The curves represent kernel density with (a) low smoothing (b) moderate smoothing (c) high smoothing

where 119870(sdot) is the kernel function and 120575 is a bandwidthparameter A popular choice of kernel function is the stan-dard Gaussian density Despite the method being generallysuccessful at finding structure kernel density estimationcan also ignore significant structure via oversmoothing orretain insignificant structure via undersmoothing and hencethe correct choice of 120575 is important Figure 11 shows asimple bimodal dataset after varying levels of smoothingusing the R function density [15] We might guess that (a)is undersmoothed retaining unimportant random variationand that (c) is oversmoothed leaving little useful informationThe central graph (b) uses the default bandwidth based onSilvermanrsquos rule of thumb [19] which seems to summarize thedistribution well

The number of modes in the kernel density estimate isregarded as the number of distinct classes and hence thenumber of horizons The locations of the minima betweenthe modes can be taken as thresholds which can then be usedto classify the candidate points Considering the three levelsof smoothing used in Figure 11 would however lead to verydifferent results Rather than attempting to estimate a singlelevel of smoothing again a scale-space approach will be used(again see eg [16 17]) In this approach many levels of

smoothing are considered allowing features at different scalesto appear and allowing a natural scale to be identified

Let the set of locations of the local minima which are tobe used as threshold be defined as

120591120575= 119866

min120575

= 119906119894 119892120575(119906119894) minus 119892120575(119906119894minus1) lt 0 119892

120575(119906119894+1) minus 119892120575(119906119894gt 0)

(11)

Suppose that 119870 thresholds are found in this way hencedefining 119870 + 1 groups For a particular bandwidth 120575 let thethresholds be denoted by 120591

120575= 120591119896 119896 = 1 2 119870 which

allows the members of the classes to be defined as

119862119896

120575=

119906119894 119906119894lt 1205911 for 119896 = 1

119906119894 120591119896minus1

lt 119906119894lt 120591119896 for 2 le 119896 le 119870

119906119894 119906119894gt 120591119870 for 119896 = 119870 + 1

(12)

and the number in each class as119873119896120575for 119896 = 1 2 119870

Figure 12(a) shows the relation between the locations ofthe thresholds and the level of smoothing defined in terms

8 Advances in Statistics

000 002 004 006 008Bandwidth

minus06

minus04

minus020Thre

shol

d

00

02

04

(a)

Den

sity

0

1

2

3

4

5

Signal strengthminus08 minus04 00 02 04

(b) (c)

Figure 12 (a) Threshold tree (b) histogram of kernel density and (c) turning point locations

of the bandwidth 120575 The change in the bandwidth leads toa change in the number and value of the thresholds andhence in the number of classes As the bandwidth increasesthe number of thresholds decreases and when the curvescombine into a single line there is only a single thresholdwhich means in turn that the number of groups is twoThe procedure is stopped when the smoothed distributionbecomes unimodal In the proposed method the best clas-sification is defined to be when the number of thresholds isstable for the longest

In this example three thresholds persist the longest abandwidth interval indicated by the vertical dotted linesand hence suggests four clusters The optimum smoothingbandwidth 120575lowast is then taken as the smallest value in thisintervalThe corresponding kernel density estimate is plottedin Figure 12(b) the vertical dotted lines show the position ofthe thresholds It is now important to realise that the clustersurrounding signal strength zero includes turning pointsgenerated by the random noise and should be excluded Thiswill be considered now before moving to the final linkingstep

The observed signal strength distribution will be a mix-ture of background values and values due to true horizonsOf course the various means and the variance of thesecomponents are unknown To describe the distribution of thesignal values from the background the following approachis proposed The centre of the background signal strengthdistribution is estimated using the median of the full data

120583 = median119895

119906119895 (13)

and the variance estimate is based on the median absolutedeviation (MAD)

2=

MADΦminus1(34)

with MAD = median119895

10038161003816100381610038161003816119906119895minus 120583

10038161003816100381610038161003816 (14)

where the value 1Φminus1(34) asymp 148 is chosen so that whenthe values follow a Gaussian distribution then 119864[

2] =

1205902 These estimators are chosen to be robust to outliers

caused by the signal strengths from the horizons The solidhorizontal line near the bottom of Figure 12(b) representsa 95 confidence interval for background signal strengthThe calculation assumes a Gaussian distribution and uses theabovemean and variance estimatesThis interval encloses themajority of this zero-centred mode and confirms that it isreasonable to exclude the whole group Before moving onit is worth noting that for datasets containing no horizonsall turning points will form a single group which will beidentified as a background group and hence the procedurewould not identify any horizons

Figure 12(c) shows all the minima and maxima as openand filled circles respectively The grey circles correspond tothe background group and the black circles to the horizongroupsThe algorithm achieves good results in extracting andclassifying the structural points which make the horizonsbut notice that although the horizons can be clearly seenthere are three extra points near the top and one missingalong the central horizon Also although not visible thereare six points misclassified to the wrong group This meansthat linking would not be perfect with some crossing of theproposed horizons This suggests that further criteria mustbe included in this final linking process As a simple examplehere isolated structural point will be excluded from thelinking In particular if the distance to the nearest other pointin the group is greater than say twice the average within-group interpoint distance then is excluded

Let the ordered set of all locations along the traces ofmember of class 119896 be denoted by V

119894 1 2 119873

119896

120575lowast Then

consider the median distance between neighbouring pointsin the class

119889119896= median

119894

1003816100381610038161003816V119894+1

minus V119894

1003816100381610038161003816 (15)

and the distance from a point to the nearest other point in theclass

119898119896

119894= min119895

10038161003816100381610038161003816V119894minus V119895

10038161003816100381610038161003816 (16)

Advances in Statistics 9

Figure 13 Linked structural points with modified method

This allows the following modified rule for class membershipto be defined

119862119896

120575lowast =

V119894 V119894lt 1205911 119898119896

119894lt 2 times 119889

119896

for 119896 = 1

V119894 120591119896minus1

lt V119894lt 120591119896 119898119896

119894lt 2 times 119889

119896

for 2 le 119896 le 119870

V119894 V119894gt 120591119870 119898119896

119894lt 2 times 119889

119896

for 119896 = 119870 + 1

(17)

Applying this final rule to the previous example producesthree groups of structural points forming clear and coherenthorizons as shown in Figure 13 Although not shown hereit has also been applied to examples with higher noise levelswith equal success

4 Conclusions

Seismic surveys play an important role in locating under-ground features but the partial reflection and natural varia-tion in acoustic properties mean that the recorded acousticsignals provide a confusing representation of the studyarea Full 3D datasets can be enormous covering severalsquare kilometres on the surface and representing severalkilometres in depth with high frequency temporal samplingThis suggests that rapid analysis is needed even if interestingexamples require further analysis or examination by expertsThis also suggests that simple techniques are preferablewith full reconstruction reserved for follow-up analysis Rawdata contain systematic distortions and significant noiseand typically they undergo substantial preprocessing beforeanalysis is started The preprocessing aims at reducing thedistortion and reducing noise but can introduce artefacts

The approach proposed here is based on simple statisticalprocedures which aim at smoothing out noise and usingcontextual information to identify coherent horizons In

particular smoothing splines are used to smooth individualtraces before localminima andmaxima are located It appearsthat greater smoothing can be applied to the traces withoutoverwhelming the major features than would be the caseif the aim was to estimate the signal Once a smaller set ofcandidate points is found they can be classified into groupswhere each group represents points along a horizon Clas-sification is performed by thresholding the signal strengthdistribution so that each mode represents a class Againsmoothing is needed to help select the correct number ofmodes and hence classes in the distributionThis time kerneldensity methods are used Here the choice of bandwidth ismuch more important as even small changes can lead to asubstantial change in the smoothed density A scale-spaceapproach is adopted which first allows all levels of smoothingto be considered The best level of smoothing and hencethe number of clusters and the number of horizons is thendetermined as the value for which the number of clusterspersists the longest The cluster associated with the randomvariation in the background is identified and excluded fromhorizon linkage Locationswithin each cluster are then linkedto form horizons In some high noise examples furtherinformation is used in this linking stage to exclude pointswhich are distant from the main cluster and hence unlikelyto be part of the horizon This does not prevent howeverhorizons from being linked across gaps and across faultsClearly it would be possible to supplement this fine tuningwith other criteria as necessary Based on the experimentsconducted here the proposed method shows great promiseIt provides a simple and quick approach to locate and linkhorizons even in relatively noisy datasets It is not affected bygaps in horizons nor by discontinuities caused by faults

Finally it is important to note that the proposed methodis not specific to seismic studies but can easily be appliedto other problems where the aim is to identify coherentstructure within 2D or 3D data Also the use of smoothingand threshold trees gives a simple graphical tool for anyclassification problem and the use of the most persistentthresholding is applicable to many other situations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are grateful to the three anonymous referees fortheir helpful comments which have substantially improvedthis paper

References

[1] PM Shearer Introduction to Seismology Cambridge UniversityPress Cambridge UK 1999

[2] R E Sheriff and L P Geldart Exploration Seismology Cam-bridge University Press Cambridge Mass USA 2nd edition1995

10 Advances in Statistics

[3] K H Waters Reflection Seismology John Wiley amp Sons NewYork NY USA 1978

[4] L Hatton M Worthington and J Makin Seismic Data Pro-cessing Theory and Practice Blackwell Scientific PublicationsLondon UK 1986

[5] A Ribes and F Schmitt ldquoLinear inverse problems in imagingan introductory surveyrdquo IEEE Signal Processing Magazine vol25 no 4 pp 84ndash99 2008

[6] A M Stuart ldquoInverse problems a Bayesian perspectiverdquo ActaNumerica vol 19 pp 451ndash559 2010

[7] W W Symes ldquoThe seismic reflection inverse problemrdquo InverseProblems vol 25 no 12 Article ID 123008 39 pages 2009

[8] V Lendzionowski A TWalden andR EWhite ldquoSeismic char-actermapping over reservoir intervalsrdquoGeophysical Prospectingvol 38 no 8 pp 951ndash969 1990

[9] M van der Baan and A Paul ldquoRecognition and reconstructionof coherent energy with application to deep seismic reflectiondatardquo Geophysics vol 65 no 2 pp 656ndash667 2000

[10] A T Walden ldquoSpatial clustering using simple summaries ofdata to find the edge of an oil-fieldrdquo Applied Statistics vol 43pp 385ndash395 1994

[11] D W Oldenburg T Scheuer and S Levy ldquoRecovery of theacoustic impedance from reflection seismogramsrdquo Geophysicsvol 48 no 10 pp 1318ndash1337 1983

[12] N Ricker ldquoThe form and nature of seismic waves and thestructure of seismogramsrdquoGeophysics vol 5 pp 348ndash366 1940

[13] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[14] P J Green and B W Silverman Nonparametric Regressionand Generalized Linear Models A Roughness Penalty ApproachChapman and Hall 1994

[15] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2013 httpwwwR-projectorg

[16] T Lindeberg ldquoScale-space theory a basic tool for analyzingstructures at different scalesrdquo Journal of Applied Statistics vol21 pp 225ndash270 1994

[17] A P Witkin ldquoScale-space filteringrdquo in Proceedings of 8thInternational Joint Conference on Artificial Intelligence pp 1019ndash1022 Karlsruhe Germany 1983

[18] M P Wand and M C Jones Kernel Smoothing Chapman ampHallCRC 1995

[19] B SilvermanDensity Estimation for Statistics andDataAnalysisChapman amp HallCRC 1998

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 2: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

2 Advances in Statistics

seismic pulseSource of the

Underground reflecting surfaces

Seismic sensors

lowast

Figure 1 Diagram showing the formation of the reflected seismicsignals

the horizonThe location of faults is an important task for thestructural interpretation of seismic data

There are several approaches and many contributionsthat have been made to solve seismological problems Oneapproach is to perform a full 3D reconstruction of the sub-surface from the surface measurementsThis type of problemis well known and is referred to as an ill-posed inverseproblem which often requires substantial regularization Ageneral background to inverse problems can be found inRibes and Schmitt [5] and in a mathematical discussionin Stuart [6] Even in 2D the reconstruction process isusually mathematically and computationally challengingand performing a full 3D reconstruction for a large scaleseismic problem (possibly involving terabytes of data) maybe impractical and ineffectivemdashthis is in contrast to othergeophysical data collection methods (see eg the reviewpaper of [7]) Also the reconstructed distribution wouldstill need postprocessing to identify features and a majordrawback of this two-stage approach is that the regularizationrequired in the first step may mask the features of interestInstead other types of analysis have beenmorewidely studiedlooking to produce surface maps to indicate what is beneaththe surface Lendzionowski et al [8] and van der Baan andPaul [9] use pattern recognition methods in seismologyIn particular they compare the traces against a library ofpatterns from known features with the aim of identifyingsimilar features The key to success is the construction ofa varied library as otherwise features of interest could beignored Walden [10] uses simple summary measures of thetraces followed by projection pursuit and thresholding toclassify the traces into homogeneous groups This howeverwas performed on the entire trace and hence no depthinformation is retained Also there is no spatial smoothingmeaning that some locations can be classified differently toneighbouring locations simply due to the effects of noise

The rest of the paper is organized as follows In Section 2a model is presented which describes the full multistageprocess leading to a seismic dataset This model will be used

to generate synthetic data but it will not be used for dataanalysis Instead the method of data analysis is described inSection 3 along with examples Finally there is a conclusionsrsquosection followed by references

2 A Model for Seismic Data

In this section the standard model for seismic data isdescribed In this paper the model has been used for datageneration but it is not used in the proposed method ofanalysis The use of a realistic seismic model will mean thatsynthetic data mimic real data If the aim of an analysisis full 3D subsurface reconstruction then such a modelwould also be used in analysis however in this work theproposed method detailed in the next section interprets thedata directly

Suppose that acoustic sensors are placed on the earthrsquossurface at 119899 distinct locations forming a regular grid acrosssome study regionsAt time 119905

0= 0 a seismic pulse is generated

and data recording starts The energy from the pulse spreadsthrough the ground with some proportion being reflectedback towards the surface The sensors record this reflectedenergy at 119899 equally spaced time points T = 119905

119894 119894 =

1 2 119899 spread over several seconds before data collectionstops Hence at each location a time series y = 119910

119894 119894 =

1 2 119899 is recorded Let the 119899 sensor locations be denotedby S = s

119895 119895 = 1 2 119898 Collecting the multiple time

series together produces the full seismic dataset Y = 119910119894119895

119894 = 1 2 119899 119895 = 1 2 119898 where 119910119894119895

represents thereflected energy measured at time 119905

119894and location s

119895

The amount of reflected energy depends on the acousticimpedance distribution of the subsurface as well as proper-ties of the initial seismic pulse A common model used inreflection seismology is that of a stack of horizontal layers(see [4 11]) Suppose that there are 119870 distinct layers with thebottom of the layers at depthsD = 119889

119896 119896 = 1 2 119870minus1mdash

note that the lowest layer is assumed to stretch to infinity andhence there is no value for 119889

119870mdashand that the corresponding

acoustic impedances are W = 120596119896 119896 = 1 2 119870 In

turn the acoustic impedance is determined by the product ofthe density 120588 and the acoustic velocity V of the material inthe layer Typical values for density (in gcm3) and acousticvelocity (kmsec) are water 120588 = 10 V = 14 oil 120588 = 08V = 13 and rock 120588 = 15ndash30 V = 30ndash60 This meansthat it is possible for two materials to have different densityand acoustic velocity but to have indistinguishable acousticimpedance values

Where two layers touch there is an acoustic impedancecontrast described by the reflection coefficientThis is definedas the difference between the acoustic impedances of the twolayers divided by their sum So the reflection coefficient 120574

119896

at the bottom of layer 119896 is given by

120574119896=

120596119896+1

minus 120596119896

120596119896+1

+ 120596119896

119896 = 1 2 119870 minus 1 (1)

In a regionwithmultiple layers therewill be several reflectingsurfaces at different depths A reflection coefficient seriesor spike series records the depth and value of the various

Advances in Statistics 3

00

02

04

06

08

10

000 002 004minus004 minus002

(a)

00

02

04

06

08

10

000 002 004minus004 minus002

(b)

00

02

04

06

08

10

000 002 004minus004 minus002

(c)

Figure 2 Ricker wavelet of varying dominant frequency (a) 25Hz (b) 50Hz (c) 100Hz

reflection coefficients and can be written as 120574 = 120574119896 119896 =

1 2 119870 minus 1 at depths D = 119889119896 119896 = 1 2 119870 minus 1

This series therefore encodes the information about thehorizons in the sense that the arrival time of a reflectionis directly proportional to the location of the spike 119889

119896 and

the amplitude of the spike represents the strength of thereflection 120574

119896

The model of recorded seismic data is completed by theconvolution of the reflection coefficient series with a transferfunction which is often taken as the Ricker wavelet [12] TheRicker wavelet is defined as the scaled second derivative ofthe Gaussian function written as

119908 (119889 119905) = (1 minus 212058721198912

119877(119905 minus

119889

V)

2

) 119890minus12058721198912

119877(119905minus119889V)2

(2)

where 119889 is the depth of the reflecting surface 119905 is the timedelay for the reflection to return to the surface 119891

119877is the

dominant frequency of the sound wave and V is the averagevelocity of the sound wave along its path The transferfunction relates the time taken for the energy pulse to covera particular distance Examples of the Ricker wavelet forthree different dominant frequencies are shown in Figure 2The Ricker wavelet is wide for low frequencies and morecompact for higher dominant frequencies Although thismeans that higher frequency soundwaves have the advantageof greater time resolution suchwaves have the overwhelmingdisadvantage that they penetrate less deeply into the groundThen the discrete convolution model can be written as

119891 (119905119894) =

119870minus1

sum

119896=1

119908 (119889119896 119905119894) times 120574119896 119894 = 1 2 119899 (3)

and including the effects of measurement error gives the finaldata equation

119910119894= 119891 (119905

119894) + 120598119894 119894 = 1 2 119899 (4)

where 120598119894 119894 = 1 2 119899 are independent measurement

errors modelled by a Gaussian distribution Figure 3 showsa diagrammatic representation of the formation of a seismicsignal which results from the convolution of the reflectioncoefficient and the Ricker wavelet The horizon locationscan be seen as spikes in the reflection coefficient series

Noise-free traceRicker waveletReflection coefficient

=times

(a) (b) (c)

Figure 3 Diagram illustrating data production (a) reflectioncoefficients (b) Ricker wavelet (c) seismic trace

corresponding to contrasting acoustic impedance valuesTheconvolution of this series with the Ricker wavelet leads to ablurred version of the reflection coefficient series In practicethe reflection coefficient series will be contaminated byminoracoustic impedance contrasts due to natural variation and byrandom noise describing other sources of error

Now consider an example of the simulation of a syntheticdataset using the above model Consider a sedimentarygeological regionwhich consists of several rock layers beddedhorizontally upon each other with each layer having a char-acteristic but naturally varying acoustic impedance Thislatter point will create weak reflections due to low-contrastchanges in acoustic impedance within a layer creating clutterwhich may partially obscure the true horizon locations Thiswill have the effect of contaminating the reflection coefficientseries which will be transformed into correlated errors in theseismic tracemdashin addition to the independent measurementerrors Further suppose there is a fault causing one part ofthe region to be deeper relative to the surrounding areaAs a particular example Figure 4 shows the seismogram oftraces along a transectmdashin these traces moderate naturalvariation and measurement noise have been included Thedataset consists of 21 traces each having 361 readings at equaltime intervals Three horizons can easily be identified onecurving up towards the top-right and the other two almosthorizontal in the bottomhalf of the pictureThe discontinuityin the horizons is caused by the presence of a fault towards

4 Advances in Statistics

Figure 4 Seismogram showing multiple traces in a syntheticdataset

the right-hand side There is also a gap in the lowest horizontowards the left-hand side

3 Method of Analysis

31 General Strategy The method proposed here takes asits starting point the seismogram Although this may havealready been preprocessed it is assumed that this involvedlittle or no smoothing Hence the method starts by consider-ing smoothing to remove the effects of noise before movingon to identify key features within the data

The overall objective of the analysis is to identify thestratigraphic structure in the seismogram that is long linearfeatures which are called horizons These represent lines ofhigh acoustic impedance contrast which are associated withimportant changes in subsurface structure In the presenceof low noise the points along a strong contrast horizoncan easily be identified as turning points in the individualtracesmdashsee Figure 4 Even in this case however as thenumber of horizons is unknown it might not be obvioushow many turning points to take Also when the contrast islower relative to the noise level identification will be furthercomplicated as turning points caused by the noise and featuremight sometimes have similar signal values

In the proposed method first each time series trace isconsidered separately and candidate horizon locations areidentified as local maxima and minima after smoothingThen these candidate points are classified into groups wheremembers of each class have similar signals Again noisemay mean that some points are incorrectly classified Theidentified classes are defined as structural points Finallyselected points in each class are linked to identify thehorizons The diagram in Figure 5 illustrates the steps of theanalysis strategy

32 Extracting Candidate Point Locations Figure 6(a) showsa single trace from themiddle of the earlier seismogramwith

local minima and maxima shown as open and filled circlesrespectivelyThe sets of the locations of the local maxima andminima are simply determined as

119871max0

= 119905119894 119910119894minus 119910119894minus1

gt 0 119910119894+1

minus 119910119894lt 0

119871min0

= 119905119894 119910119894minus 119910119894minus1

lt 0 119910119894+1

minus 119910119894gt 0

(5)

In total there are 61 turning points and although the threehorizon locations can easily be seen the remaining 58 arecaused by the noise In situations where there are highernoise levels it will not be so easy to separate the extremepoints created by horizons from those due to noise This willmean that some horizons are lost and that spurious pointscontaminate the true locations To reduce the influence ofnoise the traces are smoothed

For a given trace consider the 119899 paired data points(119905119894 119910119894) 119894 = 1 119899 related by an unknown nonlinear

relationship 119891(sdot) with additive noise that is

119910119894= 119891 (119905

119894) + 120598119894 119894 = 1 119899 (6)

where the noise 120598119894 119894 = 1 119899 are independent and

identically distributed Gaussian random variables with zeromean and constant variance 1205902

To reduce the effects of noise the unknown functioncan be estimated using smoothing splines (see eg [13 14])which minimizes the objective function

119869 (120582) =

1

119899

119899

sum

119894=1

(119910119894minus 119891(119905119894))2+ 120582int (119891

10158401015840(119905))

2

119889119905 (7)

subject to the constraint that the fitted curve is twice differ-entiable That is

119891120582(119905) = argmin

119891

1

119899

119899

sum

119894=1

(119910119894minus 119891(119905119894))2+ 120582int (119891

10158401015840(119905))

2

119889119905

(8)

In these the integral is over the interval [119905(1) 119905(119899)] that is the

minimum and maximum of the data collection times Thefirst term a residual mean-square measures the discrepancybetween the data and the fitted function and the secondan integrated squared curvature measures the roughnessof the fitted function and 120582 is a smoothing or bandwidthparameter To make the residuals zero the fitted functionmust interpolate the data In general this will lead to aquickly varying function which will have a high curvatureIn contrast to make the curvature zero the fitted func-tion must be linear which is unlikely to fit the data wellmaking the residuals large The smoothing parameter 120582controls the balance between these two measures Figures6(b) and 6(c) show the results of spline smoothing (usingthe R function smoothspline [15]) for small andmoderatevalues of 120582 respectively In each the major structure is stillevident and some undesirable points are removed Howevercare is needed to avoid oversmoothingwhichmight eliminateimportant features In the extreme however even the mainstructural features would be lost

Advances in Statistics 5

Smoothing HorizonsLink

structural pointscandidate pointsClassify the Extract

candidate pointsdataInput

Figure 5 Diagram of the data analysis strategy

(a) (b) (c)

Figure 6 A seismic trace after increasing levels of smoothing (a)low (b) moderate and (c) high smoothing Also shown are thelocations of local minima and maxima

Figures 6(b) and 6(c) also show the locations of the localturning points of the smoothed function estimates which canbe defined as

119871max120582

= 119905119894119891120582(119905119894) minus

119891120582(119905119894minus1) gt 0

119891120582(119905119894+1) minus

119891120582(119905119894) lt 0

119871min120582

= 119905119894119891120582(119905119894) minus

119891120582(119905119894minus1) lt 0

119891120582(119905119894+1) minus

119891120582(119905119894) gt 0

(9)

Here and later a scale-space approach (see eg [16 17]) willbe considered The idea is that one should not try to focuson a single smoothing level but consider several levels simul-taneously with the aim of identifying any dominant scalesFor any value of the smoothing parameter 120582 minimizing (7)will yield a fitted function 119891

120582 and hence this can be repeated

for a range of values of the smoothing parameter Figure 7shows a set of smoothed traces As the level of smoothingincreases from left to right the resulting smoothed tracesbecome smoother and smoother until they are a straight lineon the far right The best smoothed trace will be somewherewithin this set

Recall that the aim here is not to produce a ldquonicelysmoothed tracerdquo but instead to accurately locate the turningpoints which correspond to the horizons Hence the choiceof smoothing parameter focuses on this output rather thanon a more usual global goodness-of-fit measure To helpwith the analysis a novel turning-point tree is proposedwith an example shown in Figure 8 where the locations of

Figure 7 A single seismic trace after increasing levels of smoothing

all minima and maxima are shown as the smoothing levelchanges That is the tree shows a graphical representationof the sets 119871max

120582and 119871

min120582

as functions of 120582 The bottomedge of Figure 8 corresponds to no smoothing and everyturning point in the data appears As the smoothing increasesmoving upwardswithin the graph the turning point locationschange and sometimes turning points disappear Note thatas the smoothing increases pairs of neighbouring maximaand minima move towards each other before disappearingGradually the number of turning points decreases until at thetop of the graph only the three principal turning points areseen and eventually even these will disappear The horizontaldotted lines correspond to the low moderate and highsmoothing situations shown in Figure 6Overall it seems thatturning points associatedwith true horizons remain for a verywide range of levels of smoothing and hence themethod doesnot seem to be too sensitive to the exact choice

To summarize the threshold tree in Figure 8 considerthe number of turning points 119873

120582 for each value of the

smoothing parameter as shown in Figure 9(a) For smallvalues of 120582 there is little smoothing and hence there aremany turning points but as 120582 increases the number ofturning points reduces Turning points due to noise shouldbe removed quickly but true turning points due to thehorizons should remain To select an appropriate value of 120582it is proposed to use the value corresponding to the greatestchange in the number of turning points This derivative isshown in Figure 9(b) with the maximum (absolute) valueshown as a vertical line at about 120582lowast = 062 Figure 6(b)shows a smoothed trace using the corresponding smoothingparameter value Figure 10(a) shows the full dataset after

6 Advances in Statistics

Figure 8 Threshold tree showing the location of the turning points (blue minima red maxima) for varying levels of smoothing

00 02 04 06 08 10 12Smoothing

Num

ber o

f tur

ning

poi

nts

0

10

20

30

40

(a)

minus140

minus100

minus60

minus20

Der

ivat

ive

00 02 04 06 08 10 12Smoothing

(b)

Figure 9 Graphs of (a) the number of turning points as a functionof bandwidth and (b) the corresponding derivative and point ofmaximum absolute derivative

smoothing using the automatically chosen bandwidth andFigure 10(b) shows the corresponding minima and maximaas open and filled circles It is worth noting that this haslocated many more candidate points than the number of trueturning pointsThis is important as at this stage it is vital thattrue horizon locations are not lostThe next stage will identifywhich candidate points form coherent features

33 Candidate Point Classification and Structural Point Link-ing The second stage in the procedure is to classify thecandidate points into groups where members of a groupform a single horizon It is assumed that points along ahorizon will have a distinct acoustic impedance contrastgiving rise to a distinct reflectance coefficient and hence adistinct measured signal Since each horizon is expected tohave a different characteristic acoustic impedance contrastthe measured signals will be a mixture of an unknownnumber of distributions each of an unknown shape Alsothe various sources of systematic and random variability maymean that measured signals from one horizon vary and maybe similar to those from other horizons It is impossibleto obtain a parametric form for this distribution and so akernel density approach is used to provide a nonparametricsmoothed version of the signal strength distribution of thecandidate points For background to kernel density meth-ods see for example Wand and Jones [18] and Silverman[19]

Suppose that 120578 = 119873120582lowast turning points have been found

that their signal strengths are denoted by 119906119894 119894 = 1

120578 and that their locations along the traces are V119894 119894 =

1 120578 then the kernel density estimate of the signalstrength distribution is

119892120575(119906) =

120578

sum

119894=1

119870(

119906119895minus 119906

120575

) minusinfin lt 119906 lt infin (10)

Advances in Statistics 7

(a) (b)

Figure 10 Multiple smoothed traces in (a) with corresponding candidate points in (b) where open and filled circles represent minima andmaxima respectively

(a) (b) (c)

Figure 11 The curves represent kernel density with (a) low smoothing (b) moderate smoothing (c) high smoothing

where 119870(sdot) is the kernel function and 120575 is a bandwidthparameter A popular choice of kernel function is the stan-dard Gaussian density Despite the method being generallysuccessful at finding structure kernel density estimationcan also ignore significant structure via oversmoothing orretain insignificant structure via undersmoothing and hencethe correct choice of 120575 is important Figure 11 shows asimple bimodal dataset after varying levels of smoothingusing the R function density [15] We might guess that (a)is undersmoothed retaining unimportant random variationand that (c) is oversmoothed leaving little useful informationThe central graph (b) uses the default bandwidth based onSilvermanrsquos rule of thumb [19] which seems to summarize thedistribution well

The number of modes in the kernel density estimate isregarded as the number of distinct classes and hence thenumber of horizons The locations of the minima betweenthe modes can be taken as thresholds which can then be usedto classify the candidate points Considering the three levelsof smoothing used in Figure 11 would however lead to verydifferent results Rather than attempting to estimate a singlelevel of smoothing again a scale-space approach will be used(again see eg [16 17]) In this approach many levels of

smoothing are considered allowing features at different scalesto appear and allowing a natural scale to be identified

Let the set of locations of the local minima which are tobe used as threshold be defined as

120591120575= 119866

min120575

= 119906119894 119892120575(119906119894) minus 119892120575(119906119894minus1) lt 0 119892

120575(119906119894+1) minus 119892120575(119906119894gt 0)

(11)

Suppose that 119870 thresholds are found in this way hencedefining 119870 + 1 groups For a particular bandwidth 120575 let thethresholds be denoted by 120591

120575= 120591119896 119896 = 1 2 119870 which

allows the members of the classes to be defined as

119862119896

120575=

119906119894 119906119894lt 1205911 for 119896 = 1

119906119894 120591119896minus1

lt 119906119894lt 120591119896 for 2 le 119896 le 119870

119906119894 119906119894gt 120591119870 for 119896 = 119870 + 1

(12)

and the number in each class as119873119896120575for 119896 = 1 2 119870

Figure 12(a) shows the relation between the locations ofthe thresholds and the level of smoothing defined in terms

8 Advances in Statistics

000 002 004 006 008Bandwidth

minus06

minus04

minus020Thre

shol

d

00

02

04

(a)

Den

sity

0

1

2

3

4

5

Signal strengthminus08 minus04 00 02 04

(b) (c)

Figure 12 (a) Threshold tree (b) histogram of kernel density and (c) turning point locations

of the bandwidth 120575 The change in the bandwidth leads toa change in the number and value of the thresholds andhence in the number of classes As the bandwidth increasesthe number of thresholds decreases and when the curvescombine into a single line there is only a single thresholdwhich means in turn that the number of groups is twoThe procedure is stopped when the smoothed distributionbecomes unimodal In the proposed method the best clas-sification is defined to be when the number of thresholds isstable for the longest

In this example three thresholds persist the longest abandwidth interval indicated by the vertical dotted linesand hence suggests four clusters The optimum smoothingbandwidth 120575lowast is then taken as the smallest value in thisintervalThe corresponding kernel density estimate is plottedin Figure 12(b) the vertical dotted lines show the position ofthe thresholds It is now important to realise that the clustersurrounding signal strength zero includes turning pointsgenerated by the random noise and should be excluded Thiswill be considered now before moving to the final linkingstep

The observed signal strength distribution will be a mix-ture of background values and values due to true horizonsOf course the various means and the variance of thesecomponents are unknown To describe the distribution of thesignal values from the background the following approachis proposed The centre of the background signal strengthdistribution is estimated using the median of the full data

120583 = median119895

119906119895 (13)

and the variance estimate is based on the median absolutedeviation (MAD)

2=

MADΦminus1(34)

with MAD = median119895

10038161003816100381610038161003816119906119895minus 120583

10038161003816100381610038161003816 (14)

where the value 1Φminus1(34) asymp 148 is chosen so that whenthe values follow a Gaussian distribution then 119864[

2] =

1205902 These estimators are chosen to be robust to outliers

caused by the signal strengths from the horizons The solidhorizontal line near the bottom of Figure 12(b) representsa 95 confidence interval for background signal strengthThe calculation assumes a Gaussian distribution and uses theabovemean and variance estimatesThis interval encloses themajority of this zero-centred mode and confirms that it isreasonable to exclude the whole group Before moving onit is worth noting that for datasets containing no horizonsall turning points will form a single group which will beidentified as a background group and hence the procedurewould not identify any horizons

Figure 12(c) shows all the minima and maxima as openand filled circles respectively The grey circles correspond tothe background group and the black circles to the horizongroupsThe algorithm achieves good results in extracting andclassifying the structural points which make the horizonsbut notice that although the horizons can be clearly seenthere are three extra points near the top and one missingalong the central horizon Also although not visible thereare six points misclassified to the wrong group This meansthat linking would not be perfect with some crossing of theproposed horizons This suggests that further criteria mustbe included in this final linking process As a simple examplehere isolated structural point will be excluded from thelinking In particular if the distance to the nearest other pointin the group is greater than say twice the average within-group interpoint distance then is excluded

Let the ordered set of all locations along the traces ofmember of class 119896 be denoted by V

119894 1 2 119873

119896

120575lowast Then

consider the median distance between neighbouring pointsin the class

119889119896= median

119894

1003816100381610038161003816V119894+1

minus V119894

1003816100381610038161003816 (15)

and the distance from a point to the nearest other point in theclass

119898119896

119894= min119895

10038161003816100381610038161003816V119894minus V119895

10038161003816100381610038161003816 (16)

Advances in Statistics 9

Figure 13 Linked structural points with modified method

This allows the following modified rule for class membershipto be defined

119862119896

120575lowast =

V119894 V119894lt 1205911 119898119896

119894lt 2 times 119889

119896

for 119896 = 1

V119894 120591119896minus1

lt V119894lt 120591119896 119898119896

119894lt 2 times 119889

119896

for 2 le 119896 le 119870

V119894 V119894gt 120591119870 119898119896

119894lt 2 times 119889

119896

for 119896 = 119870 + 1

(17)

Applying this final rule to the previous example producesthree groups of structural points forming clear and coherenthorizons as shown in Figure 13 Although not shown hereit has also been applied to examples with higher noise levelswith equal success

4 Conclusions

Seismic surveys play an important role in locating under-ground features but the partial reflection and natural varia-tion in acoustic properties mean that the recorded acousticsignals provide a confusing representation of the studyarea Full 3D datasets can be enormous covering severalsquare kilometres on the surface and representing severalkilometres in depth with high frequency temporal samplingThis suggests that rapid analysis is needed even if interestingexamples require further analysis or examination by expertsThis also suggests that simple techniques are preferablewith full reconstruction reserved for follow-up analysis Rawdata contain systematic distortions and significant noiseand typically they undergo substantial preprocessing beforeanalysis is started The preprocessing aims at reducing thedistortion and reducing noise but can introduce artefacts

The approach proposed here is based on simple statisticalprocedures which aim at smoothing out noise and usingcontextual information to identify coherent horizons In

particular smoothing splines are used to smooth individualtraces before localminima andmaxima are located It appearsthat greater smoothing can be applied to the traces withoutoverwhelming the major features than would be the caseif the aim was to estimate the signal Once a smaller set ofcandidate points is found they can be classified into groupswhere each group represents points along a horizon Clas-sification is performed by thresholding the signal strengthdistribution so that each mode represents a class Againsmoothing is needed to help select the correct number ofmodes and hence classes in the distributionThis time kerneldensity methods are used Here the choice of bandwidth ismuch more important as even small changes can lead to asubstantial change in the smoothed density A scale-spaceapproach is adopted which first allows all levels of smoothingto be considered The best level of smoothing and hencethe number of clusters and the number of horizons is thendetermined as the value for which the number of clusterspersists the longest The cluster associated with the randomvariation in the background is identified and excluded fromhorizon linkage Locationswithin each cluster are then linkedto form horizons In some high noise examples furtherinformation is used in this linking stage to exclude pointswhich are distant from the main cluster and hence unlikelyto be part of the horizon This does not prevent howeverhorizons from being linked across gaps and across faultsClearly it would be possible to supplement this fine tuningwith other criteria as necessary Based on the experimentsconducted here the proposed method shows great promiseIt provides a simple and quick approach to locate and linkhorizons even in relatively noisy datasets It is not affected bygaps in horizons nor by discontinuities caused by faults

Finally it is important to note that the proposed methodis not specific to seismic studies but can easily be appliedto other problems where the aim is to identify coherentstructure within 2D or 3D data Also the use of smoothingand threshold trees gives a simple graphical tool for anyclassification problem and the use of the most persistentthresholding is applicable to many other situations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are grateful to the three anonymous referees fortheir helpful comments which have substantially improvedthis paper

References

[1] PM Shearer Introduction to Seismology Cambridge UniversityPress Cambridge UK 1999

[2] R E Sheriff and L P Geldart Exploration Seismology Cam-bridge University Press Cambridge Mass USA 2nd edition1995

10 Advances in Statistics

[3] K H Waters Reflection Seismology John Wiley amp Sons NewYork NY USA 1978

[4] L Hatton M Worthington and J Makin Seismic Data Pro-cessing Theory and Practice Blackwell Scientific PublicationsLondon UK 1986

[5] A Ribes and F Schmitt ldquoLinear inverse problems in imagingan introductory surveyrdquo IEEE Signal Processing Magazine vol25 no 4 pp 84ndash99 2008

[6] A M Stuart ldquoInverse problems a Bayesian perspectiverdquo ActaNumerica vol 19 pp 451ndash559 2010

[7] W W Symes ldquoThe seismic reflection inverse problemrdquo InverseProblems vol 25 no 12 Article ID 123008 39 pages 2009

[8] V Lendzionowski A TWalden andR EWhite ldquoSeismic char-actermapping over reservoir intervalsrdquoGeophysical Prospectingvol 38 no 8 pp 951ndash969 1990

[9] M van der Baan and A Paul ldquoRecognition and reconstructionof coherent energy with application to deep seismic reflectiondatardquo Geophysics vol 65 no 2 pp 656ndash667 2000

[10] A T Walden ldquoSpatial clustering using simple summaries ofdata to find the edge of an oil-fieldrdquo Applied Statistics vol 43pp 385ndash395 1994

[11] D W Oldenburg T Scheuer and S Levy ldquoRecovery of theacoustic impedance from reflection seismogramsrdquo Geophysicsvol 48 no 10 pp 1318ndash1337 1983

[12] N Ricker ldquoThe form and nature of seismic waves and thestructure of seismogramsrdquoGeophysics vol 5 pp 348ndash366 1940

[13] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[14] P J Green and B W Silverman Nonparametric Regressionand Generalized Linear Models A Roughness Penalty ApproachChapman and Hall 1994

[15] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2013 httpwwwR-projectorg

[16] T Lindeberg ldquoScale-space theory a basic tool for analyzingstructures at different scalesrdquo Journal of Applied Statistics vol21 pp 225ndash270 1994

[17] A P Witkin ldquoScale-space filteringrdquo in Proceedings of 8thInternational Joint Conference on Artificial Intelligence pp 1019ndash1022 Karlsruhe Germany 1983

[18] M P Wand and M C Jones Kernel Smoothing Chapman ampHallCRC 1995

[19] B SilvermanDensity Estimation for Statistics andDataAnalysisChapman amp HallCRC 1998

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 3: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

Advances in Statistics 3

00

02

04

06

08

10

000 002 004minus004 minus002

(a)

00

02

04

06

08

10

000 002 004minus004 minus002

(b)

00

02

04

06

08

10

000 002 004minus004 minus002

(c)

Figure 2 Ricker wavelet of varying dominant frequency (a) 25Hz (b) 50Hz (c) 100Hz

reflection coefficients and can be written as 120574 = 120574119896 119896 =

1 2 119870 minus 1 at depths D = 119889119896 119896 = 1 2 119870 minus 1

This series therefore encodes the information about thehorizons in the sense that the arrival time of a reflectionis directly proportional to the location of the spike 119889

119896 and

the amplitude of the spike represents the strength of thereflection 120574

119896

The model of recorded seismic data is completed by theconvolution of the reflection coefficient series with a transferfunction which is often taken as the Ricker wavelet [12] TheRicker wavelet is defined as the scaled second derivative ofthe Gaussian function written as

119908 (119889 119905) = (1 minus 212058721198912

119877(119905 minus

119889

V)

2

) 119890minus12058721198912

119877(119905minus119889V)2

(2)

where 119889 is the depth of the reflecting surface 119905 is the timedelay for the reflection to return to the surface 119891

119877is the

dominant frequency of the sound wave and V is the averagevelocity of the sound wave along its path The transferfunction relates the time taken for the energy pulse to covera particular distance Examples of the Ricker wavelet forthree different dominant frequencies are shown in Figure 2The Ricker wavelet is wide for low frequencies and morecompact for higher dominant frequencies Although thismeans that higher frequency soundwaves have the advantageof greater time resolution suchwaves have the overwhelmingdisadvantage that they penetrate less deeply into the groundThen the discrete convolution model can be written as

119891 (119905119894) =

119870minus1

sum

119896=1

119908 (119889119896 119905119894) times 120574119896 119894 = 1 2 119899 (3)

and including the effects of measurement error gives the finaldata equation

119910119894= 119891 (119905

119894) + 120598119894 119894 = 1 2 119899 (4)

where 120598119894 119894 = 1 2 119899 are independent measurement

errors modelled by a Gaussian distribution Figure 3 showsa diagrammatic representation of the formation of a seismicsignal which results from the convolution of the reflectioncoefficient and the Ricker wavelet The horizon locationscan be seen as spikes in the reflection coefficient series

Noise-free traceRicker waveletReflection coefficient

=times

(a) (b) (c)

Figure 3 Diagram illustrating data production (a) reflectioncoefficients (b) Ricker wavelet (c) seismic trace

corresponding to contrasting acoustic impedance valuesTheconvolution of this series with the Ricker wavelet leads to ablurred version of the reflection coefficient series In practicethe reflection coefficient series will be contaminated byminoracoustic impedance contrasts due to natural variation and byrandom noise describing other sources of error

Now consider an example of the simulation of a syntheticdataset using the above model Consider a sedimentarygeological regionwhich consists of several rock layers beddedhorizontally upon each other with each layer having a char-acteristic but naturally varying acoustic impedance Thislatter point will create weak reflections due to low-contrastchanges in acoustic impedance within a layer creating clutterwhich may partially obscure the true horizon locations Thiswill have the effect of contaminating the reflection coefficientseries which will be transformed into correlated errors in theseismic tracemdashin addition to the independent measurementerrors Further suppose there is a fault causing one part ofthe region to be deeper relative to the surrounding areaAs a particular example Figure 4 shows the seismogram oftraces along a transectmdashin these traces moderate naturalvariation and measurement noise have been included Thedataset consists of 21 traces each having 361 readings at equaltime intervals Three horizons can easily be identified onecurving up towards the top-right and the other two almosthorizontal in the bottomhalf of the pictureThe discontinuityin the horizons is caused by the presence of a fault towards

4 Advances in Statistics

Figure 4 Seismogram showing multiple traces in a syntheticdataset

the right-hand side There is also a gap in the lowest horizontowards the left-hand side

3 Method of Analysis

31 General Strategy The method proposed here takes asits starting point the seismogram Although this may havealready been preprocessed it is assumed that this involvedlittle or no smoothing Hence the method starts by consider-ing smoothing to remove the effects of noise before movingon to identify key features within the data

The overall objective of the analysis is to identify thestratigraphic structure in the seismogram that is long linearfeatures which are called horizons These represent lines ofhigh acoustic impedance contrast which are associated withimportant changes in subsurface structure In the presenceof low noise the points along a strong contrast horizoncan easily be identified as turning points in the individualtracesmdashsee Figure 4 Even in this case however as thenumber of horizons is unknown it might not be obvioushow many turning points to take Also when the contrast islower relative to the noise level identification will be furthercomplicated as turning points caused by the noise and featuremight sometimes have similar signal values

In the proposed method first each time series trace isconsidered separately and candidate horizon locations areidentified as local maxima and minima after smoothingThen these candidate points are classified into groups wheremembers of each class have similar signals Again noisemay mean that some points are incorrectly classified Theidentified classes are defined as structural points Finallyselected points in each class are linked to identify thehorizons The diagram in Figure 5 illustrates the steps of theanalysis strategy

32 Extracting Candidate Point Locations Figure 6(a) showsa single trace from themiddle of the earlier seismogramwith

local minima and maxima shown as open and filled circlesrespectivelyThe sets of the locations of the local maxima andminima are simply determined as

119871max0

= 119905119894 119910119894minus 119910119894minus1

gt 0 119910119894+1

minus 119910119894lt 0

119871min0

= 119905119894 119910119894minus 119910119894minus1

lt 0 119910119894+1

minus 119910119894gt 0

(5)

In total there are 61 turning points and although the threehorizon locations can easily be seen the remaining 58 arecaused by the noise In situations where there are highernoise levels it will not be so easy to separate the extremepoints created by horizons from those due to noise This willmean that some horizons are lost and that spurious pointscontaminate the true locations To reduce the influence ofnoise the traces are smoothed

For a given trace consider the 119899 paired data points(119905119894 119910119894) 119894 = 1 119899 related by an unknown nonlinear

relationship 119891(sdot) with additive noise that is

119910119894= 119891 (119905

119894) + 120598119894 119894 = 1 119899 (6)

where the noise 120598119894 119894 = 1 119899 are independent and

identically distributed Gaussian random variables with zeromean and constant variance 1205902

To reduce the effects of noise the unknown functioncan be estimated using smoothing splines (see eg [13 14])which minimizes the objective function

119869 (120582) =

1

119899

119899

sum

119894=1

(119910119894minus 119891(119905119894))2+ 120582int (119891

10158401015840(119905))

2

119889119905 (7)

subject to the constraint that the fitted curve is twice differ-entiable That is

119891120582(119905) = argmin

119891

1

119899

119899

sum

119894=1

(119910119894minus 119891(119905119894))2+ 120582int (119891

10158401015840(119905))

2

119889119905

(8)

In these the integral is over the interval [119905(1) 119905(119899)] that is the

minimum and maximum of the data collection times Thefirst term a residual mean-square measures the discrepancybetween the data and the fitted function and the secondan integrated squared curvature measures the roughnessof the fitted function and 120582 is a smoothing or bandwidthparameter To make the residuals zero the fitted functionmust interpolate the data In general this will lead to aquickly varying function which will have a high curvatureIn contrast to make the curvature zero the fitted func-tion must be linear which is unlikely to fit the data wellmaking the residuals large The smoothing parameter 120582controls the balance between these two measures Figures6(b) and 6(c) show the results of spline smoothing (usingthe R function smoothspline [15]) for small andmoderatevalues of 120582 respectively In each the major structure is stillevident and some undesirable points are removed Howevercare is needed to avoid oversmoothingwhichmight eliminateimportant features In the extreme however even the mainstructural features would be lost

Advances in Statistics 5

Smoothing HorizonsLink

structural pointscandidate pointsClassify the Extract

candidate pointsdataInput

Figure 5 Diagram of the data analysis strategy

(a) (b) (c)

Figure 6 A seismic trace after increasing levels of smoothing (a)low (b) moderate and (c) high smoothing Also shown are thelocations of local minima and maxima

Figures 6(b) and 6(c) also show the locations of the localturning points of the smoothed function estimates which canbe defined as

119871max120582

= 119905119894119891120582(119905119894) minus

119891120582(119905119894minus1) gt 0

119891120582(119905119894+1) minus

119891120582(119905119894) lt 0

119871min120582

= 119905119894119891120582(119905119894) minus

119891120582(119905119894minus1) lt 0

119891120582(119905119894+1) minus

119891120582(119905119894) gt 0

(9)

Here and later a scale-space approach (see eg [16 17]) willbe considered The idea is that one should not try to focuson a single smoothing level but consider several levels simul-taneously with the aim of identifying any dominant scalesFor any value of the smoothing parameter 120582 minimizing (7)will yield a fitted function 119891

120582 and hence this can be repeated

for a range of values of the smoothing parameter Figure 7shows a set of smoothed traces As the level of smoothingincreases from left to right the resulting smoothed tracesbecome smoother and smoother until they are a straight lineon the far right The best smoothed trace will be somewherewithin this set

Recall that the aim here is not to produce a ldquonicelysmoothed tracerdquo but instead to accurately locate the turningpoints which correspond to the horizons Hence the choiceof smoothing parameter focuses on this output rather thanon a more usual global goodness-of-fit measure To helpwith the analysis a novel turning-point tree is proposedwith an example shown in Figure 8 where the locations of

Figure 7 A single seismic trace after increasing levels of smoothing

all minima and maxima are shown as the smoothing levelchanges That is the tree shows a graphical representationof the sets 119871max

120582and 119871

min120582

as functions of 120582 The bottomedge of Figure 8 corresponds to no smoothing and everyturning point in the data appears As the smoothing increasesmoving upwardswithin the graph the turning point locationschange and sometimes turning points disappear Note thatas the smoothing increases pairs of neighbouring maximaand minima move towards each other before disappearingGradually the number of turning points decreases until at thetop of the graph only the three principal turning points areseen and eventually even these will disappear The horizontaldotted lines correspond to the low moderate and highsmoothing situations shown in Figure 6Overall it seems thatturning points associatedwith true horizons remain for a verywide range of levels of smoothing and hence themethod doesnot seem to be too sensitive to the exact choice

To summarize the threshold tree in Figure 8 considerthe number of turning points 119873

120582 for each value of the

smoothing parameter as shown in Figure 9(a) For smallvalues of 120582 there is little smoothing and hence there aremany turning points but as 120582 increases the number ofturning points reduces Turning points due to noise shouldbe removed quickly but true turning points due to thehorizons should remain To select an appropriate value of 120582it is proposed to use the value corresponding to the greatestchange in the number of turning points This derivative isshown in Figure 9(b) with the maximum (absolute) valueshown as a vertical line at about 120582lowast = 062 Figure 6(b)shows a smoothed trace using the corresponding smoothingparameter value Figure 10(a) shows the full dataset after

6 Advances in Statistics

Figure 8 Threshold tree showing the location of the turning points (blue minima red maxima) for varying levels of smoothing

00 02 04 06 08 10 12Smoothing

Num

ber o

f tur

ning

poi

nts

0

10

20

30

40

(a)

minus140

minus100

minus60

minus20

Der

ivat

ive

00 02 04 06 08 10 12Smoothing

(b)

Figure 9 Graphs of (a) the number of turning points as a functionof bandwidth and (b) the corresponding derivative and point ofmaximum absolute derivative

smoothing using the automatically chosen bandwidth andFigure 10(b) shows the corresponding minima and maximaas open and filled circles It is worth noting that this haslocated many more candidate points than the number of trueturning pointsThis is important as at this stage it is vital thattrue horizon locations are not lostThe next stage will identifywhich candidate points form coherent features

33 Candidate Point Classification and Structural Point Link-ing The second stage in the procedure is to classify thecandidate points into groups where members of a groupform a single horizon It is assumed that points along ahorizon will have a distinct acoustic impedance contrastgiving rise to a distinct reflectance coefficient and hence adistinct measured signal Since each horizon is expected tohave a different characteristic acoustic impedance contrastthe measured signals will be a mixture of an unknownnumber of distributions each of an unknown shape Alsothe various sources of systematic and random variability maymean that measured signals from one horizon vary and maybe similar to those from other horizons It is impossibleto obtain a parametric form for this distribution and so akernel density approach is used to provide a nonparametricsmoothed version of the signal strength distribution of thecandidate points For background to kernel density meth-ods see for example Wand and Jones [18] and Silverman[19]

Suppose that 120578 = 119873120582lowast turning points have been found

that their signal strengths are denoted by 119906119894 119894 = 1

120578 and that their locations along the traces are V119894 119894 =

1 120578 then the kernel density estimate of the signalstrength distribution is

119892120575(119906) =

120578

sum

119894=1

119870(

119906119895minus 119906

120575

) minusinfin lt 119906 lt infin (10)

Advances in Statistics 7

(a) (b)

Figure 10 Multiple smoothed traces in (a) with corresponding candidate points in (b) where open and filled circles represent minima andmaxima respectively

(a) (b) (c)

Figure 11 The curves represent kernel density with (a) low smoothing (b) moderate smoothing (c) high smoothing

where 119870(sdot) is the kernel function and 120575 is a bandwidthparameter A popular choice of kernel function is the stan-dard Gaussian density Despite the method being generallysuccessful at finding structure kernel density estimationcan also ignore significant structure via oversmoothing orretain insignificant structure via undersmoothing and hencethe correct choice of 120575 is important Figure 11 shows asimple bimodal dataset after varying levels of smoothingusing the R function density [15] We might guess that (a)is undersmoothed retaining unimportant random variationand that (c) is oversmoothed leaving little useful informationThe central graph (b) uses the default bandwidth based onSilvermanrsquos rule of thumb [19] which seems to summarize thedistribution well

The number of modes in the kernel density estimate isregarded as the number of distinct classes and hence thenumber of horizons The locations of the minima betweenthe modes can be taken as thresholds which can then be usedto classify the candidate points Considering the three levelsof smoothing used in Figure 11 would however lead to verydifferent results Rather than attempting to estimate a singlelevel of smoothing again a scale-space approach will be used(again see eg [16 17]) In this approach many levels of

smoothing are considered allowing features at different scalesto appear and allowing a natural scale to be identified

Let the set of locations of the local minima which are tobe used as threshold be defined as

120591120575= 119866

min120575

= 119906119894 119892120575(119906119894) minus 119892120575(119906119894minus1) lt 0 119892

120575(119906119894+1) minus 119892120575(119906119894gt 0)

(11)

Suppose that 119870 thresholds are found in this way hencedefining 119870 + 1 groups For a particular bandwidth 120575 let thethresholds be denoted by 120591

120575= 120591119896 119896 = 1 2 119870 which

allows the members of the classes to be defined as

119862119896

120575=

119906119894 119906119894lt 1205911 for 119896 = 1

119906119894 120591119896minus1

lt 119906119894lt 120591119896 for 2 le 119896 le 119870

119906119894 119906119894gt 120591119870 for 119896 = 119870 + 1

(12)

and the number in each class as119873119896120575for 119896 = 1 2 119870

Figure 12(a) shows the relation between the locations ofthe thresholds and the level of smoothing defined in terms

8 Advances in Statistics

000 002 004 006 008Bandwidth

minus06

minus04

minus020Thre

shol

d

00

02

04

(a)

Den

sity

0

1

2

3

4

5

Signal strengthminus08 minus04 00 02 04

(b) (c)

Figure 12 (a) Threshold tree (b) histogram of kernel density and (c) turning point locations

of the bandwidth 120575 The change in the bandwidth leads toa change in the number and value of the thresholds andhence in the number of classes As the bandwidth increasesthe number of thresholds decreases and when the curvescombine into a single line there is only a single thresholdwhich means in turn that the number of groups is twoThe procedure is stopped when the smoothed distributionbecomes unimodal In the proposed method the best clas-sification is defined to be when the number of thresholds isstable for the longest

In this example three thresholds persist the longest abandwidth interval indicated by the vertical dotted linesand hence suggests four clusters The optimum smoothingbandwidth 120575lowast is then taken as the smallest value in thisintervalThe corresponding kernel density estimate is plottedin Figure 12(b) the vertical dotted lines show the position ofthe thresholds It is now important to realise that the clustersurrounding signal strength zero includes turning pointsgenerated by the random noise and should be excluded Thiswill be considered now before moving to the final linkingstep

The observed signal strength distribution will be a mix-ture of background values and values due to true horizonsOf course the various means and the variance of thesecomponents are unknown To describe the distribution of thesignal values from the background the following approachis proposed The centre of the background signal strengthdistribution is estimated using the median of the full data

120583 = median119895

119906119895 (13)

and the variance estimate is based on the median absolutedeviation (MAD)

2=

MADΦminus1(34)

with MAD = median119895

10038161003816100381610038161003816119906119895minus 120583

10038161003816100381610038161003816 (14)

where the value 1Φminus1(34) asymp 148 is chosen so that whenthe values follow a Gaussian distribution then 119864[

2] =

1205902 These estimators are chosen to be robust to outliers

caused by the signal strengths from the horizons The solidhorizontal line near the bottom of Figure 12(b) representsa 95 confidence interval for background signal strengthThe calculation assumes a Gaussian distribution and uses theabovemean and variance estimatesThis interval encloses themajority of this zero-centred mode and confirms that it isreasonable to exclude the whole group Before moving onit is worth noting that for datasets containing no horizonsall turning points will form a single group which will beidentified as a background group and hence the procedurewould not identify any horizons

Figure 12(c) shows all the minima and maxima as openand filled circles respectively The grey circles correspond tothe background group and the black circles to the horizongroupsThe algorithm achieves good results in extracting andclassifying the structural points which make the horizonsbut notice that although the horizons can be clearly seenthere are three extra points near the top and one missingalong the central horizon Also although not visible thereare six points misclassified to the wrong group This meansthat linking would not be perfect with some crossing of theproposed horizons This suggests that further criteria mustbe included in this final linking process As a simple examplehere isolated structural point will be excluded from thelinking In particular if the distance to the nearest other pointin the group is greater than say twice the average within-group interpoint distance then is excluded

Let the ordered set of all locations along the traces ofmember of class 119896 be denoted by V

119894 1 2 119873

119896

120575lowast Then

consider the median distance between neighbouring pointsin the class

119889119896= median

119894

1003816100381610038161003816V119894+1

minus V119894

1003816100381610038161003816 (15)

and the distance from a point to the nearest other point in theclass

119898119896

119894= min119895

10038161003816100381610038161003816V119894minus V119895

10038161003816100381610038161003816 (16)

Advances in Statistics 9

Figure 13 Linked structural points with modified method

This allows the following modified rule for class membershipto be defined

119862119896

120575lowast =

V119894 V119894lt 1205911 119898119896

119894lt 2 times 119889

119896

for 119896 = 1

V119894 120591119896minus1

lt V119894lt 120591119896 119898119896

119894lt 2 times 119889

119896

for 2 le 119896 le 119870

V119894 V119894gt 120591119870 119898119896

119894lt 2 times 119889

119896

for 119896 = 119870 + 1

(17)

Applying this final rule to the previous example producesthree groups of structural points forming clear and coherenthorizons as shown in Figure 13 Although not shown hereit has also been applied to examples with higher noise levelswith equal success

4 Conclusions

Seismic surveys play an important role in locating under-ground features but the partial reflection and natural varia-tion in acoustic properties mean that the recorded acousticsignals provide a confusing representation of the studyarea Full 3D datasets can be enormous covering severalsquare kilometres on the surface and representing severalkilometres in depth with high frequency temporal samplingThis suggests that rapid analysis is needed even if interestingexamples require further analysis or examination by expertsThis also suggests that simple techniques are preferablewith full reconstruction reserved for follow-up analysis Rawdata contain systematic distortions and significant noiseand typically they undergo substantial preprocessing beforeanalysis is started The preprocessing aims at reducing thedistortion and reducing noise but can introduce artefacts

The approach proposed here is based on simple statisticalprocedures which aim at smoothing out noise and usingcontextual information to identify coherent horizons In

particular smoothing splines are used to smooth individualtraces before localminima andmaxima are located It appearsthat greater smoothing can be applied to the traces withoutoverwhelming the major features than would be the caseif the aim was to estimate the signal Once a smaller set ofcandidate points is found they can be classified into groupswhere each group represents points along a horizon Clas-sification is performed by thresholding the signal strengthdistribution so that each mode represents a class Againsmoothing is needed to help select the correct number ofmodes and hence classes in the distributionThis time kerneldensity methods are used Here the choice of bandwidth ismuch more important as even small changes can lead to asubstantial change in the smoothed density A scale-spaceapproach is adopted which first allows all levels of smoothingto be considered The best level of smoothing and hencethe number of clusters and the number of horizons is thendetermined as the value for which the number of clusterspersists the longest The cluster associated with the randomvariation in the background is identified and excluded fromhorizon linkage Locationswithin each cluster are then linkedto form horizons In some high noise examples furtherinformation is used in this linking stage to exclude pointswhich are distant from the main cluster and hence unlikelyto be part of the horizon This does not prevent howeverhorizons from being linked across gaps and across faultsClearly it would be possible to supplement this fine tuningwith other criteria as necessary Based on the experimentsconducted here the proposed method shows great promiseIt provides a simple and quick approach to locate and linkhorizons even in relatively noisy datasets It is not affected bygaps in horizons nor by discontinuities caused by faults

Finally it is important to note that the proposed methodis not specific to seismic studies but can easily be appliedto other problems where the aim is to identify coherentstructure within 2D or 3D data Also the use of smoothingand threshold trees gives a simple graphical tool for anyclassification problem and the use of the most persistentthresholding is applicable to many other situations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are grateful to the three anonymous referees fortheir helpful comments which have substantially improvedthis paper

References

[1] PM Shearer Introduction to Seismology Cambridge UniversityPress Cambridge UK 1999

[2] R E Sheriff and L P Geldart Exploration Seismology Cam-bridge University Press Cambridge Mass USA 2nd edition1995

10 Advances in Statistics

[3] K H Waters Reflection Seismology John Wiley amp Sons NewYork NY USA 1978

[4] L Hatton M Worthington and J Makin Seismic Data Pro-cessing Theory and Practice Blackwell Scientific PublicationsLondon UK 1986

[5] A Ribes and F Schmitt ldquoLinear inverse problems in imagingan introductory surveyrdquo IEEE Signal Processing Magazine vol25 no 4 pp 84ndash99 2008

[6] A M Stuart ldquoInverse problems a Bayesian perspectiverdquo ActaNumerica vol 19 pp 451ndash559 2010

[7] W W Symes ldquoThe seismic reflection inverse problemrdquo InverseProblems vol 25 no 12 Article ID 123008 39 pages 2009

[8] V Lendzionowski A TWalden andR EWhite ldquoSeismic char-actermapping over reservoir intervalsrdquoGeophysical Prospectingvol 38 no 8 pp 951ndash969 1990

[9] M van der Baan and A Paul ldquoRecognition and reconstructionof coherent energy with application to deep seismic reflectiondatardquo Geophysics vol 65 no 2 pp 656ndash667 2000

[10] A T Walden ldquoSpatial clustering using simple summaries ofdata to find the edge of an oil-fieldrdquo Applied Statistics vol 43pp 385ndash395 1994

[11] D W Oldenburg T Scheuer and S Levy ldquoRecovery of theacoustic impedance from reflection seismogramsrdquo Geophysicsvol 48 no 10 pp 1318ndash1337 1983

[12] N Ricker ldquoThe form and nature of seismic waves and thestructure of seismogramsrdquoGeophysics vol 5 pp 348ndash366 1940

[13] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[14] P J Green and B W Silverman Nonparametric Regressionand Generalized Linear Models A Roughness Penalty ApproachChapman and Hall 1994

[15] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2013 httpwwwR-projectorg

[16] T Lindeberg ldquoScale-space theory a basic tool for analyzingstructures at different scalesrdquo Journal of Applied Statistics vol21 pp 225ndash270 1994

[17] A P Witkin ldquoScale-space filteringrdquo in Proceedings of 8thInternational Joint Conference on Artificial Intelligence pp 1019ndash1022 Karlsruhe Germany 1983

[18] M P Wand and M C Jones Kernel Smoothing Chapman ampHallCRC 1995

[19] B SilvermanDensity Estimation for Statistics andDataAnalysisChapman amp HallCRC 1998

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 4: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

4 Advances in Statistics

Figure 4 Seismogram showing multiple traces in a syntheticdataset

the right-hand side There is also a gap in the lowest horizontowards the left-hand side

3 Method of Analysis

31 General Strategy The method proposed here takes asits starting point the seismogram Although this may havealready been preprocessed it is assumed that this involvedlittle or no smoothing Hence the method starts by consider-ing smoothing to remove the effects of noise before movingon to identify key features within the data

The overall objective of the analysis is to identify thestratigraphic structure in the seismogram that is long linearfeatures which are called horizons These represent lines ofhigh acoustic impedance contrast which are associated withimportant changes in subsurface structure In the presenceof low noise the points along a strong contrast horizoncan easily be identified as turning points in the individualtracesmdashsee Figure 4 Even in this case however as thenumber of horizons is unknown it might not be obvioushow many turning points to take Also when the contrast islower relative to the noise level identification will be furthercomplicated as turning points caused by the noise and featuremight sometimes have similar signal values

In the proposed method first each time series trace isconsidered separately and candidate horizon locations areidentified as local maxima and minima after smoothingThen these candidate points are classified into groups wheremembers of each class have similar signals Again noisemay mean that some points are incorrectly classified Theidentified classes are defined as structural points Finallyselected points in each class are linked to identify thehorizons The diagram in Figure 5 illustrates the steps of theanalysis strategy

32 Extracting Candidate Point Locations Figure 6(a) showsa single trace from themiddle of the earlier seismogramwith

local minima and maxima shown as open and filled circlesrespectivelyThe sets of the locations of the local maxima andminima are simply determined as

119871max0

= 119905119894 119910119894minus 119910119894minus1

gt 0 119910119894+1

minus 119910119894lt 0

119871min0

= 119905119894 119910119894minus 119910119894minus1

lt 0 119910119894+1

minus 119910119894gt 0

(5)

In total there are 61 turning points and although the threehorizon locations can easily be seen the remaining 58 arecaused by the noise In situations where there are highernoise levels it will not be so easy to separate the extremepoints created by horizons from those due to noise This willmean that some horizons are lost and that spurious pointscontaminate the true locations To reduce the influence ofnoise the traces are smoothed

For a given trace consider the 119899 paired data points(119905119894 119910119894) 119894 = 1 119899 related by an unknown nonlinear

relationship 119891(sdot) with additive noise that is

119910119894= 119891 (119905

119894) + 120598119894 119894 = 1 119899 (6)

where the noise 120598119894 119894 = 1 119899 are independent and

identically distributed Gaussian random variables with zeromean and constant variance 1205902

To reduce the effects of noise the unknown functioncan be estimated using smoothing splines (see eg [13 14])which minimizes the objective function

119869 (120582) =

1

119899

119899

sum

119894=1

(119910119894minus 119891(119905119894))2+ 120582int (119891

10158401015840(119905))

2

119889119905 (7)

subject to the constraint that the fitted curve is twice differ-entiable That is

119891120582(119905) = argmin

119891

1

119899

119899

sum

119894=1

(119910119894minus 119891(119905119894))2+ 120582int (119891

10158401015840(119905))

2

119889119905

(8)

In these the integral is over the interval [119905(1) 119905(119899)] that is the

minimum and maximum of the data collection times Thefirst term a residual mean-square measures the discrepancybetween the data and the fitted function and the secondan integrated squared curvature measures the roughnessof the fitted function and 120582 is a smoothing or bandwidthparameter To make the residuals zero the fitted functionmust interpolate the data In general this will lead to aquickly varying function which will have a high curvatureIn contrast to make the curvature zero the fitted func-tion must be linear which is unlikely to fit the data wellmaking the residuals large The smoothing parameter 120582controls the balance between these two measures Figures6(b) and 6(c) show the results of spline smoothing (usingthe R function smoothspline [15]) for small andmoderatevalues of 120582 respectively In each the major structure is stillevident and some undesirable points are removed Howevercare is needed to avoid oversmoothingwhichmight eliminateimportant features In the extreme however even the mainstructural features would be lost

Advances in Statistics 5

Smoothing HorizonsLink

structural pointscandidate pointsClassify the Extract

candidate pointsdataInput

Figure 5 Diagram of the data analysis strategy

(a) (b) (c)

Figure 6 A seismic trace after increasing levels of smoothing (a)low (b) moderate and (c) high smoothing Also shown are thelocations of local minima and maxima

Figures 6(b) and 6(c) also show the locations of the localturning points of the smoothed function estimates which canbe defined as

119871max120582

= 119905119894119891120582(119905119894) minus

119891120582(119905119894minus1) gt 0

119891120582(119905119894+1) minus

119891120582(119905119894) lt 0

119871min120582

= 119905119894119891120582(119905119894) minus

119891120582(119905119894minus1) lt 0

119891120582(119905119894+1) minus

119891120582(119905119894) gt 0

(9)

Here and later a scale-space approach (see eg [16 17]) willbe considered The idea is that one should not try to focuson a single smoothing level but consider several levels simul-taneously with the aim of identifying any dominant scalesFor any value of the smoothing parameter 120582 minimizing (7)will yield a fitted function 119891

120582 and hence this can be repeated

for a range of values of the smoothing parameter Figure 7shows a set of smoothed traces As the level of smoothingincreases from left to right the resulting smoothed tracesbecome smoother and smoother until they are a straight lineon the far right The best smoothed trace will be somewherewithin this set

Recall that the aim here is not to produce a ldquonicelysmoothed tracerdquo but instead to accurately locate the turningpoints which correspond to the horizons Hence the choiceof smoothing parameter focuses on this output rather thanon a more usual global goodness-of-fit measure To helpwith the analysis a novel turning-point tree is proposedwith an example shown in Figure 8 where the locations of

Figure 7 A single seismic trace after increasing levels of smoothing

all minima and maxima are shown as the smoothing levelchanges That is the tree shows a graphical representationof the sets 119871max

120582and 119871

min120582

as functions of 120582 The bottomedge of Figure 8 corresponds to no smoothing and everyturning point in the data appears As the smoothing increasesmoving upwardswithin the graph the turning point locationschange and sometimes turning points disappear Note thatas the smoothing increases pairs of neighbouring maximaand minima move towards each other before disappearingGradually the number of turning points decreases until at thetop of the graph only the three principal turning points areseen and eventually even these will disappear The horizontaldotted lines correspond to the low moderate and highsmoothing situations shown in Figure 6Overall it seems thatturning points associatedwith true horizons remain for a verywide range of levels of smoothing and hence themethod doesnot seem to be too sensitive to the exact choice

To summarize the threshold tree in Figure 8 considerthe number of turning points 119873

120582 for each value of the

smoothing parameter as shown in Figure 9(a) For smallvalues of 120582 there is little smoothing and hence there aremany turning points but as 120582 increases the number ofturning points reduces Turning points due to noise shouldbe removed quickly but true turning points due to thehorizons should remain To select an appropriate value of 120582it is proposed to use the value corresponding to the greatestchange in the number of turning points This derivative isshown in Figure 9(b) with the maximum (absolute) valueshown as a vertical line at about 120582lowast = 062 Figure 6(b)shows a smoothed trace using the corresponding smoothingparameter value Figure 10(a) shows the full dataset after

6 Advances in Statistics

Figure 8 Threshold tree showing the location of the turning points (blue minima red maxima) for varying levels of smoothing

00 02 04 06 08 10 12Smoothing

Num

ber o

f tur

ning

poi

nts

0

10

20

30

40

(a)

minus140

minus100

minus60

minus20

Der

ivat

ive

00 02 04 06 08 10 12Smoothing

(b)

Figure 9 Graphs of (a) the number of turning points as a functionof bandwidth and (b) the corresponding derivative and point ofmaximum absolute derivative

smoothing using the automatically chosen bandwidth andFigure 10(b) shows the corresponding minima and maximaas open and filled circles It is worth noting that this haslocated many more candidate points than the number of trueturning pointsThis is important as at this stage it is vital thattrue horizon locations are not lostThe next stage will identifywhich candidate points form coherent features

33 Candidate Point Classification and Structural Point Link-ing The second stage in the procedure is to classify thecandidate points into groups where members of a groupform a single horizon It is assumed that points along ahorizon will have a distinct acoustic impedance contrastgiving rise to a distinct reflectance coefficient and hence adistinct measured signal Since each horizon is expected tohave a different characteristic acoustic impedance contrastthe measured signals will be a mixture of an unknownnumber of distributions each of an unknown shape Alsothe various sources of systematic and random variability maymean that measured signals from one horizon vary and maybe similar to those from other horizons It is impossibleto obtain a parametric form for this distribution and so akernel density approach is used to provide a nonparametricsmoothed version of the signal strength distribution of thecandidate points For background to kernel density meth-ods see for example Wand and Jones [18] and Silverman[19]

Suppose that 120578 = 119873120582lowast turning points have been found

that their signal strengths are denoted by 119906119894 119894 = 1

120578 and that their locations along the traces are V119894 119894 =

1 120578 then the kernel density estimate of the signalstrength distribution is

119892120575(119906) =

120578

sum

119894=1

119870(

119906119895minus 119906

120575

) minusinfin lt 119906 lt infin (10)

Advances in Statistics 7

(a) (b)

Figure 10 Multiple smoothed traces in (a) with corresponding candidate points in (b) where open and filled circles represent minima andmaxima respectively

(a) (b) (c)

Figure 11 The curves represent kernel density with (a) low smoothing (b) moderate smoothing (c) high smoothing

where 119870(sdot) is the kernel function and 120575 is a bandwidthparameter A popular choice of kernel function is the stan-dard Gaussian density Despite the method being generallysuccessful at finding structure kernel density estimationcan also ignore significant structure via oversmoothing orretain insignificant structure via undersmoothing and hencethe correct choice of 120575 is important Figure 11 shows asimple bimodal dataset after varying levels of smoothingusing the R function density [15] We might guess that (a)is undersmoothed retaining unimportant random variationand that (c) is oversmoothed leaving little useful informationThe central graph (b) uses the default bandwidth based onSilvermanrsquos rule of thumb [19] which seems to summarize thedistribution well

The number of modes in the kernel density estimate isregarded as the number of distinct classes and hence thenumber of horizons The locations of the minima betweenthe modes can be taken as thresholds which can then be usedto classify the candidate points Considering the three levelsof smoothing used in Figure 11 would however lead to verydifferent results Rather than attempting to estimate a singlelevel of smoothing again a scale-space approach will be used(again see eg [16 17]) In this approach many levels of

smoothing are considered allowing features at different scalesto appear and allowing a natural scale to be identified

Let the set of locations of the local minima which are tobe used as threshold be defined as

120591120575= 119866

min120575

= 119906119894 119892120575(119906119894) minus 119892120575(119906119894minus1) lt 0 119892

120575(119906119894+1) minus 119892120575(119906119894gt 0)

(11)

Suppose that 119870 thresholds are found in this way hencedefining 119870 + 1 groups For a particular bandwidth 120575 let thethresholds be denoted by 120591

120575= 120591119896 119896 = 1 2 119870 which

allows the members of the classes to be defined as

119862119896

120575=

119906119894 119906119894lt 1205911 for 119896 = 1

119906119894 120591119896minus1

lt 119906119894lt 120591119896 for 2 le 119896 le 119870

119906119894 119906119894gt 120591119870 for 119896 = 119870 + 1

(12)

and the number in each class as119873119896120575for 119896 = 1 2 119870

Figure 12(a) shows the relation between the locations ofthe thresholds and the level of smoothing defined in terms

8 Advances in Statistics

000 002 004 006 008Bandwidth

minus06

minus04

minus020Thre

shol

d

00

02

04

(a)

Den

sity

0

1

2

3

4

5

Signal strengthminus08 minus04 00 02 04

(b) (c)

Figure 12 (a) Threshold tree (b) histogram of kernel density and (c) turning point locations

of the bandwidth 120575 The change in the bandwidth leads toa change in the number and value of the thresholds andhence in the number of classes As the bandwidth increasesthe number of thresholds decreases and when the curvescombine into a single line there is only a single thresholdwhich means in turn that the number of groups is twoThe procedure is stopped when the smoothed distributionbecomes unimodal In the proposed method the best clas-sification is defined to be when the number of thresholds isstable for the longest

In this example three thresholds persist the longest abandwidth interval indicated by the vertical dotted linesand hence suggests four clusters The optimum smoothingbandwidth 120575lowast is then taken as the smallest value in thisintervalThe corresponding kernel density estimate is plottedin Figure 12(b) the vertical dotted lines show the position ofthe thresholds It is now important to realise that the clustersurrounding signal strength zero includes turning pointsgenerated by the random noise and should be excluded Thiswill be considered now before moving to the final linkingstep

The observed signal strength distribution will be a mix-ture of background values and values due to true horizonsOf course the various means and the variance of thesecomponents are unknown To describe the distribution of thesignal values from the background the following approachis proposed The centre of the background signal strengthdistribution is estimated using the median of the full data

120583 = median119895

119906119895 (13)

and the variance estimate is based on the median absolutedeviation (MAD)

2=

MADΦminus1(34)

with MAD = median119895

10038161003816100381610038161003816119906119895minus 120583

10038161003816100381610038161003816 (14)

where the value 1Φminus1(34) asymp 148 is chosen so that whenthe values follow a Gaussian distribution then 119864[

2] =

1205902 These estimators are chosen to be robust to outliers

caused by the signal strengths from the horizons The solidhorizontal line near the bottom of Figure 12(b) representsa 95 confidence interval for background signal strengthThe calculation assumes a Gaussian distribution and uses theabovemean and variance estimatesThis interval encloses themajority of this zero-centred mode and confirms that it isreasonable to exclude the whole group Before moving onit is worth noting that for datasets containing no horizonsall turning points will form a single group which will beidentified as a background group and hence the procedurewould not identify any horizons

Figure 12(c) shows all the minima and maxima as openand filled circles respectively The grey circles correspond tothe background group and the black circles to the horizongroupsThe algorithm achieves good results in extracting andclassifying the structural points which make the horizonsbut notice that although the horizons can be clearly seenthere are three extra points near the top and one missingalong the central horizon Also although not visible thereare six points misclassified to the wrong group This meansthat linking would not be perfect with some crossing of theproposed horizons This suggests that further criteria mustbe included in this final linking process As a simple examplehere isolated structural point will be excluded from thelinking In particular if the distance to the nearest other pointin the group is greater than say twice the average within-group interpoint distance then is excluded

Let the ordered set of all locations along the traces ofmember of class 119896 be denoted by V

119894 1 2 119873

119896

120575lowast Then

consider the median distance between neighbouring pointsin the class

119889119896= median

119894

1003816100381610038161003816V119894+1

minus V119894

1003816100381610038161003816 (15)

and the distance from a point to the nearest other point in theclass

119898119896

119894= min119895

10038161003816100381610038161003816V119894minus V119895

10038161003816100381610038161003816 (16)

Advances in Statistics 9

Figure 13 Linked structural points with modified method

This allows the following modified rule for class membershipto be defined

119862119896

120575lowast =

V119894 V119894lt 1205911 119898119896

119894lt 2 times 119889

119896

for 119896 = 1

V119894 120591119896minus1

lt V119894lt 120591119896 119898119896

119894lt 2 times 119889

119896

for 2 le 119896 le 119870

V119894 V119894gt 120591119870 119898119896

119894lt 2 times 119889

119896

for 119896 = 119870 + 1

(17)

Applying this final rule to the previous example producesthree groups of structural points forming clear and coherenthorizons as shown in Figure 13 Although not shown hereit has also been applied to examples with higher noise levelswith equal success

4 Conclusions

Seismic surveys play an important role in locating under-ground features but the partial reflection and natural varia-tion in acoustic properties mean that the recorded acousticsignals provide a confusing representation of the studyarea Full 3D datasets can be enormous covering severalsquare kilometres on the surface and representing severalkilometres in depth with high frequency temporal samplingThis suggests that rapid analysis is needed even if interestingexamples require further analysis or examination by expertsThis also suggests that simple techniques are preferablewith full reconstruction reserved for follow-up analysis Rawdata contain systematic distortions and significant noiseand typically they undergo substantial preprocessing beforeanalysis is started The preprocessing aims at reducing thedistortion and reducing noise but can introduce artefacts

The approach proposed here is based on simple statisticalprocedures which aim at smoothing out noise and usingcontextual information to identify coherent horizons In

particular smoothing splines are used to smooth individualtraces before localminima andmaxima are located It appearsthat greater smoothing can be applied to the traces withoutoverwhelming the major features than would be the caseif the aim was to estimate the signal Once a smaller set ofcandidate points is found they can be classified into groupswhere each group represents points along a horizon Clas-sification is performed by thresholding the signal strengthdistribution so that each mode represents a class Againsmoothing is needed to help select the correct number ofmodes and hence classes in the distributionThis time kerneldensity methods are used Here the choice of bandwidth ismuch more important as even small changes can lead to asubstantial change in the smoothed density A scale-spaceapproach is adopted which first allows all levels of smoothingto be considered The best level of smoothing and hencethe number of clusters and the number of horizons is thendetermined as the value for which the number of clusterspersists the longest The cluster associated with the randomvariation in the background is identified and excluded fromhorizon linkage Locationswithin each cluster are then linkedto form horizons In some high noise examples furtherinformation is used in this linking stage to exclude pointswhich are distant from the main cluster and hence unlikelyto be part of the horizon This does not prevent howeverhorizons from being linked across gaps and across faultsClearly it would be possible to supplement this fine tuningwith other criteria as necessary Based on the experimentsconducted here the proposed method shows great promiseIt provides a simple and quick approach to locate and linkhorizons even in relatively noisy datasets It is not affected bygaps in horizons nor by discontinuities caused by faults

Finally it is important to note that the proposed methodis not specific to seismic studies but can easily be appliedto other problems where the aim is to identify coherentstructure within 2D or 3D data Also the use of smoothingand threshold trees gives a simple graphical tool for anyclassification problem and the use of the most persistentthresholding is applicable to many other situations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are grateful to the three anonymous referees fortheir helpful comments which have substantially improvedthis paper

References

[1] PM Shearer Introduction to Seismology Cambridge UniversityPress Cambridge UK 1999

[2] R E Sheriff and L P Geldart Exploration Seismology Cam-bridge University Press Cambridge Mass USA 2nd edition1995

10 Advances in Statistics

[3] K H Waters Reflection Seismology John Wiley amp Sons NewYork NY USA 1978

[4] L Hatton M Worthington and J Makin Seismic Data Pro-cessing Theory and Practice Blackwell Scientific PublicationsLondon UK 1986

[5] A Ribes and F Schmitt ldquoLinear inverse problems in imagingan introductory surveyrdquo IEEE Signal Processing Magazine vol25 no 4 pp 84ndash99 2008

[6] A M Stuart ldquoInverse problems a Bayesian perspectiverdquo ActaNumerica vol 19 pp 451ndash559 2010

[7] W W Symes ldquoThe seismic reflection inverse problemrdquo InverseProblems vol 25 no 12 Article ID 123008 39 pages 2009

[8] V Lendzionowski A TWalden andR EWhite ldquoSeismic char-actermapping over reservoir intervalsrdquoGeophysical Prospectingvol 38 no 8 pp 951ndash969 1990

[9] M van der Baan and A Paul ldquoRecognition and reconstructionof coherent energy with application to deep seismic reflectiondatardquo Geophysics vol 65 no 2 pp 656ndash667 2000

[10] A T Walden ldquoSpatial clustering using simple summaries ofdata to find the edge of an oil-fieldrdquo Applied Statistics vol 43pp 385ndash395 1994

[11] D W Oldenburg T Scheuer and S Levy ldquoRecovery of theacoustic impedance from reflection seismogramsrdquo Geophysicsvol 48 no 10 pp 1318ndash1337 1983

[12] N Ricker ldquoThe form and nature of seismic waves and thestructure of seismogramsrdquoGeophysics vol 5 pp 348ndash366 1940

[13] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[14] P J Green and B W Silverman Nonparametric Regressionand Generalized Linear Models A Roughness Penalty ApproachChapman and Hall 1994

[15] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2013 httpwwwR-projectorg

[16] T Lindeberg ldquoScale-space theory a basic tool for analyzingstructures at different scalesrdquo Journal of Applied Statistics vol21 pp 225ndash270 1994

[17] A P Witkin ldquoScale-space filteringrdquo in Proceedings of 8thInternational Joint Conference on Artificial Intelligence pp 1019ndash1022 Karlsruhe Germany 1983

[18] M P Wand and M C Jones Kernel Smoothing Chapman ampHallCRC 1995

[19] B SilvermanDensity Estimation for Statistics andDataAnalysisChapman amp HallCRC 1998

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 5: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

Advances in Statistics 5

Smoothing HorizonsLink

structural pointscandidate pointsClassify the Extract

candidate pointsdataInput

Figure 5 Diagram of the data analysis strategy

(a) (b) (c)

Figure 6 A seismic trace after increasing levels of smoothing (a)low (b) moderate and (c) high smoothing Also shown are thelocations of local minima and maxima

Figures 6(b) and 6(c) also show the locations of the localturning points of the smoothed function estimates which canbe defined as

119871max120582

= 119905119894119891120582(119905119894) minus

119891120582(119905119894minus1) gt 0

119891120582(119905119894+1) minus

119891120582(119905119894) lt 0

119871min120582

= 119905119894119891120582(119905119894) minus

119891120582(119905119894minus1) lt 0

119891120582(119905119894+1) minus

119891120582(119905119894) gt 0

(9)

Here and later a scale-space approach (see eg [16 17]) willbe considered The idea is that one should not try to focuson a single smoothing level but consider several levels simul-taneously with the aim of identifying any dominant scalesFor any value of the smoothing parameter 120582 minimizing (7)will yield a fitted function 119891

120582 and hence this can be repeated

for a range of values of the smoothing parameter Figure 7shows a set of smoothed traces As the level of smoothingincreases from left to right the resulting smoothed tracesbecome smoother and smoother until they are a straight lineon the far right The best smoothed trace will be somewherewithin this set

Recall that the aim here is not to produce a ldquonicelysmoothed tracerdquo but instead to accurately locate the turningpoints which correspond to the horizons Hence the choiceof smoothing parameter focuses on this output rather thanon a more usual global goodness-of-fit measure To helpwith the analysis a novel turning-point tree is proposedwith an example shown in Figure 8 where the locations of

Figure 7 A single seismic trace after increasing levels of smoothing

all minima and maxima are shown as the smoothing levelchanges That is the tree shows a graphical representationof the sets 119871max

120582and 119871

min120582

as functions of 120582 The bottomedge of Figure 8 corresponds to no smoothing and everyturning point in the data appears As the smoothing increasesmoving upwardswithin the graph the turning point locationschange and sometimes turning points disappear Note thatas the smoothing increases pairs of neighbouring maximaand minima move towards each other before disappearingGradually the number of turning points decreases until at thetop of the graph only the three principal turning points areseen and eventually even these will disappear The horizontaldotted lines correspond to the low moderate and highsmoothing situations shown in Figure 6Overall it seems thatturning points associatedwith true horizons remain for a verywide range of levels of smoothing and hence themethod doesnot seem to be too sensitive to the exact choice

To summarize the threshold tree in Figure 8 considerthe number of turning points 119873

120582 for each value of the

smoothing parameter as shown in Figure 9(a) For smallvalues of 120582 there is little smoothing and hence there aremany turning points but as 120582 increases the number ofturning points reduces Turning points due to noise shouldbe removed quickly but true turning points due to thehorizons should remain To select an appropriate value of 120582it is proposed to use the value corresponding to the greatestchange in the number of turning points This derivative isshown in Figure 9(b) with the maximum (absolute) valueshown as a vertical line at about 120582lowast = 062 Figure 6(b)shows a smoothed trace using the corresponding smoothingparameter value Figure 10(a) shows the full dataset after

6 Advances in Statistics

Figure 8 Threshold tree showing the location of the turning points (blue minima red maxima) for varying levels of smoothing

00 02 04 06 08 10 12Smoothing

Num

ber o

f tur

ning

poi

nts

0

10

20

30

40

(a)

minus140

minus100

minus60

minus20

Der

ivat

ive

00 02 04 06 08 10 12Smoothing

(b)

Figure 9 Graphs of (a) the number of turning points as a functionof bandwidth and (b) the corresponding derivative and point ofmaximum absolute derivative

smoothing using the automatically chosen bandwidth andFigure 10(b) shows the corresponding minima and maximaas open and filled circles It is worth noting that this haslocated many more candidate points than the number of trueturning pointsThis is important as at this stage it is vital thattrue horizon locations are not lostThe next stage will identifywhich candidate points form coherent features

33 Candidate Point Classification and Structural Point Link-ing The second stage in the procedure is to classify thecandidate points into groups where members of a groupform a single horizon It is assumed that points along ahorizon will have a distinct acoustic impedance contrastgiving rise to a distinct reflectance coefficient and hence adistinct measured signal Since each horizon is expected tohave a different characteristic acoustic impedance contrastthe measured signals will be a mixture of an unknownnumber of distributions each of an unknown shape Alsothe various sources of systematic and random variability maymean that measured signals from one horizon vary and maybe similar to those from other horizons It is impossibleto obtain a parametric form for this distribution and so akernel density approach is used to provide a nonparametricsmoothed version of the signal strength distribution of thecandidate points For background to kernel density meth-ods see for example Wand and Jones [18] and Silverman[19]

Suppose that 120578 = 119873120582lowast turning points have been found

that their signal strengths are denoted by 119906119894 119894 = 1

120578 and that their locations along the traces are V119894 119894 =

1 120578 then the kernel density estimate of the signalstrength distribution is

119892120575(119906) =

120578

sum

119894=1

119870(

119906119895minus 119906

120575

) minusinfin lt 119906 lt infin (10)

Advances in Statistics 7

(a) (b)

Figure 10 Multiple smoothed traces in (a) with corresponding candidate points in (b) where open and filled circles represent minima andmaxima respectively

(a) (b) (c)

Figure 11 The curves represent kernel density with (a) low smoothing (b) moderate smoothing (c) high smoothing

where 119870(sdot) is the kernel function and 120575 is a bandwidthparameter A popular choice of kernel function is the stan-dard Gaussian density Despite the method being generallysuccessful at finding structure kernel density estimationcan also ignore significant structure via oversmoothing orretain insignificant structure via undersmoothing and hencethe correct choice of 120575 is important Figure 11 shows asimple bimodal dataset after varying levels of smoothingusing the R function density [15] We might guess that (a)is undersmoothed retaining unimportant random variationand that (c) is oversmoothed leaving little useful informationThe central graph (b) uses the default bandwidth based onSilvermanrsquos rule of thumb [19] which seems to summarize thedistribution well

The number of modes in the kernel density estimate isregarded as the number of distinct classes and hence thenumber of horizons The locations of the minima betweenthe modes can be taken as thresholds which can then be usedto classify the candidate points Considering the three levelsof smoothing used in Figure 11 would however lead to verydifferent results Rather than attempting to estimate a singlelevel of smoothing again a scale-space approach will be used(again see eg [16 17]) In this approach many levels of

smoothing are considered allowing features at different scalesto appear and allowing a natural scale to be identified

Let the set of locations of the local minima which are tobe used as threshold be defined as

120591120575= 119866

min120575

= 119906119894 119892120575(119906119894) minus 119892120575(119906119894minus1) lt 0 119892

120575(119906119894+1) minus 119892120575(119906119894gt 0)

(11)

Suppose that 119870 thresholds are found in this way hencedefining 119870 + 1 groups For a particular bandwidth 120575 let thethresholds be denoted by 120591

120575= 120591119896 119896 = 1 2 119870 which

allows the members of the classes to be defined as

119862119896

120575=

119906119894 119906119894lt 1205911 for 119896 = 1

119906119894 120591119896minus1

lt 119906119894lt 120591119896 for 2 le 119896 le 119870

119906119894 119906119894gt 120591119870 for 119896 = 119870 + 1

(12)

and the number in each class as119873119896120575for 119896 = 1 2 119870

Figure 12(a) shows the relation between the locations ofthe thresholds and the level of smoothing defined in terms

8 Advances in Statistics

000 002 004 006 008Bandwidth

minus06

minus04

minus020Thre

shol

d

00

02

04

(a)

Den

sity

0

1

2

3

4

5

Signal strengthminus08 minus04 00 02 04

(b) (c)

Figure 12 (a) Threshold tree (b) histogram of kernel density and (c) turning point locations

of the bandwidth 120575 The change in the bandwidth leads toa change in the number and value of the thresholds andhence in the number of classes As the bandwidth increasesthe number of thresholds decreases and when the curvescombine into a single line there is only a single thresholdwhich means in turn that the number of groups is twoThe procedure is stopped when the smoothed distributionbecomes unimodal In the proposed method the best clas-sification is defined to be when the number of thresholds isstable for the longest

In this example three thresholds persist the longest abandwidth interval indicated by the vertical dotted linesand hence suggests four clusters The optimum smoothingbandwidth 120575lowast is then taken as the smallest value in thisintervalThe corresponding kernel density estimate is plottedin Figure 12(b) the vertical dotted lines show the position ofthe thresholds It is now important to realise that the clustersurrounding signal strength zero includes turning pointsgenerated by the random noise and should be excluded Thiswill be considered now before moving to the final linkingstep

The observed signal strength distribution will be a mix-ture of background values and values due to true horizonsOf course the various means and the variance of thesecomponents are unknown To describe the distribution of thesignal values from the background the following approachis proposed The centre of the background signal strengthdistribution is estimated using the median of the full data

120583 = median119895

119906119895 (13)

and the variance estimate is based on the median absolutedeviation (MAD)

2=

MADΦminus1(34)

with MAD = median119895

10038161003816100381610038161003816119906119895minus 120583

10038161003816100381610038161003816 (14)

where the value 1Φminus1(34) asymp 148 is chosen so that whenthe values follow a Gaussian distribution then 119864[

2] =

1205902 These estimators are chosen to be robust to outliers

caused by the signal strengths from the horizons The solidhorizontal line near the bottom of Figure 12(b) representsa 95 confidence interval for background signal strengthThe calculation assumes a Gaussian distribution and uses theabovemean and variance estimatesThis interval encloses themajority of this zero-centred mode and confirms that it isreasonable to exclude the whole group Before moving onit is worth noting that for datasets containing no horizonsall turning points will form a single group which will beidentified as a background group and hence the procedurewould not identify any horizons

Figure 12(c) shows all the minima and maxima as openand filled circles respectively The grey circles correspond tothe background group and the black circles to the horizongroupsThe algorithm achieves good results in extracting andclassifying the structural points which make the horizonsbut notice that although the horizons can be clearly seenthere are three extra points near the top and one missingalong the central horizon Also although not visible thereare six points misclassified to the wrong group This meansthat linking would not be perfect with some crossing of theproposed horizons This suggests that further criteria mustbe included in this final linking process As a simple examplehere isolated structural point will be excluded from thelinking In particular if the distance to the nearest other pointin the group is greater than say twice the average within-group interpoint distance then is excluded

Let the ordered set of all locations along the traces ofmember of class 119896 be denoted by V

119894 1 2 119873

119896

120575lowast Then

consider the median distance between neighbouring pointsin the class

119889119896= median

119894

1003816100381610038161003816V119894+1

minus V119894

1003816100381610038161003816 (15)

and the distance from a point to the nearest other point in theclass

119898119896

119894= min119895

10038161003816100381610038161003816V119894minus V119895

10038161003816100381610038161003816 (16)

Advances in Statistics 9

Figure 13 Linked structural points with modified method

This allows the following modified rule for class membershipto be defined

119862119896

120575lowast =

V119894 V119894lt 1205911 119898119896

119894lt 2 times 119889

119896

for 119896 = 1

V119894 120591119896minus1

lt V119894lt 120591119896 119898119896

119894lt 2 times 119889

119896

for 2 le 119896 le 119870

V119894 V119894gt 120591119870 119898119896

119894lt 2 times 119889

119896

for 119896 = 119870 + 1

(17)

Applying this final rule to the previous example producesthree groups of structural points forming clear and coherenthorizons as shown in Figure 13 Although not shown hereit has also been applied to examples with higher noise levelswith equal success

4 Conclusions

Seismic surveys play an important role in locating under-ground features but the partial reflection and natural varia-tion in acoustic properties mean that the recorded acousticsignals provide a confusing representation of the studyarea Full 3D datasets can be enormous covering severalsquare kilometres on the surface and representing severalkilometres in depth with high frequency temporal samplingThis suggests that rapid analysis is needed even if interestingexamples require further analysis or examination by expertsThis also suggests that simple techniques are preferablewith full reconstruction reserved for follow-up analysis Rawdata contain systematic distortions and significant noiseand typically they undergo substantial preprocessing beforeanalysis is started The preprocessing aims at reducing thedistortion and reducing noise but can introduce artefacts

The approach proposed here is based on simple statisticalprocedures which aim at smoothing out noise and usingcontextual information to identify coherent horizons In

particular smoothing splines are used to smooth individualtraces before localminima andmaxima are located It appearsthat greater smoothing can be applied to the traces withoutoverwhelming the major features than would be the caseif the aim was to estimate the signal Once a smaller set ofcandidate points is found they can be classified into groupswhere each group represents points along a horizon Clas-sification is performed by thresholding the signal strengthdistribution so that each mode represents a class Againsmoothing is needed to help select the correct number ofmodes and hence classes in the distributionThis time kerneldensity methods are used Here the choice of bandwidth ismuch more important as even small changes can lead to asubstantial change in the smoothed density A scale-spaceapproach is adopted which first allows all levels of smoothingto be considered The best level of smoothing and hencethe number of clusters and the number of horizons is thendetermined as the value for which the number of clusterspersists the longest The cluster associated with the randomvariation in the background is identified and excluded fromhorizon linkage Locationswithin each cluster are then linkedto form horizons In some high noise examples furtherinformation is used in this linking stage to exclude pointswhich are distant from the main cluster and hence unlikelyto be part of the horizon This does not prevent howeverhorizons from being linked across gaps and across faultsClearly it would be possible to supplement this fine tuningwith other criteria as necessary Based on the experimentsconducted here the proposed method shows great promiseIt provides a simple and quick approach to locate and linkhorizons even in relatively noisy datasets It is not affected bygaps in horizons nor by discontinuities caused by faults

Finally it is important to note that the proposed methodis not specific to seismic studies but can easily be appliedto other problems where the aim is to identify coherentstructure within 2D or 3D data Also the use of smoothingand threshold trees gives a simple graphical tool for anyclassification problem and the use of the most persistentthresholding is applicable to many other situations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are grateful to the three anonymous referees fortheir helpful comments which have substantially improvedthis paper

References

[1] PM Shearer Introduction to Seismology Cambridge UniversityPress Cambridge UK 1999

[2] R E Sheriff and L P Geldart Exploration Seismology Cam-bridge University Press Cambridge Mass USA 2nd edition1995

10 Advances in Statistics

[3] K H Waters Reflection Seismology John Wiley amp Sons NewYork NY USA 1978

[4] L Hatton M Worthington and J Makin Seismic Data Pro-cessing Theory and Practice Blackwell Scientific PublicationsLondon UK 1986

[5] A Ribes and F Schmitt ldquoLinear inverse problems in imagingan introductory surveyrdquo IEEE Signal Processing Magazine vol25 no 4 pp 84ndash99 2008

[6] A M Stuart ldquoInverse problems a Bayesian perspectiverdquo ActaNumerica vol 19 pp 451ndash559 2010

[7] W W Symes ldquoThe seismic reflection inverse problemrdquo InverseProblems vol 25 no 12 Article ID 123008 39 pages 2009

[8] V Lendzionowski A TWalden andR EWhite ldquoSeismic char-actermapping over reservoir intervalsrdquoGeophysical Prospectingvol 38 no 8 pp 951ndash969 1990

[9] M van der Baan and A Paul ldquoRecognition and reconstructionof coherent energy with application to deep seismic reflectiondatardquo Geophysics vol 65 no 2 pp 656ndash667 2000

[10] A T Walden ldquoSpatial clustering using simple summaries ofdata to find the edge of an oil-fieldrdquo Applied Statistics vol 43pp 385ndash395 1994

[11] D W Oldenburg T Scheuer and S Levy ldquoRecovery of theacoustic impedance from reflection seismogramsrdquo Geophysicsvol 48 no 10 pp 1318ndash1337 1983

[12] N Ricker ldquoThe form and nature of seismic waves and thestructure of seismogramsrdquoGeophysics vol 5 pp 348ndash366 1940

[13] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[14] P J Green and B W Silverman Nonparametric Regressionand Generalized Linear Models A Roughness Penalty ApproachChapman and Hall 1994

[15] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2013 httpwwwR-projectorg

[16] T Lindeberg ldquoScale-space theory a basic tool for analyzingstructures at different scalesrdquo Journal of Applied Statistics vol21 pp 225ndash270 1994

[17] A P Witkin ldquoScale-space filteringrdquo in Proceedings of 8thInternational Joint Conference on Artificial Intelligence pp 1019ndash1022 Karlsruhe Germany 1983

[18] M P Wand and M C Jones Kernel Smoothing Chapman ampHallCRC 1995

[19] B SilvermanDensity Estimation for Statistics andDataAnalysisChapman amp HallCRC 1998

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 6: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

6 Advances in Statistics

Figure 8 Threshold tree showing the location of the turning points (blue minima red maxima) for varying levels of smoothing

00 02 04 06 08 10 12Smoothing

Num

ber o

f tur

ning

poi

nts

0

10

20

30

40

(a)

minus140

minus100

minus60

minus20

Der

ivat

ive

00 02 04 06 08 10 12Smoothing

(b)

Figure 9 Graphs of (a) the number of turning points as a functionof bandwidth and (b) the corresponding derivative and point ofmaximum absolute derivative

smoothing using the automatically chosen bandwidth andFigure 10(b) shows the corresponding minima and maximaas open and filled circles It is worth noting that this haslocated many more candidate points than the number of trueturning pointsThis is important as at this stage it is vital thattrue horizon locations are not lostThe next stage will identifywhich candidate points form coherent features

33 Candidate Point Classification and Structural Point Link-ing The second stage in the procedure is to classify thecandidate points into groups where members of a groupform a single horizon It is assumed that points along ahorizon will have a distinct acoustic impedance contrastgiving rise to a distinct reflectance coefficient and hence adistinct measured signal Since each horizon is expected tohave a different characteristic acoustic impedance contrastthe measured signals will be a mixture of an unknownnumber of distributions each of an unknown shape Alsothe various sources of systematic and random variability maymean that measured signals from one horizon vary and maybe similar to those from other horizons It is impossibleto obtain a parametric form for this distribution and so akernel density approach is used to provide a nonparametricsmoothed version of the signal strength distribution of thecandidate points For background to kernel density meth-ods see for example Wand and Jones [18] and Silverman[19]

Suppose that 120578 = 119873120582lowast turning points have been found

that their signal strengths are denoted by 119906119894 119894 = 1

120578 and that their locations along the traces are V119894 119894 =

1 120578 then the kernel density estimate of the signalstrength distribution is

119892120575(119906) =

120578

sum

119894=1

119870(

119906119895minus 119906

120575

) minusinfin lt 119906 lt infin (10)

Advances in Statistics 7

(a) (b)

Figure 10 Multiple smoothed traces in (a) with corresponding candidate points in (b) where open and filled circles represent minima andmaxima respectively

(a) (b) (c)

Figure 11 The curves represent kernel density with (a) low smoothing (b) moderate smoothing (c) high smoothing

where 119870(sdot) is the kernel function and 120575 is a bandwidthparameter A popular choice of kernel function is the stan-dard Gaussian density Despite the method being generallysuccessful at finding structure kernel density estimationcan also ignore significant structure via oversmoothing orretain insignificant structure via undersmoothing and hencethe correct choice of 120575 is important Figure 11 shows asimple bimodal dataset after varying levels of smoothingusing the R function density [15] We might guess that (a)is undersmoothed retaining unimportant random variationand that (c) is oversmoothed leaving little useful informationThe central graph (b) uses the default bandwidth based onSilvermanrsquos rule of thumb [19] which seems to summarize thedistribution well

The number of modes in the kernel density estimate isregarded as the number of distinct classes and hence thenumber of horizons The locations of the minima betweenthe modes can be taken as thresholds which can then be usedto classify the candidate points Considering the three levelsof smoothing used in Figure 11 would however lead to verydifferent results Rather than attempting to estimate a singlelevel of smoothing again a scale-space approach will be used(again see eg [16 17]) In this approach many levels of

smoothing are considered allowing features at different scalesto appear and allowing a natural scale to be identified

Let the set of locations of the local minima which are tobe used as threshold be defined as

120591120575= 119866

min120575

= 119906119894 119892120575(119906119894) minus 119892120575(119906119894minus1) lt 0 119892

120575(119906119894+1) minus 119892120575(119906119894gt 0)

(11)

Suppose that 119870 thresholds are found in this way hencedefining 119870 + 1 groups For a particular bandwidth 120575 let thethresholds be denoted by 120591

120575= 120591119896 119896 = 1 2 119870 which

allows the members of the classes to be defined as

119862119896

120575=

119906119894 119906119894lt 1205911 for 119896 = 1

119906119894 120591119896minus1

lt 119906119894lt 120591119896 for 2 le 119896 le 119870

119906119894 119906119894gt 120591119870 for 119896 = 119870 + 1

(12)

and the number in each class as119873119896120575for 119896 = 1 2 119870

Figure 12(a) shows the relation between the locations ofthe thresholds and the level of smoothing defined in terms

8 Advances in Statistics

000 002 004 006 008Bandwidth

minus06

minus04

minus020Thre

shol

d

00

02

04

(a)

Den

sity

0

1

2

3

4

5

Signal strengthminus08 minus04 00 02 04

(b) (c)

Figure 12 (a) Threshold tree (b) histogram of kernel density and (c) turning point locations

of the bandwidth 120575 The change in the bandwidth leads toa change in the number and value of the thresholds andhence in the number of classes As the bandwidth increasesthe number of thresholds decreases and when the curvescombine into a single line there is only a single thresholdwhich means in turn that the number of groups is twoThe procedure is stopped when the smoothed distributionbecomes unimodal In the proposed method the best clas-sification is defined to be when the number of thresholds isstable for the longest

In this example three thresholds persist the longest abandwidth interval indicated by the vertical dotted linesand hence suggests four clusters The optimum smoothingbandwidth 120575lowast is then taken as the smallest value in thisintervalThe corresponding kernel density estimate is plottedin Figure 12(b) the vertical dotted lines show the position ofthe thresholds It is now important to realise that the clustersurrounding signal strength zero includes turning pointsgenerated by the random noise and should be excluded Thiswill be considered now before moving to the final linkingstep

The observed signal strength distribution will be a mix-ture of background values and values due to true horizonsOf course the various means and the variance of thesecomponents are unknown To describe the distribution of thesignal values from the background the following approachis proposed The centre of the background signal strengthdistribution is estimated using the median of the full data

120583 = median119895

119906119895 (13)

and the variance estimate is based on the median absolutedeviation (MAD)

2=

MADΦminus1(34)

with MAD = median119895

10038161003816100381610038161003816119906119895minus 120583

10038161003816100381610038161003816 (14)

where the value 1Φminus1(34) asymp 148 is chosen so that whenthe values follow a Gaussian distribution then 119864[

2] =

1205902 These estimators are chosen to be robust to outliers

caused by the signal strengths from the horizons The solidhorizontal line near the bottom of Figure 12(b) representsa 95 confidence interval for background signal strengthThe calculation assumes a Gaussian distribution and uses theabovemean and variance estimatesThis interval encloses themajority of this zero-centred mode and confirms that it isreasonable to exclude the whole group Before moving onit is worth noting that for datasets containing no horizonsall turning points will form a single group which will beidentified as a background group and hence the procedurewould not identify any horizons

Figure 12(c) shows all the minima and maxima as openand filled circles respectively The grey circles correspond tothe background group and the black circles to the horizongroupsThe algorithm achieves good results in extracting andclassifying the structural points which make the horizonsbut notice that although the horizons can be clearly seenthere are three extra points near the top and one missingalong the central horizon Also although not visible thereare six points misclassified to the wrong group This meansthat linking would not be perfect with some crossing of theproposed horizons This suggests that further criteria mustbe included in this final linking process As a simple examplehere isolated structural point will be excluded from thelinking In particular if the distance to the nearest other pointin the group is greater than say twice the average within-group interpoint distance then is excluded

Let the ordered set of all locations along the traces ofmember of class 119896 be denoted by V

119894 1 2 119873

119896

120575lowast Then

consider the median distance between neighbouring pointsin the class

119889119896= median

119894

1003816100381610038161003816V119894+1

minus V119894

1003816100381610038161003816 (15)

and the distance from a point to the nearest other point in theclass

119898119896

119894= min119895

10038161003816100381610038161003816V119894minus V119895

10038161003816100381610038161003816 (16)

Advances in Statistics 9

Figure 13 Linked structural points with modified method

This allows the following modified rule for class membershipto be defined

119862119896

120575lowast =

V119894 V119894lt 1205911 119898119896

119894lt 2 times 119889

119896

for 119896 = 1

V119894 120591119896minus1

lt V119894lt 120591119896 119898119896

119894lt 2 times 119889

119896

for 2 le 119896 le 119870

V119894 V119894gt 120591119870 119898119896

119894lt 2 times 119889

119896

for 119896 = 119870 + 1

(17)

Applying this final rule to the previous example producesthree groups of structural points forming clear and coherenthorizons as shown in Figure 13 Although not shown hereit has also been applied to examples with higher noise levelswith equal success

4 Conclusions

Seismic surveys play an important role in locating under-ground features but the partial reflection and natural varia-tion in acoustic properties mean that the recorded acousticsignals provide a confusing representation of the studyarea Full 3D datasets can be enormous covering severalsquare kilometres on the surface and representing severalkilometres in depth with high frequency temporal samplingThis suggests that rapid analysis is needed even if interestingexamples require further analysis or examination by expertsThis also suggests that simple techniques are preferablewith full reconstruction reserved for follow-up analysis Rawdata contain systematic distortions and significant noiseand typically they undergo substantial preprocessing beforeanalysis is started The preprocessing aims at reducing thedistortion and reducing noise but can introduce artefacts

The approach proposed here is based on simple statisticalprocedures which aim at smoothing out noise and usingcontextual information to identify coherent horizons In

particular smoothing splines are used to smooth individualtraces before localminima andmaxima are located It appearsthat greater smoothing can be applied to the traces withoutoverwhelming the major features than would be the caseif the aim was to estimate the signal Once a smaller set ofcandidate points is found they can be classified into groupswhere each group represents points along a horizon Clas-sification is performed by thresholding the signal strengthdistribution so that each mode represents a class Againsmoothing is needed to help select the correct number ofmodes and hence classes in the distributionThis time kerneldensity methods are used Here the choice of bandwidth ismuch more important as even small changes can lead to asubstantial change in the smoothed density A scale-spaceapproach is adopted which first allows all levels of smoothingto be considered The best level of smoothing and hencethe number of clusters and the number of horizons is thendetermined as the value for which the number of clusterspersists the longest The cluster associated with the randomvariation in the background is identified and excluded fromhorizon linkage Locationswithin each cluster are then linkedto form horizons In some high noise examples furtherinformation is used in this linking stage to exclude pointswhich are distant from the main cluster and hence unlikelyto be part of the horizon This does not prevent howeverhorizons from being linked across gaps and across faultsClearly it would be possible to supplement this fine tuningwith other criteria as necessary Based on the experimentsconducted here the proposed method shows great promiseIt provides a simple and quick approach to locate and linkhorizons even in relatively noisy datasets It is not affected bygaps in horizons nor by discontinuities caused by faults

Finally it is important to note that the proposed methodis not specific to seismic studies but can easily be appliedto other problems where the aim is to identify coherentstructure within 2D or 3D data Also the use of smoothingand threshold trees gives a simple graphical tool for anyclassification problem and the use of the most persistentthresholding is applicable to many other situations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are grateful to the three anonymous referees fortheir helpful comments which have substantially improvedthis paper

References

[1] PM Shearer Introduction to Seismology Cambridge UniversityPress Cambridge UK 1999

[2] R E Sheriff and L P Geldart Exploration Seismology Cam-bridge University Press Cambridge Mass USA 2nd edition1995

10 Advances in Statistics

[3] K H Waters Reflection Seismology John Wiley amp Sons NewYork NY USA 1978

[4] L Hatton M Worthington and J Makin Seismic Data Pro-cessing Theory and Practice Blackwell Scientific PublicationsLondon UK 1986

[5] A Ribes and F Schmitt ldquoLinear inverse problems in imagingan introductory surveyrdquo IEEE Signal Processing Magazine vol25 no 4 pp 84ndash99 2008

[6] A M Stuart ldquoInverse problems a Bayesian perspectiverdquo ActaNumerica vol 19 pp 451ndash559 2010

[7] W W Symes ldquoThe seismic reflection inverse problemrdquo InverseProblems vol 25 no 12 Article ID 123008 39 pages 2009

[8] V Lendzionowski A TWalden andR EWhite ldquoSeismic char-actermapping over reservoir intervalsrdquoGeophysical Prospectingvol 38 no 8 pp 951ndash969 1990

[9] M van der Baan and A Paul ldquoRecognition and reconstructionof coherent energy with application to deep seismic reflectiondatardquo Geophysics vol 65 no 2 pp 656ndash667 2000

[10] A T Walden ldquoSpatial clustering using simple summaries ofdata to find the edge of an oil-fieldrdquo Applied Statistics vol 43pp 385ndash395 1994

[11] D W Oldenburg T Scheuer and S Levy ldquoRecovery of theacoustic impedance from reflection seismogramsrdquo Geophysicsvol 48 no 10 pp 1318ndash1337 1983

[12] N Ricker ldquoThe form and nature of seismic waves and thestructure of seismogramsrdquoGeophysics vol 5 pp 348ndash366 1940

[13] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[14] P J Green and B W Silverman Nonparametric Regressionand Generalized Linear Models A Roughness Penalty ApproachChapman and Hall 1994

[15] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2013 httpwwwR-projectorg

[16] T Lindeberg ldquoScale-space theory a basic tool for analyzingstructures at different scalesrdquo Journal of Applied Statistics vol21 pp 225ndash270 1994

[17] A P Witkin ldquoScale-space filteringrdquo in Proceedings of 8thInternational Joint Conference on Artificial Intelligence pp 1019ndash1022 Karlsruhe Germany 1983

[18] M P Wand and M C Jones Kernel Smoothing Chapman ampHallCRC 1995

[19] B SilvermanDensity Estimation for Statistics andDataAnalysisChapman amp HallCRC 1998

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 7: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

Advances in Statistics 7

(a) (b)

Figure 10 Multiple smoothed traces in (a) with corresponding candidate points in (b) where open and filled circles represent minima andmaxima respectively

(a) (b) (c)

Figure 11 The curves represent kernel density with (a) low smoothing (b) moderate smoothing (c) high smoothing

where 119870(sdot) is the kernel function and 120575 is a bandwidthparameter A popular choice of kernel function is the stan-dard Gaussian density Despite the method being generallysuccessful at finding structure kernel density estimationcan also ignore significant structure via oversmoothing orretain insignificant structure via undersmoothing and hencethe correct choice of 120575 is important Figure 11 shows asimple bimodal dataset after varying levels of smoothingusing the R function density [15] We might guess that (a)is undersmoothed retaining unimportant random variationand that (c) is oversmoothed leaving little useful informationThe central graph (b) uses the default bandwidth based onSilvermanrsquos rule of thumb [19] which seems to summarize thedistribution well

The number of modes in the kernel density estimate isregarded as the number of distinct classes and hence thenumber of horizons The locations of the minima betweenthe modes can be taken as thresholds which can then be usedto classify the candidate points Considering the three levelsof smoothing used in Figure 11 would however lead to verydifferent results Rather than attempting to estimate a singlelevel of smoothing again a scale-space approach will be used(again see eg [16 17]) In this approach many levels of

smoothing are considered allowing features at different scalesto appear and allowing a natural scale to be identified

Let the set of locations of the local minima which are tobe used as threshold be defined as

120591120575= 119866

min120575

= 119906119894 119892120575(119906119894) minus 119892120575(119906119894minus1) lt 0 119892

120575(119906119894+1) minus 119892120575(119906119894gt 0)

(11)

Suppose that 119870 thresholds are found in this way hencedefining 119870 + 1 groups For a particular bandwidth 120575 let thethresholds be denoted by 120591

120575= 120591119896 119896 = 1 2 119870 which

allows the members of the classes to be defined as

119862119896

120575=

119906119894 119906119894lt 1205911 for 119896 = 1

119906119894 120591119896minus1

lt 119906119894lt 120591119896 for 2 le 119896 le 119870

119906119894 119906119894gt 120591119870 for 119896 = 119870 + 1

(12)

and the number in each class as119873119896120575for 119896 = 1 2 119870

Figure 12(a) shows the relation between the locations ofthe thresholds and the level of smoothing defined in terms

8 Advances in Statistics

000 002 004 006 008Bandwidth

minus06

minus04

minus020Thre

shol

d

00

02

04

(a)

Den

sity

0

1

2

3

4

5

Signal strengthminus08 minus04 00 02 04

(b) (c)

Figure 12 (a) Threshold tree (b) histogram of kernel density and (c) turning point locations

of the bandwidth 120575 The change in the bandwidth leads toa change in the number and value of the thresholds andhence in the number of classes As the bandwidth increasesthe number of thresholds decreases and when the curvescombine into a single line there is only a single thresholdwhich means in turn that the number of groups is twoThe procedure is stopped when the smoothed distributionbecomes unimodal In the proposed method the best clas-sification is defined to be when the number of thresholds isstable for the longest

In this example three thresholds persist the longest abandwidth interval indicated by the vertical dotted linesand hence suggests four clusters The optimum smoothingbandwidth 120575lowast is then taken as the smallest value in thisintervalThe corresponding kernel density estimate is plottedin Figure 12(b) the vertical dotted lines show the position ofthe thresholds It is now important to realise that the clustersurrounding signal strength zero includes turning pointsgenerated by the random noise and should be excluded Thiswill be considered now before moving to the final linkingstep

The observed signal strength distribution will be a mix-ture of background values and values due to true horizonsOf course the various means and the variance of thesecomponents are unknown To describe the distribution of thesignal values from the background the following approachis proposed The centre of the background signal strengthdistribution is estimated using the median of the full data

120583 = median119895

119906119895 (13)

and the variance estimate is based on the median absolutedeviation (MAD)

2=

MADΦminus1(34)

with MAD = median119895

10038161003816100381610038161003816119906119895minus 120583

10038161003816100381610038161003816 (14)

where the value 1Φminus1(34) asymp 148 is chosen so that whenthe values follow a Gaussian distribution then 119864[

2] =

1205902 These estimators are chosen to be robust to outliers

caused by the signal strengths from the horizons The solidhorizontal line near the bottom of Figure 12(b) representsa 95 confidence interval for background signal strengthThe calculation assumes a Gaussian distribution and uses theabovemean and variance estimatesThis interval encloses themajority of this zero-centred mode and confirms that it isreasonable to exclude the whole group Before moving onit is worth noting that for datasets containing no horizonsall turning points will form a single group which will beidentified as a background group and hence the procedurewould not identify any horizons

Figure 12(c) shows all the minima and maxima as openand filled circles respectively The grey circles correspond tothe background group and the black circles to the horizongroupsThe algorithm achieves good results in extracting andclassifying the structural points which make the horizonsbut notice that although the horizons can be clearly seenthere are three extra points near the top and one missingalong the central horizon Also although not visible thereare six points misclassified to the wrong group This meansthat linking would not be perfect with some crossing of theproposed horizons This suggests that further criteria mustbe included in this final linking process As a simple examplehere isolated structural point will be excluded from thelinking In particular if the distance to the nearest other pointin the group is greater than say twice the average within-group interpoint distance then is excluded

Let the ordered set of all locations along the traces ofmember of class 119896 be denoted by V

119894 1 2 119873

119896

120575lowast Then

consider the median distance between neighbouring pointsin the class

119889119896= median

119894

1003816100381610038161003816V119894+1

minus V119894

1003816100381610038161003816 (15)

and the distance from a point to the nearest other point in theclass

119898119896

119894= min119895

10038161003816100381610038161003816V119894minus V119895

10038161003816100381610038161003816 (16)

Advances in Statistics 9

Figure 13 Linked structural points with modified method

This allows the following modified rule for class membershipto be defined

119862119896

120575lowast =

V119894 V119894lt 1205911 119898119896

119894lt 2 times 119889

119896

for 119896 = 1

V119894 120591119896minus1

lt V119894lt 120591119896 119898119896

119894lt 2 times 119889

119896

for 2 le 119896 le 119870

V119894 V119894gt 120591119870 119898119896

119894lt 2 times 119889

119896

for 119896 = 119870 + 1

(17)

Applying this final rule to the previous example producesthree groups of structural points forming clear and coherenthorizons as shown in Figure 13 Although not shown hereit has also been applied to examples with higher noise levelswith equal success

4 Conclusions

Seismic surveys play an important role in locating under-ground features but the partial reflection and natural varia-tion in acoustic properties mean that the recorded acousticsignals provide a confusing representation of the studyarea Full 3D datasets can be enormous covering severalsquare kilometres on the surface and representing severalkilometres in depth with high frequency temporal samplingThis suggests that rapid analysis is needed even if interestingexamples require further analysis or examination by expertsThis also suggests that simple techniques are preferablewith full reconstruction reserved for follow-up analysis Rawdata contain systematic distortions and significant noiseand typically they undergo substantial preprocessing beforeanalysis is started The preprocessing aims at reducing thedistortion and reducing noise but can introduce artefacts

The approach proposed here is based on simple statisticalprocedures which aim at smoothing out noise and usingcontextual information to identify coherent horizons In

particular smoothing splines are used to smooth individualtraces before localminima andmaxima are located It appearsthat greater smoothing can be applied to the traces withoutoverwhelming the major features than would be the caseif the aim was to estimate the signal Once a smaller set ofcandidate points is found they can be classified into groupswhere each group represents points along a horizon Clas-sification is performed by thresholding the signal strengthdistribution so that each mode represents a class Againsmoothing is needed to help select the correct number ofmodes and hence classes in the distributionThis time kerneldensity methods are used Here the choice of bandwidth ismuch more important as even small changes can lead to asubstantial change in the smoothed density A scale-spaceapproach is adopted which first allows all levels of smoothingto be considered The best level of smoothing and hencethe number of clusters and the number of horizons is thendetermined as the value for which the number of clusterspersists the longest The cluster associated with the randomvariation in the background is identified and excluded fromhorizon linkage Locationswithin each cluster are then linkedto form horizons In some high noise examples furtherinformation is used in this linking stage to exclude pointswhich are distant from the main cluster and hence unlikelyto be part of the horizon This does not prevent howeverhorizons from being linked across gaps and across faultsClearly it would be possible to supplement this fine tuningwith other criteria as necessary Based on the experimentsconducted here the proposed method shows great promiseIt provides a simple and quick approach to locate and linkhorizons even in relatively noisy datasets It is not affected bygaps in horizons nor by discontinuities caused by faults

Finally it is important to note that the proposed methodis not specific to seismic studies but can easily be appliedto other problems where the aim is to identify coherentstructure within 2D or 3D data Also the use of smoothingand threshold trees gives a simple graphical tool for anyclassification problem and the use of the most persistentthresholding is applicable to many other situations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are grateful to the three anonymous referees fortheir helpful comments which have substantially improvedthis paper

References

[1] PM Shearer Introduction to Seismology Cambridge UniversityPress Cambridge UK 1999

[2] R E Sheriff and L P Geldart Exploration Seismology Cam-bridge University Press Cambridge Mass USA 2nd edition1995

10 Advances in Statistics

[3] K H Waters Reflection Seismology John Wiley amp Sons NewYork NY USA 1978

[4] L Hatton M Worthington and J Makin Seismic Data Pro-cessing Theory and Practice Blackwell Scientific PublicationsLondon UK 1986

[5] A Ribes and F Schmitt ldquoLinear inverse problems in imagingan introductory surveyrdquo IEEE Signal Processing Magazine vol25 no 4 pp 84ndash99 2008

[6] A M Stuart ldquoInverse problems a Bayesian perspectiverdquo ActaNumerica vol 19 pp 451ndash559 2010

[7] W W Symes ldquoThe seismic reflection inverse problemrdquo InverseProblems vol 25 no 12 Article ID 123008 39 pages 2009

[8] V Lendzionowski A TWalden andR EWhite ldquoSeismic char-actermapping over reservoir intervalsrdquoGeophysical Prospectingvol 38 no 8 pp 951ndash969 1990

[9] M van der Baan and A Paul ldquoRecognition and reconstructionof coherent energy with application to deep seismic reflectiondatardquo Geophysics vol 65 no 2 pp 656ndash667 2000

[10] A T Walden ldquoSpatial clustering using simple summaries ofdata to find the edge of an oil-fieldrdquo Applied Statistics vol 43pp 385ndash395 1994

[11] D W Oldenburg T Scheuer and S Levy ldquoRecovery of theacoustic impedance from reflection seismogramsrdquo Geophysicsvol 48 no 10 pp 1318ndash1337 1983

[12] N Ricker ldquoThe form and nature of seismic waves and thestructure of seismogramsrdquoGeophysics vol 5 pp 348ndash366 1940

[13] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[14] P J Green and B W Silverman Nonparametric Regressionand Generalized Linear Models A Roughness Penalty ApproachChapman and Hall 1994

[15] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2013 httpwwwR-projectorg

[16] T Lindeberg ldquoScale-space theory a basic tool for analyzingstructures at different scalesrdquo Journal of Applied Statistics vol21 pp 225ndash270 1994

[17] A P Witkin ldquoScale-space filteringrdquo in Proceedings of 8thInternational Joint Conference on Artificial Intelligence pp 1019ndash1022 Karlsruhe Germany 1983

[18] M P Wand and M C Jones Kernel Smoothing Chapman ampHallCRC 1995

[19] B SilvermanDensity Estimation for Statistics andDataAnalysisChapman amp HallCRC 1998

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 8: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

8 Advances in Statistics

000 002 004 006 008Bandwidth

minus06

minus04

minus020Thre

shol

d

00

02

04

(a)

Den

sity

0

1

2

3

4

5

Signal strengthminus08 minus04 00 02 04

(b) (c)

Figure 12 (a) Threshold tree (b) histogram of kernel density and (c) turning point locations

of the bandwidth 120575 The change in the bandwidth leads toa change in the number and value of the thresholds andhence in the number of classes As the bandwidth increasesthe number of thresholds decreases and when the curvescombine into a single line there is only a single thresholdwhich means in turn that the number of groups is twoThe procedure is stopped when the smoothed distributionbecomes unimodal In the proposed method the best clas-sification is defined to be when the number of thresholds isstable for the longest

In this example three thresholds persist the longest abandwidth interval indicated by the vertical dotted linesand hence suggests four clusters The optimum smoothingbandwidth 120575lowast is then taken as the smallest value in thisintervalThe corresponding kernel density estimate is plottedin Figure 12(b) the vertical dotted lines show the position ofthe thresholds It is now important to realise that the clustersurrounding signal strength zero includes turning pointsgenerated by the random noise and should be excluded Thiswill be considered now before moving to the final linkingstep

The observed signal strength distribution will be a mix-ture of background values and values due to true horizonsOf course the various means and the variance of thesecomponents are unknown To describe the distribution of thesignal values from the background the following approachis proposed The centre of the background signal strengthdistribution is estimated using the median of the full data

120583 = median119895

119906119895 (13)

and the variance estimate is based on the median absolutedeviation (MAD)

2=

MADΦminus1(34)

with MAD = median119895

10038161003816100381610038161003816119906119895minus 120583

10038161003816100381610038161003816 (14)

where the value 1Φminus1(34) asymp 148 is chosen so that whenthe values follow a Gaussian distribution then 119864[

2] =

1205902 These estimators are chosen to be robust to outliers

caused by the signal strengths from the horizons The solidhorizontal line near the bottom of Figure 12(b) representsa 95 confidence interval for background signal strengthThe calculation assumes a Gaussian distribution and uses theabovemean and variance estimatesThis interval encloses themajority of this zero-centred mode and confirms that it isreasonable to exclude the whole group Before moving onit is worth noting that for datasets containing no horizonsall turning points will form a single group which will beidentified as a background group and hence the procedurewould not identify any horizons

Figure 12(c) shows all the minima and maxima as openand filled circles respectively The grey circles correspond tothe background group and the black circles to the horizongroupsThe algorithm achieves good results in extracting andclassifying the structural points which make the horizonsbut notice that although the horizons can be clearly seenthere are three extra points near the top and one missingalong the central horizon Also although not visible thereare six points misclassified to the wrong group This meansthat linking would not be perfect with some crossing of theproposed horizons This suggests that further criteria mustbe included in this final linking process As a simple examplehere isolated structural point will be excluded from thelinking In particular if the distance to the nearest other pointin the group is greater than say twice the average within-group interpoint distance then is excluded

Let the ordered set of all locations along the traces ofmember of class 119896 be denoted by V

119894 1 2 119873

119896

120575lowast Then

consider the median distance between neighbouring pointsin the class

119889119896= median

119894

1003816100381610038161003816V119894+1

minus V119894

1003816100381610038161003816 (15)

and the distance from a point to the nearest other point in theclass

119898119896

119894= min119895

10038161003816100381610038161003816V119894minus V119895

10038161003816100381610038161003816 (16)

Advances in Statistics 9

Figure 13 Linked structural points with modified method

This allows the following modified rule for class membershipto be defined

119862119896

120575lowast =

V119894 V119894lt 1205911 119898119896

119894lt 2 times 119889

119896

for 119896 = 1

V119894 120591119896minus1

lt V119894lt 120591119896 119898119896

119894lt 2 times 119889

119896

for 2 le 119896 le 119870

V119894 V119894gt 120591119870 119898119896

119894lt 2 times 119889

119896

for 119896 = 119870 + 1

(17)

Applying this final rule to the previous example producesthree groups of structural points forming clear and coherenthorizons as shown in Figure 13 Although not shown hereit has also been applied to examples with higher noise levelswith equal success

4 Conclusions

Seismic surveys play an important role in locating under-ground features but the partial reflection and natural varia-tion in acoustic properties mean that the recorded acousticsignals provide a confusing representation of the studyarea Full 3D datasets can be enormous covering severalsquare kilometres on the surface and representing severalkilometres in depth with high frequency temporal samplingThis suggests that rapid analysis is needed even if interestingexamples require further analysis or examination by expertsThis also suggests that simple techniques are preferablewith full reconstruction reserved for follow-up analysis Rawdata contain systematic distortions and significant noiseand typically they undergo substantial preprocessing beforeanalysis is started The preprocessing aims at reducing thedistortion and reducing noise but can introduce artefacts

The approach proposed here is based on simple statisticalprocedures which aim at smoothing out noise and usingcontextual information to identify coherent horizons In

particular smoothing splines are used to smooth individualtraces before localminima andmaxima are located It appearsthat greater smoothing can be applied to the traces withoutoverwhelming the major features than would be the caseif the aim was to estimate the signal Once a smaller set ofcandidate points is found they can be classified into groupswhere each group represents points along a horizon Clas-sification is performed by thresholding the signal strengthdistribution so that each mode represents a class Againsmoothing is needed to help select the correct number ofmodes and hence classes in the distributionThis time kerneldensity methods are used Here the choice of bandwidth ismuch more important as even small changes can lead to asubstantial change in the smoothed density A scale-spaceapproach is adopted which first allows all levels of smoothingto be considered The best level of smoothing and hencethe number of clusters and the number of horizons is thendetermined as the value for which the number of clusterspersists the longest The cluster associated with the randomvariation in the background is identified and excluded fromhorizon linkage Locationswithin each cluster are then linkedto form horizons In some high noise examples furtherinformation is used in this linking stage to exclude pointswhich are distant from the main cluster and hence unlikelyto be part of the horizon This does not prevent howeverhorizons from being linked across gaps and across faultsClearly it would be possible to supplement this fine tuningwith other criteria as necessary Based on the experimentsconducted here the proposed method shows great promiseIt provides a simple and quick approach to locate and linkhorizons even in relatively noisy datasets It is not affected bygaps in horizons nor by discontinuities caused by faults

Finally it is important to note that the proposed methodis not specific to seismic studies but can easily be appliedto other problems where the aim is to identify coherentstructure within 2D or 3D data Also the use of smoothingand threshold trees gives a simple graphical tool for anyclassification problem and the use of the most persistentthresholding is applicable to many other situations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are grateful to the three anonymous referees fortheir helpful comments which have substantially improvedthis paper

References

[1] PM Shearer Introduction to Seismology Cambridge UniversityPress Cambridge UK 1999

[2] R E Sheriff and L P Geldart Exploration Seismology Cam-bridge University Press Cambridge Mass USA 2nd edition1995

10 Advances in Statistics

[3] K H Waters Reflection Seismology John Wiley amp Sons NewYork NY USA 1978

[4] L Hatton M Worthington and J Makin Seismic Data Pro-cessing Theory and Practice Blackwell Scientific PublicationsLondon UK 1986

[5] A Ribes and F Schmitt ldquoLinear inverse problems in imagingan introductory surveyrdquo IEEE Signal Processing Magazine vol25 no 4 pp 84ndash99 2008

[6] A M Stuart ldquoInverse problems a Bayesian perspectiverdquo ActaNumerica vol 19 pp 451ndash559 2010

[7] W W Symes ldquoThe seismic reflection inverse problemrdquo InverseProblems vol 25 no 12 Article ID 123008 39 pages 2009

[8] V Lendzionowski A TWalden andR EWhite ldquoSeismic char-actermapping over reservoir intervalsrdquoGeophysical Prospectingvol 38 no 8 pp 951ndash969 1990

[9] M van der Baan and A Paul ldquoRecognition and reconstructionof coherent energy with application to deep seismic reflectiondatardquo Geophysics vol 65 no 2 pp 656ndash667 2000

[10] A T Walden ldquoSpatial clustering using simple summaries ofdata to find the edge of an oil-fieldrdquo Applied Statistics vol 43pp 385ndash395 1994

[11] D W Oldenburg T Scheuer and S Levy ldquoRecovery of theacoustic impedance from reflection seismogramsrdquo Geophysicsvol 48 no 10 pp 1318ndash1337 1983

[12] N Ricker ldquoThe form and nature of seismic waves and thestructure of seismogramsrdquoGeophysics vol 5 pp 348ndash366 1940

[13] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[14] P J Green and B W Silverman Nonparametric Regressionand Generalized Linear Models A Roughness Penalty ApproachChapman and Hall 1994

[15] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2013 httpwwwR-projectorg

[16] T Lindeberg ldquoScale-space theory a basic tool for analyzingstructures at different scalesrdquo Journal of Applied Statistics vol21 pp 225ndash270 1994

[17] A P Witkin ldquoScale-space filteringrdquo in Proceedings of 8thInternational Joint Conference on Artificial Intelligence pp 1019ndash1022 Karlsruhe Germany 1983

[18] M P Wand and M C Jones Kernel Smoothing Chapman ampHallCRC 1995

[19] B SilvermanDensity Estimation for Statistics andDataAnalysisChapman amp HallCRC 1998

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 9: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

Advances in Statistics 9

Figure 13 Linked structural points with modified method

This allows the following modified rule for class membershipto be defined

119862119896

120575lowast =

V119894 V119894lt 1205911 119898119896

119894lt 2 times 119889

119896

for 119896 = 1

V119894 120591119896minus1

lt V119894lt 120591119896 119898119896

119894lt 2 times 119889

119896

for 2 le 119896 le 119870

V119894 V119894gt 120591119870 119898119896

119894lt 2 times 119889

119896

for 119896 = 119870 + 1

(17)

Applying this final rule to the previous example producesthree groups of structural points forming clear and coherenthorizons as shown in Figure 13 Although not shown hereit has also been applied to examples with higher noise levelswith equal success

4 Conclusions

Seismic surveys play an important role in locating under-ground features but the partial reflection and natural varia-tion in acoustic properties mean that the recorded acousticsignals provide a confusing representation of the studyarea Full 3D datasets can be enormous covering severalsquare kilometres on the surface and representing severalkilometres in depth with high frequency temporal samplingThis suggests that rapid analysis is needed even if interestingexamples require further analysis or examination by expertsThis also suggests that simple techniques are preferablewith full reconstruction reserved for follow-up analysis Rawdata contain systematic distortions and significant noiseand typically they undergo substantial preprocessing beforeanalysis is started The preprocessing aims at reducing thedistortion and reducing noise but can introduce artefacts

The approach proposed here is based on simple statisticalprocedures which aim at smoothing out noise and usingcontextual information to identify coherent horizons In

particular smoothing splines are used to smooth individualtraces before localminima andmaxima are located It appearsthat greater smoothing can be applied to the traces withoutoverwhelming the major features than would be the caseif the aim was to estimate the signal Once a smaller set ofcandidate points is found they can be classified into groupswhere each group represents points along a horizon Clas-sification is performed by thresholding the signal strengthdistribution so that each mode represents a class Againsmoothing is needed to help select the correct number ofmodes and hence classes in the distributionThis time kerneldensity methods are used Here the choice of bandwidth ismuch more important as even small changes can lead to asubstantial change in the smoothed density A scale-spaceapproach is adopted which first allows all levels of smoothingto be considered The best level of smoothing and hencethe number of clusters and the number of horizons is thendetermined as the value for which the number of clusterspersists the longest The cluster associated with the randomvariation in the background is identified and excluded fromhorizon linkage Locationswithin each cluster are then linkedto form horizons In some high noise examples furtherinformation is used in this linking stage to exclude pointswhich are distant from the main cluster and hence unlikelyto be part of the horizon This does not prevent howeverhorizons from being linked across gaps and across faultsClearly it would be possible to supplement this fine tuningwith other criteria as necessary Based on the experimentsconducted here the proposed method shows great promiseIt provides a simple and quick approach to locate and linkhorizons even in relatively noisy datasets It is not affected bygaps in horizons nor by discontinuities caused by faults

Finally it is important to note that the proposed methodis not specific to seismic studies but can easily be appliedto other problems where the aim is to identify coherentstructure within 2D or 3D data Also the use of smoothingand threshold trees gives a simple graphical tool for anyclassification problem and the use of the most persistentthresholding is applicable to many other situations

Conflict of Interests

The authors declare that there is no conflict of interestsregarding the publication of this paper

Acknowledgment

The authors are grateful to the three anonymous referees fortheir helpful comments which have substantially improvedthis paper

References

[1] PM Shearer Introduction to Seismology Cambridge UniversityPress Cambridge UK 1999

[2] R E Sheriff and L P Geldart Exploration Seismology Cam-bridge University Press Cambridge Mass USA 2nd edition1995

10 Advances in Statistics

[3] K H Waters Reflection Seismology John Wiley amp Sons NewYork NY USA 1978

[4] L Hatton M Worthington and J Makin Seismic Data Pro-cessing Theory and Practice Blackwell Scientific PublicationsLondon UK 1986

[5] A Ribes and F Schmitt ldquoLinear inverse problems in imagingan introductory surveyrdquo IEEE Signal Processing Magazine vol25 no 4 pp 84ndash99 2008

[6] A M Stuart ldquoInverse problems a Bayesian perspectiverdquo ActaNumerica vol 19 pp 451ndash559 2010

[7] W W Symes ldquoThe seismic reflection inverse problemrdquo InverseProblems vol 25 no 12 Article ID 123008 39 pages 2009

[8] V Lendzionowski A TWalden andR EWhite ldquoSeismic char-actermapping over reservoir intervalsrdquoGeophysical Prospectingvol 38 no 8 pp 951ndash969 1990

[9] M van der Baan and A Paul ldquoRecognition and reconstructionof coherent energy with application to deep seismic reflectiondatardquo Geophysics vol 65 no 2 pp 656ndash667 2000

[10] A T Walden ldquoSpatial clustering using simple summaries ofdata to find the edge of an oil-fieldrdquo Applied Statistics vol 43pp 385ndash395 1994

[11] D W Oldenburg T Scheuer and S Levy ldquoRecovery of theacoustic impedance from reflection seismogramsrdquo Geophysicsvol 48 no 10 pp 1318ndash1337 1983

[12] N Ricker ldquoThe form and nature of seismic waves and thestructure of seismogramsrdquoGeophysics vol 5 pp 348ndash366 1940

[13] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[14] P J Green and B W Silverman Nonparametric Regressionand Generalized Linear Models A Roughness Penalty ApproachChapman and Hall 1994

[15] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2013 httpwwwR-projectorg

[16] T Lindeberg ldquoScale-space theory a basic tool for analyzingstructures at different scalesrdquo Journal of Applied Statistics vol21 pp 225ndash270 1994

[17] A P Witkin ldquoScale-space filteringrdquo in Proceedings of 8thInternational Joint Conference on Artificial Intelligence pp 1019ndash1022 Karlsruhe Germany 1983

[18] M P Wand and M C Jones Kernel Smoothing Chapman ampHallCRC 1995

[19] B SilvermanDensity Estimation for Statistics andDataAnalysisChapman amp HallCRC 1998

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 10: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

10 Advances in Statistics

[3] K H Waters Reflection Seismology John Wiley amp Sons NewYork NY USA 1978

[4] L Hatton M Worthington and J Makin Seismic Data Pro-cessing Theory and Practice Blackwell Scientific PublicationsLondon UK 1986

[5] A Ribes and F Schmitt ldquoLinear inverse problems in imagingan introductory surveyrdquo IEEE Signal Processing Magazine vol25 no 4 pp 84ndash99 2008

[6] A M Stuart ldquoInverse problems a Bayesian perspectiverdquo ActaNumerica vol 19 pp 451ndash559 2010

[7] W W Symes ldquoThe seismic reflection inverse problemrdquo InverseProblems vol 25 no 12 Article ID 123008 39 pages 2009

[8] V Lendzionowski A TWalden andR EWhite ldquoSeismic char-actermapping over reservoir intervalsrdquoGeophysical Prospectingvol 38 no 8 pp 951ndash969 1990

[9] M van der Baan and A Paul ldquoRecognition and reconstructionof coherent energy with application to deep seismic reflectiondatardquo Geophysics vol 65 no 2 pp 656ndash667 2000

[10] A T Walden ldquoSpatial clustering using simple summaries ofdata to find the edge of an oil-fieldrdquo Applied Statistics vol 43pp 385ndash395 1994

[11] D W Oldenburg T Scheuer and S Levy ldquoRecovery of theacoustic impedance from reflection seismogramsrdquo Geophysicsvol 48 no 10 pp 1318ndash1337 1983

[12] N Ricker ldquoThe form and nature of seismic waves and thestructure of seismogramsrdquoGeophysics vol 5 pp 348ndash366 1940

[13] C de BoorAPractical Guide to Splines Springer NewYork NYUSA 2001

[14] P J Green and B W Silverman Nonparametric Regressionand Generalized Linear Models A Roughness Penalty ApproachChapman and Hall 1994

[15] R Core Team R A Language and Environment for StatisticalComputing R Foundation for Statistical Computing ViennaAustria 2013 httpwwwR-projectorg

[16] T Lindeberg ldquoScale-space theory a basic tool for analyzingstructures at different scalesrdquo Journal of Applied Statistics vol21 pp 225ndash270 1994

[17] A P Witkin ldquoScale-space filteringrdquo in Proceedings of 8thInternational Joint Conference on Artificial Intelligence pp 1019ndash1022 Karlsruhe Germany 1983

[18] M P Wand and M C Jones Kernel Smoothing Chapman ampHallCRC 1995

[19] B SilvermanDensity Estimation for Statistics andDataAnalysisChapman amp HallCRC 1998

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of

Page 11: Research Article Horizon Detection in Seismic Data: An Application … · 2016-08-02 · Seismic studies are a key stage in the search for large scale underground features such as

Submit your manuscripts athttpwwwhindawicom

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical Problems in Engineering

Hindawi Publishing Corporationhttpwwwhindawicom

Differential EquationsInternational Journal of

Volume 2014

Applied MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Probability and StatisticsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Mathematical PhysicsAdvances in

Complex AnalysisJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

OptimizationJournal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

CombinatoricsHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Operations ResearchAdvances in

Journal of

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Function Spaces

Abstract and Applied AnalysisHindawi Publishing Corporationhttpwwwhindawicom Volume 2014

International Journal of Mathematics and Mathematical Sciences

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

The Scientific World JournalHindawi Publishing Corporation httpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Algebra

Discrete Dynamics in Nature and Society

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Decision SciencesAdvances in

Discrete MathematicsJournal of

Hindawi Publishing Corporationhttpwwwhindawicom

Volume 2014 Hindawi Publishing Corporationhttpwwwhindawicom Volume 2014

Stochastic AnalysisInternational Journal of