a comparison of classification techniques for seismic facies recognition

A comparison of classification techniques for seismic facies recognition

Tao Zhao1, Vikram Jayaram2, Atish Roy3, and Kurt J. Marfurt1

Abstract

During the past decade, the size of 3D seismic data volumes and the number of seismic attributes have in-creased to the extent that it is difficult, if not impossible, for interpreters to examine every seismic line and timeslice. To address this problem, several seismic facies classification algorithms including k-means, self-organizingmaps, generative topographic mapping, support vector machines, Gaussian mixture models, and artificial neuralnetworks have been successfully used to extract features of geologic interest from multiple volumes. Althoughwell documented in the literature, the terminology and complexity of these algorithms may bewilder the averageseismic interpreter, and few papers have applied these competing methods to the same data volume. We havereviewed six commonly used algorithms and applied them to a single 3D seismic data volume acquired over theCanterbury Basin, offshore New Zealand, where one of the main objectives was to differentiate the architecturalelements of a turbidite system. Not surprisingly, the most important parameter in this analysis was the choice ofthe correct input attributes, which in turn depended on careful pattern recognition by the interpreter. We foundthat supervised learning methods provided accurate estimates of the desired seismic facies, whereas unsuper-vised learning methods also highlighted features that might otherwise be overlooked.

IntroductionIn 2015, pattern recognition has become part of

everyday life. Amazon or Alibaba analyzes the clothesyou buy, Google analyzes your driving routine, and yourlocal grocery store knows the kind of cereal you eat inthe morning. “Big data” and “deep learning algorithms”are being analyzed by big companies and big govern-ment, attempting to identify patterns in our spendinghabits and the people with whom we associate.

Successful seismic interpreters are experts at pat-tern recognition: identifying features such as channels,mass transport complexes, and collapse features,where our engineering colleagues only see wiggles.Our challenge as interpreters is that the data volumeswe need to analyze keep growing in size and dimension-ality, whereas the number of experienced interpretershas remained relatively constant. One solution to thisdilemma is for these experienced interpreters to teachtheir skills to the next generation of geologists and geo-physicists, either through traditional or on-the-job train-ing. An alternative and complementary solution is forthese experienced interpreters to teach theirs skillsto a machine. Turing (1950), whose scientific contribu-tions and life has recently been popularized in a movie,asks whether machines can think. Whether machines

will ever be able to think is a question for scientistsand philosophers to answer (e.g., Eagleman, 2012),but machines can be taught to perform repetitive tasks,and even to unravel the relationships that underlayrepetitive patterns, in an area called machine learning.

Twenty-five years ago, skilled interpreters delineatedseismic facies on a suite of 2D lines by visually exam-ining seismic waveforms, frequency, amplitude, phase,and geometric configurations. Facies would then beposted on a map and hand contoured to generate a seis-mic facies map. With the introduction of 3D seismicdata and volumetric attributes, such analysis has be-come more quantitative and more automated. In thistutorial, we focus on classification (also called cluster-

ing) on large 3D seismic data, whereby like patterns inthe seismic response (seismic facies) are assigned sim-ilar values. Much of the same technology can be used todefine specific rock properties, such as brittleness, totalorganic content, or porosity. Pattern recognition andclustering are common to many industries, from usingcameras to identify knotholes in plywood production,to tracking cell phone communications, to identifyingpotential narcotics traffickers. The workflow is summa-rized in the classic textbook by Duda et al. (2001) dis-played in Figure 1. In this figure, “sensing” consists of

1The University of Oklahoma, ConocoPhillips School of Geology and Geophysics, Norman, Oklahoma, USA. E-mail: [email protected]; [email protected].

2Oklahoma Geological Survey, Norman, Oklahoma, USA. E-mail: [email protected] America, Houston, Texas, USA. E-mail: [email protected] received by the Editor 19 February 2015; revised manuscript received 2 May 2015; published online 14 August 2015. This paper

appears in Interpretation, Vol. 3, No. 4 (November 2015); p. SAE29–SAE58, 30 FIGS., 3 TABLES.http://dx.doi.org/10.1190/INT-2015-0044.1. © 2015 Society of Exploration Geophysicists and American Association of Petroleum Geologists. All rights reserved.

t

Special section: Pattern recognition and machine learning

Interpretation / November 2015 SAE29Interpretation / November 2015 SAE29

Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

seismic, well log, completion, and production measure-ments. For interpreters, “segmentation” will usuallymean focusing on a given stratigraphic formation orsuite of formations. Seismic data lose their temporaland lateral resolution with depth, such that a given seis-mic facies changes its appearance, or is nonstationary,as we go deeper in the section. The number of potentialfacies also increases as we analyze larger vertical win-dows incorporating different depositional environments,making classification more difficult. For computer-assisted facies classification, “feature extraction” meansattributes, be they simple measurements of amplitudeand frequency; geometric attributes that measure reflec-tor configurations; or more quantitative measurementsof lithology, fractures, or geomechanical properties pro-vided by prestack inversion and azimuthal anisotropyanalysis. “Classification” assigns each voxel to one ofa finite number of classes (also called clusters), eachof which represents a seismic facies that may or maynot correspond to a geologic facies. Finally, using val-idation data, the interpreter makes a “decision” that de-termines whether a given cluster represents a uniqueseismic facies, if it should be lumped in other clustershaving a somewhat similar attribute expression, orwhether it should be further subdivided, perhapsthrough the introduction of additional attributes.

Pattern recognition and classification of seismic fea-tures is fundamental to human based interpretation,where our job may be as “simple” as identifying andpicking horizons and faults, or more advanced, suchas the delineation of channels, mass transport com-plexes, carbonate buildups, or potential gas accumula-tions. The use of computer-assisted classification begansoon after the development of seismic attributes in the1970s (Balch, 1971; Taner et al., 1979), with the work bySonneland (1983) and Justice et al. (1985) being two ofthe first. The k-means (Forgy, 1965; Jancey, 1966) wasone of the earliest clustering algorithms developed, andit was quickly applied by service companies, and todayit is common to almost all interpretation software pack-ages. The k-means is an unsupervised learning algo-

rithm, in that the interpreter provides no prior informa-tion other than the selection of attributes and thenumber of desired clusters.

Barnes and Laughlin (2002) review several unsuper-vised clustering techniques, including k-means, fuzzyclustering, and self-organizing maps (SOMs). Their pri-mary finding was that the clustering algorithm used wasless important than the choice of attributes used.Among the clustering algorithms, they favor SOM be-cause there is topologically ordered mapping of theclusters with similar clusters lying adjacent to eachother on a manifold and in the associated latent space.In our examples, a “manifold” is a deformed 2D surfacethat best fits the distribution of n attributes lying in anN -dimensional data space. The clusters are thenmapped to a simpler 2D rectangular “latent” (Latinfor “hidden”) space, upon which the interpreter can ei-ther interactively define clusters or map the projectionsonto a 2D color bar. A properly chosen latent space canhelp to identify data properties that are otherwise dif-ficult to observe in the original input space. Coleouet al.’s (2003) seismic “waveform classification” algo-rithm is implemented using SOM, where the “attributes”are seismic amplitudes that lie on a suite of 16 phantomhorizon slices. Each ðx; yÞ location in the analysis win-dow results provides a 16D vector of amplitudes. Whenplotted one element after the other, the mean of eachcluster in 16D space looks like a waveform. Thesewaveforms lie along a 1D deformed string (the mani-fold) that lies in 16D. This 1D string is then mappedto a 1D line (the latent space), which in turn is mappedagainst a 1D continuous color bar. The proximity ofsuch waveforms to each other on the manifold and la-tent spaces results in similar seismic facies appearing assimilar colors. Coleou et al. (2003) generalize their al-gorithm to attributes other than seismic amplitude, con-structing vectors of dip magnitude, coherence, andreflector parallelism. Strecker and Uden (2002) are per-haps the first to use 2D manifolds and 2D latent spaceswith geophysical data, using multidimensional attributevolumes to form N -dimensional vectors at each seismic

Figure 1. Classification as applied to the in-terpretation of seismic facies (modified fromDuda et al., 2001).

SAE30 Interpretation / November 2015

Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

sample point. Typical attributes included envelope,bandwidth, impedance, amplitude variation with offset(AVO) slope and intercept, dip magnitude, and coher-ence. These attributes were projected onto a 2D latentspace and their results plotted against a 2D color table.Gao (2007) applies a 1D SOM to gray-level co-occur-rence matrix (GLCM) texture attributes to map seismicfacies offshore Angola. Overdefining the clusters with256 prototype vectors, he then uses 3D visualizationand his knowledge of the depositional environmentto map the “natural” clusters. These natural clusterswere then calibrated using well control, giving rise towhat is called a posteriori supervision. Roy et al.(2013) build on these concepts and develop an SOMclassification workflow of multiple seismic attributescomputed over a deepwater depositional system. Theycalibrate the clusters a posteriori using classical princi-ples of seismic stratigraphy on a subset of vertical slicesthrough the seismic amplitude. A simple but very impor-tant innovation was to project the clusters onto a 2Dnonlinear Sammon space (Sammon, 1969). This projec-tion was then colored using a gradational 2D color scalelike that of Matos et al. (2009), thus facilitating the in-terpretation. Roy et al. (2013) introduce a Euclidean dis-tance measure to correlate predefined unsupervisedclusters to average data vectors about interpreter de-fined well-log facies.

Generative topographic mapping (GTM) is a more re-cent unsupervised classification innovation, providing aprobabilistic representation of the data vectors in thelatent space (Bishop et al., 1998). There has been verylittle work on the application of the GTM technique toseismic data and exploration problems. Wallet et al.(2009) are probably the first to apply the GTM tech-nique to seismic data, using a suite of phantom horizonslices through a seismic amplitude volume generating awaveform classification. Although generating excellentimages, Roy (2013) and Roy et al. (2014) find the intro-duction of well control to SOM classification to besomewhat limited, and instead apply GTM to a Mississip-pian tripolitic chert reservoir in midcontinent USA and acarbonate wash play in the Sierra Madre Oriental ofMexico. They find that GTM provides not only the mostlikely cluster associated with a given voxel, but also theprobability that voxel belongs each of clusters, providinga measure of confidence or risk in the prediction.

The k-means, SOM, and GTM are all unsupervisedlearning techniques, where the clustering is driven onlyby the choice of input attributes and the number of de-sired clusters. If we wish to teach the computer tomimic the facies identification previously chosen bya skilled interpreter, or link seismic facies to electrof-acies interpreted using wireline logs, we need to intro-duce “supervision” or external control to the clusteringalgorithm. The most popular means of supervised learn-ing classification are based on artificial neural networks(ANNs). Meldahl et al. (1999) use seismic energy andcoherence attributes coupled with interpreter control(picked seed points) to train a neural network to iden-

tify hydrocarbon chimneys. West et al. (2002) use a sim-ilar workflow, in which the objective is seismic faciesanalysis of a channel system and the input attributesare textures. Corradi et al. (2009) use GLCM texturesand ANN, with controls based on wells and skilled in-terpretation of some key 2D vertical slices to map sand,evaporate, and sealing versus nonsealing shale faciesoffshore of west Africa.

The support vector machine (SVM, where the word“machine” is due to Turing’s [1950] mechanical decryp-tion machine) is a more recent introduction (e.g.,Kuzma and Rector, 2004, 2005, 2007; Li and Castagna,2004; Zhao et al., 2005; Al-Anazi and Gates, 2010). Origi-nating from maximum margin classifiers, SVMs havegained great popularity for solving pattern classificationand regression problems since the concept of a “softmargin” was first introduced by Cortes and Vapnik(1995). SVMs map the N -dimensional input data intoa higher-dimensional latent (often called feature) space,where clusters can be linearly separated by hyper-planes. Detailed descriptions of SVMs can be foundin Cortes and Vapnik (1995), Cristianini and Shawe-Taylor (2000), and Schölkopf and Smola (2002). Liand Castagna (2004) use SVM to discriminate alterna-tive AVO responses, whereas Zhao et al. (2014) andZhang et al. (2015) use a variation of SVM using miner-alogy logs and seismic attributes to predict lithologyand brittleness in a shale resource play.

We begin our paper by providing a summary of themore common clustering techniques used in seismic fa-cies classification, emphasizing their similarities anddifferences. We start from the unsupervised learningk-means algorithm, progress through projections ontoprincipal component hyperplanes, and end with projec-tions onto SOM and GTM manifolds, which are topo-logical spaces that resemble Euclidean space neareach point. Next, we provide a summary of supervisedlearning techniques including ANNs and SVMs. Giventhese definitions, we apply each of these methods toidentify seismic facies in the same data volume acquiredin the Canterbury Basin of New Zealand. We concludewith a discussion on the advantages and limitations ofeach method and areas for future algorithm develop-ment and workflow refinement. At the very end, we pro-vide an appendix containing some of the mathematicaldetails to better quantify how each algorithm works.

Review of unsupervised learning classificationtechniquesCrossplotting

Crossplotting one or more attributes against eachother is an interactive, and perhaps the most common,clustering technique. In its simplest implementation,one computes and then displays a 2D histogram oftwo attributes. In most software packages, the inter-preter then identifies a cluster of interest and drawsa polygon around it. Although several software pack-ages allow crossplotting of up to three attributes, cross-plotting more than three attributes quickly becomes

Interpretation / November 2015 SAE31

Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

intractable. One workflow to address this visualizationlimitation is to first project a high number of attributesonto the first two or three eigenvectors, and then cross-plot the principal components. Principal componentswill be discussed later in the section on projectionmethods.

k-means classificationThe k-means (MacQueen, 1967) is perhaps the sim-

plest clustering algorithm and is widely available incommercial interpretation software packages. Themethod is summarized in the cartoons shown in Fig-ure 2. One drawback of the method is that the inter-preter needs to define how many clusters reside inthe data. Once the number of clusters is defined, thecluster means or centers are defined either on a gridor randomly to begin the iteration loop. Because attrib-utes have different units of measurement (e.g., Hertz forpeak frequency, 1∕km for curvature, and millivolt forthe root-mean-square amplitude), the distance of eachdata point to the current means is computed by scalingthe data by the inverse of the covariance matrix, givingus the “Mahalanobis distance” (see Appendix A). Eachdata point is then assigned to the cluster to whose meanit is closest. Once assigned, new cluster means are com-puted from the newly assigned data clusters and theprocess is repeated. If there are Q clusters, the processwill converge in about Q iterations.

The k-means is fast and easy to implement. Unfortu-nately, the clustering has no structure, such that there isno relationship between the cluster numbering (andtherefore coloring) and the proximity of one clusterto another. This lack of organization can result in sim-ilar facies appearing in totally different colors, confus-ing the interpretation. Tuning the number of clusters toforce similar facies into the same cluster is a somewhattedious procedure that also decreases the resolution ofthe facies map.

Projection techniquesAlthough not defined this way in the pattern recog-

nition literature because this is a tutorial, we will lumpthe following methods, principal component analysis(PCA), SOM, and generative topographic maps, to-gether and call them projection techniques. Projectiontechniques project data residing in a higher-dimensionalspace (for example, a 5D space defined by five attrib-utes) onto a lower dimensional space (for example, a2D plane or deformed 2D surface). Once projected,the data can be clustered in that space by the algorithm(such as SOM) or interactively clustered by the inter-preter by drawing polygons (routine for PCA, andour preferred analysis technique for SOM and GTM).

Principal component analysisPCA is widely used to reduce the redundancy and

excess dimensionality of the input attribute data. Suchreduction is based on the assumption that most of thesignals are preserved in the first few principal compo-nents (eigenvectors), whereas the last principal compo-nents contain uncorrelated noise. In this tutorial, wewill use PCA as the first iteration of the SOM andGTM algorithms. Many workers use PCA to reduce re-dundant attributes into “meta attributes” to simplify thecomputation. The first eigenvector is a vector in N -di-mensional attribute space that best represents theattribute patterns in the data. Crosscorrelating (projec-ting) the N -dimensional data against the first eigenvec-tor at each voxel gives us the first principal componentvolume. If we scale the first eigenvector by the firstprincipal component and subtract it from the originaldata vector, we obtain a residual data vector. The sec-ond eigenvector is that vector that best represents theattribute patterns in this residual. Crosscorrelating(projecting) the second eigenvector against either theoriginal data or residual data vector at each voxel givesus the second principal component volume. This proc-ess continues for all N -dimensions resulting in N eigen-

Figure 2. Cartoon illustration of a k-meansclassification of three clusters. (a) Select threerandom or equally spaced, but distinct, seedpoints, which serve as the initial estimate ofthe vector means of each cluster. Next, com-pute the Mahalanobis distance between eachdata vector and each cluster mean. Then,color code or otherwise label each data vectorto belong to the cluster that has the smallestMahalanobis distance. (b) Recompute themeans of each cluster from the previously de-fined data vectors. (c) Recalculate the Maha-lanobis distance from each vector to the newcluster means. Assign each vector to the clus-ter that has the smallest distance. (d) Theprocess continues until the changes in meansconverge to their final locations. If we nowadd a new (yellow) point, we will use a Baye-sian classifier to determine into which clusterit falls (figure courtesy S. Pickford).


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

vectors and N principal components. In this paper, wewill limit ourselves to the first two eigenvectors, whichthus define the plane that least-squares fits the N -di-mensional attribute data. Figure 3c shows a numericalexample of the first two principal components defininga plane in a 3D data space.

Self-organizing mapsAlthough many workers (e.g., Coleou et al., 2003) de-

scribe the SOM as a type of neural network, for the pur-poses of this tutorial, we prefer to describe the SOM asa manifold projection technique Kohonen (1982). SOM,originally developed for gene pattern recognition, is oneof the most popular classification techniques, and it hasbeen implemented in at least four commercial softwarepackages for seismic facies classification. The major ad-vantage of SOM over k-means is that the clusters resid-ing on the deformed manifold in N -dimensional dataspace are directly mapped to a rectilinear or otherwiseregularly gridded latent space. We provide a brief sum-mary of the mathematical formulations of the SOM im-plementation used in this study in Appendix A.

Although SOM is one of the most popular classifica-tion techniques, there are several limitations to the SOMalgorithm. First, the choice of neighborhood function ateach iteration is subjective, with different choices re-sulting in different solutions. Second, the absence ofa quantitative error measure does not let us knowwhether the solution has converged to an acceptablelevel, thus providing confidence in the resulting analy-sis. Third, although we find the most likely cluster for agiven data vector, we have no quantitative measure ofconfidence in the facies classification, and no indicationif the vector could be nearly as well represented byother facies.

Generative topographic mappingGTM is a nonlinear dimensionality reduction tech-

nique that provides a probabilistic representation ofthe data vectors on a lower L-dimensional deformedmanifold that is in turn mapped to an L-dimensional la-tent space. Although SOM seeks the node or prototypevector that is closest to the randomly chosen vectorfrom the training or input data set, in GTM, each ofthe nodes lying on the lower dimensional manifoldprovides some mathematical support to the data andis considered to be to some degree “responsible” forthe data vector (Figure 4). The level of support or“responsibility” is modeled with a constrained mixtureof Gaussians. The model parameter estimations are de-termined by maximum likelihood using the expectationmaximization (EM) algorithm (Bishop et al., 1998).

Because GTM theory is deeply rooted in probability,it can also be used in modern risk analysis. We can ex-tend the GTM application in seismic exploration by pro-jecting the mean posterior probabilities of a particularwindow of multiattribute data (i.e., a producing well)onto the 2D latent space. By project the data vectorat any given voxel onto the latent space, we obtain aprobability estimates of whether it falls into the samecategory (Roy et al., 2014). We thus have a probabilisticestimate of how similar any data vector is to attributebehavior (and hence facies) about a producing or non-producing well of interest.

Other unsupervised learning methodsThere are many other unsupervised learning tech-

niques, several of which were evaluated by Barnesand Laughlin (2002). We do not currently have accessto software to apply independent component analysis(ICA) and Gaussian mixture models (GMMs) to our

Figure 3. (a) A distribution of data points in3D attribute space. The statistics of this distri-bution can be defined by the covariance ma-trix. (b) k-means will cluster data into a user-defined number of distributions (four in thisexample) based on the Mahalanobis distancemeasure. (c) The plane that best fits thesedata is defined by the first two eigenvectorsof the covariance matrix. The projection ofthe 3D data onto this plane provides the firsttwo principal components of the data, as wellas the initial model for our SOM and GTM al-gorithms. (d) SOM and GTM deform the initial2D plane into a 2D manifold that better fits thedata. Each point on the deformed 2D manifoldis in turn mapped to a 2D rectangular latentspace. Clusters are color coded or interac-tively defined on this latent space.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

seismic facies classification problem, but we mentionthem as possible candidates.

Independent component analysisLike PCA, ICA is a statistical technique used to

project a set of N-dimensional vectors onto a smallerL-dimensional space. Unlike PCA, which is based onGaussian statistics, whereby the first eigenvector bestrepresents the variance in the multidimensional data,ICA attempts to project data onto subspaces that resultin non-Gaussian distributions, which are then easier toseparate and visualize. Honório et al. (2014) success-fully apply ICA to multiple spectral components todelineate the architectural elements of an offshore Bra-zil carbonate terrain. PCA and ICA are commonly usedto reduce a redundant set of attributes to form a smallerset of independent meta-attributes (e.g., Gao, 2007).

Gaussian mixture modelsGMMs are parametric models of probability dis-

tributions, which can provide greater flexibility andprecision in modeling than traditional unsupervisedclustering algorithms. Lubo et al. (2014) apply this tech-nique to a suite of well logs acquired over HorseshoeAtoll, west Texas, to generate different lithologies.These GMM lithologies are then used to calibrate 3Dseismic prestack inversion results to generate a 3D rockproperty model. At present, we do not know of any

GMM algorithms applied to seismic facies classificationusing seismic attributes as input data.

Review of supervised learning classificationtechniquesArtificial neural networks

ANNs can be used in unsupervised and supervisedmulitattribute analysis (van der Baan and Jutten,2000). The multilayer perceptron and the radial basisfunction (RBF) are two popular types of neural net-works used in supervised learning. The probabilisticneural network (PNN), which also uses RBF, formsthe basis of additional neural network geophysicalapplications. In terms of network architecture, the su-pervised algorithms are feed-forward networks. In con-trast, the unsupervised SOM algorithm described earlieris a recurrent (or feed-backward) network. An advan-tage of feed-forward networks over SOMs is the abilityto predict both continuous values (such as porosity), aswell as discrete values (such as the facies class num-ber). Applications of neural networks can be foundin seismic inversion (Röth and Tarantola, 1994), well-log prediction from other logs (Huang et al., 1996;Lim, 2005), waveform recognition (Murat and Rudman,1992), seismic facies analysis (West et al., 2002), andreservoir property prediction using seismic attributes(Yu et al., 2008; Zhao and Ramachandran, 2013). Forthe last application listed above, however, due to the

Figure 4. (a) The K grid points uk defined ona L-dimensional latent space grid are mappedto K grid points mk lying on a non-Euclideanmanifold in N-dimensional data space. In thispaper, L ¼ 2, and it will be mapped against a2D color bar. The Gaussian mapping functionsare initialized to be equally spaced on the planedefined by the first two eigenvectors. (b) Sche-matic showing the training of the latent spacegrid points to a data vector aj lying near theGTMmanifold using an EM algorithm. The pos-terior probability of each data vector is calcu-lated for all Gaussian centroids points mk andare assigned to the respective latent space gridpoints uk. Grid points with high probabilitiesare displayed as bright colors. All variablesare discussed in Appendix A.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

resolution difference between seismic and well logs,structural and lithologic variation of interwell points,and the highly nonlinear relation between these two do-mains, achieving a convincing prediction result can bechallenging. In this case, geostatistical methods, such asBayesian analysis, can be used jointly to provide a prob-ability index, giving interpreters an estimate of howmuch confidence they should have in the prediction.

ANNs are routinely used in the exploration and pro-duction industry. ANN provides a means to correlatewell measurements, such as gamma-ray logs to seismicattributes (e.g., Verma et al., 2012), where the underly-ing relationship is a function of rock properties, thedepositional environment, and diagenetic alteration.Although it has produced reliable classification in manyapplications during its service, defects such as converg-ing to local minima and difficult in parameterization arenot negligible. In industrial and scientific applications,we prefer a constant and robust classifier, once thetraining vectors and model parameters have been deter-mined. This leads to the more recent supervised learn-ing technique developed in the late 20th century,the SVM.

Support vector machinesThe basic idea of SVMs is straightforward. First, we

transform the training data vectors into a still higher-dimensional “feature” space using nonlinear mapping.Then, we find a hyperplane in this feature space thatseparates the data into two classes with an optimal“margin.” The concept of a margin is defined to bethe smallest distance between the separation hyper-plane (commonly called a decision boundary) and

the training vectors (Bishop, 2006) (Figure 5). An opti-mal margin balances two criteria: maximizing the mar-gin, thereby giving the classifier the best generalizationand minimizing the number of misclassified trainingvectors if the training data are not linearly separable.The margin can also be described as the distance be-tween the decision boundary and two hyperplanes de-fined by the data vectors, which have the smallestdistance to the decision boundary. These two hyper-planes are called the plus plane and the minus plane.The vectors that lie exactly on these two hyperplanesmathematically define or “support” them and are calledsupport vectors. Tong and Koller (2001) show that thedecision boundary is dependent solely on the supportvectors, resulting in the name SVMs.

SVMs can be used in either a supervised or in a semi-supervised learning mode. In contrast to supervisedlearning, semisupervised learning defines a learningprocess that uses labeled and unlabeled vectors. Whenthere are a limited number of interpreter classified datavectors, the classifier may not act well due to insuffi-cient training. In semisupervised training, some ofthe nearby unclassified data vectors are automaticallyselected and classified based on a distance measure-ment during the training step, as in an unsupervisedlearning process. These vectors are then used as addi-tional training vectors (Figure 6), resulting in a classi-fier that will perform better for the specific problem.The generalization power is sacrificed by using unla-beled data. In this tutorial, we focus on SVM; however,the future of semisupervised SVM in geophysical appli-cations is quite promising.

Figure 5. Cartoon of a linear SVM classifier separating blackfrom white data vectors. The two dashed lines are the marginsdefined by the support vector data points. The red decisionboundary falls midway between the margins, separating thetwo clusters. If the data clusters overlap, no margins canbe drawn. In this situation, the data vectors will be mappedto a higher-dimensional space, where they can be separated.

Figure 6. Cartoon describing semisupervised learning. Bluesquares and red triangles indicate two different interpreter-de-fined classes. Black dots indicate unclassified points. In semi-supervised learning, unclassified data vectors 1 and 2 areclassified to be class “A,” whereas data vector 3 is classifiedto be class “B” during the training process.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

Proximal support vector machinesThe proximal support vector machine (PSVM) (Fung

and Mangasarian, 2001, 2005) is a recent variant of SVM,which, instead of looking for a separating plane di-rectly, builds two parallel planes that approximatetwo data classes; the decision-boundary then falls be-tween these two planes (Figure 7). Other researchershave found that PSVM provides comparable classifica-tion correctness with standard SVM but at considerablecomputational savings (Fung and Mangasarian, 2001,2005; Mangasarian and Wild, 2006). In this tutorial,we use PSVM as our implementation of SVM. Detailson the PSVM algorithm are provided in Appendix A.

We may face problems in seismic interpretation thatare linearly inseparable in the original input multidi-mensional attribute space. In SVM, we map the datavectors into a higher-dimensional space where theybecome linearly separable (Figure 8), where the in-crease in dimensionality may result in significantly in-creased computational cost. Instead of using anexplicit mapping function to map input data into ahigher-dimensional space, PSVM achieves the samegoal by manipulating a kernel function in the inputattribute space. In our implementation, we use a Gaus-sian kernel function, but in principle, many other func-tions can be used (Shawe-Taylor and Cristianini, 2004).

SVM can be used either as a classifier or as a regres-sion operator. Used as a regression operator, SVM iscapable of predicting petrophysical properties, suchas porosity (Wong et al., 2005), VP, VS, and density(Kuzma and Rector, 2004), as well as permeability(Al-Anazi and Gates, 2010; Nazari et al., 2011). In all

such applications, SVM shows comparable or superiorperformance with neural networks with respect to pre-diction error and training cost. When used as a classi-fier, SVM is suitable in predicting lithofacies (Al-Anaziand Gates, 2010; Torres and Reveron, 2013; Wang et al.,2014; Zhao et al., 2014) or pseudo rock properties(Zhang et al., 2015), either from well-log data, core data,or seismic attributes.

Geologic settingIn this tutorial, we used the Waka 3D seismic survey

acquired over the Canterbury Basin, offshore NewZealand, generously made public by New ZealandPetroleum and Minerals. Readers can request this dataset through their website for research purposes. Fig-ure 9 shows the location of this survey, where thered rectangle corresponds to time slices shown in sub-sequent figures. The study area lies on the transitionzone of continental slope and rise, with abundance ofpaleocanyons and turbidite deposits of Cretaceousand Tertiary ages. These sediments are deposited ina single, tectonically driven transgressive-regressivecycle (Uruski, 2010). Being a very recent and underex-plored prospect, publicly available comprehensivestudies of the Canterbury Basin are somewhat limited.The modern seafloor canyons shown in Figure 9 aregood analogs of the deeper paleocanyons illuminatedby the 3D seismic amplitude and attribute data.

Attribute selectionIn their comparison of alternative unsupervised

learning techniques, Barnes and Laughlin (2002) con-

Figure 7. (a) Cartoon showing a two-class PSVM in 2D space. Classes A and B are approximated by two parallel lines that havebeen pushed as far apart as possible forming the cluster “margins.” The red decision-boundary lies midway between the twomargins. Maximizing the margin is equivalent to minimizing ðωTωþ γ2Þ12. (b) A two-class PSVM in 3D space. In this case, the de-cision-boundary and margins are 2D planes.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

clude that the appropriate choice of attributes was themost critical component of computer-assisted seismicfacies identification. Although interpreters are skilledat identifying facies, such recognition is often subcon-scious and hard to define (see Eagleman’s [2012] dis-cussion on differentiating male from female chicksand identifying military aircraft from silhouettes). In su-pervised learning, the software does some of the work

during the training process, although we must alwaysbe wary of false correlations, if we provide too manyattributes (Kalkomey, 1997). For the prediction of con-tinuous data, such as porosity, Russell (1997) suggeststhat one begin with exploratory data analysis, whereone crosscorrelates a candidate attribute with the de-sired property at the well. Such crosscorrelation doesnot work well when trying to identify seismic facies,

Figure 9. A map showing the location of the3D seismic survey acquired over the Canter-bury Basin, offshore New Zealand. The blackrectangle denotes the limits of the Waka 3Dsurvey, whereas the smaller red rectangle de-notes the part of the survey shown in sub-sequent figures. The colors represent therelative depth of the current seafloor, warmbeing shallower, and cold being deeper. Cur-rent seafloor canyons are delineated in thismap, which are good analogs for the paleocan-yons in Cretaceous and Tertiary ages (modi-fied from Mitchell and Neil, 2012).

Figure 8. Cartoon showing how one SVM can map two linearly inseparable problem into a higher-dimensional space, in whichthey can be separated. (a) Circular classes A and B in a 2D space cannot be separated by a linear decision-boundary (line). (b) Map-ping the same data into a higher 3D feature space using the given projection. This transformation allows the two classes to beseparated by the green plane.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

Figure 10. Time slice at t ¼ 1.88 s throughthe seismic amplitude volume. White arrowsindicate potential channel/canyon features.The yellow arrow indicates a high-amplitudefeature. Red arrows indicate a relativelylow-energy, gently dipping area. AA′ denotesthe cross section shown in Figure 14.

Table 1. Attribute expressions of seismic facies.

Facies Appearance to interpreter Attribute expression

Levee Structurally high Stronger dome- or ridge-shaped structural components

Locally continuous Higher GLCM homogeneity and lower GLCM entropy

Higher amplitude Dome- or ridge-shaped component

Possibly thicker Lower peak spectral frequency

Channel thalwegs Shale filled with negativecompaction

Stronger bowl- or valley-shaped structural components and higher peakspectral frequency

Sand filled with positivecompaction

Stronger dome- or ridge-shaped structural components and lower peakspectral frequency

Channel flanks Onlap onto incisement andcanyon edges

Higher reflector convergence magnitude

Gas-chargedsands

High amplitude andcontinuous reflections

Higher GLCM homogeneity, lower GLCM entropy, and high peakmagnitude

Incised floodplain Erosional truncation Higher reflector convergence magnitude and higher curvedness

Floodplain Lower amplitude Lower spectral magnitude

Higher frequency Higher peak spectral frequency

Continuous Higher GLCM homogeneity and lower GLCM entropy

Near-planar events Lower amplitude structural shape components and lower reflectorconvergence magnitude

Slumps Chaotic reflectivity Higher reflector convergence magnitude, higher spectral frequency, lowerGLCM homogeneity, and higher GLCM entropy


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

which are labeled with an integer number or alphanu-meric name.

Table 1 summarizes how we four interpreters per-ceive each of the seismic facies of interest. Once weenumerate the seismic expression, the quantificationusing attribute expression is relatively straightforward.In general, amplitude and frequency attributes are lith-ology indicators and may provide direct hydrocarbondetection in conventional reservoirs: Geometric attrib-utes delineate reflector morphology, such as dip, curva-ture, rotation, and convergence, whereas statistical andtexture attributes provide information about data distri-bution that quantifies subtle patterns that are hard todefine (Chopra and Marfurt, 2007). Attributes such ascoherence provide images of the edges of seismic faciesrather than a measure of the facies themselves,although slumps often appear as a suite of closelyspaced faults separating rotated fault blocks. Finally,what we see as interpreters and what our clusteringalgorithms see can be quite different. Although wemay see a slump feature as exhibiting a high numberof faults per kilometer, our clustering algorithms areapplied voxel by voxel and see only the local behav-ior. Extending the clustering to see such large-scaletextures requires the development of new textureattributes.

The number of attributes should be as small as pos-sible to discriminate the facies of interest, and eachattribute should be mathematically independent fromthe others. Although it may be fairly easy to representthree attributes with a deformed 2D manifold, increas-ing the dimensionality results in increased deformation,such that our manifold may fold on itself or may notaccurately represent the increased data variability. Be-cause the Waka 3D survey is new to all four authors, wetest numerous attributes that we think may highlight dif-ferent facies in the turbidite system. Among these attrib-utes, we find the shape index to be good for visualclassification, but it dominates the unsupervised classi-fications with valley and ridge features across the sur-vey. After such analysis, we chose four attributes thatare mathematically independent but should be coupledthrough the underlying geology, peak spectral fre-quency, peak spectral magnitude, GLCM homogeneity,and curvedness, as the input to our classifiers. The peakspectral frequency and peak spectral magnitude forman attribute pair that crudely represents the spectral re-sponse. The peak frequency of spectrally whitened datais sensitive to tuning thickness, whereas the peak mag-nitude is a function of the tuning thickness and theimpedance contrast. GLCM homogeneity is a textureattribute that has a high value for adjacent traces with

Figure 11. Time slice at t ¼ 1.88 s throughpeak spectral frequency corendered withpeak spectral magnitude that emphasizesthe relative thickness and reflectivity of theturbidite system and surrounding slope fansediments into which it was incised. Thetwo attributes are computed using a continu-ous wavelet transform algorithm. The edges ofthe channels are delineated by Sobel filtersimilarity.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

similar (high or low) amplitudes and measures thecontinuity of a seismic facies. Curvedness defines themagnitude of reflector structural or stratigraphic defor-mation, with dome-, ridge-, saddle-, valley-, and bowl-shaped features exhibiting high curvedness and planarfeatures exhibiting zero curvedness.

Figure 10 shows a time slice at t ¼ 1.88 s through theseismic amplitude volume, on which we identify chan-nels (white arrows), high-amplitude deposits (yellow ar-rows), and slope fans (red arrows). Figure 11 shows anequivalent time slice through the peak spectral fre-quency corendered with the peak spectral magnitudethat emphasizes the relative thickness and reflectivityof the turbidite system, as well as surrounding slopefan sediments into which it was incised. The edges ofthe channels are delineated by Sobel filter similarity.We show equivalent time slices through (Figure 12)GLCM homogeneity, and (Figure 13) corendered shapeindex and curvedness. In Figure 14, we show a repre-sentative vertical slice at line AA′ in Figure 10 cuttingthrough the channels through (Figure 14a) seismic am-plitude, (Figure 14b) seismic amplitude corenderedwith peak spectral magnitude/peak spectral frequency,(Figure 14c) seismic amplitude corendered with GLCMhomogeneity, and (Figure 14d) seismic amplitude cor-endered shape index and curvedness. White arrows in-

dicate incised valleys, yellow arrows indicate high-amplitude deposits, and red arrows indicate a slopefan. We note that several of the incised values are vis-ible at time slice t ¼ 1.88 s.

In a conventional interpretation workflow, the geo-scientist would examine each of these attribute imagesand integrate them within a depositional framework.Such interpretation takes time and may be impracticalfor extremely large data volumes. In contrast, in seismicfacies classification, the computer either attempts toclassify what it sees as distinct seismic facies (in unsu-pervised learning) or attempts to emulate the inter-preter’s classification made on a finite number ofvertical sections, time, and/or horizon slices and applythe same classification to the full 3D volume (in super-vised learning). In both cases, the interpreter needs tovalidate the final classification to determine if they re-present seismic facies of interest. In our example, wewill use Sobel filter similarity to separate the facies,and we then evaluate how they fit within our under-standing of a turbidite system.

ApplicationGiven these four attributes, we now construct 4D

attribute vectors as input to the previously described

Figure 12. Time slice at t ¼ 1.88 s throughthe GLCM homogeneity attribute corenderedwith Sobel filter similarity. Bright colors high-light areas with potential fan sand deposits.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

classification algorithms. To better illustrate the perfor-mance of each algorithm, we summarize the data size,number of computational processors, and runtime in

Table 2. All the algorithms are developed by the authorsexcept ANN, which is implemented using the MATLABtoolbox.

Figure 13. Time slice at t ¼ 1.88 s throughthe corendered shape index, curvedness,and Sobel filter similarity. The shape indexhighlights incisement, channel flanks, andlevees providing an excellent image forinteractive interpreter-driven classification.However, the shape index dominates the un-supervised classifications, highlighting valleyand ridge features, and minimizing more pla-nar features of interest in the survey.

Figure 14. Vertical slices along line AA′ (location shown in Figure 10) through (a) seismic amplitude, (b) seismic amplitudecorendered with peak spectral magnitude and peak spectral frequency, (c) seismic amplitude corendered with GLCM homo-geneity, and (d) seismic amplitude corendered with shape index and curvedness. White arrows indicate incised channel and can-yon features. The yellow arrow indicates at a high-amplitude reflector. Red arrows indicate relatively low-amplitude, gently dippingareas.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

We begin with the k-means. As previously discussed,a limitation of k-means is the lack of any structure to thecluster number selection process. We illustrate thislimitation by computing k-means with 16 (Figure 15)and 256 (Figure 16) clusters. On Figure 15, we can iden-tify high-amplitude overbank deposits (yellow arrows),channels (white arrows), and slope fan deposits (redarrows). A main limitation of k-means is that there is

no structure linking the clusters, which leads to a some-what random choice of color assignment to clusters.This problem becomes more serious when more clus-ters are selected: The result with 256 clusters (Fig-ure 16) is so chaotic that we can rarely separate theoverbank high-amplitude deposits (yellow arrows)and slope fan deposits (red arrows) that were easilyseparable in Figure 15. For this reason, modern k-means

Table 2. Classification settings and runtimes.

Algorithm Number of classes MPI processors4

Data set size(samples) Runtime (s)

Training Total Training Applying to the entire data set Total

k-means 16 50 809,600 101,200,000 65 20 85

k-means 256 50 809,600 101,200,000 1060 70 1130

SOM 256 1 809,600 101,200,000 4125 6035 10,160

GTM — 50 809,600 101,200,000 9582 1025 10,607

ANN5 4 1 437 101,200,000 2 304 306

SVM 4 50 437 101,200,000 24 12,092 12,116

4SOM is not run under MPI in our implementation. ANN is run using MATLAB and is not under MPI. The other three are run under MPI when applying the model to theentire data set.5ANN is implemented using the MATLAB toolbox.

Figure 15. Time slice at t ¼ 1.88 s through ak-means classification volume with K ¼ 16.White arrows indicate channel-like features.Yellow arrows indicate high-amplitude over-bank deposits. Red arrows indicate possibleslope fans. The edges of the channels aredelineated by Sobel filter similarity.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

Figure 16. Time slice at t ¼ 1.88 s through k-meansclassification volume with K ¼ 256. The classificationresult follows the same pattern as K ¼ 16, but it ismore chaotic because the classes are computed inde-pendently and are not constrained to fall on a lowerdimensional manifold. Note the similarity betweenclusters of high-amplitude overbank (yellow arrows)and slope fan deposits (red arrows), which were sepa-rable in Figure 15.

Figure 17. Time slice at t ¼ 1.88 s of the first twoprincipal components plotted against a 2D colorbar. These two principal components serve asthe initial model for the SOM and GTM images thatfollow. With each iteration, the SOM and GTMmanifolds will deform to better fit the natural clus-ters in the input data.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

Figure 19. Time slice at t ¼ 1.88 s through cross-plotting GTM projections 1 and 2 using a 2D colorbar. White arrows indicate channel-like features,yellow arrows indicate overbank deposits, andred arrows indicate slope fan deposits. The bluearrow indicates a braided channel system thatcan be seen on PCA but cannot be identified fromk-means or SOM classification maps. The color in-dicates the location of the mean probability ofeach data vector mapped into the 2D latent space.

Figure 18. Time slice at t ¼ 1.88 s throughan SOM classification volume using 256clusters. White arrows indicate channel-like fea-tures. Combined with vertical sections throughseismic amplitude, we interpret overbank depos-its (yellow arrows), crevasse splays (orange ar-rows), and slope fan deposits (red arrows). Thedata are mapped to a 2D manifold initialized bythe first two principal components and are some-what more organized than the k-means imageshown in the previous figures.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

applications focus on estimating the correct number ofclusters in the data.

In contrast to k-means, SOM restricts the cluster cen-ters to lie on a deformed 2Dmanifold. Although clustersmay move closer or further apart, they still form (in ourimplementation) a deformed quadrilateral mesh, whichmaps to a rectangular mesh on the 2D latent space.Mapping the latent space to a continuous 1D (Coleouet al., 2003) or 2D color bar (Strecker and Uden,2002), reduces the sensitivity to the number of clusterschosen. We follow Gao (2007) and avoid guessing at thenumber of clusters necessary to represent the data byoverdefining the number of prototype vectors to be 256(the limit of color levels in our commercial display soft-ware). These 256 prototype vectors (potential clusters)reduce to only three or four distinct natural clustersthrough the SOM neighborhood training criteria. The2D SOM manifold is initialized using the first two prin-cipal components, defining a plane through the N -di-mensional attribute space (Figure 17). The algorithmthen deforms the manifold to better fit the data. Over-defining the number of prototype vectors results inclumping into a smaller number natural clusters. Theseclumped prototype vectors project onto adjacent loca-tions in the latent space, therefore, appear as subtleshades of the same color as indicated by the limited pa-

lette of 256 colors shown in Figure 18. On the classifi-cation result shown in Figure 18, we can clearly identifythe green colored spillover deposits (yellow arrows).The difference between channel fill (white arrows)and slope fans (red arrows) is insignificant. However,by corendering with similarity, the channels are delin-eated nicely, allowing us to visually distinguish channelfills and the surrounded slope fans. We can also identifysome purple color clusters (orange arrows), which weinterpret to be crevasse splays.

Next, we apply GTM to the same four attributes. Wecompute two “orthogonal” projections of data onto themanifold and then onto the two dimensions of the latentspace. Rather than define explicit clusters, we projectthe mean a posteriori probability distribution onto the2D latent space and then export the projection onto thetwo latent space axes. We crossplot the projectionsalong axes 1 and 2 and map them against a 2D colorbar (Figure 19). In this slice, we see channels delineatedby purple colors (white arrows), point bar and crevassesplays in pinkish colors (yellow arrows), and slope fansin lime green colors (red arrows). We can also identifysome thin, braided channels at the south end of the sur-vey (blue arrow). Similarly to the SOM result, similarityseparates the incised valleys from the slope fans. How-ever, the geologic meaning of the orange-colored facies

Figure 20. The same time slice through theGTM projections shown in the previous imagebut now displayed as four seismic facies. Todo so, we first create two GTM “components”aligned with the original first two principalcomponents. We then pick four colored poly-gons representing four seismic facies on thehistogram generated using a commercialcrossplot tool. This histogram is a map ofthe GTM posterior probability distribution inthe latent space. The yellow polygon repre-sents overbank deposits, the blue polygon rep-resents channels/canyons, the green polygonrepresents slope fan deposits, and the red pol-ygon represents everything else.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

is somehow vague. This is the nature of unsupervisedlearning techniques, in that the clusters representtopological differences in the input data vectors,which are not necessarily the facies differences wewish to delineate. We can ameliorate this shortcomingby adding a posteriori supervision to the GTM mani-fold. The simplest way to add supervision is to com-pute the average attribute vectors about a givenseismic facies and map it to the GTM crossplot. Then,the interpreter can manually define clusters on the 2Dhistogram by constructing one or more polygons (Fig-ure 20), where we cluster the data into four facies:multistoried channels (blue), high-energy point barand crevasse splay deposits (yellow), slope fans(green), and “everything else” (red). A more quantita-tive methodology is to mathematically project theseaverage clusters onto the manifold and then cross-multiply the probability distribution of the control vec-tors against the probability distribution function ofeach data vector, thereby forming the Bhattacharyyadistance (Roy et al., 2013, 2014). Such measures thenprovide a probability ranging between 0% and 100% asto whether the data vector at any seismic sample pointis like the data vectors about well control (Roy et al.,2013, 2014) or like the average data vector within a fa-cies picked by the interpreter.

The a posteriori supervision added to GTM is thecritical prior supervision necessary for supervised clas-sification, such as the ANN and SVM. In this study, weused the same four attributes as input for unsupervisedand supervised learning techniques. Our supervision

Figure 21. Time slice at t ¼ 1.88 s throughthe corendered peak spectral frequency, peakspectral magnitude, and Sobel filter similarityvolumes. Seed points (training data) areshown with colors for the picked four facies:blue, multistoried channels; yellow, point barsand crevasse splays; red, channel flanks; andgreen, slope fans. Attribute vectors at theseseed points are used as training data in super-vised classification.

Figure 22. PNN errors through the training epochs. The neu-ral network reaches its best performance at epoch 42.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

consists of picked seed points for the three main faciespreviously delineated using the unsupervised classifica-tion results, which are multistoried channel, point barand crevasse splay deposits, and slope fans, plus an ad-ditional channel flank facies. The seed points are shownin Figure 21. Seed points should be picked with greatcaution to correctly represent the corresponding facies;any false picking (a seed point that does not belong tothe intended facies) will greatly compromise the classi-fication result. We then compute averages of the fourinput attributes within a 7 trace × 7 trace × 24 ms win-dow about each seed point to generate a training tablewhich consists of 4D input attribute vectors and 1D tar-gets (the labeled facies).

For our ANN application, we used the neural net-works toolbox in MATLAB, and generated a PNN com-posed of 20 neurons. Because of the relatively small sizeof the training data, the training process only took a sec-ond or so; however, because a PNN may converge tolocal minima, we are not confident that our first trainednetwork has the best performance. Our workflow isthen to rerun the training process 50 times and choosethe network exhibiting the lowest training and cross-validation errors. Figures 22 and 23 show the PNN per-formance during training, whereas Figure 24 shows thePNN classification result. We notice that all the training,

Figure 24. Time slice at t ¼ 1.88 s throughthe ANN classification result. White arrows in-dicate channels/canyons. Yellow arrows indi-cate point bars and crevasse splays.

Figure 23. Confusion tables for the same PNN shown in Fig-ure 21. From these tables, we find the training correctness tobe 90%, the testing and cross-validation correctness to be 86%,and 91%, warranting a reliable prediction.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

testing, and cross-validation performance are accept-able, with the training and cross-validation correctnessbeing approximately 90%, and the testing correctnessbeing more than 86%. We identify blue channel storieswithin the relatively larger scale incised valleys (whitearrows) and yellow point bars and crevasse splays (yel-low arrows). However, many of the slope fan depositsare now classified as channel flanks or multistoriedchannels (blue arrows), which need to be further cali-brated with well-log data. Nevertheless, as a supervisedlearning technique, ANN provides classification withexplicit geologic meaning, which is its primary advan-tage over unsupervised learning techniques.

Finally, we cluster our 4D input data using SVM, us-ing the same training data (interpreter picks) as forANN. The workflow is similar to ANN, in that we ran20 passes of training, varying the Gaussian kernel stan-dard deviation σ, and misclassification tolerance ε,parameters for each pass. These parameter choicesare easier than selecting the number of neurons forANN because the SVM algorithm solves a convex opti-mization problem that converges to a global minima.The training and cross-validation performance is com-parable with that of the ANN, with roughly 92% trainingcorrectness and 85% cross-validation correctness. Fig-ure 25 shows the SVM classification result at timet ¼ 1.88 s. The SVM map follows the same pattern as

Figure 25. Time slice at t ¼ 1.88 s through SVM classificationresult.Whitearrowsindicatemorecorrectlyclassifiedslopefans.The yellow arrow indicates crevasse splays. Red arrows showmisclassifications due to the possible acquisition footprint.

Figure 26. Time slice at t ¼ 1.88 s throughthe inline dip component of the reflectordip. Inline dip magnitude provides a photolikeimage of the paleocanyons.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

we have seen on the ANN map, but it is generallycleaner, with some differences in details. Comparedwith ANN, SVM successfully mapped more of theslope fans (white arrows), but it missed some crevassesplays that were correctly picked by ANN (yellowarrow). We also see a great amount of facies variationwithin the incised valleys, which is reasonable becauseof the multiple course changes of a paleochannelduring its deposition that results in multiple channelstories. Finally, we note some red lines followingnorthwestern–southeastern direction (red arrows),which correspond to the acquisition footprint.

ConclusionsIn this paper, we have compared and contrasted

some of the more important multiattribute facies clas-sification tools, including four unsupervised (PCA, k-means, SOM, and GTM) and two supervised (ANNand SVM) learning techniques. In addition to highlight-ing the differences in assumptions and implementation,we have applied each method to the same CanterburyBasin survey, with the goal of delineating seismic faciesin a turbidite system to demonstrate the effectivenessand weaknesses of each method. The k-means andSOM move the user-defined number of cluster centerstoward the input data vectors. PCA is the simplest mani-fold method, where the data variability in our examplesis approximated by a 2D plane defined by the first twoeigenvectors. GTM is more accurately described as amapping technique, like PCA, where the clusters areformed either in the human brain as part of visualizationor through crossplotting and the construction of poly-gons. SOM and GTM manifolds deform to fit the N -di-mensional data. In SOM, the cluster centers (prototypevectors) move along the manifold toward the data vec-tors, forming true clusters. In all four methods, any la-beling of a given cluster to a given facies happens afterthe process is completed. In contrast, ANN and SVMbuild a specific relation between the input data vectorsand a subset of user-labeled input training data vectors,thereby explicitly labeling the output clusters to the de-sired facies. Supervised learning is constructed from alimited group of training samples (usually at certainwell locations or manually picked seed points), whichgenerally are insufficient to represent all the lithologicand stratigraphic variations within a relatively largeseismic data volume. A pitfall of supervised learningis that unforeseen clusters will be misclassified as clus-ters that have been chosen.

For this reason, unsupervised classification productscan be used to construct not only an initial estimate ofthe number of classes, but also a validation tool to de-termine if separate clusters have been incorrectlylumped together. We advise computing unsupervisedSOM or GTM prior to picking seed points for sub-sequent supervised learning, to clarify the topologicaldifferences mapped by our choice of attributes. Suchmapping will greatly improve the picking confidence

because the seed points are now confirmed by humanexperience and mathematical statistics.

The choice of the correct suite of attributes is criti-cal. Specifically, images that are ideal for multiattributevisualization may be suboptimal for clustering. Wemade several poor choices in previous iterations ofwriting this paper. The image of the inline (south-west–northeast) structural dip illustrates this problemdirectly. Although a skilled interpreter sees a great dealof detail in Figure 26, there is no clear facies differencebetween positive and negative dips, such that this com-ponent of vector dip cannot be used to differentiatethem. A better choice would be dip magnitude, exceptthat a long-wavelength overprint (such as descendinginto the basin) would again bias our clustering in a man-ner that is unrelated to facies. Therefore, we tried touse relative changes in dip — curvedness and shapeindices measure lateral changes in dip, and reflectorconvergence, which differentiates conformal from non-conformal reflectors.

Certain attributes should never be used in clustering.Phase, azimuth, and strike have circular distributions,in which a phase value of −180 indicates the same valueas +180. No trend can be found. Although the shape in-dex s is not circular, ranging between −1 and +1, thehistogram has a peak about the ridge (s ¼ þ0.5) andabout the valley (s ¼ −0.5). We speculate that shapecomponents may be more amenable to classification.Reflector convergence follows the same pattern ascurvedness. For this reason, we only used curvednessas a representative of these three attributes. The addi-tion of this choice improved our clustering.

Edge attributes such as Sobel filter similarity and co-herence are not useful for the example shown here; in-stead, we have visually added them as an edge “cluster”and corendered with the images shown in Figures 15–21, 24, and 25. In contrast, when analyzing more chaoticfeatures, such as salt domes and karst collapse, coher-ence is a good input to clustering algorithms. We dowish to provide an estimate of continuity and random-ness to our clustering. To do so, use GLCM homo-geneity as an input attribute.

Theoretically, no one technique is superior to allthe others in every aspect, and each technique hasits inherent advantages and defects. The k-means witha relatively small number of clusters is the easiest algo-rithm to implement, and it provides rapid interpretation,but it lacks the relation among clusters. SOM provides agenerally more “interpreter-friendly” clustering resultwith topological connections among clusters, but it iscomputationally more demanding than k-means. GTMrelies on probability theory and enables the interpreterto add a posteriori supervision by manipulating the da-ta’s posterior probability distribution; however, it is notwidely accessible to the exploration geophysicist com-munity. Rather than displaying the conventional clusternumbers (or labels), we suggest displaying the clustercoordinates projected onto the 2D SOM and GTM latentspace axes. Doing so not only provides greater flexibil-


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

ity in constructing a 2D color bar, but it also providesdata that can be further manipulated using 2D cross-plot tools.

For the two supervised learning techniques, ANNsuffers from the convergence problem and requires ex-pertise to achieve the optimal performance, whereasthe computation cost is relatively low. SVM is math-ematically more robust and easier to train, but it is morecomputationally demanding.

Practically, if no software limitations are set, we canmake suggestions on how an interpreter can incorpo-rate these techniques to facilitate seismic facies inter-pretation at different exploration and developmentstages. To identify the main features in a recently ac-quired 3D seismic survey on which limited to no tradi-tional structural interpretation is done, k-means is agood candidate for exploratory classification startingwith a small K (typically K ¼ 4) and gradually increasethe number of class. As more data are acquired (e.g.,well-log data and production data) and detailed struc-tural interpretation has been performed, SOM orGTM focusing in the target formations will provide amore refined classification, which needs to be cali-brated with wells. In the development stage, when mostof the data have been acquired, with proper trainingprocesses, ANN and SVM provide targeted products,characterizing the reservoir by mimicking interpreters’behavior. Generally, SVM provides superior classifica-tion than ANN but at a considerably higher computa-tional cost, so choosing between these two requiresbalancing performance and runtime cost. As a practicalmanner, no given interpretation software platform pro-vides all five of these clustering techniques, such thatmany of the choices are based on software availability.

Because we wish this paper to serve as an inspirationof interpreters, we do want to reveal one drawback ofour work: All the classifications are performed volumet-rically but not along a certain formation. Such classifi-cation may be biased by the bonding formations aboveand below the target formation (if we do have a targetformation), therefore contaminating the facies map.However, we want to make the point that such classi-fication can happen at a very early stage of interpreta-tion, when structural interpretation and well logs arevery limited. And even in such a situation, we can stilluse classification techniques to generate facies volumesto assist subsequent interpretation.

In the 1970s and 1980s, much of the geophysical in-novation in seismic processing and interpretation wasfacilitated by the rapid evolution of computer technol-ogy — from mainframes to minicomputers to worksta-tions to distributed processing. We believe that similaradvances in facies analysis will be facilitated by therapid innovation in “big data” analysis, driven by theneeds of marketing and security. Although we maynot answer Turing’s question, “Can machines think?”we will certainly be able to teach them how to emulatea skilled human interpreter.

AcknowledgmentsWe thank New Zealand Petroleum and Minerals for

providing the Waka 3D seismic data to the public. Fi-nancial support for this effort and for the developmentof our own k-means, PCA, SOM, GTM, and SVM algo-rithms was provided by the industry sponsors of theAttribute-Assisted Seismic Processing and Interpreta-tion Consortium at the University of Oklahoma. TheANN examples were run using MATLAB. We thankour colleagues F. Li, S. Verma, L. Infante, and B. Walletfor their valuable input and suggestions. All the 3D seis-mic displays were made using licenses to Petrel, pro-vided to the University of Oklahoma for research andeducation courtesy of Schlumberger. Finally, we wantto express our great respect to the people that have con-tributed the development of pattern recognition tech-niques in exploration geophysics field.

Appendix A

Mathematical detailsIn this appendix, we summarize many of the

mathematical details defining the various algorithm im-plementations. Although insufficient to allow a straight-forward implementation of each algorithm, we hope tomore quantitatively illustrate the algorithmic assump-tions, as well as algorithmic similarities and differences.Because k-means and ANNs have been widely studied,in this appendix, we only give some principal statisticalbackground, and brief reviews of SOM, GTM, andPSVM algorithms involved in this tutorial. We begin thisappendix by giving statistical formulations of thecovariance matrix, principal components, and the Ma-halanobis distance when applied to seismic attributes.We further illustrate the formulations and some neces-sary theory for SOM, GTM, ANN, and PSVM. Because ofthe extensive use of mathematical symbols and nota-tions, a table of shared mathematical notations is givenin Table A-1. All other symbols are defined in thetext.

Covariance matrix, principal components, and theMahalanobis distance

Given a suite of N attributes, the covariance matrix isdefined as

Cmn ¼ 1J

XJj¼1

ðajmðtj; xj; yjÞ − μmÞðajnðtj; xj; yjÞ − μnÞ;

(A-1)

where ajm and ajn are the mth and nth attributes, J isthe total number of data vectors, and where

μn ¼ 1J

XJj¼1

ajnðtj; xj; yjÞ; (A-2)

is the mean of the nth attribute. If we compute the ei-genvalues λi and eigenvectors vi of the real, symmetric


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

covariance matrix C, the ith principal component atdata vector j is defined as

pji ¼XNn¼1

ajnðtj; xj; yjÞvni; (A-3)

where vni indicates the nth attribute component of theith eigenvector. In this paper, the first two eigenvectorsand eigenvalues are also used to construct an initialmodel in the SOM and GTM algorithms.

The Mahalanobis distance rjq of the jth sample fromthe qth cluster center, θq is defined as

r2jq ¼XNn¼1

XNm¼1

ðajn − θnqÞC−1nmðajm − θmqÞ; (A-4)

where the inversion of the covariance matrix C takesplace prior to extracting the mnth element.

Self-organizing mapsRather than computing the Mahalanobis distance,

the SOM and GTM first normalize the data using a Z-scale. If the data exhibit an approximately Gaussian dis-tribution, the Z-scale of the nth attribute is obtained bysubtracting the mean and dividing by the standarddeviation (the square root of the diagonal of the covari-ance matrix Cnn). To Z-scale non-Gaussian distributeddata, such as coherence, one needs to first break downthe data using histograms that approximate a Gaussian.The objective of the SOM algorithm is to map the inputseismic attributes onto a geometric manifold called theself-organized map. The SOM manifold is defined by asuite of prototype vectors mk lying on a lower dimen-sional (in our case, 2D) surface, which fit the N -dimen-sional attribute data. The prototype vectors mk aretypically arranged in 2D hexagonal or rectangular struc-ture maps that preserve their original neighborhood re-lationship, such that neighboring prototype vectors

represent similar data vectors. The number of prototypevectors in the 2D map determines the effectiveness andgeneralization of the algorithm. One strategy is to esti-mate the number of initial clusters, and then to eitherdivide or join clusters based on distance criteria. In ourcase, we follow Gao (2007) and overdefine the numberof clusters to be the maximum number of colors sup-ported by our visualization software. Interpreters theneither use their color perception or construct polygonson 2D histograms to define a smaller number ofclusters.

Our implementation of the SOM algorithm is summa-rized in Figure A-1. After computing Z-scores of theinput data, the initial manifold is defined to be a planedefined by the two first principal components. Proto-type vectors mk are defined on a rectangular grid tothe first two eigenvalues to range between �2ðλ1Þ1∕2and �2ðλ2Þ1∕2. The seismic attribute data are then com-pared with each of the prototype vectors, finding thenearest one. This prototype vector and its nearestneighbors (those that fall within a range σ, defining aGaussian perturbation) are moved toward the datapoint. After all the training vectors have been examined,the neighborhood radius σ is reduced. Iterations con-tinue until σ approaches the distance between the origi-nal prototype vectors. Given this background, Kohonen(2001) defines the SOM training algorithm using the fol-lowing five steps:

1) Randomly choose a previously Z-scored inputattribute vector aj from the set of input vectors.

2) Compute the Euclidean distance between this vec-tor aj and all prototype vectors mk, k ¼ 1;2; : : : ; K .The prototype vector that has the minimum distanceto the input vector aj is defined to be the “winner” orthe best matching unit, mb:

kaj −mbk ¼ mink

fkaj −mkkg: (A-5)

Table A-1. List of shared mathematical symbols.

Variable name Definition

n and N Attribute index and number of attributes

j and J Voxel (attribute vector) index and number of voxels

k and K Manifold index and number of grid points

aj The jth attribute data vector

p Matrix of principal components

C Attribute covariance matrix

μn Mean of the nth attribute

λm and vm The mth eigenvalue and eigenvector pair

mk The kth grid point lying on the manifold (prototype vector for SOM, or Gaussian center for GTM)

uk The kth grid point lying on the latent space

rjk The Mahalanobis distance between the jth data vector and the kth cluster center or manifold grid point

I Identity matrix of dimension defined in the text


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

3) Update the winner prototype vector and its neigh-bors. The updating rule for the weight of the kthprototype vector inside and outside the neighbor-hood radius σðtÞ is given by

mkðtþ1Þ

¼�mkðtÞþαðtÞhbkðtÞ½aj−mkðtÞ�; ifkrk−rbk≤σðtÞ;mkðtÞ; ifkrk−rbk>σðtÞ;

(A-6)

where the neighborhood radius defined as σðtÞ ispredefined for a problem and decreases with eachiteration t. Here, rb and rk are the position vectorsof the winner prototype vectormb and the kth proto-type vector mb, respectively. We also define theneighborhood function hbkðtÞ, the exponential learn-ing function αðtÞ, and the length of training T. ThehbkðtÞ and αðtÞ decrease with each iteration in thelearning process and are defined as

hbkðtÞ ¼ e−ðkrb−rkk2∕2σ2ðtÞ; (A-7)

and

αðtÞ ¼ α0

�0.005α0

�t∕T

. (A-8)

4) Iterate through each learning step (steps [1–3]) untilthe convergence criterion (which depends on thepredefined lowest neighborhood radius and the min-imum distance between the prototype vectors in thelatent space) is reached.

5) Project the prototype vectors onto the first two prin-cipal components and color code using a 2D colorbar (Matos et al., 2009).

Generative topological mappingIn GTM, the grid points of our 2D deformed manifold

in N-dimensional attribute space define the centers,mk,of Gaussian distributions of variance σ2 ¼ β−1. Thesecenters mk are in turn projected onto a 2D latent space,defined by a grid of nodes uk and nonlinear basis func-tions Φ:

mk ¼XMm¼1

WkmΦmðukÞ; (A-9)

where W is a K ×M matrix of unknown weights,ΦmðukÞ is a set of M nonlinear basis functions, mkare vectors defining the deformed manifold in the N -di-mensional data space, and k ¼ 1;2; : : : ; K is the numberof grid points arranged on a lower L-dimensional latentspace (in our case, L ¼ 2). A noise model (the probabil-ity of the existence of a particular data vector aj givenweights W and inverse variance β) is introduced foreach measured data vector. The probability densityfunction p is represented by a suite of K radially sym-metric N -dimensional Gaussian functions centeredabout mk with variance of 1∕β:

pðajjW; βÞ ¼XKk¼1

1K

�β

2π

�N2

e−β2kmk−ajk2 . (A-10)

The prior probabilities of each of these components areassumed to be equal with a value of 1∕K , for all datavectors aj . Figure 4 illustrates the GTM mapping froman L ¼ 2D latent space to the 3D data space.

The probability density model (GTM model) is fit tothe data aj to find the parametersW and β using a maxi-mum likelihood estimation. One popular techniqueused in parameter estimations is the EM algorithm. Us-ing Bayes’ theorem and the current values of the GTM

Figure A-1. The SOM workflow.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

model parameters W and β, we calculate the J × K pos-terior probability or responsibility, Rjk, for each of theKcomponents in latent space for each data-vector:

Rjk ¼e−

β2kmk−ajk2P

i e−β2kmi−ajk2

. (A-11)

Equation A-11 forms the “E-step” or expectation stepin the EM algorithm. The E-step is followed by the maxi-mization or “M-step,” which uses these responsibilitiesto update the model for a new weight matrixW by solv-ing a set of linear equations (Dempster et al., 1977):

�ΦTGΦþ α

βI

�WT

new ¼ ΦTRX; (A-12)

where Gkk ¼P

Jj¼1 Rjk are the nonzero elements of the

K × K diagonal matrix G, Φ is a K ×M matrix with el-ements Φ ¼ ΦmðukÞ, α is a regularization constant toavoid division by zero, and I is the M ×M identitymatrix.

The updated value of β is given by

1βnew

¼ 1JN

XJj¼1

XKk¼1

RjkkWkmnewΦmðukÞ − ajk2. (A-13)

The initialization ofW is done so that the initial GTMmodel approximates the principal components (largesteigenvectors) of the input data, aj. The value of β−1 isinitialized to be the larger of the ðLþ 1Þth eigenvaluefrom PCA, where L is the dimension of the latent space.In Figure 4, L ¼ 2, such that we initialize β−1 to be theinverse of the third eigenvalue. Figure A-2 summarizesthis workflow.

Artificial neural networksANNs are a class of pattern-recognition algorithm

that were derived separately in different fields, suchas statistics and artificial intelligence. ANNs are easilyaccessible for most of geophysical interpreters, so weonly provide a general workflow of applying an ANNto seismic facies classification for completeness of thistutorial. The workflow is shown in Figure A-3.

Proximal support vector machinesBecause SVMs are originally developed to solve

binary classification problems, the arithmetic we beginwith a summary of the arithmetic describing a binaryPSVM classifier.

Similarly to SVM, a PSVM decision condition is de-fined as (Figure 7):

xTω − γ

�> 0; x ∈ Xþ;¼ 0; x ∈ X þ orX−;< 0; x ∈ X−;

(A-14)

where x is an N -dimensional attribute vector to be clas-sified, ω is a N × 1 vector implicitly defines the normalof the decision-boundary in the higher-dimensionalspace, γ defines the location of the decision-boundary,and “Xþ” and “X−” indicate the two classes of thebinary classification. PSVM solves an optimizationproblem and takes the form of (Fung and Mangasarian,2001):

minω;γ;y

ε12kyk2 þ 1

2ðωTωþ γ2Þ; (A-15)

subject to

Figure A-2. The GTM workflow.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

Dðaω − eγÞ þ y ¼ e. (A-16)

In this optimization problem, y is a J × 1 error vari-able and a is a J × N sample matrix composed of Jattribute vectors, which can be divided into two classes,Xþ and X−. The D is a J × J diagonal matrix of labelswith a diagonal composed of þ1 for Xþ and −1 for X−.The ε is a nonnegative parameter. Finally, e is a J × 1column vector of ones. This optimization problemcan be solved by a J × 1 Lagrangian multiplier t:

Lðω; γ; y; tÞ ¼ ε12kyk2 þ 1

2ðωTωþ γ2Þ

− tT ðDðaω − eγÞ þ y − eÞ: (A-17)

By setting the gradients of L to zero, we obtain ex-pressions for ω, γ and y explicitly in the knowns and t,where t can further be represented by a, D and ε. Then,by substituting ω in equations A-15 and A-16 using itsdual equivalent ω ¼ aTDt, we can arrive at (Fungand Mangasarian, 2001)

minω;γ;y

ε12kyk2 þ 1

2ðtTtþ γ2Þ; (A-18)

subject to

DðaaTDt − eγÞ þ y ¼ e. (A-19)

Equations A-18 and A-19 provide a more desirableversion of the optimization problem because one cannow insert kernel methods to solve nonlinear classifica-tion problems made possible by the term aaT in equa-tion A-19. Using the Lagrangian multiplier again (thistime we denote the multiplier as τ), we can minimizethe new optimization problem against t, γ, y, and τ.

By setting the gradients of these four variables to zero,we can express t, γ, and y explicitly by τ and otherknowns, where τ is solely a dependent on the data ma-trices. Then, for N -dimensional attribute vector x, wewrite the decision conditions as

xTaTDt − γ

�> 0; x ∈ Xþ;¼ 0; x ∈ X þ orX−;< 0; x ∈ X−;

(A-20)

with

t ¼ DKTD

�I

εþGGT

�−1e; (A-21)

γ ¼ eTD

�I

εþGGT

�−1e; (A-22)

and

G ¼ D½K −e �: (A-23)

Instead of a, we have K in equations A-21 and A-23,which is a Gaussian kernel function of a and aT that hasthe form of

Kða;aTÞij ¼ expð−σkaTi• − aTj•k2Þ; i; j ∈ ½1; J�; (A-24)

where σ is a scalar parameter. Finally, by replacingxTaT by its corresponding kernel expression, the deci-sion condition can be written as

KðxT ; aT ÞDt − γ

�> 0; x ∈ Xþ;¼ 0; x ∈ X þ orX−;< 0; x ∈ X−;

(A-25)

Figure A-3. The ANN workflow.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

and

KðxT ; aT Þij ¼ expð−σkx − aTi•k2Þ; i ∈ ½1; J�: (A-26)

The formulations above represent a nonlinear PSVMclassifier.

To extend this binary classifier to handle multiclassclassification problems, some strategies have been de-veloped by researchers, which generally fall into threecategories: “one-versus-all,” “one-versus-one,” and “alltogether.” For Q classes, the former two strategiesbuild a suite of binary classifiers individually:(QðQ − 1Þ∕2 for the one-versus-one and Q for theone-versus-all algorithm, and we then use these clas-sifiers to construct the final classification decision.The all-together attempts to solve multiclass problemin one step. Hsu and Lin (2002) find one-versus-onemethod to be superior for large problems. There aretwo particular algorithms for one-versus-one strate-gies, namely, the “max wins” (Kreßel, 1999) anddirected acyclic graph (Platt et al., 2000) algorithms.Both algorithms provide comparable results while sur-passing the one-versus-all method in accuracy andcomputational efficiency.

Our approach uses a classification factor table to as-sign classes to unknown samples (Figure A-4). A clas-sification factor of an unknown sample point for acertain pilot class “X” is the normalized distance tothe binary decision boundary between X and the otherclass used when generating this binary decision boun-dary. An example of a classification factor table isshown in Figure A-4, and based on this table, the un-known sample point belongs to class “D.”

ReferencesAl-Anazi, A., and I. D. Gates, 2010, A support vector ma-

chine algorithm to classify lithofacies and model per-

meability in heterogeneous reservoirs: EngineeringGeology, 114, 267–277, doi: 10.1016/j.enggeo.2010.05.005.

Balch, A. H., 1971, Color sonagrams: A new dimension inseismic data interpretation: Geophysics, 36, 1074–1098,doi: 10.1190/1.1440233.

Barnes, A. E., and K. J. Laughlin, 2002, Investigation ofmethods for unsupervised classification of seismic data:72nd Annual International Meeting, SEG, Expanded Ab-stracts, 2221–2224.

Bennett, K. P., and A. Demiriz, 1999, Semi-supervisedsupport vector machines: in M. S. Kearns, S. A. Solla,and D. A. Cohn, eds., Advances in Neural InformationProcessing Systems 11: Proceedings of the 1998Conference, MIT Press, 368–374.

Bishop, C. M., 2006, Pattern recognition and machinelearning: Springer.

Bishop, C. M., M. Svensen, and C. K. I. Williams, 1998, Thegenerative topographic mapping: Neural Computation,10, 215–234, doi: 10.1162/089976698300017953.

Chopra, S., and K. J. Marfurt, 2007, Seismic attributes forprospect identification and reservoir characterization:SEG.

Coleou, T., M. Poupon, and K. Azbel, 2003, Unsupervisedseismic facies classification: A review and comparisonof techniques and implementation: The Leading Edge,22, 942–953, doi: 10.1190/1.1623635.

Corradi, A., P. Ruffo, A. Corrao, and C. Visentin, 2009, 3Dhydrocarbon migration by percolation technique in analternative sand-shale environment described by a seis-mic facies classification volume: Marine and PetroleumGeology, 26, 495–503, doi: 10.1016/j.marpetgeo.2009.01.002.

Cortes, C., and V. Vapnik, 1995, Support-vector net-works: Machine Learning, 20, 273–297, doi: 10.1023/A:1022627411411.

Figure A-4. The PSVM workflow.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

http://dx.doi.org/10.1016/j.enggeo.2010.05.005






http://dx.doi.org/10.1190/1.1440233

http://dx.doi.org/10.1190/1.1440233

http://dx.doi.org/10.1190/1.1440233

http://dx.doi.org/10.1162/089976698300017953

http://dx.doi.org/10.1162/089976698300017953

http://dx.doi.org/10.1190/1.1623635

http://dx.doi.org/10.1190/1.1623635

http://dx.doi.org/10.1190/1.1623635

http://dx.doi.org/10.1016/j.marpetgeo.2009.01.002






http://dx.doi.org/10.1023/A:1022627411411

http://dx.doi.org/10.1023/A:1022627411411

http://dx.doi.org/10.1023/A:1022627411411

Cristianini, N., and J. Shawe-Taylor, 2000, An introductionto support vector machines and other kernel-basedlearning methods: Cambridge University Press.

Dempster, A. P., N. M. Laird, and D. B. Rubin, 1977, Maxi-mum likelihood from incomplete data via the EM algo-rithm: Journal of the Royal Statistical Society, Series B,39, 1–38.

Duda, R. O., P. E. Hart, and D. G. Stork, 2001, Pattern clas-sification, 2nd ed.: John Wiley & Sons.

Eagleman, D., 2012, Incognito: The secret lives of the brain:Pantheon Books.

Forgy, E. W., 1965, Cluster analysis of multivariate data:Efficiency vs interpretability of classifications: Biomet-rics, 21, 768–769.

Fung, G., and O. L. Mangasarian, 2001, Proximal supportvector machine classifiers: Proceedings of the SeventhACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, ACM 2001, 77–86.

Fung, G. M., and O. L. Mangasarian, 2005, Multicategoryproximal support vector machine classifiers: MachineLearning, 59, 77–97, doi: 10.1007/s10994-005-0463-6.

Gao, D., 2007, Application of three-dimensional seismictexture analysis with special reference to deep-marinefacies discrimination and interpretation: An examplefrom offshore Angola, West Africa: AAPG Bulletin,91, 1665–1683, doi: 10.1306/08020706101.

Honório, B. C. Z., A. C. Sanchetta, E. P. Leite, and A. C.Vidal, 2014, Independent component spectral analysis:Interpretation, 2, no. 1, SA21–SA29, doi: 10.1190/INT-2013-0074.1.

Hsu, C., and C. Lin, 2002, A comparison of methods formulticlass support vector machines: IEEE Transactionson Neural Networks, 13, 732–744, doi: 10.1109/TNN.2002.1000139.

Huang, Z., J. Shimeld, M. Williamson, and J. Katsube, 1996,Permeability prediction with artificial neural networkmodeling in the Ventura gas field, offshore eastern Can-ada: Geophysics, 61, 422–436, doi: 10.1190/1.1443970.

Jancey, R. C., 1966, Multidimensional group analysis: Aus-tralian Journal of Botany, 14, 127–130, doi: 10.1071/BT9660127.

Justice, J. H., D. J. Hawkins, and D. J. Wong, 1985, Multi-dimensional attribute analysis and pattern recognitionfor seismic interpretation: Pattern Recognition, 18,391–399, doi: 10.1016/0031-3203(85)90010-X.

Kalkomey, C. T., T. Zhao, X. Jin, and K. J. Marfurt, 1997,Potential risks when using seismic attributes as predic-tors of reservoir properties: The Leading Edge, 16, 247–251, doi: 10.1190/1.1437610.

Kohonen, T., 1982, Self-organized formation of topologi-cally correct feature maps: Biological Cybernetics,43, 59–69, doi: 10.1007/BF00337288.

Kohonen, T., 2001, Self-organizing maps (3rd ed.):Springer-Verlag.

Kreßel, U., 1999, Pairwise classification and support vectormachines: Advances in kernel methods — Support vec-

tor learning, in B. Schölkopf, C. J. C. Burges, and A. J.Smola, eds., Advances in kernel methods: MIT Press,255–268.

Kuzma, H. A., and J. W. Rector, 2004, Nonlinear AVOinversion using support vector machines: 74th AnnualInternational Meeting, SEG, Expanded Abstracts,203–206.

Kuzma, H. A., and J. W. Rector, 2005, The Zoeppritz equa-tions, information theory, and support vector machines:75th Annual International Meeting, SEG, Expanded Ab-stracts, 1701–1704.

Kuzma, H. A., and J. W. Rector, 2007, Support vector ma-chines implemented on a graphics processing unit: 77thAnnual International Meeting, SEG, Expanded Ab-stracts, 2089–2092.

Li, J., and J. Castagna, 2004, Support vector machine(SVM) pattern recognition to AVO classification: Geo-physical Research Letters, 31, L02609, doi: 10.1029/2003GL019044.

Lim, J., 2005, Reservoir properties determination usingfuzzy logic and neural networks from well data inoffshore Korea: Journal of Petroleum Science andEngineering, 49, 182–192, doi: 10.1016/j.petrol.2005.05.005.

Lubo, D., K. Marfurt, and V. Jayaram, 2014, Statistical char-acterization and geological correlation of wells usingautomatic learning Gaussian mixture models: Uncon-ventional Resources Technology Conference, ExtendedAbstracts, 774–783.

MacQueen, J., 1967, Some methods for classification andanalysis of multivariate observations: in L. M. LeCam, and J. Neyman, eds.,Proceedings of the FifthBerkeley Symposium on Mathematical Statistics andProbability, University of California Press, 281–297.

Mangasarian, O. L., and E. W. Wild, 2006, Multisurfaceproximal support vector machine classification via gen-eralized eigenvalues: IEEE Transactions on PatternAnalysis and Machine Intelligence, 28, 69–74, doi: 10.1109/TPAMI.2006.17.

Matos, M. C., K. J. Marfurt, and P. R. S. Johann, 2009, Seis-mic color self-organizing maps: Presented at 11thInternational Congress of the Brazilian Geophysical So-ciety, Extended Abstracts.

Meldahl, P., R. Heggland, B. Bril, and P. de Groot, 1999,The chimney cube, an example of semi‐automated de-tection of seismic objects by directive attributes andneural networks. Part I: Methodology: 69th AnnualInternational Meeting, SEG, Expanded Abstracts,931–934.

Mitchell, J., and H. L. Neil, 2012, OS20/20 Canterbury —

Great South Basin TAN1209 voyage report: National In-stitute of Water and Atmospheric Research Ltd (NIWA).

Murat, M. E., and A. J. Rudman, 1992, Automated firstarrival picking: A neural network approach: Geophysi-cal Prospecting, 40, 587–604, doi: 10.1111/j.1365-2478.1992.tb00543.x.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

http://dx.doi.org/10.1007/s10994-005-0463-6

http://dx.doi.org/10.1007/s10994-005-0463-6

http://dx.doi.org/10.1306/08020706101

http://dx.doi.org/10.1306/08020706101

http://dx.doi.org/10.1190/INT-2013-0074.1




http://dx.doi.org/10.1109/TNN.2002.1000139




http://dx.doi.org/10.1190/1.1443970

http://dx.doi.org/10.1190/1.1443970

http://dx.doi.org/10.1190/1.1443970

http://dx.doi.org/10.1071/BT9660127



http://dx.doi.org/10.1016/0031-3203(85)90010-X

http://dx.doi.org/10.1016/0031-3203(85)90010-X

http://dx.doi.org/10.1190/1.1437610

http://dx.doi.org/10.1190/1.1437610

http://dx.doi.org/10.1190/1.1437610

http://dx.doi.org/10.1007/BF00337288

http://dx.doi.org/10.1007/BF00337288

http://dx.doi.org/10.1029/2003GL019044

http://dx.doi.org/10.1029/2003GL019044

http://dx.doi.org/10.1029/2003GL019044

http://dx.doi.org/10.1016/j.petrol.2005.05.005






http://dx.doi.org/10.1109/TPAMI.2006.17




http://dx.doi.org/10.1111/j.1365-2478.1992.tb00543.x






Nazari, S., H. A. Kuzma, and J. W. Rector, 2011, Predictingpermeability from well log data and core measure-ments using support vector machines: 81st AnnualInternational Meeting, SEG, Expanded Abstracts,2004–2007.

Platt, J. C., N. Cristianini, and J. Shawe-Taylor, 2000,Large margin DAGs for multiclass classification:Advances in Neural Information Processing Systems,12, 547–553.

Röth, G., and A. Tarantola, 1994, Neural networks and in-version of seismic data: Journal of Geophysical Re-search, 99, 6753–6768, doi: 10.1029/93JB01563.

Roy, A., 2013, Latent space classification of seismic facies:Ph.D. dissertation, The University of Oklahoma.

Roy, A., S. R. Araceli, J. T. Kwiatkowski, and K. J. Marfurt,2014, Generative topographic mapping for seismic fa-cies estimation of a carbonate wash, Veracruz Basin,Southern Mexico: Interpretation, 2, no. 1, SA31–SA47,doi: 10.1190/INT-2013-0077.1.

Roy, A., B. L. Dowdell, and K. J. Marfurt, 2013, Character-izing a Mississippian tripolitic chert reservoir using 3Dunsupervised and supervised multiattribute seismic fa-cies analysis: An example from Osage County, Okla-homa: Interpretation, 1, no. 2, SB109–SB124, doi: 10.1190/INT-2013-0023.1.

Russell, B., D. Hampson, J. Schuelke, and J. Quirein, 1997,Multiattribute seismic analysis: The Leading Edge, 16,1439–1443, doi: 10.1190/1.1437486.

Sammon, W. J., 1969, A nonlinear mapping for data struc-ture analysis, IEEE Transactions on Computers, C-18,401–409.

Schölkopf, B., and A. J. Smola, 2002, Learning with kernels:Support vector machines, regularization, optimization,and beyond: MIT Press.

Shawe-Taylor, J., and N. Cristianini, 2004, Kernelmethods for pattern analysis: Cambridge UniversityPress.

Sonneland, L., 1983, Computer aided interpretation of seis-mic data: 53rd Annual International Meeting, SEG, Ex-panded Abstracts, 546–549.

Strecker, U., and R. Uden, 2002, Data mining of 3D post-stack attribute volumes using Kohonen self-organizingmaps: The Leading Edge, 21, 1032–1037, doi: 10.1190/1.1518442.

Taner, M. T., F. Koehler, and R. E. Sheriff, 1979, Complexseismic trace analysis: Geophysics, 44, 1041–1063, doi:10.1190/1.1440994.

Tong, S., and D. Koller, 2001, Support vector machineactive learning with applications to text classifica-tion: Journal of Machine Learning Research, 2, 45–66.

Torres, A., and J. Reveron, 2013, Lithofacies discrimi-nation using support vector machines, rock physicsand simultaneous seismic inversion in clastic reser-voirs in the Orinoco Oil Belt, Venezuela: 83rd Annual

International Meeting, SEG, Expanded Abstracts, 2578–2582.

Turing, A. M., 1950, Computing machinery and intelligence:Mind, 59, 433–460, doi: 10.1093/mind/LIX.236.433.

Uruski, C. I., 2010, New Zealand’s deepwater frontier:Marine and Petroleum Geology, 27, 2005–2026, doi:10.1016/j.marpetgeo.2010.05.010.

van der Baan, M., and C. Jutten, 2000, Neural networks ingeophysical applications: Geophysics, 65, 1032–1047,doi: 10.1190/1.1444797.

Verma, S., A. Roy, R. Perez, and K. J. Marfurt, 2012, Map-ping high frackability and high TOC zones in the BarnettShale: Supervised probabilistic neural network vs. un-supervised multi-attribute Kohonen SOM: 82nd AnnualInternational Meeting, SEG, Expanded Abstracts, doi:10.1190/segam2012-1494.1.

Wallet, C. B., M. C. Matos, and J. T. Kwiatkowski, 2009,Latent space modeling of seismic data: An overview:The Leading Edge, 28, 1454–1459, doi: 10.1190/1.3272700.

Wang, G., T. R. Carr, Y. Ju, and C. Li, 2014, Identifying or-ganic-rich Marcellus Shale lithofacies by support vectormachine classifier in the Appalachian basin: Computers& Geosciences, 64, 52–60, doi: 10.1016/j.cageo.2013.12.002.

West, P. B., R. S. May, E. J. Eastwood, and C. Rossen, 2002,Interactive seismic facies classification using texturalattributes and neural networks: The Leading Edge,21, 1042–1049, doi: 10.1190/1.1518444.

Wong, K. W., Y. S. Ong, T. D. Gedeon, and C. C. Fung,2005, Reservoir characterization using support vectormachines: Presented at Computational Intelligencefor Modelling, Control and Automation, InternationalConference on Intelligent Agents, Web Technologiesand Internet Commerce, 354–359.

Yu, S., K. Zhu, and F. Diao, 2008, A dynamic all parametersadaptive BP neural networks model and its applicationon oil reservoir prediction: Applied Mathematics andComputation, 195, 66–75, doi: 10.1016/j.amc.2007.04.088.

Zhang, B., T. Zhao, X. Jin, and K. J. Marfurt, 2015, Brittle-ness evaluation of resource plays by integrating petro-physical and seismic data analysis: Interpretation, 3,no. 2, T81–T92, doi: 10.1190/INT-2014-0144.1.

Zhao, B., H. Zhou, and F. Hilterman, 2005, Fizz andgas separation with SVM classification: 75th AnnualInternational Meeting, SEG, Expanded Abstracts,297–300.

Zhao, T., V. Jayaram, K. J. Marfurt, and H. Zhou, 2014, Lith-ofacies classification in Barnett Shale using proximalsupport vector machines: 84th Annual InternationalMeeting, SEG, Expanded Abstracts, 1491–1495.

Zhao, T., and K. Ramachandran, 2013, Performance evalu-ation of complex neural networks in reservoir charac-terization: Applied to Boonsville 3-D seismic data: 83rd


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/

http://dx.doi.org/10.1029/93JB01563

http://dx.doi.org/10.1029/93JB01563







http://dx.doi.org/10.1190/1.1437486

http://dx.doi.org/10.1190/1.1437486

http://dx.doi.org/10.1190/1.1437486

http://dx.doi.org/10.1190/1.1518442

http://dx.doi.org/10.1190/1.1518442

http://dx.doi.org/10.1190/1.1518442

http://dx.doi.org/10.1190/1.1518442

http://dx.doi.org/10.1190/1.1440994

http://dx.doi.org/10.1190/1.1440994

http://dx.doi.org/10.1190/1.1440994

http://dx.doi.org/10.1093/mind/LIX.236.433










http://dx.doi.org/10.1190/1.1444797

http://dx.doi.org/10.1190/1.1444797

http://dx.doi.org/10.1190/1.1444797

http://dx.doi.org/10.1190/segam2012-1494.1



http://dx.doi.org/10.1190/1.3272700

http://dx.doi.org/10.1190/1.3272700

http://dx.doi.org/10.1190/1.3272700

http://dx.doi.org/10.1016/j.cageo.2013.12.002






http://dx.doi.org/10.1190/1.1518444

http://dx.doi.org/10.1190/1.1518444

http://dx.doi.org/10.1190/1.1518444

http://dx.doi.org/10.1016/j.amc.2007.04.088









Annual International Meeting, SEG, Expanded Ab-stracts, 2621–2624.

Tao Zhao received a B.S. (2011) in ex-ploration geophysics from China Uni-versity of Petroleum and an M.S.(2013) in geophysics from the Univer-sity of Tulsa. He is currently pursuingPh.D. in geophysics in the Universityof Oklahoma as a member of attributeassisted seismic processing and inter-pretation consortium. His current re-

search interests include developing and applying patternrecognition and machine learning techniques on seismicdata, unconventional shale resource play characterization,and seismic attributes development.

Vikram Jayaram received an M.S.(2004) and a Ph.D. (2009) in electricalengineering from the University ofTexas at El Paso. Prior to joiningthe University of Oklahoma (OU),he was a postdoctoral fellow in com-putational radio-nuclide imaging atthe University of Texas M.D. Ander-son working on image reconstruction

techniques for nuclear imaging. He was awarded the pres-tigious NASA Earth System Science doctoral fellowship,NGA, and Texas Instruments foundation scholarships tosupport his graduate studies. More recently, he has workedas a senior research geophysicist with Global GeophysicalServices, Co-PI of academic industry consortium at OU,Oklahoma Geological Survey. While a graduate student,he also interned with Kodak Research Labs. He has auth-ored more than 25 research papers and is currently servingas an assistant editor for the SEG/AAPG journal Interpre-tation. He is a member of IEEE Signal Processing Society,AGU, SEG, EAGE, AAPG, and a technical reviewer for sev-eral international conferences and journals in signalprocessing, machine learning and geophysics. His researchinterests include computer learning models for mathemati-cal and physical exploitation in imaging, signal/image en-hancement and interpretation, and big data analytics.

Atish Roy received a B.S. (2005) inphysics from the University of Cal-cutta and an M.S. (2007) in appliedgeophysics from the Indian Instituteof Technology, Bombay. After work-ing for two years at Paradigm Geo-physical, India, as a processinggeophysicist, he joined the Universityof Oklahoma in 2009 to pursue a Ph.D.

degree. There, he did research in the attribute-assistedseismic processing and interpretation research consortiumled by Kurt J. Marfurt. In his research, he developed algo-rithms and workflows for unsupervised, supervised, andprobabilistic seismic facies classification using multiattri-bute data. He received a Ph.D. in geophysics in the fall of2013. Currently, he is working at BP as a geophysical an-alyst in the technology group. His research interests in-clude pattern recognition, machine learning, seismicfacies analysis, seismic attribute analysis, reservoir charac-terization, and subsurface imaging.

Kurt. J. Marfurt received a Ph.D.(1978) in applied geophysics from Co-lumbia University’s Henry KrumbSchool of Mines in New York, wherehe also served as an assistant profes-sor for four years. He joined the Uni-versity of Oklahoma (OU) in 2007where he serves as the Frank andHenrietta Schultz Professor of Geo-

physics within the ConocoPhillips School of Geologyand Geophysics. His primary research interest is in the de-velopment and calibration of new seismic attributes to aidin seismic processing, seismic interpretation, and reservoircharacterization. His recent works focused on applying co-herence, spectral decomposition, structure-oriented filter-ing, and volumetric curvature to mapping fractures andkarst with a particular focus on resource plays. He worked18 years in a wide range of research projects at Amoco’sTulsa Research Center, after which he joined the Univer-sity of Houston for eight years as a professor of geophysicsand the director of the Allied Geophysics Lab. He has re-ceived SEG best paper (for coherence), SEG best presen-tation (for seismic modeling) and as a coauthor withSatinder Chopra best SEG poster (for curvature), and bestAAPG technical presentation. He also served as the EAGE/SEG Distinguished Short Course Instructor in 2006 (onseismic attributes). In addition to teaching and researchduties at OU, he leads short courses on attributes forSEG and AAPG.


Dow

nloa

ded

09/2

3/15

to 1

29.1

5.66

.178

. Red

istr

ibut

ion

subj

ect t

o SE

G li

cens

e or

cop

yrig

ht; s

ee T

erm

s of

Use

at h

ttp://

libra

ry.s

eg.o

rg/