functional outlier detection in grain-size distribution curves of detrital sediments

7
Functional outlier detection in grain-size distribution curves of detrital sediments Carlos Sierra, Celestino Ordóñez, José Luis Rodríguez Gallego Department of Mining Exploitation and Prospecting, Polytechnic School of Mieres, University of Oviedo, Campus de Mieres, Gonzalo Gutiérrez, s/n, 33600 Mieres, Asturias, Spain abstract article info Article history: Received 22 June 2013 Received in revised form 13 September 2013 Accepted 16 September 2013 Available online 25 September 2013 Editor: J. Knight Keywords: Coastal sediments Grain-size curves Outlier detection Functional bagplot Functional high density region (HDR) boxplot This article introduces functional outlier detection as a mathematical tool for the recognition of outliers in grain- size distribution curves. Two methods, namely the functional high density region (HDR) boxplot and functional bagplot, were applied for outlier detection in detrital sediment grain-size curves. The results of these two approaches were compared with those obtained with a classical modied z-score method. In this regard, while the HDR and functional bagplots revealed a signicant number of curves as outliers, the former showed superior sensitivity. Despite the visual appreciation of differences between the curves produced by the classical method, this technique was not able to detect outliers on the basis of just one characteristic parameter of the curves (the median in our case). None of the sedimentary structures (eolian and tidal) addressed was detected as outliers by the algorithms, thus these structures were incorporated into natural variability. The results suggest that the HDR bagplot and the functional bagplot could be introduced as a preceding outlier detection step in geochemical, sedimentological and coastal studies. © 2013 Elsevier B.V. All rights reserved. 1. Introduction Given its effects on surface area, packing density and colloidal properties, among others, the grain-size distribution of bulk particles is a key physical characteristic for many industrial processes, including mineral dressing, cement production, and the manufacture of phar- maceutical and cosmetic products (Fischmeister and Zahn, 1966; Peterson and Small, 1993; Merkus, 2008). Particle-size distribution is also essential in studies of sediment since correlations can be established with its origin, transport conditions, and depositional environment (Tanner, 1986), and to a great extent with its chemical composition (Weltje and Eynatten, 2004; Yalcin, 2009) and hydrau- lic conductivity (Cronican and Gribb, 2004). Outlier determination, that is to say, the determination of observa- tions deviating markedly from the other samples of the dataset and thus suggesting that their origin is caused by a different mechanism (Hawkins, 1980), can be hindered by the inherent variability of sedi- ments. This variability arises from the many agents involved in sedi- mentation: gravity, wind, and hydraulic forces, as well as frictional forces. In this regard, least squares regression lines (commonly used for these kinds of studies) can be affected by the presence of a small number of unusual samples in the dataset that are more severely con- taminated or that derive from areas with underlying geology that differs to that of the remaining samples. Such unusual samples diminish the capacity of the study to represent the dominant background population (Szava-Kovats, 2002). For these reasons, it is of interest to develop methods for outlier detection that allow for the exclusion of this natural variability. The granulometric characteristics of a sample are usually described by means of discrete parameters of the granulometric curve. Thus, the medium and median grain-size (Viscosi-Shirley et al., 2003), quartile (d 75 ,d 50 , and d 25 ), and decile diameters (d 10 ,d 50 ,d 90 ), or the percentage of sand, clay and silt is frequently used for this purpose (Blott and Pye, 2001). Based on these parameters, outliers have been dened when their values are not comprised in the 3rd quartile (Q 3 ) plus 1.5 times the inter-quartile range (IQR) (Matias, 2006; Matias et al., 2010). Neverthe- less, the classication obtained based on one or more of these parame- ters does not imply that the whole granulometric distribution is an outlier. Consequently, many outlying curves cannot be detected by means of discrete parameters, which, in many cases, are obtained by procedures (method of moments) that are simultaneously affected by the outliers (Friedman and Johnson, 1982; Blott and Pye, 2001). As the grain-size distribution of a sample is represented by a curve (its granulometric curve), we propose a functional data analysis approach to identify outliers in a set of grain-size distribution curves that represent many samples. This strategy is put forward as an alterna- tive to the common statistical techniques used with scalar or vector data. From a mathematical point of view, a curve of any one sample is considered an outlier when it has been generated by a stochastic pro- cess with a different distribution from that of the majority of curves (Febrero et al., 2008). Outlying curves can be classied as magnitude outliers when they lie outside the range of the vast majority of the Sedimentary Geology 297 (2013) 3137 Corresponding author. E-mail address: [email protected] (J.L.R. Gallego). 0037-0738/$ see front matter © 2013 Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.sedgeo.2013.09.006 Contents lists available at ScienceDirect Sedimentary Geology journal homepage: www.elsevier.com/locate/sedgeo

Upload: jose-luis-rodriguez

Post on 02-Jan-2017

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Functional outlier detection in grain-size distribution curves of detrital sediments

Sedimentary Geology 297 (2013) 31–37

Contents lists available at ScienceDirect

Sedimentary Geology

j ourna l homepage: www.e lsev ie r .com/ locate /sedgeo

Functional outlier detection in grain-size distribution curves ofdetrital sediments

Carlos Sierra, Celestino Ordóñez, José Luis Rodríguez Gallego ⁎Department of Mining Exploitation and Prospecting, Polytechnic School of Mieres, University of Oviedo, Campus de Mieres, Gonzalo Gutiérrez, s/n, 33600 Mieres, Asturias, Spain

⁎ Corresponding author.E-mail address: [email protected] (J.L.R. Gallego).

0037-0738/$ – see front matter © 2013 Elsevier B.V. All rihttp://dx.doi.org/10.1016/j.sedgeo.2013.09.006

a b s t r a c t

a r t i c l e i n f o

Article history:Received 22 June 2013Received in revised form 13 September 2013Accepted 16 September 2013Available online 25 September 2013

Editor: J. Knight

Keywords:Coastal sedimentsGrain-size curvesOutlier detectionFunctional bagplotFunctional high density region (HDR) boxplot

This article introduces functional outlier detection as a mathematical tool for the recognition of outliers in grain-size distribution curves. Two methods, namely the functional high density region (HDR) boxplot and functionalbagplot, were applied for outlier detection in detrital sediment grain-size curves. The results of these twoapproaches were compared with those obtained with a classical modified z-score method. In this regard, whilethe HDR and functional bagplots revealed a significant number of curves as outliers, the former showed superiorsensitivity. Despite the visual appreciation of differences between the curves produced by the classical method,this technique was not able to detect outliers on the basis of just one characteristic parameter of the curves(the median in our case). None of the sedimentary structures (eolian and tidal) addressed was detected asoutliers by the algorithms, thus these structures were incorporated into natural variability. The results suggestthat the HDR bagplot and the functional bagplot could be introduced as a preceding outlier detection step ingeochemical, sedimentological and coastal studies.

© 2013 Elsevier B.V. All rights reserved.

1. Introduction

Given its effects on surface area, packing density and colloidalproperties, among others, the grain-size distribution of bulk particlesis a key physical characteristic for many industrial processes, includingmineral dressing, cement production, and the manufacture of phar-maceutical and cosmetic products (Fischmeister and Zahn, 1966;Peterson and Small, 1993; Merkus, 2008). Particle-size distributionis also essential in studies of sediment since correlations can beestablished with its origin, transport conditions, and depositionalenvironment (Tanner, 1986), and to a great extent with its chemicalcomposition (Weltje and Eynatten, 2004; Yalcin, 2009) and hydrau-lic conductivity (Cronican and Gribb, 2004).

Outlier determination, that is to say, the determination of observa-tions deviating markedly from the other samples of the dataset andthus suggesting that their origin is caused by a different mechanism(Hawkins, 1980), can be hindered by the inherent variability of sedi-ments. This variability arises from the many agents involved in sedi-mentation: gravity, wind, and hydraulic forces, as well as frictionalforces. In this regard, least squares regression lines (commonly usedfor these kinds of studies) can be affected by the presence of a smallnumber of unusual samples in the dataset that are more severely con-taminated or that derive from areaswith underlying geology that differsto that of the remaining samples. Such unusual samples diminish thecapacity of the study to represent the dominant background population

ghts reserved.

(Szava-Kovats, 2002). For these reasons, it is of interest to developmethods for outlier detection that allow for the exclusion of this naturalvariability.

The granulometric characteristics of a sample are usually describedby means of discrete parameters of the granulometric curve. Thus, themedium and median grain-size (Viscosi-Shirley et al., 2003), quartile(d75, d50, and d25), and decile diameters (d10, d50, d90), or the percentageof sand, clay and silt is frequently used for this purpose (Blott and Pye,2001).

Based on these parameters, outliers have been defined when theirvalues are not comprised in the 3rd quartile (Q3) plus 1.5 times theinter-quartile range (IQR) (Matias, 2006;Matias et al., 2010). Neverthe-less, the classification obtained based on one or more of these parame-ters does not imply that the whole granulometric distribution is anoutlier. Consequently, many outlying curves cannot be detected bymeans of discrete parameters, which, in many cases, are obtainedby procedures (method of moments) that are simultaneously affectedby the outliers (Friedman and Johnson, 1982; Blott and Pye, 2001).

As the grain-size distribution of a sample is represented by acurve (its granulometric curve), we propose a functional data analysisapproach to identify outliers in a set of grain-size distribution curvesthat representmany samples. This strategy is put forward as an alterna-tive to the common statistical techniques used with scalar or vectordata. From a mathematical point of view, a curve of any one sample isconsidered an outlier when it has been generated by a stochastic pro-cess with a different distribution from that of the majority of curves(Febrero et al., 2008). Outlying curves can be classified as magnitudeoutliers when they lie outside the range of the vast majority of the

Page 2: Functional outlier detection in grain-size distribution curves of detrital sediments

32 C. Sierra et al. / Sedimentary Geology 297 (2013) 31–37

data, or as shape outlierswhen they have a very different shape from theother curves. Outliers can also exhibit a combination of these features(Hyndman and Shang, 2008; Shang and Hyndman, 2008).

Following the preceding considerations, the main aims of thecurrent study were: (a) to identify grain-size outliers on the basis ofgranulometric curves instead of discrete grain-size parameters. Forthis purpose, we used two mathematical procedures: the functionalbagplot and the functional high density region (HDR) boxplot; (b) tocompare these methods with a classical approach, namely themodifiedz-score; and (c) to evaluate how these procedures are influenced by thenatural variability of the sediments caused by physical transportation orbiological processes.

Within this framework, the grain-size distribution of detrital sedi-ments was studied in the estuarine and coastal area of Avilés (Asturias,northern Spain), a zone that shows great diversity in sediment originand nature.

2. Materials and methods

2.1. Data collection and grain-size analyses

The city of Avilés, on the coast of the Bay of Biscay, is a significantindustrial area and maritime port. The geology of the area consists ofPost-Ordovician soft sedimentary rocks such as Triassic siltstone, Juras-sic limestone, and dolomite from the (Gijón Formation) and MiddleJurassic siliceous conglomerate (Julivert et al., 1973). The processes offormation of the estuary included the fragmentation of these rocksduring the Alpine orogenesis, forming a complex of fault systems(among them Ventanielles fault is the most significant) (Flor-Blancoet al., 2013), and the later formation of a river system, which was filledduring the Late Pleistocene–Holocene transgression.

Beaches in the area are sandy, except the small ones formed inthe estuary margins, in which clasts are predominant. In addition,

Fig. 1. Location of the study area and s

a significant number of sedimentary structures, namely dunes, swashmarks, wave and eolian ripples, and also stratification profiles (reversegrading) are common throughout these areas (e.g., Flor, 1979, 1981,1986; Flor-Blanco et al., 2013).

The area has a vast metallurgical industry (zinc hydrometallurgy,aluminum electrolytic production, and siderurgy), and as a consequenceseveral spoil heaps and open-pit quarries. All of these activities have hada significant impact on soil and sediment quality (Gallego et al., 2002). Inrecent years, the dredgesmade in the estuary have led to the recession ofthe beaches and dune systems (López and Flor, 2008).

In this context, a set of samples were collected in four locations(Fig. 1), namely: 1) El Espartal beach, the largest eolian dune systemin Asturias; 2) Xagó Beach, with similar characteristics, located to theNorth of the estuary; 3) Estuary margins, areas governed by predomi-nantly mesotidal and semidiurnal tide dynamics (López and Flor,2008); and 4) Lodero Cove, an area at the mouth of the creek of Vioñoon the right side of the estuary and occupied by intertidal mud flats.

In all these locations, we collected a total of 99 samples (23, 16, 8,and 52 for the first, second, third and fourth sites respectively) surface(0–20 cm) sediment samples, 100 g approx. each. In the case of theLlodero Cove, sampleswere collected at low tide. Fig. 1 shows the spatialdistribution of the sampling points.

The samples were packed and sealed in pre-washed polyethylenebags and dried at room temperature and passed through the 2 mmsieve. They were then split into 2 sub-samples in order to determinecomposition and grain-size distribution. Organic matter, when present,was removed with hydrogen peroxide (Gee and Bauder, 1996).

The grain-size distribution was obtained using laser diffractionspectroscopy. On this purpose, samples were first disaggregated withtwo dispersants (sodium hexametaphosphate and sodium carbonate),and then analyzed by means of the Aqueous Liquid Module of the LS13 320MWmodel (Beckman Inc. Coulter), selecting the Fraunhofer the-ories of light scattering. The procedure has the advantage over other

patial distribution of the samples.

Page 3: Functional outlier detection in grain-size distribution curves of detrital sediments

33C. Sierra et al. / Sedimentary Geology 297 (2013) 31–37

methods of providing awide particle size range (0.017–2000 μm), beingfast, reproducible, and well-established ISO13320 (2009) technique.

The identification of the phases present in the sediments wasperformed by means of stereomicroscopic observations using a NikonStereoscopic Zoom Microscope SMZ1000 coupled to a Nikon DS-Ri1color high resolution camera and NIS-elements Software BR (NikonInstruments, Inc.).

2.2. Mathematical model

Here we propose the use of functional bagplots and HDR (high den-sity region) boxplots to detect outliers in the set of grain-size curves.

2.2.1. Functional bagplotFunctional bagplots are an extension of the bivariate bagplots of

Rousseuwet al. (1999). They are applied to thefirst two robust principalcomponent scores. These bagplots have an advantage over other func-tional methods that detect outliers, such as those based on the conceptof functional depth (Febrero et al., 2008), in that it can identify bothmagnitude and shape outliers.

A bivariate bagplot is a generalization of the univariate boxplot. It isconstructed on the basis of the halfspace location depth of a point rela-tive to a bivariate dataset (Tukey, 1975). The halfspace location depthldepth(θ,Z)of point θ∈ℝ2 relative to a bivariate dataset Z = {z1,z2, …,zn} is the smallest number of zi contained in any closed half planewith a boundary line through θ. The depth region Dk is the set of all θwith ldepth(θ,Z) ≥ k;k N 0. The depth regions are convex polygonsand they verify Dk + 1 ⊂ Dk. The depth median of Z is defined as the θwith the highest ldepth(θ,Z).

The bivariate bagplot is similar to a univariate boxplot in that it has acentral point (the depth median), an inner region (the bag), and anouter region (the fence) beyondwhich outliers are shown as individualpoints. In a bagplot, the depthmedian (the point with highest halfspacedepth) lies in the center and is surrounded by the bag, which contains

150 200 250 300 350 400 450

05

1015

grain size (um)

volu

me

perc

enta

gevo

lum

e pe

rcen

tage

100 200 300 400 500 600

1412

108

64

20

0

a)

c)

grain size (um)

Fig. 2.Grain-size curves for the four study areas. From top right to bottom left, a) El Espartal–Salsamples at each site.

50% of the observations with the greatest depth (analogous to theinter-quartile range in a classical boxplot). The fence is obtained bymagnifying the bag by a factor ρ; observations outside the fence areflagged as outliers. It is also possible to visualize a confidence regionfor the depth median in the bagplot. In addition to the outliers, thebagplot allows observation of the characteristics of the data, such aslocation, spread, correlation, skewness, and tails.

The idea behind functional bagplots is that, given a set of observedcurves {yi(x)}, i = 1, …, n, it is possible to order them, thus extendingthe concept of a bagplot to thefirst two scores of a robust principal com-ponent decomposition (Hyndman and Ullah, 2007) of yi:

yi ¼ μ xð Þ þX2k¼1

zi;kϕk xð Þ ð1Þ

where μ(x)is a mean curve, {ϕk(x)} are the principal components andzi = (zi,1,zi,2) are the first two principal component scores (Shang andHyndman, 2008). Ordering via the first two principal components is auseful strategy since outliers are usually more visible in the principalcomponent space than in the original functional space (Filzmoseret al., 2008). Also, according to Hall et al. (2007), the first two principalcomponents commonly hold the main modes of variation. Anotheradvantage of the functional bagplot is that it can be applied evenwhen the number of variables is significantly greater than the numberof observations, as is the case of finely discretized curves.

The functional bagplot allows the representation of the bivariatebagplot corresponding to the two first principal component scores. Itdisplays the median curve (the deepest location), the 95% confidenceintervals for the median, and the 50% (bag) and 95% (fence) surround-ing curves ranked by depth. Any curve beyond the 95% convex hull(corresponding to an inflating factor ρ = 2.57) is flagged as a functionaloutlier. By constructing functional bagplots, it is possible to see theposition of the curves cataloged as outliers relative to the fence.

volu

me

perc

enta

gevo

lum

e pe

rcen

tage

100 200 300 400 500 600

05

1015

grain size (um)

0 100 200 300 400 500

05

1015

b)

d)

grain size (um)

inas area, b) Xagó Beach, c) Estuarymargin, and d) Llodero Cove. Colors represent different

Page 4: Functional outlier detection in grain-size distribution curves of detrital sediments

150 200 250 300 350 400 450 150 200 250 300 350 400 450

05

1015

Vol

ume

prec

enta

geV

olum

e pr

ecen

tage

Vol

ume

prec

enta

geV

olum

e pr

ecen

tage

Vol

ume

prec

enta

geV

olum

e pr

ecen

tage

Vol

ume

prec

enta

geV

olum

e pr

ecen

tage

05

1015

V2 V16

4121

018

64

20

100 200 300 400 500 600 100 200 300 400 500 600

05

1015

05

1015 V13

4121

018

64

20

V8

0 100 200 300 400 500

05

0115

V6V15

V16V25

0

0

100 200 300 400 500

05

0115

grain size (um)

V6

V16

V25

a)

b)

c)

d)

grain size (um)

grain size (um)grain size (um)

100 200 300 400 500 6000 100 200 300 400 500 600grain size (um)grain size (um)

grain size (um)grain size (um)

Fig. 3. Functional bagplot (left) and functional HDRboxplot (right) analyses of the grain-size distribution curves of the four locations: a) El Espartal–Salinas area, b) Estuarymargin, c) XagóBeach, and d) Llodero Cove.

34 C. Sierra et al. / Sedimentary Geology 297 (2013) 31–37

2.2.2. Functional high density region (HDR) boxplotA variant of the functional boxplot, the functional HDR boxplot

(Shang and Hyndman, 2008) orders the scores zi = (zi,1,zi,2) by meansof a kernel bivariate density estimation.

For a bivariate random sample {zi;i = 1,…,n}, drawn from a densityf, the bivariate version of the Parzen–Rosenblatt kernel density estimatef w; a; bð Þ is defined (Scott, 1992) as:

f̂ w; a; bð Þ ¼ 1n � a � b

Xni¼1

kw1−zi;1

a

� �k

w2−zi;2b

� �ð2Þ

where w = (w1,w2)′, k(·) is a symmetric univariate kernel function,such that ∫k(u)du = 1, and (a,b) is a bivariate bandwidth parameter,such that a N 0, b N 0which converges slowly to zero as n goes to infinity.The bandwidth determines the degree of smoothness of the estimator.

The HDR boxplot is defined as:

Rα ¼ z : f̂ z; a; bð Þ≥ f αn o

ð3Þ

where fa is such that Pr(Z∈Ra) ≥ 1 − a. Every point within this regionhas probability coverage 1 − a of having a higher density estimate

Page 5: Functional outlier detection in grain-size distribution curves of detrital sediments

Fig. 4. Spatial distribution of the outliers calculated by functional HDR boxplot (blue dots).

35C. Sierra et al. / Sedimentary Geology 297 (2013) 31–37

than each point outside the region. One of the most distinctive proper-ties of HDRboxplots is that of all possible regions of probability coverage1 − a, the HDR provides the smallest possible region in the samplespace. Furthermore, the mode is contained in the HDR boxplot.

The functional HDR boxplot allows observation of functional outliersby showing themode, defined assupz f̂ z; a; bð Þ, alongwith the inner andouter regions. The former is defined as the region bounded by all curvescorresponding to points inside the 50% bivariate HDR boxplot. Thisregion is analogous to the inter-quartile range, and it provides an indica-tion of the spread of the central 50% of the curves. Similarly, the outerregion is defined as the region bounded by all curves corresponding topoints inside the 95% bivariate HDR boxplot. All points not included inthe 95% HDR boxplot are shown as outliers.

2.2.3. Univariate data analysis: modified Z-score methodIn order to compare the outlier detection based on functional data

analysis with othermethods that use univariate data,we also performedthe outlier analysis over all the samples by means of the modified

Table 1Summary of the outliers detected by the three procedures for outlier detection.

Area Outlier detection procedure

Univariate analysis Functional detection

Modified Z-score Functional bagplot HDR

El Espartal area – – V2, V16Xagó Beach – – V13Estuary margin – – V8Llodero Cove – V6, V15, V16, V25 V6, V16, V25

Z-score method (Iglewicz and Hoaglin, 1993). The modified Z-score is computed as:

Mi ¼0:6745 exi−exj j

MADMAD ¼ median exi−exj jf g

ð4Þ

where exi is the sample median for each curve and ex the sample me-dian for the whole dataset. Observations with Mi N 3.5 are flaggedas outliers.

This method is suitable for data with an approximately normal dis-tribution, like grain-size curves.

3. Results and discussion

The methods discussed in the previous section were applied to thedataset taken in the field. Fig. 2 shows the grain-size curves obtainedby laser dispersion of sediment samples from the four locations shownin Fig. 1.

Fig. 3 depicts the functional bagplot and HDR boxplot correspondingto these samples. It can be appreciated that some curves do not appearto agree in position and/or sizewith themajority. The functional bagplotand HDR boxplots allowed us to define the presence of outliers from astatistical point of view, that is to say, curves that do not fall inside the95% confidence bands (represented in light gray) shown in Fig. 3. Thefunctional bagplot detected outliers only in the Llodero Cove samples,while the HDR boxplot identified outliers in all four locations. The spa-tial distribution of the outliers is shown in Fig. 4.

On visual inspection, the outliers detected in the Espartal–Salinas lo-cation are not evident from the curves; in fact, the functional bagplot didnot detect any outliers. In contrast, the HDR boxplot identified only twooutliers; the first one (V2)was associatedwith a grain-size curve with aprevalence of a larger grain-size than that of the rest of the curves,while

Page 6: Functional outlier detection in grain-size distribution curves of detrital sediments

36 C. Sierra et al. / Sedimentary Geology 297 (2013) 31–37

the second outlier (V16)was less clear and seemed to be a shape outlieras the left side of the curve is convex while the rest of the curves have aconcave left side. Apart from this observation, very little can be saidregarding the origin of these outliers.

Similarly, for the Xagó Beach area (Fig. 3a), only one outlier (V13)was identified by the HDR boxplot. As in the previous case, it was asso-ciated with a grain-size curve with a prevalence of a larger grain-sizethan that of the rest of the curves. The coarser grain-size anomalies inthese samples could be consequence of the presence of floatingwoodenfragments (not attacked by hydrogen peroxide), shells, or even wasteproducts. In both procedures finer grain-size outliers could have beenderived from the accumulation of sands, silts, and clays within theroot systems of the marshes present in the area.

The outliers in the areas of the estuary margin corresponded tocurves shifted to the right in relation to the rest of the curves. This obser-vation implies that these samples have a coarser grain size than theunderlying distribution. This could be attributable to the presence ofplacers or by the mixing of sediments of fluvial provenance with thoseof marine origin, as reported by Dalrymple and Choi (2007) for thiskind of environment.

The highest number of outliers was found in the Llodero Cove area.These outliers were well-defined (V6, V15, V16, V25) and presentedseveral typologies: abnormal multimodal shape (V6), smaller grainsize with lower abundance of the major classes (V15, V16) (note thatoutlier V15 was not detected by the HDR boxplot), and greater grainsize with similar shape (V25). These outliers may have been caused bymeasurement errors (V6) or by the presence of a clay layer 30 cmbeneath the surface deposited by a nearby stream (V15, V16). Alterna-tively, they may be the result of marine biogenic activities in the formof shells (V25), as described byNewell (2004) in these kinds of environ-ments, and as attested by the stereomicroscopic observations.

We found that the HDR procedure had the capacity (summarized inTable 1) to detect grain-size variations such as the presence of placers aswell as shells, litter, and floating wooden fragments. In contrast, tidalstructures, such as runnels, rill marks and ripples (quite common inthe Llodero Cove area), or eolian structures, such as wind shadows ondunes (with a strong presence in El Espartal and Xagó Beach samples)– both types with a significant difference in the granulometric distribu-tion (see for example Folk, 1971; Tsoar, 1978; Stauble, 1992) –were notdetected as outliers by the algorithm and were thus included in thenatural variability of the sediments. This discrimination makes the re-sults representative of the dataset, thus avoiding natural intra-samplevariability thatwould have causedmanymore samples to be consideredoutliers.

4. Conclusions

In sedimentary studies, a few samples that deviate from common orexpected characteristics can hinder the determination of the grain-sizeparameters required for sediment characterization. To date, the com-mon criteria for the determination of outliers have been based eitheron visual identification or on the use of selected discrete parametersof the granulometric curve.

In the light of our results, the functional bagplot and theHDRboxplotproved to be more effective at detecting outliers than similar methodsfor univariate data (modified Z-score). In this regard, despite the appre-ciation of clear visual differences between the curves, the classicalmethodwas not able to detect outliers on the basis of just one character-istic parameter of the curve (the median in our case). Moreover, of thetwo functional outlier detection procedures tested, the HDR boxplotprovedmore suitable than the functional bagplot, since itwasmore sen-sitive, thus identifying a greater number of outlier curves and at thesame time avoiding natural intra-sample variability.

The mathematical determination of these outliers will allow moreaccurate grain-size normalizations as a previous step in geochemicaland sedimentological analyses.

Acknowledgments

Carlos Sierra obtained a grant from the “Severo Ochoa” program(Ficyt, Asturias, Spain). The authors thank Dr. Teresa Albuquerque, atthe Polytechnic Institute of Castelo Branco (Portugal), for the assistanceprovided in the elaboration of this paper.

References

Blott, S.J., Pye, K., 2001. GRADISTAT: a grain size distribution and statistics packagefor the analysis of unconsolidated sediments. Earth Surf. Process. Landforms26, 1237–1248.

Cronican, A.E., Gribb, M.M., 2004. Literature review: equations for predicting hydraulicconductivity based on grain-size data. Supplement to Technical Note entitled:hydraulic conductivity prediction for sandy soils. Ground Water 42, 459–464.

Dalrymple, R.W., Choi, K., 2007. Morphologic and facies trends through the fluvial–marinetransition in tide-dominated depositional systems: a schematic framework for envi-ronmental and sequence-stratigraphic interpretation. Earth Sci. Rev. 81, 135–174.

Febrero, M., Galeano, P., González-Manteiga,W., 2008. Outlier detection in functional databy depth measures, with application to identify abnormal NOx levels. Environmetrics19, 331–345.

Filzmoser, P., Maronna, R., Werner, M., 2008. Outlier identification in high dimensions.Comput. Stat. Data Anal. 52, 1694–1711.

Fischmeister, H.F., Zahn, R., 1966. Modern Developments in Powder Metallurgy. H.H.Hausner Plenum Press, New York.

Flor, G., 1979. Depósitos arenosos de las playas de la región de Cabo Peñas (Asturias):sedimentología y dinámica. PhD Thesis Department of Geology, University of Oviedo,Spain.

Flor, G., 1981. Las dunas eólicas costeras de la playa de Xagó (Asturias). Trab. Geol. 11, 61–71.Flor, G., 1986. Sedimentología de una duna lingüiforme en la playa de Xagó (Asturias). IX

Congreso Nacional de Sedimentología.Universidad de Salamanca 317–328.Flor-Blanco, G., Flor, G., Pando, L., 2013. Evolution of the Salinas–El Espartal and Xagó beach/

dune systems in north-western Spain over recent decades: evidence for responses tonatural processes and anthropogenic interventions. Geo-Mar. Lett. 33, 143–157.

Folk, R.L., 1971. Longitudinal dunes of the northwestern edge of the Simpson desert,Northern Territory, Australia. Geomorphology and grain size relationships. Sedimen-tology 16, 5–54.

Friedman, G.M., Johnson, K.G., 1982. Exercises in Sedimentology. Wiley, New York.Gallego, J.R., Ordóñez, A., Loredo, J., 2002. Investigation of trace element sources from an

industrialized area (Avilés, northern Spain) using multivariate statistical methods.Environ. Int. 27, 589–596.

Gee, G.W., Bauder, J.W., 1996. Particle size analysis. In: Klute, A. (Ed.), Methods of SoilAnalysis. American Society of Agronomy, Madison, WI, pp. 383–411.

Hall, P.G., Lee, Y., Park, B., 2007. A method for projecting functional data onto a low-dimensional space. J. Comput. Graph. Stat. 16, 799–812.

Hawkins, D., 1980. Identification of Outliers. Chapman and Hall, London, UK.Hyndman, R.J., Shang, H.L., 2008. Bagplots, boxplots and outlier detection for functional

data. Functional and Operatorial Statistics. Springer, Heidelberg, pp. 201–207.Hyndman, R.J., Ullah, M.S., 2007. Robust forecasting of mortality and fertility rates: a func-

tional data approach. Comput. Stat. Data Anal. 51, 4942–4968.Iglewicz, B., Hoaglin, D., 1993. How to detect and handle outliers. Particle Size Analysis.

Laser Diffraction Methods.ASQC Quality Press, Milwaukee, WI (ISO 13320:2009).Julivert, M., Truyols, J., Marcos, A., Arboleya, M.L., 1973. MAGNA 50 (2ª Serie). Hoja 13 –

Avilés.Instituto Geológico y Minero de España.López, J., Flor, G., 2008. Evolución ambiental del estuario de Avilés. Trab. Geol. 28, 119–135.Matias, A., 2006. Overwash Sedimentary Dynamics in the Ria Formosa Barrier Islands.

Ph.D. thesis Universidade do Algarve, Portugal.Matias, A., Vila-Concejo, A., Ferreira, O., Morris, B., Dias, J.A., 2010. Sediment dynamics of

barriers with frequent overwash. J. Coast. Res. 25, 768–780.Merkus, H.G., 2008. Particle Size Measurements: Fundamentals, Practice, Quality Particle

Technology. Springer-Verlag, New York.Newell, R.I.E., 2004. Ecosystem influences of natural and cultivated populations of

suspension-feeding bivalve molluscs: a review. J. Shellfish. Res. 23, 51–62.Peterson, E., Small, W.M., 1993. Physical behavior of water-atomized iron powder: particle

size distribution and apparent density. Int. J. Powder Metall. 29, 131–137.Rousseuw, P.J., Ruts, I., Tukey, J.W., 1999. The bagplot: a bivariate boxplot. Am. Stat. 53,

382–387.Scott, D.W., 1992. Multivariate density estimation. Theory, Practice and Visualization.John

Wiley & Sons, New York.Shang, H.L., Hyndman, R.J., 2008. Boxplot an Outlier Detection for Functional Data.

First International Workshop on Functional and Operatorial Statistics, Toulouse19–21.

Stauble, D.K., 1992. Long term profile and sedimentmorphodynamics: field research facil-ity case history. Coastal and Hydraulics Laboratory Technical, Report. CERC-92-7.U.S.Army Engineer Research and Development Center.

Szava-Kovats, R.C., 2002. Outlier-resistant errors-in-variables regression: anomalyrecognition and grain-size correction in stream sediments. Appl. Geochem. 17,1149–1157.

Tanner, W.F., 1986. Inherited and mixed traits in the grain size distribution. Proceedingsof the 9th Symposium on Coastal Sedimentology, Tallahassee, FL, Department ofGeology. Florida State University, pp. 9–21.

Tsoar, H., 1978. The Dynamics of Longitudinal Dunes. Final Technical Report DA-ERO76-G-072. European Research Office, U.S. Army, London.

Page 7: Functional outlier detection in grain-size distribution curves of detrital sediments

37C. Sierra et al. / Sedimentary Geology 297 (2013) 31–37

Tukey, J.W., 1975. Mathematics and the picturing of data. Proceedings of the InternationalCongress of Mathematicians. , 2, pp. 523–531 (Vancouver).

Viscosi-Shirley, C., Mammone, K., Pisias, N., Dymond, J., 2003. Clay mineralogy and multi-element chemistry of surface sediments on the Siberian–Arctic shelf: implications forsediment provenance and grain size sorting. Cont. Shelf Res. 23, 1175–1200.

Weltje, G.J., von Eynatten, H., 2004. Quantitative provenance analysis of sediments:review and outlook. Sediment. Geol. 171, 1–11.

Yalcin, M.G., 2009. Heavy mineral distribution as related to environmental conditions formodern beach sediments from the Susanoglu (Atakent, Mersin, Turkey). Environ.Geol. 58, 119–129.