a diagnostic modelling framework to construct indices of biotic - vliz

12
A diagnostic modelling framework to construct indices of biotic integrity: A case study of sh in the Zeeschelde estuary (Belgium) Paul Quataert a, * , Pieter Verschelde a , Jan Breine a , Geert Verbeke b , Els Goetghebeur c , Frans Ollevier d a Research Institute for Nature and Forest (INBO), Kliniekstraat 25, B-1070 Brussels, Belgium b Leuven Biostatistics & Statistical Bioinformatics Centre, Katholieke Universiteit Leuven, U.Z. Sint-Rafael, Kapucijnenvoer 35, B-3000 Leuven, Belgium c Department of Applied Mathematics and Computer Sciences, Ghent University, Krijgslaan 281 S9, Belgium d Laboratory of Aquatic Ecology and Evolutionary Biology, Katholieke Universiteit Leuven, Charles Deberiotstraat 32, B-3000 Leuven, Belgium article info Article history: Received 13 October 2010 Accepted 18 June 2011 Available online 24 June 2011 Keywords: sh estuaries environmental assessment statistical models cross-validation ROC curve Belgium Zeeschelde estuary abstract We propose a coherent regression model building framework to construct sh-based indices. More specically, we concentrate on the selection of an optimal set of metrics which remains a difcult problem. The paper departs from the observation that an index of biotic integrity (IBI) is analogous to a diagnostic model in medicine assessing the health condition of a patient from a series of biomarkers. In the same vein, an IBI is a diagnostic model predicting the ecosystem condition of a site from a set of (scored) metrics. Metrics are community attributes sensitive to anthropogenic pressure and their scores express the distance to targetto a reference condition. In a medical context, Receiver Operating Characteristic (ROC) curves are commonly used to assess the diagnostic accuracy of laboratory tests. An ROC curve plots the sensitivity of a test (Se; the capacity to detect a disease or degradation) as a function of its false positive fraction (FPF) which is the complement of the specicity (Sp ¼ 1 e FPF; the capacity to recognise a healthy person or a reference condition). The ROC curve represents the strength of the index to discriminate between degraded and reference sites. Higher curves correspond to stronger tests as then a higher sensitivity can be combined with a lower false positive fraction. Hence, it is intuitively clear to use summary statistics of the ROC curve as criteria to optimise medical tests or biotic indices. In this paper, we illustrate the value of this modelling framework with a case study in the Zeeschelde estuary in Belgium. In essence, a traditionalIBI is an average of metrics scoring relevant properties of the ecosystem. We demonstrate this average score model (AVG) is a special member of the more exible predictive logistic model (PLM) family. The selection of a set of metrics becomes equivalent to variable selection in statistical model building. We apply model building techniques as best subsets regression to facilitate the search for an optimal suite of metrics from a candidate set and use cross-validation to avoid overtting. The results show that a few metrics sufce to discriminate between most-impacted and least-impacted sites. Ó 2011 Elsevier Ltd. All rights reserved. 1. Introduction The value of using sh as a biotic quality element to monitor the ecological status of rivers is well accepted (Karr, 1981; Hughes and Oberdorff, 1999; Schmutz et al., 2007). In estuaries, Thompson and Fitzhugh (1986) were among the rst to integrate several sh metrics into a multi-metric index, a work continued by many others in the United States (Deegan et al., 1997; Roth et al., 1998). In South Africa, a multi-metric approach was developed by Harrison and Whiteld (2004, 2006). More recently in Europe, several sh- based estuarine indices were developed in the context of the Water Framework Directive (Araújo et al., 2000; Borja et al., 2004, 2009a; Breine et al., 2007; Coates et al., 2007; Breine et al., 2010; Delpech et al., 2010). Karrs proposal on how to compose and construct multi-metric indices of biotic integrity (Karr, 1981; Karr et al., 1986) started a rich research tradition of researchers working at the boundary between science and policy making (Huitema and Turnhout, 2009). These boundary workers gradually rened and extended the original concept to cover a broad range of situations (Hughes and Oberdorff, 1999). Guideline papers appeared standardising and consolidating the construction of biotic indices (Karr and Chu, 1997; Hughes et al., * Corresponding author. E-mail addresses: [email protected] (P. Quataert), pieter.verschelde@inbo. be (P. Verschelde), [email protected] (J. Breine), [email protected] (G. Verbeke), [email protected] (E. Goetghebeur), frans.ollevier@bio. kuleuven.be (F. Ollevier). Contents lists available at ScienceDirect Estuarine, Coastal and Shelf Science journal homepage: www.elsevier.com/locate/ecss 0272-7714/$ e see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.ecss.2011.06.014 Estuarine, Coastal and Shelf Science 94 (2011) 222e233

Upload: others

Post on 09-Feb-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A diagnostic modelling framework to construct indices of biotic - Vliz

lable at ScienceDirect

Estuarine, Coastal and Shelf Science 94 (2011) 222e233

Contents lists avai

Estuarine, Coastal and Shelf Science

journal homepage: www.elsevier .com/locate/ecss

A diagnostic modelling framework to construct indices of biotic integrity: A casestudy of fish in the Zeeschelde estuary (Belgium)

Paul Quataert a,*, Pieter Verschelde a, Jan Breine a, Geert Verbeke b, Els Goetghebeur c, Frans Ollevier d

aResearch Institute for Nature and Forest (INBO), Kliniekstraat 25, B-1070 Brussels, Belgiumb Leuven Biostatistics & Statistical Bioinformatics Centre, Katholieke Universiteit Leuven, U.Z. Sint-Rafael, Kapucijnenvoer 35, B-3000 Leuven, BelgiumcDepartment of Applied Mathematics and Computer Sciences, Ghent University, Krijgslaan 281 S9, Belgiumd Laboratory of Aquatic Ecology and Evolutionary Biology, Katholieke Universiteit Leuven, Charles Deberiotstraat 32, B-3000 Leuven, Belgium

a r t i c l e i n f o

Article history:Received 13 October 2010Accepted 18 June 2011Available online 24 June 2011

Keywords:fishestuariesenvironmental assessmentstatistical modelscross-validationROC curveBelgiumZeeschelde estuary

* Corresponding author.E-mail addresses: [email protected] (P. Quata

be (P. Verschelde), [email protected] (J. Breine), gee(G. Verbeke), [email protected] (E. Goetgkuleuven.be (F. Ollevier).

0272-7714/$ e see front matter � 2011 Elsevier Ltd.doi:10.1016/j.ecss.2011.06.014

a b s t r a c t

We propose a coherent regression model building framework to construct fish-based indices. Morespecifically, we concentrate on the selection of an optimal set of metrics which remains a difficultproblem. The paper departs from the observation that an index of biotic integrity (IBI) is analogous toa diagnostic model in medicine assessing the health condition of a patient from a series of biomarkers. Inthe same vein, an IBI is a diagnostic model predicting the ecosystem condition of a site from a set of(scored) metrics. Metrics are community attributes sensitive to anthropogenic pressure and their scoresexpress the “distance to target” to a reference condition. In a medical context, Receiver OperatingCharacteristic (ROC) curves are commonly used to assess the diagnostic accuracy of laboratory tests. AnROC curve plots the sensitivity of a test (Se; the capacity to detect a disease or degradation) as a functionof its false positive fraction (FPF) which is the complement of the specificity (Sp ¼ 1 e FPF; the capacityto recognise a healthy person or a reference condition). The ROC curve represents the strength of theindex to discriminate between degraded and reference sites. Higher curves correspond to stronger testsas then a higher sensitivity can be combined with a lower false positive fraction. Hence, it is intuitivelyclear to use summary statistics of the ROC curve as criteria to optimise medical tests or biotic indices. Inthis paper, we illustrate the value of this modelling framework with a case study in the Zeescheldeestuary in Belgium. In essence, a “traditional” IBI is an average of metrics scoring relevant properties ofthe ecosystem. We demonstrate this average score model (AVG) is a special member of the more flexiblepredictive logistic model (PLM) family. The selection of a set of metrics becomes equivalent to variableselection in statistical model building. We apply model building techniques as best subsets regression tofacilitate the search for an optimal suite of metrics from a candidate set and use cross-validation to avoidoverfitting. The results show that a few metrics suffice to discriminate between most-impacted andleast-impacted sites.

� 2011 Elsevier Ltd. All rights reserved.

1. Introduction

The value of using fish as a biotic quality element to monitor theecological status of rivers is well accepted (Karr, 1981; Hughes andOberdorff, 1999; Schmutz et al., 2007). In estuaries, Thompson andFitzhugh (1986) were among the first to integrate several fishmetrics into amulti-metric index, awork continued bymany othersin the United States (Deegan et al., 1997; Roth et al., 1998). In South

ert), [email protected]@med.kuleuven.behebeur), frans.ollevier@bio.

All rights reserved.

Africa, a multi-metric approach was developed by Harrison andWhitfield (2004, 2006). More recently in Europe, several fish-based estuarine indices were developed in the context of theWater Framework Directive (Araújo et al., 2000; Borja et al., 2004,2009a; Breine et al., 2007; Coates et al., 2007; Breine et al., 2010;Delpech et al., 2010).

Karr’s proposal on how to compose and construct multi-metricindices of biotic integrity (Karr, 1981; Karr et al., 1986) started a richresearch tradition of researchers working at the boundary betweenscience and policy making (Huitema and Turnhout, 2009). Theseboundary workers gradually refined and extended the originalconcept to cover a broad range of situations (Hughes and Oberdorff,1999). Guideline papers appeared standardising and consolidatingthe construction of biotic indices (Karr and Chu,1997; Hughes et al.,

Page 2: A diagnostic modelling framework to construct indices of biotic - Vliz

Fig. 1. The Zeeschelde estuary and sites surveyed during the period 1995e2008.

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233 223

1998; Hering et al., 2006; Roset et al., 2007; Southerland et al.,2007; Stoddard et al., 2008). However, none of these papersexplicitly uses a statistical model building framework to select theoptimal number of metrics. This situation is in sharp contrast withdiagnostic models in medicine assessing in a cost-effective way thehealth condition of a patient based on a set of biomarkers (Zhouet al., 2002; Pepe, 2003). For instance, to distinguish healthy fromdiseased persons, it is common practice to develop a logistic modelcontrolling for two possible types of error: false positives (a healthyperson is predicted to be diseased) and false negatives (a diseasedperson is found to be healthy).

This paper departs from the observation that index develop-ment is very similar, if not equivalent, to the construction ofa diagnostic model in a medical context. Tuning a diagnostic modelin medicine starts from an independent, preferably ‘gold standard’assessment (“preclassification”) of the health condition of thepatients, based on intensive clinical investigations and expertjudgement of clinicians (Zhou et al., 2002). The next step is thesearch for an optimal suite of predictors from a candidate set ofplausible diagnostic criteria (“metrics”) to predict with minimalmisclassification error the health status of the patients. For thisoptimisation, Receiver Operator Characteristic curves (ROC curves)play a central role, a concept (and tool) borrowed from statisticaldecision theory (Swets, 1988).

In this paper, we use a case study in the Zeeschelde estuary inBelgium to illustrate that a diagnostic modelling approach facili-tates the construction of a fish-based index. We demonstrate theclose link between the classical index format and a logisticregressionmodel. In addition, we show how to use statistical modelbuilding techniques as best subsets regression in combination withcross-validation to find the optimal suite of metrics from a candi-date set. Finally, we estimate the diagnostic accuracy. Morespecifically, we explain how the ROC concept offers a coherentframework both for the optimisation criterion of the modelbuilding and for the choice of the decision boundary to assess theecological condition of a site.

Table 1Number of fishing occasions per year in the different salinity zones of the Zeeschelde est

Zone 1995 1997 1998 1999 2001 2002

Mesohaline 7 1 8 3 7 9Oligohaline 3 7 15 6 2Freshwater 13 8 3 17Total 10 21 31 3 16 28

2. Materials and methods

2.1. The study area, typology and the sampling sites

2.1.1. The study area and stratification by salinity zoneThe study area was the Zeeschelde estuary, the Belgian part of

the estuary of the River Schelde which enters the North Sea (TheNetherlands) (Fig. 1). The Zeeschelde is a single-channel, macro-tidal estuary with intertidal area (Baeyens et al., 1998). As salinity isan important factor for the aquatic community (McLusky andElliott, 2004), we distinguished three salinity zones according tothe Venice system (1959): a mesohaline zone between Zandvliet(Dutch/Belgian border) and Antwerpen; an oligohaline zone fromAntwerpen to Temse, including the Rupel tributary, and a fresh-water zone further upstream until Gent, including the Durmetributary. Data were collected from 31 different sites over a periodof nearly fifteen years (1995e2008) (Table 1).

2.1.2. Data collection and selectionTrained fish biologists used a standardised protocol. One or two

double fyke nets were positioned at low tide and emptied the nextday. The fish were identified in the field to the species level.Occasional cross-examination in the field and laboratory ensuredthe identification quality. The catches were standardised as numberof fish per fyke per day (CPUE). We excluded the winter period(DecembereFebruary) to moderate seasonal variations (Baeyenset al., 1998). Some sites had a very high number of fishing occa-sions and so to reduce their impact, we took at random onemeasurement per month. To reduce correlation, the algorithmrestricted the selection to observations to be separated by at least15 days. The final database retained 259 observations: 62 in themesohaline zone, 95 in the oligohaline zone and 102 in the fresh-water zone.

2.2. The model format (how to calculate an index from the catchdata?)

Although there exist many variants, in essence, most bioticindices have the same four layer format (Breine et al., 2004)translating the catch data (or more generally the community data)step by step in an (ordinal) class variable assessing the ecosystemcondition. Fig. 2 makes the statistical model explicit for the binaryclassification to distinguish between reference (‘R’) and degraded(‘D’) sites or between least-impacted and most-impacted as for thispaper.

2.2.1. The generic format of a multi-metric index of biotic integrity(IBI)

(i) The calculation starts with deriving from the catch data a setof metrics reflecting a series of ecologically relevant attributes ofthe ecosystem (composition, pattern or function) which aresensitive to the human alterations of the environment. (ii) Subse-quently, the metrics are scored, to express how (dis)similar themetric values are relative to type-specific or site-specific referenceconditions. (iii) The third step combines the individual scores(traditionally by summing or averaging) into one single test

uary retained for model building. Note: there are no observations in 1996 and 2000.

2003 2004 2005 2006 2007 2008 Total

8 8 1 2 8 622 14 13 5 22 6 955 6 7 10 26 7 102

15 28 21 17 56 13 259

Page 3: A diagnostic modelling framework to construct indices of biotic - Vliz

Fig. 2. Index model. Index values: EQM ¼ ecological quality measure;EQC(b) ¼ (binary) ecological quality class; HIC(b) ¼ (binary) human impact class(preclassification) with “R” ¼ reference (least-impacted) and “D” ¼ degraded (most-impacted). Data: C ¼ catch data; Mj ¼ value for metric j (j ¼ 1, 2, ., J); Sj ¼ scoredmetric value; pj(Mj) ¼ position with respect to the septiles of the reference distributionof metric j. Model types: AVG ¼ average score model; AVGw ¼ weighted AVG;PLM ¼ predictive logistic model. Model parameters: B ¼ decision boundary;wj ¼ weights; bj ¼ regression coefficients (b0 ¼ intercept).

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233224

variable or yardstick assessing the overall impact; we call theecological quality measure (EQM). (iv) The fourth and final stepcompares the EQM with decision boundaries (cut-off values)defining the ecological quality class (EQC), an ordinal class variableexpressing the degree of degradation of the ecosystem (or,expressed positively, the level of biotic integrity).

2.2.2. The candidate metricsFrom the metrics identified by Breine et al. (2007), we selected

seven to eight candidates based on their ecological relevance foreach salinity zone (Table 2). All metrics are proportions expressingthe fraction of specimens belonging to a certain ecological group orguild. Positive metrics are positively correlated with ecologicalquality, hypothesized to decrease with increasing degradation; fornegative metrics the reverse holds. We constructed two sets ofcandidate metrics, each set connected with a different hypothesis(Bailey et al., 2004). The first set reflects general hypotheses aboutthe relation between human stress and changes in the ecological

Table 2Overview of metrics per salinity zone: [þ] ¼ positive metric; [e] ¼ negative metric. X ¼ cmodel; PLM: included in optimal predictive logistic model.

Metrics Mesohaline

Abbr. Definition Candidate Optimal

Generic metrics testing for the global functioning of an aquatic ecosystemBen Benthivores [þ] X PLM/AVGInv Invertivores [þ]Omn Omnivores [�]Pis Piscivores [þ] XRhb Rheophilic fish [þ]Spa Specialised spawner [þ] X

Specific metrics testing for functions of the estuaryDia Diadromous species [þ] XErs Estuarine residents [þ] XFws Freshwater residents [þ]Mjs Marine juvenile migrants [þ] X PLM/AVGMss Marine seasonal migrants [þ] X

community (Karr and Chu, 1997). With increasing anthropogenicdisturbance, species having narrow habitat or biotic requirementsare expected to become less abundant: benthivores, invertivores,piscivores, rheophilic species, specialised spawners (¼ positivemetrics) and, conversely, generalists become more dominant(omnivores ¼ negative metric). The second set refers to specificfunctions of the estuary (Elliott et al., 2007): diadromous species,estuarine residents, marine juvenile migrants and marine seasonalmigrants. We also included freshwater residents to investigate thefreshwatereoligohaline interface.

2.2.3. Scoring the metrics with respect to a type-specific referencedistribution

As no pristine sites were available, we chose the best available,least-impacted sites of each salinity zone to estimate the septiles oftype-specific reference distributions to score the metrics. Theseptiles determine the position p of a metric value; e.g. p ¼ 2, if andonly if the metric value is larger than the first septile, but smallerthan or equal to the second one. Then, for positivemetrics, the scorewas set equal to (p�1)/6. For negativemetrics, the complement wasused: 1 � (p�1)/6 ¼ (7�p)/6 such that also in this case the scoresare positively associated with ecological quality. The scores rangefrom 0 (lowest quality) to 1 (highest quality) in steps of 1/6. Thisapproach only slightly differs from the classical five class approachbut allows for more combinations (e.g. with two metrics, there are7 � 7 ¼ 49 values possible for septiles compared to 5 � 5 ¼ 25 forquintiles).

2.2.4. The predictor function: the model equation to assess theecological quality measure (EQM)

The first model is simply the average of the scored metrics (Sj).This average score model (AVG) is the traditional approach mostoften found in literature. A first straightforward extension isa weighted average (AVGw) giving the metrics a different weight tobe fixed in advance (expert judgement) or to be estimated. A thirdvariant makes the connection to a predictive logistic model (PLM)(Hosmer and Lemeshow, 2000) by introduction of the logit link(McCullagh and Nelder, 1989). Because the logit-transformation isa monotone function not changing the ranking of the scores, PLM isequivalent to AVGw, only the scale differs. However, we can uselogistic regression software to fit the model. The intercept (b0) canbe dropped. The term does not contribute to the ranking as eachsite gets the same additional constant. Centring the scores (Sj� 0.5)results in EQM ¼ 0.5 if all Sj ¼ 0.5 which facilitates the (graphical)comparisonwith AVG for which the same relation holds (EQM¼ 0.5if all Sj ¼ 0.5).

andidate metric selected for a salinity zone; AVG: included in optimal average score

Oligohaline Freshwater

Candidate Optimal Candidate Optimal

X AVG X AVGX XX XX PLM/AVG X

XX X PLM/AVG

X XXX PLM X PLM

Page 4: A diagnostic modelling framework to construct indices of biotic - Vliz

Table 3Habitat indicators and threshold values used to preclassify the sites. Scores range from high (1) to bad (5) quality (after Aubry and Elliott, 2006 and further adapted from Breineet al., 2010).

Parameter Score

þ Exposure indicators (anthropogenic state & habitat alterations)

1 2 3 4 5

Minimum dissolved oxygen (DO) (%) >80 �80 & >70 �70 & >50 �50 & >30 �30Benthos Classification explained in Brys et al. (2005) and Speybroeck et al. (2008)Intertidal area loss (%) 0 <20 �20 & <30 �30 & <50 �50Land reclamation (%) 0 <5 �5 & <40 �40 & <60 �60

D Stressor indicators (anthropogenic activities & pressures)

1 2 3 4 5

Port & marina activities (absence/presence) No YesIndustrial activities (expert judgement) Low Moderate HighDredging activities (absence/presence) No Yes

D Thresholds to derive human impact class

Human Impact Score 7 8e14 15e21 22e28 29e35Human Impact Level Very low Low Moderate High Very highHuman Impact Class (HIC) 1 2 3 4 5

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233 225

2.2.5. The decision rule to assess the ecological quality classThe ecological quality class (EQC) is an ordinal class variable

reflecting the ecological integrity, for instance, from category 1(very high) to 5 (very low). For our case study, the classification isbinary and discriminates between least-impacted (reference) andmost-impacted (degraded) sites. For both AVG and PLM, the samedecision rule applies: compare EQMwith a decision boundary (B). IfEQM � B, decide the site is degraded (“D”). Conversely, if EQM > B,decide the site is reference (“R”).

Table 4Pressure status per salinity zone of the Zeeschelde estuary.

Mesohalinezone

Oligohalinezone

Freshwaterzone

Human impact class (HIC)2 (Low impact) 51 e 313 (Moderate impact) 11 53 504 (High impact) e 42 21

Binary classification (HICb)R (least-impacted / Reference) 51 53 31D (most-impacted / Degraded) 11 42 71Total 62 95 102

2.3. The strategy to calibrate the index model

2.3.1. The (binary) preclassificationTo calibrate and fit above model, i.e. to select the appropriate

metrics and determine the unknown parameters, it is necessary tohave a sample of catch data of sites for which the ecologicalcondition is preclassified independently from the community data.For this objective, we adapted a procedure from Aubry and Elliott(2006) who provided a coherent and comprehensive set of indi-cators scoring the ecological impact of anthropogenic pressuresand activities in estuaries and coast from 1 (very low impact) to 5(very high impact). The sum of these scores results in a one-dimensional ranking of the sites reflecting the (total) humanimpact (HIC). To compose a balanced set, we selected twocomplementary groups of indicators (Table 3): exposure andstressor indicators (Bedoya et al., 2009).

The exposure indicators address factual anthropogenic envi-ronment alterations impacting directly the ecological community.The first two variables in Table 3 (dissolved oxygen and benthos)are factors supporting fish life (Turnpenny et al., 2006); the nexttwo variables express the loss of habitat availability for fish (Madon,2008). For oxygen, the year median of monthly measurements wasused for scoring. Benthos was scored on a yearly basis (Brys et al.,2005). Intertidal area loss (%) and land reclamation (%) weredetermined with respect to the intertidal surface in 1960 and oldmaps from 1890 respectively. There is overlap between both vari-ables resulting in double counting. However, as land reclamation ismore linked to industrial development, we gave it extra weight.

The stressor indicators make an inventory of anthropogenicactivities deteriorating the ecosystem. Boat traffic and constructionactivities have a negative effect on fish life (Tull, 2006). The pres-ence of marinas was assessed with aerial photographs. Industrialactivities (e.g. bank reinforcement) decrease habitat diversity and

occasional pollution has a negative impact on fish assemblages(Wheeler, 1969; Sindilariu et al., 2006). The degree of industrialactivity (low, moderate or high) was provided by experts. Dredgingnegatively influences benthic communities, a principal foodresource for some estuarine fish species (Elliott et al., 1998; Gard,2002; Kennish, 2002). The Maritime Access Division of theFlemish Ministry provided data about the channel dredgingactivities.

A complication was that the human impact was strongly asso-ciated with the salinity zone (Table 4). In the mesohaline zone, nosites of high human impact (HIC ¼ 4) were found, while in theoligohaline zone, sites of low human impact (HIC ¼ 2) were absent.This confounding of the salinity and pressure gradient precludedconstructing one global model for all zones simultaneously.Therefore, we stratified by salinity zone and developed threeseparate models contrasting the least-impacted sites (denoted asreference sites: ‘R’) with the most-impacted sites (denoted asdegraded sites: ‘D’).

2.3.2. Diagnostic accuracy as the optimisation criterionThe index is optimised to match as closely as possible the pre-

classification which is the response variable of the model. Ina binary classification, we should distinguish between two types oferrors (Quataert et al., 2007). If a reference site (with respect to thepreclassification) is misclassified as degraded (by the index), wehave a false positive error (FP); a false negative error (FN) occurswhen a degraded site is misclassified as reference. When optimis-ing an index, we should pay attention to each error separately asthey have different implications. For instance, if we use the index todecide about restoration, an FP implies that a reference site is

Page 5: A diagnostic modelling framework to construct indices of biotic - Vliz

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233226

restored unnecessarily depleting management resources and risk-ing causing harm because of a wrong treatment. Conversely, withan FN a degraded site is not treated thus continuing an unfav-ourable situation. In this respect, a logical optimisation criterion forthe index development is to minimise both the false positive frac-tion (FPF) and false negative fraction (FNF) or, equivalently, tomaximise the specificity (Sp) and the sensitivity (Se). The specificityor true negative fraction (TNF) expresses the capacity of the indexto recognise reference sites and is equal to the complement of FPF(Sp¼ TNF¼ 1e FPF). Conversely, the sensitivity or the true positivefraction (TPF) is the capacity of the index to detect a degraded siteand is the complement of FNF (Se ¼ TPF ¼ 1 e FNF).

2.3.3. The ROC curveA complication of above optimisation criteria is that sensitivity

and specificity depend on the decision boundary B used as a cut-offto classify the sites (see decision rule in Fig. 2). The ReceiverOperating Characteristic (ROC) curve gives a typical graphicalrepresentation of how the sensitivity changes as a function of thecomplement of the specificity for a changing decision threshold(Zhou et al., 2002; Pepe, 2003). Fig. 3 gives examples for threedifferent indices with increasing strength. We can understand anROC curve as follows. We start from any point on the curve corre-sponding to a certain B. Assuming that the ecological qualitymeasure (EQM) is positively associated with ecosystem quality (thesame reasoning applies when the converse is true), it is logical todecide a site is impacted when EQM < B. Setting B higher, wesooner conclude a site is degraded. For the degraded sites, thismoreliberal policy is beneficial: the sensitivity (Se) will increase. Incontrast, for reference sites, this is disadvantageous: the falsepositive fraction (FPF) will increase or the specificity (Sp ¼ 1 e FPF)will decrease. We move upwards on the curve. The opposite occurswhen lowering B andwemove downwards. Interestingly, fromROCcurves, we can graphically judge at one glance which index issuperior: the higher the curve, the better. For instance, with a fixedspecificity (dotted line), the corresponding sensitivity is superior:

0 .0 0 .2 0 .4 0 .6 0 .8 1 .0

0.0

0.2

0.4

0.6

0.8

1.0

1 - specificity

sens

itivi

ty

b

a

b'

a'

b'' a''

Fig. 3. The receiver operating characteristic curves (ROC) and the choice of the deci-sion boundary: a, a0 , a00 ¼ false positive fraction or specificity fixed (dotted line); b, b0 ,b00 ¼ balanced errors (or specificity ¼ sensitivity).

a” > a’ > a; or when balancing sensitivity and specificity:b” > b’ > b. Hence, index construction comes down to optimisingthe (height of the) ROC curve. As it is not evident to optimise a curveas a whole, summary statistics are often used. Since it wasconsidered important to keep both errors small, we chose tominimise max(FPF, FNF) to keep the errors as balanced as possible,which corresponds to the points b in Fig. 3.

2.3.4. Best subsets regression and cross-validationAs stepwise modelling can miss the optimal model, we fitted all

possible subsets (Kutner et al., 2005) of the candidate metrics up tofour metrics and determined the optimal number of metrics byboxplots displaying the trend of the false positive and false negativefraction as a function of the number of metrics in the model. Toselect the optimal model, we did not investigate the best modelonly, but also interpreted close competitors, for a deeper under-standing of the results (Claeskens and Hjort, 2008). To estimate thediagnostic accuracy unbiasedly, we used a guided tenfold cross-validation. The dataset was split at random in ten parts butguided such that each part mirrored the proportion of least- andmost-impacted sites in the original dataset. In turn, each part wasset aside for validation of the model calibrated by the nine othersparts, i.e. estimating the diagnostic accuracy from the predictedvalues of the calibrated model. Averaging over these ten estimatesresulted in a global cross-validated estimate of diagnostic accuracy.

3. Results

3.1. Best subset regression

Boxplots visualize the evolution of the false positive and falsenegative fraction of all possible metric subsets as a function of thenumber of metrics (Fig. 4). Going from one to two metrics in themodel improved the balance between two error types for bothmodels (AVG and PLM). The trend diverged when a third metricwas added to the model. For AVG the boxplots widened and thebalance between false positives and false negative misclassificationdisappeared. For PLM, the boxplots narrowed further (implyingthat many competing models exist) but the diagnostic accuracy ofthe best models did not improve substantially.

3.2. The best models

Table 5 lists the best subsets for each number of metrics andgives close competitors when the model consists of two metrics.The table confirms the boxplot pattern i.e. using more than twometrics does not improve the diagnostic accuracy. With twometrics in the model, 25%e30% of the sites were misclassified andthe balance between false positives and false negatives is in mostsituations good. To improve the comparison with AVG, we rescaledthe regression coefficients of PLM such that their sum equals one.This transformation does not alter the classification performance ofthe index as only the ranking of the sites with respect to theboundary threshold is important.

3.3. The estuarine biotic index (EBI)

Although therewas metric that was clearly the best, we selected(see discussion) in each zone a best subset with two metrics(marked with AVG and PLM in Table 2). The boxplots in Fig. 5compare the scores of the degraded and reference sites (centredaround 0.5 by construction) and show the decision boundarymaking sensitivity and specificity equal. The graphs give a visualrepresentation of the discriminative capacity of the biotic indices tocontrast least-impacted and most-impacted sites.

Page 6: A diagnostic modelling framework to construct indices of biotic - Vliz

0.0

0.4

0.8

PLM Mesohaline

1 ( N= 7 ) 2 ( N= 21 ) 3 ( N= 35 ) 4 ( N= 35 ) 0.0

0.4

0.8

AVG Mesohaline

1 ( N= 7 ) 2 ( N= 21 ) 3 ( N= 35 ) 4 ( N= 35 )

0.0

0.4

0.8

PLM Oligohaline

1 ( N= 8 ) 2 ( N= 28 ) 3 ( N= 56 ) 4 ( N= 70 ) 0.0

0.4

0.8

AVG Oligohaline

1 ( N= 8 ) 2 ( N= 28 ) 3 ( N= 56 ) 4 ( N= 70 )

0.0

0.4

0.8

PLM Freshwater

1 ( N= 8 ) 2 ( N= 28 ) 3 ( N= 56 ) 4 ( N= 70 ) 0.0

0.4

0.8

AVG Freshwater

1 ( N= 8 ) 2 ( N= 28 ) 3 ( N= 56 ) 4 ( N= 70 )

Fals

e po

sitiv

e fra

ctio

n (F

PF) &

fals

e ne

gativ

e fra

ctio

n (F

NF)

Number of metrics in the model

Fig. 4. Boxplots of the false positive fraction (FPF) (left) and false negative fraction (FNF) (right) of all possible combinations of candidate metrics for the three salinity zones(mesohaline, oligohaline, freshwater) as a function of the number of metrics in the model. PLM: predictive logistic model; AVG: average model. N ¼ number of models fitted.

Table 5The optimal models (metrics included þ regression coefficients (PLM) or weights(AVG)) as a function of the number of metrics in the model (J) and their diagnosticaccuracy (DA%: false positive fraction (left)/false negative fraction (right)). For J ¼ 2competingmodels are shown (see text). For the definition of themetrics, see Table 3.For each zone, a final model is proposed (see text).

J Predictive logisticmodel (PLM)

DA% Average scoremodel (AVG)

DA%

Mesohaline zone1 Mjs 27/18 Mjs 16/272 0.83 Mjs þ 0.17 Ben 24/27 (Mjs þ Ben)/2 25/92 0.77 Mjs þ 0.33 Pis 24/27 (Mjs þ Pis)/2 18/363 0.66 Mjs þ 0.19 Pis

þ 0.15 Ben25/27 (Mjs þ Ben þ Pis)/3 12/36

/ Final model:(4 Mjs þ Ben þ Pis)/6

Oligohaline zone1 Pis 28/21 Pis 15/382 0.60 Pis þ 0.40 Fws 23/21 (Pis þ Fws)/2 19/332 0.67 Pis þ 0.33 Ben 28/26 (Pis þ Ben)/2 25/293 0.52 Pis þ 0.25 Fws

þ 0.23 Ben25/24 (Pis þ Dia þ Fws)/3 30/26

4 0.50 Pis þ 0.28 Fwsþ 0.24 Ben - 0.04 Dia

26/26 (Pis þ Dia þ Fwsþ Ben)/4

26/31

/ Final model:(2 Pis þ Fws þ Ben)/2

Freshwater zone1 Fws 29/30 Pis 42/382 0.63 Fws þ 0.47 Spa 29/28 (Spa þ Ben)/2 29/272 0.60 Omn þ 0.40 Fws 26/30 (Spa þ Fws)/2 26/302 0.55 Omn þ 0.35 Ben 29/32 (Spa þ Omn)/2 29/283 0.42 Fws þ 0.26 Spa

þ 0.33 Omn29/30 (Ben þ Spa þ Omn)/3 32/31

4 0.41 Fws þ 0.25 Spaþ 0.32 Omn - 0.02 Pis

29/31 (Ben þ Spaþ Omn þ Fws)/4

26/39

/ Final model:(Ben þ Spa þ Omn)/3

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233 227

4. Discussion

4.1. The diagnostic modelling framework and model buildingstrategy

4.1.1. The link between index development and statistical modelbuilding

In most situations, a multi-metric index is simply an average ofscored metrics (AVG-model). Fig. 2 links and extends this model tothe more flexible logistic regression model (PLM-model), givinga different weight to each metric (estimable by standard statisticalsoftware). The link shows that the index construction is very close,if not equivalent, to statistical model building. From a candidate setof explanatory variables (scored metrics) as potential predictors ofthe response variable (anthropogenic impact on the ecosystem),the optimal metric suite is searched according to some presetquality criterion (diagnostic accuracy of the index). Embedding theindex construction in a model building framework broadens itstheoretical support. It contributes to a more integrated approachproviding a clear and meaningful optimisation criterion and allowsan automated investigation of all possible subsets and cross-validation to estimate unbiasedly the diagnostic accuracy. Theconnection of AVG with PLM is in itself interesting because it doesnot exclude the traditional approach and allows to evaluatewhether a more complex approach is warranted. In our case study,both model types had a similar quality. However, the logisticmodelling has a greater potential. As will be discussed in section4.5, it allows the exploration of whether a different weighting of themetrics can improve the model.

4.1.2. The optimisation criterionThe diagnostic accuracy is a logical andmeaningful optimisation

criterion. It is very informative for the end user to have an idea aboutthe discriminative capacity of the biotic index. An important insightis that the end users should differentiate between the false positiveand false negative error because both types of error imply a different

Page 7: A diagnostic modelling framework to construct indices of biotic - Vliz

Fig. 5. Boxplots of the prediction of the final models. The lines are the boundaries discriminating between reference (best available) and degraded sites. PLM-score ¼ EQM ascalculated by the prognostic logistic model; AVG-score ¼ EQM as calculated by the average score model. R ¼ reference (least-impacted), D ¼ degraded (most-impacted).

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233228

cost (Quataert et al., 2007). A similar idea was expressed byMurtaugh (1996) who advocated the use of methods coming fromsignal detection theory (Swets, 1988) to assess the usefulness ofecological indicators which are cheap proxies for an otherwisecomplex and/or hard to measure reality. An ROC curve shows howthesensitivityof an index increases as functionof the complementofthe specificity, i.e. the false positive fraction. The decision boundaryfixes a point on this line, but it has no influence on the line. The curveas a whole represents the intrinsic capacity of a diagnostic test ormodel to discriminate between the different states of a condition(Zhou et al., 2002). Quataert et al. (2007) presented a similarapproach to evaluate the European Fish Index developed by theEuropean FAME project (Schmutz et al., 2007). Their error curve isequivalent to the ROC curve, but plots the false negative fraction asa function of the false positive fraction. Here, we used the ROCbecause it is a more generally accepted framework.

The index construction comes down to searching the subset ofmetrics optimising the ROC curve. As it is not evident to optimisea curve, summary characteristics are used as the optimisationcriterion. Breine et al. (2007) used the area under the curve (AUC).Although a valid criterion, it has the drawback that the full range isused including values of the sensitivity or specificity which are toolow to be of any practical value. A first alternative is optimising thesensitivity at a given false positive fraction (e.g. 20%) or a range of it(e.g. 10e30%), i.e. a partial AUC (Dodd and Pepe, 2003). A secondalternative, used here, is to optimise the ROC at the point wherefalse positive fraction and false negative fraction (or, totallyequivalently, their complements specificity and sensitivity) are(about) equal. This optimisation also signals whether it is possibleto find a decision boundary balancing both types of error.

4.1.3. All possible subsets up to four metricsWe fitted models for all possible subsets of the candidate

metrics e up to four metrics e and plotted the diagnostic accuracyas a function of the number of metrics in the subset (Fig. 4). For eachnumber of metrics, the optimal model was selected. For the subsetswith two metrics, we also explored models with a diagnosticaccuracy close to the optimal model (Table 5). This best subsetexploration improves the stepwise approach explained in Breineet al. (2007) which can miss the best model as the first metricentered is not necessarily best in combination with others (Kutneret al., 2005). For instance, the optimal AVG model in the freshwaterzone with a subset of two metrics did not include Piscivores whichis the optimal choice in the model with one metric (Table 5).

Investigation of all possible subsets provided more assuranceabout the optimal number of metrics in the model. The boxplotsshow that once there are two metrics in the model, there is nosubstantial improvement possible anymore. From threemetrics on,the diagnostic quality of the AVG-models starts to deteriorate andthe balance between false positives and negatives disappears. WithPLM, the diagnostic quality still improves but slightly. The PLM-subsets with four metrics become very compact which impliesthat it is possible to construct a good model with nearly anycombination of four metrics, but some of the coefficients becomenegative. This is a signal of overfitting (Harrell et al., 1996): themodel fits particularities of the sample, instead of modelling theunderlying pattern.

Finally, screening of all possible subsets offers information aboutthe vicinity of the best model. Many models have about the sameoptimal diagnostic accuracy; hence the choice of an optimalcombination of metrics is rather relative. With a slightly differentset of fish data, another combination of metrics would have beenselected. There was no clear winning combination. Hence, the label‘optimal model’ should be interpreted with care. It is important toget insight in how close competing models are (McCullagh andNelder, 1989). Data seldom point unequivocally to one singlemodel, and ignoring this ambiguity can result in a wrong inter-pretation of the data or miss an important relation (Kutner et al.,2005).

4.1.4. Cross-validation and external validationCross-validation offers a protection both for overfitting

(including too many metrics) and biased estimation of the diag-nostic accuracy (Hosmer and Lemeshow, 2000). Setting subsetsaside not used to fit the model allows the assessment of whether anadditional variable really improves the prediction quality or justmodels particularities of the sample. By the same mechanism thediagnostic accuracy is estimated unbiasedly because the subsets setaside do not interfere in the model step (Harrell et al., 1996). Cross-validation is more efficient than splitting the dataset once ina calibration and a validation set, especially for smaller samples(Steyerberg et al., 2001).

For our case study, cross-validation indicated a sensitivity andspecificity of about 75% (70% in the freshwater zone). It is importantto recognise that cross-validation is an internal validation proce-dure: a flaw in the study design, a different context or wrong datacollection will remain unnoticed. External validation is necessarywith follow-up studies (Hosmer and Lemeshow, 2000). Paul et al.

Page 8: A diagnostic modelling framework to construct indices of biotic - Vliz

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233 229

(2001) reported that the sensitivity of their index decreased from90% to 80%. We can expect a similar decrease in quality. In thisrespect, application of the index is instrumental for furtherimprovement of the index; for instance by combining managementwith an experimental setting (Underwood, 1995) in the follow-upof restoration plans of the Zeeschelde (Van den Bergh et al., 2005).

4.2. Design and data quality issues

4.2.1. The stratification in salinity zonesAs salinity has an important impact on the aquatic community

(McLusky and Elliott, 2004), we distinguished three salinity zones(mesohaline, oligohaline and freshwater) to develop a type-specific(Kelly-Quin et al., 2009) biotic index better controlling for naturalvariability (Roset et al., 2007). With the introduction of the salinityzones, the salinity gradient does not disappear totally. A study ofGreenwood (2007) revealed that nekton communities changedslowly (but steadily) in the middle (ecocline), while both ends ofthe estuarine gradient appeared to be areas of rapid change(ecotones). In addition, some authors argue that the key factor toexplain the community composition in high tidal estuaries is notthe average salinity but the salinity variability (Attrill and Rundle,2002; Attrill, 2002). Still, in spite of these possible criticisms, wefound the simple stratification to be successful. In a preliminaryinvestigation compiling all data to construct a global model, it wasnot possible to find a consistent pattern in the data (results notshown). With the introduction of salinity zones, contrasting least-impacted and most-impacted sites was more successful. Thisindicates that anthropogenic factors dominate once the majorsalinity gradient is removed.

4.2.2. The preclassificationThe preclassification is the Achilles’ heel of index development.

Although decisive, even in medicine the composition of a goldstandard is the hardest part of the exercise (Zhou et al., 2002). Inour case, as in many other instances, the preclassification is derivedfrom a combination of scores expressing the effect of humanactivities and pressures (Van Stickle and Paulsen, 2008). Thisapproach involves rather strong assumptions (Yuan and Norton,2004) including knowledge of doseeresponse curves (how muchis the ecosystem affected by the human pressures/activities) andthe additivity of these pressures (ignoring differences in impact andinteraction). By no means is this a perfect system. To put inperspective of what is achievable, Falcone et al. (2010) tested thecapacity of an a priori ranking of watersheds with an extensive setof GIS-variables. They found out that the diagnostic accuracy of theclassification in least- and most-degraded sites was about two-thirds. However, the preclassification is not a purpose in itself,but a device to rank the sites in a reasonable way with respect toanthropogenic pressure enabling biotic index construction. Also,for the optimisation, only the dichotomy between least and most-impacted sites is used. Further standardisation is surely needed inthis area. In this respect, we advocate the use of validated schemes(Borja et al., 2009b) compatible with empirical studies about therelation between human activities and changes in the communitystructure (Tong, 2001; Vasconcelos et al., 2007). Hence, we derivedour preclassification (Breine et al., 2007) from a frameworkspecifically designed for an estuary context, and resulting from anextensive collaboration of field experts (Aubry and Elliott, 2006).

4.2.3. Confounding salinity and anthropogenic stressAnthropogenic stress was strongly associated with the salinity

gradient. In the mesohaline zone, the sites range from low tomoderate impact, while in the oligohaline zone, they range frommoderate to high impact (Table 4). Because of this confounding

feature (Hosmer and Lemeshow, 2000), it was impossible todisentangle the salinity gradient from anthropogenic stress factors.Therefore, we developed a separate binary biotic index for eachsalinity zone. Each index makes a contrast between the least-impacted (“reference”) and most-impacted (“degraded”) sites. Ineach zone, the binary contrast has a different meaning which limitsthe generality and interpretability of the index. A way to overcomeconfounding is to study similar estuaries on a larger (e.g. European)scale. Then, all combinations of anthropogenic stress and salinityare available allowing to model the impact of salinity as demon-strated in Delpech et al. (2010) for 13 estuaries in France. Poolinginformation and data over many (similar) estuaries would alsoconsiderably broaden the empirical basis.

4.3. The starting model and candidate metrics

Subject matter considerations should guide the choice of thecandidate metrics. A crucial preliminary step in any statisticalmodel building process is the composition of the candidateexplanatory variables (Hosmer and Lemeshow, 2000). It is naïve tohope that statistical techniques can replace critical thinking andwill be able to adjust ill-conceived starting models (Rothman,1990).

4.3.1. Selection of the candidate metrics and the Estuarine QualityParadox

Candidate metrics should reflect ecological hypotheses aboutthe response of the biota to ecosystem impairment (Bailey et al.,2004). Fore et al. (1996), comparing different bioassessment tech-niques, concluded that biotic indices incorporating ecologicalinformation are more suitable for biomonitoring than methods asPCA relying on statistical algorithms. The challenge is to findcharacteristics of the community that are sensitive to a broadspectrum of pressures (Noble et al., 2007). In this respect, as guildsare related to the ecosystem functioning (Wilson, 1999), the guildapproach offers a consistent framework to construct metricsexpressing functional and structural symptoms of aquatic ecosys-tems (Aarts and Nienhuis, 2003). For these reasons, we composeda set of candidate metrics from two broad categories (Table 2) eachlinked with a different ecological hypothesis (Bailey et al., 2004;Olden et al., 2006): the “classical” metrics testing the global func-tioning of an (any) aquatic ecosystem (Karr and Chu, 1999) and“estuarine” metrics expressing functions of the estuary (Elliottet al., 2007; Franco et al., 2008). The underlying hypothesis of theclassical metrics is that, with increasing anthropogenic disturbance,species with narrow habitat or environmental requirements willbecome less abundant, and, conversely, generalists will becomemore dominant. However, an estuary is a naturally stressedecosystem with characteristics very similar to ecosystems sub-jected to human disturbance (McLusky and Elliott, 2004; Martinhoet al., 2008). Hence, these generic metrics risk to be not veryresponsive. To cope with this difficulty, known as the EstuarineQuality Paradox (Dauvin and Ruellet, 2009), we complemented theclassical metrics, with estuarine metrics specifically evaluating thefunctioning of the estuary (Elliott and Quintino, 2007).

4.3.2. The selection of the scale of the metrics (proportions)All our metrics are proportions assessing the fraction of indi-

viduals belonging to a certain ecological group or guild. Althougha combination of different scales (abundance, number of species)can be successful, we preferred to work only with proportions. AnIBI is a bioassessment at the community level (Attrill and Depledge,1997). Alterations in the community structure are linked with theecosystem health through changes in the food web, competition,predation, etc. If a function of the ecosystem is impaired, because of

Page 9: A diagnostic modelling framework to construct indices of biotic - Vliz

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233230

the causal chain, the disturbancewill be reflected in the communitycomposition and especially shifts in the proportion of specificspecies groups are informative (Poff, 1997). Proportions are alsocapable to detect unexpected changes in species not included in theindex. If for some reason the abundance of a species increasesmanifold, all proportions will change changing the index value. Forthe same reason, metrics are messengers, not necessarily revealingthe cause. As we also argue further on, for a causal analysis othercomplementary information is necessary.

4.4. How many metrics?

An important challenge in statistical model building is theselection of the appropriate combination of exploratory variablesfrom a candidate set (Miller, 2002; Johnson and Omland, 2004). Aspecific problem is overfitting, i.e. selecting more explanatoryvariables than necessary, which can decrease the precision of themodel (Zucchini, 2000).

4.4.1. Redundancy and robustnessOur results suggest that two to threemetrics suffice to construct

an IBI of optimal diagnostic accuracy. A similar result is reported byothers. For instance, Paul et al. (2001) found three metrics to besufficient to attain a specificity of 88% and a sensitivity of 81% witha new (!) dataset (external validation). This is in contrast with theadvice to incorporate a broad spectrum of metrics in the bioticindex to uncover a broad range of pressures (Karr et al., 1986). As anexample, Alden et al. (2002) reported that single metrics performedas well as multi-metric indices. Yet they preferred redundancybecause in their opinion it increased the confidence in theconclusions. Similarly, Roth et al. (1998) added metrics to a threemetric index because not all metric groups were present in theirindex although the diagnostic accuracy decreased. Again theirargument was a better interpretation and a higher robustness ofthe index.

The general policy is to drop some of highly correlated metrics(Roset et al., 2007), but to allow for some redundancy to increaserobustness of the index. To our knowledge, robustness is nowhereexplicitly defined or measured, but the connotation is that a robustindex is sensitive to pressures not expected in advance. Byincluding many metrics, the hope is to increase robustness.Although this argument is sensible from an ecological perspective,the question is whether this is good scientific practice. Why shouldwe believe a theoretical construct, if the empirical analysis suggeststhat the model does not improve or even gets worse? Yet, it isimportant to find sufficient metrics. To achieve this on empiricalgrounds, we suggest two alternatives.

4.4.2. Improving the design by probability-based samplingAn IBI has the potential and ambition to cover a broad range of

impacts on the ecosystem. However, if the sample available for itsconstruction is not representative for the impacts in the region and/or does not cover the full gradient of pressures, there is little hopethat the relevant metrics sensitive to the pressures will be selected.In this respect, an ongoing discussion is about whether the sampleshould be probability-based. Fore (2003) showed that a probabilitysample (Overton and Stehman, 1995) was superior to cover a broadspectrum of pressures against the intuition of many researchersinvolved in the project preferring a sample based on specific tar-geted criteria. However, a random sample assures an unbiasedpicture not only of the ecological condition, but also of relationsbetween variables (Kish,1987). In practice, a random sample is hardto achieve and can seem very expensive, but in the long term theinvestment is rewarding (Hughes et al., 2000; Dauer and Llansó,2003; Llansó et al., 2003; Paul et al., 2008; Southerland et al.,

2009). Evidently, all other aspects of the design are equallyimportant including the preclassification, the typology of the strata(Kelly-Quin et al., 2009), the candidatemetrics, and the sample size.Considering all these points, Southerland et al. (2007) weresuccessful in substantially improving theMaryland fish and benthicmacroinvertebrate index.

4.4.3. Better separation of diagnosis and causal analysisThe second proposal is to separate the assessment of human

impact (diagnostic modelling) and the investigation of the causes ofimpairment (causal analysis). Many authors (implicitly) tend togive two functions to an IBI: signalling whether something iswrong, and judging the cause from the metrics. Our resultsconfirmed by other research (Roth et al., 1998; Paul et al., 2001)suggest that to detect impairment a limited set of metrics can besufficient. In contrast, for a causal analysis additional information isnecessary including extra metrics (not part of the index), specieslevel information (e.g. which species are missing or present indic-ative for environmental pollution), concentration of toxicsubstances, physical and chemical characteristics together withtheir ecologically relevant threshold values, and habitat charac-teristics at local or landscape level. It can be a more cost-effectivestrategy to distinguish both functions and to start a more indepth investigation once the IBI signals a problem.

4.5. The final models

Data seldom point unequivocally to one single model. Ignoringthis ambiguity can result in a wrong interpretation or in missingimportant facts (Kutner et al., 2005). In their seminal book aboutgeneralized linear models, McCullagh and Nelder (1989) advocate“not to fall in love with one single model to the exclusion ofalternatives”. Although this advice is at first sight not very helpfulfor the end users requiring an unequivocal index for decisionmaking (Karr and Yoder, 2004), acknowledging the uncertaintieshelps to improve the empirical foundation of an index (Nõges et al.,2009; Hering et al., 2010). In the following we concisely discuss thelessons learnt from the alternative models of the best subsetregression.

4.5.1. The mesohaline zoneBoth PLM and AVG select marine juveniles (Mjs) as the first and

most important metric. In PLM, the regression coefficient of Mjs isseveral times larger than the next metric in the model. Franco et al.(2008) speak about marine migrants: they spawn at sea but enterthe estuary seeking refuge. Food and habitat diversity (Petersonet al., 2000) and water quality (Costa and Cabral, 1999) influencetheir presence. For the second metric in the model, we can choosebetween two generic metrics: Benthos (Ben) or Piscivores (Pis).Both metrics are present in the estuarine index of Coates et al.(2007). Piscivores are at the top of the food web and registerchanges on the food web (Harrison and Whitfield, 2004). Benthos(Ben) is indicative for the habitat quality for bottom dwellingspecies (Deegan et al., 1997; McCormick et al., 2001). Oberdorff andHughes (1992) found benthic species to be mainly sensitive tobenthic oxygen depletion.

4.5.2. The oligohaline zoneIn the oligohaline zone, Piscivores (first) and Benthos were

selected. Together these metrics compose one of the two optimalmodels for AVG and PLM. A bit unexpected, also the combination ofpiscivores with Resident freshwater species (Fws) gave goodresults. It could be an artefact, but a possible interpretation is thatfreshwater fish species also penetrate in the oligohaline zone(Attrill and Rundle, 2002) and that these fish species can cope with

Page 10: A diagnostic modelling framework to construct indices of biotic - Vliz

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233 231

the ecological conditions if the ecosystem is not impacted toomuch. On the other hand, the oligohaline zone in the Zeescheldehas a long history of pollution. In the 1990s, as the water qualityimproved, the first fish to come backwere freshwater species (Maeset al., 2005). At present, the abundance of freshwater species is stillmore important than that of estuarine or marine species in thissalinity zone.

4.5.3. The freshwater zoneTraditionally, estuarine indices do not consider the freshwater

zone. Yet, for the Schelde River, the last part of the freshwater zone(seen from the origin) is tidal and the management of this fresh-water zone is important (Greenwood, 2007). No clear modelemerged from the index building as there are many close alterna-tives (see Table 5). Among the optimal models, over and aboveresident freshwater species (Fws), specialised spawners (Spa) andOmnivores (Omn) were often selected. Omnivores were recognizedas an important metric from the very beginning of the IBI as a goodindicator for disturbance (Karr, 1981). The metric Spa includes nestguarders and/or species having special demands for spawningsubstrate (Kestemont et al., 2000). This metric specifically assessesthe degradation of the spawning habitat due to canalisation(reducing riparian vegetation) and dredging (change in bottomsubstrate texture). The metric also includes diadromous species, soit indirectly measures the connectivity of the estuary.

4.5.4. The final modelsTomake a definite choice between the different models, we take

into account the lessons learnt from the subset investigation andmerge the results of AVG (simplicity) and PLM (different weighing).In the mesohaline zone, the Marine juvenile species (Mjs) isa strong metric followed by Benthic species and Piscivores. Inspiredby the third logistic model in Table 5, we can incorporate bothadditional metrics by the following weighting scheme: 4/6 Mjsþ 1/6 Ben þ 1/6 Pis. Similarly, in the oligohaline zone we obtain ½Pis þ ¼ Fws þ ¼ Ben. For the freshwater zone no consistent modelappeared. Apart from Freshwater species, Specialised spawners andOmnivores were often selected. As the regression coefficients of thelogistic model with three variables are close, we propose equalweighting: (Fws þ Spa þ Omn)/3.

5. Conclusions

The main purpose of this paper was to improve the constructionof an IBI by adopting a statistical model building perspective. Westart by observing that an IBI is analogous to a diagnostic model inmedicine assessing the health of a patient based on a set ofbiomarkers. In contrast to these models in medicine, the develop-ment of an IBI has remained a rather isolated strategy with its ownrules. A plausible reason is that an IBI does not “feel” as a statisticalmodel as it is often defined as a rating system based on an averageof scores. However, even this commonly used AVG model isa member of the family of regression models, but with fixedregression coefficients. Embedding index construction in the moreglobal framework of statistical model building provides a deepertheoretical foundation and a wealth of practical techniques. In thispaper, we demonstrated that this approach facilitates the search foran optimal combination of metrics by using best subset regressionand cross-validation for a case study of the Zeeschelde estuary. Thismodelling approach provides also insight for a better design ofindex calibration studies. We emphasize the importance of a prob-ability-based sampling along with the use of a validated pre-classification procedure to uncover the response metrics to the fullspectrum of anthropogenic pressures in the region.

References

Aarts, B.G.W., Nienhuis, P.H., 2003. Fish zonations and guilds as the basis forassessment of ecological integrity of large rivers. Hydrobiologia 500, 157e178.

Alden, R.W., Dauer, D.M., Ranasinghe, J.A., Scott, J.M., Llansó, R.J., 2002. Statisticalverification of the Chesapeake Bay benthic index of biotic integrity. Environ-metrics 13, 473e498.

Araújo, F.G., Williams, W.P., Bailey, W.P., 2000. Fish assemblages as indicators ofwater quality in the Middle Thames Estuary, England (1980e1989). Estuaries23, 305e317.

Attrill, M.J., 2002. A testable linear model for diversity trends in estuaries. Journal ofAnimal Ecology 71, 262e269.

Attrill, M.J., Depledge, M.H., 1997. Community and population indicators ofecosystem health: targeting links between levels of biological organisation.Aquatic Toxicology 38, 183e197.

Attrill, M.J., Rundle, S.D., 2002. Ecotone and ecocline: ecological boundaries inestuaries. Estuarine, Coastal and Shelf Science 55, 929e936.

Aubry, A., Elliott, M., 2006. The use of environmental integrative indicators to assessseabed disturbance in estuaries and coasts: application to the Humber estuary,UK. Marine Pollution Bulletin 53, 175e185.

Baeyens, W., van Eck, B., Lambert, C., Wollast, R., Goeyens, L., 1998. Generaldescription of the Scheldt estuary. Hydrobiologia 366, 1e14.

Bailey, R.C., Norris, R.H., Reynoldson, T.B., 2004. Bioassessment of FreshwaterEcosystems: Using the Reference Condition Approach. Springer, 170 pp.

Bedoya, D., Novotny, V., Manolakos, E.S., 2009. Instream and offstream environ-mental conditions and stream biotic integrity: importance of scale and sitesimilarities for learning and prediction. Ecological Modelling 220, 2393e2406.

Borja, Á, Bald, J., Franco, J., Laretta, A., Muxika, I., Revilla, M., Rodríguez, J.G.,Solaun, O., Uriarte, A., Valencia, V., 2009a. Using multiple ecosystem compo-nents in assessing ecological status in Spanish (Basque Country) Atlantic marinewaters. Marine Pollution Bulletin 59, 65e71.

Borja, Á, Franco, J., Valencia, V., Bald, J., Muxika, I., Belzunce, M.J., Soluan, O., 2004.Implementation of the European water framework directive from the Basquecountry (northern Spain): a methodological approach. Marine Pollution Bulletin48, 209e218.

Borja, Á, Ranasinghe, A., Weisberg, S.B., 2009b. Editorial. Assessing ecologicalintegrity in marine waters, using multiple indices and ecosystem components:challenges for the future. Marine Pollution Bulletin 59, 1e4.

Breine, J., Maes, J., Quataert, P., Van den Bergh, E., Simoens, I., Van Thuyne, G.,Belpaire, C., 2007. A fish-based assessment tool for the ecological quality of thebrackish Schelde estuary in Flanders (Belgium). Hydrobiologia 575, 141e159.

Breine, J., Quataert, P., Stevens, M., Ollevier, F., Volckaert, F.A.M., Van den Bergh, E.,Maes, J., 2010. A zone-specific fish-based biotic index as a management tool forthe Zeeschelde estuary (Belgium). Marine Pollution Bulletin 60, 1099e1122.

Breine, J., Simoens, I., Goethals, P., Quataert, P., Ercken, D., Van Liefferinghe, C.,Belpaire, C., 2004. A fish-based index of biotic integrity for upstream brooks inFlanders (Belgium). Hydrobiologia 522, 133e148.

Brys, R., Ysebaert, T., Escaravage, V., Van Damme, S., Van Braeckel, A.,Vandevoorde, B., Van den Bergh, E., 2005. Afstemmen van referentieconditiesen evaluatiesystemen in functie van de KRW: afleiden en beschrijven van sys-teemeigen referentieomstandigheden en/of maximaal ecologisch potentieel inelk Vlaams waterlichaamtype, vanuit de - overeenkomstig de KaderrichtlijnWater - ontwikkelde beoordelingssystemen voor biologische kwaliteitsele-menten. Instituut voor Natuurbehoud, Brussel, Belgium.

Claeskens, G., Hjort, N.L., 2008. Model Selection and Model Averaging, Cambridge,320 pp.

Coates, S., Waugh, A., Anwar, A., Robson, M., 2007. Efficacy of a multi-metric fishindex as an analysis tool for the transitional fish component of the waterframework directive. Marine Pollution Bulletin 55 (Spec. Issue 1e6), 225e240.

Costa, M.J., Cabral, H.N., 1999. Changes in the Tagus nursery function for commercialfish species: some perspectives for management. Aquatic Ecology 33, 287e292.

Dauer, D.M., Llansó, R.J., 2003. Spatial scales and probability based sampling indetermining levels of benthic community degradation in the Chesapeake Bay.Environmental Monitoring and Assessment 81, 175e186.

Dauvin, J.-C., Ruellet, T., 2009. The estuarine quality paradox: is it possible to definean ecological quality status for specific modified and naturally stressed estua-rine ecosystems? Marine Pollution Bulletin 59, 38e47.

Deegan, L.A., Finn, J.T., Ayvazian, S.G., Ryder-Kiefer, C.A., Buonaccorsi, J., 1997.Development and validation of an estuarine biotic integrity index. Estuaries 20,601e617.

Delpech, C., Courrat, A., Pasquaud, S., Lobry, J., Le Pape, O., Nicolas, D., Boet, P.,Girardin, M., Lepage, M., 2010. Development of a fish-based index to assess theecological quality of transitional waters: the case of French estuaries. MarinePollution Bulletin 60, 908e918.

Dodd, L.E., Pepe, M.S., 2003. Partial AUC estimation and regression. Biometrics 59,614e623.

Elliott, M., Nedwell, S., Jones, N.V., Read, S.J., Cutts, N.D., Hemingway, K.L., 1998.Intertidal Sand and Mudflats & Subtidal Mobile Sandbanks (Volume II). AnOverview of Dynamic and Sensitivity Characteristics for ConservationManagement of Marine SACs. Scottish Association for Marine Science (UKMarine SACs project), 151 pp.

Elliott, M., Quintino, V., 2007. Viewpoint. The estuarine quality paradox, environ-mental homeostasis and the difficulty of detecting anthropogenic stress innatural stressed areas. Marine Pollution Bulletin 54, 640e645.

Page 11: A diagnostic modelling framework to construct indices of biotic - Vliz

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233232

Elliott, M., Whitfield, A.K., Potter, I.C., Blaber, S.J.M., Cyrus, D.P., Nordlie, F.G.,Harrison, T.D., 2007. The guild approach to categorizing estuarine fish assem-blages: a global review. Fish and Fisheries 8, 241e268.

Falcone, J.A., Carlisle, D.M., Weber, L.C., 2010. Quantifying human disturbance inwatersheds: variable selection and performance of a GIS-based disturbanceindex for predicting the biological condition of perennial streams. EcologicalIndicators 10, 264e273.

Fore, L.S., 2003. Developing Biological Indicators: Lessons Learned from Mid-Atlantic Streams, 42 pp.

Fore, L.S., Karr, J.R., Wisseman, R.W., 1996. Assessing invertebrate responses tohuman activities: evaluating alternative approaches. Journal of the NorthAmerican Benthological Society 15, 212e231.

Franco, A., Elliott, M., Franzoi, P., Torricelli, P., 2008. Life strategies of fishes inEuropean estuaries: the functional guild approach. Marine Ecology ProgressSeries 354, 219e228.

Gard,M.F.,2002.Effectsof sediment loadsonthefishand invertebratesof a SierraNevadariver, California. Journal of Aquatic Ecosystems Stress and Recovery 9, 227e238.

Greenwood, M.F.D., 2007. Nekton community change along estuarine salinitygradients. Can salinity zones be defined? Estuaries and Coasts 30, 537e542.

Harrell, F.E., Lee, K.L., Mark, D.B., 1996. Tutorial in biostatistics: multivariableprognostic models. Issues in developing models, evaluating assumptions andadequacy, and measuring and reducing errors. Statistics in Medicine 15,361e387.

Harrison, T.D., Whitfield, A.K., 2004. A multi-metric fish index to assess the envi-ronmental condition of estuaries. Journal of Fish Biology 65, 683e710.

Harrison, T.D., Whitfield, A.K., 2006. Applications of a multi-metric fish index toassess the environmental condition of South American estuaries. Estuaries andCoasts 29, 1108e1120.

Hering, D., Borja, Á, Carstensen, J., Carvalho, L., Elliott, M., Feld, C.K., Heiskanen, A.-S.,Johnson, R.K., Moe, J., Pont, D., Solheim, A.L., Van de Bund, W., 2010. TheEuropean water framework directive at the age of 10: a critical review of theachievements with recommendations for the future. Science of the TotalEnvironment 408, 1e4007.

Hering, D., Feld, C., Moog, O., Ofenböck, T., 2006. Cookbook for the development ofa multimetric index for biological condition of aquatic ecosystems: experiencesfrom the European AQEM and STAR projects and related initiatives. Hydro-biologia 566, 311e324.

Hosmer, D.W., Lemeshow, S., 2000. Applied Logistic Regression. John Wiley & Sons,Inc., New York, 375 pp.

Hughes, R.M., Kaufmann, P.R., Herlihy, A.T., Kincaid, T.M., Reynolds, L., Larsen, D.P.,1998. A process for developing and evaluating indices of fish assemblageintegrity. Canadian Journal of Fisheries and Aquatic Sciences 55, 1618e1631.

Hughes, R.M., Oberdorff, T., 1999. Applications of IBI concepts and metrics to wateroutside the United States and Canada. In: Simon, T.P. (Ed.), Assessing theSustainability and Biological Integrity of Water Resources Using Fish Commu-nities. CRC Press, Washington, DC, pp. 62e74.

Hughes, R.M., Paulsen, S.G., Stoddard, J.L., 2000. EMAP-surface waters: a multi-assemblage, probability survey of ecological integrity in the U.S.A. Hydro-biologia 422/423, 429e443.

Huitema, D., Turnhout, E., 2009. Working at the science-policy interface: a discur-sive analysis of boundary work at the Netherlands environmental assessmentagency. Environmental Politics 18, 576e594.

Johnson, D.H., Omland, K.S., 2004. Model selection in ecology and evolution. Trendsin Ecology and Evolution 19, 101e108.

Karr, J.R., 1981. Assessment of biotic integrity using fish communities. Fisheries 6,21e27.

Karr, J.R., Chu, E.W., 1999. Restoring Life in Running Waters: Better BiologicalMonitoring. Island Press, Washington ; California, 206 pp.

Karr, J.R., Chu, E.W., 1997. Biological monitoring: essential foundation for ecologicalrisk assessment. Human and Ecological Risk Assessment 3, 993e1004.

Karr, J.R., Fausch, K.D., Angermeier, P.L., Yant, P.R., Schlosser, I.J., 1986. Assessingbiological integrity in running waters. A Method and Its Rationale., 25.

Karr, J.R., Yoder, C.O., 2004. Biological assessment and criteria improve totalmaximum daily load decision making. Journal of Environmental Engineering130, 594e604.

Kelly-Quin, M., Tierney, D., McGarrigle, M., 2009. Progress and challenges in theselection of type-specific reference conditions for Irish rivers and lakes. Biologyand Environment: Proceedings of the Royal Irish Academy 109B, 151e160.

Kennish, M.J., 2002. Environmental threads and environmental future of estuaries.Environmental Conservation 29, 78e107.

Kestemont, P., Didier, J., Depiereux, E., Micha, J.C., 2000. Selecting ichthyologicalmetrics to assess river basin ecological quality. Archiv für Hydrobiologie Sup-plementband Monographic Studies 121, 321e348.

Kish, L., 1987. Statistical Design for Research. John Wiley, New York.Kutner, M.H., Nachtsheim, C.J., Neter, J., Li, W., 2005. Applied Linear Statistical

Models. McGraw-Hill/Irwin, New York, 1396 pp.Llansó, R.J., Dauer, D.M., Vølstad, J.H., Scott, L.C., 2003. Application of the benthic

index of biotic integrity to environmental monitoring in Chesapeake Bay.Environmental Monitoring and Assessment 81, 163e174.

Madon, S.P., 2008. Fish community responses to ecosystem stressors in coastalestuarine wetlands: a functional basis for wetlands management and restora-tion. Wetlands Ecology Management 16, 219e236.

Maes, J., Stevens, M., Ollevier, F., 2005. The composition and community structure ofthe ichthyofauna of the upper Scheldt estuary: a synthesis of a 10-yearcollection (1991e2001). Journal of Applied Ichthyology 20, 1e8.

Martinho, F., Viegas, I., Dolbeth, M., Leitão, R., Cabral, H.N., Pardal, M.A., 2008.Assessing estuarine environmental quality using fish-based indices: perfor-mance evaluation under climatic unstability. Marine Pollution Bulletin 56,1834e1843.

McCormick, F.H., Hughes, R.M., Kaufmann, P.R., Peck, D.V., Stoddard, J.L.,Herlihy, A.T., 2001. Development of an index of biotic integrity for the Mid-Atlantic Highlands region. Transactions of the American Fisheries Society 130,857e877.

McCullagh, P., Nelder, J.A., 1989. Generalized Linear Models. Chapman and Hall,Boca Raton, 511 pp.

McLusky, D.S., Elliott, M., 2004. The Estuarine Ecosystem: Ecology Threats andManagement. Oxford University Press, Oxford, 216 pp.

Miller, A.J., 2002. Subset Selection in Regression. Chapman and Hall, Boca Raton,238 pp.

Murtaugh, P.A., 1996. The statistical evaluation of ecological indicators. EcologicalApplications 6, 132e139.

Noble, R.A.A., Cowx, I.G., Goffaux, D., Kestemont, P., 2007. Assessing the health ofEuropean rivers using functional ecological guilds of fish communities: stand-ardising species classification and approaches to metric selection. FisheriesManagement and Ecology 14, 381e392.

Nõges, P., Van de Bund, W., Cardoso, A.C., Solimini, A.G., Heiskanen, A.-S., 2009.Assessment of the ecological status of European surface waters: a work inprogress (review paper). Hydrobiologia 633, 197e211.

Oberdorff, T., Hughes, R.M., 1992. Modification of an index of biotic integrity basedon fish assemblages to characterise rivers of the Seine Basin, France. Hydro-biologia 228, 117e130.

Olden, J.D., Poff, N.L., Bledsoe, B.P., 2006. Incorporating ecological knowledge intoecoinformatics: an example of modeling hierarchically structured aquaticcommunities with neural networks. Ecological Informatics 1, 33e42.

Overton, W.S., Stehman, S.V., 1995. The HorvitzeThompson theorem as a unifyingperspective for probability sampling with examples from natural resourcesampling. American Statistician 49, 261e268.

Paul, J.F., Scott, K.J., Campbell, D.E., Gentile, J.H., Strobel, C.S., Valente, R.M.,Weisberg, S.B., Holland, A.F., Ranasinghe, J.A., 2001. Developing and applyinga benthic index of estuarine condition for the Virginian Biogeographic Province.Ecological Indicators 1, 83e99.

Paul, J.F., Walker, H.A., Galloway, W., Pesch, G., Cobb, D., Strobel, C.J.,Summers, J.K., Charpentier, M., Heltshe, J.F., 2008. Combining existingmonitoring sites with a probability survey design e examples from U.S. EPA’snational coastal assessment. The Open Environmental & Biological Moni-toring Journal 1, 25.

Pepe, M.S., 2003. The Statistical Evaluation of Medical Tests for Classification andPrediction. Oxford University Press, Oxford, 302 pp.

Peterson, M.S., Comyns, B.H., Hendon, J.R., Bond, P.J., Duff, G.A., 2000. Habitat use byearly life-history stages of fishes and crustaceans along a changing estuarinelandscape: differences between natural and altered shoreline sites. WetlandsEcology and Management 8, 209e219.

Poff, N.L., 1997. Landscape filters and species traits. Towards mechanistic under-standing and prediction in stream ecology. Journal of the North AmericanBenthological Society 16, 391e409.

Quataert, P., Breine, J., Simoens, I., 2007. Evaluation of the European fish index:false-positive and false-negative error rate to detect disturbance and consis-tency with alternative fish indices. Fisheries Management and Ecology 14,465e472.

Roset, N., Grenouillet, G., Goffaux, D., Pont, D., Kestemont, P., 2007. A review ofexisting fish assemblage indicators and methodologies. Fisheries Managementand Ecology 14, 393e405.

Roth, N.E., Southerland, M.T., Chaillou, J., Klauda, R., Kazyak, P., Stranko, S.,Weisberg, S., Hall, L., Morgan, R., 1998. Maryland biological stream survey:development of a fish index of biotic integrity. Environmental Monitoring andAssessment 51, 89e106.

Rothman, K.J., 1990. A sobering start for the cluster busters’ conference. AmericanJournal of Epidemiology 132 (1), S6e13.

Schmutz, S., Pont, D., Haidvogl, G., Cowx, I.G., 2007. Preface to the special issue -fish-based methods for assessing European running waters (FAME). FisheriesManagement and Ecology 14, 367.

Sindilariu, P., Freyhof, J., Wolter, C., 2006. Habitat use of juvenile fish in the lowerDanube and the Danube delta: implications for ecotone connectivity. Hydro-biologia 571, 51e61.

Southerland, M.T., Rogers, G.M., Kline, M.J., Morgan, R.P., Boward, D.M., Kazyak, P.F.,Klauda, R.J., Stranko, S.A., 2007. Improving biological indicators to better assessthe condition of streams. Ecological Indicators 7, 751e767.

Southerland, M.T., Vølstad, J.H., Weber, E.D., Klauda, R.J., Poukish, C.A., Rowe, M.C.,2009. Application of the probability-based Maryland biological stream surveyto the state’s assessment of water quality standards. Environmental Monitoringand Assessment 150, 65e73.

Speybroeck, J., Breine, J., Vandevoorde, B., Van Wichelen, J., Van Braeckel, A., VanBurm, E., Van den Bergh, E., Van Thuyne, G., Vyverman, W., 2008. KRW doel-stellingen in Vlaamse getijrivieren: afleiden en beschrijven van typespecifiekmaximaal ecologisch potentieel en goed ecologisch potentieel in een aantalVlaamse getijrivier-waterlichamen vanuit de - overeenkomstig de Kaderricht-lijn Water - ontwikkelde relevante beoordelingssystemen voor een aantalbiologische kwaliteitselementen. Rapporten van het Instituut voor Natuur- enBosonderzoek, 56. Instituut voor Natuur- en Bosonderzoek, Brussel, Belgium (inDutch)152.

Page 12: A diagnostic modelling framework to construct indices of biotic - Vliz

P. Quataert et al. / Estuarine, Coastal and Shelf Science 94 (2011) 222e233 233

Steyerberg, E.W., Harrell, F.E., Borsboom, G.J.J.M., Eijkemans, M.J.C., Vergouwe, Y.,Habbema, J.D.F., 2001. Internal validation of predictive models: efficiency ofsome procedures for logistic regression analysis. Journal of Clinical Epidemi-ology 54, 774e781.

Stoddard, J.L., Herlihy, A.T., Peck, D.V., Hughes, R.M., Whittier, T.R., Tarquinio, E.,2008. A process for creating multimetric indices for large-scale aquatic surveys.Journal of the North American Benthological Society 27, 878e891.

Swets, J.A., 1988. Measuring the accuracy of diagnostic systems. Science 240,1285e1293.

Thompson, B.A., Fitzhugh, G.R., 1986. A Use Attainability Study: An Evaluation ofFish and Macroinvertebrate Assemblages of the Lower Calcasieu River, Louisi-ana. Center for Wetland Resources, Coastal Fisheries Institute, Louisiana StateUniversity, Baton Ruge, Louisiana, 143 pp.

Tong, S., 2001. An integrated exploratory approach to examining the relationships ofenvironmental stressors and fish responses. Journal of Aquatic Ecosystem Stressand Recovery 9, 1e19.

Tull M., 2006. The Environmental Impact of Ports: an Australian Case Study. Paperpresented at the XIV International Economic History Congress (session No. 58),Helsinki, Finland, 21e25 August 2006, 24 pp.

Turnpenny, A.W.H., Coughlan, J., Liney, K.E., 2006. Review of temperature and dis-solved oxygen effects on fish in transitional waters. Jacobs Babtie, 81 pp.

Underwood, A.J., 1995. Ecological research and (and research into) environmentalmanagement. Ecological Applications 5, 232e247.

Van den Bergh, E., Van Damme, S., Graveland, J., de Jong, D., Baten, I., Meire, P.M.,2005. Ecological rehabilitation of the Schelde estuary (the Netherlands -Belgium; northwest Europe): linking ecology, safety against floods, and acces-sibility for port development. Restoration Ecology 13, 104e214.

Van Stickle, J., Paulsen, S.G., 2008. Assessing the attributable risks, relative risks, andregional extents of aquatic stressors. Journal of the North American Bentho-logical Society 27, 920e931.

Vasconcelos, R.P., Reis-Santos, P., Fonseca, V., Maia, A., Ruano, M., França, S.,Vinagre, C., Costa, M.J., Cabral, H., 2007. Assessing anthropogenic pressures onestuarine fish nurseries along the Portuguese coast: a multi-metric index andconceptual approach. Science of the Total Environment 374, 199e215.

Venice system, 1959. The final resolution of the symposium on the classification ofbrackish waters. Archives of Oceanographic Limnology 11 (suppl.), 243e248.

Wheeler, A.C., 1969. Fish-life and pollution in the lower Thames: a review andpreliminary report. Biological Conservation 2, 25e30.

Wilson, J.B., 1999. Guilds, functional types and ecological groups. Oikos 86,507e532.

Yuan, L.L., Norton, S.B., 2004. Assessing the relative severity of stressors ata watershed scale. Environmental Monitoring and Assessment 98, 323e349.

Zhou, X.H., Obuchowski, N.A., McClish, D.K., 2002. Statistical Methods in DiagnosticMedicine. Wiley Interscience, New York, 437 pp.

Zucchini, W., 2000. An introduction to model selection. Journal of MathematicalPsychology 44, 41e61.