designing observational biologging studies to assess the causal effect of instrumentation
Post on 17-Dec-2016
214 Views
Preview:
TRANSCRIPT
Designing observational biologging studies to assess
the causal effect of instrumentation
MatthieuAuthier1*, Clara P�eron1, AlainMante2, Patrick Vidal3 andDavidGr�emillet1, 4
1Centre d’�Ecologie Fonctionelle et �Evolutive, CEFE-CNRSUMR5175, 1919Route deMende,Montpellier Cedex 5, 34 293,
France; 2Conservatoire d’EspaceNaturels de Provence-Alpes-Cote d’Azur, R�eserveNaturelle Nationale de l’Archipel deRiou,
166 avenue deHambourg – Immeuble le Sud,Marseille, 13008, France; 3S�emaphore de Pom�egues – Le Frioul, ParcMaritime
des Iles du Frioul, Marseille, 13001, France; and 4DST-NRFCentre of Excellence, FitzPatrick Institute, University of Cape
Town, Rondebosch, 7701, South Africa
Summary
1. Biologging has improved ecological knowledge on an increasing number of species for more than 2 decades.
Most studies looking at the incidence of tags on behavioural, physiological or demographic parameters rely on
‘control’ individuals chosen randomly within the population, assuming that they will be comparable with
equipped individuals. This assumption is usually untestable and untenable since biologging studies are more
observational than experimental, and often involve small sample sizes. Notably, background characteristics of
wild animals are, most of the time, unknown. Consequently, investigating any causal effect of instrumentation is
a difficult task, subjected to hidden biases.
2. We describe the counterfactual model to causal inference which was implicit in early biologging studies. We
adoptedmethods developed in social and political sciences to construct a posteriori an appropriate control group.
Using biologging data collected on Scopoli’s shearwaters (Calonectris diomedea) from a small Mediterranean
island, we used thismethod to achieve objective causal inference on the effect of instrumentation on breeding per-
formance and divorce.
3. Ourmethod revealed that the sample of instrumented birds was nonrandom. After identification of a relevant
control group, we found no carry-over effects of instrumentation on breeding performance (taking into account
imperfect detection probability) or divorce rate in Scopoli’s shearwaters.
4. Randomly chosen control groups can be both counterproductive and ethically dubious via unnecessary addi-
tional disturbance of populations. The counterfactual approach, which can correct for selection bias, has wide
applicability to biologgingwithin long-term studies.
Introduction
There is no controversy over the beneficial impact of the bio-
logging revolution for wildlife ecology (Ropert-Coudert et al.
2012, but see Hebblewhite & Haydon 2010). Biologging is the
‘use of miniaturized animal attached tags for logging and/or
relaying data about an animal movements, behaviour, physiol-
ogy and/or environment’ (Rutz & Hays 2009). Knowledge on
the ecology of elusive animals, in particular marine species,
greatly increased over the last two decades, as epitomized by
seabird research (Wilson et al. 2002). Seabirds have been the
topic of a sustained wealth of biologging studies (Vandenabe-
ele, Wilson & Grogan 2011). The percentage of publications
addressing the potential for detrimental effects of tags on
seabirds, however, did not increase over the same period
(Vandenabeele,Wilson&Grogan 2011).
Although guidelines and suggestions on instrumentation
and animal welfare have been issued over the years (Wilson,
Grant &Duffy 1986; Phillips, Xavier &Croxall 2003; Hawkins
2004; Wilson &McMahon 2006; Casper 2009), a shortcoming
of impact studies is often the control group. Biologging is pla-
gued by a catch-22 effect (Barron, Brawn & Weatherhead
2010): behaviours uponwhichwe expect an adverse effect from
tags may not be observable without the latter. Wilson, Grant
& Duffy (1986) identified this issue and proposed to use linear
regression to infer the behaviour of untagged animals. If the
underlying statistical model is correct, this approach may pre-
dict values for the same animal, as if it had not been equipped.
The conditional tense betrays a counterfactual in the terminol-
ogy of causal inference (Rubin 2006). Indeed, causal inference
concerns what would happen following an intervention,
hypothetical or real (Gelman&Hill 2007).
Sample size is another issue. Random sampling guarantees
that background characteristics of animals will be balanced on
average between a control group and an instrumented group.
Yet, any specific study, especially small sample-sized ones, will
have some bias due to imbalance (Gelman&Hill 2007, p. 172).
A small sample cannot reproduce all the essential features of
the target population, although belief in the contrary is
widespread (Tversky&Kahneman 1971). Because only a small
number of expensive tags can typically be deployed*Correspondence author. E-mail: authierm@gmail.com
© 2013 The Authors. Methods in Ecology and Evolution © 2013 British Ecological Society
Methods in Ecology and Evolution 2013, 4, 802–810 doi: 10.1111/2041-210X.12075
(Hebblewhite &Haydon 2010), the assumption of no selection
bias is strong. The healthy-looking bird with shiny feathers is
more likely to be instrumented with an expensive tag than an
emaciated, shabby-plumaged one. Ethics and animal welfare
considerations actually forbid the second bird to be instrumen-
ted. Assessing the impact of instrumentation demands a mean-
ingful control sample, which is a group of birds that could
have been equipped but were not. Causal questions on instru-
mentation can only be unambiguously addressed if such a con-
trol group exists. In general, a potent threat to causal inference
is selection bias, that is, bias due to inadequate choice of a con-
trol sample.
Some studies of the impact of instrumentation reported no
short or long-term effects, for example on large animals
(McMahon et al. 2008). A recent meta-analysis on birds
reported an overall negative impact but also that breeding
success and survival were larger for birds equipped with
larger tags (Barron, Brawn & Weatherhead 2010). Similarly,
Gr�emillet et al. (2005) found that the resighting rate of Arctic
Great Cormorants (Phalacrocorax carbo) 2 years after instru-
mentation was higher for birds which had been equipped with
internal heart-rate data loggers. It is difficult to believe the
causal interpretation of theses results, which may rather be
statistical artefacts, such as Simpson’s or Lord’s paradox (see
Appendix S1).
Simpson’s paradox occurs whenever the relationship
between two categorical variables differs depending upon
whether subgroups are accounted for in an analysis or not. Its
resolution often lies in causal reasoning: only variables that are
unaffected by the treatment should be accounted for (see
Appendix S1). This truism stems from the definition of a cause
for an effect: (i) the putative cause happened before the effect
(spatio-temporal contiguity); (ii) putative cause and effect co-
vary; and (iii) other potential putative causes that may affect
the phenomenon are ruled implausible (confounders are neu-
tralized). Lord’s paradox is more subtle (see Appendix S1,
Lord 1967; Holland & Rubin 1983), but illustrates the impor-
tance of defining relevant comparisons and clearly stating all
assumptions underlying estimates implied to be causal (Rubin,
Stuart & Zanutto 2004). Lord’s paradox occurs whenever a
control group is missing: conclusions on the cause of effects
may result more from seemingly innocuous statistical assump-
tions rather than data (King & Zeng 2007; Arah 2008). Causal
inference aims at predicting what would have happened, had a
treated unit been left as a control. That is, inference proceeds
by estimating some unobserved outcomes, either implicitly
(classical approach) or explicitly (Rubin, Stuart & Zanutto
2004; Rubin 2006). In both cases, assumptions and definitions
are required to avoid paradoxical results.
Methods
Our focus is two-fold: (i) how to find a control group in an observa-
tional study; and (ii) how to ensure that the control group is adequate
for causal inference. These steps will help assessing whether unambigu-
ous causal inference is possible. Observational studies and randomized
experiments have often been opposed in ecology (Sagarin & Pauchard
2010). Rather than a dichotomy, they represent two ends on a contin-
uum of suitability to infer the cause of effects (Rubin 2007). In a ran-
domized control trial (RCT), treatment is randomly allocated to units
such that both known and unknown confounders are evenly distrib-
uted between control and treated units: each unit has a nonzero proba-
bility of receiving each treatment, independently of other units. This is
usually not the case in an observational study, but the latter may be
conceptualized as a ‘broken RCT’ (Rubin 2006). Below is detailed how
to mend a posteriori an observational study as if it were a RCT, draw-
ing from methods developed in the political and social sciences (Rubin
2006, 2007, 2008; Sekhon 2009; Austin 2011; Sekhon 2011). Methods
are summarized on Fig. 1.
TREATMENT
The first step in any causal analysis is to define the treatment of interest.
The aim of causal inference is to investigate what would happen to an
outcome variable following a potential intervention or manipulation.
A clearly defined treatment enables the identification of appropriate
comparisons, irrespective of the technical methods used to estimate the
effects of a cause on an outcome (Rubin, Stuart &Zanutto 2004).
POTENTIAL OUTCOMES
Causal inference has a fundamental problem (Rubin 1978): a unit (indi-
vidual) is either treated (Ti ¼ 1) or not (Ti ¼ 0) but cannot be both.
Its observed response is as follows:
yi;obs ¼ Ti � yið1Þ þ ð1� TiÞ � yið0Þ eqn 1
yið1Þ and yið0Þ are potential outcomes, of which only one will effectively
materialize. Either yið1Þ is observed and yið0Þ becomes the counterfactu-
al or yið0Þ is observed and yið1Þ becomes the counterfactual. The count-
erfactual model (Eqn 1, Fig. 2a) has two core characteristics (Rubin
1978): first, it defines a causal effect as a comparison of potential out-
comes on a common set of units. The causal effect for a unit can be the
difference yið1Þ � yið0Þ, and the average causal effect is E ½yð1Þ � yð0Þ�.Second, the counterfactual model stresses the importance of study
design by insisting on the assignment mechanism. The assignment
mechanism is the hypothetical or real rule that guided the decision
whether to treat a unit (Fig. 2a). It describes which potential outcome
is observed: yið1Þ or yið0Þ. Causal inference in an observational study is adoublymissing data problemwith both the assignmentmechanism and
one of the potential outcomesmissing (Fig. 2b).
STABLE UNIT TREATMENT-VALUE ASSUMPTION
Data yobs are assumed fixed and randomness stems from the assign-
ment mechanism: this is the Stable Unit Treatment-Value Assumption
(SUTVA, Gelman et al. 2003, p. 201). SUTVA entails (i) only one ver-
sion of the treatment and (ii) no interference between units: the poten-
tial outcome observed for a given unit is independent of the treatment
assignment for other units (Sekhon 2009). If SUTVA is violated, there
are more than two potential outcomes, which complicates the identifi-
cation of causes (Fig. S1). Instances where SUTVA does not hold are
beyond the scope of this study (see Chapter 6 ofGelman et al. 2003).
Potential outcomes stress the importance of time for causal infer-
ence: before exposure to a treatment, two outcomes are possible. The
familiar notation yobs eclipses the assignment mechanism. Familiar
regression modelling implicitly relies on counterfactuals, but does not
necessarily correct for selection bias (Gelman&Hill 2007;King&Zeng
2007). Counterfactual outcomes may be predicted with this approach,
© 2013 The Authors. Methods in Ecology and Evolution © 2013 British Ecological Society, Methods in Ecology and Evolution, 4, 802–810
Designing biologging studies 803
but may also be very sensitive to modelling assumptions (see Appen-
dix S1, Gelman&Hill 2007;King&Zeng 2007).
PROPENSITY SCORE
Designing an observational studymeans reconstructing the assignment
mechanism with a probability model for treatment Ti given covariates
(Xi). Covariates are any variables unaffected by instrumentation and
include pretreatment variables such as age or sex. If intermediate out-
comes are included, Simpson’s paradox may arise. Assuming that no
confounder is omitted:
ei ¼ PrðTijXiÞ eqn 2
where ei is the propensity score or the probability of a unit receiving
treatment as a function of observed covariates (Rubin 2008).
MATCHING ON PROPENSITY SCORE
The propensity score is a balancing score, such that the (statistical) dis-
tribution of covariates for a given value of ei is the same whether a unit
received treatment or not (Rubin 2006). The propensity score is the
coarsest many-to-one balancing score, meaning that covariate balance
between control and treatment can be achieved by matching solely on
ei (Rubin 2007). Given a single value of ei, a suitable control for a trea-
ted individual is simply an untreated onewith a similar value of ei.
While ei is known in a RCT, ei is missing in an observational study
and must be estimated (ei). In biologging, animals that had no chance
to be instrumented (ei ¼ 0) or were bound to be equipped (ei ¼ 1)
cannot be used for causal inference. Realistic counterfactuals entail
0\ ei \ 1. For example, in burrow-nesting seabirds, nest accessibility
(burrow depth) affects trap-ability. An additional complication stems
1) Define Treatment
2) Define Outcome(s)
3) SUTVA
Is the Causal Effect Identifiable?
What are the Required Assumptions?
4) Estimate
Propensity Scores
5) Matching
6) Placebo Tests
7) Unveil Outcomes
8) Estimate ATT
DATA UNINFORMATIVE
FOR
CAUSAL INFERENCE
Untenable Assumptions
No Relevant Background Variable
No Suitable Match
Always positive
Fig. 1. Design flowchart of designing an observational study to mimic a randomized control trial. Dotted arrows symbolize feedback loops. ATT
stands for ‘Average Treatment effect on the Treated’ and is the causal effect of interest.
© 2013 The Authors. Methods in Ecology and Evolution © 2013 British Ecological Society, Methods in Ecology and Evolution, 4, 802–810
804 M. Authier et al.
from imperfect detection: some animals may be more trappable than
others, and such unobserved heterogeneity may give rise to Simpson’s
paradox.
PLACEBO TESTS
Propensity score matching mimics randomization after data collection,
but still assumes no hidden bias. Matching methods have the potential
to correct for selection bias, but may nevertheless fail. A painful, but
potentially valid, conclusionmay be that the data at hand are not infor-
mative on some relevant causal effects. In order to check that matching
has indeed corrected bias, one may test that no important variable is
omitted in Eqn 2 by comparing control and treated samples for a dif-
ference that should be null by design. Thus, the strong ignorability
assumption behind propensity score estimation may be checked with
placebo tests (Sekhon 2009). Positive placebosmay suggest that further
assumptions are required, to which results may be sensitive (see Appen-
dix S1).
OBSERVED OUTCOMES
One crucial aspect of an RCT is that outcomes are unavailable
when the study is implemented. To mimic this important feature,
outcomes of interest should be withheld until a suitable control
group is identified to avoid data snooping (Rubin 2008). Matching
does not require knowledge of the observed outcomes (see exam-
ples in Sekhon 2011).
CAUSAL EFFECT
Causal effects are average treatment effects on the treated (ATT):
ATT¼ E yð1Þ � yð0ÞjT¼ 1� �¼ E yð1ÞjT¼ 1
� �� E yð0ÞjT¼ 1� �
eqn 3
With a suitable control group, a consistent estimate of the counterfactu-
al E yð0ÞjT ¼ 1� �
is E yð0ÞjT ¼ 0� �
. Alternatively, a model can be used
to predict counterfactuals yi;pred ¼ yið0Þ, which are then comparedwith
observed values yi;obs ¼ yið1Þ (Rubin 1978;Gelman&Hill 2007).
We will now illustrate propensity score matching to correct for selec-
tion bias with an investigation on the impact of tags on Scopoli’s shear-
waters (Calonectris diomedea). Managers and scientists may be
concerned whether instrumenting seabirds causes divorce or interfere
with breeding performance the following year. Divorce, a potentially
costly event (Choudhury 1995), is defined as a bird pairing with a new
partner (at time t + 1) in spite of its formermate (at time t) being simul-
taneously present and alive (Choudhury 1995).
Material
FIELD WORK
Deployments were carried out between mid-July and mid-September
2011 on Riou Island (43�1003400 N, 5�2301000 E), offMarseille, France.
Thirty-four GPSwere deployed on one partner from 34 different active
nests of Scopoli’s shearwaters. Population size is estimated at 280–300
breeding pairs (Anselme &Durand 2012). Rats (but not cats) are pres-
ent on the island, but subjected to a regulation program. Breeding
activity was determined as part of the long-term demographic monitor-
ing program run since 1976 by the Conservatoire d’Espaces Naturels
de Provence-Alpes-Cotes d’Azur (CEN-PACA).
Birds were caught inside their underground burrows, at night, after
chick feeding. GPS were attached to back feathers using Tesa� tape
(Tesa s.a.s., Savigny le Temple, France). Total weight of a GPS was
20 g (4�0 cm 9 2�2 cm 9 0�8 cm), corresponding to 3�1% and 3�6%of average body mass for males and females, respectively. Equipped
birds were weighed with a spring scale at deployment. In addition to
GPS, time-depth recorders (TDR, ⊘ 8 mm 9 11 mm weighing 2�7 g)
were attached with Tesa� tape on tail feathers on all but six birds. The
average bodymass of equippedmales was 630 g (range 580–760 g) and
550 g (range 490–600 g) for equipped females. Out of the 34 deployed
GPS, 31 were subsequently recovered, usually within 4 days after at
ui
Ti = 1 Ti = 0
yi(1) = 1 yi yi yi(1) = 0 (0) = 1 (0) = 0Potentialoutcomes
Science
Treatment
Assignmentmechanism
Unit bird i
Tag No Tag
Divorce No divorce Divorce No divorce
(a) (b)
Fig. 2. The Rubin Causal Model or counterfactual model. (a) A randomized control trial detailing the two steps of (i) assigning units to either the
control or treatment conditions before (ii) recording outcomes of causal interest. At the design stage, both potential outcomes are still possible. (b)
An observational study: the missing red arrow emphasizes that the assignment mechanism is missing and must be inferred to construct a valid con-
trol group for causal inference.
© 2013 The Authors. Methods in Ecology and Evolution © 2013 British Ecological Society, Methods in Ecology and Evolution, 4, 802–810
Designing biologging studies 805
least one foraging trip at sea (maximum: 4 trips). Upon recapture, GPS
and TDR were retrieved, and the tip of two primary feathers was
clipped for isotopic analyses. Subsequently, 21 (out of 31) birds were
equipped with a geolocator (GLS,⊘ 8 mm 9 35 mm weighing 3�6 g)
mounted on a plastic ring. Handling time was kept ≤10 minutes.
Among the initial 34 equipped birds, three individuals were not metal-
banded andwere excluded from the analysis.
Instrumentation had no obvious short-term impacts on birds;
they all performed foraging trips and returned to their nesting bur-
rows at night. We did not assess short-term impacts because of
the following: (i) such studies already exist (Igual et al. 2005; Vil-
lard, Bonenfant & Bretagnolle 2011); and (ii) they are potentially
biased by inadequate choice of control birds. Also, (iii) using a
control group is both logistically and ethically challenging when
working on small, vulnerable populations because it doubles the
disturbance of animals. Finally, we already examined the data at
the end of the 2011 field season: no instrumented bird was a failed
breeder. Because outcomes for the 2012 breeding season (divorce,
breeding decision and success) were unknown, we could objectively
design an observational study.
From the CEN-PACA database, we extracted the life histories of all
shearwaters breeding on Riou Island in 2011. Sex was behaviourally
determined from calls. Most birds were ringed as adults, and only one
instrumented bird was ringed as a chick. Body masses correspond to
the average adult mass of birds across all resighting events before 2011.
Birds with missing sex or adult body mass information were excluded,
yielding a total of 183 birds with nomissing information.
PROPENSITY SCORE ESTIMATION
To control for trap-ability, wemodelled the propensity score (the prob-
ability to equip a bird with tags) as:
robitðeiÞ ¼
Interceptþb1 � Sexiþb2 �Ringed as Chickiþb3 �Massiþb4 �Nb. Prev. Breedingsiþb5 �Nb. Prev. Capturesi
8>>>>><>>>>>:
eqn 4
Because instrumentation is a rare event, we used the cdf of Student tdis-
tribution of location 0, scale 1�5484 and 7 degrees of freedom as a
robust link function (Fig. S2; Liu 2004). Using data augmentation
(Albert & Chib 1993), we used shrinkage regression with a horseshoe
prior (Carvalho, Polson & Scott 2010) to achieve automatic variable
selection in propensity score estimation (Greenland 2008). Balance was
graphically assessed (see Table S1 and Fig. S3 for results without
shrinkage). Model fitting was performed with WINBUGS (Lunn et al.
2000) called from R (RDevelopment Core Team 2012). Prior specifica-
tions are available as Supporting Information.
MATCHING
We matched individuals (without replacement) according to their esti-
mated linear propensity scores [robit(ei)] with the R package MATCHING
(Sekhon 2011): we used Mahalanobis metric matching within propen-
sity score caliper (Rubin 2006). Caliper width was set to 1/4 of the stan-
dard deviation in estimated linear propensity scores (se) (Sekhon 2011).
Only birds whose linear propensity score satisfied robitðetÞ�ðse=4Þ� robitðeiÞ� robitðetÞ þ ðse=4Þ were considered suitable
matches for a bird equippedwith tags (denotedwith the subscript t).We
excluded as potentialmatch anypartner of an equippedbird.
Initially, 29 birds out the 31 equipped could bematched.Nomatches
were found for the two individuals corresponding to the two rightmost
red strips on Fig. 3. In order to check that the matching procedure
worked, we performed a placebo test (Sekhon 2011). A placebo tests
for a causal effect that is null by definition (Fig. 1). We assessed
whether the probability of detecting a bird in 2010, that is the year prior
to tag deployment, was different between equipped and control birds.
This placebo revealed that equipped birds were more likely to be
detected in 2010 than control birds (difference in proportions:
0�20 � 0�13%, Likelihood Ratio Test = 2�4, P = 0�12). Although not
statistically significant at the 5% level, this result suggested a biased
sample due to (unobserved) trap-ability.
To remedy this, we imposed to match equipped birds that were not
detected in 2012 with control birds that were also not detected in 2012
and likewise for detected individuals.Detection in2012wasnot ourout-
come of interest. Moreover, this covariate did not enter the estimation
of propensity scores since, when birds were instrumented in 2011, it was
impossible to tell whether theywould be detected in 2012.We think that
conditioning on detection in 2012 is adequate because (i) it is not one of
our outcomes of interest; and (ii) themonitoring of theRioupopulation
isperformedbyadedicatedfield teamindependentofour research team.
Any consciousorunconsciousbias thatwemayhavehadon looking for
previously equippedbirdsdidnotaffect datacollection in2012.
With this constraint, 27 equipped birds were matched (lower panel
of Fig. 3). The placebo test revealed no obvious bias (difference in
proportions: 0�12 � 0�14%, Likelihood Ratio Test = 0�8, P = 0�37).Covariate balance between equipped and control samples was
satisfactory (Fig. 4).
OBSERVED OUTCOMES
With this suitable control group, we addressed two causal questions:
1. Does instrumentation affect the breeding performance of a bird the
following year?
2. Does instrumentation cause a bird to change partner the following
year?
Imperfect detectability of individuals remains an issue. We used a
multistate capture–recapture model to predict counterfactuals. Our
sample consists of 54 birds that were alive in 2011. Their life histories
spanned 2004–2012. We assumed that all birds survived in 2012.
Because survival is perfect until 2011 by design, death can only occur
in 2012, but is confounded by imperfect detection. For any year, a
bird could be either (i) nonbreeding, (ii) a failed breeder, (iii) a suc-
cessful breeder or (iv) not seen. There are thus three different states
and nine possible transitions (Fig. S4). A bird was considered a
successful breeder if its chick fledged, a failed breeder if it failed to
do so after laying an egg. Birds caught on the colony, but for
which no egg or chick was found in the nest, were assumed to be
nonbreeding.
We deleted all observations in 2012 for equipped birds to predict
them from the model. These predictions correspond to what would
have been observed if these birds had not been equipped.We compared
predicted ( ^yi;pred ¼ yið0Þ) and observed ( ^yi;obs ¼ yið1Þ) values with botha v2 and likelihood ratio tests:
Pvalue ¼ Prðv2pred [ v2obsÞ eqn 5
A Bayesian Pvalue (Gelman, Meng & Stern 1996) close to 0�5 reflectsno causal effect of instrumentation on breeding performance the
following year: observed data are similar to predicted counterfactuals.
A Pvalue close to either 0 or 1 betrays model misfit, suggesting a causal
effect.
© 2013 The Authors. Methods in Ecology and Evolution © 2013 British Ecological Society, Methods in Ecology and Evolution, 4, 802–810
806 M. Authier et al.
We checked the Goodness-of-fit (GOF) of the multistate model with
U-CARE (Choquet et al. 2009), which was adequate (global test,
GOF = 9�4,P = 0�97, df = 21).Model fitting was performed with WIN-
BUGS (Lunn et al. 2000) called from R (R Development Core Team
2012). Prior specifications are available as Supporting Information.
Finally, we compared the proportion of control birds which changed
partner in 2012 to that of equipped birds.
Results
There was no causal effect of instrumenting a bird with tags on
its breeding performance the following year. Results from the
multistate capture–recapture model are summarized in
Tables S2, S3 and Fig. S6. Bayesian Pvalues were 0�7 and 0�5for the v2 and likelihood ratio test respectively: observed
outcomes were not different from their predicted counterfactu-
als.
Among the 13 control birds with a known partner in 2012,
two changed partners compared with 2011. Among the 12
equipped birds with a known partner in 2012, none changed
partners compared with 2011. In the latter case, the zero
numerator is problematic for classical inference (Winkler,
Smith & Fryback 2002) but informative priors offer a solution
(Seamen, Seamen & Stamey 2012). Using data from Swat-
schek, Ristow&Wink (1994) from a colony of Scopoli’s shear-
waters in Crete, we elicited an informative prior. Divorce rate
in this Cretan population was between 3�6% (perfect detection
scenario) and 18�8% (conservative scenario). In contrast,
divorce rate on Lavezzi Island, Corsica, where black rats were
present, was 23�1% (Thibault 1994). Black rats also occur on
Riou, yet we could not determined whether they were also
present in the study population of Swatschek, Ristow &Wink
(1994). We elicited an informative Beta prior by matching the
first quartile with the value 3�6%, and its third quartile with the
value 18�8%. The resulting prior is an informative Beta
(0�86,5�77) distribution (Fig. 5) with an effective sample size
of = 7 (0�86 + 5�77).Posterior mean divorce rates among equipped birds and
control birds were, respectively, 0�000�030�17 and 0�030�130�33(means are bracketed by a 95% credible interval following
Louis & Zeger 2009). The difference was �0�29 � 0�090�07, indi-cating no causal effect of instrumentation on divorce rate the
following year (Fig. 5).
Discussion
After explicitly correcting for selection bias, we found no effect
on mate fidelity of instrumenting Scopoli’s shearwaters from
Riou Island with tags for 3–10 days during the chick-rearing
period. Taking into account imperfect detection probability,
we found no effect on breeding performance one year after
instrumentation either.
Freq
uenc
y
0
5
10
15InstrumentedNot InstrumentedControl
Linear propensity score
Freq
uenc
y
–4 −3 −2 −1 0 1 2
4
2
0
2
4
Fig. 3. Results from the robit regression for propensity score estima-
tion of the 183 breeding Scopoli’s shearwaters detected in 2011 onRiou
Island. Red strips correspond to birds that were instrumented in 2011
and light-coloured strips to birds that were not tagged. Individuals rep-
resented as light-coloured bands on the leftmost part correspond to
birds breeding in 2011 that had an extremely small probability to be
equipped and for which there is no similar equipped bird. Likewise, the
4 rightmost red bands correspond to equipped birds that had the largest
probability to be equipped and for which no matches were available
(lower panel).
−0·4
–0·2
0·0
0·2
0·4
0·6
Sex
All Tag Control All Tag Control
All Tag ControlAll Tag Control
All Tag Control All Tag Control
−1·0
−0·8
−0·6
−0·4
−0·2
0·0Chick
−2
−1
0
1
2
Mass
−1
0
1
2
3
Prev. Breed
−1
0
1
2
3
Prev· Obs
−3
−2
−1
0
1
2Prop·score
Fig. 4. One-to-one matching with Mahalanobis metric within propen-
sity score caliper (Rubin 2006). Covariates balance is illustrated by
means of Tukey plots for the 183 breeding birds in 2011, 27 equipped
birds and the corresponding 27 identified controls. The point represents
the median, and the thick line the interquartile range. Thin-lined fences
were computed as in D€umbgen &Riedwyl (2007) to illustrate asymme-
try. Covariates were standardized.
© 2013 The Authors. Methods in Ecology and Evolution © 2013 British Ecological Society, Methods in Ecology and Evolution, 4, 802–810
Designing biologging studies 807
ASSUMPTIONS AND LIMITS
Four instrumented birds could not be matched (Fig. 3): our
causal estimate does not cover the whole possible range of
observations, even if none of these four birds divorced.We also
assumed that GPS instrumentation, not geolocators or clip-
ping two feather tips, is the sole differential treatment, which
conforms with the Stable Unit Treatment-Value Assumption
(SUTVA). Without SUTVA, there are more than two poten-
tial outcomes, which complicates the identification of causes.
We deployed several tags on single individuals, a potential
SUTVA violation. Because all tags were externally attached
and the GPS, which was systematically fitted, was the largest
device, we assumed that SUTVAholds. Our study is not atypi-
cal with respect to other published ones. Our estimated ATT
was imprecise because of small sample size, another character-
istic of biologging (Hebblewhite & Haydon 2010). To increase
precision, many-to-one propensity score matching may be
used, although it may also cause further attrition in the sample
if k matches are unavailable for each treated unit. Matching
with replacement is another possibility (Sekhon 2011), but
beyond the scope of the present study.
Both Igual et al. (2005) and Villard, Bonenfant &
Bretagnolle (2011) investigated the impact of instrumentation
on Scopoli’s shearwaters. Their causal effect was the impact of
instrumenting at least one mate of a breeding pair with tags.
SUTVAdoes not hold since the probability of equipping a bird
may depend on whether its mate was instrumented or not
(Fig. S1A). As John Tukey famously declared ‘an approxi-
mate answer to the right problem is worth a good deal more
than an exact answer to an approximate problem’. We should
nevertheless strive to provide precise answers to ethics commit-
tees andmanagers for them tomake the best possible decisions
(Wilson&McMahon 2006).
Biologging studies can have several goals (studying foraging
and the impact of tags), thereby raising the possibility that
none of them can be attained satisfactorily. Learning about the
effects of tags and the foraging ecology of animals simulta-
neouslymay not be possible with the same data. A neat distinc-
tion between the numerical representation (the ATT) and the
empirical representation of a phenomenon (a bird seen with a
different mate following its instrumentation) is essential for a
fruitful discussion between scientists and managers. The
importance of defining the aim of the study and the causal
effect of interest before examining data is paramount. To fur-
ther guarantee objectivity, outcomes of interest must be kept
hidden from the analyst until a suitable control group has been
found. Designing observational studies as if they were RCT is
important for the credibility of researchers relative to ethics
committees andwildlife managers.
DESIGN VERSUS ANALYSIS
Scientists deploy expensive telemetric tags to collect data on
the ecology and physiology of wild animals in their natural
environment. The sample of equipped animals may be uncon-
sciously biased towards good-quality or easily recapturable
individuals. Valid inferences may be drawn from this sample,
but extrapolation to the larger population involves additional
assumptions (Gelman & Hill 2007). Instrumentation is a rare
event, concerning a potentially nonrepresentative fraction of
the population. In our study of Scopoli’s shearwaters, the dual
goals of estimating representative demographic rates and esti-
mating a causal effect are not attainable because of selection
bias. The low precision of the demographic estimates
(Tables S2 and S3)makes them of little use. Suppose for exam-
ple that, in a capture–recapture study, <5%of animals were in-
strumented. A multistate model is fitted to the observed data,
with an indicator variable (or a stratum) for instrumented indi-
viduals. Suppose further the model is deemed acceptable if it
accommodates 95% of the life histories. If the 5% of misfits
are precisely instrumented animals, estimated vital rates are
still reasonable, but it is risky to give a causal interpretation to
the regression coefficient for instrumentation because the
model does not provide an adequate fit to these animals.
Our aim with capture–recapture modelling was to account
for imperfect detection probability among instrumentable
birds. Because the estimated causal effect concerns ‘instrumen-
table’ animals, we cannot generalize results to the Riou popu-
lation and determine the causal effect of instrumentation on a
typical individual without defining ‘typical’.
ETHICAL IMPL ICATIONS
The ethical issue raised by our work is whether it is worth
assessing the causal effect of instrumentation by sampling ‘con-
trol’ individuals, when ‘control’ is a strong and untestable
Divorce rate
0 0·05 0·1 0·15 0·2 0·25 0·3 0·35 0·4
Control
Tag
Prior
Fig. 5. Posterior distributions of the divorce rate for control and
equipped birds. The informative prior that was used in the analysis in
depicted in grey. Points symbolize the median, thick lines a 50% credi-
ble interval and thin lines a 95% credible interval.
© 2013 The Authors. Methods in Ecology and Evolution © 2013 British Ecological Society, Methods in Ecology and Evolution, 4, 802–810
808 M. Authier et al.
assumption. Our study highlights the necessity to find a suit-
able control group before collecting punctual data (data that
are not part of a systematic monitoring effort) on random indi-
viduals if our aim is to test for instrumentation effects. One
must explicitly spell an assignment mechanism before carrying
out the study (for example, tossing a fair coin). In the case of
punctual instrumentation within a long-term monitoring
study, background characteristics can also be used prior to
deployment to define a set of similar individuals which will be
equipped or will serve as control. Causal inference is straight-
forward because the assignment mechanism is specified a
priori. A power analysis should also be carried out to assess
whether meaningful effects can be detected given the planned
number of tag deployments (Igual et al. 2005).
The scope for correcting selection bias after the experiment
is limited without detailed knowledge of animals’ background
characteristics. Propensity score methods are intrinsically a
posteriori: they may only be useful within long-term studies. In
the case of a punctual biologging study on a population of
unknown characteristics, propensity scores cannot be used to
reconstruct the assignment mechanism. Collecting data on
wild animals must be scientifically and ethically justified. Col-
lecting data on a random control group may be unjustified
when causal inference is not guaranteed: an ill-defined control
group may imply unnecessary disturbance of particularly vul-
nerable individuals. Randomly sampling a control group may
nonetheless be useful to control for large detrimental effects
which may occur during fieldwork (significant increase in trip
duration or mass loss, for instance). Fortunately, with minia-
turization of data loggers, large effects are less likely.
Conclusion
We detailed how to assess the causal impact of biologging on
instrumented animals by trying to recover a posteriori a suit-
able control group. The grim picture of limited research on the
impact of biologging (Vandenabeele, Wilson & Grogan 2011)
may partly result from the lack of guidelines to identify mean-
ingful control groups. Vandenabeele, Wilson &Grogan (2011)
or Wilson &McMahon (2006) briefly mentioned this issue but
offered no guidelines. Our incremental contribution is to
suggest existingmethods to fill that gap.
Fig. 1 details how to design an observational study to explic-
itly assess the impact of biologging on animals. Propensity
score matching is, however, not a panacea as it assumes no
hidden bias or cannot be easily used within catch-22 cases such
as the study of foraging efficiency or heart-rate frequency,
where different modelling approaches (hydrodynamics, flight
mechanics) may be more appropriate (Hazekamp, Mayer &
Osinga 2010). A pluralistic approach is clearly needed, within
which the counterfactual model should be seriously consid-
ered.
Acknowledgements
The long-term monitoring study of Scopoli’s shearwaters on Riou Island is
approved by the Centre de Recherches par le Baguage des Population d’Oiseaux
(CRBPO, Paris). Access to protected areas and tag deployments were approved
by the ethics board of the Conservatoire d’Espace Naturels de Provence-
Alpes-Cote d’Azur. Bird instrumentation was carried out under personal animal
experimentation permits #34–369 (D. Gr�emillet) and #34–505 (C. P�eron) deliv-
ered by theDirectionD�epartementale de la Protectiondes Populations.We thank
CEN-PACA staff in charge of long-term demographic monitoring of Scopoli’s
shearwaters onRiou Island: Jean PatrickDurand, C�elia Pastorelli, Nicolas Bazin,
Timoth�ee Cuchet and Lorraine Anselme. We thank Pierrick Giraudet and L�eo
Martin for tag deployment, Emmanuelle Cam for multistate models’ Goodness-
of-fit tests and Olivier Gimenez for multistate capture–recapture model BUGS
code. Emmanuelle Cam, Olivier Gimenez and Christophe Barbraud offered
suggestions on an early version of the manuscript.We thank JarrodHadfield and
two anonymous reviewers for helpful and constructive comments. The authors
declare no conflict of interest.
References
Albert, J. &Chib, S. (1993) Bayesian analysis of binary and polychotomous data.
Journal of the American Statistical Association, 88, 669–679.Anselme, L. & Durand, J. (2012) The Cory’s Shearwater Calonectris diomedea
diomedea, updated state of knowledge and conservation of the nesting
populations of the small Mediterranean Islands. Monography Initiative PIM,
Conservatoire d’EspacesNaturels de ProvenceAlpes Cotes d’Azur.
Arah, O. (2008) The role of causal reasoning in understanding Simpson’s
Paradox, Lord’s Paradox, and the suppression effect: covariate selection in the
analysis of observational studies.Emerging Themes in Epidemiology, 5, 5.
Austin, P. (2011) An introduction to propensity score methods for reducing the
effects of confounding in observational studies. Multivariate Behavioral
Research, 46, 399–424.Barron, D., Brawn, J. & Weatherhead, P. (2010) Meta-analysis of transmitter
effects on avian behaviour and ecology. Methods in Ecology and Evolution, 1,
180–187.Carvalho, C., Polson, N. & Scott, J. (2010) The horseshoe estimator for sparse
signals.Biometrika, 97, 465–480.Casper, R. (2009) Guidelines for the instrumentation ofwild birds andmammals.
Animal Behaviour, 78, 1477–1483.Choquet, R., Lebreton, J., Gimenez, O., Reboulet, A. & Pradel, R. (2009)
U-CARE: utilities for performing goodness-of-fit tests and manipulating
CApture–REcapture data.Ecography, 32, 1071–1074.Choudhury, S. (1995) Divorce in birds: a review of the hypotheses. Animal
Behaviour, 50, 413–429.D€umbgen, L. & Riedwyl, H. (2007) On fences and asymmetry in box-and-whis-
kers plots.American Statistician, 61, 356–359.Gelman, A. &Hill, J. (2007)Data Analysis Using Regression andMultilevel-Hier-
archicalModels, 1st edn.CambridgeUniversity Press, Cambridge,UK.
Gelman, A., Meng, X.L. & Stern, H. (1996) Posterior predictive assessment of
model fitness via realized discrepancies.Statistica Sinica, 6, 733–807.Gelman, A., Carlin, J., Stern, H. &Rubin, D. (2003)Bayesian Data Analysis, 2nd
edn. Chapman&HallCRC, BocaRaton, Florida, USA.
Greenland, S. (2008) Invited commentary: variable selection versus shrinkage in
the control of multiple confounders. American Journal of Epidemiology, 167,
523–529.Gr�emillet, D., Kuntz, G., Woakes, A.J., Gilbert, C., Robins, J., Le Maho, Y. &
Butlin, P. (2005) Year-round recordings of behavioural and physiological
parameters reveal the survival strategy of a poorly insulated diving endotherm
during the arctic winter. Journal of Experimental Biology, 208, 4231–4241.Hawkins, P. (2004) Bio-logging and animal welfare: practical refinements.
Memoirs of the National Institute for Polar Research, 58, 58–68.Hazekamp, A., Mayer, R. & Osinga, N. (2010) Flow simulation along a seal: the
impact of an external device. European Journal of Wildlife Management, 56,
131–140.Hebblewhite, M. & Haydon, D. (2010) Distinguishing technology from biology:
a critical review of the use of GPS telemetry data in ecology. Philosophical
Transactions of the Royal Society London series B, 365, 2303–2312.Holland, P. & Rubin, D. (1983) Principals of Modern Psychological Measure-
ment: A Festschrift for Frederic M. Lord, chapter On Lord’s Paradox,
pp. 3–26. Lawrence ErlbaumAssociates Inc., Hillsdale, New Jersey.
Igual, J., Forero, M., Tavecchia, G., Gonz�ales-Solis, J., Mart�ınez Abra�ın, A.,
Hobson, K., Ruiz, A. & Oro, D. (2005) Short-term effects of data-loggers on
Cory’s Shearwater (Calonectris diomedea).Marine Biology, 146, 619–624.King, G. & Zeng, L. (2007) When can history be our guide? The pitfalls of
counterfactual inference. International Studies Quarterly, 51, 183–210.Liu, C. (2004) Applied Bayesian Modeling and Causal Inference from Incomplete
Data Perspectives, chapter 21 –Robit Regression: a Simple Robust Alternative
© 2013 The Authors. Methods in Ecology and Evolution © 2013 British Ecological Society, Methods in Ecology and Evolution, 4, 802–810
Designing biologging studies 809
to Logistic and Probit Regression, pp. 227–238. John Wiley and Sons Ltds,
NewYork.
Lord, F. (1967)Aparadox in the interpretationof group comparisons.Psycholog-
ical Bulletin, 68, 304–305.Louis, T.&Zeger, S. (2009) Effective communicationof standard error and confi-
dence interval.Biostatistics, 10, 1–2.Lunn,W., Thomas, A., Best, N. & Spiegelhalter, D. (2000)WinBUGS – a Bayes-
ian modelling framework: concept, structure, and extensibility. Statistics and
Computing, 10, 325–337.McMahon, C., Field, I., Bradshaw, C., White, G. &Hindell, M. (2008) Tracking
and data-logging devices attached to elephant seals do not affect individual
mass gain or survival. Journal of Experimental Marine Biology and Ecology,
360, 71–77.Phillips, R., Xavier, J. &Croxall, J. (2003) Effects of satellite transmitters on alba-
trosses and petrels.Auk, 120, 1082–1090.RDevelopmentCore Team (2012)R:ALanguage and Environment for Statistical
Computing. R Foundation for Statistical Computing, Vienna, Austria. ISBN
3-900051-07-0.
Ropert-Coudert, Y., Kato, A., Gr�emillet, D. & Crenner, F. (2012) Sensors for
Ecology. Towards Integrated Knowledge of Ecosystems, chapter 1 –Biologging:Recording the Ecophysiology and Behaviour of Animals Moving Freely in
their Environment, pp. 17–42. ISBN:978-2-9541683-0-2. CNRS.
Rubin, D. (1978) Bayesian inference for causal effects: the role of randomization.
TheAnnals of Statistics, 6, 34–58.Rubin, D. (2006) Matched Sampling for Causal Effects, 1st edn. Cambridge
University Press, 32 avenue of the Americas, New York, NY 10013-2473,
USA.
Rubin,D. (2007) The design versus the analysis of observational studies for causal
effects: parallels with the design of randomized trials. Statistics inMedicine, 26,
20–36.Rubin, D. (2008) For objective causal inference, design trumps analysis.Annals of
Applied Statistics, 2, 808–840.Rubin, D., Stuart, E. & Zanutto, E. (2004) A potential outcomes view of
value-added assessment in education. Journal of Educational and Behavioral
Statistics, 29, 103–116.Rutz, C. & Hays, G. (2009) New frontiers in biologging science. Biology Letters,
5, 289–292.Sagarin, R. & Pauchard, A. (2010) Observational approaches in ecology open
new ground in a changing world. Frontiers in Ecology and the Environment, 8,
379–386.Seamen III, J., Seamen Jr, J. & Stamey, J. (2012) Hidden dangers of specifying
noninformative priors.TheAmerican Statistician, 66, 77–84.Sekhon, J. (2009)Opiates for thematches:matchingmethods for causal inference.
Annual Review of Political Science, 12, 487–508.Sekhon, J. (2011)Multivariate and propensity scorematching softwarewith auto-
mated balance optimization: the MATCHING package for R. Journal of Statistical
Software, 42, 1–52.Swatschek, I., Ristow, D. & Wink, M. (1994) Mate fidelity and parentage in
Cory’s Shearwater Calonectris diomedea – field studies and DNA fingerprint-
ing.Molecular Ecology, 3, 259–262.Thibault, J. (1994) Nest-Site tenacity and mate fidelity in relation to breed-
ing success in Cory’s Shearwater Calonectris diomedea. Bird Studies, 41,
25–28.Tversky, A. &Kahneman, D. (1971) Beliefs in the law of small numbers.Psycho-
logical Bulletin, 76, 105–110.Vandenabeele, S., Wilson, R. & Grogan, A. (2011) Tags on seabirds: how
seriously are instrument-induced behaviours considered. Animal Welfare, 20,
559–571.Villard, P., Bonenfant, C. & Bretagnolle, V. (2011) Effects of satellite transmitters
fitted to breeding Cory’s Shearwaters. The Journal of Wildlife Management,
75, 709–714.Wilson, R. & McMahon, C. (2006) Measuring devices on wild animals: what
constitutes acceptable practice? Frontiers in Ecology and the Environment, 4,
147–154.
Wilson, R., Grant, W. & Duffy, D. (1986) Recording devices on free-ranging
marine animals: does measurement affect foraging performance. Ecology, 67,
1091–1093.Wilson, R., Gr�emillet, D., Syder, J., Kierspel, M., Garthe, S., Weimerskirch, H.,
Sch€afer-Neth, C., Scolaro, J., Bost, C.A., Pl€otz, J. & Nel, D. (2002)
Remote-sensing systems and seabirds: their use, abuse and potential for
measuring marine environmental variables. Marine Ecology Progress Series,
228, 241–261.Winkler, R., Smith, J. & Fryback, D. (2002) The role of informative priors in
zero-numerator problems: being conservative versus being candid. The
American Statistician, 56, 1–4.
Received 29April 2013; accepted 22May 2013
Handling Editor:Dr. JarrodHadfield
Supporting Information
Additional Supporting Information may be found in the online version
of this article.
Table S1. Estimated regression coefficients for the propensity score
model.
Table S2. Estimated transitions and detection probabilities from the
multi-state capture-recapture models. (propensity scores estimated
with shrinkage).
Table S3. Estimated transitions and detection probabilities from the
multi-state capture-recapture models. (propensity scores estimated
without shrinkage).
Fig S1. Violations of the Stable Unit-Treatment Value Assumption
(SUTVA).
Fig S2.CumulativeDistributionFunction (CDF) of a standard logistic
distribution and a Student-t distribution with 7 degrees of freedom and
scale set to 1.5484.
Fig S3. Graphical display of covariate balance after matching on pro-
pensity scores (estimatedwithout shrinkage).
Fig S4. Graphical representation of the multi-state capture
recapture model used to estimate counterfactual outcomes for
equipped birds.
Fig S5. Graphical representation of the Student t priors of location 0,
scale 10 and 7 df on a logit scale used for detection probabilities p.
Fig S6. Comparison between predicted and observed breeding perfor-
mance in 2012 for birds equipped with tags in 2011.
Appendix S1. Simpson’s and Lord’s Paradoxes.
Data S1.Data andR codes to reproduce the analysis.
© 2013 The Authors. Methods in Ecology and Evolution © 2013 British Ecological Society, Methods in Ecology and Evolution, 4, 802–810
810 M. Authier et al.
top related