what can statistics do for me? marian scott dept of statistics, university of glasgow statistics...
Post on 28-Mar-2015
216 Views
Preview:
TRANSCRIPT
What can Statistics do for me?
Marian ScottDept of Statistics, University of Glasgow
Statistics course, August 2009
Outline of presentation
Why would or indeed should an environmental scientist need to know any statistics?
Illustration: environmental change- one of the most enduring features with– Links to research, policy, policy effectiveness
evaluation,policy and management
Why quantify?
Quantification is an essential part of most scientific activities
For the environment, quantification must account
– for inherent variability of the process or
– for lack of precise knowledge of it
and is needed for resolving many of the environmental issues of today
Decision making- Which areas should be restricted?
Prediction-What is the trend in temperature? Predict its level in 2050?
Decision making-is it safe to eat fish?
Regulatory- Have emission control agreements reduced air pollutants?
Understanding -when did things happen in the past
Some examples of current environmental issues…….
Climate change Biodiversity Arctic ice cover Water quality Extreme weather
Direct Observations of Recent Climate Change
Gobal mean temperature
Global averagesea level
Northern hemisphereSnow cover
Trends in seasons over Europe (Global Change Biology, 2006)
21 countries, 125,000 studies, 542 plant and 19 animal species, 1971-2000
Spring is on average 6 to 8 days earlier than it was 30 years ago
Analysis of 254 national time series , pattern of observed change in spring matches measured national warming (correlation coefficient –0.69, P<0.001)
Observed temperature trend in Europe (EEA signals 2004).
Global average temp increased by 0.70.2°C over the past 100 years
Change in different periods of the year may have different effects,
– start of the growing season determined by spring and autumn temps,
– changes in winter important for species survival.
Spatial patterns of change
Spatial patterns of change may be important
Changes in the start and end of the growing season between two years (1961, 2004)
– heterogeneous
Example: are atmospheric SO2 concentrations declining?
Measurements made at a monitoring station over a 20 year period
Complex statistical model developed to describe the pattern, the model portions the variation to ‘trend’, seasonality, residual variation
Quantification is model and observation basedQuestions about the model Is it valid? Are the
assumptions reasonable? Does the model make sense
based on best scientific knowledge?
Is the model credible? Do the model predictions match the observed data?
How uncertain are the results?
Questions we ask about data Do they result from
observational or designed; laboratory or field experiments? What scale are they collected over (time and space)?
Are they representative? Are they qualitative or quantitative?
How are they connected to processes, how well understood are these connections?
How uncertain are they?
so2 monitored in GB02
observations
so2
0 50 100 150 200 250
02
46
81
0
Plot of so2 against time, monitored in GB02Lines = Model 3
months
so2
1980 1985 1990 1995
02
46
81
0
Comments on the issue
Statistical theme- time series modelling, trend detection
Lots of variation Variation may make the pattern more difficult to
see (signal to noise ratio) There may be small numbers of unusual
observations There may be distinct changes (discontinuities)
Example 2: water quality
catchment modellingWFD requires basin management plans: measurement series covers 20 years,
including a variety of biological, chemical and hydrological indicators but irregular in time. Stations appear and disappear
Joint work with David O’Donnell, Mark Hallard (SEPA), Adrian Bowman
Spatial patterns of change
Spatial patterns of change may be important
the circles represent the stations on the network, clearly not spatially representative
Spatial patterns of change
Spatial patterns of change may be important
interpolation over the entire network from the stations is possible, but needs a spatial model
Example: how is Cs-137 distributed over a large area of SW Scotland?
Aerial survey of the area (detectors mounted in helicopters)
How to design the flight pattern (straight lines separated by 250m)?
How to match and then calibrate the results to ground based measurements?
137Cs deposition maps in SW Scotland prepared by different European teams (ECCOMAGS, 2002)
Lochs in area Y
comments on examples
Statistical themes- where to sample, and whether representative, spatial modelling
Aerial survey-how to design the flight pattern (straight lines separated by 250m)?
How to match and then calibrate the results to ground based measurements?
Comments
Spatial variation is clear There is variation amongst the measurement
techniques There are many ways of exploring the
important spatial features There is uncertainty about the spatial extent
Example-Bathing water quality
All bathing water sites are classified as either ‘Excellent’, ‘Good’, ‘Sufficient’ or ‘Poor’ in terms of the quantities of 2 different microbiological indicator bacteria
Faecal Streptococci (FS)Faecal Coliforms (FC)
‘Sufficient’ is the minimum standard that bathing water sites are required to meet
Classification for each site is based on the 90th & 95th percentiles of samples over the most recent 4 bathing seasons
joint work with Ruth Haggarty, Claire Ferguson
Boxplots show distribution of FS with respect to guideline limits Green line represents EC Directive threshold for ‘Excellent’ (95th percentile evaluation) Red line represents EC Directive threshold for ‘Good’ (90th percentile evaluation) Blue line represents EC Directive threshold for ‘Sufficient’ (90th percentile evaluation)
2004 2005 2006 2007
020
040
060
080
0
Boxplots of FS: Portobello Central
SEPA location code 4593Year
FS
2004 2005 2006 2007
020
040
060
080
0
Boxplots of FS: Sandyhills
SEPA location code 114567Year
FS
2004 2005 2006 2007
020
040
060
080
0
Boxplots of FS: Saltcoats/Ardrossan
SEPA location code 124673Year
FS
2004 2005 2006 2007
020
040
060
080
0
Boxplots of FS: Irvine
SEPA location code 124688Year
FS
2004 2005 2006 2007
020
040
060
080
0
Boxplots of FS: Troon
SEPA location code 124706Year
FS
2004 2005 2006 2007
020
040
060
080
0
Boxplots of FS: Prestwick
SEPA location code 124714Year
FS
2004 2005 2006 2007
020
040
060
080
0
Boxplots of FS: Ayr
SEPA location code 124725Year
FS
2004 2005 2006 2007
020
040
060
080
0
Boxplots of FS: Brighouse Bay
SEPA location code 124793Year
FS
2004 2005 2006 2007
020
040
060
080
0
Boxplots of FS: Ettrick Bay
SEPA location code 124817Year
FS
2004 2005 2006 2007
020
040
060
080
0
Boxplots of FS: Aberdeen
SEPA location code 233616Year
FS
Bimodality
Evidence of bimodality at some sites
This can result in the four year 95th percentile appearing greater than the maximum value within a single year
7206004803602401200
2004
2005
2006
2007
FS (per 100ml)
Year
Dotplot of FS for SandyhillsSEPA Location Code: 114567
90th Percentile95th Percentile
Assessment of Distribution
It is believed that samples have come from a log10 normal population at each site. Directive gives directions for calculating percentiles on the assumption that the data follows a log10 normal distribution
Assumption of log-normality is needed for accurate calculation of percentiles and consequently accurate compliance classification of sites
Histogram of FS
SEPA location code: 4556FS/100ml
De
nsi
ty
0 20 40 60 80 100
0.0
00
.02
0.0
40
.06
0.0
8
-2 -1 0 1 2
02
04
06
08
0
Normal Q-Q Plot
Theoretical Quantiles
Sa
mp
le Q
ua
ntil
es
Histogram of log10(FS)
SEPA location code: 4556log10(FS)/100ml
De
nsi
ty
0.0 0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
-2 -1 0 1 2
0.0
0.5
1.0
1.5
2.0
Normal Q-Q Plot
Theoretical Quantiles
Sa
mp
le Q
ua
ntil
es
0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5
0.0
0.2
0.4
0.6
0.8
1.0
FS: Site 233613
Theoretical Percentile (log10 scale): 2.63x
Fn
(x)
2.63-0.5 0.0 0.5 1.0 1.5 2.0
0.0
0.2
0.4
0.6
0.8
1.0
FS: Site 235334
Theoretical Percentile (log10 scale): 1.48x
Fn
(x)
1.48 1.52
Theoretical Percentile
Empirical Percentile
Log scale 1.48 1.52
Directive Scale 30.2 33.11
Theoretical Percentile
Empirical Percentile
Log scale 2.62 2.62
Directive Scale 416.9 416.9
comments on example
statistical themes- distributional assumptions to be tested, extreme value modelling
considerable variation both within sites over years and across sites
unusual observations appear
NERC priorities
the climate system biodiversity environment, pollution and human health sustainable use of natural resources earth system science;
Goals include responding to climate change and predicting impacts of environmental change . Some of the fundamental research questions associated with each of these priorities require quantitative skills involving:
Statistics might be needed where?
designing and evaluation monitoring and sampling networks; sampling strategies
the analysis of observational records, (e.g. past climate indicators, water quality, pollutant trends); trends, spatio-temporal modelling, dealing with variation
the study and modelling of extreme events (e.g. sea levels, flood prediction) for prediction and management of future occurrences; extremes, risk modelling, uncertainty
evaluating the state of the environment;trends, uncertainty, prediction
Statistics might be needed where?
the use of complex computer models to simulate the whole earth system (e.g. climate change and the carbon cycle); uncertainty, model evaluation
the analysis of observational records, (e.g. past climate indicators, water quality, pollutant trends); trends, spatio-temporal modelling, dealing with variation
the study and modelling of extreme events (e.g. sea levels, flood prediction) for prediction and management of future occurrences; extremes
the evaluation and quantification of risk and uncertainty (e.g. volcanic or earthquake prediction);uncertainty, prediction
Statistics and the environment
Appropriate statistical models can give – added value to routine monitoring data, – better descriptions of complex change behaviour
and – begin to tease out climate change driven effects in
environmental quality – handle natural variation.
Greater, innovative statistical analysis needed for environmental science
Statistics and the environment
As environmental scientists, we need to try and ensure that:
data are gathered under good statistical principles and that they are not left in the filing cabinet.
We need to ensure thatGood environmental science is served by good statistical science.
Environmental science should be “Data and information rich”
Statistics training
we have chosen a number of key statistical topics to cover- there are many others
each topic will be covered in a general sense but will also have practical examples for you to work through with guidance
the main software tool will be R, which is freely available
there should be lots of opportunities to ask questions
top related