1 statistical trends and time series a recap july 2012 marian scott and adrian bowman

48
1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

Upload: benjamin-brennan

Post on 28-Mar-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

1

Statistical trends and time series

a recap

July 2012

Marian Scott and Adrian Bowman

Page 2: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

2

Measurement and assessment of change-

• Two topics to consider– Regression modelling- in general– Time series

• Leading to – Trends (combining time series and regression ideas)

• Meet some examples• Cover some of the ideas• Apply them

Page 3: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

An example of some typical environmental time series data

3

Page 4: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

An example of some typical environmental time series data

4

1985 1990 1995 2000 2005 2010

-2-1

01

23

4

Years

Lo

g T

OC

Co

nce

ntr

atio

n (

mg

/l)

Log TOC Concentrations Across the Years of 10 River Sites

R. Lui - LuiCallater BurnDubh LochR. QuoichR. Freshie - Allt Amharcaidh

Bervie WaterCowie Water - StonhavenCulter Burn - PeterculterWater of FeughSheeoch Burn

Page 5: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

5

Trends and change

• In time (SNIFFER, 2006)– A linear regression equation was calculated for each

dataset and then the trend was calculated from the gradient parameter (i.e. the rate of change) multiplied by the length of the data period to provide a clear change value since the start of the period.

• “the significance of trends was tested using the non-parametric Mann-Kendall tau test (Sneyers, 1990). Linear trends with the Mann-Kendall significance test are widely used in the analysis of climate trends”

Page 6: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

6

Joint Nature Conservation Council definition of trend

• a trend is a measurement of change derived from a comparison of the results of two or more statistics.

• A trend relates to a range of dates spanning the statistics from which it is derived, e.g. 1996 - 2000. A trend will generally be expressed as a percentage change (+ for an increase, - for a decrease) or as an index.

 

Page 7: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

7

Statistical definition of trend

• What is a statistical trend?– A long-term change in the mean level (Chatfield, 1996)– Long-term movement (Kendall and Ord, 1990)– The non-random function (t)= E (Y(t)) (Diggle, 1990)

• Trend is a long-term behaviour of the process, trends in mean, variance and extremes may be of interest (Chandler, 2002)

• Environmental change often but not always means a statistical trend

• Not restricted to linear (or even monotonic) trends

Page 8: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

8

Statistical tools for exploring and quantifying

trend• Exploratory tools

– Scatterplot, Time series plots, smoothed trends over time (are the series equally spaced, no missing data?)

• More formal tools– Can you assume monotonicity?, is the trend linear?– Non-parametric estimation and testing (classic tests)– Semi-parametric and non-parametric additive models

(for irregular spaced data)

• what is monotonic? steadily increasing or decreasing

Page 9: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

Simple Regression Model

• The basic regression model assumes:– The average value of the response y, is

linearly related to the explanatory x,– The spread of the response y, about the

average is the SAME for all values of x,

The VARIABILITY of the response y, about the average follows a NORMAL distribution for each value of x.

Page 10: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

Simple Regression Model

• Model is fit typically using least squares

• Goodness of fit of model assessed based on residual sum of squares and R2

• Assumptions checked using residual plots

• Inference about model parameters

• For water quality data, the response would be TOC, the explanatory would be year

Page 11: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

Chlorophyll and nitrogen relationship

3.53.02.52.01.51.00.5

100

90

80

70

60

50

40

30

20

10

N

chlo

ro

Scatterplot of chloro vs N

Page 12: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

Regression Output

The regression equation is

chloro = - 1.7 + 28.8 N

Predictor Coef StDev T PConstant -1.69 10.14 -0.17 0.869N 28.808 4.171 6.91 0.000S = 15.19 R-Sq = 67.5% R-Sq(adj) = 66.1%

Page 13: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

Conclusions

• the equation for the best fit straight line has an intercept of -1.7 and a slope of 28.8. Thus for every unit increase in N, the chloro measures increases by 28.8.

• The R2(adj) value is 66.1%, so we have explained 66% of the variation in chloro by its relationship to N. The S value is 15.19, which describes the variation in the points around this fitted line.

Page 14: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

Checking assumptions

• Usually based round residuals• Residuals are the differences between each

observation and the corresponding model fitted value

• They can be positive or negative but should be on average zero.

• Residual plots are common model assessment tools (scatterplot of residuals vs fitted values)

Page 15: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

Confidence and prediction intervals

3.53.02.52.01.51.00.5

140

120

100

80

60

40

20

0

-20

-40

N

chlo

ro

S 15.1864R-Sq 67.5%R-Sq(adj) 66.1%

Regression95% CI95% PI

Fitted Line Plotchloro = - 1.69 + 28.81 N

Page 16: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

16

A straight line model for the Nile

• Annual river flow from ~1870

• Straight line is a relatively poor fit, lots of variation.

1980196019401920190018801860

1400

1300

1200

1100

1000

900

800

700

600

500

year

vo

lum

e

S 150.552R-Sq 21.7%R-Sq(adj) 20.9%

Fitted Line Plotvol = 6132 - 2.714 year

Page 17: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

17

A straight line model for the Nile

• relatively poor fit, lot of variation.

• Any pattern in the residuals?

10501000950900850800

3

2

1

0

-1

-2

-3

-4

Fitted Value

Sta

nd

ard

ize

d R

esi

du

al

Versus Fits(response is volume)

Page 18: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

18

A quadratic model for the Nile

• better fit, still lots of variation

• Gives a smooth change, not abrupt

• Any pattern in the residuals?

1980196019401920190018801860

1400

1300

1200

1100

1000

900

800

700

600

500

time

vo

lum

e

S 140.392R-Sq 32.6%R-Sq(adj) 31.2%

Fitted Line PlotC1 = 281394 - 289.4 C2

+ 0.07465 C2**2

Page 19: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

19

a non-parametric model for the Nile

• a smooth function (LOESS) or non-parametric regression model

• OK?• In later sessions, you

will see some more flexible modelling tools

Page 20: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

20

Regression examples?

• In practical3final.txt, some R commands to complete some analyses

• Example 1: Loch Lomond, plots and simple regression

Page 21: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

21

what is a time series?

• a time series is a sequence of measurements made over time.

• notationally, this would commonly be written as y1, y2,…, yi, ….yT

• the index i denotes the position in the sequence of observations

• often we will assume that the data are equally spaced-so that i is truly an index, but for many environmental time series observations are not equally spaced.

Page 22: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

22

how to plot the data

a time series plot• choice of the x-axis scale

– occasionally, each observation is indexed by its position in the sequence (OK if equally spaced)

– alternatively, we may use the actual timescale (e.g. if an annual series, years or a daily series, then days 1-365)

– or we may regard time on a continuous scale (time might be recorded in decimal form e.g. 1986.5- which would be June 1986)- this latter is often the preferred form for statistical modelling (time is then a continuous variable)

Page 23: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

23

How is biodiversity changing (EEA CSI 009)

• Populations of common and widespread farmland bird species in 2003 are only 71% of their 1980 levels.

• an annual indicator

Page 24: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

24

How is biodiversity changing (kitiwakes) (JNCC DEFRA)

• the UK index of kittiwake abundance has declined rapidly since the early 1990s, such that by 2009 the index was just 50% of that in 1986, the lowest value in the 24 years of monitoring.

• Notice the uncertainty bands

Page 25: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

25

Water quality- freshwater

• Concentrations of P generally decreased

• Nitrate concentrations decreasing

• What are the rates of change and are they significant?

Page 26: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

26

Another example - monthly mean CO2

levels

Page 27: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

27

daily mean temperature

days

de

gre

e c

elc

ius

-10

01

02

0

01/01/1973 01/01/1980 01/01/1987 12/31/1993 12/31/2000

daily minima temperature

days

de

gre

e c

elc

ius

-20

-10

01

0

01/01/1973 01/01/1980 01/01/1987 12/31/1993 12/31/2000

Example: a time series plot (daily values)

the x-axis shows the actual date

Page 28: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

28

Loch Leven (NERC-CEH)

SRP

Years

SR

P,

mu

g/l

1970 1980 1990 2000

02

04

06

0

TP

Years

TP

, m

ug

/l

1970 1980 1990 2000

50

10

01

50

Secchi

Years

Se

cch

i, m

etr

es

1970 1980 1990 2000

12

34

Daphnia

Years

Da

ph

nia

, in

div

idu

als

/l

1970 1980 1990 2000

02

04

06

08

0

Chlorophyll

Years

Ch

loro

ph

yll,

mu

g/l

1970 1980 1990 2000

05

01

00

15

02

00

Water Temperature

Years

Wa

ter

Te

mp

era

ture

, o

C

1970 1980 1990 20000

51

01

52

0

Page 29: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

29

SO2 monitored in AT02

observation number

ug

S/m

3

0 2000 4000 6000 8000

02

04

06

08

01

00

Example- air quality, monitored through time (from EMEP programme)

note the gaps and the rather extreme values- one strategy is to take logs

These are daily data

Page 30: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

30

Data

Page 31: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

31

Observed temperature anomalies in Europe.

• Change in different periods of the year may have different effects, – start of the growing season

determined by spring and autumn temps,

– changes in winter important for species survival.

– note that the presentation shows winter and summer separately

Page 32: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

32

Nitrate in the Clyde sea area in different seasons

0

10

20

30

40

50

Season

0 10 20 30 40

Season

0 10 20 30 40

Season

0 10 20 30 40

Season

0 10 20 30 40

Depth

NO

3's

co

nce

ntra

tion

leve

l

River Clyde

Page 33: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

33

Loch Leven

Years

0 10 20 30

51

01

5Temperature 1968 to 2001

Winter Water TempSpring Water TempSummer Water TempAutumn Water Temp

Page 34: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

34

Environmental time series data features

• patterns over time (both short and long term)

• often missing data- may cause problems for statistical analysis

• variation, which may not be constant over time so may need to consider transformations (log)

Page 35: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

35

Seasonal patterns (cycles)

• in many environmental times series, we could imagine some periodicity (e.g. such as a monthly pattern in temperature)

• so it is common to produce a “seasonality plot”. the index (x-axis scale) depends on the period over which the cycle repeats itself (monthly, daily)

• We will need to include a term in any model to describe these features

Page 36: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

36

Log SRP

Month

Lo

g S

RP

, m

ug

/l

2 4 6 8 10 12

-20

24

Log TP

MonthL

og

TP

, m

ug

/l2 4 6 8 10 12

3.5

4.0

4.5

5.0

Log Chlorophyll

Month

Lo

g C

hlo

rop

hyl

l, m

ug

/l

2 4 6 8 10 12

01

23

45

Log Daphnia

Month

Lo

g D

ap

hn

ia,

ind

ivid

ua

ls/l

2 4 6 8 10 12

-4-2

02

4

Log Secchi

Month

Lo

g S

ecc

hi,

me

tre

s

2 4 6 8 10 12

-0.5

0.0

0.5

1.0

Water Temperature

Month

Wa

ter

Tem

pe

ratu

re,

oC

2 4 6 8 10 12

05

10

15

20

Example: Loch Leven, monthly data- data are plotted over the months of the year (Lowess smooth included)

Page 37: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

37

what are the questions of interest?

• we want to know about trends, where a trend is defined to be:– the long-term sweep of the data.

• we want to know about possible seasonality (or cycles)– The seasonal component of a time series

describes a regular fluctuation which has a period. (The period is the time interval between consecutive peaks or troughs.)

Page 38: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

38

Regression examples?

• In practical3final.txt, some R commands to complete some analyses

• Qn 1b) 1: Loch Lomond, plots and simple regression- and with an investigation of seasonality

• Qn 2: dissolved oxygen in Clyde- simple and multiple regression, year, temperature and salinity are explanatory variables

Page 39: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

39

a descriptive model

• A useful descriptive model for a time series consists of 3 components:

• X = Trend + Seasonal Component + Irregular Component

or X = T+S+I• I is the irregular component, which is left over

when the trend, and seasonal components are all accounted for. It is an irregular or random fluctuation (like residuals in regression).

Page 40: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

40

smoothing a time series

• In many time series, the seasonal variation can be so strong that it obscures any trend or cyclical component. However, for understanding the process being observed (and forecasting future values of the series), trends and cycles are of prime importance. Smoothing is a process designed to remove seasonality so that the long-term movements in a time series can be seen more clearly

Page 41: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

41

days

ln(u

g S

/m3

)

0

-4-3

-2-1

01

23

45

a) smoothing of the logarithm of SO2bandwidth = 30

0 730 1460 2190 2920 3650 4380 5110 5840 6570 7300 8030

days

ln(u

g S

/m3

)

0

-4-3

-2-1

01

23

45

b) smoothing of the logarithm of SO2bandwidth = 800

0 730 1460 2190 2920 3650 4380 5110 5840 6570 7300 8030

Example: different smoothing technique applied to air quality data (that have been logged)

Page 42: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

42

Example : water quality in the River Clyde

• A very complex regression model is of the form

– yi = 0(xi) + 1(xi)cos(2xi - (xi)) + i; i = 1;…;n;

– includes a mean trend term and seasonal variation as follows: xi is year in decimal term

– This includes smooth terms 0 and 1 and a varying coefficient seasonal term (modelled parametrically) using cosines

– This can be simplified by setting some parameters to be constant

Page 43: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

43

Seasonality-river Clyde

Page 44: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

44

Page 45: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

45

Example : Loch Leven-trends correcting for covariates

• Loch Leven: key loch for water framework directive: environmental effect of interest is eutrophication:

• measurement series covers 30 years, including a variety of biological, chemical and hydrological indicators but irregular in time.

• Substantial improvement in the loch water quality,

Page 46: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

46

Log SRP

Years

Lo

g S

RP

, m

ug

/l

1970 1980 1990 2000

-20

24

Log TP

YearsL

og

TP

, m

ug

/l1970 1980 1990 2000

3.5

4.0

4.5

5.0

Log Secchi

Years

Lo

g S

ecc

hi,

me

tre

s

1970 1980 1990 2000

-0.5

0.0

0.5

1.0

Log Daphnia

Years

Lo

g D

ap

hn

ia,

ind

ivid

ua

ls/l

1970 1980 1990 2000

-4-2

02

4

Log Chlorophyll

Years

Lo

g C

hlo

rop

hyl

l, m

ug

/l

1970 1980 1990 2000

01

23

45

Water Temperature

Years

Wa

ter

Tem

pe

ratu

re,

oC

1970 1980 1990 20000

51

01

52

0

Loch Leven

Page 47: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

47

other examples to try

• Qns 3 in practical3final.txt

• Qn3 asks whether DO is different before and after an upgrade to Shieldhall sewage work, to do this in a regression framework we need to introduce a FACTOR (a variable that takes only two values to identify before and after 1985).

Page 48: 1 Statistical trends and time series a recap July 2012 Marian Scott and Adrian Bowman

48

When time is the explanatory variable

• in many situations, we expect successive observations to show correlation at adjacent time points (most likely stronger the closer the time points are), strength of dependence usually depends on time separation or lag

• for regularly spaced data, we typically make use of the autocorrelation function (ACF) to asses how strong this correlation is

• We have not considered this in the earlier examples but.....