time series modelling and statistical trends

33
Time series modelling and statistical trends Marian Scott and Adrian Bowman SEPA, July 2012

Upload: lacey-joyner

Post on 01-Jan-2016

37 views

Category:

Documents


4 download

DESCRIPTION

Time series modelling and statistical trends. Marian Scott and Adrian Bowman SEPA, July 2012. smoothing a time series. In many time series, the seasonal variation can be so strong that it obscures any trend or cyclical component. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Time series modelling and statistical trends

Time series modelling and statistical trends

Marian Scott and Adrian Bowman

SEPA, July 2012

Page 2: Time series modelling and statistical trends

smoothing a time series

• In many time series, the seasonal variation can be so strong that it obscures any trend or cyclical component.

• for understanding the process being observed (and forecasting future values of the series), trends and cycles are of prime importance.

• Smoothing is a process designed to remove seasonality so that the long-term movements in a time series can be seen more clearly

Page 3: Time series modelling and statistical trends

smoothing a time series

• one of the most commonly used smoothing techniques is moving average.

• difficult choice: the window over which to smooth

• smooth series: Yi = wkYi+k

• other smoothing methods (more modern) commonly used include Lowess

Page 4: Time series modelling and statistical trends

smoothing a time series

• We have data , where Xt = number of bus passengers on the t'th day. Since the periodic variation is repeated every 7 days, a 7-period moving-average (Mt) is used to smooth the series, where:

• Mt = 1/7{Xt-3+Xt-2+…..+Xt+3}• This averages out the seasonality, since each Mt

is an average over 7 different 'seasons' (days of the week). Note, though, that Mt is only defined for t = 4, 5, ..., N-7.

• other smoothing methods (more modern) commonly used include Lowess

Page 5: Time series modelling and statistical trends

smoothing a time series

• one of the most commonly used smoothing techniques is moving average.

• smooth series: Yi = wkYi+k

• 3-point, 5-point, 7-point moving average example• window may be chosen to reflect the periodicity

of the data series

• other smoothing methods (more modern) commonly used include Lowess

Page 6: Time series modelling and statistical trends

smoothing a time series

• LO(W)ESS, is a method that is known as locally weighted polynomial regression. At each point in the data set a low-degree polynomial is fit to a subset of the data, with explanatory variable values near the point whose response is being estimated. The polynomial is fit using weighted least squares, giving more weight to points near the point whose response is being estimated and less weight to points further away.

• Many of the details of this method, such as the degree of the polynomial model and the weights, are flexible.

Page 7: Time series modelling and statistical trends

water surface temperature from Jan 1981- Feb 1992 (Piegorsch)- with lowess curve

19/12/

1991

01/11/

1990

27/09/

1989

24/08/19

88

21/0

7/19

87

19/0

6/19

86

17/0

5/19

85

12/0

4/19

84

10/0

3/19

83

09/0

2/19

82

20/0

1/19

81

30

25

20

15

10

5

0

date

tem

p

Time Series Plot of temp

Page 8: Time series modelling and statistical trends

days

ln(u

g S

/m3

)

0

-4-3

-2-1

01

23

45

a) smoothing of the logarithm of SO2bandwidth = 30

0 730 1460 2190 2920 3650 4380 5110 5840 6570 7300 8030

days

ln(u

g S

/m3

)

0

-4-3

-2-1

01

23

45

b) smoothing of the logarithm of SO2bandwidth = 800

0 730 1460 2190 2920 3650 4380 5110 5840 6570 7300 8030

Example : different smoothing technique applied to air quality data (that have been logged)

Page 9: Time series modelling and statistical trends

harmonic regression

• another way of a) describing and b) hence being able to remove the periodic component is to use what is called harmonic regression

• remember sin and cos from school?

• This allows us to capture the regular repeat pattern in each year – the seasonal effect

Page 10: Time series modelling and statistical trends

harmonic regression

• build a regression model using the sine function. sin () lies between -1 and +1, where measured in radians.

• for a periodic time series Yi we can build a regression model

• Yi = 0 + cos (2[ti - ]/p) + i

• to make this simpler, if we assume that p is known, this can be written as a simple multiple linear regression model

Page 11: Time series modelling and statistical trends

harmonic regression

• for a periodic time series Yi we can build a regression model

• Yi = 0 + sin (2[ti - ]/p) + i

• to make this simpler,

• Yi = 0 + 1ci + 2si + i

• where ci = cos(2ti/p) and si = sin(2ti/p)

• So a regression model

Page 12: Time series modelling and statistical trends

ln(SO2) in GB02 against fine gridModel 2

weeks

ln(u

g S

/m3

)

1980 1985 1990 1995 2000

-2-1

01

23

Example: red curve shows the harmonic pattern (superimposed on a declining trend).

Page 13: Time series modelling and statistical trends

Example to try

• Qn 4 in practical3final.txt

• The script shows how we can create the new explanatory variables, doy is a new variable that records where in the year (which day from 366) the sample was taken.

Page 14: Time series modelling and statistical trends

seasonal indices and de-seasonalisation

• The reason for giving the seasonally-adjusted data is to make trends and cycles more apparent.

• seasonal adjustments best explained as – step 1: define the Yt =Xt –Mt (actual-smoothed)

– step 2: average all the Yt values for each ‘season’ to give the same seasonal index (e.g. for quarterly data there would be 4 values), S

– step 3: the seasonally adjusted data Xt- S

Page 15: Time series modelling and statistical trends

correlation through time

• in many situations, we expect successive observations to show correlation at adjacent time points (most likely stronger the closer the time points are), strength of dependence usually depends on time separation or lag

• for regularly spaced data, we typically make use of the autocorrelation function (ACF)

• Data are NOT independent

Page 16: Time series modelling and statistical trends

correlation through time

• for regularly spaced time series, with no missing data, we define the sample mean in the usual way

• then the sample autocorrelation coefficient at lag k ( 0), r(k)- as the correlation between original series and a version shifted back k time units

• horizontal lines show approximate 95% confidence intervals for individual coefficients.

Page 17: Time series modelling and statistical trends

Example: ACF of raw water temperature data

605550454035302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Auto

corr

ela

tion

Autocorrelation Function for temp(with 5% significance limits for the autocorrelations)

Page 18: Time series modelling and statistical trends

correlation through time

• ACF shows a very marked cyclical pattern• interpretation of the ACF

– we need to have removed both trend and seasonality– we hope that (for simplicity in subsequent modelling)

that only a few correlation coefficients (at small lags) will be significant.

• ACF an important diagnostic tool for time series modelling (formal models called ARIMA).

• how should we remove the seasonal pattern or the trend?

Page 19: Time series modelling and statistical trends

differencing

• a common way of removing a simple trend (eg linear) is by differencing

• define a new series

• Zt = Yt – Yt-1

• a common way of removing seasonality (if we know the period to be p), is to take pth differences

• Zt = Yt – Yt-p

Page 20: Time series modelling and statistical trends

Example: ACF of water temperature data

302520151051

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Auto

corr

ela

tion

Autocorrelation Function for monthlymean(with 5% significance limits for the autocorrelations)

Page 21: Time series modelling and statistical trends

Example 1: ACF of water temperature data- difference order 12

30282624222018161412108642

1.0

0.8

0.6

0.4

0.2

0.0

-0.2

-0.4

-0.6

-0.8

-1.0

Lag

Auto

corr

ela

tion

Autocorrelation Function for 12 difference of monthly mean(with 5% significance limits for the autocorrelations)

Page 22: Time series modelling and statistical trends

Examples to try

• In practical4.txt• Exercises 1 and 2• Why is correlation important• How good is the ACF as a diagnostic

• Exercise 2 shows the output from a single command stl (which is a decomposition of the data series into trend, seasonal component and residual)

Page 23: Time series modelling and statistical trends

simple algorithm

• obtain rough estimate of trend (smoothing but one not affected by seasonality):

• subtract estimated trend

• estimate seasonal cycle from detrended series

• what is left is the irregular component,

• good alternative- STL (seasonal trend lowess) decomposition

Page 24: Time series modelling and statistical trends

An example for you to try

• Exercise 3, Central England temperature– obtain the acf – use the stl() command– Look at monthly data

Page 25: Time series modelling and statistical trends

A different type of change

• Change can be

• Abrupt

• As a result of an intervention

• So we might like to consider a slightly different form of model

Page 26: Time series modelling and statistical trends

Nile flow

• relatively poor fit of straight line model, lots of variation.

• some pattern in the residuals

1980196019401920190018801860

1400

1300

1200

1100

1000

900

800

700

600

500

year

vo

lum

e

S 150.552R-Sq 21.7%R-Sq(adj) 20.9%

Fitted Line Plotvol = 6132 - 2.714 year

Page 27: Time series modelling and statistical trends

A straight line model for the Nile

• relatively poor fit, lot of variation.

• any pattern in the residuals?

• this residual plot is against order of the observations

1009080706050403020101

3

2

1

0

-1

-2

-3

-4

Observation Order

Sta

nd

ard

ize

d R

esi

du

al

Versus Order(response is C1)

Page 28: Time series modelling and statistical trends

a non-parametric model for the Nile

• a smooth function (LOESS) or non-parametric regression model

• Seem OK?• any suggestion that

there may be a change-point?

Page 29: Time series modelling and statistical trends

A different type of change

• So we might like to consider a slightly different form of model- the river Nile was dammed in the late 1800s

• So there may be a changepoint- a shift in the mean flow level, and if so can we see it.

Page 30: Time series modelling and statistical trends

the smooths

• Two smooth curves are fit and we identify the biggest discrepancy between then

• with confidence bands added, helps identify the change location

• Where is the biggest discrepancy?

Page 31: Time series modelling and statistical trends

An alternative model for the Nile

• two smooth sections, broken at roughly 1900.

• different mean levels in the two periods

• so modelling the two periods separately

Page 32: Time series modelling and statistical trends

The moral of this example

• Trends can be challenging to identify

• Modelling needs to be flexible

• We need to be mindful of the assumptions

Page 33: Time series modelling and statistical trends

An example

• Haddocks- this is an example about fish stocks, we can try fitting some very simple time series regression models.

• We might want to predict what fish stocks might be several years in the future