introduction to regression - trinity college dublin 3/lecture 3.pdf · introduction to regression...

49
Introduction to Regression Using Mult Lin Regression Derived variables Many alternative models Which model to choose? Model Criticism Modelling Objective Model Details Data and Residuals Assumptions 20/01/2015 Cert in Statistics; Intro to Regression Week 3 1

Upload: nguyendan

Post on 01-Sep-2018

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Introduction to Regression

• Using Mult Lin Regression – Derived variables

Many alternative models

• Which model to choose? – Model Criticism

• Modelling Objective

• Model Details

• Data and Residuals

• Assumptions

20/01/2015 Cert in Statistics; Intro to Regression Week

3 1

Page 2: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Data Like This

• Values of coefficients

• Sampling Distributions

• Standard Errors

• 95% Confidence Intervals

• 95% Prediction Intervals

• ANOVA etc

20/01/2015

Cert in Statistics; Intro to Regression Week 3

2

Page 3: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Derived variables • General

– Logs

– Proportions and Ratios

– Indicator variables – categorical data

• Time series applications

– Indicator variables – eg seasonal effects

– Lagged variables

– Differences

– Logs and Rate of Return

20/01/2015 Cert in Statistics; Intro to Regression Week

3 3

Too many (derived) variables Redundancy Many versions of same model

Page 4: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Gas Consumption vs Temp

1086420

7

6

5

4

3

2

Temperature

Ga

s

S 0.281334

R-Sq 94.4%

R-Sq(adj) 94.1%

Fitted Line PlotGas = 6.854 - 0.3932 Temperature

1086420

5

4

3

2

1

Temperature

Ga

s

S 0.354848

R-Sq 81.3%

R-Sq(adj) 80.6%

Fitted Line PlotGas = 4.724 - 0.2779 Temperature

Period 1

Period 2

Weekly gas consumption (in 1000 cubic feet) and

the average outside temperature (in degrees

Celsius) at one house in south-east England for two

"heating seasons", one of 26 weeks before, and

one of 30 weeks after cavity-wall insulation was

installed. The object of the exercise was to assess

the effect of the insulation on gas consumption. The house thermostat was set at 20°C throughout.

Comparative

4 20/01/2015 Cert in Statistics; Intro to Regression Week

3

Page 5: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Objective

• Nominal focus on prediction

– Predict gas consumption in future for this house

– Knowing temp and whether or not insulated

• Actual interest

– Does insulation make a difference

• At all temps?

• How much? – Slope? Intercept?

– SEs? Data Like This

20/01/2015 Cert in Statistics; Intro to Regression Week

3 5

Page 6: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Using an Indicator variable

Week Insulation Temperature Gas

22 0 7.6 3.5

23 0 8.0 4.0

24 0 8.5 3.6

25 0 9.1 3.1

26 0 10.2 2.6

27 1 -0.7 4.8

28 1 0.8 4.6

29 1 1.0 4.7

Mr Derek Whiteside of the UK Building Research Station recorded the weekly gas consumption (in 1000 cubic feet) and the average outside temperature (in degrees Celsius) at his own house in south

England for two "heating seasons", one of 26 weeks before, and one of 30 weeks af ter cavityexercise was to assess the ef fect of the insulation on gas consumption.

The house thermostat was set at 20etc

Insulated Week Temperature Gas Insulated Week Temperature Gas

0 1 -0.8 7.2 1 27 -0.7 4.8

0 2 -0.7 6.9 1 28 0.8 4.6

0 3 0.4 6.4 1 29 1.0 4.7

0 4 2.5 6.0 1 30 1.4 4.0

etc etc

Two parallel data sets

One stacked data set

20/01/2015 Cert in Statistics; Intro to Regression Week

3 6

Page 7: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Simple Regression & Indicator Variable Gas vs Insulated • Insulated = 0

– Avg Gas = 4.750

• Insulated = 1 – Avg Gas = 3.483

• Diff = -1.267

Temp vs Insulated Coeff Unit Increase Random Error Design Implications

20/01/2015 Cert in Statistics; Intro to Regression Week

3 7

1.00.80.60.40.20.0

8

7

6

5

4

3

2

1

S 0.987577

R-Sq 29.8%

R-Sq(adj) 28.5%

Insulated

Gas

Fitted Line PlotGas = 4.750 - 1.267 Insulated

1.00.80.60.40.20.0

10

8

6

4

2

0

S 2.73812

R-Sq 2.6%

R-Sq(adj) 0.8%

Insulated

Tem

pera

ture

Fitted Line PlotTemperature = 5.350 - 0.8867 Insulated

Page 8: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

SLR with indicator var & T-test Two-sample T for Gas

Insulated N Mean StDev SE Mean

0 26 4.75 1.16 0.23

1 30 3.483 0.806 0.15

Difference = μ (0) - μ (1)

T-Value = 4.79

P-Value = 0.000 DF = 54

Using Pooled StDev = 0.9876

20/01/2015 Cert in Statistics; Intro to Regression Week

3 8

1.00.80.60.40.20.0

8

7

6

5

4

3

2

1

S 0.987577

R-Sq 29.8%

R-Sq(adj) 28.5%

Insulated

Gas

Fitted Line PlotGas = 4.750 - 1.267 Insulated

Regression Analysis: Gas versus Insulated

S R-sq R-sq(adj) R-sq(pred)

0.987577 29.79% 28.49% 24.35%

Coefficients

Term Coef SE Coef T-Value P-Value

Constant 4.750 0.194 24.53 0.000

Insulated -1.267 0.265 -4.79 0.000

Page 9: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Indicator Variables in Regression

1 2

21 1 2 2

2 1 1

0 1 1

2 2 1 1

1 1 1

Response variable

Predictors , (0 /1)

Statistical Model

; ~ 0,

When 0

When 1

Y Gas

x Temp x Insulated

Y x x N

x Y x

Y x

x Y x

Y x

1

1 0 2

Common Slopes

Diff bet Int'cpts

No interaction

Binary Indicator Variable

20/01/2015 Cert in Statistics; Intro to Regression Week

3 9

Page 10: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Multiple Regression Output Regression Analysis: Gas versus Temperature, Insulated The regression equation is Gas = 6.55 - 0.337 Temperature - 1.57 Insulated Predictor Coef SE Coef Constant 6.5513 0.1181 Temperature -0.3367 0.0177 Insulated -1.5652 0.0970

2 2ˆ ˆ1.565 0.097

Rough 95%CI 1.57 2(0.097)

( 1.76, 1.37)

Prev

Mean Diff 1.27 2(0.274)

SE

20/01/2015 Cert in Statistics; Intro to Regression Week

3 10

Parallel lines

Page 11: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Implementation: Categorical Variable

20/01/2015 Cert in Statistics; Intro to Regression Week

3 11

Page 12: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Regression Output: Categorical Var

Regression Analysis: Gas versus Temperature, Insulated

Categorical predictor coding (1, 0)

Model Summary

S R-sq

0.357412 90.97%

Coefficients

Term Coef SE Coef T-Value P-Value

Constant 6.551 0.118 55.48 0.000

Temperature -0.3367 0.0178 -18.95 0.000

Insulated

1 -1.5652 0.0971 -16.13 0.000

20/01/2015 Cert in Statistics; Intro to Regression Week

3 12

Regression Equation

Insulated

0 Gas = 6.551 - 0.3367 Temperature

1 Gas = 4.986 - 0.3367 Temperature

Page 13: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Aside: Omitted predictors Hidden/Lurking variables

Knowing insulation status

Slopes negative On avg, gas consumption decreases with temp

Cert in Statistics; Intro to Regression Week 3

13

Subset of data Used in exam

Uninformed by insulation status Slope positive On avg, gas consumption increases with temp!

20/01/2015

Page 14: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Interaction? Refine the question Different slopes as well?

20/01/2015 Cert in Statistics; Intro to Regression Week

3 14

Page 15: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Indicator Variables in Regression

1 2 3 2

21 1 2 2 3 3

2 1 1

0 1 1

2 2 1 3 1

2

Response variable

Predictors , (0 /1),

Combined statistical model

; ~ 0,

When 0

When 1

diff in intercepts;

Y Gas

x Temp x Insulated x Temp x

Y x x x N

x Y x

Y x

x Y x

3 diff in slopes

20/01/2015 Cert in Statistics; Intro to Regression Week

3 15

Page 16: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

New Derived Variable

20/01/2015 Cert in Statistics; Intro to Regression Week

3 16

Page 17: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Modelling two regression lines Regression Analysis: Gas versus Temperature, Insulated, Ins X Temp Gas = 6.85 - 0.393 Temperature - 2.13 Insulated + 0.115 Ins X Temp Predictor Coef SE Coef Constant 6.8538 0.1360 Temperature -0.39324 0.02249 Insulated -2.1300 0.1801 Ins X Temp 0.11530 0.03211 S = 0.323004 R-Sq = 92.8% R-Sq(adj) = 92.4% Which coeff most fundamantal to theory of heat loss?

20/01/2015 Cert in Statistics; Intro to Regression Week

3 17

Page 18: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Alt Models of two regression lines

20/01/2015 Cert in Statistics; Intro to Regression Week

3 18

Nearly equivalent Two sep lin regs Gas vs Temp Exercise Compare Coeff Ests 95% Ints a) One model, w interaction b) Two sep models

1 2

22 1

22 1

Response variable

Predictors , (0 / 1)

Two Statistical Models

0; ; 0,

1; ; 0,

NoIns NoIns NoIns

Ins Ins Ins

Y Gas

x Temp x Insulated

x Y x N

x Y x N

Page 19: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Multiple indicator variables

Will also meet

• Redundancy

• Multiple formulations of same model

20/01/2015 Cert in Statistics; Intro to Regression Week

3 19

Page 20: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Housing Completions, quarterly, 1978 to 2000

Quarter 1978 1979 1980 1981 1982 1983 1984 1985

Q1 5777 7276 3538 6642 5981 4859 5129 4947

Q2 4772 4510 6001 4710 4883 5862 4671 5188

Q3 4579 4278 5879 5570 5354 4663 4947 3930

Q4 4243 4274 6383 6314 4894 4564 3195 3360

Quarter 1986 1987 1988 1989 1990 1991 1992 1993

Q1 5186 4144 3682 3554 4296 4692 4155 3684

Q2 3719 3363 3298 3985 4477 3898 5603 4487

Q3 4533 4391 3747 5277 5011 4600 5919 5121

Q4 3726 3478 3477 4484 4752 5282 5305 6009

Quarter 1994 1995 1996 1997 1998 1999 2000

Q1 4291 5770 6582 7434 8010 9930 10302

Q2 5266 6149 7203 8799 9506 10227 11590

Q3 6871 6806 7634 9140 10103 10788 11892

Q4 7160 7879 8713 10081 11474 12079 12873

20/01/2015 Cert in Statistics; Intro to Regression Week

3 20

Page 21: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Figure 1.30 Housing Completions, quarterly, 1978 to 2000

Year

Quarter

19991996199319901987198419811978

Q1Q1Q1Q1Q1Q1Q1Q1

14000

12000

10000

8000

6000

4000

2000

Co

mp

leti

on

s

Q1

Q2

Q3

Q4

Quarter

Time Series Plot of Completions

20/01/2015 Cert in Statistics; Intro to Regression Week

3 21

Take objective: forecast one quarter ahead

Page 22: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Aside: Cubic/Quadratic Regression

20001995199019851980

16000

14000

12000

10000

8000

6000

4000

2000

time

Co

mp

sS 822.624

R-Sq 88.3%

R-Sq(adj) 87.9%

Regression

95% PI

Fitted Line PlotComps = - 1.44E+10 + 21783340 time

- 10988 time**2 + 1.848 time**3

Fitted Line plot Options

Log Quadratic Cubic

20/01/2015 Cert in Statistics; Intro to Regression Week

3 22

Page 23: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Modelling Options • Focus on stable linear structure post 1993

– Assume this structure will continue

– Exploit structure extension of Indicator Vars

– Disadvantage: smaller data set

• One model for entire data set

– Note: structure has changed; might change again

– Exploit weaker structure

• Use Lagged variables

– Advantage: use all data.

20/01/2015 Cert in Statistics; Intro to Regression Week

3 23

Page 24: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Comps, quarterly, 1993 to 2000 Option 1 – work since 1993

Year

Quarter

1999199819971996199519941993

Q1Q1Q1Q1Q1Q1Q1

13000

12000

11000

10000

9000

8000

7000

6000

5000

4000

Co

mp

leti

on

s

Q1

Q2

Q3

Q4

Quarter

Time Series Plot of Completions

Target is 2001 Q1 Use Q1 data only? OR Use all 1993-2000 data?

4 parallel lines more efficient Why/What sense?

20/01/2015 Cert in Statistics; Intro to Regression Week

3 24

Page 25: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Completions Q1 only

20001999199819971996199519941993

11000

10000

9000

8000

7000

6000

5000

4000

3000

year

Co

mp

leti

on

sS 316.477

R-Sq 98.5%

R-Sq(adj) 98.3%

Fitted Line PlotCompletions = - 1945191 + 977.8 year

Pred = -1945191 + 977.82001.00 ± 2(316.5) = (9795, 11061)

Other Qs; 4 sep lines

20/01/2015 Cert in Statistics; Intro to Regression Week

3 25

Later, use Time since 1978 Changes intercept only

Page 26: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Linear in Time plus Quarterly Ind Vars

Create set of binary variables Q1, Q2, Q3, Q4

Comps = 1Q1 + 2Q2 + 3Q3 + 4Q4

+ Time +

Year. Quarter time

Time

since

1978 Comps Q1 Q2 Q3 Q4

1993 Q1 1993 15.00 3684 1 0 0 0

1993 Q2 1993.25 15.25 4487 0 1 0 0

1993 Q3 1993.5 15.50 5089 0 0 1 0

1993 Q4 1993.75 15.75 6041 0 0 0 1

1994 Q1 1994 16.00 4291 1 0 0 0

1994 Q2 1994.25 16.25 5266 0 1 0 0

1994 Q3 1994.5 16.50 6835 0 0 1 0

1994 Q4 1994.75 16.75 7196 0 0 0 1

20/01/2015 Cert in Statistics; Intro to Regression Week

3 26

Page 27: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Multiple Indicator Vars: Tech Issue Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4

* Q4 is highly correlated with other X variables

* Q4 has been removed from the equation.

The regression equation is

Comps = - 9452 + 986 Time since 1978 - 1792 Q1 - 1139 Q2 - 758 Q3

1 1 2 2 3 3 4 4

Interp of

0 and 0

Alternatives

0 No Constant

Use 3 indicator variables only

Enter " " as categorical va

a

riabl

l

e

l i

Y Q Q Q Q t

t Q

equiv Quarter

20/01/2015 Cert in Statistics; Intro to Regression Week

3 27

Redundancy

Page 28: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Multiple Indicator Vars: Tech Issue

Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4

* Q4 is highly correlated with other X variables

* Q4 has been removed from the equation.

Comps = - 9452 + 986 Time since 1978 - 1792 Q1 - 1139 Q2 - 758 Q3

S = 297.382

OR

Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4

No constant option

Comps = 986 Time since 1978 - 11244 Q1 - 10592 Q2 - 10210 Q3 - 9452 Q4

S = 297.382

20/01/2015 Cert in Statistics; Intro to Regression Week

3 28

Note -11244 = -9452-1792 -9452 = -9452 +0 etc

Page 29: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Multiple Indicator Vars: Tech Issue

Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4

* Q4 is highly correlated with other X variables

* Q4 has been removed from the equation.

Comps = - 9452 + 986 Time since 1978 - 1792 Q1 - 1139 Q2 - 758 Q3

S = 297.382

OR

Regression Analysis: Comps versus Time since 1978, Q1, Q2, Q3, Q4

No constant option

Comps = 986 Time since 1978 - 11244 Q1 - 10592 Q2 - 10210 Q3 - 9452 Q4

S = 297.382

20/01/2015 Cert in Statistics; Intro to Regression Week

3 29

Note -11244 = -9452-1792 -9452 = -9452 +0 etc

Page 30: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Categorical Variable approach Model Summary

S R-sq

297.382 98.76%

Coefficients

Term Coef SE Coef

Constant -11244 437

time since 1978 986.5 22.9

Quarter

Q2 653 149

Q3 1034 149

Q4 1792 150

20/01/2015 Cert in Statistics; Intro to Regression Week

3 30

Regression Equations

Quarter

Q1 Comps = -11244 + 986.5 t

Q2 Comps = -10592 + 986.5 t

Q3 Comps = -10210 + 986.5 t

Q4 Comps = -9452 + 986.5 t

Consider Q2 – Q1 at t = 0

Page 31: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Derived variables and Transforms in Time Series

Lags

Differences

Rates of Return

Log scale

20/01/2015 Cert in Statistics; Intro to Regression Week

3 31

Page 32: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

All Comps, quarterly, 1978 to 2000 Option 2 – use all data, but diff model

Year

Quarter

19991996199319901987198419811978

Q1Q1Q1Q1Q1Q1Q1Q1

14000

12000

10000

8000

6000

4000

2000

Co

mp

leti

on

s

Q1

Q2

Q3

Q4

Quarter

Time Series Plot of Completions

20/01/2015 Cert in Statistics; Intro to Regression Week

3 32

Page 33: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

12000100008000600040002000

14000

12000

10000

8000

6000

4000

2000

Lag1Comp

Co

mp

s

S 1167.61

R-Sq 76.1%

R-Sq(adj) 75.8%

Fitted Line PlotComps = 564.6 + 0.9171 Lag1Comp

Auto-Regression for Time Series

Basic idea – next value ‘like’ last value (Lag1)

20/01/2015 Cert in Statistics; Intro to Regression Week

3 33

Page 34: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Auto-Regression for Time Series

Basic idea – next value ‘like’ last value (Lag1)

0 1 1

0 1 1

4 4

Auto Regression

+ * +

+ *

+ *

t lag t t

t lag t

lag t t

Y Y

Y Y

Y

Year. QuarterComps Lag1Comp Lag4Comp

1978 Q1 5777

1978 Q2 4772 5777

1978 Q3 4588 4772

1978 Q4 4234 4588

1979 Q1 7276 4234 5777

1979 Q2 4513 7276 4772

1979 Q3 4284 4513 4588

1979 Q4 4257 4284 4234

1980 Q1 7738 4257 7276

20/01/2015 Cert in Statistics; Intro to Regression Week

3 34

Page 35: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Using two lagged variables

Regression Analysis: Comps versus Lag1Comp, Lag4Comp

The regression equation is

Comps = - 387 + 0.328 Lag1Comp + 0.782 Lag4Comp :

S = 780.7

Comp Q4 2000 = 12873, Comp Q1 2000 = 10302

95% Pred Int Comp Q1 2001 = 11892 ± 2(780.7)= (10330, 13453)

20/01/2015 Cert in Statistics; Intro to Regression Week

3 35

Page 36: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Using Lagged Variables Basic Idea

Current Quarter ‘like’ prev quarter

same Q last year

1200080004000 1200080004000

12000

8000

4000

12000

8000

4000

Completions

Lag1Comp

Lag4Comp

Matrix Plot of Completions, Lag1Comp, Lag4Comp

20/01/2015 Cert in Statistics; Intro to Regression Week

3 36

Page 37: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Comparison

Lag 1 and

lag 4

Comps Lag 1 Lag 4 Q1 Q2 Q3

2000 22 Q1 10302 12079 9930 1 0 0 10451 11340.17

2000 22.25 Q2 11590 10302 10227 0 1 0 11347.5 10989.57

2000 22.5 Q3 11892 11590 10788 0 0 1 11945 11850.74

2000 22.75 Q4 12873 11892 12079 0 0 0 12979.5 12959.35

2001 23 Q1 12873 10302 1 0 0 11437 11891.51

2001 23.25 Q2 ? 11590 0 1 0 12333.5 ?

2001 23.5 Q3 ? 11892 0 0 1 12931

2001 23.75 Q4 ? 12873 0 0 0 13965.5

2002 24 Q1 ? ? 1 0 0 12423

Lin in

time + Q

inds

20/01/2015 Cert in Statistics; Intro to Regression Week

3 37

0

2000

4000

6000

8000

10000

12000

14000

16000

19

94

19

94

19

95

19

96

19

97

19

97

19

98

19

99

20

00

20

00

20

01

Forecasting models

Comps

Linear in Time, quarterindicators

Lag1 and Lag 4

1 1 2 2 3 3 4 4

1 1 4 4

Modelling Options

1 Parallel Linear Regressions

2 Seasonal Regression

More efficient for prediction

Fewer modelling assumptions

Different modelling strateg

t t

t t t t

Y Q Q Q Q t

Auto

Y Y Y

y

Page 38: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Model Criticism

• Criticism

– Does it make sense?

– Are there outliers?

• Choice amongst alternatives

– R2

– SE

20/01/2015 Cert in Statistics; Intro to Regression Week

3 38

Page 39: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Extra: Logs lags and differences

Financial data IBM share price

Natural language “%age change”

MINITAB language logs

20/01/2015 Cert in Statistics; Intro to Regression Week

3 39

Page 40: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Financial Series- IBM Prices daily

Simple Reg on Time

20/01/2015 Cert in Statistics; Intro to Regression Week

3 40

Page 41: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Log IBM Prices

Log(Yt ) vs log(Yt-1) Log(Yt ) vs t

10008006004002000

2.0

1.9

1.8

1.7

1.6

1.5

1.4

1.3

1.2

t

Lo

gp

rice

S 0.0393343

R-Sq 94.4%

R-Sq(adj) 94.4%

Regression

95% PI

IBM PricesLogprice = 1.364 + 0.000561 t

2.01.91.81.71.61.51.41.3

2.0

1.9

1.8

1.7

1.6

1.5

1.4

1.3

lag1logpriceLo

gp

rice

S 0.0080199

R-Sq 99.8%

R-Sq(adj) 99.8%

Regression

95% PI

IBM PricesLogprice = 0.002264 + 0.9990 lag1logprice

20/01/2015 Cert in Statistics; Intro to Regression Week

3 41

Page 42: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Modeled in Log Scale, presented in original units

Log(Yt )vs log(Yt-1) Log(Yt ) vs t

10008006004002000

100

90

80

70

60

50

40

30

20

10

t

pri

ce

S 0.0393343

R-Sq 94.4%

R-Sq(adj) 94.4%

Regression

95% PI

IBM Priceslog10(price) = 1.364 + 0.000561 t

1009080706050403020

100

90

80

70

60

50

40

30

20

10

lag1price

pri

ce

S 0.0080199

R-Sq 99.8%

R-Sq(adj) 99.8%

Regression

95% PI

IBM Priceslog10(price) = 0.002264 + 0.9990 log10(lag1price)

20/01/2015 Cert in Statistics; Intro to Regression Week

3 42

Page 43: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Differences/ Ratios

• First Differences Today – Yesterday

• Seasonal Diffs This Q – same Q last year

• Ratio Y(t) / Y(t-1)

• Rate of Return 100 x(Y(t) – Y(t-1))/ Y(t-1)

100 x (Ratio -1)

• Log(Ratio) Log( Y(t) ) – Log ( Y(t-1) )

20/01/2015 Cert in Statistics; Intro to Regression Week

3 43

Page 44: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Financial Series- IBM Prices daily

Simple Regression of Daily Diffs vs Time

10008006004002000

5.0

2.5

0.0

-2.5

-5.0

t

La

g1

dif

f

S 0.951260

R-Sq 0.1%

R-Sq(adj) 0.0%

Regression

95% PI

IBM Prices

Lag1diff = 0.01424 + 0.000109 t

20/01/2015 Cert in Statistics; Intro to Regression Week

3 44

Page 45: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Financial Series- IBM Prices daily Simple Regression of First Diffs of LogPrice vs Time

10008006004002000

0.04

0.03

0.02

0.01

0.00

-0.01

-0.02

-0.03

-0.04

-0.05

t

La

g1

dif

flo

gS 0.0080216

R-Sq 0.0%

R-Sq(adj) 0.0%

Regression

95% PI

IBM PricesLag1difflog = 0.000568 + 0.000000 t

20/01/2015 Cert in Statistics; Intro to Regression Week

3 45

Page 46: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Financial Series- IBM Prices daily

10008006004002000

0.04

0.03

0.02

0.01

0.00

-0.01

-0.02

-0.03

-0.04

-0.05

t

La

g1

dif

flo

g

S 0.0080216

R-Sq 0.0%

R-Sq(adj) 0.0%

Regression

95% PI

IBM PricesLag1difflog = 0.000568 + 0.000000 t

1

1 1

0.00057 0.016 0.016

1

log log 0

log log 0.00057 or in (0.00057 0.016,0.00057 0.016)

in (-0.0154,0.0166)

10 or in 10 ,10

1.0013 or in 0.96,1.04

In summary Rat

t t t

t tt

t t

t

t

P P time

P PP P

PP

e of return 0.13% per day 4%

Interpretation

20/01/2015 Cert in Statistics; Intro to Regression Week

3 46

Page 47: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Financial Series

Day to day changes most naturally expressed as % change

price tomorrow = price today small change

Log(price t+1)= Log(price t) + Log(small change) Average drift per day (for logs) is 0.00057 ie about 0.13% growth pd = 61% pa

20/01/2015 Cert in Statistics; Intro to Regression Week

3 47

Page 48: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Financial Series Confidence in future prediction

20/01/2015 Cert in Statistics; Intro to Regression Week

3 48

pt est hi lo

0.0006 -0.015 0.0166

10^ Factor 1.0013 0.9652 1.0390

Eg initial capital 1000

Day 1 1001.3 965 1039

2 1002.6 932 1079

3 1003.9 899 1122

4 1005.3 868 1165

5 1006.6 838 1211

364 1612.4 0.0 infinity

365 1614.5 0.0 infinity 61% per annum ??

Page 49: Introduction to Regression - Trinity College Dublin 3/Lecture 3.pdf · Introduction to Regression ... –Avg Gas = 3.483 • Diff = -1.267 Temp vs Insulated Coeff ... Diff bet Int'cpts

Derived Variables • Why use derived variables?

– Adding extra variables gives more options

– Challenge

• Is there a ‘cost’?

• Which is ‘best’

– “Scientific” insight can powerful & simple analysis

20/01/2015 Cert in Statistics; Intro to Regression Week

3 49