regression models

33
1-1 Regression Models s s residual where e i X b Y i i i i e Y e b 1 0 Population Deterministic Regression Model Population Deterministic Regression Model Y Y i = = 0 + + 1 X i Y Y i only depends on the value of X only depends on the value of X i i and no other factor can and no other factor can affect Y affect Y i . . Population Probabilistic Regression Model Population Probabilistic Regression Model Y Y i = = 0 + + 1 X i + + i i n. n. E(Y |X E(Y |X i )= )= 0 + + 1 X i , , That is, That is, Y Y ij ij = E(Y |X = E(Y |X i ) + ) + ij ij 0 + + 1 X X ij ij + + ij ij i n; j = 1, 2, ... , N. n; j = 1, 2, ... , N. 0 and and 1 are population parameters are population parameters 0 and and 1 are estimated by sample statistics b are estimated by sample statistics b 0 and b and b 1 Sample Model: Sample Model:

Upload: kareem-buckley

Post on 31-Dec-2015

35 views

Category:

Documents


3 download

DESCRIPTION

Population Deterministic Regression ModelY i =  0 +  1 X i Y i only depends on the value of X i and no other factor can affect Y i . Population Probabilistic Regression Model Y i =  0 +  1 X i +  i , i = 1, 2, ... , n. E(Y |X i )=  0 +  1 X i , - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Regression Models

1-1

Regression ModelsRegression Models

ss residualwhere e i

XbYiiii

eYeb

10

Population Deterministic Regression ModelPopulation Deterministic Regression Model

YYii = = 00 + + 11XXii

YYii only depends on the value of X only depends on the value of Xi i and no other factor can affect Y and no other factor can affect Yii . .

Population Probabilistic Regression ModelPopulation Probabilistic Regression Model YYii = = 00 + + 11XXii + + iiiin.n.

E(Y |XE(Y |Xii)= )= 00 + + 11XXii , ,

That is,That is, YYijij = E(Y |X = E(Y |Xii) + ) + ijij

00 + + 11XXijij + + ijijiin; j = 1, 2, ... , N.n; j = 1, 2, ... , N. 00 and and 11 are population parameters are population parameters

00 and and 11 are estimated by sample statistics b are estimated by sample statistics b00 and b and b11

Sample Model:Sample Model:

Page 2: Regression Models

1-2

Assumptions Underlying Linear Regression– for Y

• For each value of X, there is a group of Y values, and these Y values are normally distributed.

• The means of these normal distributions of Y values all lie on the straight line of regression.

• The error variances of these normal distributions are equal (Homoscedasticity). If the error variances are not constant ( called heteroscedasticity).

• The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.

Page 3: Regression Models

1-3

Equation of the Simple Regression LineEquation of the Simple Regression Line

YY

where

XY

bb

bb

of valuepredicted the= ˆ

slope sample the=

intercept sample the= :

ˆ

1

0

10

Page 4: Regression Models

1-4

Ordinary Least Squares (OLS) Analysis

obtain to(2) intoit substitute and

(3) , have we(1),By

(2) ,0)( :b

and (1) ,0)( :

:areon minimizatifor conditionsorder first The

)(

)ˆ(

10

101

1

101

0

1

210

1

2

1,0

bn

X

n

Yb

XbbYX

XbbYb

XbbYMin

YYMin

i

n

iii

i

n

ii

n

iii

n

iii

bb

Page 5: Regression Models

1-5

)(

),(

/

/2

222

2

1

XVariance

YXCovariance

n

nYYXX

xyYYXX

n

n

YXXY

XX

xXXXX

b

n

X

n

YXY bbb

110

Page 6: Regression Models

1-6

Least Squares Analysis

XX

XY

XX

XY

SS

SSn

SS

n

YXXYYYXXSS

b

XXXX

1

2

22

n

X

n

YXY bbb

110

Page 7: Regression Models

1-7

Standard Error of the EstimateStandard Error of the Estimate

SSE

Y XY

SSE

n

Y Y

Y b b

Se

2

2

0 1

2

Sum of Squares Error

Standard Errorof the

Estimate

Page 8: Regression Models

1-8

Proof: Standard Error of the Estimate

Sum of Squares Error

Standard Errorof the

Estimate

2

)(ˆ)ˆ(

10

10

2

1022

n

SSE

XYY

XbbYYYYYYYY

YeSSE

S

bbY

eXbbYYY

e

Page 9: Regression Models

Coefficient of Determination

• The Coefficient of Determination, r2 - the proportion of the total variation in the dependent variable Y that is explained or accounted for by the variation in the independent variable X. – The coefficient of determination is the

square of the coefficient of correlation, and ranges from 0 to 1.

12-10

Page 10: Regression Models

1-10

Analysis of Variance (ANOVA)

SST

SSE

SST

SSRr

SSRSSESST

YYYYYY

YYYYYYn

i

n

i

n

iiiii

iiii

1

)ˆ()ˆ()(

)ˆ()ˆ(

2

1 1 1

222

Page 11: Regression Models

1-11

Figure: Measures of variation in regression

2)(

2)(

2)(

)()(

YYYiYYiY

YYYiYYiY

Page 12: Regression Models

1-121

1

2

11

1

2

111

1

2

11

1

2

1

1

2

11

1

2

10

1

2

110

1

2

1

1

2

1 1

1

2

1

1

2

11

)()()(

)(

)(

n

ii

n

iii

n

ii

n

iii

n

ii

n

iii

n

ii

n

iii

n

ii

n

iii

n

ii

n

ii

n

ii

n

iiii

n

ii

n

iii

n

ii

n

i

n

iiii

n

ii

n

iii

n

ii

n

iii

x

Ex

x

xEbE

x

x

x

x

x

Xx

x

x

x

Xx

x

Yx

x

xYYx

x

YYx

x

yxb

Expectation of b1

Page 13: Regression Models

1-13

.)(

1

)()(

1

)22

()(

1

)()()(

1

2

2

1

22

1

22

2221

21

1

22

112121

2221

21

1

22

2

1

2

12111

n

ii

n

iin

ii

nnn

ii

nnnn

nnn

ii

n

ii

n

iii

xx

x

xxEx

xxxx

xxEx

x

xEbEbV

Variance of b1

Page 14: Regression Models

1-14

0 1 1 11 1

0 1 1 0 1 1 01

10 1

2 21 1 1

1 1

2

1

1( ) ( ) ( ) ( ) ( )

1( ) .

1( ) ( ) ,

1where .

n ni

ii i

n

ii

n

i in n ni ii

i i in ni i i

i ii i

ii n

ii

YE b E Y b X E b X E Y XE b

n n

X X X Xn

x YY x X

b Y b X X Y z Yn nx x

x Xz

n x

Expectation of b0

Page 15: Regression Models

1-15

2

1

0 1 1 11

2 2 2 21 1

1

2 2

2 2 1 12

2 2 2 21 1

1 1 1

1 Let , .

( ) ( ) ( ) ( )

( ) ( ) ( )

1 1 1( ) 2

( )

ii i in

ii

n

i i n ni

n

n n ii

n n

i i in ni i i

n n ni i

i i ii i i

x Xz x X X

n x

V b V Y b X V z Y V z Y z Y

z V Y z V Y z

x X x Xx X

n n nx x x

2

2 22

2 212

2 2 21

1 1

2 2 2

2 21 1

2 2

1 1

1 1

( )

( ).

n

ini

n ni

i ii i

n n

i ii i

n n

i ii i

x XX

n nx x

x nX X

n x n x

Variance of b0

Page 16: Regression Models

1-16

)0])[(( .

])[(])[(])[(

])()([

)]()([)])([(),(

)(

)(1

,

11

1

2

2

21111

211

112

11

1111110010

1100

10101

1

10

bEx

X

bEXbEXbE

bXbE

bXbEbbEbbCov

Xbb

XXnn

YY

XbbY

n

ii

ii

n

i

n

ii

Covariance of b0 and b1

Page 17: Regression Models

1-17

].1

[])(1

[

])([]2[

]2[2

),(2)()()()ˆ(

).|()()()ˆ(ˆ

1

2

202

1

2

202

20

1

2

1

2

2

020

2

1

2

1

2

2

020

1

2

1

2

2

1

2

20

1

2

2202

1

2

1

2

10012000100

001010000100

n

ii

n

ii

n

iin

ii

n

iin

ii

n

iin

ii

n

ii

n

ii

n

ii

n

ii

x

x

nx

XX

n

XXnxxn

XnXnXXnxxn

XnXnXXxnx

XX

x

X

xn

X

bbCovXbVXbVXbbVYV

XYEXbEXbEYEXbbY

)ˆ( and )ˆ( 00 YVYE

Page 18: Regression Models

Confidence Interval—predict

• The confidence interval for the mean value of Y for a given value of X is given by:

12-20 p.483

n

ii

XYYn

ii

Y

YY

YY

x

x

nSS

x

x

n

S

XYEYttSY

XYEYzzY

1

2

20

ˆ

1

2

20

ˆ

ˆ

00ˆ0

ˆ

00ˆ0

1,

1 where

)|(ˆ unknown. is whenˆ

)|(ˆ known. is whenˆ

00

0

0

0

0

)](|[0|0 XYXYE

Page 19: Regression Models

1-19

n

ii

XYn

ii

n

ii

Y

x

x

nSS

x

x

n

x

x

n

YVYVYYV

YEE(YYYE

YY

YY

YYXY

1

2

20

1

2

20

1

2

202

20000

0000

000

00

0000100

11,

11 where

]1

1[

)ˆ()()ˆ(

,0)ˆ())ˆ(

on.distributi normala is )ˆ(

onsdistributi normal all are ˆ and

.ˆ usingby predict tohave we, Since

00

0

Prediction of Y0

Page 20: Regression Models

Prediction Interval of an individual value of Y0

• The prediction interval for an individual value of Y for a given value of X is given by:

12-21 p.484

n

ii

XYn

ii

YY

YY

x

x

nSS

x

x

n

S

YY

StStY

YYzzY

1

2

20

1

2

20

ˆ

0002/0

ˆ

0002/0

11,

11 where

ˆ unknown. is whenˆ

ˆ known. is whenˆ

00

000

0

000

0

Page 21: Regression Models

1-21

Figure: Confidence Intervals for Estimation

Y

X=6.5

Confidence Intervalsfor YX

Confidence Intervals for E(YX)

Page 22: Regression Models

1-22

The Coefficient of Correlation, r

• The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables.– It requires interval or ratio-scaled data

(variables). – It can range from -1.00 to 1.00.– Values of -1.00 or 1.00 indicate perfect and

strong correlation.– Values close to 0.0 indicate weak correlation.– Negative values indicate an inverse relationship

and positive values indicate a direct relationship.

Page 23: Regression Models

1-23

(Pearson Product-Moment ) Correlation Coefficient(Pearson Product-Moment ) Correlation Coefficient

1 1r

nn

n

YXXY

ii

YYXX

iini

in

YYXXn

r

YY

XX

YYXX

YYXX

iini

i

iini

i

2

2

2

2

22

1

22

1

]][[

111

11

y)var(x)var(

y)covar(x,

11

NN

N

YXXY

YiXi

YX

YiXiNi

iN

YXN

YY

XX

YX

YX

YiXiNi

i

YiXiNi

i

2

2

2

2

22

1

22

1

]][[

11

1

y)var(x)var(

y)covar(x,

For sample For population

p.489

Page 24: Regression Models

1-24

CovarianceCovariance

XY

X Y

XY

X Y

N

XYX Y

NN

SS

N

2

p. 493

Page 25: Regression Models

1-25

Coefficient of regression and correlation

222

2

222

22

2

221

2

2

2

22

)(

)(

)(

ˆ

)(

)ˆ(

ryx

xy

yx

xxy

y

xb

y

y

YY

YY

SST

SSRR

iiii

i

i

i

i

i

i

i

YYy

XXx

Page 26: Regression Models

1-26

F and t statistics

regression oft coefficien gfor testin )2/()1(

)2/()1(

)]2(/[

)2/(

ˆ

)0: gfor testin(

,)(/

)2/(

ˆ

2

2

2

2

2

2

2

1

2212

21

22

21

2

221

2

2

11

nr

rt

nr

r

nSSTe

SSTy

ne

y

MSE

MSRF

H

tS

b

S

b

xS

b

S

xb

ne

y

MSE

MSRF

o

bbe

e

Page 27: Regression Models

1-27

The Simple Regression Model-Matrix

Denote

n

nnn

nnnnnn

nn

ii

i

iii

e

e

e

ebXY

X

X

X

XebX

e

e

e

b

b

X

X

X

Y

Y

eXbb

niYYeXbbY

niXbbeXbbY

eXbbY

2

1

1

0

2

2

1

1n

1

1

2

1

121

0

2

2

1

1

1

10n

i22102

10i11101

10

,b

bb ˆ

1

1

1

,

Y

Y

Y re whe Y

1

1

1

Y

,...,1,ˆe

,...,1,Y

model Sample n 1,...,i,

Page 28: Regression Models

1-28

YXXXYXXXb

YXbXXbXXYX

bXXbYXbYY

bXXbbXYYX

bXYXbYbXYbXY

beeeYYSSEn

i iiii

11

1 1

22

022b

SSE:SSEMin

2

b-YY

X-Ye ˆ

Page 29: Regression Models

1-29

Population

1-

isThat

1

1

1

Y

Y

becomes fromMatrix

Y

,...,1,

model Population

1

1

02

1

n

1

10n

11101

10

XY

X

X

X

X

XY

niXY

nn

nn

iii

Page 30: Regression Models

1-30

2-

0 Eassumingby X

model sample from OLSBy

1

1

1

1

XXXb

bEXX

XXXX

YXXXb

Page 31: Regression Models

1-31

3-

21

211

12

1

11

11

11

XX

XXXXXX

XXXIXXX

XXXEXXX

XXXXXXEbbE

XXXXXXbb

n

Page 32: Regression Models

1-32

4 1

1

where

Xn

,

1

1

1

,111

22

22

2

2

2

1

2

1

2

22i

222

2

1

21

ii

ii

i

i

ii

i

ii

i

n

ii

iiii

i

n

n

xx

X

x

X

xn

X

nX

XX

xnXX

XXx

xnXXn

Xn

XXnXXXX

XnXX

X

X

X

XXXX

X

Page 33: Regression Models

1-33

2210

2

2

12

2

2

0

22

22

222

2

21

110

100

2111100

11002

00

110011

00

,

4by 1

3

i

ii

i

ii

ii

i

x

XbbCov

xbVar

xn

XbVar

xx

Xx

X

xn

X

XXbVarbbCov

bbCovbVar

bEbbE

bbEbE

bbb

bEbbE

From