regression models
DESCRIPTION
Population Deterministic Regression ModelY i = 0 + 1 X i Y i only depends on the value of X i and no other factor can affect Y i . Population Probabilistic Regression Model Y i = 0 + 1 X i + i , i = 1, 2, ... , n. E(Y |X i )= 0 + 1 X i , - PowerPoint PPT PresentationTRANSCRIPT
1-1
Regression ModelsRegression Models
ss residualwhere e i
XbYiiii
eYeb
10
Population Deterministic Regression ModelPopulation Deterministic Regression Model
YYii = = 00 + + 11XXii
YYii only depends on the value of X only depends on the value of Xi i and no other factor can affect Y and no other factor can affect Yii . .
Population Probabilistic Regression ModelPopulation Probabilistic Regression Model YYii = = 00 + + 11XXii + + iiiin.n.
E(Y |XE(Y |Xii)= )= 00 + + 11XXii , ,
That is,That is, YYijij = E(Y |X = E(Y |Xii) + ) + ijij
00 + + 11XXijij + + ijijiin; j = 1, 2, ... , N.n; j = 1, 2, ... , N. 00 and and 11 are population parameters are population parameters
00 and and 11 are estimated by sample statistics b are estimated by sample statistics b00 and b and b11
Sample Model:Sample Model:
1-2
Assumptions Underlying Linear Regression– for Y
• For each value of X, there is a group of Y values, and these Y values are normally distributed.
• The means of these normal distributions of Y values all lie on the straight line of regression.
• The error variances of these normal distributions are equal (Homoscedasticity). If the error variances are not constant ( called heteroscedasticity).
• The Y values are statistically independent. This means that in the selection of a sample, the Y values chosen for a particular X value do not depend on the Y values for any other X values.
1-3
Equation of the Simple Regression LineEquation of the Simple Regression Line
YY
where
XY
bb
bb
of valuepredicted the= ˆ
slope sample the=
intercept sample the= :
ˆ
1
0
10
1-4
Ordinary Least Squares (OLS) Analysis
obtain to(2) intoit substitute and
(3) , have we(1),By
(2) ,0)( :b
and (1) ,0)( :
:areon minimizatifor conditionsorder first The
)(
)ˆ(
10
101
1
101
0
1
210
1
2
1,0
bn
X
n
Yb
XbbYX
XbbYb
XbbYMin
YYMin
i
n
iii
i
n
ii
n
iii
n
iii
bb
1-5
)(
),(
/
/2
222
2
1
XVariance
YXCovariance
n
nYYXX
xyYYXX
n
n
YXXY
XX
xXXXX
b
n
X
n
YXY bbb
110
1-6
Least Squares Analysis
XX
XY
XX
XY
SS
SSn
SS
n
YXXYYYXXSS
b
XXXX
1
2
22
n
X
n
YXY bbb
110
1-7
Standard Error of the EstimateStandard Error of the Estimate
SSE
Y XY
SSE
n
Y Y
Y b b
Se
2
2
0 1
2
Sum of Squares Error
Standard Errorof the
Estimate
1-8
Proof: Standard Error of the Estimate
Sum of Squares Error
Standard Errorof the
Estimate
2
)(ˆ)ˆ(
10
10
2
1022
2ˆ
n
SSE
XYY
XbbYYYYYYYY
YeSSE
S
bbY
eXbbYYY
e
Coefficient of Determination
• The Coefficient of Determination, r2 - the proportion of the total variation in the dependent variable Y that is explained or accounted for by the variation in the independent variable X. – The coefficient of determination is the
square of the coefficient of correlation, and ranges from 0 to 1.
12-10
1-10
Analysis of Variance (ANOVA)
SST
SSE
SST
SSRr
SSRSSESST
YYYYYY
YYYYYYn
i
n
i
n
iiiii
iiii
1
)ˆ()ˆ()(
)ˆ()ˆ(
2
1 1 1
222
1-11
Figure: Measures of variation in regression
2)(
2)(
2)(
)()(
YYYiYYiY
YYYiYYiY
1-121
1
2
11
1
2
111
1
2
11
1
2
1
1
2
11
1
2
10
1
2
110
1
2
1
1
2
1 1
1
2
1
1
2
11
)()()(
)(
)(
n
ii
n
iii
n
ii
n
iii
n
ii
n
iii
n
ii
n
iii
n
ii
n
iii
n
ii
n
ii
n
ii
n
iiii
n
ii
n
iii
n
ii
n
i
n
iiii
n
ii
n
iii
n
ii
n
iii
x
Ex
x
xEbE
x
x
x
x
x
Xx
x
x
x
Xx
x
Yx
x
xYYx
x
YYx
x
yxb
Expectation of b1
1-13
.)(
1
)()(
1
)22
()(
1
)()()(
1
2
2
1
22
1
22
2221
21
1
22
112121
2221
21
1
22
2
1
2
12111
n
ii
n
iin
ii
nnn
ii
nnnn
nnn
ii
n
ii
n
iii
xx
x
xxEx
xxxx
xxEx
x
xEbEbV
Variance of b1
1-14
0 1 1 11 1
0 1 1 0 1 1 01
10 1
2 21 1 1
1 1
2
1
1( ) ( ) ( ) ( ) ( )
1( ) .
1( ) ( ) ,
1where .
n ni
ii i
n
ii
n
i in n ni ii
i i in ni i i
i ii i
ii n
ii
YE b E Y b X E b X E Y XE b
n n
X X X Xn
x YY x X
b Y b X X Y z Yn nx x
x Xz
n x
Expectation of b0
1-15
2
1
0 1 1 11
2 2 2 21 1
1
2 2
2 2 1 12
2 2 2 21 1
1 1 1
1 Let , .
( ) ( ) ( ) ( )
( ) ( ) ( )
1 1 1( ) 2
( )
ii i in
ii
n
i i n ni
n
n n ii
n n
i i in ni i i
n n ni i
i i ii i i
x Xz x X X
n x
V b V Y b X V z Y V z Y z Y
z V Y z V Y z
x X x Xx X
n n nx x x
2
2 22
2 212
2 2 21
1 1
2 2 2
2 21 1
2 2
1 1
1 1
( )
( ).
n
ini
n ni
i ii i
n n
i ii i
n n
i ii i
x XX
n nx x
x nX X
n x n x
Variance of b0
1-16
)0])[(( .
])[(])[(])[(
])()([
)]()([)])([(),(
)(
)(1
,
11
1
2
2
21111
211
112
11
1111110010
1100
10101
1
10
bEx
X
bEXbEXbE
bXbE
bXbEbbEbbCov
Xbb
XXnn
YY
XbbY
n
ii
ii
n
i
n
ii
Covariance of b0 and b1
1-17
].1
[])(1
[
])([]2[
]2[2
),(2)()()()ˆ(
).|()()()ˆ(ˆ
1
2
202
1
2
202
20
1
2
1
2
2
020
2
1
2
1
2
2
020
1
2
1
2
2
1
2
20
1
2
2202
1
2
1
2
10012000100
001010000100
n
ii
n
ii
n
iin
ii
n
iin
ii
n
iin
ii
n
ii
n
ii
n
ii
n
ii
x
x
nx
XX
n
XXnxxn
XnXnXXnxxn
XnXnXXxnx
XX
x
X
xn
X
bbCovXbVXbVXbbVYV
XYEXbEXbEYEXbbY
)ˆ( and )ˆ( 00 YVYE
Confidence Interval—predict
• The confidence interval for the mean value of Y for a given value of X is given by:
12-20 p.483
n
ii
XYYn
ii
Y
YY
YY
x
x
nSS
x
x
n
S
XYEYttSY
XYEYzzY
1
2
20
ˆ
1
2
20
ˆ
ˆ
00ˆ0
ˆ
00ˆ0
1,
1 where
)|(ˆ unknown. is whenˆ
)|(ˆ known. is whenˆ
00
0
0
0
0
)](|[0|0 XYXYE
1-19
n
ii
XYn
ii
n
ii
Y
x
x
nSS
x
x
n
x
x
n
YVYVYYV
YEE(YYYE
YY
YY
YYXY
1
2
20
1
2
20
1
2
202
2ˆ
20000
0000
000
00
0000100
11,
11 where
]1
1[
)ˆ()()ˆ(
,0)ˆ())ˆ(
on.distributi normala is )ˆ(
onsdistributi normal all are ˆ and
.ˆ usingby predict tohave we, Since
00
0
Prediction of Y0
Prediction Interval of an individual value of Y0
• The prediction interval for an individual value of Y for a given value of X is given by:
12-21 p.484
n
ii
XYn
ii
YY
YY
x
x
nSS
x
x
n
S
YY
StStY
YYzzY
1
2
20
1
2
20
ˆ
0002/0
ˆ
0002/0
11,
11 where
ˆ unknown. is whenˆ
ˆ known. is whenˆ
00
000
0
000
0
1-21
Figure: Confidence Intervals for Estimation
Y
X=6.5
Confidence Intervalsfor YX
Confidence Intervals for E(YX)
1-22
The Coefficient of Correlation, r
• The Coefficient of Correlation (r) is a measure of the strength of the relationship between two variables.– It requires interval or ratio-scaled data
(variables). – It can range from -1.00 to 1.00.– Values of -1.00 or 1.00 indicate perfect and
strong correlation.– Values close to 0.0 indicate weak correlation.– Negative values indicate an inverse relationship
and positive values indicate a direct relationship.
1-23
(Pearson Product-Moment ) Correlation Coefficient(Pearson Product-Moment ) Correlation Coefficient
1 1r
nn
n
YXXY
ii
YYXX
iini
in
YYXXn
r
YY
XX
YYXX
YYXX
iini
i
iini
i
2
2
2
2
22
1
22
1
]][[
111
11
y)var(x)var(
y)covar(x,
11
NN
N
YXXY
YiXi
YX
YiXiNi
iN
YXN
YY
XX
YX
YX
YiXiNi
i
YiXiNi
i
2
2
2
2
22
1
22
1
]][[
11
1
y)var(x)var(
y)covar(x,
For sample For population
p.489
1-24
CovarianceCovariance
XY
X Y
XY
X Y
N
XYX Y
NN
SS
N
2
p. 493
1-25
Coefficient of regression and correlation
222
2
222
22
2
221
2
2
2
22
)(
)(
)(
ˆ
)(
)ˆ(
ryx
xy
yx
xxy
y
xb
y
y
YY
YY
SST
SSRR
iiii
i
i
i
i
i
i
i
YYy
XXx
1-26
F and t statistics
regression oft coefficien gfor testin )2/()1(
)2/()1(
)]2(/[
/ˆ
)2/(
ˆ
)0: gfor testin(
,)(/
)2/(
ˆ
2
2
2
2
2
2
2
1
2212
21
22
21
2
221
2
2
11
nr
rt
nr
r
nSSTe
SSTy
ne
y
MSE
MSRF
H
tS
b
S
b
xS
b
S
xb
ne
y
MSE
MSRF
o
bbe
e
1-27
The Simple Regression Model-Matrix
Denote
n
nnn
nnnnnn
nn
ii
i
iii
e
e
e
ebXY
X
X
X
XebX
e
e
e
b
b
X
X
X
Y
Y
eXbb
niYYeXbbY
niXbbeXbbY
eXbbY
2
1
1
0
2
2
1
1n
1
1
2
1
121
0
2
2
1
1
1
10n
i22102
10i11101
10
,b
bb ˆ
1
1
1
,
Y
Y
Y re whe Y
1
1
1
Y
,...,1,ˆe
,...,1,Y
model Sample n 1,...,i,
1-28
YXXXYXXXb
YXbXXbXXYX
bXXbYXbYY
bXXbbXYYX
bXYXbYbXYbXY
beeeYYSSEn
i iiii
11
1 1
22
022b
SSE:SSEMin
2
b-YY
X-Ye ˆ
1-29
Population
1-
isThat
1
1
1
Y
Y
becomes fromMatrix
Y
,...,1,
model Population
1
1
02
1
n
1
10n
11101
10
XY
X
X
X
X
XY
niXY
nn
nn
iii
1-30
2-
0 Eassumingby X
model sample from OLSBy
1
1
1
1
XXXb
bEXX
XXXX
YXXXb
1-31
3-
21
211
12
1
11
11
11
XX
XXXXXX
XXXIXXX
XXXEXXX
XXXXXXEbbE
XXXXXXbb
n
1-32
4 1
1
where
Xn
,
1
1
1
,111
22
22
2
2
2
1
2
1
2
22i
222
2
1
21
ii
ii
i
i
ii
i
ii
i
n
ii
iiii
i
n
n
xx
X
x
X
xn
X
nX
XX
xnXX
XXx
xnXXn
Xn
XXnXXXX
XnXX
X
X
X
XXXX
X
1-33
2210
2
2
12
2
2
0
22
22
222
2
21
110
100
2111100
11002
00
110011
00
,
4by 1
3
i
ii
i
ii
ii
i
x
XbbCov
xbVar
xn
XbVar
xx
Xx
X
xn
X
XXbVarbbCov
bbCovbVar
bEbbE
bbEbE
bbb
bEbbE
From