![Page 1: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/1.jpg)
©2017 Kevin Jamieson 1
Linear Regression
Machine Learning – CSE546 Kevin Jamieson University of Washington
Oct 5, 2017
![Page 2: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/2.jpg)
2
The regression problem
©2017 Kevin Jamieson
# square feet
Sal
e P
rice
Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.}
Training Data:
{(xi, yi)}ni=1
xi 2 Rd
yi 2 R
![Page 3: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/3.jpg)
3
The regression problem
©2017 Kevin Jamieson
# square feet
Sal
e P
rice
Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.}
Training Data:
{(xi, yi)}ni=1
xi 2 Rd
yi 2 R
Hypothesis: linear
Loss: least squares
yi ⇡ x
Ti w
minw
nX
i=1
�yi � x
Ti w
�2
best linear fit
![Page 4: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/4.jpg)
4©2017 Kevin Jamieson
The regression problem in matrix notation
y =
2
64y1...yn
3
75 X =
2
64x
T1...x
Tn
3
75
= argminw
(y �Xw)T (y �Xw)
bwLS = argminw
nX
i=1
�yi � x
Ti w
�2
![Page 5: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/5.jpg)
5©2017 Kevin Jamieson
The regression problem in matrix notation
= argminw
(y �Xw)T (y �Xw)
bwLS = argminw
||y �Xw||22
![Page 6: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/6.jpg)
6©2017 Kevin Jamieson
The regression problem in matrix notation
= (XTX)�1XTy
bwLS = argminw
||y �Xw||22
What about an offset?
bwLS ,bbLS = argmin
w,b
nX
i=1
�yi � (xT
i w + b)�2
= argminw,b
||y � (Xw + 1b)||22
![Page 7: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/7.jpg)
7©2017 Kevin Jamieson
Dealing with an offset
bwLS ,bbLS = argminw,b
||y � (Xw + 1b)||22
![Page 8: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/8.jpg)
8©2017 Kevin Jamieson
Dealing with an offset
If XT1 = 0 (i.e., if each feature is mean-zero) then
bwLS = (XTX)�1XTY
bbLS =1
n
nX
i=1
yi
XTX bwLS +bbLSXT1 = XTy
1TX bwLS +bbLS1T1 = 1Ty
bwLS ,bbLS = argminw,b
||y � (Xw + 1b)||22
![Page 9: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/9.jpg)
9©2017 Kevin Jamieson
The regression problem in matrix notation
= (XTX)�1XTy
bwLS = argminw
||y �Xw||22
But why least squares?
Consider yi = x
Ti w + ✏i where ✏i
i.i.d.⇠ N (0,�
2)
P (y|x,w,�) =
![Page 10: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/10.jpg)
10
Maximizing log-likelihood
Maximize:
©2017 Kevin Jamieson
logP (D|w,�) = log(
1p2⇡�
)
nnY
i=1
e�(y
i
�x
T
i
w)2
2�2
![Page 11: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/11.jpg)
11
MLE is LS under linear model
©2017 Kevin Jamieson
bwLS = argminw
nX
i=1
�yi � x
Ti w
�2
if yi = x
Ti w + ✏i and ✏i
i.i.d.⇠ N (0,�2)
bwMLE = argmax
wP (D|w,�)
bwLS = bwMLE = (XTX)�1XTY
![Page 12: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/12.jpg)
12
The regression problem
©2017 Kevin Jamieson
# square feet
Sal
e P
rice
Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.}
Training Data:
{(xi, yi)}ni=1
xi 2 Rd
yi 2 R
Hypothesis: linear
Loss: least squares
yi ⇡ x
Ti w
minw
nX
i=1
�yi � x
Ti w
�2
best linear fit
![Page 13: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/13.jpg)
13
The regression problem
©2017 Kevin Jamieson
date of sale
Sal
e P
rice
Given past sales data on zillow.com, predict: y = House sale price from x = {# sq. ft., zip code, date of sale, etc.}
Training Data:
{(xi, yi)}ni=1
xi 2 Rd
yi 2 R
Hypothesis: linear
Loss: least squares
yi ⇡ x
Ti w
minw
nX
i=1
�yi � x
Ti w
�2
best linear fit
![Page 14: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/14.jpg)
14
The regression problem
Training Data:
{(xi, yi)}ni=1
xi 2 Rd
yi 2 R
Hypothesis: linear
Loss: least squares
yi ⇡ x
Ti w
minw
nX
i=1
�yi � x
Ti w
�2
Transformed data:
![Page 15: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/15.jpg)
15
The regression problem
Training Data:
{(xi, yi)}ni=1
xi 2 Rd
yi 2 R
Hypothesis: linear
Loss: least squares
yi ⇡ x
Ti w
minw
nX
i=1
�yi � x
Ti w
�2
Transformed data:
in d=1:
h : Rd ! Rpmaps original
features to a rich, possibly
high-dimensional space
hj(x) =1
1 + exp(u
Tj x)
hj(x) = (uTj x)
2
for d>1, generate {uj}pj=1 ⇢ Rd
hj(x) = cos(u
Tj x)
h(x) =
2
6664
h1(x)h2(x)
...hp(x)
3
7775=
2
6664
x
x
2
...x
p
3
7775
![Page 16: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/16.jpg)
16
The regression problem
Training Data:
{(xi, yi)}ni=1
xi 2 Rd
yi 2 R
Hypothesis: linear
Loss: least squares
yi ⇡ x
Ti w
minw
nX
i=1
�yi � x
Ti w
�2
Transformed data:
h(x) =
2
6664
h1(x)h2(x)
...hp(x)
3
7775
Hypothesis: linear
Loss: least squares
yi ⇡ h(xi)Tw
w 2 Rp
minw
nX
i=1
�yi � h(xi)
Tw
�2
![Page 17: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/17.jpg)
17
The regression problem
Training Data:
{(xi, yi)}ni=1
xi 2 Rd
yi 2 RTransformed data:
h(x) =
2
6664
h1(x)h2(x)
...hp(x)
3
7775
Hypothesis: linear
Loss: least squares
yi ⇡ h(xi)Tw
w 2 Rp
minw
nX
i=1
�yi � h(xi)
Tw
�2
date of sale
Sal
e P
rice
best linear fit
![Page 18: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/18.jpg)
18
The regression problem
Training Data:
{(xi, yi)}ni=1
xi 2 Rd
yi 2 RTransformed data:
h(x) =
2
6664
h1(x)h2(x)
...hp(x)
3
7775
Hypothesis: linear
Loss: least squares
yi ⇡ h(xi)Tw
w 2 Rp
minw
nX
i=1
�yi � h(xi)
Tw
�2
date of sale
Sal
e P
rice
small p fit
![Page 19: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/19.jpg)
19
The regression problem
Training Data:
{(xi, yi)}ni=1
xi 2 Rd
yi 2 RTransformed data:
h(x) =
2
6664
h1(x)h2(x)
...hp(x)
3
7775
Hypothesis: linear
Loss: least squares
yi ⇡ h(xi)Tw
w 2 Rp
minw
nX
i=1
�yi � h(xi)
Tw
�2
date of sale
Sal
e P
rice
large p fit
What’s going on here?
![Page 20: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/20.jpg)
©2017 Kevin Jamieson 20
Bias-Variance Tradeoff
Machine Learning – CSE546 Kevin Jamieson University of Washington
Oct 5, 2017
![Page 21: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/21.jpg)
Statistical Learning
©2017 Kevin Jamieson
x
y
PXY (X = x, Y = y)
![Page 22: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/22.jpg)
Statistical Learning
©2017 Kevin Jamieson
x
y
PXY (X = x, Y = y)
x0 x1
PXY (Y = y|X = x0)
PXY (Y = y|X = x1)
![Page 23: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/23.jpg)
Statistical Learning
©2017 Kevin Jamieson
PXY (Y = y|X = x0)
PXY (Y = y|X = x1)x
y
PXY (X = x, Y = y)
x0 x1
⌘(x) = EXY [Y |X = x]
Ideally, we want to find:
![Page 24: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/24.jpg)
Statistical Learning
©2017 Kevin Jamieson
x
y
PXY (X = x, Y = y)⌘(x) = EXY [Y |X = x]
Ideally, we want to find:
![Page 25: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/25.jpg)
Statistical Learning
©2017 Kevin Jamieson
x
y
PXY (X = x, Y = y)⌘(x) = EXY [Y |X = x]
Ideally, we want to find:
(xi, yi)i.i.d.⇠ PXY for i = 1, . . . , n
But we only have samples:
![Page 26: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/26.jpg)
Statistical Learning
©2017 Kevin Jamieson
x
y
PXY (X = x, Y = y)
bf = argmin
f2F
1
n
nX
i=1
(yi � f(xi))2
⌘(x) = EXY [Y |X = x]
Ideally, we want to find:
(xi, yi)i.i.d.⇠ PXY for i = 1, . . . , n
But we only have samples:
and are restricted to a
function class (e.g., linear)
so we compute:
![Page 27: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/27.jpg)
Statistical Learning
©2017 Kevin Jamieson
x
y
PXY (X = x, Y = y)
bf = argmin
f2F
1
n
nX
i=1
(yi � f(xi))2
⌘(x) = EXY [Y |X = x]
Ideally, we want to find:
(xi, yi)i.i.d.⇠ PXY for i = 1, . . . , n
But we only have samples:
and are restricted to a
function class (e.g., linear)
so we compute:
We care about future predictions: EXY [(Y � bf(X))
2]
![Page 28: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/28.jpg)
Statistical Learning
©2017 Kevin Jamieson
x
y
PXY (X = x, Y = y)
Each draw D = {(xi, yi)}ni=1 results in di↵erent bf
bf = argmin
f2F
1
n
nX
i=1
(yi � f(xi))2
⌘(x) = EXY [Y |X = x]
Ideally, we want to find:
(xi, yi)i.i.d.⇠ PXY for i = 1, . . . , n
But we only have samples:
and are restricted to a
function class (e.g., linear)
so we compute:
![Page 29: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/29.jpg)
Statistical Learning
©2017 Kevin Jamieson
x
y
PXY (X = x, Y = y)
Each draw D = {(xi, yi)}ni=1 results in di↵erent bf
ED[ bf(x)]bf = argmin
f2F
1
n
nX
i=1
(yi � f(xi))2
⌘(x) = EXY [Y |X = x]
Ideally, we want to find:
(xi, yi)i.i.d.⇠ PXY for i = 1, . . . , n
But we only have samples:
and are restricted to a
function class (e.g., linear)
so we compute:
![Page 30: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/30.jpg)
Bias-Variance Tradeoff
©2017 Kevin Jamieson
EY |X=x
[ED[(Y � bfD(x))
2]] = EY |X=x
[ED[(Y � ⌘(x) + ⌘(x)� bfD(x))
2]]
⌘(x) = EXY [Y |X = x] bf = argmin
f2F
1
n
nX
i=1
(yi � f(xi))2
![Page 31: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/31.jpg)
Bias-Variance Tradeoff
©2017 Kevin Jamieson
irreducible error Caused by stochastic
label noise
learning error Caused by either using too “simple”
of a model or not enough data to learn the model accurately
⌘(x) = EXY [Y |X = x] bf = argmin
f2F
1
n
nX
i=1
(yi � f(xi))2
EXY [ED[(Y � bfD(x))
2]��X = x] = EXY [ED[(Y � ⌘(x) + ⌘(x)� b
fD(x))2]��X = x]
=EXY [ED[(Y � ⌘(x))2 + 2(Y � ⌘(x))(⌘(x)� bfD(x))
+ (⌘(x)� bfD(x))
2]��X = x]
=EXY [(Y � ⌘(x))2��X = x] + ED[(⌘(x)� b
fD(x))2]
![Page 32: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/32.jpg)
Bias-Variance Tradeoff
©2017 Kevin Jamieson
ED[(⌘(x)� bfD(x))
2] = ED[(⌘(x)� ED[ bfD(x)] + ED[ bfD(x)]� bfD(x))
2]
⌘(x) = EXY [Y |X = x] bf = argmin
f2F
1
n
nX
i=1
(yi � f(xi))2
![Page 33: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/33.jpg)
Bias-Variance Tradeoff
©2017 Kevin Jamieson
=(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))
2]
=ED[(⌘(x)� ED[ bfD(x)])2 + 2(⌘(x)� ED[ bfD(x)])(ED[ bfD(x)]� bfD(x))
+ (ED[ bfD(x)]� bfD(x))
2]
ED[(⌘(x)� bfD(x))
2] = ED[(⌘(x)� ED[ bfD(x)] + ED[ bfD(x)]� bfD(x))
2]
biased squared variance
⌘(x) = EXY [Y |X = x] bf = argmin
f2F
1
n
nX
i=1
(yi � f(xi))2
![Page 34: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/34.jpg)
Bias-Variance Tradeoff
Model too simple ➔ high bias, cannot fit well to data
Model too complex ➔ high variance, small changes in data change learned function a lot
EXY [ED[(Y � bfD(x))
2]��X = x] = EXY [(Y � ⌘(x))2
��X = x]
biased squared variance
+(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))
2]
irreducible error
![Page 35: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/35.jpg)
Bias-Variance Tradeoff
EXY [ED[(Y � bfD(x))
2]��X = x] = EXY [(Y � ⌘(x))2
��X = x]
biased squared variance
+(⌘(x)� ED[ bfD(x)])2 + ED[(ED[ bfD(x)]� bfD(x))
2]
irreducible error
![Page 36: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/36.jpg)
©2017 Kevin Jamieson 36
Overfitting
Machine Learning – CSE546 Kevin Jamieson University of Washington
Oct 5, 2017
![Page 37: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/37.jpg)
Bias-Variance Tradeoff
■ Choice of hypothesis class introduces learning bias More complex class → less bias More complex class → more variance
■ But in practice??
![Page 38: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/38.jpg)
Bias-Variance Tradeoff
■ Choice of hypothesis class introduces learning bias More complex class → less bias More complex class → more variance
■ But in practice?? ■ Before we saw how increasing the feature space can
increase the complexity of the learned estimator:
F1 ⇢ F2 ⇢ F3 ⇢ . . .
Complexity grows as k grows
bf
(k)D = arg min
f2Fk
1
|D|X
(xi,yi)2D
(yi
� f(xi
))2
![Page 39: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/39.jpg)
Training set error as a function of model complexity
©2017 Kevin Jamieson 39
F1 ⇢ F2 ⇢ F3 ⇢ . . . TRAIN error:
TRUE error: EXY [(Y � bf (k)
D (X))2]
bf
(k)D = arg min
f2Fk
1
|D|X
(xi,yi)2D
(yi
� f(xi
))21
|D|X
(xi,yi)2D
(yi
� bf
(k)D (x
i
))2D i.i.d.⇠ PXY
![Page 40: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/40.jpg)
Training set error as a function of model complexity
©2017 Kevin Jamieson 40
F1 ⇢ F2 ⇢ F3 ⇢ . . . TRAIN error:
TRUE error: EXY [(Y � bf (k)
D (X))2]
Complexity (k)
bf
(k)D = arg min
f2Fk
1
|D|X
(xi,yi)2D
(yi
� f(xi
))21
|D|X
(xi,yi)2D
(yi
� bf
(k)D (x
i
))2
TEST error:
1
|T |X
(xi,yi)2T
(yi
� bf
(k)D (x
i
))2
D i.i.d.⇠ PXY
T i.i.d.⇠ PXY
Important: D \ T = ;
![Page 41: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/41.jpg)
Training set error as a function of model complexity
©2017 Kevin Jamieson 41
F1 ⇢ F2 ⇢ F3 ⇢ . . . TRAIN error:
TRUE error: EXY [(Y � bf (k)
D (X))2]
Complexity (k)
bf
(k)D = arg min
f2Fk
1
|D|X
(xi,yi)2D
(yi
� f(xi
))21
|D|X
(xi,yi)2D
(yi
� bf
(k)D (x
i
))2
TEST error:
1
|T |X
(xi,yi)2T
(yi
� bf
(k)D (x
i
))2
D i.i.d.⇠ PXY
T i.i.d.⇠ PXY
Important: D \ T = ;Each line is i.i.d. draw of D or T
Plot from Hastie et al
![Page 42: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/42.jpg)
Training set error as a function of model complexity
©2017 Kevin Jamieson 42
F1 ⇢ F2 ⇢ F3 ⇢ . . . TRAIN error:
TRUE error: EXY [(Y � bf (k)
D (X))2]
bf
(k)D = arg min
f2Fk
1
|D|X
(xi,yi)2D
(yi
� f(xi
))21
|D|X
(xi,yi)2D
(yi
� bf
(k)D (x
i
))2
TEST error:
1
|T |X
(xi,yi)2T
(yi
� bf
(k)D (x
i
))2
D i.i.d.⇠ PXY
T i.i.d.⇠ PXY
Important: D \ T = ;
TRAIN error is optimistically biased because it is evaluated on the data it trained on. TEST error is unbiased only if T is never used to train the model or even pick the complexity k.
![Page 43: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/43.jpg)
Test set error■ Given a dataset, randomly split it into two parts:
Training data: Test data:
■ Use training data to learn predictor■ e.g., ■ use training data to pick complexity k (next lecture)
■ Use test data to report predicted performance
©2017 Kevin Jamieson 43
DT Important: D \ T = ;
1
|D|X
(xi,yi)2D
(yi
� bf
(k)D (x
i
))2
1
|T |X
(xi,yi)2T
(yi
� bf
(k)D (x
i
))2
![Page 44: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/44.jpg)
Overfitting
■ Overfitting: a learning algorithm overfits the training data if it outputs a solution w when there exists another solution w’ such that:
©2017 Kevin Jamieson 44
![Page 45: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/45.jpg)
How many points do I use for training/testing?
■ Very hard question to answer! Too few training points, learned model is bad Too few test points, you never know if you reached a good solution
■ Bounds, such as Hoeffding’s inequality can help:
■ More on this later this quarter, but still hard to answer ■ Typically:
If you have a reasonable amount of data 90/10 splits are common If you have little data, then you need to get fancy (e.g., bootstrapping)
©2017 Kevin Jamieson 45
![Page 46: Linear Regression - University of Washington · Linear Regression Machine Learning – CSE546 Kevin Jamieson University of Washington Oct 5, 2017. 2 The regression problem ©2017](https://reader033.vdocument.in/reader033/viewer/2022060521/604f45539079573162360e4c/html5/thumbnails/46.jpg)
Recap
■ Learning is… Collect some data ■ E.g., housing info and sale price
Randomly split dataset into TRAIN and TEST ■ E.g., 80% and 20%, respectively
Choose a hypothesis class or model ■ E.g., linear
Choose a loss function ■ E.g., least squares
Choose an optimization procedure ■ E.g., set derivative to zero to obtain estimator
Justifying the accuracy of the estimate ■ E.g., report TEST error
46©2017 Kevin Jamieson