simple linear regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… ·...
TRANSCRIPT
![Page 1: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/1.jpg)
Simple Linear Regression (single variable)Introduction to Machine Learning
Marek Petrik
January 31, 2017
Some of the figures in this presentation are taken from ”An Introduction to Statistical Learning, with applications in R”(Springer, 2013) with permission from the authors: G. James, D. Wien, T. Hastie and R. Tibshirani
![Page 2: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/2.jpg)
Last Class
1. Basic machine learning framework
Y = f(X)
2. Prediction vs inference: predict Y vs understand f
3. Parametric vs non-parametric: linear regression vs k-NN
4. Classification vs regressions: k-NN vs linear regression
5. Why we need to have a test set: overfiing
![Page 3: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/3.jpg)
What is Machine LearningI Discover unknown function f :
Y = f(X)
I X = set of features, or inputsI Y = target, or response
0 50 100 200 300
51
01
52
02
5
TV
Sa
les
0 10 20 30 40 50
51
01
52
02
5
Radio
Sa
les
0 20 40 60 80 100
51
01
52
02
5Newspaper
Sa
les
Sales = f(TV,Radio,Newspaper)
![Page 4: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/4.jpg)
Errors in Machine Learning: World is Noisy
I World is too complex to model preciselyI Many features are not captured in data setsI Need to allow for errors ε in f :
Y = f(X) + ε
![Page 5: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/5.jpg)
How Good are Predictions?
I Learned function fI Test data: (x1, y1), (x2, y2), . . .
I Mean Squared Error (MSE):
MSE =1
n
n∑i=1
(yi − f(xi))2
I This is the estimate of:
MSE = E[(Y − f(X))2] =1
|Ω|∑ω∈Ω
(Y (ω)− f(X(ω)))2
I Important: Samples xi are i.i.d.
![Page 6: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/6.jpg)
Do We Need Test Data?I Why not just test on the training data?
0 20 40 60 80 100
24
68
10
12
X
Y
2 5 10 20
0.0
0.5
1.0
1.5
2.0
2.5
Flexibility
Me
an
Sq
ua
red
Err
or
I Flexibility is the degree of polynomial being fitI Gray line: training error, red line: testing error
![Page 7: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/7.jpg)
Types of Function f
Regression: continuous target
f : X → R
Years of Education
Sen
iorit
y
Incom
e
Classification: discrete target
f : X → 1, 2, 3, . . . , k
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o oo
o
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
oo
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
oo
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
X1
X2
![Page 8: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/8.jpg)
Bias-Variance Decomposition
Y = f(X) + ε
Mean Squared Error can be decomposed as:
MSE = E(Y − f(X))2 = Var(f(X))︸ ︷︷ ︸Variance
+ (E(f(X)))2︸ ︷︷ ︸Bias
+ Var(ε)
I Bias: How well would method work with infinite dataI Variance: How much does output change with dierent data
sets
![Page 9: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/9.jpg)
Today
I Basics of linear regressionI Why linear regressionI How to compute itI Why compute it
![Page 10: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/10.jpg)
Simple Linear RegressionI We have only one feature
Y ≈ β0 + β1X Y = β0 + β1X + ε
I Example:
0 50 100 150 200 250 300
510
1520
25
TV
Sal
es
Sales ≈ β0 + β1 × TV
![Page 11: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/11.jpg)
How To Estimate CoeicientsI No line that will have no errors on data xiI Prediction:
yi = β0 + β1xi
I Errors (yi are true values):
ei = yi − yi
0 50 100 150 200 250 300
510
15
20
25
TV
Sale
s
![Page 12: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/12.jpg)
Residual Sum of Squares
I Residual Sum of Squares
RSS = e21 + e2
2 + e23 + · · ·+ e2
n =
n∑i=1
e2i
I Equivalently:
RSS =
n∑i=1
(yi − β0 − β1xi)2
![Page 13: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/13.jpg)
Minimizing Residual Sum of Squares
minβ0,β1
RSS = minβ0,β1
n∑i=1
e2i = min
β0,β1
n∑i=1
(yi − β0 − β1xi)2
RS
S
β1
β0
![Page 14: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/14.jpg)
Minimizing Residual Sum of Squares
minβ0,β1
RSS = minβ0,β1
n∑i=1
e2i = min
β0,β1
n∑i=1
(yi − β0 − β1xi)2
β0
β1
2.15
2.2
2.3
2.5
3
3
3
3
5 6 7 8 9
0.0
30.0
40.0
50.0
6
![Page 15: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/15.jpg)
Solving for Minimal RSS
minβ0,β1
n∑i=1
(yi − β0 − β1xi)2
I RSS is a convex function of β0, β1
I Minimum achieved when (recall the chain rule):
∂ RSS
∂β0= −2
n∑i=1
(yi − β0 − β1xi) = 0
∂ RSS
∂β1= −2
n∑i=1
xi(yi − β0 − β1xi) = 0
![Page 16: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/16.jpg)
Linear Regression Coeicients
minβ0,β1
n∑i=1
(yi − β0 − β1xi)2
Solution:
β0 = y − β1x
β1 =
∑ni=1(xi − x)(yi − y)∑n
i=1(xi − x)2=
∑ni=1 xi(yi − y)∑ni=1 xi(xi − x)
where
x =1
n
n∑i=1
xi y =1
n
n∑i=1
yi
![Page 17: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/17.jpg)
Why Minimize RSS
1. Maximize likelihood when Y = β0 + β1X + ε whenε ∼ N (0, σ2)
2. Best Linear Unbiased Estimator (BLUE): Gauss-MarkovTheorem (ESL 3.2.2)
3. It is convenient: can be solved in closed form
![Page 18: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/18.jpg)
Why Minimize RSS
1. Maximize likelihood when Y = β0 + β1X + ε whenε ∼ N (0, σ2)
2. Best Linear Unbiased Estimator (BLUE): Gauss-MarkovTheorem (ESL 3.2.2)
3. It is convenient: can be solved in closed form
![Page 19: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/19.jpg)
Why Minimize RSS
1. Maximize likelihood when Y = β0 + β1X + ε whenε ∼ N (0, σ2)
2. Best Linear Unbiased Estimator (BLUE): Gauss-MarkovTheorem (ESL 3.2.2)
3. It is convenient: can be solved in closed form
![Page 20: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/20.jpg)
Why Minimize RSS
1. Maximize likelihood when Y = β0 + β1X + ε whenε ∼ N (0, σ2)
2. Best Linear Unbiased Estimator (BLUE): Gauss-MarkovTheorem (ESL 3.2.2)
3. It is convenient: can be solved in closed form
![Page 21: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/21.jpg)
Bias in Estimation
I Assume a true value µ?
I Estimate µ is unbiased when E[µ] = µ?
I Standard mean estimate is unbiased (e.g. X ∼ N (0, 1)):
E
[1
n
n∑i=1
Xi
]= 0
I Standard variance estimate is biased (e.g. X ∼ N (0, 1)):
E
[1
n
n∑i=1
(Xi − X)2
]6= 1
![Page 22: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/22.jpg)
Linear Regression is Unbiased
−2 −1 0 1 2
−10
−5
05
10
X
Y
−2 −1 0 1 2
−10
−5
05
10
X
Y
Gauss-Markov Theorem (ESL 3.2.2)
![Page 23: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/23.jpg)
Solution of Linear Regression
0 50 100 150 200 250 300
510
1520
25
TV
Sal
es
![Page 24: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/24.jpg)
How Good is the Fit
I How well is linear regression predicting the training data?I Can we be sure that TV advertising really influences the sales?I What is the probability that we just got lucky?
![Page 25: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/25.jpg)
R2 Statistic
R2 = 1− RSS
TSS= 1−
∑ni=1(yi − yi)2∑ni=1(yi − y)2
I RSS - residual sum of squares, TSS - total sum of squaresI R2 measures the goodness of the fit as a proportionI Proportion of data variance explained by the modelI Extreme values:
0: Model does not explain data1: Model explains data perfectly
![Page 26: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/26.jpg)
Example: TV Impact on Sales
0 50 100 150 200 250 300
510
1520
25
TV
Sal
es
R2 = 0.61
![Page 27: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/27.jpg)
Example: TV Impact on Sales
0 50 100 150 200 250 300
510
1520
25
TV
Sal
es
R2 = 0.61
![Page 28: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/28.jpg)
Example: Radio Impact on Sales
0 10 20 30 40 50
510
1520
25
Radio
Sal
es
R2 = 0.33
![Page 29: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/29.jpg)
Example: Radio Impact on Sales
0 10 20 30 40 50
510
1520
25
Radio
Sal
es
R2 = 0.33
![Page 30: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/30.jpg)
Example: Newspaper Impact on Sales
0 20 40 60 80 100
510
1520
25
Newspaper
Sal
es
R2 = 0.05
![Page 31: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/31.jpg)
Example: Newspaper Impact on Sales
0 20 40 60 80 100
510
1520
25
Newspaper
Sal
es
R2 = 0.05
![Page 32: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/32.jpg)
Correlation Coeicient
I Measures dependence between two random variables X and Y
r =Cov(X,Y )√
Var(X)√
Var(Y )
I Correlation coeicient r is between [−1, 1]
0: Variables are not related1: Variables are perfectly related (same)−1: Variables are negatively related (dierent)
I R2 = r2
![Page 33: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/33.jpg)
Correlation Coeicient
I Measures dependence between two random variables X and Y
r =Cov(X,Y )√
Var(X)√
Var(Y )
I Correlation coeicient r is between [−1, 1]
0: Variables are not related1: Variables are perfectly related (same)−1: Variables are negatively related (dierent)
I R2 = r2
![Page 34: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/34.jpg)
Hypothesis Testing
I Null hypothesis H0:
There is no relationship between X and Y
β1 = 0
I Alternative hypothesis H1:
There is some relationship between X and Y
β1 6= 0
I Seek to reject hypothesis H0 with small “probability” (p-value)of making a mistake
I Important topic, but beyond the scope of the course
![Page 35: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/35.jpg)
Multiple Linear Regression
I Usually more than one feature is available
sales = β0 + β1 × TV + β2 × radio + β3 × newspaper + ε
I In general:
Y = β0 +
p∑j=1
βjXj
![Page 36: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/36.jpg)
Multiple Linear Regression
X1
X2
Y
![Page 37: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/37.jpg)
Estimating Coeicients
I Prediction:
yi = β0 +
p∑j=1
βjxij
I Errors (yi are true values):
ei = yi − yi
I Residual Sum of Squares
RSS = e21 + e2
2 + e23 + · · ·+ e2
n =
n∑i=1
e2i
I How to minimize RSS? Linear algebra!
![Page 38: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/38.jpg)
Inference from Linear Regression
1. Are predictors X1, X2, . . . , Xp really predicting Y ?
2. Is only a subset of predictors useful?
3. How well does linear model fit data?
4. What Y should be predict and how accurate is it?
![Page 39: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/39.jpg)
Inference 1
“Are predictors X1, X2, . . . , Xp really predicting Y ?”
I Null hypothesis H0:
There is no relationship between X and Y
β1 = 0
I Alternative hypothesis H1:
There is some relationship between X and Y
β1 6= 0
I Seek to reject hypothesis H0 with small “probability” (p-value)of making a mistake
I See ISL 3.2.2 on how to compute F-statistic and reject H0
![Page 40: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/40.jpg)
Inference 2
“Is only a subset of predictors useful?”
I Compare prediction accuracy with only a subset of features
I RSS always decreases with more features!I Other measures control for number of variables:
1. Mallows Cp
2. Akaike information criterion3. Bayesian information criterion4. Adjusted R2
I Testing all subsets of features is impractical: 2p options!I More on how to do this later
![Page 41: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/41.jpg)
Inference 2
“Is only a subset of predictors useful?”
I Compare prediction accuracy with only a subset of featuresI RSS always decreases with more features!
I Other measures control for number of variables:1. Mallows Cp
2. Akaike information criterion3. Bayesian information criterion4. Adjusted R2
I Testing all subsets of features is impractical: 2p options!I More on how to do this later
![Page 42: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/42.jpg)
Inference 2
“Is only a subset of predictors useful?”
I Compare prediction accuracy with only a subset of featuresI RSS always decreases with more features!I Other measures control for number of variables:
1. Mallows Cp
2. Akaike information criterion3. Bayesian information criterion4. Adjusted R2
I Testing all subsets of features is impractical: 2p options!I More on how to do this later
![Page 43: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/43.jpg)
Inference 2
“Is only a subset of predictors useful?”
I Compare prediction accuracy with only a subset of featuresI RSS always decreases with more features!I Other measures control for number of variables:
1. Mallows Cp
2. Akaike information criterion3. Bayesian information criterion4. Adjusted R2
I Testing all subsets of features is impractical: 2p options!
I More on how to do this later
![Page 44: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/44.jpg)
Inference 2
“Is only a subset of predictors useful?”
I Compare prediction accuracy with only a subset of featuresI RSS always decreases with more features!I Other measures control for number of variables:
1. Mallows Cp
2. Akaike information criterion3. Bayesian information criterion4. Adjusted R2
I Testing all subsets of features is impractical: 2p options!I More on how to do this later
![Page 45: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/45.jpg)
Inference 3
“How well does linear model fit data?”
I R2 also always increases with more features (like RSS)I Is the model linear? Plot it:
Sales
Radio
TV
I More on this later
![Page 46: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/46.jpg)
Inference 4
“What Y should be predict and how accurate is it?”
I The linear model is used to make predictions:
ypredicted = β0 + β1 xnew
I Can also predict a confidence interval (based on estimate on ε):
I Example: Spent $100 000 on TV and $20 000 on Radioadvertising
I Confidence interval: predict f(X) (the average response):
f(x) ∈ [10.985, 11, 528]
I Prediction interval: predict f(X) + ε (response + possiblenoise)
f(x) ∈ [7.930, 14.580]
![Page 47: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/47.jpg)
Inference 4
“What Y should be predict and how accurate is it?”
I The linear model is used to make predictions:
ypredicted = β0 + β1 xnew
I Can also predict a confidence interval (based on estimate on ε):I Example: Spent $100 000 on TV and $20 000 on Radio
advertisingI Confidence interval: predict f(X) (the average response):
f(x) ∈ [10.985, 11, 528]
I Prediction interval: predict f(X) + ε (response + possiblenoise)
f(x) ∈ [7.930, 14.580]
![Page 48: Simple Linear Regression (single variable)mpetrik/teaching/intro_ml_17/intro_ml_17_files/cla… · Simple Linear Regression (single variable) Introduction to Machine Learning Marek](https://reader033.vdocument.in/reader033/viewer/2022050215/5f616b60824f745c453e4955/html5/thumbnails/48.jpg)
R notebook