machine learning and data mining linear regression (adapted from) prof. alexander ihler texpoint...
TRANSCRIPT
![Page 1: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/1.jpg)
Machine Learning and Data Mining
Linear regression
(adapted from) Prof. Alexander Ihler
+
![Page 2: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/2.jpg)
Supervised learning• Notation
– Features x– Targets y– Predictions ŷ– Parameters q
Program (“Learner”)
Characterized by some “parameters” θ
Procedure (using θ) that outputs a prediction
Training data (examples)
Features
Learning algorithm
Change θImprove performance
Feedback / Target values Score performance
(“cost function”)
![Page 3: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/3.jpg)
Linear regression
• Define form of function f(x) explicitly• Find a good f(x) within that family
0 10 200
20
40
Tar
get
y
Feature x
“Predictor”:Evaluate line:
return r
(c) Alexander Ihler
![Page 4: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/4.jpg)
More dimensions?
010
2030
40
0
10
20
30
20
22
24
26
010
2030
40
0
10
20
30
20
22
24
26
x1 x2
y
x1 x2
y
(c) Alexander Ihler
![Page 5: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/5.jpg)
Notation
Define “feature” x0 = 1 (constant)Then
(c) Alexander Ihler
![Page 6: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/6.jpg)
Measuring error
0 200
Error or “residual”
Prediction
Observation
(c) Alexander Ihler
![Page 7: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/7.jpg)
Mean squared error• How can we quantify the error?
• Could choose something else, of course…– Computationally convenient (more later)– Measures the variance of the residuals– Corresponds to likelihood under Gaussian model of “noise”
(c) Alexander Ihler
![Page 8: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/8.jpg)
MSE cost function
• Rewrite using matrix form
(Matlab) >> e = y’ – th*X’; J = e*e’/m;
(c) Alexander Ihler
![Page 9: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/9.jpg)
Visualizing the cost function
-1 -0.5 0 0.5 1 1.5 2 2.5 3-40
-30
-20
-10
0
10
20
30
40
-1 -0.5 0 0.5 1 1.5 2 2.5 3-40
-30
-20
-10
0
10
20
30
40
(c) Alexander Ihler
θ1 θ
0
J(θ
)
![Page 10: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/10.jpg)
Supervised learning• Notation
– Features x– Targets y– Predictions ŷ– Parameters q
Program (“Learner”)
Characterized by some “parameters” θ
Procedure (using θ) that outputs a prediction
Training data (examples)
Features
Learning algorithm
Change θImprove performance
Feedback / Target values Score performance
(“cost function”)
![Page 11: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/11.jpg)
Finding good parameters• Want to find parameters which minimize our error…
• Think of a cost “surface”: error residual for that θ…
(c) Alexander Ihler
![Page 12: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/12.jpg)
Machine Learning and Data Mining
Linear regression: direct minimization
(adapted from) Prof. Alexander Ihler
+
![Page 13: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/13.jpg)
MSE Minimum
• Consider a simple problem– One feature, two data points– Two unknowns: µ0, µ1
– Two equations:
• Can solve this system directly:
• However, most of the time, m > n– There may be no linear function that hits all the data exactly– Instead, solve directly for minimum of MSE function
(c) Alexander Ihler
![Page 14: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/14.jpg)
SSE Minimum
• Reordering, we have
• X (XT X)-1 is called the “pseudo-inverse”
• If XT is square and independent, this is the inverse• If m > n: overdetermined; gives minimum MSE fit
(c) Alexander Ihler
![Page 15: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/15.jpg)
Matlab SSE• This is easy to solve in Matlab…
% y = [y1 ; … ; ym]% X = [x1_0 … x1_m ; x2_0 … x2_m ; …]
% Solution 1: “manual” th = y’ * X * inv(X’ * X);
% Solution 2: “mrdivide” th = y’ / X’; % th*X’ = y => th = y/X’
(c) Alexander Ihler
![Page 16: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/16.jpg)
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
14
16
18
Effects of MSE choice• Sensitivity to outliers
16 2 cost for this one datum
Heavy penalty for large errors
-20 -15 -10 -5 0 50
1
2
3
4
5
(c) Alexander Ihler
![Page 17: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/17.jpg)
L1 error
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
14
16
18 L2, original data
L1, original data
L1, outlier data
(c) Alexander Ihler
![Page 18: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/18.jpg)
Cost functions for regression
“Arbitrary” functions can’t besolved in closed form…
- use gradient descent
(MSE)
(MAE)
Something else entirely…
(???)
(c) Alexander Ihler
![Page 19: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/19.jpg)
Machine Learning and Data Mining
Linear regression: nonlinear features
(adapted from) Prof. Alexander Ihler
+
![Page 20: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/20.jpg)
Nonlinear functions• What if our hypotheses are not lines?
– Ex: higher-order polynomials
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
14
16
18Order 1 polynom ial
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
14
16
18Order 3 polynom ial
(c) Alexander Ihler
![Page 21: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/21.jpg)
Nonlinear functions• Single feature x, predict target y:
• Sometimes useful to think of “feature transform”
Add features:
Linear regression in new features
(c) Alexander Ihler
![Page 22: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/22.jpg)
Higher-order polynomials• Fit in the same way• More “features”
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
14
16
18Order 1 polynom ial
0 2 4 6 8 10 12 14 16 18 20-2
0
2
4
6
8
10
12
14
16
18Order 2 polynom ial
0 2 4 6 8 10 12 14 16 18 200
2
4
6
8
10
12
14
16
18Order 3 polynom ial
![Page 23: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/23.jpg)
Features• In general, can use any features we think are useful
• Other information about the problem– Sq. footage, location, age, …
• Polynomial functions– Features [1, x, x2, x3, …]
• Other functions– 1/x, sqrt(x), x1 * x2, …
• “Linear regression” = linear in the parameters– Features we can make as complex as we want!
(c) Alexander Ihler
![Page 24: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/24.jpg)
Higher-order polynomials• Are more features better?
• “Nested” hypotheses– 2nd order more general than 1st,– 3rd order “ “ than 2nd, …
• Fits the observed data better
![Page 25: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/25.jpg)
Overfitting and complexity• More complex models will always fit the training data
better• But they may “overfit” the training data, learning
complex relationships that are not really present
X
Y
Complex model
X
Y
Simple model
(c) Alexander Ihler
![Page 26: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/26.jpg)
Test data• After training the model• Go out and get more data from the world
– New observations (x,y)
• How well does our model perform?
(c) Alexander Ihler
Training data
New, “test” data
![Page 27: Machine Learning and Data Mining Linear regression (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you](https://reader035.vdocument.in/reader035/viewer/2022062409/56649c765503460f9492a041/html5/thumbnails/27.jpg)
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 50
5
10
15
20
25
30
Training data
Training versus test error• Plot MSE as a function
of model complexity– Polynomial order
• Decreases– More complex function
fits training data better
• What about new data?M
ean
squ
ared
err
or
Polynomial order
New, “test” data
• 0th to 1st order– Error decreases– Underfitting
• Higher order– Error increases– Overfitting
(c) Alexander Ihler