classification and prediction: regression analysis bamshad mobasher depaul university bamshad...
TRANSCRIPT
![Page 1: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/1.jpg)
Classification and Prediction:Regression Analysis
Bamshad MobasherDePaul University
Bamshad MobasherDePaul University
![Page 2: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/2.jpg)
2
What Is Numerical Prediction(a.k.a. Estimation, Forecasting)
i (Numerical) prediction is similar to classification4 construct a model4 use model to predict continuous or ordered value for a given input
i Prediction is different from classification4 Classification refers to predicting categorical class label4 Prediction models continuous-valued functions
i Major method for prediction: regression4 model the relationship between one or more independent or predictor
variables and a dependent or response variablei Regression analysis
4 Linear and multiple regression4 Non-linear regression4 Other regression methods: generalized linear model, Poisson regression,
log-linear models, regression trees
![Page 3: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/3.jpg)
3
Linear Regression
i Linear regression: involves a response variable y and a single predictor variable x y = w0 + w1 x
x y
i Goal: Using the data estimate weights (parameters) w0 and w1 for the
line such that the prediction error is minimized
![Page 4: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/4.jpg)
Linear Regression
y
x
Observed Value of y for xi
Predicted Value of y for xi
xwwy 10
xi
Slope = β1
Intercept = w0
ei
Error for this x value
![Page 5: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/5.jpg)
5
Linear Regression
i Linear regression: involves a response variable y and a single predictor variable x y = w0 + w1 x
4 The weights w0 (y-intercept) and w1 (slope) are regression coefficients
i Method of least squares: estimates the best-fitting straight line4 w0 and w1 are obtained by minimizing the sum of the squared errors (a.k.a.
residuals)
ii
iii
xx
yyxxw
21 )(
))((
xwyw 10
210
22
))((
)ˆ(
ii
iii
ii
xwwy
yye
w1 can be obtained by setting the partial derivative of the SSE to 0 and solving for w1, ultimately resulting in:
![Page 6: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/6.jpg)
6
Multiple Linear Regression
i Multiple linear regression: involves more than one predictor variable
4 Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
4 Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2
4 Solvable by extension of least square method
4 Many nonlinear functions can be transformed into the above
x1 yx2
![Page 7: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/7.jpg)
i Simple Least Squares:4 Determine linear coefficients , that minimize sum of
squared error (SSE).4 Use standard (multivariate) differential calculus:
h differentiate SSE with respect to , h find zeros of each partial differential equationh solve for ,
i One dimension:
Least Squares Generalization
ttt
N
jjj
xxy
x, yy,xxyx
yx
Nxy
samplefor test ˆ
trainingof means ]var[
],cov[
samples ofnumber ))((SSE1
2
![Page 8: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/8.jpg)
i Multiple dimensions
4 To simplify notation and derivation, change to 0, and add a new feature x0 = 1 to feature vector x:
Least Squares Generalization
xβ
T
10 1ˆ
d
iii xy
yx2x1x0
11111
![Page 9: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/9.jpg)
i Multiple dimensions
4 Calculate SSE and determine :
Least Squares Generalization
xβ
T
10 1ˆ
d
iii xy
ttt
j
j
N
j
d
ijiij
y
y
xy
xxβ
yXXXβ
xX
y
XβyXβy
samplefor test ˆ
)(
samples trainingall ofmatrix
responses trainingall ofvector
)()()(SSE
T1T
1
T2
0,
![Page 10: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/10.jpg)
i The inputs X for linear regression can be:4 Original quantitative inputs4 Transformation of quantitative inputs, e.g. log, exp, square
root, square, etc.4 Polynomial transformation
h example: y = 0 + 1x + 2x2 + 3x3
4 Dummy coding of categorical inputs4 Interactions between variables
h example: x3 = x1 x2
i This allows use of linear regression techniques to fit much more complicated non-linear datasets.
Extending Application of Linear Regression
![Page 11: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/11.jpg)
Example of fitting polynomial curve with linear model
![Page 12: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/12.jpg)
i Complex models (lots of parameters) are often prone to overfittingi Overfitting can be reduced by imposing a constraint on the overall
magnitude of the parameters (i.e., by including coefficients as part of the optimization process)
i Two common types of regularization in linear regression:
4 L2 regularization (a.k.a. ridge regression). Find which minimizes:
h is the regularization parameter: bigger imposes more constraint
4 L1 regularization (a.k.a. lasso). Find which minimizes:
Regularization
N
j
d
ii
d
iiij xy
1 1
22
0
)(
N
j
d
ii
d
iiij xy
1 1
2
0
||)(
![Page 13: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/13.jpg)
Example: Web Traffic Data
13
![Page 14: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/14.jpg)
1D Poly Fit
14
Example of too much “bias” underfitting
![Page 15: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/15.jpg)
Example: 1D and 2D Poly Fit
15
![Page 16: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/16.jpg)
Example: 1D Ploy Fit
16
Example of too much “variance” overfitting
![Page 17: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/17.jpg)
Bias-Variance Tradeoff
17
![Page 18: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/18.jpg)
Bias-Variance Tradeoffi Possible ways of dealing with high bias
4 Get additional features
4 More complex model (e.g., adding polynomial terms such as x12, x2
2 , x1.x2, etc.)
4 Use smaller regularization coefficient l.4 Note: getting more training data won’t necessarily help in this case
i Possible ways dealing with high variance4 Use more training instances4 Reduce the number of features4 Use simpler models
4 Use a larger regularization coefficient l.
18
![Page 19: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/19.jpg)
19
i Generalized linear models
4 Foundation on which linear regression can be applied to modeling categorical response variables
4 Variance of y is a function of the mean value of y, not a constant
4 Logistic regression: models the probability of some event occurring as a linear function of a set of predictor variables
4 Poisson regression: models the data that exhibit a Poisson distribution
i Log-linear models (for categorical data)
4 Approximate discrete multidimensional prob. distributions
4 Also useful for data compression and smoothing
i Regression trees and model trees
4 Trees to predict continuous values rather than class labels
Other Regression-Based Models
![Page 20: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/20.jpg)
20
Regression Trees and Model Treesi Regression tree: proposed in CART system (Breiman et al. 1984)
4 CART: Classification And Regression Trees
4 Each leaf stores a continuous-valued prediction
4 It is the average value of the predicted attribute for the training instances
that reach the leaf
i Model tree: proposed by Quinlan (1992)
4 Each leaf holds a regression model—a multivariate linear equation for
the predicted attribute
4 A more general case than regression tree
i Regression and model trees tend to be more accurate than linear
regression when instances are not represented well by simple linear
models
![Page 21: Classification and Prediction: Regression Analysis Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University](https://reader034.vdocument.in/reader034/viewer/2022051516/56649d9d5503460f94a8678b/html5/thumbnails/21.jpg)
21
Evaluating Numeric Predictioni Prediction Accuracy
4 Difference between predicted scores and the actual results (from evaluation set)4 Typically the accuracy of the model is measured in terms of variance (i.e., average
of the squared differences)
i Common Metrics (pi = predicted target value for test instance i, ai = actual target value for instance i)
4 Mean Absolute Error: Average loss over the test set
4 Root Mean Squared Error: compute the standard deviation (i.e., square root of the co-variance between predicted and actual ratings)
n
apapRMSE nn
2211 )(...)(
n
apapMAE nn )(...)( 11