cogs109 modeling & data analysis
TRANSCRIPT
![Page 1: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/1.jpg)
COGS109 Modeling & Data Analysis
Ch 7. Nonlinear regression
![Page 2: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/2.jpg)
Lecture outline
● Peer review due tonight● Quiz 5 due Wednesday midnight● Quiz 4 review● PCA and demo● Compare and contrasts model selection methods● Nonlinear regression - Polynomial regression
![Page 3: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/3.jpg)
Quiz 4 review
Menti.com → The voting code 5982 5990
● Will the model parameters I estimate when performing 2-fold CV have higher, or lower, variance compared with 10-fold CV?
● For each fold, will my estimate of the model’s mean squared error be more accurate with 2-fold or with 10-fold CV?
![Page 4: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/4.jpg)
Principal component analysis (PCA)
PCA finds a low-dimensional representation of the data that preserves as much of the information as possible.
● Information = variations: observations vary along each dimension.
See Ch10.2 for a more detailed PCA explanation
The first PC, Z1, of a set of predictors, X1, X2, … Xp is a normalized linear combination of the features that has the largest variance
● The coefficients, 𝝓1, …, 𝝓p, add up to 1 (i.e. normalized), and are called the loadings of the first PC.
The second PC, Z2 is the linear combination of X1, X2, …, Xp, that has maximal variance out of all linear combinations that are uncorrelated with Z1.
![Page 5: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/5.jpg)
PCA related questions
PCA implementation
● Standardize the predictor matrix before you perform PCA● If PCA needs to be performed with cross validations, we need to standardize the predictors within
the training data
Why do principal components maximally preserve the variability of our original data?
Explanation and visualization:
https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues
PCA loadings?
![Page 6: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/6.jpg)
PCA demo
![Page 7: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/7.jpg)
Compare and contrast model selection techniques
● What are the advantages subset selection, ridge, LASSO, and PC regression?● What are the disadvantages of subset selection, ridge, LASSO, and PC regression?● For each of the situations, which method would be more appropriate to use?
○ What if we want to reduce the computational burden?○ What if we assume that many predictors contribute the outcome?○ What if we assume that only a few predictors are related to outcome?○ What if we want to have unbiased parameter estimates?
● Under what situation would it be better to use subset selection?● Under what situation would it be better to use ridge?● Under what situation would it be better to use LASSO?● Under what situation would it be better to use PC regression? Breakout room activity
![Page 8: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/8.jpg)
Learning outcomes
● What’s the motivation behind implementing polynomial regression?● How does polynomial regression model nonlinear relationship?● How to implement polynomial regression?● What’s the limitations of polynomial regression?
![Page 9: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/9.jpg)
Non-linear regression methods
![Page 10: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/10.jpg)
Polynomial regression
● Motivation: linear assumption is sometimes a poor approximation of the true relationship.
● What’s an example of a nonlinear relationship in real life?
![Page 11: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/11.jpg)
Polynomial regression
● Motivation: linear assumption is sometimes a poor approximation of the true relationship.
● Solution: raising each original predictor to a power to capture the nonlinear fit
Linear model:
2nd degree polynomial model (quadratic equation)
This is still considered to be linear model as the coefficients/weights associated with the features are still linear. x² is only a feature. However the curve that we are fitting is quadratic in nature.
When to use polynomial regression?
1. When a nonlinear relationship is hypothesized2. When exploratory data analysis suggests a nonlinear fit may be better
nth degree polynomial model
![Page 12: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/12.jpg)
Polynomial regression
Simulated data example:● X = age (3 - 20)● Y = reaction time to visual stimuli
Visual inspection suggests that the relationship is nonlinear.
A straight line (simple linear regression) is unable to capture the patterns. Underfitting
![Page 13: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/13.jpg)
Polynomial regression
Simulated data example:● X = age (3 - 20)● Y = reaction time to visual stimuli
A second degree polynomial (quadratic model) is able to fit the data better.
A third degree polynomial (cubic model) fit passes through more data points.Underfitting
![Page 14: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/14.jpg)
Polynomial regression: summary
● Polynomial regression: Raise each predictor to a power to provide a nonlinear fit to the data
● Avoid overfitting → Use model selection (CV & test set MSE) to select the optimal polynomial terms.○ To perform model selection systematically, we can use forward and backward selection
![Page 15: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/15.jpg)
Limitations of polynomial regression
Polynomial regression imposes a global structure on the nonlinear relationship between X and Y.
2nd degree polynomial has 1 inflection point/ U-turn
3rd degree polynomials have 2 inflection points/ U-turns and look like an S
4th degree polynomials have 3 inflection points/ U-turns and look like an W
![Page 16: COGS109 Modeling & Data Analysis](https://reader031.vdocument.in/reader031/viewer/2022022314/6215e24614af5c024426d6f2/html5/thumbnails/16.jpg)
Limitations of polynomial regression
Polynomial regression imposes a global structure on the nonlinear relationship between X and Y.
If we want to model an even more flexible or locally nonlinear pattern, we can break the range of X into bins, and fit a different model for each piece of X.
Piecewise functions that model local changes
● Step functions● Piecewise polynomials● Splines