cogs109 modeling & data analysis

16
COGS109 Modeling & Data Analysis Ch 7. Nonlinear regression

Upload: others

Post on 23-Feb-2022

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: COGS109 Modeling & Data Analysis

COGS109 Modeling & Data Analysis

Ch 7. Nonlinear regression

Page 2: COGS109 Modeling & Data Analysis

Lecture outline

● Peer review due tonight● Quiz 5 due Wednesday midnight● Quiz 4 review● PCA and demo● Compare and contrasts model selection methods● Nonlinear regression - Polynomial regression

Page 3: COGS109 Modeling & Data Analysis

Quiz 4 review

Menti.com → The voting code 5982 5990

● Will the model parameters I estimate when performing 2-fold CV have higher, or lower, variance compared with 10-fold CV?

● For each fold, will my estimate of the model’s mean squared error be more accurate with 2-fold or with 10-fold CV?

Page 4: COGS109 Modeling & Data Analysis

Principal component analysis (PCA)

PCA finds a low-dimensional representation of the data that preserves as much of the information as possible.

● Information = variations: observations vary along each dimension.

See Ch10.2 for a more detailed PCA explanation

The first PC, Z1, of a set of predictors, X1, X2, … Xp is a normalized linear combination of the features that has the largest variance

● The coefficients, 𝝓1, …, 𝝓p, add up to 1 (i.e. normalized), and are called the loadings of the first PC.

The second PC, Z2 is the linear combination of X1, X2, …, Xp, that has maximal variance out of all linear combinations that are uncorrelated with Z1.

Page 5: COGS109 Modeling & Data Analysis

PCA related questions

PCA implementation

● Standardize the predictor matrix before you perform PCA● If PCA needs to be performed with cross validations, we need to standardize the predictors within

the training data

Why do principal components maximally preserve the variability of our original data?

Explanation and visualization:

https://stats.stackexchange.com/questions/2691/making-sense-of-principal-component-analysis-eigenvectors-eigenvalues

PCA loadings?

Page 6: COGS109 Modeling & Data Analysis

PCA demo

Page 7: COGS109 Modeling & Data Analysis

Compare and contrast model selection techniques

● What are the advantages subset selection, ridge, LASSO, and PC regression?● What are the disadvantages of subset selection, ridge, LASSO, and PC regression?● For each of the situations, which method would be more appropriate to use?

○ What if we want to reduce the computational burden?○ What if we assume that many predictors contribute the outcome?○ What if we assume that only a few predictors are related to outcome?○ What if we want to have unbiased parameter estimates?

● Under what situation would it be better to use subset selection?● Under what situation would it be better to use ridge?● Under what situation would it be better to use LASSO?● Under what situation would it be better to use PC regression? Breakout room activity

Page 8: COGS109 Modeling & Data Analysis

Learning outcomes

● What’s the motivation behind implementing polynomial regression?● How does polynomial regression model nonlinear relationship?● How to implement polynomial regression?● What’s the limitations of polynomial regression?

Page 9: COGS109 Modeling & Data Analysis

Non-linear regression methods

Page 10: COGS109 Modeling & Data Analysis

Polynomial regression

● Motivation: linear assumption is sometimes a poor approximation of the true relationship.

● What’s an example of a nonlinear relationship in real life?

Page 11: COGS109 Modeling & Data Analysis

Polynomial regression

● Motivation: linear assumption is sometimes a poor approximation of the true relationship.

● Solution: raising each original predictor to a power to capture the nonlinear fit

Linear model:

2nd degree polynomial model (quadratic equation)

This is still considered to be linear model as the coefficients/weights associated with the features are still linear. x² is only a feature. However the curve that we are fitting is quadratic in nature.

When to use polynomial regression?

1. When a nonlinear relationship is hypothesized2. When exploratory data analysis suggests a nonlinear fit may be better

nth degree polynomial model

Page 12: COGS109 Modeling & Data Analysis

Polynomial regression

Simulated data example:● X = age (3 - 20)● Y = reaction time to visual stimuli

Visual inspection suggests that the relationship is nonlinear.

A straight line (simple linear regression) is unable to capture the patterns. Underfitting

Page 13: COGS109 Modeling & Data Analysis

Polynomial regression

Simulated data example:● X = age (3 - 20)● Y = reaction time to visual stimuli

A second degree polynomial (quadratic model) is able to fit the data better.

A third degree polynomial (cubic model) fit passes through more data points.Underfitting

Page 14: COGS109 Modeling & Data Analysis

Polynomial regression: summary

● Polynomial regression: Raise each predictor to a power to provide a nonlinear fit to the data

● Avoid overfitting → Use model selection (CV & test set MSE) to select the optimal polynomial terms.○ To perform model selection systematically, we can use forward and backward selection

Page 15: COGS109 Modeling & Data Analysis

Limitations of polynomial regression

Polynomial regression imposes a global structure on the nonlinear relationship between X and Y.

2nd degree polynomial has 1 inflection point/ U-turn

3rd degree polynomials have 2 inflection points/ U-turns and look like an S

4th degree polynomials have 3 inflection points/ U-turns and look like an W

Page 16: COGS109 Modeling & Data Analysis

Limitations of polynomial regression

Polynomial regression imposes a global structure on the nonlinear relationship between X and Y.

If we want to model an even more flexible or locally nonlinear pattern, we can break the range of X into bins, and fit a different model for each piece of X.

Piecewise functions that model local changes

● Step functions● Piecewise polynomials● Splines