logistic regression linear, polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... ·...

24
Linear, Polynomial, and Logistic Regression Supervised Learning

Upload: others

Post on 21-Jul-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Linear, Polynomial, and Logistic Regression

Supervised Learning

Page 2: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Regressions in Machine Learning

Regression problems● Linear regression with one variable● Linear regression with multiple variables● Polynomial regression

Classification problems ● Logistic regression

Page 3: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Linear Regression with one variableSupervised Learning

Page 4: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Representation

A straight line with variable intercept and offset

Page 5: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Cost function

Mean squared error

-

Page 6: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Cost function with θ1 only

Let θ0 = 0 to simplify. (line passes through origin)

θ1

Cost fxn

J(θ1)

Page 7: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Cost function with θ0 and θ1

3 dimensional “bow shaped” cost function

Page 8: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Gradient Descent

● Can minimize J ● Used widely in machine learning● Can minimize large number of parametersGeneral procedure: 1) Start with some θ0 and θ1 (typically 0,0)2) Change θ0 and θ1 to reduce J(θ0 , θ1)3) Repeat 2 until at the minimum

Page 9: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Definition of Gradient Descent

Partial derivative (or gradient) of the cost function

Page 10: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Applying Gradient Descent to Linear Regression

● Combine the definition of gradient descent and cost function● Have to do partial derivative of the cost function

Final equations:

Page 11: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Properties of gradient descent

● For linear regression always finds global minimum

● Can become unstable if learning rate is too large or too slow if learning rate is too small

Page 12: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Linear Regression with Multiple Variables

Supervised Learning

Page 13: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

General form of linear regression

Linear regression can work with more than two parameters

By letting X0 = 1 we can write general form

Page 14: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Applying gradient descent

Algorithm: Repeat until convergence:

Cost function:

Final form:

Page 15: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Feature scaling

● If features are different in sizes, gradient descent might take long to converge

● Features can be scaled to be in same range (in approximately -1<x<1 range)

● Number of ways to scale such as mean normalization

Page 16: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Learning rate

● Plot number of iterations vs. J(θ). Should see decrease until convergence

● If learning rate is too large gradient descent might become unstable, and never converge

Page 17: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Polynomial regression

Minimized in the same way as linear regressionFor example cubic fit with one feature x:

Generate new feature by squaring cubing the original feature

Page 18: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Logistic regression

● Binary classification algorithm ● Modify the linear regression to fit logistic

function.● Output is probability of given class

Page 19: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Applying Machine Learning

Supervised and Unsupervised Learning

Page 20: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General
Page 21: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Classifier is only a part of ML system

DATAPre

processingFeature

extraction Classifier Post processing

-What kind of data?-Understanding the data

-Normalization-Scaling-Filtering -Outliers-Missing data

- Spectral analysis- Statistical features-

- Type of classifier- Tuning parameters- Training data

- Voting - Human readable form-Time out

Page 22: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

Designing ML system is iterative process

DATAPre

processingFeature

extraction Classifier Post processing

HUMAN PROGRAMMER

Page 23: Logistic Regression Linear, Polynomial, andresponsive.media.mit.edu/wp-content/uploads/sites/... · Used widely in machine learning Can minimize large number of parameters General

General advices

● It is important to have clean training data● If human expert can’t classify the data,

machine can’t also● You can’t get something from nothing● Prototype simple system first