machine learning and pattern recognitionrocha/teaching/2018s1/mo444/classes/2018... · machine...

60
Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk of slides kindly provided by Prof. Sandra Avila) Institute of Computing (IC/Unicamp) MC886/MO444 REC D reasoning for complex data

Upload: others

Post on 27-May-2020

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Machine Learning andPattern Recognition

A High Level Overview

Prof. Anderson Rocha(Main bulk of slides kindly provided by Prof. Sandra Avila)

Institute of Computing (IC/Unicamp)

MC886/MO444

REC Dreasoning for complex data

Page 2: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Today’s Agenda

● Regularization○ The Problem of Overfitting

○ Diagnosing Bias vs. Variance

○ Cost Function

○ Regularized Linear Regression

○ Regularized Logistic Regression

Page 3: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

The Problem of Overfitting

Page 4: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Example: Linear Regression

+ + + + + + +

Page 5: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Example: Linear Regression

+ + + + + + +

Page 6: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Example: Linear Regression

+ + + + + + +

UnderfittingHigh bias

Page 7: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Example: Linear Regression

+ + + + + + +

UnderfittingHigh bias

Page 8: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Example: Linear Regression

+ + + + + + +

UnderfittingHigh bias

Page 9: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Example: Linear Regression

+ + + + + + +

UnderfittingHigh bias

OverfittingHigh variance

Page 10: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

x2

x1

Example: Logistic Regression

x2 x2

x1 x1

×××× ××××× × ××

×××

×××× ××××× × ××

×××

×××× ××××× × ××

×××

+ + + + +++ +

+ ++

++ ++

+

× × ×

Page 11: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

x2

x1

Example: Logistic Regression

x2 x2

x1 x1

×××× ××××× × ××

×××

×××× ××××× × ××

×××

×××× ××××× × ××

×××

+ + + + +++ +

+ ++

++ ++

+

UnderfittingHigh bias

× × ×

Page 12: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

x2

x1

Example: Logistic Regression

x2 x2

x1 x1

×××× ××××× × ××

×××

×××× ××××× × ××

×××

×××× ××××× × ××

×××

+ + + + +++ +

+ ++

++ ++

+

UnderfittingHigh bias

× × ×

Page 13: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

x2

x1

Example: Logistic Regression

x2 x2

x1 x1

×××× ××××× × ××

×××

×××× ××××× × ××

×××

×××× ××××× × ××

×××

+ + + + +++ +

+ ++

++ ++

+

UnderfittingHigh bias

OverfittingHigh variance

× × ×

Page 14: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

A model’s generalization error can be expressed as the sum of three very different errors:

● Bias● Variance● Irreducible error

The Bias/Variance Tradeoff

Page 15: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

A model’s generalization error can be expressed as the sum of three very different errors:

● Bias○ Due to wrong assumptions, such as assuming that the data is linear

when it is actually quadratic. ○ A high-bias model is most likely to underfit the training data.

● Variance● Irreducible error

The Bias/Variance Tradeoff

Page 16: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

A model’s generalization error can be expressed as the sum of three very different errors:

● Bias● Variance

○ Due to the model’s excessive sensitivity to small variations in the training data.

○ A model with many degrees of freedom is likely to have high variance, and thus to overfit the training data.

● Irreducible error

The Bias/Variance Tradeoff

Page 17: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

A model’s generalization error can be expressed as the sum of three very different errors:

● Bias● Variance● Irreducible error

○ Due to the noisiness of the data itself. ○ The only way to reduce this part of the error is to clean up the data.

The Bias/Variance Tradeoff

Page 18: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Increasing a model’s complexity will typically increase its variance and reduce its bias.

Reducing a model’s complexity increases its bias and reduces its variance.

This is why it is called a tradeoff.

The Bias/Variance Tradeoff

Page 19: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

DiagnosingBias vs. Variance

Page 20: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Bias/Variance

+ + + + + + +

UnderfittingHigh bias

OverfittingHigh variance

Page 21: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Bias/Variance

+ + + + + + +

UnderfittingHigh bias

OverfittingHigh variance

d = 1 d = 2 d = 4

Page 22: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Training error:

Cross-validation error:

Bias/Variance

=

= −

erro

r

degree of polynomial d

Page 23: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Training error:

Cross-validation error:

Bias/Variance

=

= −

erro

r

degree of polynomial d

Page 24: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Training error:

Cross-validation error:

Bias/Variance

=

= −

erro

r

degree of polynomial d

Page 25: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Diagnosing Bias vs. Varianceer

ror

degree of polynomial d

Suppose your learning algorithm is performing less well than you were hoping: is high. Is it a bias problem or a variance problem?

Page 26: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Diagnosing Bias vs. Varianceer

ror

degree of polynomial d

Suppose your learning algorithm is performing less well than you were hoping: is high. Is it a bias problem or a variance problem?

Page 27: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Diagnosing Bias vs. Varianceer

ror

degree of polynomial d

Bias (underfit):

Variance (overfit):Bias

Suppose your learning algorithm is performing less well than you were hoping: is high. Is it a bias problem or a variance problem?

Page 28: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Diagnosing Bias vs. Varianceer

ror

degree of polynomial d

Bias (underfit):

Variance (overfit):

will be high≈

Suppose your learning algorithm is performing less well than you were hoping: is high. Is it a bias problem or a variance problem?

Page 29: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Diagnosing Bias vs. Varianceer

ror

degree of polynomial d

Bias (underfit):

Variance (overfit):

will be high≈

will be low≫

Suppose your learning algorithm is performing less well than you were hoping: is high. Is it a bias problem or a variance problem?

Page 30: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Cost Function

Page 31: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Intuition

+ + + + + +

Page 32: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Intuition

+ + + + + +

Suppose we penalize and make 𝜃3, 𝜃4 really small.

+ +−

Page 33: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Intuition

+ + + + + +

Suppose we penalize and make 𝜃3, 𝜃4 really small.

+ +−

Page 34: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Pric

e

×Size

×× × ×

Pric

e

×Size

×× × ×

Intuition

+ + + + + +

Suppose we penalize and make 𝜃3, 𝜃4 really small.

+ +−

𝜃3 ≈ 0𝜃4 ≈ 0

Page 35: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Regularization

Small values for parameters 𝜃0, 𝜃1, ...,𝜃n ● “Simpler” hypothesis● Less prone to overfitting

Page 36: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Regularization

Small values for parameters 𝜃0, 𝜃1, ...,𝜃n ● “Simpler” hypothesis● Less prone to overfitting

Housing ● Features: x0, x1, ..., x100

● Parameters: 𝜃0, 𝜃1, 𝜃2, ..., 𝜃100

+− +=

Page 37: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Regularization

Small values for parameters 𝜃0, 𝜃1, ...,𝜃n ● “Simpler” hypothesis● Less prone to overfitting

Housing ● Features: x0, x1, ..., x100

● Parameters: 𝜃0, 𝜃1, 𝜃2, ..., 𝜃100

+−=

Page 38: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Regularization

Regularization parameter

+−=

to fit the training data well

to keep the parameters small

Page 39: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

+−=

In regularized linear regression, we choose 𝜃 to minimize

What if 𝜆 is set to an extremely large value (perhaps for too large for our problem, say 𝜆 = 1010)?

Pric

e

×Size

×× × ×

+ + + +

Page 40: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

+−=

In regularized linear regression, we choose 𝜃 to minimize

What if 𝜆 is set to an extremely large value (perhaps for too large for our problem, say 𝜆 = 1010)?

Pric

e

×Size

×× × ×

+ + + +

Page 41: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

+−=

In regularized linear regression, we choose 𝜃 to minimize

What if 𝜆 is set to an extremely large value (perhaps for too large for our problem, say 𝜆 = 1010)?

Pric

e

×Size

×× × ×

+ + + +

Page 42: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Regularized Linear Function

Page 43: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Gradient Descent

repeat {

}

− −− −

(simultaneously update for = )

Page 44: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Gradient Descent

repeat {

}

− −

(simultaneously update for = )

− −

Page 45: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Gradient Descent

repeat {

}

− − +

(simultaneously update for = )

− −

Page 46: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Gradient Descent

repeat {

}

− − +

(simultaneously update for = )

− −

− − −

Page 47: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Gradient Descent

repeat {

}

− − +

(simultaneously update for = )

− −

− − −

Page 48: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

X = = =

Normal Equation

Page 49: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

X = = =

Normal Equation

= 𝜆+

-1

Page 50: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

X = = =

Normal Equation

= 𝜆+

-1

Page 51: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

http://melvincabatuan.github.io/Machine-Learning-Activity-4/

Page 52: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

http://melvincabatuan.github.io/Machine-Learning-Activity-4/

Page 53: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

http://melvincabatuan.github.io/Machine-Learning-Activity-4/

Page 54: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Regularized Logistic Function

Page 55: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Gradient Descent

repeat {

}

− − +

(simultaneously update for = )

− −

− − −

Page 56: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

Gradient Descent

repeat {

}

− − +

(simultaneously update for = )

− −

− − −

= +=

Page 57: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

http://melvincabatuan.github.io/Machine-Learning-Activity-4/

Page 58: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

http://melvincabatuan.github.io/Machine-Learning-Activity-4/

Page 59: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

http://melvincabatuan.github.io/Machine-Learning-Activity-4/

Page 60: Machine Learning and Pattern Recognitionrocha/teaching/2018s1/mo444/classes/2018... · Machine Learning and Pattern Recognition A High Level Overview Prof. Anderson Rocha (Main bulk

References

Machine Learning Books

● Hands-On Machine Learning with Scikit-Learn and TensorFlow, Chap. 4

● Pattern Recognition and Machine Learning, Chap. 3

Machine Learning Courses

● https://www.coursera.org/learn/machine-learning, Week 3 & 6