introduction to machine learning - rutgers universityelgammal/classes/cs536/... · introduction to...

33
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml Lecture Slides for

Upload: others

Post on 20-May-2020

33 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

INTRODUCTION TO

Machine Learning

ETHEM ALPAYDIN© The MIT Press, 2004

[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml

Lecture Slides for

Page 2: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

CHAPTER 10:

Linear Discrimination

Page 3: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

3

Likelihood- vs. Discriminant-based Classification

Likelihood-based: Assume a model for p(x|Ci), use Bayes’ rule to calculate P(Ci|x)

gi(x) = log P(Ci|x)

Discriminant-based: Assume a model for gi(x|Φi); no density estimation

Estimating the boundaries is enough; no need to accurately estimate the densities inside the boundaries

Page 4: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

4

Linear Discriminant

Linear discriminant:

Advantages:Simple: O(d) space/computation

Knowledge extraction: Weighted sum of attributes; positive/negative weights, magnitudes (credit scoring)Optimal when p(x|Ci) are Gaussian with shared cov matrix; useful when classes are (almost) linearly separable

Page 5: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

5

Generalized Linear Model

Quadratic discriminant:

Higher-order (product) terms:

Map from x to z using nonlinear basis functions and use a linear discriminant in z-space

Page 6: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

6

Two Classes

Page 7: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

7

Geometry

Page 8: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

8

Multiple Classes

Classes arelinearly separable

Page 9: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

9

Pairwise Separation

Page 10: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

10

From Discriminants to Posteriors

When p (x | Ci ) ~ N ( µi , ∑)

Page 11: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

11

Page 12: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

12

Sigmoid (Logistic) Function

Page 13: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

13

Gradient-Descent

E(w|X) is error with parameters w on sample X

Gradient

Gradient-descent:

Starts from random w and updates w iteratively in the negative direction of gradient

Page 14: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

14

Gradient-Descent

wt wt+1

η

E (wt)

E (wt+1)

Page 15: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

15

Logistic Discrimination

Two classes: Assume log likelihood ratio is linear

Page 16: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

16

Training: Two Classes

Page 17: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

17

Training: Gradient-Descent

Page 18: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

18

Page 19: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

19

10

100 1000

Page 20: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

20

K>2 Classessoftmax

Page 21: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

21

Page 22: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

22

Example

Page 23: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

23

Generalizing the Linear Model

Quadratic:

Sum of basis functions:

where φ(x) are basis functions Kernels in SVMHidden units in neural networks

Page 24: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

24

Discrimination by Regression

Classes are NOT mutually exclusive and exhaustive

Page 25: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

25

Optimal Separating Hyperplane

(Cortes and Vapnik, 1995; Vapnik, 1995)

Page 26: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

26

Margin

Distance from the discriminant to the closest instances on either side

Distance of x to the hyperplane is

We require

For a unique sol’n, fix and to max margin

Page 27: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

27

Page 28: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

28

Page 29: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

29

Most αt are 0 and only a small number have αt>0; they are the support vectors

Page 30: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

30

Soft Margin Hyperplane

Not linearly separable

Soft error

New primal is

Page 31: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

31

Kernel Machines

Preprocess input x by basis functions

The SVM solution

Page 32: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

32

Kernel Functions

Polynomials of degree q:

Radial-basis functions:

Sigmoidal functions:

(Cherkassky and Mulier, 1998)

Page 33: INTRODUCTION TO Machine Learning - Rutgers Universityelgammal/classes/cs536/... · INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, 2004 alpaydin@boun.edu.tr ethem/i2ml

Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.0)

33

SVM for Regression

Use a linear model (possibly kernelized)

Use the є-sensitive error function