introduction to machine learning 3rd edition ethem alpaydin modified by prof. carolina ruiz © the...

26
INTRODUCTION TO MACHİNE LEARNİNG 3RD EDİTİON ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI [email protected] http://www.cmpe.boun.edu.tr/~ethem/i2ml3e Lecture Slides for

Upload: kathleen-jones

Post on 13-Jan-2016

249 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

INTRODUCTION TO MACHİNE LEARNİNG3RD EDİTİON

ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz© The MIT Press, 2014 for CS539 Machine Learning at WPI

[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml3e

Lecture Slides for

Page 2: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

CHAPTER 5:

MULTİVARİATE METHODS

Page 3: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

3

Multivariate Data

Nd

NN

d

d

XXX

XXXXXX

21

222

21

112

11

X

Multiple measurements (sensors) d inputs/features/attributes: d-variate N instances/observations/examples

Page 4: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

4

Multivariate Parameters

221

22221

11221

μμCov

ddd

d

d

TE

XXX

ji

ijijji

jiTji

Tjjiijiij

Td

XX

XXEXXEXX

E

,Corr :nCorrelatio

,Cov:Covariance

,...,:Mean 1μx

Covariance Matrix:

Page 5: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

5

Parameter Estimation from data sample X

ji

ijij

jtj

N

t iti

ij

N

t

ti

i

ss

sr

N

mxmxs

diN

xm

:matrix nCorrelatio

:matrix Covariance

,...,1,: mean Sample

1

1

R

S

m

Page 6: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

6

Estimation of Missing Values

What to do if certain instances have missing attribute values?

Ignore those instances. This is not a good idea if the sample is small

Use ‘missing’ as an attribute: may give information

Imputation: Fill in the missing value Mean imputation: Use the most likely value

(e.g., mean) Imputation by regression: Predict based on

other attributes

Page 7: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

7

Multivariate Normal Distribution

μxμxx

μx

1212

Σ2

1

Σ2

1

Σ

Td

d

p exp

~

//

,N

Page 8: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

8

Multivariate Normal Distribution

Mahalanobis distance: (x – μ)T ∑–1 (x – μ) measures the distance from x to μ in terms of ∑ (normalizes for difference in variances and correlations)

Bivariate: d = 2

2221

2121

iiii xz

zzzzxxp

/

exp,

2

2212122

21

21 212

1

12

1

z-normalization

Page 9: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

9

Bivariate NormalIsoprobability [i.e., (x – μ)T ∑–1 (x – μ) = c2] contour plots

when covariance is 0, ellipsoid axes are parallel to coordinate axes

Page 10: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

10

Page 11: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

11

If xi are independent, offdiagonal values of ∑ are 0, Mahalanobis distance reduces to weighted (by 1/σi) Euclidean distance:

If variances are also equal, reduces to Euclidean distance

Independent Inputs: Naive Bayes

d

i i

iid

ii

d

d

iii

xxpp

1

2

1

2/1 2

1exp

2

1

x

The use of the term “Naïve Bayes” in this chapter is somewhat wrong Naïve Bayes assumes independence in the probability sense,

not in the linear algebra sense

Page 12: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

12

Parametric Classification

If p (x | Ci ) ~ N ( μi , ∑i )

Discriminant functions

iiT

i

idiCp μxμxx 1

212Σ

2

1

Σ2

1exp| //

iiiT

ii

iii

CPd

CPCpg

log loglog

log| log

μΣμ2

2

12

21 xx

xx

Page 13: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

13

Estimation of Parameters from data sample X

tti

Ti

tt i

tti

i

tti

ttt

ii

tti

i

r

r

r

rN

rCP

mxmx

xm

S

ˆ

iiiT

iii CPg ˆ log log mxmxx 1

2

1

2

1SS

Page 14: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

14

Assuming a different Si for each Ci

Quadratic discriminant. Expanding the formula on previous slide:

has the form of a quadratic formula

See figure on next slide

iiiiTii

iii

ii

iTii

T

iiiTiii

Ti

Tii

CPw

w

CPg

ˆ log log

where

ˆ log log

SS

S

SW

W

SSSS

2

1

2

1

2

1

22

1

2

1

10

1

1

0

111

mm

mw

xwxx

mmmxxxx

Page 15: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

15

likelihoods

posterior for C1

discriminant: P (C1|x ) = 0.5

Page 16: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

16

Assuming Common Covariance Matrix S

ii

iCP̂ SS

Shared common sample covariance S

Discriminant reduces to

which is a linear discriminant

iiT

ii CP̂g log21 1 mxmxx S

iiTiiii

iTii

CPw

wg

ˆ log

where

mmmw

xwx

10

1

0

2

1SS

Page 17: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

17

Common Covariance Matrix S

Arbitrary covariances but shared by classes

Page 18: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

18

Assuming Common Covariance Matrix S is Diagonal

id

j j

ijtj

i CPsmx

g ˆ log

1

2

2

1x

When xj j = 1,..d, are independent, ∑ is diagonal

p (x|Ci) = ∏j p (xj |Ci) (Naive Bayes’

assumption)

Classify based on weighted Euclidean distance (in sj units) to the nearest mean

Page 19: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

19

Assuming Common Covariance Matrix S is Diagonal

variances may bedifferent

Covariances are 0, so ellipsoid axes are parallel to coordinate axes

Page 20: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

20

Assuming Common Covariance Matrix S is Diagonal and variances are equal

id

jij

tj

ii

i

CPmxs

CPs

g

ˆ log

ˆ log

2

12

2

2

2

1

2

mxx

Nearest mean classifier: Classify based on Euclidean distance to the nearest mean

Each mean can be considered a prototype or template and this is template matching

Page 21: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

21

Assuming Common Covariance Matrix S is Diagonal and variances are equal

*?

Covariances are 0, so ellipsoid axes are parallel to coordinate axes.Variances are the same, so ellipsoids become circles.

Classifier looks for nearest mean

Page 22: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

22

Model Selection

Assumption Covariance matrix No of parameters

Shared, Hyperspheric Si=S=s2I 1

Shared, Axis-aligned Si=S, with sij=0 d

Shared, Hyperellipsoidal Si=S d(d+1)/2

Different, Hyperellipsoidal Si K d(d+1)/2

As we increase complexity (less restricted S), bias decreases and variance increases

Assume simple models (allow some bias) to control variance (regularization)

Page 23: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

23

Different cases of covariance matrices

fitted to the same data lead to different decision

boundaries

Page 24: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

24

Discrete Features

i

jijjijj

iii

CPpxpx

CPCpg

log log log

log| log

11

xx

Binary features:if xj are independent (Naive Bayes’)

the discriminant is linear

ijij Cxpp |1

d

j

xij

xiji

jj ppCxp1

11|

tti

tti

tj

ij r

rxp̂Estimated parameters

Page 25: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

25

Discrete Features

ikjijkijk CvxpCzpp || 1

Multinomial (1-of-nj) features: xj in {v1, v2,..., vnj

}

where zjk = 1 if xj = vk ; or 0 otherwiseif xj are independent

tti

tti

tjk

ijk

iijkj k jki

d

j

n

k

zijki

r

rzp

CPpzg

pCpj

jk

ˆ

log log

|

x

x1 1

Page 26: INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN Modified by Prof. Carolina Ruiz © The MIT Press, 2014 for CS539 Machine Learning at WPI alpaydin@boun.edu.tr

Multivariate Regression26

dtt wwwxgr ,...,,| 10

Multivariate linear model

Multivariate polynomial model: Define new higher-order variables

z1=x1, z2=x2, z3=x12, z4=x2

2, z5=x1x2

and use the linear model in this new z space (basis functions, kernel trick: Chapter 13)

211010

22110

2

1

ttdd

ttd

tdd

tt

xwxwwrwwwE

xwxwxww

X|,...,,