chapter 3: maximum-likelihood parameter estimation l introduction l maximum-likelihood estimation l...

24
Chapter 3: Maximum- Likelihood Parameter Estimation Introduction Maximum-Likelihood Estimation Multivariate Case: unknown , known Univariate Case: unknown and unknown 2 Bias Appendix: Maximum-Likelihood Problem Statement

Upload: john-james

Post on 13-Dec-2015

235 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Chapter 3: Maximum-Likelihood Parameter Estimation

Introduction Maximum-Likelihood Estimation

Multivariate Case: unknown , known Univariate Case: unknown and unknown 2

Bias Appendix: Maximum-Likelihood Problem Statement

Page 2: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

2

Introduction Data availability in a Bayesian framework

We could design an optimal classifier if we knew: P(i) (priors)

P(x | i) (class-conditional densities)

Unfortunately, we rarely have this complete information!

Design a classifier from a training sample No problem with prior estimation Samples are often too small for class-conditional

estimation (large dimension of feature space!)

1

Page 3: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

3

A priori information about the problem

Normality of P(x | i)

P(x | i) ~ N( i, i)

Characterized by i and i parameters

Estimation techniques

Maximum-Likelihood and Bayesian estimations Results nearly identical, but approaches are different We will not cover Bayesian estimation details

1

Page 4: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

4

Parameters in Maximum-Likelihood estimation are fixed but unknown!

Best parameters are obtained by maximizing the probability of obtaining the samples observed

Bayesian methods view the parameters as random variables having some known distribution

In either approach, we use P(i | x)for our classification rule!

1

Page 5: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

5 Maximum-Likelihood Estimation

Has good convergence properties as the sample size increases

Simpler than alternative techniques

General principle

Assume we have c classes and

P(x | j) ~ N( j, j)

P(x | j) P (x | j, j) where:

)...)x,xcov(,,,...,,(),( nj

mj

22j

11j

2j

1jjj

2

Page 6: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

6 Use the information

provided by the training samples to estimate

= (1, 2, …, c), each i (i = 1, 2, …, c) is associated with each category

Suppose that D contains n samples, x1, x2,…, xn

ML estimate of is, by definition the value that maximizes P(D | )

“It is the value of that best agrees with the actually observed training sample”

samples) ofset the w.r.t. of likelihood the called is )|D(P

)(F)|x(P)|D(Pnk

1kk

2

Page 7: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

7

2

Likelihood

Log-likelihood

(fixed, = unknown

Page 8: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

8 Optimal estimation

Let = (1, 2, …, p)t and let be the gradient operator

We define l() as the log-likelihood function

l() = ln P(D | )

New problem statement:

determine that maximizes the log-likelihood

t

p21

,...,,

)(lmaxargˆ

2

Page 9: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

9

Set of necessary conditions for an optimum is:

l = 0

))|x(Plnl( k

nk

1k

2

n = number of training samples

Page 10: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

10

Multivariate Gaussian: unknown , known

Samples drawn from multivariate Gaussian population

P(xi | ) ~ N(, ) =

= therefore:The ML estimate for must satisfy:

)x()|x(Pln and

)x()x(21

)2(ln21

)|x(Pln

1

kk

1

kt

kd

k

0)ˆx( k

nk

1k

1

2

Page 11: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

11

• Multiplying by and rearranging, we obtain:

Just the arithmetic average of the samples of the training samples!

Conclusion: If P(xk | j) (j = 1, 2, …, c) is supposed to be Gaussian in a d-

dimensional feature space; then we can estimate the vector

= (1, 2, …, c)t and perform an optimal classification!

nk

1kkx

n1

ˆ

2

Page 12: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

12

Univariate Gaussian: unknown , unknown 2 Samples drawn from univariate Gaussian population

P(xi | , 2) ~ N(, 2) = (1, 2) = (, 2)

02

)x(

21

0)x(1

0))|x(P(ln

))|x(P(ln

l

)x(2

12ln

21

)|x(Plnl

22

21k

2

1k2

k2

k1

21k

22k

2

Page 13: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

13

Summation:

Combining (1) and (2), one obtains:

n

)x( ;

n

x

nk

1k

2k

2nk

1k

k

nk

1k

nk

1k22

21k

2

nk

1k1k

2

(2) 0ˆ

)ˆx(ˆ1

(1) 0)x(ˆ1

2

Page 14: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

14

Bias

Maximum-Likelihood estimate for 2 is biased

An elementary unbiased estimator for is:

222i .

n1n

)xx(n1

E

matrix covariance Sample

nk

1k

tkk )ˆx)(x(

1-n1

C

2

Page 15: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

15

Appendix: Maximum-Likelihood Problem Statement

Let D = {x1, x2, …, xn}

P(x1,…, xn | ) = 1,nP(xk | ); |D| = n

Our goal is to determine (value of that makes this sample the most representative!)

2

Page 16: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

16

|D| = n

x1

x2xn

.. ..

.

..

..

..

...

..

..

x11

x20

x10x8

x9x1

N(j, j) = P(xj, 1)

D1

DcDk

P(xj | 1)P(xj | k)

2

Page 17: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

17

= (1, 2, …, c)

Problem: find such that:

n

1kk

n1

)|x(PMax

)|x,...,x(MaxP)|D(PMax

2

Page 18: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 3

18

Sources of final-system classification error (sec 3.5.1)

Bayes Error Error due to overlapping densities for different

classes (inherent error, never eliminated) Model Error

Error due to having an incorrect model Estimation Error

Error from estimating parameters from finite sample

1

Page 19: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 1

19

Problems of Dimensionality (sec 3.7)Accuracy, Dimension, Training Sample SizeClassification accuracy depends upon the dimensionality and the amount of training dataCase of two classes multivariate normal with the same covariance

0)error(Plim

)()(r :where

due21

)error(P

r

211t

212

2

2u

2/r

7

Page 20: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 1

20

If features are independent then:

Most useful features are the ones for which the difference between the means is large relative to the standard deviation It appears that adding new features improves accuracy

It has frequently been observed in practice that, beyond a certain point, the inclusion of additional features leads to worse rather than better performance: we have the wrong model !

2di

1i i

2i1i2

2d

22

21

r

),...,,(diag

7

Page 21: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 1

21

77

7

Page 22: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 1

22

Computational Complexity

Maximum-Likelihood Estimation Gaussian priors in d dimensions, with n

training samples for each of c classes For each category, we have to compute the

discriminant function

Total = O(d2..n)Total for c classes = O(cd2.n) O(d2.n)

Cost increase when d and n are large!

)n(O)n.2d(O

)1(O)2d.n(O

1t)n.d(O

)(Plnˆln21

2ln2d

)ˆx()ˆx(21

)x(g

7

Page 23: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 1

23

Overfitting

Number of training samples n can be inadequate for estimating the parameters

What to do?

Simplify the model – reduce the parameters Assume all classes have same covariance matrix Assume statistical independence

Reduce number of features d Principal Component Analysis, etc.

8

Page 24: Chapter 3: Maximum-Likelihood Parameter Estimation l Introduction l Maximum-Likelihood Estimation l Multivariate Case: unknown , known  l Univariate

Pattern Classification, Chapter 1

24

8