maximum-likelihood estimation consider as usual a random sample x = x 1, …, x n from a...

Maximum-Likelihood estimation

Consider as usual a random sample x = x1, … , xn from a distribution with p.d.f. f (x; ) (and c.d.f. F(x; ) )

The maximum likelihood point estimator of is the value of that maximizes L( ; x ) or equivalently maximizes l( ; x )

Useful notation:

With a k-dimensional parameter:

x;maxargˆ

LML

x;maxargˆ θθθ

LML

Complete sample case:

If all sample values are explicitly known, then

Censored data case:

If some ( say nc ) of the sample values are censored , e.g. xi < k1 or xi > k2 , then

where

n

ii

n

iiML xfxf

11

;lnmaxarg;maxargˆ

uclcc

nnnn

iiML kXkXxf ,,

211

PrPr;maxargˆ

cuclc

uc

lc

nnn

kn

kn

,,

2,

1,

censored)-(right above being asonly known valuesofNumber

censored)-(left below being asonly known valuesofNumber

When the sample comes from a continuous distribution the censored data case can be written

In the case the distribution is discrete the use of F is also possible: If k1 and k2 are values that can be attained by the random variables then we may write

where

uclcc

nnnn

iiML kFkFxf ,, ;1;;maxargˆ

211

uclc

c nnnn

iiML kFkFxf ,, ;1;;maxargˆ

211

ofright theoclosest t valueattainable thebut valuea is

ofleft theoclosest t valueattainable thebut valuea is

222

111

kkk

kkk

Example:

n

iiML

n

ii

n

ii

n

ii

n

ii

xn

n

ii

n

ii

n

ii

n

ii

n

ii

n

i

n

ii

ii

n

i

x

x

n

xn

xn

x

n

x

n

x

nlx

nl

xn

xnl

xnl

xnxx

xex

l

xex

xf

xx

n

ii

1

2

1

2

2

1

2

3

2

1

2

3

2

1

2

3

12

2

1

2322

2

1

2

1

22

1

22

1

2

1 1

2

1

2

2

1

1ˆ maximum (local) a defines 1

022

excluded) 0 (case 11

0;1

1lnlnlnlnln;

0,0,;

p.d.f.on with distributiRayleigh thefrom (R.S.) sample random ,,

1

2

2

2

x

x

Example:

5

0

4

1

6

4

1

4

1

4

1

5

!1ln!lnlnminarg

!ln

!lnminarg

5Prln;lnminarg

5Pr;minargˆ

5 :censored-right is valuessample theof One!

;

function) (mass p.d.f.on with distributiPoisson thefrom R.S. "5",3,5,3,4

y

y

iii

y

yy

i i

x

ii

iiML

x

ey

xx

ey

ex

Xxf

Xxf

x

ex

xf

i

x

Solution must be numerically found

For the exponential family of distributions:

Use the canonical form (natural parameterization):

Let

Then the maximum likelihood estimators (MLEs) of 1, … , k are found by solving the system of equations

*

1;DxCxB

k

jjj

exf

kjtT

kjXBT

jj

n

iijj

,,1, assume and

,,1,1

x

X

kjtTE jj ,,1, X

Example:

n

ii

n

ii

n

ii

n

ii

n

ii

exxxxx

n

xXExXE

MLE

XTxxB

xfeeex

xf

xx

1111

1

!ln!lnln

1

solvingby found is

lni.e.,;!

;

ondistributiPoisson thefrom R.S. ,,

X

x

x

xexnexe

eeey

ee

xyex

eee

x

ee

x

ex

ex

exexxfxXE

ML

n

ii

n

ii

n

i

y

ey

x

ex

x

ex

x

ex

x

ex

x

exxi

lnˆ

1!

1!1!1!

!;

111

0

1

1

11

00

!ln

Computational aspects

When the MLEs can be found by evaluating

numerical routines for solving the generic equation g( ) = 0 can be used.

• Newton-Raphson method

• Fisher’s method of scoring (makes use of the fact that under regularity conditions:

)

This is the multidimensional analogue of Lemma 2.1 ( see page 17)

0θ

l

jiji

llE

θ

lE

xxx ;;;2 θθθ

When the MLEs cannot be found the above way other numerical routines must be used:

• Simplex method

• EM-algorithm

For description of the numerical routines see textbook.

Maximum Likelihood estimation comes into natural use not for handling the standard case, i.e. a complete random sample from a distribution within the exponential family , but for finding estimators in more non-standard and complex situations.

Example:

xaxb

a

b

ab

baL

abbaL

bxxxaabbaL

bxaabbaxf

baUxx

MLnML

nn

n

1

21

1

1

ˆ and ˆ

sample in the values thefrom ofion approximat possiblelargest

theand ofion approximat possiblesmallest theChoose

possible as small as is when

sample theorespect t with maximized is ;,

exist. minimaor maxima local No

case). ed(degenerat when )(largest as is ;,

otherwise0;,

otherwise0,;

, from R.S. ,,

x

x

x

Properties of MLEs

Invariance:

Consistency:

Under some weak regularity conditions all MLEs are consistent

Efficiency:

Under the usual regularity conditions:

(Asymptotically efficient and normally distributed)

φθθθ

φθθφθφ

φθ

of theis ˆ of theis ˆ ifthen

, : offunction one-to-one a is and

zationsparameteri ealternativ tworepresent and If

MLEMLE g

hg

1ML , as ddistributeally asymptotic is ˆ

IN

Sufficiency:

for statistic sufficient minimal

theoffunction a is ˆfor unique ˆMLML MLE

Example:

ii

nnnxx

xxxx

xx

XTXT

eL

ee

eexf

N

ii

XX

x

x

22

12221

2ln

22ln

22

12

2ln

2

12ln

2

1

2

1

2

22lnln

22ln

2

1ln

2

1

22

2

2

and ; and 2

1

;,

2

1,;

, from R.S.

2

22

2

2

2

2

22

2

2

22

222

2

22

2

2

2121,2

21

212,1212

1

22

1

21

112

1

2

22

1

22

1

,2,1

1

22

22

21

22

1

22

2

1

2221

2

12ˆ

2

1

2

1ˆ2

1

4

2

2

12

2

42

1

solvingby obtained are ˆ and ˆ

2

42

1

2

1

xxn

xx

xxn

xxn

xnxxnx

xx

nx

xn

xn

nnnXETE

nnnXETE

iiML

i

iMLi

i

i

i

MLML

i

i

X

X

Invariance property

nbiasNote

x

xxn

ML

MLMLML

i

ML

ML

as 0but 0ˆ!

ˆˆˆ

ˆ2

1ˆ

solution) unique a has equations relating of system the(as

with iprelationsh one-to-one a has

2

,22

21

,1

2

2

1

2

32

2

2232

23222

2

22222

2

22

2

22

2

222

2222

22

2

22

22

22

2

21

1

222

1

12

ln2

2ln22

1;,

nnxx

l

nx

l

nl

nnxx

l

nx

l

nnnxxl

ii

i

ii

i

ii

x

4

222

2

2232

2

2232

223222

22

22222

22

22

22

2

20

0;,

,with

22

21;,

01;,

;,

;,

n

nl

EI

nnnnn

lE

nn

lE

nlE

l

2θ θ

θ

x

x

x

X

x

n

nN

n

n

n

n

nnII

II

II

ML

ML4

2

22

4

2

2

4

421,1,1,2,

2,1,2,2,1

20

0; as ddistributeally asymptotic is

ˆ

ˆ

20

0

0

02

002

1

det

1

θθ

θθ

θθ

i.e. the two MLEs are asymptotically uncorrelated (and by the normal distribution independent)

Modifications and extensions

Ancillarity and conditional sufficiency:

2

112

2122

12

2121

of be to

said is and for statsitic an be tosaid is

on not but on depends b)

on not but on depends a)

,for statistic sufficient minimal a ,

21

2

θ

θ

θθ

θθ

θθθ

tindependenlly conditiona

TancillaryT

tTtf

tf

TTT

TT

T

XX

Profile likelihood:

This concept has its main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.

The same ML point estimator for 1 is obtained by maximizing the profile likelihood as by maximizing the full likelihood function

1

1.211

21.22121

for

thecalled is ;ˆ, then of uegiven val afor

of theis ˆ if , ;, and ,With

θ

θθθ

θθθθθθθ

likelihood profile

L

MLEL

x

x

Marginal and conditional likelihood:

.for the,;

onsolely based becan about inferencesthen

,,;; as factorized becan ,; If

.for the,;on solely based becan

about inferences then , on dependnot does ,; and

,;; as factorized becan ,; if Now,

,;,; then of ngpartitioni a is , If

,;

i.e. , sample theof p.d.f.joint the toequivalent is ;,

111

1

2121121

111

11212

2121121

2121

21

21

θθ

θ

θθθθθ

θθ

θθθθ

θθθθθ

θθθθ

θθ

θθ

likelihood lconditionaf

ff,f

likelihood marginalf

f

ff,f

,ff

f

L

vu

vvuvu

u

uv

uvuvu

vuxxvu

x

xx

X

X

XX

X

Again, these concepts have their main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.

Penalized likelihood:

MLEs can be derived subjected to some criteria od smoothness.

In particulare this is applicable when the parameter is no longer a single value (one- or multidimensional), but a function such as an unknown

density function or a regression curve.

The penalized log-likelihood function is written

valuesddistributey identicall-nonbut t independen ofset a

becan but here, sample random ususal not the is that Note

9) ch. (see es techniqu called-soby estimated

becan but , ; minimizingby estimatednot thusis

. of influence thegcontrollin

parameter fixed a is andfunction penalty theis where

;;

x

x

xx

validation-cross

l

R

R

Rll

P

P

θ

θ

θ

θθθ

Method of moments estimation (MM )

n

i

rir

n

i

rir

n

rr

rr

xxnm

sampler

xnm

sampler

xx

XE

centralr populationr

XE

populationr

X

1

1'

1

1

1

1'

is mean about themoment th The

is origin about themoment th The

:,, sample random aFor

is )moment th (mean about themoment th The

is origin about themoment th The

: variablerandom aFor

x

The method of moments point estimator of = ( 1, … , k ) is obtained by solving for 1, … , k the systems of equations

twohesebetween t mixture aor

,,1,

or ,,1,'' krm

krm

rr

rr

Example:

1212

2

12

363

12

444

4

2

3

43434

1

:moment central Second

2 :origin about themoment First

, from R.S. ,,

222

22222222

2332322

222'2

1

1

abaabb

babaaabbbaba

ab

aabbab

ba

ab

abab

ab

ybady

aby

XE

ba

baUxx

b

a

b

a

n

x

22

222

22

2222

1

2122

ˆ3ˆˆ3ˆ

as

possiblenot ˆ3ˆ32ˆ3

ˆ3ˆ3

ˆ3ˆ12

222

ˆ12

2

equations of systems the,for Solve

xbxa

ba

xxxbxa

xaax

axax

axb

xxnab

xba

ba

MMMM

n

ii

Method of Least Squares (LS)

First principles:

Assume a sample x where the random variable Xi can be written

The least-squares estimator of is the value of that minimizes

i.e.

2 varianceconstant

andmean zero with variablerandom a is and

involving (function) emean valu theis where

i

ii

m

mX

n

ii mx

1

2

n

iiLS mx

1

2minargˆ

A more general approach:

Assume the sample can be written (x, z ) where xi represents the random variable of interest (endogenous variable) and zi represent either an auxiliary random variable (exogenous) or a given constant for sample point i

The least squares estimator of is then

, of functions possibly , with

, and , 0 where

;

2

2

jiiji

ijjiiii

iii

zzc

cCovVarE

zmX

ε

ε

εε

ofmatrix covariance-variance

theis and ,, where

minargˆ

1 W

W

n

TLS

Special cases:

The ordinary linear regression model:

The heteroscedastic regression model:

xTT

n

iippii

n

iiLS

n

iippii

zzx

zzX

ZZZ

ZIW

Z

1

1

2,,110

1

2

2

,,110

minargminargˆ

matrixconstant a be toconsidered is and with

βββ

εβ

x111

2

ˆ

with

WZZWZ

IWZ

TTLS

niX

β

ε β

The first-order auto-regressive model:

ntzX

z

xxxxx

XX

ttt

nn

nttt

,,2,

availablenot is point)-me(first tipoint samplefirst

for the i.e. , ,,*, and ,,,Let

,

1121

21

zx

IW

The conditional least-squares estimator of (given ) is

0

0

zeros ofvector

ldimensiona-1 theis and 0

where

minarg

minargminargˆ

111

11

1

2

2

2

2

n

zx

nnn

Tn

T

n

iii

n

iiCLS

0I0

0W

W εε

maximum-likelihood estimation consider as usual a random sample x = x 1, …, x n from a...

Documents