maximum-likelihood estimation

Maximum-Likelihood estimation

Consider as usual a random sample x = x1, … , xn from a distribution with p.d.f. f (x; ) (and c.d.f. F(x; ) )

The maximum likelihood point estimator of is the value of that maximizes L( ; x ) or equivalently maximizes l( ; x )

Useful notation:

With a k-dimensional parameter:

x;maxargˆ

LML

x;maxargˆ θθθ

LML

Complete sample case:

If all sample values are explicitly known, then

Censored data case:

If some ( say nc ) of the sample values are censored , e.g. xi < k1 or xi > k2 , then

where

n

ii

n

iiML xfxf

11

;lnmaxarg;maxargˆ

uclcc

nnnn

iiML kXkXxf ,,

211

PrPr;maxargˆ

cuclc

uc

lc

nnn

kn

kn

,,

2,

1,

censored)-(right above being asonly known valuesofNumber

censored)-(left below being asonly known valuesofNumber

When the sample comes from a continuous distribution the censored data case can be written

In the case the distribution is discrete the use of F is also possible: If k1 and k2 are values that can be attained by the random variables then we may write

where

uclcc

nnnn

iiML kFkFxf ,, ;1;;maxargˆ

211

uclc

c nnnn

iiML kFkFxf ,, ;1;;maxargˆ

211

ofright theoclosest t valueattainable thebut valuea is

ofleft theoclosest t valueattainable thebut valuea is

222

111

kkk

kkk

Example:

n

iiML

n

ii

n

ii

n

ii

n

ii

xn

n

ii

n

ii

n

ii

n

ii

n

ii

n

i

n

ii

ii

n

i

x

x

n

xn

xn

x

n

x

n

x

nlxnl

xn

xnlxnl

xnxxxexl

xexxf

xx

n

ii

1

2

1

2

2

1

2

3

2

1

2

3

2

1

2

3

12

2

1

2322

2

1

2

1

22

1

22

1

2

1 1

2

1

2

2

1

1ˆ maximum (local) a defines 1

022

excluded) 0 (case 110;1

1lnlnlnlnln;

0,0,;

p.d.f.on with distributiRayleigh thefrom (R.S.) sample random ,,

1

2

2

2

x

x

Example:

5

0

4

1

6

4

1

4

1

4

1

5

!1ln!lnlnminarg

!ln

!lnminarg

5Prln;lnminarg

5Pr;minargˆ

5 :censored-right is valuessample theof One!

;

function) (mass p.d.f.on with distributiPoisson thefrom R.S. "5",3,5,3,4

y

y

iii

y

yy

i i

x

ii

iiML

x

ey

xx

ey

ex

Xxf

Xxf

x

ex

xf

i

x

Solution must be numerically found

For the exponential family of distributions:

Use the canonical form (natural parameterization):

Let

Then the maximum likelihood estimators (MLEs) of 1, … , k are found by solving the system of equations

*

1;DxCxB

k

jjj

exf

kjtT

kjXBT

jj

n

iijj

,,1, assume and

,,1,1

x

X

kjtTE jj ,,1, X

Example:

n

ii

n

ii

n

ii

n

ii

n

ii

exxxxx

n

xXExXE

MLE

XTxxB

xfeeex

xf

xx

1111

1

!ln!lnln

1

solvingby found is

lni.e.,;!

;

ondistributiPoisson thefrom R.S. ,,

X

x

x

xexnexe

eeey

ee

xyexeee

xee

xex

ex

exexxfxXE

ML

n

ii

n

ii

n

i

y

ey

x

ex

x

ex

x

ex

x

ex

x

exxi

lnˆ

1!

1!1!1!

!;

111

0

1

1

11

00

!ln

Computational aspects

When the MLEs can be found by evaluating

numerical routines for solving the generic equation g( ) = 0 can be used.

• Newton-Raphson method

• Fisher’s method of scoring (makes use of the fact that under regularity conditions:

)

This is the multidimensional analogue of Lemma 2.1 ( see page 17)

0θl

jiji

llEθlE

xxx ;;;2 θθθ

When the MLEs cannot be found the above way other numerical routines must be used:

• Simplex method

• EM-algorithm

For description of the numerical routines see textbook.

Maximum Likelihood estimation comes into natural use not for handling the standard case, i.e. a complete random sample from a distribution within the exponential family , but for finding estimators in more non-standard and complex situations.

Example:

xaxb

ab

abbaL

abbaL

bxxxaabbaL

bxaabbaxf

baUxx

MLnML

nn

n

1

21

11

ˆ and ˆsample in the values thefrom ofion approximat possiblelargest

theand ofion approximat possiblesmallest theChoose possible as small as is when

sample theorespect t with maximized is ;,exist. minimaor maxima local No

case). ed(degenerat when )(largest as is ;,otherwise0

;,

otherwise0,;

, from R.S. ,,

x

x

x

Properties of MLEs

Invariance:

Consistency:

Under some weak regularity conditions all MLEs are consistent

Efficiency:

Under the usual regularity conditions:

(Asymptotically efficient and normally distributed)

φθθθ

φθθφθφφθ

of theis ˆ of theis ˆ ifthen

, : offunction one-to-one a is andzationsparameteri ealternativ tworepresent and If

MLEMLE g

hg

1ML , as ddistributeally asymptotic is ˆ

IN

Sufficiency:

for statistic sufficient minimal

theoffunction a is ˆfor unique ˆMLML MLE

Example:

ii

nnnxx

xxxx

xx

XTXT

eL

ee

eexf

N

ii

XX

x

x

22

12221

2ln

22ln

221

2

2ln

212ln

21

21

222lnln

22ln

21ln

21

22

2

2

and ; and 2

1;,

2

1,;

, from R.S.

2

22

22

2

2

22

22

22

222

2

22

2

2

2121,2

21

212,1212

1

22

1

21

112

1

2

22

1

22

1

,2,1

1

22

22

21

22

1

22

2

1

2221

212ˆ

21

21ˆ

21

42

212

2

421

solvingby obtained are ˆ and ˆ2

421

21

xxnxx

xxn

xxn

xnxxnx

xxnx

xn

xn

nnnXETE

nnnXETE

iiML

i

iMLi

i

i

i

MLML

i

i

X

X

Invariance property

nbiasNote

x

xxn

ML

MLMLML

iML

ML

as 0but 0ˆ!

ˆˆˆ

ˆ21ˆ

solution) unique a has equations relating of system the(as

with iprelationsh one-to-one a has

2

,22

21

,1

2

2

12

32

2

22322

3222

2

22222

2

22

2

22

2

2222

222

22

2

22

22

22

2

21

1

222

1

12

ln2

2ln22

1;,

nnxxl

nxl

nl

nnxxl

nxl

nnnxxl

ii

i

ii

i

ii

x

4

222

2

2232

2

2232

223222

22

22222

22

22

22

2

20

0;,

,with

2221;,

01;,

;,

;,

n

nlEI

nnnnnlE

nnlE

nlE

l

2θ θ

θ

x

x

x

X

x

n

nN

n

n

n

n

nnIIII

II

ML

ML4

2

22

4

2

2

4

421,1,1,2,

2,1,2,2,1

20

0; as ddistributeally asymptotic is

ˆˆ

20

0

0

02

002

1det

1

θθ

θθ

θθ

i.e. the two MLEs are asymptotically uncorrelated (and by the normal distribution independent)

Modifications and extensions

Ancillarity and conditional sufficiency:

2

112

2122

12

2121

of be tosaid is and for statsitic an be tosaid is

on not but on depends b)

on not but on depends a),for statistic sufficient minimal a ,

21

2

θθ

θθ

θθθθθ

tindependenlly conditionaTancillaryT

tTtf

tfTTT

TT

T

XX

Profile likelihood:

This concept has its main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.

The same ML point estimator for 1 is obtained by maximizing the profile likelihood as by maximizing the full likelihood function

1

1.211

21.22121

for thecalled is ;ˆ, then of uegiven val afor

of theis ˆ if , ;, and ,With

θθθθ

θθθθθθθ

likelihood profileL

MLEL

x

x

Marginal and conditional likelihood:

.for the,;onsolely based becan about inferencesthen

,,;; as factorized becan ,; If

.for the,;on solely based becan

about inferences then , on dependnot does ,; and

,;; as factorized becan ,; if Now,

,;,; then of ngpartitioni a is , If ,;

i.e. , sample theof p.d.f.joint the toequivalent is ;,

111

1

2121121

111

11212

2121121

2121

21

21

θθθ

θθθθθ

θθ

θθθθ

θθθθθ

θθθθθθ

θθ

likelihood lconditionaf

ff,f

likelihood marginalf

f

ff,f

,fffL

vu

vvuvu

u

uv

uvuvu

vuxxvux

xx

X

X

XX

X

Again, these concepts have their main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.

Penalized likelihood:

MLEs can be derived subjected to some criteria od smoothness.

In particulare this is applicable when the parameter is no longer a single value (one- or multidimensional), but a function such as an unknown

density function or a regression curve.

The penalized log-likelihood function is written

valuesddistributey identicall-nonbut t independen ofset abecan but here, sample random ususal not the is that Note

9) ch. (see es techniqu called-soby estimatedbecan but , ; minimizingby estimatednot thusis

. of influence thegcontrollinparameter fixed a is andfunction penalty theis where

;;

x

x

xx

validation-crossl

RR

Rll

P

P

θθ

θθθθ

Method of moments estimation (MM )

n

i

rir

n

i

rir

n

rr

rr

xxnm

sampler

xnm

samplerxx

XE

centralr populationrXE

populationrX

1

1'

1

1

1

1'

is mean about themoment th The

is origin about themoment th The:,, sample random aFor

is )moment th (mean about themoment th The

is origin about themoment th The: variablerandom aFor

x

The method of moments point estimator of = ( 1, … , k ) is obtained by solving for 1, … , k the systems of equations

twohesebetween t mixture aor

,,1,

or ,,1,'' krm

krm

rr

rr

Example:

1212

2

12363

12444

42

3

434341

:moment central Second2

:origin about themoment First

, from R.S. ,,

222

22222222

2332322

222'2

1

1

abaabb

babaaabbbabaab

aabbab

baababab

abybady

aby

XE

babaUxx

b

a

b

a

n

x

22

222

22

2222

1

2122

ˆ3ˆˆ3ˆ

as

possiblenot ˆ3ˆ32ˆ3

ˆ3ˆ3

ˆ3ˆ12

222

ˆ12

2

equations of systems the,for Solve

xbxa

ba

xxxbxa

xaax

axaxaxb

xxnab

xbaba

MMMM

n

ii

Method of Least Squares (LS)First principles:

Assume a sample x where the random variable Xi can be written

The least-squares estimator of is the value of that minimizes

i.e.

2 varianceconstant

andmean zero with variablerandom a is and involving (function) emean valu theis where

i

ii

mmX

n

ii mx

1

2

n

iiLS mx

1

2minargˆ

A more general approach:

Assume the sample can be written (x, z ) where xi represents the random variable of interest (endogenous variable) and zi represent either an auxiliary random variable (exogenous) or a given constant for sample point i

The least squares estimator of is then

, of functions possibly , with

, and , 0 where

;

2

2

jiiji

ijjiiii

iii

zzc

cCovVarE

zmX

εε

εε

ofmatrix covariance-variance theis and ,, where

minargˆ

1 W

W

n

TLS

Special cases:

The ordinary linear regression model:

The heteroscedastic regression model:

xTT

n

iippii

n

iiLS

n

iippii

zzx

zzX

ZZZ

ZIW

Z

11

2,,110

1

2

2

,,110

minargminargˆ

matrixconstant a be toconsidered is and with

βββ

εβ

x111

2

ˆ

with

WZZWZ

IWZTT

LS

niX

β

ε β

The first-order auto-regressive model:

ntzXz

xxxxxXX

ttt

nn

nttt

,,2,availablenot is point)-me(first tipoint samplefirst

for the i.e. , ,,*, and ,,,Let ,

1121

21

zx

IW

The conditional least-squares estimator of (given ) is

0

0 zeros ofvector

ldimensiona-1 theis and 0 where

minarg

minargminargˆ

111

11

1

2

2

2

2

n

zx

nnn

Tn

T

n

iii

n

iiCLS

0I00W

W εε

maximum-likelihood estimation

Documents