maximum-likelihood estimation
DESCRIPTION
Maximum-Likelihood estimation. Consider as usual a random sample x = x 1 , … , x n from a distribution with p.d.f. f ( x ; ) (and c.d.f. F ( x ; ) ) - PowerPoint PPT PresentationTRANSCRIPT
Maximum-Likelihood estimation
Consider as usual a random sample x = x1, … , xn from a distribution with p.d.f. f (x; ) (and c.d.f. F(x; ) )
The maximum likelihood point estimator of is the value of that maximizes L( ; x ) or equivalently maximizes l( ; x )
Useful notation:
With a k-dimensional parameter:
x;maxargˆ
LML
x;maxargˆ θθθ
LML
Complete sample case:
If all sample values are explicitly known, then
Censored data case:
If some ( say nc ) of the sample values are censored , e.g. xi < k1 or xi > k2 , then
where
n
ii
n
iiML xfxf
11
;lnmaxarg;maxargˆ
uclcc
nnnn
iiML kXkXxf ,,
211
PrPr;maxargˆ
cuclc
uc
lc
nnn
kn
kn
,,
2,
1,
censored)-(right above being asonly known valuesofNumber
censored)-(left below being asonly known valuesofNumber
When the sample comes from a continuous distribution the censored data case can be written
In the case the distribution is discrete the use of F is also possible: If k1 and k2 are values that can be attained by the random variables then we may write
where
uclcc
nnnn
iiML kFkFxf ,, ;1;;maxargˆ
211
uclc
c nnnn
iiML kFkFxf ,, ;1;;maxargˆ
211
ofright theoclosest t valueattainable thebut valuea is
ofleft theoclosest t valueattainable thebut valuea is
222
111
kkk
kkk
Example:
n
iiML
n
ii
n
ii
n
ii
n
ii
xn
n
ii
n
ii
n
ii
n
ii
n
ii
n
i
n
ii
ii
n
i
x
x
n
xn
xn
x
n
x
n
x
nlxnl
xn
xnlxnl
xnxxxexl
xexxf
xx
n
ii
1
2
1
2
2
1
2
3
2
1
2
3
2
1
2
3
12
2
1
2322
2
1
2
1
22
1
22
1
2
1 1
2
1
2
2
1
1ˆ maximum (local) a defines 1
022
excluded) 0 (case 110;1
1lnlnlnlnln;
0,0,;
p.d.f.on with distributiRayleigh thefrom (R.S.) sample random ,,
1
2
2
2
x
x
Example:
5
0
4
1
6
4
1
4
1
4
1
5
!1ln!lnlnminarg
!ln
!lnminarg
5Prln;lnminarg
5Pr;minargˆ
5 :censored-right is valuessample theof One!
;
function) (mass p.d.f.on with distributiPoisson thefrom R.S. "5",3,5,3,4
y
y
iii
y
yy
i i
x
ii
iiML
x
ey
xx
ey
ex
Xxf
Xxf
x
ex
xf
i
x
Solution must be numerically found
For the exponential family of distributions:
Use the canonical form (natural parameterization):
Let
Then the maximum likelihood estimators (MLEs) of 1, … , k are found by solving the system of equations
*
1;DxCxB
k
jjj
exf
kjtT
kjXBT
jj
n
iijj
,,1, assume and
,,1,1
x
X
kjtTE jj ,,1, X
Example:
n
ii
n
ii
n
ii
n
ii
n
ii
exxxxx
n
xXExXE
MLE
XTxxB
xfeeex
xf
xx
1111
1
!ln!lnln
1
solvingby found is
lni.e.,;!
;
ondistributiPoisson thefrom R.S. ,,
X
x
x
xexnexe
eeey
ee
xyexeee
xee
xex
ex
exexxfxXE
ML
n
ii
n
ii
n
i
y
ey
x
ex
x
ex
x
ex
x
ex
x
exxi
lnˆ
1!
1!1!1!
!;
111
0
1
1
11
00
!ln
Computational aspects
When the MLEs can be found by evaluating
numerical routines for solving the generic equation g( ) = 0 can be used.
• Newton-Raphson method
• Fisher’s method of scoring (makes use of the fact that under regularity conditions:
)
This is the multidimensional analogue of Lemma 2.1 ( see page 17)
0θl
jiji
llEθlE
xxx ;;;2 θθθ
When the MLEs cannot be found the above way other numerical routines must be used:
• Simplex method
• EM-algorithm
For description of the numerical routines see textbook.
Maximum Likelihood estimation comes into natural use not for handling the standard case, i.e. a complete random sample from a distribution within the exponential family , but for finding estimators in more non-standard and complex situations.
Example:
xaxb
ab
abbaL
abbaL
bxxxaabbaL
bxaabbaxf
baUxx
MLnML
nn
n
1
21
11
ˆ and ˆsample in the values thefrom ofion approximat possiblelargest
theand ofion approximat possiblesmallest theChoose possible as small as is when
sample theorespect t with maximized is ;,exist. minimaor maxima local No
case). ed(degenerat when )(largest as is ;,otherwise0
;,
otherwise0,;
, from R.S. ,,
x
x
x
Properties of MLEs
Invariance:
Consistency:
Under some weak regularity conditions all MLEs are consistent
Efficiency:
Under the usual regularity conditions:
(Asymptotically efficient and normally distributed)
φθθθ
φθθφθφφθ
of theis ˆ of theis ˆ ifthen
, : offunction one-to-one a is andzationsparameteri ealternativ tworepresent and If
MLEMLE g
hg
1ML , as ddistributeally asymptotic is ˆ
IN
Sufficiency:
for statistic sufficient minimal
theoffunction a is ˆfor unique ˆMLML MLE
Example:
ii
nnnxx
xxxx
xx
XTXT
eL
ee
eexf
N
ii
XX
x
x
22
12221
2ln
22ln
221
2
2ln
212ln
21
21
222lnln
22ln
21ln
21
22
2
2
and ; and 2
1;,
2
1,;
, from R.S.
2
22
22
2
2
22
22
22
222
2
22
2
2
2121,2
21
212,1212
1
22
1
21
112
1
2
22
1
22
1
,2,1
1
22
22
21
22
1
22
2
1
2221
212ˆ
21
21ˆ
21
42
212
2
421
solvingby obtained are ˆ and ˆ2
421
21
xxnxx
xxn
xxn
xnxxnx
xxnx
xn
xn
nnnXETE
nnnXETE
iiML
i
iMLi
i
i
i
MLML
i
i
X
X
Invariance property
nbiasNote
x
xxn
ML
MLMLML
iML
ML
as 0but 0ˆ!
ˆˆˆ
ˆ21ˆ
solution) unique a has equations relating of system the(as
with iprelationsh one-to-one a has
2
,22
21
,1
2
2
12
32
2
22322
3222
2
22222
2
22
2
22
2
2222
222
22
2
22
22
22
2
21
1
222
1
12
ln2
2ln22
1;,
nnxxl
nxl
nl
nnxxl
nxl
nnnxxl
ii
i
ii
i
ii
x
4
222
2
2232
2
2232
223222
22
22222
22
22
22
2
20
0;,
,with
2221;,
01;,
;,
;,
n
nlEI
nnnnnlE
nnlE
nlE
l
2θ θ
θ
x
x
x
X
x
n
nN
n
n
n
n
nnIIII
II
ML
ML4
2
22
4
2
2
4
421,1,1,2,
2,1,2,2,1
20
0; as ddistributeally asymptotic is
ˆˆ
20
0
0
02
002
1det
1
θθ
θθ
θθ
i.e. the two MLEs are asymptotically uncorrelated (and by the normal distribution independent)
Modifications and extensions
Ancillarity and conditional sufficiency:
2
112
2122
12
2121
of be tosaid is and for statsitic an be tosaid is
on not but on depends b)
on not but on depends a),for statistic sufficient minimal a ,
21
2
θθ
θθ
θθθθθ
tindependenlly conditionaTancillaryT
tTtf
tfTTT
TT
T
XX
Profile likelihood:
This concept has its main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.
The same ML point estimator for 1 is obtained by maximizing the profile likelihood as by maximizing the full likelihood function
1
1.211
21.22121
for thecalled is ;ˆ, then of uegiven val afor
of theis ˆ if , ;, and ,With
θθθθ
θθθθθθθ
likelihood profileL
MLEL
x
x
Marginal and conditional likelihood:
.for the,;onsolely based becan about inferencesthen
,,;; as factorized becan ,; If
.for the,;on solely based becan
about inferences then , on dependnot does ,; and
,;; as factorized becan ,; if Now,
,;,; then of ngpartitioni a is , If ,;
i.e. , sample theof p.d.f.joint the toequivalent is ;,
111
1
2121121
111
11212
2121121
2121
21
21
θθθ
θθθθθ
θθ
θθθθ
θθθθθ
θθθθθθ
θθ
likelihood lconditionaf
ff,f
likelihood marginalf
f
ff,f
,fffL
vu
vvuvu
u
uv
uvuvu
vuxxvux
xx
X
X
XX
X
Again, these concepts have their main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.
Penalized likelihood:
MLEs can be derived subjected to some criteria od smoothness.
In particulare this is applicable when the parameter is no longer a single value (one- or multidimensional), but a function such as an unknown
density function or a regression curve.
The penalized log-likelihood function is written
valuesddistributey identicall-nonbut t independen ofset abecan but here, sample random ususal not the is that Note
9) ch. (see es techniqu called-soby estimatedbecan but , ; minimizingby estimatednot thusis
. of influence thegcontrollinparameter fixed a is andfunction penalty theis where
;;
x
x
xx
validation-crossl
RR
Rll
P
P
θθ
θθθθ
Method of moments estimation (MM )
n
i
rir
n
i
rir
n
rr
rr
xxnm
sampler
xnm
samplerxx
XE
centralr populationrXE
populationrX
1
1'
1
1
1
1'
is mean about themoment th The
is origin about themoment th The:,, sample random aFor
is )moment th (mean about themoment th The
is origin about themoment th The: variablerandom aFor
x
The method of moments point estimator of = ( 1, … , k ) is obtained by solving for 1, … , k the systems of equations
twohesebetween t mixture aor
,,1,
or ,,1,'' krm
krm
rr
rr
Example:
1212
2
12363
12444
42
3
434341
:moment central Second2
:origin about themoment First
, from R.S. ,,
222
22222222
2332322
222'2
1
1
abaabb
babaaabbbabaab
aabbab
baababab
abybady
aby
XE
babaUxx
b
a
b
a
n
x
22
222
22
2222
1
2122
ˆ3ˆˆ3ˆ
as
possiblenot ˆ3ˆ32ˆ3
ˆ3ˆ3
ˆ3ˆ12
222
ˆ12
2
equations of systems the,for Solve
xbxa
ba
xxxbxa
xaax
axaxaxb
xxnab
xbaba
MMMM
n
ii
Method of Least Squares (LS)First principles:
Assume a sample x where the random variable Xi can be written
The least-squares estimator of is the value of that minimizes
i.e.
2 varianceconstant
andmean zero with variablerandom a is and involving (function) emean valu theis where
i
ii
mmX
n
ii mx
1
2
n
iiLS mx
1
2minargˆ
A more general approach:
Assume the sample can be written (x, z ) where xi represents the random variable of interest (endogenous variable) and zi represent either an auxiliary random variable (exogenous) or a given constant for sample point i
The least squares estimator of is then
, of functions possibly , with
, and , 0 where
;
2
2
jiiji
ijjiiii
iii
zzc
cCovVarE
zmX
εε
εε
ofmatrix covariance-variance theis and ,, where
minargˆ
1 W
W
n
TLS
Special cases:
The ordinary linear regression model:
The heteroscedastic regression model:
xTT
n
iippii
n
iiLS
n
iippii
zzx
zzX
ZZZ
ZIW
Z
11
2,,110
1
2
2
,,110
minargminargˆ
matrixconstant a be toconsidered is and with
βββ
εβ
x111
2
ˆ
with
WZZWZ
IWZTT
LS
niX
β
ε β
The first-order auto-regressive model:
ntzXz
xxxxxXX
ttt
nn
nttt
,,2,availablenot is point)-me(first tipoint samplefirst
for the i.e. , ,,*, and ,,,Let ,
1121
21
zx
IW
The conditional least-squares estimator of (given ) is
0
0 zeros ofvector
ldimensiona-1 theis and 0 where
minarg
minargminargˆ
111
11
1
2
2
2
2
n
zx
nnn
Tn
T
n
iii
n
iiCLS
0I00W
W εε