maximum-likelihood estimation consider as usual a random sample x = x 1, …, x n from a...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Maximum-Likelihood estimation
Consider as usual a random sample x = x1, … , xn from a distribution with p.d.f. f (x; ) (and c.d.f. F(x; ) )
The maximum likelihood point estimator of is the value of that maximizes L( ; x ) or equivalently maximizes l( ; x )
Useful notation:
With a k-dimensional parameter:
x;maxargˆ
LML
x;maxargˆ θθθ
LML
Complete sample case:
If all sample values are explicitly known, then
Censored data case:
If some ( say nc ) of the sample values are censored , e.g. xi < k1 or xi > k2 , then
where
n
ii
n
iiML xfxf
11
;lnmaxarg;maxargˆ
uclcc
nnnn
iiML kXkXxf ,,
211
PrPr;maxargˆ
cuclc
uc
lc
nnn
kn
kn
,,
2,
1,
censored)-(right above being asonly known valuesofNumber
censored)-(left below being asonly known valuesofNumber
When the sample comes from a continuous distribution the censored data case can be written
In the case the distribution is discrete the use of F is also possible: If k1 and k2 are values that can be attained by the random variables then we may write
where
uclcc
nnnn
iiML kFkFxf ,, ;1;;maxargˆ
211
uclc
c nnnn
iiML kFkFxf ,, ;1;;maxargˆ
211
ofright theoclosest t valueattainable thebut valuea is
ofleft theoclosest t valueattainable thebut valuea is
222
111
kkk
kkk
Example:
n
iiML
n
ii
n
ii
n
ii
n
ii
xn
n
ii
n
ii
n
ii
n
ii
n
ii
n
i
n
ii
ii
n
i
x
x
n
xn
xn
x
n
x
n
x
nlx
nl
xn
xnl
xnl
xnxx
xex
l
xex
xf
xx
n
ii
1
2
1
2
2
1
2
3
2
1
2
3
2
1
2
3
12
2
1
2322
2
1
2
1
22
1
22
1
2
1 1
2
1
2
2
1
1ˆ maximum (local) a defines 1
022
excluded) 0 (case 11
0;1
1lnlnlnlnln;
0,0,;
p.d.f.on with distributiRayleigh thefrom (R.S.) sample random ,,
1
2
2
2
x
x
Example:
5
0
4
1
6
4
1
4
1
4
1
5
!1ln!lnlnminarg
!ln
!lnminarg
5Prln;lnminarg
5Pr;minargˆ
5 :censored-right is valuessample theof One!
;
function) (mass p.d.f.on with distributiPoisson thefrom R.S. "5",3,5,3,4
y
y
iii
y
yy
i i
x
ii
iiML
x
ey
xx
ey
ex
Xxf
Xxf
x
ex
xf
i
x
Solution must be numerically found
For the exponential family of distributions:
Use the canonical form (natural parameterization):
Let
Then the maximum likelihood estimators (MLEs) of 1, … , k are found by solving the system of equations
*
1;DxCxB
k
jjj
exf
kjtT
kjXBT
jj
n
iijj
,,1, assume and
,,1,1
x
X
kjtTE jj ,,1, X
Example:
n
ii
n
ii
n
ii
n
ii
n
ii
exxxxx
n
xXExXE
MLE
XTxxB
xfeeex
xf
xx
1111
1
!ln!lnln
1
solvingby found is
lni.e.,;!
;
ondistributiPoisson thefrom R.S. ,,
X
x
x
xexnexe
eeey
ee
xyex
eee
x
ee
x
ex
ex
exexxfxXE
ML
n
ii
n
ii
n
i
y
ey
x
ex
x
ex
x
ex
x
ex
x
exxi
lnˆ
1!
1!1!1!
!;
111
0
1
1
11
00
!ln
Computational aspects
When the MLEs can be found by evaluating
numerical routines for solving the generic equation g( ) = 0 can be used.
• Newton-Raphson method
• Fisher’s method of scoring (makes use of the fact that under regularity conditions:
)
This is the multidimensional analogue of Lemma 2.1 ( see page 17)
0θ
l
jiji
llE
θ
lE
xxx ;;;2 θθθ
When the MLEs cannot be found the above way other numerical routines must be used:
• Simplex method
• EM-algorithm
For description of the numerical routines see textbook.
Maximum Likelihood estimation comes into natural use not for handling the standard case, i.e. a complete random sample from a distribution within the exponential family , but for finding estimators in more non-standard and complex situations.
Example:
xaxb
a
b
ab
baL
abbaL
bxxxaabbaL
bxaabbaxf
baUxx
MLnML
nn
n
1
21
1
1
ˆ and ˆ
sample in the values thefrom ofion approximat possiblelargest
theand ofion approximat possiblesmallest theChoose
possible as small as is when
sample theorespect t with maximized is ;,
exist. minimaor maxima local No
case). ed(degenerat when )(largest as is ;,
otherwise0;,
otherwise0,;
, from R.S. ,,
x
x
x
Properties of MLEs
Invariance:
Consistency:
Under some weak regularity conditions all MLEs are consistent
Efficiency:
Under the usual regularity conditions:
(Asymptotically efficient and normally distributed)
φθθθ
φθθφθφ
φθ
of theis ˆ of theis ˆ ifthen
, : offunction one-to-one a is and
zationsparameteri ealternativ tworepresent and If
MLEMLE g
hg
1ML , as ddistributeally asymptotic is ˆ
IN
Sufficiency:
for statistic sufficient minimal
theoffunction a is ˆfor unique ˆMLML MLE
Example:
ii
nnnxx
xxxx
xx
XTXT
eL
ee
eexf
N
ii
XX
x
x
22
12221
2ln
22ln
22
12
2ln
2
12ln
2
1
2
1
2
22lnln
22ln
2
1ln
2
1
22
2
2
and ; and 2
1
;,
2
1,;
, from R.S.
2
22
2
2
2
2
22
2
2
22
222
2
22
2
2
2121,2
21
212,1212
1
22
1
21
112
1
2
22
1
22
1
,2,1
1
22
22
21
22
1
22
2
1
2221
2
12ˆ
2
1
2
1ˆ2
1
4
2
2
12
2
42
1
solvingby obtained are ˆ and ˆ
2
42
1
2
1
xxn
xx
xxn
xxn
xnxxnx
xx
nx
xn
xn
nnnXETE
nnnXETE
iiML
i
iMLi
i
i
i
MLML
i
i
X
X
Invariance property
nbiasNote
x
xxn
ML
MLMLML
i
ML
ML
as 0but 0ˆ!
ˆˆˆ
ˆ2
1ˆ
solution) unique a has equations relating of system the(as
with iprelationsh one-to-one a has
2
,22
21
,1
2
2
1
2
32
2
2232
23222
2
22222
2
22
2
22
2
222
2222
22
2
22
22
22
2
21
1
222
1
12
ln2
2ln22
1;,
nnxx
l
nx
l
nl
nnxx
l
nx
l
nnnxxl
ii
i
ii
i
ii
x
4
222
2
2232
2
2232
223222
22
22222
22
22
22
2
20
0;,
,with
22
21;,
01;,
;,
;,
n
nl
EI
nnnnn
lE
nn
lE
nlE
l
2θ θ
θ
x
x
x
X
x
n
nN
n
n
n
n
nnII
II
II
ML
ML4
2
22
4
2
2
4
421,1,1,2,
2,1,2,2,1
20
0; as ddistributeally asymptotic is
ˆ
ˆ
20
0
0
02
002
1
det
1
θθ
θθ
θθ
i.e. the two MLEs are asymptotically uncorrelated (and by the normal distribution independent)
Modifications and extensions
Ancillarity and conditional sufficiency:
2
112
2122
12
2121
of be to
said is and for statsitic an be tosaid is
on not but on depends b)
on not but on depends a)
,for statistic sufficient minimal a ,
21
2
θ
θ
θθ
θθ
θθθ
tindependenlly conditiona
TancillaryT
tTtf
tf
TTT
TT
T
XX
Profile likelihood:
This concept has its main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.
The same ML point estimator for 1 is obtained by maximizing the profile likelihood as by maximizing the full likelihood function
1
1.211
21.22121
for
thecalled is ;ˆ, then of uegiven val afor
of theis ˆ if , ;, and ,With
θ
θθθ
θθθθθθθ
likelihood profile
L
MLEL
x
x
Marginal and conditional likelihood:
.for the,;
onsolely based becan about inferencesthen
,,;; as factorized becan ,; If
.for the,;on solely based becan
about inferences then , on dependnot does ,; and
,;; as factorized becan ,; if Now,
,;,; then of ngpartitioni a is , If
,;
i.e. , sample theof p.d.f.joint the toequivalent is ;,
111
1
2121121
111
11212
2121121
2121
21
21
θθ
θ
θθθθθ
θθ
θθθθ
θθθθθ
θθθθ
θθ
θθ
likelihood lconditionaf
ff,f
likelihood marginalf
f
ff,f
,ff
f
L
vu
vvuvu
u
uv
uvuvu
vuxxvu
x
xx
X
X
XX
X
Again, these concepts have their main use in cases where 1 contains the parameters of “interest” and 2 contains nuisance parameters.
Penalized likelihood:
MLEs can be derived subjected to some criteria od smoothness.
In particulare this is applicable when the parameter is no longer a single value (one- or multidimensional), but a function such as an unknown
density function or a regression curve.
The penalized log-likelihood function is written
valuesddistributey identicall-nonbut t independen ofset a
becan but here, sample random ususal not the is that Note
9) ch. (see es techniqu called-soby estimated
becan but , ; minimizingby estimatednot thusis
. of influence thegcontrollin
parameter fixed a is andfunction penalty theis where
;;
x
x
xx
validation-cross
l
R
R
Rll
P
P
θ
θ
θ
θθθ
Method of moments estimation (MM )
n
i
rir
n
i
rir
n
rr
rr
xxnm
sampler
xnm
sampler
xx
XE
centralr populationr
XE
populationr
X
1
1'
1
1
1
1'
is mean about themoment th The
is origin about themoment th The
:,, sample random aFor
is )moment th (mean about themoment th The
is origin about themoment th The
: variablerandom aFor
x
The method of moments point estimator of = ( 1, … , k ) is obtained by solving for 1, … , k the systems of equations
twohesebetween t mixture aor
,,1,
or ,,1,'' krm
krm
rr
rr
Example:
1212
2
12
363
12
444
4
2
3
43434
1
:moment central Second
2 :origin about themoment First
, from R.S. ,,
222
22222222
2332322
222'2
1
1
abaabb
babaaabbbaba
ab
aabbab
ba
ab
abab
ab
ybady
aby
XE
ba
baUxx
b
a
b
a
n
x
22
222
22
2222
1
2122
ˆ3ˆˆ3ˆ
as
possiblenot ˆ3ˆ32ˆ3
ˆ3ˆ3
ˆ3ˆ12
222
ˆ12
2
equations of systems the,for Solve
xbxa
ba
xxxbxa
xaax
axax
axb
xxnab
xba
ba
MMMM
n
ii
Method of Least Squares (LS)
First principles:
Assume a sample x where the random variable Xi can be written
The least-squares estimator of is the value of that minimizes
i.e.
2 varianceconstant
andmean zero with variablerandom a is and
involving (function) emean valu theis where
i
ii
m
mX
n
ii mx
1
2
n
iiLS mx
1
2minargˆ
A more general approach:
Assume the sample can be written (x, z ) where xi represents the random variable of interest (endogenous variable) and zi represent either an auxiliary random variable (exogenous) or a given constant for sample point i
The least squares estimator of is then
, of functions possibly , with
, and , 0 where
;
2
2
jiiji
ijjiiii
iii
zzc
cCovVarE
zmX
ε
ε
εε
ofmatrix covariance-variance
theis and ,, where
minargˆ
1 W
W
n
TLS
Special cases:
The ordinary linear regression model:
The heteroscedastic regression model:
xTT
n
iippii
n
iiLS
n
iippii
zzx
zzX
ZZZ
ZIW
Z
1
1
2,,110
1
2
2
,,110
minargminargˆ
matrixconstant a be toconsidered is and with
βββ
εβ
x111
2
ˆ
with
WZZWZ
IWZ
TTLS
niX
β
ε β
The first-order auto-regressive model:
ntzX
z
xxxxx
XX
ttt
nn
nttt
,,2,
availablenot is point)-me(first tipoint samplefirst
for the i.e. , ,,*, and ,,,Let
,
1121
21
zx
IW
The conditional least-squares estimator of (given ) is
0
0
zeros ofvector
ldimensiona-1 theis and 0
where
minarg
minargminargˆ
111
11
1
2
2
2
2
n
zx
nnn
Tn
T
n
iii
n
iiCLS
0I0
0W
W εε