chapter 6: maximum likelihood methods - purdue university - department of...
TRANSCRIPT
-
Lecture Notes of Statistical 517, Chapter 6
Chapter 6: Maximum LikelihoodMethods
Tonglin Zhang
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
6.1 Maximum Likelihood Estimation
(Definition of maximum likelihood estimator). The joint PDF orPMF of a random vector X is the likelihood function of therandom variable. Usually, we denote the likelihood function by
L(θ) = L(θ;X )
and its logarithm isℓ(θ) = log L(θ).
The maximum likelihood estimator (MLE) of θ is the solution of θwhich maximizes the likelihood function, i.e.
θ̂ = argmaxθL(θ).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Theorem (Jessen’s inequality). Let g(x) be a smooth function. Ifg(x) is a convex function (i.e., g [(x1 + x2)/2] ≥ [g(x1) + g(x2)]/2for any x1, x2 ∈ R), then
g [E(X )] ≥ E[g(X )].
If g(x) is strictly convex (i.e., g [(x1 + x2)/2] > [g(x1) + g(x2)]/2for any x1, x2 ∈ R), then
g [E(X )] > E[g(X )]
and the equality holds iff X is degenerated (i.e. P(X = a) = 1 forsome a ∈ R).Proposition. If g ′′(x) < 0, then g is convex.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Example: Let g(x) = log(x) for x > 0. Then, g ′(x) = 1/x andg ′′(x) = −1/x2 < 0. Thus, log(x) is strictly convex. Therefore,
log[E(X )] ≥ E[log(X )].
Let X be a random variable with PDF or PMF fθ(x) (f (x , θ) bythe notation of the textbook), where θ is the parameter. In thefollowing, we write
Eθ[g(X )] =
∫Rg(x)fθ(x)dx
as the expected value of g(X ) under θ, where we use thecontinuous case.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
(Theorem) Let θ0 be the true parameter, such that the true PDFor PMF of X is fθ0(x). Then,
Eθ0
[log
fθ(X )
fθ0(X )
]≤ 0
and the equality holds iff fθ(X ) = fθ0(X ).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Proof. Note that ∫Rfθ(x)dx = 1
for all θ. We have
Eθ0
[log
fθ(X )
fθ0(X )
]=
∫R
[log
fθ(x)
fθ0(x)
]fθ0(x)dx
≤ log∫R
[fθ(x)
fθ0(x)
]fθ0(x)dx
= log
∫Rfθ(x)dx
=0,
and the equality holds iff fθ(x) = fθ0(x).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Definition. (Almost surely). In the following, we have a conceptcalled almost surely. For any function f (x) and h(x), we sayg(x) = h(x) almost surely and denote it as
g(x)a.s= h(x)
ifP({x : g(x) ̸= h(x)}) = 0.
Note: Suppose that both g(x) and h(x) are continuous. Then, wecan assume g(x) = h(x) for all x ∈ R.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Corollary. If fθ(x)a.s.= fθ0(x) iff θ = θ0, then
Eθ0 [ℓθ(X )] ≤ Eθ0 [ℓθ0(X )]
and the equality holds iff θ = θ0.Proof. Straightforwardly.
Eθ0 [ℓθ(X )]− Eθ0 [ℓθ0(X )] =n∑
i=1
Eθ0
[log
fθ(Xi )
fθ0(Xi )
]≤ 0
and the equality holds iff θ = θ0.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Theorem. (Consistency of the MLE). Let X1, · · · ,Xn be iid withdensity fθ(x) for θ ∈ Θ, and let θ0 be the true value of θ. If(1) Θ is compact.
(2) fθ(x) is continuous in θ for all x ,
(3) there exists a function K (x) such that Eθ0 |K (X )| < ∞ andlog fθ(x)− log fθ0(x) ≤ K (x) for all x and θ,
(4) for all θ ∈ Θ and sufficiently small ρ > 0, the expressionsup|θ′−θ|
-
6.1 Maximum Likelihood Estimation
▶ The conditions given by the above theorem are weak. I cannotdisplay the proof of the theorem. Usually, the proof ofconsistency of an estimator is very hard. The conclusion isbasic. If you assume an estimator is consistent, then thederivation of asymptotic normality is often based on themethod of Taylor expansions. We can prove this.
▶ In practice, we often use the asymptotic results of the MLE.An important issue is Fisher Information (either a value or amatrix). We provide the main theorem below.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Theorem. (Main result of the MLE). Let θ̂ be the MLE of θderived by ℓ̇(θ) = 0 and θ0 is an interior point of Θ. Suppose thatall Conditions of the previous Theorem hold. Suppose that wehave the following additional conditions:
(R3) fθ(x) is twice differentiable in θ;
(R4) the integral ∫ ∞−∞
fθ(x)dx
can be differentiated twice in θ under the integral sign;
(R5) fθ(x) is three time differentiable. In addition, there exists afunction M(x) and a constant c so that EθM(X ) < θ for allθ0 − c < θ < θ0 + c and
|∂3 log fθ(x)
∂θ3| ≤ M(x).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Then, √n(θ̂ − θ0)
D→ N(0, I−1(θ0)),
where I (θ) is the Fisher information given by
I (θ) =− Eθ[ℓ̈(X )]
=−∫R
[∂2 log fθ(x)
∂θ2
]fθ(x)dx
=
∫R
[∂ log fθ(x)
∂θ
]2fθ(x)dx .
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Proof. I only provide the proof for the univariate case. Themultivariate case can be shown similarly. The loglikelihoodfunction is
ℓ(θ) =n∑
i=1
log fθ(Xi ).
Its derivative is
ℓ̇(θ) =n∑
i=1
∂ log fθ(Xi )
∂θ=
n∑i=1
1
fθ(Xi )
∂fθ(Xi )
∂θ.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Taking the expected value, we obtain
E[ℓ̇(θ)] =nE
[1
fθ(Xi )
∂fθ(Xi )
∂θ
]=n
∫ ∞−∞
1
fθ(x)
∂fθ(x)
∂θfθ0(x)dx .
If θ = θ0, then
E[ℓ̇(θ0)] =n
∫ ∞−∞
∂fθ(x)
∂θ|θ=θ0dx
=n∂
∂θ
[∫ ∞−∞
fθ(x)dx
]|θ = θ0
=0.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
The Taylor expansion of ℓ̇(θ̂) is
ℓ̇(θ̂) = ℓ̇(θ0) + ℓ̈(θ0)(θ̂ − θ0) +1
2ℓ′′′(θ∗)(θ̂ − θ0)2,
where θ∗ is between θ̂ and θ0. The last term can be ignoredbecause θ̂ − θ0
a.s.→ 0. We have
θ̂ − θ0 ≈ −ℓ̇(θ0)/ℓ̈(θ0)
⇒√n(θ̂ − θ0) =
(− nℓ̈(θ0)
)[1√nℓ̇(θ0)
].
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Observing the form of ℓ̇(θ0) and ℓ̈(θ0), we find that they are sumof independent variables. In particular, we have
ℓ̇(θ0) =n∑
i=1
∂fθ(X )
∂θ|θ=θ0
and
ℓ̈(θ0) =n∑
i=1
∂2fθ(X )
∂θ2|θ=θ0 .
We can use SLLN and CLT for iid random variables. Weimplement it to ℓ(θ0). It states that
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
√n
[1
nℓ̇(θ0)− Eθ0
(∂fθ(x)
∂θ|θ=θ0
)]D→N
[0,Vθ0
(∂fθ(x)
∂θ|θ=θ0
)]= N[0, I (θ0)].
The Fisher information is given by
I (θ0) =Vθ0
(∂ log fθ(x)
∂θ|θ=θ0
)=Eθ0
[(∂ log fθ(x)
∂θ|θ=θ0
)2]
=− Eθ0[∂2 log fθ(x)
∂θ2|θ=θ0
].
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
To show this, we use∫ ∞−∞
∂ log fθ(x)
∂θfθ(x)dx = 0.
We obtain∂
∂θ
∫ ∞−∞
∂ log fθ(x)
∂θfθ(x)dx = 0.
Then,∫ ∞−∞
[∂2 log fθ(x)
∂θ2
]fθ(x)dx +
∫ ∞−∞
[∂ log fθ(x)
∂θ
]2fθ(x)dx = 0.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Since
Eθ0
(∂fθ(x)
∂θ|θ=θ0 = 0
),
the above becomes
1√nℓ̇(θ0)
D→ N(0, I (θ0)).
By SLLN, we obtain
1
nℓ̈(θ0)
P→ E[∂2 log fθ(X )
∂θ2|θ=θ0
]= I (θ0).
Thus
√n(θ̂ − θ0)
D→(− 1I (θ0)
)N[0, I (θ0)] = N[0, I
−1(θ0)].
This is the conclusion.Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Example. Let X1, · · · ,Xn be iid N(µ, σ2). Denote θ = (µ, σ2).Find the limiting distribution of θ̂.Solution: The PDF of N(µ, σ2) is
fθ(x) =1√2πσ
e−(x−µ)2
2σ2 .
Its logarithm is
log fθ(x) = −1
2log(2π)− 1
2log σ2 − (x − µ)
2
2σ2.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
The first-order partial derivatives are
∂ log fθ(x)
∂µ=x − µσ2
,
∂ log fθ(x)
∂σ2=− 1
2σ2+
(x − µ)2
2σ4.
The second-order partial derivatives are
∂2 log fθ(x)
∂µ2=− 1
σ2,
∂2 log fθ(x)
∂(σ2)2=
1
2σ4− (x − µ)
2
σ6,
∂2 log fθ(x)
∂µ∂σ2=− x − µ
σ4,
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Note that
E
(∂2 log fθ(x)
∂µ2
)=− 1
σ2
E
(∂2 log fθ(x)
∂(σ2)2
)=− 1
2σ4
E
(∂2 log fθ(x)
∂µ∂σ2
)=0.
The Fisher information matrix is
I (θ) =
(1/σ2 00 1
2σ4
).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Using
I−1(θ) =
(σ2 00 2σ4
).
we obtain the asymptotic distribution of the MLE as
√n
[(X̄
1n
∑ni=1(Xi − X̄ )2
)−(
µσ2
)]D→N
[(00
),
(σ2 00 2σ4
)].
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Next, we want to use the ∆-method to find the asymptoticdistribution of η = µ/σ, called the signal-to-noise ratio. Clearly,the MLE is
η̂ =X̄
[ 1n∑n
i=1(Xi − X̄ )2]1/2.
Let g(z1, z2) = z1/√z2. Then
g(µ̂, σ̂2) = η̂
andg(µ, σ2) = η.
Then,∂g(z1, z2)
∂z1=1/
√z2,
∂g(z1, z2)
∂z2=− z1
2z3/22
.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Thus,
ġ(µ, σ2) =
(1σ
− µ2σ3
).
We obtain
ġ⊤(µ, σ2)I−1(θ)ġ(µ, σ2)
=(
1σ −
µ2σ3
)( σ2 00 2σ4
)(1σ
− µ2σ3
)=1 +
µ2
2σ2.
Thus,√n(η̂ − η) D→ N(0, 1 + µ
2
2σ2).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Then, the 95% confidence interval for µ/σ is
µ̂
σ̂± 1.96
√1
n(1 +
µ̂2
2σ̂2)
=X̂√∑n
i=1(Xi − X̄ )2/n± 1.96
√1
n(1 +
X̄ 2
2∑n
i=1(Xi − X̄ )2/n).
In addition, we can compute the asymptotic distributions of√n(µ̂2 − µ2),
√n(σ̂ − σ), and many others.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Example. Let X1, · · · ,Xn be iid Bernoulli(θ). Find the limitingdistribution of the MLE of θ.Solution: The PMF is
fθ(x) = θx(1− θ)1−x .
Its logarithm is
log fθ(x) = x log θ + (1− x) log(1− θ).
Its partial derivative is
∂ log fθ(x)
∂θ=
x
θ− 1− x
1− θ.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
The second-order partial derivative is
∂2 log fθ(x)
∂θ2= − x
θ2− 1− x
(1− θ)2.
The Fisher information is
I (θ) = −E∂2 log fθ(x)
∂θ2=
1
θ+
1
1− θ=
1
θ(1− θ).
Thus, the asymptotic distribution of the MLE is
√n(X̄ − θ) D→ N(0, θ(1− θ)).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Let η = log[θ/(1− θ)], where p is called odds. Then, we useg(z) = log[z/(1− z)]. We obtain g ′(z) = 1/[z(1− z)]. Therefore,
√n(log
X̄
1− X̄− log θ
1− θ)
D→N(0, 1θ(1− θ)
)
=N(0,1
θ+
1
1− θ).
This is also a famous formula.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Example. Let X1, · · · ,Xn be iid Poisson(θ). Find the limitingdistribution of the MLE of θ.Solution: The logarithm of the PMF is
log fθ(x) = − log x! + x log θ − θ.
Its partial derivative is
∂ log fθ(x)
∂θ=
x
θ− 1
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Its second-order partial derivative is
∂2 log fθ(x)
∂θ2= − x
θ2.
The Fisher information is
I (θ) = −E∂2 log fθ(X )
∂θ2=
1
θ.
Thus, the asymptotic distribution of the MLE is
√n(X̄ − θ) D→ N(0, θ).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Example. Let X1, · · · ,Xn be iid Exp(θ). Find the limitingdistribution of the MLE of θ.Solution: The logarithm of the PDF is
log fθ(x) = log θ − θx .
Its first-order partial derivative is
∂ log fθ(x)
∂θ=
1
θ− x .
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Its second-order partial derivative is
∂2 log fθ(x)
∂θ2= − 1
θ2.
Thus, the Fisher information is
I (θ) =1
θ2.
The asymptotic distribution of the MLE is
√n(X̄−1 − θ) D→ N(0, θ2).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Example. Let X1, · · · ,Xn be iid with common PDFf (x) = (θ + 1)xθ for x ∈ (0, 1) and θ > −1. Find the limitingdistribution of the MLE of θ.Solution: The logarithm of the PDF is
log fθ(x) = log(1 + θ) + θ log x .
Its first-order partial derivative is
∂fθ(x)
∂θ=
1
1 + θ+ log x .
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Its second-order partial derivative is
∂2fθ(x)
∂θ2= − 1
(1 + θ)2.
Thus, the Fisher information is
I (θ) =1
(1 + θ)2.
The asymptotic distribution of the MLE is
√n(−1− n∑n
i=1 logXi− θ) D→ N(0, (1 + θ)2).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.1 Maximum Likelihood Estimation
Example. Let about X1, · · · ,Xn is iid Uniform(θ). The MLE of θis θ̂ = X(n). The CDF of θ̂ is
Fn(x) = (x/θ)n
for 0 ≤ x ≤ θ. The PDF of θ̂ is
fn(x) = nxn−1/θn
for 0 ≤ x ≤ θ, which is not normal. This is an irregular case.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.2 Rao-Cramér Lower Bound and Efficiency
6.2 Rao-Cramér Lower Bound and Efficiency
Theorem. Let X1, · · · ,Xn be iid random variables. Assume thatall of previous conditions holds. Let Y = Y (X1, · · · ,Xn) beunbiased estimator of k(θ), where k is a smooth function. Then,
Vθ(Y ) ≥[k ′(θ)]2
nI (θ).
Proof. We have
k(θ) = Eθ(Y ) =
∫Rn
y(x1, · · · , yn)
[n∏
i=1
fθ(xi )
]dx1 · · · dxn.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.2 Rao-Cramér Lower Bound and Efficiency
Taking derivative with respect to θ, we have
k ′(θ) =
∫Rn
y(x1, · · · , yn)∂
∂θexp
{n∑
i=1
log[fθ(xi )]
}dx1 · · · dxn
=Cov
[Y (X1, · · · ,Xn),
n∑i=1
1
fθ(Xi )
∂f (xi , θ)
∂θ
].
≤V1/2[Y (X1, · · · ,Xn)]V1/2[
n∑i=1
1
fθ(Xi )
∂f (xi , θ)
∂θ
]=V1/2(Y )[nI (θ)]1/2,
where the inequality holds by the Cauchy-Schwarz inequality.Then, we draw the conclusion.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.2 Rao-Cramér Lower Bound and Efficiency
Let θ̂ be the MLE of θ. Then, k(θ̂) is the MLE of θ. By thetheorem, we have
√n(θ̂ − θ0)
D→ N[0, 1I (θ0)
].
By the ∆-Theorem, we have
√n[k(θ̂)− k(θ0)]
D→ N(0, [k′(θ)]2
I (θ0)).
Therefore,
Vθ0 [k(θ̂)] ≈[k ′(θ0)]
2
nI (θ0)
which is less than or equal to any unbiased estimator of k(θ).Thus, the MLE is the most efficient estimator, asymptotically. Thisis also the reason why MLE is so popular.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.2 Rao-Cramér Lower Bound and Efficiency
In summary, we have
▶ The MLE is transformation invariant.▶ The variance of MLE is the asymptotically minimum.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.3 Likelihood Ratio Test
6.3 Likelihood Ratio Test
Suppose we have iid sample X1, · · · ,Xn with common PDF orPMF fθ(x). Then, the likelihood function is
L(θ) =n∏
i=1
fθ(Xi ).
Consider a testH0 : θ ∈ Θ0 ↔ H1 : θ ∈ Θ1
with Θ0 ∩Θ1 = ϕ and Θ0 ∪Θ1 = Θ. Suppose the MLE of θ forΘ ∈ Θ is θ̂ and the MLE of θ for Θ ∈ Θ0 is θ̂0. Then, thelikelihood ratio statistic is
Λ =supθ∈Θ0 L(θ̂0)
supθ∈Θ L(θ̂).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.3 Likelihood Ratio Test
▶ Note that Λ < 1. We should accept H0 if Λ is close to 1.Thus, we reject H0 if Λ is small (e.g. Λ < c for a constant c).This is called the likelihood ratio test.
▶ In general, a likelihood ratio test is given by(a) reject H0 if Λ < c ;(b) accept H0 if Λ > c ;(c) with probability 0 < γ < 1 reject H0 if Λ = 0.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.3 Likelihood Ratio Test
Theorem: Let θ ∈ R be a real parameter. Suppose that weconsider the two sided test
H0 : θ = θ0 ↔ H1 : θ ̸= θ0.
If θ0 is the true value, then
−2 log Λ D→ χ21.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.3 Likelihood Ratio Test
Proof. This can be shown by the Taylor expansion of ℓ(θ0) at θ̂ as
ℓ(θ0)− ℓ(θ̂)
≈ℓ̇(θ̂) + 12ℓ̈(θ̂)(θ0 − θ̂)2
=1
2ℓ̈(θ̂)(θ0 − θ̂)2 ≈
1
2ℓ̈(θ0)(θ0 − θ̂)2
⇒2[ℓ(θ0)− ℓ(θ̂)] ≈(1
nℓ̈(θ0)
)[√n(θ̂ − θ0)]2.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.3 Likelihood Ratio Test
By1
nℓ̈(θ0)
P→ −I (θ0),
we obtain
−2 log Λ ≈ [I−1/2(θ0)√n(θ̂ − θ0)]2.
Since I−1/2(θ0)√n(θ̂ − θ0)
D→ N(0, 1), we draw the conclusion (bythe Continuous Mapping Theorem).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.3 Likelihood Ratio Test
Example: Let X1, · · · ,Xn ∼iid Exp(θ), where the PDF isfθ(x) = θe
−θx . Derive the likelihood ratio test for H0 : θ = 1versus H1 : θ ̸= 1.Solution: The likelihood function
L(θ) =n∏
i=1
θe−θXi = θne−θ∑n
i=1 X̄ = θne−nθX̄ .
The MLE of θ is θ̂ = 1/X̄ . Taking θ0 = 1, we have
Λ =L(1)
L(θ̂)=
e−nX̄
θ̂ne−nθ̂X̄= X̄ nen(1−X̄ ).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.3 Likelihood Ratio Test
We have−2 log Λ = 2n[X̄ − 1− log(X̄ )] D→ χ21.
Let α be the significance level. We reject H0 if−2 log Λ > χ20.05,1 = 3.84.We can look at this problem by the Taylor expansion ofg(x) = x − 1− log(x) at 1. We have g(1) = 0, g ′(1) = 0 andg ′′(1) = 1. Thus,
−2 log Λ ≈ 2ng′′(1)
2(X̄ − 1)2 = n(X̄ − 1)2.
Since√n(X̄ − 1) D→ N(0, 1), we also conclude −2 log Λ D→ χ21.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.3 Likelihood Ratio Test
Example: Derive the likelihood ratio test for H0 : p = p0 versusH0 : p ̸= p0 if X ∼ Bin(n, p).Solution: The likelihood function is
L(p) =
(n
X
)pX (1− p)n−X .
By the p̂ = X/n, we have
Λ =
(nX
)pX0 (1− p0)n−X(n
X
)p̂X (1− p̂)n−X
=
(p0p̂
)X(1− p01− p̂
)n−X .
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.3 Likelihood Ratio Test
Thus, we have
−2 log Λ =2X log p̂p0
+ 2(n − X ) log 1− p̂1− p0
=[2X logX + 2(n − X ) log(n − X )]− {2X log(np0) + 2(n − X ) log[n(1− p0)]}.
It is identical to the well-known deviance goodness-of-fit statistic.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
6.5 Multiparameter Cases
Suppose we consider the test
H0 : θ ∈ Θ0 ↔ H1 : θ ̸∈ Θ0
where Θ0 ∈ Rp−q ⊆ Rp. Then, the likelihood ratio statistic is
Λ =supθ∈Θ0 L(θ)
supθ∈Θ L(θ)=
L(θ̂0)
L(θ̂)
where θ̂0 is the MLE under H0 and θ̂ is the MLE under H0 ∪ H1.We reject H0 if Λ is small.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
Theorem 6.5.1. Under some regularity conditions, we have
−2 log Λ D→ χ2q.
Proof. Please see Theorem 22 in Chapter 22 (page 144-146) in inFerguson, T.S. (1996). A course in Large Sample Theory, CRCPress. ISBN 0-412-04371-8.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
Example: Suppose there are two questions with either yes or noanswered by n individuals. Consider the following 2× 2 table.
ColumnRow Yes No
Yes n11 n12No n21 n22
where nij is the observed value of Nij .
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
Suppose Nij ∼iid Poisson(λij). Let
λ++ = λ11 + λ12 + λ21 + λ22.
Then, conditional on
n = n11 + n12 + n21 + n22
we have(N11,N12,N21,N22) ∼ Multicolumn(n, p)
wherep =(p11, p12, p21, p22)
=(λ11λ++
,λ12λ++
,λ21λ++
,λ22λ++
).
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
Propose a likelihood ratio test for the independence between Rowand Column. To under this concept, we look at the probabilitytable
ColumnRow Yes No Marginal
Yes p11 p12 p1+ = p11 + p12No p21 p22 p2+ = p21 + p22
Marginal p+1 = p11 + p21 p+2 = p12 + p22 1
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
If rows and columns are independent, then pij = pi+p+j . Thus, wetest
H0 : pij = pi+p+j ⇒ λij =λi+λ+jλ++
∀ i , j ,
where
λi+ =2∑
j=1
λij ,
λ+j =2∑
i=1
λij ,
λ++ =2∑
i=1
2∑j=1
λij .
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
To derive the likelihood ratio test, we need to estimate pij withand without H0, respectively. Let θ = (λ11, λ12, λ21, λ22)
⊤. In thegeneral case, the likelihood function is
L(θ) =2∏
i=1
2∏j=1
λnijij
nij !e−λij .
Maximizing the above, we obtain λ̂ij = nij . Under H0, thelikelihood function is
LH0(θ) =2∏
i=1
2∏j=1
(λi+λ+j/λ++)nij
nij !e−
λi+λ+jλ++ .
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
The loglikelihood function is
ℓ(θ) =−2∑
i=1
2∑j=1
log nij !
+2∑
i=1
2∑j=1
nij(log λi+ + log λ+j − log λ++)
− 1λ++
2∑i=1
2∑j=1
λi+λ+j
=−2∑
i=1
2∑j=1
log nij !
+2∑
i=1
2∑j=1
nij(log λi+ + log λ+j − log λ++)− λ++.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
Maximizing above (the detail is omitted), we obtain λ̂i+ = ni+(defined similarly). Thus,
−2 log λ = 22∑
i=1
2∑j=1
nij log(nij/n̂ij),
where n̂ij = ni+n+j/n++. In I × J table, the above is modified to
G 2 = −2 log λ = 2I∑
i=1
J∑j=1
nij log(nij/n̂ij).
This is the definition of deviance goodness-of-fit statistics, veryfamous in statistics.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
Example: Example for Oral Contraceptive Practice by MyocardialInfraction. We want to know whether Oral Contraceptive andMyocarinal Infraction are related. The data are given by thefollowing Table.
Oral Contraceptive Myocardial InfractionPractice Yes No Total
Used 23 34 57Never Used 35 132 167
Total 58 166 224
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
We can construct a probability table as
Oral Contraceptive Myocardial InfractionPractice Yes No Marginal
Used p11 p12 p1+Never Used p21 p22 p2+Marginal p+1 p+2 1
If Oral Contraceptive and Myocarinal Infraction are not related,then
H0 : pij = pi+p+j
for all i , j = 1, 2.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
Under H0, we have
n̂11 =57× 58224
= 14.76,
n̂12 =57× 166
224= 42.24,
n̂21 =167× 58
224= 43.24,
n̂22 =167× 166
224= 123.76.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
▶ Thus, the value of −2 log Λ is
G 2 =22∑
i=1
2∑j=1
nij log(nij/n̂ij)
=2[23 log(23/14.76) + 34 log(34/42.24)
+ 35 log(35/43.24) + 166 log(166/123.76)]
=88.34.
▶ Under H0, we have θ = (p1+, p+1). Without H0, we haveθ = (p11, p12, p21). Thus, DF = 3− 2 = 1, implying thatc = χ20.05,1 = 3.84.
▶ Because G 2 > 3.84, we reject H0.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods
-
6.5 Multiparameter Cases
▶ We only provide an example for 2× 2 table.▶ For I × J table, we have DF = (I − 1)(J − 1). For example,
for 3× 3 table, we have DF = 2× 2 = 4. Thus,
c = χ20.05,4 = 9.49.
▶ There are many other extensions. The main issue is how tocompute n̂ij .
▶ The DF is a major problem. You will learn it in other courses.
Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods