chapter 6: maximum likelihood methods - purdue university - department of...

63
Lecture Notes of Statistical 517, Chapter 6 Chapter 6: Maximum Likelihood Methods Tonglin Zhang Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

Upload: others

Post on 19-Feb-2021

1 views

Category:

Documents


0 download

TRANSCRIPT

  • Lecture Notes of Statistical 517, Chapter 6

    Chapter 6: Maximum LikelihoodMethods

    Tonglin Zhang

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    6.1 Maximum Likelihood Estimation

    (Definition of maximum likelihood estimator). The joint PDF orPMF of a random vector X is the likelihood function of therandom variable. Usually, we denote the likelihood function by

    L(θ) = L(θ;X )

    and its logarithm isℓ(θ) = log L(θ).

    The maximum likelihood estimator (MLE) of θ is the solution of θwhich maximizes the likelihood function, i.e.

    θ̂ = argmaxθL(θ).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Theorem (Jessen’s inequality). Let g(x) be a smooth function. Ifg(x) is a convex function (i.e., g [(x1 + x2)/2] ≥ [g(x1) + g(x2)]/2for any x1, x2 ∈ R), then

    g [E(X )] ≥ E[g(X )].

    If g(x) is strictly convex (i.e., g [(x1 + x2)/2] > [g(x1) + g(x2)]/2for any x1, x2 ∈ R), then

    g [E(X )] > E[g(X )]

    and the equality holds iff X is degenerated (i.e. P(X = a) = 1 forsome a ∈ R).Proposition. If g ′′(x) < 0, then g is convex.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Example: Let g(x) = log(x) for x > 0. Then, g ′(x) = 1/x andg ′′(x) = −1/x2 < 0. Thus, log(x) is strictly convex. Therefore,

    log[E(X )] ≥ E[log(X )].

    Let X be a random variable with PDF or PMF fθ(x) (f (x , θ) bythe notation of the textbook), where θ is the parameter. In thefollowing, we write

    Eθ[g(X )] =

    ∫Rg(x)fθ(x)dx

    as the expected value of g(X ) under θ, where we use thecontinuous case.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    (Theorem) Let θ0 be the true parameter, such that the true PDFor PMF of X is fθ0(x). Then,

    Eθ0

    [log

    fθ(X )

    fθ0(X )

    ]≤ 0

    and the equality holds iff fθ(X ) = fθ0(X ).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Proof. Note that ∫Rfθ(x)dx = 1

    for all θ. We have

    Eθ0

    [log

    fθ(X )

    fθ0(X )

    ]=

    ∫R

    [log

    fθ(x)

    fθ0(x)

    ]fθ0(x)dx

    ≤ log∫R

    [fθ(x)

    fθ0(x)

    ]fθ0(x)dx

    = log

    ∫Rfθ(x)dx

    =0,

    and the equality holds iff fθ(x) = fθ0(x).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Definition. (Almost surely). In the following, we have a conceptcalled almost surely. For any function f (x) and h(x), we sayg(x) = h(x) almost surely and denote it as

    g(x)a.s= h(x)

    ifP({x : g(x) ̸= h(x)}) = 0.

    Note: Suppose that both g(x) and h(x) are continuous. Then, wecan assume g(x) = h(x) for all x ∈ R.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Corollary. If fθ(x)a.s.= fθ0(x) iff θ = θ0, then

    Eθ0 [ℓθ(X )] ≤ Eθ0 [ℓθ0(X )]

    and the equality holds iff θ = θ0.Proof. Straightforwardly.

    Eθ0 [ℓθ(X )]− Eθ0 [ℓθ0(X )] =n∑

    i=1

    Eθ0

    [log

    fθ(Xi )

    fθ0(Xi )

    ]≤ 0

    and the equality holds iff θ = θ0.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Theorem. (Consistency of the MLE). Let X1, · · · ,Xn be iid withdensity fθ(x) for θ ∈ Θ, and let θ0 be the true value of θ. If(1) Θ is compact.

    (2) fθ(x) is continuous in θ for all x ,

    (3) there exists a function K (x) such that Eθ0 |K (X )| < ∞ andlog fθ(x)− log fθ0(x) ≤ K (x) for all x and θ,

    (4) for all θ ∈ Θ and sufficiently small ρ > 0, the expressionsup|θ′−θ|

  • 6.1 Maximum Likelihood Estimation

    ▶ The conditions given by the above theorem are weak. I cannotdisplay the proof of the theorem. Usually, the proof ofconsistency of an estimator is very hard. The conclusion isbasic. If you assume an estimator is consistent, then thederivation of asymptotic normality is often based on themethod of Taylor expansions. We can prove this.

    ▶ In practice, we often use the asymptotic results of the MLE.An important issue is Fisher Information (either a value or amatrix). We provide the main theorem below.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Theorem. (Main result of the MLE). Let θ̂ be the MLE of θderived by ℓ̇(θ) = 0 and θ0 is an interior point of Θ. Suppose thatall Conditions of the previous Theorem hold. Suppose that wehave the following additional conditions:

    (R3) fθ(x) is twice differentiable in θ;

    (R4) the integral ∫ ∞−∞

    fθ(x)dx

    can be differentiated twice in θ under the integral sign;

    (R5) fθ(x) is three time differentiable. In addition, there exists afunction M(x) and a constant c so that EθM(X ) < θ for allθ0 − c < θ < θ0 + c and

    |∂3 log fθ(x)

    ∂θ3| ≤ M(x).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Then, √n(θ̂ − θ0)

    D→ N(0, I−1(θ0)),

    where I (θ) is the Fisher information given by

    I (θ) =− Eθ[ℓ̈(X )]

    =−∫R

    [∂2 log fθ(x)

    ∂θ2

    ]fθ(x)dx

    =

    ∫R

    [∂ log fθ(x)

    ∂θ

    ]2fθ(x)dx .

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Proof. I only provide the proof for the univariate case. Themultivariate case can be shown similarly. The loglikelihoodfunction is

    ℓ(θ) =n∑

    i=1

    log fθ(Xi ).

    Its derivative is

    ℓ̇(θ) =n∑

    i=1

    ∂ log fθ(Xi )

    ∂θ=

    n∑i=1

    1

    fθ(Xi )

    ∂fθ(Xi )

    ∂θ.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Taking the expected value, we obtain

    E[ℓ̇(θ)] =nE

    [1

    fθ(Xi )

    ∂fθ(Xi )

    ∂θ

    ]=n

    ∫ ∞−∞

    1

    fθ(x)

    ∂fθ(x)

    ∂θfθ0(x)dx .

    If θ = θ0, then

    E[ℓ̇(θ0)] =n

    ∫ ∞−∞

    ∂fθ(x)

    ∂θ|θ=θ0dx

    =n∂

    ∂θ

    [∫ ∞−∞

    fθ(x)dx

    ]|θ = θ0

    =0.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    The Taylor expansion of ℓ̇(θ̂) is

    ℓ̇(θ̂) = ℓ̇(θ0) + ℓ̈(θ0)(θ̂ − θ0) +1

    2ℓ′′′(θ∗)(θ̂ − θ0)2,

    where θ∗ is between θ̂ and θ0. The last term can be ignoredbecause θ̂ − θ0

    a.s.→ 0. We have

    θ̂ − θ0 ≈ −ℓ̇(θ0)/ℓ̈(θ0)

    ⇒√n(θ̂ − θ0) =

    (− nℓ̈(θ0)

    )[1√nℓ̇(θ0)

    ].

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Observing the form of ℓ̇(θ0) and ℓ̈(θ0), we find that they are sumof independent variables. In particular, we have

    ℓ̇(θ0) =n∑

    i=1

    ∂fθ(X )

    ∂θ|θ=θ0

    and

    ℓ̈(θ0) =n∑

    i=1

    ∂2fθ(X )

    ∂θ2|θ=θ0 .

    We can use SLLN and CLT for iid random variables. Weimplement it to ℓ(θ0). It states that

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    √n

    [1

    nℓ̇(θ0)− Eθ0

    (∂fθ(x)

    ∂θ|θ=θ0

    )]D→N

    [0,Vθ0

    (∂fθ(x)

    ∂θ|θ=θ0

    )]= N[0, I (θ0)].

    The Fisher information is given by

    I (θ0) =Vθ0

    (∂ log fθ(x)

    ∂θ|θ=θ0

    )=Eθ0

    [(∂ log fθ(x)

    ∂θ|θ=θ0

    )2]

    =− Eθ0[∂2 log fθ(x)

    ∂θ2|θ=θ0

    ].

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    To show this, we use∫ ∞−∞

    ∂ log fθ(x)

    ∂θfθ(x)dx = 0.

    We obtain∂

    ∂θ

    ∫ ∞−∞

    ∂ log fθ(x)

    ∂θfθ(x)dx = 0.

    Then,∫ ∞−∞

    [∂2 log fθ(x)

    ∂θ2

    ]fθ(x)dx +

    ∫ ∞−∞

    [∂ log fθ(x)

    ∂θ

    ]2fθ(x)dx = 0.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Since

    Eθ0

    (∂fθ(x)

    ∂θ|θ=θ0 = 0

    ),

    the above becomes

    1√nℓ̇(θ0)

    D→ N(0, I (θ0)).

    By SLLN, we obtain

    1

    nℓ̈(θ0)

    P→ E[∂2 log fθ(X )

    ∂θ2|θ=θ0

    ]= I (θ0).

    Thus

    √n(θ̂ − θ0)

    D→(− 1I (θ0)

    )N[0, I (θ0)] = N[0, I

    −1(θ0)].

    This is the conclusion.Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Example. Let X1, · · · ,Xn be iid N(µ, σ2). Denote θ = (µ, σ2).Find the limiting distribution of θ̂.Solution: The PDF of N(µ, σ2) is

    fθ(x) =1√2πσ

    e−(x−µ)2

    2σ2 .

    Its logarithm is

    log fθ(x) = −1

    2log(2π)− 1

    2log σ2 − (x − µ)

    2

    2σ2.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    The first-order partial derivatives are

    ∂ log fθ(x)

    ∂µ=x − µσ2

    ,

    ∂ log fθ(x)

    ∂σ2=− 1

    2σ2+

    (x − µ)2

    2σ4.

    The second-order partial derivatives are

    ∂2 log fθ(x)

    ∂µ2=− 1

    σ2,

    ∂2 log fθ(x)

    ∂(σ2)2=

    1

    2σ4− (x − µ)

    2

    σ6,

    ∂2 log fθ(x)

    ∂µ∂σ2=− x − µ

    σ4,

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Note that

    E

    (∂2 log fθ(x)

    ∂µ2

    )=− 1

    σ2

    E

    (∂2 log fθ(x)

    ∂(σ2)2

    )=− 1

    2σ4

    E

    (∂2 log fθ(x)

    ∂µ∂σ2

    )=0.

    The Fisher information matrix is

    I (θ) =

    (1/σ2 00 1

    2σ4

    ).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Using

    I−1(θ) =

    (σ2 00 2σ4

    ).

    we obtain the asymptotic distribution of the MLE as

    √n

    [(X̄

    1n

    ∑ni=1(Xi − X̄ )2

    )−(

    µσ2

    )]D→N

    [(00

    ),

    (σ2 00 2σ4

    )].

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Next, we want to use the ∆-method to find the asymptoticdistribution of η = µ/σ, called the signal-to-noise ratio. Clearly,the MLE is

    η̂ =X̄

    [ 1n∑n

    i=1(Xi − X̄ )2]1/2.

    Let g(z1, z2) = z1/√z2. Then

    g(µ̂, σ̂2) = η̂

    andg(µ, σ2) = η.

    Then,∂g(z1, z2)

    ∂z1=1/

    √z2,

    ∂g(z1, z2)

    ∂z2=− z1

    2z3/22

    .

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Thus,

    ġ(µ, σ2) =

    (1σ

    − µ2σ3

    ).

    We obtain

    ġ⊤(µ, σ2)I−1(θ)ġ(µ, σ2)

    =(

    1σ −

    µ2σ3

    )( σ2 00 2σ4

    )(1σ

    − µ2σ3

    )=1 +

    µ2

    2σ2.

    Thus,√n(η̂ − η) D→ N(0, 1 + µ

    2

    2σ2).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Then, the 95% confidence interval for µ/σ is

    µ̂

    σ̂± 1.96

    √1

    n(1 +

    µ̂2

    2σ̂2)

    =X̂√∑n

    i=1(Xi − X̄ )2/n± 1.96

    √1

    n(1 +

    X̄ 2

    2∑n

    i=1(Xi − X̄ )2/n).

    In addition, we can compute the asymptotic distributions of√n(µ̂2 − µ2),

    √n(σ̂ − σ), and many others.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Example. Let X1, · · · ,Xn be iid Bernoulli(θ). Find the limitingdistribution of the MLE of θ.Solution: The PMF is

    fθ(x) = θx(1− θ)1−x .

    Its logarithm is

    log fθ(x) = x log θ + (1− x) log(1− θ).

    Its partial derivative is

    ∂ log fθ(x)

    ∂θ=

    x

    θ− 1− x

    1− θ.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    The second-order partial derivative is

    ∂2 log fθ(x)

    ∂θ2= − x

    θ2− 1− x

    (1− θ)2.

    The Fisher information is

    I (θ) = −E∂2 log fθ(x)

    ∂θ2=

    1

    θ+

    1

    1− θ=

    1

    θ(1− θ).

    Thus, the asymptotic distribution of the MLE is

    √n(X̄ − θ) D→ N(0, θ(1− θ)).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Let η = log[θ/(1− θ)], where p is called odds. Then, we useg(z) = log[z/(1− z)]. We obtain g ′(z) = 1/[z(1− z)]. Therefore,

    √n(log

    1− X̄− log θ

    1− θ)

    D→N(0, 1θ(1− θ)

    )

    =N(0,1

    θ+

    1

    1− θ).

    This is also a famous formula.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Example. Let X1, · · · ,Xn be iid Poisson(θ). Find the limitingdistribution of the MLE of θ.Solution: The logarithm of the PMF is

    log fθ(x) = − log x! + x log θ − θ.

    Its partial derivative is

    ∂ log fθ(x)

    ∂θ=

    x

    θ− 1

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Its second-order partial derivative is

    ∂2 log fθ(x)

    ∂θ2= − x

    θ2.

    The Fisher information is

    I (θ) = −E∂2 log fθ(X )

    ∂θ2=

    1

    θ.

    Thus, the asymptotic distribution of the MLE is

    √n(X̄ − θ) D→ N(0, θ).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Example. Let X1, · · · ,Xn be iid Exp(θ). Find the limitingdistribution of the MLE of θ.Solution: The logarithm of the PDF is

    log fθ(x) = log θ − θx .

    Its first-order partial derivative is

    ∂ log fθ(x)

    ∂θ=

    1

    θ− x .

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Its second-order partial derivative is

    ∂2 log fθ(x)

    ∂θ2= − 1

    θ2.

    Thus, the Fisher information is

    I (θ) =1

    θ2.

    The asymptotic distribution of the MLE is

    √n(X̄−1 − θ) D→ N(0, θ2).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Example. Let X1, · · · ,Xn be iid with common PDFf (x) = (θ + 1)xθ for x ∈ (0, 1) and θ > −1. Find the limitingdistribution of the MLE of θ.Solution: The logarithm of the PDF is

    log fθ(x) = log(1 + θ) + θ log x .

    Its first-order partial derivative is

    ∂fθ(x)

    ∂θ=

    1

    1 + θ+ log x .

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Its second-order partial derivative is

    ∂2fθ(x)

    ∂θ2= − 1

    (1 + θ)2.

    Thus, the Fisher information is

    I (θ) =1

    (1 + θ)2.

    The asymptotic distribution of the MLE is

    √n(−1− n∑n

    i=1 logXi− θ) D→ N(0, (1 + θ)2).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.1 Maximum Likelihood Estimation

    Example. Let about X1, · · · ,Xn is iid Uniform(θ). The MLE of θis θ̂ = X(n). The CDF of θ̂ is

    Fn(x) = (x/θ)n

    for 0 ≤ x ≤ θ. The PDF of θ̂ is

    fn(x) = nxn−1/θn

    for 0 ≤ x ≤ θ, which is not normal. This is an irregular case.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.2 Rao-Cramér Lower Bound and Efficiency

    6.2 Rao-Cramér Lower Bound and Efficiency

    Theorem. Let X1, · · · ,Xn be iid random variables. Assume thatall of previous conditions holds. Let Y = Y (X1, · · · ,Xn) beunbiased estimator of k(θ), where k is a smooth function. Then,

    Vθ(Y ) ≥[k ′(θ)]2

    nI (θ).

    Proof. We have

    k(θ) = Eθ(Y ) =

    ∫Rn

    y(x1, · · · , yn)

    [n∏

    i=1

    fθ(xi )

    ]dx1 · · · dxn.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.2 Rao-Cramér Lower Bound and Efficiency

    Taking derivative with respect to θ, we have

    k ′(θ) =

    ∫Rn

    y(x1, · · · , yn)∂

    ∂θexp

    {n∑

    i=1

    log[fθ(xi )]

    }dx1 · · · dxn

    =Cov

    [Y (X1, · · · ,Xn),

    n∑i=1

    1

    fθ(Xi )

    ∂f (xi , θ)

    ∂θ

    ].

    ≤V1/2[Y (X1, · · · ,Xn)]V1/2[

    n∑i=1

    1

    fθ(Xi )

    ∂f (xi , θ)

    ∂θ

    ]=V1/2(Y )[nI (θ)]1/2,

    where the inequality holds by the Cauchy-Schwarz inequality.Then, we draw the conclusion.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.2 Rao-Cramér Lower Bound and Efficiency

    Let θ̂ be the MLE of θ. Then, k(θ̂) is the MLE of θ. By thetheorem, we have

    √n(θ̂ − θ0)

    D→ N[0, 1I (θ0)

    ].

    By the ∆-Theorem, we have

    √n[k(θ̂)− k(θ0)]

    D→ N(0, [k′(θ)]2

    I (θ0)).

    Therefore,

    Vθ0 [k(θ̂)] ≈[k ′(θ0)]

    2

    nI (θ0)

    which is less than or equal to any unbiased estimator of k(θ).Thus, the MLE is the most efficient estimator, asymptotically. Thisis also the reason why MLE is so popular.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.2 Rao-Cramér Lower Bound and Efficiency

    In summary, we have

    ▶ The MLE is transformation invariant.▶ The variance of MLE is the asymptotically minimum.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.3 Likelihood Ratio Test

    6.3 Likelihood Ratio Test

    Suppose we have iid sample X1, · · · ,Xn with common PDF orPMF fθ(x). Then, the likelihood function is

    L(θ) =n∏

    i=1

    fθ(Xi ).

    Consider a testH0 : θ ∈ Θ0 ↔ H1 : θ ∈ Θ1

    with Θ0 ∩Θ1 = ϕ and Θ0 ∪Θ1 = Θ. Suppose the MLE of θ forΘ ∈ Θ is θ̂ and the MLE of θ for Θ ∈ Θ0 is θ̂0. Then, thelikelihood ratio statistic is

    Λ =supθ∈Θ0 L(θ̂0)

    supθ∈Θ L(θ̂).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.3 Likelihood Ratio Test

    ▶ Note that Λ < 1. We should accept H0 if Λ is close to 1.Thus, we reject H0 if Λ is small (e.g. Λ < c for a constant c).This is called the likelihood ratio test.

    ▶ In general, a likelihood ratio test is given by(a) reject H0 if Λ < c ;(b) accept H0 if Λ > c ;(c) with probability 0 < γ < 1 reject H0 if Λ = 0.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.3 Likelihood Ratio Test

    Theorem: Let θ ∈ R be a real parameter. Suppose that weconsider the two sided test

    H0 : θ = θ0 ↔ H1 : θ ̸= θ0.

    If θ0 is the true value, then

    −2 log Λ D→ χ21.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.3 Likelihood Ratio Test

    Proof. This can be shown by the Taylor expansion of ℓ(θ0) at θ̂ as

    ℓ(θ0)− ℓ(θ̂)

    ≈ℓ̇(θ̂) + 12ℓ̈(θ̂)(θ0 − θ̂)2

    =1

    2ℓ̈(θ̂)(θ0 − θ̂)2 ≈

    1

    2ℓ̈(θ0)(θ0 − θ̂)2

    ⇒2[ℓ(θ0)− ℓ(θ̂)] ≈(1

    nℓ̈(θ0)

    )[√n(θ̂ − θ0)]2.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.3 Likelihood Ratio Test

    By1

    nℓ̈(θ0)

    P→ −I (θ0),

    we obtain

    −2 log Λ ≈ [I−1/2(θ0)√n(θ̂ − θ0)]2.

    Since I−1/2(θ0)√n(θ̂ − θ0)

    D→ N(0, 1), we draw the conclusion (bythe Continuous Mapping Theorem).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.3 Likelihood Ratio Test

    Example: Let X1, · · · ,Xn ∼iid Exp(θ), where the PDF isfθ(x) = θe

    −θx . Derive the likelihood ratio test for H0 : θ = 1versus H1 : θ ̸= 1.Solution: The likelihood function

    L(θ) =n∏

    i=1

    θe−θXi = θne−θ∑n

    i=1 X̄ = θne−nθX̄ .

    The MLE of θ is θ̂ = 1/X̄ . Taking θ0 = 1, we have

    Λ =L(1)

    L(θ̂)=

    e−nX̄

    θ̂ne−nθ̂X̄= X̄ nen(1−X̄ ).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.3 Likelihood Ratio Test

    We have−2 log Λ = 2n[X̄ − 1− log(X̄ )] D→ χ21.

    Let α be the significance level. We reject H0 if−2 log Λ > χ20.05,1 = 3.84.We can look at this problem by the Taylor expansion ofg(x) = x − 1− log(x) at 1. We have g(1) = 0, g ′(1) = 0 andg ′′(1) = 1. Thus,

    −2 log Λ ≈ 2ng′′(1)

    2(X̄ − 1)2 = n(X̄ − 1)2.

    Since√n(X̄ − 1) D→ N(0, 1), we also conclude −2 log Λ D→ χ21.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.3 Likelihood Ratio Test

    Example: Derive the likelihood ratio test for H0 : p = p0 versusH0 : p ̸= p0 if X ∼ Bin(n, p).Solution: The likelihood function is

    L(p) =

    (n

    X

    )pX (1− p)n−X .

    By the p̂ = X/n, we have

    Λ =

    (nX

    )pX0 (1− p0)n−X(n

    X

    )p̂X (1− p̂)n−X

    =

    (p0p̂

    )X(1− p01− p̂

    )n−X .

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.3 Likelihood Ratio Test

    Thus, we have

    −2 log Λ =2X log p̂p0

    + 2(n − X ) log 1− p̂1− p0

    =[2X logX + 2(n − X ) log(n − X )]− {2X log(np0) + 2(n − X ) log[n(1− p0)]}.

    It is identical to the well-known deviance goodness-of-fit statistic.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    6.5 Multiparameter Cases

    Suppose we consider the test

    H0 : θ ∈ Θ0 ↔ H1 : θ ̸∈ Θ0

    where Θ0 ∈ Rp−q ⊆ Rp. Then, the likelihood ratio statistic is

    Λ =supθ∈Θ0 L(θ)

    supθ∈Θ L(θ)=

    L(θ̂0)

    L(θ̂)

    where θ̂0 is the MLE under H0 and θ̂ is the MLE under H0 ∪ H1.We reject H0 if Λ is small.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    Theorem 6.5.1. Under some regularity conditions, we have

    −2 log Λ D→ χ2q.

    Proof. Please see Theorem 22 in Chapter 22 (page 144-146) in inFerguson, T.S. (1996). A course in Large Sample Theory, CRCPress. ISBN 0-412-04371-8.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    Example: Suppose there are two questions with either yes or noanswered by n individuals. Consider the following 2× 2 table.

    ColumnRow Yes No

    Yes n11 n12No n21 n22

    where nij is the observed value of Nij .

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    Suppose Nij ∼iid Poisson(λij). Let

    λ++ = λ11 + λ12 + λ21 + λ22.

    Then, conditional on

    n = n11 + n12 + n21 + n22

    we have(N11,N12,N21,N22) ∼ Multicolumn(n, p)

    wherep =(p11, p12, p21, p22)

    =(λ11λ++

    ,λ12λ++

    ,λ21λ++

    ,λ22λ++

    ).

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    Propose a likelihood ratio test for the independence between Rowand Column. To under this concept, we look at the probabilitytable

    ColumnRow Yes No Marginal

    Yes p11 p12 p1+ = p11 + p12No p21 p22 p2+ = p21 + p22

    Marginal p+1 = p11 + p21 p+2 = p12 + p22 1

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    If rows and columns are independent, then pij = pi+p+j . Thus, wetest

    H0 : pij = pi+p+j ⇒ λij =λi+λ+jλ++

    ∀ i , j ,

    where

    λi+ =2∑

    j=1

    λij ,

    λ+j =2∑

    i=1

    λij ,

    λ++ =2∑

    i=1

    2∑j=1

    λij .

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    To derive the likelihood ratio test, we need to estimate pij withand without H0, respectively. Let θ = (λ11, λ12, λ21, λ22)

    ⊤. In thegeneral case, the likelihood function is

    L(θ) =2∏

    i=1

    2∏j=1

    λnijij

    nij !e−λij .

    Maximizing the above, we obtain λ̂ij = nij . Under H0, thelikelihood function is

    LH0(θ) =2∏

    i=1

    2∏j=1

    (λi+λ+j/λ++)nij

    nij !e−

    λi+λ+jλ++ .

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    The loglikelihood function is

    ℓ(θ) =−2∑

    i=1

    2∑j=1

    log nij !

    +2∑

    i=1

    2∑j=1

    nij(log λi+ + log λ+j − log λ++)

    − 1λ++

    2∑i=1

    2∑j=1

    λi+λ+j

    =−2∑

    i=1

    2∑j=1

    log nij !

    +2∑

    i=1

    2∑j=1

    nij(log λi+ + log λ+j − log λ++)− λ++.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    Maximizing above (the detail is omitted), we obtain λ̂i+ = ni+(defined similarly). Thus,

    −2 log λ = 22∑

    i=1

    2∑j=1

    nij log(nij/n̂ij),

    where n̂ij = ni+n+j/n++. In I × J table, the above is modified to

    G 2 = −2 log λ = 2I∑

    i=1

    J∑j=1

    nij log(nij/n̂ij).

    This is the definition of deviance goodness-of-fit statistics, veryfamous in statistics.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    Example: Example for Oral Contraceptive Practice by MyocardialInfraction. We want to know whether Oral Contraceptive andMyocarinal Infraction are related. The data are given by thefollowing Table.

    Oral Contraceptive Myocardial InfractionPractice Yes No Total

    Used 23 34 57Never Used 35 132 167

    Total 58 166 224

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    We can construct a probability table as

    Oral Contraceptive Myocardial InfractionPractice Yes No Marginal

    Used p11 p12 p1+Never Used p21 p22 p2+Marginal p+1 p+2 1

    If Oral Contraceptive and Myocarinal Infraction are not related,then

    H0 : pij = pi+p+j

    for all i , j = 1, 2.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    Under H0, we have

    n̂11 =57× 58224

    = 14.76,

    n̂12 =57× 166

    224= 42.24,

    n̂21 =167× 58

    224= 43.24,

    n̂22 =167× 166

    224= 123.76.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    ▶ Thus, the value of −2 log Λ is

    G 2 =22∑

    i=1

    2∑j=1

    nij log(nij/n̂ij)

    =2[23 log(23/14.76) + 34 log(34/42.24)

    + 35 log(35/43.24) + 166 log(166/123.76)]

    =88.34.

    ▶ Under H0, we have θ = (p1+, p+1). Without H0, we haveθ = (p11, p12, p21). Thus, DF = 3− 2 = 1, implying thatc = χ20.05,1 = 3.84.

    ▶ Because G 2 > 3.84, we reject H0.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

  • 6.5 Multiparameter Cases

    ▶ We only provide an example for 2× 2 table.▶ For I × J table, we have DF = (I − 1)(J − 1). For example,

    for 3× 3 table, we have DF = 2× 2 = 4. Thus,

    c = χ20.05,4 = 9.49.

    ▶ There are many other extensions. The main issue is how tocompute n̂ij .

    ▶ The DF is a major problem. You will learn it in other courses.

    Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods