chapter 6: maximum likelihood methods - purdue university - department of...

Lecture Notes of Statistical 517, Chapter 6

Chapter 6: Maximum LikelihoodMethods

Tonglin Zhang

Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods

6.1 Maximum Likelihood Estimation


(Definition of maximum likelihood estimator). The joint PDF orPMF of a random vector X is the likelihood function of therandom variable. Usually, we denote the likelihood function by

L(θ) = L(θ;X )

and its logarithm isℓ(θ) = log L(θ).

The maximum likelihood estimator (MLE) of θ is the solution of θwhich maximizes the likelihood function, i.e.

θ̂ = argmaxθL(θ).



Theorem (Jessen’s inequality). Let g(x) be a smooth function. Ifg(x) is a convex function (i.e., g [(x1 + x2)/2] ≥ [g(x1) + g(x2)]/2for any x1, x2 ∈ R), then

g [E(X )] ≥ E[g(X )].

If g(x) is strictly convex (i.e., g [(x1 + x2)/2] > [g(x1) + g(x2)]/2for any x1, x2 ∈ R), then

g [E(X )] > E[g(X )]

and the equality holds iff X is degenerated (i.e. P(X = a) = 1 forsome a ∈ R).Proposition. If g ′′(x) < 0, then g is convex.



Example: Let g(x) = log(x) for x > 0. Then, g ′(x) = 1/x andg ′′(x) = −1/x2 < 0. Thus, log(x) is strictly convex. Therefore,

log[E(X )] ≥ E[log(X )].

Let X be a random variable with PDF or PMF fθ(x) (f (x , θ) bythe notation of the textbook), where θ is the parameter. In thefollowing, we write

Eθ[g(X )] =

∫Rg(x)fθ(x)dx

as the expected value of g(X ) under θ, where we use thecontinuous case.



(Theorem) Let θ0 be the true parameter, such that the true PDFor PMF of X is fθ0(x). Then,

Eθ0

[log

fθ(X )

fθ0(X )

]≤ 0

and the equality holds iff fθ(X ) = fθ0(X ).



Proof. Note that ∫Rfθ(x)dx = 1

for all θ. We have

Eθ0

[log

fθ(X )

fθ0(X )

]=

∫R

[log

fθ(x)

fθ0(x)

]fθ0(x)dx

≤ log∫R

[fθ(x)

fθ0(x)

]fθ0(x)dx

= log

∫Rfθ(x)dx

=0,

and the equality holds iff fθ(x) = fθ0(x).



Definition. (Almost surely). In the following, we have a conceptcalled almost surely. For any function f (x) and h(x), we sayg(x) = h(x) almost surely and denote it as

g(x)a.s= h(x)

ifP({x : g(x) ̸= h(x)}) = 0.

Note: Suppose that both g(x) and h(x) are continuous. Then, wecan assume g(x) = h(x) for all x ∈ R.



Corollary. If fθ(x)a.s.= fθ0(x) iff θ = θ0, then

Eθ0 [ℓθ(X )] ≤ Eθ0 [ℓθ0(X )]

and the equality holds iff θ = θ0.Proof. Straightforwardly.

Eθ0 [ℓθ(X )]− Eθ0 [ℓθ0(X )] =n∑

i=1

Eθ0

[log

fθ(Xi )

fθ0(Xi )

]≤ 0

and the equality holds iff θ = θ0.



Theorem. (Consistency of the MLE). Let X1, · · · ,Xn be iid withdensity fθ(x) for θ ∈ Θ, and let θ0 be the true value of θ. If(1) Θ is compact.

(2) fθ(x) is continuous in θ for all x ,

(3) there exists a function K (x) such that Eθ0 |K (X )| < ∞ andlog fθ(x)− log fθ0(x) ≤ K (x) for all x and θ,

(4) for all θ ∈ Θ and sufficiently small ρ > 0, the expressionsup|θ′−θ|


▶ The conditions given by the above theorem are weak. I cannotdisplay the proof of the theorem. Usually, the proof ofconsistency of an estimator is very hard. The conclusion isbasic. If you assume an estimator is consistent, then thederivation of asymptotic normality is often based on themethod of Taylor expansions. We can prove this.

▶ In practice, we often use the asymptotic results of the MLE.An important issue is Fisher Information (either a value or amatrix). We provide the main theorem below.



Theorem. (Main result of the MLE). Let θ̂ be the MLE of θderived by ℓ̇(θ) = 0 and θ0 is an interior point of Θ. Suppose thatall Conditions of the previous Theorem hold. Suppose that wehave the following additional conditions:

(R3) fθ(x) is twice differentiable in θ;

(R4) the integral ∫ ∞−∞

fθ(x)dx

can be differentiated twice in θ under the integral sign;

(R5) fθ(x) is three time differentiable. In addition, there exists afunction M(x) and a constant c so that EθM(X ) < θ for allθ0 − c < θ < θ0 + c and

|∂3 log fθ(x)

∂θ3| ≤ M(x).



Then, √n(θ̂ − θ0)

D→ N(0, I−1(θ0)),

where I (θ) is the Fisher information given by

I (θ) =− Eθ[ℓ̈(X )]

=−∫R

[∂2 log fθ(x)

∂θ2

]fθ(x)dx

=

∫R

[∂ log fθ(x)

∂θ

]2fθ(x)dx .



Proof. I only provide the proof for the univariate case. Themultivariate case can be shown similarly. The loglikelihoodfunction is

ℓ(θ) =n∑

i=1

log fθ(Xi ).

Its derivative is

ℓ̇(θ) =n∑

i=1

∂ log fθ(Xi )

∂θ=

n∑i=1

1

fθ(Xi )

∂fθ(Xi )

∂θ.



Taking the expected value, we obtain

E[ℓ̇(θ)] =nE

[1

fθ(Xi )

∂fθ(Xi )

∂θ

]=n

∫ ∞−∞

1

fθ(x)

∂fθ(x)

∂θfθ0(x)dx .

If θ = θ0, then

E[ℓ̇(θ0)] =n

∫ ∞−∞

∂fθ(x)

∂θ|θ=θ0dx

=n∂

∂θ

[∫ ∞−∞

fθ(x)dx

]|θ = θ0

=0.



The Taylor expansion of ℓ̇(θ̂) is

ℓ̇(θ̂) = ℓ̇(θ0) + ℓ̈(θ0)(θ̂ − θ0) +1

2ℓ′′′(θ∗)(θ̂ − θ0)2,

where θ∗ is between θ̂ and θ0. The last term can be ignoredbecause θ̂ − θ0

a.s.→ 0. We have

θ̂ − θ0 ≈ −ℓ̇(θ0)/ℓ̈(θ0)

⇒√n(θ̂ − θ0) =

(− nℓ̈(θ0)

)[1√nℓ̇(θ0)

].



Observing the form of ℓ̇(θ0) and ℓ̈(θ0), we find that they are sumof independent variables. In particular, we have

ℓ̇(θ0) =n∑

i=1

∂fθ(X )

∂θ|θ=θ0

and

ℓ̈(θ0) =n∑

i=1

∂2fθ(X )

∂θ2|θ=θ0 .

We can use SLLN and CLT for iid random variables. Weimplement it to ℓ(θ0). It states that



To show this, we use∫ ∞−∞

∂ log fθ(x)

∂θfθ(x)dx = 0.

We obtain∂

∂θ

∫ ∞−∞

∂ log fθ(x)

∂θfθ(x)dx = 0.

Then,∫ ∞−∞

[∂2 log fθ(x)

∂θ2

]fθ(x)dx +

∫ ∞−∞

[∂ log fθ(x)

∂θ

]2fθ(x)dx = 0.



Since

Eθ0

(∂fθ(x)

∂θ|θ=θ0 = 0

),

the above becomes

1√nℓ̇(θ0)

D→ N(0, I (θ0)).

By SLLN, we obtain

1

nℓ̈(θ0)

P→ E[∂2 log fθ(X )

∂θ2|θ=θ0

]= I (θ0).

Thus

√n(θ̂ − θ0)

D→(− 1I (θ0)

)N[0, I (θ0)] = N[0, I

−1(θ0)].

This is the conclusion.Tonglin Zhang, Department of Statistics, Purdue University Chapter 6: Maximum Likelihood Methods


Example. Let X1, · · · ,Xn be iid N(µ, σ2). Denote θ = (µ, σ2).Find the limiting distribution of θ̂.Solution: The PDF of N(µ, σ2) is

fθ(x) =1√2πσ

e−(x−µ)2

2σ2 .

Its logarithm is

log fθ(x) = −1

2log(2π)− 1

2log σ2 − (x − µ)

2

2σ2.



The first-order partial derivatives are

∂ log fθ(x)

∂µ=x − µσ2

,

∂ log fθ(x)

∂σ2=− 1

2σ2+

(x − µ)2

2σ4.

The second-order partial derivatives are

∂2 log fθ(x)

∂µ2=− 1

σ2,

∂2 log fθ(x)

∂(σ2)2=

1

2σ4− (x − µ)

2

σ6,

∂2 log fθ(x)

∂µ∂σ2=− x − µ

σ4,



Note that

E

(∂2 log fθ(x)

∂µ2

)=− 1

σ2

E

(∂2 log fθ(x)

∂(σ2)2

)=− 1

2σ4

E

(∂2 log fθ(x)

∂µ∂σ2

)=0.

The Fisher information matrix is

I (θ) =

(1/σ2 00 1

2σ4

).



Using

I−1(θ) =

(σ2 00 2σ4

).

we obtain the asymptotic distribution of the MLE as

√n

[(X̄

1n

∑ni=1(Xi − X̄ )2

)−(

µσ2

)]D→N

[(00

),

(σ2 00 2σ4

)].



Next, we want to use the ∆-method to find the asymptoticdistribution of η = µ/σ, called the signal-to-noise ratio. Clearly,the MLE is

η̂ =X̄

[ 1n∑n

i=1(Xi − X̄ )2]1/2.

Let g(z1, z2) = z1/√z2. Then

g(µ̂, σ̂2) = η̂

andg(µ, σ2) = η.

Then,∂g(z1, z2)

∂z1=1/

√z2,

∂g(z1, z2)

∂z2=− z1

2z3/22

.



Thus,

ġ(µ, σ2) =

(1σ

− µ2σ3

).

We obtain

ġ⊤(µ, σ2)I−1(θ)ġ(µ, σ2)

=(

1σ −

µ2σ3

)( σ2 00 2σ4

)(1σ

− µ2σ3

)=1 +

µ2

2σ2.

Thus,√n(η̂ − η) D→ N(0, 1 + µ

2

2σ2).



Then, the 95% confidence interval for µ/σ is

µ̂

σ̂± 1.96

√1

n(1 +

µ̂2

2σ̂2)

=X̂√∑n

i=1(Xi − X̄ )2/n± 1.96

√1

n(1 +

X̄ 2

2∑n

i=1(Xi − X̄ )2/n).

In addition, we can compute the asymptotic distributions of√n(µ̂2 − µ2),

√n(σ̂ − σ), and many others.



Example. Let X1, · · · ,Xn be iid Bernoulli(θ). Find the limitingdistribution of the MLE of θ.Solution: The PMF is

fθ(x) = θx(1− θ)1−x .

Its logarithm is

log fθ(x) = x log θ + (1− x) log(1− θ).

Its partial derivative is

∂ log fθ(x)

∂θ=

x

θ− 1− x

1− θ.



The second-order partial derivative is

∂2 log fθ(x)

∂θ2= − x

θ2− 1− x

(1− θ)2.

The Fisher information is

I (θ) = −E∂2 log fθ(x)

∂θ2=

1

θ+

1

1− θ=

1

θ(1− θ).

Thus, the asymptotic distribution of the MLE is

√n(X̄ − θ) D→ N(0, θ(1− θ)).



Let η = log[θ/(1− θ)], where p is called odds. Then, we useg(z) = log[z/(1− z)]. We obtain g ′(z) = 1/[z(1− z)]. Therefore,

√n(log

X̄

1− X̄− log θ

1− θ)

D→N(0, 1θ(1− θ)

)

=N(0,1

θ+

1

1− θ).

This is also a famous formula.



Example. Let X1, · · · ,Xn be iid Poisson(θ). Find the limitingdistribution of the MLE of θ.Solution: The logarithm of the PMF is

log fθ(x) = − log x! + x log θ − θ.

Its partial derivative is

∂ log fθ(x)

∂θ=

x

θ− 1



Its second-order partial derivative is

∂2 log fθ(x)

∂θ2= − x

θ2.

The Fisher information is

I (θ) = −E∂2 log fθ(X )

∂θ2=

1

θ.

Thus, the asymptotic distribution of the MLE is

√n(X̄ − θ) D→ N(0, θ).



Example. Let X1, · · · ,Xn be iid Exp(θ). Find the limitingdistribution of the MLE of θ.Solution: The logarithm of the PDF is

log fθ(x) = log θ − θx .

Its first-order partial derivative is

∂ log fθ(x)

∂θ=

1

θ− x .




∂2 log fθ(x)

∂θ2= − 1

θ2.

Thus, the Fisher information is

I (θ) =1

θ2.

The asymptotic distribution of the MLE is

√n(X̄−1 − θ) D→ N(0, θ2).



Example. Let X1, · · · ,Xn be iid with common PDFf (x) = (θ + 1)xθ for x ∈ (0, 1) and θ > −1. Find the limitingdistribution of the MLE of θ.Solution: The logarithm of the PDF is

log fθ(x) = log(1 + θ) + θ log x .

Its first-order partial derivative is

∂fθ(x)

∂θ=

1

1 + θ+ log x .




∂2fθ(x)

∂θ2= − 1

(1 + θ)2.

Thus, the Fisher information is

I (θ) =1

(1 + θ)2.

The asymptotic distribution of the MLE is

√n(−1− n∑n

i=1 logXi− θ) D→ N(0, (1 + θ)2).



Example. Let about X1, · · · ,Xn is iid Uniform(θ). The MLE of θis θ̂ = X(n). The CDF of θ̂ is

Fn(x) = (x/θ)n

for 0 ≤ x ≤ θ. The PDF of θ̂ is

fn(x) = nxn−1/θn

for 0 ≤ x ≤ θ, which is not normal. This is an irregular case.


6.2 Rao-Cramér Lower Bound and Efficiency


Theorem. Let X1, · · · ,Xn be iid random variables. Assume thatall of previous conditions holds. Let Y = Y (X1, · · · ,Xn) beunbiased estimator of k(θ), where k is a smooth function. Then,

Vθ(Y ) ≥[k ′(θ)]2

nI (θ).

Proof. We have

k(θ) = Eθ(Y ) =

∫Rn

y(x1, · · · , yn)

[n∏

i=1

fθ(xi )

]dx1 · · · dxn.



Taking derivative with respect to θ, we have

k ′(θ) =

∫Rn

y(x1, · · · , yn)∂

∂θexp

{n∑

i=1

log[fθ(xi )]

}dx1 · · · dxn

=Cov

[Y (X1, · · · ,Xn),

n∑i=1

1

fθ(Xi )

∂f (xi , θ)

∂θ

].

≤V1/2[Y (X1, · · · ,Xn)]V1/2[

n∑i=1

1

fθ(Xi )

∂f (xi , θ)

∂θ

]=V1/2(Y )[nI (θ)]1/2,

where the inequality holds by the Cauchy-Schwarz inequality.Then, we draw the conclusion.



Let θ̂ be the MLE of θ. Then, k(θ̂) is the MLE of θ. By thetheorem, we have

√n(θ̂ − θ0)

D→ N[0, 1I (θ0)

].

By the ∆-Theorem, we have

√n[k(θ̂)− k(θ0)]

D→ N(0, [k′(θ)]2

I (θ0)).

Therefore,

Vθ0 [k(θ̂)] ≈[k ′(θ0)]

2

nI (θ0)

which is less than or equal to any unbiased estimator of k(θ).Thus, the MLE is the most efficient estimator, asymptotically. Thisis also the reason why MLE is so popular.



In summary, we have

▶ The MLE is transformation invariant.▶ The variance of MLE is the asymptotically minimum.


6.3 Likelihood Ratio Test


Suppose we have iid sample X1, · · · ,Xn with common PDF orPMF fθ(x). Then, the likelihood function is

L(θ) =n∏

i=1

fθ(Xi ).

Consider a testH0 : θ ∈ Θ0 ↔ H1 : θ ∈ Θ1

with Θ0 ∩Θ1 = ϕ and Θ0 ∪Θ1 = Θ. Suppose the MLE of θ forΘ ∈ Θ is θ̂ and the MLE of θ for Θ ∈ Θ0 is θ̂0. Then, thelikelihood ratio statistic is

Λ =supθ∈Θ0 L(θ̂0)

supθ∈Θ L(θ̂).



▶ Note that Λ < 1. We should accept H0 if Λ is close to 1.Thus, we reject H0 if Λ is small (e.g. Λ < c for a constant c).This is called the likelihood ratio test.

▶ In general, a likelihood ratio test is given by(a) reject H0 if Λ < c ;(b) accept H0 if Λ > c ;(c) with probability 0 < γ < 1 reject H0 if Λ = 0.



Theorem: Let θ ∈ R be a real parameter. Suppose that weconsider the two sided test

H0 : θ = θ0 ↔ H1 : θ ̸= θ0.

If θ0 is the true value, then

−2 log Λ D→ χ21.



Proof. This can be shown by the Taylor expansion of ℓ(θ0) at θ̂ as

ℓ(θ0)− ℓ(θ̂)

≈ℓ̇(θ̂) + 12ℓ̈(θ̂)(θ0 − θ̂)2

=1

2ℓ̈(θ̂)(θ0 − θ̂)2 ≈

1

2ℓ̈(θ0)(θ0 − θ̂)2

⇒2[ℓ(θ0)− ℓ(θ̂)] ≈(1

nℓ̈(θ0)

)[√n(θ̂ − θ0)]2.



By1

nℓ̈(θ0)

P→ −I (θ0),

we obtain

−2 log Λ ≈ [I−1/2(θ0)√n(θ̂ − θ0)]2.

Since I−1/2(θ0)√n(θ̂ − θ0)

D→ N(0, 1), we draw the conclusion (bythe Continuous Mapping Theorem).



Example: Let X1, · · · ,Xn ∼iid Exp(θ), where the PDF isfθ(x) = θe

−θx . Derive the likelihood ratio test for H0 : θ = 1versus H1 : θ ̸= 1.Solution: The likelihood function

L(θ) =n∏

i=1

θe−θXi = θne−θ∑n

i=1 X̄ = θne−nθX̄ .

The MLE of θ is θ̂ = 1/X̄ . Taking θ0 = 1, we have

Λ =L(1)

L(θ̂)=

e−nX̄

θ̂ne−nθ̂X̄= X̄ nen(1−X̄ ).



We have−2 log Λ = 2n[X̄ − 1− log(X̄ )] D→ χ21.

Let α be the significance level. We reject H0 if−2 log Λ > χ20.05,1 = 3.84.We can look at this problem by the Taylor expansion ofg(x) = x − 1− log(x) at 1. We have g(1) = 0, g ′(1) = 0 andg ′′(1) = 1. Thus,

−2 log Λ ≈ 2ng′′(1)

2(X̄ − 1)2 = n(X̄ − 1)2.

Since√n(X̄ − 1) D→ N(0, 1), we also conclude −2 log Λ D→ χ21.



Example: Derive the likelihood ratio test for H0 : p = p0 versusH0 : p ̸= p0 if X ∼ Bin(n, p).Solution: The likelihood function is

L(p) =

(n

X

)pX (1− p)n−X .

By the p̂ = X/n, we have

Λ =

(nX

)pX0 (1− p0)n−X(n

X

)p̂X (1− p̂)n−X

=

(p0p̂

)X(1− p01− p̂

)n−X .



Thus, we have

−2 log Λ =2X log p̂p0

+ 2(n − X ) log 1− p̂1− p0

=[2X logX + 2(n − X ) log(n − X )]− {2X log(np0) + 2(n − X ) log[n(1− p0)]}.

It is identical to the well-known deviance goodness-of-fit statistic.


6.5 Multiparameter Cases


Suppose we consider the test

H0 : θ ∈ Θ0 ↔ H1 : θ ̸∈ Θ0

where Θ0 ∈ Rp−q ⊆ Rp. Then, the likelihood ratio statistic is

Λ =supθ∈Θ0 L(θ)

supθ∈Θ L(θ)=

L(θ̂0)

L(θ̂)

where θ̂0 is the MLE under H0 and θ̂ is the MLE under H0 ∪ H1.We reject H0 if Λ is small.



Theorem 6.5.1. Under some regularity conditions, we have

−2 log Λ D→ χ2q.

Proof. Please see Theorem 22 in Chapter 22 (page 144-146) in inFerguson, T.S. (1996). A course in Large Sample Theory, CRCPress. ISBN 0-412-04371-8.



Example: Suppose there are two questions with either yes or noanswered by n individuals. Consider the following 2× 2 table.

ColumnRow Yes No

Yes n11 n12No n21 n22

where nij is the observed value of Nij .



Suppose Nij ∼iid Poisson(λij). Let

λ++ = λ11 + λ12 + λ21 + λ22.

Then, conditional on

n = n11 + n12 + n21 + n22

we have(N11,N12,N21,N22) ∼ Multicolumn(n, p)

wherep =(p11, p12, p21, p22)

=(λ11λ++

,λ12λ++

,λ21λ++

,λ22λ++

).



Propose a likelihood ratio test for the independence between Rowand Column. To under this concept, we look at the probabilitytable

ColumnRow Yes No Marginal

Yes p11 p12 p1+ = p11 + p12No p21 p22 p2+ = p21 + p22

Marginal p+1 = p11 + p21 p+2 = p12 + p22 1



If rows and columns are independent, then pij = pi+p+j . Thus, wetest

H0 : pij = pi+p+j ⇒ λij =λi+λ+jλ++

∀ i , j ,

where

λi+ =2∑

j=1

λij ,

λ+j =2∑

i=1

λij ,

λ++ =2∑

i=1

2∑j=1

λij .



To derive the likelihood ratio test, we need to estimate pij withand without H0, respectively. Let θ = (λ11, λ12, λ21, λ22)

⊤. In thegeneral case, the likelihood function is

L(θ) =2∏

i=1

2∏j=1

λnijij

nij !e−λij .

Maximizing the above, we obtain λ̂ij = nij . Under H0, thelikelihood function is

LH0(θ) =2∏

i=1

2∏j=1

(λi+λ+j/λ++)nij

nij !e−

λi+λ+jλ++ .



The loglikelihood function is

ℓ(θ) =−2∑

i=1

2∑j=1

log nij !

+2∑

i=1

2∑j=1

nij(log λi+ + log λ+j − log λ++)

− 1λ++

2∑i=1

2∑j=1

λi+λ+j

=−2∑

i=1

2∑j=1

log nij !

+2∑

i=1

2∑j=1

nij(log λi+ + log λ+j − log λ++)− λ++.



Maximizing above (the detail is omitted), we obtain λ̂i+ = ni+(defined similarly). Thus,

−2 log λ = 22∑

i=1

2∑j=1

nij log(nij/n̂ij),

where n̂ij = ni+n+j/n++. In I × J table, the above is modified to

G 2 = −2 log λ = 2I∑

i=1

J∑j=1

nij log(nij/n̂ij).

This is the definition of deviance goodness-of-fit statistics, veryfamous in statistics.



Example: Example for Oral Contraceptive Practice by MyocardialInfraction. We want to know whether Oral Contraceptive andMyocarinal Infraction are related. The data are given by thefollowing Table.

Oral Contraceptive Myocardial InfractionPractice Yes No Total

Used 23 34 57Never Used 35 132 167

Total 58 166 224



We can construct a probability table as

Oral Contraceptive Myocardial InfractionPractice Yes No Marginal

Used p11 p12 p1+Never Used p21 p22 p2+Marginal p+1 p+2 1

If Oral Contraceptive and Myocarinal Infraction are not related,then

H0 : pij = pi+p+j

for all i , j = 1, 2.



Under H0, we have

n̂11 =57× 58224

= 14.76,

n̂12 =57× 166

224= 42.24,

n̂21 =167× 58

224= 43.24,

n̂22 =167× 166

224= 123.76.



▶ Thus, the value of −2 log Λ is

G 2 =22∑

i=1

2∑j=1

nij log(nij/n̂ij)

=2[23 log(23/14.76) + 34 log(34/42.24)

+ 35 log(35/43.24) + 166 log(166/123.76)]

=88.34.

▶ Under H0, we have θ = (p1+, p+1). Without H0, we haveθ = (p11, p12, p21). Thus, DF = 3− 2 = 1, implying thatc = χ20.05,1 = 3.84.

▶ Because G 2 > 3.84, we reject H0.



▶ We only provide an example for 2× 2 table.▶ For I × J table, we have DF = (I − 1)(J − 1). For example,

for 3× 3 table, we have DF = 2× 2 = 4. Thus,

c = χ20.05,4 = 9.49.

▶ There are many other extensions. The main issue is how tocompute n̂ij .

▶ The DF is a major problem. You will learn it in other courses.


chapter 6: maximum likelihood methods - purdue university - department of...

Documents