ordered probit - purdue universityweb.ics.purdue.edu/~jltobias/674/oprobit.pdf · ordered probit...

25
Ordered Probit Econ 674 Purdue University March 9, 2009 Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 1 / 25

Upload: doanminh

Post on 26-Jul-2018

221 views

Category:

Documents


0 download

TRANSCRIPT

Ordered Probit

Econ 674

Purdue University

March 9, 2009

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 1 / 25

In some cases, the variable to be modeled has a natural ordinalinterpretation.

Some examples include:

1 Education, measured categorically, (e.g. 1 =< HS, 2 = HS , 3 =Some college, etc.).

2 Income, also measured categorically.

3 Survey responses, coded as a degree of opinion (e.g. 1 = StronglyDisagree, 2= Disagree, 3 = Agree, 4 = Strongly Agree.

4 These are in contrast to other choices such as type of insurance orselected mode of transportation, for example, that are not ordered.

In this lecture we discuss ordinal choice models, and focus on theordered probit in particular.

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 2 / 25

The Ordered Probit Model

Suppose that the variable to be modeled, y takes on J different values,which are naturally ordered:

yi =

12...J

, i = 1, 2, . . . , n.

As with the probit model, we assume that the observed y is generated by alatent variable y∗, where

The link between the latent and observed data is given as follows:

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 3 / 25

The Ordered Probit Model

The αj are called cutpoints or threshold parameters. They are estimatedby the data and help to match the probabilities associated with eachdiscrete outcome.

Without any additional structure, the model is not identified. In particular,there are too many cutpoints and some restrictions are required. The mostcommon way to achieve identification is to set:

and retain an intercept parameter in the model.

What happens, for example when J = 2 and these restrictions areimposed?

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 4 / 25

The Ordered Probit Model

The likelihood for the ordered probit is simply the product of theprobabilities associated with each discrete outcome:

L(β, α) =n∏

i=1

Pr(yi = j |xi ),

whereα = [α3 α4 · · · αJ ].

The i th observation’s contribution to the likelihood is

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 5 / 25

The Ordered Probit Model

Therefore,

L(β, α) =n∏

i=1

Φ(αyi +1 − xiβ)− Φ(αyi − xiβ)

and

For purposes of computing the MLE, it can be useful to define

Zij = I (yi = j).

Thus, we can write:

L(β, α) =n∑

i=1

J∑j=1

zij (log [Φ(αj+1 − xiβ)− Φ(αj − xiβ)]) .

(Some textbooks present the material this way, though we will not makeuse of this here).

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 6 / 25

The Ordered Probit Model

This yields the score for the parameter vector β:

and likewise, we obtain the FOC for αk , k = 3, 4, · · · , J:

Lαk(β, α) =

∑i :yi =k

− φ(αk − xiβ)

Φ(αk+1 − xiβ)− Φ(αk − xiβ)

+∑

i :yi =k−1

φ(αk − xiβ)

Φ(αk − xiβ)− Φ(αk−1 − xiβ)

We do not report the Hessian here, as the expressions are rather lengthy.Nonetheless, standard MLE can be applied.

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 7 / 25

Marginal Effects

To fix ideas, consider the case of an ordered probit model with J = 3, inwhich case we have:

From these, we obtain the category-specific marginal effects:

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 8 / 25

Marginal Effects

What do we learn from this simple model?

1 Like the probit, the marginal effects depend on x . We can evaluatethese at sample means, or take a sample average of the marginaleffects.

2 Unlike the probit, the signs of the “interior” marginal effects areunknown and not completely determined by the sign of βk .

3 We can, however, sign the effects of the lowest and highest categoriesbased on βk . The others, however, can not be known by the readersimply by looking at a table of point estimates.

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 9 / 25

Interpretation

Continue to consider the case with J = 3 and suppose there are nocovariates and only an intercept parameter is included.In this case we have

Pr(yi = 1) = 1− Φ(β)

Pr(yi = 2) = Φ(α− β)− [1− Φ(β)] = Φ(α− β)− Φ(−β)

Pr(yi = 3) = 1− Φ(α− β)

What do you think will happen in terms of the MLE’s?The likelihood is:

L(α, β) =∏

i :yi =1

[1− Φ(β)]∏

i :yi =2

[Φ(α− β)− Φ(−β)]∏

i :yi =3

[1− Φ(α− β)]

which reduces to

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 10 / 25

Interpretation

In the last slide, we have defined nj as the number of observations forwhich yi = j and also note that n1 + n2 + n3 = n.Thus we obtain the log-likelihood

L(α, β) = n1 log[1−Φ(β)]+n2 log[Φ(α−β)−Φ(−β)]+n3 log[1−Φ(α−β)].

The α FOC gives:

with P̂j denoting the fitted probability for category j .

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 11 / 25

Interpretation

Likewise, we get an FOC for β:

−n1φ(β̂)

1− Φ(β̂)+ n2

φ(β̂)− φ(α̂− β̂)

Φ(α̂− β̂)− Φ(−β̂)+ n3

φ(α̂− β̂)

1− Φ(α̂− β̂)= 0.

Grouping terms, and using our α FOC, the β FOC can be shown to imply:

n2

P̂2

=n1

P̂1

.

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 12 / 25

interpretation

Noting thatP̂1 + P̂2 + P̂3 = 1

andn1 + n2 + n3 = n,

these two FOC’s can be manipulated to yield:

That is, the parameters will be selected so that the fitted values of eachcategory exactly match the observed frequencies of outcomes in thatcategory. Note that this generalizes to any J and any link function!For β̂, for example, we obtain:

1− Φ(β̂) = n1/n

or

β̂ = Φ−1

(n − n1

n

).

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 13 / 25

−2.5 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 2.50

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1 − Φ (β)

Φ (α − β)

Pr(y=1)

Pr(y=2)

Pr(y=3)

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 14 / 25

Ordered Probit and the EM Algorithm

Suppose that J = 3 and consider the following model:

y∗i = xiβ + εi , εi |xiiid∼ N (0, 1)

and

yi =

1 if y∗i ≤ 02 if 0 < y∗i ≤ α3 if y∗i > α.

Let

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 15 / 25

Ordered Probit and the EM Algorithm

This reparameterization defines an equivalent model:

where

Note that, in this representation, there are no unknown cutpoints.

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 16 / 25

Ordered Probit and the EM Algorithm

Step 1: E-StepNote that

and thus

L(δ, σ2; z∗) = constant − n

2log(σ2)− 1

2σ2(z∗ − X δ)′(z∗ − X δ).

The E-step is completed by taking expectations over z∗|θ = θt , y :

E [L(δ, σ2; z∗)] ≡ constant− n

2log(σ2)− 1

2σ2Ez∗|θ=θt ,y (z∗−X δ)′(z∗−X δ).

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 17 / 25

Ordered Probit and the EM Algorithm

Step 2: M-Step:To implement the M− step, we must evaluate this expectation and thenmaximize over δ and σ2.

You will probably recognize the δ-part of this exercise. It will followsimilarly to the probit, where:

with

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 18 / 25

To evaluate this mean, suppose

and we seek

That is,

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 19 / 25

Ordered Probit and the EM Algorithm

Applying this result, we can evaluate µ(δt , σ2t , y)

µ(δt , σ2t , yi ) =

xiδt − σ φ(xiδt/σt)Φ(−xiδt/σt) if yi = 1

xiδt + σ φ(xiδt/σt)−φ([1−xiδt ]/σt)Φ([1−xiδt ]/σt)−Φ(−xiδt/σt) if yi = 2

xiδt + σ φ([1−xiδt ]/σt)1−Φ([1−xiδt ]/σt) if yi = 3

Thus, the parameters δ are easily updated.

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 20 / 25

Ordered Probit and the EM Algorithm

As for the updating of σ2, note that it will be obtained as:

σ2t+1 =

1

n

∑i

Ez∗i |δ=δt ,σ2=σ2t ,yi

(z∗i − xiδt+1)2 .

Expanding this out, we obtain (dropping the subscript on the expectationfor simplicity):

σ2t+1 =

1

n

∑i

[E([z∗i ]2

)− 2µ(δt , σ

2t , yi )xiδt+1 + (xiδt+1)2

].

Only the first term in the summation above requires further evaluation.We first note that

E([z∗i ]2

)= (xiδt)2 + 2xiδtE (vi |δt , σ2

t , yi ) + E (v 2i |δt , σ2

t , yi ).

We first recognize that

E (vi |θt , yi ) = µ(δt , σ2t , yi )− xiδt .

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 21 / 25

Ordered Probit and the EM Algorithm

As for the E (v 2i |δt , σ2

t , yi ) term, a little work gives:

E(v 2i |δt , σ2

t , yi ) = σ2

[1 +−xiδt/σtφ(xiδt/σt)− [1− xiδt ]/σtφ([1− xiδt ]/σt)

Φ([1− xiδt ]/σt)− Φ(−xiδt/σt)

].

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 22 / 25

Ordered Probit and the EM Algorithm

So, we have everything we need to implement the EM algorithm. It wouldproceed as follows:

1 Pick some starting values.

2 Calculate µ(δt , σ2t , yi ) using the formula provided. Use this to update

δt to δt+1.

3 Calculate E (v 2i |δt , σ2

t , yi ) ∀i using the formula provided, and use it[together with δt+1 and µ(δt , σ

2t , yi )] to update σ2

t to σ2t+1.

4 Iterate to convergence.

5 Transform back by setting

α̂ = σ̂−1, β̂ = δ̂α̂.

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 23 / 25

Ordered Probit and the EM Algorithm

We use n = 139 law school applications from 1985.

The dependent variable is the rank of the law school with y = 1 if therank is less than or equal to 25, y = 2 if the rank is between 25 and 50and y = 3 if the rank exceeds 50.

The independent variables include the applicant’s LSAT score, GPA andstudent/faculty ratio (the latter is rather questionable).

In the following slides, we present the EM ordered probit estimates (whichmatched STATA’s EXACTLY and were obtained faster!) We report somestatistics evaluated at the sample mean of the x’s and also setting LSATand GPA to their maximum sample values.

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 24 / 25

Ordered Probit and the EM Algorithm

β̂ = [49.1 − .24 − 2.73 − .01], α̂ = 1.00.

Category Fitted Probability Marginal Effectxbar xmax LSAT/xbar LSAT/ xmax GPA/ xbar

y = 1 .04 .99 .02 .003 .25y = 2 .20 .01 .05 -.003 .59y = 3 .76 .00 -.08 .000 -.85

Justin L. Tobias (Purdue) Ordered Probit March 9, 2009 25 / 25