multinomial logistic regression - facultywestfall.ba.ttu.edu/isqs5349/mlr_16.pdf · introduction...
TRANSCRIPT
Multinomial Logistic Regression
Andrea Arias Ll. Luis Sandoval-Mej́ıa Yangmei (Emily) Wang
Texas Tech UniversityISQS 5349: Regression Analysis
April 28, 2016
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Agenda
1 Introduction
2 Multinomial Logistic RegressionMultinomial logit modelModel assumptionsParameter estimation: MLE
3 Example in REstimated probabilities
4 Simulation in RAccounting example
5 References
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Introduction
Let’s consider a data set
A data set with n observations where the response variable can take oneof several discrete values (1,2,...,J)
Let Y be program type1:
GeneralAcademicVocation
1IDRE, 2016
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Some data analysis
Probability of choosing one program by SES in general
0.400.46
0.72
0.34
0.210.16
0.260.33
0.12
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
low middle high
academic general voca:on
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Some data analysis (cont.)
Probability of choosing one program by SES at the 1st
quartile of writing score
0.31 0.33
0.57
0.38
0.23 0.22
0.31
0.44
0.21
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
ses_low ses_middle ses_high
Academic General Voca?onal
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Some data analysis (cont.)
Probability of choosing one program by SES at the meanwriting score
0.440.48
0.70
0.36
0.230.180.20
0.29
0.12
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
ses_low ses_middle ses_high
Academic General Voca?onal
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Some data analysis (cont.)
Probability of choosing one program by SES at the 3rd
quartile of writing score
0.580.63
0.81
0.31
0.200.130.11
0.17
0.06
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
ses_low ses_middle ses_high
Academic General Voca?onal
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Some data analysis (cont.)
Probability of choosing one program by writing score at SESlow
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
31323334353637383940414243444546474849505152535455565758596061626364656667
Wri0ngscore
Academic General Voca0onal
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Some data analysis (cont.)
Probability of choosing one program by writing score at SESmiddle
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
31323334353637383940414243444546474849505152535455565758596061626364656667
Wri0ngscore
Academic General Voca0onal
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Some data analysis (cont.)
Probability of choosing one program by writing score at SEShigh
0.00
0.10
0.20
0.30
0.40
0.50
0.60
0.70
0.80
0.90
1.00
31323334353637383940414243444546474849505152535455565758596061626364656667
Wri0ngscore
Academic General Voca0onal
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
When do we use Multinomial Logit?
When we need to model nominal (unordered choices) out-come variables!
The individual is choosing between more than two alternatives.
Undergraduate major choiceFood choicesCar manufacturer choice
Important!! The multinomial logit model applies only whenthe predictor variables are individual specific.
Income.Age.Social economic status.... and so on.
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
The Multinomial Distribution
Consider a random variable Yi that may take one of severaldiscrete values (1,2,...,J)
Let:
Pr{Yi = j} = πij i = 1, .., n j = 1, .., J
Where πij is the probability that the i-th individual choosethe j-th category2.
And
J∑j=1
πij = 1 i = 1, .., n
2Rodriguez,2016
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Multinomial logit modelModel assumptionsParameter estimation: MLE
The Multinomial Logit Model
A model for the probabilities where the probabilities dependon a vector Xi.
Nominate one of the response categories as baseline.
Calculate the logits for all other categories.with respect to the baseline.
Let the logits be a linear function of the predictors.
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Multinomial logit modelModel assumptionsParameter estimation: MLE
The Multinomial Logit model (cont.)
log
(πijπiJ
)= X
′iβj j = 1, .., J − 1
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Multinomial logit modelModel assumptionsParameter estimation: MLE
The Multinomial Logit model (cont.)
Modeling the probabilities3:
Prob(Yi = j|Xi) = πij =exp(X
′
iβj)
1 +∑J−1
j=1 exp(X′iβj)
, j = 1, 2, ..., J − 1
Prob(Yi = J |Xi) = πiJ =1
1 +∑J−1
j=1 exp(X′iβj)
,
Note: The binomial logit model is a special case when J = 2.
3Greene, 2012
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Multinomial logit modelModel assumptionsParameter estimation: MLE
Model assumptions
Linearity
Independence of observations
Independence of irrelevant alternatives(Not as important as the first two!!)
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Multinomial logit modelModel assumptionsParameter estimation: MLE
Maximum Likelihood Estimation
Recall that the Likelihood function for a parameter vector θ resultingfrom an independent sample is4:
L(θ|y1, y2, ..., yn) = p(y1|θ)× p(y2|θ)× ...× p(yn|θ)
In our case, since for observation i the probability is given by:
Pr{Yi = j} = πij
We have that the Likelihood function is the following:
L(π|data) =∏
i∈choice1
πi1∏
i∈choice2
πi2...∏
i∈choiceJ
πiJ
We choose the β̂ that maximizes
L(π|data)
4Westfall and Henning, 2013
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Estimated probabilities
Estimated probabilities
Once we have estimated the parameters (β̂), we can estimate the prob-abilities for each particular cohort.
Pr (Y = O0|X1 = x1, X2 = x2) =1
1 + e(β̂10+β̂11x1+β̂12x2) + e(β̂20+β̂21x1+β̂22x2)
Pr (Y = O1|X1 = x1, X2 = x2) =e(β̂10+β̂11x1+β̂12x2)
1 + e(β̂10+β̂11x1+β̂12x2) + e(β̂20+β̂21x1+β̂22x2)
Pr (Y = O2|X1 = x1, X2 = x2) =e(β̂20+β̂21x1+β̂22x2)
1 + e(β̂10+β̂11x1+β̂12x2) + e(β̂20+β̂21x1+β̂22x2)
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Accounting example
SimulationAccounting example
Response variable: Tax service provider.Three nominal choices:
B - Big4N - Non Big4S - Self preparer
Predictor variable: Company’s tax preparation budget.(lbudget)
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
Accounting example
SimulationAccounting example
The Model:
log
(Pr(Y = N |lbudget)Pr(Y = B|lbudget)
)= β01 + β11lbudget
log
(Pr(Y = S|lbudget)Pr(Y = B|lbudget)
)= β02 + β12lbudget
The probabilities of choosing each of the three choices:
Pr (Y = B|lbudget) =1
1 + e(β01+β11lbudget) + e(β02+β12lbudget)
Pr (Y = N |lbudget) =e(β01+β11lbudget)
1 + e(β01+β11lbudget) + e(β02+β12lbudget)
Pr (Y = S|lbudget) =e(β02+β12lbudget)
1 + e(β01+β11lbudget) + e(β02+β12lbudget)
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
References
McKee, D. 2015. An intuitive introduction to the multinomial logit.Available at: https://www.youtube.com/watch?v=n7zHXjuE6PE&t=2732s
Institute for Digital Research and Education, UCLA. R Data analysisexamples: Multinomial Logistic Regression. Available at: http://www.
ats.ucla.edu/stat/r/dae/mlogit.htm
Greene, W, H. 2012. Econometric Analysis, 7th Edition. Prentice Hall,NJ.
Rodriguez, G. 2016. A note on interpreting multinomial logit coeffi-cients. Princeton University. Available at: http://data.princeton.
edu/wws509/stata/mlogit.html
Westfall, P., Henning, K. S. (2013). Understanding advanced statisticalmethods. CRC Press.
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression
IntroductionMultinomial Logistic Regression
Example in RSimulation in R
References
THANK YOU !!!
Questions?
Arias Ll., Sandoval-Mej́ıa, Wang Multinomial Logistic Regression