soc504: generalized linear models - princeton universitysoc504: generalized linear models brandon...

1179
Soc504: Generalized Linear Models Brandon Stewart 1 Princeton February 22 - March 15, 2017 1 These slides are heavily influenced by Gary King with some material from Teppei Yamamoto, Patrick Lam and Yuri Zhukov. Some individual vignettes are built from the collective effort of generations of teaching fellows for Gov2001 at Harvard. Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 1 / 242

Upload: others

Post on 06-Oct-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Soc504: Generalized Linear Models

Brandon Stewart1

Princeton

February 22 - March 15, 2017

1These slides are heavily influenced by Gary King with some material from TeppeiYamamoto, Patrick Lam and Yuri Zhukov. Some individual vignettes are built from thecollective effort of generations of teaching fellows for Gov2001 at Harvard.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 1 / 242

Page 2: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Followup

Questions?

Replication Stories?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 2 / 242

Page 3: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Followup

Questions?

Replication Stories?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 2 / 242

Page 4: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Followup

Questions?

Replication Stories?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 2 / 242

Page 5: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 3 / 242

Page 6: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 3 / 242

Page 7: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Outcomes

Binary outcome variable:

Yi ∈ {0, 1}

Examples in social science: numerous!

I Turnout (1 = vote; 0 = abstain)

I Education (1 = completed hs; 0 = dropped out)

I Conflict (1 = civil war; 0 = no civil war)

I Eviction (1 = evicted; 0 = not evicted)

I etc. etc.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 4 / 242

Page 8: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Outcomes

Binary outcome variable:

Yi ∈ {0, 1}

Examples in social science:

numerous!

I Turnout (1 = vote; 0 = abstain)

I Education (1 = completed hs; 0 = dropped out)

I Conflict (1 = civil war; 0 = no civil war)

I Eviction (1 = evicted; 0 = not evicted)

I etc. etc.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 4 / 242

Page 9: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Outcomes

Binary outcome variable:

Yi ∈ {0, 1}

Examples in social science: numerous!

I Turnout (1 = vote; 0 = abstain)

I Education (1 = completed hs; 0 = dropped out)

I Conflict (1 = civil war; 0 = no civil war)

I Eviction (1 = evicted; 0 = not evicted)

I etc. etc.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 4 / 242

Page 10: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Outcomes

Binary outcome variable:

Yi ∈ {0, 1}

Examples in social science: numerous!

I Turnout (1 = vote; 0 = abstain)

I Education (1 = completed hs; 0 = dropped out)

I Conflict (1 = civil war; 0 = no civil war)

I Eviction (1 = evicted; 0 = not evicted)

I etc. etc.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 4 / 242

Page 11: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Outcomes

Binary outcome variable:

Yi ∈ {0, 1}

Examples in social science: numerous!

I Turnout (1 = vote; 0 = abstain)

I Education (1 = completed hs; 0 = dropped out)

I Conflict (1 = civil war; 0 = no civil war)

I Eviction (1 = evicted; 0 = not evicted)

I etc. etc.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 4 / 242

Page 12: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Outcomes

Binary outcome variable:

Yi ∈ {0, 1}

Examples in social science: numerous!

I Turnout (1 = vote; 0 = abstain)

I Education (1 = completed hs; 0 = dropped out)

I Conflict (1 = civil war; 0 = no civil war)

I Eviction (1 = evicted; 0 = not evicted)

I etc. etc.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 4 / 242

Page 13: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Outcomes

Binary outcome variable:

Yi ∈ {0, 1}

Examples in social science: numerous!

I Turnout (1 = vote; 0 = abstain)

I Education (1 = completed hs; 0 = dropped out)

I Conflict (1 = civil war; 0 = no civil war)

I Eviction (1 = evicted; 0 = not evicted)

I etc. etc.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 4 / 242

Page 14: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Outcomes

Binary outcome variable:

Yi ∈ {0, 1}

Examples in social science: numerous!

I Turnout (1 = vote; 0 = abstain)

I Education (1 = completed hs; 0 = dropped out)

I Conflict (1 = civil war; 0 = no civil war)

I Eviction (1 = evicted; 0 = not evicted)

I etc. etc.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 4 / 242

Page 15: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do We Model a Binary Outcome Variable?

For a continuous outcome variable, we model the

conditional expectation

function with predictors Xi :

E(Yi | Xi ) = X>i β (linear regression)

When Yi is binary, E(Yi | Xi ) = Pr(Yi = 1 | Xi )

Thus, we model the conditional probability of Yi = 1:

Pr(Yi = 1 | Xi ) = g(X>i β)

There are many possible binary outcome models, depending on thechoice of g(·)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 5 / 242

Page 16: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do We Model a Binary Outcome Variable?

For a continuous outcome variable, we model theconditional expectation function with predictors Xi :

E(Yi | Xi ) = X>i β (linear regression)

When Yi is binary, E(Yi | Xi ) = Pr(Yi = 1 | Xi )

Thus, we model the conditional probability of Yi = 1:

Pr(Yi = 1 | Xi ) = g(X>i β)

There are many possible binary outcome models, depending on thechoice of g(·)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 5 / 242

Page 17: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do We Model a Binary Outcome Variable?

For a continuous outcome variable, we model theconditional expectation function with predictors Xi :

E(Yi | Xi ) = X>i β (linear regression)

When Yi is binary, E(Yi | Xi ) =

Pr(Yi = 1 | Xi )

Thus, we model the conditional probability of Yi = 1:

Pr(Yi = 1 | Xi ) = g(X>i β)

There are many possible binary outcome models, depending on thechoice of g(·)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 5 / 242

Page 18: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do We Model a Binary Outcome Variable?

For a continuous outcome variable, we model theconditional expectation function with predictors Xi :

E(Yi | Xi ) = X>i β (linear regression)

When Yi is binary, E(Yi | Xi ) = Pr(Yi = 1 | Xi )

Thus, we model the conditional probability of Yi = 1:

Pr(Yi = 1 | Xi ) = g(X>i β)

There are many possible binary outcome models, depending on thechoice of g(·)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 5 / 242

Page 19: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do We Model a Binary Outcome Variable?

For a continuous outcome variable, we model theconditional expectation function with predictors Xi :

E(Yi | Xi ) = X>i β (linear regression)

When Yi is binary, E(Yi | Xi ) = Pr(Yi = 1 | Xi )

Thus, we model the conditional probability of Yi = 1:

Pr(Yi = 1 | Xi ) = g(X>i β)

There are many possible binary outcome models, depending on thechoice of g(·)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 5 / 242

Page 20: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do We Model a Binary Outcome Variable?

For a continuous outcome variable, we model theconditional expectation function with predictors Xi :

E(Yi | Xi ) = X>i β (linear regression)

When Yi is binary, E(Yi | Xi ) = Pr(Yi = 1 | Xi )

Thus, we model the conditional probability of Yi = 1:

Pr(Yi = 1 | Xi ) = g(X>i β)

There are many possible binary outcome models, depending on thechoice of g(·)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 5 / 242

Page 21: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:I Estimated probability can go outside of [0, 1]I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 22: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:I Estimated probability can go outside of [0, 1]I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 23: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:I Estimated probability can go outside of [0, 1]I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 24: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:I Estimated probability can go outside of [0, 1]I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 25: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:

I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:I Estimated probability can go outside of [0, 1]I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 26: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:I Estimated probability can go outside of [0, 1]I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 27: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:I Estimated probability can go outside of [0, 1]I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 28: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:I Estimated probability can go outside of [0, 1]I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 29: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:

I Estimated probability can go outside of [0, 1]I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 30: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:I Estimated probability can go outside of [0, 1]

I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 31: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Linear Probability Model (LPM)

The simplest choice: g(X>i β) = X>i β

This gives the linear probability model (LPM):

Pr(Yi = 1 | Xi ) = X>i β

or equivalently

Yi ∼ Bernoulli(πi )

πi = X>i β

Advantages:I Easy to estimate: Regress Yi on Xi

I Easy to interpret: β = ATE if Xi ∈ {0, 1} and exogenous

Disadvantages:I Estimated probability can go outside of [0, 1]I Always heteroskedastic

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 6 / 242

Page 32: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit and Probit Models

Want: 0 ≤ g(X>i β) ≤ 1 for any Xi

Solution: Use a CDF

πi = g(X>i β) = F (X>i β)

Note: F is not the CDF of Yi , which is

Bernoulli(using a CDF is just a convenient way to ensure 0 ≤ πi ≤ 1)

Logit: Logistic CDF (a.k.a. inverse logit function)

πi = logit−1(X>i β) ≡exp(X>i β)

1 + exp(X>i β)=

1

1 + exp(−X>i β)

Probit: Standard normal CDF

πi = Φ(X>i β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 7 / 242

Page 33: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit and Probit Models

Want: 0 ≤ g(X>i β) ≤ 1 for any Xi

Solution: Use a CDF

πi = g(X>i β) = F (X>i β)

Note: F is not the CDF of Yi , which is

Bernoulli(using a CDF is just a convenient way to ensure 0 ≤ πi ≤ 1)

Logit: Logistic CDF (a.k.a. inverse logit function)

πi = logit−1(X>i β) ≡exp(X>i β)

1 + exp(X>i β)=

1

1 + exp(−X>i β)

Probit: Standard normal CDF

πi = Φ(X>i β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 7 / 242

Page 34: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit and Probit Models

Want: 0 ≤ g(X>i β) ≤ 1 for any Xi

Solution: Use a CDF

πi = g(X>i β) = F (X>i β)

Note: F is not the CDF of Yi , which is

Bernoulli(using a CDF is just a convenient way to ensure 0 ≤ πi ≤ 1)

Logit: Logistic CDF (a.k.a. inverse logit function)

πi = logit−1(X>i β) ≡exp(X>i β)

1 + exp(X>i β)=

1

1 + exp(−X>i β)

Probit: Standard normal CDF

πi = Φ(X>i β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 7 / 242

Page 35: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit and Probit Models

Want: 0 ≤ g(X>i β) ≤ 1 for any Xi

Solution: Use a CDF

πi = g(X>i β) = F (X>i β)

Note: F is not the CDF of Yi , which is Bernoulli

(using a CDF is just a convenient way to ensure 0 ≤ πi ≤ 1)

Logit: Logistic CDF (a.k.a. inverse logit function)

πi = logit−1(X>i β) ≡exp(X>i β)

1 + exp(X>i β)=

1

1 + exp(−X>i β)

Probit: Standard normal CDF

πi = Φ(X>i β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 7 / 242

Page 36: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit and Probit Models

Want: 0 ≤ g(X>i β) ≤ 1 for any Xi

Solution: Use a CDF

πi = g(X>i β) = F (X>i β)

Note: F is not the CDF of Yi , which is Bernoulli(using a CDF is just a convenient way to ensure 0 ≤ πi ≤ 1)

Logit: Logistic CDF (a.k.a. inverse logit function)

πi = logit−1(X>i β) ≡exp(X>i β)

1 + exp(X>i β)=

1

1 + exp(−X>i β)

Probit: Standard normal CDF

πi = Φ(X>i β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 7 / 242

Page 37: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit and Probit Models

Want: 0 ≤ g(X>i β) ≤ 1 for any Xi

Solution: Use a CDF

πi = g(X>i β) = F (X>i β)

Note: F is not the CDF of Yi , which is Bernoulli(using a CDF is just a convenient way to ensure 0 ≤ πi ≤ 1)

Logit: Logistic CDF (a.k.a. inverse logit function)

πi = logit−1(X>i β) ≡exp(X>i β)

1 + exp(X>i β)=

1

1 + exp(−X>i β)

Probit: Standard normal CDF

πi = Φ(X>i β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 7 / 242

Page 38: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit and Probit Models

Want: 0 ≤ g(X>i β) ≤ 1 for any Xi

Solution: Use a CDF

πi = g(X>i β) = F (X>i β)

Note: F is not the CDF of Yi , which is Bernoulli(using a CDF is just a convenient way to ensure 0 ≤ πi ≤ 1)

Logit: Logistic CDF (a.k.a. inverse logit function)

πi = logit−1(X>i β) ≡exp(X>i β)

1 + exp(X>i β)=

1

1 + exp(−X>i β)

Probit: Standard normal CDF

πi = Φ(X>i β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 7 / 242

Page 39: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The logistic regression (or “logit”) model:

1 Stochastic component:

Yi ∼ YBern(yi |πi ) = πyii (1− πi )1−yi =

{πi for y = 1

1− πi for y = 0

2 Systematic Component:

Pr(Yi = 1|β) ≡ E (Yi ) ≡ πi =1

1 + e−xiβ

3 Yi and Yj are independent ∀ i 6= j , conditional on X

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 8 / 242

Page 40: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The logistic regression (or “logit”) model:

1 Stochastic component:

Yi ∼ YBern(yi |πi ) = πyii (1− πi )1−yi =

{πi for y = 1

1− πi for y = 0

2 Systematic Component:

Pr(Yi = 1|β) ≡ E (Yi ) ≡ πi =1

1 + e−xiβ

3 Yi and Yj are independent ∀ i 6= j , conditional on X

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 8 / 242

Page 41: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The logistic regression (or “logit”) model:

1 Stochastic component:

Yi ∼ YBern(yi |πi ) = πyii (1− πi )1−yi =

{πi for y = 1

1− πi for y = 0

2 Systematic Component:

Pr(Yi = 1|β) ≡ E (Yi ) ≡ πi =1

1 + e−xiβ

3 Yi and Yj are independent ∀ i 6= j , conditional on X

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 8 / 242

Page 42: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The logistic regression (or “logit”) model:

1 Stochastic component:

Yi ∼ YBern(yi |πi ) = πyii (1− πi )1−yi =

{πi for y = 1

1− πi for y = 0

2 Systematic Component:

Pr(Yi = 1|β) ≡ E (Yi ) ≡ πi =1

1 + e−xiβ

3 Yi and Yj are independent ∀ i 6= j , conditional on X

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 8 / 242

Page 43: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The logistic regression (or “logit”) model:

1 Stochastic component:

Yi ∼ YBern(yi |πi ) = πyii (1− πi )1−yi =

{πi for y = 1

1− πi for y = 0

2 Systematic Component:

Pr(Yi = 1|β) ≡ E (Yi ) ≡ πi =1

1 + e−xiβ

3 Yi and Yj are independent ∀ i 6= j , conditional on X

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 8 / 242

Page 44: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The logistic regression (or “logit”) model:

1 Stochastic component:

Yi ∼ YBern(yi |πi ) = πyii (1− πi )1−yi =

{πi for y = 1

1− πi for y = 0

2 Systematic Component:

Pr(Yi = 1|β) ≡ E (Yi ) ≡ πi =1

1 + e−xiβ

3 Yi and Yj are independent ∀ i 6= j , conditional on X

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 8 / 242

Page 45: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The probability density of all the data:

P(y |π) =n∏

i=1

πyii (1− πi )1−yi

The log-likelihood:

ln L(π|y) =n∑

i=1

{yi lnπi + (1− yi ) ln(1− πi )}

=n∑

i=1

{−yi ln

(1 + e−xiβ

)+ (1− yi ) ln

(1− 1

1 + e−xiβ

)}

= −n∑

i=1

ln(

1 + e(1−2yi )xiβ).

What do we do with this?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 9 / 242

Page 46: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The probability density of all the data:

P(y |π) =n∏

i=1

πyii (1− πi )1−yi

The log-likelihood:

ln L(π|y) =n∑

i=1

{yi lnπi + (1− yi ) ln(1− πi )}

=n∑

i=1

{−yi ln

(1 + e−xiβ

)+ (1− yi ) ln

(1− 1

1 + e−xiβ

)}

= −n∑

i=1

ln(

1 + e(1−2yi )xiβ).

What do we do with this?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 9 / 242

Page 47: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The probability density of all the data:

P(y |π) =n∏

i=1

πyii (1− πi )1−yi

The log-likelihood:

ln L(π|y) =n∑

i=1

{yi lnπi + (1− yi ) ln(1− πi )}

=n∑

i=1

{−yi ln

(1 + e−xiβ

)+ (1− yi ) ln

(1− 1

1 + e−xiβ

)}

= −n∑

i=1

ln(

1 + e(1−2yi )xiβ).

What do we do with this?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 9 / 242

Page 48: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The probability density of all the data:

P(y |π) =n∏

i=1

πyii (1− πi )1−yi

The log-likelihood:

ln L(π|y) =n∑

i=1

{yi lnπi + (1− yi ) ln(1− πi )}

=n∑

i=1

{−yi ln

(1 + e−xiβ

)+ (1− yi ) ln

(1− 1

1 + e−xiβ

)}

= −n∑

i=1

ln(

1 + e(1−2yi )xiβ).

What do we do with this?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 9 / 242

Page 49: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The probability density of all the data:

P(y |π) =n∏

i=1

πyii (1− πi )1−yi

The log-likelihood:

ln L(π|y) =n∑

i=1

{yi lnπi + (1− yi ) ln(1− πi )}

=n∑

i=1

{−yi ln

(1 + e−xiβ

)+ (1− yi ) ln

(1− 1

1 + e−xiβ

)}

= −n∑

i=1

ln(

1 + e(1−2yi )xiβ).

What do we do with this?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 9 / 242

Page 50: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The probability density of all the data:

P(y |π) =n∏

i=1

πyii (1− πi )1−yi

The log-likelihood:

ln L(π|y) =n∑

i=1

{yi lnπi + (1− yi ) ln(1− πi )}

=n∑

i=1

{−yi ln

(1 + e−xiβ

)+ (1− yi ) ln

(1− 1

1 + e−xiβ

)}

= −n∑

i=1

ln(

1 + e(1−2yi )xiβ).

What do we do with this?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 9 / 242

Page 51: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binary Variable Regression Models

The probability density of all the data:

P(y |π) =n∏

i=1

πyii (1− πi )1−yi

The log-likelihood:

ln L(π|y) =n∑

i=1

{yi lnπi + (1− yi ) ln(1− πi )}

=n∑

i=1

{−yi ln

(1 + e−xiβ

)+ (1− yi ) ln

(1− 1

1 + e−xiβ

)}

= −n∑

i=1

ln(

1 + e(1−2yi )xiβ).

What do we do with this?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 9 / 242

Page 52: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

Running Example is logit:

πi =1

1 + e−xiβ

Methods:

1. Graphs.

(a) Can use desired instead of observed X ’s(b) Can try entire surface plot for a small number of X ’s(c) Marginal effects: Can hold “other variables” constant at their means, a

typical value, or at their observed values(d) Average effects: compute effects for every observation and average

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 10 / 242

Page 53: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

Running Example is logit:

πi =1

1 + e−xiβ

Methods:

1. Graphs.

(a) Can use desired instead of observed X ’s(b) Can try entire surface plot for a small number of X ’s(c) Marginal effects: Can hold “other variables” constant at their means, a

typical value, or at their observed values(d) Average effects: compute effects for every observation and average

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 10 / 242

Page 54: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

Running Example is logit:

πi =1

1 + e−xiβ

Methods:

1. Graphs.

(a) Can use desired instead of observed X ’s(b) Can try entire surface plot for a small number of X ’s(c) Marginal effects: Can hold “other variables” constant at their means, a

typical value, or at their observed values(d) Average effects: compute effects for every observation and average

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 10 / 242

Page 55: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

Running Example is logit:

πi =1

1 + e−xiβ

Methods:

1. Graphs.

(a) Can use desired instead of observed X ’s(b) Can try entire surface plot for a small number of X ’s(c) Marginal effects: Can hold “other variables” constant at their means, a

typical value, or at their observed values(d) Average effects: compute effects for every observation and average

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 10 / 242

Page 56: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

Running Example is logit:

πi =1

1 + e−xiβ

Methods:

1. Graphs.

(a) Can use desired instead of observed X ’s

(b) Can try entire surface plot for a small number of X ’s(c) Marginal effects: Can hold “other variables” constant at their means, a

typical value, or at their observed values(d) Average effects: compute effects for every observation and average

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 10 / 242

Page 57: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

Running Example is logit:

πi =1

1 + e−xiβ

Methods:

1. Graphs.

(a) Can use desired instead of observed X ’s(b) Can try entire surface plot for a small number of X ’s

(c) Marginal effects: Can hold “other variables” constant at their means, atypical value, or at their observed values

(d) Average effects: compute effects for every observation and average

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 10 / 242

Page 58: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

Running Example is logit:

πi =1

1 + e−xiβ

Methods:

1. Graphs.

(a) Can use desired instead of observed X ’s(b) Can try entire surface plot for a small number of X ’s(c) Marginal effects: Can hold “other variables” constant at their means, a

typical value, or at their observed values

(d) Average effects: compute effects for every observation and average

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 10 / 242

Page 59: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

Running Example is logit:

πi =1

1 + e−xiβ

Methods:

1. Graphs.

(a) Can use desired instead of observed X ’s(b) Can try entire surface plot for a small number of X ’s(c) Marginal effects: Can hold “other variables” constant at their means, a

typical value, or at their observed values(d) Average effects: compute effects for every observation and average

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 10 / 242

Page 60: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

2. Fitted Values for selected combinations of X ’s, or “typical” people ortypes:

Sex Age Home Income Pr(vote)

Male 20 Chicago $33,000 0.20Female 27 New York City $43,000 0.28Male 50 Madison, WI $55,000 0.72

...

We may also want to include uncertainty (fundamental and estimationuncertainty)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 11 / 242

Page 61: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

2. Fitted Values for selected combinations of X ’s, or “typical” people ortypes:

Sex Age Home Income Pr(vote)

Male 20 Chicago $33,000 0.20Female 27 New York City $43,000 0.28Male 50 Madison, WI $55,000 0.72

...

We may also want to include uncertainty (fundamental and estimationuncertainty)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 11 / 242

Page 62: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

2. Fitted Values for selected combinations of X ’s, or “typical” people ortypes:

Sex Age Home Income Pr(vote)

Male 20 Chicago $33,000 0.20Female 27 New York City $43,000 0.28Male 50 Madison, WI $55,000 0.72

...

We may also want to include uncertainty (fundamental and estimationuncertainty)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 11 / 242

Page 63: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

3. First Differences (called Risk Differences in epidemiology)

(a) Define Xs (starting point) and Xe (ending point) as k × 1 vectors ofvalues of X . Usually all values are the same but one.

(b) First difference = g(Xe , β)− g(Xs , β)(c) D = 1

1+e−Xe β− 1

1+e−Xs β

(d) Better (and necessary to compute se’s): do by simulation (we’ll repeatthe details soon)

Variable From To FirstDifference

Sex Male → Female .05Age 65 → 75 −.10Home NYC → Madison, WI .26Income $35,000 → $75, 000 .14

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 12 / 242

Page 64: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

3. First Differences (called Risk Differences in epidemiology)

(a) Define Xs (starting point) and Xe (ending point) as k × 1 vectors ofvalues of X . Usually all values are the same but one.

(b) First difference = g(Xe , β)− g(Xs , β)(c) D = 1

1+e−Xe β− 1

1+e−Xs β

(d) Better (and necessary to compute se’s): do by simulation (we’ll repeatthe details soon)

Variable From To FirstDifference

Sex Male → Female .05Age 65 → 75 −.10Home NYC → Madison, WI .26Income $35,000 → $75, 000 .14

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 12 / 242

Page 65: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

3. First Differences (called Risk Differences in epidemiology)

(a) Define Xs (starting point) and Xe (ending point) as k × 1 vectors ofvalues of X . Usually all values are the same but one.

(b) First difference = g(Xe , β)− g(Xs , β)

(c) D = 11+e−Xe β

− 11+e−Xs β

(d) Better (and necessary to compute se’s): do by simulation (we’ll repeatthe details soon)

Variable From To FirstDifference

Sex Male → Female .05Age 65 → 75 −.10Home NYC → Madison, WI .26Income $35,000 → $75, 000 .14

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 12 / 242

Page 66: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

3. First Differences (called Risk Differences in epidemiology)

(a) Define Xs (starting point) and Xe (ending point) as k × 1 vectors ofvalues of X . Usually all values are the same but one.

(b) First difference = g(Xe , β)− g(Xs , β)(c) D = 1

1+e−Xe β− 1

1+e−Xs β

(d) Better (and necessary to compute se’s): do by simulation (we’ll repeatthe details soon)

Variable From To FirstDifference

Sex Male → Female .05Age 65 → 75 −.10Home NYC → Madison, WI .26Income $35,000 → $75, 000 .14

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 12 / 242

Page 67: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

3. First Differences (called Risk Differences in epidemiology)

(a) Define Xs (starting point) and Xe (ending point) as k × 1 vectors ofvalues of X . Usually all values are the same but one.

(b) First difference = g(Xe , β)− g(Xs , β)(c) D = 1

1+e−Xe β− 1

1+e−Xs β

(d) Better (and necessary to compute se’s): do by simulation (we’ll repeatthe details soon)

Variable From To FirstDifference

Sex Male → Female .05Age 65 → 75 −.10Home NYC → Madison, WI .26Income $35,000 → $75, 000 .14

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 12 / 242

Page 68: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

3. First Differences (called Risk Differences in epidemiology)

(a) Define Xs (starting point) and Xe (ending point) as k × 1 vectors ofvalues of X . Usually all values are the same but one.

(b) First difference = g(Xe , β)− g(Xs , β)(c) D = 1

1+e−Xe β− 1

1+e−Xs β

(d) Better (and necessary to compute se’s): do by simulation (we’ll repeatthe details soon)

Variable From To FirstDifference

Sex Male → Female .05Age 65 → 75 −.10Home NYC → Madison, WI .26Income $35,000 → $75, 000 .14

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 12 / 242

Page 69: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

4. Derivatives (i.e. a source of heuristics for talks)

∂πi∂Xj

=∂ 1

1+e−Xβ

∂Xj= βj πi (1− πi )

(a) Max value of logit derivative: β × 0.5(1− 0.5) = β/4

(b) Max value for probit [πi = Φ(Xiβ)] derivative: β × 0.4

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 13 / 242

Page 70: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

4. Derivatives (i.e. a source of heuristics for talks)

∂πi∂Xj

=∂ 1

1+e−Xβ

∂Xj= βj πi (1− πi )

(a) Max value of logit derivative: β × 0.5(1− 0.5) = β/4

(b) Max value for probit [πi = Φ(Xiβ)] derivative: β × 0.4

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 13 / 242

Page 71: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

4. Derivatives (i.e. a source of heuristics for talks)

∂πi∂Xj

=∂ 1

1+e−Xβ

∂Xj= βj πi (1− πi )

(a) Max value of logit derivative: β × 0.5(1− 0.5) = β/4

(b) Max value for probit [πi = Φ(Xiβ)] derivative: β × 0.4

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 13 / 242

Page 72: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting Functional Forms

4. Derivatives (i.e. a source of heuristics for talks)

∂πi∂Xj

=∂ 1

1+e−Xβ

∂Xj= βj πi (1− πi )

(a) Max value of logit derivative: β × 0.5(1− 0.5) = β/4

(b) Max value for probit [πi = Φ(Xiβ)] derivative: β × 0.4

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 13 / 242

Page 73: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Logit models can also be interpreted in terms of a latent variable Y ∗i

Let

Yi =

{1 if Y ∗i > 00 if Y ∗i ≤ 0

Y ∗i = X>i β + εi where E(εi ) = 0

This is also called a random utility model, where

I Y ∗i : Utility from choosing Yi = 1 instead of Yi = 0I X>i β: Systematic component of utilityI εi : Stochastic (random) component of utility

Make distributional assumptions about εi :

I εii.i.d.∼ Logistic =⇒ Logit

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 14 / 242

Page 74: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Logit models can also be interpreted in terms of a latent variable Y ∗iLet

Yi =

{1 if Y ∗i > 00 if Y ∗i ≤ 0

Y ∗i = X>i β + εi where E(εi ) = 0

This is also called a random utility model, where

I Y ∗i : Utility from choosing Yi = 1 instead of Yi = 0I X>i β: Systematic component of utilityI εi : Stochastic (random) component of utility

Make distributional assumptions about εi :

I εii.i.d.∼ Logistic =⇒ Logit

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 14 / 242

Page 75: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Logit models can also be interpreted in terms of a latent variable Y ∗iLet

Yi =

{1 if Y ∗i > 00 if Y ∗i ≤ 0

Y ∗i = X>i β + εi where E(εi ) = 0

This is also called a random utility model, where

I Y ∗i : Utility from choosing Yi = 1 instead of Yi = 0I X>i β: Systematic component of utilityI εi : Stochastic (random) component of utility

Make distributional assumptions about εi :

I εii.i.d.∼ Logistic =⇒ Logit

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 14 / 242

Page 76: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Logit models can also be interpreted in terms of a latent variable Y ∗iLet

Yi =

{1 if Y ∗i > 00 if Y ∗i ≤ 0

Y ∗i = X>i β + εi where E(εi ) = 0

This is also called a random utility model, whereI Y ∗i : Utility from choosing Yi = 1 instead of Yi = 0

I X>i β: Systematic component of utilityI εi : Stochastic (random) component of utility

Make distributional assumptions about εi :

I εii.i.d.∼ Logistic =⇒ Logit

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 14 / 242

Page 77: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Logit models can also be interpreted in terms of a latent variable Y ∗iLet

Yi =

{1 if Y ∗i > 00 if Y ∗i ≤ 0

Y ∗i = X>i β + εi where E(εi ) = 0

This is also called a random utility model, whereI Y ∗i : Utility from choosing Yi = 1 instead of Yi = 0I X>i β: Systematic component of utility

I εi : Stochastic (random) component of utility

Make distributional assumptions about εi :

I εii.i.d.∼ Logistic =⇒ Logit

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 14 / 242

Page 78: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Logit models can also be interpreted in terms of a latent variable Y ∗iLet

Yi =

{1 if Y ∗i > 00 if Y ∗i ≤ 0

Y ∗i = X>i β + εi where E(εi ) = 0

This is also called a random utility model, whereI Y ∗i : Utility from choosing Yi = 1 instead of Yi = 0I X>i β: Systematic component of utilityI εi : Stochastic (random) component of utility

Make distributional assumptions about εi :

I εii.i.d.∼ Logistic =⇒ Logit

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 14 / 242

Page 79: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Logit models can also be interpreted in terms of a latent variable Y ∗iLet

Yi =

{1 if Y ∗i > 00 if Y ∗i ≤ 0

Y ∗i = X>i β + εi where E(εi ) = 0

This is also called a random utility model, whereI Y ∗i : Utility from choosing Yi = 1 instead of Yi = 0I X>i β: Systematic component of utilityI εi : Stochastic (random) component of utility

Make distributional assumptions about εi :

I εii.i.d.∼ Logistic =⇒ Logit

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 14 / 242

Page 80: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Logit models can also be interpreted in terms of a latent variable Y ∗iLet

Yi =

{1 if Y ∗i > 00 if Y ∗i ≤ 0

Y ∗i = X>i β + εi where E(εi ) = 0

This is also called a random utility model, whereI Y ∗i : Utility from choosing Yi = 1 instead of Yi = 0I X>i β: Systematic component of utilityI εi : Stochastic (random) component of utility

Make distributional assumptions about εi :

I εii.i.d.∼ Logistic =⇒ Logit

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 14 / 242

Page 81: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable: Walkthrough

Let Y ∗ be a continuous unobserved variable. Health, propensity to vote,etc.

A model:

Y ∗i ∼ P(y∗i |µi )µi = xiβ

with observation mechanism:

yi =

{1 y∗ ≤ τ if i is alive

0 y∗ > τ if i is dead

Since Y ∗ is unobserved anyway, define the threshold as τ = 0. (Plus thesame independence assumption, which from now on is assumed implicit.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 15 / 242

Page 82: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable: Walkthrough

Let Y ∗ be a continuous unobserved variable. Health, propensity to vote,etc.

A model:

Y ∗i ∼ P(y∗i |µi )µi = xiβ

with observation mechanism:

yi =

{1 y∗ ≤ τ if i is alive

0 y∗ > τ if i is dead

Since Y ∗ is unobserved anyway, define the threshold as τ = 0. (Plus thesame independence assumption, which from now on is assumed implicit.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 15 / 242

Page 83: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable: Walkthrough

Let Y ∗ be a continuous unobserved variable. Health, propensity to vote,etc.

A model:

Y ∗i ∼ P(y∗i |µi )µi = xiβ

with observation mechanism:

yi =

{1 y∗ ≤ τ if i is alive

0 y∗ > τ if i is dead

Since Y ∗ is unobserved anyway, define the threshold as τ = 0. (Plus thesame independence assumption, which from now on is assumed implicit.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 15 / 242

Page 84: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable: Walkthrough

Let Y ∗ be a continuous unobserved variable. Health, propensity to vote,etc.

A model:

Y ∗i ∼ P(y∗i |µi )

µi = xiβ

with observation mechanism:

yi =

{1 y∗ ≤ τ if i is alive

0 y∗ > τ if i is dead

Since Y ∗ is unobserved anyway, define the threshold as τ = 0. (Plus thesame independence assumption, which from now on is assumed implicit.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 15 / 242

Page 85: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable: Walkthrough

Let Y ∗ be a continuous unobserved variable. Health, propensity to vote,etc.

A model:

Y ∗i ∼ P(y∗i |µi )µi = xiβ

with observation mechanism:

yi =

{1 y∗ ≤ τ if i is alive

0 y∗ > τ if i is dead

Since Y ∗ is unobserved anyway, define the threshold as τ = 0. (Plus thesame independence assumption, which from now on is assumed implicit.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 15 / 242

Page 86: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable: Walkthrough

Let Y ∗ be a continuous unobserved variable. Health, propensity to vote,etc.

A model:

Y ∗i ∼ P(y∗i |µi )µi = xiβ

with observation mechanism:

yi =

{1 y∗ ≤ τ if i is alive

0 y∗ > τ if i is dead

Since Y ∗ is unobserved anyway, define the threshold as τ = 0. (Plus thesame independence assumption, which from now on is assumed implicit.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 15 / 242

Page 87: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable: Walkthrough

Let Y ∗ be a continuous unobserved variable. Health, propensity to vote,etc.

A model:

Y ∗i ∼ P(y∗i |µi )µi = xiβ

with observation mechanism:

yi =

{1 y∗ ≤ τ if i is alive

0 y∗ > τ if i is dead

Since Y ∗ is unobserved anyway, define the threshold as τ = 0. (Plus thesame independence assumption, which from now on is assumed implicit.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 15 / 242

Page 88: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable: Walkthrough

Let Y ∗ be a continuous unobserved variable. Health, propensity to vote,etc.

A model:

Y ∗i ∼ P(y∗i |µi )µi = xiβ

with observation mechanism:

yi =

{1 y∗ ≤ τ if i is alive

0 y∗ > τ if i is dead

Since Y ∗ is unobserved anyway, define the threshold as τ = 0. (Plus thesame independence assumption, which from now on is assumed implicit.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 15 / 242

Page 89: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable: Walkthrough

Let Y ∗ be a continuous unobserved variable. Health, propensity to vote,etc.

A model:

Y ∗i ∼ P(y∗i |µi )µi = xiβ

with observation mechanism:

yi =

{1 y∗ ≤ τ if i is alive

0 y∗ > τ if i is dead

Since Y ∗ is unobserved anyway, define the threshold as τ = 0. (Plus thesame independence assumption, which from now on is assumed implicit.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 15 / 242

Page 90: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

1. If Y ∗ is observed and P(·) is normal, this is a regression.

2. If only yi is observed, and Y ∗ is standardized logistic (which looks closeto the normal),

P(y∗i |µi ) = STL(y∗|µi ) =exp(y∗i − µi )

[1 + exp(y∗i − µi )]2

then we get a logit model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 16 / 242

Page 91: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

1. If Y ∗ is observed and P(·) is normal, this is a regression.

2. If only yi is observed, and Y ∗ is standardized logistic (which looks closeto the normal),

P(y∗i |µi ) = STL(y∗|µi ) =exp(y∗i − µi )

[1 + exp(y∗i − µi )]2

then we get a logit model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 16 / 242

Page 92: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

1. If Y ∗ is observed and P(·) is normal, this is a regression.

2. If only yi is observed, and Y ∗ is standardized logistic (which looks closeto the normal),

P(y∗i |µi ) = STL(y∗|µi ) =exp(y∗i − µi )

[1 + exp(y∗i − µi )]2

then we get a logit model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 16 / 242

Page 93: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

1. If Y ∗ is observed and P(·) is normal, this is a regression.

2. If only yi is observed, and Y ∗ is standardized logistic (which looks closeto the normal),

P(y∗i |µi ) = STL(y∗|µi ) =exp(y∗i − µi )

[1 + exp(y∗i − µi )]2

then we get a logit model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 16 / 242

Page 94: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

3. The derivation:

Pr(Yi = 1|µi ) = Pr(Y ∗i ≤ 0)

=

∫ 0

−∞STL(y∗i |µi )dy∗i

= Fstl(0|µi ) [the CDF of the STL]

= [1 + exp(−Xiβ)]−1

The same functional form!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 17 / 242

Page 95: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

3. The derivation:

Pr(Yi = 1|µi ) = Pr(Y ∗i ≤ 0)

=

∫ 0

−∞STL(y∗i |µi )dy∗i

= Fstl(0|µi ) [the CDF of the STL]

= [1 + exp(−Xiβ)]−1

The same functional form!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 17 / 242

Page 96: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

3. The derivation:

Pr(Yi = 1|µi ) = Pr(Y ∗i ≤ 0)

=

∫ 0

−∞STL(y∗i |µi )dy∗i

= Fstl(0|µi ) [the CDF of the STL]

= [1 + exp(−Xiβ)]−1

The same functional form!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 17 / 242

Page 97: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

3. The derivation:

Pr(Yi = 1|µi ) = Pr(Y ∗i ≤ 0)

=

∫ 0

−∞STL(y∗i |µi )dy∗i

= Fstl(0|µi ) [the CDF of the STL]

= [1 + exp(−Xiβ)]−1

The same functional form!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 17 / 242

Page 98: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

3. The derivation:

Pr(Yi = 1|µi ) = Pr(Y ∗i ≤ 0)

=

∫ 0

−∞STL(y∗i |µi )dy∗i

= Fstl(0|µi ) [the CDF of the STL]

= [1 + exp(−Xiβ)]−1

The same functional form!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 17 / 242

Page 99: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

3. The derivation:

Pr(Yi = 1|µi ) = Pr(Y ∗i ≤ 0)

=

∫ 0

−∞STL(y∗i |µi )dy∗i

= Fstl(0|µi ) [the CDF of the STL]

= [1 + exp(−Xiβ)]−1

The same functional form!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 17 / 242

Page 100: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Same Logit Model, Different Justification andInterpretation

3. The derivation:

Pr(Yi = 1|µi ) = Pr(Y ∗i ≤ 0)

=

∫ 0

−∞STL(y∗i |µi )dy∗i

= Fstl(0|µi ) [the CDF of the STL]

= [1 + exp(−Xiβ)]−1

The same functional form!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 17 / 242

Page 101: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Probit Model

4. For the Probit Model, we modify:

P(y∗i |µi ) = N(y∗i |µi , 1)

with the same observation mechanism, implying

Pr(Yi = 1|µ) =

∫ 0

−∞N(y∗i |µi , 1)dy∗i = Φ(Xiβ)

5. =⇒ interpret β as regression coefficients of Y ∗ on X : β1 is whathappens to Y ∗ on average (or µi ) when X1 goes up by one unit,holding constant the other explanatory variables (and conditional onthe model). In probit, one unit of Y ∗ is one standard deviation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 18 / 242

Page 102: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Probit Model

4. For the Probit Model, we modify:

P(y∗i |µi ) = N(y∗i |µi , 1)

with the same observation mechanism, implying

Pr(Yi = 1|µ) =

∫ 0

−∞N(y∗i |µi , 1)dy∗i = Φ(Xiβ)

5. =⇒ interpret β as regression coefficients of Y ∗ on X : β1 is whathappens to Y ∗ on average (or µi ) when X1 goes up by one unit,holding constant the other explanatory variables (and conditional onthe model). In probit, one unit of Y ∗ is one standard deviation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 18 / 242

Page 103: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Probit Model

4. For the Probit Model, we modify:

P(y∗i |µi ) = N(y∗i |µi , 1)

with the same observation mechanism, implying

Pr(Yi = 1|µ) =

∫ 0

−∞N(y∗i |µi , 1)dy∗i = Φ(Xiβ)

5. =⇒ interpret β as regression coefficients of Y ∗ on X : β1 is whathappens to Y ∗ on average (or µi ) when X1 goes up by one unit,holding constant the other explanatory variables (and conditional onthe model). In probit, one unit of Y ∗ is one standard deviation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 18 / 242

Page 104: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Probit Model

4. For the Probit Model, we modify:

P(y∗i |µi ) = N(y∗i |µi , 1)

with the same observation mechanism, implying

Pr(Yi = 1|µ) =

∫ 0

−∞N(y∗i |µi , 1)dy∗i = Φ(Xiβ)

5. =⇒ interpret β as regression coefficients of Y ∗ on X : β1 is whathappens to Y ∗ on average (or µi ) when X1 goes up by one unit,holding constant the other explanatory variables (and conditional onthe model). In probit, one unit of Y ∗ is one standard deviation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 18 / 242

Page 105: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Probit Model

4. For the Probit Model, we modify:

P(y∗i |µi ) = N(y∗i |µi , 1)

with the same observation mechanism, implying

Pr(Yi = 1|µ) =

∫ 0

−∞N(y∗i |µi , 1)dy∗i = Φ(Xiβ)

5. =⇒ interpret β as regression coefficients of Y ∗ on X : β1 is whathappens to Y ∗ on average (or µi ) when X1 goes up by one unit,holding constant the other explanatory variables (and conditional onthe model). In probit, one unit of Y ∗ is one standard deviation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 18 / 242

Page 106: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

An Econometric Interpretation: Utility Maximization

Let UDi be the utility for the Democratic candidate;

and URi be the utility for the Republican candidate.

Assume UDi and UR

i are independent

Assume Uki ∼ P(Uk

i |ηki ) for k = {D,R}.Let Y ∗ ≡ UD

i − URi and apply the same interpretation as above: If

y∗ > 0, choose the Democrat, otherwise, choose the Republican.

If P(·) is normal, we get a Probit model

If P(·) is generalized extreme value, we get logit.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 19 / 242

Page 107: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

An Econometric Interpretation: Utility Maximization

Let UDi be the utility for the Democratic candidate;

and URi be the utility for the Republican candidate.

Assume UDi and UR

i are independent

Assume Uki ∼ P(Uk

i |ηki ) for k = {D,R}.Let Y ∗ ≡ UD

i − URi and apply the same interpretation as above: If

y∗ > 0, choose the Democrat, otherwise, choose the Republican.

If P(·) is normal, we get a Probit model

If P(·) is generalized extreme value, we get logit.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 19 / 242

Page 108: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

An Econometric Interpretation: Utility Maximization

Let UDi be the utility for the Democratic candidate;

and URi be the utility for the Republican candidate.

Assume UDi and UR

i are independent

Assume Uki ∼ P(Uk

i |ηki ) for k = {D,R}.Let Y ∗ ≡ UD

i − URi and apply the same interpretation as above: If

y∗ > 0, choose the Democrat, otherwise, choose the Republican.

If P(·) is normal, we get a Probit model

If P(·) is generalized extreme value, we get logit.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 19 / 242

Page 109: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

An Econometric Interpretation: Utility Maximization

Let UDi be the utility for the Democratic candidate;

and URi be the utility for the Republican candidate.

Assume UDi and UR

i are independent

Assume Uki ∼ P(Uk

i |ηki ) for k = {D,R}.

Let Y ∗ ≡ UDi − UR

i and apply the same interpretation as above: Ify∗ > 0, choose the Democrat, otherwise, choose the Republican.

If P(·) is normal, we get a Probit model

If P(·) is generalized extreme value, we get logit.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 19 / 242

Page 110: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

An Econometric Interpretation: Utility Maximization

Let UDi be the utility for the Democratic candidate;

and URi be the utility for the Republican candidate.

Assume UDi and UR

i are independent

Assume Uki ∼ P(Uk

i |ηki ) for k = {D,R}.Let Y ∗ ≡ UD

i − URi and apply the same interpretation as above: If

y∗ > 0, choose the Democrat, otherwise, choose the Republican.

If P(·) is normal, we get a Probit model

If P(·) is generalized extreme value, we get logit.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 19 / 242

Page 111: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

An Econometric Interpretation: Utility Maximization

Let UDi be the utility for the Democratic candidate;

and URi be the utility for the Republican candidate.

Assume UDi and UR

i are independent

Assume Uki ∼ P(Uk

i |ηki ) for k = {D,R}.Let Y ∗ ≡ UD

i − URi and apply the same interpretation as above: If

y∗ > 0, choose the Democrat, otherwise, choose the Republican.

If P(·) is normal, we get a Probit model

If P(·) is generalized extreme value, we get logit.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 19 / 242

Page 112: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

An Econometric Interpretation: Utility Maximization

Let UDi be the utility for the Democratic candidate;

and URi be the utility for the Republican candidate.

Assume UDi and UR

i are independent

Assume Uki ∼ P(Uk

i |ηki ) for k = {D,R}.Let Y ∗ ≡ UD

i − URi and apply the same interpretation as above: If

y∗ > 0, choose the Democrat, otherwise, choose the Republican.

If P(·) is normal, we get a Probit model

If P(·) is generalized extreme value, we get logit.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 19 / 242

Page 113: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Comparing LPM, Logit, and Probit

●● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●● ●

●●

●●

● ●

●●

● ●

●●●

●●●

● ●

●●

●● ●●●●●

●●

●●●

●●

●●

●●

● ●

●●●

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Xi

π i

LPM goes outside of [0, 1] for extreme values of Xi

LPM underestimates the marginal effect near center and overpredicts nearextremes

Logit has slightly fatter tails than probit, but no practical difference

Note that β are completely different between the models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 20 / 242

Page 114: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Comparing LPM, Logit, and Probit

●● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●● ●

●●

●●

● ●

●●

● ●

●●●

●●●

● ●

●●

●● ●●●●●

●●

●●●

●●

●●

●●

● ●

●●●

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Xi

π i

LPM (β = 0.19)

LPM goes outside of [0, 1] for extreme values of Xi

LPM underestimates the marginal effect near center and overpredicts nearextremes

Logit has slightly fatter tails than probit, but no practical difference

Note that β are completely different between the models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 20 / 242

Page 115: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Comparing LPM, Logit, and Probit

●● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●● ●

●●

●●

● ●

●●

● ●

●●●

●●●

● ●

●●

●● ●●●●●

●●

●●●

●●

●●

●●

● ●

●●●

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Xi

π i

LPM (β = 0.19)

Logit (β = 1.06)

LPM goes outside of [0, 1] for extreme values of Xi

LPM underestimates the marginal effect near center and overpredicts nearextremes

Logit has slightly fatter tails than probit, but no practical difference

Note that β are completely different between the models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 20 / 242

Page 116: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Comparing LPM, Logit, and Probit

●● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●● ●

●●

●●

● ●

●●

● ●

●●●

●●●

● ●

●●

●● ●●●●●

●●

●●●

●●

●●

●●

● ●

●●●

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Xi

π i

LPM (β = 0.19)

Logit (β = 1.06)

Probit (β = 0.63)

LPM goes outside of [0, 1] for extreme values of Xi

LPM underestimates the marginal effect near center and overpredicts nearextremes

Logit has slightly fatter tails than probit, but no practical difference

Note that β are completely different between the models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 20 / 242

Page 117: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Comparing LPM, Logit, and Probit

●● ●

●●

● ●

●●

●●

●●

●● ●

●●

●●

●●

●●

●●● ●

●●

●●

● ●

●●

● ●

●●●

●●●

● ●

●●

●● ●●●●●

●●

●●●

●●

●●

●●

● ●

●●●

−3 −2 −1 0 1 2 3

0.0

0.2

0.4

0.6

0.8

1.0

Xi

π i

LPM (β = 0.19)

Logit (β = 1.06)

Probit (β = 0.63)

LPM goes outside of [0, 1] for extreme values of Xi

LPM underestimates the marginal effect near center and overpredicts nearextremes

Logit has slightly fatter tails than probit, but no practical difference

Note that β are completely different between the models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 20 / 242

Page 118: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 21 / 242

Page 119: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 21 / 242

Page 120: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Not to Present Statistical Results

1. This one is typical of current practice, not that unusual.

2. What do these numbers mean?

3. Why so much whitespace? Can you connect cols A and B?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 22 / 242

Page 121: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Not to Present Statistical Results

1. This one is typical of current practice, not that unusual.

2. What do these numbers mean?

3. Why so much whitespace? Can you connect cols A and B?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 22 / 242

Page 122: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Not to Present Statistical Results

1. This one is typical of current practice, not that unusual.

2. What do these numbers mean?

3. Why so much whitespace? Can you connect cols A and B?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 22 / 242

Page 123: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Not to Present Statistical Results

1. This one is typical of current practice, not that unusual.

2. What do these numbers mean?

3. Why so much whitespace? Can you connect cols A and B?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 22 / 242

Page 124: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Not to Present Statistical Results

4. What does the star-gazing add?

5. Can any be interpreted as causal estimates?

6. Can you compute a quantity of interest from these numbers?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 23 / 242

Page 125: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Not to Present Statistical Results

4. What does the star-gazing add?

5. Can any be interpreted as causal estimates?

6. Can you compute a quantity of interest from these numbers?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 23 / 242

Page 126: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Not to Present Statistical Results

4. What does the star-gazing add?

5. Can any be interpreted as causal estimates?

6. Can you compute a quantity of interest from these numbers?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 23 / 242

Page 127: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Not to Present Statistical Results

4. What does the star-gazing add?

5. Can any be interpreted as causal estimates?

6. Can you compute a quantity of interest from these numbers?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 23 / 242

Page 128: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of substantiveinterest,

(b) Include reasonable measures of uncertainty about those estimates,(c) Require little specialized knowledge to understand.(d) Include no superfluous information, long lists of coefficients no one

understands, star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy a reader who hasn’t taken this course

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 24 / 242

Page 129: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of substantiveinterest,

(b) Include reasonable measures of uncertainty about those estimates,(c) Require little specialized knowledge to understand.(d) Include no superfluous information, long lists of coefficients no one

understands, star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy a reader who hasn’t taken this course

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 24 / 242

Page 130: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of substantiveinterest,

(b) Include reasonable measures of uncertainty about those estimates,(c) Require little specialized knowledge to understand.(d) Include no superfluous information, long lists of coefficients no one

understands, star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy a reader who hasn’t taken this course

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 24 / 242

Page 131: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of substantiveinterest,

(b) Include reasonable measures of uncertainty about those estimates,

(c) Require little specialized knowledge to understand.(d) Include no superfluous information, long lists of coefficients no one

understands, star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy a reader who hasn’t taken this course

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 24 / 242

Page 132: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of substantiveinterest,

(b) Include reasonable measures of uncertainty about those estimates,(c) Require little specialized knowledge to understand.

(d) Include no superfluous information, long lists of coefficients no oneunderstands, star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy a reader who hasn’t taken this course

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 24 / 242

Page 133: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of substantiveinterest,

(b) Include reasonable measures of uncertainty about those estimates,(c) Require little specialized knowledge to understand.(d) Include no superfluous information, long lists of coefficients no one

understands, star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy a reader who hasn’t taken this course

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 24 / 242

Page 134: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of substantiveinterest,

(b) Include reasonable measures of uncertainty about those estimates,(c) Require little specialized knowledge to understand.(d) Include no superfluous information, long lists of coefficients no one

understands, star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy a reader who hasn’t taken this course

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 24 / 242

Page 135: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation and Presentation

1. Statistical presentations should

(a) Convey numerically precise estimates of the quantities of substantiveinterest,

(b) Include reasonable measures of uncertainty about those estimates,(c) Require little specialized knowledge to understand.(d) Include no superfluous information, long lists of coefficients no one

understands, star gazing, etc.

2. For example: Other things being equal, an additional year of educationwould increase your annual income by $1,500 on average, plus or minusabout $500.

3. Your work should satisfy a reader who hasn’t taken this course

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 24 / 242

Page 136: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Reading

King, Tomz, Wittenberg, “Making the Most of Statistical Analyses:Improving Interpretation and Presentation” American Journal ofPolitical Science, Vol. 44, No. 2 (March, 2000): 341-355.

Hamner and Kalkan (2013). Behind the Curve: Clarifying the BestApproach to Calculating Predicted Probabilities and Marginal Effectsfrom Limited Dependent Variable Models. American Journal ofPolitical Science.

Greenhill, Ward, and Sacks (2011). The Separation Plot: A newvisual method for evaluating the fit of binary models. AmericanJournal of Political Science.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 25 / 242

Page 137: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of Interest

How to interpret β in binary outcome models?I In LPM, β is the marginal effect (βj = ∂E [Yi | X ] /∂Xij)

I In logit, β is the log odds ratio

βj = log

{Pr(Yi = 1 | Xij = 1)/Pr(Yi = 0 | Xij = 1)

Pr(Yi = 1 | Xij = 0)/Pr(Yi = 0 | Xij = 0)

}I In probit, no direct substantive interpretation of β

I In general, it is a bad practice to just present a coefficients table!

I Instead, always try to present your results in terms of aneasy-to-interpret quantity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 26 / 242

Page 138: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of Interest

How to interpret β in binary outcome models?I In LPM, β is the marginal effect (βj = ∂E [Yi | X ] /∂Xij)

I In logit, β is the log odds ratio

βj = log

{Pr(Yi = 1 | Xij = 1)/Pr(Yi = 0 | Xij = 1)

Pr(Yi = 1 | Xij = 0)/Pr(Yi = 0 | Xij = 0)

}

I In probit, no direct substantive interpretation of β

I In general, it is a bad practice to just present a coefficients table!

I Instead, always try to present your results in terms of aneasy-to-interpret quantity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 26 / 242

Page 139: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of Interest

How to interpret β in binary outcome models?I In LPM, β is the marginal effect (βj = ∂E [Yi | X ] /∂Xij)

I In logit, β is the log odds ratio

βj = log

{Pr(Yi = 1 | Xij = 1)/Pr(Yi = 0 | Xij = 1)

Pr(Yi = 1 | Xij = 0)/Pr(Yi = 0 | Xij = 0)

}I In probit, no direct substantive interpretation of β

I In general, it is a bad practice to just present a coefficients table!

I Instead, always try to present your results in terms of aneasy-to-interpret quantity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 26 / 242

Page 140: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of Interest

How to interpret β in binary outcome models?I In LPM, β is the marginal effect (βj = ∂E [Yi | X ] /∂Xij)

I In logit, β is the log odds ratio

βj = log

{Pr(Yi = 1 | Xij = 1)/Pr(Yi = 0 | Xij = 1)

Pr(Yi = 1 | Xij = 0)/Pr(Yi = 0 | Xij = 0)

}I In probit, no direct substantive interpretation of β

I In general, it is a bad practice to just present a coefficients table!

I Instead, always try to present your results in terms of aneasy-to-interpret quantity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 26 / 242

Page 141: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of Interest

How to interpret β in binary outcome models?I In LPM, β is the marginal effect (βj = ∂E [Yi | X ] /∂Xij)

I In logit, β is the log odds ratio

βj = log

{Pr(Yi = 1 | Xij = 1)/Pr(Yi = 0 | Xij = 1)

Pr(Yi = 1 | Xij = 0)/Pr(Yi = 0 | Xij = 0)

}I In probit, no direct substantive interpretation of β

I In general, it is a bad practice to just present a coefficients table!

I Instead, always try to present your results in terms of aneasy-to-interpret quantity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 26 / 242

Page 142: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Analytic Quantities of Interest

1. Predicted probability when Xi = x :

Pr(Yi = 1 | Xi = x) = π(x)

2. Average Treatment Effect (ATE):I Given the model, εi is i.i.d.

=⇒ Ti is conditionally ignorable given Wi

I Thus, ATE can be identified as

τ = E [Pr(Yi = 1 | Ti = 1,Wi )− Pr(Yi = 1 | Ti = 0,Wi )]

= E [π(Ti = 1,Wi )− π(Ti = 0,Wi )]

where E is taken with respect to both ε and W3. Marginal effects: For a continuous predictor Xij ,

∂E[Yi | X ]

∂Xij=

{βj · logit−1(X>i β)

(1− logit−1(X>i β)

)(for logit)

βj · φ(X>i β) (for probit)

Note: Depends on all Xi , so must pick a particular value

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 27 / 242

Page 143: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Analytic Quantities of Interest1. Predicted probability when Xi = x :

Pr(Yi = 1 | Xi = x) = π(x)

2. Average Treatment Effect (ATE):I Given the model, εi is i.i.d.

=⇒ Ti is conditionally ignorable given Wi

I Thus, ATE can be identified as

τ = E [Pr(Yi = 1 | Ti = 1,Wi )− Pr(Yi = 1 | Ti = 0,Wi )]

= E [π(Ti = 1,Wi )− π(Ti = 0,Wi )]

where E is taken with respect to both ε and W3. Marginal effects: For a continuous predictor Xij ,

∂E[Yi | X ]

∂Xij=

{βj · logit−1(X>i β)

(1− logit−1(X>i β)

)(for logit)

βj · φ(X>i β) (for probit)

Note: Depends on all Xi , so must pick a particular value

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 27 / 242

Page 144: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Analytic Quantities of Interest1. Predicted probability when Xi = x :

Pr(Yi = 1 | Xi = x) = π(x)

2. Average Treatment Effect (ATE):I Given the model, εi is i.i.d.

=⇒ Ti is conditionally ignorable given Wi

I Thus, ATE can be identified as

τ = E [Pr(Yi = 1 | Ti = 1,Wi )− Pr(Yi = 1 | Ti = 0,Wi )]

= E [π(Ti = 1,Wi )− π(Ti = 0,Wi )]

where E is taken with respect to both ε and W3. Marginal effects: For a continuous predictor Xij ,

∂E[Yi | X ]

∂Xij=

{βj · logit−1(X>i β)

(1− logit−1(X>i β)

)(for logit)

βj · φ(X>i β) (for probit)

Note: Depends on all Xi , so must pick a particular value

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 27 / 242

Page 145: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Analytic Quantities of Interest1. Predicted probability when Xi = x :

Pr(Yi = 1 | Xi = x) = π(x)

2. Average Treatment Effect (ATE):I Given the model, εi is i.i.d.

=⇒ Ti is conditionally ignorable given Wi

I Thus, ATE can be identified as

τ = E [Pr(Yi = 1 | Ti = 1,Wi )− Pr(Yi = 1 | Ti = 0,Wi )]

= E [π(Ti = 1,Wi )− π(Ti = 0,Wi )]

where E is taken with respect to both ε and W

3. Marginal effects: For a continuous predictor Xij ,

∂E[Yi | X ]

∂Xij=

{βj · logit−1(X>i β)

(1− logit−1(X>i β)

)(for logit)

βj · φ(X>i β) (for probit)

Note: Depends on all Xi , so must pick a particular value

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 27 / 242

Page 146: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Analytic Quantities of Interest1. Predicted probability when Xi = x :

Pr(Yi = 1 | Xi = x) = π(x)

2. Average Treatment Effect (ATE):I Given the model, εi is i.i.d.

=⇒ Ti is conditionally ignorable given Wi

I Thus, ATE can be identified as

τ = E [Pr(Yi = 1 | Ti = 1,Wi )− Pr(Yi = 1 | Ti = 0,Wi )]

= E [π(Ti = 1,Wi )− π(Ti = 0,Wi )]

where E is taken with respect to both ε and W3. Marginal effects: For a continuous predictor Xij ,

∂E[Yi | X ]

∂Xij=

{βj · logit−1(X>i β)

(1− logit−1(X>i β)

)(for logit)

βj · φ(X>i β) (for probit)

Note: Depends on all Xi , so must pick a particular value

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 27 / 242

Page 147: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Analytic Quantities of Interest1. Predicted probability when Xi = x :

Pr(Yi = 1 | Xi = x) = π(x)

2. Average Treatment Effect (ATE):I Given the model, εi is i.i.d.

=⇒ Ti is conditionally ignorable given Wi

I Thus, ATE can be identified as

τ = E [Pr(Yi = 1 | Ti = 1,Wi )− Pr(Yi = 1 | Ti = 0,Wi )]

= E [π(Ti = 1,Wi )− π(Ti = 0,Wi )]

where E is taken with respect to both ε and W3. Marginal effects: For a continuous predictor Xij ,

∂E[Yi | X ]

∂Xij=

{βj · logit−1(X>i β)

(1− logit−1(X>i β)

)(for logit)

βj · φ(X>i β) (for probit)

Note: Depends on all Xi , so must pick a particular value

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 27 / 242

Page 148: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict and Political Instability

Fearon & Laitin (2003):

Yi : Civil conflict

Ti : Political instability

Wi : Geography (log % mountainous)

Estimated model:

Pr(Yi = 1 | Ti ,Wi )

= logit−1 (−2.84 + 0.91Ti + 0.35Wi )

Predicted probability:

π(Ti = 1,Wi = 3.10) = 0.299

ATE:

τ =1

n

n∑i=1

{π(1,Wi )− π(0,Wi )}

= 0.127

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 28 / 242

Page 149: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict and Political Instability

Fearon & Laitin (2003):

Yi : Civil conflict

Ti : Political instability

Wi : Geography (log % mountainous)

Estimated model:

Pr(Yi = 1 | Ti ,Wi )

= logit−1 (−2.84 + 0.91Ti + 0.35Wi )

Predicted probability:

π(Ti = 1,Wi = 3.10) = 0.299

ATE:

τ =1

n

n∑i=1

{π(1,Wi )− π(0,Wi )}

= 0.127

●●

●●●

●●

●●●●●●●

●●

●●●●●

●●●●●●●

●●

●●●●

●●●●●

●●

●●●

●●●

●●●●●●

●●●●●

●●●●

●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●●●●●

●●●

●●

●●●

●●●●

●●

●●

●● ●●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●●●●●

●●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●●

●●●●●

●●●●●●

●●●●●●●

●●

●●●

●●

●●●

●●●●●●●

●●●●●●●

●●

●●

●●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●●●●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●●

●●

●●●

●●●●●

●●●●●●

●●●●

●●●

●●

●●

●●

●●

●●●●●●

●●●●●●

●●●●●●

●●●

●●

●●●●●●●●●

●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●●●

●●●●

●●●●

●●●

●●

●●●

●●●

●●●●●●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●●●●●

●●●

●●

●●●●●

●●●●

●●●●

●●●●●

●●

●●●●●

●●●●

●●●

●●●

●●

●●●

●●●

●●●

●●●●

●●●●

●●●●●●

●●

●●●●●●

●●

●●●●●●●●

●●

●●●●●●

●●●●

●●●●

●●●●●●

●●●●●

●●●●●

●●

●●●●

●●●●

●●

●●

●●●●●●

●●●●●●●●●●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●●●●

●●●

●●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●●●●●●

●●

●●●

●●

●●●

●●●●●●

●●●●●●●

●●

●●●

●●

●●

●●

● ●

●●●

●●●●

●●●●●●●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●●

●●●

●●●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●●●●●●

●●

●●

●●●●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●●

●●●●

●●

●●

●●

●●

●●●●●●●●●

●●

●●●●

●●

●●●●

●●●●●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●●●●

●●●

●●●

●●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●

●●

●●●

●●●

●●

●●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●●

●●●●

●●

●●●●

●●

●●●●●

●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●●

●●●●●●●●

●●

●●●

●●●●●●●●●

●●●●●

●●●●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●●

●●

●●●

●●●●●

●●●●●

●●●●●

●●●●

●●

●●

●●●●

●●●●

●●

●●●●●●●●●●●

●●●●●●

●●●

●●

●●●●●●

●●●●

●●●

●●

●●

●●●●

●●●●

●●●●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●

●●●●

●●●●

●●

●●

●●

●●●

●●●●●●●

●●●●●●●●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●●●●●●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●●

●●●

●●●●●●●

●●

●●●

●●

●●●●

●●

●●●

●●●●

●●●●●

●●●

●●

●●●●●

●●●

●●●●●●

●●●

●●

●●●

●●●● ●

●●●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●●●

●●●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●

● ●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●

●●●●●●

●●

●●●●●●

●●●●●●●

●●●●

●●●●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●●●●●

●●●

●●

●●●●●●●●●●●

●●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●

●●

●●●●

●●●

●●●●

●●●

●●

●●●●●●

●●

●●●●●

●●

●●●●●

●●●●

●●●●

●●

●●●

●●●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●

●●●●●●●

●●●

●●●

●●●●●●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●●●●●

●●●●●

●●

●●●

●●●●●●●

●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●●●●

● ●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●●

●●

●●●

●●●●●

●●●

●●●

●●●●

●●●

●●●

●●●

●●●●

●●● ●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●●

●●●●●●●●●●

●●

●●●●●●

●●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●●●●●

●●●●●

●●●●●●●●●●

●●

●●●

●●●

●●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●

●●●●●

●●●●

●●●●●

●●●●●●●

●●

●●

●●

●●●

●●

●●●●●●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●●●●●●●●

●●

●●●

●●●● ●

●●●●●●●●

●●●●●●●●

●●

●●

●●

●●●

●●●

●●●●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●●●●

●●●●●●●●●●

●●●●●●●●

●●

●●●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●●●

●●●●

●●●●

●●

●●

●●●●●●

●●●●●●●●●

●●●

●●●

●●●

●●●●●

●●●

●●●

●●●

●●●

●●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●

●●●●

●●

●●

●●●●

●●●●●●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●●●●●

●●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●●●

●●

●●●●●●

●●●●

●●●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●●

●●●●

●●

●● ●

●●

●●

●●●●

●●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●●●●

●●●●●●●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●

●●●●●●●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●●●●

●●●●

●●●●

●●

●●

●●●●

●●●●

●●

●●

●●●●●

●●●●●●

●●

●●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●

●●●●

●●

●●●●●●●●

●●●●

●●●

●●●●

●●

●●●

●●●

●●

●●●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●●●●

●●

●●

●●●

●●

●●●●●●

●●

●●●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●●●●●●●●

●●●

●●●●●●●●●

●●●●●

●●●

●●

●●●●●●●

●●●●

●●●

●●

●●

●●

●●

●●●

● ●

●●●

●●●●

●●●●

●●●●

●●

●●●

●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●

●●●

●●●

●●●

●●

●●●

●●

●●●●●

●●●

●●●●●

●●●●●●●●

●●●●● ●●

●●

●●

●●●●●●●

●●●

●●

●●

●●●●●●●●

●●

●●●

●●●

●●●

●●●●

●●

●●●●●

●●●●

●●●●●●

●●

● ●●

●●

●●●

●●●

●●●

●●

●●●

●●●●

●●●●●

●●●●

●●

●●●●●

●●●●

●●

●●●●●

●●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●●●●

●●●●●●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●●●

●●●●●●

●●●●

●●●

●●●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●

●●●●●

●●●●

●●●●●

●●●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●●

●●●

●●●●

● ●●●●●●

●●●

●●●●●

●●●●●

●●●●●●

●●

●●●●●

●●●

●●

●●●

●●

●●●●

●●

●●●●●●●●

●●●●

●●

●●●●●

●●●●●●●

●●

●●

●●●

●●

●●●

●●●●●

●●●●●●

●●●

●●

●●

●●●

●●●●●

●●●●

● ●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●●●●

●●

●●●

●●

●●

●●●

●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●

●●●●●●●●●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●●●

●●●●●

●●●●●●●

●●

●●●●●

●●

●●

●●●●

●●●●●●●●●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●●●●●

●●

●●

●●●●

●●●●

●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●

●●●

●●●

●●

●●

●●●●●●●●

●●

●●

●●

● ●

●●●●●●●

●●●

●●●●

●●●

●●

●●

●●●●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●●●

●●●

●●●●

●●

●●

●●

●●●●●●●●●

●●●●

●●●●

●●●●●●

●●●●●●●●

●●●●●●●

●●

●●●

●●

●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●●●●

●●●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●●●●●●

●●● ●

●●

●●

●●●●●●●

●●●●

●●

●●●●●●●

●●●●●

●●●●

●●●

●●

●●

●●

●●●●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●●●●

●●●●●●

●●

●●●

●●●●

●●●●●

●●●

●●●●

●●

●●●●●●●●

●●

●●●

●●●●

●●

●●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●●●●●

●●●

●●●

●●●

●●

●●●●●

●●●●

●●●●

●●●

●●●

●●●●●●

●●●●●●

●●●

●●●●

●●●●●●

●●●●●●●

●●●●

●●

●●●

●●

●●

●●●●●●

●●●●●●

●●

●●

●●

●●●●●●●●●●●

●●●

●●●●●

●●●●

●●●

●●●●

●●●

●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●●●●

●●●●●●●●

●●

●●

●●

●●

●●●

●●●●●

●●●

●●●

●●●●●

●●●●●

●●

●●

●●●●

●●●●●●●

●●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●●●●●●

●●●●●●

●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●

●●●

●●●●

●●●

●●●●●

●●●

●●●●

●●●●●●●

●●●●●●●●

●●

●●●●●●

●●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●●

●●●

●●●

●●

●●●●●●●

●●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●●

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

Wi

π(T

i,Wi)

Ti = 1

Ti = 0

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 28 / 242

Page 150: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict and Political Instability

Fearon & Laitin (2003):

Yi : Civil conflict

Ti : Political instability

Wi : Geography (log % mountainous)

Estimated model:

Pr(Yi = 1 | Ti ,Wi )

= logit−1 (−2.84 + 0.91Ti + 0.35Wi )

Predicted probability:

π(Ti = 1,Wi = 3.10) = 0.299

ATE:

τ =1

n

n∑i=1

{π(1,Wi )− π(0,Wi )}

= 0.127

●●

●●●

●●

●●●●●●●

●●

●●●●●

●●●●●●●

●●

●●●●

●●●●●

●●

●●●

●●●

●●●●●●

●●●●●

●●●●

●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●●●●●

●●●

●●

●●●

●●●●

●●

●●

●● ●●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●●●●●

●●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●●

●●●●●

●●●●●●

●●●●●●●

●●

●●●

●●

●●●

●●●●●●●

●●●●●●●

●●

●●

●●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●●●●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●●

●●

●●●

●●●●●

●●●●●●

●●●●

●●●

●●

●●

●●

●●

●●●●●●

●●●●●●

●●●●●●

●●●

●●

●●●●●●●●●

●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●●●

●●●●

●●●●

●●●

●●

●●●

●●●

●●●●●●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●●●●●

●●●

●●

●●●●●

●●●●

●●●●

●●●●●

●●

●●●●●

●●●●

●●●

●●●

●●

●●●

●●●

●●●

●●●●

●●●●

●●●●●●

●●

●●●●●●

●●

●●●●●●●●

●●

●●●●●●

●●●●

●●●●

●●●●●●

●●●●●

●●●●●

●●

●●●●

●●●●

●●

●●

●●●●●●

●●●●●●●●●●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●●●●

●●●

●●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●●●●●●

●●

●●●

●●

●●●

●●●●●●

●●●●●●●

●●

●●●

●●

●●

●●

● ●

●●●

●●●●

●●●●●●●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●●

●●●

●●●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●●●●●●

●●

●●

●●●●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●●

●●●●

●●

●●

●●

●●

●●●●●●●●●

●●

●●●●

●●

●●●●

●●●●●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●●●●

●●●

●●●

●●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●

●●

●●●

●●●

●●

●●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●●

●●●●

●●

●●●●

●●

●●●●●

●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●●

●●●●●●●●

●●

●●●

●●●●●●●●●

●●●●●

●●●●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●●

●●

●●●

●●●●●

●●●●●

●●●●●

●●●●

●●

●●

●●●●

●●●●

●●

●●●●●●●●●●●

●●●●●●

●●●

●●

●●●●●●

●●●●

●●●

●●

●●

●●●●

●●●●

●●●●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●

●●●●

●●●●

●●

●●

●●

●●●

●●●●●●●

●●●●●●●●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●●●●●●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●●

●●●

●●●●●●●

●●

●●●

●●

●●●●

●●

●●●

●●●●

●●●●●

●●●

●●

●●●●●

●●●

●●●●●●

●●●

●●

●●●

●●●● ●

●●●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●●●

●●●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●

● ●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●

●●●●●●

●●

●●●●●●

●●●●●●●

●●●●

●●●●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●●●●●

●●●

●●

●●●●●●●●●●●

●●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●

●●

●●●●

●●●

●●●●

●●●

●●

●●●●●●

●●

●●●●●

●●

●●●●●

●●●●

●●●●

●●

●●●

●●●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●

●●●●●●●

●●●

●●●

●●●●●●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●●●●●

●●●●●

●●

●●●

●●●●●●●

●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●●●●

● ●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●●

●●

●●●

●●●●●

●●●

●●●

●●●●

●●●

●●●

●●●

●●●●

●●● ●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●●

●●●●●●●●●●

●●

●●●●●●

●●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●●●●●

●●●●●

●●●●●●●●●●

●●

●●●

●●●

●●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●

●●●●●

●●●●

●●●●●

●●●●●●●

●●

●●

●●

●●●

●●

●●●●●●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●●●●●●●●

●●

●●●

●●●● ●

●●●●●●●●

●●●●●●●●

●●

●●

●●

●●●

●●●

●●●●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●●●●

●●●●●●●●●●

●●●●●●●●

●●

●●●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●●●

●●●●

●●●●

●●

●●

●●●●●●

●●●●●●●●●

●●●

●●●

●●●

●●●●●

●●●

●●●

●●●

●●●

●●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●

●●●●

●●

●●

●●●●

●●●●●●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●●●●●

●●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●●●

●●

●●●●●●

●●●●

●●●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●●

●●●●

●●

●● ●

●●

●●

●●●●

●●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●●●●

●●●●●●●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●

●●●●●●●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●●●●

●●●●

●●●●

●●

●●

●●●●

●●●●

●●

●●

●●●●●

●●●●●●

●●

●●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●

●●●●

●●

●●●●●●●●

●●●●

●●●

●●●●

●●

●●●

●●●

●●

●●●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●●●●

●●

●●

●●●

●●

●●●●●●

●●

●●●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●●●●●●●●

●●●

●●●●●●●●●

●●●●●

●●●

●●

●●●●●●●

●●●●

●●●

●●

●●

●●

●●

●●●

● ●

●●●

●●●●

●●●●

●●●●

●●

●●●

●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●

●●●

●●●

●●●

●●

●●●

●●

●●●●●

●●●

●●●●●

●●●●●●●●

●●●●● ●●

●●

●●

●●●●●●●

●●●

●●

●●

●●●●●●●●

●●

●●●

●●●

●●●

●●●●

●●

●●●●●

●●●●

●●●●●●

●●

● ●●

●●

●●●

●●●

●●●

●●

●●●

●●●●

●●●●●

●●●●

●●

●●●●●

●●●●

●●

●●●●●

●●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●●●●

●●●●●●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●●●

●●●●●●

●●●●

●●●

●●●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●

●●●●●

●●●●

●●●●●

●●●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●●

●●●

●●●●

● ●●●●●●

●●●

●●●●●

●●●●●

●●●●●●

●●

●●●●●

●●●

●●

●●●

●●

●●●●

●●

●●●●●●●●

●●●●

●●

●●●●●

●●●●●●●

●●

●●

●●●

●●

●●●

●●●●●

●●●●●●

●●●

●●

●●

●●●

●●●●●

●●●●

● ●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●●●●

●●

●●●

●●

●●

●●●

●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●

●●●●●●●●●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●●●

●●●●●

●●●●●●●

●●

●●●●●

●●

●●

●●●●

●●●●●●●●●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●●●●●

●●

●●

●●●●

●●●●

●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●

●●●

●●●

●●

●●

●●●●●●●●

●●

●●

●●

● ●

●●●●●●●

●●●

●●●●

●●●

●●

●●

●●●●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●●●

●●●

●●●●

●●

●●

●●

●●●●●●●●●

●●●●

●●●●

●●●●●●

●●●●●●●●

●●●●●●●

●●

●●●

●●

●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●●●●

●●●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●●●●●●

●●● ●

●●

●●

●●●●●●●

●●●●

●●

●●●●●●●

●●●●●

●●●●

●●●

●●

●●

●●

●●●●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●●●●

●●●●●●

●●

●●●

●●●●

●●●●●

●●●

●●●●

●●

●●●●●●●●

●●

●●●

●●●●

●●

●●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●●●●●

●●●

●●●

●●●

●●

●●●●●

●●●●

●●●●

●●●

●●●

●●●●●●

●●●●●●

●●●

●●●●

●●●●●●

●●●●●●●

●●●●

●●

●●●

●●

●●

●●●●●●

●●●●●●

●●

●●

●●

●●●●●●●●●●●

●●●

●●●●●

●●●●

●●●

●●●●

●●●

●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●●●●

●●●●●●●●

●●

●●

●●

●●

●●●

●●●●●

●●●

●●●

●●●●●

●●●●●

●●

●●

●●●●

●●●●●●●

●●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●●●●●●

●●●●●●

●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●

●●●

●●●●

●●●

●●●●●

●●●

●●●●

●●●●●●●

●●●●●●●●

●●

●●●●●●

●●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●●

●●●

●●●

●●

●●●●●●●

●●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●●

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

Wi

π(T

i,Wi)

Ti = 1

Ti = 0

●PHILIPPI , 1982πi = 0.299

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 28 / 242

Page 151: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict and Political Instability

Fearon & Laitin (2003):

Yi : Civil conflict

Ti : Political instability

Wi : Geography (log % mountainous)

Estimated model:

Pr(Yi = 1 | Ti ,Wi )

= logit−1 (−2.84 + 0.91Ti + 0.35Wi )

Predicted probability:

π(Ti = 1,Wi = 3.10) = 0.299

ATE:

τ =1

n

n∑i=1

{π(1,Wi )− π(0,Wi )}

= 0.127

●●

●●●

●●

●●●●●●●

●●

●●●●●

●●●●●●●

●●

●●●●

●●●●●

●●

●●●

●●●

●●●●●●

●●●●●

●●●●

●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●●●●●

●●●

●●

●●●

●●●●

●●

●●

●● ●●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●●●●●

●●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●●

●●●●●

●●●●●●

●●●●●●●

●●

●●●

●●

●●●

●●●●●●●

●●●●●●●

●●

●●

●●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●●●●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●●

●●

●●●

●●●●●

●●●●●●

●●●●

●●●

●●

●●

●●

●●

●●●●●●

●●●●●●

●●●●●●

●●●

●●

●●●●●●●●●

●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●●●

●●●●

●●●●

●●●

●●

●●●

●●●

●●●●●●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●●●●●

●●●

●●

●●●●●

●●●●

●●●●

●●●●●

●●

●●●●●

●●●●

●●●

●●●

●●

●●●

●●●

●●●

●●●●

●●●●

●●●●●●

●●

●●●●●●

●●

●●●●●●●●

●●

●●●●●●

●●●●

●●●●

●●●●●●

●●●●●

●●●●●

●●

●●●●

●●●●

●●

●●

●●●●●●

●●●●●●●●●●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●●●●

●●●

●●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●●●●●●

●●

●●●

●●

●●●

●●●●●●

●●●●●●●

●●

●●●

●●

●●

●●

● ●

●●●

●●●●

●●●●●●●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●●

●●●

●●●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●●●●●●

●●

●●

●●●●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●●

●●●●

●●

●●

●●

●●

●●●●●●●●●

●●

●●●●

●●

●●●●

●●●●●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●●●●

●●●

●●●

●●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●

●●

●●●

●●●

●●

●●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●●

●●●●

●●

●●●●

●●

●●●●●

●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●●

●●●●●●●●

●●

●●●

●●●●●●●●●

●●●●●

●●●●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●●

●●

●●●

●●●●●

●●●●●

●●●●●

●●●●

●●

●●

●●●●

●●●●

●●

●●●●●●●●●●●

●●●●●●

●●●

●●

●●●●●●

●●●●

●●●

●●

●●

●●●●

●●●●

●●●●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●

●●●●

●●●●

●●

●●

●●

●●●

●●●●●●●

●●●●●●●●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●●●●●●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●●

●●●

●●●●●●●

●●

●●●

●●

●●●●

●●

●●●

●●●●

●●●●●

●●●

●●

●●●●●

●●●

●●●●●●

●●●

●●

●●●

●●●● ●

●●●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●●●

●●●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●

● ●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●

●●●●●●

●●

●●●●●●

●●●●●●●

●●●●

●●●●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●●●●●

●●●

●●

●●●●●●●●●●●

●●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●

●●

●●●●

●●●

●●●●

●●●

●●

●●●●●●

●●

●●●●●

●●

●●●●●

●●●●

●●●●

●●

●●●

●●●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●

●●●●●●●

●●●

●●●

●●●●●●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●●●●●

●●●●●

●●

●●●

●●●●●●●

●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●●●●

● ●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●●

●●

●●●

●●●●●

●●●

●●●

●●●●

●●●

●●●

●●●

●●●●

●●● ●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●●

●●●●●●●●●●

●●

●●●●●●

●●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●●●●●

●●●●●

●●●●●●●●●●

●●

●●●

●●●

●●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●

●●●●●

●●●●

●●●●●

●●●●●●●

●●

●●

●●

●●●

●●

●●●●●●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●●●●●●●●

●●

●●●

●●●● ●

●●●●●●●●

●●●●●●●●

●●

●●

●●

●●●

●●●

●●●●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●●●●

●●●●●●●●●●

●●●●●●●●

●●

●●●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●●●

●●●●

●●●●

●●

●●

●●●●●●

●●●●●●●●●

●●●

●●●

●●●

●●●●●

●●●

●●●

●●●

●●●

●●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●

●●●●

●●

●●

●●●●

●●●●●●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●●●●●

●●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●●●

●●

●●●●●●

●●●●

●●●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●●

●●●●

●●

●● ●

●●

●●

●●●●

●●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●●●●

●●●●●●●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●

●●●●●●●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●●●●

●●●●

●●●●

●●

●●

●●●●

●●●●

●●

●●

●●●●●

●●●●●●

●●

●●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●

●●●●

●●

●●●●●●●●

●●●●

●●●

●●●●

●●

●●●

●●●

●●

●●●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●●●●

●●

●●

●●●

●●

●●●●●●

●●

●●●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●●●●●●●●

●●●

●●●●●●●●●

●●●●●

●●●

●●

●●●●●●●

●●●●

●●●

●●

●●

●●

●●

●●●

● ●

●●●

●●●●

●●●●

●●●●

●●

●●●

●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●

●●●

●●●

●●●

●●

●●●

●●

●●●●●

●●●

●●●●●

●●●●●●●●

●●●●● ●●

●●

●●

●●●●●●●

●●●

●●

●●

●●●●●●●●

●●

●●●

●●●

●●●

●●●●

●●

●●●●●

●●●●

●●●●●●

●●

● ●●

●●

●●●

●●●

●●●

●●

●●●

●●●●

●●●●●

●●●●

●●

●●●●●

●●●●

●●

●●●●●

●●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●●●●

●●●●●●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●●●

●●●●●●

●●●●

●●●

●●●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●

●●●●●

●●●●

●●●●●

●●●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●●

●●●

●●●●

● ●●●●●●

●●●

●●●●●

●●●●●

●●●●●●

●●

●●●●●

●●●

●●

●●●

●●

●●●●

●●

●●●●●●●●

●●●●

●●

●●●●●

●●●●●●●

●●

●●

●●●

●●

●●●

●●●●●

●●●●●●

●●●

●●

●●

●●●

●●●●●

●●●●

● ●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●●●●

●●

●●●

●●

●●

●●●

●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●

●●●●●●●●●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●●●

●●●●●

●●●●●●●

●●

●●●●●

●●

●●

●●●●

●●●●●●●●●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●●●●●

●●

●●

●●●●

●●●●

●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●

●●●

●●●

●●

●●

●●●●●●●●

●●

●●

●●

● ●

●●●●●●●

●●●

●●●●

●●●

●●

●●

●●●●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●●●

●●●

●●●●

●●

●●

●●

●●●●●●●●●

●●●●

●●●●

●●●●●●

●●●●●●●●

●●●●●●●

●●

●●●

●●

●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●●●●

●●●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●●●●●●

●●● ●

●●

●●

●●●●●●●

●●●●

●●

●●●●●●●

●●●●●

●●●●

●●●

●●

●●

●●

●●●●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●●●●

●●●●●●

●●

●●●

●●●●

●●●●●

●●●

●●●●

●●

●●●●●●●●

●●

●●●

●●●●

●●

●●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●●●●●

●●●

●●●

●●●

●●

●●●●●

●●●●

●●●●

●●●

●●●

●●●●●●

●●●●●●

●●●

●●●●

●●●●●●

●●●●●●●

●●●●

●●

●●●

●●

●●

●●●●●●

●●●●●●

●●

●●

●●

●●●●●●●●●●●

●●●

●●●●●

●●●●

●●●

●●●●

●●●

●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●●●●

●●●●●●●●

●●

●●

●●

●●

●●●

●●●●●

●●●

●●●

●●●●●

●●●●●

●●

●●

●●●●

●●●●●●●

●●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●●●●●●

●●●●●●

●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●

●●●

●●●●

●●●

●●●●●

●●●

●●●●

●●●●●●●

●●●●●●●●

●●

●●●●●●

●●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●●

●●●

●●●

●●

●●●●●●●

●●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●●

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

Wi

π(T

i,Wi)

E[^ Yi(1)]

E[^ Yi(0)]

τ = 0.127

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 28 / 242

Page 152: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict and Political Instability

Fearon & Laitin (2003):

Yi : Civil conflict

Ti : Political instability

Wi : Geography (log % mountainous)

Estimated model:

Pr(Yi = 1 | Ti ,Wi )

= logit−1 (−2.84 + 0.91Ti + 0.35Wi )

Predicted probability:

π(Ti = 1,Wi = 3.10) = 0.299

ATE:

τ =1

n

n∑i=1

{π(1,Wi )− π(0,Wi )}

= 0.127

●●

●●●

●●

●●●●●●●

●●

●●●●●

●●●●●●●

●●

●●●●

●●●●●

●●

●●●

●●●

●●●●●●

●●●●●

●●●●

●●●

●●●●●

●●●●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●●●●●

●●●

●●

●●●

●●●●

●●

●●

●● ●●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●

●●●●

●●

●●

●●●●

●●●●●

●●●●

●●●●

●●●

●●

●●●●

●●●

●●

●●●●●●

●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●●●

●●●●●

●●●●●●

●●●●●●●

●●

●●●

●●

●●●

●●●●●●●

●●●●●●●

●●

●●

●●●●

●●●

●●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●●●●●●●●●●●●●

●●

●●

●●

●●●

●●

●●●●

●●

●●

●●●

●●

●●●

●●●

●●●

●●●

●●

●●●

●●●●●

●●●●●●

●●●●

●●●

●●

●●

●●

●●

●●●●●●

●●●●●●

●●●●●●

●●●

●●

●●●●●●●●●

●●●

●●

●●●●●

●●●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●●●

●●

●●●

●●

●●●

●●●●

●●●●

●●●●

●●●

●●

●●●

●●●

●●●●●●●●

●●

●●

●●

●●●

●●●

●●●

●●

●●●

●●●●●●

●●●

●●

●●●●●

●●●●

●●●●

●●●●●

●●

●●●●●

●●●●

●●●

●●●

●●

●●●

●●●

●●●

●●●●

●●●●

●●●●●●

●●

●●●●●●

●●

●●●●●●●●

●●

●●●●●●

●●●●

●●●●

●●●●●●

●●●●●

●●●●●

●●

●●●●

●●●●

●●

●●

●●●●●●

●●●●●●●●●●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●●●●●●●

●●●

●●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●●●●

●●

●●

●●

●●

●●

●●●

●●●●●●●

●●

●●●

●●

●●●

●●●●●●

●●●●●●●

●●

●●●

●●

●●

●●

● ●

●●●

●●●●

●●●●●●●●

●●

●●

●●●●

●●

●●●●

●●●

●●

●●

●●

●●

●●●●

●●●

●●

●●●●●

●●●

●●●

●●●●●

●●●

●●●

●●

●●

●●●

●●

●●●

●●●●●●●

●●

●●

●●●●●

●●●

●●

●●●●

●●

●●●

●●●●●

●●●

●●●●

●●

●●

●●

●●

●●●●●●●●●

●●

●●●●

●●

●●●●

●●●●●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●●●●

●●●

●●●

●●●

●●

●●●

●●●

●●●●●

●●

●●●

●●●●●

●●

●●●

●●●

●●

●●●

●●●●

●●●●●●

●●●

●●

●●●

●●●●●

●●●●

●●

●●●●

●●

●●●●●

●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●●

●●

●●

●●

●●●

●●●●●●●●

●●

●●●

●●●●●●●●●

●●●●●

●●●●●

●●●

●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●

●●●

●●●

●●●

●●

●●●

●●

●●●

●●●●●

●●●●●

●●●●●

●●●●

●●

●●

●●●●

●●●●

●●

●●●●●●●●●●●

●●●●●●

●●●

●●

●●●●●●

●●●●

●●●

●●

●●

●●●●

●●●●

●●●●●●

●●●●

●●●●

●●

●●●●●

●●●●

●●●●

●●●●

●●●●

●●

●●

●●

●●●

●●●●●●●

●●●●●●●●●

●●

●●●●●

●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●●●●●●●

●●

●●●●

●●●

●●●

●●●●●

●●

●●●

●●●

●●●●●●●

●●

●●●

●●

●●●●

●●

●●●

●●●●

●●●●●

●●●

●●

●●●●●

●●●

●●●●●●

●●●

●●

●●●

●●●● ●

●●●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●

●●

●●●

●●●

●●

●●●●

●●

●●

●●●●●

●●●●

●●

●●●●

●●●

●●

●●●

●●●●●

●●

● ●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●●●

●●●

●●

●●●●●

●●

●●

●●

●●●

●●●

●●●●

●●

●●●

●●●●●●

●●

●●●●●●

●●●●●●●

●●●●

●●●●●●

●●●●

●●

●●

●●●●●

●●

●●

●●●●●●●

●●●

●●

●●●●●●●●●●●

●●●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●

●●●●●

●●●

●●

●●

●●●●

●●●

●●●●

●●●

●●

●●●●●●

●●

●●●●●

●●

●●●●●

●●●●

●●●●

●●

●●●

●●●●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●●●

●●

●●

●●

●●

●●●

●●

●●●

●●●●●●

●●●

●●●●●●●

●●●

●●●

●●●●●●●

●●●

●●

●●●

●●

●●

●●

●●●●

●●●●●●

●●

●●

●●

●●●

●●

●●●●●●

●●●●●

●●

●●●

●●●●●●●

●●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

● ●●

●●●●●

● ●

●●●

●●●

●●●

●●

●●●

●●

●●

●●●●●●

●●

●●●

●●●●●

●●●

●●●

●●●●

●●●

●●●

●●●

●●●●

●●● ●●

●●●●●●●

●●

●●

●●

●●

●●

●●

●●●●●●

●●●●

●●

●●●

●●

●●

●●

●●●●●

●●●●●●

●●

●●

●●●

●●●●●●●●●●

●●

●●●●●●

●●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●●●●●

●●●●●

●●●●●●●●●●

●●

●●●

●●●

●●●●●●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●

●●●●●

●●●●

●●●●●

●●●●●●●

●●

●●

●●

●●●

●●

●●●●●●●●

●●●●

●●●

●●

●●

●●

●●●●

●●●●

●●●●●

●●

●●●

●●

●●●

●●

●●

●●

●●●

●●●●

●●

●●●●●●●●●●

●●

●●●

●●●● ●

●●●●●●●●

●●●●●●●●

●●

●●

●●

●●●

●●●

●●●●●

●●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●●●●●

●●●●●●●●●●

●●●●●●●●

●●

●●●●

●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●●●●●

●●

●●●●

●●●●

●●●●

●●

●●

●●●●●●

●●●●●●●●●

●●●

●●●

●●●

●●●●●

●●●

●●●

●●●

●●●

●●●

●●●

●●●●

●●

●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●

●●●●●●

●●

●●●●

●●●●

●●

●●

●●●●

●●●●●●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●

●●●●●●●

●●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●●●

●●

●●●●●●

●●●●

●●●●●

●●

●●●●●

●●

●●●

●●●

●●●

●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●

●●●●

●●●●

●●●

●●●

●●●

●●●

●●●●

●●

●● ●

●●

●●

●●●●

●●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●●

●●

●●●●

●●●

●●

●●

●●●

●●●●

●●●●

●●

●●

●●

●●●●●

●●

●●

●●●●●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●●●●●●●●

●●

●●

●●●

●●●

●●

●●●●●

●●●●●●●●

●●●●●

●●

●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●

●●●●●●●●

●●

●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●●

●●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●●●●

●●●●

●●●●

●●

●●

●●●●

●●●●

●●

●●

●●●●●

●●●●●●

●●

●●●

●●

●●

●●●●●●

●●

●●

●●

●●●●●

●●●●

●●

●●●●●●●●

●●●●

●●●

●●●●

●●

●●●

●●●

●●

●●●●●●

●●●

●●

●●●●

●●●●●●

●●●

●●●●●

●●

●●

●●●

●●

●●●●●●

●●

●●●●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●●●●

●●●

●●●●●●●●●

●●●

●●●●●●●●●

●●●●●

●●●

●●

●●●●●●●

●●●●

●●●

●●

●●

●●

●●

●●●

● ●

●●●

●●●●

●●●●

●●●●

●●

●●●

●●●●●●●●●●●

●●●●●●●

●●●●●●

●●●●

●●●

●●●

●●●

●●

●●●

●●

●●●●●

●●●

●●●●●

●●●●●●●●

●●●●● ●●

●●

●●

●●●●●●●

●●●

●●

●●

●●●●●●●●

●●

●●●

●●●

●●●

●●●●

●●

●●●●●

●●●●

●●●●●●

●●

● ●●

●●

●●●

●●●

●●●

●●

●●●

●●●●

●●●●●

●●●●

●●

●●●●●

●●●●

●●

●●●●●

●●●●

●●

●●●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●●

●●●●●

●●●●●●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●●●●

●●

●●

●●●

●●●●●

●●

●●

●●●

●●●

●●●

●●●

●●

●●

●●

●●

●●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●●●

●●

●●●●●●

●●●●●

●●●●●

●●

●●

●●●

●●

●●●●

●●

●●

●●

●●●●

●●●●

●●●

●●●●●●

●●●●●●

●●●●

●●●

●●●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●

●●●

●●●

●●●●

●●●●●

●●●●

●●●●●

●●●●●

●●

●●

●●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●

●●

●●●

●●

●●●

●●●

●●●●

●●●

●●●●

● ●●●●●●

●●●

●●●●●

●●●●●

●●●●●●

●●

●●●●●

●●●

●●

●●●

●●

●●●●

●●

●●●●●●●●

●●●●

●●

●●●●●

●●●●●●●

●●

●●

●●●

●●

●●●

●●●●●

●●●●●●

●●●

●●

●●

●●●

●●●●●

●●●●

● ●

●●

●●

●●

●●●●●●

●●

●●●

●●

●●●●●

●●

●●●

●●

●●

●●●

●●●●●

●●●

●●

●●

●●●

●●●●

●●

●●●

●●

●●

●●●●

●●●●●●●●●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●●●

●●●

●●

●●●●●

●●

●●●●

●●●●●

●●●●●●●

●●

●●●●●

●●

●●

●●●●

●●●●●●●●●●●●●

●●

●●●

●●●

●●●

●●

●●●

●●

●●●●●●

●●

●●

●●●●

●●●●

●●●●●

●●

●●●●

●●

●●

●●

●●

●●●●●●●●

●●●

●●

●●●

●●●

●●

●●

●●●●●●●●

●●

●●

●●

● ●

●●●●●●●

●●●

●●●●

●●●

●●

●●

●●●●●

●●

●●●●●

●●●●

●●

●●

●●

●●

●●

●●

●●●

●●●●●

●●●●

●●●

●●●●

●●

●●

●●

●●●●●●●●●

●●●●

●●●●

●●●●●●

●●●●●●●●

●●●●●●●

●●

●●●

●●

●●●●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●●●

●●●●●

●●●●●

●●●●

●●●

●●●

●●

●●

●●●●

●●●

●●

●●●●

●●

●●●

●●

●●

●●●

●●

●●

●●●●●●●●●

●●● ●

●●

●●

●●●●●●●

●●●●

●●

●●●●●●●

●●●●●

●●●●

●●●

●●

●●

●●

●●●●●●

●●●

●●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●●●●●

●●●●●●

●●

●●●

●●●●

●●●●●

●●●

●●●●

●●

●●●●●●●●

●●

●●●

●●●●

●●

●●●

●●●●●●

●●

●●●

●●

●●●

●●

●●●●●●●

●●●

●●●

●●●

●●

●●●●●

●●●●

●●●●

●●●

●●●

●●●●●●

●●●●●●

●●●

●●●●

●●●●●●

●●●●●●●

●●●●

●●

●●●

●●

●●

●●●●●●

●●●●●●

●●

●●

●●

●●●●●●●●●●●

●●●

●●●●●

●●●●

●●●

●●●●

●●●

●●●●

●●

●●

●●●●●

●●●

●●

●●

●●●

●●●

●●

●●●

●●●●

●●●

●●●●●●●●●●●●●●●

●●●●●●

●●●

●●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●●●●

●●●

●●●

●●●●●

●●●●

●●●

●●

●●

●●●

●●●

●●●●●

●●●●●●●●

●●

●●

●●

●●

●●●

●●●●●

●●●

●●●

●●●●●

●●●●●

●●

●●

●●●●

●●●●●●●

●●●●●●●

●●●●●

●●●●●

●●

●●

●●

●●●

●●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●●●●●●●●

●●●●●●

●●●●●

●●

●●●●

●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●

●●●

●●●●

●●●

●●●●●

●●●

●●●●

●●●●●●●

●●●●●●●●

●●

●●●●●●

●●●

●●●●

●●●

●●

●●●●●

●●

●●

●●●●

●●

●●●●

●●

●●●

●●●

●●●

●●

●●

●●●

●●

●●●●●●●

●●●

●●●

●●

●●●●●●●

●●●●

●●

●●●●

●●

●●

●●●●

●●

●●●

●●●●

0 1 2 3 4

0.0

0.2

0.4

0.6

0.8

1.0

Wi

π(T

i,Wi)

Ti = 1

Ti = 0

●PHILIPPI , 1982πi = 0.299

E[^ Yi(1)]

E[^ Yi(0)]

τ = 0.127

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 28 / 242

Page 153: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Variance Estimation

The variance estimates can be used for calculating confidenceintervals for logit/probit β: [βMLE − zα/2 · s.e., βMLE + zα/2 · s.e.]But β itself is (typically) not of direct substantive interest

Predicted probability: π(x) = exp(x>βMLE )

1+exp(x>βMLE )for logit

ATE: τ = 1n

∑ni=1

(exp(βT +W>i βW )

1+exp(βT +W>i βW )− exp(W>i βW )

1+exp(W>i βW )

)for logit

How we compute standard errors for quantities like π and τ?

Three approaches:

1 Analytical approximation: the Delta method2 Simulating from sampling distributions3 Resampling: the bootstrap (parametric or nonparametric)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 29 / 242

Page 154: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Variance Estimation

The variance estimates can be used for calculating confidenceintervals for logit/probit β: [βMLE − zα/2 · s.e., βMLE + zα/2 · s.e.]

But β itself is (typically) not of direct substantive interest

Predicted probability: π(x) = exp(x>βMLE )

1+exp(x>βMLE )for logit

ATE: τ = 1n

∑ni=1

(exp(βT +W>i βW )

1+exp(βT +W>i βW )− exp(W>i βW )

1+exp(W>i βW )

)for logit

How we compute standard errors for quantities like π and τ?

Three approaches:

1 Analytical approximation: the Delta method2 Simulating from sampling distributions3 Resampling: the bootstrap (parametric or nonparametric)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 29 / 242

Page 155: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Variance Estimation

The variance estimates can be used for calculating confidenceintervals for logit/probit β: [βMLE − zα/2 · s.e., βMLE + zα/2 · s.e.]But β itself is (typically) not of direct substantive interest

Predicted probability: π(x) = exp(x>βMLE )

1+exp(x>βMLE )for logit

ATE: τ = 1n

∑ni=1

(exp(βT +W>i βW )

1+exp(βT +W>i βW )− exp(W>i βW )

1+exp(W>i βW )

)for logit

How we compute standard errors for quantities like π and τ?

Three approaches:

1 Analytical approximation: the Delta method2 Simulating from sampling distributions3 Resampling: the bootstrap (parametric or nonparametric)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 29 / 242

Page 156: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Variance Estimation

The variance estimates can be used for calculating confidenceintervals for logit/probit β: [βMLE − zα/2 · s.e., βMLE + zα/2 · s.e.]But β itself is (typically) not of direct substantive interest

Predicted probability: π(x) = exp(x>βMLE )

1+exp(x>βMLE )for logit

ATE: τ = 1n

∑ni=1

(exp(βT +W>i βW )

1+exp(βT +W>i βW )− exp(W>i βW )

1+exp(W>i βW )

)for logit

How we compute standard errors for quantities like π and τ?

Three approaches:

1 Analytical approximation: the Delta method2 Simulating from sampling distributions3 Resampling: the bootstrap (parametric or nonparametric)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 29 / 242

Page 157: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Variance Estimation

The variance estimates can be used for calculating confidenceintervals for logit/probit β: [βMLE − zα/2 · s.e., βMLE + zα/2 · s.e.]But β itself is (typically) not of direct substantive interest

Predicted probability: π(x) = exp(x>βMLE )

1+exp(x>βMLE )for logit

ATE: τ = 1n

∑ni=1

(exp(βT +W>i βW )

1+exp(βT +W>i βW )− exp(W>i βW )

1+exp(W>i βW )

)for logit

How we compute standard errors for quantities like π and τ?

Three approaches:

1 Analytical approximation: the Delta method2 Simulating from sampling distributions3 Resampling: the bootstrap (parametric or nonparametric)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 29 / 242

Page 158: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Variance Estimation

The variance estimates can be used for calculating confidenceintervals for logit/probit β: [βMLE − zα/2 · s.e., βMLE + zα/2 · s.e.]But β itself is (typically) not of direct substantive interest

Predicted probability: π(x) = exp(x>βMLE )

1+exp(x>βMLE )for logit

ATE: τ = 1n

∑ni=1

(exp(βT +W>i βW )

1+exp(βT +W>i βW )− exp(W>i βW )

1+exp(W>i βW )

)for logit

How we compute standard errors for quantities like π and τ?

Three approaches:

1 Analytical approximation: the Delta method2 Simulating from sampling distributions3 Resampling: the bootstrap (parametric or nonparametric)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 29 / 242

Page 159: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Variance Estimation

The variance estimates can be used for calculating confidenceintervals for logit/probit β: [βMLE − zα/2 · s.e., βMLE + zα/2 · s.e.]But β itself is (typically) not of direct substantive interest

Predicted probability: π(x) = exp(x>βMLE )

1+exp(x>βMLE )for logit

ATE: τ = 1n

∑ni=1

(exp(βT +W>i βW )

1+exp(βT +W>i βW )− exp(W>i βW )

1+exp(W>i βW )

)for logit

How we compute standard errors for quantities like π and τ?

Three approaches:

1 Analytical approximation: the Delta method2 Simulating from sampling distributions3 Resampling: the bootstrap (parametric or nonparametric)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 29 / 242

Page 160: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Variance Estimation

The variance estimates can be used for calculating confidenceintervals for logit/probit β: [βMLE − zα/2 · s.e., βMLE + zα/2 · s.e.]But β itself is (typically) not of direct substantive interest

Predicted probability: π(x) = exp(x>βMLE )

1+exp(x>βMLE )for logit

ATE: τ = 1n

∑ni=1

(exp(βT +W>i βW )

1+exp(βT +W>i βW )− exp(W>i βW )

1+exp(W>i βW )

)for logit

How we compute standard errors for quantities like π and τ?

Three approaches:1 Analytical approximation: the Delta method

2 Simulating from sampling distributions3 Resampling: the bootstrap (parametric or nonparametric)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 29 / 242

Page 161: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Variance Estimation

The variance estimates can be used for calculating confidenceintervals for logit/probit β: [βMLE − zα/2 · s.e., βMLE + zα/2 · s.e.]But β itself is (typically) not of direct substantive interest

Predicted probability: π(x) = exp(x>βMLE )

1+exp(x>βMLE )for logit

ATE: τ = 1n

∑ni=1

(exp(βT +W>i βW )

1+exp(βT +W>i βW )− exp(W>i βW )

1+exp(W>i βW )

)for logit

How we compute standard errors for quantities like π and τ?

Three approaches:1 Analytical approximation: the Delta method2 Simulating from sampling distributions

3 Resampling: the bootstrap (parametric or nonparametric)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 29 / 242

Page 162: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Variance Estimation

The variance estimates can be used for calculating confidenceintervals for logit/probit β: [βMLE − zα/2 · s.e., βMLE + zα/2 · s.e.]But β itself is (typically) not of direct substantive interest

Predicted probability: π(x) = exp(x>βMLE )

1+exp(x>βMLE )for logit

ATE: τ = 1n

∑ni=1

(exp(βT +W>i βW )

1+exp(βT +W>i βW )− exp(W>i βW )

1+exp(W>i βW )

)for logit

How we compute standard errors for quantities like π and τ?

Three approaches:1 Analytical approximation: the Delta method2 Simulating from sampling distributions3 Resampling: the bootstrap (parametric or nonparametric)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 29 / 242

Page 163: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Monte Carlo Approximation

For MLE, we know that θapprox.∼ N (θ,V(θ))

We can simulate this distribution by sampling from N (θ, V(θ))

For each draw θ∗, compute f (θ∗) by plugging in

Variance in the distribution of θ∗ should transfer to f (θ∗)

This leads to the algorithm of King, Tomz and Wittenberg (2000):

1. Draw R copies of θr from N (θ, V(θ))2. For each θr , compute f (θr )

3a. To obtain s.e. of f (θ), use the sample standard deviation of{f (θ1), ..., f (θR)}

3b. To compute 95% CI, use 2.5/97.5 percentiles of {f (θ1), ..., f (θR)} asthe lower/upper bounds

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 30 / 242

Page 164: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Monte Carlo Approximation

For MLE, we know that θapprox.∼ N (θ,V(θ))

We can simulate this distribution by sampling from N (θ, V(θ))

For each draw θ∗, compute f (θ∗) by plugging in

Variance in the distribution of θ∗ should transfer to f (θ∗)

This leads to the algorithm of King, Tomz and Wittenberg (2000):

1. Draw R copies of θr from N (θ, V(θ))2. For each θr , compute f (θr )

3a. To obtain s.e. of f (θ), use the sample standard deviation of{f (θ1), ..., f (θR)}

3b. To compute 95% CI, use 2.5/97.5 percentiles of {f (θ1), ..., f (θR)} asthe lower/upper bounds

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 30 / 242

Page 165: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Monte Carlo Approximation

For MLE, we know that θapprox.∼ N (θ,V(θ))

We can simulate this distribution by sampling from N (θ, V(θ))

For each draw θ∗, compute f (θ∗) by plugging in

Variance in the distribution of θ∗ should transfer to f (θ∗)

This leads to the algorithm of King, Tomz and Wittenberg (2000):

1. Draw R copies of θr from N (θ, V(θ))2. For each θr , compute f (θr )

3a. To obtain s.e. of f (θ), use the sample standard deviation of{f (θ1), ..., f (θR)}

3b. To compute 95% CI, use 2.5/97.5 percentiles of {f (θ1), ..., f (θR)} asthe lower/upper bounds

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 30 / 242

Page 166: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Monte Carlo Approximation

For MLE, we know that θapprox.∼ N (θ,V(θ))

We can simulate this distribution by sampling from N (θ, V(θ))

For each draw θ∗, compute f (θ∗) by plugging in

Variance in the distribution of θ∗ should transfer to f (θ∗)

This leads to the algorithm of King, Tomz and Wittenberg (2000):

1. Draw R copies of θr from N (θ, V(θ))2. For each θr , compute f (θr )

3a. To obtain s.e. of f (θ), use the sample standard deviation of{f (θ1), ..., f (θR)}

3b. To compute 95% CI, use 2.5/97.5 percentiles of {f (θ1), ..., f (θR)} asthe lower/upper bounds

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 30 / 242

Page 167: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Monte Carlo Approximation

For MLE, we know that θapprox.∼ N (θ,V(θ))

We can simulate this distribution by sampling from N (θ, V(θ))

For each draw θ∗, compute f (θ∗) by plugging in

Variance in the distribution of θ∗ should transfer to f (θ∗)

This leads to the algorithm of King, Tomz and Wittenberg (2000):

1. Draw R copies of θr from N (θ, V(θ))2. For each θr , compute f (θr )

3a. To obtain s.e. of f (θ), use the sample standard deviation of{f (θ1), ..., f (θR)}

3b. To compute 95% CI, use 2.5/97.5 percentiles of {f (θ1), ..., f (θR)} asthe lower/upper bounds

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 30 / 242

Page 168: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Monte Carlo Approximation

For MLE, we know that θapprox.∼ N (θ,V(θ))

We can simulate this distribution by sampling from N (θ, V(θ))

For each draw θ∗, compute f (θ∗) by plugging in

Variance in the distribution of θ∗ should transfer to f (θ∗)

This leads to the algorithm of King, Tomz and Wittenberg (2000):

1. Draw R copies of θr from N (θ, V(θ))

2. For each θr , compute f (θr )3a. To obtain s.e. of f (θ), use the sample standard deviation of{f (θ1), ..., f (θR)}

3b. To compute 95% CI, use 2.5/97.5 percentiles of {f (θ1), ..., f (θR)} asthe lower/upper bounds

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 30 / 242

Page 169: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Monte Carlo Approximation

For MLE, we know that θapprox.∼ N (θ,V(θ))

We can simulate this distribution by sampling from N (θ, V(θ))

For each draw θ∗, compute f (θ∗) by plugging in

Variance in the distribution of θ∗ should transfer to f (θ∗)

This leads to the algorithm of King, Tomz and Wittenberg (2000):

1. Draw R copies of θr from N (θ, V(θ))2. For each θr , compute f (θr )

3a. To obtain s.e. of f (θ), use the sample standard deviation of{f (θ1), ..., f (θR)}

3b. To compute 95% CI, use 2.5/97.5 percentiles of {f (θ1), ..., f (θR)} asthe lower/upper bounds

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 30 / 242

Page 170: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Monte Carlo Approximation

For MLE, we know that θapprox.∼ N (θ,V(θ))

We can simulate this distribution by sampling from N (θ, V(θ))

For each draw θ∗, compute f (θ∗) by plugging in

Variance in the distribution of θ∗ should transfer to f (θ∗)

This leads to the algorithm of King, Tomz and Wittenberg (2000):

1. Draw R copies of θr from N (θ, V(θ))2. For each θr , compute f (θr )

3a. To obtain s.e. of f (θ), use the sample standard deviation of{f (θ1), ..., f (θR)}

3b. To compute 95% CI, use 2.5/97.5 percentiles of {f (θ1), ..., f (θR)} asthe lower/upper bounds

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 30 / 242

Page 171: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Monte Carlo Approximation

For MLE, we know that θapprox.∼ N (θ,V(θ))

We can simulate this distribution by sampling from N (θ, V(θ))

For each draw θ∗, compute f (θ∗) by plugging in

Variance in the distribution of θ∗ should transfer to f (θ∗)

This leads to the algorithm of King, Tomz and Wittenberg (2000):

1. Draw R copies of θr from N (θ, V(θ))2. For each θr , compute f (θr )

3a. To obtain s.e. of f (θ), use the sample standard deviation of{f (θ1), ..., f (θR)}

3b. To compute 95% CI, use 2.5/97.5 percentiles of {f (θ1), ..., f (θR)} asthe lower/upper bounds

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 30 / 242

Page 172: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict and Political InstabilityConfidence Intervals for π(Ti = 1,Wi = 3.10):

0.25

0.30

0.35

Comparison of 95% Confidence Intervalsπ i

Delta Method Nonparametric BootstrapClarify Parametric Bootstrap

Delta Method Clarify Nonpara. B. Para. B.Normal Approximation Yes Yes No NoSimulations No Yes Yes YesDerivation-Free No Yes Yes NoComputation Speed Instant Fast Very Slow Slow

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 31 / 242

Page 173: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulation from any model must reflect all uncertainty

Yi ∼ f (θi , α) stochastic

θi = g(xi , β) systematic

Must simulate anything with uncertainty:

1. Estimation uncertainty: Lack of knowledge of β and α. (Due toinadequacies in your research design: n is not infinite.)

2. Fundamental uncertainty: Represented by the stochastic component.(Due to the nature of nature!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 32 / 242

Page 174: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulation from any model must reflect all uncertainty

Yi ∼ f (θi , α) stochastic

θi = g(xi , β) systematic

Must simulate anything with uncertainty:

1. Estimation uncertainty: Lack of knowledge of β and α. (Due toinadequacies in your research design: n is not infinite.)

2. Fundamental uncertainty: Represented by the stochastic component.(Due to the nature of nature!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 32 / 242

Page 175: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulation from any model must reflect all uncertainty

Yi ∼ f (θi , α) stochastic

θi = g(xi , β) systematic

Must simulate anything with uncertainty:

1. Estimation uncertainty: Lack of knowledge of β and α. (Due toinadequacies in your research design: n is not infinite.)

2. Fundamental uncertainty: Represented by the stochastic component.(Due to the nature of nature!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 32 / 242

Page 176: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulation from any model must reflect all uncertainty

Yi ∼ f (θi , α) stochastic

θi = g(xi , β) systematic

Must simulate anything with uncertainty:

1. Estimation uncertainty: Lack of knowledge of β and α. (Due toinadequacies in your research design: n is not infinite.)

2. Fundamental uncertainty: Represented by the stochastic component.(Due to the nature of nature!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 32 / 242

Page 177: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulation from any model must reflect all uncertainty

Yi ∼ f (θi , α) stochastic

θi = g(xi , β) systematic

Must simulate anything with uncertainty:

1. Estimation uncertainty: Lack of knowledge of β and α. (Due toinadequacies in your research design: n is not infinite.)

2. Fundamental uncertainty: Represented by the stochastic component.(Due to the nature of nature!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 32 / 242

Page 178: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulation from any model must reflect all uncertainty

Yi ∼ f (θi , α) stochastic

θi = g(xi , β) systematic

Must simulate anything with uncertainty:

1. Estimation uncertainty: Lack of knowledge of β and α. (Due toinadequacies in your research design: n is not infinite.)

2. Fundamental uncertainty: Represented by the stochastic component.(Due to the nature of nature!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 32 / 242

Page 179: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Strategy for Simulating from Generalized Linear Models

All of the models we’ve talked about so far (and for the next few weeks)belong to the class of generalized linear models (GLM).

Three elements of a GLM

A distribution for Y (stochastic component)

A linear predictor Xβ (systematic component)

A link function that relates the linear predictor to the mean of thedistribution. (systematic component)

(Note: the language is slightly different for the latent variable withobservation mechanism but the result is the same)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 33 / 242

Page 180: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Strategy for Simulating from Generalized Linear Models

All of the models we’ve talked about so far (and for the next few weeks)belong to the class of generalized linear models (GLM).

Three elements of a GLM

A distribution for Y (stochastic component)

A linear predictor Xβ (systematic component)

A link function that relates the linear predictor to the mean of thedistribution. (systematic component)

(Note: the language is slightly different for the latent variable withobservation mechanism but the result is the same)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 33 / 242

Page 181: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Strategy for Simulating from Generalized Linear Models

All of the models we’ve talked about so far (and for the next few weeks)belong to the class of generalized linear models (GLM).

Three elements of a GLM

A distribution for Y (stochastic component)

A linear predictor Xβ (systematic component)

A link function that relates the linear predictor to the mean of thedistribution. (systematic component)

(Note: the language is slightly different for the latent variable withobservation mechanism but the result is the same)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 33 / 242

Page 182: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Strategy for Simulating from Generalized Linear Models

All of the models we’ve talked about so far (and for the next few weeks)belong to the class of generalized linear models (GLM).

Three elements of a GLM

A distribution for Y (stochastic component)

A linear predictor Xβ (systematic component)

A link function that relates the linear predictor to the mean of thedistribution. (systematic component)

(Note: the language is slightly different for the latent variable withobservation mechanism but the result is the same)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 33 / 242

Page 183: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Strategy for Simulating from Generalized Linear Models

All of the models we’ve talked about so far (and for the next few weeks)belong to the class of generalized linear models (GLM).

Three elements of a GLM

A distribution for Y (stochastic component)

A linear predictor Xβ (systematic component)

A link function that relates the linear predictor to the mean of thedistribution. (systematic component)

(Note: the language is slightly different for the latent variable withobservation mechanism but the result is the same)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 33 / 242

Page 184: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Strategy for Simulating from Generalized Linear Models

All of the models we’ve talked about so far (and for the next few weeks)belong to the class of generalized linear models (GLM).

Three elements of a GLM

A distribution for Y (stochastic component)

A linear predictor Xβ (systematic component)

A link function that relates the linear predictor to the mean of thedistribution. (systematic component)

(Note: the language is slightly different for the latent variable withobservation mechanism but the result is the same)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 33 / 242

Page 185: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Strategy for Simulating from Generalized Linear Models

All of the models we’ve talked about so far (and for the next few weeks)belong to the class of generalized linear models (GLM).

Three elements of a GLM

A distribution for Y (stochastic component)

A linear predictor Xβ (systematic component)

A link function that relates the linear predictor to the mean of thedistribution. (systematic component)

(Note: the language is slightly different for the latent variable withobservation mechanism but the result is the same)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 33 / 242

Page 186: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Complete Recipe

1 Specify a distribution for Y

2 Specify a linear predictor

3 Specify a link function

4 Estimate Parameters via Maximum Likelihood

5 Simulate or Calculate Quantities of Interest

Let’s do this together for a particular example.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 34 / 242

Page 187: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Complete Recipe

1 Specify a distribution for Y

2 Specify a linear predictor

3 Specify a link function

4 Estimate Parameters via Maximum Likelihood

5 Simulate or Calculate Quantities of Interest

Let’s do this together for a particular example.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 34 / 242

Page 188: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Complete Recipe

1 Specify a distribution for Y

2 Specify a linear predictor

3 Specify a link function

4 Estimate Parameters via Maximum Likelihood

5 Simulate or Calculate Quantities of Interest

Let’s do this together for a particular example.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 34 / 242

Page 189: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Complete Recipe

1 Specify a distribution for Y

2 Specify a linear predictor

3 Specify a link function

4 Estimate Parameters via Maximum Likelihood

5 Simulate or Calculate Quantities of Interest

Let’s do this together for a particular example.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 34 / 242

Page 190: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Complete Recipe

1 Specify a distribution for Y

2 Specify a linear predictor

3 Specify a link function

4 Estimate Parameters via Maximum Likelihood

5 Simulate or Calculate Quantities of Interest

Let’s do this together for a particular example.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 34 / 242

Page 191: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Complete Recipe

1 Specify a distribution for Y

2 Specify a linear predictor

3 Specify a link function

4 Estimate Parameters via Maximum Likelihood

5 Simulate or Calculate Quantities of Interest

Let’s do this together for a particular example.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 34 / 242

Page 192: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Complete Recipe

1 Specify a distribution for Y

2 Specify a linear predictor

3 Specify a link function

4 Estimate Parameters via Maximum Likelihood

5 Simulate or Calculate Quantities of Interest

Let’s do this together for a particular example.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 34 / 242

Page 193: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Data: Political Assassinations

Taken from Olken and Jones (2009), ”Hit or Miss? The Effect ofAssassinations on Institutions and War”, American Economic Journal:Macroeconomics.

Dataframe is called as and contains information on assassinationattempts, success or failure, and various covariates.

> as[as$country == "United States" & as$year == "1975",]

country year leadername age tenure attempt

United States 1975 Ford 62 510 TRUE

survived result dem_score civil_war war

1 24 10 0 0

pop energy solo weapon

215973 2208506 1 gun

Observations are country-year-leaders, so some country-years have multipleobservations.Let’s try to predict assassination attempts with some of our covariates.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 35 / 242

Page 194: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Data: Political Assassinations

Taken from Olken and Jones (2009), ”Hit or Miss? The Effect ofAssassinations on Institutions and War”, American Economic Journal:Macroeconomics.

Dataframe is called as and contains information on assassinationattempts, success or failure, and various covariates.

> as[as$country == "United States" & as$year == "1975",]

country year leadername age tenure attempt

United States 1975 Ford 62 510 TRUE

survived result dem_score civil_war war

1 24 10 0 0

pop energy solo weapon

215973 2208506 1 gun

Observations are country-year-leaders, so some country-years have multipleobservations.Let’s try to predict assassination attempts with some of our covariates.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 35 / 242

Page 195: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Data: Political Assassinations

Taken from Olken and Jones (2009), ”Hit or Miss? The Effect ofAssassinations on Institutions and War”, American Economic Journal:Macroeconomics.

Dataframe is called as and contains information on assassinationattempts, success or failure, and various covariates.

> as[as$country == "United States" & as$year == "1975",]

country year leadername age tenure attempt

United States 1975 Ford 62 510 TRUE

survived result dem_score civil_war war

1 24 10 0 0

pop energy solo weapon

215973 2208506 1 gun

Observations are country-year-leaders, so some country-years have multipleobservations.

Let’s try to predict assassination attempts with some of our covariates.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 35 / 242

Page 196: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Data: Political Assassinations

Taken from Olken and Jones (2009), ”Hit or Miss? The Effect ofAssassinations on Institutions and War”, American Economic Journal:Macroeconomics.

Dataframe is called as and contains information on assassinationattempts, success or failure, and various covariates.

> as[as$country == "United States" & as$year == "1975",]

country year leadername age tenure attempt

United States 1975 Ford 62 510 TRUE

survived result dem_score civil_war war

1 24 10 0 0

pop energy solo weapon

215973 2208506 1 gun

Observations are country-year-leaders, so some country-years have multipleobservations.Let’s try to predict assassination attempts with some of our covariates.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 35 / 242

Page 197: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 198: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 199: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 200: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded:

Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 201: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 202: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary:

Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 203: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 204: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count:

Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 205: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 206: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration:

Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 207: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 208: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories:

Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 209: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 210: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories:

Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 211: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 212: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1. Specify a distribution for Y

Assume our data was generated from some distribution.

Examples:

Continuous and Unbounded: Normal

Binary: Bernoulli

Event Count: Poisson

Duration: Exponential

Ordered Categories: Normal with observation mechanism

Unordered Categories: Multinomial

What fits our application?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 36 / 242

Page 213: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

2. Specify a linear predictor

We are interested in allowing some parameter of the distribution θ to varyas a (linear) function of covariates. So we specify a linear predictor.

Xβ = β0 + x1β1 + x2β2 + · · ·+ xkβk

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 37 / 242

Page 214: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

2. Specify a linear predictor

We are interested in allowing some parameter of the distribution θ to varyas a (linear) function of covariates. So we specify a linear predictor.

Xβ = β0 + x1β1 + x2β2 + · · ·+ xkβk

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 37 / 242

Page 215: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What’s in our model?

We wish to predict assassination attempts for country-year-leaders.

tenure: number of days in office

age: age of leader, in years

dem score: polity score, -10 to 10

civil war: is there currently a civil war?

war: is country in an international conflict?

pop: the country’s population, in thousands

energy: energy usage

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 38 / 242

Page 216: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What’s in our model?

We wish to predict assassination attempts for country-year-leaders.

tenure: number of days in office

age: age of leader, in years

dem score: polity score, -10 to 10

civil war: is there currently a civil war?

war: is country in an international conflict?

pop: the country’s population, in thousands

energy: energy usage

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 38 / 242

Page 217: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

3. Specify a link function

The link function relates the linear predictor to some parameter θ of thedistribution for Y (usually the mean).

Let g(·) be the link function and let E (Y ) = θ be the mean of distributionfor Y .

g(θ) = Xβ

θ = g−1(Xβ)

Note that we usually use the inverse link function g−1(Xβ) rather thanthe link function.

Together with the linear predictor this forms the systematic componentthat we’ve been talking about all along.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 39 / 242

Page 218: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

3. Specify a link function

The link function relates the linear predictor to some parameter θ of thedistribution for Y (usually the mean).

Let g(·) be the link function and let E (Y ) = θ be the mean of distributionfor Y .

g(θ) = Xβ

θ = g−1(Xβ)

Note that we usually use the inverse link function g−1(Xβ) rather thanthe link function.

Together with the linear predictor this forms the systematic componentthat we’ve been talking about all along.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 39 / 242

Page 219: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

3. Specify a link function

The link function relates the linear predictor to some parameter θ of thedistribution for Y (usually the mean).

Let g(·) be the link function and let E (Y ) = θ be the mean of distributionfor Y .

g(θ) = Xβ

θ = g−1(Xβ)

Note that we usually use the inverse link function g−1(Xβ) rather thanthe link function.

Together with the linear predictor this forms the systematic componentthat we’ve been talking about all along.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 39 / 242

Page 220: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

3. Specify a link function

The link function relates the linear predictor to some parameter θ of thedistribution for Y (usually the mean).

Let g(·) be the link function and let E (Y ) = θ be the mean of distributionfor Y .

g(θ) = Xβ

θ = g−1(Xβ)

Note that we usually use the inverse link function g−1(Xβ) rather thanthe link function.

Together with the linear predictor this forms the systematic componentthat we’ve been talking about all along.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 39 / 242

Page 221: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

3. Specify a link function

The link function relates the linear predictor to some parameter θ of thedistribution for Y (usually the mean).

Let g(·) be the link function and let E (Y ) = θ be the mean of distributionfor Y .

g(θ) = Xβ

θ = g−1(Xβ)

Note that we usually use the inverse link function g−1(Xβ) rather thanthe link function.

Together with the linear predictor this forms the systematic componentthat we’ve been talking about all along.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 39 / 242

Page 222: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

3. Specify a link function

The link function relates the linear predictor to some parameter θ of thedistribution for Y (usually the mean).

Let g(·) be the link function and let E (Y ) = θ be the mean of distributionfor Y .

g(θ) = Xβ

θ = g−1(Xβ)

Note that we usually use the inverse link function g−1(Xβ) rather thanthe link function.

Together with the linear predictor this forms the systematic componentthat we’ve been talking about all along.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 39 / 242

Page 223: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

3. Specify a link function

The link function relates the linear predictor to some parameter θ of thedistribution for Y (usually the mean).

Let g(·) be the link function and let E (Y ) = θ be the mean of distributionfor Y .

g(θ) = Xβ

θ = g−1(Xβ)

Note that we usually use the inverse link function g−1(Xβ) rather thanthe link function.

Together with the linear predictor this forms the systematic componentthat we’ve been talking about all along.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 39 / 242

Page 224: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 225: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 226: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 227: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 228: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 229: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 230: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 231: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 232: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 233: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 234: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 235: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 236: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 237: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 238: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example Link Functions

Identity:

Link: µ = Xβ

Inverse:

Link: λ−1 = Xβ

Inverse Link: λ = (Xβ)−1

Logit:

Link: ln(

π1−π

)= Xβ

Inverse Link: π = 11+e−Xβ

Probit:

Link: Φ−1(π) = Xβ

Inverse Link: π = Φ(Xβ)

Log:

Link: ln(λ) = Xβ

Inverse Link: λ = exp(Xβ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 40 / 242

Page 239: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit or Probit?

“The question of which distribution to use is a natural one... There arepractical reasons for favoring one or the other in some cases formathematical convenience, but it is difficult to justify the choice of onedistribution or another on theoretical grounds ...[A]s a general proposition,the question is unresolved. In most applications, the choice between thesetwo seems not to make much difference.” -Econometric Analysis, Greene.pg. 774.

Let’s do probit. Why? Mostly to avoid giving away the problem set.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 41 / 242

Page 240: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit or Probit?

“The question of which distribution to use is a natural one... There arepractical reasons for favoring one or the other in some cases formathematical convenience, but it is difficult to justify the choice of onedistribution or another on theoretical grounds ...[A]s a general proposition,the question is unresolved. In most applications, the choice between thesetwo seems not to make much difference.” -Econometric Analysis, Greene.pg. 774.

Let’s do probit. Why? Mostly to avoid giving away the problem set.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 41 / 242

Page 241: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit or Probit?

“The question of which distribution to use is a natural one... There arepractical reasons for favoring one or the other in some cases formathematical convenience, but it is difficult to justify the choice of onedistribution or another on theoretical grounds ...[A]s a general proposition,the question is unresolved. In most applications, the choice between thesetwo seems not to make much difference.” -Econometric Analysis, Greene.pg. 774.

Let’s do probit.

Why? Mostly to avoid giving away the problem set.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 41 / 242

Page 242: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit or Probit?

“The question of which distribution to use is a natural one... There arepractical reasons for favoring one or the other in some cases formathematical convenience, but it is difficult to justify the choice of onedistribution or another on theoretical grounds ...[A]s a general proposition,the question is unresolved. In most applications, the choice between thesetwo seems not to make much difference.” -Econometric Analysis, Greene.pg. 774.

Let’s do probit. Why?

Mostly to avoid giving away the problem set.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 41 / 242

Page 243: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Logit or Probit?

“The question of which distribution to use is a natural one... There arepractical reasons for favoring one or the other in some cases formathematical convenience, but it is difficult to justify the choice of onedistribution or another on theoretical grounds ...[A]s a general proposition,the question is unresolved. In most applications, the choice between thesetwo seems not to make much difference.” -Econometric Analysis, Greene.pg. 774.

Let’s do probit. Why? Mostly to avoid giving away the problem set.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 41 / 242

Page 244: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Estimate Parameters via ML

a. Write down the likelihood

b. Estimate all the parameters by maximizing the likelihood.I In this case it would be the coefficients βI In the regression case it would be θ = {β, γ} where γ is a

reparametrization of the variance.

c. Obtain an estimate of the variance by inverting the negative Hessian

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 42 / 242

Page 245: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Estimate Parameters via ML

a. Write down the likelihood

b. Estimate all the parameters by maximizing the likelihood.I In this case it would be the coefficients βI In the regression case it would be θ = {β, γ} where γ is a

reparametrization of the variance.

c. Obtain an estimate of the variance by inverting the negative Hessian

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 42 / 242

Page 246: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Estimate Parameters via ML

a. Write down the likelihood

b. Estimate all the parameters by maximizing the likelihood.

I In this case it would be the coefficients βI In the regression case it would be θ = {β, γ} where γ is a

reparametrization of the variance.

c. Obtain an estimate of the variance by inverting the negative Hessian

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 42 / 242

Page 247: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Estimate Parameters via ML

a. Write down the likelihood

b. Estimate all the parameters by maximizing the likelihood.I In this case it would be the coefficients β

I In the regression case it would be θ = {β, γ} where γ is areparametrization of the variance.

c. Obtain an estimate of the variance by inverting the negative Hessian

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 42 / 242

Page 248: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Estimate Parameters via ML

a. Write down the likelihood

b. Estimate all the parameters by maximizing the likelihood.I In this case it would be the coefficients βI In the regression case it would be θ = {β, γ} where γ is a

reparametrization of the variance.

c. Obtain an estimate of the variance by inverting the negative Hessian

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 42 / 242

Page 249: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Estimate Parameters via ML

a. Write down the likelihood

b. Estimate all the parameters by maximizing the likelihood.I In this case it would be the coefficients βI In the regression case it would be θ = {β, γ} where γ is a

reparametrization of the variance.

c. Obtain an estimate of the variance by inverting the negative Hessian

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 42 / 242

Page 250: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4a: Write Down the Likelihood

The model:

1. Yi ∼ fbern(yi |πi ).

2. πi = Φ(Xiβ) where Φ is the CDF of the standard normal distribution.

3. Yi and Yj are independent for all i 6= j .

Like all CDF’s, Φ has range 0 to 1, so it bounds our πi to the correctspace:

Φ(z) =

∫ z

−∞

1√2π

exp(z2

2)dz .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 43 / 242

Page 251: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4a: Write Down the Likelihood

The model:

1. Yi ∼ fbern(yi |πi ).

2. πi = Φ(Xiβ) where Φ is the CDF of the standard normal distribution.

3. Yi and Yj are independent for all i 6= j .

Like all CDF’s, Φ has range 0 to 1, so it bounds our πi to the correctspace:

Φ(z) =

∫ z

−∞

1√2π

exp(z2

2)dz .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 43 / 242

Page 252: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4a: Write Down the Likelihood

We can then derive the log-likelihood for β:

L(β|y) ∝n∏

i=1

fbern(yi |πi )

=n∏

i=1

(πi )yi (1− πi )(1−yi )

Therefore:

ln L(β|y) ∝n∑

i=1

yi ln(πi ) + (1− yi ) ln(1− πi )

=n∑

i=1

yi ln(Φ(Xiβ)) + (1− yi ) ln(1− Φ(Xiβ))

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 44 / 242

Page 253: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4a: Write Down the Likelihood

We can then derive the log-likelihood for β:

L(β|y) ∝n∏

i=1

fbern(yi |πi )

=n∏

i=1

(πi )yi (1− πi )(1−yi )

Therefore:

ln L(β|y) ∝n∑

i=1

yi ln(πi ) + (1− yi ) ln(1− πi )

=n∑

i=1

yi ln(Φ(Xiβ)) + (1− yi ) ln(1− Φ(Xiβ))

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 44 / 242

Page 254: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4a: Write Down the Likelihood

We can then derive the log-likelihood for β:

L(β|y) ∝n∏

i=1

fbern(yi |πi )

=n∏

i=1

(πi )yi (1− πi )(1−yi )

Therefore:

ln L(β|y) ∝n∑

i=1

yi ln(πi ) + (1− yi ) ln(1− πi )

=n∑

i=1

yi ln(Φ(Xiβ)) + (1− yi ) ln(1− Φ(Xiβ))

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 44 / 242

Page 255: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4a: Write Down the Likelihood

We can then derive the log-likelihood for β:

L(β|y) ∝n∏

i=1

fbern(yi |πi )

=n∏

i=1

(πi )yi (1− πi )(1−yi )

Therefore:

ln L(β|y) ∝n∑

i=1

yi ln(πi ) + (1− yi ) ln(1− πi )

=n∑

i=1

yi ln(Φ(Xiβ)) + (1− yi ) ln(1− Φ(Xiβ))

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 44 / 242

Page 256: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4a: Write Down the Likelihood

We can then derive the log-likelihood for β:

L(β|y) ∝n∏

i=1

fbern(yi |πi )

=n∏

i=1

(πi )yi (1− πi )(1−yi )

Therefore:

ln L(β|y) ∝n∑

i=1

yi ln(πi ) + (1− yi ) ln(1− πi )

=n∑

i=1

yi ln(Φ(Xiβ)) + (1− yi ) ln(1− Φ(Xiβ))

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 44 / 242

Page 257: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4a: Write Down the Likelihood

We can then derive the log-likelihood for β:

L(β|y) ∝n∏

i=1

fbern(yi |πi )

=n∏

i=1

(πi )yi (1− πi )(1−yi )

Therefore:

ln L(β|y) ∝n∑

i=1

yi ln(πi ) + (1− yi ) ln(1− πi )

=n∑

i=1

yi ln(Φ(Xiβ)) + (1− yi ) ln(1− Φ(Xiβ))

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 44 / 242

Page 258: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4b: Maximize the Likelihood

First implement a function of the likelihood:

ll.probit <- function(beta, y=y, X=X){

phi <- pnorm(X%*%beta, log = TRUE)

opp.phi <- pnorm(X%*%beta, log = TRUE, lower.tail = FALSE)

logl <- sum(y*phi + (1-y)*opp.phi)

return(logl)

}

Notes:

1. the STN CDF is evaluated with pnorm. R’s pre-programmed log ofthe CDF has greater range than log(pnorm(Z)) (try Z=-50).

2. if lower.tail = FALSE gives Pr(Z ≥ z).

3. uses a logical test to check that an intercept column has been added

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 45 / 242

Page 259: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4b: Maximize the Likelihood

First implement a function of the likelihood:

ll.probit <- function(beta, y=y, X=X){

phi <- pnorm(X%*%beta, log = TRUE)

opp.phi <- pnorm(X%*%beta, log = TRUE, lower.tail = FALSE)

logl <- sum(y*phi + (1-y)*opp.phi)

return(logl)

}

Notes:

1. the STN CDF is evaluated with pnorm. R’s pre-programmed log ofthe CDF has greater range than log(pnorm(Z)) (try Z=-50).

2. if lower.tail = FALSE gives Pr(Z ≥ z).

3. uses a logical test to check that an intercept column has been added

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 45 / 242

Page 260: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4b: Maximize the Likelihood

First implement a function of the likelihood:

ll.probit <- function(beta, y=y, X=X){

phi <- pnorm(X%*%beta, log = TRUE)

opp.phi <- pnorm(X%*%beta, log = TRUE, lower.tail = FALSE)

logl <- sum(y*phi + (1-y)*opp.phi)

return(logl)

}

Notes:

1. the STN CDF is evaluated with pnorm. R’s pre-programmed log ofthe CDF has greater range than log(pnorm(Z)) (try Z=-50).

2. if lower.tail = FALSE gives Pr(Z ≥ z).

3. uses a logical test to check that an intercept column has been added

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 45 / 242

Page 261: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4b: Maximize the Likelihood

First implement a function of the likelihood:

ll.probit <- function(beta, y=y, X=X){

phi <- pnorm(X%*%beta, log = TRUE)

opp.phi <- pnorm(X%*%beta, log = TRUE, lower.tail = FALSE)

logl <- sum(y*phi + (1-y)*opp.phi)

return(logl)

}

Notes:

1. the STN CDF is evaluated with pnorm. R’s pre-programmed log ofthe CDF has greater range than log(pnorm(Z)) (try Z=-50).

2. if lower.tail = FALSE gives Pr(Z ≥ z).

3. uses a logical test to check that an intercept column has been added

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 45 / 242

Page 262: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4b: Maximize the Likelihood

First implement a function of the likelihood:

ll.probit <- function(beta, y=y, X=X){

phi <- pnorm(X%*%beta, log = TRUE)

opp.phi <- pnorm(X%*%beta, log = TRUE, lower.tail = FALSE)

logl <- sum(y*phi + (1-y)*opp.phi)

return(logl)

}

Notes:

1. the STN CDF is evaluated with pnorm. R’s pre-programmed log ofthe CDF has greater range than log(pnorm(Z)) (try Z=-50).

2. if lower.tail = FALSE gives Pr(Z ≥ z).

3. uses a logical test to check that an intercept column has been added

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 45 / 242

Page 263: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4b: Maximize the Likelihood

First implement a function of the likelihood:

ll.probit <- function(beta, y=y, X=X){

phi <- pnorm(X%*%beta, log = TRUE)

opp.phi <- pnorm(X%*%beta, log = TRUE, lower.tail = FALSE)

logl <- sum(y*phi + (1-y)*opp.phi)

return(logl)

}

Notes:

1. the STN CDF is evaluated with pnorm. R’s pre-programmed log ofthe CDF has greater range than log(pnorm(Z)) (try Z=-50).

2. if lower.tail = FALSE gives Pr(Z ≥ z).

3. uses a logical test to check that an intercept column has been added

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 45 / 242

Page 264: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4b: Maximize the Likelihood

y <- as$attempt

X <- as[,c("tenure","age","dem_score","civil_war",

"war","pop","energy")]

yX <- na.omit(cbind(y,X))

y <- yX[,1]; X <- cbind(1,yX[,-1])

opt <- optim(par = rep(0,ncol(X)), fn = ll.probit, y=y, X=X,

method = "BFGS", control = list(fnscale = -1,

maxit = 1000), hessian = TRUE)

opt$par

[1] -1.836166916 0.029141779 -0.005751652 -0.012433873 0.066287021 0.317451778 0.040863289

[8] 0.026859441

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 46 / 242

Page 265: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4c: Estimate the Variance-Covariance Matrix

vcov <- solve(-opt$hessian)

Now we can draw approximate the sampling distribution of beta.

MASS::mvrnorm(n=1, mu=opt$par, Sigma=vcov)

[1] -1.819984146 -0.001830225 -0.005933452 -0.012464456 0.122449059 0.380434336 0.072157828

[8] 0.008879418

This is stochastic so we do it again and get a different answer:

MASS::mvrnorm(n=1, mu=opt$par, Sigma=vcov)

[1] -1.792636772 0.081477117 -0.006457063 -0.013436530 0.019081307 0.255634394 0.036075468

[8] 0.073840550

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 47 / 242

Page 266: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4c: Estimate the Variance-Covariance Matrix

vcov <- solve(-opt$hessian)

Now we can draw approximate the sampling distribution of beta.

MASS::mvrnorm(n=1, mu=opt$par, Sigma=vcov)

[1] -1.819984146 -0.001830225 -0.005933452 -0.012464456 0.122449059 0.380434336 0.072157828

[8] 0.008879418

This is stochastic so we do it again and get a different answer:

MASS::mvrnorm(n=1, mu=opt$par, Sigma=vcov)

[1] -1.792636772 0.081477117 -0.006457063 -0.013436530 0.019081307 0.255634394 0.036075468

[8] 0.073840550

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 47 / 242

Page 267: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Step 4c: Estimate the Variance-Covariance Matrix

vcov <- solve(-opt$hessian)

Now we can draw approximate the sampling distribution of beta.

MASS::mvrnorm(n=1, mu=opt$par, Sigma=vcov)

[1] -1.819984146 -0.001830225 -0.005933452 -0.012464456 0.122449059 0.380434336 0.072157828

[8] 0.008879418

This is stochastic so we do it again and get a different answer:

MASS::mvrnorm(n=1, mu=opt$par, Sigma=vcov)

[1] -1.792636772 0.081477117 -0.006457063 -0.013436530 0.019081307 0.255634394 0.036075468

[8] 0.073840550

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 47 / 242

Page 268: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

What not to do...

est SE

Intercept -1.8362 1.4012tenure 0.0291 0.2937

age -0.0058 0.0251dem score -0.0124 0.0445

civil war 0.0663 0.8918war 0.3175 1.0141pop 0.0409 0.2368

energy 0.0269 0.2432

ses <- sqrt(diag(solve(-opt$hessian)))

table.dat <- cbind(opt$par, ses)

rownames(table.dat) <- colnames(X)

xtable::xtable(table.dat, digits = 4)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 48 / 242

Page 269: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

What not to do...

est SE

Intercept -1.8362 1.4012tenure 0.0291 0.2937

age -0.0058 0.0251dem score -0.0124 0.0445

civil war 0.0663 0.8918war 0.3175 1.0141pop 0.0409 0.2368

energy 0.0269 0.2432

ses <- sqrt(diag(solve(-opt$hessian)))

table.dat <- cbind(opt$par, ses)

rownames(table.dat) <- colnames(X)

xtable::xtable(table.dat, digits = 4)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 48 / 242

Page 270: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

What not to do...

est SE

Intercept -1.8362 1.4012tenure 0.0291 0.2937

age -0.0058 0.0251dem score -0.0124 0.0445

civil war 0.0663 0.8918war 0.3175 1.0141pop 0.0409 0.2368

energy 0.0269 0.2432

ses <- sqrt(diag(solve(-opt$hessian)))

table.dat <- cbind(opt$par, ses)

rownames(table.dat) <- colnames(X)

xtable::xtable(table.dat, digits = 4)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 48 / 242

Page 271: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

1 Simulate parameters from multivariate normal.

2 Run Xβ through inverse link function to get the original parameter(typically the distribution mean)

3 Draw from distribution of Y for predicted values.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 49 / 242

Page 272: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

1 Simulate parameters from multivariate normal.

2 Run Xβ through inverse link function to get the original parameter(typically the distribution mean)

3 Draw from distribution of Y for predicted values.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 49 / 242

Page 273: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

1 Simulate parameters from multivariate normal.

2 Run Xβ through inverse link function to get the original parameter(typically the distribution mean)

3 Draw from distribution of Y for predicted values.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 49 / 242

Page 274: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

1 Simulate parameters from multivariate normal.

2 Run Xβ through inverse link function to get the original parameter(typically the distribution mean)

3 Draw from distribution of Y for predicted values.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 49 / 242

Page 275: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

General considerations:

a. Incorporating estimation uncertainty.

b. Incorporating fundamental uncertainty when making predictions.

c. Establishing appropriate baseline values for QOI, and consideringplausible changes in those values.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 50 / 242

Page 276: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

General considerations:

a. Incorporating estimation uncertainty.

b. Incorporating fundamental uncertainty when making predictions.

c. Establishing appropriate baseline values for QOI, and consideringplausible changes in those values.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 50 / 242

Page 277: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

General considerations:

a. Incorporating estimation uncertainty.

b. Incorporating fundamental uncertainty when making predictions.

c. Establishing appropriate baseline values for QOI, and consideringplausible changes in those values.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 50 / 242

Page 278: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. Quantities of Interest

General considerations:

a. Incorporating estimation uncertainty.

b. Incorporating fundamental uncertainty when making predictions.

c. Establishing appropriate baseline values for QOI, and consideringplausible changes in those values.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 50 / 242

Page 279: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected Values

For this model we will be interested in estimating the predicted probabilityof an assassination attempt at some level for the covariate values. Ingeneral, E [y |X ].

Let’s consider a potentially high risk situations (we’ll call them“highrisk”,“XHR”) then we can manipulate the risk factors:

Var. Value

tenure -0.30age 54.00

dem score -3.00civil war 0.00

war 0.00pop -0.18

energy -0.23

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 51 / 242

Page 280: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected Values

For this model we will be interested in estimating the predicted probabilityof an assassination attempt at some level for the covariate values. Ingeneral, E [y |X ].

Let’s consider a potentially high risk situations (we’ll call them“highrisk”,“XHR”) then we can manipulate the risk factors:

Var. Value

tenure -0.30age 54.00

dem score -3.00civil war 0.00

war 0.00pop -0.18

energy -0.23

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 51 / 242

Page 281: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected Values

For this model we will be interested in estimating the predicted probabilityof an assassination attempt at some level for the covariate values. Ingeneral, E [y |X ].

Let’s consider a potentially high risk situations (we’ll call them“highrisk”,“XHR”) then we can manipulate the risk factors:

Var. Value

tenure -0.30age 54.00

dem score -3.00civil war 0.00

war 0.00pop -0.18

energy -0.23

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 51 / 242

Page 282: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected Values

What’s the estimated probability of an assassination at XHR?

Draw β

beta.draws <- MASS::mvrnorm(10000, mu = opt$par, Sigma = vcov)

dim(beta.draws)

[1] 10000 8

Now we simulate the outcome (warning: inefficient code!)

nsims <- 10000

p.ests <- vector(length=nrow(beta.draws))

for(i in 1:nsims){

p.ass.att <- pnorm(highrisk%*%beta.draws[i,])

outcomes <- rbinom(nsims2, 1, p.ass.att)

p.ests[i] <- mean(outcomes)

}

> mean(p.ests)

[1] 0.0166266

> quantile(p.ests, .025); quantile(p.ests, .975)

2.5% 97.5%

0.0134 0.0201

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 52 / 242

Page 283: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected Values

What’s the estimated probability of an assassination at XHR?

Draw β

beta.draws <- MASS::mvrnorm(10000, mu = opt$par, Sigma = vcov)

dim(beta.draws)

[1] 10000 8

Now we simulate the outcome (warning: inefficient code!)

nsims <- 10000

p.ests <- vector(length=nrow(beta.draws))

for(i in 1:nsims){

p.ass.att <- pnorm(highrisk%*%beta.draws[i,])

outcomes <- rbinom(nsims2, 1, p.ass.att)

p.ests[i] <- mean(outcomes)

}

> mean(p.ests)

[1] 0.0166266

> quantile(p.ests, .025); quantile(p.ests, .975)

2.5% 97.5%

0.0134 0.0201

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 52 / 242

Page 284: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected ValuesWhat are the steps that I just took?

1. simulate from the estimated sampling distribution of β to incorporate estimationuncertainty.

2. start our for-loop which will do steps 3-5 each time.

3. combine one β draw with XHR as XHR β, then plug into Φ() to get probability ofattempt for that β draw.

4. draw a bunch of outcomes from the Bernoulli(Φ(XHR β)).

5. average over those draws to get one simulated E [y |XHR ].

6. return to step 3.

0.010 0.015 0.020 0.025

050

100

150

200

Density of Expected Values at High Risk

N = 10000 Bandwidth = 0.000275

Den

sity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 53 / 242

Page 285: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected ValuesWhat are the steps that I just took?

1. simulate from the estimated sampling distribution of β to incorporate estimationuncertainty.

2. start our for-loop which will do steps 3-5 each time.

3. combine one β draw with XHR as XHR β, then plug into Φ() to get probability ofattempt for that β draw.

4. draw a bunch of outcomes from the Bernoulli(Φ(XHR β)).

5. average over those draws to get one simulated E [y |XHR ].

6. return to step 3.

0.010 0.015 0.020 0.025

050

100

150

200

Density of Expected Values at High Risk

N = 10000 Bandwidth = 0.000275

Den

sity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 53 / 242

Page 286: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected ValuesWhat are the steps that I just took?

1. simulate from the estimated sampling distribution of β to incorporate estimationuncertainty.

2. start our for-loop which will do steps 3-5 each time.

3. combine one β draw with XHR as XHR β, then plug into Φ() to get probability ofattempt for that β draw.

4. draw a bunch of outcomes from the Bernoulli(Φ(XHR β)).

5. average over those draws to get one simulated E [y |XHR ].

6. return to step 3.

0.010 0.015 0.020 0.025

050

100

150

200

Density of Expected Values at High Risk

N = 10000 Bandwidth = 0.000275

Den

sity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 53 / 242

Page 287: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected ValuesWhat are the steps that I just took?

1. simulate from the estimated sampling distribution of β to incorporate estimationuncertainty.

2. start our for-loop which will do steps 3-5 each time.

3. combine one β draw with XHR as XHR β, then plug into Φ() to get probability ofattempt for that β draw.

4. draw a bunch of outcomes from the Bernoulli(Φ(XHR β)).

5. average over those draws to get one simulated E [y |XHR ].

6. return to step 3.

0.010 0.015 0.020 0.025

050

100

150

200

Density of Expected Values at High Risk

N = 10000 Bandwidth = 0.000275

Den

sity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 53 / 242

Page 288: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected ValuesWhat are the steps that I just took?

1. simulate from the estimated sampling distribution of β to incorporate estimationuncertainty.

2. start our for-loop which will do steps 3-5 each time.

3. combine one β draw with XHR as XHR β, then plug into Φ() to get probability ofattempt for that β draw.

4. draw a bunch of outcomes from the Bernoulli(Φ(XHR β)).

5. average over those draws to get one simulated E [y |XHR ].

6. return to step 3.

0.010 0.015 0.020 0.025

050

100

150

200

Density of Expected Values at High Risk

N = 10000 Bandwidth = 0.000275

Den

sity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 53 / 242

Page 289: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected ValuesWhat are the steps that I just took?

1. simulate from the estimated sampling distribution of β to incorporate estimationuncertainty.

2. start our for-loop which will do steps 3-5 each time.

3. combine one β draw with XHR as XHR β, then plug into Φ() to get probability ofattempt for that β draw.

4. draw a bunch of outcomes from the Bernoulli(Φ(XHR β)).

5. average over those draws to get one simulated E [y |XHR ].

6. return to step 3.

0.010 0.015 0.020 0.025

050

100

150

200

Density of Expected Values at High Risk

N = 10000 Bandwidth = 0.000275

Den

sity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 53 / 242

Page 290: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected ValuesWhat are the steps that I just took?

1. simulate from the estimated sampling distribution of β to incorporate estimationuncertainty.

2. start our for-loop which will do steps 3-5 each time.

3. combine one β draw with XHR as XHR β, then plug into Φ() to get probability ofattempt for that β draw.

4. draw a bunch of outcomes from the Bernoulli(Φ(XHR β)).

5. average over those draws to get one simulated E [y |XHR ].

6. return to step 3.

0.010 0.015 0.020 0.025

050

100

150

200

Density of Expected Values at High Risk

N = 10000 Bandwidth = 0.000275

Den

sity

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 53 / 242

Page 291: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected Values: A Shortcut

There is a shorter way to come up with the same answer, but it requiressome care in its application.

beta.draws <- mvrnorm(10000, mu = opt$par, Sigma

= solve(-opt$hessian))

p.ests2 <- pnorm(highrisk%*%t(beta.draws))

> mean(p.ests2)

[1] 0.01659705

> quantile(p.ests2, .025); quantile(p.ests2, .975)

2.5% 97.5%

0.01395935 0.01955867

This shortcut works because E [y |XHR ] = πHR ; i.e. the parameter is theexpected value of the outcome.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 54 / 242

Page 292: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected Values: A Shortcut

There is a shorter way to come up with the same answer, but it requiressome care in its application.

beta.draws <- mvrnorm(10000, mu = opt$par, Sigma

= solve(-opt$hessian))

p.ests2 <- pnorm(highrisk%*%t(beta.draws))

> mean(p.ests2)

[1] 0.01659705

> quantile(p.ests2, .025); quantile(p.ests2, .975)

2.5% 97.5%

0.01395935 0.01955867

This shortcut works because E [y |XHR ] = πHR ; i.e. the parameter is theexpected value of the outcome.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 54 / 242

Page 293: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected Values: A Shortcut

There is a shorter way to come up with the same answer, but it requiressome care in its application.

beta.draws <- mvrnorm(10000, mu = opt$par, Sigma

= solve(-opt$hessian))

p.ests2 <- pnorm(highrisk%*%t(beta.draws))

> mean(p.ests2)

[1] 0.01659705

> quantile(p.ests2, .025); quantile(p.ests2, .975)

2.5% 97.5%

0.01395935 0.01955867

This shortcut works because E [y |XHR ] = πHR ; i.e. the parameter is theexpected value of the outcome.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 54 / 242

Page 294: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When wouldn’t this work?

Ex.: suppose that yi ∼ Expo(λi ) where λi = exp(Xiβ). We could find ourlikelihood, insert our parameterization of λi for each i , and then maximize to findβ as usual.

Thus, for some baseline set of covariates XBL, we now have a simulated samplingdistribution for λBL which has a mean at E [exp(XBLβ)].

Its not too hard to show that if y ∼ Expo(λ), then E [y ] = 1λ . The temptation is

then to declare that because E [λBL] = E [exp(XBLβ)] then

E [y ] = 1/E [exp(XBLβ)].

It turns out this is not the case because E [1/λ] 6= 1/E [λ]. The first averages over

the sampling distribution of the means of y . The second averages over the

sampling distribution of λ then plugs into the formula for the mean of y .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 55 / 242

Page 295: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When wouldn’t this work?

Ex.: suppose that yi ∼ Expo(λi ) where λi = exp(Xiβ). We could find ourlikelihood, insert our parameterization of λi for each i , and then maximize to findβ as usual.

Thus, for some baseline set of covariates XBL, we now have a simulated samplingdistribution for λBL which has a mean at E [exp(XBLβ)].

Its not too hard to show that if y ∼ Expo(λ), then E [y ] = 1λ . The temptation is

then to declare that because E [λBL] = E [exp(XBLβ)] then

E [y ] = 1/E [exp(XBLβ)].

It turns out this is not the case because E [1/λ] 6= 1/E [λ]. The first averages over

the sampling distribution of the means of y . The second averages over the

sampling distribution of λ then plugs into the formula for the mean of y .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 55 / 242

Page 296: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When wouldn’t this work?

Ex.: suppose that yi ∼ Expo(λi ) where λi = exp(Xiβ). We could find ourlikelihood, insert our parameterization of λi for each i , and then maximize to findβ as usual.

Thus, for some baseline set of covariates XBL, we now have a simulated samplingdistribution for λBL which has a mean at E [exp(XBLβ)].

Its not too hard to show that if y ∼ Expo(λ), then E [y ] = 1λ . The temptation is

then to declare that because E [λBL] = E [exp(XBLβ)] then

E [y ] = 1/E [exp(XBLβ)].

It turns out this is not the case because E [1/λ] 6= 1/E [λ]. The first averages over

the sampling distribution of the means of y . The second averages over the

sampling distribution of λ then plugs into the formula for the mean of y .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 55 / 242

Page 297: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When wouldn’t this work?

Ex.: suppose that yi ∼ Expo(λi ) where λi = exp(Xiβ). We could find ourlikelihood, insert our parameterization of λi for each i , and then maximize to findβ as usual.

Thus, for some baseline set of covariates XBL, we now have a simulated samplingdistribution for λBL which has a mean at E [exp(XBLβ)].

Its not too hard to show that if y ∼ Expo(λ), then E [y ] = 1λ . The temptation is

then to declare that because E [λBL] = E [exp(XBLβ)] then

E [y ] = 1/E [exp(XBLβ)].

It turns out this is not the case because E [1/λ] 6= 1/E [λ]. The first averages over

the sampling distribution of the means of y . The second averages over the

sampling distribution of λ then plugs into the formula for the mean of y .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 55 / 242

Page 298: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When wouldn’t this work?

Ex.: suppose that yi ∼ Expo(λi ) where λi = exp(Xiβ). We could find ourlikelihood, insert our parameterization of λi for each i , and then maximize to findβ as usual.

Thus, for some baseline set of covariates XBL, we now have a simulated samplingdistribution for λBL which has a mean at E [exp(XBLβ)].

Its not too hard to show that if y ∼ Expo(λ), then E [y ] = 1λ . The temptation is

then to declare that because E [λBL] = E [exp(XBLβ)] then

E [y ] = 1/E [exp(XBLβ)].

It turns out this is not the case because E [1/λ] 6= 1/E [λ]. The first averages over

the sampling distribution of the means of y . The second averages over the

sampling distribution of λ then plugs into the formula for the mean of y .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 55 / 242

Page 299: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When wouldn’t this work?

Why this annoying wrinkle?

Jensen’s inequality: given a random variableX, E [g(X )] 6= g(E [X ]) (it’s ≥ if g(·) is concave; ≤ if g(·) is convex).

Why can we use our shortcut with the Probit model? If Y ∼ Bern(π) then

E [Y ] = π. Our guess would then be that E [Y ] = E [Φ(XHR β)] which isfine because 1 · E [Φ(XHR β)] = E [1 · Φ(XHR β)].

Rule of thumb: if E [Y ] = θ, you are safe taking the shortcut.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 56 / 242

Page 300: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When wouldn’t this work?

Why this annoying wrinkle? Jensen’s inequality: given a random variableX, E [g(X )] 6= g(E [X ]) (it’s ≥ if g(·) is concave; ≤ if g(·) is convex).

Why can we use our shortcut with the Probit model? If Y ∼ Bern(π) then

E [Y ] = π. Our guess would then be that E [Y ] = E [Φ(XHR β)] which isfine because 1 · E [Φ(XHR β)] = E [1 · Φ(XHR β)].

Rule of thumb: if E [Y ] = θ, you are safe taking the shortcut.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 56 / 242

Page 301: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When wouldn’t this work?

Why this annoying wrinkle? Jensen’s inequality: given a random variableX, E [g(X )] 6= g(E [X ]) (it’s ≥ if g(·) is concave; ≤ if g(·) is convex).

Why can we use our shortcut with the Probit model?

If Y ∼ Bern(π) then

E [Y ] = π. Our guess would then be that E [Y ] = E [Φ(XHR β)] which isfine because 1 · E [Φ(XHR β)] = E [1 · Φ(XHR β)].

Rule of thumb: if E [Y ] = θ, you are safe taking the shortcut.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 56 / 242

Page 302: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When wouldn’t this work?

Why this annoying wrinkle? Jensen’s inequality: given a random variableX, E [g(X )] 6= g(E [X ]) (it’s ≥ if g(·) is concave; ≤ if g(·) is convex).

Why can we use our shortcut with the Probit model? If Y ∼ Bern(π) then

E [Y ] = π. Our guess would then be that E [Y ] = E [Φ(XHR β)] which isfine because 1 · E [Φ(XHR β)] = E [1 · Φ(XHR β)].

Rule of thumb: if E [Y ] = θ, you are safe taking the shortcut.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 56 / 242

Page 303: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When wouldn’t this work?

Why this annoying wrinkle? Jensen’s inequality: given a random variableX, E [g(X )] 6= g(E [X ]) (it’s ≥ if g(·) is concave; ≤ if g(·) is convex).

Why can we use our shortcut with the Probit model? If Y ∼ Bern(π) then

E [Y ] = π. Our guess would then be that E [Y ] = E [Φ(XHR β)] which isfine because 1 · E [Φ(XHR β)] = E [1 · Φ(XHR β)].

Rule of thumb: if E [Y ] = θ, you are safe taking the shortcut.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 56 / 242

Page 304: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

More Expected Values

What if I want a bunch of these to see how expected values change withsome variable?

dem.rng <- -10:10

p.ests <- matrix(data = NA, ncol = length(dem.rng),

nrow=10000)

for(j in 1:length(dem.rng)){

highrisk.dem <- highrisk

highrisk.dem["dem_score"] <- dem.rng[j]

p.ests[,j] <- pnorm(highrisk.dem%*%t(beta.draws))

}

plot(dem.rng, apply(p.ests,2,mean), ylim = c(0,.028))

segments(x0 = dem.rng, x1 = dem.rng,

y0 = apply(p.ests, 2, quantile, .025),

y1 = apply(p.ests, 2, quantile, .975))

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 57 / 242

Page 305: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

More Expected Values

What if I want a bunch of these to see how expected values change withsome variable?

dem.rng <- -10:10

p.ests <- matrix(data = NA, ncol = length(dem.rng),

nrow=10000)

for(j in 1:length(dem.rng)){

highrisk.dem <- highrisk

highrisk.dem["dem_score"] <- dem.rng[j]

p.ests[,j] <- pnorm(highrisk.dem%*%t(beta.draws))

}

plot(dem.rng, apply(p.ests,2,mean), ylim = c(0,.028))

segments(x0 = dem.rng, x1 = dem.rng,

y0 = apply(p.ests, 2, quantile, .025),

y1 = apply(p.ests, 2, quantile, .975))

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 57 / 242

Page 306: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

●●

●●

●●

●●

● ● ● ● ● ● ● ● ● ● ● ● ●

−10 −5 0 5 10

0.00

00.

010

0.02

0

Dem Score

Pro

babi

lity

of A

ss. A

ttem

pt

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 58 / 242

Page 307: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predicted Values

What if someone asks me to predict whether an assassination will takeplace? If I were to simulate from the distribution of β, and then draw 1value from the stochastic component for each simulation I would get apredictive distribution for y |X .

Q: What will it look like?

There is no need to actually conduct the simulation, though. The

simulated outcomes will be Bern(E [y |X ]) = Bern(.166). How is thisdifferent than the linear regression case?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 59 / 242

Page 308: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predicted Values

What if someone asks me to predict whether an assassination will takeplace? If I were to simulate from the distribution of β, and then draw 1value from the stochastic component for each simulation I would get apredictive distribution for y |X .Q: What will it look like?

There is no need to actually conduct the simulation, though. The

simulated outcomes will be Bern(E [y |X ]) = Bern(.166). How is thisdifferent than the linear regression case?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 59 / 242

Page 309: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predicted Values

What if someone asks me to predict whether an assassination will takeplace? If I were to simulate from the distribution of β, and then draw 1value from the stochastic component for each simulation I would get apredictive distribution for y |X .Q: What will it look like?

There is no need to actually conduct the simulation, though. The

simulated outcomes will be Bern(E [y |X ]) = Bern(.166). How is thisdifferent than the linear regression case?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 59 / 242

Page 310: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Differences

Compare expected values of the outcome for two different scenarios,usually all predictors held constant but one. Recall from our regressionresults that war seemed to have a big positive effect on probability of anassassination attempt.

So let’s find:E [y |XWar ]− E [y |XNowar ].

Each of these are just fitted values for the probability parameter, with allcovariates at the highrisk values except war, which we control.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 60 / 242

Page 311: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Differences

Compare expected values of the outcome for two different scenarios,usually all predictors held constant but one. Recall from our regressionresults that war seemed to have a big positive effect on probability of anassassination attempt.

So let’s find:E [y |XWar ]− E [y |XNowar ].

Each of these are just fitted values for the probability parameter, with allcovariates at the highrisk values except war, which we control.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 60 / 242

Page 312: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Differences

Compare expected values of the outcome for two different scenarios,usually all predictors held constant but one. Recall from our regressionresults that war seemed to have a big positive effect on probability of anassassination attempt.

So let’s find:E [y |XWar ]− E [y |XNowar ].

Each of these are just fitted values for the probability parameter, with allcovariates at the highrisk values except war, which we control.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 60 / 242

Page 313: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Differences

highrisk.war <- highrisk

highrisk.war["war"] <- 1

highrisk.nowar <- highrisk

highrisk.nowar["war"] <- 0

fd.ests <- pnorm(highrisk.war%*%t(beta.draws)) -

pnorm(highrisk.nowar%*%t(beta.draws))

> mean(fd.ests)

[1] 0.01891

> quantile(fd.ests, .025); quantile(fd.ests, .975)

2.5% 97.5%

0.00578 0.03609

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 61 / 242

Page 314: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Differences

highrisk.war <- highrisk

highrisk.war["war"] <- 1

highrisk.nowar <- highrisk

highrisk.nowar["war"] <- 0

fd.ests <- pnorm(highrisk.war%*%t(beta.draws)) -

pnorm(highrisk.nowar%*%t(beta.draws))

> mean(fd.ests)

[1] 0.01891

> quantile(fd.ests, .025); quantile(fd.ests, .975)

2.5% 97.5%

0.00578 0.03609

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 61 / 242

Page 315: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating (Parameter) Estimation Uncertainty

To take one random draw of all the parameters γ = (~β, α) from their“sampling distribution” (or “posterior distribution” with a flat prior):

1. Estimate the model by maximizing the likelihood function, record thepoint estimates γ and variance matrix V(γ).

2. Draw the vector γ from the multivariate normal distribution:

γ ∼ N(γ, V(γ)

)Denote the draw γ = vec(β, α), which has k elements.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 62 / 242

Page 316: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating (Parameter) Estimation Uncertainty

To take one random draw of all the parameters γ = (~β, α) from their“sampling distribution” (or “posterior distribution” with a flat prior):

1. Estimate the model by maximizing the likelihood function, record thepoint estimates γ and variance matrix V(γ).

2. Draw the vector γ from the multivariate normal distribution:

γ ∼ N(γ, V(γ)

)Denote the draw γ = vec(β, α), which has k elements.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 62 / 242

Page 317: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating (Parameter) Estimation Uncertainty

To take one random draw of all the parameters γ = (~β, α) from their“sampling distribution” (or “posterior distribution” with a flat prior):

1. Estimate the model by maximizing the likelihood function, record thepoint estimates γ and variance matrix V(γ).

2. Draw the vector γ from the multivariate normal distribution:

γ ∼ N(γ, V(γ)

)Denote the draw γ = vec(β, α), which has k elements.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 62 / 242

Page 318: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating (Parameter) Estimation Uncertainty

To take one random draw of all the parameters γ = (~β, α) from their“sampling distribution” (or “posterior distribution” with a flat prior):

1. Estimate the model by maximizing the likelihood function, record thepoint estimates γ and variance matrix V(γ).

2. Draw the vector γ from the multivariate normal distribution:

γ ∼ N(γ, V(γ)

)

Denote the draw γ = vec(β, α), which has k elements.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 62 / 242

Page 319: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating (Parameter) Estimation Uncertainty

To take one random draw of all the parameters γ = (~β, α) from their“sampling distribution” (or “posterior distribution” with a flat prior):

1. Estimate the model by maximizing the likelihood function, record thepoint estimates γ and variance matrix V(γ).

2. Draw the vector γ from the multivariate normal distribution:

γ ∼ N(γ, V(γ)

)Denote the draw γ = vec(β, α), which has k elements.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 62 / 242

Page 320: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Predicted Values, ∼ Y

Predicted values can be for:

1. Forecasts: about the future

2. Farcasts: about some area for which you have no y

3. Nowcasts: about the current data (perhaps to reproduce it to see whether it fits)

To simulate one predicted value, follow these steps:

1. Draw one value of γ = vec(β, α).

2. Choose a predicted value to compute, defined by one value for each explanatoryvariable as the vector Xc .

3. Extract simulated β from γ; compute θc = g(Xc , β) (from systematic component)

4. Simulate outcome variable Yc ∼ f (θc , α) (from stochastic component)

Repeat algorithm say M = 1000 times, to produce 1000 predicted values. Use these to

compute a histogram for the full posterior, the average, variance, percentile values, or

others.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 63 / 242

Page 321: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Predicted Values, ∼ Y

Predicted values can be for:

1. Forecasts: about the future

2. Farcasts: about some area for which you have no y

3. Nowcasts: about the current data (perhaps to reproduce it to see whether it fits)

To simulate one predicted value, follow these steps:

1. Draw one value of γ = vec(β, α).

2. Choose a predicted value to compute, defined by one value for each explanatoryvariable as the vector Xc .

3. Extract simulated β from γ; compute θc = g(Xc , β) (from systematic component)

4. Simulate outcome variable Yc ∼ f (θc , α) (from stochastic component)

Repeat algorithm say M = 1000 times, to produce 1000 predicted values. Use these to

compute a histogram for the full posterior, the average, variance, percentile values, or

others.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 63 / 242

Page 322: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Predicted Values, ∼ Y

Predicted values can be for:

1. Forecasts: about the future

2. Farcasts: about some area for which you have no y

3. Nowcasts: about the current data (perhaps to reproduce it to see whether it fits)

To simulate one predicted value, follow these steps:

1. Draw one value of γ = vec(β, α).

2. Choose a predicted value to compute, defined by one value for each explanatoryvariable as the vector Xc .

3. Extract simulated β from γ; compute θc = g(Xc , β) (from systematic component)

4. Simulate outcome variable Yc ∼ f (θc , α) (from stochastic component)

Repeat algorithm say M = 1000 times, to produce 1000 predicted values. Use these to

compute a histogram for the full posterior, the average, variance, percentile values, or

others.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 63 / 242

Page 323: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Predicted Values, ∼ Y

Predicted values can be for:

1. Forecasts: about the future

2. Farcasts: about some area for which you have no y

3. Nowcasts: about the current data (perhaps to reproduce it to see whether it fits)

To simulate one predicted value, follow these steps:

1. Draw one value of γ = vec(β, α).

2. Choose a predicted value to compute, defined by one value for each explanatoryvariable as the vector Xc .

3. Extract simulated β from γ; compute θc = g(Xc , β) (from systematic component)

4. Simulate outcome variable Yc ∼ f (θc , α) (from stochastic component)

Repeat algorithm say M = 1000 times, to produce 1000 predicted values. Use these to

compute a histogram for the full posterior, the average, variance, percentile values, or

others.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 63 / 242

Page 324: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Predicted Values, ∼ Y

Predicted values can be for:

1. Forecasts: about the future

2. Farcasts: about some area for which you have no y

3. Nowcasts: about the current data (perhaps to reproduce it to see whether it fits)

To simulate one predicted value, follow these steps:

1. Draw one value of γ = vec(β, α).

2. Choose a predicted value to compute, defined by one value for each explanatoryvariable as the vector Xc .

3. Extract simulated β from γ; compute θc = g(Xc , β) (from systematic component)

4. Simulate outcome variable Yc ∼ f (θc , α) (from stochastic component)

Repeat algorithm say M = 1000 times, to produce 1000 predicted values. Use these to

compute a histogram for the full posterior, the average, variance, percentile values, or

others.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 63 / 242

Page 325: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Predicted Values, ∼ Y

Predicted values can be for:

1. Forecasts: about the future

2. Farcasts: about some area for which you have no y

3. Nowcasts: about the current data (perhaps to reproduce it to see whether it fits)

To simulate one predicted value, follow these steps:

1. Draw one value of γ = vec(β, α).

2. Choose a predicted value to compute, defined by one value for each explanatoryvariable as the vector Xc .

3. Extract simulated β from γ; compute θc = g(Xc , β) (from systematic component)

4. Simulate outcome variable Yc ∼ f (θc , α) (from stochastic component)

Repeat algorithm say M = 1000 times, to produce 1000 predicted values. Use these to

compute a histogram for the full posterior, the average, variance, percentile values, or

others.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 63 / 242

Page 326: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Predicted Values, ∼ Y

Predicted values can be for:

1. Forecasts: about the future

2. Farcasts: about some area for which you have no y

3. Nowcasts: about the current data (perhaps to reproduce it to see whether it fits)

To simulate one predicted value, follow these steps:

1. Draw one value of γ = vec(β, α).

2. Choose a predicted value to compute, defined by one value for each explanatoryvariable as the vector Xc .

3. Extract simulated β from γ; compute θc = g(Xc , β) (from systematic component)

4. Simulate outcome variable Yc ∼ f (θc , α) (from stochastic component)

Repeat algorithm say M = 1000 times, to produce 1000 predicted values. Use these to

compute a histogram for the full posterior, the average, variance, percentile values, or

others.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 63 / 242

Page 327: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Predicted Values, ∼ Y

Predicted values can be for:

1. Forecasts: about the future

2. Farcasts: about some area for which you have no y

3. Nowcasts: about the current data (perhaps to reproduce it to see whether it fits)

To simulate one predicted value, follow these steps:

1. Draw one value of γ = vec(β, α).

2. Choose a predicted value to compute, defined by one value for each explanatoryvariable as the vector Xc .

3. Extract simulated β from γ; compute θc = g(Xc , β) (from systematic component)

4. Simulate outcome variable Yc ∼ f (θc , α) (from stochastic component)

Repeat algorithm say M = 1000 times, to produce 1000 predicted values. Use these to

compute a histogram for the full posterior, the average, variance, percentile values, or

others.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 63 / 242

Page 328: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Predicted Values, ∼ Y

Predicted values can be for:

1. Forecasts: about the future

2. Farcasts: about some area for which you have no y

3. Nowcasts: about the current data (perhaps to reproduce it to see whether it fits)

To simulate one predicted value, follow these steps:

1. Draw one value of γ = vec(β, α).

2. Choose a predicted value to compute, defined by one value for each explanatoryvariable as the vector Xc .

3. Extract simulated β from γ; compute θc = g(Xc , β) (from systematic component)

4. Simulate outcome variable Yc ∼ f (θc , α) (from stochastic component)

Repeat algorithm say M = 1000 times, to produce 1000 predicted values. Use these to

compute a histogram for the full posterior, the average, variance, percentile values, or

others.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 63 / 242

Page 329: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Predicted Values, ∼ Y

Predicted values can be for:

1. Forecasts: about the future

2. Farcasts: about some area for which you have no y

3. Nowcasts: about the current data (perhaps to reproduce it to see whether it fits)

To simulate one predicted value, follow these steps:

1. Draw one value of γ = vec(β, α).

2. Choose a predicted value to compute, defined by one value for each explanatoryvariable as the vector Xc .

3. Extract simulated β from γ; compute θc = g(Xc , β) (from systematic component)

4. Simulate outcome variable Yc ∼ f (θc , α) (from stochastic component)

Repeat algorithm say M = 1000 times, to produce 1000 predicted values. Use these to

compute a histogram for the full posterior, the average, variance, percentile values, or

others.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 63 / 242

Page 330: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Distribution of Expected v. Predicted Values

1. Predicted values: draws of Y that are or could be observed

2. Expected values: draws of fixed features of the distribution of Y , suchas E (Y ).

3. Predicted values: include estimation and fundamental uncertainty.

4. Expected values: average away fundamental uncertainty

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 64 / 242

Page 331: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Distribution of Expected v. Predicted Values

1. Predicted values: draws of Y that are or could be observed

2. Expected values: draws of fixed features of the distribution of Y , suchas E (Y ).

3. Predicted values: include estimation and fundamental uncertainty.

4. Expected values: average away fundamental uncertainty

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 64 / 242

Page 332: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Distribution of Expected v. Predicted Values

1. Predicted values: draws of Y that are or could be observed

2. Expected values: draws of fixed features of the distribution of Y , suchas E (Y ).

3. Predicted values: include estimation and fundamental uncertainty.

4. Expected values: average away fundamental uncertainty

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 64 / 242

Page 333: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Distribution of Expected v. Predicted Values

1. Predicted values: draws of Y that are or could be observed

2. Expected values: draws of fixed features of the distribution of Y , suchas E (Y ).

3. Predicted values: include estimation and fundamental uncertainty.

4. Expected values: average away fundamental uncertainty

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 64 / 242

Page 334: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Distribution of Expected v. Predicted Values

1. Predicted values: draws of Y that are or could be observed

2. Expected values: draws of fixed features of the distribution of Y , suchas E (Y ).

3. Predicted values: include estimation and fundamental uncertainty.

4. Expected values: average away fundamental uncertainty

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 64 / 242

Page 335: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. The variance of expected values (but not predicted values) go to 0 andn gets large.

6. Example use of predicted value distribution: probability of temperaturecolder than 32◦ tomorrow. (Predicted temperature is uncertain becausewe have to estimate it and because of natural fluctuations.)

7. Example use of expected value distribution: probability the averagetemperature on days like tomorrow will be colder than 32◦. (Expectedtemperature is only uncertain because we have to estimate it; naturalfluctuations in temperature doesn’t affect the average.)

8. Which to use for causal effects & first differences?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 65 / 242

Page 336: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. The variance of expected values (but not predicted values) go to 0 andn gets large.

6. Example use of predicted value distribution: probability of temperaturecolder than 32◦ tomorrow. (Predicted temperature is uncertain becausewe have to estimate it and because of natural fluctuations.)

7. Example use of expected value distribution: probability the averagetemperature on days like tomorrow will be colder than 32◦. (Expectedtemperature is only uncertain because we have to estimate it; naturalfluctuations in temperature doesn’t affect the average.)

8. Which to use for causal effects & first differences?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 65 / 242

Page 337: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. The variance of expected values (but not predicted values) go to 0 andn gets large.

6. Example use of predicted value distribution: probability of temperaturecolder than 32◦ tomorrow. (Predicted temperature is uncertain becausewe have to estimate it and because of natural fluctuations.)

7. Example use of expected value distribution: probability the averagetemperature on days like tomorrow will be colder than 32◦. (Expectedtemperature is only uncertain because we have to estimate it; naturalfluctuations in temperature doesn’t affect the average.)

8. Which to use for causal effects & first differences?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 65 / 242

Page 338: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. The variance of expected values (but not predicted values) go to 0 andn gets large.

6. Example use of predicted value distribution: probability of temperaturecolder than 32◦ tomorrow. (Predicted temperature is uncertain becausewe have to estimate it and because of natural fluctuations.)

7. Example use of expected value distribution: probability the averagetemperature on days like tomorrow will be colder than 32◦. (Expectedtemperature is only uncertain because we have to estimate it; naturalfluctuations in temperature doesn’t affect the average.)

8. Which to use for causal effects & first differences?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 65 / 242

Page 339: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

5. The variance of expected values (but not predicted values) go to 0 andn gets large.

6. Example use of predicted value distribution: probability of temperaturecolder than 32◦ tomorrow. (Predicted temperature is uncertain becausewe have to estimate it and because of natural fluctuations.)

7. Example use of expected value distribution: probability the averagetemperature on days like tomorrow will be colder than 32◦. (Expectedtemperature is only uncertain because we have to estimate it; naturalfluctuations in temperature doesn’t affect the average.)

8. Which to use for causal effects & first differences?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 65 / 242

Page 340: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Expected Values: AnAlgorithm

1. Draw one value of γ = vec(β, α).

2. Choose one value for each explanatory variable (Xc is a vector)

3. Taking the one set of simulated β from γ, compute θc = g(Xc , β)(from the systematic component)

4. Draw m values of the outcome variable Y(k)c (k = 1, . . . ,m) from the

stochastic component f (θc , α). (This step simulates fundamentaluncertainty.)

5. Average over the fundamental uncertainty by calculating the mean ofthe m simulations to yield one simulated expected value

E (Yc) =∑m

k=1 Y(k)c /m.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 66 / 242

Page 341: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Expected Values: AnAlgorithm

1. Draw one value of γ = vec(β, α).

2. Choose one value for each explanatory variable (Xc is a vector)

3. Taking the one set of simulated β from γ, compute θc = g(Xc , β)(from the systematic component)

4. Draw m values of the outcome variable Y(k)c (k = 1, . . . ,m) from the

stochastic component f (θc , α). (This step simulates fundamentaluncertainty.)

5. Average over the fundamental uncertainty by calculating the mean ofthe m simulations to yield one simulated expected value

E (Yc) =∑m

k=1 Y(k)c /m.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 66 / 242

Page 342: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Expected Values: AnAlgorithm

1. Draw one value of γ = vec(β, α).

2. Choose one value for each explanatory variable (Xc is a vector)

3. Taking the one set of simulated β from γ, compute θc = g(Xc , β)(from the systematic component)

4. Draw m values of the outcome variable Y(k)c (k = 1, . . . ,m) from the

stochastic component f (θc , α). (This step simulates fundamentaluncertainty.)

5. Average over the fundamental uncertainty by calculating the mean ofthe m simulations to yield one simulated expected value

E (Yc) =∑m

k=1 Y(k)c /m.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 66 / 242

Page 343: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Expected Values: AnAlgorithm

1. Draw one value of γ = vec(β, α).

2. Choose one value for each explanatory variable (Xc is a vector)

3. Taking the one set of simulated β from γ, compute θc = g(Xc , β)(from the systematic component)

4. Draw m values of the outcome variable Y(k)c (k = 1, . . . ,m) from the

stochastic component f (θc , α). (This step simulates fundamentaluncertainty.)

5. Average over the fundamental uncertainty by calculating the mean ofthe m simulations to yield one simulated expected value

E (Yc) =∑m

k=1 Y(k)c /m.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 66 / 242

Page 344: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Expected Values: AnAlgorithm

1. Draw one value of γ = vec(β, α).

2. Choose one value for each explanatory variable (Xc is a vector)

3. Taking the one set of simulated β from γ, compute θc = g(Xc , β)(from the systematic component)

4. Draw m values of the outcome variable Y(k)c (k = 1, . . . ,m) from the

stochastic component f (θc , α). (This step simulates fundamentaluncertainty.)

5. Average over the fundamental uncertainty by calculating the mean ofthe m simulations to yield one simulated expected value

E (Yc) =∑m

k=1 Y(k)c /m.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 66 / 242

Page 345: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating the Distribution of Expected Values: AnAlgorithm

1. Draw one value of γ = vec(β, α).

2. Choose one value for each explanatory variable (Xc is a vector)

3. Taking the one set of simulated β from γ, compute θc = g(Xc , β)(from the systematic component)

4. Draw m values of the outcome variable Y(k)c (k = 1, . . . ,m) from the

stochastic component f (θc , α). (This step simulates fundamentaluncertainty.)

5. Average over the fundamental uncertainty by calculating the mean ofthe m simulations to yield one simulated expected value

E (Yc) =∑m

k=1 Y(k)c /m.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 66 / 242

Page 346: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating Expected Values: Notes

1. When m = 1, this algorithm produces predicted values.

2. With large m, this algorithm better represents and averages over thefundamental uncertainty.

3. Repeat entire algorithm M times (say 1000), with results differing onlydue to estimation uncertainty

4. Use to compute a histogram, average, standard error, confidenceinterval, etc.

5. When E (Yc) = θc , we can skip the last two steps. E.g., in the logitmodel, once we simulate πi , we don’t need to draw Y and then averageto get back to πi . (If you’re unsure, do it anyway!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 67 / 242

Page 347: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating Expected Values: Notes

1. When m = 1, this algorithm produces predicted values.

2. With large m, this algorithm better represents and averages over thefundamental uncertainty.

3. Repeat entire algorithm M times (say 1000), with results differing onlydue to estimation uncertainty

4. Use to compute a histogram, average, standard error, confidenceinterval, etc.

5. When E (Yc) = θc , we can skip the last two steps. E.g., in the logitmodel, once we simulate πi , we don’t need to draw Y and then averageto get back to πi . (If you’re unsure, do it anyway!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 67 / 242

Page 348: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating Expected Values: Notes

1. When m = 1, this algorithm produces predicted values.

2. With large m, this algorithm better represents and averages over thefundamental uncertainty.

3. Repeat entire algorithm M times (say 1000), with results differing onlydue to estimation uncertainty

4. Use to compute a histogram, average, standard error, confidenceinterval, etc.

5. When E (Yc) = θc , we can skip the last two steps. E.g., in the logitmodel, once we simulate πi , we don’t need to draw Y and then averageto get back to πi . (If you’re unsure, do it anyway!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 67 / 242

Page 349: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating Expected Values: Notes

1. When m = 1, this algorithm produces predicted values.

2. With large m, this algorithm better represents and averages over thefundamental uncertainty.

3. Repeat entire algorithm M times (say 1000), with results differing onlydue to estimation uncertainty

4. Use to compute a histogram, average, standard error, confidenceinterval, etc.

5. When E (Yc) = θc , we can skip the last two steps. E.g., in the logitmodel, once we simulate πi , we don’t need to draw Y and then averageto get back to πi . (If you’re unsure, do it anyway!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 67 / 242

Page 350: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating Expected Values: Notes

1. When m = 1, this algorithm produces predicted values.

2. With large m, this algorithm better represents and averages over thefundamental uncertainty.

3. Repeat entire algorithm M times (say 1000), with results differing onlydue to estimation uncertainty

4. Use to compute a histogram, average, standard error, confidenceinterval, etc.

5. When E (Yc) = θc , we can skip the last two steps. E.g., in the logitmodel, once we simulate πi , we don’t need to draw Y and then averageto get back to πi . (If you’re unsure, do it anyway!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 67 / 242

Page 351: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating Expected Values: Notes

1. When m = 1, this algorithm produces predicted values.

2. With large m, this algorithm better represents and averages over thefundamental uncertainty.

3. Repeat entire algorithm M times (say 1000), with results differing onlydue to estimation uncertainty

4. Use to compute a histogram, average, standard error, confidenceinterval, etc.

5. When E (Yc) = θc , we can skip the last two steps. E.g., in the logitmodel, once we simulate πi , we don’t need to draw Y and then averageto get back to πi . (If you’re unsure, do it anyway!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 67 / 242

Page 352: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating First Differences

To draw one simulated first difference:

1. Choose vectors Xs , the starting point, Xe , the ending point.

2. Apply the expected value algorithm twice, once for Xs and Xe (butreuse the random draws).

3. Take the difference in the two expected values.

4. (To save computation time, and improve approximation, use the samesimulated β in each.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 68 / 242

Page 353: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating First Differences

To draw one simulated first difference:

1. Choose vectors Xs , the starting point, Xe , the ending point.

2. Apply the expected value algorithm twice, once for Xs and Xe (butreuse the random draws).

3. Take the difference in the two expected values.

4. (To save computation time, and improve approximation, use the samesimulated β in each.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 68 / 242

Page 354: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating First Differences

To draw one simulated first difference:

1. Choose vectors Xs , the starting point, Xe , the ending point.

2. Apply the expected value algorithm twice, once for Xs and Xe (butreuse the random draws).

3. Take the difference in the two expected values.

4. (To save computation time, and improve approximation, use the samesimulated β in each.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 68 / 242

Page 355: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating First Differences

To draw one simulated first difference:

1. Choose vectors Xs , the starting point, Xe , the ending point.

2. Apply the expected value algorithm twice, once for Xs and Xe (butreuse the random draws).

3. Take the difference in the two expected values.

4. (To save computation time, and improve approximation, use the samesimulated β in each.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 68 / 242

Page 356: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating First Differences

To draw one simulated first difference:

1. Choose vectors Xs , the starting point, Xe , the ending point.

2. Apply the expected value algorithm twice, once for Xs and Xe (butreuse the random draws).

3. Take the difference in the two expected values.

4. (To save computation time, and improve approximation, use the samesimulated β in each.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 68 / 242

Page 357: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Simulating First Differences

To draw one simulated first difference:

1. Choose vectors Xs , the starting point, Xe , the ending point.

2. Apply the expected value algorithm twice, once for Xs and Xe (butreuse the random draws).

3. Take the difference in the two expected values.

4. (To save computation time, and improve approximation, use the samesimulated β in each.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 68 / 242

Page 358: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Parameters

1. Simulate all parameters (in γ), including ancillary parameters, together,unless you know they are orthogonal.

2. Reparameterize to unbounded scale to

I make γ converge more quickly in n (and so work better with small n) toa multivariate normal. (MLEs don’t change, but the posteriors do.)

I make the maximization algorithm work faster without constraints

3. To do this, all estimated parameters should be unbounded and logicallysymmetric. E.g.,

I σ2 = eη (i.e., wherever you see σ2, in your log-likelihood function,replace it with eη)

I For a probability, π = [1 + e−η]−1 (a logit transformation).I For −1 ≤ ρ ≤ 1, use ρ = (e2η − 1)/(e2η + 1) (Fisher’s Z transformation)

In all 3 cases, η is unbounded: estimate it, simulate from it, andreparameterize back to the scale you care about.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 69 / 242

Page 359: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Parameters

1. Simulate all parameters (in γ), including ancillary parameters, together,unless you know they are orthogonal.

2. Reparameterize to unbounded scale to

I make γ converge more quickly in n (and so work better with small n) toa multivariate normal. (MLEs don’t change, but the posteriors do.)

I make the maximization algorithm work faster without constraints

3. To do this, all estimated parameters should be unbounded and logicallysymmetric. E.g.,

I σ2 = eη (i.e., wherever you see σ2, in your log-likelihood function,replace it with eη)

I For a probability, π = [1 + e−η]−1 (a logit transformation).I For −1 ≤ ρ ≤ 1, use ρ = (e2η − 1)/(e2η + 1) (Fisher’s Z transformation)

In all 3 cases, η is unbounded: estimate it, simulate from it, andreparameterize back to the scale you care about.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 69 / 242

Page 360: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Parameters

1. Simulate all parameters (in γ), including ancillary parameters, together,unless you know they are orthogonal.

2. Reparameterize to unbounded scale to

I make γ converge more quickly in n (and so work better with small n) toa multivariate normal. (MLEs don’t change, but the posteriors do.)

I make the maximization algorithm work faster without constraints

3. To do this, all estimated parameters should be unbounded and logicallysymmetric. E.g.,

I σ2 = eη (i.e., wherever you see σ2, in your log-likelihood function,replace it with eη)

I For a probability, π = [1 + e−η]−1 (a logit transformation).I For −1 ≤ ρ ≤ 1, use ρ = (e2η − 1)/(e2η + 1) (Fisher’s Z transformation)

In all 3 cases, η is unbounded: estimate it, simulate from it, andreparameterize back to the scale you care about.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 69 / 242

Page 361: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Parameters

1. Simulate all parameters (in γ), including ancillary parameters, together,unless you know they are orthogonal.

2. Reparameterize to unbounded scale toI make γ converge more quickly in n (and so work better with small n) to

a multivariate normal. (MLEs don’t change, but the posteriors do.)

I make the maximization algorithm work faster without constraints

3. To do this, all estimated parameters should be unbounded and logicallysymmetric. E.g.,

I σ2 = eη (i.e., wherever you see σ2, in your log-likelihood function,replace it with eη)

I For a probability, π = [1 + e−η]−1 (a logit transformation).I For −1 ≤ ρ ≤ 1, use ρ = (e2η − 1)/(e2η + 1) (Fisher’s Z transformation)

In all 3 cases, η is unbounded: estimate it, simulate from it, andreparameterize back to the scale you care about.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 69 / 242

Page 362: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Parameters

1. Simulate all parameters (in γ), including ancillary parameters, together,unless you know they are orthogonal.

2. Reparameterize to unbounded scale toI make γ converge more quickly in n (and so work better with small n) to

a multivariate normal. (MLEs don’t change, but the posteriors do.)I make the maximization algorithm work faster without constraints

3. To do this, all estimated parameters should be unbounded and logicallysymmetric. E.g.,

I σ2 = eη (i.e., wherever you see σ2, in your log-likelihood function,replace it with eη)

I For a probability, π = [1 + e−η]−1 (a logit transformation).I For −1 ≤ ρ ≤ 1, use ρ = (e2η − 1)/(e2η + 1) (Fisher’s Z transformation)

In all 3 cases, η is unbounded: estimate it, simulate from it, andreparameterize back to the scale you care about.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 69 / 242

Page 363: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Parameters

1. Simulate all parameters (in γ), including ancillary parameters, together,unless you know they are orthogonal.

2. Reparameterize to unbounded scale toI make γ converge more quickly in n (and so work better with small n) to

a multivariate normal. (MLEs don’t change, but the posteriors do.)I make the maximization algorithm work faster without constraints

3. To do this, all estimated parameters should be unbounded and logicallysymmetric. E.g.,

I σ2 = eη (i.e., wherever you see σ2, in your log-likelihood function,replace it with eη)

I For a probability, π = [1 + e−η]−1 (a logit transformation).I For −1 ≤ ρ ≤ 1, use ρ = (e2η − 1)/(e2η + 1) (Fisher’s Z transformation)

In all 3 cases, η is unbounded: estimate it, simulate from it, andreparameterize back to the scale you care about.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 69 / 242

Page 364: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Parameters

1. Simulate all parameters (in γ), including ancillary parameters, together,unless you know they are orthogonal.

2. Reparameterize to unbounded scale toI make γ converge more quickly in n (and so work better with small n) to

a multivariate normal. (MLEs don’t change, but the posteriors do.)I make the maximization algorithm work faster without constraints

3. To do this, all estimated parameters should be unbounded and logicallysymmetric. E.g.,

I σ2 = eη (i.e., wherever you see σ2, in your log-likelihood function,replace it with eη)

I For a probability, π = [1 + e−η]−1 (a logit transformation).I For −1 ≤ ρ ≤ 1, use ρ = (e2η − 1)/(e2η + 1) (Fisher’s Z transformation)

In all 3 cases, η is unbounded: estimate it, simulate from it, andreparameterize back to the scale you care about.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 69 / 242

Page 365: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Parameters

1. Simulate all parameters (in γ), including ancillary parameters, together,unless you know they are orthogonal.

2. Reparameterize to unbounded scale toI make γ converge more quickly in n (and so work better with small n) to

a multivariate normal. (MLEs don’t change, but the posteriors do.)I make the maximization algorithm work faster without constraints

3. To do this, all estimated parameters should be unbounded and logicallysymmetric. E.g.,

I σ2 = eη (i.e., wherever you see σ2, in your log-likelihood function,replace it with eη)

I For a probability, π = [1 + e−η]−1 (a logit transformation).

I For −1 ≤ ρ ≤ 1, use ρ = (e2η − 1)/(e2η + 1) (Fisher’s Z transformation)

In all 3 cases, η is unbounded: estimate it, simulate from it, andreparameterize back to the scale you care about.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 69 / 242

Page 366: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Parameters

1. Simulate all parameters (in γ), including ancillary parameters, together,unless you know they are orthogonal.

2. Reparameterize to unbounded scale toI make γ converge more quickly in n (and so work better with small n) to

a multivariate normal. (MLEs don’t change, but the posteriors do.)I make the maximization algorithm work faster without constraints

3. To do this, all estimated parameters should be unbounded and logicallysymmetric. E.g.,

I σ2 = eη (i.e., wherever you see σ2, in your log-likelihood function,replace it with eη)

I For a probability, π = [1 + e−η]−1 (a logit transformation).I For −1 ≤ ρ ≤ 1, use ρ = (e2η − 1)/(e2η + 1) (Fisher’s Z transformation)

In all 3 cases, η is unbounded: estimate it, simulate from it, andreparameterize back to the scale you care about.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 69 / 242

Page 367: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Parameters

1. Simulate all parameters (in γ), including ancillary parameters, together,unless you know they are orthogonal.

2. Reparameterize to unbounded scale toI make γ converge more quickly in n (and so work better with small n) to

a multivariate normal. (MLEs don’t change, but the posteriors do.)I make the maximization algorithm work faster without constraints

3. To do this, all estimated parameters should be unbounded and logicallysymmetric. E.g.,

I σ2 = eη (i.e., wherever you see σ2, in your log-likelihood function,replace it with eη)

I For a probability, π = [1 + e−η]−1 (a logit transformation).I For −1 ≤ ρ ≤ 1, use ρ = (e2η − 1)/(e2η + 1) (Fisher’s Z transformation)

In all 3 cases, η is unbounded: estimate it, simulate from it, andreparameterize back to the scale you care about.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 69 / 242

Page 368: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Quantities of Interest

1. Unless you’re sure, always compute simulations of Y and use that as a basisfor creating simulations of other quantities. (This will get all information fromthe model in the simulations.)

2. Simulating functions of Y

(a) If some function of Y , such as ln(Y ), is used, simulate ln(Y ) and thenapply the inverse function exp(ln(Y )) to reveal Y .

(b) The usual, but wrong way: Regress ln(Y ) on X , compute predicted value

ln(Y ) and exponentiate.(c) Its wrong because the regression estimates E [ln(Y )], but

E [ln(Y )] 6= ln[E (Y )], so exp(E [ln(Y )]) 6= Y(d) More generally, E (g [Y ]) 6= g [E (Y )], unless g [·] is linear.

3. Check the approximation error of your simulation algorithm: Run it twice,check the number of digits of precision that don’t change. If its not enoughfor your tables, increase M (or m) and try again.

4. Analytical calculations and other tricks can speed simulation, or precision.

5. Canned Software Options: Clarify in Stata, Zelig in R

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 70 / 242

Page 369: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Quantities of Interest

1. Unless you’re sure, always compute simulations of Y and use that as a basisfor creating simulations of other quantities. (This will get all information fromthe model in the simulations.)

2. Simulating functions of Y

(a) If some function of Y , such as ln(Y ), is used, simulate ln(Y ) and thenapply the inverse function exp(ln(Y )) to reveal Y .

(b) The usual, but wrong way: Regress ln(Y ) on X , compute predicted value

ln(Y ) and exponentiate.(c) Its wrong because the regression estimates E [ln(Y )], but

E [ln(Y )] 6= ln[E (Y )], so exp(E [ln(Y )]) 6= Y(d) More generally, E (g [Y ]) 6= g [E (Y )], unless g [·] is linear.

3. Check the approximation error of your simulation algorithm: Run it twice,check the number of digits of precision that don’t change. If its not enoughfor your tables, increase M (or m) and try again.

4. Analytical calculations and other tricks can speed simulation, or precision.

5. Canned Software Options: Clarify in Stata, Zelig in R

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 70 / 242

Page 370: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Quantities of Interest

1. Unless you’re sure, always compute simulations of Y and use that as a basisfor creating simulations of other quantities. (This will get all information fromthe model in the simulations.)

2. Simulating functions of Y

(a) If some function of Y , such as ln(Y ), is used, simulate ln(Y ) and thenapply the inverse function exp(ln(Y )) to reveal Y .

(b) The usual, but wrong way: Regress ln(Y ) on X , compute predicted value

ln(Y ) and exponentiate.(c) Its wrong because the regression estimates E [ln(Y )], but

E [ln(Y )] 6= ln[E (Y )], so exp(E [ln(Y )]) 6= Y(d) More generally, E (g [Y ]) 6= g [E (Y )], unless g [·] is linear.

3. Check the approximation error of your simulation algorithm: Run it twice,check the number of digits of precision that don’t change. If its not enoughfor your tables, increase M (or m) and try again.

4. Analytical calculations and other tricks can speed simulation, or precision.

5. Canned Software Options: Clarify in Stata, Zelig in R

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 70 / 242

Page 371: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Quantities of Interest

1. Unless you’re sure, always compute simulations of Y and use that as a basisfor creating simulations of other quantities. (This will get all information fromthe model in the simulations.)

2. Simulating functions of Y

(a) If some function of Y , such as ln(Y ), is used, simulate ln(Y ) and thenapply the inverse function exp(ln(Y )) to reveal Y .

(b) The usual, but wrong way: Regress ln(Y ) on X , compute predicted value

ln(Y ) and exponentiate.(c) Its wrong because the regression estimates E [ln(Y )], but

E [ln(Y )] 6= ln[E (Y )], so exp(E [ln(Y )]) 6= Y(d) More generally, E (g [Y ]) 6= g [E (Y )], unless g [·] is linear.

3. Check the approximation error of your simulation algorithm: Run it twice,check the number of digits of precision that don’t change. If its not enoughfor your tables, increase M (or m) and try again.

4. Analytical calculations and other tricks can speed simulation, or precision.

5. Canned Software Options: Clarify in Stata, Zelig in R

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 70 / 242

Page 372: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Quantities of Interest

1. Unless you’re sure, always compute simulations of Y and use that as a basisfor creating simulations of other quantities. (This will get all information fromthe model in the simulations.)

2. Simulating functions of Y

(a) If some function of Y , such as ln(Y ), is used, simulate ln(Y ) and thenapply the inverse function exp(ln(Y )) to reveal Y .

(b) The usual, but wrong way: Regress ln(Y ) on X , compute predicted value

ln(Y ) and exponentiate.

(c) Its wrong because the regression estimates E [ln(Y )], butE [ln(Y )] 6= ln[E (Y )], so exp(E [ln(Y )]) 6= Y

(d) More generally, E (g [Y ]) 6= g [E (Y )], unless g [·] is linear.

3. Check the approximation error of your simulation algorithm: Run it twice,check the number of digits of precision that don’t change. If its not enoughfor your tables, increase M (or m) and try again.

4. Analytical calculations and other tricks can speed simulation, or precision.

5. Canned Software Options: Clarify in Stata, Zelig in R

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 70 / 242

Page 373: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Quantities of Interest

1. Unless you’re sure, always compute simulations of Y and use that as a basisfor creating simulations of other quantities. (This will get all information fromthe model in the simulations.)

2. Simulating functions of Y

(a) If some function of Y , such as ln(Y ), is used, simulate ln(Y ) and thenapply the inverse function exp(ln(Y )) to reveal Y .

(b) The usual, but wrong way: Regress ln(Y ) on X , compute predicted value

ln(Y ) and exponentiate.(c) Its wrong because the regression estimates E [ln(Y )], but

E [ln(Y )] 6= ln[E (Y )], so exp(E [ln(Y )]) 6= Y

(d) More generally, E (g [Y ]) 6= g [E (Y )], unless g [·] is linear.

3. Check the approximation error of your simulation algorithm: Run it twice,check the number of digits of precision that don’t change. If its not enoughfor your tables, increase M (or m) and try again.

4. Analytical calculations and other tricks can speed simulation, or precision.

5. Canned Software Options: Clarify in Stata, Zelig in R

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 70 / 242

Page 374: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Quantities of Interest

1. Unless you’re sure, always compute simulations of Y and use that as a basisfor creating simulations of other quantities. (This will get all information fromthe model in the simulations.)

2. Simulating functions of Y

(a) If some function of Y , such as ln(Y ), is used, simulate ln(Y ) and thenapply the inverse function exp(ln(Y )) to reveal Y .

(b) The usual, but wrong way: Regress ln(Y ) on X , compute predicted value

ln(Y ) and exponentiate.(c) Its wrong because the regression estimates E [ln(Y )], but

E [ln(Y )] 6= ln[E (Y )], so exp(E [ln(Y )]) 6= Y(d) More generally, E (g [Y ]) 6= g [E (Y )], unless g [·] is linear.

3. Check the approximation error of your simulation algorithm: Run it twice,check the number of digits of precision that don’t change. If its not enoughfor your tables, increase M (or m) and try again.

4. Analytical calculations and other tricks can speed simulation, or precision.

5. Canned Software Options: Clarify in Stata, Zelig in R

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 70 / 242

Page 375: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Quantities of Interest

1. Unless you’re sure, always compute simulations of Y and use that as a basisfor creating simulations of other quantities. (This will get all information fromthe model in the simulations.)

2. Simulating functions of Y

(a) If some function of Y , such as ln(Y ), is used, simulate ln(Y ) and thenapply the inverse function exp(ln(Y )) to reveal Y .

(b) The usual, but wrong way: Regress ln(Y ) on X , compute predicted value

ln(Y ) and exponentiate.(c) Its wrong because the regression estimates E [ln(Y )], but

E [ln(Y )] 6= ln[E (Y )], so exp(E [ln(Y )]) 6= Y(d) More generally, E (g [Y ]) 6= g [E (Y )], unless g [·] is linear.

3. Check the approximation error of your simulation algorithm: Run it twice,check the number of digits of precision that don’t change. If its not enoughfor your tables, increase M (or m) and try again.

4. Analytical calculations and other tricks can speed simulation, or precision.

5. Canned Software Options: Clarify in Stata, Zelig in R

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 70 / 242

Page 376: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Quantities of Interest

1. Unless you’re sure, always compute simulations of Y and use that as a basisfor creating simulations of other quantities. (This will get all information fromthe model in the simulations.)

2. Simulating functions of Y

(a) If some function of Y , such as ln(Y ), is used, simulate ln(Y ) and thenapply the inverse function exp(ln(Y )) to reveal Y .

(b) The usual, but wrong way: Regress ln(Y ) on X , compute predicted value

ln(Y ) and exponentiate.(c) Its wrong because the regression estimates E [ln(Y )], but

E [ln(Y )] 6= ln[E (Y )], so exp(E [ln(Y )]) 6= Y(d) More generally, E (g [Y ]) 6= g [E (Y )], unless g [·] is linear.

3. Check the approximation error of your simulation algorithm: Run it twice,check the number of digits of precision that don’t change. If its not enoughfor your tables, increase M (or m) and try again.

4. Analytical calculations and other tricks can speed simulation, or precision.

5. Canned Software Options: Clarify in Stata, Zelig in R

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 70 / 242

Page 377: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Tricks for Simulating Quantities of Interest

1. Unless you’re sure, always compute simulations of Y and use that as a basisfor creating simulations of other quantities. (This will get all information fromthe model in the simulations.)

2. Simulating functions of Y

(a) If some function of Y , such as ln(Y ), is used, simulate ln(Y ) and thenapply the inverse function exp(ln(Y )) to reveal Y .

(b) The usual, but wrong way: Regress ln(Y ) on X , compute predicted value

ln(Y ) and exponentiate.(c) Its wrong because the regression estimates E [ln(Y )], but

E [ln(Y )] 6= ln[E (Y )], so exp(E [ln(Y )]) 6= Y(d) More generally, E (g [Y ]) 6= g [E (Y )], unless g [·] is linear.

3. Check the approximation error of your simulation algorithm: Run it twice,check the number of digits of precision that don’t change. If its not enoughfor your tables, increase M (or m) and try again.

4. Analytical calculations and other tricks can speed simulation, or precision.

5. Canned Software Options: Clarify in Stata, Zelig in R

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 70 / 242

Page 378: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Replication of Rosenstone and Hansen from King, Tomzand Wittenberg (2000)

1. Logit of reported turnout on Age, Age2, Education, Income, and Race

2. Quantity of Interest: (nonlinear) effect of age on Pr(vote|X ), holdingconstant Income and Race.

3. Use M = 1000 and compute 99% CI:

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 71 / 242

Page 379: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Replication of Rosenstone and Hansen from King, Tomzand Wittenberg (2000)

1. Logit of reported turnout on Age, Age2, Education, Income, and Race

2. Quantity of Interest: (nonlinear) effect of age on Pr(vote|X ), holdingconstant Income and Race.

3. Use M = 1000 and compute 99% CI:

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 71 / 242

Page 380: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Replication of Rosenstone and Hansen from King, Tomzand Wittenberg (2000)

1. Logit of reported turnout on Age, Age2, Education, Income, and Race

2. Quantity of Interest: (nonlinear) effect of age on Pr(vote|X ), holdingconstant Income and Race.

3. Use M = 1000 and compute 99% CI:

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 71 / 242

Page 381: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Replication of Rosenstone and Hansen from King, Tomzand Wittenberg (2000)

1. Logit of reported turnout on Age, Age2, Education, Income, and Race

2. Quantity of Interest: (nonlinear) effect of age on Pr(vote|X ), holdingconstant Income and Race.

3. Use M = 1000 and compute 99% CI:

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 71 / 242

Page 382: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

To create this graph, simulate:

1. Set age=24, education=high school, income=average, Race=white

2. Run logistic regression

3. Simulate 1000 β’s

4. Compute 1000 πi = [1 + exi β ]−1

5. Sort in numerical order

6. Take 5th and 995th values as the 99% confidence interval

7. Plot a vertical line on the graph at age=24 representing the CI.

8. Repeat for other ages and for college degree.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 72 / 242

Page 383: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

To create this graph, simulate:

1. Set age=24, education=high school, income=average, Race=white

2. Run logistic regression

3. Simulate 1000 β’s

4. Compute 1000 πi = [1 + exi β ]−1

5. Sort in numerical order

6. Take 5th and 995th values as the 99% confidence interval

7. Plot a vertical line on the graph at age=24 representing the CI.

8. Repeat for other ages and for college degree.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 72 / 242

Page 384: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

To create this graph, simulate:

1. Set age=24, education=high school, income=average, Race=white

2. Run logistic regression

3. Simulate 1000 β’s

4. Compute 1000 πi = [1 + exi β ]−1

5. Sort in numerical order

6. Take 5th and 995th values as the 99% confidence interval

7. Plot a vertical line on the graph at age=24 representing the CI.

8. Repeat for other ages and for college degree.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 72 / 242

Page 385: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

To create this graph, simulate:

1. Set age=24, education=high school, income=average, Race=white

2. Run logistic regression

3. Simulate 1000 β’s

4. Compute 1000 πi = [1 + exi β ]−1

5. Sort in numerical order

6. Take 5th and 995th values as the 99% confidence interval

7. Plot a vertical line on the graph at age=24 representing the CI.

8. Repeat for other ages and for college degree.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 72 / 242

Page 386: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

To create this graph, simulate:

1. Set age=24, education=high school, income=average, Race=white

2. Run logistic regression

3. Simulate 1000 β’s

4. Compute 1000 πi = [1 + exi β ]−1

5. Sort in numerical order

6. Take 5th and 995th values as the 99% confidence interval

7. Plot a vertical line on the graph at age=24 representing the CI.

8. Repeat for other ages and for college degree.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 72 / 242

Page 387: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

To create this graph, simulate:

1. Set age=24, education=high school, income=average, Race=white

2. Run logistic regression

3. Simulate 1000 β’s

4. Compute 1000 πi = [1 + exi β ]−1

5. Sort in numerical order

6. Take 5th and 995th values as the 99% confidence interval

7. Plot a vertical line on the graph at age=24 representing the CI.

8. Repeat for other ages and for college degree.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 72 / 242

Page 388: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

To create this graph, simulate:

1. Set age=24, education=high school, income=average, Race=white

2. Run logistic regression

3. Simulate 1000 β’s

4. Compute 1000 πi = [1 + exi β ]−1

5. Sort in numerical order

6. Take 5th and 995th values as the 99% confidence interval

7. Plot a vertical line on the graph at age=24 representing the CI.

8. Repeat for other ages and for college degree.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 72 / 242

Page 389: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

To create this graph, simulate:

1. Set age=24, education=high school, income=average, Race=white

2. Run logistic regression

3. Simulate 1000 β’s

4. Compute 1000 πi = [1 + exi β ]−1

5. Sort in numerical order

6. Take 5th and 995th values as the 99% confidence interval

7. Plot a vertical line on the graph at age=24 representing the CI.

8. Repeat for other ages and for college degree.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 72 / 242

Page 390: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

To create this graph, simulate:

1. Set age=24, education=high school, income=average, Race=white

2. Run logistic regression

3. Simulate 1000 β’s

4. Compute 1000 πi = [1 + exi β ]−1

5. Sort in numerical order

6. Take 5th and 995th values as the 99% confidence interval

7. Plot a vertical line on the graph at age=24 representing the CI.

8. Repeat for other ages and for college degree.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 72 / 242

Page 391: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

To create this graph, simulate:

1. Set age=24, education=high school, income=average, Race=white

2. Run logistic regression

3. Simulate 1000 β’s

4. Compute 1000 πi = [1 + exi β ]−1

5. Sort in numerical order

6. Take 5th and 995th values as the 99% confidence interval

7. Plot a vertical line on the graph at age=24 representing the CI.

8. Repeat for other ages and for college degree.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 72 / 242

Page 392: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Replication of Garrett (King, Tomz and Wittenberg 2000)

Dependent variable: Government Spending as % of GDPKey explanatory variable: left-labor power (high = solid line; low =dashed)Garrett used only point estimates to distinguish the eight quantitiesrepresented above. What new information do we learn with this approach?Left-labor power only has a clear effect when exposure to trade or capitalmobility is high.

See Last Semester’s Slides

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 73 / 242

Page 393: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Replication of Garrett (King, Tomz and Wittenberg 2000)

Dependent variable: Government Spending as % of GDPKey explanatory variable: left-labor power (high = solid line; low =dashed)Garrett used only point estimates to distinguish the eight quantitiesrepresented above. What new information do we learn with this approach?Left-labor power only has a clear effect when exposure to trade or capitalmobility is high.

See Last Semester’s Slides

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 73 / 242

Page 394: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 74 / 242

Page 395: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 74 / 242

Page 396: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Model Diagnostics for Binary Outcome Models

How do you quantify the goodness of fit of your model?

Note: Goodness of fit may or may not be very important

A model that fits well is likely to be a good predictive model

But it does not guarantee that the model is a good causal model

Pseudo-R2: a generalization of R2 to outside of the linear world

R2 = 1− `(βMLE )

`(y)∈ [0, 1]

`(y): log-likelihood of the null model, which sets πi = y for all i

This one is due to McFadden (1974); many other variants exist

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 75 / 242

Page 397: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Model Diagnostics for Binary Outcome Models

How do you quantify the goodness of fit of your model?

Note: Goodness of fit may or may not be very important

A model that fits well is likely to be a good predictive model

But it does not guarantee that the model is a good causal model

Pseudo-R2: a generalization of R2 to outside of the linear world

R2 = 1− `(βMLE )

`(y)∈ [0, 1]

`(y): log-likelihood of the null model, which sets πi = y for all i

This one is due to McFadden (1974); many other variants exist

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 75 / 242

Page 398: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Model Diagnostics for Binary Outcome Models

How do you quantify the goodness of fit of your model?

Note: Goodness of fit may or may not be very important

A model that fits well is likely to be a good predictive model

But it does not guarantee that the model is a good causal model

Pseudo-R2: a generalization of R2 to outside of the linear world

R2 = 1− `(βMLE )

`(y)∈ [0, 1]

`(y): log-likelihood of the null model, which sets πi = y for all i

This one is due to McFadden (1974); many other variants exist

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 75 / 242

Page 399: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Model Diagnostics for Binary Outcome Models

How do you quantify the goodness of fit of your model?

Note: Goodness of fit may or may not be very important

A model that fits well is likely to be a good predictive model

But it does not guarantee that the model is a good causal model

Pseudo-R2: a generalization of R2 to outside of the linear world

R2 = 1− `(βMLE )

`(y)∈ [0, 1]

`(y): log-likelihood of the null model, which sets πi = y for all i

This one is due to McFadden (1974); many other variants exist

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 75 / 242

Page 400: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Model Diagnostics for Binary Outcome Models

How do you quantify the goodness of fit of your model?

Note: Goodness of fit may or may not be very important

A model that fits well is likely to be a good predictive model

But it does not guarantee that the model is a good causal model

Pseudo-R2: a generalization of R2 to outside of the linear world

R2 = 1− `(βMLE )

`(y)∈ [0, 1]

`(y): log-likelihood of the null model, which sets πi = y for all i

This one is due to McFadden (1974); many other variants exist

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 75 / 242

Page 401: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Model Diagnostics for Binary Outcome Models

How do you quantify the goodness of fit of your model?

Note: Goodness of fit may or may not be very important

A model that fits well is likely to be a good predictive model

But it does not guarantee that the model is a good causal model

Pseudo-R2: a generalization of R2 to outside of the linear world

R2 = 1− `(βMLE )

`(y)∈ [0, 1]

`(y): log-likelihood of the null model, which sets πi = y for all i

This one is due to McFadden (1974); many other variants exist

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 75 / 242

Page 402: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do you know which model is better?

1. Out-of-sample forecasts (or farcasts)

(a) Your job: find the underlying (persistent) structure, not the idiosyncraticfeatures of any one data set.

(b) Set aside some (test) data.(c) Fit your model to the rest (the training data).(d) Make predictions with training set; compare to the test set.(e) Comparisons to average prediction and full distribution.(f) E.g., if a set of predictions have Pr(y = 1) = 0.2, then 20% of these

observations in the test set should be 1s.(g) The best test sets are really out of sample, not even available yet.(h) If the world changes, an otherwise good model will fail. But it’s still the

right test.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 76 / 242

Page 403: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do you know which model is better?

1. Out-of-sample forecasts (or farcasts)

(a) Your job: find the underlying (persistent) structure, not the idiosyncraticfeatures of any one data set.

(b) Set aside some (test) data.(c) Fit your model to the rest (the training data).(d) Make predictions with training set; compare to the test set.(e) Comparisons to average prediction and full distribution.(f) E.g., if a set of predictions have Pr(y = 1) = 0.2, then 20% of these

observations in the test set should be 1s.(g) The best test sets are really out of sample, not even available yet.(h) If the world changes, an otherwise good model will fail. But it’s still the

right test.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 76 / 242

Page 404: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do you know which model is better?

1. Out-of-sample forecasts (or farcasts)

(a) Your job: find the underlying (persistent) structure, not the idiosyncraticfeatures of any one data set.

(b) Set aside some (test) data.(c) Fit your model to the rest (the training data).(d) Make predictions with training set; compare to the test set.(e) Comparisons to average prediction and full distribution.(f) E.g., if a set of predictions have Pr(y = 1) = 0.2, then 20% of these

observations in the test set should be 1s.(g) The best test sets are really out of sample, not even available yet.(h) If the world changes, an otherwise good model will fail. But it’s still the

right test.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 76 / 242

Page 405: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do you know which model is better?

1. Out-of-sample forecasts (or farcasts)

(a) Your job: find the underlying (persistent) structure, not the idiosyncraticfeatures of any one data set.

(b) Set aside some (test) data.

(c) Fit your model to the rest (the training data).(d) Make predictions with training set; compare to the test set.(e) Comparisons to average prediction and full distribution.(f) E.g., if a set of predictions have Pr(y = 1) = 0.2, then 20% of these

observations in the test set should be 1s.(g) The best test sets are really out of sample, not even available yet.(h) If the world changes, an otherwise good model will fail. But it’s still the

right test.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 76 / 242

Page 406: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do you know which model is better?

1. Out-of-sample forecasts (or farcasts)

(a) Your job: find the underlying (persistent) structure, not the idiosyncraticfeatures of any one data set.

(b) Set aside some (test) data.(c) Fit your model to the rest (the training data).

(d) Make predictions with training set; compare to the test set.(e) Comparisons to average prediction and full distribution.(f) E.g., if a set of predictions have Pr(y = 1) = 0.2, then 20% of these

observations in the test set should be 1s.(g) The best test sets are really out of sample, not even available yet.(h) If the world changes, an otherwise good model will fail. But it’s still the

right test.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 76 / 242

Page 407: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do you know which model is better?

1. Out-of-sample forecasts (or farcasts)

(a) Your job: find the underlying (persistent) structure, not the idiosyncraticfeatures of any one data set.

(b) Set aside some (test) data.(c) Fit your model to the rest (the training data).(d) Make predictions with training set; compare to the test set.

(e) Comparisons to average prediction and full distribution.(f) E.g., if a set of predictions have Pr(y = 1) = 0.2, then 20% of these

observations in the test set should be 1s.(g) The best test sets are really out of sample, not even available yet.(h) If the world changes, an otherwise good model will fail. But it’s still the

right test.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 76 / 242

Page 408: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do you know which model is better?

1. Out-of-sample forecasts (or farcasts)

(a) Your job: find the underlying (persistent) structure, not the idiosyncraticfeatures of any one data set.

(b) Set aside some (test) data.(c) Fit your model to the rest (the training data).(d) Make predictions with training set; compare to the test set.(e) Comparisons to average prediction and full distribution.

(f) E.g., if a set of predictions have Pr(y = 1) = 0.2, then 20% of theseobservations in the test set should be 1s.

(g) The best test sets are really out of sample, not even available yet.(h) If the world changes, an otherwise good model will fail. But it’s still the

right test.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 76 / 242

Page 409: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do you know which model is better?

1. Out-of-sample forecasts (or farcasts)

(a) Your job: find the underlying (persistent) structure, not the idiosyncraticfeatures of any one data set.

(b) Set aside some (test) data.(c) Fit your model to the rest (the training data).(d) Make predictions with training set; compare to the test set.(e) Comparisons to average prediction and full distribution.(f) E.g., if a set of predictions have Pr(y = 1) = 0.2, then 20% of these

observations in the test set should be 1s.

(g) The best test sets are really out of sample, not even available yet.(h) If the world changes, an otherwise good model will fail. But it’s still the

right test.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 76 / 242

Page 410: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do you know which model is better?

1. Out-of-sample forecasts (or farcasts)

(a) Your job: find the underlying (persistent) structure, not the idiosyncraticfeatures of any one data set.

(b) Set aside some (test) data.(c) Fit your model to the rest (the training data).(d) Make predictions with training set; compare to the test set.(e) Comparisons to average prediction and full distribution.(f) E.g., if a set of predictions have Pr(y = 1) = 0.2, then 20% of these

observations in the test set should be 1s.(g) The best test sets are really out of sample, not even available yet.

(h) If the world changes, an otherwise good model will fail. But it’s still theright test.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 76 / 242

Page 411: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do you know which model is better?

1. Out-of-sample forecasts (or farcasts)

(a) Your job: find the underlying (persistent) structure, not the idiosyncraticfeatures of any one data set.

(b) Set aside some (test) data.(c) Fit your model to the rest (the training data).(d) Make predictions with training set; compare to the test set.(e) Comparisons to average prediction and full distribution.(f) E.g., if a set of predictions have Pr(y = 1) = 0.2, then 20% of these

observations in the test set should be 1s.(g) The best test sets are really out of sample, not even available yet.(h) If the world changes, an otherwise good model will fail. But it’s still the

right test.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 76 / 242

Page 412: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(See Trevor Hastie et al. 2001. The Elements of Statistical Learning, Springer, Chapter 7: Fig 7.1.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 77 / 242

Page 413: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.

I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.

• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 414: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.

I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.

• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 415: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.

I C could come from your philosophical justification, survey of policymakers, a review of the literature, etc.

I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.

• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 416: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.

I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.

• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 417: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.

I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.

• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 418: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.

• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 419: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.• If C = 1, predict y = 1 when π > 0.5

• If C = 2, predict y = 1 when π > 1/3I Only with C chosen can we compute (a) % of 1s correctly predicted

and (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 420: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 421: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 422: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 423: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of C

I Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 424: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0s

I Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 425: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.

I If one curve is above another the whole way, then that model dominatesthe other. It’s better no matter your normative decision (about C )

I Otherwise, one model is better than the other in only given specificedranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 426: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )

I Otherwise, one model is better than the other in only given specificedranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 427: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

(i) Binary variable predictions require a normative decision.I Let C be number of times more costly misclassifying a 1 is than a 0.I C must be chosen independently of the data.I C could come from your philosophical justification, survey of policy

makers, a review of the literature, etc.I People often choose C = 1, but without justification.I Decision theory: choose Y = 1 when π > 1/(1 + C ) and 0 otherwise.• If C = 1, predict y = 1 when π > 0.5• If C = 2, predict y = 1 when π > 1/3

I Only with C chosen can we compute (a) % of 1s correctly predictedand (b) % of 0s correctly predicted, and (c) patterns in errors indifferent subsets of the data or forecasts.

(j) If you can’t justify a choice for C , use ROC (receiver-operatorcharacteristic) curves

I Compute %1s and %0s correctly predicted for every possible value of CI Plot %1s by %0sI Overlay curves for several models on the same graph.I If one curve is above another the whole way, then that model dominates

the other. It’s better no matter your normative decision (about C )I Otherwise, one model is better than the other in only given specificed

ranges of C (i.e., for only some normative perspectives).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 78 / 242

Page 428: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

In sample ROC, on left (from Gary King and Langche Zeng. “ImprovingForecasts of State Failure,” World Politics, Vol. 53, No. 4 (July, 2001):623-58)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 79 / 242

Page 429: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:

F transform bounded variablesF transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 430: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:

F transform bounded variablesF transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 431: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:

F transform bounded variablesF transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 432: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:

F transform bounded variablesF transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 433: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:

F transform bounded variablesF transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 434: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics

(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:

F transform bounded variablesF transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 435: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y

(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:

F transform bounded variablesF transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 436: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:

F transform bounded variablesF transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 437: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:

F transform bounded variablesF transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 438: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:F transform bounded variables

F transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 439: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:F transform bounded variablesF transform heteroskedastic results

F highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 440: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

4. Cross-validation

(a) The idea: set aside k observations as the “test set”: evaluate; and thenset aside another set of k observations. Repeat multiple times; reportperformance averaged over subsets

(b) Useful for smaller data sets; real test sets are better.

5. Fit, in general: Look for all possible observable implications of a model,and compare to observations. (Think. Be creative here!)

6. Fit: continuous variables

(a) The usual regression diagnostics(b) E.G., plots of e = y − y by X , Y or y(c) Check more than the means. E.g., plot e by y and draw a line at 0 and at±1, 2 se’s. 66%, 95% of the observations should fall between the lines.

(d) For graphics:F transform bounded variablesF transform heteroskedastic resultsF highlight key results; label everything

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 80 / 242

Page 441: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

7. Fit: dichotomous variables

(a) Sort estimated probabilities into bins of say 0.1 width: [0, 0.1),[0.1, 0.2),. . . , [0.9, 1].

(b) From the observations in each bin, compute (a) the mean predictions(probably near 0.05, 0.15, etc.) and (b) the average fraction of 1s.

(c) Plot (a) by (b) and look for systematic deviation from 45◦ line.

In sample calibration graph on right (from Gary King and Langche Zeng.“Improving Forecasts of State Failure,” World Politics, Vol. 53, No. 4(July, 2001): 623-58)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 81 / 242

Page 442: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

7. Fit: dichotomous variables(a) Sort estimated probabilities into bins of say 0.1 width: [0, 0.1),

[0.1, 0.2),. . . , [0.9, 1].

(b) From the observations in each bin, compute (a) the mean predictions(probably near 0.05, 0.15, etc.) and (b) the average fraction of 1s.

(c) Plot (a) by (b) and look for systematic deviation from 45◦ line.

In sample calibration graph on right (from Gary King and Langche Zeng.“Improving Forecasts of State Failure,” World Politics, Vol. 53, No. 4(July, 2001): 623-58)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 81 / 242

Page 443: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

7. Fit: dichotomous variables(a) Sort estimated probabilities into bins of say 0.1 width: [0, 0.1),

[0.1, 0.2),. . . , [0.9, 1].(b) From the observations in each bin, compute (a) the mean predictions

(probably near 0.05, 0.15, etc.) and (b) the average fraction of 1s.

(c) Plot (a) by (b) and look for systematic deviation from 45◦ line.

In sample calibration graph on right (from Gary King and Langche Zeng.“Improving Forecasts of State Failure,” World Politics, Vol. 53, No. 4(July, 2001): 623-58)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 81 / 242

Page 444: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

7. Fit: dichotomous variables(a) Sort estimated probabilities into bins of say 0.1 width: [0, 0.1),

[0.1, 0.2),. . . , [0.9, 1].(b) From the observations in each bin, compute (a) the mean predictions

(probably near 0.05, 0.15, etc.) and (b) the average fraction of 1s.(c) Plot (a) by (b) and look for systematic deviation from 45◦ line.

In sample calibration graph on right (from Gary King and Langche Zeng.“Improving Forecasts of State Failure,” World Politics, Vol. 53, No. 4(July, 2001): 623-58)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 81 / 242

Page 445: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

7. Fit: dichotomous variables(a) Sort estimated probabilities into bins of say 0.1 width: [0, 0.1),

[0.1, 0.2),. . . , [0.9, 1].(b) From the observations in each bin, compute (a) the mean predictions

(probably near 0.05, 0.15, etc.) and (b) the average fraction of 1s.(c) Plot (a) by (b) and look for systematic deviation from 45◦ line.

In sample calibration graph on right (from Gary King and Langche Zeng.“Improving Forecasts of State Failure,” World Politics, Vol. 53, No. 4(July, 2001): 623-58)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 81 / 242

Page 446: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Out of sample calibration graph on right.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 82 / 242

Page 447: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Out-of-Sample with Cross-Validation

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

●●●●●●

●●●●●●●●●

●●●

●●●●●●●●●●●●●

●●●●●

●●●●●

●●●●●●●●●●●●●●●●●●●●

●●

●●●●●●●●●●●●●●●●●●●

●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

Fitted Probabilities vs. Actual Outcomes

Mean Fitted Prob(Y=1)

Fra

ctio

n of

Y=

1

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ROC Curve for the Fearon and Laitin(2003) Data

Proportion of 1's correctly classified

Pro

port

ion

of 0

's c

orre

ctly

cla

ssifi

ed

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 83 / 242

Page 448: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

New Developments: Separation plots

Greenhill, Ward and Sacks (2011)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 84 / 242

Page 449: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

New Developments: Separation plots

Greenhill, Ward and Sacks (2011)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 84 / 242

Page 450: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

New Developments: Separation plots

Greenhill, Ward and Sacks (2011)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 84 / 242

Page 451: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

New Developments: Separation plots

Greenhill, Ward and Sacks (2011)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 84 / 242

Page 452: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

New Developments: Separation plotsGreenhill, Ward and Sacks (2011)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 84 / 242

Page 453: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

New Developments: Separation plots

Greenhill, Ward and Sacks (2011)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 84 / 242

Page 454: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

New Developments: Separation plots

Greenhill, Ward and Sacks (2011)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 84 / 242

Page 455: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

New Developments

Hanmer and Kalkan distinguish between:

average case (most common)

observed value (they argue better)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 85 / 242

Page 456: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

New Developments

Hanmer and Kalkan distinguish between:

average case (most common)

observed value (they argue better)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 85 / 242

Page 457: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

New Developments

Hanmer and Kalkan distinguish between:

average case (most common)

observed value (they argue better)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 85 / 242

Page 458: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Consider the case of voting for Bush in 2004. Average case is:

a white

48 year old woman

identifies as independent and a political moderate

with an associates degree

believes economic performance has been constant

disapproves of the Iraq war but not strongly

with an income between $45K and $50K

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 86 / 242

Page 459: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Consider the case of voting for Bush in 2004.

Average case is:

a white

48 year old woman

identifies as independent and a political moderate

with an associates degree

believes economic performance has been constant

disapproves of the Iraq war but not strongly

with an income between $45K and $50K

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 86 / 242

Page 460: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Consider the case of voting for Bush in 2004. Average case is:

a white

48 year old woman

identifies as independent and a political moderate

with an associates degree

believes economic performance has been constant

disapproves of the Iraq war but not strongly

with an income between $45K and $50K

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 86 / 242

Page 461: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Consider the case of voting for Bush in 2004. Average case is:

a white

48 year old woman

identifies as independent and a political moderate

with an associates degree

believes economic performance has been constant

disapproves of the Iraq war but not strongly

with an income between $45K and $50K

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 86 / 242

Page 462: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Consider the case of voting for Bush in 2004. Average case is:

a white

48 year old woman

identifies as independent and a political moderate

with an associates degree

believes economic performance has been constant

disapproves of the Iraq war but not strongly

with an income between $45K and $50K

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 86 / 242

Page 463: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Consider the case of voting for Bush in 2004. Average case is:

a white

48 year old woman

identifies as independent and a political moderate

with an associates degree

believes economic performance has been constant

disapproves of the Iraq war but not strongly

with an income between $45K and $50K

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 86 / 242

Page 464: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Consider the case of voting for Bush in 2004. Average case is:

a white

48 year old woman

identifies as independent and a political moderate

with an associates degree

believes economic performance has been constant

disapproves of the Iraq war but not strongly

with an income between $45K and $50K

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 86 / 242

Page 465: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Consider the case of voting for Bush in 2004. Average case is:

a white

48 year old woman

identifies as independent and a political moderate

with an associates degree

believes economic performance has been constant

disapproves of the Iraq war but not strongly

with an income between $45K and $50K

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 86 / 242

Page 466: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Consider the case of voting for Bush in 2004. Average case is:

a white

48 year old woman

identifies as independent and a political moderate

with an associates degree

believes economic performance has been constant

disapproves of the Iraq war but not strongly

with an income between $45K and $50K

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 86 / 242

Page 467: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 87 / 242

Page 468: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Try to come up with an argument for why the average-case method willtend to produce bigger changes than the observed-case method.

Average cases are likely to be in the “middle” of the data where thepredicted probabilities are changing the fastest. Think about Bill O’Reillyand Rachel Maddow. They are going to show up in the observed-casemethod but not the average-case method.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 88 / 242

Page 469: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Try to come up with an argument for why the average-case method willtend to produce bigger changes than the observed-case method.

Average cases are likely to be in the “middle” of the data where thepredicted probabilities are changing the fastest. Think about Bill O’Reillyand Rachel Maddow. They are going to show up in the observed-casemethod but not the average-case method.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 88 / 242

Page 470: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Case for Observed Case

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 89 / 242

Page 471: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

UPM Remainder of Chapter 5.

Optionally: Greenhill et al. 2011, Hamner and Kalkan 2013

Also Helpful: Mood 2010, Berry, DeMeritt and Esarey 2010

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 90 / 242

Page 472: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 91 / 242

Page 473: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 91 / 242

Page 474: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling Ordered Outcomes

Suppose that we have an outcome which is one of J choices that areordered in a substantively meaningful way

Examples:I “Likert scale” in survey questions (“strongly agree”, “agree”, etc.)I Party positions (extreme left, center left, center, right, extreme right)I Levels of democracy (autocracy, anocracy, democracy)I Health status (healthy, sick, dying, dead)

Why not use continuous outcome models?−→ Don’t want to assume equal distances between levels

Why not use categorical outcome models?−→ Don’t want to waste information about ordering

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 92 / 242

Page 475: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling Ordered Outcomes

Suppose that we have an outcome which is one of J choices that areordered in a substantively meaningful way

Examples:I “Likert scale” in survey questions (“strongly agree”, “agree”, etc.)I Party positions (extreme left, center left, center, right, extreme right)I Levels of democracy (autocracy, anocracy, democracy)I Health status (healthy, sick, dying, dead)

Why not use continuous outcome models?−→ Don’t want to assume equal distances between levels

Why not use categorical outcome models?−→ Don’t want to waste information about ordering

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 92 / 242

Page 476: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling Ordered Outcomes

Suppose that we have an outcome which is one of J choices that areordered in a substantively meaningful way

Examples:I “Likert scale” in survey questions (“strongly agree”, “agree”, etc.)

I Party positions (extreme left, center left, center, right, extreme right)I Levels of democracy (autocracy, anocracy, democracy)I Health status (healthy, sick, dying, dead)

Why not use continuous outcome models?−→ Don’t want to assume equal distances between levels

Why not use categorical outcome models?−→ Don’t want to waste information about ordering

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 92 / 242

Page 477: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling Ordered Outcomes

Suppose that we have an outcome which is one of J choices that areordered in a substantively meaningful way

Examples:I “Likert scale” in survey questions (“strongly agree”, “agree”, etc.)I Party positions (extreme left, center left, center, right, extreme right)

I Levels of democracy (autocracy, anocracy, democracy)I Health status (healthy, sick, dying, dead)

Why not use continuous outcome models?−→ Don’t want to assume equal distances between levels

Why not use categorical outcome models?−→ Don’t want to waste information about ordering

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 92 / 242

Page 478: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling Ordered Outcomes

Suppose that we have an outcome which is one of J choices that areordered in a substantively meaningful way

Examples:I “Likert scale” in survey questions (“strongly agree”, “agree”, etc.)I Party positions (extreme left, center left, center, right, extreme right)I Levels of democracy (autocracy, anocracy, democracy)

I Health status (healthy, sick, dying, dead)

Why not use continuous outcome models?−→ Don’t want to assume equal distances between levels

Why not use categorical outcome models?−→ Don’t want to waste information about ordering

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 92 / 242

Page 479: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling Ordered Outcomes

Suppose that we have an outcome which is one of J choices that areordered in a substantively meaningful way

Examples:I “Likert scale” in survey questions (“strongly agree”, “agree”, etc.)I Party positions (extreme left, center left, center, right, extreme right)I Levels of democracy (autocracy, anocracy, democracy)I Health status (healthy, sick, dying, dead)

Why not use continuous outcome models?−→ Don’t want to assume equal distances between levels

Why not use categorical outcome models?−→ Don’t want to waste information about ordering

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 92 / 242

Page 480: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling Ordered Outcomes

Suppose that we have an outcome which is one of J choices that areordered in a substantively meaningful way

Examples:I “Likert scale” in survey questions (“strongly agree”, “agree”, etc.)I Party positions (extreme left, center left, center, right, extreme right)I Levels of democracy (autocracy, anocracy, democracy)I Health status (healthy, sick, dying, dead)

Why not use continuous outcome models?

−→ Don’t want to assume equal distances between levels

Why not use categorical outcome models?−→ Don’t want to waste information about ordering

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 92 / 242

Page 481: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling Ordered Outcomes

Suppose that we have an outcome which is one of J choices that areordered in a substantively meaningful way

Examples:I “Likert scale” in survey questions (“strongly agree”, “agree”, etc.)I Party positions (extreme left, center left, center, right, extreme right)I Levels of democracy (autocracy, anocracy, democracy)I Health status (healthy, sick, dying, dead)

Why not use continuous outcome models?−→ Don’t want to assume equal distances between levels

Why not use categorical outcome models?−→ Don’t want to waste information about ordering

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 92 / 242

Page 482: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling Ordered Outcomes

Suppose that we have an outcome which is one of J choices that areordered in a substantively meaningful way

Examples:I “Likert scale” in survey questions (“strongly agree”, “agree”, etc.)I Party positions (extreme left, center left, center, right, extreme right)I Levels of democracy (autocracy, anocracy, democracy)I Health status (healthy, sick, dying, dead)

Why not use continuous outcome models?−→ Don’t want to assume equal distances between levels

Why not use categorical outcome models?

−→ Don’t want to waste information about ordering

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 92 / 242

Page 483: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling Ordered Outcomes

Suppose that we have an outcome which is one of J choices that areordered in a substantively meaningful way

Examples:I “Likert scale” in survey questions (“strongly agree”, “agree”, etc.)I Party positions (extreme left, center left, center, right, extreme right)I Levels of democracy (autocracy, anocracy, democracy)I Health status (healthy, sick, dying, dead)

Why not use continuous outcome models?−→ Don’t want to assume equal distances between levels

Why not use categorical outcome models?−→ Don’t want to waste information about ordering

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 92 / 242

Page 484: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models

The model

Y ∗i ∼ STN(y∗i |µi )µi = xiβ

Observation mechanism

yij =

{1 if τj−1,i ≤ y∗i ≤ τj ,i0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 93 / 242

Page 485: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models

The model

Y ∗i ∼ STN(y∗i |µi )µi = xiβ

Observation mechanism

yij =

{1 if τj−1,i ≤ y∗i ≤ τj ,i0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 93 / 242

Page 486: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models

The model

Y ∗i ∼ STN(y∗i |µi )

µi = xiβ

Observation mechanism

yij =

{1 if τj−1,i ≤ y∗i ≤ τj ,i0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 93 / 242

Page 487: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models

The model

Y ∗i ∼ STN(y∗i |µi )µi = xiβ

Observation mechanism

yij =

{1 if τj−1,i ≤ y∗i ≤ τj ,i0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 93 / 242

Page 488: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models

The model

Y ∗i ∼ STN(y∗i |µi )µi = xiβ

Observation mechanism

yij =

{1 if τj−1,i ≤ y∗i ≤ τj ,i0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 93 / 242

Page 489: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models

The model

Y ∗i ∼ STN(y∗i |µi )µi = xiβ

Observation mechanism

yij =

{1 if τj−1,i ≤ y∗i ≤ τj ,i0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 93 / 242

Page 490: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models

The model

Y ∗i ∼ STN(y∗i |µi )µi = xiβ

Observation mechanism

yij =

{1 if τj−1,i ≤ y∗i ≤ τj ,i0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 93 / 242

Page 491: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models

The model

Y ∗i ∼ STN(y∗i |µi )µi = xiβ

Observation mechanism

yij =

{1 if τj−1,i ≤ y∗i ≤ τj ,i0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 93 / 242

Page 492: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable ModelsThe model

Y ∗i ∼ STN(y∗i |µi )µi = xiβ

Observation mechanism

yij =

{1 if τj−1,i ≤ y∗i ≤ τj ,i0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 93 / 242

Page 493: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Logit and Probit ModelsAgain, the latent variable representation: Y ∗i = X>i β + εi

Assume that Y ∗i gives rise to Yi based on the following scheme:

Yi =

1 if −∞(= ψ0) < Y ∗i ≤ ψ1,2 if ψ1 < Y ∗i ≤ ψ2,...

...J if ψJ−1 < Y ∗i ≤ ∞(= ψJ)

where ψ1, ..., ψJ−1 are the threshold parameters to be estimatedIf Xi contains an intercept, one of the ψ’s must be fixed foridentifiability (typically ψ1 = 0)

εji.i.d.∼ logistic ⇒ the ordered logit model:

Pr(Yi ≤ j | Xi ) =exp(ψj − X>i β)

1 + exp(ψj − X>i β)

εji.i.d.∼ N (0, 1) ⇒ the ordered probit model:

Pr(Yi ≤ j | Xi ) = Φ(ψj − X>i β

)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 94 / 242

Page 494: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Logit and Probit ModelsAgain, the latent variable representation: Y ∗i = X>i β + εiAssume that Y ∗i gives rise to Yi based on the following scheme:

Yi =

1 if −∞(= ψ0) < Y ∗i ≤ ψ1,2 if ψ1 < Y ∗i ≤ ψ2,...

...J if ψJ−1 < Y ∗i ≤ ∞(= ψJ)

where ψ1, ..., ψJ−1 are the threshold parameters to be estimated

If Xi contains an intercept, one of the ψ’s must be fixed foridentifiability (typically ψ1 = 0)

εji.i.d.∼ logistic ⇒ the ordered logit model:

Pr(Yi ≤ j | Xi ) =exp(ψj − X>i β)

1 + exp(ψj − X>i β)

εji.i.d.∼ N (0, 1) ⇒ the ordered probit model:

Pr(Yi ≤ j | Xi ) = Φ(ψj − X>i β

)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 94 / 242

Page 495: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Logit and Probit ModelsAgain, the latent variable representation: Y ∗i = X>i β + εiAssume that Y ∗i gives rise to Yi based on the following scheme:

Yi =

1 if −∞(= ψ0) < Y ∗i ≤ ψ1,2 if ψ1 < Y ∗i ≤ ψ2,...

...J if ψJ−1 < Y ∗i ≤ ∞(= ψJ)

where ψ1, ..., ψJ−1 are the threshold parameters to be estimatedIf Xi contains an intercept, one of the ψ’s must be fixed foridentifiability (typically ψ1 = 0)

εji.i.d.∼ logistic ⇒ the ordered logit model:

Pr(Yi ≤ j | Xi ) =exp(ψj − X>i β)

1 + exp(ψj − X>i β)

εji.i.d.∼ N (0, 1) ⇒ the ordered probit model:

Pr(Yi ≤ j | Xi ) = Φ(ψj − X>i β

)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 94 / 242

Page 496: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Logit and Probit ModelsAgain, the latent variable representation: Y ∗i = X>i β + εiAssume that Y ∗i gives rise to Yi based on the following scheme:

Yi =

1 if −∞(= ψ0) < Y ∗i ≤ ψ1,2 if ψ1 < Y ∗i ≤ ψ2,...

...J if ψJ−1 < Y ∗i ≤ ∞(= ψJ)

where ψ1, ..., ψJ−1 are the threshold parameters to be estimatedIf Xi contains an intercept, one of the ψ’s must be fixed foridentifiability (typically ψ1 = 0)

εji.i.d.∼ logistic ⇒ the ordered logit model:

Pr(Yi ≤ j | Xi ) =exp(ψj − X>i β)

1 + exp(ψj − X>i β)

εji.i.d.∼ N (0, 1) ⇒ the ordered probit model:

Pr(Yi ≤ j | Xi ) = Φ(ψj − X>i β

)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 94 / 242

Page 497: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Logit and Probit ModelsAgain, the latent variable representation: Y ∗i = X>i β + εiAssume that Y ∗i gives rise to Yi based on the following scheme:

Yi =

1 if −∞(= ψ0) < Y ∗i ≤ ψ1,2 if ψ1 < Y ∗i ≤ ψ2,...

...J if ψJ−1 < Y ∗i ≤ ∞(= ψJ)

where ψ1, ..., ψJ−1 are the threshold parameters to be estimatedIf Xi contains an intercept, one of the ψ’s must be fixed foridentifiability (typically ψ1 = 0)

εji.i.d.∼ logistic ⇒ the ordered logit model:

Pr(Yi ≤ j | Xi ) =exp(ψj − X>i β)

1 + exp(ψj − X>i β)

εji.i.d.∼ N (0, 1) ⇒ the ordered probit model:

Pr(Yi ≤ j | Xi ) = Φ(ψj − X>i β

)Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 94 / 242

Page 498: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Logit and Probit Models

0.0

0.2

0.4

0.6

0.8

1.0

F(Y

i*)

0.0

0.1

0.2

0.3

0.4

f(Yi*)

Yi* ~ N(XiTβ, 1)

XiTβ

Yi*

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 95 / 242

Page 499: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Logit and Probit Models

0.0

0.2

0.4

0.6

0.8

1.0

F(Y

i*)

Pr(Yi = 1)

ψ1

0.0

0.1

0.2

0.3

0.4

f(Yi*)

Yi* ~ N(XiTβ, 1)

XiTβ

Pr(Yi = 1)

Yi*

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 95 / 242

Page 500: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Logit and Probit Models

0.0

0.2

0.4

0.6

0.8

1.0

F(Y

i*)

Pr(Yi = 1)

ψ1

Pr(Yi ≤ 2)

ψ2

0.0

0.1

0.2

0.3

0.4

f(Yi*)

Yi* ~ N(XiTβ, 1)

XiTβ

Pr(Yi = 1) Pr(Yi = 2)

Yi*

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 95 / 242

Page 501: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Logit and Probit Models

0.0

0.2

0.4

0.6

0.8

1.0

F(Y

i*)

Pr(Yi = 1)

ψ1

Pr(Yi ≤ 2)

ψ2

Pr(Yi ≤ 3)

ψ3

0.0

0.1

0.2

0.3

0.4

f(Yi*)

Yi* ~ N(XiTβ, 1)

XiTβ

Pr(Yi = 1) Pr(Yi = 2) Pr(Yi = 3)

Yi*

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 95 / 242

Page 502: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models: Connections

1. Y ∗i ∼ STL(y∗i |µi )→ ordinal logitY ∗i ∼ STN(y∗i |µi )→ ordinal probit

2. Alternate representation: dichotomous variable Yji for each category j ,only one of which is 1; the others are 0.

3. If Y ∗i is observed, the probit version is a linear-normal regression model

4. If a dichotomous realization of Y ∗ is observed, its a logit/probit model

5. This is the same model, and the same parameters are being estimated;only the observation mechanism differs.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 96 / 242

Page 503: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models: Connections

1. Y ∗i ∼ STL(y∗i |µi )→ ordinal logitY ∗i ∼ STN(y∗i |µi )→ ordinal probit

2. Alternate representation: dichotomous variable Yji for each category j ,only one of which is 1; the others are 0.

3. If Y ∗i is observed, the probit version is a linear-normal regression model

4. If a dichotomous realization of Y ∗ is observed, its a logit/probit model

5. This is the same model, and the same parameters are being estimated;only the observation mechanism differs.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 96 / 242

Page 504: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models: Connections

1. Y ∗i ∼ STL(y∗i |µi )→ ordinal logitY ∗i ∼ STN(y∗i |µi )→ ordinal probit

2. Alternate representation: dichotomous variable Yji for each category j ,only one of which is 1; the others are 0.

3. If Y ∗i is observed, the probit version is a linear-normal regression model

4. If a dichotomous realization of Y ∗ is observed, its a logit/probit model

5. This is the same model, and the same parameters are being estimated;only the observation mechanism differs.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 96 / 242

Page 505: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models: Connections

1. Y ∗i ∼ STL(y∗i |µi )→ ordinal logitY ∗i ∼ STN(y∗i |µi )→ ordinal probit

2. Alternate representation: dichotomous variable Yji for each category j ,only one of which is 1; the others are 0.

3. If Y ∗i is observed, the probit version is a linear-normal regression model

4. If a dichotomous realization of Y ∗ is observed, its a logit/probit model

5. This is the same model, and the same parameters are being estimated;only the observation mechanism differs.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 96 / 242

Page 506: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models: Connections

1. Y ∗i ∼ STL(y∗i |µi )→ ordinal logitY ∗i ∼ STN(y∗i |µi )→ ordinal probit

2. Alternate representation: dichotomous variable Yji for each category j ,only one of which is 1; the others are 0.

3. If Y ∗i is observed, the probit version is a linear-normal regression model

4. If a dichotomous realization of Y ∗ is observed, its a logit/probit model

5. This is the same model, and the same parameters are being estimated;only the observation mechanism differs.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 96 / 242

Page 507: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Dependent Variable Models: Connections

1. Y ∗i ∼ STL(y∗i |µi )→ ordinal logitY ∗i ∼ STN(y∗i |µi )→ ordinal probit

2. Alternate representation: dichotomous variable Yji for each category j ,only one of which is 1; the others are 0.

3. If Y ∗i is observed, the probit version is a linear-normal regression model

4. If a dichotomous realization of Y ∗ is observed, its a logit/probit model

5. This is the same model, and the same parameters are being estimated;only the observation mechanism differs.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 96 / 242

Page 508: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

First the probability of each observation, then the joint probability.

Pr(Yij = 1) = Pr(τj−1 ≤ Y ∗i ≤ τj)

=

∫ τj

τj−1

STN(y∗i |µi )dy∗i

= Fstn(τj |µi )− Fstn(τj−1|µi )= Fstn(τj |xiβ)− Fstn(τj−1|xiβ)

The joint probability is then:

P(Y ) =n∏

i=1

J∏j=1

Pr(Yij = 1)yij

Bracketed portion has only one active component for each i .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 97 / 242

Page 509: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

First the probability of each observation, then the joint probability.

Pr(Yij = 1) = Pr(τj−1 ≤ Y ∗i ≤ τj)

=

∫ τj

τj−1

STN(y∗i |µi )dy∗i

= Fstn(τj |µi )− Fstn(τj−1|µi )= Fstn(τj |xiβ)− Fstn(τj−1|xiβ)

The joint probability is then:

P(Y ) =n∏

i=1

J∏j=1

Pr(Yij = 1)yij

Bracketed portion has only one active component for each i .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 97 / 242

Page 510: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

First the probability of each observation, then the joint probability.

Pr(Yij = 1) = Pr(τj−1 ≤ Y ∗i ≤ τj)

=

∫ τj

τj−1

STN(y∗i |µi )dy∗i

= Fstn(τj |µi )− Fstn(τj−1|µi )= Fstn(τj |xiβ)− Fstn(τj−1|xiβ)

The joint probability is then:

P(Y ) =n∏

i=1

J∏j=1

Pr(Yij = 1)yij

Bracketed portion has only one active component for each i .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 97 / 242

Page 511: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

First the probability of each observation, then the joint probability.

Pr(Yij = 1) = Pr(τj−1 ≤ Y ∗i ≤ τj)

=

∫ τj

τj−1

STN(y∗i |µi )dy∗i

= Fstn(τj |µi )− Fstn(τj−1|µi )= Fstn(τj |xiβ)− Fstn(τj−1|xiβ)

The joint probability is then:

P(Y ) =n∏

i=1

J∏j=1

Pr(Yij = 1)yij

Bracketed portion has only one active component for each i .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 97 / 242

Page 512: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

First the probability of each observation, then the joint probability.

Pr(Yij = 1) = Pr(τj−1 ≤ Y ∗i ≤ τj)

=

∫ τj

τj−1

STN(y∗i |µi )dy∗i

= Fstn(τj |µi )− Fstn(τj−1|µi )

= Fstn(τj |xiβ)− Fstn(τj−1|xiβ)

The joint probability is then:

P(Y ) =n∏

i=1

J∏j=1

Pr(Yij = 1)yij

Bracketed portion has only one active component for each i .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 97 / 242

Page 513: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

First the probability of each observation, then the joint probability.

Pr(Yij = 1) = Pr(τj−1 ≤ Y ∗i ≤ τj)

=

∫ τj

τj−1

STN(y∗i |µi )dy∗i

= Fstn(τj |µi )− Fstn(τj−1|µi )= Fstn(τj |xiβ)− Fstn(τj−1|xiβ)

The joint probability is then:

P(Y ) =n∏

i=1

J∏j=1

Pr(Yij = 1)yij

Bracketed portion has only one active component for each i .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 97 / 242

Page 514: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

First the probability of each observation, then the joint probability.

Pr(Yij = 1) = Pr(τj−1 ≤ Y ∗i ≤ τj)

=

∫ τj

τj−1

STN(y∗i |µi )dy∗i

= Fstn(τj |µi )− Fstn(τj−1|µi )= Fstn(τj |xiβ)− Fstn(τj−1|xiβ)

The joint probability is then:

P(Y ) =n∏

i=1

J∏j=1

Pr(Yij = 1)yij

Bracketed portion has only one active component for each i .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 97 / 242

Page 515: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

First the probability of each observation, then the joint probability.

Pr(Yij = 1) = Pr(τj−1 ≤ Y ∗i ≤ τj)

=

∫ τj

τj−1

STN(y∗i |µi )dy∗i

= Fstn(τj |µi )− Fstn(τj−1|µi )= Fstn(τj |xiβ)− Fstn(τj−1|xiβ)

The joint probability is then:

P(Y ) =n∏

i=1

J∏j=1

Pr(Yij = 1)yij

Bracketed portion has only one active component for each i .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 97 / 242

Page 516: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

First the probability of each observation, then the joint probability.

Pr(Yij = 1) = Pr(τj−1 ≤ Y ∗i ≤ τj)

=

∫ τj

τj−1

STN(y∗i |µi )dy∗i

= Fstn(τj |µi )− Fstn(τj−1|µi )= Fstn(τj |xiβ)− Fstn(τj−1|xiβ)

The joint probability is then:

P(Y ) =n∏

i=1

J∏j=1

Pr(Yij = 1)yij

Bracketed portion has only one active component for each i .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 97 / 242

Page 517: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

The Log-likelihood:

ln L(β, τ |y) =n∑

i=1

J∑j=1

yij ln Pr(Yij = 1)

=n∑

i=1

J∑j=1

yij ln [Fstn(τj |xiβ)− Fstn(τj−1|xiβ)]

(Constraints during optimization make this more complicated to do fromscratch: τj−1 < τj , ∀j)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 98 / 242

Page 518: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

The Log-likelihood:

ln L(β, τ |y) =n∑

i=1

J∑j=1

yij ln Pr(Yij = 1)

=n∑

i=1

J∑j=1

yij ln [Fstn(τj |xiβ)− Fstn(τj−1|xiβ)]

(Constraints during optimization make this more complicated to do fromscratch: τj−1 < τj , ∀j)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 98 / 242

Page 519: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

The Log-likelihood:

ln L(β, τ |y) =n∑

i=1

J∑j=1

yij ln Pr(Yij = 1)

=n∑

i=1

J∑j=1

yij ln [Fstn(τj |xiβ)− Fstn(τj−1|xiβ)]

(Constraints during optimization make this more complicated to do fromscratch: τj−1 < τj , ∀j)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 98 / 242

Page 520: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Deriving the likelihood function

The Log-likelihood:

ln L(β, τ |y) =n∑

i=1

J∑j=1

yij ln Pr(Yij = 1)

=n∑

i=1

J∑j=1

yij ln [Fstn(τj |xiβ)− Fstn(τj−1|xiβ)]

(Constraints during optimization make this more complicated to do fromscratch: τj−1 < τj , ∀j)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 98 / 242

Page 521: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation: Ordinal Probit

1. Coefficients are the linear effect of X on Y ∗ in standard deviation units

2. Predictions from the model are J probabilities that sum to 1.

3. One first difference has an effect on all J probabilities.

4. When one probability goes up, at least one of the others must go down.

5. Can use ternary diagrams if J = 3

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 99 / 242

Page 522: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation: Ordinal Probit

1. Coefficients are the linear effect of X on Y ∗ in standard deviation units

2. Predictions from the model are J probabilities that sum to 1.

3. One first difference has an effect on all J probabilities.

4. When one probability goes up, at least one of the others must go down.

5. Can use ternary diagrams if J = 3

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 99 / 242

Page 523: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation: Ordinal Probit

1. Coefficients are the linear effect of X on Y ∗ in standard deviation units

2. Predictions from the model are J probabilities that sum to 1.

3. One first difference has an effect on all J probabilities.

4. When one probability goes up, at least one of the others must go down.

5. Can use ternary diagrams if J = 3

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 99 / 242

Page 524: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation: Ordinal Probit

1. Coefficients are the linear effect of X on Y ∗ in standard deviation units

2. Predictions from the model are J probabilities that sum to 1.

3. One first difference has an effect on all J probabilities.

4. When one probability goes up, at least one of the others must go down.

5. Can use ternary diagrams if J = 3

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 99 / 242

Page 525: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation: Ordinal Probit

1. Coefficients are the linear effect of X on Y ∗ in standard deviation units

2. Predictions from the model are J probabilities that sum to 1.

3. One first difference has an effect on all J probabilities.

4. When one probability goes up, at least one of the others must go down.

5. Can use ternary diagrams if J = 3

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 99 / 242

Page 526: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpretation: Ordinal Probit

1. Coefficients are the linear effect of X on Y ∗ in standard deviation units

2. Predictions from the model are J probabilities that sum to 1.

3. One first difference has an effect on all J probabilities.

4. When one probability goes up, at least one of the others must go down.

5. Can use ternary diagrams if J = 3

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 99 / 242

Page 527: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Representing 3 variables, with Yj ∈ [0, 1] and∑3

j=1 Yj = 1

0

.5

1

ViC

0

.5

1

V iA

0

.5

1

ViL

C

LA

.. ..

.

.

. .

..

.

.

.

...

..

.

.

.

..

.

.......

.

.

.

..

.. .

. .

.

..

.

..

.

.

..

.

.

..

.

.

.

..

.

.

.. .

.

... .

.

. .

.

.

..

.

..

....

....

..

.

.

..

...

...

.

.

.

.

.

..

.

.

.

.

...

.

...

..

.. .

...

.. ..

.

..

. .. ..

. ..

..

. .

.

.

.

.

.

.

.

..

.

.. . .. .

.

.

.

..

.

...

.

..

..

...

.

..

.

.

..

..

.. .

..

..

.

...

.

..

.

.

..

.

.

.

..

.

.

... .

..

. ..

..

.

.

.

.

. ..

.

.

.

. .

.

..

.... ..

.

.. ...

.

...

..

.

..

. .

..

..

..

..

..

.

..

.. .

...

..

..

..

.

. .

. .

.

. .. ..

........

... .

..

..

..

..

..

.

...

..

.

.

..

..

.. .

.

..

. ...

. ....

.

.

.

.

. .. .

..

..

.

.. . .. ..

..

.

. ..

.

.. .

..

.

.. .

... .

...

..

.

.

.

.

. ..

.

.

.

..

.

.

.

..

.

.

...

.

..

..

.

... .

..

..

.

.

..

.

.

.

.

..

.

.

. .

..

.

..

.

.

..

...

..

..

.

.

..

. .

.

....

. .

.

....

.

.

.

...

. .

.

.

..

.

..

.

.

.

.

..

.

.

.

.

.

..

. .

.

.

.

..

.

.

..

.

.

.

.

.

.

.

.

.

...

.

.

.

.

.

..

.

..

..

.

.

.

..

.

. ..

.

.

.

.

..

.

..

.

.

.

.

.

.

.

.

.

.

.

..

.

...

..

...

.

...

.

.

.

.

...

.

...

...

.

.

. ..

.

..

.

.. .

..

..

..

.

.

.

.

.

.

.. .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 100 / 242

Page 528: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Calculating Quantities of Interest

Predicted probability:

πij(Xi ) ≡ Pr(Yi = j | Xi ) = Pr(Yi ≤ j | Xi )− Pr(Yi ≤ j − 1 | Xi )

=

{exp(ψj−X>i β)

1+exp(ψj−X>i β)− exp(ψj−1−X>i β)

1+exp(ψj−1−X>i β)for logit

Φ(ψj − X>i β

)− Φ

(ψj−1 − X>i β

)for probit

ATE (APE): τj = E [πj(Ti = 1,Wi )− πj(Ti = 0,Wi )]

Estimate β and ψ via MLE, plug the estimates in, replace E with 1n

∑, and

compute CI by delta or MC or bootstrap

Note that X>i β appears both before and after the minus sign in πij−→ Direction of effect of Xi on Yij is ambiguous (except top and bottom)−→ Again, calculate quantities of interest, not just coefficients

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 101 / 242

Page 529: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Calculating Quantities of Interest

Predicted probability:

πij(Xi ) ≡ Pr(Yi = j | Xi ) = Pr(Yi ≤ j | Xi )− Pr(Yi ≤ j − 1 | Xi )

=

{exp(ψj−X>i β)

1+exp(ψj−X>i β)− exp(ψj−1−X>i β)

1+exp(ψj−1−X>i β)for logit

Φ(ψj − X>i β

)− Φ

(ψj−1 − X>i β

)for probit

ATE (APE): τj = E [πj(Ti = 1,Wi )− πj(Ti = 0,Wi )]

Estimate β and ψ via MLE, plug the estimates in, replace E with 1n

∑, and

compute CI by delta or MC or bootstrap

Note that X>i β appears both before and after the minus sign in πij−→ Direction of effect of Xi on Yij is ambiguous (except top and bottom)−→ Again, calculate quantities of interest, not just coefficients

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 101 / 242

Page 530: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Calculating Quantities of Interest

Predicted probability:

πij(Xi ) ≡ Pr(Yi = j | Xi ) = Pr(Yi ≤ j | Xi )− Pr(Yi ≤ j − 1 | Xi )

=

{exp(ψj−X>i β)

1+exp(ψj−X>i β)− exp(ψj−1−X>i β)

1+exp(ψj−1−X>i β)for logit

Φ(ψj − X>i β

)− Φ

(ψj−1 − X>i β

)for probit

ATE (APE): τj = E [πj(Ti = 1,Wi )− πj(Ti = 0,Wi )]

Estimate β and ψ via MLE, plug the estimates in, replace E with 1n

∑, and

compute CI by delta or MC or bootstrap

Note that X>i β appears both before and after the minus sign in πij−→ Direction of effect of Xi on Yij is ambiguous (except top and bottom)−→ Again, calculate quantities of interest, not just coefficients

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 101 / 242

Page 531: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Calculating Quantities of Interest

Predicted probability:

πij(Xi ) ≡ Pr(Yi = j | Xi ) = Pr(Yi ≤ j | Xi )− Pr(Yi ≤ j − 1 | Xi )

=

{exp(ψj−X>i β)

1+exp(ψj−X>i β)− exp(ψj−1−X>i β)

1+exp(ψj−1−X>i β)for logit

Φ(ψj − X>i β

)− Φ

(ψj−1 − X>i β

)for probit

ATE (APE): τj = E [πj(Ti = 1,Wi )− πj(Ti = 0,Wi )]

Estimate β and ψ via MLE, plug the estimates in, replace E with 1n

∑, and

compute CI by delta or MC or bootstrap

Note that X>i β appears both before and after the minus sign in πij

−→ Direction of effect of Xi on Yij is ambiguous (except top and bottom)−→ Again, calculate quantities of interest, not just coefficients

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 101 / 242

Page 532: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Calculating Quantities of Interest

Predicted probability:

πij(Xi ) ≡ Pr(Yi = j | Xi ) = Pr(Yi ≤ j | Xi )− Pr(Yi ≤ j − 1 | Xi )

=

{exp(ψj−X>i β)

1+exp(ψj−X>i β)− exp(ψj−1−X>i β)

1+exp(ψj−1−X>i β)for logit

Φ(ψj − X>i β

)− Φ

(ψj−1 − X>i β

)for probit

ATE (APE): τj = E [πj(Ti = 1,Wi )− πj(Ti = 0,Wi )]

Estimate β and ψ via MLE, plug the estimates in, replace E with 1n

∑, and

compute CI by delta or MC or bootstrap

Note that X>i β appears both before and after the minus sign in πij−→ Direction of effect of Xi on Yij is ambiguous (except top and bottom)

−→ Again, calculate quantities of interest, not just coefficients

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 101 / 242

Page 533: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Calculating Quantities of Interest

Predicted probability:

πij(Xi ) ≡ Pr(Yi = j | Xi ) = Pr(Yi ≤ j | Xi )− Pr(Yi ≤ j − 1 | Xi )

=

{exp(ψj−X>i β)

1+exp(ψj−X>i β)− exp(ψj−1−X>i β)

1+exp(ψj−1−X>i β)for logit

Φ(ψj − X>i β

)− Φ

(ψj−1 − X>i β

)for probit

ATE (APE): τj = E [πj(Ti = 1,Wi )− πj(Ti = 0,Wi )]

Estimate β and ψ via MLE, plug the estimates in, replace E with 1n

∑, and

compute CI by delta or MC or bootstrap

Note that X>i β appears both before and after the minus sign in πij−→ Direction of effect of Xi on Yij is ambiguous (except top and bottom)−→ Again, calculate quantities of interest, not just coefficients

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 101 / 242

Page 534: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Immigration and Media PrimingBrader, Valentino and Suhay (2008):

Yi : Ordinal response to question about increasing immigration

T1i ,T2i : Media cues (immigrant ethnicity × story tone)

Wi : Respondent age and income

Estimated coefficients:

Coefficients:

Value s.e. t

tone 0.27 0.32 0.85

eth -0.33 0.32 -1.02

ppage 0.01 0.02 1.40

ppincimp 0.00 0.03 0.06

tone:eth 0.90 0.46 2.16

Intercepts:

Value s.e. t

1|2 -1.93 0.58 -3.32

2|3 -0.12 0.55 -0.21

3|4 1.12 0.56 2.01

ATE:

−0.

4−

0.2

0.0

0.2

0.4

Do you think the number of immigrants from foreign countries should be increased or decreased?

Ave

rage

Tre

atm

ent E

ffect

European/Positive vs. Latino/Negative

Increased a Lot

Increased a Little

Decreased a Little

Decreased a Lot

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 102 / 242

Page 535: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Immigration and Media PrimingBrader, Valentino and Suhay (2008):

Yi : Ordinal response to question about increasing immigration

T1i ,T2i : Media cues (immigrant ethnicity × story tone)

Wi : Respondent age and income

Estimated coefficients:

Coefficients:

Value s.e. t

tone 0.27 0.32 0.85

eth -0.33 0.32 -1.02

ppage 0.01 0.02 1.40

ppincimp 0.00 0.03 0.06

tone:eth 0.90 0.46 2.16

Intercepts:

Value s.e. t

1|2 -1.93 0.58 -3.32

2|3 -0.12 0.55 -0.21

3|4 1.12 0.56 2.01

ATE:

−0.

4−

0.2

0.0

0.2

0.4

Do you think the number of immigrants from foreign countries should be increased or decreased?

Ave

rage

Tre

atm

ent E

ffect

European/Positive vs. Latino/Negative

Increased a Lot

Increased a Little

Decreased a Little

Decreased a Lot

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 102 / 242

Page 536: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Immigration and Media PrimingBrader, Valentino and Suhay (2008):

Yi : Ordinal response to question about increasing immigration

T1i ,T2i : Media cues (immigrant ethnicity × story tone)

Wi : Respondent age and income

Estimated coefficients:

Coefficients:

Value s.e. t

tone 0.27 0.32 0.85

eth -0.33 0.32 -1.02

ppage 0.01 0.02 1.40

ppincimp 0.00 0.03 0.06

tone:eth 0.90 0.46 2.16

Intercepts:

Value s.e. t

1|2 -1.93 0.58 -3.32

2|3 -0.12 0.55 -0.21

3|4 1.12 0.56 2.01

ATE:

−0.

4−

0.2

0.0

0.2

0.4

Do you think the number of immigrants from foreign countries should be increased or decreased?

Ave

rage

Tre

atm

ent E

ffect

European/Positive vs. Latino/Negative

Increased a Lot

Increased a Little

Decreased a Little

Decreased a Lot

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 102 / 242

Page 537: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Peer Bereavement

Andersen, Silver, Koperwas, Stewart and Kirschbaum (2013):

Study of response by college students to a sequence of 14 peer deaths in oneacademic year.

Yi : Severity of acute reaction (1-5 scale)

Xi : gender, number of peers known, media exposure

Coef SE CI P-value RR Strong (95% CI) RR Extreme (95% CI)Female 0.89 0.23 (0.44, 1.35) 0 2.71 (1.51, 4.82) 13.69 (2.84, 45.55)

Num. Peers Known 0.54 0.16 (0.23, 0.85) 0 1.43 (1.15, 1.82) 3.53 (1.6, 7.25)Media Exposure 0.25 0.06 (0.13, 0.36) 0 1.17 (1.08, 1.3) 1.73 (1.31, 2.38)

The Risk Ratio measures the relative probability of being in the outcome categorybased on different values of the independent variable. Thus the RR for the StrongReaction category for Female can be understood as

RRStrong =Pr(Strong Reaction|Female)

Pr(Strong Reaction|Male)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 103 / 242

Page 538: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Peer Bereavement

Andersen, Silver, Koperwas, Stewart and Kirschbaum (2013):Study of response by college students to a sequence of 14 peer deaths in oneacademic year.

Yi : Severity of acute reaction (1-5 scale)

Xi : gender, number of peers known, media exposure

Coef SE CI P-value RR Strong (95% CI) RR Extreme (95% CI)Female 0.89 0.23 (0.44, 1.35) 0 2.71 (1.51, 4.82) 13.69 (2.84, 45.55)

Num. Peers Known 0.54 0.16 (0.23, 0.85) 0 1.43 (1.15, 1.82) 3.53 (1.6, 7.25)Media Exposure 0.25 0.06 (0.13, 0.36) 0 1.17 (1.08, 1.3) 1.73 (1.31, 2.38)

The Risk Ratio measures the relative probability of being in the outcome categorybased on different values of the independent variable. Thus the RR for the StrongReaction category for Female can be understood as

RRStrong =Pr(Strong Reaction|Female)

Pr(Strong Reaction|Male)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 103 / 242

Page 539: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Peer Bereavement

Andersen, Silver, Koperwas, Stewart and Kirschbaum (2013):Study of response by college students to a sequence of 14 peer deaths in oneacademic year.

Yi : Severity of acute reaction (1-5 scale)

Xi : gender, number of peers known, media exposure

Coef SE CI P-value RR Strong (95% CI) RR Extreme (95% CI)Female 0.89 0.23 (0.44, 1.35) 0 2.71 (1.51, 4.82) 13.69 (2.84, 45.55)

Num. Peers Known 0.54 0.16 (0.23, 0.85) 0 1.43 (1.15, 1.82) 3.53 (1.6, 7.25)Media Exposure 0.25 0.06 (0.13, 0.36) 0 1.17 (1.08, 1.3) 1.73 (1.31, 2.38)

The Risk Ratio measures the relative probability of being in the outcome categorybased on different values of the independent variable. Thus the RR for the StrongReaction category for Female can be understood as

RRStrong =Pr(Strong Reaction|Female)

Pr(Strong Reaction|Male)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 103 / 242

Page 540: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Peer Bereavement

Andersen, Silver, Koperwas, Stewart and Kirschbaum (2013):Study of response by college students to a sequence of 14 peer deaths in oneacademic year.

Yi : Severity of acute reaction (1-5 scale)

Xi : gender, number of peers known, media exposure

Coef SE CI P-value RR Strong (95% CI) RR Extreme (95% CI)Female 0.89 0.23 (0.44, 1.35) 0 2.71 (1.51, 4.82) 13.69 (2.84, 45.55)

Num. Peers Known 0.54 0.16 (0.23, 0.85) 0 1.43 (1.15, 1.82) 3.53 (1.6, 7.25)Media Exposure 0.25 0.06 (0.13, 0.36) 0 1.17 (1.08, 1.3) 1.73 (1.31, 2.38)

The Risk Ratio measures the relative probability of being in the outcome categorybased on different values of the independent variable.

Thus the RR for the StrongReaction category for Female can be understood as

RRStrong =Pr(Strong Reaction|Female)

Pr(Strong Reaction|Male)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 103 / 242

Page 541: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Peer Bereavement

Andersen, Silver, Koperwas, Stewart and Kirschbaum (2013):Study of response by college students to a sequence of 14 peer deaths in oneacademic year.

Yi : Severity of acute reaction (1-5 scale)

Xi : gender, number of peers known, media exposure

Coef SE CI P-value RR Strong (95% CI) RR Extreme (95% CI)Female 0.89 0.23 (0.44, 1.35) 0 2.71 (1.51, 4.82) 13.69 (2.84, 45.55)

Num. Peers Known 0.54 0.16 (0.23, 0.85) 0 1.43 (1.15, 1.82) 3.53 (1.6, 7.25)Media Exposure 0.25 0.06 (0.13, 0.36) 0 1.17 (1.08, 1.3) 1.73 (1.31, 2.38)

The Risk Ratio measures the relative probability of being in the outcome categorybased on different values of the independent variable. Thus the RR for the StrongReaction category for Female can be understood as

RRStrong =Pr(Strong Reaction|Female)

Pr(Strong Reaction|Male)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 103 / 242

Page 542: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Peer Bereavement

0.0

0.1

0.2

0.3

0.4

0.5

0.6

Severity of Acute Reaction

Exp

ecte

d P

roba

bilit

y of

Rea

ctio

n

No Reaction Slight Moderate Strong Extreme

● 0.0

0.2

0.4

0.6

0.8

Severity of Acute ReactionE

xpec

ted

Pro

babi

lity

of R

eact

ion

●●

● ●

●●

No Reaction Slight Moderate Strong Extreme

Figure: This plot shows the expected probabilities of being in each category ofreaction given gender (left) and knowing 1 to 4 people (right) with 95%confidence intervals.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 104 / 242

Page 543: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Categorical Conclusions

Straightforward to derive from latent variable representation

Ordered probit is often easier to work with,

ordered logit has a niceinterpretation as a proportional odds model

logγij

1− γij= λjexp(xiβ)

where γij is the cumulative probability and λj is the baseline odds.Covariates raise or lower the odds of a response in category j or below.

Visualization and appropriate quantities of interest can be tricky. Letthe substance guide you.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 105 / 242

Page 544: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Categorical Conclusions

Straightforward to derive from latent variable representation

Ordered probit is often easier to work with,

ordered logit has a niceinterpretation as a proportional odds model

logγij

1− γij= λjexp(xiβ)

where γij is the cumulative probability and λj is the baseline odds.Covariates raise or lower the odds of a response in category j or below.

Visualization and appropriate quantities of interest can be tricky. Letthe substance guide you.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 105 / 242

Page 545: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Categorical Conclusions

Straightforward to derive from latent variable representation

Ordered probit is often easier to work with,

ordered logit has a niceinterpretation as a proportional odds model

logγij

1− γij= λjexp(xiβ)

where γij is the cumulative probability and λj is the baseline odds.Covariates raise or lower the odds of a response in category j or below.

Visualization and appropriate quantities of interest can be tricky. Letthe substance guide you.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 105 / 242

Page 546: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Categorical Conclusions

Straightforward to derive from latent variable representation

Ordered probit is often easier to work with, ordered logit has a niceinterpretation as a proportional odds model

logγij

1− γij= λjexp(xiβ)

where γij is the cumulative probability and λj is the baseline odds.

Covariates raise or lower the odds of a response in category j or below.

Visualization and appropriate quantities of interest can be tricky. Letthe substance guide you.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 105 / 242

Page 547: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Categorical Conclusions

Straightforward to derive from latent variable representation

Ordered probit is often easier to work with, ordered logit has a niceinterpretation as a proportional odds model

logγij

1− γij= λjexp(xiβ)

where γij is the cumulative probability and λj is the baseline odds.Covariates raise or lower the odds of a response in category j or below.

Visualization and appropriate quantities of interest can be tricky. Letthe substance guide you.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 105 / 242

Page 548: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Ordered Categorical Conclusions

Straightforward to derive from latent variable representation

Ordered probit is often easier to work with, ordered logit has a niceinterpretation as a proportional odds model

logγij

1− γij= λjexp(xiβ)

where γij is the cumulative probability and λj is the baseline odds.Covariates raise or lower the odds of a response in category j or below.

Visualization and appropriate quantities of interest can be tricky. Letthe substance guide you.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 105 / 242

Page 549: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 106 / 242

Page 550: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 106 / 242

Page 551: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit

Sometimes we encounter unordered categories (choose a Ph.D.Sociology, Politics, Psychology, Statistics).

We can generalize the logit model to two choices to get themultinomial logit model

πij = Pr(Yi = j | Xi ) =exp(X>i βj)∑J

k=1 exp(X>i βk),

where Xi = individual-specific characteristics of unit i

category-specific set of coefficients (one category omitted foridentification)

Multinomial logit also has a latent variable interpretation: makechoice with greatest utility Y ∗ij . When the stochastic component on

the utility is εiji.i.d.∼ type I extreme value distribution, multinomial

logit is implied.

Coefficients are relative to a baseline category- so again we want tocompute quantities of interest for interpretation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 107 / 242

Page 552: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit

Sometimes we encounter unordered categories (choose a Ph.D.Sociology, Politics, Psychology, Statistics).

We can generalize the logit model to two choices to get themultinomial logit model

πij = Pr(Yi = j | Xi ) =exp(X>i βj)∑J

k=1 exp(X>i βk),

where Xi = individual-specific characteristics of unit i

category-specific set of coefficients (one category omitted foridentification)

Multinomial logit also has a latent variable interpretation: makechoice with greatest utility Y ∗ij . When the stochastic component on

the utility is εiji.i.d.∼ type I extreme value distribution, multinomial

logit is implied.

Coefficients are relative to a baseline category- so again we want tocompute quantities of interest for interpretation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 107 / 242

Page 553: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit

Sometimes we encounter unordered categories (choose a Ph.D.Sociology, Politics, Psychology, Statistics).

We can generalize the logit model to two choices to get themultinomial logit model

πij = Pr(Yi = j | Xi ) =exp(X>i βj)∑J

k=1 exp(X>i βk),

where Xi = individual-specific characteristics of unit i

category-specific set of coefficients (one category omitted foridentification)

Multinomial logit also has a latent variable interpretation: makechoice with greatest utility Y ∗ij . When the stochastic component on

the utility is εiji.i.d.∼ type I extreme value distribution, multinomial

logit is implied.

Coefficients are relative to a baseline category- so again we want tocompute quantities of interest for interpretation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 107 / 242

Page 554: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit

Sometimes we encounter unordered categories (choose a Ph.D.Sociology, Politics, Psychology, Statistics).

We can generalize the logit model to two choices to get themultinomial logit model

πij = Pr(Yi = j | Xi ) =exp(X>i βj)∑J

k=1 exp(X>i βk),

where Xi = individual-specific characteristics of unit i

category-specific set of coefficients (one category omitted foridentification)

Multinomial logit also has a latent variable interpretation: makechoice with greatest utility Y ∗ij . When the stochastic component on

the utility is εiji.i.d.∼ type I extreme value distribution, multinomial

logit is implied.

Coefficients are relative to a baseline category- so again we want tocompute quantities of interest for interpretation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 107 / 242

Page 555: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit

Sometimes we encounter unordered categories (choose a Ph.D.Sociology, Politics, Psychology, Statistics).

We can generalize the logit model to two choices to get themultinomial logit model

πij = Pr(Yi = j | Xi ) =exp(X>i βj)∑J

k=1 exp(X>i βk),

where Xi = individual-specific characteristics of unit i

category-specific set of coefficients (one category omitted foridentification)

Multinomial logit also has a latent variable interpretation: makechoice with greatest utility Y ∗ij . When the stochastic component on

the utility is εiji.i.d.∼ type I extreme value distribution, multinomial

logit is implied.

Coefficients are relative to a baseline category- so again we want tocompute quantities of interest for interpretation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 107 / 242

Page 556: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit

Sometimes we encounter unordered categories (choose a Ph.D.Sociology, Politics, Psychology, Statistics).

We can generalize the logit model to two choices to get themultinomial logit model

πij = Pr(Yi = j | Xi ) =exp(X>i βj)∑J

k=1 exp(X>i βk),

where Xi = individual-specific characteristics of unit i

category-specific set of coefficients (one category omitted foridentification)

Multinomial logit also has a latent variable interpretation: makechoice with greatest utility Y ∗ij . When the stochastic component on

the utility is εiji.i.d.∼ type I extreme value distribution, multinomial

logit is implied.

Coefficients are relative to a baseline category- so again we want tocompute quantities of interest for interpretation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 107 / 242

Page 557: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit

Sometimes we encounter unordered categories (choose a Ph.D.Sociology, Politics, Psychology, Statistics).

We can generalize the logit model to two choices to get themultinomial logit model

πij = Pr(Yi = j | Xi ) =exp(X>i βj)∑J

k=1 exp(X>i βk),

where Xi = individual-specific characteristics of unit i

category-specific set of coefficients (one category omitted foridentification)

Multinomial logit also has a latent variable interpretation: makechoice with greatest utility Y ∗ij . When the stochastic component on

the utility is εiji.i.d.∼ type I extreme value distribution, multinomial

logit is implied.

Coefficients are relative to a baseline category- so again we want tocompute quantities of interest for interpretation.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 107 / 242

Page 558: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Independence of Irrelevant Alternatives

Multinomial logit assumes iid errors in the latent utility model, thisimplies that unobserved factors affecting Y ∗ij are unrelated to thoseaffecting Y ∗ik .

MNL makes the Independence of Irrelevant Alternatives (IIA)assumption:

Pr(Choose j | j or k)

Pr(Choose k | j or k)=

Pr(Choose j | j or k or l)

Pr(Choose k | j or k or l)for any l ∈ {1, ...J}

A classical example of IIA violation: the red bus-blue bus problem

That is, the multinomial choice reduces to a series of independentpairwise comparisons

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 108 / 242

Page 559: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Independence of Irrelevant Alternatives

Multinomial logit assumes iid errors in the latent utility model, thisimplies that unobserved factors affecting Y ∗ij are unrelated to thoseaffecting Y ∗ik .

MNL makes the Independence of Irrelevant Alternatives (IIA)assumption:

Pr(Choose j | j or k)

Pr(Choose k | j or k)=

Pr(Choose j | j or k or l)

Pr(Choose k | j or k or l)for any l ∈ {1, ...J}

A classical example of IIA violation: the red bus-blue bus problem

That is, the multinomial choice reduces to a series of independentpairwise comparisons

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 108 / 242

Page 560: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Independence of Irrelevant Alternatives

Multinomial logit assumes iid errors in the latent utility model, thisimplies that unobserved factors affecting Y ∗ij are unrelated to thoseaffecting Y ∗ik .

MNL makes the Independence of Irrelevant Alternatives (IIA)assumption:

Pr(Choose j | j or k)

Pr(Choose k | j or k)=

Pr(Choose j | j or k or l)

Pr(Choose k | j or k or l)for any l ∈ {1, ...J}

A classical example of IIA violation: the red bus-blue bus problem

That is, the multinomial choice reduces to a series of independentpairwise comparisons

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 108 / 242

Page 561: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Independence of Irrelevant Alternatives

Multinomial logit assumes iid errors in the latent utility model, thisimplies that unobserved factors affecting Y ∗ij are unrelated to thoseaffecting Y ∗ik .

MNL makes the Independence of Irrelevant Alternatives (IIA)assumption:

Pr(Choose j | j or k)

Pr(Choose k | j or k)=

Pr(Choose j | j or k or l)

Pr(Choose k | j or k or l)for any l ∈ {1, ...J}

A classical example of IIA violation: the red bus-blue bus problem

That is, the multinomial choice reduces to a series of independentpairwise comparisons

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 108 / 242

Page 562: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Independence of Irrelevant Alternatives

Multinomial logit assumes iid errors in the latent utility model, thisimplies that unobserved factors affecting Y ∗ij are unrelated to thoseaffecting Y ∗ik .

MNL makes the Independence of Irrelevant Alternatives (IIA)assumption:

Pr(Choose j | j or k)

Pr(Choose k | j or k)=

Pr(Choose j | j or k or l)

Pr(Choose k | j or k or l)for any l ∈ {1, ...J}

A classical example of IIA violation: the red bus-blue bus problem

That is, the multinomial choice reduces to a series of independentpairwise comparisons

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 108 / 242

Page 563: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Independence of Irrelevant Alternatives

Multinomial logit assumes iid errors in the latent utility model, thisimplies that unobserved factors affecting Y ∗ij are unrelated to thoseaffecting Y ∗ik .

MNL makes the Independence of Irrelevant Alternatives (IIA)assumption:

Pr(Choose j | j or k)

Pr(Choose k | j or k)=

Pr(Choose j | j or k or l)

Pr(Choose k | j or k or l)for any l ∈ {1, ...J}

A classical example of IIA violation: the red bus-blue bus problem

That is, the multinomial choice reduces to a series of independentpairwise comparisons

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 108 / 242

Page 564: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing IIA with Multinomial Probit

To relax IIA, we need to allow the stochastic component of the utilityεij to be correlated across choices j for each voter.

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on level and scale of Y ∗i for identification

Computation is difficult because integral is intractableMoreover, # of parameters in ΣJ increases as J gets large, but datacontain little information about ΣJ :

J 3 4 5 6 7# of elements in ΣJ 6 10 15 21 28# of parameters identified 2 5 9 14 20

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 109 / 242

Page 565: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing IIA with Multinomial Probit

To relax IIA, we need to allow the stochastic component of the utilityεij to be correlated across choices j for each voter.

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on level and scale of Y ∗i for identification

Computation is difficult because integral is intractableMoreover, # of parameters in ΣJ increases as J gets large, but datacontain little information about ΣJ :

J 3 4 5 6 7# of elements in ΣJ 6 10 15 21 28# of parameters identified 2 5 9 14 20

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 109 / 242

Page 566: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing IIA with Multinomial Probit

To relax IIA, we need to allow the stochastic component of the utilityεij to be correlated across choices j for each voter.

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on level and scale of Y ∗i for identification

Computation is difficult because integral is intractableMoreover, # of parameters in ΣJ increases as J gets large, but datacontain little information about ΣJ :

J 3 4 5 6 7# of elements in ΣJ 6 10 15 21 28# of parameters identified 2 5 9 14 20

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 109 / 242

Page 567: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing IIA with Multinomial Probit

To relax IIA, we need to allow the stochastic component of the utilityεij to be correlated across choices j for each voter.

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on level and scale of Y ∗i for identification

Computation is difficult because integral is intractableMoreover, # of parameters in ΣJ increases as J gets large, but datacontain little information about ΣJ :

J 3 4 5 6 7# of elements in ΣJ 6 10 15 21 28# of parameters identified 2 5 9 14 20

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 109 / 242

Page 568: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing IIA with Multinomial Probit

To relax IIA, we need to allow the stochastic component of the utilityεij to be correlated across choices j for each voter.

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on level and scale of Y ∗i for identification

Computation is difficult because integral is intractable

Moreover, # of parameters in ΣJ increases as J gets large, but datacontain little information about ΣJ :

J 3 4 5 6 7# of elements in ΣJ 6 10 15 21 28# of parameters identified 2 5 9 14 20

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 109 / 242

Page 569: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing IIA with Multinomial Probit

To relax IIA, we need to allow the stochastic component of the utilityεij to be correlated across choices j for each voter.

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on level and scale of Y ∗i for identification

Computation is difficult because integral is intractableMoreover, # of parameters in ΣJ increases as J gets large, but datacontain little information about ΣJ :

J 3 4 5 6 7# of elements in ΣJ 6 10 15 21 28# of parameters identified 2 5 9 14 20

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 109 / 242

Page 570: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 110 / 242

Page 571: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 110 / 242

Page 572: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

It’s a discrete probability distribution which gives the probability that somenumber of events will occur in a fixed period of time.Examples:

1. number of terrorist attacks in a given year

2. number of publications by a Professor in a career

3. number of days absent from school for High School Sophomores

4. logo for the Stata Press:

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 111 / 242

Page 573: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

It’s a discrete probability distribution which gives the probability that somenumber of events will occur in a fixed period of time.Examples:

1. number of terrorist attacks in a given year

2. number of publications by a Professor in a career

3. number of days absent from school for High School Sophomores

4. logo for the Stata Press:

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 111 / 242

Page 574: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

It’s a discrete probability distribution which gives the probability that somenumber of events will occur in a fixed period of time.Examples:

1. number of terrorist attacks in a given year

2. number of publications by a Professor in a career

3. number of days absent from school for High School Sophomores

4. logo for the Stata Press:

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 111 / 242

Page 575: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

It’s a discrete probability distribution which gives the probability that somenumber of events will occur in a fixed period of time.Examples:

1. number of terrorist attacks in a given year

2. number of publications by a Professor in a career

3. number of days absent from school for High School Sophomores

4. logo for the Stata Press:

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 111 / 242

Page 576: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

It’s a discrete probability distribution which gives the probability that somenumber of events will occur in a fixed period of time.Examples:

1. number of terrorist attacks in a given year

2. number of publications by a Professor in a career

3. number of days absent from school for High School Sophomores

4. logo for the Stata Press:

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 111 / 242

Page 577: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Poisson distribution’s first principles:

1. Begin with an observation period and count point:

2. Assumptions are about: events occurring between start and countobservation. The process of event generation is not observed.

3. 0 events occur at the start of the period

4. Observe only: number of events at end of the period

5. No 2 events can occur at the same time

6. Pr(event at time t | all events up to time t − 1) is constant for all t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 112 / 242

Page 578: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Poisson distribution’s first principles:

1. Begin with an observation period and count point:

2. Assumptions are about: events occurring between start and countobservation. The process of event generation is not observed.

3. 0 events occur at the start of the period

4. Observe only: number of events at end of the period

5. No 2 events can occur at the same time

6. Pr(event at time t | all events up to time t − 1) is constant for all t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 112 / 242

Page 579: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Poisson distribution’s first principles:

1. Begin with an observation period and count point:

Observation period Total count observed here

2. Assumptions are about: events occurring between start and countobservation. The process of event generation is not observed.

3. 0 events occur at the start of the period

4. Observe only: number of events at end of the period

5. No 2 events can occur at the same time

6. Pr(event at time t | all events up to time t − 1) is constant for all t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 112 / 242

Page 580: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Poisson distribution’s first principles:

1. Begin with an observation period and count point:

Observation period Total count observed here

2. Assumptions are about: events occurring between start and countobservation. The process of event generation is not observed.

3. 0 events occur at the start of the period

4. Observe only: number of events at end of the period

5. No 2 events can occur at the same time

6. Pr(event at time t | all events up to time t − 1) is constant for all t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 112 / 242

Page 581: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Poisson distribution’s first principles:

1. Begin with an observation period and count point:

Observation period Total count observed here

2. Assumptions are about: events occurring between start and countobservation. The process of event generation is not observed.

3. 0 events occur at the start of the period

4. Observe only: number of events at end of the period

5. No 2 events can occur at the same time

6. Pr(event at time t | all events up to time t − 1) is constant for all t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 112 / 242

Page 582: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Poisson distribution’s first principles:

1. Begin with an observation period and count point:

Observation period Total count observed here

2. Assumptions are about: events occurring between start and countobservation. The process of event generation is not observed.

3. 0 events occur at the start of the period

4. Observe only: number of events at end of the period

5. No 2 events can occur at the same time

6. Pr(event at time t | all events up to time t − 1) is constant for all t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 112 / 242

Page 583: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Poisson distribution’s first principles:

1. Begin with an observation period and count point:

Observation period Total count observed here

2. Assumptions are about: events occurring between start and countobservation. The process of event generation is not observed.

3. 0 events occur at the start of the period

4. Observe only: number of events at end of the period

5. No 2 events can occur at the same time

6. Pr(event at time t | all events up to time t − 1) is constant for all t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 112 / 242

Page 584: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Poisson distribution’s first principles:

1. Begin with an observation period and count point:

Observation period Total count observed here

2. Assumptions are about: events occurring between start and countobservation. The process of event generation is not observed.

3. 0 events occur at the start of the period

4. Observe only: number of events at end of the period

5. No 2 events can occur at the same time

6. Pr(event at time t | all events up to time t − 1) is constant for all t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 112 / 242

Page 585: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

Here is the probability density function (PDF) for a random variable Ythat is distributed Pois(λ):

Pr(Y = y) =λy

y !e−λ.

-suppose Y ∼ Pois(3). What’s Pr(Y = 4)?

Pr(Y = 4) =34

4!e−3 = 0.168.

Poisson Distribution

y

Pr(

Y=

y)

00.

050.

10.

15

0 2 4 6 8 10

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 113 / 242

Page 586: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson DistributionHere is the probability density function (PDF) for a random variable Ythat is distributed Pois(λ):

Pr(Y = y) =λy

y !e−λ.

-suppose Y ∼ Pois(3). What’s Pr(Y = 4)?

Pr(Y = 4) =34

4!e−3 = 0.168.

Poisson Distribution

y

Pr(

Y=

y)

00.

050.

10.

15

0 2 4 6 8 10

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 113 / 242

Page 587: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson DistributionHere is the probability density function (PDF) for a random variable Ythat is distributed Pois(λ):

Pr(Y = y) =λy

y !e−λ.

-suppose Y ∼ Pois(3). What’s Pr(Y = 4)?

Pr(Y = 4) =34

4!e−3 = 0.168.

Poisson Distribution

y

Pr(

Y=

y)

00.

050.

10.

15

0 2 4 6 8 10

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 113 / 242

Page 588: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson DistributionHere is the probability density function (PDF) for a random variable Ythat is distributed Pois(λ):

Pr(Y = y) =λy

y !e−λ.

-suppose Y ∼ Pois(3). What’s Pr(Y = 4)?

Pr(Y = 4) =34

4!e−3 = 0.168.

Poisson Distribution

y

Pr(

Y=

y)

00.

050.

10.

15

0 2 4 6 8 10

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 113 / 242

Page 589: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

(λ):

Pr(Y = y) =λy

y !e−λ.

Using a little bit of geometric series trickery, it isn’t too hard to show thatE[Y ] =

∑∞y=0 y · λyy ! e

−λ = λ.

It also turns out that Var(Y ) = λ, a feature of the model we will discusslater on.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 114 / 242

Page 590: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

(λ):

Pr(Y = y) =λy

y !e−λ.

Using a little bit of geometric series trickery, it isn’t too hard to show thatE[Y ] =

∑∞y=0 y · λyy ! e

−λ = λ.

It also turns out that Var(Y ) = λ, a feature of the model we will discusslater on.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 114 / 242

Page 591: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

(λ):

Pr(Y = y) =λy

y !e−λ.

Using a little bit of geometric series trickery, it isn’t too hard to show thatE[Y ] =

∑∞y=0 y · λyy ! e

−λ = λ.

It also turns out that Var(Y ) = λ, a feature of the model we will discusslater on.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 114 / 242

Page 592: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

Poisson data arises when there is some discrete event which occurs(possibly multiple times) at a constant rate for some fixed time period.

This constant rate assumption could be restated: the probability of anevent occurring at any moment is independent of whether an event hasoccurred at any other moment.

Derivation of the distribution has some other technical first principles, butthe above is the most important.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 115 / 242

Page 593: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

Poisson data arises when there is some discrete event which occurs(possibly multiple times) at a constant rate for some fixed time period.

This constant rate assumption could be restated: the probability of anevent occurring at any moment is independent of whether an event hasoccurred at any other moment.

Derivation of the distribution has some other technical first principles, butthe above is the most important.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 115 / 242

Page 594: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

Poisson data arises when there is some discrete event which occurs(possibly multiple times) at a constant rate for some fixed time period.

This constant rate assumption could be restated: the probability of anevent occurring at any moment is independent of whether an event hasoccurred at any other moment.

Derivation of the distribution has some other technical first principles, butthe above is the most important.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 115 / 242

Page 595: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Distribution

Poisson data arises when there is some discrete event which occurs(possibly multiple times) at a constant rate for some fixed time period.

This constant rate assumption could be restated: the probability of anevent occurring at any moment is independent of whether an event hasoccurred at any other moment.

Derivation of the distribution has some other technical first principles, butthe above is the most important.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 115 / 242

Page 596: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Connections to Distributions We Have Seen

Take Binom(n, p) and let n→∞ and p → 0 holding np = µ constant

If the number of arrivals in the time interval [0, t] follows aPoisson(λt) then the wait times are distributed Exponential withmean 1/λ.

For Yj |(X = k) ∼Multinom(k , pj) then each Yj ∼Pois(λpj).

If Xi ∼Pois(λi ) for i = 1 . . . n independent thenY =

∑ni=1 Xi ∼ Pois(

∑ni=1 λi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 116 / 242

Page 597: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Connections to Distributions We Have Seen

Take Binom(n, p) and let n→∞ and p → 0 holding np = µ constant

If the number of arrivals in the time interval [0, t] follows aPoisson(λt) then the wait times are distributed Exponential withmean 1/λ.

For Yj |(X = k) ∼Multinom(k , pj) then each Yj ∼Pois(λpj).

If Xi ∼Pois(λi ) for i = 1 . . . n independent thenY =

∑ni=1 Xi ∼ Pois(

∑ni=1 λi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 116 / 242

Page 598: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Connections to Distributions We Have Seen

Take Binom(n, p) and let n→∞ and p → 0 holding np = µ constant

If the number of arrivals in the time interval [0, t] follows aPoisson(λt) then the wait times are distributed Exponential withmean 1/λ.

For Yj |(X = k) ∼Multinom(k , pj) then each Yj ∼Pois(λpj).

If Xi ∼Pois(λi ) for i = 1 . . . n independent thenY =

∑ni=1 Xi ∼ Pois(

∑ni=1 λi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 116 / 242

Page 599: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Connections to Distributions We Have Seen

Take Binom(n, p) and let n→∞ and p → 0 holding np = µ constant

If the number of arrivals in the time interval [0, t] follows aPoisson(λt) then the wait times are distributed Exponential withmean 1/λ.

For Yj |(X = k) ∼Multinom(k , pj) then each Yj ∼Pois(λpj).

If Xi ∼Pois(λi ) for i = 1 . . . n independent thenY =

∑ni=1 Xi ∼ Pois(

∑ni=1 λi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 116 / 242

Page 600: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Connections to Distributions We Have Seen

Take Binom(n, p) and let n→∞ and p → 0 holding np = µ constant

If the number of arrivals in the time interval [0, t] follows aPoisson(λt) then the wait times are distributed Exponential withmean 1/λ.

For Yj |(X = k) ∼Multinom(k , pj) then each Yj ∼Pois(λpj).

If Xi ∼Pois(λi ) for i = 1 . . . n independent thenY =

∑ni=1 Xi ∼ Pois(

∑ni=1 λi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 116 / 242

Page 601: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson regression model:

Yi ∼ Poisson(yi |λi )λi = exp(xiβ)

and, as usual, Yi and Yj are independent ∀ i 6= j , conditional on X .The probability density of all the data:

P(y |λ) =n∏

i=1

e−λiλyiiyi !

The log-likelihood:

ln L(β|y) =n∑

i=1

{yi ln(λi )− λi − ln(yi !)}

=n∑

i=1

{(xiβ)yi − exp(xiβ)− ln yi !}

.=

n∑i=1

{(xiβ)yi − exp(xiβ)}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 117 / 242

Page 602: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson regression model:

Yi ∼ Poisson(yi |λi )

λi = exp(xiβ)

and, as usual, Yi and Yj are independent ∀ i 6= j , conditional on X .The probability density of all the data:

P(y |λ) =n∏

i=1

e−λiλyiiyi !

The log-likelihood:

ln L(β|y) =n∑

i=1

{yi ln(λi )− λi − ln(yi !)}

=n∑

i=1

{(xiβ)yi − exp(xiβ)− ln yi !}

.=

n∑i=1

{(xiβ)yi − exp(xiβ)}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 117 / 242

Page 603: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson regression model:

Yi ∼ Poisson(yi |λi )λi = exp(xiβ)

and, as usual, Yi and Yj are independent ∀ i 6= j , conditional on X .The probability density of all the data:

P(y |λ) =n∏

i=1

e−λiλyiiyi !

The log-likelihood:

ln L(β|y) =n∑

i=1

{yi ln(λi )− λi − ln(yi !)}

=n∑

i=1

{(xiβ)yi − exp(xiβ)− ln yi !}

.=

n∑i=1

{(xiβ)yi − exp(xiβ)}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 117 / 242

Page 604: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson regression model:

Yi ∼ Poisson(yi |λi )λi = exp(xiβ)

and, as usual, Yi and Yj are independent ∀ i 6= j , conditional on X .

The probability density of all the data:

P(y |λ) =n∏

i=1

e−λiλyiiyi !

The log-likelihood:

ln L(β|y) =n∑

i=1

{yi ln(λi )− λi − ln(yi !)}

=n∑

i=1

{(xiβ)yi − exp(xiβ)− ln yi !}

.=

n∑i=1

{(xiβ)yi − exp(xiβ)}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 117 / 242

Page 605: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson regression model:

Yi ∼ Poisson(yi |λi )λi = exp(xiβ)

and, as usual, Yi and Yj are independent ∀ i 6= j , conditional on X .The probability density of all the data:

P(y |λ) =n∏

i=1

e−λiλyiiyi !

The log-likelihood:

ln L(β|y) =n∑

i=1

{yi ln(λi )− λi − ln(yi !)}

=n∑

i=1

{(xiβ)yi − exp(xiβ)− ln yi !}

.=

n∑i=1

{(xiβ)yi − exp(xiβ)}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 117 / 242

Page 606: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson regression model:

Yi ∼ Poisson(yi |λi )λi = exp(xiβ)

and, as usual, Yi and Yj are independent ∀ i 6= j , conditional on X .The probability density of all the data:

P(y |λ) =n∏

i=1

e−λiλyiiyi !

The log-likelihood:

ln L(β|y) =n∑

i=1

{yi ln(λi )− λi − ln(yi !)}

=n∑

i=1

{(xiβ)yi − exp(xiβ)− ln yi !}

.=

n∑i=1

{(xiβ)yi − exp(xiβ)}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 117 / 242

Page 607: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson regression model:

Yi ∼ Poisson(yi |λi )λi = exp(xiβ)

and, as usual, Yi and Yj are independent ∀ i 6= j , conditional on X .The probability density of all the data:

P(y |λ) =n∏

i=1

e−λiλyiiyi !

The log-likelihood:

ln L(β|y) =n∑

i=1

{yi ln(λi )− λi − ln(yi !)}

=n∑

i=1

{(xiβ)yi − exp(xiβ)− ln yi !}

.=

n∑i=1

{(xiβ)yi − exp(xiβ)}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 117 / 242

Page 608: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson regression model:

Yi ∼ Poisson(yi |λi )λi = exp(xiβ)

and, as usual, Yi and Yj are independent ∀ i 6= j , conditional on X .The probability density of all the data:

P(y |λ) =n∏

i=1

e−λiλyiiyi !

The log-likelihood:

ln L(β|y) =n∑

i=1

{yi ln(λi )− λi − ln(yi !)}

=n∑

i=1

{(xiβ)yi − exp(xiβ)− ln yi !}

.=

n∑i=1

{(xiβ)yi − exp(xiβ)}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 117 / 242

Page 609: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson regression model:

Yi ∼ Poisson(yi |λi )λi = exp(xiβ)

and, as usual, Yi and Yj are independent ∀ i 6= j , conditional on X .The probability density of all the data:

P(y |λ) =n∏

i=1

e−λiλyiiyi !

The log-likelihood:

ln L(β|y) =n∑

i=1

{yi ln(λi )− λi − ln(yi !)}

=n∑

i=1

{(xiβ)yi − exp(xiβ)− ln yi !}

.=

n∑i=1

{(xiβ)yi − exp(xiβ)}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 117 / 242

Page 610: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson regression model:

Yi ∼ Poisson(yi |λi )λi = exp(xiβ)

and, as usual, Yi and Yj are independent ∀ i 6= j , conditional on X .The probability density of all the data:

P(y |λ) =n∏

i=1

e−λiλyiiyi !

The log-likelihood:

ln L(β|y) =n∑

i=1

{yi ln(λi )− λi − ln(yi !)}

=n∑

i=1

{(xiβ)yi − exp(xiβ)− ln yi !}

.=

n∑i=1

{(xiβ)yi − exp(xiβ)}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 117 / 242

Page 611: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Comparing with the Linear Model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 118 / 242

Page 612: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict in Northern Ireland

Background: a conflict largely along religious lines about the status ofNorthern Ireland within the United Kingdom, and the division of resourcesand political power between Northern Ireland’s Protestant (mainlyUnionist) and Catholic (mainly Republican) communities.

The data: the number of Republican deaths for every month from 1969,the beginning of sustained violence, to 2001 (at which point, mostorganized violence had subsided). Also, the unemployment rates in the twomain religious communities.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 119 / 242

Page 613: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict in Northern Ireland

Background: a conflict largely along religious lines about the status ofNorthern Ireland within the United Kingdom, and the division of resourcesand political power between Northern Ireland’s Protestant (mainlyUnionist) and Catholic (mainly Republican) communities.

The data: the number of Republican deaths for every month from 1969,the beginning of sustained violence, to 2001 (at which point, mostorganized violence had subsided). Also, the unemployment rates in the twomain religious communities.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 119 / 242

Page 614: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict in Northern Ireland

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 120 / 242

Page 615: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict in Northern Ireland

The model: Let Yi = # of Republican deaths in a month. Our solepredictor for the moment will be: UC = the unemployment rate amongNorthern Ireland’s Catholics.

Our model is then:Yi ∼ Pois(λi )

andλi = E [Yi |UC

i ] = exp(β0 + β1 ∗ UCi ).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 121 / 242

Page 616: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Civil Conflict in Northern Ireland

The model: Let Yi = # of Republican deaths in a month. Our solepredictor for the moment will be: UC = the unemployment rate amongNorthern Ireland’s Catholics.

Our model is then:Yi ∼ Pois(λi )

andλi = E [Yi |UC

i ] = exp(β0 + β1 ∗ UCi ).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 121 / 242

Page 617: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimate (just as we have all along!)

mod <- glm(repdeaths ~ cathunemp,

data = troubles, family = poisson(link="log")

> summary(mod)$coefficients

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.295875 0.1805327 7.178064 7.070547e-13

cathunemp 1.406498 0.6689819 2.102445 3.551432e-02

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 122 / 242

Page 618: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Our fitted model

λi = E [Yi |UCi ] = exp(1.296 + 1.407 ∗ UC

i ).

●●●●●●●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●●

●●● ●●

●● ●

●●

●●●

●●

●●●

●●●●

●●●

●●

●●●●●●●●

●●

● ●●

●●●●●●●●●●●●●●●●●●●

●●●●●● ●●●●●●

●●●●●●●●●

0.10 0.20 0.30 0.40

010

2030

4050

cathunemp

repd

eath

s

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 123 / 242

Page 619: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Some fitted and predicted valuesSuppose UC is equal to .2.

mod.coef <- coef(mod); mod.vcov <- vcov(mod)

beta.draws <- mvrnorm(10000, mod.coef, mod.vcov)

lambda.draws <- exp(beta.draws[,1] + .2*beta.draws[,2])

outcome.draws <- rpois(10000, lambda.draws)

4.4 4.8 5.2

0.0

1.0

2.0

3.0

Expected Values

E[Y|U]

Den

sity

Predicted Values

Y|U

Fre

quen

cy

0 5 10 15

050

010

0015

00

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 124 / 242

Page 620: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Overdispersion

36% of observations lie outside the 2.5% or 97.5% quantile of the Poissondistribution that we are alleging generated them.

●●●●●●●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●●

●●● ●●

●● ●

●●

●●●

●●

●●●

●●●●

●●●

●●

●●●●●●●●

●●

● ●●

●●●●●●●●●●●●●●●●●●●

●●●●●● ●●●●●●

●●●●●●●●●

0.10 0.20 0.30 0.40

010

2030

4050

cathunemp

repd

eath

s

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 125 / 242

Page 621: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Overdispersion in Poisson Model

The Poisson model assumes E(Yi | Xi ) = V(Yi | Xi )

But for many count data, E(Yi | Xi ) < V(Yi | Xi )

Potential sources of overdispersion:1 unobserved heterogeneity2 clustering3 contagion or diffusion4 (classical) measurement error

Underdispersion could occur, but rare

One solution to this is to modify the Poisson model by assuming:

E(Yi | Xi ) = µi = exp(X>i β) and V(Yi | Xi ) = Vi = φµi

This is called the overdispersed Poisson regression model

When φ > 1, this corresponds to a type of the negative binomialregression model (more on this later)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 126 / 242

Page 622: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Overdispersion in Poisson Model

The Poisson model assumes E(Yi | Xi ) = V(Yi | Xi )

But for many count data, E(Yi | Xi ) < V(Yi | Xi )

Potential sources of overdispersion:1 unobserved heterogeneity2 clustering3 contagion or diffusion4 (classical) measurement error

Underdispersion could occur, but rare

One solution to this is to modify the Poisson model by assuming:

E(Yi | Xi ) = µi = exp(X>i β) and V(Yi | Xi ) = Vi = φµi

This is called the overdispersed Poisson regression model

When φ > 1, this corresponds to a type of the negative binomialregression model (more on this later)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 126 / 242

Page 623: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Overdispersion in Poisson Model

The Poisson model assumes E(Yi | Xi ) = V(Yi | Xi )

But for many count data, E(Yi | Xi ) < V(Yi | Xi )

Potential sources of overdispersion:1 unobserved heterogeneity2 clustering3 contagion or diffusion4 (classical) measurement error

Underdispersion could occur, but rare

One solution to this is to modify the Poisson model by assuming:

E(Yi | Xi ) = µi = exp(X>i β) and V(Yi | Xi ) = Vi = φµi

This is called the overdispersed Poisson regression model

When φ > 1, this corresponds to a type of the negative binomialregression model (more on this later)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 126 / 242

Page 624: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Overdispersion in Poisson Model

The Poisson model assumes E(Yi | Xi ) = V(Yi | Xi )

But for many count data, E(Yi | Xi ) < V(Yi | Xi )

Potential sources of overdispersion:1 unobserved heterogeneity2 clustering3 contagion or diffusion4 (classical) measurement error

Underdispersion could occur, but rare

One solution to this is to modify the Poisson model by assuming:

E(Yi | Xi ) = µi = exp(X>i β) and V(Yi | Xi ) = Vi = φµi

This is called the overdispersed Poisson regression model

When φ > 1, this corresponds to a type of the negative binomialregression model (more on this later)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 126 / 242

Page 625: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Overdispersion in Poisson Model

The Poisson model assumes E(Yi | Xi ) = V(Yi | Xi )

But for many count data, E(Yi | Xi ) < V(Yi | Xi )

Potential sources of overdispersion:1 unobserved heterogeneity2 clustering3 contagion or diffusion4 (classical) measurement error

Underdispersion could occur, but rare

One solution to this is to modify the Poisson model by assuming:

E(Yi | Xi ) = µi = exp(X>i β) and V(Yi | Xi ) = Vi = φµi

This is called the overdispersed Poisson regression model

When φ > 1, this corresponds to a type of the negative binomialregression model (more on this later)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 126 / 242

Page 626: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Overdispersion in Poisson Model

The Poisson model assumes E(Yi | Xi ) = V(Yi | Xi )

But for many count data, E(Yi | Xi ) < V(Yi | Xi )

Potential sources of overdispersion:1 unobserved heterogeneity2 clustering3 contagion or diffusion4 (classical) measurement error

Underdispersion could occur, but rare

One solution to this is to modify the Poisson model by assuming:

E(Yi | Xi ) = µi = exp(X>i β) and V(Yi | Xi ) = Vi = φµi

This is called the overdispersed Poisson regression model

When φ > 1, this corresponds to a type of the negative binomialregression model (more on this later)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 126 / 242

Page 627: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Overdispersion in Poisson Model

The Poisson model assumes E(Yi | Xi ) = V(Yi | Xi )

But for many count data, E(Yi | Xi ) < V(Yi | Xi )

Potential sources of overdispersion:1 unobserved heterogeneity2 clustering3 contagion or diffusion4 (classical) measurement error

Underdispersion could occur, but rare

One solution to this is to modify the Poisson model by assuming:

E(Yi | Xi ) = µi = exp(X>i β) and V(Yi | Xi ) = Vi = φµi

This is called the overdispersed Poisson regression model

When φ > 1, this corresponds to a type of the negative binomialregression model (more on this later)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 126 / 242

Page 628: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Derivation as a Gamma-Poisson Mixture

Here’s the new stochastic component:

Yi |ςi ∼ Poisson(ςiλi )

ςi ∼1

θGamma(θ)

Note that Gamma(θ) implicitly has location parameter 1, so its mean is θ.This means that 1

θGamma(θ) has mean 1, and so Poisson(ςiλi ) has meanλi .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 127 / 242

Page 629: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Derivation as a Gamma-Poisson Mixture

Here’s the new stochastic component:

Yi |ςi ∼ Poisson(ςiλi )

ςi ∼1

θGamma(θ)

Note that Gamma(θ) implicitly has location parameter 1, so its mean is θ.This means that 1

θGamma(θ) has mean 1, and so Poisson(ςiλi ) has meanλi .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 127 / 242

Page 630: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Derivation as a Gamma-Poisson Mixture

Here’s the new stochastic component:

Yi |ςi ∼ Poisson(ςiλi )

ςi ∼1

θGamma(θ)

Note that Gamma(θ) implicitly has location parameter 1, so its mean is θ.This means that 1

θGamma(θ) has mean 1, and so Poisson(ςiλi ) has meanλi .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 127 / 242

Page 631: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Derivation as a Gamma-Poisson Mixture

Here’s the new stochastic component:

Yi |ςi ∼ Poisson(ςiλi )

ςi ∼1

θGamma(θ)

Note that Gamma(θ) implicitly has location parameter 1, so its mean is θ.

This means that 1θGamma(θ) has mean 1, and so Poisson(ςiλi ) has mean

λi .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 127 / 242

Page 632: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Derivation as a Gamma-Poisson Mixture

Here’s the new stochastic component:

Yi |ςi ∼ Poisson(ςiλi )

ςi ∼1

θGamma(θ)

Note that Gamma(θ) implicitly has location parameter 1, so its mean is θ.This means that 1

θGamma(θ) has mean 1, and so Poisson(ςiλi ) has meanλi .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 127 / 242

Page 633: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Derivation as a Gamma-Poisson Mixture

Using a similar approach to that described in UPM pgs. 51-52 we canderive the marginal distribution of Y as

Yi ∼ Negbin(λi , θ)

where

fnb(yi |λi , θ) =Γ(θ + yi )

yi !Γ(θ)

λyii θθ

(λi + θ)θ+yi

Notes:

1. E[Yi ] = λi and Var(Yi ) = λi +λ2iθ . What values of θ would be

evidence against overdispersion?

2. we still have the same old systematic component: λi = exp(Xiβ).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 128 / 242

Page 634: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Derivation as a Gamma-Poisson Mixture

Using a similar approach to that described in UPM pgs. 51-52 we canderive the marginal distribution of Y as

Yi ∼ Negbin(λi , θ)

where

fnb(yi |λi , θ) =Γ(θ + yi )

yi !Γ(θ)

λyii θθ

(λi + θ)θ+yi

Notes:

1. E[Yi ] = λi and Var(Yi ) = λi +λ2iθ . What values of θ would be

evidence against overdispersion?

2. we still have the same old systematic component: λi = exp(Xiβ).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 128 / 242

Page 635: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Derivation as a Gamma-Poisson Mixture

Using a similar approach to that described in UPM pgs. 51-52 we canderive the marginal distribution of Y as

Yi ∼ Negbin(λi , θ)

where

fnb(yi |λi , θ) =Γ(θ + yi )

yi !Γ(θ)

λyii θθ

(λi + θ)θ+yi

Notes:

1. E[Yi ] = λi and Var(Yi ) = λi +λ2iθ . What values of θ would be

evidence against overdispersion?

2. we still have the same old systematic component: λi = exp(Xiβ).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 128 / 242

Page 636: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Derivation as a Gamma-Poisson Mixture

Using a similar approach to that described in UPM pgs. 51-52 we canderive the marginal distribution of Y as

Yi ∼ Negbin(λi , θ)

where

fnb(yi |λi , θ) =Γ(θ + yi )

yi !Γ(θ)

λyii θθ

(λi + θ)θ+yi

Notes:

1. E[Yi ] = λi and Var(Yi ) = λi +λ2iθ . What values of θ would be

evidence against overdispersion?

2. we still have the same old systematic component: λi = exp(Xiβ).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 128 / 242

Page 637: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimates

mod <- zelig(repdeaths ~ cathunemp, data = troubles,

model = "negbin")

summary(mod)

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) 1.2959 0.1805 7.178 7.07e-13 ***

cathunemp 1.4065 0.6690 2.102 0.0355 *

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

...

Theta: 0.8551

Std. Err.: 0.0754

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 129 / 242

Page 638: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Overdispersion Handled!5.68% of observations lie at or above the 95% quantile of the NegativeBinomial distribution that we are alleging generated them.

●●●●●●●●●●●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

● ●

●●

●●

●● ●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●●●

●●

●●

●●●

●●

●●●●●

●●

●●●●

●●●

●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●●

●●●

●●●●

●●●●●

●●● ●●

●● ●

●●

●●●

●●

●●●

●●●●

●●●

●●

●●●●●●●●

●●

● ●●

●●●●●●●●●●●●●●●●●●●

●●●●●● ●●●●●●

●●●●●●●●●

0.10 0.20 0.30 0.40

010

2030

4050

cathunemp

repd

eath

s

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 130 / 242

Page 639: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binomial Regression Model

Sometimes count data have a known upper bound Mi

Examples:

I # of votes for a third party candidate in precinct with population Mi

I # of children who drop out of high school in a family with Mi children

If Mi “trials” are all independent, we have the binomial distribution:

p(Yi | Mi , πi ) =

(Mi

Yi

)πYi

i (1− πi )Mi−Yi

An exponential family with E(Yi ) = Miπi and V(Yi ) = Miπi (1− πi )

We can thus consider a GLM, the binomial regression model,by setting πi = g−1(X>i β)

Common links: logit (canonical), probit, cloglog

Note that if Mi = 1 for all i , this reduces to a binary outcome model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 131 / 242

Page 640: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binomial Regression Model

Sometimes count data have a known upper bound Mi

Examples:

I # of votes for a third party candidate in precinct with population Mi

I # of children who drop out of high school in a family with Mi children

If Mi “trials” are all independent, we have the binomial distribution:

p(Yi | Mi , πi ) =

(Mi

Yi

)πYi

i (1− πi )Mi−Yi

An exponential family with E(Yi ) = Miπi and V(Yi ) = Miπi (1− πi )

We can thus consider a GLM, the binomial regression model,by setting πi = g−1(X>i β)

Common links: logit (canonical), probit, cloglog

Note that if Mi = 1 for all i , this reduces to a binary outcome model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 131 / 242

Page 641: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binomial Regression Model

Sometimes count data have a known upper bound Mi

Examples:

I # of votes for a third party candidate in precinct with population Mi

I # of children who drop out of high school in a family with Mi children

If Mi “trials” are all independent, we have the binomial distribution:

p(Yi | Mi , πi ) =

(Mi

Yi

)πYi

i (1− πi )Mi−Yi

An exponential family with E(Yi ) = Miπi and V(Yi ) = Miπi (1− πi )

We can thus consider a GLM, the binomial regression model,by setting πi = g−1(X>i β)

Common links: logit (canonical), probit, cloglog

Note that if Mi = 1 for all i , this reduces to a binary outcome model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 131 / 242

Page 642: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binomial Regression Model

Sometimes count data have a known upper bound Mi

Examples:

I # of votes for a third party candidate in precinct with population Mi

I # of children who drop out of high school in a family with Mi children

If Mi “trials” are all independent, we have the binomial distribution:

p(Yi | Mi , πi ) =

(Mi

Yi

)πYi

i (1− πi )Mi−Yi

An exponential family with E(Yi ) = Miπi and V(Yi ) = Miπi (1− πi )

We can thus consider a GLM, the binomial regression model,by setting πi = g−1(X>i β)

Common links: logit (canonical), probit, cloglog

Note that if Mi = 1 for all i , this reduces to a binary outcome model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 131 / 242

Page 643: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binomial Regression Model

Sometimes count data have a known upper bound Mi

Examples:

I # of votes for a third party candidate in precinct with population Mi

I # of children who drop out of high school in a family with Mi children

If Mi “trials” are all independent, we have the binomial distribution:

p(Yi | Mi , πi ) =

(Mi

Yi

)πYi

i (1− πi )Mi−Yi

An exponential family with E(Yi ) = Miπi and V(Yi ) = Miπi (1− πi )

We can thus consider a GLM, the binomial regression model,by setting πi = g−1(X>i β)

Common links: logit (canonical), probit, cloglog

Note that if Mi = 1 for all i , this reduces to a binary outcome model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 131 / 242

Page 644: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binomial Regression Model

Sometimes count data have a known upper bound Mi

Examples:

I # of votes for a third party candidate in precinct with population Mi

I # of children who drop out of high school in a family with Mi children

If Mi “trials” are all independent, we have the binomial distribution:

p(Yi | Mi , πi ) =

(Mi

Yi

)πYi

i (1− πi )Mi−Yi

An exponential family with E(Yi ) = Miπi and V(Yi ) = Miπi (1− πi )

We can thus consider a GLM, the binomial regression model,by setting πi = g−1(X>i β)

Common links: logit (canonical), probit, cloglog

Note that if Mi = 1 for all i , this reduces to a binary outcome model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 131 / 242

Page 645: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binomial Regression Model

Sometimes count data have a known upper bound Mi

Examples:

I # of votes for a third party candidate in precinct with population Mi

I # of children who drop out of high school in a family with Mi children

If Mi “trials” are all independent, we have the binomial distribution:

p(Yi | Mi , πi ) =

(Mi

Yi

)πYi

i (1− πi )Mi−Yi

An exponential family with E(Yi ) = Miπi and V(Yi ) = Miπi (1− πi )

We can thus consider a GLM, the binomial regression model,

by setting πi = g−1(X>i β)

Common links: logit (canonical), probit, cloglog

Note that if Mi = 1 for all i , this reduces to a binary outcome model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 131 / 242

Page 646: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binomial Regression Model

Sometimes count data have a known upper bound Mi

Examples:

I # of votes for a third party candidate in precinct with population Mi

I # of children who drop out of high school in a family with Mi children

If Mi “trials” are all independent, we have the binomial distribution:

p(Yi | Mi , πi ) =

(Mi

Yi

)πYi

i (1− πi )Mi−Yi

An exponential family with E(Yi ) = Miπi and V(Yi ) = Miπi (1− πi )

We can thus consider a GLM, the binomial regression model,by setting πi = g−1(X>i β)

Common links: logit (canonical), probit, cloglog

Note that if Mi = 1 for all i , this reduces to a binary outcome model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 131 / 242

Page 647: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binomial Regression Model

Sometimes count data have a known upper bound Mi

Examples:

I # of votes for a third party candidate in precinct with population Mi

I # of children who drop out of high school in a family with Mi children

If Mi “trials” are all independent, we have the binomial distribution:

p(Yi | Mi , πi ) =

(Mi

Yi

)πYi

i (1− πi )Mi−Yi

An exponential family with E(Yi ) = Miπi and V(Yi ) = Miπi (1− πi )

We can thus consider a GLM, the binomial regression model,by setting πi = g−1(X>i β)

Common links: logit (canonical), probit, cloglog

Note that if Mi = 1 for all i , this reduces to a binary outcome model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 131 / 242

Page 648: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Binomial Regression Model

Sometimes count data have a known upper bound Mi

Examples:

I # of votes for a third party candidate in precinct with population Mi

I # of children who drop out of high school in a family with Mi children

If Mi “trials” are all independent, we have the binomial distribution:

p(Yi | Mi , πi ) =

(Mi

Yi

)πYi

i (1− πi )Mi−Yi

An exponential family with E(Yi ) = Miπi and V(Yi ) = Miπi (1− πi )

We can thus consider a GLM, the binomial regression model,by setting πi = g−1(X>i β)

Common links: logit (canonical), probit, cloglog

Note that if Mi = 1 for all i , this reduces to a binary outcome model

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 131 / 242

Page 649: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation and Overdispersion

The log-likelihood:

`(β | Xi ) =N∑i=1

{Yi log

(πi

1− πi

)+ Mi log(1− πi ) + log

(Mi

Yi

)}

Use the standard MLE/GLM machinary to estimate β and calculatequantities of interest

Data are often overdispersed due to dependence between trials

Modify the variance function by including a dispersion parameter:

E(Yi | Xi ) = Miπi and V(Yi | Xi ) = φMiπi (1− πi )

Estimate β and φ via QMLE

This is a GLM, so we have the same robustness properties as the Poissoncase

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 132 / 242

Page 650: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation and Overdispersion

The log-likelihood:

`(β | Xi ) =N∑i=1

{Yi log

(πi

1− πi

)+ Mi log(1− πi ) + log

(Mi

Yi

)}

Use the standard MLE/GLM machinary to estimate β and calculatequantities of interest

Data are often overdispersed due to dependence between trials

Modify the variance function by including a dispersion parameter:

E(Yi | Xi ) = Miπi and V(Yi | Xi ) = φMiπi (1− πi )

Estimate β and φ via QMLE

This is a GLM, so we have the same robustness properties as the Poissoncase

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 132 / 242

Page 651: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation and Overdispersion

The log-likelihood:

`(β | Xi ) =N∑i=1

{Yi log

(πi

1− πi

)+ Mi log(1− πi ) + log

(Mi

Yi

)}

Use the standard MLE/GLM machinary to estimate β and calculatequantities of interest

Data are often overdispersed due to dependence between trials

Modify the variance function by including a dispersion parameter:

E(Yi | Xi ) = Miπi and V(Yi | Xi ) = φMiπi (1− πi )

Estimate β and φ via QMLE

This is a GLM, so we have the same robustness properties as the Poissoncase

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 132 / 242

Page 652: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation and Overdispersion

The log-likelihood:

`(β | Xi ) =N∑i=1

{Yi log

(πi

1− πi

)+ Mi log(1− πi ) + log

(Mi

Yi

)}

Use the standard MLE/GLM machinary to estimate β and calculatequantities of interest

Data are often overdispersed due to dependence between trials

Modify the variance function by including a dispersion parameter:

E(Yi | Xi ) = Miπi and V(Yi | Xi ) = φMiπi (1− πi )

Estimate β and φ via QMLE

This is a GLM, so we have the same robustness properties as the Poissoncase

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 132 / 242

Page 653: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation and Overdispersion

The log-likelihood:

`(β | Xi ) =N∑i=1

{Yi log

(πi

1− πi

)+ Mi log(1− πi ) + log

(Mi

Yi

)}

Use the standard MLE/GLM machinary to estimate β and calculatequantities of interest

Data are often overdispersed due to dependence between trials

Modify the variance function by including a dispersion parameter:

E(Yi | Xi ) = Miπi and V(Yi | Xi ) = φMiπi (1− πi )

Estimate β and φ via QMLE

This is a GLM, so we have the same robustness properties as the Poissoncase

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 132 / 242

Page 654: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation and Overdispersion

The log-likelihood:

`(β | Xi ) =N∑i=1

{Yi log

(πi

1− πi

)+ Mi log(1− πi ) + log

(Mi

Yi

)}

Use the standard MLE/GLM machinary to estimate β and calculatequantities of interest

Data are often overdispersed due to dependence between trials

Modify the variance function by including a dispersion parameter:

E(Yi | Xi ) = Miπi and V(Yi | Xi ) = φMiπi (1− πi )

Estimate β and φ via QMLE

This is a GLM, so we have the same robustness properties as the Poissoncase

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 132 / 242

Page 655: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation and Overdispersion

The log-likelihood:

`(β | Xi ) =N∑i=1

{Yi log

(πi

1− πi

)+ Mi log(1− πi ) + log

(Mi

Yi

)}

Use the standard MLE/GLM machinary to estimate β and calculatequantities of interest

Data are often overdispersed due to dependence between trials

Modify the variance function by including a dispersion parameter:

E(Yi | Xi ) = Miπi and V(Yi | Xi ) = φMiπi (1− πi )

Estimate β and φ via QMLE

This is a GLM, so we have the same robustness properties as the Poissoncase

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 132 / 242

Page 656: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: Butterfly Ballot in 2000 Presidential ElectionWand et al. (2001): Did the butterfly ballot give the election to Bush?

Yi : Number of votes cast for Buchanan in county i

Xi : Past Republican & third-party vote shares, demographic covariates

Wand et al. examine residuals to see how abberant the vote share was inPalm Beach

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 133 / 242

Page 657: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Fitting GLMs in R

Canonical LinkFamily (Default) Variance Modelgaussian identity φ (= σ2) normal linearbinomial logit µ(1− µ) logit, probit, binomialpoisson log µ Poissonquasibinomial logit φµ(1− µ) overdispersed binomialquasipoisson log φµ overdispersed Poisson

Other choices not covered in this course: Gamma, inverse.gaussian

You can roll your own GLM using the quasi family

The negative binomial regression (NB2) can be fitted via the glm.nb

function in MASS

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 134 / 242

Page 658: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Fitting GLMs in R

Canonical LinkFamily (Default) Variance Modelgaussian identity φ (= σ2) normal linearbinomial logit µ(1− µ) logit, probit, binomialpoisson log µ Poissonquasibinomial logit φµ(1− µ) overdispersed binomialquasipoisson log φµ overdispersed Poisson

Other choices not covered in this course: Gamma, inverse.gaussian

You can roll your own GLM using the quasi family

The negative binomial regression (NB2) can be fitted via the glm.nb

function in MASS

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 134 / 242

Page 659: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Fitting GLMs in R

Canonical LinkFamily (Default) Variance Modelgaussian identity φ (= σ2) normal linearbinomial logit µ(1− µ) logit, probit, binomialpoisson log µ Poissonquasibinomial logit φµ(1− µ) overdispersed binomialquasipoisson log φµ overdispersed Poisson

Other choices not covered in this course: Gamma, inverse.gaussian

You can roll your own GLM using the quasi family

The negative binomial regression (NB2) can be fitted via the glm.nb

function in MASS

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 134 / 242

Page 660: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Other Models

Note that there are many other count models for different types ofsituations:

Generalized Event Count (GEC) Model

Zero-Inflated Poisson

Zero-Inflated Negative Binomial

Zero-Truncated Models

Hurdle Models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 135 / 242

Page 661: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Other Models

Note that there are many other count models for different types ofsituations:

Generalized Event Count (GEC) Model

Zero-Inflated Poisson

Zero-Inflated Negative Binomial

Zero-Truncated Models

Hurdle Models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 135 / 242

Page 662: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Other Models

Note that there are many other count models for different types ofsituations:

Generalized Event Count (GEC) Model

Zero-Inflated Poisson

Zero-Inflated Negative Binomial

Zero-Truncated Models

Hurdle Models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 135 / 242

Page 663: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Other Models

Note that there are many other count models for different types ofsituations:

Generalized Event Count (GEC) Model

Zero-Inflated Poisson

Zero-Inflated Negative Binomial

Zero-Truncated Models

Hurdle Models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 135 / 242

Page 664: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Other Models

Note that there are many other count models for different types ofsituations:

Generalized Event Count (GEC) Model

Zero-Inflated Poisson

Zero-Inflated Negative Binomial

Zero-Truncated Models

Hurdle Models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 135 / 242

Page 665: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Other Models

Note that there are many other count models for different types ofsituations:

Generalized Event Count (GEC) Model

Zero-Inflated Poisson

Zero-Inflated Negative Binomial

Zero-Truncated Models

Hurdle Models

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 135 / 242

Page 666: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 136 / 242

Page 667: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 136 / 242

Page 668: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What are duration models used for?

Survival models = duration models = event history models

Dependent variable Y is the duration of time that observations spendin some state before experiencing an event (aka failure, death)

Used in biostatistics and engineering: i.e. how long until a patient dies

Models the relationship between duration and covariates (how doesan increase in X affect the duration Y )

In social science, used in questions such as how long a coalitiongovernment lasts, how long until someone gets a job, how a programextends life expectancy

Observations should be measured in the same (temporal) units, i.e.don’t have some units’ duration measured in days and others inmonths

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 137 / 242

Page 669: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What are duration models used for?

Survival models = duration models = event history models

Dependent variable Y is the duration of time that observations spendin some state before experiencing an event (aka failure, death)

Used in biostatistics and engineering: i.e. how long until a patient dies

Models the relationship between duration and covariates (how doesan increase in X affect the duration Y )

In social science, used in questions such as how long a coalitiongovernment lasts, how long until someone gets a job, how a programextends life expectancy

Observations should be measured in the same (temporal) units, i.e.don’t have some units’ duration measured in days and others inmonths

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 137 / 242

Page 670: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What are duration models used for?

Survival models = duration models = event history models

Dependent variable Y is the duration of time that observations spendin some state before experiencing an event (aka failure, death)

Used in biostatistics and engineering: i.e. how long until a patient dies

Models the relationship between duration and covariates (how doesan increase in X affect the duration Y )

In social science, used in questions such as how long a coalitiongovernment lasts, how long until someone gets a job, how a programextends life expectancy

Observations should be measured in the same (temporal) units, i.e.don’t have some units’ duration measured in days and others inmonths

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 137 / 242

Page 671: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What are duration models used for?

Survival models = duration models = event history models

Dependent variable Y is the duration of time that observations spendin some state before experiencing an event (aka failure, death)

Used in biostatistics and engineering: i.e. how long until a patient dies

Models the relationship between duration and covariates (how doesan increase in X affect the duration Y )

In social science, used in questions such as how long a coalitiongovernment lasts, how long until someone gets a job, how a programextends life expectancy

Observations should be measured in the same (temporal) units, i.e.don’t have some units’ duration measured in days and others inmonths

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 137 / 242

Page 672: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What are duration models used for?

Survival models = duration models = event history models

Dependent variable Y is the duration of time that observations spendin some state before experiencing an event (aka failure, death)

Used in biostatistics and engineering: i.e. how long until a patient dies

Models the relationship between duration and covariates (how doesan increase in X affect the duration Y )

In social science, used in questions such as how long a coalitiongovernment lasts,

how long until someone gets a job, how a programextends life expectancy

Observations should be measured in the same (temporal) units, i.e.don’t have some units’ duration measured in days and others inmonths

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 137 / 242

Page 673: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What are duration models used for?

Survival models = duration models = event history models

Dependent variable Y is the duration of time that observations spendin some state before experiencing an event (aka failure, death)

Used in biostatistics and engineering: i.e. how long until a patient dies

Models the relationship between duration and covariates (how doesan increase in X affect the duration Y )

In social science, used in questions such as how long a coalitiongovernment lasts, how long until someone gets a job,

how a programextends life expectancy

Observations should be measured in the same (temporal) units, i.e.don’t have some units’ duration measured in days and others inmonths

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 137 / 242

Page 674: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What are duration models used for?

Survival models = duration models = event history models

Dependent variable Y is the duration of time that observations spendin some state before experiencing an event (aka failure, death)

Used in biostatistics and engineering: i.e. how long until a patient dies

Models the relationship between duration and covariates (how doesan increase in X affect the duration Y )

In social science, used in questions such as how long a coalitiongovernment lasts, how long until someone gets a job, how a programextends life expectancy

Observations should be measured in the same (temporal) units, i.e.don’t have some units’ duration measured in days and others inmonths

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 137 / 242

Page 675: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What are duration models used for?

Survival models = duration models = event history models

Dependent variable Y is the duration of time that observations spendin some state before experiencing an event (aka failure, death)

Used in biostatistics and engineering: i.e. how long until a patient dies

Models the relationship between duration and covariates (how doesan increase in X affect the duration Y )

In social science, used in questions such as how long a coalitiongovernment lasts, how long until someone gets a job, how a programextends life expectancy

Observations should be measured in the same (temporal) units, i.e.don’t have some units’ duration measured in days and others inmonths

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 137 / 242

Page 676: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Why not just use OLS?

Three reasons:

1. The normal linear model assumes Y is Normal but durationdependent variables are always positive (number of years, etc.)

2. Duration models can handle censoring

0 1 2 3 4

01

23

4

Observation 3 is Censored

time

obse

rvat

ions

●●

Observation 3 is censored in thatit has not experienced the eventat the time we stop collectingdata, so we don’t know its trueduration

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 138 / 242

Page 677: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Why not just use OLS?

Three reasons:

1. The normal linear model assumes Y is Normal but durationdependent variables are always positive (number of years, etc.)

2. Duration models can handle censoring

0 1 2 3 4

01

23

4

Observation 3 is Censored

time

obse

rvat

ions

●●

Observation 3 is censored in thatit has not experienced the eventat the time we stop collectingdata, so we don’t know its trueduration

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 138 / 242

Page 678: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Why not just use OLS?

Three reasons:

1. The normal linear model assumes Y is Normal but durationdependent variables are always positive (number of years, etc.)

2. Duration models can handle censoring

0 1 2 3 4

01

23

4

Observation 3 is Censored

time

obse

rvat

ions

●●

Observation 3 is censored in thatit has not experienced the eventat the time we stop collectingdata, so we don’t know its trueduration

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 138 / 242

Page 679: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Why not just use OLS?

Three reasons:

1. The normal linear model assumes Y is Normal but durationdependent variables are always positive (number of years, etc.)

2. Duration models can handle censoring

0 1 2 3 4

01

23

4

Observation 3 is Censored

time

obse

rvat

ions

●●

Observation 3 is censored in thatit has not experienced the eventat the time we stop collectingdata, so we don’t know its trueduration

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 138 / 242

Page 680: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Why not just use OLS?

Three reasons:

1. The normal linear model assumes Y is Normal but durationdependent variables are always positive (number of years, etc.)

2. Duration models can handle censoring

0 1 2 3 4

01

23

4

Observation 3 is Censored

time

obse

rvat

ions

●●

Observation 3 is censored in thatit has not experienced the eventat the time we stop collectingdata, so we don’t know its trueduration

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 138 / 242

Page 681: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Why not use OLS?

3. Duration models can handle time-varying covariates

I If Y is duration of a regime, GDP may change during the duration ofthe regime

I OLS cannot handle multiple values of GDP per observationI You can set up data in a special way with duration models such that

you can accommodate time-varying covariates

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 139 / 242

Page 682: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Why not use OLS?

3. Duration models can handle time-varying covariatesI If Y is duration of a regime, GDP may change during the duration of

the regime

I OLS cannot handle multiple values of GDP per observationI You can set up data in a special way with duration models such that

you can accommodate time-varying covariates

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 139 / 242

Page 683: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Why not use OLS?

3. Duration models can handle time-varying covariatesI If Y is duration of a regime, GDP may change during the duration of

the regimeI OLS cannot handle multiple values of GDP per observation

I You can set up data in a special way with duration models such thatyou can accommodate time-varying covariates

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 139 / 242

Page 684: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Why not use OLS?

3. Duration models can handle time-varying covariatesI If Y is duration of a regime, GDP may change during the duration of

the regimeI OLS cannot handle multiple values of GDP per observationI You can set up data in a special way with duration models such that

you can accommodate time-varying covariates

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 139 / 242

Page 685: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Let T denote a continuous positive random variable representing theduration/survival times (T = Y )

T has a probability density function f (t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 140 / 242

Page 686: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Let T denote a continuous positive random variable representing theduration/survival times (T = Y )

T has a probability density function f (t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 140 / 242

Page 687: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Let T denote a continuous positive random variable representing theduration/survival times (T = Y )

T has a probability density function f (t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 140 / 242

Page 688: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Let T denote a continuous positive random variable representing theduration/survival times (T = Y )

T has a probability density function f (t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 140 / 242

Page 689: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model JargonF(t): the CDF of f (t),

∫ t0 f (u)du = P(T ≤ t), which is the probability of

an event occurring before (or at exactly) time t

Survivor function: The probability of surviving (i.e. no event occuring)until at least time t: S(t) = 1− F (t) = P(T > t)

Eye of the Tiger: 1982 album by the band Survivor, which reachednumber 2 on the US Billboard 200 chart.

0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

CDF F(t)

t

F(t

)

0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

Survivor Function S(t)

t

S(t

)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 141 / 242

Page 690: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model JargonF(t): the CDF of f (t),

∫ t0 f (u)du = P(T ≤ t), which is the probability of

an event occurring before (or at exactly) time t

Survivor function: The probability of surviving (i.e. no event occuring)until at least time t: S(t) = 1− F (t) = P(T > t)

Eye of the Tiger: 1982 album by the band Survivor, which reachednumber 2 on the US Billboard 200 chart.

0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

CDF F(t)

t

F(t

)

0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

Survivor Function S(t)

t

S(t

)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 141 / 242

Page 691: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model JargonF(t): the CDF of f (t),

∫ t0 f (u)du = P(T ≤ t), which is the probability of

an event occurring before (or at exactly) time t

Survivor function: The probability of surviving (i.e. no event occuring)until at least time t: S(t) = 1− F (t) = P(T > t)

Eye of the Tiger: 1982 album by the band Survivor, which reachednumber 2 on the US Billboard 200 chart.

0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

CDF F(t)

t

F(t

)

0 2 4 6

0.0

0.2

0.4

0.6

0.8

1.0

Survivor Function S(t)

t

S(t

)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 141 / 242

Page 692: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Hazard rate (or hazard function): h(t) is roughly the probability of anevent at time t given survival up to time t

h(t) = P(t ≤ T < t + τ |T ≥ t)

= P(event at t|survival up to t)

=P(survival up to t|event at t)P(event at t)

P(survival up to t)

=P(event at t)

P(survival up to t)

=f (t)

S(t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 142 / 242

Page 693: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Hazard rate (or hazard function): h(t) is roughly the probability of anevent at time t given survival up to time t

h(t) = P(t ≤ T < t + τ |T ≥ t)

= P(event at t|survival up to t)

=P(survival up to t|event at t)P(event at t)

P(survival up to t)

=P(event at t)

P(survival up to t)

=f (t)

S(t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 142 / 242

Page 694: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Hazard rate (or hazard function): h(t) is roughly the probability of anevent at time t given survival up to time t

h(t) = P(t ≤ T < t + τ |T ≥ t)

= P(event at t|survival up to t)

=P(survival up to t|event at t)P(event at t)

P(survival up to t)

=P(event at t)

P(survival up to t)

=f (t)

S(t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 142 / 242

Page 695: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Hazard rate (or hazard function): h(t) is roughly the probability of anevent at time t given survival up to time t

h(t) = P(t ≤ T < t + τ |T ≥ t)

= P(event at t|survival up to t)

=P(survival up to t|event at t)P(event at t)

P(survival up to t)

=P(event at t)

P(survival up to t)

=f (t)

S(t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 142 / 242

Page 696: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Hazard rate (or hazard function): h(t) is roughly the probability of anevent at time t given survival up to time t

h(t) = P(t ≤ T < t + τ |T ≥ t)

= P(event at t|survival up to t)

=P(survival up to t|event at t)P(event at t)

P(survival up to t)

=P(event at t)

P(survival up to t)

=f (t)

S(t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 142 / 242

Page 697: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Hazard rate (or hazard function): h(t) is roughly the probability of anevent at time t given survival up to time t

h(t) = P(t ≤ T < t + τ |T ≥ t)

= P(event at t|survival up to t)

=P(survival up to t|event at t)P(event at t)

P(survival up to t)

=P(event at t)

P(survival up to t)

=f (t)

S(t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 142 / 242

Page 698: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Duration/Survival Model Jargon

Hazard rate (or hazard function): h(t) is roughly the probability of anevent at time t given survival up to time t

h(t) = P(t ≤ T < t + τ |T ≥ t)

= P(event at t|survival up to t)

=P(survival up to t|event at t)P(event at t)

P(survival up to t)

=P(event at t)

P(survival up to t)

=f (t)

S(t)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 142 / 242

Page 699: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relating the Density, Survival, and Hazard Functions

h(t) =f (t)

S(t)

implies

f (t)︸︷︷︸density function

= h(t)︸︷︷︸hazard function

· S(t)︸︷︷︸survival function

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 143 / 242

Page 700: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling with Covariates

We can model the mean of the duration times as a function of covariatesvia a link function g(·)

g(E [Ti ]) = Xiβ

and estimate β via maximum likelihood.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 144 / 242

Page 701: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling with Covariates

We can model the mean of the duration times as a function of covariatesvia a link function g(·)

g(E [Ti ]) = Xiβ

and estimate β via maximum likelihood.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 144 / 242

Page 702: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling with Covariates

We can model the mean of the duration times as a function of covariatesvia a link function g(·)

g(E [Ti ]) = Xiβ

and estimate β via maximum likelihood.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 144 / 242

Page 703: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Modeling with Covariates

We can model the mean of the duration times as a function of covariatesvia a link function g(·)

g(E [Ti ]) = Xiβ

and estimate β via maximum likelihood.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 144 / 242

Page 704: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How to estimate parametric survival models

They might seem fancy and complicated, but we estimate these modelsthe same as every other model!

1 Make an assumption that Ti follows a specific distribution f (t) (i.e.choose the stochastic component).

2 Model the hazard rate with covariates (i.e. specify the systematiccomponent).

3 Estimate via maximum likelihood.

4 Interpret quantities of interest (hazard ratios, expected survivaltimes).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 145 / 242

Page 705: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How to estimate parametric survival models

They might seem fancy and complicated, but we estimate these modelsthe same as every other model!

1 Make an assumption that Ti follows a specific distribution f (t) (i.e.choose the stochastic component).

2 Model the hazard rate with covariates (i.e. specify the systematiccomponent).

3 Estimate via maximum likelihood.

4 Interpret quantities of interest (hazard ratios, expected survivaltimes).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 145 / 242

Page 706: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How to estimate parametric survival models

They might seem fancy and complicated, but we estimate these modelsthe same as every other model!

1 Make an assumption that Ti follows a specific distribution f (t) (i.e.choose the stochastic component).

2 Model the hazard rate with covariates (i.e. specify the systematiccomponent).

3 Estimate via maximum likelihood.

4 Interpret quantities of interest (hazard ratios, expected survivaltimes).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 145 / 242

Page 707: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How to estimate parametric survival models

They might seem fancy and complicated, but we estimate these modelsthe same as every other model!

1 Make an assumption that Ti follows a specific distribution f (t) (i.e.choose the stochastic component).

2 Model the hazard rate with covariates (i.e. specify the systematiccomponent).

3 Estimate via maximum likelihood.

4 Interpret quantities of interest (hazard ratios, expected survivaltimes).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 145 / 242

Page 708: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How to estimate parametric survival models

They might seem fancy and complicated, but we estimate these modelsthe same as every other model!

1 Make an assumption that Ti follows a specific distribution f (t) (i.e.choose the stochastic component).

2 Model the hazard rate with covariates (i.e. specify the systematiccomponent).

3 Estimate via maximum likelihood.

4 Interpret quantities of interest (hazard ratios, expected survivaltimes).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 145 / 242

Page 709: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How to estimate parametric survival models

They might seem fancy and complicated, but we estimate these modelsthe same as every other model!

1 Make an assumption that Ti follows a specific distribution f (t) (i.e.choose the stochastic component).

2 Model the hazard rate with covariates (i.e. specify the systematiccomponent).

3 Estimate via maximum likelihood.

4 Interpret quantities of interest (hazard ratios, expected survivaltimes).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 145 / 242

Page 710: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What’s Special About Survival Models?

Censoring:

... it makes modeling a little tricky.But not too tricky

0 1 2 3 4

01

23

4

Observation 3 is Censored

time

obse

rvat

ions

●●

Observation 3 is censored because ithad not experienced the event when wecollected the data, so we don’t know itstrue duration.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 146 / 242

Page 711: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What’s Special About Survival Models?

Censoring:

... it makes modeling a little tricky.

But not too tricky

0 1 2 3 4

01

23

4

Observation 3 is Censored

time

obse

rvat

ions

●●

Observation 3 is censored because ithad not experienced the event when wecollected the data, so we don’t know itstrue duration.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 146 / 242

Page 712: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What’s Special About Survival Models?

Censoring:

... it makes modeling a little tricky.But not too tricky

0 1 2 3 4

01

23

4

Observation 3 is Censored

time

obse

rvat

ions

●●

Observation 3 is censored because ithad not experienced the event when wecollected the data, so we don’t know itstrue duration.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 146 / 242

Page 713: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What’s Special About Survival Models?

Censoring:

... it makes modeling a little tricky.But not too tricky

0 1 2 3 4

01

23

4

Observation 3 is Censored

time

obse

rvat

ions

●●

Observation 3 is censored because ithad not experienced the event when wecollected the data, so we don’t know itstrue duration.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 146 / 242

Page 714: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

What’s Special About Survival Models?

Censoring:

... it makes modeling a little tricky.But not too tricky

0 1 2 3 4

01

23

4

Observation 3 is Censored

time

obse

rvat

ions

●●

Observation 3 is censored because ithad not experienced the event when wecollected the data, so we don’t know itstrue duration.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 146 / 242

Page 715: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Censoring

Observations that are censored give us information about how long theysurvive.

For censored observations, we know that they survived at least until someobserved time, tc , and that the true duration, t is greater than or equal totc .

For each observation, let’s create a censoring indicator variable, ci , suchthat

ci =

{1 if not censored0 if censored

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 147 / 242

Page 716: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Censoring

Observations that are censored give us information about how long theysurvive.

For censored observations, we know that they survived at least until someobserved time, tc , and that the true duration, t is greater than or equal totc .

For each observation, let’s create a censoring indicator variable, ci , suchthat

ci =

{1 if not censored0 if censored

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 147 / 242

Page 717: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Censoring

Observations that are censored give us information about how long theysurvive.

For censored observations, we know that they survived at least until someobserved time, tc , and that the true duration, t is greater than or equal totc .

For each observation, let’s create a censoring indicator variable, ci , suchthat

ci =

{1 if not censored0 if censored

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 147 / 242

Page 718: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Censoring

Observations that are censored give us information about how long theysurvive.

For censored observations, we know that they survived at least until someobserved time, tc , and that the true duration, t is greater than or equal totc .

For each observation, let’s create a censoring indicator variable, ci , suchthat

ci =

{1 if not censored0 if censored

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 147 / 242

Page 719: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Censoring

We can incorporate the information from the censored observations intothe likelihood function.

L =n∏

i=1

[f (ti )]ci︸ ︷︷ ︸uncensored

[P(Ti ≥ tci )]1−ci︸ ︷︷ ︸censored

=n∏

i=1

[f (ti )]ci [1− F (ti )]1−ci

=n∏

i=1

[f (ti )]ci [S(ti )]1−ci

So uncensored observations contribute to the density function andcensored observations contribute to the survivor function in the likelihood.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 148 / 242

Page 720: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Censoring

We can incorporate the information from the censored observations intothe likelihood function.

L =n∏

i=1

[f (ti )]ci︸ ︷︷ ︸uncensored

[P(Ti ≥ tci )]1−ci︸ ︷︷ ︸censored

=n∏

i=1

[f (ti )]ci [1− F (ti )]1−ci

=n∏

i=1

[f (ti )]ci [S(ti )]1−ci

So uncensored observations contribute to the density function andcensored observations contribute to the survivor function in the likelihood.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 148 / 242

Page 721: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Censoring

We can incorporate the information from the censored observations intothe likelihood function.

L =n∏

i=1

[f (ti )]ci︸ ︷︷ ︸uncensored

[P(Ti ≥ tci )]1−ci︸ ︷︷ ︸censored

=n∏

i=1

[f (ti )]ci [1− F (ti )]1−ci

=n∏

i=1

[f (ti )]ci [S(ti )]1−ci

So uncensored observations contribute to the density function andcensored observations contribute to the survivor function in the likelihood.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 148 / 242

Page 722: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Censoring

We can incorporate the information from the censored observations intothe likelihood function.

L =n∏

i=1

[f (ti )]ci︸ ︷︷ ︸uncensored

[P(Ti ≥ tci )]1−ci︸ ︷︷ ︸censored

=n∏

i=1

[f (ti )]ci [1− F (ti )]1−ci

=n∏

i=1

[f (ti )]ci [S(ti )]1−ci

So uncensored observations contribute to the density function andcensored observations contribute to the survivor function in the likelihood.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 148 / 242

Page 723: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Censoring

We can incorporate the information from the censored observations intothe likelihood function.

L =n∏

i=1

[f (ti )]ci︸ ︷︷ ︸uncensored

[P(Ti ≥ tci )]1−ci︸ ︷︷ ︸censored

=n∏

i=1

[f (ti )]ci [1− F (ti )]1−ci

=n∏

i=1

[f (ti )]ci [S(ti )]1−ci

So uncensored observations contribute to the density function andcensored observations contribute to the survivor function in the likelihood.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 148 / 242

Page 724: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Popular example of stochastic process

Principles of Poisson process:I Independent increments: number of events occurring in two disjoint

intervals is independentI Stationary increments: probability distribution of number of

occurrences depends only on the time length of interval (because ofcommon rate)

Events occur at rate λ (expected occurrences per unit of time)

Nτ = number of arrivals in time period of length τI Nτ ∼ Poisson(λτ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 149 / 242

Page 725: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Popular example of stochastic process

Principles of Poisson process:I Independent increments: number of events occurring in two disjoint

intervals is independentI Stationary increments: probability distribution of number of

occurrences depends only on the time length of interval (because ofcommon rate)

Events occur at rate λ (expected occurrences per unit of time)

Nτ = number of arrivals in time period of length τI Nτ ∼ Poisson(λτ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 149 / 242

Page 726: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Popular example of stochastic process

Principles of Poisson process:

I Independent increments: number of events occurring in two disjointintervals is independent

I Stationary increments: probability distribution of number ofoccurrences depends only on the time length of interval (because ofcommon rate)

Events occur at rate λ (expected occurrences per unit of time)

Nτ = number of arrivals in time period of length τI Nτ ∼ Poisson(λτ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 149 / 242

Page 727: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Popular example of stochastic process

Principles of Poisson process:I Independent increments: number of events occurring in two disjoint

intervals is independent

I Stationary increments: probability distribution of number ofoccurrences depends only on the time length of interval (because ofcommon rate)

Events occur at rate λ (expected occurrences per unit of time)

Nτ = number of arrivals in time period of length τI Nτ ∼ Poisson(λτ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 149 / 242

Page 728: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Popular example of stochastic process

Principles of Poisson process:I Independent increments: number of events occurring in two disjoint

intervals is independentI Stationary increments: probability distribution of number of

occurrences depends only on the time length of interval (because ofcommon rate)

Events occur at rate λ (expected occurrences per unit of time)

Nτ = number of arrivals in time period of length τI Nτ ∼ Poisson(λτ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 149 / 242

Page 729: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Popular example of stochastic process

Principles of Poisson process:I Independent increments: number of events occurring in two disjoint

intervals is independentI Stationary increments: probability distribution of number of

occurrences depends only on the time length of interval (because ofcommon rate)

Events occur at rate λ (expected occurrences per unit of time)

Nτ = number of arrivals in time period of length τI Nτ ∼ Poisson(λτ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 149 / 242

Page 730: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Popular example of stochastic process

Principles of Poisson process:I Independent increments: number of events occurring in two disjoint

intervals is independentI Stationary increments: probability distribution of number of

occurrences depends only on the time length of interval (because ofcommon rate)

Events occur at rate λ (expected occurrences per unit of time)

Nτ = number of arrivals in time period of length τI Nτ ∼ Poisson(λτ)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 149 / 242

Page 731: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Exponential distribution measures the times between events in aPoisson process

T = time to wait until next event in a Poisson process with rate λ

T ∼ Expo(λ)

Memorylessness property: how much you have waited already isirrelevant

P(T > t + k |T > t) = P(T > k)

P(T > 3 + 5|T > 3) = P(T > 5)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 150 / 242

Page 732: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Exponential distribution measures the times between events in aPoisson process

T = time to wait until next event in a Poisson process with rate λ

T ∼ Expo(λ)

Memorylessness property: how much you have waited already isirrelevant

P(T > t + k |T > t) = P(T > k)

P(T > 3 + 5|T > 3) = P(T > 5)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 150 / 242

Page 733: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Exponential distribution measures the times between events in aPoisson process

T = time to wait until next event in a Poisson process with rate λ

T ∼ Expo(λ)

Memorylessness property: how much you have waited already isirrelevant

P(T > t + k |T > t) = P(T > k)

P(T > 3 + 5|T > 3) = P(T > 5)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 150 / 242

Page 734: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Exponential distribution measures the times between events in aPoisson process

T = time to wait until next event in a Poisson process with rate λ

T ∼ Expo(λ)

Memorylessness property: how much you have waited already isirrelevant

P(T > t + k |T > t) = P(T > k)

P(T > 3 + 5|T > 3) = P(T > 5)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 150 / 242

Page 735: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Poisson Process

Exponential distribution measures the times between events in aPoisson process

T = time to wait until next event in a Poisson process with rate λ

T ∼ Expo(λ)

Memorylessness property: how much you have waited already isirrelevant

P(T > t + k |T > t) = P(T > k)

P(T > 3 + 5|T > 3) = P(T > 5)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 150 / 242

Page 736: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Two Possible Parameterizations of the Exponential Model

λi > 0 is the rate parameter

Ti ∼ Exponential(λi )

f (ti ) = λie−λi ti

E (Ti ) =1

λi

θi > 0 is scale parameter (θi = 1λi

)

Ti ∼ Exponential(θi )

f (ti ) =1

θie− tiθi

E (Ti ) = θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 151 / 242

Page 737: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Two Possible Parameterizations of the Exponential Model

λi > 0 is the rate parameter

Ti ∼ Exponential(λi )

f (ti ) = λie−λi ti

E (Ti ) =1

λi

θi > 0 is scale parameter (θi = 1λi

)

Ti ∼ Exponential(θi )

f (ti ) =1

θie− tiθi

E (Ti ) = θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 151 / 242

Page 738: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Two Possible Parameterizations of the Exponential Model

λi > 0 is the rate parameter

Ti ∼ Exponential(λi )

f (ti ) = λie−λi ti

E (Ti ) =1

λi

θi > 0 is scale parameter (θi = 1λi

)

Ti ∼ Exponential(θi )

f (ti ) =1

θie− tiθi

E (Ti ) = θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 151 / 242

Page 739: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Two Possible Parameterizations of the Exponential Model

λi > 0 is the rate parameter

Ti ∼ Exponential(λi )

f (ti ) = λie−λi ti

E (Ti ) =1

λi

θi > 0 is scale parameter (θi = 1λi

)

Ti ∼ Exponential(θi )

f (ti ) =1

θie− tiθi

E (Ti ) = θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 151 / 242

Page 740: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Two Possible Parameterizations of the Exponential Model

λi > 0 is the rate parameter

Ti ∼ Exponential(λi )

f (ti ) = λie−λi ti

E (Ti ) =1

λi

θi > 0 is scale parameter (θi = 1λi

)

Ti ∼ Exponential(θi )

f (ti ) =1

θie− tiθi

E (Ti ) = θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 151 / 242

Page 741: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Two Possible Parameterizations of the Exponential Model

λi > 0 is the rate parameter

Ti ∼ Exponential(λi )

f (ti ) = λie−λi ti

E (Ti ) =1

λi

θi > 0 is scale parameter (θi = 1λi

)

Ti ∼ Exponential(θi )

f (ti ) =1

θie− tiθi

E (Ti ) = θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 151 / 242

Page 742: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Two Possible Parameterizations of the Exponential Model

λi > 0 is the rate parameter

Ti ∼ Exponential(λi )

f (ti ) = λie−λi ti

E (Ti ) =1

λi

θi > 0 is scale parameter (θi = 1λi

)

Ti ∼ Exponential(θi )

f (ti ) =1

θie− tiθi

E (Ti ) = θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 151 / 242

Page 743: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Exponential Model

0 2 4 6

0.0

0.2

0.4

0.6

Exponential(1)

t

Den

sity

E(T)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 152 / 242

Page 744: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link Functions

If you use a rate parameterization with λi :

E (Ti ) =1

λi=

1

exp(xiβ)

Positive β implies that expected duration time decreases as xincreases.

If you use a scale parameterization with θi

E (Ti ) = θi = exp(xiβ)

Positive β implies that expected duration time increases as xincreases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 153 / 242

Page 745: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link Functions

If you use a rate parameterization with λi :

E (Ti ) =1

λi=

1

exp(xiβ)

Positive β implies that expected duration time decreases as xincreases.

If you use a scale parameterization with θi

E (Ti ) = θi = exp(xiβ)

Positive β implies that expected duration time increases as xincreases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 153 / 242

Page 746: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link Functions

If you use a rate parameterization with λi :

E (Ti ) =1

λi=

1

exp(xiβ)

Positive β implies that expected duration time decreases as xincreases.

If you use a scale parameterization with θi

E (Ti ) = θi = exp(xiβ)

Positive β implies that expected duration time increases as xincreases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 153 / 242

Page 747: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link Functions

If you use a rate parameterization with λi :

E (Ti ) =1

λi=

1

exp(xiβ)

Positive β implies that expected duration time decreases as xincreases.

If you use a scale parameterization with θi

E (Ti ) = θi = exp(xiβ)

Positive β implies that expected duration time increases as xincreases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 153 / 242

Page 748: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link Functions

If you use a rate parameterization with λi :

E (Ti ) =1

λi=

1

exp(xiβ)

Positive β implies that expected duration time decreases as xincreases.

If you use a scale parameterization with θi

E (Ti ) = θi = exp(xiβ)

Positive β implies that expected duration time increases as xincreases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 153 / 242

Page 749: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link Functions

If you use a rate parameterization with λi :

E (Ti ) =1

λi=

1

exp(xiβ)

Positive β implies that expected duration time decreases as xincreases.

If you use a scale parameterization with θi

E (Ti ) = θi = exp(xiβ)

Positive β implies that expected duration time increases as xincreases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 153 / 242

Page 750: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link Functions

If you use a rate parameterization with λi :

E (Ti ) =1

λi=

1

exp(xiβ)

Positive β implies that expected duration time decreases as xincreases.

If you use a scale parameterization with θi

E (Ti ) = θi = exp(xiβ)

Positive β implies that expected duration time increases as xincreases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 153 / 242

Page 751: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Rate Parametrization

For Ti ∼ Exponential(λi ):

f (t) = λie−λi t

S(t) = 1− F (t)

= 1− (1− e−λt)

= e−λi t

h(t) =f (t)

S(t)

=λie−λi t

e−λi t

= λi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 154 / 242

Page 752: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Rate Parametrization

For Ti ∼ Exponential(λi ):

f (t) = λie−λi t

S(t) = 1− F (t)

= 1− (1− e−λt)

= e−λi t

h(t) =f (t)

S(t)

=λie−λi t

e−λi t

= λi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 154 / 242

Page 753: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Rate Parametrization

For Ti ∼ Exponential(λi ):

f (t) = λie−λi t

S(t) = 1− F (t)

= 1− (1− e−λt)

= e−λi t

h(t) =f (t)

S(t)

=λie−λi t

e−λi t

= λi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 154 / 242

Page 754: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Rate Parametrization

For Ti ∼ Exponential(λi ):

f (t) = λie−λi t

S(t) = 1− F (t)

= 1− (1− e−λt)

= e−λi t

h(t) =f (t)

S(t)

=λie−λi t

e−λi t

= λi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 154 / 242

Page 755: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Rate Parametrization

For Ti ∼ Exponential(λi ):

f (t) = λie−λi t

S(t) = 1− F (t)

= 1− (1− e−λt)

= e−λi t

h(t) =f (t)

S(t)

=λie−λi t

e−λi t

= λi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 154 / 242

Page 756: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Rate Parametrization

For Ti ∼ Exponential(λi ):

f (t) = λie−λi t

S(t) = 1− F (t)

= 1− (1− e−λt)

= e−λi t

h(t) =f (t)

S(t)

=λie−λi t

e−λi t

= λi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 154 / 242

Page 757: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Rate Parametrization

For Ti ∼ Exponential(λi ):

f (t) = λie−λi t

S(t) = 1− F (t)

= 1− (1− e−λt)

= e−λi t

h(t) =f (t)

S(t)

=λie−λi t

e−λi t

= λi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 154 / 242

Page 758: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Rate Parametrization

For Ti ∼ Exponential(λi ):

f (t) = λie−λi t

S(t) = 1− F (t)

= 1− (1− e−λt)

= e−λi t

h(t) =f (t)

S(t)

=λie−λi t

e−λi t

= λi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 154 / 242

Page 759: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Scale Parametrization

For Ti ∼ Exponential(θi ):

f (t) =1

θiexp[− t

θi]

S(t) = 1− F (t)

= 1− (1− exp[− t

θi])

= exp[− t

θi]

h(t) =f (t)

S(t)

=1θi

exp[− tθi

]

exp[− tθi

]=

1

θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 155 / 242

Page 760: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Scale Parametrization

For Ti ∼ Exponential(θi ):

f (t) =1

θiexp[− t

θi]

S(t) = 1− F (t)

= 1− (1− exp[− t

θi])

= exp[− t

θi]

h(t) =f (t)

S(t)

=1θi

exp[− tθi

]

exp[− tθi

]=

1

θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 155 / 242

Page 761: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Scale Parametrization

For Ti ∼ Exponential(θi ):

f (t) =1

θiexp[− t

θi]

S(t) = 1− F (t)

= 1− (1− exp[− t

θi])

= exp[− t

θi]

h(t) =f (t)

S(t)

=1θi

exp[− tθi

]

exp[− tθi

]=

1

θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 155 / 242

Page 762: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Scale Parametrization

For Ti ∼ Exponential(θi ):

f (t) =1

θiexp[− t

θi]

S(t) = 1− F (t)

= 1− (1− exp[− t

θi])

= exp[− t

θi]

h(t) =f (t)

S(t)

=1θi

exp[− tθi

]

exp[− tθi

]=

1

θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 155 / 242

Page 763: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Scale Parametrization

For Ti ∼ Exponential(θi ):

f (t) =1

θiexp[− t

θi]

S(t) = 1− F (t)

= 1− (1− exp[− t

θi])

= exp[− t

θi]

h(t) =f (t)

S(t)

=1θi

exp[− tθi

]

exp[− tθi

]=

1

θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 155 / 242

Page 764: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Scale Parametrization

For Ti ∼ Exponential(θi ):

f (t) =1

θiexp[− t

θi]

S(t) = 1− F (t)

= 1− (1− exp[− t

θi])

= exp[− t

θi]

h(t) =f (t)

S(t)

=1θi

exp[− tθi

]

exp[− tθi

]=

1

θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 155 / 242

Page 765: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Scale Parametrization

For Ti ∼ Exponential(θi ):

f (t) =1

θiexp[− t

θi]

S(t) = 1− F (t)

= 1− (1− exp[− t

θi])

= exp[− t

θi]

h(t) =f (t)

S(t)

=1θi

exp[− tθi

]

exp[− tθi

]

=1

θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 155 / 242

Page 766: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Function for Scale Parametrization

For Ti ∼ Exponential(θi ):

f (t) =1

θiexp[− t

θi]

S(t) = 1− F (t)

= 1− (1− exp[− t

θi])

= exp[− t

θi]

h(t) =f (t)

S(t)

=1θi

exp[− tθi

]

exp[− tθi

]=

1

θi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 155 / 242

Page 767: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Let’s work with the scale parametrization

Note that h(t) = 1θi

, which does not depend on t!

I The exponential model thus assume a flat hazard: Every unit /individual has their own hazard rate, but it does not change over time

I Connected to memorylessness property of the exponentialdistribution

Modeling h(t) with covariates:

h(t) =1

θi= exp[−xiβ]

Positive β implies that hazard decreases and average survival timeincreases as x increases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 156 / 242

Page 768: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Let’s work with the scale parametrization

Note that h(t) = 1θi

, which does not depend on t!

I The exponential model thus assume a flat hazard: Every unit /individual has their own hazard rate, but it does not change over time

I Connected to memorylessness property of the exponentialdistribution

Modeling h(t) with covariates:

h(t) =1

θi= exp[−xiβ]

Positive β implies that hazard decreases and average survival timeincreases as x increases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 156 / 242

Page 769: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Let’s work with the scale parametrization

Note that h(t) = 1θi

, which does not depend on t!

I The exponential model thus assume a flat hazard: Every unit /individual has their own hazard rate, but it does not change over time

I Connected to memorylessness property of the exponentialdistribution

Modeling h(t) with covariates:

h(t) =1

θi= exp[−xiβ]

Positive β implies that hazard decreases and average survival timeincreases as x increases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 156 / 242

Page 770: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Let’s work with the scale parametrization

Note that h(t) = 1θi

, which does not depend on t!

I The exponential model thus assume a flat hazard: Every unit /individual has their own hazard rate, but it does not change over time

I Connected to memorylessness property of the exponentialdistribution

Modeling h(t) with covariates:

h(t) =1

θi= exp[−xiβ]

Positive β implies that hazard decreases and average survival timeincreases as x increases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 156 / 242

Page 771: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Let’s work with the scale parametrization

Note that h(t) = 1θi

, which does not depend on t!

I The exponential model thus assume a flat hazard: Every unit /individual has their own hazard rate, but it does not change over time

I Connected to memorylessness property of the exponentialdistribution

Modeling h(t) with covariates:

h(t) =1

θi= exp[−xiβ]

Positive β implies that hazard decreases and average survival timeincreases as x increases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 156 / 242

Page 772: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Let’s work with the scale parametrization

Note that h(t) = 1θi

, which does not depend on t!

I The exponential model thus assume a flat hazard: Every unit /individual has their own hazard rate, but it does not change over time

I Connected to memorylessness property of the exponentialdistribution

Modeling h(t) with covariates:

h(t) =1

θi= exp[−xiβ]

Positive β implies that hazard decreases and average survival timeincreases as x increases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 156 / 242

Page 773: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation via ML:

L =n∏

i=1

[f (ti )]1−ci [1− F (ti )]ci

=n∏

i=1

[1

θie− ti

θi

]1−ci [e− ti

θi

]ci` =

n∑i=1

(1− ci )(ln1

θi− tiθi

) + ci (−tiθi

)

=n∑

i=1

(1− ci )(ln e−xiβ − e−xiβti ) + ci (−e−xiβti )

=n∑

i=1

(1− ci )(−xiβ − e−xiβti )− ci (e−xiβti )

=n∑

i=1

(1− ci )(−xiβ)− e−xiβti

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 157 / 242

Page 774: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation via ML:

L =n∏

i=1

[f (ti )]1−ci [1− F (ti )]ci

=n∏

i=1

[1

θie− ti

θi

]1−ci [e− ti

θi

]ci` =

n∑i=1

(1− ci )(ln1

θi− tiθi

) + ci (−tiθi

)

=n∑

i=1

(1− ci )(ln e−xiβ − e−xiβti ) + ci (−e−xiβti )

=n∑

i=1

(1− ci )(−xiβ − e−xiβti )− ci (e−xiβti )

=n∑

i=1

(1− ci )(−xiβ)− e−xiβti

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 157 / 242

Page 775: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation via ML:

L =n∏

i=1

[f (ti )]1−ci [1− F (ti )]ci

=n∏

i=1

[1

θie− ti

θi

]1−ci [e− ti

θi

]ci

` =n∑

i=1

(1− ci )(ln1

θi− tiθi

) + ci (−tiθi

)

=n∑

i=1

(1− ci )(ln e−xiβ − e−xiβti ) + ci (−e−xiβti )

=n∑

i=1

(1− ci )(−xiβ − e−xiβti )− ci (e−xiβti )

=n∑

i=1

(1− ci )(−xiβ)− e−xiβti

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 157 / 242

Page 776: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation via ML:

L =n∏

i=1

[f (ti )]1−ci [1− F (ti )]ci

=n∏

i=1

[1

θie− ti

θi

]1−ci [e− ti

θi

]ci` =

n∑i=1

(1− ci )(ln1

θi− tiθi

) + ci (−tiθi

)

=n∑

i=1

(1− ci )(ln e−xiβ − e−xiβti ) + ci (−e−xiβti )

=n∑

i=1

(1− ci )(−xiβ − e−xiβti )− ci (e−xiβti )

=n∑

i=1

(1− ci )(−xiβ)− e−xiβti

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 157 / 242

Page 777: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation via ML:

L =n∏

i=1

[f (ti )]1−ci [1− F (ti )]ci

=n∏

i=1

[1

θie− ti

θi

]1−ci [e− ti

θi

]ci` =

n∑i=1

(1− ci )(ln1

θi− tiθi

) + ci (−tiθi

)

=n∑

i=1

(1− ci )(ln e−xiβ − e−xiβti ) + ci (−e−xiβti )

=n∑

i=1

(1− ci )(−xiβ − e−xiβti )− ci (e−xiβti )

=n∑

i=1

(1− ci )(−xiβ)− e−xiβti

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 157 / 242

Page 778: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation via ML:

L =n∏

i=1

[f (ti )]1−ci [1− F (ti )]ci

=n∏

i=1

[1

θie− ti

θi

]1−ci [e− ti

θi

]ci` =

n∑i=1

(1− ci )(ln1

θi− tiθi

) + ci (−tiθi

)

=n∑

i=1

(1− ci )(ln e−xiβ − e−xiβti ) + ci (−e−xiβti )

=n∑

i=1

(1− ci )(−xiβ − e−xiβti )− ci (e−xiβti )

=n∑

i=1

(1− ci )(−xiβ)− e−xiβti

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 157 / 242

Page 779: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation via ML:

L =n∏

i=1

[f (ti )]1−ci [1− F (ti )]ci

=n∏

i=1

[1

θie− ti

θi

]1−ci [e− ti

θi

]ci` =

n∑i=1

(1− ci )(ln1

θi− tiθi

) + ci (−tiθi

)

=n∑

i=1

(1− ci )(ln e−xiβ − e−xiβti ) + ci (−e−xiβti )

=n∑

i=1

(1− ci )(−xiβ − e−xiβti )− ci (e−xiβti )

=n∑

i=1

(1− ci )(−xiβ)− e−xiβti

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 157 / 242

Page 780: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of interest

If our outcome variable is how long a parliamentary government lasts, andwe’re interested in the effect of majority versus minority governments. Wecould calculate:

Find the hazard ratio of majority to minority governments

Expected survival time for majority and minority governments

Predicted survival times for majority and minority governments

First differences in expected survival times between majority andminority governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 158 / 242

Page 781: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of interest

If our outcome variable is how long a parliamentary government lasts, andwe’re interested in the effect of majority versus minority governments. Wecould calculate:

Find the hazard ratio of majority to minority governments

Expected survival time for majority and minority governments

Predicted survival times for majority and minority governments

First differences in expected survival times between majority andminority governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 158 / 242

Page 782: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of interest

If our outcome variable is how long a parliamentary government lasts, andwe’re interested in the effect of majority versus minority governments. Wecould calculate:

Find the hazard ratio of majority to minority governments

Expected survival time for majority and minority governments

Predicted survival times for majority and minority governments

First differences in expected survival times between majority andminority governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 158 / 242

Page 783: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of interest

If our outcome variable is how long a parliamentary government lasts, andwe’re interested in the effect of majority versus minority governments. Wecould calculate:

Find the hazard ratio of majority to minority governments

Expected survival time for majority and minority governments

Predicted survival times for majority and minority governments

First differences in expected survival times between majority andminority governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 158 / 242

Page 784: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of interest

If our outcome variable is how long a parliamentary government lasts, andwe’re interested in the effect of majority versus minority governments. Wecould calculate:

Find the hazard ratio of majority to minority governments

Expected survival time for majority and minority governments

Predicted survival times for majority and minority governments

First differences in expected survival times between majority andminority governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 158 / 242

Page 785: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of interest

If our outcome variable is how long a parliamentary government lasts, andwe’re interested in the effect of majority versus minority governments. Wecould calculate:

Find the hazard ratio of majority to minority governments

Expected survival time for majority and minority governments

Predicted survival times for majority and minority governments

First differences in expected survival times between majority andminority governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 158 / 242

Page 786: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

HR =h(t|xmaj)

h(t|xmin)

=e−xmajβ

e−xminβ

=e−β0e−x1β1e−x2β2e−x3β3e−xmajβ4e−x5β5

e−β0e−x1β1e−x2β2e−x3β3e−xminβ4e−x5β5

=e−xmajβ4

e−xminβ4

= e−β4

Hazard ratio greater than 1 would imply that majority governments fallfaster (shorter survival time) than minority governments.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 159 / 242

Page 787: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

HR =h(t|xmaj)

h(t|xmin)

=e−xmajβ

e−xminβ

=e−β0e−x1β1e−x2β2e−x3β3e−xmajβ4e−x5β5

e−β0e−x1β1e−x2β2e−x3β3e−xminβ4e−x5β5

=e−xmajβ4

e−xminβ4

= e−β4

Hazard ratio greater than 1 would imply that majority governments fallfaster (shorter survival time) than minority governments.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 159 / 242

Page 788: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

HR =h(t|xmaj)

h(t|xmin)

=e−xmajβ

e−xminβ

=e−β0e−x1β1e−x2β2e−x3β3e−xmajβ4e−x5β5

e−β0e−x1β1e−x2β2e−x3β3e−xminβ4e−x5β5

=e−xmajβ4

e−xminβ4

= e−β4

Hazard ratio greater than 1 would imply that majority governments fallfaster (shorter survival time) than minority governments.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 159 / 242

Page 789: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

HR =h(t|xmaj)

h(t|xmin)

=e−xmajβ

e−xminβ

=e−β0e−x1β1e−x2β2e−x3β3e−xmajβ4e−x5β5

e−β0e−x1β1e−x2β2e−x3β3e−xminβ4e−x5β5

=e−xmajβ4

e−xminβ4

= e−β4

Hazard ratio greater than 1 would imply that majority governments fallfaster (shorter survival time) than minority governments.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 159 / 242

Page 790: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

HR =h(t|xmaj)

h(t|xmin)

=e−xmajβ

e−xminβ

=e−β0e−x1β1e−x2β2e−x3β3e−xmajβ4e−x5β5

e−β0e−x1β1e−x2β2e−x3β3e−xminβ4e−x5β5

=e−xmajβ4

e−xminβ4

= e−β4

Hazard ratio greater than 1 would imply that majority governments fallfaster (shorter survival time) than minority governments.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 159 / 242

Page 791: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

HR =h(t|xmaj)

h(t|xmin)

=e−xmajβ

e−xminβ

=e−β0e−x1β1e−x2β2e−x3β3e−xmajβ4e−x5β5

e−β0e−x1β1e−x2β2e−x3β3e−xminβ4e−x5β5

=e−xmajβ4

e−xminβ4

= e−β4

Hazard ratio greater than 1 would imply that majority governments fallfaster (shorter survival time) than minority governments.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 159 / 242

Page 792: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

HR =h(t|xmaj)

h(t|xmin)

=e−xmajβ

e−xminβ

=e−β0e−x1β1e−x2β2e−x3β3e−xmajβ4e−x5β5

e−β0e−x1β1e−x2β2e−x3β3e−xminβ4e−x5β5

=e−xmajβ4

e−xminβ4

= e−β4

Hazard ratio greater than 1 would imply that majority governments fallfaster (shorter survival time) than minority governments.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 159 / 242

Page 793: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

0.4 0.5 0.6 0.7 0.8 0.9 1.0

01

23

4

Distribution of Hazard Ratios

hazard ratios

Den

sity

Mean Hazard Ratio

Majority governments survive longer than minority governments.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 160 / 242

Page 794: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

0.4 0.5 0.6 0.7 0.8 0.9 1.0

01

23

4

Distribution of Hazard Ratios

hazard ratios

Den

sity

Mean Hazard Ratio

Majority governments survive longer than minority governments.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 160 / 242

Page 795: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected (average) Survival Time

E (T |xi ) = θi

= exp[xiβ]

10 15 20 25 30 35 40

0.00

0.05

0.10

0.15

0.20

0.25

Distribution of Expected Duration

expected duration in months

Den

sity

Majority GovernmentsMinority Governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 161 / 242

Page 796: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected (average) Survival Time

E (T |xi ) = θi

= exp[xiβ]

10 15 20 25 30 35 40

0.00

0.05

0.10

0.15

0.20

0.25

Distribution of Expected Duration

expected duration in months

Den

sity

Majority GovernmentsMinority Governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 161 / 242

Page 797: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected (average) Survival Time

E (T |xi ) = θi

= exp[xiβ]

10 15 20 25 30 35 40

0.00

0.05

0.10

0.15

0.20

0.25

Distribution of Expected Duration

expected duration in months

Den

sity

Majority GovernmentsMinority Governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 161 / 242

Page 798: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Expected (average) Survival Time

E (T |xi ) = θi

= exp[xiβ]

10 15 20 25 30 35 40

0.00

0.05

0.10

0.15

0.20

0.25

Distribution of Expected Duration

expected duration in months

Den

sity

Majority GovernmentsMinority Governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 161 / 242

Page 799: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predicted Survival Time

Draw predicted values from the exponential distribution.

0 50 100 150

0.00

0.01

0.02

0.03

0.04

Distribution of Predicted Duration

predicted duration in months

Den

sity

Majority GovernmentsMinority Governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 162 / 242

Page 800: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predicted Survival Time

Draw predicted values from the exponential distribution.

0 50 100 150

0.00

0.01

0.02

0.03

0.04

Distribution of Predicted Duration

predicted duration in months

Den

sity

Majority GovernmentsMinority Governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 162 / 242

Page 801: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predicted Survival Time

Draw predicted values from the exponential distribution.

0 50 100 150

0.00

0.01

0.02

0.03

0.04

Distribution of Predicted Duration

predicted duration in months

Den

sity

Majority GovernmentsMinority Governments

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 162 / 242

Page 802: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Differences

E (T |xmaj)− E (T |xmin)

0 5 10 15 20 25

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Distribution of First Differences

first difference in months

Den

sity

Mean First Difference

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 163 / 242

Page 803: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Differences

E (T |xmaj)− E (T |xmin)

0 5 10 15 20 25

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Distribution of First Differences

first difference in months

Den

sity

Mean First Difference

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 163 / 242

Page 804: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Differences

E (T |xmaj)− E (T |xmin)

0 5 10 15 20 25

0.00

0.02

0.04

0.06

0.08

0.10

0.12

Distribution of First Differences

first difference in months

Den

sity

Mean First Difference

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 163 / 242

Page 805: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Quantities of Interest in Zelig

x.min <- setx(z.out,numst2=0)

x.maj <- setx(z.out,numst2=1)

s.out <- sim(z.out, x=x.min,x1=x.maj)

summary(s.out)

plot(s.out)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 164 / 242

Page 806: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

0 20 40 60 80 100

0.00

0.04

Predicted Values: Y|X

0 50 100 150

0.000

0.025

Predicted Values: Y|X1

10 12 14 16 18 20

0.00

0.20

Expected Values: E(Y|X)

20 25 30

0.00

0.15

Expected Values: E(Y|X1)

0 5 10 15

0.00

0.10

First Differences: E(Y|X1) - E(Y|X)

0 50 100 150

0.00

0.04

Comparison of Y|X and Y|X1

0 50 100 150

0.00

0.04

10 15 20 25 30

0.00

0.20

Comparison of E(Y|X) and E(Y|X1)

10 15 20 25 30

0.00

0.20

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 165 / 242

Page 807: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The exponential model is nice and simple, but the assumption of a flathazard may be too restrictive.

What if we want to loosen that restriction by assuming a monotonichazard?

We can use the Weibull model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 166 / 242

Page 808: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The exponential model is nice and simple, but the assumption of a flathazard may be too restrictive.

What if we want to loosen that restriction by assuming a monotonichazard?

We can use the Weibull model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 166 / 242

Page 809: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The exponential model is nice and simple, but the assumption of a flathazard may be too restrictive.

What if we want to loosen that restriction by assuming a monotonichazard?

We can use the Weibull model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 166 / 242

Page 810: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Weibull Model

Similar to how we generalized the Poisson into a Negative Binomial by adding aparameter, we can do the same with the Exponential by turning it into a Weibull:

Ti ∼ Weibull(θi , α)

E (Ti ) = θiΓ

(1 +

1

α

)θi > 0 is the scale parameter and α > 0 is the shape parameter.

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

Changing Shape

l = 1, k = 0.5l = 1, k = 1l = 1, k = 1.5l = 1, k = 5

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

Changing Scale

k = 1, l = 0.5k = 1, l = 1k = 1, l = 1.5k = 1, l = 5

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 167 / 242

Page 811: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Weibull Model

Similar to how we generalized the Poisson into a Negative Binomial by adding aparameter, we can do the same with the Exponential by turning it into a Weibull:

Ti ∼ Weibull(θi , α)

E (Ti ) = θiΓ

(1 +

1

α

)θi > 0 is the scale parameter and α > 0 is the shape parameter.

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

Changing Shape

l = 1, k = 0.5l = 1, k = 1l = 1, k = 1.5l = 1, k = 5

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

Changing Scale

k = 1, l = 0.5k = 1, l = 1k = 1, l = 1.5k = 1, l = 5

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 167 / 242

Page 812: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Weibull Model

Similar to how we generalized the Poisson into a Negative Binomial by adding aparameter, we can do the same with the Exponential by turning it into a Weibull:

Ti ∼ Weibull(θi , α)

E (Ti ) = θiΓ

(1 +

1

α

)

θi > 0 is the scale parameter and α > 0 is the shape parameter.

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

Changing Shape

l = 1, k = 0.5l = 1, k = 1l = 1, k = 1.5l = 1, k = 5

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

Changing Scale

k = 1, l = 0.5k = 1, l = 1k = 1, l = 1.5k = 1, l = 5

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 167 / 242

Page 813: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Weibull Model

Similar to how we generalized the Poisson into a Negative Binomial by adding aparameter, we can do the same with the Exponential by turning it into a Weibull:

Ti ∼ Weibull(θi , α)

E (Ti ) = θiΓ

(1 +

1

α

)θi > 0 is the scale parameter and α > 0 is the shape parameter.

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

Changing Shape

l = 1, k = 0.5l = 1, k = 1l = 1, k = 1.5l = 1, k = 5

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

Changing Scale

k = 1, l = 0.5k = 1, l = 1k = 1, l = 1.5k = 1, l = 5

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 167 / 242

Page 814: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Weibull Model

Similar to how we generalized the Poisson into a Negative Binomial by adding aparameter, we can do the same with the Exponential by turning it into a Weibull:

Ti ∼ Weibull(θi , α)

E (Ti ) = θiΓ

(1 +

1

α

)θi > 0 is the scale parameter and α > 0 is the shape parameter.

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

Changing Shape

l = 1, k = 0.5l = 1, k = 1l = 1, k = 1.5l = 1, k = 5

0.0 0.5 1.0 1.5 2.0 2.5

0.0

0.5

1.0

1.5

2.0

2.5

Changing Scale

k = 1, l = 0.5k = 1, l = 1k = 1, l = 1.5k = 1, l = 5

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 167 / 242

Page 815: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Weibull Model

f (ti ) =

θαi

)tα−1i exp

[−(

tiθi

)α]

Model θi with covariates in the systematic component:

θi = exp(xiβ)

Positive β implies that expected duration time increases as x increases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 168 / 242

Page 816: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Weibull Model

f (ti ) =

θαi

)tα−1i exp

[−(

tiθi

)α]

Model θi with covariates in the systematic component:

θi = exp(xiβ)

Positive β implies that expected duration time increases as x increases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 168 / 242

Page 817: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Weibull Model

f (ti ) =

θαi

)tα−1i exp

[−(

tiθi

)α]

Model θi with covariates in the systematic component:

θi = exp(xiβ)

Positive β implies that expected duration time increases as x increases.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 168 / 242

Page 818: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

f (ti ) =

θαi

)tα−1i exp

[−(

tiθi

)α]

S(ti ) = 1− F (ti )

= 1− (1− e−(ti/θi )α

)

= e−(ti/θi )α

h(ti ) =f (ti )

S(ti )

=

(αθαi

)tα−1i exp

[−(

tiθi

)α]e−(ti/θi )α

=

θi

)(tiθi

)α−1

=

θαi

)tα−1i

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 169 / 242

Page 819: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

f (ti ) =

θαi

)tα−1i exp

[−(

tiθi

)α]S(ti ) = 1− F (ti )

= 1− (1− e−(ti/θi )α

)

= e−(ti/θi )α

h(ti ) =f (ti )

S(ti )

=

(αθαi

)tα−1i exp

[−(

tiθi

)α]e−(ti/θi )α

=

θi

)(tiθi

)α−1

=

θαi

)tα−1i

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 169 / 242

Page 820: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

f (ti ) =

θαi

)tα−1i exp

[−(

tiθi

)α]S(ti ) = 1− F (ti )

= 1− (1− e−(ti/θi )α

)

= e−(ti/θi )α

h(ti ) =f (ti )

S(ti )

=

(αθαi

)tα−1i exp

[−(

tiθi

)α]e−(ti/θi )α

=

θi

)(tiθi

)α−1

=

θαi

)tα−1i

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 169 / 242

Page 821: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

f (ti ) =

θαi

)tα−1i exp

[−(

tiθi

)α]S(ti ) = 1− F (ti )

= 1− (1− e−(ti/θi )α

)

= e−(ti/θi )α

h(ti ) =f (ti )

S(ti )

=

(αθαi

)tα−1i exp

[−(

tiθi

)α]e−(ti/θi )α

=

θi

)(tiθi

)α−1

=

θαi

)tα−1i

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 169 / 242

Page 822: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

f (ti ) =

θαi

)tα−1i exp

[−(

tiθi

)α]S(ti ) = 1− F (ti )

= 1− (1− e−(ti/θi )α

)

= e−(ti/θi )α

h(ti ) =f (ti )

S(ti )

=

(αθαi

)tα−1i exp

[−(

tiθi

)α]e−(ti/θi )α

=

θi

)(tiθi

)α−1

=

θαi

)tα−1i

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 169 / 242

Page 823: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

f (ti ) =

θαi

)tα−1i exp

[−(

tiθi

)α]S(ti ) = 1− F (ti )

= 1− (1− e−(ti/θi )α

)

= e−(ti/θi )α

h(ti ) =f (ti )

S(ti )

=

(αθαi

)tα−1i exp

[−(

tiθi

)α]e−(ti/θi )α

=

θi

)(tiθi

)α−1

=

θαi

)tα−1i

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 169 / 242

Page 824: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

f (ti ) =

θαi

)tα−1i exp

[−(

tiθi

)α]S(ti ) = 1− F (ti )

= 1− (1− e−(ti/θi )α

)

= e−(ti/θi )α

h(ti ) =f (ti )

S(ti )

=

(αθαi

)tα−1i exp

[−(

tiθi

)α]e−(ti/θi )α

=

θi

)(tiθi

)α−1

=

θαi

)tα−1i

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 169 / 242

Page 825: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard monotonicity assumptionh(ti ) is modeled with both λi and α and is a function of ti . Thus, theWeibull model assumes a monotonic hazard.

If α = 1, h(ti ) is flat and the model is the exponential model.

If α > 1, h(ti ) is monotonically increasing.

If α < 1, h(ti ) is monotonically decreasing.

0.5 1.0 1.5 2.0

01

23

4

Weibull Hazards

t

h(t)

p=.5p=1p=2

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 170 / 242

Page 826: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard monotonicity assumptionh(ti ) is modeled with both λi and α and is a function of ti . Thus, theWeibull model assumes a monotonic hazard.

If α = 1, h(ti ) is flat and the model is the exponential model.

If α > 1, h(ti ) is monotonically increasing.

If α < 1, h(ti ) is monotonically decreasing.

0.5 1.0 1.5 2.0

01

23

4

Weibull Hazards

t

h(t)

p=.5p=1p=2

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 170 / 242

Page 827: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard monotonicity assumptionh(ti ) is modeled with both λi and α and is a function of ti . Thus, theWeibull model assumes a monotonic hazard.

If α = 1, h(ti ) is flat and the model is the exponential model.

If α > 1, h(ti ) is monotonically increasing.

If α < 1, h(ti ) is monotonically decreasing.

0.5 1.0 1.5 2.0

01

23

4

Weibull Hazards

t

h(t)

p=.5p=1p=2

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 170 / 242

Page 828: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard monotonicity assumptionh(ti ) is modeled with both λi and α and is a function of ti . Thus, theWeibull model assumes a monotonic hazard.

If α = 1, h(ti ) is flat and the model is the exponential model.

If α > 1, h(ti ) is monotonically increasing.

If α < 1, h(ti ) is monotonically decreasing.

0.5 1.0 1.5 2.0

01

23

4

Weibull Hazards

t

h(t)

p=.5p=1p=2

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 170 / 242

Page 829: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard monotonicity assumptionh(ti ) is modeled with both λi and α and is a function of ti . Thus, theWeibull model assumes a monotonic hazard.

If α = 1, h(ti ) is flat and the model is the exponential model.

If α > 1, h(ti ) is monotonically increasing.

If α < 1, h(ti ) is monotonically decreasing.

0.5 1.0 1.5 2.0

01

23

4Weibull Hazards

t

h(t)

p=.5p=1p=2

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 170 / 242

Page 830: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The shape parameter α for the Weibull distribution is the reciprocal of thescale parameter given by survreg().

The scale parameter given by survreg() is NOT the same as the scaleparameter in the Weibull distribution, which should be θi = exiβ.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 171 / 242

Page 831: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The shape parameter α for the Weibull distribution is the reciprocal of thescale parameter given by survreg().

The scale parameter given by survreg() is NOT the same as the scaleparameter in the Weibull distribution, which should be θi = exiβ.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 171 / 242

Page 832: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

One quantity of interest is the hazard ratio:

HR =h(t|x = 1)

h(t|x = 0)

With the Weibull model we make a proportional hazards assumption:hazard ratio does not depend t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 172 / 242

Page 833: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

One quantity of interest is the hazard ratio:

HR =h(t|x = 1)

h(t|x = 0)

With the Weibull model we make a proportional hazards assumption:hazard ratio does not depend t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 172 / 242

Page 834: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

One quantity of interest is the hazard ratio:

HR =h(t|x = 1)

h(t|x = 0)

With the Weibull model we make a proportional hazards assumption:hazard ratio does not depend t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 172 / 242

Page 835: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Hazard Ratios

One quantity of interest is the hazard ratio:

HR =h(t|x = 1)

h(t|x = 0)

With the Weibull model we make a proportional hazards assumption:hazard ratio does not depend t.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 172 / 242

Page 836: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Other Parametric Models

Gompertz model: monotonic hazard

Log-logistic or log-normal model: nonmonotonic hazard

Generalized gamma model: nests the exponential, Weibull,log-normal, and gamma models with an extra parameter (seeappendix slides)

But what if we don’t want to make an assumption about the shape of thehazard?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 173 / 242

Page 837: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Other Parametric Models

Gompertz model: monotonic hazard

Log-logistic or log-normal model: nonmonotonic hazard

Generalized gamma model: nests the exponential, Weibull,log-normal, and gamma models with an extra parameter (seeappendix slides)

But what if we don’t want to make an assumption about the shape of thehazard?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 173 / 242

Page 838: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Other Parametric Models

Gompertz model: monotonic hazard

Log-logistic or log-normal model: nonmonotonic hazard

Generalized gamma model: nests the exponential, Weibull,log-normal, and gamma models with an extra parameter (seeappendix slides)

But what if we don’t want to make an assumption about the shape of thehazard?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 173 / 242

Page 839: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Other Parametric Models

Gompertz model: monotonic hazard

Log-logistic or log-normal model: nonmonotonic hazard

Generalized gamma model: nests the exponential, Weibull,log-normal, and gamma models with an extra parameter (seeappendix slides)

But what if we don’t want to make an assumption about the shape of thehazard?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 173 / 242

Page 840: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Cox Proportional Hazards Model

Often described as a semi-parametric model.Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 174 / 242

Page 841: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Cox Proportional Hazards Model

Often described as a semi-parametric model.

Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 174 / 242

Page 842: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Cox Proportional Hazards Model

Often described as a semi-parametric model.Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 174 / 242

Page 843: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Cox Proportional Hazards Model

Often described as a semi-parametric model.Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 174 / 242

Page 844: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Cox Proportional Hazards Model

Often described as a semi-parametric model.Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 174 / 242

Page 845: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Cox Proportional Hazards Model

Often described as a semi-parametric model.Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 174 / 242

Page 846: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Cox Proportional Hazards Model

Often described as a semi-parametric model.Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 174 / 242

Page 847: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Cox Proportional Hazards Model

Often described as a semi-parametric model.Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 174 / 242

Page 848: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The Cox Proportional Hazards Model

Often described as a semi-parametric model.Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 174 / 242

Page 849: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Reconceptualize each ti as a discrete event time rather than aduration or survival time (non-censored observations only).

I ti = 5: An event occurred at month 5, rather than observation isurviving for 5 months.

2 Assume there are no tied event times in the data.I No two events can occur at the same instant. It only seems that way

because our unit of measurement is not precise enough.I There are ways to adjust the likelihood to take into account observed

ties.

3 Assume no events can happen between event times.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 175 / 242

Page 850: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Reconceptualize each ti as a discrete event time rather than aduration or survival time (non-censored observations only).

I ti = 5: An event occurred at month 5, rather than observation isurviving for 5 months.

2 Assume there are no tied event times in the data.I No two events can occur at the same instant. It only seems that way

because our unit of measurement is not precise enough.I There are ways to adjust the likelihood to take into account observed

ties.

3 Assume no events can happen between event times.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 175 / 242

Page 851: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Reconceptualize each ti as a discrete event time rather than aduration or survival time (non-censored observations only).

I ti = 5: An event occurred at month 5, rather than observation isurviving for 5 months.

2 Assume there are no tied event times in the data.

I No two events can occur at the same instant. It only seems that waybecause our unit of measurement is not precise enough.

I There are ways to adjust the likelihood to take into account observedties.

3 Assume no events can happen between event times.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 175 / 242

Page 852: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Reconceptualize each ti as a discrete event time rather than aduration or survival time (non-censored observations only).

I ti = 5: An event occurred at month 5, rather than observation isurviving for 5 months.

2 Assume there are no tied event times in the data.

I No two events can occur at the same instant. It only seems that waybecause our unit of measurement is not precise enough.

I There are ways to adjust the likelihood to take into account observedties.

3 Assume no events can happen between event times.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 175 / 242

Page 853: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Reconceptualize each ti as a discrete event time rather than aduration or survival time (non-censored observations only).

I ti = 5: An event occurred at month 5, rather than observation isurviving for 5 months.

2 Assume there are no tied event times in the data.I No two events can occur at the same instant. It only seems that way

because our unit of measurement is not precise enough.

I There are ways to adjust the likelihood to take into account observedties.

3 Assume no events can happen between event times.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 175 / 242

Page 854: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Reconceptualize each ti as a discrete event time rather than aduration or survival time (non-censored observations only).

I ti = 5: An event occurred at month 5, rather than observation isurviving for 5 months.

2 Assume there are no tied event times in the data.I No two events can occur at the same instant. It only seems that way

because our unit of measurement is not precise enough.I There are ways to adjust the likelihood to take into account observed

ties.

3 Assume no events can happen between event times.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 175 / 242

Page 855: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Reconceptualize each ti as a discrete event time rather than aduration or survival time (non-censored observations only).

I ti = 5: An event occurred at month 5, rather than observation isurviving for 5 months.

2 Assume there are no tied event times in the data.I No two events can occur at the same instant. It only seems that way

because our unit of measurement is not precise enough.I There are ways to adjust the likelihood to take into account observed

ties.

3 Assume no events can happen between event times.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 175 / 242

Page 856: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We know that exactly one event occurred at each ti for all non-censored i .

Define a risk set Ri as the set of all possible observations at risk of anevent at time ti .

What observations belong in Ri?

All observations (censored and non-censored) j such that tj ≥ ti

For example, if ti = 5 months, then all observations that do not experiencethe event or are not censored before 5 months are at risk.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 176 / 242

Page 857: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We know that exactly one event occurred at each ti for all non-censored i .

Define a risk set Ri as the set of all possible observations at risk of anevent at time ti .

What observations belong in Ri?

All observations (censored and non-censored) j such that tj ≥ ti

For example, if ti = 5 months, then all observations that do not experiencethe event or are not censored before 5 months are at risk.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 176 / 242

Page 858: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We know that exactly one event occurred at each ti for all non-censored i .

Define a risk set Ri as the set of all possible observations at risk of anevent at time ti .

What observations belong in Ri?

All observations (censored and non-censored) j such that tj ≥ ti

For example, if ti = 5 months, then all observations that do not experiencethe event or are not censored before 5 months are at risk.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 176 / 242

Page 859: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We know that exactly one event occurred at each ti for all non-censored i .

Define a risk set Ri as the set of all possible observations at risk of anevent at time ti .

What observations belong in Ri?

All observations (censored and non-censored) j such that tj ≥ ti

For example, if ti = 5 months, then all observations that do not experiencethe event or are not censored before 5 months are at risk.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 176 / 242

Page 860: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We know that exactly one event occurred at each ti for all non-censored i .

Define a risk set Ri as the set of all possible observations at risk of anevent at time ti .

What observations belong in Ri?

All observations (censored and non-censored) j such that tj ≥ ti

For example, if ti = 5 months, then all observations that do not experiencethe event or are not censored before 5 months are at risk.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 176 / 242

Page 861: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We can then create a partial likelihood function:

L =n∏

i=1

[P(event occurred in i |event occurred in Ri )]ci

=n∏

i=1

[P(event occurred in i)

P(event occurred in Ri )

]ci=

n∏i=1

[h(ti )∑j∈Ri

h(tj)

]ci

=n∏

i=1

[h0(t)hi (ti )∑j∈Ri

h0(t)hj(tj)

]ci

=n∏

i=1

[hi (ti )∑j∈Ri

hj(tj)

]ci

h0(t) is the baseline hazard, which is the same for all observations, so itcancels out.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 177 / 242

Page 862: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We can then create a partial likelihood function:

L =n∏

i=1

[P(event occurred in i |event occurred in Ri )]ci

=n∏

i=1

[P(event occurred in i)

P(event occurred in Ri )

]ci=

n∏i=1

[h(ti )∑j∈Ri

h(tj)

]ci

=n∏

i=1

[h0(t)hi (ti )∑j∈Ri

h0(t)hj(tj)

]ci

=n∏

i=1

[hi (ti )∑j∈Ri

hj(tj)

]ci

h0(t) is the baseline hazard, which is the same for all observations, so itcancels out.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 177 / 242

Page 863: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We can then create a partial likelihood function:

L =n∏

i=1

[P(event occurred in i |event occurred in Ri )]ci

=n∏

i=1

[P(event occurred in i)

P(event occurred in Ri )

]ci

=n∏

i=1

[h(ti )∑j∈Ri

h(tj)

]ci

=n∏

i=1

[h0(t)hi (ti )∑j∈Ri

h0(t)hj(tj)

]ci

=n∏

i=1

[hi (ti )∑j∈Ri

hj(tj)

]ci

h0(t) is the baseline hazard, which is the same for all observations, so itcancels out.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 177 / 242

Page 864: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We can then create a partial likelihood function:

L =n∏

i=1

[P(event occurred in i |event occurred in Ri )]ci

=n∏

i=1

[P(event occurred in i)

P(event occurred in Ri )

]ci=

n∏i=1

[h(ti )∑j∈Ri

h(tj)

]ci

=n∏

i=1

[h0(t)hi (ti )∑j∈Ri

h0(t)hj(tj)

]ci

=n∏

i=1

[hi (ti )∑j∈Ri

hj(tj)

]ci

h0(t) is the baseline hazard, which is the same for all observations, so itcancels out.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 177 / 242

Page 865: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We can then create a partial likelihood function:

L =n∏

i=1

[P(event occurred in i |event occurred in Ri )]ci

=n∏

i=1

[P(event occurred in i)

P(event occurred in Ri )

]ci=

n∏i=1

[h(ti )∑j∈Ri

h(tj)

]ci

=n∏

i=1

[h0(t)hi (ti )∑j∈Ri

h0(t)hj(tj)

]ci

=n∏

i=1

[hi (ti )∑j∈Ri

hj(tj)

]ci

h0(t) is the baseline hazard, which is the same for all observations, so itcancels out.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 177 / 242

Page 866: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We can then create a partial likelihood function:

L =n∏

i=1

[P(event occurred in i |event occurred in Ri )]ci

=n∏

i=1

[P(event occurred in i)

P(event occurred in Ri )

]ci=

n∏i=1

[h(ti )∑j∈Ri

h(tj)

]ci

=n∏

i=1

[h0(t)hi (ti )∑j∈Ri

h0(t)hj(tj)

]ci

=n∏

i=1

[hi (ti )∑j∈Ri

hj(tj)

]ci

h0(t) is the baseline hazard, which is the same for all observations, so itcancels out.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 177 / 242

Page 867: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

We can then create a partial likelihood function:

L =n∏

i=1

[P(event occurred in i |event occurred in Ri )]ci

=n∏

i=1

[P(event occurred in i)

P(event occurred in Ri )

]ci=

n∏i=1

[h(ti )∑j∈Ri

h(tj)

]ci

=n∏

i=1

[h0(t)hi (ti )∑j∈Ri

h0(t)hj(tj)

]ci

=n∏

i=1

[hi (ti )∑j∈Ri

hj(tj)

]ci

h0(t) is the baseline hazard, which is the same for all observations, so itcancels out.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 177 / 242

Page 868: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Like in parametric models, h(t) is modeled with covariates:

hi (ti ) = exiβ

Note that a positive β now suggests that an increase in x increases thehazard and decreases survival time.

L =n∏

i=1

[exiβ∑j∈Ri

exjβ

]ci

There is no β0 term estimated. This implies that the shape of the baselinehazard is left unmodeled.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 178 / 242

Page 869: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Like in parametric models, h(t) is modeled with covariates:

hi (ti ) = exiβ

Note that a positive β now suggests that an increase in x increases thehazard and decreases survival time.

L =n∏

i=1

[exiβ∑j∈Ri

exjβ

]ci

There is no β0 term estimated. This implies that the shape of the baselinehazard is left unmodeled.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 178 / 242

Page 870: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Like in parametric models, h(t) is modeled with covariates:

hi (ti ) = exiβ

Note that a positive β now suggests that an increase in x increases thehazard and decreases survival time.

L =n∏

i=1

[exiβ∑j∈Ri

exjβ

]ci

There is no β0 term estimated. This implies that the shape of the baselinehazard is left unmodeled.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 178 / 242

Page 871: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Like in parametric models, h(t) is modeled with covariates:

hi (ti ) = exiβ

Note that a positive β now suggests that an increase in x increases thehazard and decreases survival time.

L =n∏

i=1

[exiβ∑j∈Ri

exjβ

]ci

There is no β0 term estimated. This implies that the shape of the baselinehazard is left unmodeled.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 178 / 242

Page 872: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Like in parametric models, h(t) is modeled with covariates:

hi (ti ) = exiβ

Note that a positive β now suggests that an increase in x increases thehazard and decreases survival time.

L =n∏

i=1

[exiβ∑j∈Ri

exjβ

]ci

There is no β0 term estimated.

This implies that the shape of the baselinehazard is left unmodeled.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 178 / 242

Page 873: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Like in parametric models, h(t) is modeled with covariates:

hi (ti ) = exiβ

Note that a positive β now suggests that an increase in x increases thehazard and decreases survival time.

L =n∏

i=1

[exiβ∑j∈Ri

exjβ

]ci

There is no β0 term estimated. This implies that the shape of the baselinehazard is left unmodeled.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 178 / 242

Page 874: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 179 / 242

Page 875: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 179 / 242

Page 876: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 179 / 242

Page 877: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 179 / 242

Page 878: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 179 / 242

Page 879: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 179 / 242

Page 880: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Pros:

Makes no restrictive assumption about the shape of the hazard.

A better choice if you want the effects of the covariates and thenature of the time dependence is unimportant.

Cons:

Only quantities of interest are hazard ratios.

Can be subject to overfitting

Shape of hazard is unknown (although there are semi-parametric waysto derive the hazard and survivor functions)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 179 / 242

Page 881: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do I run a Cox proportional hazards model in R?

Use the coxph() function in the survival package (also in the Design

and Zelig packages).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 180 / 242

Page 882: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How do I run a Cox proportional hazards model in R?

Use the coxph() function in the survival package (also in the Design

and Zelig packages).

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 180 / 242

Page 883: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Alternatives

Survival models are cool . . .

but hard.

There are other things you can model:I Perhaps some observations are more likely to fail than others: frailty

modelsI Perhaps some observations you don’t expect to fail at all: split

population modelsI Perhaps there can be more than one type of event: competing risks

model

If you encounter survival data think carefully about the process and thenchoose a corresponding model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 181 / 242

Page 884: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Alternatives

Survival models are cool . . . but hard.

There are other things you can model:I Perhaps some observations are more likely to fail than others: frailty

modelsI Perhaps some observations you don’t expect to fail at all: split

population modelsI Perhaps there can be more than one type of event: competing risks

model

If you encounter survival data think carefully about the process and thenchoose a corresponding model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 181 / 242

Page 885: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Alternatives

Survival models are cool . . . but hard.

There are other things you can model:

I Perhaps some observations are more likely to fail than others: frailtymodels

I Perhaps some observations you don’t expect to fail at all: splitpopulation models

I Perhaps there can be more than one type of event: competing risksmodel

If you encounter survival data think carefully about the process and thenchoose a corresponding model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 181 / 242

Page 886: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Alternatives

Survival models are cool . . . but hard.

There are other things you can model:I Perhaps some observations are more likely to fail than others: frailty

models

I Perhaps some observations you don’t expect to fail at all: splitpopulation models

I Perhaps there can be more than one type of event: competing risksmodel

If you encounter survival data think carefully about the process and thenchoose a corresponding model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 181 / 242

Page 887: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Alternatives

Survival models are cool . . . but hard.

There are other things you can model:I Perhaps some observations are more likely to fail than others: frailty

modelsI Perhaps some observations you don’t expect to fail at all: split

population models

I Perhaps there can be more than one type of event: competing risksmodel

If you encounter survival data think carefully about the process and thenchoose a corresponding model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 181 / 242

Page 888: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Alternatives

Survival models are cool . . . but hard.

There are other things you can model:I Perhaps some observations are more likely to fail than others: frailty

modelsI Perhaps some observations you don’t expect to fail at all: split

population modelsI Perhaps there can be more than one type of event: competing risks

model

If you encounter survival data think carefully about the process and thenchoose a corresponding model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 181 / 242

Page 889: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Alternatives

Survival models are cool . . . but hard.

There are other things you can model:I Perhaps some observations are more likely to fail than others: frailty

modelsI Perhaps some observations you don’t expect to fail at all: split

population modelsI Perhaps there can be more than one type of event: competing risks

model

If you encounter survival data think carefully about the process and thenchoose a corresponding model.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 181 / 242

Page 890: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

References:

Box-Steffensmeier, Janet M. and Bradford S. Jones. 2004. Event HistoryModeling. Cambridge University Press.

Therneau, Terry M., and Patricia M. Grambsch. 2013 Modeling survivaldata: extending the Cox model. Springer Science & Business Media.

Andersen, Per Kragh, et al. 2012 Statistical models based on countingprocesses. Springer Science & Business Media.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 182 / 242

Page 891: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 183 / 242

Page 892: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 183 / 242

Page 893: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do Survival Models Relate to Duration Dependencein a Logit Model?

Based on Beck, Katz, and Tucker (1998)

Suppose we have Time-Series Cross-Sectional Data with a binarydependent variable.

I For example, if we had data on country dyads over 50 years, with thedependent variable being whether there was a war between the twocountries in each year.

Not all observations are independent. We may see some durationdependence.

I Perhaps countries that have been at peace for 100 years may be lesslikely to go to war than countries that have been at peace for only 2years.

How can we account for this duration dependence in a logit model?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 184 / 242

Page 894: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do Survival Models Relate to Duration Dependencein a Logit Model?

Based on Beck, Katz, and Tucker (1998)

Suppose we have Time-Series Cross-Sectional Data with a binarydependent variable.

I For example, if we had data on country dyads over 50 years, with thedependent variable being whether there was a war between the twocountries in each year.

Not all observations are independent. We may see some durationdependence.

I Perhaps countries that have been at peace for 100 years may be lesslikely to go to war than countries that have been at peace for only 2years.

How can we account for this duration dependence in a logit model?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 184 / 242

Page 895: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do Survival Models Relate to Duration Dependencein a Logit Model?

Based on Beck, Katz, and Tucker (1998)

Suppose we have Time-Series Cross-Sectional Data with a binarydependent variable.

I For example, if we had data on country dyads over 50 years, with thedependent variable being whether there was a war between the twocountries in each year.

Not all observations are independent. We may see some durationdependence.

I Perhaps countries that have been at peace for 100 years may be lesslikely to go to war than countries that have been at peace for only 2years.

How can we account for this duration dependence in a logit model?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 184 / 242

Page 896: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do Survival Models Relate to Duration Dependencein a Logit Model?

Based on Beck, Katz, and Tucker (1998)

Suppose we have Time-Series Cross-Sectional Data with a binarydependent variable.

I For example, if we had data on country dyads over 50 years, with thedependent variable being whether there was a war between the twocountries in each year.

Not all observations are independent. We may see some durationdependence.

I Perhaps countries that have been at peace for 100 years may be lesslikely to go to war than countries that have been at peace for only 2years.

How can we account for this duration dependence in a logit model?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 184 / 242

Page 897: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do Survival Models Relate to Duration Dependencein a Logit Model?

Based on Beck, Katz, and Tucker (1998)

Suppose we have Time-Series Cross-Sectional Data with a binarydependent variable.

I For example, if we had data on country dyads over 50 years, with thedependent variable being whether there was a war between the twocountries in each year.

Not all observations are independent. We may see some durationdependence.

I Perhaps countries that have been at peace for 100 years may be lesslikely to go to war than countries that have been at peace for only 2years.

How can we account for this duration dependence in a logit model?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 184 / 242

Page 898: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do Survival Models Relate to Duration Dependencein a Logit Model?

Based on Beck, Katz, and Tucker (1998)

Suppose we have Time-Series Cross-Sectional Data with a binarydependent variable.

I For example, if we had data on country dyads over 50 years, with thedependent variable being whether there was a war between the twocountries in each year.

Not all observations are independent. We may see some durationdependence.

I Perhaps countries that have been at peace for 100 years may be lesslikely to go to war than countries that have been at peace for only 2years.

How can we account for this duration dependence in a logit model?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 184 / 242

Page 899: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

How Do Survival Models Relate to Duration Dependencein a Logit Model?

Based on Beck, Katz, and Tucker (1998)

Suppose we have Time-Series Cross-Sectional Data with a binarydependent variable.

I For example, if we had data on country dyads over 50 years, with thedependent variable being whether there was a war between the twocountries in each year.

Not all observations are independent. We may see some durationdependence.

I Perhaps countries that have been at peace for 100 years may be lesslikely to go to war than countries that have been at peace for only 2years.

How can we account for this duration dependence in a logit model?

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 184 / 242

Page 900: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Think of the observations as grouped duration data:

Year tk Dyad Yi Ti

1992 1 US-Iraq 01993 2 US-Iraq 01994 3 US-Iraq 01995 4 US-Iraq 01996 5 US-Iraq 01997 6 US-Iraq 01998 7 US-Iraq 0 121999 8 US-Iraq 02000 9 US-Iraq 02001 10 US-Iraq 02002 11 US-Iraq 02003 12 US-Iraq 1

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 185 / 242

Page 901: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Think of the observations as grouped duration data:

Year tk Dyad Yi Ti

1992 1 US-Iraq 01993 2 US-Iraq 01994 3 US-Iraq 01995 4 US-Iraq 01996 5 US-Iraq 01997 6 US-Iraq 01998 7 US-Iraq 0 121999 8 US-Iraq 02000 9 US-Iraq 02001 10 US-Iraq 02002 11 US-Iraq 02003 12 US-Iraq 1

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 185 / 242

Page 902: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Then we end up with:

P(yi ,tk = 1|xi ,tk ) = h(tk |xi ,tk )

= 1− P(surviving beyond tk |survival up to tk−1)

It can be shown in general that

S(t) = e−∫ t

0 h(u)du

So then we get

P(yi ,tk = 1|xi ,tk ) = 1− e−

∫ tktk−1

h(u)du

where we take the integral from tk−1 to tk in order to get the conditionalsurvival.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 186 / 242

Page 903: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Then we end up with:

P(yi ,tk = 1|xi ,tk ) = h(tk |xi ,tk )

= 1− P(surviving beyond tk |survival up to tk−1)

It can be shown in general that

S(t) = e−∫ t

0 h(u)du

So then we get

P(yi ,tk = 1|xi ,tk ) = 1− e−

∫ tktk−1

h(u)du

where we take the integral from tk−1 to tk in order to get the conditionalsurvival.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 186 / 242

Page 904: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Then we end up with:

P(yi ,tk = 1|xi ,tk ) = h(tk |xi ,tk )

= 1− P(surviving beyond tk |survival up to tk−1)

It can be shown in general that

S(t) = e−∫ t

0 h(u)du

So then we get

P(yi ,tk = 1|xi ,tk ) = 1− e−

∫ tktk−1

h(u)du

where we take the integral from tk−1 to tk in order to get the conditionalsurvival.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 186 / 242

Page 905: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Then we end up with:

P(yi ,tk = 1|xi ,tk ) = h(tk |xi ,tk )

= 1− P(surviving beyond tk |survival up to tk−1)

It can be shown in general that

S(t) = e−∫ t

0 h(u)du

So then we get

P(yi ,tk = 1|xi ,tk ) = 1− e−

∫ tktk−1

h(u)du

where we take the integral from tk−1 to tk in order to get the conditionalsurvival.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 186 / 242

Page 906: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

P(yi ,tk = 1|xi ,tk ) = 1− exp

(−∫ tk

tk−1

h(u)du

)

= 1− exp

(−∫ tk

tk−1

exi,tk βh0(u)du

)

= 1− exp

(−exi,tk β

∫ tk

tk−1

h0(u)du

)= 1− exp

(−exi,tk βαtk

)= 1− exp

(−exi,tk β+κtk

)

This is equivalent to a model with a complementary log-log (cloglog) linkand time dummies κtk .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 187 / 242

Page 907: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

P(yi ,tk = 1|xi ,tk ) = 1− exp

(−∫ tk

tk−1

h(u)du

)

= 1− exp

(−∫ tk

tk−1

exi,tk βh0(u)du

)

= 1− exp

(−exi,tk β

∫ tk

tk−1

h0(u)du

)= 1− exp

(−exi,tk βαtk

)= 1− exp

(−exi,tk β+κtk

)

This is equivalent to a model with a complementary log-log (cloglog) linkand time dummies κtk .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 187 / 242

Page 908: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

P(yi ,tk = 1|xi ,tk ) = 1− exp

(−∫ tk

tk−1

h(u)du

)

= 1− exp

(−∫ tk

tk−1

exi,tk βh0(u)du

)

= 1− exp

(−exi,tk β

∫ tk

tk−1

h0(u)du

)

= 1− exp(−exi,tk βαtk

)= 1− exp

(−exi,tk β+κtk

)

This is equivalent to a model with a complementary log-log (cloglog) linkand time dummies κtk .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 187 / 242

Page 909: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

P(yi ,tk = 1|xi ,tk ) = 1− exp

(−∫ tk

tk−1

h(u)du

)

= 1− exp

(−∫ tk

tk−1

exi,tk βh0(u)du

)

= 1− exp

(−exi,tk β

∫ tk

tk−1

h0(u)du

)= 1− exp

(−exi,tk βαtk

)

= 1− exp(−exi,tk β+κtk

)

This is equivalent to a model with a complementary log-log (cloglog) linkand time dummies κtk .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 187 / 242

Page 910: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

P(yi ,tk = 1|xi ,tk ) = 1− exp

(−∫ tk

tk−1

h(u)du

)

= 1− exp

(−∫ tk

tk−1

exi,tk βh0(u)du

)

= 1− exp

(−exi,tk β

∫ tk

tk−1

h0(u)du

)= 1− exp

(−exi,tk βαtk

)= 1− exp

(−exi,tk β+κtk

)

This is equivalent to a model with a complementary log-log (cloglog) linkand time dummies κtk .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 187 / 242

Page 911: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

P(yi ,tk = 1|xi ,tk ) = 1− exp

(−∫ tk

tk−1

h(u)du

)

= 1− exp

(−∫ tk

tk−1

exi,tk βh0(u)du

)

= 1− exp

(−exi,tk β

∫ tk

tk−1

h0(u)du

)= 1− exp

(−exi,tk βαtk

)= 1− exp

(−exi,tk β+κtk

)

This is equivalent to a model with a complementary log-log (cloglog) linkand time dummies κtk .

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 187 / 242

Page 912: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

BKT suggest using a logit link instead of a cloglog link because logitis more widely used (and nobody knows what a cloglog link is exceptyou guys!).

As long as probability of an event does not exceed 50 percent, logitand cloglog links are very similar.

The use of time dummies means that we are imposing no structure onthe nature of duration dependence (structure of the hazard).

If we don’t use time dummies, we are assuming no durationdependence (flat hazard)

Using a variable such as “number of years at peace” instead of timedummies imposes a monotonic hazard.

The use of time dummies may use up a lot of degrees of freedom, soBKT suggest using restricted cubic splines.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 188 / 242

Page 913: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

BKT suggest using a logit link instead of a cloglog link because logitis more widely used (and nobody knows what a cloglog link is exceptyou guys!).

As long as probability of an event does not exceed 50 percent, logitand cloglog links are very similar.

The use of time dummies means that we are imposing no structure onthe nature of duration dependence (structure of the hazard).

If we don’t use time dummies, we are assuming no durationdependence (flat hazard)

Using a variable such as “number of years at peace” instead of timedummies imposes a monotonic hazard.

The use of time dummies may use up a lot of degrees of freedom, soBKT suggest using restricted cubic splines.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 188 / 242

Page 914: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

BKT suggest using a logit link instead of a cloglog link because logitis more widely used (and nobody knows what a cloglog link is exceptyou guys!).

As long as probability of an event does not exceed 50 percent, logitand cloglog links are very similar.

The use of time dummies means that we are imposing no structure onthe nature of duration dependence (structure of the hazard).

If we don’t use time dummies, we are assuming no durationdependence (flat hazard)

Using a variable such as “number of years at peace” instead of timedummies imposes a monotonic hazard.

The use of time dummies may use up a lot of degrees of freedom, soBKT suggest using restricted cubic splines.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 188 / 242

Page 915: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

BKT suggest using a logit link instead of a cloglog link because logitis more widely used (and nobody knows what a cloglog link is exceptyou guys!).

As long as probability of an event does not exceed 50 percent, logitand cloglog links are very similar.

The use of time dummies means that we are imposing no structure onthe nature of duration dependence (structure of the hazard).

If we don’t use time dummies, we are assuming no durationdependence (flat hazard)

Using a variable such as “number of years at peace” instead of timedummies imposes a monotonic hazard.

The use of time dummies may use up a lot of degrees of freedom, soBKT suggest using restricted cubic splines.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 188 / 242

Page 916: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

BKT suggest using a logit link instead of a cloglog link because logitis more widely used (and nobody knows what a cloglog link is exceptyou guys!).

As long as probability of an event does not exceed 50 percent, logitand cloglog links are very similar.

The use of time dummies means that we are imposing no structure onthe nature of duration dependence (structure of the hazard).

If we don’t use time dummies, we are assuming no durationdependence (flat hazard)

Using a variable such as “number of years at peace” instead of timedummies imposes a monotonic hazard.

The use of time dummies may use up a lot of degrees of freedom, soBKT suggest using restricted cubic splines.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 188 / 242

Page 917: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

BKT suggest using a logit link instead of a cloglog link because logitis more widely used (and nobody knows what a cloglog link is exceptyou guys!).

As long as probability of an event does not exceed 50 percent, logitand cloglog links are very similar.

The use of time dummies means that we are imposing no structure onthe nature of duration dependence (structure of the hazard).

If we don’t use time dummies, we are assuming no durationdependence (flat hazard)

Using a variable such as “number of years at peace” instead of timedummies imposes a monotonic hazard.

The use of time dummies may use up a lot of degrees of freedom, soBKT suggest using restricted cubic splines.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 188 / 242

Page 918: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Possible complications:

Multiple eventsI Assumes that multiple events are independent (independence of

observations assumption in a survival model).

Left censoringI Countries may have been at peace long before we start observing data,

and we don’t know when that “peace duration” began.

Variables that do not vary across unitsI May be collinear with time dummies.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 189 / 242

Page 919: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Possible complications:

Multiple events

I Assumes that multiple events are independent (independence ofobservations assumption in a survival model).

Left censoringI Countries may have been at peace long before we start observing data,

and we don’t know when that “peace duration” began.

Variables that do not vary across unitsI May be collinear with time dummies.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 189 / 242

Page 920: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Possible complications:

Multiple eventsI Assumes that multiple events are independent (independence of

observations assumption in a survival model).

Left censoringI Countries may have been at peace long before we start observing data,

and we don’t know when that “peace duration” began.

Variables that do not vary across unitsI May be collinear with time dummies.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 189 / 242

Page 921: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Possible complications:

Multiple eventsI Assumes that multiple events are independent (independence of

observations assumption in a survival model).

Left censoring

I Countries may have been at peace long before we start observing data,and we don’t know when that “peace duration” began.

Variables that do not vary across unitsI May be collinear with time dummies.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 189 / 242

Page 922: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Possible complications:

Multiple eventsI Assumes that multiple events are independent (independence of

observations assumption in a survival model).

Left censoringI Countries may have been at peace long before we start observing data,

and we don’t know when that “peace duration” began.

Variables that do not vary across unitsI May be collinear with time dummies.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 189 / 242

Page 923: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Possible complications:

Multiple eventsI Assumes that multiple events are independent (independence of

observations assumption in a survival model).

Left censoringI Countries may have been at peace long before we start observing data,

and we don’t know when that “peace duration” began.

Variables that do not vary across units

I May be collinear with time dummies.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 189 / 242

Page 924: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Possible complications:

Multiple eventsI Assumes that multiple events are independent (independence of

observations assumption in a survival model).

Left censoringI Countries may have been at peace long before we start observing data,

and we don’t know when that “peace duration” began.

Variables that do not vary across unitsI May be collinear with time dummies.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 189 / 242

Page 925: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 926: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 927: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear model

I binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 928: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probit

I count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 929: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomial

I ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 930: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logit

I categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 931: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probit

I duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 932: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 933: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component

2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 934: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood

3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 935: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE

4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 936: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 937: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

First Half of Course Summary

In the first six weeks we have covered: maximum likelihood estimation,generalized linear models and a general approach to quantities of interest

We touched on at least briefly on standard models for most types of datayou will encounter:

I continuous: normal linear modelI binary: logit, probitI count: poisson, negative binomialI ordered: ordinal probit, ordinal logitI categorical: multinomial logit, multinomial probitI duration: exponential, weibull, cox proportional hazards

Each model followed a similar pattern:

1 define a model with a stochastic and systematic component2 derive the log-likelihood3 estimate via MLE4 interpret quantities of interest

You can now interpret these models and learn new ones.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 190 / 242

Page 938: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Preview

Weeks 7-8 Missing DataI Mixture Models and the Expectation Maximization algorithmI Missing Data and Multiple Imputation

Weeks 9-10 Causal InferenceI Model Dependence and MatchingI Explanation in Causal Inference with Moderation/Mediation

Weeks 11-12 Hierarchical ModelsI Regularization and Hierarchical ModelsI More Hierarchical Models and Wrap-up

These topics are a long-term bet on things that will be important in yourcareer. Also a short case-study in reading into a new statistical literature.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 191 / 242

Page 939: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Preview

Weeks 7-8 Missing DataI Mixture Models and the Expectation Maximization algorithmI Missing Data and Multiple Imputation

Weeks 9-10 Causal InferenceI Model Dependence and MatchingI Explanation in Causal Inference with Moderation/Mediation

Weeks 11-12 Hierarchical ModelsI Regularization and Hierarchical ModelsI More Hierarchical Models and Wrap-up

These topics are a long-term bet on things that will be important in yourcareer. Also a short case-study in reading into a new statistical literature.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 191 / 242

Page 940: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Preview

Weeks 7-8 Missing DataI Mixture Models and the Expectation Maximization algorithmI Missing Data and Multiple Imputation

Weeks 9-10 Causal InferenceI Model Dependence and MatchingI Explanation in Causal Inference with Moderation/Mediation

Weeks 11-12 Hierarchical ModelsI Regularization and Hierarchical ModelsI More Hierarchical Models and Wrap-up

These topics are a long-term bet on things that will be important in yourcareer. Also a short case-study in reading into a new statistical literature.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 191 / 242

Page 941: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Preview

Weeks 7-8 Missing DataI Mixture Models and the Expectation Maximization algorithmI Missing Data and Multiple Imputation

Weeks 9-10 Causal InferenceI Model Dependence and MatchingI Explanation in Causal Inference with Moderation/Mediation

Weeks 11-12 Hierarchical ModelsI Regularization and Hierarchical ModelsI More Hierarchical Models and Wrap-up

These topics are a long-term bet on things that will be important in yourcareer. Also a short case-study in reading into a new statistical literature.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 191 / 242

Page 942: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Preview

Weeks 7-8 Missing DataI Mixture Models and the Expectation Maximization algorithmI Missing Data and Multiple Imputation

Weeks 9-10 Causal InferenceI Model Dependence and MatchingI Explanation in Causal Inference with Moderation/Mediation

Weeks 11-12 Hierarchical ModelsI Regularization and Hierarchical ModelsI More Hierarchical Models and Wrap-up

These topics are a long-term bet on things that will be important in yourcareer. Also a short case-study in reading into a new statistical literature.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 191 / 242

Page 943: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 192 / 242

Page 944: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 192 / 242

Page 945: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Identification

Reading: Unifying Political Methodology, Chapter 8

Definition of “identification”

I Qualitative: we can learn the parameter with infinite drawsI Mathematical: Each parameter value produces unique likelihood valueI Graphical: A likelihood with a plateau at the maximum

Partially identified models: the likelihood is informative but not abouta single point

Non-identified models: include those that make little sense, even ifhard to tell.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 193 / 242

Page 946: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Identification

Reading: Unifying Political Methodology, Chapter 8

Definition of “identification”

I Qualitative: we can learn the parameter with infinite drawsI Mathematical: Each parameter value produces unique likelihood valueI Graphical: A likelihood with a plateau at the maximum

Partially identified models: the likelihood is informative but not abouta single point

Non-identified models: include those that make little sense, even ifhard to tell.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 193 / 242

Page 947: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Identification

Reading: Unifying Political Methodology, Chapter 8

Definition of “identification”

I Qualitative: we can learn the parameter with infinite drawsI Mathematical: Each parameter value produces unique likelihood valueI Graphical: A likelihood with a plateau at the maximum

Partially identified models: the likelihood is informative but not abouta single point

Non-identified models: include those that make little sense, even ifhard to tell.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 193 / 242

Page 948: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Identification

Reading: Unifying Political Methodology, Chapter 8

Definition of “identification”I Qualitative: we can learn the parameter with infinite draws

I Mathematical: Each parameter value produces unique likelihood valueI Graphical: A likelihood with a plateau at the maximum

Partially identified models: the likelihood is informative but not abouta single point

Non-identified models: include those that make little sense, even ifhard to tell.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 193 / 242

Page 949: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Identification

Reading: Unifying Political Methodology, Chapter 8

Definition of “identification”I Qualitative: we can learn the parameter with infinite drawsI Mathematical: Each parameter value produces unique likelihood value

I Graphical: A likelihood with a plateau at the maximum

Partially identified models: the likelihood is informative but not abouta single point

Non-identified models: include those that make little sense, even ifhard to tell.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 193 / 242

Page 950: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Identification

Reading: Unifying Political Methodology, Chapter 8

Definition of “identification”I Qualitative: we can learn the parameter with infinite drawsI Mathematical: Each parameter value produces unique likelihood valueI Graphical: A likelihood with a plateau at the maximum

Partially identified models: the likelihood is informative but not abouta single point

Non-identified models: include those that make little sense, even ifhard to tell.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 193 / 242

Page 951: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Identification

Reading: Unifying Political Methodology, Chapter 8

Definition of “identification”I Qualitative: we can learn the parameter with infinite drawsI Mathematical: Each parameter value produces unique likelihood valueI Graphical: A likelihood with a plateau at the maximum

Partially identified models: the likelihood is informative but not abouta single point

Non-identified models: include those that make little sense, even ifhard to tell.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 193 / 242

Page 952: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Identification

Reading: Unifying Political Methodology, Chapter 8

Definition of “identification”I Qualitative: we can learn the parameter with infinite drawsI Mathematical: Each parameter value produces unique likelihood valueI Graphical: A likelihood with a plateau at the maximum

Partially identified models: the likelihood is informative but not abouta single point

Non-identified models: include those that make little sense, even ifhard to tell.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 193 / 242

Page 953: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat Likelihoods

A (dumb) model:Yi ∼ fp(yi |λi )

λi = 1 + 0β

What do we know about β? (from the likelihood perspective)

L(λ|y) =n∏

i=1

e−λλyi

yi !

and the log-likelihood, with (1 + 0β) substituted for λi :

ln L(β|y) =n∑

i=1

{−(0β + 1)− yi ln(0β + 1)}

=n∑

i=1

−1

= −n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 194 / 242

Page 954: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat LikelihoodsA (dumb) model:

Yi ∼ fp(yi |λi )

λi = 1 + 0β

What do we know about β? (from the likelihood perspective)

L(λ|y) =n∏

i=1

e−λλyi

yi !

and the log-likelihood, with (1 + 0β) substituted for λi :

ln L(β|y) =n∑

i=1

{−(0β + 1)− yi ln(0β + 1)}

=n∑

i=1

−1

= −n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 194 / 242

Page 955: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat LikelihoodsA (dumb) model:

Yi ∼ fp(yi |λi )

λi = 1 + 0β

What do we know about β? (from the likelihood perspective)

L(λ|y) =n∏

i=1

e−λλyi

yi !

and the log-likelihood, with (1 + 0β) substituted for λi :

ln L(β|y) =n∑

i=1

{−(0β + 1)− yi ln(0β + 1)}

=n∑

i=1

−1

= −n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 194 / 242

Page 956: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat LikelihoodsA (dumb) model:

Yi ∼ fp(yi |λi )

λi = 1 + 0β

What do we know about β? (from the likelihood perspective)

L(λ|y) =n∏

i=1

e−λλyi

yi !

and the log-likelihood, with (1 + 0β) substituted for λi :

ln L(β|y) =n∑

i=1

{−(0β + 1)− yi ln(0β + 1)}

=n∑

i=1

−1

= −n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 194 / 242

Page 957: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat LikelihoodsA (dumb) model:

Yi ∼ fp(yi |λi )

λi = 1 + 0β

What do we know about β? (from the likelihood perspective)

L(λ|y) =n∏

i=1

e−λλyi

yi !

and the log-likelihood, with (1 + 0β) substituted for λi :

ln L(β|y) =n∑

i=1

{−(0β + 1)− yi ln(0β + 1)}

=n∑

i=1

−1

= −n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 194 / 242

Page 958: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat LikelihoodsA (dumb) model:

Yi ∼ fp(yi |λi )

λi = 1 + 0β

What do we know about β? (from the likelihood perspective)

L(λ|y) =n∏

i=1

e−λλyi

yi !

and the log-likelihood, with (1 + 0β) substituted for λi :

ln L(β|y) =n∑

i=1

{−(0β + 1)− yi ln(0β + 1)}

=n∑

i=1

−1

= −n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 194 / 242

Page 959: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat LikelihoodsA (dumb) model:

Yi ∼ fp(yi |λi )

λi = 1 + 0β

What do we know about β? (from the likelihood perspective)

L(λ|y) =n∏

i=1

e−λλyi

yi !

and the log-likelihood, with (1 + 0β) substituted for λi :

ln L(β|y) =n∑

i=1

{−(0β + 1)− yi ln(0β + 1)}

=n∑

i=1

−1

= −n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 194 / 242

Page 960: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat LikelihoodsA (dumb) model:

Yi ∼ fp(yi |λi )

λi = 1 + 0β

What do we know about β? (from the likelihood perspective)

L(λ|y) =n∏

i=1

e−λλyi

yi !

and the log-likelihood, with (1 + 0β) substituted for λi :

ln L(β|y) =n∑

i=1

{−(0β + 1)− yi ln(0β + 1)}

=n∑

i=1

−1

= −n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 194 / 242

Page 961: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat LikelihoodsA (dumb) model:

Yi ∼ fp(yi |λi )

λi = 1 + 0β

What do we know about β? (from the likelihood perspective)

L(λ|y) =n∏

i=1

e−λλyi

yi !

and the log-likelihood, with (1 + 0β) substituted for λi :

ln L(β|y) =n∑

i=1

{−(0β + 1)− yi ln(0β + 1)}

=n∑

i=1

−1

= −n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 194 / 242

Page 962: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat LikelihoodsA (dumb) model:

Yi ∼ fp(yi |λi )

λi = 1 + 0β

What do we know about β? (from the likelihood perspective)

L(λ|y) =n∏

i=1

e−λλyi

yi !

and the log-likelihood, with (1 + 0β) substituted for λi :

ln L(β|y) =n∑

i=1

{−(0β + 1)− yi ln(0β + 1)}

=n∑

i=1

−1

= −n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 194 / 242

Page 963: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat Likelihoods

1. An identified likelihood has a unique maximum.

2. A likelihood function with a flat region or plateau at the maximum isnot identified.

3. A likelihood with a plateau can be informative, but a unique MLEdoesn’t exist

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 195 / 242

Page 964: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat Likelihoods

1. An identified likelihood has a unique maximum.

2. A likelihood function with a flat region or plateau at the maximum isnot identified.

3. A likelihood with a plateau can be informative, but a unique MLEdoesn’t exist

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 195 / 242

Page 965: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat Likelihoods

1. An identified likelihood has a unique maximum.

2. A likelihood function with a flat region or plateau at the maximum isnot identified.

3. A likelihood with a plateau can be informative, but a unique MLEdoesn’t exist

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 195 / 242

Page 966: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 1: Flat Likelihoods

1. An identified likelihood has a unique maximum.

2. A likelihood function with a flat region or plateau at the maximum isnot identified.

3. A likelihood with a plateau can be informative, but a unique MLEdoesn’t exist

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 195 / 242

Page 967: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3, where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 968: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3, where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 969: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3, where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 970: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3,

where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 971: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3, where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 972: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3, where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 973: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3, where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 974: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3, where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 975: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3, where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 976: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3, where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 977: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example 2: Non-unique Reparameterization

A model

Yi ∼ fN(yi |µi , σ2)

µi = x1iβ1 + x2iβ2 + x3iβ3, where x2i = x3i

= x1iβ1 + x2i (β2 + β3)

What is the (unique) MLE of β2 and β3? Different parameter values lead to thesame values of µ and thus the same likelihood values:

µi = x1iβ1 + x2i (5 + 3)

µi = x1iβ1 + x2i (3 + 5)

µi = x1iβ1 + x2i (7 + 1)

So {β2 = 2, β3 = 5} gives the same likelihood as {β2 = 5, β3 = 2}.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 196 / 242

Page 978: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Introduction to Multiple Equation Models

1. Let Yi be an N × 1 vector for each i (i = 1, . . . , n)

2. Elements of Yi are jointly distributed

YiN×1∼ f ( θi

N×1, αN×N

)

3. Systematic components:

θ1i = g1(x1i , β1)

θ2i = g2(x2i , β2)

...

θNi = gN(xNi , βN)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 197 / 242

Page 979: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Introduction to Multiple Equation Models

1. Let Yi be an N × 1 vector for each i (i = 1, . . . , n)

2. Elements of Yi are jointly distributed

YiN×1∼ f ( θi

N×1, αN×N

)

3. Systematic components:

θ1i = g1(x1i , β1)

θ2i = g2(x2i , β2)

...

θNi = gN(xNi , βN)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 197 / 242

Page 980: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Introduction to Multiple Equation Models

1. Let Yi be an N × 1 vector for each i (i = 1, . . . , n)

2. Elements of Yi are jointly distributed

YiN×1∼ f ( θi

N×1, αN×N

)

3. Systematic components:

θ1i = g1(x1i , β1)

θ2i = g2(x2i , β2)

...

θNi = gN(xNi , βN)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 197 / 242

Page 981: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Introduction to Multiple Equation Models

1. Let Yi be an N × 1 vector for each i (i = 1, . . . , n)

2. Elements of Yi are jointly distributed

YiN×1∼ f ( θi

N×1, αN×N

)

3. Systematic components:

θ1i = g1(x1i , β1)

θ2i = g2(x2i , β2)

...

θNi = gN(xNi , βN)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 197 / 242

Page 982: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Introduction to Multiple Equation Models

1. Let Yi be an N × 1 vector for each i (i = 1, . . . , n)

2. Elements of Yi are jointly distributed

YiN×1∼ f ( θi

N×1, αN×N

)

3. Systematic components:

θ1i = g1(x1i , β1)

θ2i = g2(x2i , β2)

...

θNi = gN(xNi , βN)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 197 / 242

Page 983: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Introduction to Multiple Equation Models

1. Let Yi be an N × 1 vector for each i (i = 1, . . . , n)

2. Elements of Yi are jointly distributed

YiN×1∼ f ( θi

N×1, αN×N

)

3. Systematic components:

θ1i = g1(x1i , β1)

θ2i = g2(x2i , β2)

...

θNi = gN(xNi , βN)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 197 / 242

Page 984: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Introduction to Multiple Equation Models

1. Let Yi be an N × 1 vector for each i (i = 1, . . . , n)

2. Elements of Yi are jointly distributed

YiN×1∼ f ( θi

N×1, αN×N

)

3. Systematic components:

θ1i = g1(x1i , β1)

θ2i = g2(x2i , β2)

...

θNi = gN(xNi , βN)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 197 / 242

Page 985: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Introduction to Multiple Equation Models

1. Let Yi be an N × 1 vector for each i (i = 1, . . . , n)

2. Elements of Yi are jointly distributed

YiN×1∼ f ( θi

N×1, αN×N

)

3. Systematic components:

θ1i = g1(x1i , β1)

θ2i = g2(x2i , β2)

...

θNi = gN(xNi , βN)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 197 / 242

Page 986: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When are Multiple Equation Models different from Nseparate equation-by-equation models?

When the elements of Yi are (conditional on X ),

1. Stochastically dependent, or

2. Parametrically dependent (shared parameters)

Example and proof:

Suppose no ancillary parameters, and N = 2. The joint density:

f (y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

(BTW, you now know how to form the likelihood for multiple equationmodels!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 198 / 242

Page 987: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When are Multiple Equation Models different from Nseparate equation-by-equation models?

When the elements of Yi are (conditional on X ),

1. Stochastically dependent, or

2. Parametrically dependent (shared parameters)

Example and proof:

Suppose no ancillary parameters, and N = 2. The joint density:

f (y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

(BTW, you now know how to form the likelihood for multiple equationmodels!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 198 / 242

Page 988: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When are Multiple Equation Models different from Nseparate equation-by-equation models?

When the elements of Yi are (conditional on X ),

1. Stochastically dependent, or

2. Parametrically dependent (shared parameters)

Example and proof:

Suppose no ancillary parameters, and N = 2. The joint density:

f (y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

(BTW, you now know how to form the likelihood for multiple equationmodels!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 198 / 242

Page 989: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When are Multiple Equation Models different from Nseparate equation-by-equation models?

When the elements of Yi are (conditional on X ),

1. Stochastically dependent, or

2. Parametrically dependent (shared parameters)

Example and proof:

Suppose no ancillary parameters, and N = 2. The joint density:

f (y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

(BTW, you now know how to form the likelihood for multiple equationmodels!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 198 / 242

Page 990: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When are Multiple Equation Models different from Nseparate equation-by-equation models?

When the elements of Yi are (conditional on X ),

1. Stochastically dependent, or

2. Parametrically dependent (shared parameters)

Example and proof:

Suppose no ancillary parameters, and N = 2. The joint density:

f (y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

(BTW, you now know how to form the likelihood for multiple equationmodels!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 198 / 242

Page 991: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When are Multiple Equation Models different from Nseparate equation-by-equation models?

When the elements of Yi are (conditional on X ),

1. Stochastically dependent, or

2. Parametrically dependent (shared parameters)

Example and proof:

Suppose no ancillary parameters, and N = 2. The joint density:

f (y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

(BTW, you now know how to form the likelihood for multiple equationmodels!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 198 / 242

Page 992: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When are Multiple Equation Models different from Nseparate equation-by-equation models?

When the elements of Yi are (conditional on X ),

1. Stochastically dependent, or

2. Parametrically dependent (shared parameters)

Example and proof:

Suppose no ancillary parameters, and N = 2. The joint density:

f (y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

(BTW, you now know how to form the likelihood for multiple equationmodels!)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 198 / 242

Page 993: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assuming stochastic independence lets us factor f :

P(y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

=n∏

i=1

f (y1i |θ1i )f (y2i |θ2i )

with log-likelihood

ln L(θ1, θ2|y) =n∑

i=1

ln f (y1i |θ1i ) +n∑

i=1

ln f (y2i |θ2i )

Also assume parametric independence, and you can estimate the equationsseparately.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 199 / 242

Page 994: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assuming stochastic independence lets us factor f :

P(y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

=n∏

i=1

f (y1i |θ1i )f (y2i |θ2i )

with log-likelihood

ln L(θ1, θ2|y) =n∑

i=1

ln f (y1i |θ1i ) +n∑

i=1

ln f (y2i |θ2i )

Also assume parametric independence, and you can estimate the equationsseparately.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 199 / 242

Page 995: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assuming stochastic independence lets us factor f :

P(y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

=n∏

i=1

f (y1i |θ1i )f (y2i |θ2i )

with log-likelihood

ln L(θ1, θ2|y) =n∑

i=1

ln f (y1i |θ1i ) +n∑

i=1

ln f (y2i |θ2i )

Also assume parametric independence, and you can estimate the equationsseparately.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 199 / 242

Page 996: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assuming stochastic independence lets us factor f :

P(y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

=n∏

i=1

f (y1i |θ1i )f (y2i |θ2i )

with log-likelihood

ln L(θ1, θ2|y) =n∑

i=1

ln f (y1i |θ1i ) +n∑

i=1

ln f (y2i |θ2i )

Also assume parametric independence, and you can estimate the equationsseparately.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 199 / 242

Page 997: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assuming stochastic independence lets us factor f :

P(y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

=n∏

i=1

f (y1i |θ1i )f (y2i |θ2i )

with log-likelihood

ln L(θ1, θ2|y) =n∑

i=1

ln f (y1i |θ1i ) +n∑

i=1

ln f (y2i |θ2i )

Also assume parametric independence, and you can estimate the equationsseparately.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 199 / 242

Page 998: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assuming stochastic independence lets us factor f :

P(y |θ) =n∏

i=1

f (y1i , y2i |θ1i , θ2i )

=n∏

i=1

f (y1i |θ1i )f (y2i |θ2i )

with log-likelihood

ln L(θ1, θ2|y) =n∑

i=1

ln f (y1i |θ1i ) +n∑

i=1

ln f (y2i |θ2i )

Also assume parametric independence, and you can estimate the equationsseparately.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 199 / 242

Page 999: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: 1992 U.S. Presidential Election

Alvarez and Nagler (1995):

Yi : Vote choice in the 1992 U.S. presidential election(1 = Clinton, 2 = Bush, 3 = Perot)

Bush Clinton Perot

1992 Presidential Election Vote Choice (ANES, n=909)

020

040

0

34%

46%

20%

Two types of predictors:

I Voter-specific (Vi ): age, gender, education, party, opinions, etc.

I Candidate-varying (Xij): ideological distance between voter i andcandidate j

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 200 / 242

Page 1000: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: 1992 U.S. Presidential Election

Alvarez and Nagler (1995):

Yi : Vote choice in the 1992 U.S. presidential election(1 = Clinton, 2 = Bush, 3 = Perot)

Bush Clinton Perot

1992 Presidential Election Vote Choice (ANES, n=909)0

200

400

34%

46%

20%

Two types of predictors:

I Voter-specific (Vi ): age, gender, education, party, opinions, etc.

I Candidate-varying (Xij): ideological distance between voter i andcandidate j

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 200 / 242

Page 1001: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: 1992 U.S. Presidential Election

Alvarez and Nagler (1995):

Yi : Vote choice in the 1992 U.S. presidential election(1 = Clinton, 2 = Bush, 3 = Perot)

Bush Clinton Perot

1992 Presidential Election Vote Choice (ANES, n=909)0

200

400

34%

46%

20%

Two types of predictors:

I Voter-specific (Vi ): age, gender, education, party, opinions, etc.

I Candidate-varying (Xij): ideological distance between voter i andcandidate j

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 200 / 242

Page 1002: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit Model

Generalize the logit model to more than two choices

The multinomial logit model (MNL):

πij = Pr(Yi = j | Vi ) =exp(V>i δj)∑J

k=1 exp(V>i δk),

where Vi = individual-specific characteristics of unit i (and anintercept)

Note that∑J

j=1 πij = 1

Need to set the base category for identifiability: δ1 = 0

δj represents how characteristics of voter i is associated withprobability of voting for candidate j

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 201 / 242

Page 1003: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit Model

Generalize the logit model to more than two choices

The multinomial logit model (MNL):

πij = Pr(Yi = j | Vi ) =exp(V>i δj)∑J

k=1 exp(V>i δk),

where Vi = individual-specific characteristics of unit i (and anintercept)

Note that∑J

j=1 πij = 1

Need to set the base category for identifiability: δ1 = 0

δj represents how characteristics of voter i is associated withprobability of voting for candidate j

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 201 / 242

Page 1004: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit Model

Generalize the logit model to more than two choices

The multinomial logit model (MNL):

πij = Pr(Yi = j | Vi ) =exp(V>i δj)∑J

k=1 exp(V>i δk),

where Vi = individual-specific characteristics of unit i (and anintercept)

Note that∑J

j=1 πij = 1

Need to set the base category for identifiability: δ1 = 0

δj represents how characteristics of voter i is associated withprobability of voting for candidate j

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 201 / 242

Page 1005: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit Model

Generalize the logit model to more than two choices

The multinomial logit model (MNL):

πij = Pr(Yi = j | Vi ) =exp(V>i δj)∑J

k=1 exp(V>i δk),

where Vi = individual-specific characteristics of unit i (and anintercept)

Note that∑J

j=1 πij = 1

Need to set the base category for identifiability: δ1 = 0

δj represents how characteristics of voter i is associated withprobability of voting for candidate j

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 201 / 242

Page 1006: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Logit Model

Generalize the logit model to more than two choices

The multinomial logit model (MNL):

πij = Pr(Yi = j | Vi ) =exp(V>i δj)∑J

k=1 exp(V>i δk),

where Vi = individual-specific characteristics of unit i (and anintercept)

Note that∑J

j=1 πij = 1

Need to set the base category for identifiability: δ1 = 0

δj represents how characteristics of voter i is associated withprobability of voting for candidate j

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 201 / 242

Page 1007: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Conditional Logit Model

We can also incorporate alternative-varying predictors Xij

The conditional logit (CL) model:

πij = Pr(Yi = j | Xij) =exp(X>ij β)∑Jk=1 exp(X>ik β)

β represents how characteristics of candidate j for voter i areassociated with voting probabilities

Xij does not have to vary across voters (e.g. whether candidate j isincumbent)

In that case we suppress the subscript to Xj

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 202 / 242

Page 1008: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Conditional Logit Model

We can also incorporate alternative-varying predictors Xij

The conditional logit (CL) model:

πij = Pr(Yi = j | Xij) =exp(X>ij β)∑Jk=1 exp(X>ik β)

β represents how characteristics of candidate j for voter i areassociated with voting probabilities

Xij does not have to vary across voters (e.g. whether candidate j isincumbent)

In that case we suppress the subscript to Xj

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 202 / 242

Page 1009: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Conditional Logit Model

We can also incorporate alternative-varying predictors Xij

The conditional logit (CL) model:

πij = Pr(Yi = j | Xij) =exp(X>ij β)∑Jk=1 exp(X>ik β)

β represents how characteristics of candidate j for voter i areassociated with voting probabilities

Xij does not have to vary across voters (e.g. whether candidate j isincumbent)

In that case we suppress the subscript to Xj

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 202 / 242

Page 1010: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Conditional Logit Model

We can also incorporate alternative-varying predictors Xij

The conditional logit (CL) model:

πij = Pr(Yi = j | Xij) =exp(X>ij β)∑Jk=1 exp(X>ik β)

β represents how characteristics of candidate j for voter i areassociated with voting probabilities

Xij does not have to vary across voters (e.g. whether candidate j isincumbent)

In that case we suppress the subscript to Xj

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 202 / 242

Page 1011: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Conditional Logit Model

We can also incorporate alternative-varying predictors Xij

The conditional logit (CL) model:

πij = Pr(Yi = j | Xij) =exp(X>ij β)∑Jk=1 exp(X>ik β)

β represents how characteristics of candidate j for voter i areassociated with voting probabilities

Xij does not have to vary across voters (e.g. whether candidate j isincumbent)

In that case we suppress the subscript to Xj

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 202 / 242

Page 1012: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

MNL as a Special Case of CL

Mathematically, MNL can be subsumed under CL using a set ofartificial alternative-varying regressors for each Vi :

Xi1 =

Vi

0...0

, Xi2 =

0Vi

...0

, · · · , XiJ =

00...

Vi

Set the element of β for Xij to δj and you get the MNL model

δ1 must be set to zero for identifiability

Thus we can write both models (and their mixture) simply as CL:

πij = Pr(Yi = j | X ) =exp(X>ij β)∑Jk=1 exp(X>ik β)

We use the names CL and MNL interchangeably from here on

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 203 / 242

Page 1013: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

MNL as a Special Case of CL

Mathematically, MNL can be subsumed under CL using a set ofartificial alternative-varying regressors for each Vi :

Xi1 =

Vi

0...0

, Xi2 =

0Vi

...0

, · · · , XiJ =

00...

Vi

Set the element of β for Xij to δj and you get the MNL model

δ1 must be set to zero for identifiability

Thus we can write both models (and their mixture) simply as CL:

πij = Pr(Yi = j | X ) =exp(X>ij β)∑Jk=1 exp(X>ik β)

We use the names CL and MNL interchangeably from here on

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 203 / 242

Page 1014: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

MNL as a Special Case of CL

Mathematically, MNL can be subsumed under CL using a set ofartificial alternative-varying regressors for each Vi :

Xi1 =

Vi

0...0

, Xi2 =

0Vi

...0

, · · · , XiJ =

00...

Vi

Set the element of β for Xij to δj and you get the MNL model

δ1 must be set to zero for identifiability

Thus we can write both models (and their mixture) simply as CL:

πij = Pr(Yi = j | X ) =exp(X>ij β)∑Jk=1 exp(X>ik β)

We use the names CL and MNL interchangeably from here on

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 203 / 242

Page 1015: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

MNL as a Special Case of CL

Mathematically, MNL can be subsumed under CL using a set ofartificial alternative-varying regressors for each Vi :

Xi1 =

Vi

0...0

, Xi2 =

0Vi

...0

, · · · , XiJ =

00...

Vi

Set the element of β for Xij to δj and you get the MNL model

δ1 must be set to zero for identifiability

Thus we can write both models (and their mixture) simply as CL:

πij = Pr(Yi = j | X ) =exp(X>ij β)∑Jk=1 exp(X>ik β)

We use the names CL and MNL interchangeably from here on

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 203 / 242

Page 1016: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

MNL as a Special Case of CL

Mathematically, MNL can be subsumed under CL using a set ofartificial alternative-varying regressors for each Vi :

Xi1 =

Vi

0...0

, Xi2 =

0Vi

...0

, · · · , XiJ =

00...

Vi

Set the element of β for Xij to δj and you get the MNL model

δ1 must be set to zero for identifiability

Thus we can write both models (and their mixture) simply as CL:

πij = Pr(Yi = j | X ) =exp(X>ij β)∑Jk=1 exp(X>ik β)

We use the names CL and MNL interchangeably from here on

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 203 / 242

Page 1017: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

MNL as a Special Case of CL

Mathematically, MNL can be subsumed under CL using a set ofartificial alternative-varying regressors for each Vi :

Xi1 =

Vi

0...0

, Xi2 =

0Vi

...0

, · · · , XiJ =

00...

Vi

Set the element of β for Xij to δj and you get the MNL model

δ1 must be set to zero for identifiability

Thus we can write both models (and their mixture) simply as CL:

πij = Pr(Yi = j | X ) =exp(X>ij β)∑Jk=1 exp(X>ik β)

We use the names CL and MNL interchangeably from here on

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 203 / 242

Page 1018: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predictor Types and Data Formats

Discrete choice data usually come in one of the two formats:

1 Wide format: N rows, #V + J ·#X predictors

choice women educ idist.Clinton idist.Bush idist.Perot

Bush 1 3 4.0804 0.1024 0.2601

Bush 1 4 4.0804 0.1024 0.2601

Clinton 1 2 1.0404 1.7424 0.2401

Bush 0 6 0.0004 5.3824 2.2201

Clinton 1 3 0.9604 11.0220 6.2001 ...

2 Long format: NJ rows, #V + #X predictors

chid alt choice women educ idist

1 Bush TRUE 1 3 0.1024

1 Clinton FALSE 1 3 4.0804

1 Perot FALSE 1 3 0.2601

2 Bush TRUE 1 4 0.1024

2 Clinton FALSE 1 4 4.0804

2 Perot FALSE 1 4 0.2601 ...

Use reshape to change between wide and long

Some estimation functions (e.g. mlogit) can take both formats

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 204 / 242

Page 1019: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predictor Types and Data FormatsDiscrete choice data usually come in one of the two formats:

1 Wide format: N rows, #V + J ·#X predictors

choice women educ idist.Clinton idist.Bush idist.Perot

Bush 1 3 4.0804 0.1024 0.2601

Bush 1 4 4.0804 0.1024 0.2601

Clinton 1 2 1.0404 1.7424 0.2401

Bush 0 6 0.0004 5.3824 2.2201

Clinton 1 3 0.9604 11.0220 6.2001 ...

2 Long format: NJ rows, #V + #X predictors

chid alt choice women educ idist

1 Bush TRUE 1 3 0.1024

1 Clinton FALSE 1 3 4.0804

1 Perot FALSE 1 3 0.2601

2 Bush TRUE 1 4 0.1024

2 Clinton FALSE 1 4 4.0804

2 Perot FALSE 1 4 0.2601 ...

Use reshape to change between wide and long

Some estimation functions (e.g. mlogit) can take both formats

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 204 / 242

Page 1020: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predictor Types and Data FormatsDiscrete choice data usually come in one of the two formats:

1 Wide format: N rows, #V + J ·#X predictors

choice women educ idist.Clinton idist.Bush idist.Perot

Bush 1 3 4.0804 0.1024 0.2601

Bush 1 4 4.0804 0.1024 0.2601

Clinton 1 2 1.0404 1.7424 0.2401

Bush 0 6 0.0004 5.3824 2.2201

Clinton 1 3 0.9604 11.0220 6.2001 ...

2 Long format: NJ rows, #V + #X predictors

chid alt choice women educ idist

1 Bush TRUE 1 3 0.1024

1 Clinton FALSE 1 3 4.0804

1 Perot FALSE 1 3 0.2601

2 Bush TRUE 1 4 0.1024

2 Clinton FALSE 1 4 4.0804

2 Perot FALSE 1 4 0.2601 ...

Use reshape to change between wide and long

Some estimation functions (e.g. mlogit) can take both formats

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 204 / 242

Page 1021: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predictor Types and Data FormatsDiscrete choice data usually come in one of the two formats:

1 Wide format: N rows, #V + J ·#X predictors

choice women educ idist.Clinton idist.Bush idist.Perot

Bush 1 3 4.0804 0.1024 0.2601

Bush 1 4 4.0804 0.1024 0.2601

Clinton 1 2 1.0404 1.7424 0.2401

Bush 0 6 0.0004 5.3824 2.2201

Clinton 1 3 0.9604 11.0220 6.2001 ...

2 Long format: NJ rows, #V + #X predictors

chid alt choice women educ idist

1 Bush TRUE 1 3 0.1024

1 Clinton FALSE 1 3 4.0804

1 Perot FALSE 1 3 0.2601

2 Bush TRUE 1 4 0.1024

2 Clinton FALSE 1 4 4.0804

2 Perot FALSE 1 4 0.2601 ...

Use reshape to change between wide and long

Some estimation functions (e.g. mlogit) can take both formats

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 204 / 242

Page 1022: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predictor Types and Data FormatsDiscrete choice data usually come in one of the two formats:

1 Wide format: N rows, #V + J ·#X predictors

choice women educ idist.Clinton idist.Bush idist.Perot

Bush 1 3 4.0804 0.1024 0.2601

Bush 1 4 4.0804 0.1024 0.2601

Clinton 1 2 1.0404 1.7424 0.2401

Bush 0 6 0.0004 5.3824 2.2201

Clinton 1 3 0.9604 11.0220 6.2001 ...

2 Long format: NJ rows, #V + #X predictors

chid alt choice women educ idist

1 Bush TRUE 1 3 0.1024

1 Clinton FALSE 1 3 4.0804

1 Perot FALSE 1 3 0.2601

2 Bush TRUE 1 4 0.1024

2 Clinton FALSE 1 4 4.0804

2 Perot FALSE 1 4 0.2601 ...

Use reshape to change between wide and long

Some estimation functions (e.g. mlogit) can take both formats

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 204 / 242

Page 1023: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predictor Types and Data FormatsDiscrete choice data usually come in one of the two formats:

1 Wide format: N rows, #V + J ·#X predictors

choice women educ idist.Clinton idist.Bush idist.Perot

Bush 1 3 4.0804 0.1024 0.2601

Bush 1 4 4.0804 0.1024 0.2601

Clinton 1 2 1.0404 1.7424 0.2401

Bush 0 6 0.0004 5.3824 2.2201

Clinton 1 3 0.9604 11.0220 6.2001 ...

2 Long format: NJ rows, #V + #X predictors

chid alt choice women educ idist

1 Bush TRUE 1 3 0.1024

1 Clinton FALSE 1 3 4.0804

1 Perot FALSE 1 3 0.2601

2 Bush TRUE 1 4 0.1024

2 Clinton FALSE 1 4 4.0804

2 Perot FALSE 1 4 0.2601 ...

Use reshape to change between wide and long

Some estimation functions (e.g. mlogit) can take both formats

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 204 / 242

Page 1024: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predictor Types and Data FormatsDiscrete choice data usually come in one of the two formats:

1 Wide format: N rows, #V + J ·#X predictors

choice women educ idist.Clinton idist.Bush idist.Perot

Bush 1 3 4.0804 0.1024 0.2601

Bush 1 4 4.0804 0.1024 0.2601

Clinton 1 2 1.0404 1.7424 0.2401

Bush 0 6 0.0004 5.3824 2.2201

Clinton 1 3 0.9604 11.0220 6.2001 ...

2 Long format: NJ rows, #V + #X predictors

chid alt choice women educ idist

1 Bush TRUE 1 3 0.1024

1 Clinton FALSE 1 3 4.0804

1 Perot FALSE 1 3 0.2601

2 Bush TRUE 1 4 0.1024

2 Clinton FALSE 1 4 4.0804

2 Perot FALSE 1 4 0.2601 ...

Use reshape to change between wide and long

Some estimation functions (e.g. mlogit) can take both formats

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 204 / 242

Page 1025: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Predictor Types and Data FormatsDiscrete choice data usually come in one of the two formats:

1 Wide format: N rows, #V + J ·#X predictors

choice women educ idist.Clinton idist.Bush idist.Perot

Bush 1 3 4.0804 0.1024 0.2601

Bush 1 4 4.0804 0.1024 0.2601

Clinton 1 2 1.0404 1.7424 0.2401

Bush 0 6 0.0004 5.3824 2.2201

Clinton 1 3 0.9604 11.0220 6.2001 ...

2 Long format: NJ rows, #V + #X predictors

chid alt choice women educ idist

1 Bush TRUE 1 3 0.1024

1 Clinton FALSE 1 3 4.0804

1 Perot FALSE 1 3 0.2601

2 Bush TRUE 1 4 0.1024

2 Clinton FALSE 1 4 4.0804

2 Perot FALSE 1 4 0.2601 ...

Use reshape to change between wide and long

Some estimation functions (e.g. mlogit) can take both formats

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 204 / 242

Page 1026: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Recall the random utility model:

Y ∗ij = X>ij β + εij ,

where

{Y ∗ij = latent utility from choosing j for i

εij = stochastic component of the utility

Assume that voter chooses the most preferred candidate, i.e.,

Yi = j if Y ∗ij ≥ Y ∗ij ′ for any j ′ ∈ {1, ..., J}

Assuming εiji.i.d.∼ type I extreme value distribution, this setup implies

MNL (McFadden 1974)

Proof for J = 2:

Pr(Yi = 1 | X ) = Pr(Y ∗i1 ≥ Y ∗i2 | X )

= Pr(εi2 − εi1 ≤ (Xi1 − Xi2)>β

)=

exp((Xi1 − Xi2)>β

)1 + exp ((Xi1 − Xi2)>β)

=exp(X>i1 β)

exp(X>i1 β) + exp(X>i2 β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 205 / 242

Page 1027: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Recall the random utility model:

Y ∗ij = X>ij β + εij ,

where

{Y ∗ij = latent utility from choosing j for i

εij = stochastic component of the utility

Assume that voter chooses the most preferred candidate, i.e.,

Yi = j if Y ∗ij ≥ Y ∗ij ′ for any j ′ ∈ {1, ..., J}

Assuming εiji.i.d.∼ type I extreme value distribution, this setup implies

MNL (McFadden 1974)

Proof for J = 2:

Pr(Yi = 1 | X ) = Pr(Y ∗i1 ≥ Y ∗i2 | X )

= Pr(εi2 − εi1 ≤ (Xi1 − Xi2)>β

)=

exp((Xi1 − Xi2)>β

)1 + exp ((Xi1 − Xi2)>β)

=exp(X>i1 β)

exp(X>i1 β) + exp(X>i2 β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 205 / 242

Page 1028: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Recall the random utility model:

Y ∗ij = X>ij β + εij ,

where

{Y ∗ij = latent utility from choosing j for i

εij = stochastic component of the utility

Assume that voter chooses the most preferred candidate, i.e.,

Yi = j if Y ∗ij ≥ Y ∗ij ′ for any j ′ ∈ {1, ..., J}

Assuming εiji.i.d.∼ type I extreme value distribution, this setup implies

MNL (McFadden 1974)

Proof for J = 2:

Pr(Yi = 1 | X ) = Pr(Y ∗i1 ≥ Y ∗i2 | X )

= Pr(εi2 − εi1 ≤ (Xi1 − Xi2)>β

)=

exp((Xi1 − Xi2)>β

)1 + exp ((Xi1 − Xi2)>β)

=exp(X>i1 β)

exp(X>i1 β) + exp(X>i2 β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 205 / 242

Page 1029: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Latent Variable Interpretation

Recall the random utility model:

Y ∗ij = X>ij β + εij ,

where

{Y ∗ij = latent utility from choosing j for i

εij = stochastic component of the utility

Assume that voter chooses the most preferred candidate, i.e.,

Yi = j if Y ∗ij ≥ Y ∗ij ′ for any j ′ ∈ {1, ..., J}

Assuming εiji.i.d.∼ type I extreme value distribution, this setup implies

MNL (McFadden 1974)

Proof for J = 2:

Pr(Yi = 1 | X ) = Pr(Y ∗i1 ≥ Y ∗i2 | X )

= Pr(εi2 − εi1 ≤ (Xi1 − Xi2)>β

)=

exp((Xi1 − Xi2)>β

)1 + exp ((Xi1 − Xi2)>β)

=exp(X>i1 β)

exp(X>i1 β) + exp(X>i2 β)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 205 / 242

Page 1030: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation and Inference

Estimation via MLE

Likelihood for a random sample of size n:

L(β | Y ,X ) =∏n

i=1

∏Jj=1 π

1{Yi=j}ij

It can be shown that the log-likelihood is globally concave⇒ guaranteed convergence to the true (not local) MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 206 / 242

Page 1031: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation and Inference

Estimation via MLE

Likelihood for a random sample of size n:

L(β | Y ,X ) =∏n

i=1

∏Jj=1 π

1{Yi=j}ij

It can be shown that the log-likelihood is globally concave⇒ guaranteed convergence to the true (not local) MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 206 / 242

Page 1032: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimation and Inference

Estimation via MLE

Likelihood for a random sample of size n:

L(β | Y ,X ) =∏n

i=1

∏Jj=1 π

1{Yi=j}ij

It can be shown that the log-likelihood is globally concave⇒ guaranteed convergence to the true (not local) MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 206 / 242

Page 1033: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting MNL/CL Coefficients

In MNL/CL, β itself is not necessarily informative about the effect of X

1 The coefficients are all with respect to the baseline category

−→ Testing βj = 0 does not generally make sense(unless comparison to the baseline is the goal)

2 Changing Xij has impact on Pr(Yi = k | X ), k 6= j :

I For individual-specific characteristics (Vi ), even sign of δj may notagree with the direction of the change in response probability for j

I For alternative-varying characteristics (Xij), sign of β does indicate thedirection of the effect, but magnitude is hard to interpret

Compute a quantity that has a clear substantive interpretation!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 207 / 242

Page 1034: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting MNL/CL Coefficients

In MNL/CL, β itself is not necessarily informative about the effect of X

1 The coefficients are all with respect to the baseline category

−→ Testing βj = 0 does not generally make sense(unless comparison to the baseline is the goal)

2 Changing Xij has impact on Pr(Yi = k | X ), k 6= j :

I For individual-specific characteristics (Vi ), even sign of δj may notagree with the direction of the change in response probability for j

I For alternative-varying characteristics (Xij), sign of β does indicate thedirection of the effect, but magnitude is hard to interpret

Compute a quantity that has a clear substantive interpretation!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 207 / 242

Page 1035: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting MNL/CL Coefficients

In MNL/CL, β itself is not necessarily informative about the effect of X

1 The coefficients are all with respect to the baseline category

−→ Testing βj = 0 does not generally make sense(unless comparison to the baseline is the goal)

2 Changing Xij has impact on Pr(Yi = k | X ), k 6= j :

I For individual-specific characteristics (Vi ), even sign of δj may notagree with the direction of the change in response probability for j

I For alternative-varying characteristics (Xij), sign of β does indicate thedirection of the effect, but magnitude is hard to interpret

Compute a quantity that has a clear substantive interpretation!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 207 / 242

Page 1036: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting MNL/CL Coefficients

In MNL/CL, β itself is not necessarily informative about the effect of X

1 The coefficients are all with respect to the baseline category

−→ Testing βj = 0 does not generally make sense(unless comparison to the baseline is the goal)

2 Changing Xij has impact on Pr(Yi = k | X ), k 6= j :

I For individual-specific characteristics (Vi ), even sign of δj may notagree with the direction of the change in response probability for j

I For alternative-varying characteristics (Xij), sign of β does indicate thedirection of the effect, but magnitude is hard to interpret

Compute a quantity that has a clear substantive interpretation!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 207 / 242

Page 1037: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting MNL/CL Coefficients

In MNL/CL, β itself is not necessarily informative about the effect of X

1 The coefficients are all with respect to the baseline category

−→ Testing βj = 0 does not generally make sense(unless comparison to the baseline is the goal)

2 Changing Xij has impact on Pr(Yi = k | X ), k 6= j :

I For individual-specific characteristics (Vi ), even sign of δj may notagree with the direction of the change in response probability for j

I For alternative-varying characteristics (Xij), sign of β does indicate thedirection of the effect, but magnitude is hard to interpret

Compute a quantity that has a clear substantive interpretation!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 207 / 242

Page 1038: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Interpreting MNL/CL Coefficients

In MNL/CL, β itself is not necessarily informative about the effect of X

1 The coefficients are all with respect to the baseline category

−→ Testing βj = 0 does not generally make sense(unless comparison to the baseline is the goal)

2 Changing Xij has impact on Pr(Yi = k | X ), k 6= j :

I For individual-specific characteristics (Vi ), even sign of δj may notagree with the direction of the change in response probability for j

I For alternative-varying characteristics (Xij), sign of β does indicate thedirection of the effect, but magnitude is hard to interpret

Compute a quantity that has a clear substantive interpretation!

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 207 / 242

Page 1039: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Calculating Quantities of Interest1 Choice probability:

πj(x) = Pr(Yi = j | X = x)

e.g. How likely is a female college-educated conservative Republican voter tovote for Perot?

2 Predicted vote share:

pj(x1) ≡ E [1 {πj(Xi1 = x1,Xi2) ≥ πk(Xi1 = x1,Xi2) for all k}]

where Xi1 is the predictor(s) of interest and Xi2 is all other predictors

e.g. What would Perot’s vote share be if all voters supported abortion?

3 Average partial (treatment) effects:

τjk = E [πj(Tik = 1,Ti∗,Wi )− πj(Tik = 0,Ti∗,Wi )]

where Tik is treatment on candidate k, Ti∗ is treatment on others, Wi ispre-treatment covariates

I “Direct effect” if j = k; “indirect effect” if j 6= kI If T is individual-specific, τj = E[πj(Ti = 1,Wi )− πj(Ti = 0,Wi )]

Estimate by plugging in sample analogues (e.g. πj → πj , E→ 1n

∑)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 208 / 242

Page 1040: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Calculating Quantities of Interest1 Choice probability:

πj(x) = Pr(Yi = j | X = x)

e.g. How likely is a female college-educated conservative Republican voter tovote for Perot?

2 Predicted vote share:

pj(x1) ≡ E [1 {πj(Xi1 = x1,Xi2) ≥ πk(Xi1 = x1,Xi2) for all k}]

where Xi1 is the predictor(s) of interest and Xi2 is all other predictors

e.g. What would Perot’s vote share be if all voters supported abortion?

3 Average partial (treatment) effects:

τjk = E [πj(Tik = 1,Ti∗,Wi )− πj(Tik = 0,Ti∗,Wi )]

where Tik is treatment on candidate k, Ti∗ is treatment on others, Wi ispre-treatment covariates

I “Direct effect” if j = k; “indirect effect” if j 6= kI If T is individual-specific, τj = E[πj(Ti = 1,Wi )− πj(Ti = 0,Wi )]

Estimate by plugging in sample analogues (e.g. πj → πj , E→ 1n

∑)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 208 / 242

Page 1041: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Calculating Quantities of Interest1 Choice probability:

πj(x) = Pr(Yi = j | X = x)

e.g. How likely is a female college-educated conservative Republican voter tovote for Perot?

2 Predicted vote share:

pj(x1) ≡ E [1 {πj(Xi1 = x1,Xi2) ≥ πk(Xi1 = x1,Xi2) for all k}]

where Xi1 is the predictor(s) of interest and Xi2 is all other predictors

e.g. What would Perot’s vote share be if all voters supported abortion?

3 Average partial (treatment) effects:

τjk = E [πj(Tik = 1,Ti∗,Wi )− πj(Tik = 0,Ti∗,Wi )]

where Tik is treatment on candidate k , Ti∗ is treatment on others, Wi ispre-treatment covariates

I “Direct effect” if j = k ; “indirect effect” if j 6= kI If T is individual-specific, τj = E[πj(Ti = 1,Wi )− πj(Ti = 0,Wi )]

Estimate by plugging in sample analogues (e.g. πj → πj , E→ 1n

∑)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 208 / 242

Page 1042: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Calculating Quantities of Interest1 Choice probability:

πj(x) = Pr(Yi = j | X = x)

e.g. How likely is a female college-educated conservative Republican voter tovote for Perot?

2 Predicted vote share:

pj(x1) ≡ E [1 {πj(Xi1 = x1,Xi2) ≥ πk(Xi1 = x1,Xi2) for all k}]

where Xi1 is the predictor(s) of interest and Xi2 is all other predictors

e.g. What would Perot’s vote share be if all voters supported abortion?

3 Average partial (treatment) effects:

τjk = E [πj(Tik = 1,Ti∗,Wi )− πj(Tik = 0,Ti∗,Wi )]

where Tik is treatment on candidate k , Ti∗ is treatment on others, Wi ispre-treatment covariates

I “Direct effect” if j = k ; “indirect effect” if j 6= kI If T is individual-specific, τj = E[πj(Ti = 1,Wi )− πj(Ti = 0,Wi )]

Estimate by plugging in sample analogues (e.g. πj → πj , E→ 1n

∑)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 208 / 242

Page 1043: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: 1992 U.S. Presidential Election

Model specification (Alvarez and Nagler 1995):

πij =exp(X>ij β + V>i δj)∑J

k=1 exp(X>ik β + V>i δk)

where

Xij = {ideological distance}Vi = {1, issue opinions, party, gender, education, age, ...}

Estimated coefficients:

β = −0.11 (0.02)

δ =[δBush δClinton

]=

0.67 (0.94) −0.41 (0.45)−0.52 (0.11) −0.02 (0.12)0.54 (0.23) 0.30 (0.22)

......

(intercept)(support abortion)(female)

...

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 209 / 242

Page 1044: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: 1992 U.S. Presidential Election

Model specification (Alvarez and Nagler 1995):

πij =exp(X>ij β + V>i δj)∑J

k=1 exp(X>ik β + V>i δk)

where

Xij = {ideological distance}Vi = {1, issue opinions, party, gender, education, age, ...}

Estimated coefficients:

β = −0.11 (0.02)

δ =[δBush δClinton

]=

0.67 (0.94) −0.41 (0.45)−0.52 (0.11) −0.02 (0.12)0.54 (0.23) 0.30 (0.22)

......

(intercept)(support abortion)(female)

...

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 209 / 242

Page 1045: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: 1992 U.S. Presidential Election

Estimated choice probabilities for a “typical” voter from South:

0.0

0.2

0.4

0.6

0.8

1.0

Cho

ice

Pro

babi

lity

Bush Clinton Perot

Predicted vote shares if everyone opposed abortion:

0.0

0.2

0.4

0.6

0.8

1.0

Vot

e S

hare

Bush Clinton Perot

● ●

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 210 / 242

Page 1046: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Example: 1992 U.S. Presidential Election

Estimated choice probabilities for a “typical” voter from South:

0.0

0.2

0.4

0.6

0.8

1.0

Cho

ice

Pro

babi

lity

Bush Clinton Perot

Predicted vote shares if everyone opposed abortion:

0.0

0.2

0.4

0.6

0.8

1.0

Vot

e S

hare

Bush Clinton Perot

● ●

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 210 / 242

Page 1047: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assumptions in Multinomial Logit Models

Recall that MNL assumes εij is i.i.d.

In particular, εij⊥⊥εik for j 6= k

This implies that unobserved factors affecting Y ∗ij are unrelated tothose affecting Y ∗ik

When is this assumption plausible?

Example: Multiparty election with parties R, L1 and L2.

Do voters’ unobserved ideological preferences affect Pr(Yi =L1)independently of their effect on Pr(Yi =L2)? Probably not.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 211 / 242

Page 1048: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assumptions in Multinomial Logit Models

Recall that MNL assumes εij is i.i.d.

In particular, εij⊥⊥εik for j 6= k

This implies that unobserved factors affecting Y ∗ij are unrelated tothose affecting Y ∗ik

When is this assumption plausible?

Example: Multiparty election with parties R, L1 and L2.

Do voters’ unobserved ideological preferences affect Pr(Yi =L1)independently of their effect on Pr(Yi =L2)? Probably not.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 211 / 242

Page 1049: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assumptions in Multinomial Logit Models

Recall that MNL assumes εij is i.i.d.

In particular, εij⊥⊥εik for j 6= k

This implies that unobserved factors affecting Y ∗ij are unrelated tothose affecting Y ∗ik

When is this assumption plausible?

Example: Multiparty election with parties R, L1 and L2.

Do voters’ unobserved ideological preferences affect Pr(Yi =L1)independently of their effect on Pr(Yi =L2)? Probably not.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 211 / 242

Page 1050: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assumptions in Multinomial Logit Models

Recall that MNL assumes εij is i.i.d.

In particular, εij⊥⊥εik for j 6= k

This implies that unobserved factors affecting Y ∗ij are unrelated tothose affecting Y ∗ik

When is this assumption plausible?

Example: Multiparty election with parties R, L1 and L2.

Do voters’ unobserved ideological preferences affect Pr(Yi =L1)independently of their effect on Pr(Yi =L2)? Probably not.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 211 / 242

Page 1051: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assumptions in Multinomial Logit Models

Recall that MNL assumes εij is i.i.d.

In particular, εij⊥⊥εik for j 6= k

This implies that unobserved factors affecting Y ∗ij are unrelated tothose affecting Y ∗ik

When is this assumption plausible?

Example: Multiparty election with parties R, L1 and L2.

Do voters’ unobserved ideological preferences affect Pr(Yi =L1)independently of their effect on Pr(Yi =L2)? Probably not.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 211 / 242

Page 1052: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assumptions in Multinomial Logit Models

Recall that MNL assumes εij is i.i.d.

In particular, εij⊥⊥εik for j 6= k

This implies that unobserved factors affecting Y ∗ij are unrelated tothose affecting Y ∗ik

When is this assumption plausible?

Example: Multiparty election with parties R, L1 and L2.

Do voters’ unobserved ideological preferences affect Pr(Yi =L1)independently of their effect on Pr(Yi =L2)?

Probably not.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 211 / 242

Page 1053: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Assumptions in Multinomial Logit Models

Recall that MNL assumes εij is i.i.d.

In particular, εij⊥⊥εik for j 6= k

This implies that unobserved factors affecting Y ∗ij are unrelated tothose affecting Y ∗ik

When is this assumption plausible?

Example: Multiparty election with parties R, L1 and L2.

Do voters’ unobserved ideological preferences affect Pr(Yi =L1)independently of their effect on Pr(Yi =L2)? Probably not.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 211 / 242

Page 1054: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Independence of Irrelevant Alternatives

MNL makes the Independence of irrelevant alternatives (IIA)assumption:

Pr(Choose j | j or k)

Pr(Choose k | j or k)=

Pr(Choose j | j or k or l)

Pr(Choose k | j or k or l)for any l ∈ {1, ...J}

A classical example of IIA violation: the red bus-blue bus problem

Relative risk of j over k does not depend on other alternatives:

Pr(Yi = j | Xi )

Pr(Yi = k | Xi )= exp{(Xij − Xik)>β}

That is, the multinomial choice reduces to a series of independentpairwise comparisons

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 212 / 242

Page 1055: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Independence of Irrelevant Alternatives

MNL makes the Independence of irrelevant alternatives (IIA)assumption:

Pr(Choose j | j or k)

Pr(Choose k | j or k)=

Pr(Choose j | j or k or l)

Pr(Choose k | j or k or l)for any l ∈ {1, ...J}

A classical example of IIA violation: the red bus-blue bus problem

Relative risk of j over k does not depend on other alternatives:

Pr(Yi = j | Xi )

Pr(Yi = k | Xi )= exp{(Xij − Xik)>β}

That is, the multinomial choice reduces to a series of independentpairwise comparisons

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 212 / 242

Page 1056: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Independence of Irrelevant Alternatives

MNL makes the Independence of irrelevant alternatives (IIA)assumption:

Pr(Choose j | j or k)

Pr(Choose k | j or k)=

Pr(Choose j | j or k or l)

Pr(Choose k | j or k or l)for any l ∈ {1, ...J}

A classical example of IIA violation: the red bus-blue bus problem

Relative risk of j over k does not depend on other alternatives:

Pr(Yi = j | Xi )

Pr(Yi = k | Xi )= exp{(Xij − Xik)>β}

That is, the multinomial choice reduces to a series of independentpairwise comparisons

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 212 / 242

Page 1057: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Independence of Irrelevant Alternatives

MNL makes the Independence of irrelevant alternatives (IIA)assumption:

Pr(Choose j | j or k)

Pr(Choose k | j or k)=

Pr(Choose j | j or k or l)

Pr(Choose k | j or k or l)for any l ∈ {1, ...J}

A classical example of IIA violation: the red bus-blue bus problem

Relative risk of j over k does not depend on other alternatives:

Pr(Yi = j | Xi )

Pr(Yi = k | Xi )= exp{(Xij − Xik)>β}

That is, the multinomial choice reduces to a series of independentpairwise comparisons

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 212 / 242

Page 1058: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Model

How can we relax the IIA assumption?

Instead of assuming εij to be i.i.d. across alternatives j , we allow εijto be correlated across j within each voter i

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on the model for identifiability:I The (absolute) level of Y ∗i shouldn’t matter−→ Subtract the 1st equation from all the other equations and work

with a system of J − 1 equations with εii.i.d.∼ N (0, ΣJ−1)

I The scale of Y ∗i also shouldn’t matter

−→ Σ(1,1) = 1

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 213 / 242

Page 1059: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Model

How can we relax the IIA assumption?

Instead of assuming εij to be i.i.d. across alternatives j , we allow εijto be correlated across j within each voter i

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on the model for identifiability:I The (absolute) level of Y ∗i shouldn’t matter−→ Subtract the 1st equation from all the other equations and work

with a system of J − 1 equations with εii.i.d.∼ N (0, ΣJ−1)

I The scale of Y ∗i also shouldn’t matter

−→ Σ(1,1) = 1

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 213 / 242

Page 1060: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Model

How can we relax the IIA assumption?

Instead of assuming εij to be i.i.d. across alternatives j , we allow εijto be correlated across j within each voter i

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on the model for identifiability:I The (absolute) level of Y ∗i shouldn’t matter−→ Subtract the 1st equation from all the other equations and work

with a system of J − 1 equations with εii.i.d.∼ N (0, ΣJ−1)

I The scale of Y ∗i also shouldn’t matter

−→ Σ(1,1) = 1

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 213 / 242

Page 1061: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Model

How can we relax the IIA assumption?

Instead of assuming εij to be i.i.d. across alternatives j , we allow εijto be correlated across j within each voter i

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on the model for identifiability:

I The (absolute) level of Y ∗i shouldn’t matter−→ Subtract the 1st equation from all the other equations and work

with a system of J − 1 equations with εii.i.d.∼ N (0, ΣJ−1)

I The scale of Y ∗i also shouldn’t matter

−→ Σ(1,1) = 1

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 213 / 242

Page 1062: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Model

How can we relax the IIA assumption?

Instead of assuming εij to be i.i.d. across alternatives j , we allow εijto be correlated across j within each voter i

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on the model for identifiability:I The (absolute) level of Y ∗i shouldn’t matter

−→ Subtract the 1st equation from all the other equations and work

with a system of J − 1 equations with εii.i.d.∼ N (0, ΣJ−1)

I The scale of Y ∗i also shouldn’t matter

−→ Σ(1,1) = 1

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 213 / 242

Page 1063: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Model

How can we relax the IIA assumption?

Instead of assuming εij to be i.i.d. across alternatives j , we allow εijto be correlated across j within each voter i

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on the model for identifiability:I The (absolute) level of Y ∗i shouldn’t matter−→ Subtract the 1st equation from all the other equations and work

with a system of J − 1 equations with εii.i.d.∼ N (0, ΣJ−1)

I The scale of Y ∗i also shouldn’t matter

−→ Σ(1,1) = 1

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 213 / 242

Page 1064: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Model

How can we relax the IIA assumption?

Instead of assuming εij to be i.i.d. across alternatives j , we allow εijto be correlated across j within each voter i

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on the model for identifiability:I The (absolute) level of Y ∗i shouldn’t matter−→ Subtract the 1st equation from all the other equations and work

with a system of J − 1 equations with εii.i.d.∼ N (0, ΣJ−1)

I The scale of Y ∗i also shouldn’t matter

−→ Σ(1,1) = 1

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 213 / 242

Page 1065: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Model

How can we relax the IIA assumption?

Instead of assuming εij to be i.i.d. across alternatives j , we allow εijto be correlated across j within each voter i

Multinomial probit model (MNP):

Y ∗i = X>i β + εi where

εi

i.i.d.∼ N (0,ΣJ)Y ∗i = [Y ∗i1 · · · Y ∗iJ ]>

Xi = [Xi1 · · · XiJ ]>

Restrictions on the model for identifiability:I The (absolute) level of Y ∗i shouldn’t matter−→ Subtract the 1st equation from all the other equations and work

with a system of J − 1 equations with εii.i.d.∼ N (0, ΣJ−1)

I The scale of Y ∗i also shouldn’t matter

−→ Σ(1,1) = 1

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 213 / 242

Page 1066: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Walkthrough

U∗i ∼ N(u∗i |µi ,Σ)

µij = xijβj

with observation mechanism:

Yij =

{1 if U∗ij > U∗ij ′ ∀ j 6= j ′

0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 214 / 242

Page 1067: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Walkthrough

U∗i ∼ N(u∗i |µi ,Σ)

µij = xijβj

with observation mechanism:

Yij =

{1 if U∗ij > U∗ij ′ ∀ j 6= j ′

0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 214 / 242

Page 1068: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Walkthrough

U∗i ∼ N(u∗i |µi ,Σ)

µij = xijβj

with observation mechanism:

Yij =

{1 if U∗ij > U∗ij ′ ∀ j 6= j ′

0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 214 / 242

Page 1069: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Multinomial Probit Walkthrough

U∗i ∼ N(u∗i |µi ,Σ)

µij = xijβj

with observation mechanism:

Yij =

{1 if U∗ij > U∗ij ′ ∀ j 6= j ′

0 otherwise

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 214 / 242

Page 1070: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The stochastic component:

Pr(Yij = 1) = πij , s.t.J∑

j=1

πij = 1

for i = 1, . . . , n

Systematic component. Let Y ∗ij = U∗ij − U∗ij ′ , so the observationmechanism is

Yij =

{1 if Y ∗ij > 0

0 otherwise

πij = Pr(yij = 1)

= Pr(Y ∗i1 ≤ 0, . . . ,Y ∗ij > 0, . . . ,Y ∗iJ ≤ 0)

=

∫ 0

−∞· · ·∫ ∞

0· · ·∫ 0

−∞N(y |µi ,Σ)dyi1 · · · dyij · · · dyiJ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 215 / 242

Page 1071: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The stochastic component:

Pr(Yij = 1) = πij , s.t.J∑

j=1

πij = 1 for i = 1, . . . , n

Systematic component. Let Y ∗ij = U∗ij − U∗ij ′ , so the observationmechanism is

Yij =

{1 if Y ∗ij > 0

0 otherwise

πij = Pr(yij = 1)

= Pr(Y ∗i1 ≤ 0, . . . ,Y ∗ij > 0, . . . ,Y ∗iJ ≤ 0)

=

∫ 0

−∞· · ·∫ ∞

0· · ·∫ 0

−∞N(y |µi ,Σ)dyi1 · · · dyij · · · dyiJ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 215 / 242

Page 1072: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The stochastic component:

Pr(Yij = 1) = πij , s.t.J∑

j=1

πij = 1 for i = 1, . . . , n

Systematic component. Let Y ∗ij = U∗ij − U∗ij ′ , so the observationmechanism is

Yij =

{1 if Y ∗ij > 0

0 otherwise

πij = Pr(yij = 1)

= Pr(Y ∗i1 ≤ 0, . . . ,Y ∗ij > 0, . . . ,Y ∗iJ ≤ 0)

=

∫ 0

−∞· · ·∫ ∞

0· · ·∫ 0

−∞N(y |µi ,Σ)dyi1 · · · dyij · · · dyiJ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 215 / 242

Page 1073: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The stochastic component:

Pr(Yij = 1) = πij , s.t.J∑

j=1

πij = 1 for i = 1, . . . , n

Systematic component. Let Y ∗ij = U∗ij − U∗ij ′ , so the observationmechanism is

Yij =

{1 if Y ∗ij > 0

0 otherwise

πij = Pr(yij = 1)

= Pr(Y ∗i1 ≤ 0, . . . ,Y ∗ij > 0, . . . ,Y ∗iJ ≤ 0)

=

∫ 0

−∞· · ·∫ ∞

0· · ·∫ 0

−∞N(y |µi ,Σ)dyi1 · · · dyij · · · dyiJ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 215 / 242

Page 1074: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The stochastic component:

Pr(Yij = 1) = πij , s.t.J∑

j=1

πij = 1 for i = 1, . . . , n

Systematic component. Let Y ∗ij = U∗ij − U∗ij ′ , so the observationmechanism is

Yij =

{1 if Y ∗ij > 0

0 otherwise

πij = Pr(yij = 1)

= Pr(Y ∗i1 ≤ 0, . . . ,Y ∗ij > 0, . . . ,Y ∗iJ ≤ 0)

=

∫ 0

−∞· · ·∫ ∞

0· · ·∫ 0

−∞N(y |µi ,Σ)dyi1 · · · dyij · · · dyiJ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 215 / 242

Page 1075: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The stochastic component:

Pr(Yij = 1) = πij , s.t.J∑

j=1

πij = 1 for i = 1, . . . , n

Systematic component. Let Y ∗ij = U∗ij − U∗ij ′ , so the observationmechanism is

Yij =

{1 if Y ∗ij > 0

0 otherwise

πij = Pr(yij = 1)

= Pr(Y ∗i1 ≤ 0, . . . ,Y ∗ij > 0, . . . ,Y ∗iJ ≤ 0)

=

∫ 0

−∞· · ·∫ ∞

0· · ·∫ 0

−∞N(y |µi ,Σ)dyi1 · · · dyij · · · dyiJ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 215 / 242

Page 1076: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The stochastic component:

Pr(Yij = 1) = πij , s.t.J∑

j=1

πij = 1 for i = 1, . . . , n

Systematic component. Let Y ∗ij = U∗ij − U∗ij ′ , so the observationmechanism is

Yij =

{1 if Y ∗ij > 0

0 otherwise

πij = Pr(yij = 1)

= Pr(Y ∗i1 ≤ 0, . . . ,Y ∗ij > 0, . . . ,Y ∗iJ ≤ 0)

=

∫ 0

−∞· · ·∫ ∞

0· · ·∫ 0

−∞N(y |µi ,Σ)dyi1 · · · dyij · · · dyiJ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 215 / 242

Page 1077: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Computational and Estimation issues

No analytical solution is known to the integral

Moreover, # of parameters in ΣJ increases as J gets large, but datacontain little information about ΣJ :

J 3 4 5 6 7# of elements in ΣJ 6 10 15 21 28# of parameters identified 2 5 9 14 20

Consequently, MNP is only feasible when J is small

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 216 / 242

Page 1078: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Computational and Estimation issues

No analytical solution is known to the integral

Moreover, # of parameters in ΣJ increases as J gets large, but datacontain little information about ΣJ :

J 3 4 5 6 7# of elements in ΣJ 6 10 15 21 28# of parameters identified 2 5 9 14 20

Consequently, MNP is only feasible when J is small

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 216 / 242

Page 1079: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Computational and Estimation issues

No analytical solution is known to the integral

Moreover, # of parameters in ΣJ increases as J gets large, but datacontain little information about ΣJ :

J 3 4 5 6 7# of elements in ΣJ 6 10 15 21 28# of parameters identified 2 5 9 14 20

Consequently, MNP is only feasible when J is small

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 216 / 242

Page 1080: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Computational and Estimation issues

No analytical solution is known to the integral

Moreover, # of parameters in ΣJ increases as J gets large, but datacontain little information about ΣJ :

J 3 4 5 6 7# of elements in ΣJ 6 10 15 21 28# of parameters identified 2 5 9 14 20

Consequently, MNP is only feasible when J is small

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 216 / 242

Page 1081: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 217 / 242

Page 1082: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 217 / 242

Page 1083: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimating the Dispersion Parameter

In GLMs, φ is typically estimated sequentially after β is obtained by MLE.Two common methods for the common case of a(φ) = φ/ωi :

1 Calculate the (unscaled) deviance:

D(Y ; µ) ≡ φD∗(Y ; µ) = 2n∑

i=1

ωi

{Yi (θi − θi )− (b(θi )− b(θi ))

}which approximately follows φ · χ2

n−k (because D∗(Y ; µ)approx.∼ χ2

n−k)

Then estimate φ by φD = D(Y ;µ)n−k (because E

[χ2n−k]

= n − k)

2 Calculate the generalized Pearson χ2 statistic:

X 2 ≡n∑

i=1

ωi (Yi − µi )2

b′′(θi )= φ

n∑i=1

(Yi − µi )2

Vi

which also approximately follows φ · χ2n−k

Then estimate φ by φP = X 2

n−k (by the same logic)

When Yi ∼ N , D = X 2 ∼ φχ2n−k exactly, and φD and φP are identical and MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 218 / 242

Page 1084: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimating the Dispersion ParameterIn GLMs, φ is typically estimated sequentially after β is obtained by MLE.

Two common methods for the common case of a(φ) = φ/ωi :

1 Calculate the (unscaled) deviance:

D(Y ; µ) ≡ φD∗(Y ; µ) = 2n∑

i=1

ωi

{Yi (θi − θi )− (b(θi )− b(θi ))

}which approximately follows φ · χ2

n−k (because D∗(Y ; µ)approx.∼ χ2

n−k)

Then estimate φ by φD = D(Y ;µ)n−k (because E

[χ2n−k]

= n − k)

2 Calculate the generalized Pearson χ2 statistic:

X 2 ≡n∑

i=1

ωi (Yi − µi )2

b′′(θi )= φ

n∑i=1

(Yi − µi )2

Vi

which also approximately follows φ · χ2n−k

Then estimate φ by φP = X 2

n−k (by the same logic)

When Yi ∼ N , D = X 2 ∼ φχ2n−k exactly, and φD and φP are identical and MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 218 / 242

Page 1085: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimating the Dispersion ParameterIn GLMs, φ is typically estimated sequentially after β is obtained by MLE.Two common methods for the common case of a(φ) = φ/ωi :

1 Calculate the (unscaled) deviance:

D(Y ; µ) ≡ φD∗(Y ; µ) = 2n∑

i=1

ωi

{Yi (θi − θi )− (b(θi )− b(θi ))

}which approximately follows φ · χ2

n−k (because D∗(Y ; µ)approx.∼ χ2

n−k)

Then estimate φ by φD = D(Y ;µ)n−k (because E

[χ2n−k]

= n − k)

2 Calculate the generalized Pearson χ2 statistic:

X 2 ≡n∑

i=1

ωi (Yi − µi )2

b′′(θi )= φ

n∑i=1

(Yi − µi )2

Vi

which also approximately follows φ · χ2n−k

Then estimate φ by φP = X 2

n−k (by the same logic)

When Yi ∼ N , D = X 2 ∼ φχ2n−k exactly, and φD and φP are identical and MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 218 / 242

Page 1086: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimating the Dispersion ParameterIn GLMs, φ is typically estimated sequentially after β is obtained by MLE.Two common methods for the common case of a(φ) = φ/ωi :

1 Calculate the (unscaled) deviance:

D(Y ; µ) ≡ φD∗(Y ; µ) = 2n∑

i=1

ωi

{Yi (θi − θi )− (b(θi )− b(θi ))

}which approximately follows φ · χ2

n−k (because D∗(Y ; µ)approx.∼ χ2

n−k)

Then estimate φ by φD = D(Y ;µ)n−k (because E

[χ2n−k]

= n − k)

2 Calculate the generalized Pearson χ2 statistic:

X 2 ≡n∑

i=1

ωi (Yi − µi )2

b′′(θi )= φ

n∑i=1

(Yi − µi )2

Vi

which also approximately follows φ · χ2n−k

Then estimate φ by φP = X 2

n−k (by the same logic)

When Yi ∼ N , D = X 2 ∼ φχ2n−k exactly, and φD and φP are identical and MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 218 / 242

Page 1087: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimating the Dispersion ParameterIn GLMs, φ is typically estimated sequentially after β is obtained by MLE.Two common methods for the common case of a(φ) = φ/ωi :

1 Calculate the (unscaled) deviance:

D(Y ; µ) ≡ φD∗(Y ; µ) = 2n∑

i=1

ωi

{Yi (θi − θi )− (b(θi )− b(θi ))

}which approximately follows φ · χ2

n−k (because D∗(Y ; µ)approx.∼ χ2

n−k)

Then estimate φ by φD = D(Y ;µ)n−k (because E

[χ2n−k]

= n − k)

2 Calculate the generalized Pearson χ2 statistic:

X 2 ≡n∑

i=1

ωi (Yi − µi )2

b′′(θi )= φ

n∑i=1

(Yi − µi )2

Vi

which also approximately follows φ · χ2n−k

Then estimate φ by φP = X 2

n−k (by the same logic)

When Yi ∼ N , D = X 2 ∼ φχ2n−k exactly, and φD and φP are identical and MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 218 / 242

Page 1088: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimating the Dispersion ParameterIn GLMs, φ is typically estimated sequentially after β is obtained by MLE.Two common methods for the common case of a(φ) = φ/ωi :

1 Calculate the (unscaled) deviance:

D(Y ; µ) ≡ φD∗(Y ; µ) = 2n∑

i=1

ωi

{Yi (θi − θi )− (b(θi )− b(θi ))

}which approximately follows φ · χ2

n−k (because D∗(Y ; µ)approx.∼ χ2

n−k)

Then estimate φ by φD = D(Y ;µ)n−k (because E

[χ2n−k]

= n − k)

2 Calculate the generalized Pearson χ2 statistic:

X 2 ≡n∑

i=1

ωi (Yi − µi )2

b′′(θi )= φ

n∑i=1

(Yi − µi )2

Vi

which also approximately follows φ · χ2n−k

Then estimate φ by φP = X 2

n−k (by the same logic)

When Yi ∼ N , D = X 2 ∼ φχ2n−k exactly, and φD and φP are identical and MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 218 / 242

Page 1089: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Estimating the Dispersion ParameterIn GLMs, φ is typically estimated sequentially after β is obtained by MLE.Two common methods for the common case of a(φ) = φ/ωi :

1 Calculate the (unscaled) deviance:

D(Y ; µ) ≡ φD∗(Y ; µ) = 2n∑

i=1

ωi

{Yi (θi − θi )− (b(θi )− b(θi ))

}which approximately follows φ · χ2

n−k (because D∗(Y ; µ)approx.∼ χ2

n−k)

Then estimate φ by φD = D(Y ;µ)n−k (because E

[χ2n−k]

= n − k)

2 Calculate the generalized Pearson χ2 statistic:

X 2 ≡n∑

i=1

ωi (Yi − µi )2

b′′(θi )= φ

n∑i=1

(Yi − µi )2

Vi

which also approximately follows φ · χ2n−k

Then estimate φ by φP = X 2

n−k (by the same logic)

When Yi ∼ N , D = X 2 ∼ φχ2n−k exactly, and φD and φP are identical and MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 218 / 242

Page 1090: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing the Distributional Assumption

Note that in the overdispersed Poisson model, Yi cannot be Poissondistributed

How can this seemingly arbitrary modification justified?

In GLMs, we can replace the distributional assumption with thevariance function assumption:

1 Systematic component: X>i β = X>i β2 Link function: X>i β = g(µi ) where µi ≡ E(Yi | X )3 Variance function: V(Yi | X ) = φψ(µi )

That is, we specify mean and variance, but remain agnostic about therest of f (Y ) (i.e. likelihood)

With this reduced set of assumptions, what can we learn?

Bottom line: We lose nothing, thanks to the properties of theexponential family

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 219 / 242

Page 1091: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing the Distributional Assumption

Note that in the overdispersed Poisson model, Yi cannot be Poissondistributed

How can this seemingly arbitrary modification justified?

In GLMs, we can replace the distributional assumption with thevariance function assumption:

1 Systematic component: X>i β = X>i β2 Link function: X>i β = g(µi ) where µi ≡ E(Yi | X )3 Variance function: V(Yi | X ) = φψ(µi )

That is, we specify mean and variance, but remain agnostic about therest of f (Y ) (i.e. likelihood)

With this reduced set of assumptions, what can we learn?

Bottom line: We lose nothing, thanks to the properties of theexponential family

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 219 / 242

Page 1092: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing the Distributional Assumption

Note that in the overdispersed Poisson model, Yi cannot be Poissondistributed

How can this seemingly arbitrary modification justified?

In GLMs, we can replace the distributional assumption with thevariance function assumption:

1 Systematic component: X>i β = X>i β2 Link function: X>i β = g(µi ) where µi ≡ E(Yi | X )3 Variance function: V(Yi | X ) = φψ(µi )

That is, we specify mean and variance, but remain agnostic about therest of f (Y ) (i.e. likelihood)

With this reduced set of assumptions, what can we learn?

Bottom line: We lose nothing, thanks to the properties of theexponential family

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 219 / 242

Page 1093: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing the Distributional Assumption

Note that in the overdispersed Poisson model, Yi cannot be Poissondistributed

How can this seemingly arbitrary modification justified?

In GLMs, we can replace the distributional assumption with thevariance function assumption:

1 Systematic component: X>i β = X>i β2 Link function: X>i β = g(µi ) where µi ≡ E(Yi | X )3 Variance function: V(Yi | X ) = φψ(µi )

That is, we specify mean and variance, but remain agnostic about therest of f (Y ) (i.e. likelihood)

With this reduced set of assumptions, what can we learn?

Bottom line: We lose nothing, thanks to the properties of theexponential family

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 219 / 242

Page 1094: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing the Distributional Assumption

Note that in the overdispersed Poisson model, Yi cannot be Poissondistributed

How can this seemingly arbitrary modification justified?

In GLMs, we can replace the distributional assumption with thevariance function assumption:

1 Systematic component: X>i β = X>i β2 Link function: X>i β = g(µi ) where µi ≡ E(Yi | X )3 Variance function: V(Yi | X ) = φψ(µi )

That is, we specify mean and variance, but remain agnostic about therest of f (Y ) (i.e. likelihood)

With this reduced set of assumptions, what can we learn?

Bottom line: We lose nothing, thanks to the properties of theexponential family

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 219 / 242

Page 1095: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Relaxing the Distributional Assumption

Note that in the overdispersed Poisson model, Yi cannot be Poissondistributed

How can this seemingly arbitrary modification justified?

In GLMs, we can replace the distributional assumption with thevariance function assumption:

1 Systematic component: X>i β = X>i β2 Link function: X>i β = g(µi ) where µi ≡ E(Yi | X )3 Variance function: V(Yi | X ) = φψ(µi )

That is, we specify mean and variance, but remain agnostic about therest of f (Y ) (i.e. likelihood)

With this reduced set of assumptions, what can we learn?

Bottom line: We lose nothing, thanks to the properties of theexponential family

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 219 / 242

Page 1096: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Negative Binomial Regression Models

The overdispersed Poisson model is also called the negative binomial1 (NB1) model

An alternative parameterization for allowing overdispersion:

E(Yi | Xi ) = µi and V(Yi | Xi ) = µi + µ2i /γ > E(Yi | Xi )

This is called the negative binomial 2 (NB2) model

The NB2 model corresponds to the following PMF:

p(Yi | µi , γ) =Γ(Yi + γ)

Yi !Γ(γ)

(µi

µi + γ

)Yi(

γ

µi + γ

)γwhere µi = exp(X>i β)

(But keep in mind, you don’t need to exactly assume this PMF!)

Estimation via (Q)MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 220 / 242

Page 1097: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Negative Binomial Regression Models

The overdispersed Poisson model is also called the negative binomial1 (NB1) model

An alternative parameterization for allowing overdispersion:

E(Yi | Xi ) = µi and V(Yi | Xi ) = µi + µ2i /γ > E(Yi | Xi )

This is called the negative binomial 2 (NB2) model

The NB2 model corresponds to the following PMF:

p(Yi | µi , γ) =Γ(Yi + γ)

Yi !Γ(γ)

(µi

µi + γ

)Yi(

γ

µi + γ

)γwhere µi = exp(X>i β)

(But keep in mind, you don’t need to exactly assume this PMF!)

Estimation via (Q)MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 220 / 242

Page 1098: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Negative Binomial Regression Models

The overdispersed Poisson model is also called the negative binomial1 (NB1) model

An alternative parameterization for allowing overdispersion:

E(Yi | Xi ) = µi and V(Yi | Xi ) = µi + µ2i /γ > E(Yi | Xi )

This is called the negative binomial 2 (NB2) model

The NB2 model corresponds to the following PMF:

p(Yi | µi , γ) =Γ(Yi + γ)

Yi !Γ(γ)

(µi

µi + γ

)Yi(

γ

µi + γ

)γwhere µi = exp(X>i β)

(But keep in mind, you don’t need to exactly assume this PMF!)

Estimation via (Q)MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 220 / 242

Page 1099: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Negative Binomial Regression Models

The overdispersed Poisson model is also called the negative binomial1 (NB1) model

An alternative parameterization for allowing overdispersion:

E(Yi | Xi ) = µi and V(Yi | Xi ) = µi + µ2i /γ > E(Yi | Xi )

This is called the negative binomial 2 (NB2) model

The NB2 model corresponds to the following PMF:

p(Yi | µi , γ) =Γ(Yi + γ)

Yi !Γ(γ)

(µi

µi + γ

)Yi(

γ

µi + γ

)γwhere µi = exp(X>i β)

(But keep in mind, you don’t need to exactly assume this PMF!)

Estimation via (Q)MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 220 / 242

Page 1100: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Negative Binomial Regression Models

The overdispersed Poisson model is also called the negative binomial1 (NB1) model

An alternative parameterization for allowing overdispersion:

E(Yi | Xi ) = µi and V(Yi | Xi ) = µi + µ2i /γ > E(Yi | Xi )

This is called the negative binomial 2 (NB2) model

The NB2 model corresponds to the following PMF:

p(Yi | µi , γ) =Γ(Yi + γ)

Yi !Γ(γ)

(µi

µi + γ

)Yi(

γ

µi + γ

)γwhere µi = exp(X>i β)

(But keep in mind, you don’t need to exactly assume this PMF!)

Estimation via (Q)MLE

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 220 / 242

Page 1101: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 221 / 242

Page 1102: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 221 / 242

Page 1103: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary VariablesSame model as binary logit, but we only observe sums of iid groups ofBernoulli trials. E.g., the number of times you voted out of the last 5elections.

Yi ∼ Binomial(yi |πi ) =

(Ni

yi

)πyii (1− πi )Ni−yi

whereπi = [1 + e−xiβ]−1

which impliesE (Yi ) ≡ µi = Niπi = Ni [1 + e−xiβ]−1

and a likelihood of

L(π|y) ∝n∏

i=1

Binomial(yi |πi )

=n∏

i=1

(Ni

yi

)πyii (1− πi )Ni−yi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 222 / 242

Page 1104: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary VariablesSame model as binary logit, but we only observe sums of iid groups ofBernoulli trials. E.g., the number of times you voted out of the last 5elections.

Yi ∼ Binomial(yi |πi ) =

(Ni

yi

)πyii (1− πi )Ni−yi

whereπi = [1 + e−xiβ]−1

which impliesE (Yi ) ≡ µi = Niπi = Ni [1 + e−xiβ]−1

and a likelihood of

L(π|y) ∝n∏

i=1

Binomial(yi |πi )

=n∏

i=1

(Ni

yi

)πyii (1− πi )Ni−yi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 222 / 242

Page 1105: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary VariablesSame model as binary logit, but we only observe sums of iid groups ofBernoulli trials. E.g., the number of times you voted out of the last 5elections.

Yi ∼ Binomial(yi |πi ) =

(Ni

yi

)πyii (1− πi )Ni−yi

where

πi = [1 + e−xiβ]−1

which impliesE (Yi ) ≡ µi = Niπi = Ni [1 + e−xiβ]−1

and a likelihood of

L(π|y) ∝n∏

i=1

Binomial(yi |πi )

=n∏

i=1

(Ni

yi

)πyii (1− πi )Ni−yi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 222 / 242

Page 1106: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary VariablesSame model as binary logit, but we only observe sums of iid groups ofBernoulli trials. E.g., the number of times you voted out of the last 5elections.

Yi ∼ Binomial(yi |πi ) =

(Ni

yi

)πyii (1− πi )Ni−yi

whereπi = [1 + e−xiβ]−1

which impliesE (Yi ) ≡ µi = Niπi = Ni [1 + e−xiβ]−1

and a likelihood of

L(π|y) ∝n∏

i=1

Binomial(yi |πi )

=n∏

i=1

(Ni

yi

)πyii (1− πi )Ni−yi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 222 / 242

Page 1107: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary VariablesSame model as binary logit, but we only observe sums of iid groups ofBernoulli trials. E.g., the number of times you voted out of the last 5elections.

Yi ∼ Binomial(yi |πi ) =

(Ni

yi

)πyii (1− πi )Ni−yi

whereπi = [1 + e−xiβ]−1

which implies

E (Yi ) ≡ µi = Niπi = Ni [1 + e−xiβ]−1

and a likelihood of

L(π|y) ∝n∏

i=1

Binomial(yi |πi )

=n∏

i=1

(Ni

yi

)πyii (1− πi )Ni−yi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 222 / 242

Page 1108: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary VariablesSame model as binary logit, but we only observe sums of iid groups ofBernoulli trials. E.g., the number of times you voted out of the last 5elections.

Yi ∼ Binomial(yi |πi ) =

(Ni

yi

)πyii (1− πi )Ni−yi

whereπi = [1 + e−xiβ]−1

which impliesE (Yi ) ≡ µi = Niπi = Ni [1 + e−xiβ]−1

and a likelihood of

L(π|y) ∝n∏

i=1

Binomial(yi |πi )

=n∏

i=1

(Ni

yi

)πyii (1− πi )Ni−yi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 222 / 242

Page 1109: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary VariablesSame model as binary logit, but we only observe sums of iid groups ofBernoulli trials. E.g., the number of times you voted out of the last 5elections.

Yi ∼ Binomial(yi |πi ) =

(Ni

yi

)πyii (1− πi )Ni−yi

whereπi = [1 + e−xiβ]−1

which impliesE (Yi ) ≡ µi = Niπi = Ni [1 + e−xiβ]−1

and a likelihood of

L(π|y) ∝n∏

i=1

Binomial(yi |πi )

=n∏

i=1

(Ni

yi

)πyii (1− πi )Ni−yi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 222 / 242

Page 1110: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary VariablesSame model as binary logit, but we only observe sums of iid groups ofBernoulli trials. E.g., the number of times you voted out of the last 5elections.

Yi ∼ Binomial(yi |πi ) =

(Ni

yi

)πyii (1− πi )Ni−yi

whereπi = [1 + e−xiβ]−1

which impliesE (Yi ) ≡ µi = Niπi = Ni [1 + e−xiβ]−1

and a likelihood of

L(π|y) ∝n∏

i=1

Binomial(yi |πi )

=n∏

i=1

(Ni

yi

)πyii (1− πi )Ni−yi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 222 / 242

Page 1111: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary VariablesSame model as binary logit, but we only observe sums of iid groups ofBernoulli trials. E.g., the number of times you voted out of the last 5elections.

Yi ∼ Binomial(yi |πi ) =

(Ni

yi

)πyii (1− πi )Ni−yi

whereπi = [1 + e−xiβ]−1

which impliesE (Yi ) ≡ µi = Niπi = Ni [1 + e−xiβ]−1

and a likelihood of

L(π|y) ∝n∏

i=1

Binomial(yi |πi )

=n∏

i=1

(Ni

yi

)πyii (1− πi )Ni−yi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 222 / 242

Page 1112: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

The Log-likelihood is then:

ln L(π|y) =n∑

i=1

{ln

(Ni

yi

)+ yi lnπi + (Ni − yi ) ln(1− πi )

}

and after substituting in the systematic component:

ln L(β|y).

=n∑

i=1

{−yi ln[1 + e−xiβ] + (Ni − yi ) ln

(1− [1 + e−xiβ]−1

)}=

n∑i=1

{(Ni − yi ) ln(1 + exiβ)− yi ln(1 + e−xiβ)

}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 223 / 242

Page 1113: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

The Log-likelihood is then:

ln L(π|y) =n∑

i=1

{ln

(Ni

yi

)+ yi lnπi + (Ni − yi ) ln(1− πi )

}

and after substituting in the systematic component:

ln L(β|y).

=n∑

i=1

{−yi ln[1 + e−xiβ] + (Ni − yi ) ln

(1− [1 + e−xiβ]−1

)}=

n∑i=1

{(Ni − yi ) ln(1 + exiβ)− yi ln(1 + e−xiβ)

}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 223 / 242

Page 1114: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

The Log-likelihood is then:

ln L(π|y) =n∑

i=1

{ln

(Ni

yi

)+ yi lnπi + (Ni − yi ) ln(1− πi )

}

and after substituting in the systematic component:

ln L(β|y).

=n∑

i=1

{−yi ln[1 + e−xiβ] + (Ni − yi ) ln

(1− [1 + e−xiβ]−1

)}=

n∑i=1

{(Ni − yi ) ln(1 + exiβ)− yi ln(1 + e−xiβ)

}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 223 / 242

Page 1115: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

The Log-likelihood is then:

ln L(π|y) =n∑

i=1

{ln

(Ni

yi

)+ yi lnπi + (Ni − yi ) ln(1− πi )

}

and after substituting in the systematic component:

ln L(β|y).

=n∑

i=1

{−yi ln[1 + e−xiβ] + (Ni − yi ) ln

(1− [1 + e−xiβ]−1

)}

=n∑

i=1

{(Ni − yi ) ln(1 + exiβ)− yi ln(1 + e−xiβ)

}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 223 / 242

Page 1116: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

The Log-likelihood is then:

ln L(π|y) =n∑

i=1

{ln

(Ni

yi

)+ yi lnπi + (Ni − yi ) ln(1− πi )

}

and after substituting in the systematic component:

ln L(β|y).

=n∑

i=1

{−yi ln[1 + e−xiβ] + (Ni − yi ) ln

(1− [1 + e−xiβ]−1

)}=

n∑i=1

{(Ni − yi ) ln(1 + exiβ)− yi ln(1 + e−xiβ)

}

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 223 / 242

Page 1117: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.(b) Draw many values of β from the multivariate normal with mean vector β

and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1118: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.(b) Draw many values of β from the multivariate normal with mean vector β

and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1119: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.(b) Draw many values of β from the multivariate normal with mean vector β

and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1120: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.(b) Draw many values of β from the multivariate normal with mean vector β

and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1121: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.

(b) Draw many values of β from the multivariate normal with mean vector βand the variance matrix that come from optim.

(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1122: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.(b) Draw many values of β from the multivariate normal with mean vector β

and the variance matrix that come from optim.

(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1123: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.(b) Draw many values of β from the multivariate normal with mean vector β

and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1124: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.(b) Draw many values of β from the multivariate normal with mean vector β

and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1125: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.(b) Draw many values of β from the multivariate normal with mean vector β

and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1126: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.(b) Draw many values of β from the multivariate normal with mean vector β

and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1127: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Uncorrelated Binary Variables

Notes:

1. Similar log-likelihood to binary logit

2. All inference is about the same π as in binary logit

3. How to simulate and compute quantities of interest?

(a) Run optim, and get β and the variance matrix.(b) Draw many values of β from the multivariate normal with mean vector β

and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromBinomial(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 224 / 242

Page 1128: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Correlated Binary VariablesIn the binomial-logit model, V (Y ) = πi (1− πi )/Ni , with no σ2-likeparameter to take up slack. The beta-binomial (or extended BB) adds thisextra parameter. The model:

Yi ∼ febb(yi |πi , γ)

where, recall

febb(yi |πi , γ) = Pr(Yi = yi |πi , γ,N)

=N!

yi !(N − yi )!

yi−1∏j=0

(πi + γj)

N−yi−1∏j=0

(1− πi + γj)/N−1∏j=0

(1 + γj)

and

πi =1

1 + e−xiβ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 225 / 242

Page 1129: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Correlated Binary VariablesIn the binomial-logit model, V (Y ) = πi (1− πi )/Ni , with no σ2-likeparameter to take up slack. The beta-binomial (or extended BB) adds thisextra parameter. The model:

Yi ∼ febb(yi |πi , γ)

where, recall

febb(yi |πi , γ) = Pr(Yi = yi |πi , γ,N)

=N!

yi !(N − yi )!

yi−1∏j=0

(πi + γj)

N−yi−1∏j=0

(1− πi + γj)/N−1∏j=0

(1 + γj)

and

πi =1

1 + e−xiβ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 225 / 242

Page 1130: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Correlated Binary VariablesIn the binomial-logit model, V (Y ) = πi (1− πi )/Ni , with no σ2-likeparameter to take up slack. The beta-binomial (or extended BB) adds thisextra parameter. The model:

Yi ∼ febb(yi |πi , γ)

where, recall

febb(yi |πi , γ) = Pr(Yi = yi |πi , γ,N)

=N!

yi !(N − yi )!

yi−1∏j=0

(πi + γj)

N−yi−1∏j=0

(1− πi + γj)/N−1∏j=0

(1 + γj)

and

πi =1

1 + e−xiβ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 225 / 242

Page 1131: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Correlated Binary VariablesIn the binomial-logit model, V (Y ) = πi (1− πi )/Ni , with no σ2-likeparameter to take up slack. The beta-binomial (or extended BB) adds thisextra parameter. The model:

Yi ∼ febb(yi |πi , γ)

where, recall

febb(yi |πi , γ) = Pr(Yi = yi |πi , γ,N)

=N!

yi !(N − yi )!

yi−1∏j=0

(πi + γj)

N−yi−1∏j=0

(1− πi + γj)/N−1∏j=0

(1 + γj)

and

πi =1

1 + e−xiβ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 225 / 242

Page 1132: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Correlated Binary VariablesIn the binomial-logit model, V (Y ) = πi (1− πi )/Ni , with no σ2-likeparameter to take up slack. The beta-binomial (or extended BB) adds thisextra parameter. The model:

Yi ∼ febb(yi |πi , γ)

where, recall

febb(yi |πi , γ) = Pr(Yi = yi |πi , γ,N)

=N!

yi !(N − yi )!

yi−1∏j=0

(πi + γj)

N−yi−1∏j=0

(1− πi + γj)/N−1∏j=0

(1 + γj)

and

πi =1

1 + e−xiβ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 225 / 242

Page 1133: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Correlated Binary VariablesIn the binomial-logit model, V (Y ) = πi (1− πi )/Ni , with no σ2-likeparameter to take up slack. The beta-binomial (or extended BB) adds thisextra parameter. The model:

Yi ∼ febb(yi |πi , γ)

where, recall

febb(yi |πi , γ) = Pr(Yi = yi |πi , γ,N)

=N!

yi !(N − yi )!

yi−1∏j=0

(πi + γj)

N−yi−1∏j=0

(1− πi + γj)/N−1∏j=0

(1 + γj)

and

πi =1

1 + e−xiβ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 225 / 242

Page 1134: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Grouped Correlated Binary VariablesIn the binomial-logit model, V (Y ) = πi (1− πi )/Ni , with no σ2-likeparameter to take up slack. The beta-binomial (or extended BB) adds thisextra parameter. The model:

Yi ∼ febb(yi |πi , γ)

where, recall

febb(yi |πi , γ) = Pr(Yi = yi |πi , γ,N)

=N!

yi !(N − yi )!

yi−1∏j=0

(πi + γj)

N−yi−1∏j=0

(1− πi + γj)/N−1∏j=0

(1 + γj)

and

πi =1

1 + e−xiβ

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 225 / 242

Page 1135: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The probability model of all the data:

Pr(Y = y |β, γ; N) =n∏

i=1

(N!

yi !(N − yi )!

)

×yi−1∏j=0

{[1 + exp(−xiβ)]−1 + γj

}

×N−yi−1∏

j=0

{[1 + exp(xiβ)]−1 + γj

}/N−1∏j=0

(1 + γj)

ln L(β, γ|y) =n∑

i=1

{ln

(N!

yi !(N − yi )!

)

+

yi−1∑j=0

ln{

[1 + exp(−xiβ)]−1 + γj}

+

N−yi−1∑j=0

ln{

[1 + exp(xiβ)]−1 + γj}−

N−1∑j=0

ln(1 + γj)

.

=n∑

i=1

yi−1∑j=0

ln{

[1 + exp(−xiβ)]−1 + γj}

+

N−yi−1∑j=0

ln{

[1 + exp(xiβ)]−1 + γj}−

N−1∑j=0

ln(1 + γj)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 226 / 242

Page 1136: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The probability model of all the data:

Pr(Y = y |β, γ; N) =n∏

i=1

(N!

yi !(N − yi )!

)

×yi−1∏j=0

{[1 + exp(−xiβ)]−1 + γj

}

×N−yi−1∏

j=0

{[1 + exp(xiβ)]−1 + γj

}/N−1∏j=0

(1 + γj)

ln L(β, γ|y) =n∑

i=1

{ln

(N!

yi !(N − yi )!

)

+

yi−1∑j=0

ln{

[1 + exp(−xiβ)]−1 + γj}

+

N−yi−1∑j=0

ln{

[1 + exp(xiβ)]−1 + γj}−

N−1∑j=0

ln(1 + γj)

.

=n∑

i=1

yi−1∑j=0

ln{

[1 + exp(−xiβ)]−1 + γj}

+

N−yi−1∑j=0

ln{

[1 + exp(xiβ)]−1 + γj}−

N−1∑j=0

ln(1 + γj)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 226 / 242

Page 1137: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The probability model of all the data:

Pr(Y = y |β, γ; N) =n∏

i=1

(N!

yi !(N − yi )!

)

×yi−1∏j=0

{[1 + exp(−xiβ)]−1 + γj

}

×N−yi−1∏

j=0

{[1 + exp(xiβ)]−1 + γj

}/N−1∏j=0

(1 + γj)

ln L(β, γ|y) =n∑

i=1

{ln

(N!

yi !(N − yi )!

)

+

yi−1∑j=0

ln{

[1 + exp(−xiβ)]−1 + γj}

+

N−yi−1∑j=0

ln{

[1 + exp(xiβ)]−1 + γj}−

N−1∑j=0

ln(1 + γj)

.=

n∑i=1

yi−1∑j=0

ln{

[1 + exp(−xiβ)]−1 + γj}

+

N−yi−1∑j=0

ln{

[1 + exp(xiβ)]−1 + γj}−

N−1∑j=0

ln(1 + γj)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 226 / 242

Page 1138: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

The probability model of all the data:

Pr(Y = y |β, γ; N) =n∏

i=1

(N!

yi !(N − yi )!

)

×yi−1∏j=0

{[1 + exp(−xiβ)]−1 + γj

}

×N−yi−1∏

j=0

{[1 + exp(xiβ)]−1 + γj

}/N−1∏j=0

(1 + γj)

ln L(β, γ|y) =n∑

i=1

{ln

(N!

yi !(N − yi )!

)

+

yi−1∑j=0

ln{

[1 + exp(−xiβ)]−1 + γj}

+

N−yi−1∑j=0

ln{

[1 + exp(xiβ)]−1 + γj}−

N−1∑j=0

ln(1 + γj)

.

=n∑

i=1

yi−1∑j=0

ln{

[1 + exp(−xiβ)]−1 + γj}

+

N−yi−1∑j=0

ln{

[1 + exp(xiβ)]−1 + γj}−

N−1∑j=0

ln(1 + γj)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 226 / 242

Page 1139: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?

(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1140: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?

(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1141: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?

(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1142: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?

(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1143: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?

(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1144: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?

(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1145: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?(a) Run optim, and get β, γ and the variance matrix.

(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1146: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.

(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1147: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1148: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1149: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1150: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1151: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Notes:

1. The math looks complicated.

2. The use of this model is simple.

3. γ soaks up binomial misspecification

4. Assuming binomial when EBB is the right model causes se’s to bewrong.

5. How to simulate to compute quantities of interest?(a) Run optim, and get β, γ and the variance matrix.(b) Draw many values of β and γ from the multivariate normal with mean

vector ~(β, γ) and the variance matrix that come from optim.(c) Set X to your choice of values, Xc

(d) Calculate simulations of the probability that any of the component binaryvariables is a one:

πc = [1 + e−xc β]−1

(e) If π is of interest, summarize with mean, SD, CI’s, or histogram asneeded.

(f) If simulations of y are needed, go one more step and draw y fromfebb(yi |πi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 227 / 242

Page 1152: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 228 / 242

Page 1153: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

1 Binary Outcome Models

2 Quantities of InterestAn Example with CodePredicted ValuesFirst DifferencesGeneral Algorithms

3 Model Diagnostics for Binary Outcome Models

4 Ordered Categorical

5 Unordered Categorical

6 Event Count ModelsPoissonOverdispersionBinomial for Known Trials

7 Duration ModelsExponential ModelWeibull ModelCox Proportional Hazards Model

8 Duration-Logit Correspondence

9 Appendix: Multinomial Models

10 Appendix: More on Overdispersed Poisson

11 Appendix: More on Binomial Models

12 Appendix: Gamma Regression

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 228 / 242

Page 1154: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Generalized Gamma DistributionThe Weibull and Exponential distributions are special cases of theGeneralized Gamma distribution, Y ∼ GGamma(ν, λ, p) :

fY (y) =pλpν

Γ(ν)ypν−1exp(−λy)p

When p = 1, Y ∼ Gamma(ν, λ):

fY (y) = λν

Γ(ν) yν−1exp(−λy)

When ν = 1, Y ∼ Weibull( 1λ, p):

fY (y) = pλpyp−1exp(−λy)p

When p = 1 and ν = 1, Y ∼ Expo(λ):

fY (y) = λexp(−λy)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 229 / 242

Page 1155: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Generalized Gamma DistributionThe Weibull and Exponential distributions are special cases of theGeneralized Gamma distribution, Y ∼ GGamma(ν, λ, p) :

fY (y) =pλpν

Γ(ν)ypν−1exp(−λy)p

When p = 1, Y ∼ Gamma(ν, λ):

fY (y) = λν

Γ(ν) yν−1exp(−λy)

When ν = 1, Y ∼ Weibull( 1λ, p):

fY (y) = pλpyp−1exp(−λy)p

When p = 1 and ν = 1, Y ∼ Expo(λ):

fY (y) = λexp(−λy)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 229 / 242

Page 1156: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Generalized Gamma DistributionThe Weibull and Exponential distributions are special cases of theGeneralized Gamma distribution, Y ∼ GGamma(ν, λ, p) :

fY (y) =pλpν

Γ(ν)ypν−1exp(−λy)p

When p = 1, Y ∼ Gamma(ν, λ):

fY (y) = λν

Γ(ν) yν−1exp(−λy)

When ν = 1, Y ∼ Weibull( 1λ, p):

fY (y) = pλpyp−1exp(−λy)p

When p = 1 and ν = 1, Y ∼ Expo(λ):

fY (y) = λexp(−λy)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 229 / 242

Page 1157: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Generalized Gamma DistributionThe Weibull and Exponential distributions are special cases of theGeneralized Gamma distribution, Y ∼ GGamma(ν, λ, p) :

fY (y) =pλpν

Γ(ν)ypν−1exp(−λy)p

When p = 1, Y ∼ Gamma(ν, λ):

fY (y) = λν

Γ(ν) yν−1exp(−λy)

When ν = 1, Y ∼ Weibull( 1λ, p):

fY (y) = pλpyp−1exp(−λy)p

When p = 1 and ν = 1, Y ∼ Expo(λ):

fY (y) = λexp(−λy)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 229 / 242

Page 1158: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When would a Gamma regression be appropriate?

For positive random variables with a skewed distribution, the varianceoften increases with the mean.

Poisson random variable: Var(Y ) = E [Y ] = µ.

Another case occurs where the standard-deviation increases linearly withthe mean: √

Var(Y ) ∝ E (Y )

In this case, the coefficient of variation (ratio of standard deviation toexpectation) is constant:

c.v . =

√Var(Y )

E (Y )

The Gamma distribution has this property.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 230 / 242

Page 1159: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When would a Gamma regression be appropriate?

For positive random variables with a skewed distribution, the varianceoften increases with the mean.

Poisson random variable: Var(Y ) = E [Y ] = µ.

Another case occurs where the standard-deviation increases linearly withthe mean: √

Var(Y ) ∝ E (Y )

In this case, the coefficient of variation (ratio of standard deviation toexpectation) is constant:

c.v . =

√Var(Y )

E (Y )

The Gamma distribution has this property.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 230 / 242

Page 1160: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When would a Gamma regression be appropriate?

For positive random variables with a skewed distribution, the varianceoften increases with the mean.

Poisson random variable: Var(Y ) = E [Y ] = µ.

Another case occurs where the standard-deviation increases linearly withthe mean: √

Var(Y ) ∝ E (Y )

In this case, the coefficient of variation (ratio of standard deviation toexpectation) is constant:

c.v . =

√Var(Y )

E (Y )

The Gamma distribution has this property.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 230 / 242

Page 1161: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When would a Gamma regression be appropriate?

For positive random variables with a skewed distribution, the varianceoften increases with the mean.

Poisson random variable: Var(Y ) = E [Y ] = µ.

Another case occurs where the standard-deviation increases linearly withthe mean: √

Var(Y ) ∝ E (Y )

In this case, the coefficient of variation (ratio of standard deviation toexpectation) is constant:

c.v . =

√Var(Y )

E (Y )

The Gamma distribution has this property.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 230 / 242

Page 1162: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When would a Gamma regression be appropriate?

Example

Mean duration of developmental period in Drosphila melanogaster(McCullagh & Nelder, 1989)

15 20 25 30

2030

4050

60

Temperature (Celsius)

Mea

n du

ratio

n (h

rs.)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 231 / 242

Page 1163: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

When would a Gamma regression be appropriate?

Example

Mean duration of developmental period in Drosphila melanogaster(McCullagh & Nelder, 1989)

20 30 40 50 60

0.5

1.0

1.5

2.0

Mean duration (hrs.)

Bat

ch s

tand

ard

devi

atio

n

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 232 / 242

Page 1164: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Gamma shapesν is the shape parameter, λ is the scale parameter

0 2 4 6 8 10

0.0

0.5

1.0

1.5

y

f(y)

ν = 0.5ν = 1ν = 2ν = 5λ = 1

f(y) =λν

Γ(ν)(y)(ν−1)exp(−λy)

Special cases:

ν = 1 =⇒ Exponential

ν →∞ =⇒ Normal

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 233 / 242

Page 1165: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Gamma as an EDF

fY (y) =λν

Γ(ν)yν−1exp(−λy)

= exp

(−λν y + ln(λν )

ν−1+ νln(νy)− ln(y)− ln(Γ(ν))

)

Where

θ = −λν

φ = ν−1 = σ2

b(θ) = −ln(λν

)E [Y ] = b′(θ) =

ν

λ= µ

Var(Y ) = φb′′(θ) =1

ν

ν2

λ2= σ2µ2

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 234 / 242

Page 1166: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Gamma as an EDF

fY (y) =λν

Γ(ν)yν−1exp(−λy)

= exp

(−λν y + ln(λν )

ν−1+ νln(νy)− ln(y)− ln(Γ(ν))

)Where

θ = −λν

φ = ν−1 = σ2

b(θ) = −ln(λν

)E [Y ] = b′(θ) =

ν

λ= µ

Var(Y ) = φb′′(θ) =1

ν

ν2

λ2= σ2µ2

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 234 / 242

Page 1167: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link functions

Canonical link

η = θ = − 1

µ

The reciprocal transformation does not map the range of µ onto the wholereal line.

The requirement that µ > 0 places restrictions on β’s.

The canonical link is rarely used.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 235 / 242

Page 1168: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link functions

Inverse polynomial: linear

η =µ−1 = β0 + β1/x

Inverse polynomial: quadratic

η =µ−1 = β0 + β1x + β2/x

Inverse polynomials have appealing property that η is everywhere positiveand bounded.

Application: sometimes used in plant density experiments, where yield perplant (yi ) varies inversely with plant density (xi )

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 236 / 242

Page 1169: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link functions

Log link

η =ln(µ) = β0 + β1x

η =ln(µ) = β0 + β1x + β2/x

Application: useful for describing functions that have turning points, butare noticeably asymmetric around that point.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 237 / 242

Page 1170: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Link functions

Identity link

η =µ = β0 + β1x

Application: used for modeling variance components.

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 238 / 242

Page 1171: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Maximum Likelihood Estimation

L =n∏

i=1

λν

Γ(ν)yν−1i exp(−λyi )

lnL =n∑

i=1

ln

[λν

Γ(ν)yν−1i exp(−λyi )

]

=n∑

i=1

νlnλ− lnΓ(ν) + (ν − 1)lnyi − λyi

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 239 / 242

Page 1172: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Gamma regression with weightsSuppose your data consist of n observations, each from a separate groupi ∈ {1, . . . , n}. Each group has ni individuals.

Example

Yi is the duration of embryonic period in ni batches of fruit flies

Yi =

ni∑j=1

Yij Yij = duration for j embryo in i-th batch

Y si = Yi/ni = average duration in i-th batch

If Yij ∼ Gamma(λi , ν), independent, with λi = ν/µi :

E [Y si ] =

1

ni

ni∑j=1

E [Yij ] =1

ni

ni∑j=1

µi = µi

Var(Y si ) =

1

niVar(Yi ) =

σ2µ2i

niweights = ni

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 240 / 242

Page 1173: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Gamma regression with weightsSuppose your data consist of n observations, each from a separate groupi ∈ {1, . . . , n}. Each group has ni individuals.

Example

Yi is the duration of embryonic period in ni batches of fruit flies

Yi =

ni∑j=1

Yij Yij = duration for j embryo in i-th batch

Y si = Yi/ni = average duration in i-th batch

If Yij ∼ Gamma(λi , ν), independent, with λi = ν/µi :

E [Y si ] =

1

ni

ni∑j=1

E [Yij ] =1

ni

ni∑j=1

µi = µi

Var(Y si ) =

1

niVar(Yi ) =

σ2µ2i

niweights = ni

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 240 / 242

Page 1174: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Gamma regression with weightsSuppose your data consist of n observations, each from a separate groupi ∈ {1, . . . , n}. Each group has ni individuals.

Example

Yi is the duration of embryonic period in ni batches of fruit flies

Yi =

ni∑j=1

Yij Yij = duration for j embryo in i-th batch

Y si = Yi/ni = average duration in i-th batch

If Yij ∼ Gamma(λi , ν), independent, with λi = ν/µi :

E [Y si ] =

1

ni

ni∑j=1

E [Yij ] =1

ni

ni∑j=1

µi = µi

Var(Y si ) =

1

niVar(Yi ) =

σ2µ2i

niweights = ni

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 240 / 242

Page 1175: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Application: Drosophila melanogaster

4 models estimated:

1 log(Durationi ) = β0 + β1Tempi

2 log(Durationi ) = β0 + β1Tempi + β2/Tempi3 log(Durationi ) = β0 + β1Tempi + β2/Tempi

(weighted by batch size)

4 log(Durationi ) = β0 + β1Tempi + β2/(Tempi − δ)(weighted by batch size)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 241 / 242

Page 1176: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Application: Drosophila melanogaster

4 models estimated:

1 log(Durationi ) = β0 + β1Tempi2 log(Durationi ) = β0 + β1Tempi + β2/Tempi

3 log(Durationi ) = β0 + β1Tempi + β2/Tempi(weighted by batch size)

4 log(Durationi ) = β0 + β1Tempi + β2/(Tempi − δ)(weighted by batch size)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 241 / 242

Page 1177: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Application: Drosophila melanogaster

4 models estimated:

1 log(Durationi ) = β0 + β1Tempi2 log(Durationi ) = β0 + β1Tempi + β2/Tempi3 log(Durationi ) = β0 + β1Tempi + β2/Tempi

(weighted by batch size)

4 log(Durationi ) = β0 + β1Tempi + β2/(Tempi − δ)(weighted by batch size)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 241 / 242

Page 1178: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Application: Drosophila melanogaster

4 models estimated:

1 log(Durationi ) = β0 + β1Tempi2 log(Durationi ) = β0 + β1Tempi + β2/Tempi3 log(Durationi ) = β0 + β1Tempi + β2/Tempi

(weighted by batch size)

4 log(Durationi ) = β0 + β1Tempi + β2/(Tempi − δ)(weighted by batch size)

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 241 / 242

Page 1179: Soc504: Generalized Linear Models - Princeton UniversitySoc504: Generalized Linear Models Brandon Stewart1 Princeton February 22 - March 15, 2017 1These slides are heavily in uenced

Application: Drosophila melanogaster

15 20 25 30

2030

4050

60

Temperature (Celsius)

Mea

n du

ratio

n hr

s

log(µ) = β0 + β1x

Min. duration at 35 degrees

15 20 25 30

2030

4050

60

Temperature (Celsius)

Mea

n du

ratio

n hr

s

log(µ) = β0 + β1x + β2 x

Min. duration at 35 degrees

15 20 25 30

2030

4050

60

Temperature (Celsius)

Mea

n du

ratio

n hr

s

log(µ) = β0 + β1x + β2 x(weighted)

Min. duration at 35 degrees

15 20 25 30

2030

4050

60

Temperature (Celsius)

Mea

n du

ratio

n hr

s

log(µ) = β0 + β1x + β−1 (x − δ)(weighted)

Min. duration at 29.95 degrees

Stewart (Princeton) GLMs Feb 22 - Mar 15, 2017 242 / 242