event history analysis 1 sociology 8811 lecture 14 copyright © 2007 by evan schofer do not copy or...

52
Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Post on 20-Dec-2015

215 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Event History Analysis 1

Sociology 8811 Lecture 14

Copyright © 2007 by Evan Schofer

Do not copy or distribute without permission

Page 2: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Announcements

• Paper #1 due on Thursday!• Questions?

• New Topic: Event History Analysis

Page 3: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Regression and EHA: Examples

• Medical Research on Drug Efficacy

• Question #1: Do patients with larger doses of a drug have lower cholesterol?

• Approach: OLS Regression• If assumptions are met, OLS is appropriate• Independent Variable = dosage (“level” of drug)• Dependent Variable = cholesterol (“level”)

Page 4: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Regression Example: CholesterolRelationship between level of

X and Y is modeled as a linear function:

Y = a + bX + e

300

250

200

150

100

0 10 20 30 40 50 60 70 Drug Dosage (mg)

Ch

oles

tero

l Lev

el

Page 5: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Example 2: Drug & Mortality

• Suppose a different question:

• Does increased drug dosage reduce the incidence of mortality among patients?

• The dependent variable has a different character

• 1. Whereas cholesterol is measured as a “level” (continuously), mortality is “discrete”

• Either the patient lives or they don’t (not a “level”)

• 2. Also, TIMING is an issue• Not just if a patient survives, but how long• A drug that extends life is good, even if patients die

Page 6: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Logit/Probit Strategies• Research strategies to address this problem:

• 1. Use a non-linear regression model for discrete outcomes: Logit, Probit, etc.

• Dependent variable is a dummy for patient mortality• Look for relationship between dosage and mortality

• Benefit: Easy. An analog of regression

• Limitation: Doesn’t take timing into account• All patients that die have the same influence on the

model (whether they live 5 days or 20 years due to the drug dosage).

Page 7: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Logit/Probit Strategy: Visual

Relationship between level of X and the discrete

variable Y is modeled as a non-linear function

Yes

No

0 10 20 30 40 50 60 70 Drug Dosage (mg)

Mor

tali

ty

Page 8: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Drug & Mortality: OLS Regression

• Option #2: Use OLS regression to model the time elapsed (duration) until mortality– Rather than ask “did they live or die”

(logit/probit), you ask “how long did they live”?• Compute a variable that reflects the time until mortality

(in relevant time units – e.g., months since drug therapy is started)

• Model time as the dependent variable• Observe: Do patients with high drug doses die later

than ones with low doses?

Page 9: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

OLS Duration Strategy: Visual

Q: Where do you put individuals

who were alive at the end of the

study?

80

60

40

20

0

0 10 20 30 40 50 60 70 Drug Dosage (mg)

Mon

ths

Un

til M

orta

lity

Page 10: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Drug & Mortality: OLS Regression

• Problem #1: What about patients who don’t experience mortality during study?

• This is called “censored data”• If study is 80 months, you know that Y>80…

– But, you don’t have an exact value

• What do you do?– Treat them as experiencing mortality at the very end of the

study? Or approximate time of mortality?– Exclude them? NO! That selects on the dependent variable!

• Possible solution: Use models for censored data– Ex: tobit model.

Page 11: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Drug & Mortality: OLS Regression

• Problem #2: Temporal data often violates normality assumption of OLS regression

• Often violations are quite bad• “Censored” data is a surmountable problem, but

normality violation is usually not• So – we shouldn’t typically use OLS!

Page 12: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Drug and Mortality: EHA Strategy• Event History Analysis (EHA) provides purchase on

this exact type of problem• And others, as well

• In essence, EHA models a dependent variable that reflects both:– 1. Whether or not a patient experiences mortality (like

logit),

and… – 2. When it occurs (like a OLS regression of duration)

• Note: This information is typically encoded in 2 or more variables

Page 13: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Drug and Mortality: EHA Strategy

• Moreover: EHA is very flexible and can address various situations:

• 1. EHA can address “repeated” events• Mortality can only occur once per patient. • But, heart attack can occur repeatedly, at different

points in time – further confounding OLS or probit

• 2. EHA can address different time-clocks• Durations could be coded in a number of contexts:• From start of study. Age of patient. Historical time.

• And even more complex issues

Page 14: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA: Overview and Terminology

• EHA is referred to as “dynamic” modeling• i.e., addresses the timing of outcomes: rates

• Dependent variable is best conceptualized as a rate of some occurrence

• Not a “level” or “amount” as in OLS regression• Think: “How fast?” “How often?”

• The “occurrence” may be something that can occur only once for each case: e.g., mortality

• Or, it may be repeatable: e.g., marriages, strategic alliances.

Page 15: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA: Overview

• EHA involves both descriptive and parametric analysis of data

• Just like regression• Scatterplots, partialplots = descriptive• OLS model/hypothesis tests = parametric

• Descriptive analyses/plots• Allow description of the overall rate of some outcome• For all cases, or for various subgroups

• Parametric Models • Allow hypothesis testing about variables that affect

rate (and can include control variables).

Page 16: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA: Types of Questions

• Some types of questions EHA can address:

• 1. Mortality: Does drug dosage reduce rates?• Does “rate” decrease with larger doses?• Also: control for race, gender, treatment options, etc

• 2. Life stage transitions: timing of marriage• Is rate affected by gender, class, religion?

• 3. Organizational mortality• Is rate affected by size, historical era, competition?

• 4. Civil war• Is rate affected by economic, political factors?

Page 17: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA Terminology: States & Events

• EHA has evolved its own terminology:

• “State” = the “state of being” of a case• Conceptualized in terms of discrete phenomena• e.g., alive vs. dead

• “State space” = the set of all possible states• Can be complex: Single, married, divorced, widowed

• “Event” = Occurrence of the outcome of interest

• Shift from “alive” to “dead”, “single” to “married”• Occurs at a specific, known point in time

Page 18: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Terminology: Risk & Spells

• “Risk Set” = the set of all cases capable of experiencing the event

• e.g., those “at risk” of experiencing mortality• Note: the risk set changes over time…

• “Spell” = A chunk of time that a case experiences, bounded by: events, and/or the start or end of the study

• As in “I’m gonna sit here for a spell…”• EHA is, in essence, an analysis of a set of spells

(experienced by a given sample of cases).

Page 19: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

States, Spells, & Events: Visually

• If we assign numeric values to states, it is easy to graph cases over time

• As they experience 1 or more spells

• Example: drug & mortality study

• States:• Alive = 0• Dead = 1

• Time = measured in months• Starting at zero, when the study begins• Ending at 60 months, when study ends (5 years).

Page 20: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

States, Spells, & Events: Visually

• Example of mortality at month 33

1

0

0 10 20 30 40 50 60 Time (Months)

Sta

te Spell #1

Spell #2

EventEnd of Study

• Note: It takes 2 spells to describe this case– But, we may only be interested in the first spell. (Because there is no

possibility of change after transition to state = 1)

Page 21: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

States, Spells, & Events: Visually

• Example of a patient who is cured– Doesn’t experience mortality during study

1

0

0 10 20 30 40 50 60 Time (Months)

Sta

te Spell #1

End of Study

• Note: Only 1 spell is needed– The spell indicates a consistent state (0), for the

period of time in which we have information

Page 22: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

More Terminology: Censoring

• Note: In both cases, data runs out after month 60

• Even if the patient is still alive

• In temporal analysis, we rarely have data for all relevant time for all cases

• “Censored” = indicates the absence of data before or after a certain point in time

• As in: “data on cases is censored at 60 months”

• “Right Censored” = no data after a time point

• “Left Censored” = no data before a time point

Page 23: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

States, Spells, & Events: Visually

• A more complex state space: partnership• 0 = single, 1 = married, 2 = divorced, 3 = widowed

• Individual history:• Married at 20, divorced at 27, remarried at 33

3

2

1

0

16 20 24 28 32 36 40 44Age (Years)

Sta

te

Spell #1Right

Censored at 45

Spell #4Spell #2 Spell #3

Page 24: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Measuring States and Times

• EHA, in short, is the analysis of spells• It takes into account the duration of spells, and

whether or not there was a change of state at the end

• States at start and end of spell are measured by assigning pre-defined values to a variable

• Much like logit/probit or multinomial logit

• Times at the start and end of spell must also be measured

• Time Unit = The time metric in the study• e.g., minutes, hours, days, months, years, etc

Page 25: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time Clock

• Time Clock = time reference of the analysis

• Possibilities:• Duration since start of study• Chronological age of case (person, firm, country)• Duration since end of last spell• i.e., clock is set to zero at start of each spell• Historical time – the actual calendar date

• The choice of time-clock can radically change the analysis and meaning of results

• It is crucial to choose a clock that makes sense for the hypotheses you wish to test

Page 26: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time Clocks Visually: Age

3

2

1

0

16 20 24 28 32 36 40 44Age (Years)

Sta

te

Spell #1End of StudySpell #4Spell #2 Spell #3

• EHA examines rate of transitions as a function of a person’s age

Page 27: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time Clocks Visually: Duration Single from 16-20 (4 years), married from 20-27 (7 years),

divorced from 27-33 (6 yrs), remarried at 33-45 (12 yrs)

3

2

1

0

0 4 6 12 18 22 Duration (Years)

Sta

te

Spell #1

Spell #4Spell #2 Spell #3

• EHA examines rate of transitions as a function of a person’s duration in their current state

Page 28: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time Clocks: General Advice• Different time-clocks have different strengths

• We’ll discuss this more…

• Chronological Age = good for processes clearly linked to age

• Biological things: fertility, mortality• Liability of newness

• Historical time = useful for examining the impact of historical change on ongoing phenomena

• E.g., effects of changing regulatory regimes on rates of strategic alliances

Page 29: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Moving Toward Analyses: Example

• Example: Employee retention• How long after hiring before employees quit?

• Data: Sample of 12 employees at McDonalds

• Time-Clock/Time Unit: duration of employment from time of hiring (measured in days)

• 2 Possible states:• Employed & No longer employed

• We are uninterested in subsequent hires• Therefore, we focus on initial spell, ending in quitting.

Page 30: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Example: Employee Retention

• Visually – red line indicates length of employment spell for each case:

0 20 40 60 80 100 120 Time (days)

Cas

es

Right Censored

Page 31: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Simple EHA Descriptives

• Question: What simple things can we do to describe this sample of 12 employees?

• 1. Average duration of employment• Only works if all (or nearly all) have quit• Many censored cases make “average” meaningless

– This is a fairly useful summary statistic• Gives a sense of overall speed of events• Especially useful when broken down by sub-groups• e.g., average by gender or compensation plan.

Page 32: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Descriptives: Average Duration

• Simply calculate the mean time-to-quitting

0 20 40 60 80 100 120 Time (days)

Cas

es

Right Censored

Average = 33.4 days

Page 33: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Simple EHA Descriptives

• Question: What simple things can we do to describe this sample of 12 employees?

• 2. Compute “Half Life” of employee tenure• Determine time at which attrition equals 50%• Also highlights the overall turnover rate• Note: Exact value is calculable, even if there are

censored cases• Again, computing for sub-groups is useful

Page 34: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Descriptives: Half Life

• Determine time when ½ of sample has had event

0 20 40 60 80 100 120 Time (days)

Cas

es

Right Censored

Half Life = 23 days

Page 35: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Simple EHA Descriptives

• Question: What simple things can we do to describe this sample of 12 employees?

• 3. Tabulate (or plot) quitters in different time-periods: e.g., 1-20 days, 21-40 days, etc.

• Absolute numbers of “quitters” or “stayers”– or

• Numbers of quitters as a proportion of “stayers”• Or look at number (or proportion) who have “survived”

(i.e., not quit)

Page 36: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Descriptives: Tables• For each period, determine number or

proportion quitting/staying

0 20 40 60 80 100 120 Time (days)

Cas

es

Day 1-20 20-40 40-60 60-80 80-100

Page 37: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA Descriptives: TablesTime Range

Quitters:

Total #, %

# staying

1 Day 1-20 5 quit, 42% of all,

42% of remaining

7 left, 58 % of all

2 Day 21-40 2 quit, 16% of all

29% of remaining

5 left, 42% of all

3 Day 41-60 1 quit, 8% of all

20% of remaining

4 left, 33 % of all

4 Day 61-80 1 quit, 8% of all

25% of remaining

3 left, 25% of all

Page 38: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA Descriptives: Tables

• Remarks on EHA tables:

• 1. Results of tables change depending on time-ranges chosen (like a histogram)

• E.g., comparing 20-day ranges vs. 10-day ranges

• 2. % quitters vs. % quitters as a proportion of those still employed

• Absolute % can be misleading since the number of people left in the risk set tends to decrease

• A low # of quitters can actually correspond to a very high rate of quitting for those remaining in the firm

• Typically, these ratios are more socially meaningful than raw percentages.

Page 39: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA Descriptives: Plots

• We can also plot tabular information:

0

10

20

30

40

50

60

70

80

90

100

0 1 2 3 4 5

Time Period

Pe

rce

nt

% Quit (of Remaining)

% Remaining

Page 40: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

The Survivor Function

• A more sophisticated version of % remaining• Calculated based on continuous time (calculus), rather

than based on some arbitrary interval (e.g., day 1-20)

• Survivor Function – S(t): The probability (at time = t) of not having the event prior to time t.

• Always equal to 1 at time = 0 (when no events can have happened yet

• Decreases as more cases experience the event• When graphed, it is typically a decreasing curve• Looks a lot like % remaining

Page 41: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Survivor Function

• McDonald’s Example:Survivor Function: McDonalds Employees

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

0 20 40 60 80 100 120

Time

S(t

)

Steep decreases indicate lots of

quitting at around 20 days

Page 42: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

The Hazard Function

• A more sophisticated version of # events divided by # remaining

• Hazard Function – h(t) = The probability of an event occurring at a given point in time, given that it hasn’t already occurred

• Formula:

t

tTtTttPth

t

)(lim)(

0

• Think of it as: the rate of events occurring for those at risk of experiencing the event

Page 43: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

The Hazard Function

• Example:McDonalds Employees: Hazard Rate

0.00

0.02

0.04

0.06

0.08

0.10

0.12

0.00 10.00 20.00 30.00 40.00 50.00 60.00 70.00 80.00

Time

h(t

)

High (and wide) peaks indicate lots of quitting

Page 44: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cumulative Hazard Function

• Problem: the Hazard Function is often very spiky and hard to read/interpret

• Alternative #1: “Smooth” the hazard function (using a smoothing algorithm)

• Alternative #2: The “cumulative” or “integrated” hazard

• Use calculus to “integrate” the hazard function• Recall – An integral represents the area under the

curve of another function between 0 and t.• Integrated hazard functions always increase (opposite

of the survivor function).• Big growth indicates that the hazard is high.

Page 45: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Integrated Hazard Function

• Example:McDonalds Employees: Integrated Hazard

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

0 20 40 60 80 100

Time

Inte

gra

ted

Haz

ard

Steep increases indicate peaks in

hazard rate

“Flat” areas indicate low hazard rate

Page 46: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Descriptive EHA: Marriage

• Example: Event = Marriage• Time Clock: Person’s Age• Data Source: NORC General Social Survey• Sample: 29,000 individuals

Page 47: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Survivor: Marriage

• Compare survivor for women, men:Kaplan-Meier survival estimates, by dfem

analysis time0 50 100

0.00

0.25

0.50

0.75

1.00

dfem 0

dfem 1

Survivor plot for Men

(declines later)

Survivor plot for Women

(declines earlier)

Page 48: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Integrated Hazard: Marriage

• Compare Integrated Hazard for women, men:Nelson-Aalen cumulative hazard estimates, by dfem

analysis time0 50 100

0.00

1.00

2.00

3.00

dfem 0

dfem 1

Integrated Hazard for men increases slower (and remains lower)

than women

Page 49: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Figure 3. Estimated hazard rateof entry into first marriage for entire sample

Est

ima

ted

Ha

zard

Ra

te

Age in Years12 20 30 40 50 60 70 80

12 20 30 40 50 60 70 80

0

.05

.1

.15

.2

0

.05

.1

.15

.2

Hazard Plot: Marriage• Hazard Rate: Full Sample

Page 50: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Survivor Plot: Pros/Cons

• Benefits: • 1. Clear, simple interpretation• 2. Useful for comparing subgroups in data

Limitations:• 1. Mainly useful for a fixed risk set with a single non-

repeating event (e.g., Drug trials/mortality)– If events recur frequently, the survivor drops to zero (and

becomes uninterpretable)

• 2. If the risk set fluctuates a lot, the survivor function becomes harder to interpret.

Page 51: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Hazard Plot Pros/Cons

• Benefits:• Directly shows the rate over time

– This is the actual dependent variable modeled

• Works well for repeating events

• Limitations:• Can be difficult to interpret – requires practice• Spikes make it hard to get a clear picture of trend

– Pay close attention to width of spikes, not just height!

• Choice of smoothing algorithms can affect results• Hard to compare groups (due to spikeyness).

Page 52: Event History Analysis 1 Sociology 8811 Lecture 14 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Integrated Hazard Plot Pros/Cons

• Benefits:• Closely related to the dependent variable that you’ll be

modeling• Very good for comparing groups• Works for repeating events

• Limitations:• Not as intuitive as the actual hazard rate• Still takes some practice to interpret.