event history analysis 4 sociology 8811 lecture 18 copyright © 2007 by evan schofer do not copy or...

38
Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Post on 22-Dec-2015

219 views

Category:

Documents


4 download

TRANSCRIPT

Page 1: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Event History Analysis 4

Sociology 8811 Lecture 18

Copyright © 2007 by Evan SchoferDo not copy or distribute without permission

Page 2: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Announcements

• Class topic: EHA data structures• Later: More details on models, diagnostics

• Paper Assignment #2 coming soon…

Page 3: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Spell Data• Background Concepts:

• In EHA, each line of data may not represent an entire case

• Rather, it describes a case for some span of time• Either a spell, or part of a spell…• Often called “multiple-record” data

• The complete specification of a spell entails:• Start and end times of the spell• State at the start and end of the spell• But, simple analyses don’t always require all this info

– For instance, the start state is almost always zero… and thus does not need to be specified.

Page 4: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Spell Data: Example 1• Example: Student completion of Grad School

• Research Question: What attributes of students lead to faster completion of PhDs?

• Time clock options:• Age, duration, historical time, etc

• Which is best for this research question?

• Answer:• Duration. All we care about is time until graduation• Age/historical era may be useful predictors, but aren’t

the main substantive concern.

Page 5: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Spell Data: Example 1• Independent variables of interest:

• Science field (vs. humanities/soc sci) (1=yes)• Perhaps humanities/soc sci PhD takes longer

• Married at start of grad school (1=yes)• Perhaps family puts pressure to get done with school

• Years since completion of undergrad• Perhaps ‘taking a break’ makes you more focused

• Note: All variables are constant for the case• None are “time-varying” covariates• i.e., they don’t change from spell to spell.

Page 6: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Spell Data: Example 1

• Full specification of spell data:

ID start time

end time

start state

end state

science?

married?

yrs after ugrad

1 0 5.6 yrs 0 1 1 1 0

2 0 9 yrs 0 0 0 0 0

3 0 6.8 yrs 0 1 0 0 2

4 0 2.2 yrs 0 0 1 0 0

5 0 7.2 yrs 0 1 0 1 4

Page 7: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Spell Data: Example 1• Notes on example 1:

• 1. Start time and start state are always zero• Typical of duration datasets• Often, specification of those variables can be omitted

• 2. A single line of data (one “record”) was sufficient to describe an entire case

• Except if an individual got 2 PhD’s in the study

• 3. Equivalent information could be coded in multiple records per individual

• Total time is what matters, not # of lines of data.

Page 8: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Spell Data: Example 1

• The info could be coded in multiple spells:ID start

timeend time

start state

end state

science?

married?

yrs after ugrad

1 0 5.6 yrs 0 1 1 1 0

• Can also be represented as:

1 0 2 yrs 0 0 1 1 0

1 2 3 yrs 0 0 1 1 0

1 3 5.6 yrs 0 1 1 1 0

Page 9: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Spell Data: Example 1• Advantage of splitting a single spell into many:

• You can change values of independent variables over time

• Referred to as “time-varying” covariates

• Strategies:• 1. Create a new record any time a covariate changes• 2. Or, if variables change on a regular basis (e.g.,

yearly, daily, etc), split the records by that unit– Create a “yearly spell file” or “hourly spell file”.

Page 10: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Spell Data: Example 2• Research Question #2: How has NSF

funding affected the production of PhDs since 1970?

• Time Clock: Historical Time• We are interested in the hazard rate per year for all

universities, not the duration for individuals

• Independent variables of interest:• Department funding from NSF (varies yearly)• Department size (varies yearly)• Individual age, gender, marital status.

Page 11: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Spell Data: Example 2

• Yearly split-spell historical time data:

ID start time

end time

start state

end state

Funding $

Dept size

Gender

83 1971 1972 0 0 800k 65 0

83 1972 1973 0 0 600k 71 0

83 1973 1974 0 0 500k 69 0

83 1974 1975 0 1 900k 76 0

84 1988 1989 0 0 2.5m 106 1

Page 12: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA in STATA

• Stata requires that you specify the way your data is organized

• Stata command: stset• “Survival Time Setup”• No analysis commands will work until this is used

• Options vary depending on type of data

Page 13: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

STATA: stset• Simple data setup:stset timevar, failure(failvar==1)

• Selects the end-spell variable to be “timevar”

• Selects variable defining event (vs. censored)• And condition representing an event

• Assumes the following:• Start time is always 0• Start state is always zero• Failure = 0 represents censoring

• Example:stset endtime, failure(endstate==1)

Page 14: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

STATA: stset options• id(varname)

• Specifies a case ID• Tells stata that multiple records belong to a particular

case• Ex: stset mardate, failure(mar=1) id(caseid)

• Necessary for datasets in which cases have multiple records:

– ID month doseage cured? – 1 1 250 0– 1 2 300 0– 1 3 300 1– 2 1 0 0– 2 2 0 0– …

Page 15: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

STATA: stset options• origin(expression)

• Specifies “zero” time: when cases become “at risk”• If origin is 1970, then 1971 = 1, 1972=2, etc• Also, origin can vary from case to case

– e.g., so all cases start at zero when a person gets married

• Ex: origin(time yearmarried) starts clock at time of marriage

• Ex: origin(dmarried == 1) starts clock at time at event when dmarried initially becomes 1

– i.e., at end of the first time-span when dmarried became 1

Page 16: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

STATA: stset options• Left censoring: the condition in which cases

are unobserved PRIOR to a point in time• enter(expression)

• Specifies when cases come under observation– Even though they have been “at risk” since time = 0

• If entry time is not specified, STATA assumes that cases are under observation at all times in dataset

• Ex: enter(time enteredstudy) starts clock at time = value of variable called “enteredstudy”

• Ex: enter(underobs == 1) starts clock at time = value of variable called “enteredstudy”

• NOTE: entry occurs at END time!!!– So case doesn’t really enter until NEXT record in data…

Page 17: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

STATA: stset options• When do subjects leave the analysis?

• Stata defaults:– When there is no more data – When case has an event (even if there is more data)

• exit(expression)• Specifies when cases exit the analysis• Ex: exit(lefthospital == 1) removes cases at

time that patients leave the hospital• Ex: exit(time yrdivorced)

– Also: necessary for multiple case data• Overrides Stata defaults – keeps all cases in the

analysis, even after first event• Ex: exit(time .)

Page 18: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

STATA: stset options• Issue: In multiple-case data, stata assumes

there are no gaps• If cases end at time 5, 10…

– Stata assumes that first spell is 0 – 5, second is 5 – 10– The second spell begins right when the first ends…

• But, sometimes you have gaps in your data– Cases leave the analysis or are censored

• time0(expression)• Ex: time0(starttime)• Allows you to specify when cases enter the data

(rather than relying on statas assumptions)• Allows you to indicate that cases entered LATER than

end of last spell.

Page 19: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA Commands: st set

• It is easy to make mistakes with stset!

• Diagnostic strategies:– 1. look at st set output

• Stata identifies possible errors– It isn’t always right… but it is a start

– 2. Use the “stdes” command• Generates summary statistics• Shows total # of cases, spells, events• Plus minimum / maximum entry & exit times

– Also: “stvary” command if you have multiple-record data

– 3. Examine Stata’s time variables…

Page 20: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

EHA Commands: st set

• st set creates a series of new variables that stata uses in the analysis:

• _t0 and _t – define the start and end of a time span• _d – defines whether an event occurs at end of span• _st – determines if time span should be included in the

analysis– Versus being excluded because case hadn’t entered yet, or

already exited…

– If you are having problems with st set, look at these variables to see what is going wrong…

• Ex: list yrmarried dmarried _t0 _t _d _st• Or just look in the data editor/viewer.

Page 21: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Event History Example

• What factors affect how soon a country passes an environmental protection law?

• In this analysis it is possible to use time-varying data…aren’t you excited?

• What is the “state” space?• “Law” vs. “no law”

• What is the “event”?• Passing an environmental law in a given year

Page 22: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Event History Example

• What is the “risk set”?• Every country that has not passed an environmental

protection law

• What is the duration of interest?• Time from countries becoming “at risk” to adoption of

law

• What is an appropriate “time clock”?• Option 1: The number of years between independence

and when the law was written• Option 2: Historical time – based on a origin time in

which countries become “at risk”

Page 23: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Example: Environmental Laws• Cross-national time series dataset of nearly

100 countries• Event: when a country writes its first comprehensive

environmental law (e.g., EPA)

• Data taken from various sources• Independent variables: GDP, population, democracy,

degradation, education, domestic and international NGOs

• Time duration: analyses are from 1970-1998• In other words, countries enter the “risk set” in 1970, or

when they become independent

• Total sample of 97 countries• 73 countries have an event between 1970 and 1998.

Page 24: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time-Varying Data Structure

• In the previous example, each row of data was a separate survey respondent

• Because survey respondents were not tracked over multiple years, this data was not “time-varying”

• In the current example, we have the advantage of time-varying data

• Each row of data is a country-year• Our independent variables may change over time.

Page 25: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

States, Spells, and Events

• Example (India):

1

0

1970 … 1983 1984 1985 1986 1987 1988 … 1998Year

Sta

te

Spell #2Spell #1

Law written

Page 26: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

States, Spells, and Events

• Example (Iran):

1

0

1970 … 1983 1984 1985 1986 1987 1988 … 1998Year

Sta

te Spell #1

No law written as of 1998

Page 27: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time-Varying Data Structure

newname2 newid3 year law eventnum start end ss es popINDIA 1119 1978 0 1 1978 1979 0 0 656941INDIA 1119 1979 0 1 1979 1980 0 0 672021INDIA 1119 1980 0 1 1980 1981 0 0 687332INDIA 1119 1981 0 1 1981 1982 0 0 702821INDIA 1119 1982 0 1 1982 1983 0 0 718426INDIA 1119 1983 0 1 1983 1984 0 0 734072INDIA 1119 1984 0 1 1984 1985 0 0 749677INDIA 1119 1985 0 1 1985 1986 0 0 765147INDIA 1119 1986 1 1 1986 1987 0 1 781893INDIA 1119 1987 0 1 1987 1988 1 1 798680INDIA 1119 1988 0 1 1988 1989 1 1 815590INDIA 1119 1989 0 1 1989 1990 1 1 832535INDIA 1119 1990 0 1 1990 1991 1 1 849515INDIA 1119 1991 0 1 1991 1992 1 1 866530

• Example:

Law writtenSpell State

Population

Page 28: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time-Varying Data Structure

newname2 newid3 year law eventnum start end ss es popINDIA 1119 1978 0 1 1978 1979 0 0 656941INDIA 1119 1979 0 1 1979 1980 0 0 672021INDIA 1119 1980 0 1 1980 1981 0 0 687332INDIA 1119 1981 0 1 1981 1982 0 0 702821INDIA 1119 1982 0 1 1982 1983 0 0 718426INDIA 1119 1983 0 1 1983 1984 0 0 734072INDIA 1119 1984 0 1 1984 1985 0 0 749677INDIA 1119 1985 0 1 1985 1986 0 0 765147INDIA 1119 1986 1 1 1986 1987 0 1 781893INDIA 1119 1987 0 1 1987 1988 1 1 798680INDIA 1119 1988 0 1 1988 1989 1 1 815590INDIA 1119 1989 0 1 1989 1990 1 1 832535INDIA 1119 1990 0 1 1990 1991 1 1 849515INDIA 1119 1991 0 1 1991 1992 1 1 866530

• Stset command:stset end, failure(es==1) origin(1970)

Note: It is common to drop cases that are not at risk (ex: if start state = 1)BUT, it is not necessary if stset is done correctly… Stata drops cases after the event by default

Page 29: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Time-Varying Data Structure

• What if countries pass multiple laws?• Called “repeated events• 1. start state could be reset to zero• 2. We can override the stata default of removing

cases after the first event occurs: exit(time .)

newname2 newid3 year law eventnum start end ss es popINDIA 1119 1978 0 1 1978 1979 0 0 656941INDIA 1119 1979 0 1 1979 1980 0 0 672021INDIA 1119 1980 0 1 1980 1981 0 0 687332INDIA 1119 1981 0 1 1981 1982 0 0 702821INDIA 1119 1982 0 1 1982 1983 0 0 718426INDIA 1119 1983 0 1 1983 1984 0 0 734072INDIA 1119 1984 0 1 1984 1985 0 0 749677INDIA 1119 1985 0 1 1985 1986 0 0 765147INDIA 1119 1986 1 1 1986 1987 0 1 781893INDIA 1119 1987 0 1 1987 1988 0 0 798680INDIA 1119 1988 0 1 1988 1989 0 0 815590INDIA 1119 1989 0 1 1989 1990 0 0 832535INDIA 1119 1990 0 1 1990 1991 0 1 849515INDIA 1119 1991 0 1 1991 1992 0 0 866530

Page 30: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cumulative Survivor Function

0.0

00

.25

0.5

00

.75

1.0

0

1970 1980 1990 2000analysis time

Kaplan-Meier survival estimate

Page 31: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cumulative Survivor Functionby Region

0.0

00

.25

0.5

00

.75

1.0

0

1970 1980 1990 2000analysis time

gregion = industrialized west gregion = central and south americagregion = asia gregion = middle eastgregion = africa

Kaplan-Meier survival estimates, by gregion

Page 32: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Cumulative Survivor Function West vs. non-West

0.0

00

.25

0.5

00

.75

1.0

0

1970 1980 1990 2000analysis time

west = 0 west = 1

Kaplan-Meier survival estimates, by west

Page 33: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Smoothed Hazard FunctionWest vs. non-West

0.0

5.1

.15

1970 1980 1990 2000analysis time

west = 0 west = 1

Smoothed hazard estimates, by west

Page 34: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model: Example

• Simple one-variable model comparing west vs. non-west

Exponential regression -- log relative-hazard form

No. of subjects = 97 Number of obs = 2047No. of failures = 81Time at risk = 2047 Wald chi2(1) = 12.10Log pseudolikelihood = 275.49924 Prob > chi2 = 0.0005

(Std. Err. adjusted for 97 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- west | .6931146 .1992638 3.48 0.001 .3025648 1.083664 _cons | -3.34054 .0807514 -41.37 0.000 -3.49881 -3.18227

Page 35: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model: Example

• Model with time-varying covariates

No. of subjects = 92 Number of obs = 1938No. of failures = 77Time at risk = 1938 Wald chi2(6) = 94.29Log pseudolikelihood = 282.11796 Prob > chi2 = 0.0000

(Std. Err. adjusted for 92 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | -.044568 .1842564 -0.24 0.809 -.4057039 .3165679 degradation | -.4766958 .1044108 -4.57 0.000 -.6813372 -.2720543 education | .0377531 .0130314 2.90 0.004 .0122121 .0632942 democracy | .2295392 .0959669 2.39 0.017 .0414475 .417631 ngo | .4258148 .1576803 2.70 0.007 .1167671 .7348624 ingo | .3114173 .365112 0.85 0.394 -.4041891 1.027024 _cons | -4.565513 1.864396 -2.45 0.014 -8.219663 -.9113642

Page 36: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model : Example

• What if we expect global civil society to have a particularly strong effect in the non-West?

• Option #1: Create an interaction termNo. of subjects = 92 Number of obs = 1938No. of failures = 77Time at risk = 1938 Wald chi2(8) = 91.25Log pseudolikelihood = 282.5435 Prob > chi2 = 0.0000

(Std. Err. adjusted for 92 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | -.0789765 .2546507 -0.31 0.756 -.5780827 .4201298 degradation | -.4656443 .1177774 -3.95 0.000 -.6964838 -.2348047 education | .0425672 .0137641 3.09 0.002 .01559 .0695444 democracy | .2277121 .0951693 2.39 0.017 .0411836 .4142406 ngo | .4069064 .1595268 2.55 0.011 .0942397 .7195732 ingo | -.1326514 .6842896 -0.19 0.846 -1.473834 1.208532 nonwest | -3.345421 4.94285 -0.68 0.499 -13.03323 6.342387ingoXnonwest | .49408 .6819827 0.72 0.469 -.8425815 1.830741 _cons | -1.28664 5.692187 -0.23 0.821 -12.44312 9.869841

Page 37: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model : Example

• What if we expect global civil society to have a particularly strong effect in the non-West?

• Option #2: Include only non-Western countries in the analysis

No. of subjects = 76 Number of obs = 1720No. of failures = 61Time at risk = 1720 Wald chi2(6) = 55.26Log pseudolikelihood = 215.57325 Prob > chi2 = 0.0000

(Std. Err. adjusted for 76 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .3521921 .3470927 1.01 0.310 -.3280971 1.032481 degradation | -.7326479 .2566293 -2.85 0.004 -1.235632 -.2296637 education | .0314009 .0193698 1.62 0.105 -.0065633 .069365 democracy | .2387203 .0935281 2.55 0.011 .0554087 .422032 ngo | .3604018 .1984957 1.82 0.069 -.0286426 .7494462 ingo | .5447586 .4949746 1.10 0.271 -.4253738 1.514891 _cons | -8.446306 3.872579 -2.18 0.029 -16.03642 -.8561915

Page 38: Event History Analysis 4 Sociology 8811 Lecture 18 Copyright © 2007 by Evan Schofer Do not copy or distribute without permission

Constant Rate Model: Example

• What if we expect global civil society to have a particularly strong effect in developing countries, but only in earlier periods? We can end the analysis in 1990

No. of subjects = 71 Number of obs = 1348No. of failures = 21Time at risk = 1348 Wald chi2(6) = 47.11Log pseudolikelihood = 64.196123 Prob > chi2 = 0.0000

(Std. Err. adjusted for 71 clusters in newid3)------------------------------------------------------------------------------ | Robust _t | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+---------------------------------------------------------------- gdp | .6710042 .8928315 0.75 0.452 -1.078913 2.420922 degradation | -.8862021 .6193312 -1.43 0.152 -2.100069 .3276647 education | .0106282 .0399874 0.27 0.790 -.0677456 .0890021 democracy | -.0359767 .2079641 -0.17 0.863 -.4435789 .3716255 ngo | .2688239 .3306838 0.81 0.416 -.3793045 .9169522 ingo | 1.933407 .5669864 3.41 0.001 .8221337 3.044679 _cons | -19.4724 7.021114 -2.77 0.006 -33.23353 -5.711268