sc968: panel data methods for sociologists
DESCRIPTION
SC968: Panel Data Methods for Sociologists. Introduction to survival/event history models. Types of outcome. ContinuousOLS Linear regression BinaryBinary regression Logistic or probit regression Time to event dataSurvival or event history analysis. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/1.jpg)
SC968: Panel Data Methods for Sociologists
Introduction to survival/event history models
![Page 2: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/2.jpg)
Types of outcome
Continuous OLS Linear regression
Binary Binary regressionLogistic or probit regression
Time to event data Survival or event history analysis
![Page 3: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/3.jpg)
Examples of time to event data
Time to death Time to incidence of disease Unemployed - time till find job Time to birth of first child Smokers – time till quit smoking
![Page 4: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/4.jpg)
Time to event data
Set of a finite, discrete states Units (individuals, firms, households
etc.) –in one state Transitions between states
Time until a transition takes place
![Page 5: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/5.jpg)
4 key concepts for survival analysis
States Events Risk period Duration/ time
![Page 6: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/6.jpg)
States
States are categories of the outcome variable of interest Each person occupies exactly one state at any moment in
time Examples
alive, dead single, married, divorced, widowed never smoker, smoker, ex-smoker
Set of possible states called the state space
![Page 7: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/7.jpg)
Events
A transition from one state to another From an origin state to a destination state Possible events depend on the state space Examples
From smoker to ex-smoker From married to widowed
Not all transitions can be events E.g. from smoker to never smoker
![Page 8: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/8.jpg)
Risk period
2 states: A & B Event: transition from A B To be able to undergo this transition, one must be in
state A (if in state B already cannot transition) Not all individuals will be in state A at any given time Example
can only experience divorce if married
The period of time that someone is at risk of a particular event is called the risk period
All subjects at risk of an event at a point in time called the risk set
![Page 9: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/9.jpg)
Time
Various meanings...
Calendar time ...but onset of risk usually not simultaneous
for all units Ex: by age 40, some individuals will have
smoked for 20+ years, other for 1 year Duration=time since onset of risk ...intensity may not be the same
EX: one smoker may smoke 5 cigarettes a day, another 20
1 unit of time -same for all individuals
![Page 10: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/10.jpg)
Duration
Event history analysis is to do with the analysis of the duration of a nonoccurrence of an event or the length of time during the risk period
Examples Duration of marriage Length of life
In practice we model the probability of a transition conditional on being in the risk set
![Page 11: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/11.jpg)
Example data
ID Entry date Died End date
1 01/01/1991 01/01/2008
2 01/01/1991 01/01/2000 01/01/2000
3 01/01/1995 01/01/2005
4 01/01/1994 01/07/2004 01/07/2004
![Page 12: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/12.jpg)
Calendar time
1991 1994 1997 2000 2003 2006 2009
Study follow-up ended
![Page 13: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/13.jpg)
Censoring
Ideally: observe individual since the onset of risk until event has occurred
...very demanding in terms of data collection (ex: risk of death starts when one is born)
Usually– incomplete data censoring An observation is censored if it has incomplete
information Types of censoring
Right censoring Left censoring
![Page 14: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/14.jpg)
Censoring
Right censoring: the person did not experience the event during the time that they were studied
Common reasons for right censoring the study ends the person drops-out of the study
We do not know when the person experiences the event but we do know that it is later than a given time T
Left censoring: the person became at risk before we started observing her We do not know when the person entered the risk set EHA
cannot deal with We know when the person entered the risk set condition on
the person having survived long enough to enter the study Censoring independent of survival processes!!
![Page 15: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/15.jpg)
Study time in years
0 3 6 9 12 15 18
censored
event
censored
event
![Page 16: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/16.jpg)
Why a special set of methods?
duration =continuous variable why not OLS? Censoring
If excluding higher probability to throw out longer durations If treating as complete mis-measurement of duration
Non normality of residuals Time varying co-variates Interested in the probability of a transition at any given
time rather than in the length of complete spells Need to simultaneously take into account:
Whether the event has taken place or not The length of the period at risk before the event ocurred
![Page 17: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/17.jpg)
Survival function
Length of time (duration) before an event occurs (length of ‘spell’-T) probability density function (pdf)- f(t)
f(t)= lim Pr(t<=T<=t+Δt) = δF(t) δt Δt0 Δt
cumulative density function (cdf)- F(t)F(t)= Pr( T<=t) =∫f(t) dt
Survival function: S(t)=1-F(t)
![Page 18: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/18.jpg)
Hazard rate
h(t)= f(t)/ S(t) The exact definition & interpretation of h(t) differs:
duration is continuous duration is discrete
Conditional on having survived up to t, what is the probability of leaving between t and t+Δt
It is a measure of risk intensity h(t) >=0 In principle h(t)= rate; not a probability There is a 1-1 relationship between h(t), f(t), F(t), S(t) EHA analysis:
h(t)= g (t, Xs) g=parametric & semi-parametric specifications
![Page 19: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/19.jpg)
Data
Survival or event history data characterised by 2 variables Time or duration of risk period Failure (event)
• 1 if not survived or event observed• 0 if censored or event not yet occurred
Data structure different: Duration is discrete Duration is continuous
Assume: 2 states; 1 transition; no repeated events
![Page 20: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/20.jpg)
Data structure-Discrete time
ID Entry End date Event X at t0 X at t1 ....
1 01/01/1991 01/01/2008 01/01/2002
2 01/01/1991 01/01/2008
ID Date Duration (t) Event X
1 01/01/1991 1 01 01/01/1992 2 0... ..... .... .....1 01/01/2002 11 12 01/01/1991 1 0... .... .... ....2 01/01/2008 17 0
![Page 21: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/21.jpg)
Data structure-Discrete time
The row is a an individual period An individual has as many rows as the number of
periods he is observed to be at risk No longer at risk when
Experienced event No longer under observation (censored)
For each period (row)- explanatory variable X very easy to incorporate time varying co-variates
Stata: reshape long
![Page 22: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/22.jpg)
Data structure-continuous time
ID Entry Died End date Duration Event X
1 01/01/1991 01/01/2008 17.0 0 02 01/01/1991 01/01/2002 01/01/2002 11.0 1 03 01/01/1995 01/01/2000 5.0 0 03 01/01/2000 01/01/2005 01/01/2005 5.0 1 1
![Page 23: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/23.jpg)
Data structure-continuous time
The row is a person Indicator for observed events/ censored cases Calculate duration= exit date – entry date Exit date=
Failure date Censoring date
If time-varying covariates- Split the period an individual is under observation by the
number of times time-varying Xs change If many Xs-change often- multiple rows
![Page 24: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/24.jpg)
Worked example
Random 20% sample from BHPS Waves 1 – 15 One record per person/wave Outcome: Duration of cohabitation Conditions on cohabiting in first wave Survival time: years from entry to the study in 1991
till year living without a partner
![Page 25: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/25.jpg)
The data
+----------------------------+ | pid wave mastat | |----------------------------| | 10081798 1 married | | 10081798 2 married | | 10081798 3 married | | 10081798 4 married | | 10081798 5 married | | 10081798 6 married | | 10081798 7 widowed | | 10081798 8 widowed | | 10081798 9 widowed | | 10081798 10 widowed | | 10081798 11 widowed | | 10081798 12 widowed | | 10081798 13 widowed | | 10081798 14 widowed | | 10081798 15 widowed | |----------------------------|
Duration = 6 years
Event = 1
Ignore data after event = 1
![Page 26: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/26.jpg)
The data (continued)
+----------------------------+ | pid wave mastat | |----------------------------| | 10162747 1 living a | | 10162747 2 living a | | 10162747 3 living a | | 10162747 4 living a | | 10162747 5 living a | | 10162747 6 living a | | 10162747 10 separate | | 10162747 11 . | | 10162747 12 . | | 10162747 13 . | | 10162747 14 never ma | | 10162747 15 never ma | +----------------------------+
Note missing waves before event
![Page 27: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/27.jpg)
Preparing the data
. sort pid wave . generate skey=1 if wave==1&(mastat==1|mastat==2) . by pid: replace skey=skey[_n-1] if wave~=1 . keep if skey==1 . drop skey . . stset wave,id(pid) failure(mastat==3/6) id: pid failure event: mastat == 3 4 5 6 obs. time interval: (wave[_n-1], wave] exit on or before: failure ------------------------------------------------------------------------------ 15058 total obs. 1628 obs. begin on or after (first) failure ------------------------------------------------------------------------------ 13430 obs. remaining, representing 1357 subjects 270 failures in single failure-per-subject data 13612 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 15
Select records for respondents who were cohabiting in 1991
Declare that you want to set the data to survival time
Important to check that you have set data as intended
![Page 28: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/28.jpg)
Checking the data setup. list pid wave mastat _st _d _t _t0 if pid==10081798,sepby(pid) noobs +-------------------------------------------------+ | pid wave mastat _st _d _t _t0 | |-------------------------------------------------| | 10081798 1 married 1 0 1 0 | | 10081798 2 married 1 0 2 1 | | 10081798 3 married 1 0 3 2 | | 10081798 4 married 1 0 4 3 | | 10081798 5 married 1 0 5 4 | | 10081798 6 married 1 0 6 5 | | 10081798 7 widowed 1 1 7 6 | | 10081798 8 widowed 0 . . . | | 10081798 9 widowed 0 . . . | | 10081798 10 widowed 0 . . . | | 10081798 11 widowed 0 . . . | | 10081798 12 widowed 0 . . . | | 10081798 13 widowed 0 . . . | | 10081798 14 widowed 0 . . . | | 10081798 15 widowed 0 . . . | +-------------------------------------------------+ 1 if observation is to be used
and 0 otherwise
1 if event, 0 if censoring orevent not yet occurred
time of exit
time of entry
![Page 29: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/29.jpg)
Checking the data setup
. list pid wave mastat _st _d _t _t0 if pid==10162747,sepby(pid) noobs +--------------------------------------------------+ | pid wave mastat _st _d _t _t0 | |--------------------------------------------------| | 10162747 1 living a 1 0 1 0 | | 10162747 2 living a 1 0 2 1 | | 10162747 3 living a 1 0 3 2 | | 10162747 4 living a 1 0 4 3 | | 10162747 5 living a 1 0 5 4 | | 10162747 6 living a 1 0 6 5 | | 10162747 10 separate 1 1 10 6 | | 10162747 11 . 0 . . . | | 10162747 12 . 0 . . . | | 10162747 13 . 0 . . . | | 10162747 14 never ma 0 . . . | | 10162747 15 never ma 0 . . . | +--------------------------------------------------+ How do we know when
this person separated?
![Page 30: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/30.jpg)
Trying again!
. fillin pid wave . stset wave,id(pid) failure(mastat==3/6) exit(mastat==3/6 .) id: pid failure event: mastat == 3 4 5 6 obs. time interval: (wave[_n-1], wave] exit on or before: mastat==3 4 5 6 . ------------------------------------------------------------------------------ 20355 total obs. 7524 obs. begin on or after exit ------------------------------------------------------------------------------ 12831 obs. remaining, representing 1357 subjects 234 failures in single failure-per-subject data 12831 total analysis time at risk, at risk from t = 0 earliest observed entry t = 0 last observed exit t = 15
![Page 31: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/31.jpg)
. list pid wave mastat _st _d _t _t0 if pid==10162747,sepby(pid) noobs +--------------------------------------------------+ | pid wave mastat _st _d _t _t0 | |--------------------------------------------------| | 10162747 1 living a 1 0 1 0 | | 10162747 2 living a 1 0 2 1 | | 10162747 3 living a 1 0 3 2 | | 10162747 4 living a 1 0 4 3 | | 10162747 5 living a 1 0 5 4 | | 10162747 6 living a 1 0 6 5 | | 10162747 7 . 1 0 7 6 | | 10162747 8 . 0 . . . | | 10162747 9 . 0 . . . | | 10162747 10 separate 0 . . . | | 10162747 11 . 0 . . . | | 10162747 12 . 0 . . . | | 10162747 13 . 0 . . . | | 10162747 14 never ma 0 . . . | | 10162747 15 never ma 0 . . . | +--------------------------------------------------+
Checking the new data setup
Now censored instead of an event
![Page 32: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/32.jpg)
Summarising time to event data
Individuals followed up for different lengths of time So can’t use prevalence rates (% people who have
an event) Use rates instead that take account of person years
at risk Incidence rate per year Death rate per 1000 person years
![Page 33: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/33.jpg)
Summarising time to event data
Number of observationsPerson-years Rate per year
<25% of sample had event by 15 elapsed years
. stsum failure _d: mastat == 3 4 5 6 analysis time _t: wave exit on or before: mastat==3 4 5 6 . id: pid | incidence no. of |------ Survival time -----| | time at risk rate subjects 25% 50% 75% ---------+--------------------------------------------------------------------- total | 12831 .0182371 1357 . . .
stvary-check whether a variable varies within individuals and over time
![Page 34: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/34.jpg)
Descriptive analysis
To recap…. pdf= probability that a spell has a length of
exactly Tf(t)= lim Pr(t<=T<=t+Δt) = δF(t) δt
Δt0 Δt cdf=probability that a spell has a length<=T F(t)= Pr( T<=t) =∫f(t) dt Survival function S(t)=1-F(t)
![Page 35: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/35.jpg)
Kaplan-Meier estimates of survival time
The Kaplan-Meier cumulative probability of an individual surviving to any time, t
Analysis can be made by subgroup Nonparametric method First period: S1=1-d1/n1 exit rate After t periods: St=(1-d1/n1)*(1-d2/n2)*……*(1-dt/nt) Survival function estimated only at times where
you observe exits!!! Last t that can be estimated highest non-censored
time observed
![Page 36: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/36.jpg)
Survival/ failure function
Describing the survival/ failure function. sts list, failure failure _d: mastat == 3 4 5 6 analysis time _t: wave exit on or before: mastat==3 4 5 6 . id: pid Beg. Net Failure Std. Time Total Fail Lost Function Error [95% Conf. Int.] ------------------------------------------------------------------------------- 2 1357 29 162 0.0214 0.0039 0.0149 0.0306 3 1166 33 89 0.0491 0.0061 0.0384 0.0625 4 1044 16 64 0.0636 0.0070 0.0513 0.0789 5 964 35 58 0.0976 0.0088 0.0818 0.1164 6 871 12 34 0.1101 0.0094 0.0931 0.1300 7 825 20 24 0.1316 0.0103 0.1128 0.1534 8 781 14 17 0.1472 0.0109 0.1271 0.1701 9 750 12 30 0.1609 0.0115 0.1398 0.1848 10 708 15 23 0.1786 0.0121 0.1563 0.2038 11 670 9 32 0.1897 0.0125 0.1666 0.2155 12 629 8 16 0.2000 0.0128 0.1762 0.2266 13 605 13 24 0.2172 0.0134 0.1922 0.2449 14 568 8 24 0.2282 0.0138 0.2025 0.2566 15 536 10 526 0.2426 0.0143 0.2160 0.2719 -------------------------------------------------------------------------------
![Page 37: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/37.jpg)
Kaplan-Meier graphs
Can read off the estimated probability of surviving a relationship at any time point on the graph E.g. at 5 years 88% are still cohabiting
The survival probability only changes when an event occurs graph not smooth but (irregular) stepwise
sts graph, survival
![Page 38: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/38.jpg)
0.00
0.25
0.50
0.75
1.00
0 5 10 15analysis time
Kaplan-Meier survival estimate
![Page 39: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/39.jpg)
0.00
0.25
0.50
0.75
1.00
0 5 10 15time in years
Kaplan-Meier survival estimate
![Page 40: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/40.jpg)
0.00
0.25
0.50
0.75
1.00
0 5 10 15analysis time
sex = male sex = female
Comparing survival by group using Kaplan-Meier graphs
![Page 41: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/41.jpg)
Testing equality of survival curves among groups
The log-rank test
A non –parametric test that assesses the null hypothesis that there are no differences in survival times between groups
![Page 42: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/42.jpg)
. sts test sex, logrank failure _d: mastat == 3 4 5 6 analysis time _t: wave exit on or before: mastat==3 4 5 6 . id: pid Log-rank test for equality of survivor functions | Events Events sex | observed expected -------+------------------------- male | 98 113.59 female | 136 120.41 -------+------------------------- Total | 234 234.00 chi2(1) = 4.25 Pr>chi2 = 0.0392
Log-rank test example
Significant difference between men and women
![Page 43: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/43.jpg)
More elaborate models…
Modeling the hazard rate not survival time directly h(t)=transitioning at time t, having survived up to t Time:
Continuous- parametric• Exponential• Weibull• Log-logistic
Continuous-semi-parametric• Cox
Discrete• Logistic• Complementary log-log
![Page 44: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/44.jpg)
Some hazard shapes
Increasing Onset of Alzheimer's
Decreasing Survival after surgery
U-shaped Age specific mortality
Constant Time till next email arrives
![Page 45: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/45.jpg)
Proportional-hazards (PH) models
h(t) is separable into h0(t) and the effects of Xs
h0(t)=‘baseline’ hazard that depends on t but not on individual characteristics
h(t)=h0(t)exp(βX) Absolute differences in X proportional
differences in h(t) ~scaling of h0(t)
![Page 46: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/46.jpg)
The Cox regression model
![Page 47: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/47.jpg)
Cox regression model
Regression model for survival analysis Can model time invariant and time varying
explanatory variables Produces estimated hazard ratios (sometimes
called rate ratios or risk ratios) Regression coefficients are on a log scale
Exponentiate to get hazard ratio Similar to odds ratios from logistic models
![Page 48: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/48.jpg)
Cox regression equation (i)
).......exp()()( 22110 inniii xxxthth
)(0 th
)(thi
is the baseline hazard function and can take any formIt is estimated from the data (non parametric)
is the hazard function for individual i
inii xxx ,....,, 21
n ,....,, 21
are the covariates
are the regression coefficients estimated from the data
PH assumption neededEstimate βs without estimating h0(t) semi parametric model
![Page 49: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/49.jpg)
Cox regression equation (ii)
If we divide both sides of the equation on the previous slide by h0(t) and take logarithms, we obtain:
We call h(t) / h0(t) the hazard ratio The coefficients bi...bn are estimated by Cox regression, and can
be interpreted in a similar manner to that of multiple logistic regression
exp(bi) is the instantaneous relative risk of an event
inniii xxxthth
.......
)()(
ln 22110
![Page 50: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/50.jpg)
Cox regression in Stata
Will first model a time invariant covariate (sex) on risk of partnership ending
Then will add a time dependent covariate (age) to the model
![Page 51: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/51.jpg)
Cox regression in Stata
. stcox female failure _d: mastat == 3 4 5 6 analysis time _t: wave exit on or before: mastat==3 4 5 6 . id: pid Cox regression -- Breslow method for ties No. of subjects = 1357 Number of obs = 12337 No. of failures = 234 Time at risk = 12337 LR chi2(1) = 4.18 Log likelihood = -1574.5782 Prob > chi2 = 0.0409 ------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 1.30913 .1734699 2.03 0.042 1.009699 1.697358 ------------------------------------------------------------------------------
![Page 52: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/52.jpg)
Interpreting output from Cox regression
Cox model has no intercept It is included in the baseline hazard
In our example, the baseline hazard is when sex=1 (male) The hazard ratio is the ratio of the hazard for a unit
change in the covariate HR = 1.3 for women vs. men The risk of partnership breakdown is increased by 30% for women
compared with men Hazard ratio assumed constant over time
At any time point, the hazard of partnership breakdown for a woman is 1.3 times the hazard for a man
![Page 53: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/53.jpg)
Interpreting output from Cox regression (ii)
The hazard ratio is equivalent to the odds that a female has a partnership breakdown before a man
The probability of having a partnership breakdown first is = (hazard ratio) / (1 + hazard ratio)
So in our example, a HR of 1.30 corresponds to aprobability of 0.57 that a woman will experience a partnership breakdown first
The probability or risk of partnership breakdown can be different each year but the relative risk is constant
So if we know that the probability of a man having a partnership breakdown in the following year is 1.5% then the probability of a woman having a partnership breakdown in the following year is
0.015*1.30 = 1.95%
![Page 54: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/54.jpg)
0.0
5.1
.15
.2.2
5
0 5 10 15_t
sex = women sex = men
Estimated cumulative hazard: men vs. women
![Page 55: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/55.jpg)
.012
.014
.016
.018
.02
Sm
ooth
ed h
azar
d fu
nctio
n
4 6 8 10 12analysis time
hazard function varying over time
Cox proportional hazards regression:
![Page 56: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/56.jpg)
Time dependent covariates
Examples Current age group rather than age at baseline GHQ score may change over time and predict break-ups
Will use age to predict duration of cohabitation Nonlinear relationship hypothesised Recode age into 8 equally spaced age groups
![Page 57: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/57.jpg)
Cox regression with time dependent covariates
. xi: stcox female i.agecat i.agecat _Iagecat_0-7 (naturally coded; _Iagecat_0 omitted) failure _d: mastat == 3 4 5 6 analysis time _t: wave exit on or before: mastat==3 4 5 6 . id: pid Cox regression -- Breslow method for ties No. of subjects = 1357 Number of obs = 12337 No. of failures = 234 Time at risk = 12337 LR chi2(8) = 78.44 Log likelihood = -1537.4472 Prob > chi2 = 0.0000 ------------------------------------------------------------------------------ _t | Haz. Ratio Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- female | 1.3705 .1842481 2.34 0.019 1.05304 1.783666 _Iagecat_1 | .5838602 .1883578 -1.67 0.095 .3102449 1.098786 _Iagecat_2 | .311325 .1039311 -3.50 0.000 .1618279 .5989281 _Iagecat_3 | .2136714 .0737986 -4.47 0.000 .1085813 .4204725 _Iagecat_4 | .2225187 .0811395 -4.12 0.000 .1088888 .4547261 _Iagecat_5 | .4770023 .1691695 -2.09 0.037 .238035 .9558732 _Iagecat_6 | 1.203702 .4306775 0.52 0.604 .5969856 2.427023 _Iagecat_7 | 1.644141 .9677715 0.84 0.398 .518688 5.21161 ------------------------------------------------------------------------------
![Page 58: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/58.jpg)
Cox regression assumptions
Assumption of proportional hazards No censoring patterns True starting time Plus assumptions for all modelling
Sufficient sample size, proper model specification, independent observations, exogenous covariates, no high multicollinearity, random sampling, and so on
![Page 59: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/59.jpg)
Proportional hazards assumption
Cox regression with time-invariant covariates assumes that the ratio of hazards for any two observations is the same across time periods
This can be a false assumption, for example using age at baseline as a covariate
If a covariate fails this assumption for hazard ratios that increase over time for that covariate,
relative risk is overestimated for ratios that decrease over time, relative risk is
underestimated standard errors are incorrect and significance tests are
decreased in power
![Page 60: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/60.jpg)
Testing the proportional hazards assumption
Graphical methods Comparison of Kaplan-Meier observed & predicted curves
by group. Observed lines should be close to predicted Survival probability plots (cumulative survival against time
for each group). Lines should not cross Log minus log plots (minus log cumulative hazard against
log survival time). Lines should be parallel
![Page 61: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/61.jpg)
Testing the proportional hazards assumption
Formal tests of proportional hazard assumption
Include an interaction between the covariate and a function of time. Log time often used but could be any function. If significant then assumption violated
Test the proportional hazards assumption on the basis of partial residuals. Type of residual known as Schoenfeld residuals.
![Page 62: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/62.jpg)
When assumptions are not met
If categorical covariate, include the variable as a strata variable
Allows underlying hazard function to differ between categories and be non proportional
Estimates separate underlying baseline hazard for each stratum
![Page 63: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/63.jpg)
When assumptions are not met
If a continuous covariate
Consider splitting the follow-up time. For example, hazard may be proportional within first 5 years, next 5-10 years and so on
Could covariate be included as time dependent covariate? There are different survival regression methods (e.g.
parametric models) that do not assume PH
![Page 64: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/64.jpg)
Censoring assumptions
Censored cases must be independent of the survival distribution. There should be no pattern to these cases, which instead should be missing at random.
If censoring is not independent, then censoring is said to be informative
You have to judge this for yourself Usually don’t have any data that can be used to test the
assumption Think carefully about start and end dates Always check a sample of records
![Page 65: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/65.jpg)
True starting time
The ideal model for survival analysis would be where there is a true zero time
If the zero point is arbitrary or ambiguous, the data series will be different depending on starting point. The computed hazard rate coefficients could differ, sometimes markedly
Conduct a sensitivity analysis to see how coefficients may change according to different starting points
![Page 66: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/66.jpg)
Other extensions to survival analysis
Discrete (interval-censored) survival times Repeated events Multi-state models (more than 1 event type)-
competing risks Transition from employment to unemployment or leaving
labour market Modelling type of exit from cohabiting relationship-
separation/divorce/widowhood Frailty (unobserved heterogeneity)
![Page 67: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/67.jpg)
Could you use logistic regression instead?
May produce similar results for short or fixed follow-up periods Examples
• everyone followed-up for 7 years• maximum follow-up 5 years
Results may differ if there are varying follow-up times
If dates of entry and dates of events are available then better to use Cox regression
![Page 68: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/68.jpg)
Finally….
This is just an introduction to survival/ event history analysis
Only reviewed the Cox regression model Also parametric survival methods But Cox regression likely to suit type of analyses of
interest to sociologists
Consider an intensive course if you want to use survival analysis in your own work
![Page 69: SC968: Panel Data Methods for Sociologists](https://reader035.vdocument.in/reader035/viewer/2022062520/56816304550346895dd37f90/html5/thumbnails/69.jpg)
Some Resources
Stephen Jenkins’s course on survival analysis: https://
www.iser.essex.ac.uk/files/teaching/stephenj/ec968/pdfs/ec968lnotesv6.pdf
Allison, Paul D. (1984) Event History Analysis: Regression for Longitudinal Event Data, Sage
Cleves, M., W. Gould, and R. Gutierrez. 2004. An Introduction to Survival Analysis Using Stata. Rev. ed. Stata Press: College Station, Texas