analysis of complex survey data
DESCRIPTION
Analysis of Complex Survey Data. Day 4: Survival analysis and Cox proportional hazards models. Nonparametric Survival Analysis. Kaplan-Meier Method (also called Product-Limit Method) Life Table Method (also called Actuarial Method). Nonparametric Survival Analysis. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/1.jpg)
Analysis of Complex Survey Data
Day 4: Survival analysis and Cox proportional hazards models
![Page 2: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/2.jpg)
Nonparametric Survival Analysis
• Kaplan-Meier Method (also called Product-Limit Method)
• Life Table Method (also called Actuarial Method)
![Page 3: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/3.jpg)
A statistical method to study time to an event
1) Divide risk period into many small time intervals
2) Treat each interval as a small cohort analysis
3) Combine the results for the intervals
Nonparametric Survival Analysis
![Page 4: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/4.jpg)
Basic Concepts of Survival Analysis
1. Censoring
2. Time to an event
3. Survival Function
![Page 5: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/5.jpg)
Censoring
• At the end of study, subjects did not experience the event (outcome). Or subjects withdrew from a study (lost to follow up or died from other diseases).
• Survival analysis assumes LTF and competing cause censoring is random (independent of exposure and outcome)
• When using longitudinal complex surveys (e.g., PSID, AddHealth), survival analysis is most useful
• We can also use it in cross-sectional studies when incorporating retrospective age of onset information.
![Page 6: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/6.jpg)
Cohort Size at Start : 1,000 for 1 year
Number with disease : 28
Number LTF: 15
If assume all dropped out on 1st day of study, rate of disease/y
Censoring
= 28
1,000 - 15=
28
985= .0284
If assume all dropped out on last day of study, probability of disease =
28
1,000= .0280
If drop out rate is constant over the period best estimate of when dropped out is midpoint : probability of disease then is
= 28
1,000 – 7.5=
28
992.5= .0282
Example:
![Page 7: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/7.jpg)
Survival Function
The probability of surviving beyond a specific time [i.e., S(t) = 1 – F(t)]
F(t) = cumulative probability distribution for endpoint (e.g., death)
![Page 8: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/8.jpg)
Probability for survival at each new time period =Probability at that time period conditioned “surviving” to that interval
n
o
p
Probability survival to S4 =
n * o * p * q
S1
S2
q
F
S3
S4
F
F
FFailures (F) = deaths or cases or losses to follow up
![Page 9: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/9.jpg)
Life Table Method
• Time is partitioned into a fixed sequence of intervals (not necessarily of equal lengths)
A classical method of estimating the survival function in epidemiology and actuarial science
Useful for large samples
Interval lengths (arbitrary)Larger the interval, larger the bias
![Page 10: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/10.jpg)
• The LIFETEST Procedure• Stratum 1: platelet = 0• Life Table Survival Estimates
• Conditional• Effective Conditional Probability• Interval Number Number Sample Probability Standard• [Lower, Upper) Failed Censored Size of Failure Error Survival Failure
• 0 10 4 0 9.0 0.4444 0.1656 1.0000 0• 10 20 2 1 4.5 0.4444 0.2342 0.5556 0.4444• 20 30 0 0 2.0 0 0 0.3086 0.6914• 30 40 1 0 2.0 0.5000 0.3536 0.3086 0.6914• 40 50 0 0 1.0 0 0 0.1543 0.8457• 50 60 1 0 1.0 1.0000 0 0.1543 0.8457
Effective sample size: whenever there is censoring (withdrawal or loss), we assume that, on average, those individuals who became lost or withdrawn during the interval were at risk for half the interval.
Thus, effective sample size (n*)= n – ½ (censoring #)E.g., effective sample size (1st interval) = 9 – ½ (0) = 9
N*
E.g., effective sample size (2nd interval) = 5 – ½ (1) = 4.5
![Page 11: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/11.jpg)
• The LIFETEST Procedure• Stratum 1: platelet = 0• Life Table Survival Estimates
• Conditional• Effective Conditional Probability• Interval Number Number Sample Probability Standard• [Lower, Upper) Failed Censored Size of Failure Error Survival Failure
• 0 10 4 0 9.0 0.4444 0.1656 1.0000 0• 10 20 2 1 4.5 0.4444 0.2342 0.5556 0.4444• 20 30 0 0 2.0 0 0 0.3086 0.6914• 30 40 1 0 2.0 0.5000 0.3536 0.3086 0.6914• 40 50 0 0 1.0 0 0 0.1543 0.8457• 50 60 1 0 1.0 1.0000 0 0.1543 0.8457
Conditional Probability of Failure: Number failed / Effective Sample Sizee.g., P(F) (1st interval) = 4/9 = .44
P(F)
e.g., P(F) (2nd interval) = 2/4.5 = .44Survival probability (in each interval) = 1- failure probability (in each interval)Cum Survival Prob (S(t)) = S (t-1) * S(t)
e.g., S(1) = 1 * (1-.4444) = 1* 0.5556 =.5556e.g., S(2) = S(0)* S(1) * S(2)
S(2) =1*(1-.4444)* (1-.4444) =1 * .5556 * .5556 = .3086
Cumulative Survival
![Page 12: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/12.jpg)
Kaplan-Meier (Product-limit) Method
• Time is partitioned into variable intervals
Whenever a case arises, set up a time interval. Use the actual censored and event times
If censored times > last event time, then the average duration will be underestimated using KM method
![Page 13: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/13.jpg)
Kaplan-Meier Method
Patient 1
Patient 2
Patient 3
Patient 4
Patient 5
Patient 6
died
died
died
died
Lost to follow-up
Lost to follow-up
Months Since Enrollment
4 10 14 24
![Page 14: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/14.jpg)
Kaplan-Meier Method(1)
Times to death from
starting treatment (Months)
(2)Number alive at
each time
(3)Number who died at each
time
(4)HAZARDProportion who died at that time:(3)/(2)
(5)Proportion
who survived at that time:
1.00-(4)
(6)Cumulative Survival
4 6 1 .167 .833 .833
10 4 1 .250 .750 .625
14 3 1 .333 .667 .417
24 1 1 1.00 .000 .000
![Page 15: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/15.jpg)
Kaplan-Meier Plot (N=6)
0 4 10 14 24
Months After Enrollment
% Surviving
0
20
40
80
100
60
.833
.625
.417
.0
![Page 16: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/16.jpg)
Kaplan-Meier Curve (N = 5,398).
Days to Settlement Censored at Nov 1, 1997
1400120010008006004002000
Pro
porti
on o
f Cla
iman
ts
1.0
.8
.6
.4
.2
0.0
Tort
No Fault 1
No Fault 2
“Effect of eliminating compensation for pain and suffering on the outcome of insurance claims for whiplash injury” Cassidy JD et al., N Engl J Med 2000;342:1179-1186
![Page 17: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/17.jpg)
Median Survival Time
Days to Settlement Censored at Nov 1, 1997
1400120010008006004002000
Pro
porti
on o
f Cla
iman
ts
1.0
.8
.6
.4
.2
0.0
Tort
No Fault 1
No Fault 2
![Page 18: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/18.jpg)
Semi-Parametric Methods
• Not required to choose some particular probability distribution to represent survival time
• Incorporate time-dependent covariates
Example: exposure increases over time as with drug dosage or with workers in hazardous occupations
![Page 19: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/19.jpg)
Cox Proportional Hazards Model
• 1. Proportional Hazards Model
Basic Model of the hazard for individual i at time t
hi(t) = 0(t) exp{β1xi1 + ….. + βkxik}Baseline hazard
functionNon-negative
Linear function of fixed covariates
Take the logarithm of both sides, log hi(t) = (t) +β1xi1 + ….. + βkxik
log 0(t) No need to specify the
functional form of baseline hazard function
![Page 20: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/20.jpg)
Cox Proportional Hazards Model• 1. Proportional Hazards Model
Consider the hazard ratio of two individuals i and j hi(t) = 0(t) exp{β1xi1 + ….. +
βkxik}hi(t) = 0(t) exp{β1xj1 + ….. + βkxjk}
Hazard ratio = exp{β1(xi1 -xj1) ….. + βk(xik-xjk)} Hazard functions are multiplicatively
related, hazard ratio is constant over survival time.
Hazards of any two individuals are proportional.
![Page 21: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/21.jpg)
Cox Proportional Hazards Model• 2. Partial Likelihood Estimation
Estimate the β coefficients of the Cox model without having to specify the baseline hazard function 0(t)
Partial likelihood depends only on the order in which events occur, not on the exact times of occurrence.
Partial likelihood estimates are not fully efficient because of loss of information about exact times of event occurrence
![Page 22: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/22.jpg)
• No intercept h0(t): an arbitrary function of time. Cancel out of the estimating equations
Interpretation of Coefficients
eβ: Hazard ratioIndicator variables (coded as 0 and 1)
Hazard ratio of the estimated hazard for those with a value of 1 in X to the estimated hazard for those with a value of 0 in X (controlling for other covariates)
Quantitative (Continuous) variablesEstimated percent change in the hazard for each one-unit increase in X. For example, variable AGE, eβ=1.5, which yields 100(1.5 - 1) =50. For each one-year increase in the age at diagnosis, the hazard of death goes up by an estimated 50 percent, controlling for other covariates.
![Page 23: Analysis of Complex Survey Data](https://reader036.vdocument.in/reader036/viewer/2022062520/56815ee7550346895dcd93d7/html5/thumbnails/23.jpg)
Lab 4: estimating survival curves and Cox models in SUDAAN