survival analysis - university homepage€¦ · web viewsurvival/failure analysis (aka event...
TRANSCRIPT
Survival/Failure Analysis(AKA Event History Analysis)
T & F Chapter 11
Data Example 1. A medical doctor wished to compare the efficacy of two drugs for treating a sometimes fatal illness. Two groups of patients with the disease were identified. One group was given Drug A. The other group was given Drug B.
The age of the patient at the time of drug administration was recorded.
The patients were then monitored by a special team of patient observers.
The age of the patient at time of death was recorded and the survival duration from the time the patient began taking the drug computed. Several of the patients lived for many years.
The study was terminated when the last patient in the two groups died – more than 60 years after the beginning of the study. (The original researcher died while waiting for the last patient to die. The original researcher’s grandchildren were available to continue the analyses.)
The grandchildren used the Mann-Whitney U-test to compare survival times between the two groups. (The U-test was used because survival times are notoriously positively skewed.)
This is the appropriate way to compare the efficacy of the two drugs.
Problems:
1) The long amount of our time it will take to observe survival times of all patients.
2) What to do about persons who get “lost” – from whom contact was lost.These patients give incomplete data. Should they be ignored – treated as missing values?
Because of Problem 1 above, we typically do NOT wait until every participant in our research has died before analyzing.
Instead we define a Window of Observation, and observe participants only while that window is open.
Survival Analysis – 1 Printed on 10/26/2016
Window of Observation
The problem is that we don’t have an infinite period of time to wait until everyone quits or dies. So what do we do about the persons who are still alive when the window of observation is closed?
Plus, it may be the case that we lose contact with people so for some people we won’t know how long they survived regardless of the length of the window of observation. All we know about them is that they were alive until a specific time. We don’t know whether they’re still alive or not after that time.
The window of observation is the specific time period in which participant survival is recorded.
At some time, we begin recording whether or not each person is surviving or not. At some later time, we quit monitoring each patient.
Because the window is of finite duration, this necessarily results in incomplete information on some participants.
Of particular importance is the fact some will still be alive/working when we quit observing.
This means that we won’t have accurate survival times for some people.
Medical literature
Two treatments for a disease are given. We attempt to record1) Whether or not each patient died – the dichotomous outcome – and 2) how long each patient survived until death – the continuous outcome.
Group A given Drug A.Group B given Drug B.
Turnover literature
Persons are hired by an organization into two different buildings. We attempt to record1) Whether or not each employee quits or retires and2) how long each employee is employed before leaving the organization.
Building A: Kill and Debone chickensBuilding B: Cook the chicken carcasses
Survival Analysis – 2 Printed on 10/26/2016
Overview of Types of cases in survival analysis
-------|------------------------------------------------------|--------
Ideal Cases – each starting time and ending time is known.
Right Censored Cases: Cases whose ending times (time of termination/deathare unknown. These are the most common problem cases.
The above cases are still employed/surviving at the time monitoring ends. ??????????????????????????????
The above case is lost to follow-up (quit answering phone, left state, etc.)
Left Censored Cases: Cases whose starting times are unknown.
We will not include such cases in the analyses conducted here.
Cases whose starting times and ending times are unknown,Fagettaboutit – these are not analyzable.
???? ????
Survival Analysis – 3 Printed on 10/26/2016
????????????????????????????
Time
Monitoring of cases begins, i.e., Window
opens
Monitoring of cases ends, i.e., Window
closes
????????
Incorrect Analysis 1: Use death/quit rates as a proxy for survival
Assuming that persons with long survival times will be less likely to die within the window of observation, we could use death or quit rates as an indicator of survival time.
We could use logistic regression to assess the relation of death or quit rates to independent variables.
(Use linear regression in a pinch praying that the God of statistics won’t strike you down).
Problem – it’s possible to create situations in which most people would agree that the distributions of survival times are different even though proportions of outcomes are identical.
Consider the following . . . Assume we’re dealing with employment.
In the figures, each arrow represents duration of employment for a person. The horizontal axis is time. The vertical line at the left represents the time at which the window of observation opened. The vertical line at the right represents the time at which the window closed. The -> of the arrow represents death/termination.
Group A – Termination Rate = 100%
Group B – Termination Rate = 100%
Clearly, Group A has longer average employment times, but both have the exact same proportion of turnovers – 100% in this example.
So, comparison of death/quit rates may certainly give us an inaccurate picture of the differences between the groups.
Survival Analysis – 4 Printed on 10/26/2016
Incorrect Analysis 2 – Analyze only the durations within the window of observation. Ignore the deaths/turnovers.
Group A. Average Survival time =
Group B. Average Survival time =
In the example above, the two groups have equal (ultimate) death rates but different survival times just within the window of observation – In Group A all subjects had “time to die.” In Group B, subjects were still living when the window of observation closed. In this case, analysis of survival times within the finite window of observation will give an incorrect picture of the lack of difference between the groups.
Each type of incomplete analysis ignores the other aspect of the complete dependent variable. We need a method of analysis that takes into account both aspects.
Survival analysis is an analytic technique that combines both aspects.
Comparisons of different groups includes . . .
Comparison of proportion dying / leaving
Comparison of time surviving / staying.
Survival Analysis – 5 Printed on 10/26/2016
Survival Analysis (also called Event History Analysis)
An analytic technique that models both survival times and proportions of deaths / quits.
3 separate techniques available in SPSS – Life Table, Kaplan-Meier, Cox Regression
Key concept common to all techniques
Survival function – most important one of all of them
A plot of proportion surviving from time 0 up to a given time vs. time
A cumulative plot.
Generally decreasing curve, since proportion surviving can only remain constant or decrease across time.
Separate curves for separate groups
The curve represents both aspects of survival.
1) The height of the curve at a point represents the proportion surviving up to that time.
2) The curve also represents duration of stay/life (how far the curve has progressed to the right from t=0). The distance along the X-axis represents the average survival time for those who have a specific survival rate.
So, the survival curve is a two-dimensional representation of the two aspects of survival – survival rates and length of life/employment.
Survival Analysis – 6 Printed on 10/26/2016
0 Time
ProportionSurviving
30
50%
100%At time 30, 63% have survived.
At time 30, 54% have survived.
63% have survived at time, t.
At 60% survival rate, average length of time was 17. survived at time, t.
60%
At 60% survival rate, average length of time was 85. survived at time, t.
10 20 40 50 60 70 80
Comparing groups.
The vertical axis represents proportion of survivals or turnovers.
Within a vertical slice at any point, turnover rates up to a particular time can be compared.In the following, we see that at time t, Group A had a higher survival rate than Group B.
Comparing Average survival times between two groups.
Within a horizontal slice at any point, average survival times can be compared.
In the following, we see that for a 70% survival rate, average survival time was longer for Group A than it was for Group B.
When comparing groups we will usually compare the whole curve for each group. The group whose curve is generally above the others is the group with best survival.
Survival Analysis – 7 Printed on 10/26/2016
Time
B
A
70%Time
B
A
t
Three general types of Survival Analysis
1. Life Tables analysis.
The window of observation is cut up into n equal-length intervals.
Proportions of persons surviving/dying within each interval are computed.
This is the original method.
Useful for analysis of one group or for comparison of a few groups defined by levels of a single categorical factor.
Can’t incorporate quantitative predictors.
Can’t incorporate more than 2 qualitative predictors in SPSS.
Cannot analyze interactions of 2 or more predictors.
2. Kaplan-Meier analysis.
Event-based. Rather than defining intervals based on time, intervals are defined based on occurrence of death/termination. Each death/termination marks the end of one interval and the beginning of a subsequent interval.
Can’t incorporate quantitative predictors.
Can’t incorporate more than 2 qualitative predictors in SPSS.
Cannot analyze interactions of 2 or more predictors.
Survival function graphs printed by SPSS’s K-M procedure show censored cases, a plus.
Survival Analysis – 8 Printed on 10/26/2016
3. Cox Proportional Hazards Regression (Cox Regression)
A very general, procedure.
Based on a specific mathematical model of survival developed by Cox.
Estimates hazard probabilities for whole sample.
Then estimates ratios of hazards to this overall hazard function for groups/persons with different values of IV’s
As implemented in SPSS, output and analyses look at lot like logistic regression.
Can incorporate quantitative predictors.
Can incorporate multiple qualitative and quantitative factors.
Can incorporate interactions.
Survival Analysis – 9 Printed on 10/26/2016
Based on Tabachnick Table 11.1, p. 515Analyzed using SPSS Life Tables
Suppose the efficacy of Drug 0 is being compared with that of Drug 1. Each was formulated to prolong life of patients with a usually terminal form of cancer. Seven patients were given Drug 0 and five were given Drug 1. Patients were observed for up to 12 months. After 12 months, the window of observation closed and the results were entered into SPSS. Note that this problem is analogous to a turnover problem in organizational research with two groups of employees treated differently.
The SPSS syntax to invoke the analysis.
SAVE OUTFILE='G:\MdbT\P595\P595AL07-Survival analysis\TAndFDancingData.sav' /COMPRESSED.SURVIVAL TABLE=months BY drug(0 1) /INTERVAL=THRU 12 BY 1 /STATUS=outcome(1) /PRINT=TABLE /PLOTS (SURVIVAL)=months BY drug.
Survival Analysis – 10 Printed on 10/26/2016
Like ANOVA
Like multiple t-tests
Survival Analysis [DataSet0] G:\MdbT\P595\P595AL07-Survival analysis\TAndFDancingData.savSurvival Variable: months
Life Table
First-order Controls
Interval Start Time
Number Entering Interval
Number Withdraw
ing during Interval
Number Exposed to Risk
Number of
Terminal Events
Proportion
Terminating
Proportion
Surviving
Cumulative
Proportion
Surviving at End of Interval
Std. Error of
Cumulative
Proportion
Surviving at End of Interval
Probability Density
Std. Error of
Probability Density
Hazard Rate
Std. Error of Hazard
Ratedrug 0 0 7 0 7.000 0 .00 1.00 1.00 .00 .000 .000 .00 .00
1 7 0 7.000 1 .14 .86 .86 .13 .143 .132 .15 .152 6 0 6.000 2 .33 .67 .57 .19 .286 .171 .40 .283 4 0 4.000 1 .25 .75 .43 .19 .143 .132 .29 .284 3 0 3.000 1 .33 .67 .29 .17 .143 .132 .40 .395 2 0 2.000 1 .50 .50 .14 .13 .143 .132 .67 .636 1 0 1.000 0 .00 1.00 .14 .13 .000 .000 .00 .007 1 0 1.000 0 .00 1.00 .14 .13 .000 .000 .00 .008 1 0 1.000 0 .00 1.00 .14 .13 .000 .000 .00 .009 1 0 1.000 0 .00 1.00 .14 .13 .000 .000 .00 .0010 1 0 1.000 0 .00 1.00 .14 .13 .000 .000 .00 .0011 1 0 1.000 1 1.00 .00 .00 .00 .143 .132 2.00 .00
1 0 5 0 5.000 0 .00 1.00 1.00 .00 .000 .000 .00 .001 5 0 5.000 0 .00 1.00 1.00 .00 .000 .000 .00 .002 5 0 5.000 0 .00 1.00 1.00 .00 .000 .000 .00 .003 5 0 5.000 0 .00 1.00 1.00 .00 .000 .000 .00 .004 5 0 5.000 0 .00 1.00 1.00 .00 .000 .000 .00 .005 5 0 5.000 0 .00 1.00 1.00 .00 .000 .000 .00 .006 5 0 5.000 0 .00 1.00 1.00 .00 .000 .000 .00 .007 5 0 5.000 1 .20 .80 .80 .18 .200 .179 .22 .228 4 0 4.000 1 .25 .75 .60 .22 .200 .179 .29 .289 3 0 3.000 0 .00 1.00 .60 .22 .000 .000 .00 .0010 3 0 3.000 2 .67 .33 .20 .18 .400 .219 1.00 .6111 1 0 1.000 0 .00 1.00 .20 .18 .000 .000 .00 .0012 1 1 .500 0 .00 1.00 .20 .18 .000 .000 .00 .00
The results suggest that survival is significantly longer with Drug 1 – the top (orange) curve.
Survival Analysis – 11 Printed on 10/26/2016
It’s my assumption that all the times are collected and the median of those times is reported here. It should correspond closely to the intersection of survival functions and a horizontal line at 50% survival.
Tabachnick Table 11.1, p. 511 Analyzed using SPSS Kaplan-Meier
Analyze Survival Kaplan-Meier . . .
KM months BY drug /STATUS=outcome(1) /PRINT TABLE MEAN /PLOT SURVIVAL /TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED.
Survival Analysis – 12 Printed on 10/26/2016
[Define Event] had already been pressed when this screen shot was taken.
Kaplan-Meier [DataSet2] G:\MdbT\InClassDatasets\Survival(T&Bp511).sav
Case Processing Summary
drug Total N N of Events
Censored
N Percent0 7 7 0 .0%
1 5 4 1 20.0%
Overall 12 11 1 8.3%
Survival Table
drug Time StatusCumulative Proportion Surviving at the Time
N of Cumulative Events N of Remaining CasesEstimate Std. Error0 1 1.000 1 .857 .132 1 6
2 2.000 1 . . 2 53 2.000 1 .571 .187 3 44 3.000 1 .429 .187 4 35 4.000 1 .286 .171 5 26 5.000 1 .143 .132 6 17 11.000 1 .000 .000 7 0
1 1 7.000 1 .800 .179 1 42 8.000 1 .600 .219 2 33 10.000 1 . . 3 24 10.000 1 .200 .179 4 15 12.000 0 . . 4 0
Means and Medians for Survival Time
drug
Meana Median
Estimate Std. Error95% Confidence Interval
Estimate Std. Error95% Confidence Interval
Lower Bound Upper Bound Lower Bound Upper Bound0 4.000 1.272 1.506 6.494 3.000 1.309 .434 5.5661 9.400 .780 7.872 10.928 10.000 .894 8.247 11.753Overall 6.250 1.081 4.131 8.369 5.000 2.598 .000 10.092a. Estimation is limited to the largest survival time if it is censored.
Overall Comparisons
Chi-Square df Sig.Log Rank (Mantel-Cox) 3.747 1 .053
Breslow (Generalized Wilcoxon) 4.926 1 .026
Tarone-Ware 4.522 1 .033
Test of equality of survival distributions for the different levels of drug.
Survival Analysis – 13 Printed on 10/26/2016
As was the case with the analysis using the LIFE TABLES procedure, the results support the conclusion that survival is significantly longer with Drug 1.
Survival Analysis – 14 Printed on 10/26/2016
Note that censored cases are denoted with a + on the survival function.
Tabachnick Table 11.1, p. 511 – start here on 11/6/17Analyzed using SPSS Cox Regression
The program will not produce a survival curve for a group of cases defined by the value of a variable unless that variable is a categorical variable. (Reminds me of the RCMDR Factor issue.)
For that reason, I told the program that drug is a categorical variable so that survival curves for each value of drug could be obtained.
Since drug is a dichotomy, the analysis could be done without labeling it categorical, but in that case the survival curves for each value of drug could not have been generated.
Survival Analysis – 15 Printed on 10/26/2016
The left panel would yield 1 plot The right panel yields a plot for each value of drug.
COXREG months /STATUS=outcome(1) /PATTERN BY drug /CONTRAST (drug)=Indicator(1) /METHOD=ENTER drug /PLOT SURVIVAL /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Cox Regression
[DataSet2] G:\MdbT\InClassDatasets\Survival(T&Bp511).sav
Case Processing SummaryN Percent
Cases available in analysis Eventa 11 91.7%Censored 1 8.3%Total 12 100.0%
Cases dropped Cases with missing values 0 .0%Cases with negative time 0 .0%Censored cases before the earliest event in a stratum
0 .0%
Total 0 .0%Total 12 100.0%a. Dependent Variable: months
Categorical Variable Codingsb
Frequency (1)druga 0 7 0
1 5 1a. Indicator Parameter Codingb. Category variable: drug
Survival Analysis – 16 Printed on 10/26/2016
As mentioned above if you want separate predicted survival functions for each value of a categorical variable, put the name of that categorical variable here.
Block 0: Beginning Block
Omnibus Tests of Model Coefficients
-2 Log Likelihood40.740
Block 1: Method = Enter
Omnibus Tests of Model Coefficientsa
-2 Log LikelihoodOverall (score) Change From Previous Step Change From Previous Block
Chi-square df Sig. Chi-square df Sig. Chi-square df Sig.37.394 3.469 1 .063 3.346 1 .067 3.346 1 .067
a. Beginning Block Number 1. Method = Enter
Variables in the Equation
B SE Wald df Sig. Exp(B)drug -1.176 .658 3.192 1 .074 .309
Covariate Means and Pattern Values
MeanPattern
1 2drug .417 .000 1.000
I strongly recommend that you create a plot such as the one immediately above by hand to make sure you understand the Cox Regression results. I do it every time I use this procedure.
Survival Analysis – 17 Printed on 10/26/2016
Cox regression coefficient signs are relative to death, not survival. So, a positive sign means that larger values of the independent variable have higher death rates. And negative signs mean that larger values of the independent variable have lower death rates.
Drug
Death
10
In Cox Regression, we’re predicting DEATH, not survival.
COXREG plots are plots of predicted survival, not actual survival. In this sense, they’re like the tables and plots of estimated marginal means from GLM. I usually report observed survival functions, using Kaplan-Meier, rather than these predicted survival functions. However, these are certainly useful in situations in which you want to show what survival should be for specific groups at specific times controlling for the other variables in the equation.
Survival Analysis – 18 Printed on 10/26/2016
Y-hats
The Cox-Regression plots are y-hat plots, not observed survival functions.
They are predicted survival, not actual survival.
Note, however, that they are predicted SURVIVAL curves, not death curves.
From Kaplan-Meier
Real Life Example: Turnover at a local Manufacturing Plant
1. Effect of Friends and/or family at the plant
In this study, turnover at a local manufacturing plant was studied. On the application blank, applicants were asked to indicate whether or not they had friends or family already working at the plant.
Some did not respond to this question. They’re included in the analysis.A screen shot of the data editor
The variable, wsfr2, represents whether or not the applicant had friends at the company.
wsfr2 = 0.50 means yes.wsfr2 = -0.50 means no.wsfr2 = 0.15 means no info.
Wsfr2 was created to deal with missing values in a special way. The fact that the values are fractional has no bearing on the analyses. They could just as well have been 0, 1, 2 or 1, 2, 3.
Having said that, because the LIFE TABLES procedures requires integer values of each factor, I’ll skip it here.
Kaplan-Meier analysis is shown
Some of SPSS’s procedures are written so that a grouping variable can have any kind of values. K-M is one of them.
K-M allows you to simply specify the name of the factor, and the program figures out how many groups are implied by the values of the factor.
That’s good unless you have a grouping variable with some incidental values representing unique cases or groups of cases – cases you wish to be excluded from the analysis.
Survival Analysis – 19 Printed on 10/26/2016
KM dos BY wsfr2 /STATUS=status(1) /PRINT TABLE MEAN /PLOT SURVIVAL /TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED .
Survival Analysis – 20 Printed on 10/26/2016
Kaplan-Meier
[DataSet3] G:\MdbR\1TurnoverArticle\TurnoverArticleDataset061005.savLarge table was deleted.
Case Processing Summarywsfr2 Whether F/F at company
for whole sample analyses Total N
N of
Events
Censored
N Percent
-.50 423 174 249 58.9%
.15 Whole sample missing value 100 40 60 60.0%
.50 778 220 558 71.7%
Overall 1301 434 867 66.6%
Means and Medians for Survival Time
wsfr2 Whether F/F at company
for whole sample analyses
Meana Median
Estimat
e
Std.
Error
95% Confidence Interval
Estimat
e
Std.
Error
95% Confidence Interval
Lower
Bound
Upper
Bound
Lower
Bound
Upper
Bound
-.50 610.597 25.559 560.500 660.693 667.000 . . .
.15 Whole sample missing value 579.795 49.233 483.299 676.291 528.000 151.013 232.014 823.986
.50 769.900 18.559 733.524 806.277 . . . .
Overall 706.965 15.009 677.548 736.383 . . . .
a. Estimation is limited to the largest survival time if it is censored.
Note that there is no estimate of median survival for the 0.50 group. I’m not absolutely sure why, but I believe it’s because more than 50% of the persons in that group were still on the job at the end of the observation window. For that reason, a median was not computable.
Overall ComparisonsChi-Square df Sig.
Log Rank (Mantel-Cox) 25.344 2 .000
Breslow (Generalized Wilcoxon) 25.325 2 .000
Tarone-Ware 25.004 2 .000
Test of equality of survival distributions for the different levels of wsfr2
Whether F/F at company for whole sample analyses.
Clearly there are significant differences in overall survival between the groups.
Survival Analysis – 21 Printed on 10/26/2016
-.50 = No friends.15 = No info.50 = Had friends
The data strongly suggest that applicants who had friends or family at the company had higher survival rates at all times up to 1100 days (about 3 years).
For example, at the end of 1 year survival (leftmost arrow in the above figure) rate of those with friends and family was about 70% while that for those who said they did not have friends or family at the organization was about 60%.
By two years (middle arrow), the rate of retention of those with was about 68% while the rate of those without had decreased to 50%.
The fact that the curve for those for whom no information was available was between the other two curves suggests that those employees for whom no information was available were a mixture of some who did have friends and family and those who did not.
Survival Analysis – 22 Printed on 10/26/2016
Had friends or family
No friends or family
Missing response
1 year 2 years
Note the huge difference in proportion surviving after two years – almost 20% difference between those with friends and those without friends.
3 years
Same analysis using SPSS Cox RegressionAnalyze Survival Cox Regression . . .
Survival Analysis – 23 Printed on 10/26/2016
In my limited experience with group coding variables in survival analysis, I’ve found that Dummy Variable (Indicator in SPSS) coding is the one that is most useful.
COXREG dos /STATUS=status(1) /PATTERN BY wsfr2 /CONTRAST (wsfr2)=Indicator /METHOD=ENTER wsfr2 /PLOT SURVIVAL /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Cox RegressionCase Processing Summary
N Percent
Cases available in analysis Eventa 434 33.4%
Censored 867 66.6%
Total 1301 100.0%
Cases dropped Cases with missing values 0 0.0%
Cases with negative time 0 0.0%
Censored cases before the
earliest event in a stratum
0 0.0%
Total 0 0.0%
Total 1301 100.0%
a. Dependent Variable: dos
Survival Analysis – 24 Printed on 10/26/2016
About 1/3 of the employees were still working when the window of observation closed.
Remember, the variable representing different groups must have been specified as categorical.
Categorical Variable Codingsa
Frequency (1) (2)
wsfr2b -.50=-.50 423 1 0
.15=Whole sample missing value 100 0 1
.50=.50 778 0 0
a. Category variable: wsfr2 (Whether F/F at company for whole sample analyses)
b. Indicator Parameter Coding
Block 0: Beginning BlockOmnibus Tests
of Model Coefficients
-2 Log Likelihood
5871.672
Block 1: Method = EnterOmnibus Tests of Model Coefficientsa
-2 Log
Likelihood
Overall (score) Change From Previous Step Change From Previous Block
Chi-square df Sig. Chi-square df Sig. Chi-square df Sig.
5847.172 25.290 2 .000 24.499 2 .000 24.499 2 .000
a. Beginning Block Number 1. Method = Enter
Variables in the EquationB SE Wald df Sig. Exp(B)
wsfr2 24.812 2 .000
wsfr2(1) .489 .102 23.230 1 .000 1.631
wsfr2(2) .425 .172 6.104 1 .013 1.529
Recall that the sign of each coefficient is relative to “Termination”.
WSFR2(1) compares the proportion terminating in the -.50 group to the proportion in the +.50 group.
Since the coefficient is +.489, this says that the -.50 group has larger probability of terminating than the .50 group.
Same for the wsfr2(2) – The no response group has greater probability of terminating than the +.50 group.
Covariate Means and Pattern Values
Mean
Pattern
1 2 3
wsfr2(1) .325 1.000 .000 .000
wsfr2(2) .077 .000 1.000 .000
Survival Analysis – 25 Printed on 10/26/2016
Term’d
I’m not sure what this table is for.
The reference group is the wsfr2 = +0.50 “Have Friends” group.
0 1
Survival Analysis – 26 Printed on 10/26/2016
Predicted survival for the whole sample.
These are predicted survival curves, which is why they’re so smooth.
We might use these data to read the minds of those who did not respond to the “Do you have friends or family?” The similarity of their survival function to the “No Friends” function suggests that most did not have friends or family at the organization.
Using Survival Analysis to score and validate selection test questions.An I/O consulting firm gave a 30-question pre-employment questionnaire to 1000+ employees of a local company. Each question had from one to five alternatives. The consulting company wanted to identify questions that predicted long tenure with the organization. (They would have preferred to identify questions that predicted high performance, but it was not possible to get good performance data. Don’t get me started on why organizations don’t gather good performance data.)
In order to identify responses associated with long tenure, a survival analysis was conducted for each question. A few of the analyses are presented below.
For each survival function, each curve is the survival function of persons who made a particular response to the item. I picked only those for which the difference in survival curves was significant or approached significance.
Question 1Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 5.382 2 .068
Breslow (Generalized Wilcoxon) 4.307 2 .116
Tarone-Ware 4.756 2 .093
Survival Analysis – 27 Printed on 10/26/2016
The numbers represent the 3 possible responses to the question, coded as +1, 0, -1.
For this question, I believe we treated +1 as an indicator of long tenure and both 0 and -1 as indicators of short tenure.
+1
0
-1?
Question 2Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 7.647 4 .105
Breslow (Generalized Wilcoxon) 6.950 4 .139
Tarone-Ware 7.298 4 .121
Survival Analysis – 28 Printed on 10/26/2016
+1
0
-1?
As in the case of the question on the previous page, the response coded as +1 was treated as an indicator of long tenure and all other responses were treated as indicators of short tenure.
Question 3Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 5.070 3 .167
Breslow (Generalized Wilcoxon) 5.525 3 .137
Tarone-Ware 5.493 3 .139
Test of equality of survival distributions for the different levels of GenQ4 Gen
Q4 L:I prefer a job that / S: How often you experience conflict with a co-
worker?.
Survival Analysis – 29 Printed on 10/26/2016
+1
0
There were very few persons who responded +1 or 0, but those who did were treated as long tenure and those who responded 0 as short tenure.
Question 4
Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 7.753 4 .101
Breslow (Generalized Wilcoxon) 6.762 4 .149
Tarone-Ware 7.439 4 .114
Test of equality of survival distributions for the different levels of GenQ3 Gen
Q3 L: Recieved safety training? / S: You are asked to do more physically
demanding work than you were hired to do because someone out sick, how
do you react?.
Survival Analysis – 30 Printed on 10/26/2016
+1
0, -1
+1: Long tenureElse: Short tenure
Question 5
Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 10.971 4 .027
Breslow (Generalized Wilcoxon) 9.931 4 .042
Tarone-Ware 10.597 4 .031
Test of equality of survival distributions for the different levels of GenQ2 Gen
Q2 L: Your team in disagreement over who will clean the floor. What
method is fair?/ S: Recent supervisor rate dependability?.
Survival Analysis – 31 Printed on 10/26/2016
+1
0
Long Tenure
Short Tenure
Question 6
Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 8.052 3 .045
Breslow (Generalized Wilcoxon) 12.729 3 .005
Tarone-Ware 10.614 3 .014
Test of equality of survival distributions for the different levels of GenQ1
GenQ1 L: Which strategies inspire a team and help be more effective?/
S:Your team in disagreement over who will clean the floor. What method is
fair?.
Survival Analysis – 32 Printed on 10/26/2016
+1
0
Creation of an overall Tenure Index
Thirty questions were evaluated in the above fashion.
After examination of the individual survival curves for the 30 questions, those for which significant differences in survival between responses were identified by examining the survival analysis for each question as shown above.
Finally, an overall index was calculated, using syntax like the following . . .
In this particular case, the response associated with long survival added 1 to the index.
The response associated with short survival subtracted 1 from the index.
Tenure Scale Computation
Compute genshort=0.if ((genq1=3 or genq1=4)) genqshort=genqshort + 1.if ((genq1=1 or genq1=2)) genqshort=genqshort - 1.if ((genq2=3 or genq2=4)) genshort=genshort + 1.if ((genq2=1 or genq2=2 or genq2=5)) genshort=genshort - 1.if ((genq6=3)) genshort=genshort + 1.if ((genq6=1 or genq6=2)) genshort=genshort - 1.if ((genq12=1)) genshort=genshort + 1.if ((genq12=3)) genshort=genshort - 1.if ((genq13=1)) genshort=genshort + 1.if ((genq13=2 or genq13=3 or genq13=4)) genshort=genshort - 1.if ((genq21=1 or genq21=3)) genshort=genshort + 1.if ((genq21=2)) genshort=genshort - 1.
Survival Analysis – 33 Printed on 10/26/2016
Validity of the Tenure Index
The following is not based on the scale above but on a similar scale.
The median score on the scale was determined to be -14,Group 0 was all employees with an index value less than or equal to -14 – persons who generally responded with the “short tenure” answer.
Group 1 was all employees with an index value greater than -14 – persons who generally responded with the “long tenure” answer.
The graph indicates that those in Group 1, with large values of the index, had a nearly 70% retention rate after 50 months.
Those in Group 0 had a 40% retention rate after the same length of time.
The implication of this analysis would be to recommend to the company to use the scale in hiring of employees, giving preference in hiring to those with higher scores on the scale.
Remember that these responses were obtained at time of application. The effect lasted for 4 years.
Potential problems
The above curve was based on the same sample that was used to select the questions. So clearly there is capitalization on chance. The scale should be tested on a different sample. That is the results need to be cross validated.
Survival Analysis – 34 Printed on 10/26/2016
Group 0 – low tenure
Group 1 – high tenure
1 yr 2 yr 3 yr 4 yr
Multivariate Analysis using Cox Regression
Turnover as a function of 1) friends at the organization (wsfr2) and 2) sex of the employee, and 3) ethnic group of the employee (neth)
COXREG dos /STATUS=status(1) /PATTERN BY wsfr2 /CONTRAST (neth)=Indicator(1) /CONTRAST (wsfr2)=Indicator /METHOD=ENTER wsfr2 nsex neth /PLOT SURVIVAL /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Wsfr2 -0.50 does not have friends at company 0.15 no info on whether has friends0.50 friends at the company
Nsex 1 Female2 Male
Neth 1 Employee is White2 Employee is Black
Survival Analysis – 35 Printed on 10/26/2016
3 Employee is American Indian or Asian or Hispanic
Survival Analysis – 36 Printed on 10/26/2016
Cox Regression
[DataSet1] G:\MDBR\1TurnoverArticle\TurnoverArticleDataset061005.sav
Case Processing Summary
N Percent
Cases available in analysis
Eventa 434 33.4%
Censored 867 66.6%
Total 1301 100.0%
Cases dropped
Cases with missing values 0 0.0%
Cases with negative time 0 0.0%
Censored cases before the earliest
event in a stratum0 0.0%
Total 0 0.0%
Total 1301 100.0%
a. Dependent Variable: dos Days of service: termdate-effdate or 3/1/1-effdate or 12/31/4-effdate
Categorical Variable Codingsa,c
Frequency (1) (2)
wsfr2b
-.50=-.50 423 1 0
.15=Whole sample missing value 100 0 1
.50=.50 778 0 0
nethb
1.00=White 903 0 0
2.00=Black 324 1 0
3.00=Am Ind,Asian,Hisp 74 0 1
a. Category variable: wsfr2 (Whether F/F at company for whole sample analyses)
b. Indicator Parameter Coding
c. Category variable: neth (1=White, 2=Black, 3=Am Ind,Asian, Hisp)
No interactions were included in this analysis.
Survival Analysis – 37 Printed on 10/26/2016
Note that
Wsfr2 = 0.50 (friends) is the reference group
Neth = 1 (white) is the reference group
Block 0: Beginning Block
Omnibus Tests of
Model Coefficients
-2 Log Likelihood
5871.672
Block 1: Method = Enter
Omnibus Tests of Model Coefficientsa
-2 Log Likelihood Overall (score) Change From Previous Step Change From Previous Block
Chi-square df Sig. Chi-square df Sig. Chi-square df Sig.
5827.342 42.322 5 .000 44.330 5 .000 44.330 5 .000
a. Beginning Block Number 1. Method = Enter
Variables in the Equation
B SE Wald df Sig. Exp(B)
wsfr2 22.427 2 .000
wsfr2(1) .464 .102 20.763 1 .000 1.590
wsfr2(2) .421 .173 5.969 1 .015 1.524
nsex -.223 .100 4.952 1 .026 .800
neth 10.799 2 .005
neth(1) .088 .109 .657 1 .417 1.092
neth(2) -.908 .295 9.490 1 .002 .403
Covariate Means and Pattern Values
Mean Pattern
1 2 3
wsfr2(1) .325 1.000 .000 .000
wsfr2(2) .077 .000 1.000 .000
nsex 1.421 1.421 1.421 1.421
neth(1) .249 .249 .249 .249
neth(2) .057 .057 .057 .057
Survival Analysis – 38 Printed on 10/26/2016
0 Friends 1=No Fr
Term’dRemember we’re predicting “Termination”, not survival
We found this previously.
We found this previously.
1= AsA/Hisp
? 1=MV0 Friends
Term’d
Term’d
0=W1=Fem 2=Male
Term’d
The Kaplan-Meier Curve, for comparison . . .
Survival Analysis – 39 Printed on 10/26/2016
Predicted
These graphs replicate what we found above without other variables (NSEX and NETH) in the equation.
Testing for Interactions in Cox Regression
1. The interaction of Friends and Nsex
Block 1: Method = Enter
Omnibus Tests of Model Coefficientsa
-2 Log Likelihood
Overall (score) Change From Previous Step Change From Previous Block
Chi-square
df Sig. Chi-square
df Sig. Chi-square
df Sig.
5824.879 44.989 7 .000 46.792 7 .000 46.792 7 .000a. Beginning Block Number 1. Method = Enter
Variables in the EquationB SE Wald df Sig. Exp(B)
wsfr2 3.022 2 .221wsfr2(1) .429 .306 1.975 1 .160 1.536wsfr2(2) -.333 .530 .394 1 .530 .717nsex -.282 .138 4.158 1 .041 .754neth 10.964 2 .004neth(1) .097 .109 .800 1 .371 1.102neth(2) -.907 .295 9.464 1 .002 .404nsex*wsfr2 2.517 2 .284nsex*wsfr2(1) .023 .213 .011 1 .915 1.023nsex*wsfr2(2) .541 .347 2.424 1 .119 1.717
So no significant interaction means that the effect of having friends is the same for Females as it is for Males
Survival Analysis – 40 Printed on 10/26/2016
To specify that an interaction be tested, click on the 1st variable name, then while holding down the CTRL key or Command on the Mac, click on the 2nd variable name.
Finally, click on the >a*b> button.
2. The interaction of Friends and Neth – assessed in a separate analysis.
Block 1: Method = Enter
Omnibus Tests of Model Coefficientsa
-2 Log Likelihood
Overall (score) Change From Previous Step Change From Previous Block
Chi-square
df Sig. Chi-square
df Sig. Chi-square
df Sig.
5820.584 49.194 9 .000 51.088 9 .000 51.088 9 .000a. Beginning Block Number 1. Method = Enter
Variables in the EquationB SE Wald df Sig. Exp(B)
wsfr2 27.320 2 .000wsfr2(1) .599 .121 24.386 1 .000 1.820wsfr2(2) .623 .209 8.846 1 .003 1.864nsex -.224 .100 4.973 1 .026 .799neth 6.603 2 .037neth(1) .298 .150 3.934 1 .047 1.347neth(2) -.465 .344 1.835 1 .176 .628neth*wsfr2 6.377 4 .173neth(1)*wsfr2(1) -.392 .230 2.906 1 .088 .675neth(2)*wsfr2(1) -1.222 .791 2.385 1 .123 .295neth(1)*wsfr2(2) -.534 .378 1.995 1 .158 .586neth(2)*wsfr2(2) -1.093 1.075 1.035 1 .309 .335
Again, the lack of a significant interaction means that the effect of Friends is the same for each ethnic group.
What the heck? What about the interaction of nsex and neth?
Block 1: Method = Enter – assessed again in a separate analysis.
Omnibus Tests of Model Coefficientsa
-2 Log Likelihood Overall (score) Change From Previous Step Change From Previous BlockChi-square df Sig. Chi-square df Sig. Chi-square df Sig.
5827.298 42.395 7 .000 44.373 7 .000 44.373 7 .000a. Beginning Block Number 1. Method = Enter
Variables in the EquationB SE Wald df Sig. Exp(B)
wsfr2 22.438 2 .000wsfr2(1) .465 .102 20.791 1 .000 1.591wsfr2(2) .421 .173 5.941 1 .015 1.523nsex -.216 .118 3.365 1 .067 .806neth .888 2 .642neth(1) .112 .327 .117 1 .732 1.118neth(2) -.739 .884 .700 1 .403 .477neth*nsex .043 2 .979neth(1)*nsex -.018 .234 .006 1 .940 .982neth(2)*nsex -.125 .624 .040 1 .842 .883
Nope.Survival Analysis – 41 Printed on 10/26/2016
Comparing Turnover in two plantsA company was interested in determining the causes of turnover in two of its plants.
Plant A: One part of the preparation of food for sale to retailers is undertaken.Plant B: A different part of the preparation of food for sale to retailer is undertaken.
Each plant is managed by a different person.
The overall “survival” of employees in the two plants, reploc=1 and reploc=2, is as follows . . .
filter off.compute reploc = newloc.value labels reploc 1 "A" 2 "B".filter by useme.KM dayswrkd by reploc /STATUS=termed(1)/PRINT MEAN /PLOT SURVIVAL/TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED.Kaplan-Meier[DataSet1] G:\MDBR\???\AllEmployeesNN041025.sav
Case Processing Summary
reploc Total N N of Events Censored
N Percent
1.00 A 310 126 184 59.4%
2.00 B 837 285 552 65.9%
Overall 1147 411 736 64.2%
Means and Medians for Survival Time
reploc Meana Median
Estimate Std. Error 95% Confidence Interval Estimate Std. Error 95% Confidence Interval
Lower Bound Upper Bound Lower Bound Upper Bound
1.00 A 355.796 16.345 323.760 387.832 377.000 39.815 298.962 455.038
2.00 B 424.911 9.197 406.884 442.938 559.000 . . .
Overall 407.357 8.081 391.519 423.195 489.000 33.040 424.242 553.758
a. Estimation is limited to the largest survival time if it is censored.
Overall Comparisons
Chi-Square df Sig.
Log Rank (Mantel-Cox) 13.633 1 .000
Breslow (Generalized Wilcoxon) 10.203 1 .001
Tarone-Ware 11.880 1 .001
Test of equality of survival distributions for the different levels of reploc.
Survival Analysis – 42 Printed on 10/26/2016
filter off.
Clearly, employee “retention/survival” is best in Plant B – reploc = 2.
The manager of Plant A was pretty defensive.
Survival Analysis – 43 Printed on 10/26/2016
Plant B
Plant A
Are these differences in survival rates the same for the different ethnic groups employed by the company?
Perhaps the differences between buildings are due to the fact that the different buildings have different proportions of ethnic groups coupled with the fact that the different ethnic groups have different survival rates.
neweth * reploc Crosstabulationreploc
Total1.00 A 2.00 Bneweth .00 White or Black Count 130 219 349
% within reploc 41.4% 25.7% 29.9%1.00 Hispanic Count 184 634 818
% within reploc 58.6% 74.3% 70.1%Total Count 314 853 1167
% within reploc 100.0% 100.0% 100.0%
These differences suggest that the difference in survival between buildings might be a side-effect of the difference in proportion of Hispanics in the two buildings combined with the difference in survival between Hispanics vs. White/Black,
The way to resolve this issue is to perform a multivariate analysis, assessing the Plant effect while controlling for the Ethnic Group effect..
This can only be done with Cox Regression.
Survival Analysis – 44 Printed on 10/26/2016
Hispanic
White/Black
Multivariate analysis joint effect of plant and ethnic group.
filter off.filter by useme.COXREG dayswrkd /STATUS=termed(1) /METHOD=ENTER reploc neweth /CRITERIA=PIN(.05) POUT(.10) ITERATE(20).
Cox Regression
Case Processing Summary
N PercentCases available in analysis Eventa 411 33.9%
Censored 736 60.7%Total 1147 94.6%
Cases dropped Cases with missing values 65 5.4%Cases with negative time 0 0.0%Censored cases before the earliest event in a stratum 0 0.0%
Total 65 5.4%Total 1212 100.0%a. Dependent Variable: dayswrkd
Block 0: Beginning Block
Omnibus Tests of Model Coefficients
-2 Log Likelihood5312.092
Block 1: Method = Enter
Omnibus Tests of Model Coefficientsa
-2 Log Likelihood
Overall (score) Change From Previous StepChange From Previous
BlockChi-
square df Sig.Chi-
square df Sig.Chi-
square df Sig.5222.145 101.652 2 .000 89.947 2 .000 89.947 2 .000
a. Beginning Block Number 1. Method = Enter
Variables in the EquationB SE Wald df Sig. Exp(B)
reploc -.159 .111 2.072 1 .150 .853neweth -.916 .102 80.462 1 .000 .400
Covariate MeansMean
reploc 1.730neweth .697
filter off.
So, when controlling for differences in ethnic groups, no difference in survival (turnover) between the two buildings was found. The manager of Building A was very happy with this result.
Survival Analysis – 45 Printed on 10/26/2016
Since both factors – reploc and neweth – are dichotomous, I did not bother to identify them as categorical variables for SPSS. I will not be able to get survival curves for the individual combinations, though, because they’re not identified as categorical.
Survival Analysis of a phenomenon with a positive outcomePEG vs. PEGJ Example – skipped in 2018
The data for this example compared two methods of feeding trauma patients, one using a percutaneous esophagogastrojejunostomy (PEGJ) and the other using percutaneous esophagogastrostomy (PEG). It was hoped that the data would show that the PEGJ technique would provide continuous uninterrupted nutrition with greater consistency than with PEG. Time to reach a nutrition goal was the continuous dependent variable. Patients were observed for 14 days. Whether or not a patient reached the goal was the status. Reaching the goal was the +1 state. A patient who had not reached the goal in 14 days, was treated as a censored case. Group=1 is the PEGJ group. Group=2 is the PEG group.
NUTRSD NUTRGOAL DAYSGOAL GOALIN14 GROUP ISS AGE02/15/98 02/16/98 1 1 1 29 4301/10/98 01/12/98 2 1 1 5 8802/14/98 02/18/98 4 1 1 29 3702/02/98 02/06/98 4 1 1 27 3601/10/98 01/13/98 3 1 1 13 9201/09/98 . 15 0 2 19 7301/02/98 01/04/98 2 1 2 26 4201/20/98 01/22/98 2 1 2 36 5503/18/98 . 5 1 1 27 2302/04/98 02/06/98 2 1 2 13 7201/23/98 . 15 0 2 10 4502/01/98 02/02/98 1 1 1 22 5902/20/98 02/21/98 1 1 1 17 5402/03/98 02/04/98 1 1 2 14 7803/31/98 04/02/98 2 1 2 18 3004/13/98 04/15/98 2 1 2 27 4905/08/98 05/09/98 1 1 2 9 2204/14/98 04/20/98 6 1 2 9 6005/27/98 05/28/98 1 1 1 17 2705/13/98 . 15 0 2 29 9505/07/98 05/16/98 9 1 2 25 3104/16/98 04/17/98 1 1 2 32 3103/23/98 03/25/98 2 1 2 20 4104/07/98 04/08/98 1 1 2 16 2903/29/98 03/30/98 1 1 1 25 2404/30/98 05/01/98 1 1 2 29 5205/05/98 05/08/98 3 1 2 38 7905/28/98 05/30/98 2 1 1 4 7606/08/98 06/10/98 2 1 2 16 7005/27/98 05/28/98 1 1 1 9 2704/27/98 04/29/98 2 1 1 22 8704/10/98 04/11/98 1 1 1 27 3602/26/98 03/04/98 6 1 1 25 5403/27/98 03/28/98 1 1 1 29 2204/17/98 04/18/98 1 1 1 22 2202/25/98 03/05/98 8 1 1 25 7903/18/98 03/19/98 1 1 1 25 5601/28/98 01/29/98 1 1 1 17 6603/23/98 03/24/98 1 1 1 16 2004/29/98 05/03/98 4 1 1 26 2207/19/98 08/02/98 14 1 2 34 3308/13/98 08/15/98 2 1 1 25 4908/25/98 . 15 0 2 26 7710/06/98 10/07/98 1 1 2 34 1909/10/98 09/11/98 1 1 2 27 3608/14/98 08/15/98 1 1 1 30 3508/25/98 08/27/98 2 1 2 27 2909/20/98 09/21/98 1 1 2 36 6209/29/98 10/01/98 2 1 2 17 1910/09/98 . 15 0 2 38 74
Survival Analysis – 46 Printed on 10/26/2016
DAYSGOAL is the “length of the arrow” variable in the first handout.
GOALIN14 is a variable which represents whether the goal was reached or not.
GOALIN14=1 means that the goal was reached.
GOALIN14=0 means that the case is right-censored.
NUTRSD NUTRGOAL DAYSGOAL GOALIN14 GROUP ISS AGE10/02/98 10/03/98 1 1 1 10 4008/26/98 09/04/98 9 1 2 18 4808/19/98 08/21/98 2 1 1 18 3108/03/98 08/04/98 1 1 1 41 4608/25/98 08/28/98 3 1 2 24 3709/17/98 . 15 0 2 26 7507/02/98 . 15 0 1 19 2808/03/98 08/05/98 2 1 2 13 5207/15/98 07/17/98 2 1 2 38 7107/27/98 08/01/98 5 1 2 34 3304/30/98 05/02/98 2 1 2 4 6105/29/98 05/30/98 1 1 1 29 5805/16/98 05/18/98 2 1 2 19 4206/20/98 06/23/98 3 1 1 25 1908/30/98 . 15 0 1 25 7004/30/98 05/02/98 2 1 2 43 3307/01/98 07/02/98 1 1 1 43 7909/29/98 . 15 0 2 17 1805/28/98 06/08/98 11 1 2 36 5707/15/98 07/16/98 1 1 2 27 5908/11/98 08/12/98 1 1 1 19 4310/12/98 10/13/98 1 1 1 36 1808/24/98 08/25/98 1 1 1 20 8410/22/98 . 15 0 1 25 1710/08/98 10/09/98 1 1 2 25 2010/06/98 . 15 0 2 17 3107/30/98 08/02/98 3 1 1 22 2604/16/98 04/17/98 1 1 1 38 1810/08/98 10/09/98 1 1 1 25 3408/19/98 08/21/98 2 1 1 34 2203/20/98 03/21/98 1 1 1 25 4806/20/98 06/21/98 1 1 1 11 4507/30/98 07/31/98 1 1 1 25 3309/07/98 . 15 0 2 36 2807/17/98 07/18/98 1 1 1 22 6209/15/98 09/17/98 2 1 2 20 4707/07/98 07/08/98 1 1 1 33 2710/01/98 10/02/98 1 1 2 25 3309/11/98 09/12/98 1 1 1 41 31
Specifying the analysis using Life Tables . . .
Survival Analysis – 47 Printed on 10/26/2016
The output of LIFE TABLESSURVIVAL TABLE=DAYSGOAL BY GROUP(1 2) /INTERVAL=THRU 15 BY 1 /STATUS=GOALIN14(1) /PRINT=TABLE /PLOTS ( SURVIVAL)=DAYSGOAL BY GROUP .
Survival Analysis
G:\MdbT\P595\P595AL07-Survival analysis\PEGPEGJData.sav
Survival Variable: DAYSGOAL
Life Table
46 0 46.000 0 .00 1.00 1.00 .00 .000 .000 .00 .00
46 0 46.000 28 .61 .39 .39 .07 .609 .072 .88 .15
18 0 18.000 6 .33 .67 .26 .06 .130 .050 .40 .16
12 0 12.000 3 .25 .75 .20 .06 .065 .036 .29 .16
9 0 9.000 3 .33 .67 .13 .05 .065 .036 .40 .23
6 0 6.000 1 .17 .83 .11 .05 .022 .022 .18 .18
5 0 5.000 1 .20 .80 .09 .04 .022 .022 .22 .22
4 0 4.000 0 .00 1.00 .09 .04 .000 .000 .00 .00
4 0 4.000 1 .25 .75 .07 .04 .022 .022 .29 .28
3 0 3.000 0 .00 1.00 .07 .04 .000 .000 .00 .00
3 0 3.000 0 .00 1.00 .07 .04 .000 .000 .00 .00
3 0 3.000 0 .00 1.00 .07 .04 .000 .000 .00 .00
3 0 3.000 0 .00 1.00 .07 .04 .000 .000 .00 .00
3 0 3.000 0 .00 1.00 .07 .04 .000 .000 .00 .00
3 0 3.000 0 .00 1.00 .07 .04 .000 .000 .00 .00
43 0 43.000 0 .00 1.00 1.00 .00 .000 .000 .00 .00
43 0 43.000 11 .26 .74 .74 .07 .256 .067 .29 .09
32 0 32.000 15 .47 .53 .40 .07 .349 .073 .61 .15
17 0 17.000 2 .12 .88 .35 .07 .047 .032 .13 .09
15 0 15.000 0 .00 1.00 .35 .07 .000 .000 .00 .00
15 0 15.000 1 .07 .93 .33 .07 .023 .023 .07 .07
14 0 14.000 1 .07 .93 .30 .07 .023 .023 .07 .07
13 0 13.000 0 .00 1.00 .30 .07 .000 .000 .00 .00
13 0 13.000 0 .00 1.00 .30 .07 .000 .000 .00 .00
13 0 13.000 2 .15 .85 .26 .07 .047 .032 .17 .12
11 0 11.000 0 .00 1.00 .26 .07 .000 .000 .00 .00
11 0 11.000 1 .09 .91 .23 .06 .023 .023 .10 .10
10 0 10.000 0 .00 1.00 .23 .06 .000 .000 .00 .00
10 0 10.000 0 .00 1.00 .23 .06 .000 .000 .00 .00
10 0 10.000 1 .10 .90 .21 .06 .023 .023 .11 .11
Interval StartTime.000
1.000
2.000
3.000
4.000
5.000
6.000
7.000
8.000
9.000
10.000
11.000
12.000
13.000
14.000
.000
1.000
2.000
3.000
4.000
5.000
6.000
7.000
8.000
9.000
10.000
11.000
12.000
13.000
14.000
First-order Controls1
2
GROUP
NumberEnteringInterval
NumberWithdrawin
g duringInterval
NumberExposed to
Risk
Number ofTerminalEvents
ProportionTerminatin
gProportionSurviving
CumulativeProportion
Surviving atEnd ofInterval
Std. Error ofCumulativeProportion
Surviving atEnd ofInterval
ProbabilityDensity
Std. Errorof
ProbabilityDensity
HazardRate
Std.Error ofHazardRate
Median Survival Time
1.82
2.70
First-order Controls1
2
GROUPMed Time
Survival Analysis – 48 Printed on 10/26/2016
First-order Control: GROUP
These data are strange because the “event” is something that is sought after - reaching a feeding goal, rather than something that is to be avoided - death or termination. So for these data, lower "survival" is preferred, since the "event" is not death, but reaching a nutrition goal. The sooner a patient reached the nutrition goal the better. Thus, the investigators hoped that patients in the PEJ condition would reach those goals faster, leading to lower "survival" curves. In this case, survival should be called "Failure to reach feeding goal."
Survival Analysis – 49 Printed on 10/26/2016
Since the outcome is a good event, the faster the curve falls to zero, the better.
So the group performing best is the group with the lowest curve.
Analysis of the same data using Kaplan-Meier
KM DAYSGOAL BY GROUP /STATUS=GOALIN14(1) /PRINT TABLE MEAN /PLOT SURVIVAL HAZARD /TEST LOGRANK BRESLOW TARONE /COMPARE OVERALL POOLED .
Kaplan-Meier
G:\MdbT\P595\P595AL07-Survival analysis\PEGPEGJData.sav
Case Processing Summary
46 43 3 6.5%
43 34 9 20.9%
89 77 12 13.5%
GROUP1
2
Overall
Total N N of Events N Percent
Censored
Means and Medians for Survival Time
2.717 .527 1.685 3.750 1.000 . . .
5.488 .857 3.808 7.169 2.000 .214 1.581 2.419
4.056 .517 3.043 5.069 2.000 .211 1.587 2.413
GROUP1
2
Overall
Estimate Std. Error Lower Bound Upper Bound
95% Confidence Interval
Estimate Std. Error Lower Bound Upper Bound
95% Confidence Interval
Meana
Median
Estimation is limited to the largest survival time if it is censored.a.
Survival Analysis – 50 Printed on 10/26/2016
Overall Comparisons
8.479 1 .004
9.588 1 .002
9.306 1 .002
Log Rank (Mantel-Cox)
Breslow (GeneralizedWilcoxon)
Tarone-Ware
Chi-Square df Sig.
Test of equality of survival distributions for the different levels of GROUP.
Survival Analysis – 51 Printed on 10/26/2016
The same analysis using Cox Regression
Survival Analysis – 52 Printed on 10/26/2016
One requirement of the Cox Regression analysis is that the hazard functions be proportional. That means that for any two values of a covariate, the ratio of hazards for those two values across time be constant.
This eliminates hazard functions which cross or which are parallel.
Roughly speaking the hazard function should look like the following . . .
That is, the hazard functions diverge over time.
COXREG DAYSGOAL /STATUS=GOALIN14(1) /PATTERN BY GROUP /CONTRAST (GROUP)=Indicator(1) /METHOD=ENTER GROUP /PLOT SURVIVAL HAZARD /CRITERIA=PIN(.05) POUT(.10) ITERATE(20) .
Cox Regression
G:\MdbT\P595\P595AL07-Survival analysis\PEGPEGJData.sav
Case Processing Summary
77 86.5%
12 13.5%
89 100.0%
0 .0%
0 .0%
0 .0%
0 .0%
89 100.0%
Event a
Censored
Total
Cases availablein analysis
Cases with missing values
Cases with negative time
Censored cases beforethe earliest event in astratum
Total
Cases dropped
Total
N Percent
Dependent Variable: DAYSGOALa.
Categorical Variable Codings b
46 0
43 1
1
2
GROUP aFrequency (1)
Indicator Parameter Codinga.
Category variable: GROUPb.
Block 0: Beginning Block
Omnibus Tests of Model Coefficients
618.281-2 Log Likelihood
Block 1: Method = Enter
Omnibus Tests of Model Coefficients a,b
612.895 5.448 1 .020 5.385 1 .020 5.385 1 .020-2 Log Likelihood Chi-square df Sig.
Overall (score)
Chi-square df Sig.
Change From Previous Step
Chi-square df Sig.
Change From Previous Block
Beginning Block Number 0, initial Log Likelihood function: -2 Log likelihood: 618.281a.
Beginning Block Number 1. Method = Enterb.
Survival Analysis – 53 Printed on 10/26/2016
Variables in the Equation
-.542 .235 5.332 1 .021 .582GROUPB SE Wald df Sig. Exp(B)
Covariate Means and Pattern Values
.483 .000 1.000GROUPMean 1 2
Pattern
The above graph presents predicted proportions. They are analogous to plots of y-hats vs. predictors in a regression analysis.
When you perform a Cox-regression analysis, you may also have to run a Kaplan-Meier analysis just for the observed survival curves the K-M procedure produces.
Survival Analysis – 54 Printed on 10/26/2016
1
No Goal
2
Goal
Survival Analysis – 55 Printed on 10/26/2016