using weibull accelerated failure time regression model to ... · weibull aft regression model 18...

14
Using Weibull accelerated failure time regression model to predict survival time and life expectancy Enwu Liu 1,2* 1 Musculoskeletal Health and Ageing Research Program, Mary MacKillop Institute for Health Research, Australian Catholic University, Melbourne, Victoria, Australia 2 College of Medicine and Public Health, Flinders University, Adelaide, South Australia, Australia * [email protected] Abstract Predict mean time to failure (MTTF) or mean time between failures(MTBF) and median survival time are quite common in Engineering reliability researches. In medical literature most prediction models are used to predict probabilities during a certain period. In this paper we introduced detailed calculations to predict different survival times using Weibull accelerated failure time regression model and assessed the accuracy of the point predictions. The method to construct confidence interval for the predicted survival time was also discussed. Weibull distribution 1 The Weibull distribution is also called type III extreme value distribution. [1] The 2 distribution has three parameters, the location parameter μ, the scale parameter ρ and 3 the shape parameter γ . The location parameter μ was predetermined as the minimum 4 value in the distribution, we usually choose 0 as the minimum value for survival 5 analysis, therefore, the distribution is reduced to a two-parameter distribution. 6 The cumulative distribution function(CDF) for a two parameter Weibull distributed 7 random variable T W (ρ, γ ) is given as 8 F T (t; ρ, γ )=1 - exp -( t ρ ) γ (1) for t 0,ρ> 0,γ> 0 and F (t; ρ, γ ) = 0 for t< 0 9 The probability density function(PDF) of the Weibull distribution is given as 10 f T (t; ρ, γ )= F 0 (t; ρ, γ )= γ ρ ( t ρ ) γ-1 exp -( t ρ ) γ (2) The survival function of the Weibull distribution is given as 11 S T (t)=1 - F T (t) = exp -( t ρ ) γ (3) June 23, 2018 1/14

Upload: others

Post on 09-Jul-2020

22 views

Category:

Documents


0 download

TRANSCRIPT

Using Weibull accelerated failure time regression model topredict survival time and life expectancy

Enwu Liu1,2*

1 Musculoskeletal Health and Ageing Research Program, Mary MacKillop Institute forHealth Research, Australian Catholic University, Melbourne, Victoria, Australia2 College of Medicine and Public Health, Flinders University, Adelaide, South Australia,Australia

* [email protected]

Abstract

Predict mean time to failure (MTTF) or mean time between failures(MTBF) andmedian survival time are quite common in Engineering reliability researches. In medicalliterature most prediction models are used to predict probabilities during a certainperiod. In this paper we introduced detailed calculations to predict different survivaltimes using Weibull accelerated failure time regression model and assessed the accuracyof the point predictions. The method to construct confidence interval for the predictedsurvival time was also discussed.

Weibull distribution 1

The Weibull distribution is also called type III extreme value distribution. [1] The 2

distribution has three parameters, the location parameter µ, the scale parameter ρ and 3

the shape parameter γ. The location parameter µ was predetermined as the minimum 4

value in the distribution, we usually choose 0 as the minimum value for survival 5

analysis, therefore, the distribution is reduced to a two-parameter distribution. 6

The cumulative distribution function(CDF) for a two parameter Weibull distributed 7

random variable T ∼W (ρ, γ) is given as 8

FT (t; ρ, γ) = 1− exp

[−(

t

ρ)γ]

(1)

for t ≥ 0, ρ > 0, γ > 0 and F (t; ρ, γ) = 0 for t < 0 9

The probability density function(PDF) of the Weibull distribution is given as 10

fT (t; ρ, γ) = F′(t; ρ, γ) =

γ

ρ(t

ρ)γ−1 exp

[−(

t

ρ)γ]

(2)

The survival function of the Weibull distribution is given as 11

ST (t) = 1− FT (t) = exp

[−(

t

ρ)γ]

(3)

June 23, 2018 1/14

The mean survival time or mean time to failure(MTTF) is given as

E(T ) =

∫ ∞0

S(t)dt (Darth Vader rule)

=

∫ ∞0

exp

[−(

t

ρ)γ]dt let (

t

ρ)γ = u⇒ t = ρu

=

∫ ∞0

e−uρ1

γu

1γ−1du

= ρ1

γΓ(

1

γ) note

1

γΓ(

1

γ) = Γ(

1

γ+ 1)

= ρΓ(1

γ+ 1) (4)

Log-Weibull distribution 12

The log-Weibull distribution is also called Gumbel distribution, or type I extreme value 13

distribution. [2] Let T ∼W (ρ, γ) and Y = g(T ) = logT is a one-one transformation 14

from support T = {t|t > 0} to Y = {y| −∞ < y <∞}. The inverse of Y is 15

T = g−1(Y ) = eY

and the Jacobian is given as

|J | = |dg−1(Y )

dY| = eY

The probability density function of Y is then derived from (2)

fY (y) = fT (g−1(y))|J | = γ

ρ(ey

ρ)γ−1 exp

[−(ey

ρ)γ]ey =

γ

ρ

ey(γ−1)

ρ(γ−1)ey exp

[−e

γy

ργ

]= γ

eγy

ργexp

[−e

γy

ργ

]note (ργ = eγ log ρ)

= γeγy

eγlogρexp

[− eγy

eγlogρ

]= γeγy−γ log ρ exp

[−eγy−γlogρ

]= γeγ(y−log ρ) exp

[−eγ(y−log ρ)

](let γ =

1

band log ρ = a)

=1

bexp(

y − ab

) exp

[− exp(

y − ab

)

]where −∞ < y <∞ (5)

It shows log-Weibull distribution has a Gumbel distribution G(a, b), where a = log ρand b = 1

γ . The cumulative distribution function of the log-Weibull distribution can bederived from the above PDF or by the definition directly.

FY (y) = P (Y ≤ y) = P (log(T ) ≤ y) = P (T ≤ ey) = FT (ey)

By Eq(1) we get

FY (y) = FT (ey) = 1− exp

[−(ey

a)γ]

= 1− exp

[−e

γy

]= 1− exp

[− eγy

eγ log ρ

]= 1− exp

[−eγ(y−log ρ)

]= 1− exp

[− exp(

y − ab

)

]where γ =

1

b, log ρ = a (6)

June 23, 2018 2/14

The survival function of Y = log(T ) is given by 16

SY (y) = 1− FY (y) = exp

[− exp(

y − ab

)

](7)

The hazard function is given by 17

hY (y) =fY (y)

SY (y)=

1

bexp(

y − ab

)

Weibull AFT regression model 18

Let T be the survival time. Suppose we have a random sample of size n from a target 19

population. For a subject i(i = 1, 2, ..., n), we have observed values of covariates 20

xi1, xi2, ..., xip and possibly censored survival time ti. We can write Weibull accelerated 21

failure time (AFT) model as 22

log(ti) = β0 + β1xi1 + ...+ βpxip + σεi = x′iβ + σεi, i = 1, 2, ..., n (8)

where β = (β0, ..., βp) are the regression coefficients of interest, σ is a scale parameter 23

and ε1, ...εn are i.i.d distributed according to Gumbel distribution with the PDF 24

fε(x) = exp(x) exp [− exp(x)] (9)

and the CDF 25

Fε(x) = 1− exp [− exp(x)] (10)

Note this is equal to a = 0, b = 1 in Eq(5) and Eq(6). We denote it as G(0, 1) 26

distribution or a standard Gumbel distribution. 27

Now, let us find the PDF of T from Eq(8)

log(T ) = x′β + σε

⇒ T = ex′β+σε

⇒ ε = g−1(T ) =log(T )− x′β

σ(11)

⇒ |J | = |d(g−1(T ))

dT| = 1

σT(12)

Put Eq(11) and Eq(12) to Eq(9) we get

fT (t) = fε(g−1(t))|J | = exp(

log(t)− x′β

σ) exp

[− exp(

log(t)− x′β

σ)

]1

σt(13)

= (t

exp(x′β))

1σ exp

[−(

t

exp(x′β))

]1

σt

=1/σ

exp(x′β)(

t

exp(x′β))

1σ−1 exp

[−(

t

exp(x′β))

]Compare Eq(13) with Eq (2) and let γ = 1

σ and ρ = exp(x′β), we can see T has a 28

Weibull distribution T ∼W (exp(x′β), 1σ ). 29

Refer to Eq(3), now the survival function of T ∼W (exp(x′β), 1σ ) can be written as 30

ST (t) = exp

[−(

t

exp(x′β))

](14)

June 23, 2018 3/14

Refer to Eq(4) the expected survival time of W (exp(x′β), 1σ ) is given as 31

E(T ) = exp(x′β)Γ(σ + 1) (15)

Since most statistical software use log(T ) to calculate the parameters, let us show 32

distribution and characteristics of the log(T ). Let 33

Y = log(T ) = x′β + σε

34

⇒ ε = g−1(Y ) =Y − x′β

σ(16)

35

⇒ |J | = |d(g−1(Y ))

dY| = 1

σ(17)

Put Eq(16) and Eq(17) to Eq(9) we get 36

fY (y) = fε(g−1(Y ))|J | = 1

σexp(

y − x′β

σ) exp

[− exp(

y − x′β

σ)

](18)

Compare Eq(18) to Eq(5) we can see Y i.e. log(T ) has a G(x′β, σ) distribution. We 37

also can see the use of error term ε which has a G(0, 1) distribution is the similar as the 38

error term in simple linear regression that has a N(0, σ2) distribution. 39

Refer to Eq(13) and Eq(18), we can see in Weibull AFT model, T has a Weibull 40

W (exp(x′β, 1σ )) distribution, and log(T ) has a Gumbel G(x′β, σ) distribution. 41

From Eq (7) the survival function of Y i.e. log(T ) is given as 42

SY (y) = exp

[− exp(

y − x′β

σ)

](19)

And the expectation of Y i.e log(T ) is calculated as

E(Y ) = E(log(T ) =

∫ ∞−∞

yifY (yi)dyi

=

∫ ∞−∞

y1

σexp(

y − x′β

σ) exp

[− exp(

y − x′β

σ)

]dy (let z = exp(

y − x′β

σ))

=1

σ

∫ ∞0

(σ log z + x′β)z exp(−z)d(σ log z + x′β) (y = σ log z + x′β)

=

∫ ∞0

(σ log z + x′β) exp(−z)dz = σ

∫ ∞0

log z exp(−z)dz + x′β

∫ ∞0

exp(−z)dz

= σ

∫ ∞0

∂α

[zαe−z

]α=0

dz + x′β

= σ∂

∂α

[∫ ∞0

zαe−zdz

]α=0

(differentiating under the integral, notedzα

dα= log(z)zα)

+ x′β

= σΓ(1)′ + x′β

= x′β − σξ (where Γ(1)′ = −ξ ≈ −0· 57721, ξ is the Euler-Mascheroni Constant )

Note, by Jensen’s inequality, E(log(T )) ≤ log(E(T )) since log(x) is a concave down 43

function, we should not use exp(x′β − σξ) to calculate the expected survival time. Eq 44

(15) is the correct formula to be used to calculate the expected survival time. 45

June 23, 2018 4/14

Parameter estimation for the Weibull AFT model 46

The parameters of Weibull AFT model can be estimated by the maximum likelihoodmethod. The likelihood function of the n observed log(t) time, y1, y2, ..., yn is given by

L(β, σ; yi) =

n∏i=1

[fY (yi)]δi [SY (yi)]

1−δi

=

n∏i=1

{1

σexp(

yi − x′β

σ) exp

[− exp(

yi − x′β

σ)

]}{exp

[− exp(

yi − x′β

σ)

]}(20)

where δi is the event indicator for ith subject with δi = 1 if an event has occured and 47

δi = 0 if the event has not occurred. The maximum likelihood estimates p+ 1 48

parameters σ, β1...., βp. We can take log of the likelihood functin and use 49

Newton-Raphson method to calculate these parameters. Most statistical software can 50

do the calculations 51

Predict mean survival time by the Weibull AFT 52

model 53

In reliability researches mean survival time is called mean time to failure(MTTF) or 54

mean time between failures(MTBF). [3] 55

Suppose we want to predict a person i’s mean survival time ti by the Weibull AFT 56

model. First we use MLE method of Eq(20) to calculate the β̂ and σ̂ then by the 57

invariance property of MLE, we use Eq(15) to calculate the predicted MTTF directly. 58

ti = ˆE(ti) = exp(x′iβ̂)Γ(σ̂ + 1) (21)

After we calculate the mean time to failure(MTTF), we can use the Delta method to 59

calculate the confidence interval for the MTTF. We will treat the predicted MTTF as a 60

function of β̂ and σ̂. The standard error of the MTTF can be calculated as 61

SE =

∂ ˆE(ti)

∂β̂∂ ˆE(ti)∂σ̂

t

Σσ̂β̂

∂ ˆE(ti)

∂β̂∂ ˆE(ti)∂σ̂

12

(22)

where Σσ̂β̂ is the variance-covariance matrix of β̂ and σ̂. It can be estimated by the 62

observed Fisher information of the Weibull AFT model. The (1-α)% confidence interval 63

is given as 64

t̂i − z1−α2 SE < ti < t̂i + z1−α2 SE (23)

where α is the type I error, z is the quantile of the standard normal distribution. 65

Predict median survival time by the Weibull AFT 66

model 67

Another important statistic in survival analysis is the median survival time or percentilesurvival time. The pth percentile of survival time is calculated from the survivalfunction. For the individual i the pth percentile of survival time is calculated as

ST (ti(p)) =100− p

100

June 23, 2018 5/14

For Weibull AFT model we use Eq(14) to calculate pth survival time of anindividual i.

ST (ti) = exp

[−(

tiexp(x′β)

)1σ

]=

100− p100

⇒ (ti

exp(x′β))

1σ = − log(

100− p100

)

⇒ ti =

[− log(

100− p100

)

]σexp(x′

iβ)

After we get β̂ and σ̂ from Eq(20) and use the the invariance property of MLE, the 68

median survival time is estimated by 69

ti(50) = (log 2)σ̂ exp(x′iβ̂) (24)

Still, we can treat the estimated survival time percentile as a function of σ̂ and β̂ 70

when p is fixed, we use Delta method to calculate the standard error of predicted pth 71

survival time. The method is the same as in Eq(22) and Eq(23) 72

Minimum prediction error survival time(MPET) 73

Both mean and median survival time estimates are biased when small sample is usedand the model incorporate censor. [3] Henderson et al purpose a method to find theoptimum prediction time with the minimum prediction error. [4] They define that if aobserved survival time t falls in the interval pk < t < kp where p is the predicted survivaltime and k > 1 then the prediction is accurate. The probability of prediction error Ekcondition on the predicted time p is given as

P (Ek|p) = P (T < p/k) + P (T > kp) (observed time falls outside the bounds)

= 1− ST (p/k) + ST (kp)

It can be shown when 74

fT (p/k) = k2f(kp) (25)

the probability of prediction error P (Ek|p) will achieve the minimum value. 75

Now let us calculate the minimum prediction error for Weibull AFT model. FromEq(13) we get

fT (p/k) =1/σ

exp(x′β)(

p/k

exp(x′β))

1σ−1 exp

[−(

p/k

exp(x′β))

]k2fT (kp) = k2

1/σ

exp(x′β)(

kp

exp(x′β))

1σ−1 exp

[−(

kp

exp(x′β))

]From Eq(25) and cancel the common parts, we get

k1−1σ exp

[−(

p/k

exp(x′β))

]= k1+

1σ exp

[−(

kp

exp(x′β))

]take log from both sides we get

(1− 1

σ) log(k)− (

p/k

exp(x′β))

1σ = (1 +

1

σ) log(k)− (

kp

exp(x′β))

June 23, 2018 6/14

Rearrange these terms we get 76

p =

[ 2σ log(k)

k1σ − k− 1

σ

]σexp(x′β) (26)

Here p is the minimum prediction error survival time. We may use Delta method to get 77

the stand error of the minimum prediction error survival time. Bootstrap methods also 78

could be used to get a confidence interval. 79

An example to predict the survival time 80

We use a published larynx cancer dataset [5] to demonstrate the calculations of the 81

predictions. The data set includes 90 male larynx cancer patients with five variables: 82

stage of disease (1=stage 1, 2=stage2, 3=stage 3, 4=stage 4), time to death or on-study 83

time in months, age at diagnosis of larynx cancer, year of diagnosis of larynx cancer and 84

death indicator (0=alive, 1=dead).The author added a variable ID into the dataset and 85

changed the variable name delta to death. The data set can be downloaded from here. 86

The larynx cancer data have the following structure. 87

ID stage time age diagyr death 88

1 1 0.6 77 76 1 89

2 1 1.3 53 71 1 90

3 1 2.4 45 71 1 91

4 1 2.5 57 78 0 92

5 1 3.2 58 74 1 93

... ... ... ... 94

46 2 6.2 74 72 1 95

47 2 7.0 62 73 1 96

48 2 7.5 50 73 0 97

49 2 7.6 53 73 0 98

50 2 9.3 61 71 0 99

51 3 0.3 49 72 1 100

52 3 0.3 71 76 1 101

... ... ... ... 102

89 4 3.8 84 74 1 103

90 4 4.3 48 76 0 104

We will use two predictors to predict the survival time, one predictor is the stage of 105

disease, another predictor is the age at diagnosis of larynx cancer. Since stage is a 106

categorical variable, we will create three dummy variables for the stage and make stage 107

1 as the default reference group. The Weibull AFT model can be written as 108

log(T ) = β0 + β1 ∗ stage2 + β2 ∗ stage3 + β3 ∗ stage4 + β4 ∗ age+ σε ε ∼ G(0, 1)

Most statistical software can run the Weibull regression model, we use R here. 109

library(survival) 110

larynx<-read.csv("D:/larynx.csv") 111

wr <- survreg(Surv(time, death) ~ factor(stage) + age, 112

data = larynx,dist="w") 113

summary(wr) 114

We got the following results 115

Call: 116

survreg(formula = Surv(time, death) ~ factor(stage) + age, data = larynx, 117

June 23, 2018 7/14

dist = "w") 118

Value Std. Error z p 119

(Intercept) 3.5288 0.9041 3.903 9.50e-05 120

factor(stage)2 -0.1477 0.4076 -0.362 7.17e-01 121

factor(stage)3 -0.5866 0.3199 -1.833 6.68e-02 122

factor(stage)4 -1.5441 0.3633 -4.251 2.13e-05 123

age -0.0175 0.0128 -1.367 1.72e-01 124

Log(scale) -0.1223 0.1225 -0.999 3.18e-01 125

Scale= 0.885 126

Weibull distribution 127

Loglik(model)= -141.4 Loglik(intercept only)= -151.1 128

Chisq= 19.37 on 4 degrees of freedom, p= 0.00066 129

Number of Newton-Raphson Iterations: 5 130

n= 90 131

Suppose we try to predict the survival time of a patient ID=46 who was at larynxcancer stage 2 and aged 74. We use Eq(21) to calculate his expected survival time(mean, or MTTF)

MTTF46 = ˆE(t46) = exp(x′iβ̂)Γ(σ̂ + 1)

= exp(3.5288− 0.1477 ∗ 1− 0.5866 ∗ 0− 1.5441 ∗ 0− 0.0175 ∗ 74) ∗ Γ(1.885)

= 7.7(months)

We use Eq(24) to predict his median survival time

Median46 = (log 2)σ̂ exp(x′iβ̂)

= log(2)0.885exp(3.5288− 0.1477 ∗ 1− 0.5866 ∗ 0− 1.5441 ∗ 0− 0.0175 ∗ 74)

= 5.8(months)

Now let us calculate the standard error of the predicted median time. By Eq(22)

SE =

∂ ˆE(ti)

∂β̂∂ ˆE(ti)∂σ̂

t

Σσ̂β̂

∂ ˆE(ti)

∂β̂∂ ˆE(ti)∂σ̂

12

=

(∂(log 2)σ̂ exp(x′

iβ̂)

∂β̂∂(log 2)σ̂ exp(x′

iβ̂)∂σ̂

)tΣσ̂β̂

(∂(log 2)σ̂ exp(x′

iβ̂)

∂β̂∂(log 2)σ̂ exp(x′

iβ̂)∂σ̂

)12

=

(log 2)σ̂ exp(x′β̂)

(log 2)σ̂ exp(x′β̂) ∗ stage2(log 2)σ̂ exp(x′β̂) ∗ stage3(log 2)σ̂ exp(x′β̂) ∗ stage4(log 2)σ̂ exp(x′β̂) ∗ age(log 2)σ̂ log(log 2) exp(x′β̂)

t

Σσ̂β̂

(log 2)σ̂ exp(x′β̂)

(log 2)σ̂ exp(x′β̂) ∗ stage2(log 2)σ̂ exp(x′β̂) ∗ stage3(log 2)σ̂ exp(x′β̂) ∗ stage4(log 2)σ̂ exp(x′β̂) ∗ age(log 2)σ̂ log(log 2) exp(x′β̂)

12

(27)

First, we need to find the variance-covariance matrix Σσ̂β̂, this can be calculated by the 132

observed Fisher information of the Weibull AFT model, most software can calculate this 133

variance-covariance matrix, we use R to do this job. 134

wr$var 135

We get 136

June 23, 2018 8/14

(Intercept) stage2 stage3 stage4 age Log(scale) 137

(Intercept) 0.817 -0.09049 -0.08479 -0.0444 -0.01114 0.02591 138

stage2 -0.090 0.16611 0.05319 0.0507 0.00057 0.00016 139

stage3 -0.085 0.05319 0.10237 0.0567 0.00042 -0.00731 140

stage4 -0.044 0.05068 0.05668 0.1320 -0.00020 -0.01070 141

age -0.011 0.00057 0.00042 -0.0002 0.00016 -0.00026 142

Log(scale) 0.026 0.00016 -0.00731 -0.0107 -0.00026 0.01501 143

Note, from R what we got is the variance-covariance of coefficients and log(scale), weneed to change the log(scale) to scale. We need do some extra calculations. Useformulas in page 401 of the book by John Klein [5] and set g1(σ,β) = β andg2(σ,β) = σ = elog(σ), i.e σ = eθ is a function of θ and θ = log(σ), we get

Cov(β0, σ, ) = Cov(β0, elog(σ)) = Cov(β0, log(σ)) ∗ σ = 0.02292735

Cov(β1, σ, ) = Cov(β1, elog(σ)) = Cov(β1, log(σ)) ∗ σ = 0.0001403178

Cov(β2, σ, ) = Cov(β2, elog(σ)) = Cov(β2, log(σ)) ∗ σ − 0.006469443

Cov(β3, σ, ) = Cov(β3, elog(σ)) = Cov(β3, log(σ)) ∗ σ = −0.009470604

Cov(β4, σ, ) = Cov(β4, elog(σ)) = Cov(β4, log(σ)) ∗ σ = −0.0002297781

Cov(σ) = Cov(elog(σ)) = (elog(σ))2Cov(log(σ)) = σ2V ar(log(σ)) = 0.0117501

Now we will use these six values to replace the last row and column of the 144

variance-covariance matrix of the coefficients and log(scale). We get 145

[,1] [,2] [,3] [,4] [,5] [,6] 146

[1,] 0.817 -0.09049 -0.08479 -0.0444 -0.01114 0.02293 147

[2,] -0.090 0.16611 0.05319 0.0507 0.00057 0.00014 148

[3,] -0.085 0.05319 0.10237 0.0567 0.00042 -0.00647 149

[4,] -0.044 0.05068 0.05668 0.1320 -0.00020 -0.00947 150

[5,] -0.011 0.00057 0.00042 -0.0002 0.00016 -0.00023 151

[6,] 0.023 0.00014 -0.00647 -0.0095 -0.00023 0.01175 152

This is the Σσ̂β̂ matrix we needed. 153

If we use SAS software we can directly get the variance-covarince matrix of β̂ and σ̂ 154

by using the following statements 155

proc lifereg data=larynx order=data COVOUT outest=est; 156

class stage; 157

model time*death(0)=stage age/dist=weibull; 158

run; 159

proc print data=est; 160

run; 161

June 23, 2018 9/14

The right side vector of the Eq(27) is calculated as

(log 2)σ̂ exp(x′β̂)

(log 2)σ̂ exp(x′β̂) ∗ stage2(log 2)σ̂ exp(x′β̂) ∗ stage3(log 2)σ̂ exp(x′β̂) ∗ stage4(log 2)σ̂ exp(x′β̂) ∗ age(log 2)σ̂ log(log 2) exp(x′β̂)

=

(log 2)0.885 exp(3.5288− 0.1477 ∗ 1− 0.5866 ∗ 0− 1.5441 ∗ 0− 0.0175 ∗ 74)(log 2)0.885 exp(3.5288− 0.1477 ∗ 1− 0.5866 ∗ 0− 1.5441 ∗ 0− 0.0175 ∗ 74) ∗ 1(log 2)0.885 exp(3.5288− 0.1477 ∗ 1− 0.5866 ∗ 0− 1.5441 ∗ 0− 0.0175 ∗ 74) ∗ 0(log 2)0.885a exp(3.5288− 0.1477 ∗ 1− 0.5866 ∗ 0− 1.5441 ∗ 0− 0.0175 ∗ 74) ∗ 0(log 2)0.885 exp(3.5288− 0.1477 ∗ 1− 0.5866 ∗ 0− 1.5441 ∗ 0− 0.0175 ∗ 74) ∗ 74(log 2)0.885 log log(log 2) exp(3.5288− 0.1477 ∗ 1− 0.5866 ∗ 0− 1.5441 ∗ 0− 0.0175 ∗ 74)

=

5.83835.8383

00

432.03−7.7915

162

Now we have everything to calculate the standard error of the median survival time.

SE =

5.83835.8383

00

432.03−7.7915

t0.817 −0.0905 −0.0848 −0.0444 −0.0111 0.0229−0.090 0.1661 0.0532 0.0507 0.00057 0.00014−0.0859 0.0532 0.1024 0.0567 0.00042 −0.0065−0.044 0.0507 0.0567 0.1320 −0.00020 −0.0095−0.011 0.00057 0.0004 −0.00020 0.00016 −0.000230.023 0.00014 −0.0065 −0.0095 −0.00023 0.01175

5.83835.8383

00

432.03−7.7915

12

= 2.156133

Therefore, the 95% confidence interval is given as 163

95%CI : (5.83− 1.96 ∗ 2.16 < Median46 < 5.83 + 1.96 ∗ 2.16) = (1.60 to 10.01) months.

Which means we are 95% confident that the survival time will be within 1.60 to 10.01 164

months. We can also use the R build in function predict to predict the median survival 165

time. 166

Median46<-predict(wr, newdata=data.frame(stage=2,age=74),type="quantile", 167

p=0.5,se.fit=TRUE) 168

Median46 169

We get 170

$fit 171

5.838288 172

$se.fit 173

2.095133 174

The standard error is a little bit different from our calculations which might be due to 175

the round off error. 176

We use Eq(26) to calculate minimum prediction error survival time (MPET) and wefix k=2.

MPET46 =

[ 2σlog(k)

k1σ − k−

]σexp(x′β)

=

[ 2σlog(k)

21

0.885 − k−1

0.885

]0.885∗ exp(3.5288− 0.1477 ∗ 1− 0.5866 ∗ 0− 1.5441 ∗ 0− 0.0175 ∗ 74)

= 7.4(months)

June 23, 2018 10/14

It seems the three prediction methods are all quite close to the real survival time of the 177

patient ID=46 which was 6.2 months. 178

Note in R build in predict function for Weibull AFT model type= ”response” 179

calculates exp(x′β̂) without the Γ(1 + σ̂) and type=”lp” calculates x′β̂ only, we should 180

not use them to predict MTTF. There is no software to calculate the minimum 181

prediction error survival time. 182

Assess the point prediction accuracy 183

Parkes [6] suggested a simple method to measure the accuracy of the predicted survival 184

time. Let t be the observed survival time and p is the predicted time, if p/k ≤ t ≤ kp 185

then the point prediction p is defined as ”accurate”, outside the interval is defined as” 186

inaccurate”. Christakis and Lamont purposed a 33 percent rule to measure the 187

accuracy, where they divided the observed time by the predicted survival time, and 188

defined the prediction is “accurate” if this quotient was between 0.67 and 1.33. Values 189

less than 0.67 or greater than 1.33 were defined as “error”. [7] This method in fact is 190

just fix k = 3 in Parkes’ method. We choose k = 2 for our accuracy assessment. The 191

accurate rate is defined as the proportion of ”accurate” prediction over the total sample 192

size. The results were presented in table 1. 193

Discussion 194

In this paper, we introduced how to use Weibull AFT model to predict survival times. 195

Mean survival time (mean time to failure time, mean time between failures), median 196

survival time and minimum prediction error survival time were used to predict the 197

survival time and the prediction accuracy was assessed by Parke’s method. When we 198

fixed k = 2 the accuracy was 55.6% for median, 50% for MTTF and 51.1% for MPET. 199

If we fixed k = 3 as suggested by Christakis and Lamont the accuracy rate was 200

increased to 77.8%,66.7 % and 67.8%, respectively. The sample we used is quite small 201

and we only used two predictors, with bigger sample size and more predictors the 202

accuracy rate might be even higher. In this sample we did not observe that minimum 203

prediction error time had better accuracy rate than median survival time. 204

The parametric survival models have advantages in predicting survival time than the 205

semi-parametric Cox regression model. The Cox regression model which can be 206

specified as Si(t|xi) = S0(t)exp(x′iβ) cannot predict time directly. What it can do is to 207

specify a certain time first then to calculate the probability for that period of time. The 208

disadvantage of the parametric survival models is that we have to make stronger 209

assumptions than semi-parametric models. [8] 210

Currently, most clinical prediction models calculate a patient’s probability of having 211

or developing a certain disease or risk scores that based on the calculated 212

probabilities. [9]However, provide a probability seems difficult to be understand by 213

general population and probability itself can be defined by quite different ways. [10] In 214

practice, the time axis remains the most natural measure for both clinicians and patients. 215

It is much easier to understand a survival time rather than a subjective assessment of 216

probability of survival to a certain time point. [4] Predicting survival time can provide a 217

practical and concrete guide to clinicians and health care providers to manage their 218

patients and help families and patients make suitable plans for the remaining lifespan. 219

References

1. Gorgoso-Varela JJ, Rojo-Alboreca A. Use of Gumbel and Weibull functions tomodel extreme values of diameter distributions in forest stands. Annals of forest

June 23, 2018 11/14

science. 2014;71(7):741-50.

2. Lai C-D. Generalized Weibull Distributions. Generalized Weibull Distributions:Springer; 2014. p. 23-75.

3. Ho L, Silva A. Unbiased estimators for mean time to failure and percentiles in aWeibull regression model. International Journal of Quality & ReliabilityManagement.2006;23(3):323-39.

4. Henderson R, Jones M, Stare J. Accuracy of point predictions in survivalanalysis. Statistics in medicine. 2001;20(20):3083-96.

5. Klein JP, Moeschberger ML. Survival analysis: techniques for censored andtruncated data. Springer Science & Business Media; 2005.

6. Parkes CM. Accuracy of predictions of survival in later stages of cancer. Britishmedical journal. 1972;2(5804):29.

7. Christakis NA, Smith JL, Parkes CM, Lamont EB. Extent and determinants oferror in doctors’ prognoses in terminally ill patients: prospective cohortstudyCommentary: Why do doctors overestimate? Commentary: Prognosesshould be based on proved indices not intuition. Bmj. 2000;320(7233):469-73.

8. Nardi A, Schemper M. Comparing Cox and parametric models in clinical studies.Statistics in medicine. 2003;22(23):3597-610.

9. Lee Y-h, Bang H, Kim DJ. How to establish clinical prediction models.Endocrinology and Metabolism. 2016;31(1):38-44.

10. Saunders S. What is probability? Quo vadis quantum mechanics? Springer; 2005.p. 209-38.

June 23, 2018 12/14

Table 1. Prediction results and accuracy (last digit in predicted time: 0 inaccurate, 1,accurate)

ID stage age death time Median(95% CI) MTTF MPET1 1 77 1 0.6 6.42(3.16,9.68),0 8.47,0 8.11,02 1 53 1 1.3 9.77(4.07,15.46),0 12.9,0 12.34,03 1 45 1 2.4 11.23(3.07,19.39),0 14.84,0 14.19,04 1 57 0 2.5 9.11(4.32,13.89),0 12.03,0 11.5,05 1 58 1 3.2 8.95(4.36,13.54),0 11.82,0 11.3,06 1 51 0 3.2 10.11(3.88,16.34),0 13.36,0 12.78,07 1 76 1 3.3 6.54(3.29,9.78),1 8.62,0 8.25,08 1 63 0 3.3 8.2(4.37,12.03),0 10.83,0 10.36,09 1 43 1 3.5 11.63(2.72,20.54),0 15.36,0 14.7,010 1 60 1 3.5 8.64(4.4,12.89),0 11.41,0 10.91,011 1 52 1 4 9.94(3.98,15.89),0 13.13,0 12.55,012 1 63 1 4 8.2(4.37,12.03),0 10.83,0 10.36,013 1 86 1 4.3 5.49(1.96,9.01),1 7.24,1 6.92,114 1 48 0 4.5 10.66(3.52,17.79),0 14.08,0 13.47,015 1 68 0 4.5 7.52(4.12,10.91),1 9.92,0 9.49,016 1 81 1 5.3 5.99(2.63,9.34),1 7.9,1 7.56,117 1 70 0 5.5 7.26(3.96,10.56),1 9.58,1 9.16,118 1 58 0 5.9 8.95(4.36,13.54),1 11.82,0 11.3,119 1 47 0 5.9 10.84(3.38,18.31),1 14.33,0 13.7,020 1 75 1 6 6.65(3.41,9.89),1 8.78,1 8.39,121 1 77 0 6.1 6.42(3.16,9.68),1 8.47,1 8.11,122 1 64 0 6.2 8.06(4.34,11.77),1 10.64,1 10.18,123 1 77 1 6.4 6.42(3.16,9.68),1 8.47,1 8.11,124 1 67 1 6.5 7.65(4.19,11.1),1 10.1,1 9.66,125 1 79 0 6.5 6.2(2.9,9.5),1 8.18,1 7.83,126 1 61 0 6.7 8.49(4.4,12.58),1 11.21,1 10.73,127 1 66 0 7 7.78(4.25,11.31),1 10.27,1 9.83,128 1 68 1 7.4 7.52(4.12,10.91),1 9.92,1 9.49,129 1 73 0 7.4 6.89(3.65,10.12),1 9.09,1 8.69,130 1 56 0 8.1 9.27(4.28,14.26),1 12.24,1 11.71,131 1 73 0 8.1 6.89(3.65,10.12),1 9.09,1 8.69,132 1 58 0 9.6 8.95(4.36,13.54),1 11.82,1 11.3,133 1 68 0 10.7 7.52(4.12,10.91),1 9.92,1 9.49,134 2 86 1 0.2 4.73(0.68,8.78),0 6.25,0 5.97,035 2 64 1 1.8 6.95(2.37,11.54),0 9.18,0 8.78,036 2 63 1 2 7.07(2.4,11.75),0 9.34,0 8.93,037 2 71 0 2.2 6.15(1.96,10.34),0 8.12,0 7.77,038 2 67 0 2.6 6.6(2.22,10.97),0 8.71,0 8.33,039 2 51 0 3.3 8.72(2.28,15.17),0 11.52,0 11.02,040 2 70 1 3.6 6.26(2.03,10.49),1 8.26,0 7.9,041 2 72 0 3.6 6.05(1.89,10.2),1 7.98,0 7.63,042 2 81 1 4 5.17(1.13,9.21),1 6.82,1 6.52,143 2 47 0 4.3 9.36(1.98,16.73),0 12.36,0 11.82,044 2 64 0 4.3 6.95(2.37,11.54),1 9.18,0 8.78,045 2 66 0 5 6.71(2.28,11.15),1 8.86,1 8.48,1

June 23, 2018 13/14

Table 1 continued Prediction results and accuracy (last digit in predicted time: 0inaccurate, 1, accurate)

ID stage age death time Median(95% CI) MTTF MPET46 2 74 1 6.2 5.84(1.73,9.94),1 7.7,1 7.37,147 2 62 1 7 7.2(2.43,11.97),1 9.51,1 9.09,148 2 50 0 7.5 8.88(2.22,15.54),1 11.73,1 11.22,149 2 53 0 7.6 8.42(2.38,14.47),1 11.13,1 10.64,150 2 61 0 9.3 7.33(2.46,12.2),1 9.67,1 9.25,151 3 49 1 0.3 5.83(2.41,9.24),0 7.69,0 7.36,052 3 71 1 0.3 3.97(2.19,5.75),0 5.24,0 5.01,053 3 57 1 0.5 5.07(2.68,7.45),0 6.69,0 6.4,054 3 79 1 0.7 3.45(1.56,5.34),0 4.55,0 4.35,055 3 82 1 0.8 3.27(1.32,5.23),0 4.32,0 4.13,056 3 49 1 1 5.83(2.41,9.24),0 7.69,0 7.36,057 3 60 1 1.3 4.81(2.68,6.94),0 6.35,0 6.07,058 3 64 1 1.6 4.48(2.58,6.39),0 5.92,0 5.66,059 3 74 1 1.8 3.76(1.96,5.56),0 4.97,0 4.75,060 3 72 1 1.9 3.9(2.12,5.68),0 5.14,0 4.92,061 3 53 1 1.9 5.43(2.6,8.27),0 7.17,0 6.86,062 3 54 1 3.2 5.34(2.63,8.05),1 7.05,0 6.74,063 3 81 1 3.5 3.33(1.4,5.27),1 4.39,1 4.2,164 3 52 0 3.7 5.53(2.56,8.49),1 7.3,1 6.98,165 3 66 0 4.5 4.33(2.49,6.17),1 5.71,1 5.47,166 3 54 0 4.8 5.34(2.63,8.05),1 7.05,1 6.74,167 3 63 0 4.8 4.56(2.61,6.51),1 6.02,1 5.76,168 3 59 1 5 4.89(2.69,7.1),1 6.46,1 6.18,169 3 49 0 5 5.83(2.41,9.24),1 7.69,1 7.36,170 3 69 0 5.1 4.11(2.32,5.89),1 5.42,1 5.19,171 3 70 1 6.3 4.04(2.26,5.82),1 5.33,1 5.1,172 3 65 1 6.4 4.41(2.54,6.27),1 5.82,1 5.56,173 3 65 0 6.5 4.41(2.54,6.27),1 5.82,1 5.56,174 3 68 1 7.8 4.18(2.38,5.98),1 5.52,1 5.28,175 3 78 0 8 3.51(1.64,5.38),0 4.63,1 4.43,176 3 69 0 9.3 4.11(2.32,5.89),0 5.42,1 5.19,177 3 51 0 10.1 5.63(2.52,8.73),1 7.43,1 7.11,178 4 65 1 0.1 1.69(0.77,2.61),0 2.23,0 2.14,079 4 71 1 0.3 1.52(0.7,2.34),0 2.01,0 1.92,080 4 76 1 0.4 1.4(0.61,2.18),0 1.84,0 1.76,081 4 65 1 0.8 1.69(0.77,2.61),0 2.23,0 2.14,082 4 78 1 0.8 1.35(0.56,2.13),1 1.78,0 1.7,083 4 41 1 1 2.57(0.3,4.84),0 3.4,0 3.25,084 4 68 1 1.5 1.6(0.74,2.47),1 2.12,1 2.03,185 4 69 1 2 1.58(0.73,2.42),1 2.08,1 1.99,186 4 62 1 2.3 1.78(0.78,2.79),1 2.35,1 2.25,187 4 74 0 2.9 1.44(0.65,2.24),0 1.91,1 1.82,188 4 71 1 3.6 1.52(0.7,2.34),0 2.01,1 1.92,189 4 84 1 3.8 1.21(0.42,2.01),0 1.6,0 1.53,090 4 48 0 4.3 2.28(0.57,3.99),1 3.01,1 2.87,1Accuracy rate(%) 55.6%(50/90) 50%(45/90) 51.1%(46/90)

June 23, 2018 14/14