Introduction Kernel estimates Simulation study Real data Conclusion
Kernel estimation as alternative to(semi)parametric models in survival analysis
Iveta Selingerová, Stanislav Katina, Ivana Horová
Department of Mathematics and StatisticsFaculty of Science, Masaryk University
Brno
ROBUST 201611 - 16 September 2016
Introduction Kernel estimates Simulation study Real data Conclusion
Outline1 Introduction
Survival analysisKernel smoothing
2 Modeling hazard function using kernel methodTwo types of kernel estimates of conditional hazard functionCox proportional hazard modelBandwidth selection
3 Simulation studyWeibull modelLognormal modelCox modelGeneral hazard function
4 Real dataTriple negative breast cancer
5 Conclusion
Introduction Kernel estimates Simulation study Real data Conclusion
Survival analysis
Survival time T with distribution function FRandom right censoringCensoring time C with distribution function GObserved time Y = min(T ,C) with distribution function L,censoring indicator δ = I(T ≤ C)Survival function F (t) = P(T ≥ t) = 1− F (t)Hazard function
λ(t) = lim∆t→0+
P(t ≤ T < t + ∆t|T ≥ t)∆t = f (t)
F (t)
Cumulative hazard function Λ(t) =∫ t
0 λ(u)du
Introduction Kernel estimates Simulation study Real data Conclusion
Properties
Survival function of observed time L(t) = F (t)G(t)Joint density of (Y , δ) is l(·, ·) and is composed of two”subdensities” l(·, 0) and l(·, 1)Subdensity of the event time r(t) = l(·, 1) = f (t)G(t)Hazard function
λ(t) = r(t)L(t)
Introduction Kernel estimates Simulation study Real data Conclusion
Nonparametric estimation
Kaplan-Meier estimator of the survival function
F KM(t) =
1 t ≤ Y(1),∏i :Y(i)<t
(n−i
n−i+1
)δ(i) t > Y(1).
Nelson-Aalen estimator of the cumulative hazard
ΛNA(t) =
0 t ≤ Y(1),∑i :Y(i)<t
δ(i)n−i+1 t > Y(1).
Smoothing techniques for estimation of hazard function λ(t)
Introduction Kernel estimates Simulation study Real data Conclusion
Survival models
X = (X1,X2, . . . ,Xp)T vector of p covariatesConditional hazard function
λ(t|x) = lim∆t→0+
P(t ≤ T < t + ∆t|T ≥ t,X = x)∆t
Parametric survival modelAssumption of distribution of survival timeAccelerated failure time (AFT) models, parametricproportional hazard (PH) models
Semiparametric survival model – Cox modelExponential dependence on covariatesAssumption proportionality of hazard ratioNonparametric estimate of baseline hazard
Nonparametric survival modelNo assumption of distribution or dependence on covariatesSmoothing techniques
Introduction Kernel estimates Simulation study Real data Conclusion
Kernel smoothing
The value of unknown function at a point is estimated as alocal weighted average of known observations in theneighbourhood of this pointKernel K – real function with support on [−1, 1] satisfying
∫ 1
−1x jK (x)dx =
1 j = 0,0 0 < j < k,bk 6= 0 j = k.
Epanechnikov kernel K (x) = −34(x2 − 1)I ([−1, 1])
Introduction Kernel estimates Simulation study Real data Conclusion
Kernel smoothing
The bandwidth sequence {h(n)}
limn→∞
h(n) = 0, limn→∞
nh(n) =∞
Statistical properties of an estimator ϑ(x) of a function ϑ(x)Local error of an estimator
MSE(ϑ(x)
)= E
(ϑ(x)− ϑ(x)
)2= Var
(ϑ(x)
)+Bias2
(ϑ(x)
)Global error of an estimator
MISE(ϑ)
=∫
MSE(ϑ(x)
)ω(x)dx
Introduction Kernel estimates Simulation study Real data Conclusion
Bandwidths
0 2 4 6 8 10 12 14 16 18 20
0
2
4
6
8
10
0
0.05
0.1
0.15
0.2
0.25
0.3
t
h
λ(t
)
Introduction Kernel estimates Simulation study Real data Conclusion
Two types of kernel conditional hazard estimates
External estimatorL(t|x) = F (t|x)G(t|x)λ(t|x) = f (t|x)
F (t|x) = f (t|x)G(t|x)L(t|x) = r(t|x)
L(t|x)
λE (t|x) = r(t|x)L(t|x)
=1ht
∑ni=1 wi (x)K
(t−Yi
ht
)δi∑n
i=1 wi (x)W(
Yi−tht
)Nadaraya–Watson weights
wi (x) =K(
x−Xihx
)∑n
j=1 K(
x−Xjhx
) , i = 1, . . . , n
Introduction Kernel estimates Simulation study Real data Conclusion
Two types of kernel conditional hazard estimates
Internal estimatorBeran estimator of conditional cumulative hazard function
Λ(t|x) =
0 t ≤ Y(1)∑i :Y(i)<t
δ(i)w(i)(x)1−∑i−1
j=1 w(j)(x)t > Y(1)
λI(t|x) = 1ht
∫K( t − u
ht
)dΛ(u|x)
= 1ht
n∑i=1
K( t − Y(i)
ht
)δ(i)w(i)(x)
1−∑i−1
j=1 w(j)(x)
ht influences smoothing in direction of the timehx influences smoothing in direction of the covariate
Introduction Kernel estimates Simulation study Real data Conclusion
Cox proportional hazards model
Cox modelλ(t|x) = λ0(t)eβx
Properties and assumptions:Hazard ratios are constant over time, i.e.
λ(t|x1)λ(t|x2) = eβ(x1−x2)
The dependence on covariate is exponentialNo distribution of the survival time is assumed
Maximum partial likelihood estimator of the model parameterβ
Introduction Kernel estimates Simulation study Real data Conclusion
Cox proportional hazards model
Breslow estimator of baseline hazard function
Λ0(t) =∑
i :t(i)<tλ0(t(i)) =
∑i :t(i)<t
di∑j∈Ri eβXj
Kernel estimator of baseline hazard function
λ0(t) = 1h
∫K( t − u
h
)dΛ0(u)
= 1h
n∑i=1
K( t − Yi
h
)δi∑n
j=1 I(Yj ≥ Yi )eβXj
Introduction Kernel estimates Simulation study Real data Conclusion
Bandwidth selection
Asymptotically optimal bandwidthsMinimize MISETheoretical value of bandwidthsNeed to know distribution of survival time, observed time, orcovariate
Introduction Kernel estimates Simulation study Real data Conclusion
Bandwidth selection
Cross-validation method
CV (hx , ht) = 1n
n∑i=1
∫λ3(t|Xi)e
−∫ t
0λ(u|Xi )dudt
− 2n
n∑i=1
λ2−i(Yi |Xi)
L(Yi |Xi)e−∫ Yi
0λ−i (u|Xi )du
δi
(hx,CV , ht,CV ) = arg min(hx ,ht ) CV (hx , ht)
0 2 4 6 8 10 12 14 16 18 20
0
24
6
810
−0.01
0
0.01
0.02
0.03
0.04
0.05
0.06
ht
hx
CV
(hx,h
t)
Introduction Kernel estimates Simulation study Real data Conclusion
Bandwidth selection
Maximum likelihood method
ML(hx , ht) =n∏
i=1
λδi−i(Yi |Xi)F−i(Yi |Xi)
(hx,ML, ht,ML) = arg max(hx ,ht ) ML(hx , ht)
0 2 4 6 8 10 12 14 16 18 20
0
2
4
6
8
100
1
2
3
4
5
6
7
8
x 10−207
ht
hx
ML(h
x,h
t)
Introduction Kernel estimates Simulation study Real data Conclusion
Simulation study
200 triples of observed data (Xi ,Yi , δi )100 samplesSpecified conditional hazard function λ(t|x)Covariate Xi ∼ U(0, 10)Survival time Ti obtained as Ti = F−1(Ui |Xi ), whereUi ∼ U(0, 1) and F (t|x) = 1− e−
∫ t0 λ(u|x)du
Censoring time Ci ∼ logN (µ; 0.22), where µ influencescensoring rate (30%, 60% or 90%)Observed time Yi = min(Ti ,Ci )Censoring indicator δi = 1 for Yi = Ti and δi = 0 for otherError between true and estimated conditional hazard functionmeasured by
ASE(λ) = 1n
n∑i=1
(λ(Yi |Xi)− λ(Yi |Xi)
)2
Introduction Kernel estimates Simulation study Real data Conclusion
Weibull model
λ(t|x) = νµtµ−1eβx
ν = 0.018, µ = 1.3,β = 0.2The interpretationusing PH or AFTmodel 0
5
10
15
20
0
2
4
6
8
10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
t
x
Introduction Kernel estimates Simulation study Real data Conclusion
Lognormal model
λ(t|x) =1σt φ( log t−βx
σ )Φ(βx−log t
σ )σ = 0.5, β = 1
3The interpretationusing AFT modelln Ti = βXi + εi ,εi ∼ N (0, σ2) 0
5
10
15
20
0
2
4
6
8
10
0
0.5
1
1.5
2
t
x
Introduction Kernel estimates Simulation study Real data Conclusion
Cox model
λ(t|x) =1
15e(1.5− cos
( t1.7 + 1
))eβx
β = 0.2λ0(t) =
115e(1.5− cos
( t1.7 + 1
))0
5
10
15
20
0
2
4
6
8
10
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
tx
Introduction Kernel estimates Simulation study Real data Conclusion
General hazard function
λ(t|x) =1
40000
(t (t − 25)2 + 200
)×
× (sin(0.7x − 5) + 2)
0
5
10
15
20
02
46
8100
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
tx
Introduction Kernel estimates Simulation study Real data Conclusion
Results of simulation study
If assumptions of the (semi)parametric models are satisfied,these models have slightly better resultsThis difference between models are diminished with increasingcensoring rateThe differences between kernel estimates and parametricmodels are more pronounced than between kernel estimatesand Cox model. This is due to stronger assumptions of theparametric models.The simulation of general conditional hazard function(violated assumptions of the (semi)parametric models) showskernel estimates as preferableIn practice, time distribution and dependence on covariatesare not known ⇒ general conditional hazard is most frequentsituation
Introduction Kernel estimates Simulation study Real data Conclusion
Triple negative breast cancer
408 patients diagnosed and/or treated at Masaryk MemorialCancer Institute in Brno in the period 2004–2011100 deaths (82 deaths in the first 4 years from diagnosis ∼80% censoring rate)The age at diagnosis varies between 25 and 88 years (theaverage 55 years)Is survival time affected by patient age? (Testing ofhypothesis that β = 0 versus β 6= 0 )
0 20 40 60 80 100 120 1400
50
100
150
200
250
300
350
400
Time (months)
Patient
death
censored
Patients’ times Death rate
Introduction Kernel estimates Simulation study Real data Conclusion
Triple negative breast cancer
010
2030
4050
20
40
60
80
1000
0.002
0.004
0.006
0.008
0.01
0.012
Time (months)Age (years)
Weibull model p-value=0.129β = 0.0128, ν = 0.0005, µ = 1.41PH interpretation: increase of age about one yearincreases risk of death 1.0128 timesAFT interpretation: increase of age about one yearshortens survival time 0.9910 times
010
2030
4050
20
40
60
80
1000
0.002
0.004
0.006
0.008
0.01
0.012
Time (months)Age (years)
Lognormal model p-value=0.045β = −0.0126, σ = 1.290, µ = 5.616AFT interpretation: increase of age about one yearshortens survival time 0.9875 times
010
2030
4050
20
40
60
80
1000
0.002
0.004
0.006
0.008
0.01
0.012
Time (months)Age (years)
Cox model p-value=0.127β = 0.0126PH interpretation: increase of age about one yearincreases risk of death by 1.0127 times
Introduction Kernel estimates Simulation study Real data Conclusion
Triple negative breast cancer
010
2030
4050
20
40
60
80
1000
0.002
0.004
0.006
0.008
0.01
0.012
Time (months)Age (years)
External kernel estimate
010
2030
4050
20
40
60
80
1000
0.002
0.004
0.006
0.008
0.01
0.012
Time (months)Age (years)
Internal kernel estimate
High risk of death for patients over 70 years and increased riskof death for patients up to 40 yearsSlightly different shape of hazard function for various agegroupsIt offers possibility to divide patients into groups with similarrisk of death (up to 40, 41-70, over 70 years)
Introduction Kernel estimates Simulation study Real data Conclusion
Conclusion
In practice, the theoretic shape of the conditional hazardfunction or distribution of the survival time is not known,assumptions of PH model are often violated – using kernelestimates is therefore very helpful alternativeThe kernel estimates of the conditional hazard function can beuseful for verifying assumptions of the (semi)parametricmodels and for finding thresholds of continuous variablesThe kernel estimates are able to capture any changes in thehazard function in direction of covariate and timeThe kernel methods are able to estimate different shapes ofthe hazard function without any constrains and smooth themoutThe kernel estimation produces functions more useful forpresentation
Introduction Kernel estimates Simulation study Real data Conclusion
Conclusion
Advantage DisadvantageKernel Flexibility Complexity of calculating
estimates Visualization Estimate influencedNo assumptions by choice of bandwidthFinding groupswith similar risk
Parametric Easy interpretationInadequate results can
be obtained whereassumptions of models
are violated
model Quality of estimate forknown time distribution
Semiparametric Easy interpretationmodel No distribution
assumption
Introduction Kernel estimates Simulation study Real data Conclusion
References
I. Selingerová, H. Doleželová, I. Horová, S. Katina, J. Zelinka, Survival ofPatients with Primary Brain Tumors: Comparison of Two StatisticalApproaches, PloS one 11(2), 2016.T. R. Fleming and D. P. Harrington, Counting processes and survivalanalysis, John Wiley & Sons, 2011.I. Horová, J. Koláček, J. Zelinka, Kernel Smoothing in MATLAB: Theoryand Practice of Kernel Smoothing, World Scientific Publishing, 2012.L. Spierdijk, Nonparametric Conditional Hazard Rate Estimation: A LocalLinear Approach, Computational Statistics & Data Analysis 52, 2008,2419–2434.M. Svoboda et al., Triple-Negative Breast Cancer: Analysis of PatientsDiagnosed and/or Treated at the Masaryk Memorial Cancer Institutebetween 2004–2009 (in Czech), Clinical Oncology 25(3), 2012, 188–198.
Introduction Kernel estimates Simulation study Real data Conclusion
Thank you for yourattention!