nonparametric estimation in the cox model · 4l-a25,47-a53,45-lio,4s-m05. febrnarv 27, 1989....
TRANSCRIPT
NONPARAMETRIC ESTIMATION IN THE COX MODEL
by
Finbarr O'Sullivan
TECHNICAL REPORT No. 169
August 1989
Department of Statistics, GN-22
University of Washington
Seattle, Washington 98195 USA
ABSTRACT
Finbarr
Department ofI:sio~;tatisti(;s
University of 'Wa:5hirlgtcmSeattle. WA
Key words and phrases. Cox Model, Counting Process Formulation, Martingales, PenalizedPanial Likelihood, Rates of ConvergeIlce,Relative. Risk, Uniformity.
on of convergencematch the optimalsity estimation. The results are uniforminthe$moothing p3j"am,ete:r,important step forthe analysis of datadependelltrules for the Selection of thesmoothing parameter.
AMS 1980 subject classifications. Primary, 62-GOS,Secondary, V""'··JV.).
4l-A25,47-A53, 45-LIO, 4S-M05.
Febrnarv 27, 1989
For each n ,
intensity process
a multivariate counting process with a random
(1.2)
The underlying baseline hazard Ao and the relative risk eo: Rd ~ R are fixed unknown quanti-
ties. Throughout this the time index t takes values on a bounded interval which without loss of
generality is taken to be [0,1]. A family of right continuous non-decreasing sub o-algebras
(Ft(l1 ) : t e [O,l]} are defined on the n'th sample space, with Ft0Y<representing the history of the
n'th process up to time t. All processes are adapted to this family of c-algebras. Yi(l1)(-)iS a
predictable process taking values in {O, I}; Yi(n )(t) = 1 if the i'th component of the process is
under observation just before time t , y/l1 }(t ) = 0 otherwise, see [8] for further discussion. The d-
dimensional covariate process Xi(l1}O is predictable and locally bounded, in fact wewi1l assume
that the collection of covariate processes take values in a fixed bounded subset of Rd.
Specification ofA,(l1) as an intensity process means that the process
tM/l1)(t)=N/l1)(t)-JA,ll1)(1:)d1:, i=1,2,···n and te[O,lJ, (1.3)
is a local martingale with mean
by
=0. is given
t
= !A,t(l1}('t)d1: (1.4)
ease of 110t,itic,n
1.2. Definition of the Penalized Partial Lilteliihood and Some Assumptions
= + , J.1 >
P(t .x) = p (Y (r) =1 I x (t) =x ] ,
and let q (x ,t) =P (x ,t)h (x It). We suppose that there are strictly positive constants, k 1 and kz
(independent ofx and t), such that
kl < q(x,t) < kz and a(jtq(x ,t) < kz.
(tv) {x (r ) , t E (O,l]) \;; K \;; Rd where K is a bounded open simply connected set with Coo -
boundary (see Definition 3.2.1.2. ofTriebel[14] ).
Assumptions (i) and (ii) should be familiar, (iii) is used used in proving results concerning
derivatives of a continuous version of the partial likelihood and (iv) is used is to obtain growth
behavior on eigenvalues which in turn determine the ultimate rates of convergence. For the
assumptions on the parameter space, let W~CK,R) denote the Sobolev space of real valued Lz
functions definedonK whose k'th derivative is square integrable, see Adams(l]. Sobolevspaces
may also be definedfornon..integer k (again see Adams(l]) and throughout this paper the orderk
can assume any positive real value. Wfu CK ,R) is the subspace of W~ CK ,R) which consists of
functions with mean zero. Thus !e(X)dx =0, for all eE W6zCK,R).
Assumption B. (Parameter Space)
(i) e is a Hilbert space offunctions e:K ~ R with inner product <.;> and norm
menzs ofe are constrained to have mean zero.
The ele-
For some
is sett-adiotm are positive constants
-6-
occurs
Proof: The results §3.
1.4. Discussion and Outline of the Paper
The first part of the Theorem gives the order of the systematic error and the second part
gives the order of the stochastic error. From the theorem it follows that if 11: is 0 (n-2mI(2mp+d» ,
then /lfJlI.lJ.: - eo/I?; is bounded by Op(n-2m(P- b)/(2mp+d » . In particular, if eo E wp p = 1),
the integrated square error /lell.~ - edJa is bounded by Op(n-2mI(2m+d » . Thus in the analogue of
the standard l-dimensional cubic smoothing spline setup (m = 2, d = 1 and p = 1) the integrated
squared error converges we get the familiar rate of Op(n-415). Uniformity in the smoothing
parameter is noteworthy. We expect that this will be most useful in the analysis of data depen-
rules for choosing the smoothing parameter. Theorem 1 yields upper bounds the rates of
convergence. theoptimalni.tes ofconvyfgellceobtained
for non-parametric regression and density estimation in Holder spaces, We conjecture the
rates obtained are optimal.
repI'eseIlts an application a general approa(;h
sions to the asymptotic analysis of penalized likelihood-type estimators developed in Cox and
O'Sullivan[4] (heI'l~aftler abbreviated CO[4]). The theoretical rramewcrx
some
reQlJtires a strengt!::leniJl1.g
a more tecnmcal lemma in obtaining bounds on aenva-
(2.3)
(2.2)
e
and
<
a.>
,t) <
S
Ds (e,t)(\> = f(\>(x )e9(.r)q (x ,t)dx
D2s.(e,t)(\>'V = J(\>(x)'V(x)e El(.r)q{x ,t)dx
D3s{e,t)(\>\ff~= J(\>(x )\ff(x)~(x)e9(.r)q (x,t)dx
- 8-
DslI(e,t)(\> = ~ ,t(\>(Xdt»Yi(t)e 6(x;(t»
D2slI(e,t)(\>'V= n It(\>(Xi(t»\ff(Xi(t»Yi(t)e9(X,(l »
D3Sn (e,t )(\>\ff~= ~ ;t(\>(Xi(t »\ff(Xi (t »~(Xi (t»Yi (t)e 9(x,(t» ,
e8
are
Lemma 2.1. Eorany R > 0 and. rt,>dl2.rn there are constants 0 < m« S; MR < 00
(a)
all e,e* E S(R ,a.) and t e [0,1].
above quantities are the Frechet derivatives of slI(e,t) in Wr(X;R). We note that the Frechet
calculus, see Ralll l l] for example. The continuity of the derivatives follows since e9(.r) is con-
where (\>,\ff,~ are functions. Since evaluations is a continuous linear functional in WTCt.(X;R) the
tinuous in efor e E S(R ,a.) because cc-drlm,
The derivatives are clearly continuous. The following Lemma will be useful.
Fixing e E Wr by the strong law oflarge numbers SII (e,t) -7 S (e,t) almost surely where
S (e,t) =E[Y (t)e 9(l1=fe 9(.r)q (x ,t)dx. The Frechet derivatives of s (e,t) are given by
derivative is the function space generalization of the total derivative used in standard multivariate
ous derivatives up to order three guarantees that Assumption A.3 of CO[4] holds.
The first and second order Frechet derivatives of in are given by
(2.11)
Again, with Pi (r) = [..lYi(t)e 8(X,(t)1/Sn (O,t) for i =1,2, ... n , and ~t = t<\>(xi (t»Pi(t),n l~
(2.12)
and it follows that in and inll, are convex. A straightforward argument, along the lines given in
the appendix of(9), shows that the penalized partial likelihood estimator must lie in the subspace
8 n =N(W) + Span~(xi(tij»}, where ~(x) is the Riesz representer of evaluation at x and N(W)
is the null space of the linear operator W. Span stands for the sPanifor the given set where tij
ranges over the event times of the counting process N/n )(t) for t E [0, IT and i == 1,2, ..•. n . If
N (W) is finite then 8 n is a finite dimensional space, although its dimension will in general be
larger than n . When W corresponds to the usual Laplacian penalty functional used· to generate
thin plate smoothing splines[9, 15], the penalized partial likelihood estimator can be represented
as a generalized Laplacian smoothing spline. Following the argument in[9]
umlClue mirlimilzer of the peaalized is
a umlQue mirumilzer
are summanzed
ttnraue root
a Unl,'111p minimizer is
be
operators on E>b e [0,1]
Let U (6) be defined by
(2.17)
By definition of the derivative U (6) is a bounded linear operator on 8a. since (J. > dl'lm . From
(2.10) we have the next.Lemma which implies assumpti()n AA of CO[4].
Lemma 2.3.
for all 6 E E>.
S(R
Replacing U by U (6*) in equation (2.11) we obtain for each 6* E N 90 sequences of eigen
values {Y*v;v = 1,2, ... } and corresponding eigenfunctions {(p*v; v = 1,2, ... ). This leads to
a norm
(2.18)
are unifonnlY ecuivalent
U (6* ) extends to a bounded linear operator on E>*b for b e [0,1Jand the linear operator
is OOlJtndt::d
Lemma 2.4. For R > 0 and =
1
There a
b:::;; a-d12m
Both M and An are independent of l.land b .
Proof: The results follow from Lemma A.l in the appendix because Neo c S (R ,a) for som~R.
We consider part (iii) which illustrates the technicalities. LetH.. = 91Jlnd H;:::: 92
//GjJ.(6.. )-lD31n (6)uv lib s M IIGjJ.(6.. )-lD31n(6)uvll"b (2.21)
=M v~[1 + "{"v)b[1 +l.l"{"vr2{D 31n(6.. )uv$ ..vp
But from Lemma A.l (iii)
and the from Lemma 2.4 (i) and (iii) (the condition that b < d/2m used here)
This proves part (iii). The arguments for parts (i) and arevery similar, just replace the appli-
cation ofLemma A.l (iii) by J.."I;;U.ll.UQ.
f 119n~HJlt .}. ..0 .(1.( -1))!.LlltE,J.lo11n-1j.1., :j:d12m) s P og Jinand
Thus Assumption A.S
=
-d/2m,
[lln ,110]. We have
1
en!.L - 9!.L = G!.L(9~-1[Dln (90)- DI (90)] + !G!.L(9~-1[D2In (9s) - DI2(9s)](9Jl- ( 0)sds (3.5)
is {'lPl~r1V
where 9s = 90 + s (9Jl - (0). Thus
n-1ll ; 2(a.+dI2m )~ O. Let 0 :::;; b :::;; min(2 - a.- d/2m ,a.) and let u-, be a deterministic sequence in
Proof: By first order Taylor series expansion about 90(see Rall[11])
From Theorem 3.1 and Lemma 2.5 (iii) the second term is bounded above by
Lemma 3.2.
Before considering the stochastic component of the error we analyze 119n!.L - 9.,JIb for b .:::; a.
Let u, is any sequence tending to zero such that for some a.satisfying dl2m < a.< (p - d/2m )/2,
3.1 of CO[4]. The conclusion of the theorem tc)liows.
18)
(3.15)
(3.16)
- s.(eo,t )AQ(t )dt]
1+
E//xn Ql)//?;::;k[1+yv]b [1 + J.Lyvr2EB ~n)
s M n-1k[1 +yv]b [1 + J.LYvr2 ::;M n-1J.L-(b+d l2m )
condition on a.
Applying arguments used in Lemma A.1(i) in the appendix gives bounds of 0 (n-1//$Ovlla) for the
where M does not depend on n or v. The first part of (3.12) now follows directly by Markov's
as Jl ~ O. The last inequality comes from Lemma 2.4(iii). For the second part
expectations of both terms on the right hand side of the latter expression. Thus
bounded by M n-1//$Ov//'&J. For the first term
inequality byc;ause
The expected values of the squares of the last three terms are easily computed and shown to be
Thus Assumption in the appendixis satisfied and so bY.Tl:1eorem A.2 the event that 9nfl.
uniquely exists for all }l E [}In ,Ilo] with
occurs with probability approaching unity as n ~ 00. On this event
- -//9nfl. -9P//b S;//9nfl.-9P/1b + 1/9nfl. - 9nP//b
S;d; (}l,b) + r« (}l,b )dn (}l,a.)
Using Lemma 3.2
(3.23)
Thus the convergence rate of //9n !J. - 9~/b is determined by the convergence of d« (}l,b). From
here the results follow by Lemma 3.2. O.
the
differeJrlt terms. For the first term, direct application of the Cauchy-Schwartz inequality and
Lemma 2.2 gives
s MRllullJ.llv IIJ./lw IIJ (A.3)
The last inequality follows from Sobolev's Imdedding Theorem because a.> dl'Im . Similar
analysis of the other terms leads to the bound in (i),
For paruii),
D2[n (a. )uv - D2[ (a.)uv (AA)
= J[D2Sn(a.,t)Uv _ Dsn(a.,t)u Dsn(a.,t)v]dii(t)s; (a. ,t) s; (6. ,t) s; (6. ,t)
J[ D2s (a. ,t)uv Ds (a. ,t)u Ds(a. ,t)v] s (a t)'l c(t)dt- s(6. ,t ) - s (6. ,t) S (a. ,t) 0, A.() .
We analyze a representative term
write
-~~-dN(t) (A.6)
= :: =
(A. 12)
(A.13)
a
From assumption
CO[4] for penalized lilcelil1lood estimators is
E[fbv(t) [dN(t) - s(80,t)Ao(t)dt]j2
s E[fbv(t )dM(t)]2 +E[fbv(t )[sn(eo,t)-s (eo,t )]Ao(t)dtP::;Mn-1/lbvl Ljr=Mn-1
[J D2:~:,;?UV [dN(t) - s(eo,t)Ao(t)dt]j2
s (L[I+V]2gJ ) . 'L[I+vr2[Jb v(t ) [dN(t)- s(80,t)Ao(t)dt]]2v v
upper bound of /I U ,(I xt t V
analysis of parts
generalization of a Iineanzanon
ereved. Consider a penatized lilcelitlood funC:tiOIllal
applying the Cauchy-Schwartz inequality to the sum over v (as in the argument before equation
But
(A.7» gives
orthonormal functions and //g//Wi = 'L[I+v]2gJ . Substituting the series expansion for g andv
Uniform Linearization Result
other part of (AA) and this gives the bound in part (ii), The proof of part (iii) uses techniques
Therefore g E W:t[O,I]. Thus g has the representation g(t)='Lgvbv(t) where b; are L2[O,I]-v
To deal with the remaining term in (A.S) let g (r) = -""""",,"-,,-
=
and
n>no
(A. 19)
(A.IS)
related to this
I am very grateful to Professors John Crowley and Grace Wahba for stimulating discussions
Acknowledgement
o
which hold for all lie [Ii" ,lloT. Thus F"p. is a contraction on Se/t"p.,cx.) for all lie •[Ii,,<,llo]· From
here the argument of Theorem 3.2 in CO[4] is repeated to obtain parts (i) and (ii) of the theorem.
the computations
13. Tikhonov, and Arsenin, V., Solutions ofIll-Posed PrOllJlems,
14. Triebel, H., Interpolation Theory, FunctionSpaces, DifferentialOperators, North-Holland,
New York, 1978.
15. Wahba, G., "Cross-validated spline methods for the estimation of multivariate functions
from data on functionals," in Statistics: An Appraisal, ed, H. A. David and H. T. David, pp,
205-233, The Iowa State University Press, 1984.
16. Weinberger, H. F., "Variational Methods for Eigenvalue Approximation," in CEMS
Regional conference series in appliedmathematics, SIAM, Philadelphia, 1974.