nonparametric estimation in the cox model · 4l-a25,47-a53,45-lio,4s-m05. febrnarv 27, 1989....

NONPARAMETRIC ESTIMATION IN THE COX MODEL

by

Finbarr O'Sullivan

TECHNICAL REPORT No. 169

August 1989

Department of Statistics, GN-22

University of Washington

Seattle, Washington 98195 USA

ABSTRACT

Finbarr

Department ofI:sio~;tatisti(;s

University of 'Wa:5hirlgtcmSeattle. WA

Key words and phrases. Cox Model, Counting Process Formulation, Martingales, PenalizedPanial Likelihood, Rates of ConvergeIlce,Relative. Risk, Uniformity.

on of convergencematch the optimalsity estimation. The results are uniforminthe$moothing p3j"am,ete:r,important step forthe analysis of datadependelltrules for the Selection of thesmoothing parameter.

AMS 1980 subject classifications. Primary, 62-GOS,Secondary, V""'··JV.).

4l-A25,47-A53, 45-LIO, 4S-M05.

Febrnarv 27, 1989

For each n ,

intensity process

a multivariate counting process with a random

(1.2)

The underlying baseline hazard Ao and the relative risk eo: Rd ~ R are fixed unknown quanti-

ties. Throughout this the time index t takes values on a bounded interval which without loss of

generality is taken to be [0,1]. A family of right continuous non-decreasing sub o-algebras

(Ft(l1 ) : t e [O,l]} are defined on the n'th sample space, with Ft0Y<representing the history of the

n'th process up to time t. All processes are adapted to this family of c-algebras. Yi(l1)(-)iS a

predictable process taking values in {O, I}; Yi(n )(t) = 1 if the i'th component of the process is

under observation just before time t , y/l1 }(t ) = 0 otherwise, see [8] for further discussion. The d-

dimensional covariate process Xi(l1}O is predictable and locally bounded, in fact wewi1l assume

that the collection of covariate processes take values in a fixed bounded subset of Rd.

Specification ofA,(l1) as an intensity process means that the process

tM/l1)(t)=N/l1)(t)-JA,ll1)(1:)d1:, i=1,2,···n and te[O,lJ, (1.3)

is a local martingale with mean

by

=0. is given

t

= !A,t(l1}('t)d1: (1.4)

ease of 110t,itic,n

1.2. Definition of the Penalized Partial Lilteliihood and Some Assumptions

= + , J.1 >

P(t .x) = p (Y (r) =1 I x (t) =x ] ,

and let q (x ,t) =P (x ,t)h (x It). We suppose that there are strictly positive constants, k 1 and kz

(independent ofx and t), such that

kl < q(x,t) < kz and a(jtq(x ,t) < kz.

(tv) {x (r ) , t E (O,l]) \;; K \;; Rd where K is a bounded open simply connected set with Coo -

boundary (see Definition 3.2.1.2. ofTriebel[14] ).

Assumptions (i) and (ii) should be familiar, (iii) is used used in proving results concerning

derivatives of a continuous version of the partial likelihood and (iv) is used is to obtain growth

behavior on eigenvalues which in turn determine the ultimate rates of convergence. For the

assumptions on the parameter space, let W~CK,R) denote the Sobolev space of real valued Lz

functions definedonK whose k'th derivative is square integrable, see Adams(l]. Sobolevspaces

may also be definedfornon..integer k (again see Adams(l]) and throughout this paper the orderk

can assume any positive real value. Wfu CK ,R) is the subspace of W~ CK ,R) which consists of

functions with mean zero. Thus !e(X)dx =0, for all eE W6zCK,R).

Assumption B. (Parameter Space)

(i) e is a Hilbert space offunctions e:K ~ R with inner product <.;> and norm

menzs ofe are constrained to have mean zero.

The ele-

For some

is sett-adiotm are positive constants

-6-

occurs

Proof: The results §3.

1.4. Discussion and Outline of the Paper

The first part of the Theorem gives the order of the systematic error and the second part

gives the order of the stochastic error. From the theorem it follows that if 11: is 0 (n-2mI(2mp+d» ,

then /lfJlI.lJ.: - eo/I?; is bounded by Op(n-2m(P- b)/(2mp+d » . In particular, if eo E wp p = 1),

the integrated square error /lell.~ - edJa is bounded by Op(n-2mI(2m+d » . Thus in the analogue of

the standard l-dimensional cubic smoothing spline setup (m = 2, d = 1 and p = 1) the integrated

squared error converges we get the familiar rate of Op(n-415). Uniformity in the smoothing

parameter is noteworthy. We expect that this will be most useful in the analysis of data depen-

rules for choosing the smoothing parameter. Theorem 1 yields upper bounds the rates of

convergence. theoptimalni.tes ofconvyfgellceobtained

for non-parametric regression and density estimation in Holder spaces, We conjecture the

rates obtained are optimal.

repI'eseIlts an application a general approa(;h

sions to the asymptotic analysis of penalized likelihood-type estimators developed in Cox and

O'Sullivan[4] (heI'l~aftler abbreviated CO[4]). The theoretical rramewcrx

some

reQlJtires a strengt!::leniJl1.g

a more tecnmcal lemma in obtaining bounds on aenva-

(2.3)

(2.2)

e

and

<

a.>

,t) <

S

Ds (e,t)(\> = f(\>(x )e9(.r)q (x ,t)dx

D2s.(e,t)(\>'V = J(\>(x)'V(x)e El(.r)q{x ,t)dx

D3s{e,t)(\>\ff~= J(\>(x )\ff(x)~(x)e9(.r)q (x,t)dx

- 8-

DslI(e,t)(\> = ~ ,t(\>(Xdt»Yi(t)e 6(x;(t»

D2slI(e,t)(\>'V= n It(\>(Xi(t»\ff(Xi(t»Yi(t)e9(X,(l »

D3Sn (e,t )(\>\ff~= ~ ;t(\>(Xi(t »\ff(Xi (t »~(Xi (t»Yi (t)e 9(x,(t» ,

e8

are

Lemma 2.1. Eorany R > 0 and. rt,>dl2.rn there are constants 0 < m« S; MR < 00

(a)

all e,e* E S(R ,a.) and t e [0,1].

above quantities are the Frechet derivatives of slI(e,t) in Wr(X;R). We note that the Frechet

calculus, see Ralll l l] for example. The continuity of the derivatives follows since e9(.r) is con-

where (\>,\ff,~ are functions. Since evaluations is a continuous linear functional in WTCt.(X;R) the

tinuous in efor e E S(R ,a.) because cc-drlm,

The derivatives are clearly continuous. The following Lemma will be useful.

Fixing e E Wr by the strong law oflarge numbers SII (e,t) -7 S (e,t) almost surely where

S (e,t) =E[Y (t)e 9(l1=fe 9(.r)q (x ,t)dx. The Frechet derivatives of s (e,t) are given by

derivative is the function space generalization of the total derivative used in standard multivariate

ous derivatives up to order three guarantees that Assumption A.3 of CO[4] holds.

The first and second order Frechet derivatives of in are given by

(2.11)

Again, with Pi (r) = [..lYi(t)e 8(X,(t)1/Sn (O,t) for i =1,2, ... n , and ~t = t<\>(xi (t»Pi(t),n l~

(2.12)

and it follows that in and inll, are convex. A straightforward argument, along the lines given in

the appendix of(9), shows that the penalized partial likelihood estimator must lie in the subspace

8 n =N(W) + Span~(xi(tij»}, where ~(x) is the Riesz representer of evaluation at x and N(W)

is the null space of the linear operator W. Span stands for the sPanifor the given set where tij

ranges over the event times of the counting process N/n )(t) for t E [0, IT and i == 1,2, ..•. n . If

N (W) is finite then 8 n is a finite dimensional space, although its dimension will in general be

larger than n . When W corresponds to the usual Laplacian penalty functional used· to generate

thin plate smoothing splines[9, 15], the penalized partial likelihood estimator can be represented

as a generalized Laplacian smoothing spline. Following the argument in[9]

umlClue mirlimilzer of the peaalized is

a umlQue mirumilzer

are summanzed

ttnraue root

a Unl,'111p minimizer is

be

operators on E>b e [0,1]

Let U (6) be defined by

(2.17)

By definition of the derivative U (6) is a bounded linear operator on 8a. since (J. > dl'lm . From

(2.10) we have the next.Lemma which implies assumpti()n AA of CO[4].

Lemma 2.3.

for all 6 E E>.

S(R

Replacing U by U (6*) in equation (2.11) we obtain for each 6* E N 90 sequences of eigen

values {Y*v;v = 1,2, ... } and corresponding eigenfunctions {(p*v; v = 1,2, ... ). This leads to

a norm

(2.18)

are unifonnlY ecuivalent

U (6* ) extends to a bounded linear operator on E>*b for b e [0,1Jand the linear operator

is OOlJtndt::d

Lemma 2.4. For R > 0 and =

1

There a

b:::;; a-d12m

Both M and An are independent of l.land b .

Proof: The results follow from Lemma A.l in the appendix because Neo c S (R ,a) for som~R.

We consider part (iii) which illustrates the technicalities. LetH.. = 91Jlnd H;:::: 92

//GjJ.(6.. )-lD31n (6)uv lib s M IIGjJ.(6.. )-lD31n(6)uvll"b (2.21)

=M v~[1 + "{"v)b[1 +l.l"{"vr2{D 31n(6.. )uv$ ..vp

But from Lemma A.l (iii)

and the from Lemma 2.4 (i) and (iii) (the condition that b < d/2m used here)

This proves part (iii). The arguments for parts (i) and arevery similar, just replace the appli-

cation ofLemma A.l (iii) by J.."I;;U.ll.UQ.

f 119n~HJlt .}. ..0 .(1.( -1))!.LlltE,J.lo11n-1j.1., :j:d12m) s P og Jinand

Thus Assumption A.S

=

-d/2m,

[lln ,110]. We have

1

en!.L - 9!.L = G!.L(9~-1[Dln (90)- DI (90)] + !G!.L(9~-1[D2In (9s) - DI2(9s)](9Jl- ( 0)sds (3.5)

is {'lPl~r1V

where 9s = 90 + s (9Jl - (0). Thus

n-1ll ; 2(a.+dI2m )~ O. Let 0 :::;; b :::;; min(2 - a.- d/2m ,a.) and let u-, be a deterministic sequence in

Proof: By first order Taylor series expansion about 90(see Rall[11])

From Theorem 3.1 and Lemma 2.5 (iii) the second term is bounded above by

Lemma 3.2.

Before considering the stochastic component of the error we analyze 119n!.L - 9.,JIb for b .:::; a.

Let u, is any sequence tending to zero such that for some a.satisfying dl2m < a.< (p - d/2m )/2,

3.1 of CO[4]. The conclusion of the theorem tc)liows.

18)

(3.15)

(3.16)

- s.(eo,t )AQ(t )dt]

1+

E//xn Ql)//?;::;k[1+yv]b [1 + J.Lyvr2EB ~n)

s M n-1k[1 +yv]b [1 + J.LYvr2 ::;M n-1J.L-(b+d l2m )

condition on a.

Applying arguments used in Lemma A.1(i) in the appendix gives bounds of 0 (n-1//$Ovlla) for the

where M does not depend on n or v. The first part of (3.12) now follows directly by Markov's

as Jl ~ O. The last inequality comes from Lemma 2.4(iii). For the second part

expectations of both terms on the right hand side of the latter expression. Thus

bounded by M n-1//$Ov//'&J. For the first term

inequality byc;ause

The expected values of the squares of the last three terms are easily computed and shown to be

Thus Assumption in the appendixis satisfied and so bY.Tl:1eorem A.2 the event that 9nfl.

uniquely exists for all }l E [}In ,Ilo] with

occurs with probability approaching unity as n ~ 00. On this event

- -//9nfl. -9P//b S;//9nfl.-9P/1b + 1/9nfl. - 9nP//b

S;d; (}l,b) + r« (}l,b )dn (}l,a.)

Using Lemma 3.2

(3.23)

Thus the convergence rate of //9n !J. - 9~/b is determined by the convergence of d« (}l,b). From

here the results follow by Lemma 3.2. O.

the

differeJrlt terms. For the first term, direct application of the Cauchy-Schwartz inequality and

Lemma 2.2 gives

s MRllullJ.llv IIJ./lw IIJ (A.3)

The last inequality follows from Sobolev's Imdedding Theorem because a.> dl'Im . Similar

analysis of the other terms leads to the bound in (i),

For paruii),

D2[n (a. )uv - D2[ (a.)uv (AA)

= J[D2Sn(a.,t)Uv _ Dsn(a.,t)u Dsn(a.,t)v]dii(t)s; (a. ,t) s; (6. ,t) s; (6. ,t)

J[ D2s (a. ,t)uv Ds (a. ,t)u Ds(a. ,t)v] s (a t)'l c(t)dt- s(6. ,t ) - s (6. ,t) S (a. ,t) 0, A.() .

We analyze a representative term

write

-~~-dN(t) (A.6)

= :: =

(A. 12)

(A.13)

a

From assumption

CO[4] for penalized lilcelil1lood estimators is

E[fbv(t) [dN(t) - s(80,t)Ao(t)dt]j2

s E[fbv(t )dM(t)]2 +E[fbv(t )[sn(eo,t)-s (eo,t )]Ao(t)dtP::;Mn-1/lbvl Ljr=Mn-1

[J D2:~:,;?UV [dN(t) - s(eo,t)Ao(t)dt]j2

s (L[I+V]2gJ ) . 'L[I+vr2[Jb v(t ) [dN(t)- s(80,t)Ao(t)dt]]2v v

upper bound of /I U ,(I xt t V

analysis of parts

generalization of a Iineanzanon

ereved. Consider a penatized lilcelitlood funC:tiOIllal

applying the Cauchy-Schwartz inequality to the sum over v (as in the argument before equation

But

(A.7» gives

orthonormal functions and //g//Wi = 'L[I+v]2gJ . Substituting the series expansion for g andv

Uniform Linearization Result

other part of (AA) and this gives the bound in part (ii), The proof of part (iii) uses techniques

Therefore g E W:t[O,I]. Thus g has the representation g(t)='Lgvbv(t) where b; are L2[O,I]-v

To deal with the remaining term in (A.S) let g (r) = -""""",,"-,,-

=

and

n>no

(A. 19)

(A.IS)

related to this

I am very grateful to Professors John Crowley and Grace Wahba for stimulating discussions

Acknowledgement

o

which hold for all lie [Ii" ,lloT. Thus F"p. is a contraction on Se/t"p.,cx.) for all lie •[Ii,,<,llo]· From

here the argument of Theorem 3.2 in CO[4] is repeated to obtain parts (i) and (ii) of the theorem.

the computations

13. Tikhonov, and Arsenin, V., Solutions ofIll-Posed PrOllJlems,

14. Triebel, H., Interpolation Theory, FunctionSpaces, DifferentialOperators, North-Holland,

New York, 1978.

15. Wahba, G., "Cross-validated spline methods for the estimation of multivariate functions

from data on functionals," in Statistics: An Appraisal, ed, H. A. David and H. T. David, pp,

205-233, The Iowa State University Press, 1984.

16. Weinberger, H. F., "Variational Methods for Eigenvalue Approximation," in CEMS

Regional conference series in appliedmathematics, SIAM, Philadelphia, 1974.

nonparametric estimation in the cox model · 4l-a25,47-a53,45-lio,4s-m05. febrnarv 27, 1989....

Documents