1 goodness-of-fit tests with censored data edsel a. pena statistics department university of south...

Post on 15-Dec-2015

217 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

1

Goodness-of-Fit Tests with Censored Data

Edsel A. PenaStatistics Department

University of South CarolinaColumbia, SC

[E-Mail: pena@stat.sc.edu]

Research support from NIH and NSF.

Talk at Cornell University, 3/13/02

2

Practical Problem

3

Product-Limit Estimator and Best-Fitting ExponentialSurvivor Function

Question: Is the underlying survivor function modeled by a family of exponential distributions? a Weibull distribution?

4

A Car Tire Data Set

• Times to withdrawal (in hours) of 171 car tires, with withdrawal either due to failure or right-censoring.

• Reference: Davis and Lawrance, in Scand. J. Statist., 1989.

• Pneumatic tires subjected to laboratory testing by rotating each tire against a steel drum until either failure (several modes) or removal (right-censoring).

5

Product-Limit Estimator, Best-Fitting Exponential and Weibull Survivor Functions

Exp

PLE

Weibull

Question: Is the Weibull family a good model for this data?

6

Goodness-of-Fit Problem

• T1, T2, …, Tn are IID from an unknown distribution function F

• Case 1: F is a continuous df• Case 2: F is discrete df with known jump points• Case 3: F is a mixed distribution with known jump points

• C1, C2, …, Cn are (nuisance) variables that right-censor the Ti’s

• Data: (Z1, 1), (Z2, 2), …, (Zn, n), where

Zi = min(Ti, Ci) and i = I(Ti < Ci)

7

Simple GOF Problem: For a pre-specified F0, to test the null hypothesis that

H0: F = F0 versus H1: F F0.

On the basis of the data (Z1, 1), (Z2, 2), …, (Zn, n):

Composite GOF Problem: For a pre-specified family of dfs F = {F0(.;}, to test the hypotheses that

H0: F F versus H1: F F .

Statement of the GOF Problem

8

Generalizing Pearson

With complete data, the famous Pearson test statistics are:

Simple Case:

K

i i

ii

E

EO

1

22 )(

Composite Case:

K

i i

ii

E

EO

1

22

ˆ)ˆ(

where Oi is the # of observations in the ith interval; Ei is the expected number of observations in the ith interval; and

)ˆ;(ˆ0 ii InFE

is the estimated expected number of observations in the ith interval under the null model.

9

Obstacles with Censored Data

• With right-censored data, determining the exact values of

the Oj’s is not possible.

• Need to estimate them using the product-limit estimator

(Hollander and Pena, ‘92; Li and Doss, ‘93), Nelson-Aalen

estimator (Akritas, ‘88; Hjort, ‘90), or by self-consistency

arguments.• Hard to examine the power or optimality properties of the

resulting Pearson generalizations because of the ad hoc

nature of their derivations.

10

In Hazards View: Continuous Case

For T an abs cont +rv, the hazard rate function (t) is:

)(

)(}|{)(

tF

dttftTdttTtPdtt

Cumulative hazard function (t) is:

t

dwwt0

)()(

Survivor function in terms of :

)}(exp{}{)( ttTPtF

11

Two Common Examples

Exponential: );(t

Two-parameter Weibull: 1))((),;( tt

12

Counting Processes and Martingales

t

n

ii

i

n

ii

dwwwYtNtM

tZItY

tZItN

0

1

1

)()()()(

}{)(

}1,{)(

{M(t): 0 < t < is a square-integrable zero-mean martingale with predictable quadratic variation (PQV) process

t

dwwwYtMM0

)()()(,

13

Idea in Continuous Case

• For testing H0: (.) C ={0(.;): , if H0 holds, then there is some such that the true hazard is such

0(.;)

space some);(

)(log);(

0

0

t

tt K• Let

• Basis Set for K: ),...}(.;),(.;{ 21

• Expansion: );();(1

tt ik

i

• Truncation: );();(1

tt k

p

kk

p is smoothing order,

14

Hazard Embedding and Approach

• From this truncation, we obtain the approximation

)};(exp{);();(exp);()( 01

00 ttttt tp

kkk

• Embedding Class

pp

kkkp ttt :);(exp);(),;(

10Cp

• Note: H0 Cp obtains by taking = 0.

• GOF Tests: Score tests for H0: = 0 versus H1: 0.

• Note that is a nuisance parameter in this testing problem.

15

Class of Statistics

• Estimating equation for the nuisance

0)ˆ;()()()ˆ;(

);(log);(

0

0

0

dwwwYwdNw

tt

• Quadratic Statistic:

QQn

S

dwwwYwdNwQ

tp

p

ˆ ˆ ˆ 1

)ˆ;()()()ˆ;(ˆ0

0

• ̂ is an estimator of the limiting covariance of Qn

ˆ1

16

Asymptotics and Test

• Under regularity conditions,

) ,0(ˆ121

1221211

pd NQ

n

• Estimator of obtained from the matrix:

dwwwYdN(w)w

w

np )ˆ;()(

)ˆ;(

)ˆ;(

2

1ˆ0

2

0

• Test: Reject H0 if .2*; ppS

17

A Choice of Generalizing Pearson

• Partition [0,] into 0 = a1 < a2 < … < ap = and let

taaaaap tItItIt

pp))( ..., ),( ),(()( ],(],(],0[ 1211

• Then

dwwwYE

aNaNwdNO

EOEOEOQ

j

j

j

j

a

a

j

j

a

a

jj

pp

)ˆ;()(ˆ

)()()(

ˆ ..., ,ˆ ,ˆˆ

1

1

0

1

t2211

• are dynamic expected frequenciessE j 'ˆ

18

Special Case: Testing Exponentiality

C = {t}

0

0

)(

)(

ˆ

dwwY

wdN

j

j

j

j

a

a

j

a

a

j dwwYEwdNO11

)(ˆˆ and )( with

• Exponential Hazards:

• Test Statistic (“generalized Pearson”):

)ˆ( }ˆˆ)ˆ({ )ˆ( ˆ pDgpES tt

p

tp

tp

p

kk EEE

EOOO

EpEE )ˆ,...,ˆ,ˆ(ˆ

1ˆ ;),...,,(ˆ

1 ;ˆˆ

21211

where

19

A Polynomial-Type Choice of

t100 )];([ ..., ),;( ,1);( p

p ttt

• Components of Q̂

;,...,2,1 ,)()(ˆ*

0

1 pkdwwYwdNwQ RRkk

where

}.{)( and };1;{)( );ˆ;(11

0 tRItYtRItNZRn

ii

Rn

iii

Rii

• Resulting test based on the ‘generalized’ residuals. The framework allows correcting for the estimation of nuisance .

20

Simulated Levels(Polynomial Specification, K = p)

21

Simulated Powers

Legend: Solid: p=2; Dots: p=3; Short Dashes: p = 4; Long Dashes: p=5

22

Back to Lung Cancer Data

Test for Exponentiality Test for Weibull

S4 and S5 also both indicaterejection of Weibull family.

23

Back to Davis & Lawrance Car Tire Data

Exp

PLE

Weibull

24

Test of Exponentiality

Conclusion: Exponentiality does not hold as in graph!

25

Test of Weibull Family

Conclusion: Cannot reject Weibull family of distributions.

26

Simple GOF Problem: Discrete Data

• Ti’s are discrete +rvs with jump points {a1, a2, a3, …}.

• Hazards:

• Problem: To test the hypotheses

based on the right-censored data (Z1, 1), …, (Zn, n).

27

• True and hypothesized hazard odds:

• For p a pre-specified order, let

be a p x J (possibly random) matrix, with its p rows linearly independent, and with [0, aJ] being the maximum observation period for all n units.

28

Embedding Idea

• To embed the hypothesized hazard odds into

• Equivalent to assuming that the log hazard odds ratios satisfy

• Class of tests are the score tests of H0: vs. H1: as p and are varied.

29

Class of Test Statistics

),...,,( 0022

0110 JJVVVDgV

• Quadratic Score Statistic:

• Under H0, this converges in distribution to a chi-square rv.

p

30

A Pearson-Type Choice of

Partition {1,2,…,J}:

31

A Polynomial-Type Choice

quadratic form from the above matrices.

32

Hyde’s Test: A Special Case

When p = 1 with polynomial specification, we obtain:

Resulting test coincides with Hyde’s (‘77, Btka) test.

33

Adaptive Choice of Smoothing Order

= partial likelihood of

= associated observed information matrix

= partial MLE of

Adjusted Schwarz (‘78, Ann. Stat.) Bayesian Information Criterion

*

34

Simulation Results for Simple Discrete Case

Note: Based on polynomial-type specification. Performances of Pearson type tests were not as good as for the polynomial type.

35

Concluding Remarks

• Framework is general enough so as to cover both continuous and discrete cases.

• Mixed case dealt with via hazard decomposition. • Since tests are score tests, they possess local

optimality properties.• Enables automatic adjustment of effects due to

estimation of nuisance parameters.• Basic approach extends Neyman’s 1937 idea by

embedding hazards instead of densities.• More studies needed for adaptive procedures.

top related