1 goodness-of-fit tests with censored data edsel a. pena statistics department university of south...

35
1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: [email protected]] Research support from NIH and NSF. Talk at Cornell University, 3/13/02

Upload: layne-gloster

Post on 15-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

1

Goodness-of-Fit Tests with Censored Data

Edsel A. PenaStatistics Department

University of South CarolinaColumbia, SC

[E-Mail: [email protected]]

Research support from NIH and NSF.

Talk at Cornell University, 3/13/02

Page 2: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

2

Practical Problem

Page 3: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

3

Product-Limit Estimator and Best-Fitting ExponentialSurvivor Function

Question: Is the underlying survivor function modeled by a family of exponential distributions? a Weibull distribution?

Page 4: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

4

A Car Tire Data Set

• Times to withdrawal (in hours) of 171 car tires, with withdrawal either due to failure or right-censoring.

• Reference: Davis and Lawrance, in Scand. J. Statist., 1989.

• Pneumatic tires subjected to laboratory testing by rotating each tire against a steel drum until either failure (several modes) or removal (right-censoring).

Page 5: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

5

Product-Limit Estimator, Best-Fitting Exponential and Weibull Survivor Functions

Exp

PLE

Weibull

Question: Is the Weibull family a good model for this data?

Page 6: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

6

Goodness-of-Fit Problem

• T1, T2, …, Tn are IID from an unknown distribution function F

• Case 1: F is a continuous df• Case 2: F is discrete df with known jump points• Case 3: F is a mixed distribution with known jump points

• C1, C2, …, Cn are (nuisance) variables that right-censor the Ti’s

• Data: (Z1, 1), (Z2, 2), …, (Zn, n), where

Zi = min(Ti, Ci) and i = I(Ti < Ci)

Page 7: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

7

Simple GOF Problem: For a pre-specified F0, to test the null hypothesis that

H0: F = F0 versus H1: F F0.

On the basis of the data (Z1, 1), (Z2, 2), …, (Zn, n):

Composite GOF Problem: For a pre-specified family of dfs F = {F0(.;}, to test the hypotheses that

H0: F F versus H1: F F .

Statement of the GOF Problem

Page 8: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

8

Generalizing Pearson

With complete data, the famous Pearson test statistics are:

Simple Case:

K

i i

ii

E

EO

1

22 )(

Composite Case:

K

i i

ii

E

EO

1

22

ˆ)ˆ(

where Oi is the # of observations in the ith interval; Ei is the expected number of observations in the ith interval; and

)ˆ;(ˆ0 ii InFE

is the estimated expected number of observations in the ith interval under the null model.

Page 9: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

9

Obstacles with Censored Data

• With right-censored data, determining the exact values of

the Oj’s is not possible.

• Need to estimate them using the product-limit estimator

(Hollander and Pena, ‘92; Li and Doss, ‘93), Nelson-Aalen

estimator (Akritas, ‘88; Hjort, ‘90), or by self-consistency

arguments.• Hard to examine the power or optimality properties of the

resulting Pearson generalizations because of the ad hoc

nature of their derivations.

Page 10: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

10

In Hazards View: Continuous Case

For T an abs cont +rv, the hazard rate function (t) is:

)(

)(}|{)(

tF

dttftTdttTtPdtt

Cumulative hazard function (t) is:

t

dwwt0

)()(

Survivor function in terms of :

)}(exp{}{)( ttTPtF

Page 11: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

11

Two Common Examples

Exponential: );(t

Two-parameter Weibull: 1))((),;( tt

Page 12: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

12

Counting Processes and Martingales

t

n

ii

i

n

ii

dwwwYtNtM

tZItY

tZItN

0

1

1

)()()()(

}{)(

}1,{)(

{M(t): 0 < t < is a square-integrable zero-mean martingale with predictable quadratic variation (PQV) process

t

dwwwYtMM0

)()()(,

Page 13: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

13

Idea in Continuous Case

• For testing H0: (.) C ={0(.;): , if H0 holds, then there is some such that the true hazard is such

0(.;)

space some);(

)(log);(

0

0

t

tt K• Let

• Basis Set for K: ),...}(.;),(.;{ 21

• Expansion: );();(1

tt ik

i

• Truncation: );();(1

tt k

p

kk

p is smoothing order,

Page 14: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

14

Hazard Embedding and Approach

• From this truncation, we obtain the approximation

)};(exp{);();(exp);()( 01

00 ttttt tp

kkk

• Embedding Class

pp

kkkp ttt :);(exp);(),;(

10Cp

• Note: H0 Cp obtains by taking = 0.

• GOF Tests: Score tests for H0: = 0 versus H1: 0.

• Note that is a nuisance parameter in this testing problem.

Page 15: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

15

Class of Statistics

• Estimating equation for the nuisance

0)ˆ;()()()ˆ;(

);(log);(

0

0

0

dwwwYwdNw

tt

• Quadratic Statistic:

QQn

S

dwwwYwdNwQ

tp

p

ˆ ˆ ˆ 1

)ˆ;()()()ˆ;(ˆ0

0

• ̂ is an estimator of the limiting covariance of Qn

ˆ1

Page 16: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

16

Asymptotics and Test

• Under regularity conditions,

) ,0(ˆ121

1221211

pd NQ

n

• Estimator of obtained from the matrix:

dwwwYdN(w)w

w

np )ˆ;()(

)ˆ;(

)ˆ;(

2

1ˆ0

2

0

• Test: Reject H0 if .2*; ppS

Page 17: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

17

A Choice of Generalizing Pearson

• Partition [0,] into 0 = a1 < a2 < … < ap = and let

taaaaap tItItIt

pp))( ..., ),( ),(()( ],(],(],0[ 1211

• Then

dwwwYE

aNaNwdNO

EOEOEOQ

j

j

j

j

a

a

j

j

a

a

jj

pp

)ˆ;()(ˆ

)()()(

ˆ ..., ,ˆ ,ˆˆ

1

1

0

1

t2211

• are dynamic expected frequenciessE j 'ˆ

Page 18: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

18

Special Case: Testing Exponentiality

C = {t}

0

0

)(

)(

ˆ

dwwY

wdN

j

j

j

j

a

a

j

a

a

j dwwYEwdNO11

)(ˆˆ and )( with

• Exponential Hazards:

• Test Statistic (“generalized Pearson”):

)ˆ( }ˆˆ)ˆ({ )ˆ( ˆ pDgpES tt

p

tp

tp

p

kk EEE

EOOO

EpEE )ˆ,...,ˆ,ˆ(ˆ

1ˆ ;),...,,(ˆ

1 ;ˆˆ

21211

where

Page 19: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

19

A Polynomial-Type Choice of

t100 )];([ ..., ),;( ,1);( p

p ttt

• Components of Q̂

;,...,2,1 ,)()(ˆ*

0

1 pkdwwYwdNwQ RRkk

where

}.{)( and };1;{)( );ˆ;(11

0 tRItYtRItNZRn

ii

Rn

iii

Rii

• Resulting test based on the ‘generalized’ residuals. The framework allows correcting for the estimation of nuisance .

Page 20: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

20

Simulated Levels(Polynomial Specification, K = p)

Page 21: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

21

Simulated Powers

Legend: Solid: p=2; Dots: p=3; Short Dashes: p = 4; Long Dashes: p=5

Page 22: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

22

Back to Lung Cancer Data

Test for Exponentiality Test for Weibull

S4 and S5 also both indicaterejection of Weibull family.

Page 23: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

23

Back to Davis & Lawrance Car Tire Data

Exp

PLE

Weibull

Page 24: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

24

Test of Exponentiality

Conclusion: Exponentiality does not hold as in graph!

Page 25: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

25

Test of Weibull Family

Conclusion: Cannot reject Weibull family of distributions.

Page 26: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

26

Simple GOF Problem: Discrete Data

• Ti’s are discrete +rvs with jump points {a1, a2, a3, …}.

• Hazards:

• Problem: To test the hypotheses

based on the right-censored data (Z1, 1), …, (Zn, n).

Page 27: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

27

• True and hypothesized hazard odds:

• For p a pre-specified order, let

be a p x J (possibly random) matrix, with its p rows linearly independent, and with [0, aJ] being the maximum observation period for all n units.

Page 28: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

28

Embedding Idea

• To embed the hypothesized hazard odds into

• Equivalent to assuming that the log hazard odds ratios satisfy

• Class of tests are the score tests of H0: vs. H1: as p and are varied.

Page 29: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

29

Class of Test Statistics

),...,,( 0022

0110 JJVVVDgV

• Quadratic Score Statistic:

• Under H0, this converges in distribution to a chi-square rv.

p

Page 30: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

30

A Pearson-Type Choice of

Partition {1,2,…,J}:

Page 31: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

31

A Polynomial-Type Choice

quadratic form from the above matrices.

Page 32: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

32

Hyde’s Test: A Special Case

When p = 1 with polynomial specification, we obtain:

Resulting test coincides with Hyde’s (‘77, Btka) test.

Page 33: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

33

Adaptive Choice of Smoothing Order

= partial likelihood of

= associated observed information matrix

= partial MLE of

Adjusted Schwarz (‘78, Ann. Stat.) Bayesian Information Criterion

*

Page 34: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

34

Simulation Results for Simple Discrete Case

Note: Based on polynomial-type specification. Performances of Pearson type tests were not as good as for the polynomial type.

Page 35: 1 Goodness-of-Fit Tests with Censored Data Edsel A. Pena Statistics Department University of South Carolina Columbia, SC [E-Mail: pena@stat.sc.edu] Research

35

Concluding Remarks

• Framework is general enough so as to cover both continuous and discrete cases.

• Mixed case dealt with via hazard decomposition. • Since tests are score tests, they possess local

optimality properties.• Enables automatic adjustment of effects due to

estimation of nuisance parameters.• Basic approach extends Neyman’s 1937 idea by

embedding hazards instead of densities.• More studies needed for adaptive procedures.