1 goodness-of-fit tests with censored data edsel a. pena statistics department university of south...
TRANSCRIPT
1
Goodness-of-Fit Tests with Censored Data
Edsel A. PenaStatistics Department
University of South CarolinaColumbia, SC
[E-Mail: [email protected]]
Research support from NIH and NSF.
Talk at Cornell University, 3/13/02
2
Practical Problem
3
Product-Limit Estimator and Best-Fitting ExponentialSurvivor Function
Question: Is the underlying survivor function modeled by a family of exponential distributions? a Weibull distribution?
4
A Car Tire Data Set
• Times to withdrawal (in hours) of 171 car tires, with withdrawal either due to failure or right-censoring.
• Reference: Davis and Lawrance, in Scand. J. Statist., 1989.
• Pneumatic tires subjected to laboratory testing by rotating each tire against a steel drum until either failure (several modes) or removal (right-censoring).
5
Product-Limit Estimator, Best-Fitting Exponential and Weibull Survivor Functions
Exp
PLE
Weibull
Question: Is the Weibull family a good model for this data?
6
Goodness-of-Fit Problem
• T1, T2, …, Tn are IID from an unknown distribution function F
• Case 1: F is a continuous df• Case 2: F is discrete df with known jump points• Case 3: F is a mixed distribution with known jump points
• C1, C2, …, Cn are (nuisance) variables that right-censor the Ti’s
• Data: (Z1, 1), (Z2, 2), …, (Zn, n), where
Zi = min(Ti, Ci) and i = I(Ti < Ci)
7
Simple GOF Problem: For a pre-specified F0, to test the null hypothesis that
H0: F = F0 versus H1: F F0.
On the basis of the data (Z1, 1), (Z2, 2), …, (Zn, n):
Composite GOF Problem: For a pre-specified family of dfs F = {F0(.;}, to test the hypotheses that
H0: F F versus H1: F F .
Statement of the GOF Problem
8
Generalizing Pearson
With complete data, the famous Pearson test statistics are:
Simple Case:
K
i i
ii
E
EO
1
22 )(
Composite Case:
K
i i
ii
E
EO
1
22
ˆ)ˆ(
where Oi is the # of observations in the ith interval; Ei is the expected number of observations in the ith interval; and
)ˆ;(ˆ0 ii InFE
is the estimated expected number of observations in the ith interval under the null model.
9
Obstacles with Censored Data
• With right-censored data, determining the exact values of
the Oj’s is not possible.
• Need to estimate them using the product-limit estimator
(Hollander and Pena, ‘92; Li and Doss, ‘93), Nelson-Aalen
estimator (Akritas, ‘88; Hjort, ‘90), or by self-consistency
arguments.• Hard to examine the power or optimality properties of the
resulting Pearson generalizations because of the ad hoc
nature of their derivations.
10
In Hazards View: Continuous Case
For T an abs cont +rv, the hazard rate function (t) is:
)(
)(}|{)(
tF
dttftTdttTtPdtt
Cumulative hazard function (t) is:
t
dwwt0
)()(
Survivor function in terms of :
)}(exp{}{)( ttTPtF
11
Two Common Examples
Exponential: );(t
Two-parameter Weibull: 1))((),;( tt
12
Counting Processes and Martingales
t
n
ii
i
n
ii
dwwwYtNtM
tZItY
tZItN
0
1
1
)()()()(
}{)(
}1,{)(
{M(t): 0 < t < is a square-integrable zero-mean martingale with predictable quadratic variation (PQV) process
t
dwwwYtMM0
)()()(,
13
Idea in Continuous Case
• For testing H0: (.) C ={0(.;): , if H0 holds, then there is some such that the true hazard is such
0(.;)
space some);(
)(log);(
0
0
t
tt K• Let
• Basis Set for K: ),...}(.;),(.;{ 21
• Expansion: );();(1
tt ik
i
• Truncation: );();(1
tt k
p
kk
p is smoothing order,
14
Hazard Embedding and Approach
• From this truncation, we obtain the approximation
)};(exp{);();(exp);()( 01
00 ttttt tp
kkk
• Embedding Class
pp
kkkp ttt :);(exp);(),;(
10Cp
• Note: H0 Cp obtains by taking = 0.
• GOF Tests: Score tests for H0: = 0 versus H1: 0.
• Note that is a nuisance parameter in this testing problem.
15
Class of Statistics
• Estimating equation for the nuisance
0)ˆ;()()()ˆ;(
);(log);(
0
0
0
dwwwYwdNw
tt
• Quadratic Statistic:
QQn
S
dwwwYwdNwQ
tp
p
ˆ ˆ ˆ 1
)ˆ;()()()ˆ;(ˆ0
0
• ̂ is an estimator of the limiting covariance of Qn
ˆ1
16
Asymptotics and Test
• Under regularity conditions,
) ,0(ˆ121
1221211
pd NQ
n
• Estimator of obtained from the matrix:
dwwwYdN(w)w
w
np )ˆ;()(
)ˆ;(
)ˆ;(
2
1ˆ0
2
0
• Test: Reject H0 if .2*; ppS
17
A Choice of Generalizing Pearson
• Partition [0,] into 0 = a1 < a2 < … < ap = and let
taaaaap tItItIt
pp))( ..., ),( ),(()( ],(],(],0[ 1211
• Then
dwwwYE
aNaNwdNO
EOEOEOQ
j
j
j
j
a
a
j
j
a
a
jj
pp
)ˆ;()(ˆ
)()()(
ˆ ..., ,ˆ ,ˆˆ
1
1
0
1
t2211
• are dynamic expected frequenciessE j 'ˆ
18
Special Case: Testing Exponentiality
C = {t}
0
0
)(
)(
ˆ
dwwY
wdN
j
j
j
j
a
a
j
a
a
j dwwYEwdNO11
)(ˆˆ and )( with
• Exponential Hazards:
• Test Statistic (“generalized Pearson”):
)ˆ( }ˆˆ)ˆ({ )ˆ( ˆ pDgpES tt
p
tp
tp
p
kk EEE
EOOO
EpEE )ˆ,...,ˆ,ˆ(ˆ
1ˆ ;),...,,(ˆ
1 ;ˆˆ
21211
where
19
A Polynomial-Type Choice of
t100 )];([ ..., ),;( ,1);( p
p ttt
• Components of Q̂
;,...,2,1 ,)()(ˆ*
0
1 pkdwwYwdNwQ RRkk
where
}.{)( and };1;{)( );ˆ;(11
0 tRItYtRItNZRn
ii
Rn
iii
Rii
• Resulting test based on the ‘generalized’ residuals. The framework allows correcting for the estimation of nuisance .
20
Simulated Levels(Polynomial Specification, K = p)
21
Simulated Powers
Legend: Solid: p=2; Dots: p=3; Short Dashes: p = 4; Long Dashes: p=5
22
Back to Lung Cancer Data
Test for Exponentiality Test for Weibull
S4 and S5 also both indicaterejection of Weibull family.
23
Back to Davis & Lawrance Car Tire Data
Exp
PLE
Weibull
24
Test of Exponentiality
Conclusion: Exponentiality does not hold as in graph!
25
Test of Weibull Family
Conclusion: Cannot reject Weibull family of distributions.
26
Simple GOF Problem: Discrete Data
• Ti’s are discrete +rvs with jump points {a1, a2, a3, …}.
• Hazards:
•
• Problem: To test the hypotheses
based on the right-censored data (Z1, 1), …, (Zn, n).
27
• True and hypothesized hazard odds:
• For p a pre-specified order, let
be a p x J (possibly random) matrix, with its p rows linearly independent, and with [0, aJ] being the maximum observation period for all n units.
28
Embedding Idea
• To embed the hypothesized hazard odds into
• Equivalent to assuming that the log hazard odds ratios satisfy
• Class of tests are the score tests of H0: vs. H1: as p and are varied.
29
Class of Test Statistics
),...,,( 0022
0110 JJVVVDgV
• Quadratic Score Statistic:
• Under H0, this converges in distribution to a chi-square rv.
p
30
A Pearson-Type Choice of
Partition {1,2,…,J}:
31
A Polynomial-Type Choice
quadratic form from the above matrices.
32
Hyde’s Test: A Special Case
When p = 1 with polynomial specification, we obtain:
Resulting test coincides with Hyde’s (‘77, Btka) test.
33
Adaptive Choice of Smoothing Order
= partial likelihood of
= associated observed information matrix
= partial MLE of
Adjusted Schwarz (‘78, Ann. Stat.) Bayesian Information Criterion
*
34
Simulation Results for Simple Discrete Case
Note: Based on polynomial-type specification. Performances of Pearson type tests were not as good as for the polynomial type.
35
Concluding Remarks
• Framework is general enough so as to cover both continuous and discrete cases.
• Mixed case dealt with via hazard decomposition. • Since tests are score tests, they possess local
optimality properties.• Enables automatic adjustment of effects due to
estimation of nuisance parameters.• Basic approach extends Neyman’s 1937 idea by
embedding hazards instead of densities.• More studies needed for adaptive procedures.