lecture 10: hypothesis testing ii weight functions trend tests
TRANSCRIPT
Lecture 10: Hypothesis Testing II
Weight FunctionsTrend Tests
Testing >2 Samples in R> ###2-sample testing using toy example> time<-c(3,6,9,9,11,16,8,9,10,12,19,23)> cens<-c(1,0,1,1,0,1,1,1,0,0,1,0)> grp<-c(1,1,1,1,1,1,2,2,2,2,2,2)> grp<-as.factor(grp)> > sdat<-Surv(time, cens)> survdiff(sdat~grp)Call:survdiff(formula = sdat ~ grp)
N Observed Expected (O-E)^2/E (O-E)^2/Vgrp=1 6 4 2.57 0.800 1.62grp=2 6 3 4.43 0.463 1.62
Chisq= 1.6 on 1 degrees of freedom, p= 0.203
Testing >2 Samples in R> survdiff(sdat~grp, rho=1)Call:survdiff(formula = sdat ~ grp)
N Observed Expected (O-E)^2/E (O-E)^2/Vgrp=1 6 3.2 2.15 0.513 1.23grp=2 6 2.11 3.16 0.349 1.23
Chisq= 1.6 on 1 degrees of freedom, p= 0.268
Revisit ‘Linear Dependence’ of Zj(t)• How are they linearly dependent?
• Two sample case:
Revisit ‘Linear Dependence’ of Zj(t)• K-sample case:
Beyond Log-Rank
• Log-rank has optimum power to detect Ha when the hazard rates of our K groups are proportional
• What if they’re not…
• We’ve mentioned using other weight functions
• Depending on the choice of weight functions, we can place emphasis on different regions of the survival curve.
Example: Kidney Infection
• Data on 119 kidney dialysis patients• Comparing time to kidney infection between
two groups– Catheters placed percutaneously (n = 76)– Catheters placed surgically (n = 43)
Log-Rank Testti Yi1 di1 Yi2 di2 Yi di
0.5 43 0 76 6 119 6 2.168 -2.168 1.3261.5 43 1 60 0 103 1 0.417 0.583 0.2432.5 42 0 56 2 98 2 0.857 -0.857 0.4853.5 40 1 49 1 89 2 0.899 0.101 0.4894.5 36 2 43 0 79 2 0.911 1.089 0.4905.5 33 1 40 0 73 1 0.452 0.548 0.2486.5 31 0 35 1 66 1 0.470 -0.470 0.2498.5 25 2 30 0 55 2 0.909 1.091 0.4879.5 22 1 27 0 49 1 0.449 0.551 0.247
10.5 20 1 25 0 45 1 0.444 0.556 0.24711.5 18 1 22 0 40 1 0.450 0.550 0.24815.5 11 1 14 1 25 2 0.88 0.120 0.47216.5 10 1 13 0 23 1 0.435 0.565 0.24618.5 9 1 11 0 20 1 0.450 0.550 0.24823.5 4 1 5 0 9 1 0.440 0.556 0.24726.5 2 1 3 0 5 1 0.400 0.600 0.240Sum 3.964 6.211
1i
i
di YY 1 1
i
i
di i Yd Y
1 2
2 1
i i i i i
i i
Y Y d Y d
Y Y
ComparisonsTest p-valueLog-Rank 1 3.96 6.21 2.53 0.112Gehan -9 38862 0.002 0.964Tarone-Ware 13.2 432.83 0.4 0.526Peto-Peto 2.47 4.36 1.4 0.237
Modified Peto-Peto 2.31 4.2 1.28 0.259
Fleming-Harrington p=0; q=1 1.41 0.21 9.67 0.002
Fleming-Harrington p=1; q=0 2.55 4.69 1.39 0.239
Fleming-Harrington p=1; q=1 1.02 0.11 9.83 0.002
Fleming-Harrington p=0.5; q=0.5 2.47 0.66 9.28 0.002
Fleming-Harrington p=0.5; q=2 0.32 0.01 8.18 0.004
iW t 212
11 1Z
iY
iY
iS t
1i
i
Yi YS t
1ˆ
iS t
1ˆ1 iS t
1 1ˆ ˆ1i iS t S t
0.50.5
1 1ˆ ˆ1i iS t S t
20.5
1 1ˆ ˆ1i iS t S t
Notice the Differences!
• Situation of varying inference• Need to be sure you are testing what you
think you are testing• Check– Look at the hazards– Do they cross?
• Problem– Estimating hazards is imprecise (as we’ve
discussed)
Cumulative Hazards
Hazard Rate (smoothing spline)
Misconception
• Survival curves crossing telling about appropriateness of log-rank
• Not true– Survival curves crossing depends on censoring and study duration– What if they cross but we don’t look far enough out
• Consider– Survival curves cross hazards cross– Hazards cross survival curves may or may not cross
• Solution?– Test regions of t– Prior and after cross based in looking at hazard– Some tests allow for crossing (Yang and Prentice 2005)
Take-home
• Choice of weight function can be critical• K&M recommend applying log-rank and
Gehan • Cox regression (simple) is akin to log-rank• Think carefully about the distribution of
weights and about possible crossing of hazards
What About Weights…
• We know that R has limited selection for weights.
• SAS doesn’t seem to allow us to specify any weights (at least not in proc lifetest)
• So of course we can write our own function…
R Function for Different Weights
• What information will we need to construct the different weights?
• Can we get this information from R?
Building Our R Function
> times<-kidney$Time> cens<-kidney$d> grp<-kidney$cath> fit<-survfit(Surv(times, cens)~1)> tm<-summary(fit)$time> Yi<-fit$n.risk[which(fit$time%in%tm)]> di<-fit$n.event[which(fit$time%in%tm)]> Yi [1] 119 103 98 89 79 73 66 55 49 45 40 25 23 20 9 5> di [1] 6 1 2 2 2 1 1 2 1 1 1 2 1 1 1 1
> fit<-survfit(st~kidney$cath)> summary(fit)Call: survfit(formula = st ~ kidney$cath) kidney$cath=1 time n.risk n.event survival std.err lower 95% CI upper 95% CI 1.5 43 1 0.977 0.0230 0.9327 1.000 3.5 40 1 0.952 0.0329 0.8899 1.000 4.5 36 2 0.899 0.0478 0.8104 0.998 5.5 33 1 0.872 0.0536 0.7732 0.984 8.5 25 2 0.802 0.0683 0.6790 0.948 9.5 22 1 0.766 0.0743 0.6332 0.926 10.5 20 1 0.728 0.0799 0.5868 0.902 11.5 18 1 0.687 0.0851 0.5392 0.876 15.5 11 1 0.625 0.0976 0.4599 0.849 16.5 10 1 0.562 0.1060 0.3886 0.813 18.5 9 1 0.500 0.1111 0.3233 0.773 23.5 4 1 0.375 0.1366 0.1835 0.766 26.5 2 1 0.187 0.1491 0.0394 0.891
kidney$cath=2 time n.risk n.event survival std.err lower 95% CI upper 95% CI 0.5 76 6 0.921 0.0309 0.862 0.984 2.5 56 2 0.888 0.0376 0.817 0.965 3.5 49 1 0.870 0.0409 0.793 0.954 6.5 35 1 0.845 0.0467 0.758 0.942 15.5 14 1 0.785 0.0726 0.655 0.941
> names(fit) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" "type" "strata" "std.err" "upper" [11] "lower" "conf.type" "conf.int" "call"
> names(fit) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" "type" "strata" "std.err" "upper" [11] "lower" "conf.type" "conf.int" "call"
> fit$n.risk [1] 43 42 40 36 33 31 29 25 22 20 18 16 14 13 11 10 9 8 6 4 3 2 1 76 60 56 49 43 40 35 33 30 27[34] 25 22 20 16 14 13 11 10 7 6 5 4 3 1
> fit$n.event [1] 1 0 1 2 1 0 0 2 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 6 0 2 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
> names(summary(fit)) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" "type" "strata" "std.err" "upper" [11] "lower" "conf.type" "conf.int" "call" "table“
> summary(fit)$n.risk [1] 43 40 36 33 25 22 20 18 11 10 9 4 2 76 56 49 35 14
> summary(fit)$n.event [1] 1 1 2 1 2 1 1 1 1 1 1 1 1 6 2 1 1 1
Building Our R Function
Building Our R Function
• We still need to think about how to estimate Yi1 and di1 for all times where > 1 event occurs– including times where group 1 is censored
• We can certainly construct a risk set using what we get out of R. – Recall how we find the risk set…
Building Our R Function> dat<-cbind(times, cens)[which(grp==1),]> yij<-dij<-c()> for (i in 1:length(tm))
{tmi<-tm[i]yij<-append(yij, length(which(dat[,1]>=tmi)))dij<-append(dij, sum(dat[which(dat[,1]==tmi),2]))}
> yij [1] 43 43 42 40 36 33 31 25 22 20 18 11 10 9 4 2> dij [1] 0 1 0 1 2 1 0 2 1 1 1 1 1 1 1 1
-We need to estimate the weights so we can construct the weighted versions
Test Statistic-We have all the parts we need to construct the “constant” portion of our test statistic.
>OmEi<-dij-yij*(di/Yi)
>vi<-(yij/Yi)*(1-yij/Yi)*((Yi-di)/(Yi-1))*di
> round(OmEi, 3) [1] -2.168 0.583 -0.857 0.101 1.089 0.548 -0.470 1.091 0.551 [10] 0.556 0.550 0.120 0.565 0.550 0.556 0.600
> round(vi, 3) [1] 1.326 0.243 0.485 0.489 0.490 0.248 0.249 0.487 0.247 0.247 0.248 0.472 0.246 0.248 0.247 0.240
-Now we need to estimate the weights so we can construct the weighted versions…
#generating weights Sim1<-c(1,fit$surv[which(fit$time%in%tm)][1:(length(tm)-1)]) if (wt=="lr") Wti<-rep(1, length(tm)) if (wt=="geh") Wti<-Yi if (wt=="tw") Wti<-sqrt(Yi) if (wt=="pp") Wti<-cumprod(1-di/(Yi+1)) if (wt=="mpp") Wti<-cumprod(1-di/(Yi+1))*Yi/(Yi+1) if (wt=="fh")
{if(missing(p) | missing(q)) stop("Use of Fleming-Harrington Weights
requires values for p and q") else Wti<-Sim1^p*(1-Sim1)^q
}
#Example Using the Gehan weight> wt=“geh”> if (wt=="geh") Wti<-Yi> Wti [1] 119 103 98 89 79 73 66 55 49 45 40 25 23 20 9 5
Different Weight functions
Final Calculations#Apply the chosen weight to our test statistic and it’s variance> OmE<-as.numeric(t(Wti)%*%OmEi)> v<-as.numeric(t(Wti^2)%*%vi)> tstat<-OmE^2/v> pval<-pchisq(tstat, df=1, lower.tail=F)
> OmE[1] -9
> v[1] 38861.81
> tstat[1] 0.002084309
> pval[1] 0.9635858
survdiff_wts<-function(times, cens, grp, wt, p, q) { fit<-survfit(Surv(times, cens)~1) tm<-summary(fit)$time Yi<-fit$n.risk[which(fit$time%in%tm)] di<-fit$n.event[which(fit$time%in%tm)] dat<-cbind(times, cens)[which(grp==1),] yij<-dij<-c() for (i in 1:length(tm)) {
tmi<-tm[i]yij<-append(yij, length(which(dat[,1]>=tmi)))dij<-append(dij, sum(dat[which(dat[,1]==tmi),2])) }
OmEi<-dij-yij*(di/Yi) vi<-(yij/Yi)*(1-yij/Yi)*((Yi-di)/(Yi-1))*di Sim1<-c(1,fit$surv[which(fit$time%in%tm)][1:(length(tm)-1)]) if (wt=="lr") Wti<-rep(1, length(tm)) if (wt=="geh") Wti<-Yi if (wt=="tw") Wti<-sqrt(Yi) if (wt=="pp") Wti<-cumprod(1-di/(Yi+1)) if (wt=="mpp") Wti<-cumprod(1-di/(Yi+1))*Yi/(Yi+1) if (wt=="fh") {
if(missing(p) | missing(q)) stop("Use of Fleming-Harrington Weights requires values for p and q") else Wti<-Sim1^p*(1-Sim1)^q } OmE<-as.numeric(t(Wti)%*%OmEi) v<-as.numeric(t(Wti^2)%*%vi) tstat<-OmE^2/v pval<-pchisq(tstat, df=1, lower.tail=F) ans<-list(weights=Wti, Z_tau=OmE, sig_11=v, chisq=tstat, pval=pval) names(ans)<-c("Weights", "Z_tau","sig_11","chisq value","pvalue") return(ans)}
Larynx Cancer
• 90 patients diagnosed with larynx cancer (1970’s)
• Patients classified according to disease stage– Stages I-IV
• We are interested in survival • BUT we want to compare the four stages
Kaplan-Meier curves
R: survdiff>lar<-read.csv("H:public.html\\BMTRY_722_Summer2015\\Date\\larynx.csv")>time<-lar$time; death<-lar$death; stage<-lar$stage>st<-Surv(time, death) > test0<-survdiff(st~stage)> test0Call: survdiff(formula = st ~ stage) N Observed Expected (O-E)^2/E (O-E)^2/Vstage=1 33 15 22.57 2.537 4.741stage=2 17 7 10.01 0.906 1.152stage=3 27 17 14.08 0.603 0.856stage=4 13 11 3.34 17.590 19.827
Chisq= 22.8 on 3 degrees of freedom, p= 4.53e-05
R: survdiff> test1<-survdiff(st~stage, rho=1)> test1Call: survdiff(formula = st ~ stage, rho=1)… Chisq= 23.1 on 3 degrees of freedom, p= 3.85e-05
> test2<-survdiff(st~stage, rho=3)> test2Call: survdiff(formula = st ~ stage, rho=3)… Chisq= 21.8 on 3 degrees of freedom, p= 7.03e-05
Recall: W(ti)=Y(ti)S(t_(i-1))^p(1-S(t_(i-1)))^q
What about our hazards
R: survdiff> test3<-survdiff(st[stage<3]~stage[stage<3]) Chisq= 0 on 1 degrees of freedom, p= 0.866
> test4<-survdiff(st~factor(disease, exclude=c(2,4))) Chisq= 3.1 on 1 degrees of freedom, p= 0.0801
> test5<- survdiff(st~factor(disease, exclude=c(2,3))) Chisq= 23.4 on 1 degrees of freedom, p= 1.32e-06
> test6<-survdiff(st~factor(disease, exclude=c(1,4))) Chisq= 1.5 on 1 degrees of freedom, p= 0.266
> test7<-survdiff(st~factor(disease, exclude=c(1,3)) Chisq= 11.5 on 1 degrees of freedom, p= 0.000679
> test8<-survdiff(st[stage>2]~stage[stage>2]) Chisq= 0.5 on 1 degrees of freedom, p= 0.769
What about the differences
• Not much evidence of hazards crossing• If there isn’t overlap, then tests will be
somewhat consistent• Log-rank: most appropriate when hazards are
proportional
Test For Trends
• We generally perform tests of trends for ordinal variables– Dose level– PSA categories (prostate cancer)– Cancer stage
• Different than treating variable as continuous, although that is one ‘accepted’ approach
• For continuous covariates, we need a regression model (we will get there shortly)
Formally Tests for trends
• Our hypothesis is
• Any weight function discussed previously can be used
• Test statistic:
0 1 2
1 2
: ...
: ... with at least 1 strict inequality
K
A K
H h t h t h t
H h t h t h t
1 1ˆ
K
j jj i
K K
j g jgj g
a ZZ
a a
Formally Tests for trends
• aj : Weights- often chosen as aj = j but can be user specified
• sjg : jth, gth element of the variance-covariance matrix of Zj(t)
1 1
~ 0,1ˆ
K
j jj i
K K
j g jgj g
a ZZ N
a a
Stage: Ordinal Categories
Trend Test in R#Test Trend in Rsurv.trendtest<-function(times, cens, wt, aj) { require(survival) test<-survdiff(Surv(times, cens), rho=wt) zj<-test$obs-test$exp zv<-test$var num<-sum(aj*zj) den<-0 for (i in 1:length(aj)) {
for (g in 1:length(aj)) { den<-den+aj[i]*aj[g]*zv[i,g]}
} den<-sqrt(den) zz<-num/den pval<-2*(1-pnorm(abs(zz))) return(list(Z=zz, pvalue=pval))}
Trend Test in R>test.t0<-surv.trendtest(test=test0, wt=1:4)>test.t0$Z[1] 3.718959
$pvalue[1] 0.0002000459
>test.t1<-surv.trendtest(test=test1, wt=1:4)> test.t1$Z[1] 4.120055
$pvalue[1] 3.787827e-05
Next Time…
• Stratified tests• Other K-sample tests