lecture 10: hypothesis testing ii weight functions trend tests

Lecture 10: Hypothesis Testing II

Weight FunctionsTrend Tests

Testing >2 Samples in R> ###2-sample testing using toy example> time<-c(3,6,9,9,11,16,8,9,10,12,19,23)> cens<-c(1,0,1,1,0,1,1,1,0,0,1,0)> grp<-c(1,1,1,1,1,1,2,2,2,2,2,2)> grp<-as.factor(grp)> > sdat<-Surv(time, cens)> survdiff(sdat~grp)Call:survdiff(formula = sdat ~ grp)

N Observed Expected (O-E)^2/E (O-E)^2/Vgrp=1 6 4 2.57 0.800 1.62grp=2 6 3 4.43 0.463 1.62

Chisq= 1.6 on 1 degrees of freedom, p= 0.203

Testing >2 Samples in R> survdiff(sdat~grp, rho=1)Call:survdiff(formula = sdat ~ grp)

N Observed Expected (O-E)^2/E (O-E)^2/Vgrp=1 6 3.2 2.15 0.513 1.23grp=2 6 2.11 3.16 0.349 1.23

Chisq= 1.6 on 1 degrees of freedom, p= 0.268

Revisit ‘Linear Dependence’ of Zj(t)• How are they linearly dependent?

• Two sample case:

Revisit ‘Linear Dependence’ of Zj(t)• K-sample case:

Beyond Log-Rank

• Log-rank has optimum power to detect Ha when the hazard rates of our K groups are proportional

• What if they’re not…

• We’ve mentioned using other weight functions

• Depending on the choice of weight functions, we can place emphasis on different regions of the survival curve.

Example: Kidney Infection

• Data on 119 kidney dialysis patients• Comparing time to kidney infection between

two groups– Catheters placed percutaneously (n = 76)– Catheters placed surgically (n = 43)

Log-Rank Testti Yi1 di1 Yi2 di2 Yi di

0.5 43 0 76 6 119 6 2.168 -2.168 1.3261.5 43 1 60 0 103 1 0.417 0.583 0.2432.5 42 0 56 2 98 2 0.857 -0.857 0.4853.5 40 1 49 1 89 2 0.899 0.101 0.4894.5 36 2 43 0 79 2 0.911 1.089 0.4905.5 33 1 40 0 73 1 0.452 0.548 0.2486.5 31 0 35 1 66 1 0.470 -0.470 0.2498.5 25 2 30 0 55 2 0.909 1.091 0.4879.5 22 1 27 0 49 1 0.449 0.551 0.247

10.5 20 1 25 0 45 1 0.444 0.556 0.24711.5 18 1 22 0 40 1 0.450 0.550 0.24815.5 11 1 14 1 25 2 0.88 0.120 0.47216.5 10 1 13 0 23 1 0.435 0.565 0.24618.5 9 1 11 0 20 1 0.450 0.550 0.24823.5 4 1 5 0 9 1 0.440 0.556 0.24726.5 2 1 3 0 5 1 0.400 0.600 0.240Sum 3.964 6.211

1i

i

di YY 1 1

i

i

di i Yd Y

1 2

2 1

i i i i i

i i

Y Y d Y d

Y Y

ComparisonsTest p-valueLog-Rank 1 3.96 6.21 2.53 0.112Gehan -9 38862 0.002 0.964Tarone-Ware 13.2 432.83 0.4 0.526Peto-Peto 2.47 4.36 1.4 0.237

Modified Peto-Peto 2.31 4.2 1.28 0.259

Fleming-Harrington p=0; q=1 1.41 0.21 9.67 0.002



Fleming-Harrington p=0.5; q=0.5 2.47 0.66 9.28 0.002

Fleming-Harrington p=0.5; q=2 0.32 0.01 8.18 0.004

iW t 212

11 1Z

iY

iY

iS t

1i

i

Yi YS t

1ˆ

iS t

1ˆ1 iS t

1 1ˆ ˆ1i iS t S t

0.50.5

1 1ˆ ˆ1i iS t S t

20.5

1 1ˆ ˆ1i iS t S t

Notice the Differences!

• Situation of varying inference• Need to be sure you are testing what you

think you are testing• Check– Look at the hazards– Do they cross?

• Problem– Estimating hazards is imprecise (as we’ve

discussed)

Cumulative Hazards

Hazard Rate (smoothing spline)

Misconception

• Survival curves crossing telling about appropriateness of log-rank

• Not true– Survival curves crossing depends on censoring and study duration– What if they cross but we don’t look far enough out

• Consider– Survival curves cross hazards cross– Hazards cross survival curves may or may not cross

• Solution?– Test regions of t– Prior and after cross based in looking at hazard– Some tests allow for crossing (Yang and Prentice 2005)

Take-home

• Choice of weight function can be critical• K&M recommend applying log-rank and

Gehan • Cox regression (simple) is akin to log-rank• Think carefully about the distribution of

weights and about possible crossing of hazards

What About Weights…

• We know that R has limited selection for weights.

• SAS doesn’t seem to allow us to specify any weights (at least not in proc lifetest)

• So of course we can write our own function…

R Function for Different Weights

• What information will we need to construct the different weights?

• Can we get this information from R?

Building Our R Function

> times<-kidney$Time> cens<-kidney$d> grp<-kidney$cath> fit<-survfit(Surv(times, cens)~1)> tm<-summary(fit)$time> Yi<-fit$n.risk[which(fit$time%in%tm)]> di<-fit$n.event[which(fit$time%in%tm)]> Yi [1] 119 103 98 89 79 73 66 55 49 45 40 25 23 20 9 5> di [1] 6 1 2 2 2 1 1 2 1 1 1 2 1 1 1 1

> fit<-survfit(st~kidney$cath)> summary(fit)Call: survfit(formula = st ~ kidney$cath) kidney$cath=1 time n.risk n.event survival std.err lower 95% CI upper 95% CI 1.5 43 1 0.977 0.0230 0.9327 1.000 3.5 40 1 0.952 0.0329 0.8899 1.000 4.5 36 2 0.899 0.0478 0.8104 0.998 5.5 33 1 0.872 0.0536 0.7732 0.984 8.5 25 2 0.802 0.0683 0.6790 0.948 9.5 22 1 0.766 0.0743 0.6332 0.926 10.5 20 1 0.728 0.0799 0.5868 0.902 11.5 18 1 0.687 0.0851 0.5392 0.876 15.5 11 1 0.625 0.0976 0.4599 0.849 16.5 10 1 0.562 0.1060 0.3886 0.813 18.5 9 1 0.500 0.1111 0.3233 0.773 23.5 4 1 0.375 0.1366 0.1835 0.766 26.5 2 1 0.187 0.1491 0.0394 0.891

kidney$cath=2 time n.risk n.event survival std.err lower 95% CI upper 95% CI 0.5 76 6 0.921 0.0309 0.862 0.984 2.5 56 2 0.888 0.0376 0.817 0.965 3.5 49 1 0.870 0.0409 0.793 0.954 6.5 35 1 0.845 0.0467 0.758 0.942 15.5 14 1 0.785 0.0726 0.655 0.941

> names(fit) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" "type" "strata" "std.err" "upper" [11] "lower" "conf.type" "conf.int" "call"

> names(fit) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" "type" "strata" "std.err" "upper" [11] "lower" "conf.type" "conf.int" "call"

> fit$n.risk [1] 43 42 40 36 33 31 29 25 22 20 18 16 14 13 11 10 9 8 6 4 3 2 1 76 60 56 49 43 40 35 33 30 27[34] 25 22 20 16 14 13 11 10 7 6 5 4 3 1

> fit$n.event [1] 1 0 1 2 1 0 0 2 1 1 1 0 0 0 1 1 1 0 0 1 0 1 0 6 0 2 1 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0

> names(summary(fit)) [1] "n" "time" "n.risk" "n.event" "n.censor" "surv" "type" "strata" "std.err" "upper" [11] "lower" "conf.type" "conf.int" "call" "table“

> summary(fit)$n.risk [1] 43 40 36 33 25 22 20 18 11 10 9 4 2 76 56 49 35 14

> summary(fit)$n.event [1] 1 1 2 1 2 1 1 1 1 1 1 1 1 6 2 1 1 1



• We still need to think about how to estimate Yi1 and di1 for all times where > 1 event occurs– including times where group 1 is censored

• We can certainly construct a risk set using what we get out of R. – Recall how we find the risk set…

Building Our R Function> dat<-cbind(times, cens)[which(grp==1),]> yij<-dij<-c()> for (i in 1:length(tm))

{tmi<-tm[i]yij<-append(yij, length(which(dat[,1]>=tmi)))dij<-append(dij, sum(dat[which(dat[,1]==tmi),2]))}

> yij [1] 43 43 42 40 36 33 31 25 22 20 18 11 10 9 4 2> dij [1] 0 1 0 1 2 1 0 2 1 1 1 1 1 1 1 1

-We need to estimate the weights so we can construct the weighted versions

Test Statistic-We have all the parts we need to construct the “constant” portion of our test statistic.

>OmEi<-dij-yij*(di/Yi)

>vi<-(yij/Yi)*(1-yij/Yi)*((Yi-di)/(Yi-1))*di

> round(OmEi, 3) [1] -2.168 0.583 -0.857 0.101 1.089 0.548 -0.470 1.091 0.551 [10] 0.556 0.550 0.120 0.565 0.550 0.556 0.600

> round(vi, 3) [1] 1.326 0.243 0.485 0.489 0.490 0.248 0.249 0.487 0.247 0.247 0.248 0.472 0.246 0.248 0.247 0.240

-Now we need to estimate the weights so we can construct the weighted versions…

#generating weights Sim1<-c(1,fit$surv[which(fit$time%in%tm)][1:(length(tm)-1)]) if (wt=="lr") Wti<-rep(1, length(tm)) if (wt=="geh") Wti<-Yi if (wt=="tw") Wti<-sqrt(Yi) if (wt=="pp") Wti<-cumprod(1-di/(Yi+1)) if (wt=="mpp") Wti<-cumprod(1-di/(Yi+1))*Yi/(Yi+1) if (wt=="fh")

{if(missing(p) | missing(q)) stop("Use of Fleming-Harrington Weights

requires values for p and q") else Wti<-Sim1^p*(1-Sim1)^q

}

#Example Using the Gehan weight> wt=“geh”> if (wt=="geh") Wti<-Yi> Wti [1] 119 103 98 89 79 73 66 55 49 45 40 25 23 20 9 5

Different Weight functions

Final Calculations#Apply the chosen weight to our test statistic and it’s variance> OmE<-as.numeric(t(Wti)%*%OmEi)> v<-as.numeric(t(Wti^2)%*%vi)> tstat<-OmE^2/v> pval<-pchisq(tstat, df=1, lower.tail=F)

> OmE[1] -9

> v[1] 38861.81

> tstat[1] 0.002084309

> pval[1] 0.9635858

survdiff_wts<-function(times, cens, grp, wt, p, q) { fit<-survfit(Surv(times, cens)~1) tm<-summary(fit)$time Yi<-fit$n.risk[which(fit$time%in%tm)] di<-fit$n.event[which(fit$time%in%tm)] dat<-cbind(times, cens)[which(grp==1),] yij<-dij<-c() for (i in 1:length(tm)) {

tmi<-tm[i]yij<-append(yij, length(which(dat[,1]>=tmi)))dij<-append(dij, sum(dat[which(dat[,1]==tmi),2])) }

OmEi<-dij-yij*(di/Yi) vi<-(yij/Yi)*(1-yij/Yi)*((Yi-di)/(Yi-1))*di Sim1<-c(1,fit$surv[which(fit$time%in%tm)][1:(length(tm)-1)]) if (wt=="lr") Wti<-rep(1, length(tm)) if (wt=="geh") Wti<-Yi if (wt=="tw") Wti<-sqrt(Yi) if (wt=="pp") Wti<-cumprod(1-di/(Yi+1)) if (wt=="mpp") Wti<-cumprod(1-di/(Yi+1))*Yi/(Yi+1) if (wt=="fh") {

if(missing(p) | missing(q)) stop("Use of Fleming-Harrington Weights requires values for p and q") else Wti<-Sim1^p*(1-Sim1)^q } OmE<-as.numeric(t(Wti)%*%OmEi) v<-as.numeric(t(Wti^2)%*%vi) tstat<-OmE^2/v pval<-pchisq(tstat, df=1, lower.tail=F) ans<-list(weights=Wti, Z_tau=OmE, sig_11=v, chisq=tstat, pval=pval) names(ans)<-c("Weights", "Z_tau","sig_11","chisq value","pvalue") return(ans)}

Larynx Cancer

• 90 patients diagnosed with larynx cancer (1970’s)

• Patients classified according to disease stage– Stages I-IV

• We are interested in survival • BUT we want to compare the four stages

Kaplan-Meier curves

R: survdiff>lar<-read.csv("H:public.html\\BMTRY_722_Summer2015\\Date\\larynx.csv")>time<-lar$time; death<-lar$death; stage<-lar$stage>st<-Surv(time, death) > test0<-survdiff(st~stage)> test0Call: survdiff(formula = st ~ stage) N Observed Expected (O-E)^2/E (O-E)^2/Vstage=1 33 15 22.57 2.537 4.741stage=2 17 7 10.01 0.906 1.152stage=3 27 17 14.08 0.603 0.856stage=4 13 11 3.34 17.590 19.827

Chisq= 22.8 on 3 degrees of freedom, p= 4.53e-05

R: survdiff> test1<-survdiff(st~stage, rho=1)> test1Call: survdiff(formula = st ~ stage, rho=1)… Chisq= 23.1 on 3 degrees of freedom, p= 3.85e-05

> test2<-survdiff(st~stage, rho=3)> test2Call: survdiff(formula = st ~ stage, rho=3)… Chisq= 21.8 on 3 degrees of freedom, p= 7.03e-05

Recall: W(ti)=Y(ti)S(t_(i-1))^p(1-S(t_(i-1)))^q

What about our hazards

R: survdiff> test3<-survdiff(st[stage<3]~stage[stage<3]) Chisq= 0 on 1 degrees of freedom, p= 0.866

> test4<-survdiff(st~factor(disease, exclude=c(2,4))) Chisq= 3.1 on 1 degrees of freedom, p= 0.0801

> test5<- survdiff(st~factor(disease, exclude=c(2,3))) Chisq= 23.4 on 1 degrees of freedom, p= 1.32e-06

> test6<-survdiff(st~factor(disease, exclude=c(1,4))) Chisq= 1.5 on 1 degrees of freedom, p= 0.266

> test7<-survdiff(st~factor(disease, exclude=c(1,3)) Chisq= 11.5 on 1 degrees of freedom, p= 0.000679

> test8<-survdiff(st[stage>2]~stage[stage>2]) Chisq= 0.5 on 1 degrees of freedom, p= 0.769

What about the differences

• Not much evidence of hazards crossing• If there isn’t overlap, then tests will be

somewhat consistent• Log-rank: most appropriate when hazards are

proportional

Test For Trends

• We generally perform tests of trends for ordinal variables– Dose level– PSA categories (prostate cancer)– Cancer stage

• Different than treating variable as continuous, although that is one ‘accepted’ approach

• For continuous covariates, we need a regression model (we will get there shortly)

Formally Tests for trends

• Our hypothesis is

• Any weight function discussed previously can be used

• Test statistic:

0 1 2

1 2

: ...

: ... with at least 1 strict inequality

K

A K

H h t h t h t

H h t h t h t

1 1ˆ

K

j jj i

K K

j g jgj g

a ZZ

a a

Formally Tests for trends

• aj : Weights- often chosen as aj = j but can be user specified

• sjg : jth, gth element of the variance-covariance matrix of Zj(t)

1 1

~ 0,1ˆ

K

j jj i

K K

j g jgj g

a ZZ N

a a

Stage: Ordinal Categories

Trend Test in R#Test Trend in Rsurv.trendtest<-function(times, cens, wt, aj) { require(survival) test<-survdiff(Surv(times, cens), rho=wt) zj<-test$obs-test$exp zv<-test$var num<-sum(aj*zj) den<-0 for (i in 1:length(aj)) {

for (g in 1:length(aj)) { den<-den+aj[i]*aj[g]*zv[i,g]}

} den<-sqrt(den) zz<-num/den pval<-2*(1-pnorm(abs(zz))) return(list(Z=zz, pvalue=pval))}

Trend Test in R>test.t0<-surv.trendtest(test=test0, wt=1:4)>test.t0$Z[1] 3.718959

$pvalue[1] 0.0002000459

>test.t1<-surv.trendtest(test=test1, wt=1:4)> test.t1$Z[1] 4.120055

$pvalue[1] 3.787827e-05

Next Time…

• Stratified tests• Other K-sample tests

lecture 10: hypothesis testing ii weight functions trend tests

Documents