external validation in biomarker research: examples from ... · review (milan) central microarray...
Post on 08-Jul-2020
3 Views
Preview:
TRANSCRIPT
External validation in biomarker research:
examples from gene profiling
Marc Buyse, ScDIDDI, Louvain-la-Neuve, and
Hasselt University, Diepenbeek, Belgium
Challenges in design, analysis and reporting of prognostic and predictive marker research
Freiburg, October 8, 2008
““There are few tumor markers that There are few tumor markers that are clinically useful in predicting are clinically useful in predicting therapeutic response or patient outcomes therapeutic response or patient outcomes despite nearly 20 years of advances in despite nearly 20 years of advances in molecular biology.molecular biology.””
Current state of tumor markers
Hammond and Taube, Seminars in Oncology, 2002Hammond and Taube, Seminars in Oncology, 2002
““There are few tumor markers that There are few tumor markers that are clinically useful in predicting are clinically useful in predicting therapeutic response or patient outcomes therapeutic response or patient outcomes despite nearly 20 years of advances in despite nearly 20 years of advances in molecular biology.molecular biology.””
Few predictive markers
Hammond and Taube, Seminars in Oncology, 2002Hammond and Taube, Seminars in Oncology, 2002
““There are few tumor markers that There are few tumor markers that are clinically useful in predicting are clinically useful in predicting therapeutic response or patient outcomes therapeutic response or patient outcomes despite nearly 20 years of advances in despite nearly 20 years of advances in molecular biology.molecular biology.””
Few prognostic markers
Hammond and Taube, Seminars in Oncology, 2002Hammond and Taube, Seminars in Oncology, 2002
Reasons for conflicting results in biomarker studies
•• Different assay protocols or measurement techniquesDifferent assay protocols or measurement techniques•• Specimen format (freshSpecimen format (fresh--frozen vs. fixed tissue, serum)frozen vs. fixed tissue, serum)•• Different clinical endpoints (e.g., response, DFS, OS)Different clinical endpoints (e.g., response, DFS, OS)•• Different patient populations (e.g., stage, treatments)Different patient populations (e.g., stage, treatments)•• Single study without independent confirmationSingle study without independent confirmation•• Statistical issues Statistical issues (next slides)(next slides)
Simon et al, JNCI 2003;95,14; Lusa et al, Statist in Med 2007; 2Simon et al, JNCI 2003;95,14; Lusa et al, Statist in Med 2007; 26, 1102.6, 1102.
Statistical reasons for conflicting results in biomarker studies - 1
•• Underpowered Underpowered •• small sample sizes small sample sizes •• few few ““eventsevents””•• insensitive tests for interaction insensitive tests for interaction
•• OverOver--analyzedanalyzed•• multiple endpoints multiple endpoints •• cutpoint optimizationcutpoint optimization•• model overfittingmodel overfitting•• subset analysessubset analyses
Simon et al, JNCI 2003;95,14; Lusa et al, Statist in Med 2007; 2Simon et al, JNCI 2003;95,14; Lusa et al, Statist in Med 2007; 26, 1102.6, 1102.
Statistical reasons for conflicting results in biomarker studies - 2
•• No prospective protocol No prospective protocol •• Data dredgingData dredging•• No control of multiplicityNo control of multiplicity•• Inappropriate statistics (PInappropriate statistics (P--values, odds ratios)values, odds ratios)•• Publication bias Publication bias •• Poor reportingPoor reporting
Simon et al, JNCI 2003;95,14; Lusa et al, Statist in Med 2007; 2Simon et al, JNCI 2003;95,14; Lusa et al, Statist in Med 2007; 26, 1102.6, 1102.
• Signatures predictive of outcome:– « Amsterdam » 76-gene signature
(Agendia)– « Rotterdam » 70-gene signature
(Veridex)– « Oncotype DX » 21-gene signature
(Genomic Health)
• Signature predictive of risk (pathological grade):– « Genomic grade index » (GGI)
(Institut Bordet)
Example of molecular profilingin early breast cancer
van de Vijver et al, NEJM 2002;347,1999; Paik et al, NEJM 2004;3van de Vijver et al, NEJM 2002;347,1999; Paik et al, NEJM 2004;351,2817; 51,2817; Wang et al, Lancet 2005;365:671; Sotiriou et al, JNCI 2006;98:26Wang et al, Lancet 2005;365:671; Sotiriou et al, JNCI 2006;98:262. 2.
Signatures predictive of outcome
Apply algorithm to identify classifier
Measure ≈ 25,000 genes in RNA from breast tumors Good Class:
No metastasesat 5 (or 10) years
Poor Class:Metastases within 5 (or 10) years
The “Amsterdam” (Agendia) signature
• Discovery (or “training”) set : – 78 node negative patients– tumor < 5 cm– < 55 years old– ER- or ER+– Few or none received endocrine or chemotherapy
• Validation (or “test”) set : – 295 patients (including 61/78 from discovery set) – 151 node negative / 144 node positive patients
van de Vijver et al, NEJM 2002;347,1999van de Vijver et al, NEJM 2002;347,1999
Paik et al, NEJM 2004;351,2817Paik et al, NEJM 2004;351,2817
The “Oncotype DX” signature
Risk = 7%95% CI:(4%,10%)
Risk = 14%95% CI:(8%,20%)
Risk = 31%95% CI:(24%,37%)
< 18 18-30 > 30
Signatures predictive of risk
Apply algorithm to identify classifier
Measure ≈ 25,000 genes in RNA from breast tumors Good Class:
Histological grade 1
Poor Class:Histological grade 3
Affymetrix U133AAffymetrix U133A22,283 probe sets22,283 probe sets
The genomic grade
• Discovery (or “training”) set : – 64 node negative patients (33 histological grade 1,
31 histological grade 3)– All ER+– All untreated
• Validation (or “test”) set : – 129 new patients– 300 patients from published datasets
Sotiriou et al., Sotiriou et al., J Natl Cancer Instit 2006;98:262.J Natl Cancer Instit 2006;98:262.
Histologic GradeHistologic Grade
G1G1
G2G2
G3G3
Genomic GradeGenomic Grade
GG1GG1
GG2GG2
GG3GG3
• G2 : poor inter observer reproducibility• G2: difficult treatment decision
making, under- or overtreatment likely
• More objective assessment (based on gene expression)
• Easier treatment decision-making• Most genes involved in cell
proliferationSotiriou et al., Sotiriou et al., J Natl Cancer Instit 2006;98:262.J Natl Cancer Instit 2006;98:262.
Signature predictive of risk (grade)
GG3GG3GG1GG1
Problems with discovery set(Amsterdam signature)
Cross-validation in discovery set excluded gene selection this may have led to overestimation of odds ratio in discovery set
Simon et al, JNCI 2003;95,14.Simon et al, JNCI 2003;95,14.
Problems with validation set(Amsterdam signature)
Validation set (295 patients) included some patients from discovery set this may have led to overestimation of odds ratio in validation set
Lusa et al, Statist in Med 2007; 26, 1102.Lusa et al, Statist in Med 2007; 26, 1102.
van de Vijver et al, NEJM 2002;347,1999van de Vijver et al, NEJM 2002;347,1999
Is predictive accuracy acceptable?
van de Vijver et al, NEJM 2002;347,1999van de Vijver et al, NEJM 2002;347,1999
* OR = (31 / 18) / (3 / 26) = 15.0
*
Is predictive accuracy acceptable?
Is predictive accuracy acceptable?Amsterdam signature
Sensitivity = 31 / 34 = .91Specificity = 26 / 44 = .59
The odds ratio is not a good indicator of predictive accuracy
Pepe et al,Pepe et al, Am J Epidemiol 2004; Am J Epidemiol 2004; 159:882.159:882.
The odds ratio is not a good indicator of predictive accuracy
Pepe et al,Pepe et al, Am J Epidemiol 2004;Am J Epidemiol 2004;159:882.159:882.
Sensitivity = 91%
Specificity = 59%
Is predictive accuracy acceptable?Rotterdam signature
Relapse Hazard Score
Prob
abili
ty o
f dis
tant
met
asta
sis
at 5
yea
rs
-75 -60 -45 -30 -15 0 10 25 40 55 70 85 100 115 130 145
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Good-Prognosis Group Poor-Prognosis Group
Sensitivity: 52/56=93%
Specificity: 55/115=48%
Adapted from Foekens, Erasmus Medical Center, Rotterdam, the NetAdapted from Foekens, Erasmus Medical Center, Rotterdam, the NetherlandsherlandsWang et al, Lancet 2005;365:671.Wang et al, Lancet 2005;365:671.
External validation of signatures
• Tumor samples and clinical data: 326 patients from 5 European institutions (N-, < 5 cm tumors, < 61 year old: 19 patients disqualified)
• Median follow-up: 13.6 years• Endpoints:
Time to distant metastasesOverall survival Disease-free survival
Buyse et al,Buyse et al, JNCI 2006;98,1183.JNCI 2006;98,1183.
External validation of signatures
Central PathologyReview (Milan)
Central Microarray
Analyses / Review (Amsterdam / Lausanne)
Central ValidationAnalyses
(IDDI, Brussels)
IndependentClinical Site
Audits
IGR(Villejuif)
JRH(Oxford)
GH(London)
KI(Stockholm)
CRH(Paris)
NKI(Amsterdam)
Key validation issues1 – is prognostic value confirmed ?
Year
Pro
babi
lity
0.0
0.2
0.4
0.6
0.8
1.0
0 2 4 6 8 10 12 14
Patients Events Risk group
52 7 Gene signature low risk, clinical low risk59 11 Gene signature low risk, clinical high risk28 6 Gene signature high risk, clinical low risk163 52 Gene signature high risk, clinical high risk
HRsignature = 2.32 [1.35 – 4.00]
Time to distant metastases
70%
90%
GGI compared with Amsterdam signature
Year
Pro
babi
lity
0.0
0.2
0.4
0.6
0.8
1.0
0 2 4 6 8 10 12 14 16 18 20 22 24 26
Patients Events Risk group
113 18 Agendia gene signature low risk192 57 Agendia gene signature high risk92 15 GGI low risk213 60 GGI high risk
Time to distant metastases
70%
90%
Regression to the mean?Patient selection?Other reason?
Key validation issues1 – is prognostic value confirmed?
Buyse et al,Buyse et al, JJ NatlNatl Cancer Instit 2006;Cancer Instit 2006;98:1183; Desmedt et al, Clin Cancer Res 2007;13: 3207.98:1183; Desmedt et al, Clin Cancer Res 2007;13: 3207.
Key validation issues2 – is signature independent of clinical risk?
Adjuvant! online
Key validation issues2 – is signature independent of clinical risk?
1% 10% 21% 30%40%
62%
87%98%
2.29 2.13 2.27 2.341.87 2.04
2.562.51
60% 65% 70% 75% 80% 85% 90% 95%
High clinical risk defined as probability of 10-year survival lower than
Proportion of
patients in
high clinical
risk group 0.1
1
10
Adjusted hazard ratio for
gene signature
Time to distant metastases
Key validation issues3 – is prognostic value robust across sites?
1.00.2 10
Low High Adjusted High risk Low riskstudy Events/Patients Events/Patients HR (CI) better better
IGR 7 / 46 15 / 50 2.06(0.81,5.25)
KI 2 / 27 10 / 33 9.93(1.13,87.27)
CRH 4 / 17 8 / 38 1.11(0.31,3.99)
GH 5 / 16 18 / 37 1.64(0.57,4.71)
JRH 0 / 5 7 / 33 >100 (-,-)
Tot 18 / 111 58 / 191 2.13(1.19,3.82)
NKI 7 / 60 47 / 91 6.07(2.64,13.98)
Time to distant metastases
29%39%
50%62%
75%83%
96% 100%
4.52
7.54
4.683.24 3.5
9.14
2.132.33
2 3 4 5 7 10 15 none
Censoring time (in years)
Cumulativeproportion
of events
0.1
1
10
Adjusted hazard ratio for
gene signature
Key validation issues4 – is prognostic value constant over time?
Time to distant metastases
27%
43%
55%
69% 73%80%
96% 100%
4.19
3.20 3.292.93
3.563.91
2.282.72
2 3 4 5 7 10 15 none
Censoring time (in years)
Cumulativeproportion
of events
0.1
1
10
Adjusted hazard
ratio for
clinical risk
Little time dependency of clinical riskTime to distant metastases
Key validation issues5 – is predictive accuracy acceptable?Metastases within 5 years Sensitivity Specificity
Gene signature 0.90 0.42Adjuvant! software 0.87 0.29NPI 0.91 0.32St Gallen criteria 0.96 0.10Adjuvant! software concordant with gene signature
0.93 0.28
Adjuvant! software discordant with gene signature
0.40 0.30
Key validation issues6 – is predictive accuracy of continuous
risk score better?
Lessons from validation
Glass is half full !• No major heterogeneity
between centers• Signature independent of
clinical risk • Poor signature increases
risk more than two-fold• Signature identifies
« discordant » patients
Lessons from validation
Glass is half empty !• Sensitivity no better than
with clinico-pathological prognostic factors
• Specificity very poor• Is cost of microarray
worth it?
Lessons from validation
Other findings…• Effect of signature highly time
dependent (predicts early metastases much better than any clinicopathological factor)
• Several signatures (Amsterdam, Rotterdam, GGI) show similar patterns, though genes involved differ
top related