validation of predictive classifiers

53
Validation of Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://linus.nci.nih.gov/brb

Upload: kuame-ochoa

Post on 03-Jan-2016

30 views

Category:

Documents


2 download

DESCRIPTION

Validation of Predictive Classifiers. Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer Institute http://linus.nci.nih.gov/brb. Biomarker =Biological Measurement. Surrogate endpoint - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Validation of Predictive Classifiers

Validation of Predictive Classifiers

Richard Simon, D.Sc.Chief, Biometric Research Branch

National Cancer Institutehttp://linus.nci.nih.gov/brb

Page 2: Validation of Predictive Classifiers

Biomarker =Biological Measurement

• Surrogate endpoint– A measurement made on a patient before, during and

after treatment to determine whether the treatment is working

• Prognostic factor– A measurement made before treatment that

correlates with outcome, often for a heterogeneous set of patients

• Predictive factors– A measurement made before treatment to predict

whether a particular treatment is likely to be beneficial

Page 3: Validation of Predictive Classifiers

Prognostic Factors

• Most prognostic factors are not used because they are not therapeutically relevant

• Many prognostic factor studies use a convenience sample of patients for whom tissue is available. Generally the patients are too heterogeneous to support therapeutically relevant conclusions

Page 4: Validation of Predictive Classifiers

Pusztai et al. The Oncologist 8:252-8, 2003

• 939 articles on “prognostic markers” or “prognostic factors” in breast cancer in past 20 years

• ASCO guidelines only recommend routine testing for ER, PR and HER-2 in breast cancer

• “With the exception of ER or progesterone receptor expression and HER-2 gene amplification, there are no clinically useful molecular predictors of response to any form of anticancer therapy.”

Page 5: Validation of Predictive Classifiers

Predictive Biomarkers

• Most cancer treatments benefit only a minority of patients to whom they are administered

• Being able to predict which patients are likely to benefit would – save patients from unnecessary toxicity, and enhance

their chance of receiving a drug that helps them– Improve the efficiency of clinical development– Help control medical costs

Page 6: Validation of Predictive Classifiers

• In new drug development, the role of a classifier is to select a target population for treatment– The focus should be on evaluating the new

drug, not on validating the classifier

• Adoption of a classifier to restrict the use of a treatment in wide use should be based on demonstrating that use of the classifier leads to better clinical outcome

Page 7: Validation of Predictive Classifiers

• Targeted clinical trials can be much more efficient than untargeted clinical trials, if we know who to target

Page 8: Validation of Predictive Classifiers

Developmental Strategy (I)

• Develop a diagnostic classifier that identifies the patients likely to benefit from the new drug

• Develop a reproducible assay for the classifier• Use the diagnostic to restrict eligibility to a

prospectively planned evaluation of the new drug

• Demonstrate that the new drug is effective in the prospectively defined set of patients determined by the diagnostic

Page 9: Validation of Predictive Classifiers

Using phase II data, develop predictor of response to new drugDevelop Predictor of Response to New Drug

Patient Predicted Responsive

New Drug Control

Patient Predicted Non-Responsive

Off Study

Page 10: Validation of Predictive Classifiers

Evaluating the Efficiency of Strategy (I)

• Simon R and Maitnourim A. Evaluating the efficiency of targeted designs for randomized clinical trials. Clinical Cancer Research 10:6759-63, 2004.

• Maitnourim A and Simon R. On the efficiency of targeted clinical trials. Statistics in Medicine 24:329-339, 2005.

• reprints at http://linus.nci.nih.gov/brb

Page 11: Validation of Predictive Classifiers

• For Herceptin, even a relatively poor assay enabled conduct of a targeted phase III trial which was crucial for establishing effectiveness

• Recent results with Herceptin in early stage breast cancer show dramatic benefits for patients selected to express Her-2

Page 12: Validation of Predictive Classifiers
Page 13: Validation of Predictive Classifiers

Developmental Strategy (II)

Develop Predictor of Response to New Rx

Predicted Non-responsive to New Rx

Predicted ResponsiveTo New Rx

ControlNew RX Control

New RX

Page 14: Validation of Predictive Classifiers

Developmental Strategy II

• Do not use the diagnostic to restrict eligibility, but to structure a prospective analysis plan.

• Compare the new drug to the control for classifier positive patients – If p+>0.05 make no claim of effectiveness

– If p+ 0.05 claim effectiveness for the classifier positive patients and

• Test treatment effect for classifier negative patients at 0.05 level

Page 15: Validation of Predictive Classifiers

Key Features of Design (II)

• The purpose of the RCT is to evaluate treatment T vs C for the two pre-defined subsets defined by the binary classifier; not to re-evaluate the components of the classifier, or to modify, refine or re-develop the classifier

Page 16: Validation of Predictive Classifiers

Guiding Principle

• The data used to develop the classifier must be distinct from the data used to test hypotheses about treatment effect in subsets determined by the classifier– Developmental studies are exploratory– Studies on which treatment effectiveness

claims are to be based should be definitive studies that test a treatment hypothesis in a patient population completely pre-specified by the classifier

Page 17: Validation of Predictive Classifiers

Adaptive Signature Design An adaptive design for generating and prospectively testing a gene expression

signature for sensitive patients

Boris Freidlin and Richard SimonClinical Cancer Research 11:7872-8, 2005

Page 18: Validation of Predictive Classifiers

Adaptive Signature DesignEnd of Trial Analysis

• Compare E to C for all patients at significance level 0.04– If overall H0 is rejected, then claim

effectiveness of E for eligible patients– Otherwise

Page 19: Validation of Predictive Classifiers

• Otherwise:– Using only the first half of patients accrued during the

trial, develop a binary classifier that predicts the subset of patients most likely to benefit from the new treatment E compared to control C

– Compare E to C for patients accrued in second stage who are predicted responsive to E based on classifier

• Perform test at significance level 0.01

• If H0 is rejected, claim effectiveness of E for subset defined by classifier

Page 20: Validation of Predictive Classifiers

Biomarker Adaptive Threshold Design

Wenyu Jiang, Boris Freidlin & Richard Simon

JNCI 99:1036-43, 2007http://linus.nci.nih.gov/brb

Page 21: Validation of Predictive Classifiers

Biomarker Adaptive Threshold Design

• Randomized pivotal trial comparing new treatment E to control C

• Survival or DFS endpoint• Have identified a univariate biomarker

index B thought to be predictive of patients likely to benefit from E relative to C

• Eligibility not restricted by biomarker• No threshold for biomarker determined• Biomarker value scaled to range (0,1)

Page 22: Validation of Predictive Classifiers

Evaluating a Classifier

• Fit of a model to the same data used to develop it is no evidence of prediction accuracy for independent data.

• When the number of candidate predictors (p) exceeds the number of cases (n), perfect prediction on the same data used to create the predictor is always possible

Page 23: Validation of Predictive Classifiers

Evaluating a Classifier

• Validation does not mean that repeating classifier process results in similar gene sets

• Validation means predictions for independent cases are accurate

Page 24: Validation of Predictive Classifiers

Internal Validation of a Predictive Classifier

• Split-sample validation• Often applied with too small a validation set• Don’t combine the training and validation set• Don’t validate multiple models and select the “best”

• Cross-validation• Often misused by pre-selection of genes

Page 25: Validation of Predictive Classifiers

Split Sample Approach

• Separate training set of patients from test set

• Patients should represent those eligible for a clinical trial that asks a therapeutically relevant question

• Do not access information about patients in test set until a single completely specified classifier is agreed upon based on the training set data

Page 26: Validation of Predictive Classifiers

Re-Sampling Approach

• Partition data into training set and test set

• Develop a single fully specified classifier of outcome on training set

• Use the classifier to predict outcome for patients in the test set and estimate the error rate

• Repeat the process for many random training-test partitions

Page 27: Validation of Predictive Classifiers

• Re-sampling is only valid if the training set is not used in any way in the development of the model. Using the complete set of samples to select genes violates this assumption and invalidates the process

• With proper re-sampling, the model must be developed from scratch for each training set. This means that gene selection must be repeated for each training set.

Page 28: Validation of Predictive Classifiers

• Re-sampling, e.g. leave-one-out cross-validation is widely misunderstood even by statisticians and widely misused in the published clinical literature

• It is only applicable when there is a completely pre-defined algorithm for gene selection and classifier development that can be applied blindly to each training set

Page 29: Validation of Predictive Classifiers

Myth

• Split sample validation is superior to LOOCV or 10-fold CV for estimating prediction error

Page 30: Validation of Predictive Classifiers
Page 31: Validation of Predictive Classifiers
Page 32: Validation of Predictive Classifiers

Types of Clinical Outcome

• Survival or disease-free survival

• Response to therapy

Page 33: Validation of Predictive Classifiers

• 90 publications identified that met criteria– Abstracted information for all 90

• Performed detailed review of statistical analysis for the 42 papers published in 2004

Page 34: Validation of Predictive Classifiers

Major Flaws Found in 40 Studies Published in 2004

• Inadequate control of multiple comparisons in gene finding– 9/23 studies had unclear or inadequate methods to deal with

false positives• 10,000 genes x .05 significance level = 500 false positives

• Misleading report of prediction accuracy– 12/28 reports based on incomplete cross-validation

• Misleading use of cluster analysis – 13/28 studies invalidly claimed that expression clusters based on

differentially expressed genes could help distinguish clinical outcomes

• 50% of studies contained one or more major flaws

Page 35: Validation of Predictive Classifiers

Validation of Predictive Classifiers for Use with Available Treatments

• Should establish that the classifier is reproducibly measurable and has clinical utility– Better patient outcome or equivalent outcome

with less morbidity– Improvement relative to available staging

tools

Page 36: Validation of Predictive Classifiers

Developmental vs Validation Studies

• Developmental studies should select patients sufficiently homogeneous for addressing a therapeutically relevant question

• Developmental studies should develop a completely specified classifier

• Developmental studies should provide an unbiased estimate of predictive accuracy– Statistical significance of association between

prediction and outcome is not the same as predictive accuracy

Page 37: Validation of Predictive Classifiers

Limitations to Developmental Studies

• Sample handling and assay conduct are performed under controlled conditions that do not incorporate real world sources of variability

• Poor analysis may result in biased estimates of prediction accuracy

• Small study size limits precision of estimates of predictive accuracy– Cases may be unrepresentative of patients at other sites

• Developmental studies may not estimate to what extent predictive accuracy is greater than that achievable with standard prognostic factors

• Predictive accuracy is often not clinical utility

Page 38: Validation of Predictive Classifiers

Independent Validation Studies

• Predictive classifier completely pre-specified

• Patients from different clinical centers

• Specimen handling and assay simulates real world conditions

• Study addresses medical utility of new classifier relative to practice standards

Page 39: Validation of Predictive Classifiers

Types of Clinical Utility

• Identify patients whose prognosis is sufficiently good without cytotoxic chemotherapy

• Identify patients who are likely to benefit from a specific therapy or patients who are unlikely to benefit from it

Page 40: Validation of Predictive Classifiers

Establishing Clinical Utility

• Develop prognostic classifier for patients not receiving cytotoxic chemotherapy

• Identify patients for whom– current practice standards imply

chemotherapy– Classifier indicates very good prognosis

without chemotherapy

• Withhold chemotherapy to test predictions

Page 41: Validation of Predictive Classifiers

Prospectively Planned Validation Using Archived Materials

Oncotype-Dx• Fully specified classifier developed using

data from NSABP B20 applied prospectively to frozen specimens from NSABP B14 patients who received Tamoxifen for 5 years

• Long term follow-up available

• Good risk patients had very good relapse-free survival

Page 42: Validation of Predictive Classifiers

Prospective Validation Design

• Randomize patients with node negative ER+ breast cancer receiving TAM to chemotherapy vs classifier determined therapy

• Determine whether classifier determined arm has equivalent outcome to arm in which all patients receive chemotherapy– Therapeutic equivalence trial

• Gold standard but rarely performed– Very inefficient because most patients get same

treatment in both arms and so the trial must be sized to detect miniscule difference in outcome

Page 43: Validation of Predictive Classifiers
Page 44: Validation of Predictive Classifiers

• Measure classifier for all patients and randomize only those for whom classifier determined therapy differs form standard of care

Page 45: Validation of Predictive Classifiers
Page 46: Validation of Predictive Classifiers

M-rx SOC

• SOC involves chemotherapy– M-rx does not

• SOC does not involve chemotherapy– M-rx does

Page 47: Validation of Predictive Classifiers

M-rx SOC

• SOC involves chemotherapy– M-rx does not– Validation by withholding chemotherapy and

observing outcome of cases in single arm study

• SOC does not involve chemotherapy– M-rx does– Validation by withholding chemotherapy and

observing outcome in single arm study?– Validation by randomization chemo vs no chemo

Page 48: Validation of Predictive Classifiers

US Intergroup Study

• OncotypeDx risk score <15– Tam alone

• OncotypeDx risk score >30– Tam + Chemo

• OncotypeDx risk score 15-30– Randomize to Tam vs Tam + Chemo

Page 49: Validation of Predictive Classifiers

Key Steps in Development and Validation of Therapeutically Relevant

Genomic Classifiers

• Develop classifier for addressing a specific important therapeutic decision: – Patients sufficiently homogeneous and receiving uniform

treatment so that results are therapeutically relevant.– Treatment options and costs of mis-classification such that a

classifier is likely to be used • Perform internal validation of classifier to assess whether

it appears sufficiently accurate relative to standard prognostic factors that it is worth further development

• Translate classifier to platform that would be used for broad clinical application

• Demonstrate that the classifier is reproducible• Independent validation of the completely specified

classifier on a prospectively planned study

Page 50: Validation of Predictive Classifiers

Types of Clinical Utility

• Identify patients whose prognosis is sufficiently good without cytotoxic chemotherapy– Identify patients whose prognosis is so good

on standard therapy S that they do not need additional treatment T

• Identify patients who are likely to benefit from a specific systemic therapy and/or patients who are unlikely to benefit from it

Page 51: Validation of Predictive Classifiers

Validation Study for Identifying Patients Who Benefit from a Specific Regimen

• Standard treatment S• Test treatment T (e.g. S+X)• Classifier based on previous data for identifying

patients who benefit from T relative to S• Randomized study of S vs T• Endpoint is accepted measure of patient benefit• Measure classifier on all patients• Compare T vs S separately within classifier +

and classifier – patients– Establish that T is better than S for classifier +

patients but not for classifier - patients

Page 52: Validation of Predictive Classifiers

• Approach is not feasible when T is curative for some patients and S is not

• Under these circumstances, a classifier with less than perfect negative predictive value is probably not acceptable

• These studies are best accomplished prior to approval of the new drug T or using archived specimens from the pivotal trials leading to approval of T

Page 53: Validation of Predictive Classifiers

Acknowledgements

• Kevin Dobbin

• Boris Freidlin

• Aboubakar Maitnourim

• Annette Molinaro

• Michael Radmacher

• Yingdong Zhao