multi trial evaluation of longitudinal tumor measurement (tm)-based metrics for predicting overall...
TRANSCRIPT
Multi trial evaluation of longitudinal tumor measurement (TM)-based metrics for predicting overall survival (OS) using the RECIST 1.1 data warehouse
Background: • Response Evaluation Criteria in Solid Tumors (RECIST) version 1.0 (and
RECIST version1.1) for measuring tumor shrinkage groups patients into categories based on change in tumor measurements, specifically: • Complete response - Complete disappearance of all lesions• Partial response - at least 30% reduction from baseline sum for target
lesions• Progression - at least 20% increase from the lowest sum of
measurements (and at least 5 mm absolute increase in version 1.1) or new lesion recorded (with additional FDG PET assessment in version 1.1)
• Stable otherwise • We previously reported (ASCO 2012) that alternative cutpoints and
alternate categorical metrics to RECIST standards provided no meaningful improvement in overall survival (OS) prediction
Therasse et al., JNCI 2000; Eisenhauer et al., EJC 2009
Background
Average Baseline Sum (in mm)
◦ Sum of one-dimensional baseline tumor measurements of consistent lesions / number of consistent lesions
◦ Tumor measurements are recorded in millimeters (mm). First slope (0-6 wks) and last slope (6-12 wks)
(m6 – m0) / 6 and (m12 – m6) / 6; mx = tumor measurement at week x
◦ Units: millimeters per week (mm/w)
• Indicator of (first slope > 0) = 1, if first slope >0; 0 otherwise
• Similarly for last slope-based metrics First % change (0-6 wks) and last % change (6-12 wks)
◦ 10*(m6 – m0) /(6*m0) and 10* (m12 – m6) /(6*m6)
◦ mx = tumor measurement at week x
◦ Units: 10% change per week (10%/w)
• Indicator of (first % change > 0) = 1, if first % change>0; 0 otherwise
• Similarly for last % change-based metrics
• Indicator of (inflection status) = 1, if inflection; 0 otherwise
Definitions of Metrics
• Regardless of tumor type:• TM based metrics had similar predictive performance compared to
RECIST based categorical metrics.• Although point estimates were higher, the 95% CIs for the TM
based models encompass the RECIST c-index• Smaller sample size for some of the training and validation cohorts• Theoretical c-index much higher than TM based and RECIST
based metrics
Summary
Models
Methods
2013 Mayo Foundation for Medical Education and Research
Imaging assessment schedule per study
Sumithra J. Mandrekar, Ph.D.1, Ming-Wen An, Ph.D.2, Xinxin Dong, Ph.D.3, Axel Grothey, M.D.1, Jan Bogaerts, M.D.4, Daniel J. Sargent, Ph.D.1, 1Mayo Clinic, Rochester, MN, USA; 2Vassar College, Poughkeepsie, NY, USA; 3University of Pittsburgh, Pittsburgh, PA, USA; 4European Organization for Research and Treatment of Cancer Headquarters, Brussels, Belgium
• Landmark analysis at 12 weeks• Window around landmark time point: keeping only those who are alive
beyond the landmark time point, with available tumor status at landmark time point +/- 2 weeks
• Outcome: OS (time from registration to death from any cause)• Cox PH models, stratified by study and number of consistent lesions (< 3
and >= 3), and adjusted for average baseline tumor sum • Separate models for each tumor type
• Excluded the following due to lack of (reliable)TM measurements: • Progression due to new lesions• Assessments based on clinical examination only
• 60:40 split (training: test), stratified by:• survival status, progression status, and “perfect status”
(if observed assessments are within 2 weeks of protocol expected assessments based on a sliding window)
• Selection criterion: concordance index (with associated 95% CI)• Theoretical upper bound for c-index: calculated from time-dependent Cox
models using PFS as time-dependent status, using all available data• Breast: 0.66• Lung: 0.67• Colon: 0.68
Slope based:
• Similar interpretation for the last slope metrics.• The % change based metrics (for 10% change) follows a similar model
log λ(t) = log λ(t) + β1 average baseline sum + β2 first slope + β3 first slope x I(first slope >0) + β4 last slope slope + β5 last slope x I(last slope >0) + β6 I(inflection)
Multivariable Cox Proportional Hazard Model Results
Metric
Breast
Hazard Ratios
(p-values)
Lung
Hazard Ratios
(p-values)
Colon
Hazard Ratios
(p-values)
Training (N=140)
Validation (N=88)
Training (N=512)
Validation (N=335)
Training (N=278)
Validation (N=190)
Average baseline sum (mm) 1.01 (0.08) 0.99 (0.29) 1.01 (0.01) 1.01 (0.02) 1.01 (0.15) 1.00 (0.58)
1st slope (mm/w) 0.96 (0.36) 0.95 (0.10) 1.00 (0.97) 1.02 (0.44) 1.04 (0.22) 0.92 (0.02)
Interaction term:
1st slope * Indicator of (1st slope>0) 0.82 (0.57) 1.78 (0.64) 1.05 (0.85) 0.84 (0.62) 1.01 (0.90) 1.46 (0.01)
Last slope (mm/w) 1.00 (0.99) 0.92 (0.50) 0.96 (0.33) 1.03 (0.57) 0.99 (0.85) 1.06 (0.42)
Interaction term:
Last slope * Indicator of (last slope>0)
1.11 (0.86) 0.24 (0.33) 1.23 (0.003) 1.07 (0.70) 1.69 (<0.001)
1.84 (0.004)
Indicator of inflection status (Yes vs. No)
1.58 (0.38) 1.25 (0.80) 1.44 (0.07) 0.95 (0.87) 0.73 (0.35) 0.88 (0.74)
Model c-index
(95% CI)
0.59
(0.5-0.66)
0.57
(0.49-0.65)
0.57
(0.53-0.61)
0.58
(0.53-0.62)
0.60
(0.54-0.67)
0.63
(0.56-0.71)
RECIST c-index
Theoretical upper bound c-index
0.51
0.66
0.56
0.67
0.58
0.68
Multivariable Cox Proportional Hazard Model Results
Metric
Breast
Hazard Ratios
(p-values)
Lung
Hazard Ratios
(p-values)
Colon
Hazard Ratios
(p-values)
Training
(N=133)
Validation
(N=88)
Training
(N=503)
Validation
(N=326)
Training
(N=276)
Validation
(N=187)
Average baseline sum (mm) 1.01 (0.01) 1.00 (0.71) 1.01 (<0.001) 1.01 (0.04) 1.01 (0.02) 1.00 (0.90)
1st %-change (10%/w) 0.95 (0.91) 1.42 (0.59) 1.03 (0.90) 1.34 (0.31) 5.84 (0.005) 0.56 (0.22)
Interaction term:
1st %change * Indicator of (1st %change>0)
0.05 (0.25) 0.001 (0.16) 0.23 (0.50) 0.02 (0.52) 0.37 (0.25) 1.16 (0.93)
Last % change(10%/w) 0.79 (0.50) 0.64 (0.21) 1.10 (0.68) 1.45 (0.23) 1.28 (0.64) 2.78 (0.14)
Interaction term:
Last %change* Indicator of (last %change>0)
0.005(0.17) <0.001 (0.13) 2.35 (0.07) 0.34 (0.42) 10.37 (0.03) 0.20 (0.12)
Indicator of inflection status (Yes vs. No)
4.14 (0.03) 5.58 (0.10) 1.44 (0.07) 1.20 (0.59) 0.69 (0.24) 1.87 (0.08)
Model c-index
(95% CI)
0.57
(0.50-0.64)
0.56
(0.48-0.64)
0.58
(0.54-0.62)
0.59
(0.54-0.64)
0.62
(0.56-0.69)
0.62
(0.55-0.70)
RECIST c-index
Theoretical upper bound c-index
0.51
0.66
0.56
0.67
0.58
0.68
c-indices (and 95% CI) from the training set for the slope-based and % change-based models
Point estimates for the C-indices
*: slope-based model, % change-based model
Limitations
• Analysis based on data from only 3 tumor types• Missing data issues:
• Not all lesions measured over time• Missed visits, or missing assessments due to only clinical
evaluations • Primarily in breast cancer leading to small sample size for the
models investigated• Missing measurements on target lesions when progression was from
new or non-target lesions• Censored predictor variables
• No TM measurements after RECIST progression• Landmark analysis: Conditional on survival to 12 weeks
• Alternative: Include a time-dependent component prior to 12 weeks to account for early deaths, progressions, drop outs etc.
• RECIST steering committee• Supported in part by National Institute grant, CA167326-01
Acknowledgements
Research Question
Sample Data
Imaging Results
Slope-based model % change-based model
Results
An Example:
Consider 2 patients with the same average baseline slope and last slope, but with different first slopes (using the colon cancer training set):
That is, the HR associated with a 1mm/w increase in first slope depends on whether the first slopes are positive or negative.
An Example:Consider 2 patients with the same average baseline slope and last % change, but with different first % changes (using the colon cancer training set):
That is, the HR associated with a 10%/w increase in first % change depends on whether the first % changes are positive or negative.
• Are there continuous, longitudinal TM-based metrics that can enhance prediction of OS outcomes compared to RECIST based categorical metrics?
• Goals: 1) Identify and validate clinically relevant (necessary & sufficient) features
of the tumor trajectory for overall survival (OS) prediction. 2) Compare the longitudinal TM-based metrics to RECIST based
categorical metrics for OS prediction
Trajectories of 10 randomly selected patients from each study