multi trial evaluation of longitudinal tumor measurement (tm)-based metrics for predicting overall...

1
Multi trial evaluation of longitudinal tumor measurement (TM)-based metrics for predicting overall survival (OS) using the RECIST 1.1 data warehouse Background: Response Evaluation Criteria in Solid Tumors (RECIST) version 1.0 (and RECIST version1.1) for measuring tumor shrinkage groups patients into categories based on change in tumor measurements, specifically: Complete response - Complete disappearance of all lesions Partial response - at least 30% reduction from baseline sum for target lesions Progression - at least 20% increase from the lowest sum of measurements (and at least 5 mm absolute increase in version 1.1) or new lesion recorded (with additional FDG PET assessment in version 1.1) Stable otherwise We previously reported (ASCO 2012) that alternative cutpoints and alternate categorical metrics to RECIST standards provided no meaningful improvement in overall survival (OS) prediction Therasse et al., JNCI 2000; Eisenhauer et al., EJC 2009 Background Average Baseline Sum (in mm) Sum of one-dimensional baseline tumor measurements of consistent lesions / number of consistent lesions Tumor measurements are recorded in millimeters (mm). First slope (0-6 wks) and last slope (6-12 wks) (m 6 – m 0 ) / 6 and (m 12 – m 6 ) / 6; m x = tumor measurement at week x Units: millimeters per week (mm/w) Indicator of (first slope > 0) = 1, if first slope >0; 0 otherwise Similarly for last slope-based metrics First % change (0-6 wks) and last % change (6-12 wks) 10*(m 6 – m 0 ) /(6*m 0 ) and 10* (m 12 – m 6 ) /(6*m 6 ) m x = tumor measurement at week x Units: 10% change per week (10%/w) Indicator of (first % change > 0) = 1, if first % change>0; 0 otherwise Similarly for last % change-based metrics Indicator of (inflection status) = 1, if inflection; 0 otherwise Definitions of Metrics Regardless of tumor type: TM based metrics had similar predictive performance compared to RECIST based categorical metrics. Although point estimates were higher, the 95% CIs for the TM based models encompass the RECIST c- index Smaller sample size for some of the training and validation cohorts Theoretical c-index much higher than TM based and RECIST based metrics Summary Models Methods 2013 Mayo Foundation for Medical Education and Research Imaging assessment schedule per study Sumithra J. Mandrekar, Ph.D. 1 , Ming-Wen An, Ph.D. 2 , Xinxin Dong, Ph.D. 3 , Axel Grothey, M.D. 1 , Jan Bogaerts, M.D. 4 , Daniel J. Sargent, Ph.D. 1 , 1 Mayo Clinic, Rochester, MN, USA; 2 Vassar College, Poughkeepsie, NY, USA; 3 University of Pittsburgh, Pittsburgh, PA, USA; 4 European Organization for Research and Treatment of Cancer Headquarters, Brussels, Belgium Landmark analysis at 12 weeks Window around landmark time point: keeping only those who are alive beyond the landmark time point, with available tumor status at landmark time point +/- 2 weeks Outcome: OS (time from registration to death from any cause) Cox PH models, stratified by study and number of consistent lesions (< 3 and >= 3), and adjusted for average baseline tumor sum Separate models for each tumor type Excluded the following due to lack of (reliable)TM measurements: Progression due to new lesions Assessments based on clinical examination only 60:40 split (training: test), stratified by: survival status, progression status, and “perfect status” (if observed assessments are within 2 weeks of protocol expected assessments based on a sliding window) Selection criterion: concordance index (with associated 95% CI) Theoretical upper bound for c-index: calculated from time-dependent Cox models using PFS as time-dependent status, using all available data Breast: 0.66 Lung: 0.67 Colon: 0.68 Slope based: Similar interpretation for the last slope metrics. The % change based metrics (for 10% change) follows a similar model log λ(t)= log λ(t)+ β 1 average baseline sum + β 2 firstslope + β 3 firstslope xI(firstslope >0)+ β 4 lastslope slope + β 5 lastslope xI(lastslope >0)+ β 6 I(inflection) c-indices (and 95% CI) from the training set for the slope-based and % change-based models Point estimates for the C- indices *: slope-based model, % change- based model Limitations Analysis based on data from only 3 tumor types Missing data issues: Not all lesions measured over time Missed visits, or missing assessments due to only clinical evaluations Primarily in breast cancer leading to small sample size for the models investigated Missing measurements on target lesions when progression was from new or non-target lesions Censored predictor variables No TM measurements after RECIST progression Landmark analysis: Conditional on survival to 12 weeks Alternative: Include a time-dependent component prior to 12 weeks to account for early deaths, progressions, drop outs etc. RECIST steering committee Supported in part by National Institute grant, CA167326-01 Acknowledgements Research Question Sample Data Imaging Results Slope-based model % change-based model Results An Example: Consider 2 patients with the same average baseline slope and last slope, but with different first slopes (using the colon cancer training set): That is, the HR associated with a 1mm/w increase in first slope depends on whether the first slopes are positive or negative. An Example: Consider 2 patients with the same average baseline slope and last % change, but with different first % changes (using the colon cancer training set): That is, the HR associated with a 10%/w increase in first % change depends on whether the first % changes are positive or negative. Are there continuous, longitudinal TM-based metrics that can enhance prediction of OS outcomes compared to RECIST based categorical metrics? Goals: 1) Identify and validate clinically relevant (necessary & sufficient) features of the tumor trajectory for overall survival (OS) prediction. 2) Compare the longitudinal TM-based metrics to RECIST based categorical metrics for OS prediction Trajectories of 10 randomly selected patients from each study M ultivariable C ox Proportional H azard M odel R esults M etric B reast H azard R atios (p-values) Lung H azard R atios (p-values) C olon H azard R atios (p-values) Training (N =140) Validation (N =88) Training (N =512) Validation (N =335) Training (N =278) Validation (N =190) Average baseline sum (mm) 1.01 (0.08) 0.99 (0.29) 1.01 (0.01) 1.01 (0.02) 1.01 (0.15) 1.00 (0.58) 1 st slope (m m /w) 0.96 (0.36) 0.95 (0.10) 1.00 (0.97) 1.02 (0.44) 1.04 (0.22) 0.92 (0.02) Interaction term : 1 st slope *Indicatorof(1 st slope>0) 0.82 (0.57) 1.78 (0.64) 1.05 (0.85) 0.84 (0.62) 1.01 (0.90) 1.46 (0.01) Lastslope (m m /w) 1.00 (0.99) 0.92 (0.50) 0.96 (0.33) 1.03 (0.57) 0.99 (0.85) 1.06 (0.42) Interaction term : Lastslope *Indicatorof(last slope>0) 1.11 (0.86) 0.24 (0.33) 1.23 (0.003) 1.07 (0.70) 1.69 (<0.001) 1.84 (0.004) Indicatorofinflection status (Yes vs.N o) 1.58 (0.38) 1.25 (0.80) 1.44 (0.07) 0.95 (0.87) 0.73 (0.35) 0.88 (0.74) M odelc-index (95% C I) 0.59 (0.5-0.66) 0.57 (0.49-0.65) 0.57 (0.53-0.61) 0.58 (0.53-0.62) 0.60 (0.54-0.67) 0.63 (0.56-0.71) R EC IST c-index Theoreticalupperbound c- index 0.51 0.66 0.56 0.67 0.58 0.68 M ultivariable C ox Proportional H azard M odel R esults M etric B reast H azard R atios (p-values) Lung H azard R atios (p-values) C olon H azard R atios (p-values) Training (N =133) Validation (N =88) Training (N =503) Validation (N =326) Training (N =276) Validation (N =187) Average baseline sum (mm) 1.01 (0.01) 1.00 (0.71) 1.01 (<0.001) 1.01 (0.04) 1.01 (0.02) 1.00 (0.90) 1 st % -change (10% /w ) 0.95 (0.91) 1.42 (0.59) 1.03 (0.90) 1.34 (0.31) 5.84 (0.005) 0.56 (0.22) Interaction term : 1 st % change *Indicatorof(1 st % change>0) 0.05 (0.25) 0.001 (0.16) 0.23 (0.50) 0.02 (0.52) 0.37 (0.25) 1.16 (0.93) Last% change(10% /w ) 0.79 (0.50) 0.64 (0.21) 1.10 (0.68) 1.45 (0.23) 1.28 (0.64) 2.78 (0.14) Interaction term : Last% change*Indicatorof(last % change>0) 0.005(0.17) <0.001 (0.13) 2.35 (0.07) 0.34 (0.42) 10.37 (0.03) 0.20 (0.12) Indicatorofinflection status (Yes vs.N o) 4.14 (0.03) 5.58 (0.10) 1.44 (0.07) 1.20 (0.59) 0.69 (0.24) 1.87 (0.08) M odelc-index (95% C I) 0.57 (0.50-0.64) 0.56 (0.48-0.64) 0.58 (0.54-0.62) 0.59 (0.54-0.64) 0.62 (0.56-0.69) 0.62 (0.55-0.70) R EC IST c-index Theoreticalupperbound c- index 0.51 0.66 0.56 0.67 0.58 0.68

Upload: imogene-horn

Post on 16-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multi trial evaluation of longitudinal tumor measurement (TM)-based metrics for predicting overall survival (OS) using the RECIST 1.1 data warehouse Background:

Multi trial evaluation of longitudinal tumor measurement (TM)-based metrics for predicting overall survival (OS) using the RECIST 1.1 data warehouse

Background: • Response Evaluation Criteria in Solid Tumors (RECIST) version 1.0 (and

RECIST version1.1) for measuring tumor shrinkage groups patients into categories based on change in tumor measurements, specifically: • Complete response - Complete disappearance of all lesions• Partial response - at least 30% reduction from baseline sum for target

lesions• Progression - at least 20% increase from the lowest sum of

measurements (and at least 5 mm absolute increase in version 1.1) or new lesion recorded (with additional FDG PET assessment in version 1.1)

• Stable otherwise • We previously reported (ASCO 2012) that alternative cutpoints and

alternate categorical metrics to RECIST standards provided no meaningful improvement in overall survival (OS) prediction

Therasse et al., JNCI 2000; Eisenhauer et al., EJC 2009

Background

Average Baseline Sum (in mm)

◦ Sum of one-dimensional baseline tumor measurements of consistent lesions / number of consistent lesions

◦ Tumor measurements are recorded in millimeters (mm). First slope (0-6 wks) and last slope (6-12 wks)

(m6 – m0) / 6 and (m12 – m6) / 6; mx = tumor measurement at week x

◦ Units: millimeters per week (mm/w)

• Indicator of (first slope > 0) = 1, if first slope >0; 0 otherwise

• Similarly for last slope-based metrics First % change (0-6 wks) and last % change (6-12 wks)

◦ 10*(m6 – m0) /(6*m0) and 10* (m12 – m6) /(6*m6)

◦ mx = tumor measurement at week x

◦ Units: 10% change per week (10%/w)

• Indicator of (first % change > 0) = 1, if first % change>0; 0 otherwise

• Similarly for last % change-based metrics

• Indicator of (inflection status) = 1, if inflection; 0 otherwise

Definitions of Metrics

• Regardless of tumor type:• TM based metrics had similar predictive performance compared to

RECIST based categorical metrics.• Although point estimates were higher, the 95% CIs for the TM

based models encompass the RECIST c-index• Smaller sample size for some of the training and validation cohorts• Theoretical c-index much higher than TM based and RECIST

based metrics

Summary

Models

Methods

2013 Mayo Foundation for Medical Education and Research

Imaging assessment schedule per study

Sumithra J. Mandrekar, Ph.D.1, Ming-Wen An, Ph.D.2, Xinxin Dong, Ph.D.3, Axel Grothey, M.D.1, Jan Bogaerts, M.D.4, Daniel J. Sargent, Ph.D.1, 1Mayo Clinic, Rochester, MN, USA; 2Vassar College, Poughkeepsie, NY, USA; 3University of Pittsburgh, Pittsburgh, PA, USA; 4European Organization for Research and Treatment of Cancer Headquarters, Brussels, Belgium

• Landmark analysis at 12 weeks• Window around landmark time point: keeping only those who are alive

beyond the landmark time point, with available tumor status at landmark time point +/- 2 weeks

• Outcome: OS (time from registration to death from any cause)• Cox PH models, stratified by study and number of consistent lesions (< 3

and >= 3), and adjusted for average baseline tumor sum • Separate models for each tumor type

• Excluded the following due to lack of (reliable)TM measurements: • Progression due to new lesions• Assessments based on clinical examination only

• 60:40 split (training: test), stratified by:• survival status, progression status, and “perfect status”

(if observed assessments are within 2 weeks of protocol expected assessments based on a sliding window)

• Selection criterion: concordance index (with associated 95% CI)• Theoretical upper bound for c-index: calculated from time-dependent Cox

models using PFS as time-dependent status, using all available data• Breast: 0.66• Lung: 0.67• Colon: 0.68

Slope based:

• Similar interpretation for the last slope metrics.• The % change based metrics (for 10% change) follows a similar model

log λ(t) = log λ(t) + β1 average baseline sum + β2 first slope + β3 first slope x I(first slope >0) + β4 last slope slope + β5 last slope x I(last slope >0) + β6 I(inflection)

Multivariable Cox Proportional Hazard Model Results

Metric

Breast

Hazard Ratios

(p-values)

Lung

Hazard Ratios

(p-values)

Colon

Hazard Ratios

(p-values)

Training (N=140)

Validation (N=88)

Training (N=512)

Validation (N=335)

Training (N=278)

Validation (N=190)

Average baseline sum (mm) 1.01 (0.08) 0.99 (0.29) 1.01 (0.01) 1.01 (0.02) 1.01 (0.15) 1.00 (0.58)

1st slope (mm/w) 0.96 (0.36) 0.95 (0.10) 1.00 (0.97) 1.02 (0.44) 1.04 (0.22) 0.92 (0.02)

Interaction term:

1st slope * Indicator of (1st slope>0) 0.82 (0.57) 1.78 (0.64) 1.05 (0.85) 0.84 (0.62) 1.01 (0.90) 1.46 (0.01)

Last slope (mm/w) 1.00 (0.99) 0.92 (0.50) 0.96 (0.33) 1.03 (0.57) 0.99 (0.85) 1.06 (0.42)

Interaction term:

Last slope * Indicator of (last slope>0)

1.11 (0.86) 0.24 (0.33) 1.23 (0.003) 1.07 (0.70) 1.69 (<0.001)

1.84 (0.004)

Indicator of inflection status (Yes vs. No)

1.58 (0.38) 1.25 (0.80) 1.44 (0.07) 0.95 (0.87) 0.73 (0.35) 0.88 (0.74)

Model c-index

(95% CI)

0.59

(0.5-0.66)

0.57

(0.49-0.65)

0.57

(0.53-0.61)

0.58

(0.53-0.62)

0.60

(0.54-0.67)

0.63

(0.56-0.71)

RECIST c-index

Theoretical upper bound c-index

0.51

0.66

0.56

0.67

0.58

0.68

Multivariable Cox Proportional Hazard Model Results

Metric

Breast

Hazard Ratios

(p-values)

Lung

Hazard Ratios

(p-values)

Colon

Hazard Ratios

(p-values)

Training

(N=133)

Validation

(N=88)

Training

(N=503)

Validation

(N=326)

Training

(N=276)

Validation

(N=187)

Average baseline sum (mm) 1.01 (0.01) 1.00 (0.71) 1.01 (<0.001) 1.01 (0.04) 1.01 (0.02) 1.00 (0.90)

1st %-change (10%/w) 0.95 (0.91) 1.42 (0.59) 1.03 (0.90) 1.34 (0.31) 5.84 (0.005) 0.56 (0.22)

Interaction term:

1st %change * Indicator of (1st %change>0)

0.05 (0.25) 0.001 (0.16) 0.23 (0.50) 0.02 (0.52) 0.37 (0.25) 1.16 (0.93)

Last % change(10%/w) 0.79 (0.50) 0.64 (0.21) 1.10 (0.68) 1.45 (0.23) 1.28 (0.64) 2.78 (0.14)

Interaction term:

Last %change* Indicator of (last %change>0)

0.005(0.17) <0.001 (0.13) 2.35 (0.07) 0.34 (0.42) 10.37 (0.03) 0.20 (0.12)

Indicator of inflection status (Yes vs. No)

4.14 (0.03) 5.58 (0.10) 1.44 (0.07) 1.20 (0.59) 0.69 (0.24) 1.87 (0.08)

Model c-index

(95% CI)

0.57

(0.50-0.64)

0.56

(0.48-0.64)

0.58

(0.54-0.62)

0.59

(0.54-0.64)

0.62

(0.56-0.69)

0.62

(0.55-0.70)

RECIST c-index

Theoretical upper bound c-index

0.51

0.66

0.56

0.67

0.58

0.68

c-indices (and 95% CI) from the training set for the slope-based and % change-based models

Point estimates for the C-indices

*: slope-based model, % change-based model

Limitations

• Analysis based on data from only 3 tumor types• Missing data issues:

• Not all lesions measured over time• Missed visits, or missing assessments due to only clinical

evaluations • Primarily in breast cancer leading to small sample size for the

models investigated• Missing measurements on target lesions when progression was from

new or non-target lesions• Censored predictor variables

• No TM measurements after RECIST progression• Landmark analysis: Conditional on survival to 12 weeks

• Alternative: Include a time-dependent component prior to 12 weeks to account for early deaths, progressions, drop outs etc.

• RECIST steering committee• Supported in part by National Institute grant, CA167326-01

Acknowledgements

Research Question

Sample Data

Imaging Results

Slope-based model % change-based model

Results

An Example:

Consider 2 patients with the same average baseline slope and last slope, but with different first slopes (using the colon cancer training set):

That is, the HR associated with a 1mm/w increase in first slope depends on whether the first slopes are positive or negative.

An Example:Consider 2 patients with the same average baseline slope and last % change, but with different first % changes (using the colon cancer training set):

That is, the HR associated with a 10%/w increase in first % change depends on whether the first % changes are positive or negative.

• Are there continuous, longitudinal TM-based metrics that can enhance prediction of OS outcomes compared to RECIST based categorical metrics?

• Goals: 1) Identify and validate clinically relevant (necessary & sufficient) features

of the tumor trajectory for overall survival (OS) prediction. 2) Compare the longitudinal TM-based metrics to RECIST based

categorical metrics for OS prediction

Trajectories of 10 randomly selected patients from each study