prospective clinical trials

1
Machine Learning Models to Quantify HER2 for Real-Time Tissue Image Analysis in Prospective Clinical Trials STUDY BACKGROUND AND CONCLUSIONS PATHAI ML MODELS DEVELOPMENT VALIDATION FUTURE DIRECTIONS 689 breast cancer tissue samples were stained with HER2 immunohistochemistry (IHC; Ventana HER2 [4B5] Assay) and digitized into WSI (Leica Biosystems) across 5 laboratories in the US. Breast cancer tissue (purchased from Avaden Biosciences or anonymized from the AstraZeneca biobank) was from primary and metastatic tumors, core needle biopsies and surgical resections, lobular and ductal carcinomas across tumor grades and HER2 expression levels reflecting real-world conditions. WSI were stratified into training (n=407), validation (n=110), and test sets (n=173). Multiple convolutional neural network (CNN)-based ML models were trained using over 190,000 manual annotations provided by 30 board certified pathologists to identify artifacts, invasive tumor, individual cancer cells, and measure tumor cell membrane HER2 expression as partial or complete, and negative, weak-or-moderate, or intense (Figure 1, Figure 2, Figure 3). Cell-Level Scores ML cell-level scores were validated against a consensus (median) of manual counts from 5 independent pathologists in 320 representative frames (Figure 1B,C). In test set data, there was strong agreement between ML-model and pathologist consensus scores for all cell types except for faintly positive HER2 cells where ML-based quantification identified more cells on average (Table 1 ). Abstract #3061 AUTHORS 2021 American Society of Clinical Oncology Annual Meeting June 4-8, 2021, Virtual Meeting, USA Benjamin Glass 1 , Michel Erminio Vandenberghe 2 , Surya Teja Chavali 1 , Syed Ashar Javed 1 , Marlon Rebelatto 3 , Shamira Sridharan 1 , Hunter Elliott 1 , Sudha Rao 1 , Michael Montalto 1 , Murray Resnick 1 , Ilan Wapinski 1 , Andrew Beck 1 , Craig Barker 2 1 PathAI, Boston, MA; 2 AstraZeneca R&D, Cambridge, United Kingdom; 3 AstraZeneca, Gaithersburg, MD ACKNOWLEDGMENTS This study was funded by AstraZeneca Research and Development. This poster template was developed by SciStories LLC. https://scistories.com/ Figure 2. Breast Cancer HER2 Cell Model. Figure 3. Breast Cancer HER2 Tissue Segmentation Model. Breast cancer tissue sample with (A) HER2 IHC stain; (B) ML model detection of cancer epithelium (red); cancer stroma (orange), DCIS (green); Necrosis (black)l non-tumor/background (no color) Slide-Level Scores HER2 slide-level scores were generated by automatically applying the rules derived from 2018 ASCO/CAP guidelines and compared with consensus scores from 3 independent pathologists in the test set (Figure 1B,C). In the test set, automatically generated ML-ASCO/CAP scores showed substantial consistency with the pathologists’ consensus scores across the IHC categories (ICC 0.88 [95% CI 0.82-0.92) and Figure 4, Precision Score. Agreement improved further when models were trained to agree with pathologists by adjusting the cut offs by moving ML predicted HER2 weak to moderate partial positive tumor cells to the 2+ score (ICC 0.91 [95% CI 0.89-0.94] and Figure 4, Adjusted Score. A B C A B Breast cancer tissue with (A) HER2 IHC stain (Ventana HER2 [4B5] Assay; (B) ML model detection of cellular membrane (red); (C) ML model detection of HER2 negative cancer cell (red), partial HER2 positive cancer cell (red with orange border), complete HER2 positive cancer cell (red with purple border), other cell (green) Figure 1. ML Model Training, Pathologist Label Collection, and Validation of HER2 scores A) Model Training B) Ground Truth Collection ML model training incorporating annotations from board-certified pathologists (top). Inference on study samples (bottom left). Generation of cell level predictions summarized as slide level scores (bottom right). Board certified pathologists score study samples with ASCO/CAP HER2 scores (bottom left) and exhaustively annotate cells within randomly generated frames extracted from the whole slide images (bottom right). C) Model Validation Pathologist labels and ML predicted scores for slide level readouts (left) and cell level readouts (right). Table 1. ML Model Quantified and Pathologist HER2 Cell Level Scores Pearson correlation values for consensus cell count correlation with ML Model in evaluated frames, 95% CI Figure 4. Confusion Matrices of Test Set (n=173) Comparison of Pathologist and PathAI Precision Algorithm Scoring Precision score (left) and Adjusted Score (right) PathAI Precision Algorithm PathAI Adjusted Algorithm Validated ML models were incorporated into a clinical trial monitoring tool that supports the uploading of WSIs to the PathAI cloud-based platform, deployment of ML models, and reporting of case-level (Figure 5), and trial level results (Figure 6). The HER2 QC can be used at clinical laboratories to reproducibly and rapidly monitor sample adequacy and HER2 assays in active clinical trials. REFERENCES 1. Marchiò, Caterina, et al. “Evolving Concepts in HER2 Evaluation in Breast Cancer: Heterogeneity, HER2-Low Carcinomas and Beyond.” Seminars in Cancer Biology, 2020, doi:10.1016/j.semcancer.2020.02.016. 2. Modi, Shanu, et al. “Trastuzumab Deruxtecan in Previously Treated HER2-Positive Breast Cancer.” New England Journal of Medicine, vol. 382, no. 7, 2020, pp. 610–621., doi:10.1056/nejmoa1914510. 3. Perez, Edith A., et al. “HER2Testing By Local, Central, and Reference Laboratories in Specimens From the North Central Cancer Treatment Group N9831 Intergroup Adjuvant Trial.” Journal of Clinical Oncology, vol. 24, no. 19, 2006, pp. 3032–3038., doi:10.1200/jco.2005.03.4744. Cell Type ASCO/CAP Score Model Pathologist HER2 Negative Cancer Cell 0 0.77 [0.62– 0.85] 0.86 [0.80 – 0.90] HER2 Faint Partial Membranous Positive Cancer Cell 1+ 0.65 [0.58 0.71] 0.32 [0.20 – 0.44] HER2 Weak To Moderate Complete Membranous Positive Cancer Cell 2+ 0.84 [0.80 0.88] 0.76 [0.68 – 0.82] HER2 Intense Complete Membranous Positive Cancer Cell 3+ 0.91 [0.88 0.93] 0.83 [0.76 – 0.88] Figure 6. Sample HER2 Trial Report from the PathAI Clinical Trial Support Platform Generated Using Simulated Data Trial report contains comparisons of ML-model predicted and pathologist scores using the Precision (left) and Adjusted (right) algorithms. Deployment of the HER2 QC Tool for Use in Clinical Trials Pathologist Score Pathologist Score HER2 overexpression is a demonstrated negative prognostic factor in breast cancer, and a target for anti-HER2 compounds [1] . There is an unmet need for reproducible and accurate HER2 scoring in breast cancer as it is essential to inform treatment decisions [2] . Pathologists show inter- and intra-observer variability for whole slide quantitative scores in part because exhaustive cell scoring is not possible by manual means [3] . Using machine learning (ML) approaches, every viable tumor cell within the HER2 stained sample is classified. Here, we developed, trained, and validated an automated ML-based model as a quality control tool for HER2 testing and monitoring in clinical trials. The ML model was trained using whole slide images (WSI) from multiple sources to quantify HER2 expression, and measure stain intensity, artifact content, tumor area, and DCIS (ductal carcinoma in-situ) across a diversity of breast cancer phenotypes. Model quantified HER2 scores were consistent with pathologist consensus scores across breast cancer tissue types. These results support incorporation of ML-based algorithms into clinical trial workflows to monitor HER2 testing quality including scoring, tissue quality, and assay performance. Figure 5. Sample Individual HER2 Case Report from the PathAI Clinical Trial Support Platform Report shows ML model Adjusted and Precision scores, tumor area, and tumor cell count (positive and negative HER2 tumor cells). Additional readouts include artifact area, DCIS area and turn-around time.

Upload: others

Post on 17-Jun-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Prospective Clinical Trials

Machine Learning Models to Quantify HER2 for Real-Time Tissue Image Analysis in Prospective Clinical Trials

STUDY BACKGROUND AND CONCLUSIONS PATHAI ML MODELS

DEVELOPMENT VALIDATION FUTURE DIRECTIONS• 689 breast cancer tissue samples were stained with HER2

immunohistochemistry (IHC; Ventana HER2 [4B5] Assay) and digitized into WSI (Leica Biosystems) across 5 laboratories in the US.

• Breast cancer tissue (purchased from Avaden Biosciences or anonymized from the AstraZeneca biobank) was from primary and metastatic tumors, core needle biopsies and surgical resections, lobular and ductal carcinomas across tumor grades and HER2 expression levels reflecting real-world conditions.

• WSI were stratified into training (n=407), validation (n=110), and test sets (n=173).

• Multiple convolutional neural network (CNN)-based ML models were trained using over 190,000 manual annotations provided by 30 board certified pathologists to identify artifacts, invasive tumor, individual cancer cells, and measure tumor cell membrane HER2 expression as partial or complete, and negative, weak-or-moderate, or intense (Figure 1, Figure 2, Figure 3).

Cell-Level ScoresML cell-level scores were validated against a consensus (median) of manual counts from 5 independent pathologists in 320 representative frames (Figure 1B,C). In test set data, there was strong agreement between ML-model and pathologist consensus scores for all cell types except for faintly positive HER2 cells where ML-based quantification identified more cells on average (Table 1).

Abstract #3061

AUTHORS

2021 American Society of Clinical Oncology Annual Meeting June 4-8, 2021, Virtual Meeting, USA

Benjamin Glass1, Michel Erminio Vandenberghe2, Surya Teja Chavali1, Syed Ashar Javed1, Marlon Rebelatto3, Shamira Sridharan1, Hunter

Elliott1, Sudha Rao1, Michael Montalto1, Murray Resnick1, Ilan Wapinski1, Andrew Beck1, Craig Barker2

1PathAI, Boston, MA; 2AstraZeneca R&D, Cambridge, United Kingdom; 3AstraZeneca, Gaithersburg, MD

ACKNOWLEDGMENTSThis study was funded by AstraZeneca Research and Development.

This poster template was developed by SciStories LLC. https://scistories.com/

Figure 2. Breast Cancer HER2Cell Model.

Figure 3. Breast Cancer HER2 Tissue Segmentation Model. Breast cancer tissue sample with (A) HER2 IHC stain; (B) ML model detection of cancer epithelium (red); cancer stroma (orange), DCIS (green); Necrosis (black)l non-tumor/background (no color)

Slide-Level ScoresHER2 slide-level scores were generated by automatically applying the rules derived from 2018 ASCO/CAP guidelines and compared with consensus scores from 3 independent pathologists in the test set (Figure 1B,C). In the test set, automatically generated ML-ASCO/CAP scores showed substantial consistency with the pathologists’ consensus scores across the IHC categories (ICC 0.88 [95% CI 0.82-0.92) and Figure 4, Precision Score.Agreement improved further when models were trained to agree with pathologists by adjusting the cut offs by moving ML predicted HER2 weak to moderate partial positive tumor cells to the 2+ score (ICC 0.91 [95% CI 0.89-0.94] and Figure 4, Adjusted Score.

A B C

A B

Breast cancer tissue with (A) HER2 IHC stain (Ventana HER2 [4B5] Assay; (B) ML model detection of cellular membrane (red); (C) ML model detection of HER2 negative cancer cell (red), partial HER2 positive cancer cell (red with orange border), complete HER2 positive cancer cell (red with purple border), other cell (green)

Figure 1. ML Model Training, Pathologist Label Collection, and Validation of HER2 scores

A) Model Training

B) Ground Truth Collection

ML model training incorporating annotations from board-certified pathologists (top). Inference on study samples (bottom left). Generation of cell level predictions summarized as slide level scores (bottom right).

Board certified pathologists score study samples with ASCO/CAP HER2 scores (bottom left) and exhaustively annotate cells within randomly generated frames extracted from the whole slide images (bottom right).

C) Model Validation

Pathologist labels and ML predicted scores for slide level readouts (left) and cell level readouts (right).

Table 1. ML Model Quantified and Pathologist HER2 Cell Level Scores Pearson correlation values for consensus cell count correlation with ML Model in evaluated frames, 95% CI

Figure 4. Confusion Matrices of Test Set (n=173) Comparison of Pathologist and PathAI Precision Algorithm ScoringPrecision score (left) and Adjusted Score (right)

PathAI Precision Algorithm PathAI Adjusted Algorithm

Validated ML models were incorporated into a clinical trial monitoring tool that supports the uploading of WSIs to the PathAI cloud-based platform, deployment of ML models, and reporting of case-level (Figure 5), and trial level results (Figure 6). The HER2 QC can be used at clinical laboratories to reproducibly and rapidly monitor sample adequacy and HER2 assays in active clinical trials.

REFERENCES1. Marchiò, Caterina, et al. “Evolving Concepts in HER2 Evaluation in Breast Cancer: Heterogeneity, HER2-Low Carcinomas and Beyond.” Seminars in Cancer Biology, 2020, doi:10.1016/j.semcancer.2020.02.016. 2. Modi, Shanu, et al. “Trastuzumab Deruxtecan in Previously Treated HER2-Positive Breast Cancer.” New England Journal of Medicine, vol. 382, no. 7, 2020, pp. 610–621., doi:10.1056/nejmoa1914510. 3. Perez, Edith A., et al. “HER2Testing By Local, Central, and Reference Laboratories in Specimens From the North Central Cancer Treatment Group N9831 Intergroup Adjuvant Trial.” Journal of Clinical Oncology, vol. 24, no. 19, 2006, pp. 3032–3038., doi:10.1200/jco.2005.03.4744.

Cell Type ASCO/CAP Score Model Pathologist

HER2 Negative Cancer Cell 0 0.77 [0.62– 0.85] 0.86 [0.80 – 0.90]

HER2 Faint Partial Membranous Positive Cancer Cell 1+ 0.65 [0.58 – 0.71] 0.32 [0.20 – 0.44]

HER2 Weak To Moderate Complete Membranous Positive Cancer Cell 2+ 0.84 [0.80 – 0.88] 0.76 [0.68 – 0.82]

HER2 Intense Complete Membranous Positive Cancer Cell 3+ 0.91 [0.88 – 0.93] 0.83 [0.76 – 0.88]

Figure 6. Sample HER2 Trial Report from the PathAI Clinical Trial Support Platform Generated Using Simulated Data

Trial report contains comparisons of ML-model predicted and pathologist scores using the Precision (left) and Adjusted (right) algorithms.

Deployment of the HER2 QC Tool for Use in Clinical Trials

Path

olog

ist S

core

Path

olog

ist S

core

HER2 overexpression is a demonstrated negative prognostic factor inbreast cancer, and a target for anti-HER2 compounds[1]. There is anunmet need for reproducible and accurate HER2 scoring in breastcancer as it is essential to inform treatment decisions[2]. Pathologistsshow inter- and intra-observer variability for whole slide quantitativescores in part because exhaustive cell scoring is not possible by manualmeans[3]. Using machine learning (ML) approaches, every viable tumorcell within the HER2 stained sample is classified. Here, we developed,trained, and validated an automated ML-based model as a qualitycontrol tool for HER2 testing and monitoring in clinical trials. The ML

model was trained using whole slide images (WSI) from multiplesources to quantify HER2 expression, and measure stain intensity,artifact content, tumor area, and DCIS (ductal carcinoma in-situ) acrossa diversity of breast cancer phenotypes. Model quantified HER2 scoreswere consistent with pathologist consensus scores across breastcancer tissue types. These results support incorporation of ML-basedalgorithms into clinical trial workflows to monitor HER2 testing qualityincluding scoring, tissue quality, and assay performance.

Figure 5. Sample Individual HER2 Case Report from the PathAI Clinical Trial Support Platform

Report shows ML model Adjusted and Precision scores, tumor area, and tumor cell count (positive and negative HER2 tumor cells). Additional readouts include artifact area, DCIS area and turn-around time.