an ai-augmented lesion detection framework for liver ... · neural network for fast liver tumor...

An AI-Augmented Lesion Detection Framework For LiverMetastases With Model Interpretability

Xin J. HuntSAS Institute [email protected]

Ralph AbbeySAS Institute Inc.

[email protected]

Ricky TharringtonSAS Institute Inc.

[email protected]

Joost HuiskensSAS Institute B.V.

[email protected]

Nina Wesdorp, MDAmsterdam University Medical

[email protected]

ABSTRACTColorectal cancer (CRC) is the third most common cancer and thesecond leading cause of cancer-related deaths worldwide. MostCRC deaths are the result of progression of metastases. The as-sessment of metastases is done using the RECIST criterion, whichis time consuming and subjective, as clinicians need to manuallymeasure anatomical tumor sizes. AI has many successes in imageobject detection, but often suffers because the models used arenot interpretable, leading to issues in trust and implementation inthe clinical setting. We propose a framework for an AI-augmentedsystem in which an interactive AI system assists clinicians in themetastasis assessment. We include model interpretability to giveexplanations of the reasoning of the underlying models.

KEYWORDSColorectal cancer; liver metastases; artificial intelligence; Shapleyvalues; interpretabilityACM Reference Format:Xin J. Hunt, Ralph Abbey, Ricky Tharrington, Joost Huiskens, and NinaWesdorp, MD. . An AI-Augmented Lesion Detection Framework For LiverMetastases With Model Interpretability. In Proceedings of SIGKDD ’19 (KDD’19). ACM, New York, NY, USA, 4 pages.

1 INTRODUCTIONColorectal cancer (CRC) is the third most common cancer andthe second leading cause of cancer-related deaths worldwide [5].Most cancer deaths are the result of progression of metastases.Approximately 50% of CRC patients will develop metastases tothe liver (CRLM) [1, 8]. Patients with liver-only colorectal metas-tases can be treated with curative intent. Complete surgical re-section of CRLM is considered the only method with a chance tocure these patients [4, 7, 20]. Only 20% of the patients with CRLMpresent with resectable CRLM [2, 6, 22, 29]. Initially-unresectableliver metastases can become resectable after downsizing the lesionsvia systemic therapy. However, there is no consensus regarding theoptimal systemic therapy regime. The effect of systemic treatmentvaries between patients, some have total response and others showprogression of disease [2, 21]. Moreover, systemic therapy has a lotof side effects due to its cytotoxicity [18].

In clinical oncology, the selection and monitoring of treatmentis crucial for effective cancer treatment and for the evaluation of

KDD ’19, August 04–08, 2019, Anchorage, Alaska.

new drug therapies. Accordingly, assessment of patient response totreatment is a crucial feature in the clinical evaluation of systemictherapy. The widely accepted and applied criterion for such assess-ment is the Response Evaluation Criteria In Solid Tumors (RECIST),which aims to measure the objective change of anatomical tumorsize. The RECIST assessment is performed by measuring changesin one-dimensional (1-D) diameter in two target lesions before andafter therapy [10]. Though RECIST is a clinical standard worldwide,it is highly limited. Currently, it is not possible to predict clinicaloutcome based on tumor response assessment (RECIST) and patientcharacteristics in individual patients. A meta-analysis revealed thatinter-observer variability in RECIST measurement may exceed the20% cut-off for progression, resulting in potential misclassificationof diagnosis (stable disease or progression) [30]. A further prob-lem of RECIST is the subjectivity and variability in selecting targetlesions.

1.1 Assessment AutomationOne of the goals of this project is to more accurately and efficientlyassess tumor response. Automated medical image processing al-lows more objective analysis of clinically relevant imaging features.Machine learning methods like object detection models trainedon clinical data can be used to detect tumor lesions and automatesystemic therapy response assessment. However, fully automatedsystems directly based on object detection models are not idealunder current circumstances, for reasons including• Accuracy concerns: Modern general purpose object detectionmethods can only achieve 30% to roughly 75% mean averageprecision (mAP), depending on the dataset they are tested on[13].Specialized models trained specifically for lesion detection mayachieve better accuracy, but will still make mistakes. Incorpo-rating professional knowledge and feedback from clinicians cansignificantly improve detection accuracy.

• Lack of interpretability: Most of the high-performance objectdetection models are based on deep learning, which are oftenconsidered black-box models. However, it is important to commu-nicate the reason behind decisions to the physician prescribingtreatment plans and to the patient. A system without the abilityto explain its decisions is not desirable in the clinical setting.

Thus, in this paper we propose an interactive system, in which weuse machine learning object detection models along with modelinterpretability and natural language generation to augment the ef-ficacy and accuracy of clinicians in lesion detection and assessment.

arX

iv:1

907.

0771

3v1

[ee

ss.I

V]

17

Jul 2

019

KDD ’19, August 04–08, 2019, Anchorage, Alaska Hunt, Abbey, Tharrington, Huiskens, and Wesdorp

Model interpretability with natural language generation acts to pro-vide trust and understanding of the underlying machine learningobject detection models.

2 RELATEDWORK2.1 Clinical studiesArtificial intelligence is quickly moving forward in many fields,including in medicine. Deep learning techniques have deliveredimpressive results in image recognition. Radiology and pathologyare medical specialties that create and evaluate lots of medicalimages. Multiple studies in these specialties have used artificialintelligence to mimic or augment human capabilities.

Masood et al. [17] propose a computer-assisted decision sup-port system with the potential to help radiologists in improvingdetection and diagnosis decision for pulmonary cancer stage classi-fication. Wang et al. [28] show that using a deep learning modelfor automated detection of metastatic breast cancer in lymph nodebiopsies reduces the error rate of pathologists from over three per-cent to less than one percent. Hamm et al. [11] use a convolutionalneural network for fast liver tumor diagnosis on multi-phasic MRIimages.

While these initial results are positive, substantial translationor implementation of these technologies into clinical use has notyet transpired [12]. Beyond building AI algorithms, applying themin daily clinical practice is complex [9]. Key challenges for theimplementation include data sharing, patient safety, data standard-ization, integration into complex clinical workflows, compliancewith regulation, and transparency [12]. Transparency of complexAI algorithms is of great importance for clinicians. If a medical doc-tor cannot understand the outcome of an algorithm, then the doctorwill be unable to explain the outcome to a patient. Technologiesthat help explain complex AI algorithms have an important role inacceptance of AI by the medical community.

2.2 Model interpretabilityIn our proposed framework there is a need for an applicable modelinterpretability method. The majority of methods for model inter-pretability come in three forms: inherently-interpretable models[3, 15, 27], methods for interpreting existing models [19, 25]), andpost-hoc investigations [16, 23, 24, 26]. In the current framework,we use a segmentation convolutional neural network model forlesion detection in CT images for the CRLM patients. To give ex-planations of this model, we use Shapley values, a post-hoc, model-agnostic interpretability method. The model-agnostic nature of theShapley values gives us the flexibility to substitute better perform-ing models in future research.

The Shapley Values were originally introduced in game theoryas a way to determine the individual contributions of players in acollaborative game. In model interpretability the Shapley values areused to measure the individual contributions of the input variablevalues of a single observation to a model’s prediction. Recent workin this area includes [16, 26]. Image models use as inputs a seriesof pixel values. Assigning Shapley values to these pixels creates agradient over an image indicating regions of an image that lead toor detract from lesion detection probabilities. This gradient is veryeasy to understand, which makes Shapley values a natural methodfor the clinician-collaboration framework.

3 PROPOSED FRAMEWORKIn this section we propose a system with clinician interaction forhigh accuracy lesion detection and measurement. The proposedsystem is summarized in Figure 1.

The images from a CT scan are sent to an automated report gen-eration system. The report generation system’s purpose is to createan interactive and coherent report highlighting possible lesions.This report will allow a clinician to quickly confirm or overwrite thedetections made by the automated system. The report generationsystem consists of three modules: a lesion detection module, aninterpretability module, and a natural language generation module.The lesion detection module processes the scan images and labelsall potential lesions. Note that we can use any high-accuracy ob-ject detection model as the detection module. The detected lesionsare sent to the interpretability module to generate visual explana-tions that highlight important decision areas, which can help theclinician make better informed decisions. The natural language gen-eration module then collects information from previous modulesand generates a report for the clinician.

Once a report is generated, a clinician then reviews the report,confirming or rejecting each individual detection. The clinician canalso add new lesion detections missed by the automated system.During the review process, the clinician can interact with the inter-pretability module, review explanations for detected lesions, andrequest explanations for new areas as indicated by the clinician.Once all detections are confirmed or rejected, the scan image issent to the automated measuring system, where information likelesion count, location, and diameter is recorded.

CT Scan Image

Lesion Detection Module

Natural Language Generation Module

Interpretability Module

Detected Lesions

Explanations

Clinician Review

Final Lesion Count and

Measurement

Automated Report Generation System

Automated Measuring System

Clinician Interaction

Interpretability Module

Lesion Detection Report

Figure 1: Flow chart of the proposed system

AI-Augmented Lesion Detection KDD ’19, August 04–08, 2019, Anchorage, Alaska

4 HYPERSHAP (HYPERPARAMETERIZEDSHAPLEY VALUE ESTIMATION)

The interpretability module in our proposed framework uses Shap-ley values. We propose a novel and deterministic approximation tothe Shapley values for efficient computation.

The calculation of the Shapley values for the ith variable of aninstance of interest, which we call a query q, given a predictivemodel f : Rm → R is

ϕi =∑

z ∈[0,1]m,z i=0

(m − |z | − 1)!|z |!m!

(Ez+e i [(f (x)] − Ez [f (x)]

),

where the expectation Ez [f (q)] = 1n∑x ∈X f (τ (q,x , S)) is com-

puted over all observations x in a data set X with n total observa-

tions. The function τ (q,x , S) ={qi i ∈ S

xi i < S, and the relationship

between z and S is zi ={1 variable i ∈ S

0 variable i < S.

S represents a subset of variables used in the model training.The summation in the Shapley value computation is over all z ∈[0, 1]m ,zi = 0, that is to say, all subsets of variables that are notthe variable of interest.

It is important to note that the Shapley value computation re-quires all subsets of variables, of which there are 2m . This com-putation complexity makes direct computation infeasible. Instead,we rely on a deterministic approximation that uses only allowedsubsets of variables as determined by a hyperparameter χ , leadingto the name HyperSHAP. We use a deterministic approximation toensure stability in explanations, while still achieving high accuracyof the Shapley values.

Algorithm 2 describes the full HyperSHAP computation for avalue of χ , using Algorithm 1, which computes the expectations.

Algorithm 1 Shapley Expected Values

1: input: training data X t ∈ Rn×m , query data q ∈ Rm , modelfunction f (·) : Rm → R, selection matrix Z ∈ {0, 1}c×m

2: initialize: µ = 0c×1 be an all-zero vector3: for j = 1 . . . , c do4: initialize: Xz = [ ]5: Let z = Z[j, :] be the jth row of Z6: for i = 1 . . . ,n do7: Compute xz = τ (q,X t

[i, :],z) using the ith row of X t

8: Xz =

[Xz

xz

]9: end for10: Compute y ∈ Rn , where yt = f (Xz

[t, :]), t = 1, . . . ,n11: µ j =

1n1

⊤n×1y

12: end for13: output: µ

5 PRELIMINARY RESULTS5.1 Clinical DataThe first phase of this project aims to improve the response assess-ment to systemic therapy of CRLM patients by applying advanced

Algorithm 2 HyperSHAP

1: input: training data X t ∈ Rn×m , query data q ∈ Rm , modelfunction f (·) : Rm → R, approximation depth χ

2: initialize: Z = [ ]3: for k = 0, . . . , χ ,m − χ , . . . ,m do4: Let Zk ∈ {0, 1}(

mk )×m be the selection matrix whose rows

form the set {z ∈ {0, 1}m : |z | = k}

5: Z =

[Z

Zk

]6: end for7: Use Algorithm 1 to compute µ with Z8: Let c be the number of rows in Z9: for i = 1, 2, . . . ,m do10: a = Z[:,i] ∈ {0, 1}c , the ith column of Z11: b = Z[:, j,i]1(m−1)×1 ∈ Nc×1, the row sum of Z excluding

the ith column12: v = 0c×113: for j = 1, . . . , c do

14: vj =

{(2aj − 1)wχ (bj ), if bj ≤ χ − 1 or bj ≥ m − χ

0, otherwise

wherewχ (bj ) ={ bj !(m−bj−1)!

m! · m2χ , χ < ⌈m/2⌉

bj !(m−bj−1)!m! , otherwise.

15: end for16: ϕi = v

⊤µ17: end for18: ϕ0 = f (q) −∑m

i=1 ϕi19: output: ϕ0,ϕ1, . . . ,ϕm

analytics to medical imaging and clinical data. All patient data usedwere collected as part of the multicenter randomized clinical trialCAIRO5 [14]. This ongoing study aims to downsize tumor burdenin the liver and make local treatment with curative intent feasiblefor initially unresectable colorectal liver metastases, in order toimprove (disease-free) survival. The data consisted of diagnosticimaging (CT images) before and after systemic therapy, evaluatedby a nationwide expert-panel. The CT-images from 52 patientswere used for segmentation of the liver and liver metastases byan expert radiologist. The expert segmentations were performedsemi-automatically using the Philips® IntelliSpace Portal software.A total of 1380 liver metastases were segmented, resulting in the3-D organ contours of the liver and all metastases. From each tumor,each three-dimensional pixel (voxel) is available for analytics.

5.2 Model Training and InterpretationWe built a deep-learning image segmentation model targeting la-beled lesion regions on the CT images. For each new test CT image,we use the deep-learning model to predict lesion regions, and thencalculate the Shapley values for the deep-learning model on the de-tected lesion regions in the image. Positive Shapley values indicatepixels that contribute positively towards the predicted probabilityof a lesion, while negatively Shapley values indicate pixels thatcontribute negatively towards the model’s predicted probability ofa lesion. By viewing the area that contributes towards the predicted

KDD ’19, August 04–08, 2019, Anchorage, Alaska Hunt, Abbey, Tharrington, Huiskens, and Wesdorp

probability of a lesion, a clinician can see what area of the CT imagethe deep-learning model thinks is indicative of a lesion in the liver.

In Figure 2 we show a sample report generated using the pro-posed framework.

Selected Lesion Explanation of Selected Lesion

The model has found three potential lesions (orange boxes) from the slice. The selected lesion area and its explanation are shown above. The red pixels in the explanation highlight the area that contributes positively to the lesion prediction according to the model. The green pixels highlight the area that detracts from the model’s predicted probability of the lesion. Slice 121

✔

✖i

✔

✖i

✔

✖i

Figure 2: A sample report. The clinician can confirm or re-ject each detected lesion by clicking the green or red but-ton next to the bounding box. Clicking the orange button(labeled with letter “i”) will show explanations for the cor-responding image patch for review. The clinician can alsolabel new areas as lesions and request explanations.

6 CONCLUSIONCurrent lesion detection andmeasurement systems are not clinician-efficient, taking large amounts of clinicians’ time. These systemsalso suffer from significant inter-clinician variability. The proposedsystem optimizes the use of clinicians’ time by quickly identifyingpotential lesions and providing interpretation as to why the modelprovided such a prediction. By automating a portion of the lesiondetection task, our framework can reduce inter-clinician variability.

Our initial experiments have shown promise both in the abil-ity to detect lesions and the ability to explain the predictions of alesion detection model. Future work includes improving the detec-tion model, improving the interaction system with clinicians, andvalidating our AI-augmented framework through clinical trials.

REFERENCES[1] Eddie K Abdalla, Rene Adam, Anton J Bilchik, Daniel Jaeck, Jean-Nicolas Vauthey,

and David Mahvi. 2006. Improving resectability of hepatic colorectal metastases:expert consensus statement. Annals of surgical oncology 13, 10 (2006), 1271–1280.

[2] René Adam, Valérie Delvart, Gérard Pascal, Adrian Valeanu, Denis Castaing,Daniel Azoulay, Sylvie Giacchetti, Bernard Paule, Francis Kunstlinger, OdileGhémard, et al. 2004. Rescue surgery for unresectable colorectal liver metastasesdownstaged by chemotherapy: a model to predict long-term survival. Annals ofsurgery 240, 4 (2004), 644.

[3] Elaine Angelino, Nicholas Larus-Stone, Daniel Alabi, Margo Seltzer, and CynthiaRudin. 2017. Certifiably optimal rule lists for categorical data. In Proceedingsof the 23rd ACM SIGKDD Conference of Knowledge, Discovery, and Data Mining(KDD).

[4] J-H Angelsen, A Horn, H Sorbye, GE Eide, IM Løes, and A Viste. 2017. Population-based study on resection rates and survival in patients with colorectal livermetastasis in Norway. British Journal of Surgery 104, 5 (2017), 580–589.

[5] Freddie Bray, Jacques Ferlay, Isabelle Soerjomataram, Rebecca L Siegel, Lindsey ATorre, and Ahmedin Jemal. 2018. Global cancer statistics 2018: GLOBOCANestimates of incidence and mortality worldwide for 36 cancers in 185 countries.CA: a cancer journal for clinicians 68, 6 (2018), 394–424.

[6] MV d Eynde and Alain Hendlisz. 2009. Treatment of colorectal liver metastases:a review. Reviews on recent clinical trials 4, 1 (2009), 56–62.

[7] Jannemarie AM de Ridder, Eric P van der Stok, Leonie J Mekenkamp, BastiaanWiering, Miriam Koopman, Cornelis JA Punt, Cornelis Verhoef, and H Johannes.2016. Management of liver metastases in colorectal cancer patients: a retrospec-tive case-control study of systemic therapy versus liver resection. EuropeanJournal of Cancer 59 (2016), 13–21.

[8] Matteo Donadon, Dario Ribero, Gareth Morris-Stiff, Eddie K Abdalla, and Jean-Nicolas Vauthey. 2007. New paradigm in themanagement of liver-onlymetastasesfrom colorectal cancer. Gastrointestinal cancer research: GCR 1, 1 (2007), 20.

[9] Keith J Dreyer and J Raymond Geis. 2017. When machines think: radiologyâĂŹsnext frontier. Radiology 285, 3 (2017), 713–718.

[10] Elizabeth A Eisenhauer, Patrick Therasse, Jan Bogaerts, Lawrence H Schwartz, DSargent, Robert Ford, Janet Dancey, S Arbuck, Steve Gwyther, Margaret Mooney,et al. 2009. New response evaluation criteria in solid tumours: revised RECISTguideline (version 1.1). European journal of cancer 45, 2 (2009), 228–247.

[11] Charlie A Hamm, Clinton J Wang, Lynn J Savic, Marc Ferrante, Isabel Schobert,Todd Schlachter, MingDe Lin, James S Duncan, Jeffrey C Weinreb, Julius Chapiro,et al. 2019. Deep learning for liver tumor diagnosis part I: development of aconvolutional neural network classifier for multi-phasic MRI. European radiology(2019), 1–10.

[12] Jianxing He, Sally L Baxter, Jie Xu, Jiming Xu, Xingtao Zhou, and Kang Zhang.2019. The practical implementation of artificial intelligence technologies inmedicine. Nature medicine 25, 1 (2019), 30.

[13] Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara,Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, et al.2017. Speed/accuracy trade-offs for modern convolutional object detectors. InProceedings of the IEEE conference on computer vision and pattern recognition.7310–7311.

[14] Joost Huiskens, Thomas M van Gulik, Krijn P van Lienden, Marc RW Engelbrecht,Gerrit A Meijer, Nicole CT van Grieken, Jonne Schriek, Astrid Keijser, LindaMol, I Quintus Molenaar, et al. 2015. Treatment strategies in colorectal cancerpatients with initially unresectable liver-only metastases, a study protocol ofthe randomised phase 3 CAIRO5 study of the Dutch Colorectal Cancer Group(DCCG). BMC cancer 15, 1 (2015), 365.

[15] Yin Lou, Rich Caruana, Johannes Gehrke, and Giles Hooker. 2013. Accurateintelligible models with pairwise interactions. In Proceedings of the 19th ACMSIGKDD international conference on Knowledge discovery and data mining. ACM,623–631.

[16] Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting modelpredictions. In Advances in Neural Information Processing Systems. 4765–4774.

[17] Anum Masood, Bin Sheng, Ping Li, Xuhong Hou, Xiaoer Wei, Jing Qin, andDagan Feng. 2018. Computer-assisted decision support system in pulmonarycancer detection and stage classification on CT images. Journal of biomedicalinformatics 79 (2018), 117–128.

[18] Jeffrey A Meyerhardt and Robert J Mayer. 2005. Systemic therapy for colorectalcancer. New England Journal of Medicine 352, 5 (2005), 476–487.

[19] GrégoireMontavon,Wojciech Samek, and Klaus-Robert Müller. 2018. Methods forinterpreting and understanding deep neural networks. Digital Signal Processing73 (2018), 1–15.

[20] Agneta Norén, HG Eriksson, and LI Olsson. 2016. Selection for surgery andsurvival of synchronous colorectal liver metastases; a nationwide study. Europeanjournal of cancer 53 (2016), 105–114.

[21] National Working Group on Gastrointestinal Tumors. 2004. Colorectal carcinomanational guideline 2014. https://www.oncoline.nl/colorectaalcarcinoom

[22] Graeme J Poston, Joan Figueras, Felice Giuliante, Gennaro Nuzzo, Alberto FSobrero, Jean-Francois Gigot, Bernard Nordlinger, Rene Adam, Thomas Gruen-berger, Michael A Choti, et al. 2008. Urgent need for a new staging system inadvanced colorectal cancer. Journal of Clinical Oncology 26, 29 (2008), 4828–4833.

[23] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. Why should itrust you?: Explaining the predictions of any classifier. In Proceedings of the 22ndACM SIGKDD international conference on knowledge discovery and data mining.ACM, 1135–1144.

[24] Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2018. Anchors: High-precision model-agnostic explanations. In AAAI Conference on Artificial Intelli-gence.

[25] Wojciech Samek, Thomas Wiegand, and Klaus-Robert Müller. 2017. Explainableartificial intelligence: Understanding, visualizing and interpreting deep learningmodels. arXiv preprint arXiv:1708.08296 (2017).

[26] Erik Strumbelj and Igor Kononenko. 2010. An Efficient Explanation of IndividualClassifications Using Game Theory. J. Mach. Learn. Res. 11 (March 2010), 1–18.http://dl.acm.org/citation.cfm?id=1756006.1756007

[27] Berk Ustun and Cynthia Rudin. 2016. Learning Optimized Risk Scores on Large-Scale Datasets. arXiv preprint arXiv:1610.00168 (2016).

[28] Dayong Wang, Aditya Khosla, Rishab Gargeya, Humayun Irshad, and Andrew HBeck. 2016. Deep learning for identifying metastatic breast cancer. arXiv preprintarXiv:1606.05718 (2016).

[29] Dennis A Wicherts, Robbert J de Haas, and René Adam. 2007. Bringing un-resectable liver disease to resection with curative intent. European Journal ofSurgical Oncology (EJSO) 33 (2007), S42–S51.

[30] Soon Ho Yoon, Kyung Won Kim, Jin Mo Goo, Dong-Wan Kim, and SeokyungHahn. 2016. Observer variability in RECIST-based tumour burden measurements:a meta-analysis. European journal of cancer 53 (2016), 5–15.

https://www.oncoline.nl/colorectaalcarcinoom

http://dl.acm.org/citation.cfm?id=1756006.1756007

an ai-augmented lesion detection framework for liver ... · neural network for fast liver tumor...

Documents