image labeling for deep learning: human vs. machine · 2018-09-26 · image labeling for deep...

22
Image Labeling for Deep Learning: Human vs. Machine September 10, 2018 Curtis P. Langlotz, MD, PhD Professor of Radiology and Biomedical Informatics Director, Center for Artificial Intelligence in Medicine & Imaging (AIMI) Associate Chair, Information Systems, Department of Radiology Medical Informatics Director for Radiology, Stanford Health Care

Upload: others

Post on 26-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Image Labeling for Deep Learning:Human vs. MachineSeptember 10, 2018

Curtis P. Langlotz, MD, PhDProfessor of Radiology and Biomedical InformaticsDirector, Center for Artificial Intelligence in Medicine & Imaging (AIMI)Associate Chair, Information Systems, Department of RadiologyMedical Informatics Director for Radiology, Stanford Health Care

Page 2: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Annotation:an explanation or comment added to a text or diagram

Labeling:a descriptive or identifying word or phrase

http://www.radiologyassistant.nl/ https://arxiv.org/abs/1603.08486

Page 3: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Radiologist Labels1: No Significant Abnormality

4: Possible Significant Abnormality, May Need Action

9: Critical, Clinical Notified

1.5 million studies

Leslie Zatz, MDhttps://whatsnext.nuance.com/healthcare/radiologists-role-in-patient-centered-care/

Page 4: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Penn “Code Abdomen”Solid Organ Masses

0: Incompletely evaluated. See RECOMMENDATION.1: No mass.2: Benign. No further evaluation needed.3: Indeterminate. Future imaging follow up may

be needed. See RECOMMENDATION.4: Suspicious. May represent malignancy.

5: Highly suspicious. Clear imaging evidence of malignancy.

6: Known cancer.7: Completely treated cancer.

Zafar, H et al. JACR 2015; 12(9):947-50

Page 5: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Reverse-Index Radiology Report Search

5

Page 6: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Reverse-Index Radiology Report Search

6

Page 7: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

https://www.youtube.com/watch?v=iSQHelJ1xxUhttps://github.com/HazyResearch/snorkel

Snorkel: Data Programming for Weakly-Supervised Machine Learning

Page 8: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

“Crowd” Labeling

Page 9: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

• Part of speech• Porter stemmer• Word shape • NegEx negation• RadLex ontology class

Information Extraction Results

Hassanpour, S & Langlotz, CP. Artif Intell Med 23(1):84-9, 2016.

Page 10: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

• Part of speech• Porter stemmer• Word shape• NegEx negation• RadLex class

Word Representations

“You shall know a word by the company it keeps”(Firth, J. R. 1957:11)

Page 11: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Adapted from https://www.slideshare.net/BhaskarMitra3/a-simple-introduction-to-word-embeddings

0

(proximal, -2)

0

2

0

(proximal, -1)

1

1

0

0

(sclerotic, -2)

(sclerotic, -1)

(fibular, -1) (fracture, +1)

(metastasis, +1)

0 1 0 10 1

(humeral, -1)

0 1 0 10 1

0 0 1 01 0

2 0 1 01 0

similar

similar

“proximal fibular fracture”“proximal humeral fracture”“sclerotic fibular metastasis”“sclerotic humeral metastasis”

fibular

fracture

humeral

metastasis

Wor

dsContexts

Page 12: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Empiric Utility of Word Embeddings

http://nlp.stanford.edu/

Page 13: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

https://doi.org/10.1148/radiol.2017171115

Page 14: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Information Extraction Results

Zhang, Y & Langlotz, CP. Artif Intell Med 23(1):84-9, 2016.

Page 15: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Effect of Noisy Labels on Training Data Requirements

0%

100%

200%

300%

400%

500%

600%

700%

0% 5% 10% 15% 20% 25% 30%

V Agarwal et al. Learning Statistical Models of Phenotypes Using Noisy Labeled Training Data. JAMIA 23 (6): 1166–73, 2016.

HU Simon. General Bounds on the Number of Examples Needed for Learning Probabilistic Concepts. J Comput System Sci 52 (2): 239–54, 1996.

NLP Accuracy

Dat

a Se

t Siz

e

Noise

Noisy data

Clean data

Page 16: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Effect of Noisy Labels on Accuracy

https://arxiv.org/abs/1805.00932

Page 17: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

17

EDEMAINFILTRATECONSOLIDATIONPNEUMONIAATELECTASISMASSNODULEEMPHYSEMAPLEURAL THICKENINGEFFUSIONFIBROSISPNEUMOTHORAXCARDIOMEGALYHERNIA

Label Hierarchy

CheXpert: A Large-Scale Uncertainty-Labeled Dataset for Multi-LabelClassification of Observations in Chest Radiographs

Page 18: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Your Ontology May Vary

• Alveolar opacity

• Edema

• Consolidation

• Pneumonia

• Atelectasis

• Infiltrate

• Alveolar opacity

• Edema

• Pneumonia

• Consolidation

• Interstitial

• Atelectasis

EDEMA

INFILTRATE

CONSOLIDATION

PNEUMONIA

ATELECTASIS

Page 19: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

• Alveolar opacity

• Edema

• Pneumonia

• Consolidation

• Interstitial

• Atelectasis

• Opacity

• Pneumonia

• Consolidation

• Interstitial

• Edema

• Atelectasis

Your Ontology May Vary

• Alveolar opacity

• Edema

• Consolidation

• Pneumonia

• Atelectasis

• Infiltrate

EDEMA

CONSOLIDATION

PNEUMONIA

ATELECTASIS

INTERSTITIAL

Page 20: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Image Labeling Best Practices• Large training set with automated noisy labels

• Accurately labeled test set

• Multiple expert observers

• User training with examples and hierarchy

• Validate that users are following hierarchy

• Method to adjudicate observers

Page 21: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Conclusions

MEDICAL IMAGING DATA

IS MESSY

LABELS HAVE HIERARCHICAL RELATIONSHIPS

DATA VOLUME CAN OVERCOME DATA

NOISE

ACCURATE TEST SET LABELS ARE

IMPORTANT

Courtesy of Matt Lungren

Page 22: Image Labeling for Deep Learning: Human vs. Machine · 2018-09-26 · Image Labeling for Deep Learning: Human vs. Machine. September 10, 2018. Curtis P. Langlotz, MD, PhD. Professor

Thank YouCurtis P. Langlotz, MD, PhD

Professor of Radiology and Biomedical InformaticsDirector, Center for Artificial Intelligence for Medicine & Imaging

Associate Chair for Information SystemsDepartment of Radiology, Stanford University

Informatics Director for RadiologyStanford Health Care

@curtlanglotz