feature visualization techniques for medical image...
TRANSCRIPT
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Feature Visualization Techniquesfor Medical Image Analysis
Dr. Maximilian BaustKonica Minolta Laboratory Europe
GTC 2018, October 10, 2018
© 2018 Konica Minolta, Inc.
Fundamentals of Feature Visualization
1
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
feature visualization→ understanding visual concepts
used for a specific decision
attribution→ source of decision
Feature Visualization - Disambiguation
2
InceptionV1:mixed4d:159 by C. Olah et al., 2018
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
reconstruction (feature inversion)→ relevant information at certainabstraction level (loss of information)
dimensionality reduction/embedding→ class separability
Feature Visualization - Disambiguation
3
https://cs.stanford.edu/people/karpathy/cnnembed/
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Activation Maximization as a Building Blockfor Feature Visualization
4
individual neuron spatial activation channel activation
Neural network mapping input 𝑥 to output 𝑦
Φ 𝑥 = 𝑦.
Denote the weights of Φ by
Φ𝑖 𝑖 = 1,… , 𝑛.
Denote the set of neurons to activate/stimulate by 𝑆 then we wish to maximize
𝐸 𝑥 =
𝑖∈𝑆
Φ𝑖 𝑎𝑖 𝑥 ,
where 𝑎𝑖 𝑥 is the activation corresponding to input 𝑥.
cluster activation
C. Olah et al., 2018
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Define an energy 𝐸 and the set of neurons S to be activated (also negative activations are possible):
𝐸 𝑥 = σ𝑖∈𝑆+Φ𝑖 𝑎𝑖 𝑥 -σ𝑖∈𝑆−
Φ𝑖 𝑎𝑖 𝑥 .
2. Compute gradient w.r.t. to input 𝑥 via backpropagation 𝛻𝑥𝐸(𝑥).
3. Use gradient ascent
𝑥𝑖+1 = 𝑥𝑖 + 𝜏 ⋅ 𝛻𝑥𝐸(𝑥𝑖)
to compute the desired activation image (or any other gradient-based optimization strategy).
Variational Recipe for Activation Maximization
5
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Activate/visualize first layer filters.
What you can do with it
6
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Activate/visualize first layer filters.
2. Build semantic dictionaries explaining the decision at a specific position.
What you can do with it
7
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Activate/visualize first layer filters.
2. Build semantic dictionaries explaining the decision at a specific position.
3. Combine attribution & feature visualization to obtain concept specific saliency maps.
What you can do with it
8
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Activate/visualize first layer filters.
2. Build semantic dictionaries explaining the decision at a specific position.
3. Combine attribution & feature visualization to obtain concept specific saliency maps.
4. Obtain channel-wise attribution masks.
What you can do with it
9
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Activate/visualize first layer filters.
2. Build semantic dictionaries explaining the decision at a specific position.
3. Combine attribution & feature visualization to obtain concept specific saliency maps.
4. Obtain channel-wise attribution masks.
5. Activate arbitrary combinations of neurons (positively and negatively) to investigate their interactions.
6. Activate clusters that concisely summarize entangled visual concepts.
What you can do with it
10
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Activate/visualize first layer filters.
2. Build semantic dictionaries explaining the decision at a specific position.
3. Combine attribution & feature visualization to obtain concept specific saliency maps.
4. Obtain channel-wise attribution masks.
5. Activate arbitrary combinations of neurons (positively and negatively) to investigate their interactions.
6. Activate clusters that concisely summarize entangled visual concepts.
7. Activate the output class to study class-specific visual representations (pre-image).
What you can do with it
11
© 2018 Konica Minolta, Inc.
The Importance of Being Smooth
12
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Vanilla execution of the variational recipe does lead to satisfactory results – example:
Following the Recipe
13
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Main issue: Strided convolutions and poolings introduce high-frequency(checker-board-like) artifacts:
Essentially, poolings are linear or even non-linear low-pass operations whose inverse operations are high-pass filters. This causes high frequencies to appear during backpropagation. → Regularization is necessary.
Numerical Issues
14
∗2
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Define an energy 𝐸 and the set of neurons S to be activated (also negative activation is possible):
𝐸 𝑥 = σ𝑖∈𝑆+Φ𝑖 𝑎𝑖 𝑥 -σ𝑖∈𝑆−
Φ𝑖 𝑎𝑖 𝑥
2. Compute gradient w.r.t. to input 𝑥 via backpropagation 𝛻𝑥𝐸(𝑥).
3. Use gradient ascent
𝑥𝑖+1 = G𝜎 ∗ (𝑥𝑖 + 𝜏 ⋅ 𝛻𝑥𝐸(𝑥𝑖))
to compute the desired activation image (or any other gradient-based optimization strategy).
An ad hoc Remedy?
15
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
without regularization with regularization
Impact of Regularization
16
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
𝑥𝑖+1 = 𝐾𝑠 ∗ (𝑥𝑖 + 𝜏 ⋅ 𝐾u ∗ 𝛻𝑥𝐸(𝑥𝑖))
Theory Behind
17
𝐾𝑢 = 𝐺𝜎 corresponds to an implicitregularization, i.e. a parametrization of 𝑥such that it is smooth:
𝑥 =
𝑖
𝐾𝑢 ⋅ 𝛼𝑖 .
If 𝐾𝑠 = 𝐺𝜎, this corresponds to anexplicit regularization, i.e.
𝐸 𝑥 + 𝜆(𝜎)
𝑥
𝛻𝑥 2.
Baust, M., Ludwig, F., Rupprecht, C., Kohl, M., & Braunewell, S. (2018).Understanding Regularization to Visualize Convolutional Neural Networks.arXiv preprint arXiv:1805.00071.
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Gaussian kernels could be too ”blurry” – is there an alternative? → Yes!
Gaussian kernel Sobolev kernel
Kernels Beyond Gaussian
18
Baust, M., Ludwig, F., Rupprecht, C., Kohl, M., & Braunewell, S. (2018).Understanding Regularization to Visualize Convolutional Neural Networks.arXiv preprint arXiv:1805.00071.
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
implicit regularization explicit regularization implicit & explicit regularization
What is the Difference? King Crab Class in VGG19
19
Gaussia
nkern
el
Sobo
lev
kern
el
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
explicit regularization implicit regularization
total variationby Mahendran & Vedaldi (2016)
update regularization with Gaussianby Nguyen et al. (2016)
Gaussian filtering of solutionby Mordvintsev (2015) and Øygard (2015)
multiscale, wavelet-type regularization by Mordvintsev (deepdream, 2016)
bilateral filterby Tyka (2016)
CNN-parameterizationby Dosovitskiy & Brox (2015), Nguyen et al. (2016) and Ulyanov & Vedaldi (2018)
Further Regularization Techniques Employed so far
20
In addition, people often use augmentations (jitter, rotations)and octaves (multiscale-approaches).
Baust, M., Ludwig, F., Rupprecht, C., Kohl, M., & Braunewell, S. (2018).Understanding Regularization to Visualize Convolutional Neural Networks.arXiv preprint arXiv:1805.00071.
© 2018 Konica Minolta, Inc.
Applicability to Medical Image Analysis
21
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Digital Histopathology – Workflow
22
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
implicit regularization explicit regularization implicit & explicit regularization
Class Activations
23
norm
al cla
ss
invasiv
e c
lass
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Pathology Detection in Frontal X-ray Images (ChestXray14)
24
class: "nodule"
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Pathology Detection in Frontal X-ray Images (ChestXray14)
25
class: "nodule"
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
implicit regularization explicit regularization implicit & explicit regularization
Class Activations for ChestXray14
26
Nodule
Pneum
onia
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Computer Vision data bases are (more) exhaustively annotated, medical ones rather not.
Main Issue with Current Data Sets
27
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
detection of biases in training data convergence analysis & parameter reduction
General Benefits of Activation Maximization Techniques
28
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Activation maximization (and attribution) are essential buildings blocks for understanding decisions of deep neural networks.
2. Regularization is crucial to obtain meaningful results. → (cf. our paper for an overview)
3. In combination with attribution, basic explicability can be achieved.
4. Activation maximization facilitates the detection of dataset biases & basic model analysis.
5. Visual concepts discovered by deep neural networks might not correspond to medical terminology.
6. Non-exhaustively annotated datasets might hinder the applicability to medical pattern recognition problems.
7. Employed regularization techniques are not connected to the network.
→ Obtaining visually pleasing activations does not prevent the existence of adversarial examples!
Conclusions so far
29
© 2018 Konica Minolta, Inc.
A Note onGenerative Modeling & Introspection
30
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Generative approaches have proven to be very powerful, cf. variational auto-encoders, generative adversarial networks, etc.
2. Many are based on the minimum description length (MDL) principle→ Compression means understanding.
3. The effort of compressing information helps us to model it appropriately.
4. Not the compressed information itself resembles the learning process, but the way the information is compressed.
On the Importance of Generative Modeling
31
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Introspective Neural Networks in a Nutshell
32
cla
ssific
ation
new
lydra
wn
pse
ud
o n
eg
ative
s
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Plain application of activation maximization does not feed back into the learning process.
2. INNs require activation maximization for generating pseudo-negative examples.
3. Idea: Any method for better generation of pseudo-negative examples facilitates better training of introspective neural networks.
Introspection and Activation Maximization
33
generation ofnew pseudo negatives
classifier training
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
positive examples pseudo negative examples
An Example: Digital Pathology
34
© 2018 Konica Minolta, Inc.
Feature Visualizationfor Computer Aided Diagnosis& Medical Decision Support
35
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
Can you Learn the Shape of a cup?
36
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
21st Century Cure Act & FDA European Medical Device Regulation (2017/745 & 2017/746)
Legal Guidelines
37
Main issues:
• In general, software is difficult to approve.
• Root cause analysis for deep-learning-based algorithms is hard.
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
date company product autonomy level reg. pathway
02/2018 Arterys Arterys Oncology AI suite
assistive tool 510(k)
02/2018 Viz.AI Clinical Decision Support Software for Stroke
assistive tool De Novo
04/2018 IDx LLC. Idx-DR for diabetic retinopathy screening
screening without clinician for interpretation
De Novo
Examples for Recent FDA-approved & AI-based Medical Devices
38
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Use an accepted machine learning approach that has shown to yield satisfactory results under optimal conditions, such as sufficient training data and proper parameter choice.→ Visualize filters to monitor convergence properties of your network.→ Make networks more robust using introspective approaches.
2. Ensure that your training data is complete (w.r.t. to the targeted application).→ Detect biases in the data via feature visualization.
3. Validate your system with (>3) experts and prove that it is always better than the worst.
4. Start with non-intelligent product to gather data and add intelligence later.
5. Leave responsibility to the clinician.→ Purely assistive devices can greatly benefit from explained decisions.
Still an issue: Systems working with multi-parametric data, e.g. radiological, pathological and genetic information.
Possible Strategies for Getting Approval &Opportunities for the Application of Feature Visualization Tools
39
© 2018 Konica Minolta, Inc.
Conclusion and Outlook
40
© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL
1. Research on Feature/Decision Visualization techniques is very important(and we are only at the very beginning).
2. Most research is focused on certain network architectures,such as VGG-networks and Inception networks (particularly GoogLeNet).
3. Generating interpretable visualization requires regularization.
4. Feeding back good visualizations can improve the training of generative methods (especially in case of INNs).
5. We have to change the way we annotate our medical data.→ More exhaustive/complete annotations are needed.
Conclusion & Outlook
41