feature visualization techniques for medical image...

© 2018 Konica Minolta, Inc.INTERNAL - CONFIDENTIAL

Feature Visualization Techniquesfor Medical Image Analysis

Dr. Maximilian BaustKonica Minolta Laboratory Europe

GTC 2018, October 10, 2018

© 2018 Konica Minolta, Inc.

Fundamentals of Feature Visualization

1


feature visualization→ understanding visual concepts

used for a specific decision

attribution→ source of decision

Feature Visualization - Disambiguation

2

InceptionV1:mixed4d:159 by C. Olah et al., 2018


reconstruction (feature inversion)→ relevant information at certainabstraction level (loss of information)

dimensionality reduction/embedding→ class separability

Feature Visualization - Disambiguation

3

https://cs.stanford.edu/people/karpathy/cnnembed/


Activation Maximization as a Building Blockfor Feature Visualization

4

individual neuron spatial activation channel activation

Neural network mapping input 𝑥 to output 𝑦

Φ 𝑥 = 𝑦.

Denote the weights of Φ by

Φ𝑖 𝑖 = 1,… , 𝑛.

Denote the set of neurons to activate/stimulate by 𝑆 then we wish to maximize

𝐸 𝑥 =

𝑖∈𝑆

Φ𝑖 𝑎𝑖 𝑥 ,

where 𝑎𝑖 𝑥 is the activation corresponding to input 𝑥.

cluster activation

C. Olah et al., 2018


1. Define an energy 𝐸 and the set of neurons S to be activated (also negative activations are possible):

𝐸 𝑥 = σ𝑖∈𝑆+Φ𝑖 𝑎𝑖 𝑥 -σ𝑖∈𝑆−

Φ𝑖 𝑎𝑖 𝑥 .

2. Compute gradient w.r.t. to input 𝑥 via backpropagation 𝛻𝑥𝐸(𝑥).

3. Use gradient ascent

𝑥𝑖+1 = 𝑥𝑖 + 𝜏 ⋅ 𝛻𝑥𝐸(𝑥𝑖)

to compute the desired activation image (or any other gradient-based optimization strategy).

Variational Recipe for Activation Maximization

5


1. Activate/visualize first layer filters.

What you can do with it

6



2. Build semantic dictionaries explaining the decision at a specific position.


7




3. Combine attribution & feature visualization to obtain concept specific saliency maps.


8





4. Obtain channel-wise attribution masks.


9






5. Activate arbitrary combinations of neurons (positively and negatively) to investigate their interactions.

6. Activate clusters that concisely summarize entangled visual concepts.


10






5. Activate arbitrary combinations of neurons (positively and negatively) to investigate their interactions.

6. Activate clusters that concisely summarize entangled visual concepts.

7. Activate the output class to study class-specific visual representations (pre-image).


11


The Importance of Being Smooth

12


Vanilla execution of the variational recipe does lead to satisfactory results – example:

Following the Recipe

13


Main issue: Strided convolutions and poolings introduce high-frequency(checker-board-like) artifacts:

Essentially, poolings are linear or even non-linear low-pass operations whose inverse operations are high-pass filters. This causes high frequencies to appear during backpropagation. → Regularization is necessary.

Numerical Issues

14

∗2


1. Define an energy 𝐸 and the set of neurons S to be activated (also negative activation is possible):

𝐸 𝑥 = σ𝑖∈𝑆+Φ𝑖 𝑎𝑖 𝑥 -σ𝑖∈𝑆−

Φ𝑖 𝑎𝑖 𝑥

2. Compute gradient w.r.t. to input 𝑥 via backpropagation 𝛻𝑥𝐸(𝑥).

3. Use gradient ascent

𝑥𝑖+1 = G𝜎 ∗ (𝑥𝑖 + 𝜏 ⋅ 𝛻𝑥𝐸(𝑥𝑖))

to compute the desired activation image (or any other gradient-based optimization strategy).

An ad hoc Remedy?

15


without regularization with regularization

Impact of Regularization

16


𝑥𝑖+1 = 𝐾𝑠 ∗ (𝑥𝑖 + 𝜏 ⋅ 𝐾u ∗ 𝛻𝑥𝐸(𝑥𝑖))

Theory Behind

17

𝐾𝑢 = 𝐺𝜎 corresponds to an implicitregularization, i.e. a parametrization of 𝑥such that it is smooth:

𝑥 =

𝑖

𝐾𝑢 ⋅ 𝛼𝑖 .

If 𝐾𝑠 = 𝐺𝜎, this corresponds to anexplicit regularization, i.e.

𝐸 𝑥 + 𝜆(𝜎)

𝑥

𝛻𝑥 2.

Baust, M., Ludwig, F., Rupprecht, C., Kohl, M., & Braunewell, S. (2018).Understanding Regularization to Visualize Convolutional Neural Networks.arXiv preprint arXiv:1805.00071.


Gaussian kernels could be too ”blurry” – is there an alternative? → Yes!

Gaussian kernel Sobolev kernel

Kernels Beyond Gaussian

18



implicit regularization explicit regularization implicit & explicit regularization

What is the Difference? King Crab Class in VGG19

19

Gaussia

nkern

el

Sobo

lev

kern

el


explicit regularization implicit regularization

total variationby Mahendran & Vedaldi (2016)

update regularization with Gaussianby Nguyen et al. (2016)

Gaussian filtering of solutionby Mordvintsev (2015) and Øygard (2015)

multiscale, wavelet-type regularization by Mordvintsev (deepdream, 2016)

bilateral filterby Tyka (2016)

CNN-parameterizationby Dosovitskiy & Brox (2015), Nguyen et al. (2016) and Ulyanov & Vedaldi (2018)

Further Regularization Techniques Employed so far

20

In addition, people often use augmentations (jitter, rotations)and octaves (multiscale-approaches).



Applicability to Medical Image Analysis

21


Digital Histopathology – Workflow

22



Class Activations

23

norm

al cla

ss

invasiv

e c

lass


Pathology Detection in Frontal X-ray Images (ChestXray14)

24

class: "nodule"


Pathology Detection in Frontal X-ray Images (ChestXray14)

25

class: "nodule"



Class Activations for ChestXray14

26

Nodule

Pneum

onia


Computer Vision data bases are (more) exhaustively annotated, medical ones rather not.

Main Issue with Current Data Sets

27


detection of biases in training data convergence analysis & parameter reduction

General Benefits of Activation Maximization Techniques

28


1. Activation maximization (and attribution) are essential buildings blocks for understanding decisions of deep neural networks.

2. Regularization is crucial to obtain meaningful results. → (cf. our paper for an overview)

3. In combination with attribution, basic explicability can be achieved.

4. Activation maximization facilitates the detection of dataset biases & basic model analysis.

5. Visual concepts discovered by deep neural networks might not correspond to medical terminology.

6. Non-exhaustively annotated datasets might hinder the applicability to medical pattern recognition problems.

7. Employed regularization techniques are not connected to the network.

→ Obtaining visually pleasing activations does not prevent the existence of adversarial examples!

Conclusions so far

29


A Note onGenerative Modeling & Introspection

30


1. Generative approaches have proven to be very powerful, cf. variational auto-encoders, generative adversarial networks, etc.

2. Many are based on the minimum description length (MDL) principle→ Compression means understanding.

3. The effort of compressing information helps us to model it appropriately.

4. Not the compressed information itself resembles the learning process, but the way the information is compressed.

On the Importance of Generative Modeling

31


Introspective Neural Networks in a Nutshell

32

cla

ssific

ation

new

lydra

wn

pse

ud

o n

eg

ative

s


1. Plain application of activation maximization does not feed back into the learning process.

2. INNs require activation maximization for generating pseudo-negative examples.

3. Idea: Any method for better generation of pseudo-negative examples facilitates better training of introspective neural networks.

Introspection and Activation Maximization

33

generation ofnew pseudo negatives

classifier training


positive examples pseudo negative examples

An Example: Digital Pathology

34


Feature Visualizationfor Computer Aided Diagnosis& Medical Decision Support

35


Can you Learn the Shape of a cup?

36


21st Century Cure Act & FDA European Medical Device Regulation (2017/745 & 2017/746)

Legal Guidelines

37

Main issues:

• In general, software is difficult to approve.

• Root cause analysis for deep-learning-based algorithms is hard.


date company product autonomy level reg. pathway

02/2018 Arterys Arterys Oncology AI suite

assistive tool 510(k)

02/2018 Viz.AI Clinical Decision Support Software for Stroke

assistive tool De Novo

04/2018 IDx LLC. Idx-DR for diabetic retinopathy screening

screening without clinician for interpretation

De Novo

Examples for Recent FDA-approved & AI-based Medical Devices

38


1. Use an accepted machine learning approach that has shown to yield satisfactory results under optimal conditions, such as sufficient training data and proper parameter choice.→ Visualize filters to monitor convergence properties of your network.→ Make networks more robust using introspective approaches.

2. Ensure that your training data is complete (w.r.t. to the targeted application).→ Detect biases in the data via feature visualization.

3. Validate your system with (>3) experts and prove that it is always better than the worst.

4. Start with non-intelligent product to gather data and add intelligence later.

5. Leave responsibility to the clinician.→ Purely assistive devices can greatly benefit from explained decisions.

Still an issue: Systems working with multi-parametric data, e.g. radiological, pathological and genetic information.

Possible Strategies for Getting Approval &Opportunities for the Application of Feature Visualization Tools

39


Conclusion and Outlook

40


1. Research on Feature/Decision Visualization techniques is very important(and we are only at the very beginning).

2. Most research is focused on certain network architectures,such as VGG-networks and Inception networks (particularly GoogLeNet).

3. Generating interpretable visualization requires regularization.

4. Feeding back good visualizations can improve the training of generative methods (especially in case of INNs).

5. We have to change the way we annotate our medical data.→ More exhaustive/complete annotations are needed.

Conclusion & Outlook

41

feature visualization techniques for medical image...

Documents