intro to deep learning - mcgill university...computer vision and pattern recognition (cvpr), 2016...

Intro to

Deep Learningfor

NeuroImaging

Andrew Doyle

@crocodoyle

McGill Centre for Integrative Neuroscience

Outline

1. GET EXCITED

2. Artificial Neural Networks

3. Backpropagation

4. Convolutional Neural Networks

5. Neuroimaging Applications

ImageNet-1000 Results

Image courtesy Aaron Courville, 2016

Generative Models

Deep Blood by Team BloodArtBrainBrushGatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style

transfer using convolutional neural networks." Computer Vision and

Pattern Recognition (CVPR), 2016 IEEE Conference on. IEEE, 2016.

Generative Models

Zhang, Han, et al. "StackGAN: Text to photo-realistic image synthesis

with stacked generative adversarial networks." arXiv preprint

arXiv:1612.03242 (2016).

StackGAN

Generative Models

Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-

consistent adversarial networks." arXiv preprint

arXiv:1703.10593 (2017).

CycleGAN

Generative Models

Paired Data Unpaired Data

Wolterink, Jelmer M., et al. "Deep MR to CT synthesis using unpaired

data." International Workshop on Simulation and Synthesis in Medical

Imaging. Springer, Cham, 2017.

Generative Models

22/11/15

Vue.ai

Deep Reinforcement Learning

Mnih, Volodymyr, et al. "Playing atari with deep reinforcement

learning." arXiv preprint arXiv:1312.5602 (2013).

DQN - 600 epochs

Silver, David, et al. "Mastering the game of go without human

knowledge." Nature 550.7676 (2017): 354-359.

AlphaGo defeats Lee Sedol

Deep Reinforcement Learning

Moravčík, Matej, et al. "Deepstack: Expert-level artificial intelligence in

no-limit poker." arXiv preprint arXiv:1701.01724(2017).

DeepStack

Introduction

For Deep Learning, you need:

1. Artificial Neural Network

2. Loss

3. Optimizer

4. Data

Artificial Neurons

Feedforward Recurrent

Artificial Neurons

𝑜 = 𝑓 𝑥 = 𝑓 𝒘𝑻𝒊 + 𝒃

i1

i2

i3

o

w1i1

w2i2

w3i3b

x

Artificial Neurons

Artificial Neurons

i1

i2

i3

o 𝑜 = 𝜎 𝑥 = 𝜎 𝒘𝑻𝒊 + 𝒃

w1i1

w2i2

w3i3b

x

Logistic Regression

Neural Networks

x1

x2 h2

h1

y

i1

i2

o

Support

Vector

Machine

Input

Hid

den

Outp

ut

Neural Networks

x1

x2

h2

h1

y

h5

h4

h1

h2 h3

h4 h5 h6 h7

h3

h6

h7

x1 x2

y

Sethi, Ishwar Krishnan. "Entropy nets: From decision trees to neural

networks." Proceedings of the IEEE 78.10 (1990): 1605-1613

Neural Networks

x1

x2

h2

h1

y

h5

h4

h1

h2 h3

h4 h5 h6 h7

h3

h6

h7

x1 x2

y

Sethi, Ishwar Krishnan. "Entropy nets: From decision trees to neural

networks." Proceedings of the IEEE 78.10 (1990): 1605-1613

x1

x2

h2

h1

y

h9

h8

h3

h10

h11

h5

h4

h6

h13

h12

h14

h15

h7

h1

h2 h3

h4 h5 h6 h7

x1 x2

h8 h9 h11h10 h13h12 h14 h15

y

Neural Networks

x1

x2 h2

h1

y

𝑓 𝑥2 = 𝜎(𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2)

𝑓 ℎ2 = 𝜎(𝑤ℎ2,𝑥1𝑓 𝑥1 +𝑤ℎ2,𝑥2𝑓 𝑥2 + 𝑏ℎ2)

= 𝜎(𝑤ℎ2,𝑥1𝜎 𝑖1𝑤𝑥1,𝑖1 + 𝑏𝑥1 +𝑤ℎ2,𝑥2𝜎 𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2 + 𝑏ℎ2)

𝑓 𝑦 = 𝜎(𝑤𝑦,ℎ1𝑓 ℎ1 + 𝑤𝑦,ℎ2𝑓 ℎ2 + 𝑏𝑦)

= 𝜎(𝑤𝑦,ℎ1𝜎(𝑤ℎ1,𝑥1𝜎 𝑖1𝑤𝑥1+ 𝑏𝑥1

+𝑤1,𝑥2𝜎 𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2 + 𝑏ℎ1)

+ 𝑤𝑦,ℎ2𝜎(𝑤ℎ2,𝑥1𝜎 𝑖1𝑤𝑥1,𝑖1 + 𝑏𝑥1+𝑤ℎ2,𝑥2𝜎 𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2 + 𝑏ℎ2)

+ 𝑏𝑦)

i1

i2

o

17 parameters θ = {w, b}

Backpropagation

1. Random θ initialization

Iterate:

1. Forward - compute loss

2. Backward - update parameters

forward pass

backward pass

Backpropagation

x1

x2 h2

h1

y

i1

i2

i1 i2 o

0 0 0

0 1 1

1 0 1

1 1 0

ො𝑦 ≈ 𝑃(𝑜)

XOR

forward pass

backward pass

𝐽 𝑜, ො𝑦 =1

2(𝑜 − ො𝑦)2

𝛻𝜃𝐽 𝑜, ො𝑦 =𝜕𝐽

𝜕𝑤𝑥1,𝑖1

,𝜕𝐽

𝜕𝑏𝑥1,

𝜕𝐽

𝜕𝑤𝑥2,𝑖2

,𝜕𝐽

𝜕𝑏𝑥2, … ,

𝜕𝐽

𝜕𝑤𝑦,ℎ2

𝑇

Backpropagation

J

w

𝜕𝐽

𝜕𝑤

forward pass

backward pass

𝑤′ = 𝑤 + 𝛼𝜕𝐽

𝜕𝑤

learning rate

Backpropagation

x1

x2 h2

h1

y

i1

i2

ො𝑦 ≈ 𝑜

𝜕𝐽

𝜕𝑤𝑦,ℎ1

=𝜕𝐽

𝜕 ො𝑦∗

𝜕 ො𝑦

𝜕𝑤𝑦,ℎ1

=−𝜎 ො𝑦 1 − 𝜎 ො𝑦 𝑓 ℎ1

…

Backpropagation

x1

x2 h2

h1

y

i1

i2

ො𝑦 ≈ 𝑜

𝜕𝐽

𝜕𝑤ℎ1,𝑥1

=𝜕𝐽

𝜕𝑦∗𝜕𝑦

𝜕ℎ1∗

𝜕ℎ1𝜕𝑤ℎ1,𝑥1

𝜕𝐽

𝜕𝑤ℎ2,𝑥2

=𝜕𝐽

𝜕𝑦∗𝜕𝑦

𝜕ℎ2∗

𝜕ℎ2𝜕𝑤ℎ2,𝑥2

Backpropagation

x1

x2 h2

h1

y

i1

i2

ො𝑦 ≈ 𝑜

𝜕𝐽

𝜕𝑤𝑥1,𝑖1

=𝜕𝐽

𝜕𝑦∗𝜕𝑦

𝜕ℎ1∗𝜕ℎ1𝜕𝑥1

∗𝜕𝑥1

𝜕𝑤𝑥1,𝑖1

+𝜕𝐽

𝜕𝑦∗𝜕𝑦

𝜕ℎ2∗𝜕ℎ2𝜕𝑥1

∗𝜕𝑥1

𝜕𝑤𝑥1,𝑖1

Optimizers

approx. 𝜕𝐽

𝜕𝑤in batches

1. Gradient Descent

2. Stochastic Gradient Descent

3. Momentum

4. Adagrad/adadelta

5. RMSprop

6. Adam

𝑣 = 𝛾𝑣 + 𝛼𝜕𝐽

𝜕𝑤𝑤′ = 𝑤 + 𝑣

param-wise decaying learning rate

avg. gradients

RMSprop + momentum

𝑤′ = 𝑤 + 𝛼𝜕𝐽

𝜕𝑤

Image courtesy Chris Olah, 2014

Convolutional Neural Networks

CNN/convnet neurons:

1. Have receptive field

2. Share weights

3. Max pooling

Images courtesy Vincent Dumoulin, 2016


CNN/convnet neurons:

1. Have receptive field

2. Share weights

3. Max pooling

Input

Output

Images courtesy Vincent Dumoulin, 2016


90% parametersAlexNet trained using:

1. Dropout

2. Batch Normalization

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet

classification with deep convolutional neural networks." Advances in

neural information processing systems. 2012.


Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet

classification with deep convolutional neural networks." Advances in

neural information processing systems. 2012.


Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional

networks for large-scale image recognition." arXiv preprint

arXiv:1409.1556 (2014).

VGG16

ResNet

He, Kaiming, et al. "Deep residual learning for image

recognition." Proceedings of the IEEE conference on computer vision

and pattern recognition. 2016.

152 convolutional layers

Skip (residual) connections

GoogLeNet

Szegedy, Christian, et al. "Going deeper with

convolutions." Proceedings of the IEEE conference on computer vision

and pattern recognition. 2015.

1. Deep Supervision helps training

2. 1x1 convolutions can replace fully-connected layers

NeuroImaging Applications

1. Alzheimer’s Prediction

2. T1w MRI Quality Control

3. MRI Tissue Segmentation

4. PET Brain Extraction

id t x1 x2 x3 x4 x5 x6 DX

1 0 0.10 0.25 -0.20 0.01 Healthy

1 1 -0.20 0.01 Healthy

1 2 0.21 0.14 -0.31 0.01 MCI

1 3 0.12 0.32 -0.28 0.11 MCI

2 0 -0.01 0.35 -0.42 0.29 0.20 MCI

2 1 0.03 0.40 -0.82 MCI

2 2 0.10 0.89 -0.21 Alzheimer’s

Patient 1

Patient 2

…

…

id t x1 x2 x3 x4 x5 x6 DX

1 0 0.10 0.25 -0.20 0.01 -0.20 0.01 Healthy

1 1 0.10 0.25 -0.20 0.01 -0.20 0.01 Healthy

1 2 0.21 0.14 -0.31 0.01 -0.20 0.01 MCI

1 3 0.12 0.32 -0.28 0.11 -0.20 0.01 MCI

2 0 -0.01 0.35 -0.20 -0.42 0.29 0.20 MCI

2 1 0.03 0.40 -0.20 -0.82 0.29 0.20 MCI

2 2 0.10 0.89 -0.20 -0.21 0.29 0.20 Alzheimer’s

…

…

𝑃(𝐷𝑋𝑡+Δ𝑡|𝐷𝑋𝑡, 𝑋𝑡)

X

512

1024 1024

512 512

374

Δt

Δt Δt

Δt Δt

3

Δt input layer

fully-connected layer

3-class softmax layer

𝑃(𝐷𝑋𝑡+Δ𝑡|𝐷𝑋𝑡, 𝑋𝑡)

93% Accuracy

Automatic QC of T1w Brain MRI

P(QC|MRI)

16

32

32

64

64

256

256

256

128

2

3x3 convolutional layer

2x2 max pooling layer

fully-connected layer

2-class softmax layer

Automatic QC of T1w Brain MRI

+/- 10 voxels

Dataset Sensitivity Specificity

IBIS 97% 96%

Segmentation

Kamnitsas, Konstantinos, et al. "Efficient multi-scale 3D CNN with fully

connected CRF for accurate brain lesion segmentation." Medical image

analysis 36 (2017): 61-78.

DeepMedic

Segmentation

Çiçek, Özgün, et al. "3D U-Net: learning dense volumetric

segmentation from sparse annotation." International Conference on

Medical Image Computing and Computer-Assisted Intervention.

Springer International Publishing, 2016.

Dilated Convolutions

Yu F, Koltun V. Multi-scale context aggregation by dilated

convolutions. arXiv preprint arXiv:1511.07122. 2015 Nov 23.

Efficient Multi-scale

PET Brain Extraction

Funck, T. et al. Brain tissue segmentation from multiple PET radiotracers.

Poster at Montreal Artificial Intelligence in Neuroscience conference, 2017

prediction

FMZ

RCL

FDOPA

FDG

truthimage

Motion Estimation

Iglesias, Juan Eugenio, et al. "Retrospective head motion estimation in

structural brain MRI with 3D CNNs." International Conference on

Medical Image Computing and Computer-Assisted Intervention.

Springer, Cham, 2017.

Motion Estimation

PASS FAIL

Selvaraju, Ramprasaath R., et al. "Grad-CAM: Why did you say that?

visual explanations from deep networks via gradient-based

localization." arXiv preprint ArXiv:1610.02391 (2016).

Challenges

1. Data quantity

2. Data size

3. Data quality

4. Data expense

5. Data variability

6. Unexpected pathology

Start here

http://keras.io

http://www.deeplearningbook.org/

Autism Prediction

Heinsfeld, Anibal Sólon, et al. "Identification of autism spectrum

disorder using deep learning and the ABIDE dataset." NeuroImage:

Clinical 17 (2018): 16.

Denoising autoencoders

intro to deep learning - mcgill university...computer vision and pattern recognition (cvpr), 2016...

Documents