intro to deep learning - mcgill university...computer vision and pattern recognition (cvpr), 2016...

Intro to

Deep Learningfor

NeuroImaging

Andrew Doyle

@crocodoyle

McGill Centre for Integrative Neuroscience

Outline

1. GET EXCITED

2. Artificial Neural Networks

3. Backpropagation

4. Convolutional Neural Networks

5. Neuroimaging Applications

ImageNet-1000 Results

Image courtesy Aaron Courville, 2016

Generative Models

Deep Blood by Team BloodArtBrainBrushGatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style

transfer using convolutional neural networks." Computer Vision and

Pattern Recognition (CVPR), 2016 IEEE Conference on. IEEE, 2016.

Generative Models

Zhang, Han, et al. "StackGAN: Text to photo-realistic image synthesis

with stacked generative adversarial networks." arXiv preprint

arXiv:1612.03242 (2016).

StackGAN

Generative Models

Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-

consistent adversarial networks." arXiv preprint

arXiv:1703.10593 (2017).

CycleGAN

Generative Models

Paired Data Unpaired Data

Wolterink, Jelmer M., et al. "Deep MR to CT synthesis using unpaired

data." International Workshop on Simulation and Synthesis in Medical

Imaging. Springer, Cham, 2017.

Generative Models

22/11/15

Vue.ai

Deep Reinforcement Learning

Mnih, Volodymyr, et al. "Playing atari with deep reinforcement

learning." arXiv preprint arXiv:1312.5602 (2013).

DQN - 600 epochs

Silver, David, et al. "Mastering the game of go without human

knowledge." Nature 550.7676 (2017): 354-359.

AlphaGo defeats Lee Sedol

Deep Reinforcement Learning

Moravčík, Matej, et al. "Deepstack: Expert-level artificial intelligence in

no-limit poker." arXiv preprint arXiv:1701.01724(2017).

DeepStack

Introduction

For Deep Learning, you need:

1. Artificial Neural Network

2. Loss

3. Optimizer

4. Data

Artificial Neurons

Feedforward Recurrent

Artificial Neurons

𝑜 = 𝑓 𝑥 = 𝑓 𝒘𝑻𝒊 + 𝒃

Artificial Neurons

o 𝑜 = 𝜎 𝑥 = 𝜎 𝒘𝑻𝒊 + 𝒃

Logistic Regression

Neural Networks

Support

Vector

Machine

Neural Networks

h4 h5 h6 h7

Sethi, Ishwar Krishnan. "Entropy nets: From decision trees to neural

networks." Proceedings of the IEEE 78.10 (1990): 1605-1613

Neural Networks

h4 h5 h6 h7

Sethi, Ishwar Krishnan. "Entropy nets: From decision trees to neural

networks." Proceedings of the IEEE 78.10 (1990): 1605-1613

h4 h5 h6 h7

h8 h9 h11h10 h13h12 h14 h15

Neural Networks

𝑓 𝑥2 = 𝜎(𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2)

𝑓 ℎ2 = 𝜎(𝑤ℎ2,𝑥1𝑓 𝑥1 +𝑤ℎ2,𝑥2𝑓 𝑥2 + 𝑏ℎ2)

= 𝜎(𝑤ℎ2,𝑥1𝜎 𝑖1𝑤𝑥1,𝑖1 + 𝑏𝑥1 +𝑤ℎ2,𝑥2𝜎 𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2 + 𝑏ℎ2)

𝑓 𝑦 = 𝜎(𝑤𝑦,ℎ1𝑓 ℎ1 + 𝑤𝑦,ℎ2𝑓 ℎ2 + 𝑏𝑦)

= 𝜎(𝑤𝑦,ℎ1𝜎(𝑤ℎ1,𝑥1𝜎 𝑖1𝑤𝑥1+ 𝑏𝑥1

+𝑤1,𝑥2𝜎 𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2 + 𝑏ℎ1)

+ 𝑤𝑦,ℎ2𝜎(𝑤ℎ2,𝑥1𝜎 𝑖1𝑤𝑥1,𝑖1 + 𝑏𝑥1+𝑤ℎ2,𝑥2𝜎 𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2 + 𝑏ℎ2)

+ 𝑏𝑦)

17 parameters θ = {w, b}

Backpropagation

1. Random θ initialization

Iterate:

1. Forward - compute loss

2. Backward - update parameters

forward pass

backward pass

Backpropagation

i1 i2 o

ො𝑦 ≈ 𝑃(𝑜)

forward pass

backward pass

𝐽 𝑜, ො𝑦 =1

2(𝑜 − ො𝑦)2

𝛻𝜃𝐽 𝑜, ො𝑦 =𝜕𝐽

𝜕𝑤𝑥1,𝑖1

,𝜕𝐽

𝜕𝑏𝑥1,

𝜕𝐽

𝜕𝑤𝑥2,𝑖2

,𝜕𝐽

𝜕𝑏𝑥2, … ,

𝜕𝐽

𝜕𝑤𝑦,ℎ2

Backpropagation

𝜕𝐽

𝜕𝑤

forward pass

backward pass

𝑤′ = 𝑤 + 𝛼𝜕𝐽

𝜕𝑤

learning rate

Backpropagation

ො𝑦 ≈ 𝑜

𝜕𝐽

𝜕𝑤𝑦,ℎ1

=𝜕𝐽

𝜕 ො𝑦∗

𝜕 ො𝑦

𝜕𝑤𝑦,ℎ1

=−𝜎 ො𝑦 1 − 𝜎 ො𝑦 𝑓 ℎ1

Backpropagation

ො𝑦 ≈ 𝑜

𝜕𝐽

𝜕𝑤ℎ1,𝑥1

=𝜕𝐽

𝜕𝑦∗𝜕𝑦

𝜕ℎ1∗

𝜕ℎ1𝜕𝑤ℎ1,𝑥1

𝜕𝐽

𝜕𝑤ℎ2,𝑥2

=𝜕𝐽

𝜕𝑦∗𝜕𝑦

𝜕ℎ2∗

𝜕ℎ2𝜕𝑤ℎ2,𝑥2

Backpropagation

ො𝑦 ≈ 𝑜

𝜕𝐽

𝜕𝑤𝑥1,𝑖1

=𝜕𝐽

𝜕𝑦∗𝜕𝑦

𝜕ℎ1∗𝜕ℎ1𝜕𝑥1

∗𝜕𝑥1

𝜕𝑤𝑥1,𝑖1

+𝜕𝐽

𝜕𝑦∗𝜕𝑦

𝜕ℎ2∗𝜕ℎ2𝜕𝑥1

∗𝜕𝑥1

𝜕𝑤𝑥1,𝑖1

Optimizers

approx. 𝜕𝐽

𝜕𝑤in batches

1. Gradient Descent

2. Stochastic Gradient Descent

3. Momentum

4. Adagrad/adadelta

5. RMSprop

6. Adam

𝑣 = 𝛾𝑣 + 𝛼𝜕𝐽

𝜕𝑤𝑤′ = 𝑤 + 𝑣

param-wise decaying learning rate

avg. gradients

RMSprop + momentum

𝑤′ = 𝑤 + 𝛼𝜕𝐽

𝜕𝑤

Image courtesy Chris Olah, 2014

Convolutional Neural Networks

CNN/convnet neurons:

1. Have receptive field

2. Share weights

3. Max pooling

Images courtesy Vincent Dumoulin, 2016

CNN/convnet neurons:

1. Have receptive field

2. Share weights

3. Max pooling

Output

Images courtesy Vincent Dumoulin, 2016

90% parametersAlexNet trained using:

1. Dropout

2. Batch Normalization

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet

classification with deep convolutional neural networks." Advances in

neural information processing systems. 2012.

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet

classification with deep convolutional neural networks." Advances in

neural information processing systems. 2012.

Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional

networks for large-scale image recognition." arXiv preprint

arXiv:1409.1556 (2014).

ResNet

He, Kaiming, et al. "Deep residual learning for image

recognition." Proceedings of the IEEE conference on computer vision

and pattern recognition. 2016.

152 convolutional layers

Skip (residual) connections

GoogLeNet

Szegedy, Christian, et al. "Going deeper with

convolutions." Proceedings of the IEEE conference on computer vision

and pattern recognition. 2015.

1. Deep Supervision helps training

2. 1x1 convolutions can replace fully-connected layers

NeuroImaging Applications

1. Alzheimer’s Prediction

2. T1w MRI Quality Control

3. MRI Tissue Segmentation

4. PET Brain Extraction

id t x1 x2 x3 x4 x5 x6 DX

1 0 0.10 0.25 -0.20 0.01 Healthy

1 1 -0.20 0.01 Healthy

1 2 0.21 0.14 -0.31 0.01 MCI

1 3 0.12 0.32 -0.28 0.11 MCI

2 0 -0.01 0.35 -0.42 0.29 0.20 MCI

2 1 0.03 0.40 -0.82 MCI

2 2 0.10 0.89 -0.21 Alzheimer’s

Patient 1

Patient 2

id t x1 x2 x3 x4 x5 x6 DX

1 0 0.10 0.25 -0.20 0.01 -0.20 0.01 Healthy

1 1 0.10 0.25 -0.20 0.01 -0.20 0.01 Healthy

1 2 0.21 0.14 -0.31 0.01 -0.20 0.01 MCI

1 3 0.12 0.32 -0.28 0.11 -0.20 0.01 MCI

2 0 -0.01 0.35 -0.20 -0.42 0.29 0.20 MCI

2 1 0.03 0.40 -0.20 -0.82 0.29 0.20 MCI

2 2 0.10 0.89 -0.20 -0.21 0.29 0.20 Alzheimer’s

𝑃(𝐷𝑋𝑡+Δ𝑡|𝐷𝑋𝑡, 𝑋𝑡)

1024 1024

512 512

Δt Δt

Δt input layer

fully-connected layer

3-class softmax layer

𝑃(𝐷𝑋𝑡+Δ𝑡|𝐷𝑋𝑡, 𝑋𝑡)

93% Accuracy

Automatic QC of T1w Brain MRI

P(QC|MRI)

3x3 convolutional layer

2x2 max pooling layer

fully-connected layer

2-class softmax layer

Automatic QC of T1w Brain MRI

+/- 10 voxels

Dataset Sensitivity Specificity

IBIS 97% 96%

Segmentation

Kamnitsas, Konstantinos, et al. "Efficient multi-scale 3D CNN with fully

connected CRF for accurate brain lesion segmentation." Medical image

analysis 36 (2017): 61-78.

DeepMedic

Segmentation

Çiçek, Özgün, et al. "3D U-Net: learning dense volumetric

segmentation from sparse annotation." International Conference on

Medical Image Computing and Computer-Assisted Intervention.

Springer International Publishing, 2016.

Dilated Convolutions

Yu F, Koltun V. Multi-scale context aggregation by dilated

convolutions. arXiv preprint arXiv:1511.07122. 2015 Nov 23.

Efficient Multi-scale

PET Brain Extraction

Funck, T. et al. Brain tissue segmentation from multiple PET radiotracers.

Poster at Montreal Artificial Intelligence in Neuroscience conference, 2017

prediction

truthimage

Motion Estimation

Iglesias, Juan Eugenio, et al. "Retrospective head motion estimation in

structural brain MRI with 3D CNNs." International Conference on

Medical Image Computing and Computer-Assisted Intervention.

Springer, Cham, 2017.

Motion Estimation

PASS FAIL

Selvaraju, Ramprasaath R., et al. "Grad-CAM: Why did you say that?

visual explanations from deep networks via gradient-based

localization." arXiv preprint ArXiv:1610.02391 (2016).

Challenges

1. Data quantity

2. Data size

3. Data quality

4. Data expense

5. Data variability

6. Unexpected pathology

Start here

http://keras.io

http://www.deeplearningbook.org/

Autism Prediction

Heinsfeld, Anibal Sólon, et al. "Identification of autism spectrum

disorder using deep learning and the ABIDE dataset." NeuroImage:

Clinical 17 (2018): 16.

Denoising autoencoders

intro to deep learning - mcgill university...computer vision and pattern recognition (cvpr), 2016...

Documents

song robust scale estimation 2014 cvpr paper

proceedings of the 2004 ieee computer society conference on...

2013 ieee conference on computer vision and pattern...

automatic face naming with caption-based supervisionwith...

robust online appearance models for visual...

iccv & cvpr paper reading

lxcv @ cvpr 2021 reviewer mentoring program

bing: binarized normed gradient for objectness estimation at...

large displacement optical flow - eecs at uc berkeleylarge...

mihash: online hashing with mutual information · 2020. 11....

cvpr prototype drugs table

jue wang michael f. cohen ieee cvpr 2007

unsupervised learning of visual taxonomies ieee conference...

2014 ieee conference on computer vision and pattern...

cvpr slides

cvpr 2019 tracking and detection challenge cvpr 2019 ·...

cvpr 2018 openingcvpr2018.thecvf.com/files/cvpr 2018 opening...

ieee cvpr 04

cvpr underwater final

cvpr tutorial final