intro to deep learning - mcgill university...computer vision and pattern recognition (cvpr), 2016...
TRANSCRIPT
Intro to
Deep Learningfor
NeuroImaging
Andrew Doyle
@crocodoyle
McGill Centre for Integrative Neuroscience
Outline
1. GET EXCITED
2. Artificial Neural Networks
3. Backpropagation
4. Convolutional Neural Networks
5. Neuroimaging Applications
ImageNet-1000 Results
Image courtesy Aaron Courville, 2016
Generative Models
Deep Blood by Team BloodArtBrainBrushGatys, Leon A., Alexander S. Ecker, and Matthias Bethge. "Image style
transfer using convolutional neural networks." Computer Vision and
Pattern Recognition (CVPR), 2016 IEEE Conference on. IEEE, 2016.
Generative Models
Zhang, Han, et al. "StackGAN: Text to photo-realistic image synthesis
with stacked generative adversarial networks." arXiv preprint
arXiv:1612.03242 (2016).
StackGAN
Generative Models
Zhu, Jun-Yan, et al. "Unpaired image-to-image translation using cycle-
consistent adversarial networks." arXiv preprint
arXiv:1703.10593 (2017).
CycleGAN
Generative Models
Paired Data Unpaired Data
Wolterink, Jelmer M., et al. "Deep MR to CT synthesis using unpaired
data." International Workshop on Simulation and Synthesis in Medical
Imaging. Springer, Cham, 2017.
Generative Models
22/11/15
Vue.ai
Deep Reinforcement Learning
Mnih, Volodymyr, et al. "Playing atari with deep reinforcement
learning." arXiv preprint arXiv:1312.5602 (2013).
DQN - 600 epochs
Silver, David, et al. "Mastering the game of go without human
knowledge." Nature 550.7676 (2017): 354-359.
AlphaGo defeats Lee Sedol
Deep Reinforcement Learning
Moravčík, Matej, et al. "Deepstack: Expert-level artificial intelligence in
no-limit poker." arXiv preprint arXiv:1701.01724(2017).
DeepStack
Introduction
For Deep Learning, you need:
1. Artificial Neural Network
2. Loss
3. Optimizer
4. Data
Artificial Neurons
Feedforward Recurrent
Artificial Neurons
𝑜 = 𝑓 𝑥 = 𝑓 𝒘𝑻𝒊 + 𝒃
i1
i2
i3
o
w1i1
w2i2
w3i3b
x
Artificial Neurons
Artificial Neurons
i1
i2
i3
o 𝑜 = 𝜎 𝑥 = 𝜎 𝒘𝑻𝒊 + 𝒃
w1i1
w2i2
w3i3b
x
Logistic Regression
Neural Networks
x1
x2 h2
h1
y
i1
i2
o
Support
Vector
Machine
Input
Hid
den
Outp
ut
Neural Networks
x1
x2
h2
h1
y
h5
h4
h1
h2 h3
h4 h5 h6 h7
h3
h6
h7
x1 x2
y
Sethi, Ishwar Krishnan. "Entropy nets: From decision trees to neural
networks." Proceedings of the IEEE 78.10 (1990): 1605-1613
Neural Networks
x1
x2
h2
h1
y
h5
h4
h1
h2 h3
h4 h5 h6 h7
h3
h6
h7
x1 x2
y
Sethi, Ishwar Krishnan. "Entropy nets: From decision trees to neural
networks." Proceedings of the IEEE 78.10 (1990): 1605-1613
x1
x2
h2
h1
y
h9
h8
h3
h10
h11
h5
h4
h6
h13
h12
h14
h15
h7
h1
h2 h3
h4 h5 h6 h7
x1 x2
h8 h9 h11h10 h13h12 h14 h15
y
Neural Networks
x1
x2 h2
h1
y
𝑓 𝑥2 = 𝜎(𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2)
𝑓 ℎ2 = 𝜎(𝑤ℎ2,𝑥1𝑓 𝑥1 +𝑤ℎ2,𝑥2𝑓 𝑥2 + 𝑏ℎ2)
= 𝜎(𝑤ℎ2,𝑥1𝜎 𝑖1𝑤𝑥1,𝑖1 + 𝑏𝑥1 +𝑤ℎ2,𝑥2𝜎 𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2 + 𝑏ℎ2)
𝑓 𝑦 = 𝜎(𝑤𝑦,ℎ1𝑓 ℎ1 + 𝑤𝑦,ℎ2𝑓 ℎ2 + 𝑏𝑦)
= 𝜎(𝑤𝑦,ℎ1𝜎(𝑤ℎ1,𝑥1𝜎 𝑖1𝑤𝑥1+ 𝑏𝑥1
+𝑤1,𝑥2𝜎 𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2 + 𝑏ℎ1)
+ 𝑤𝑦,ℎ2𝜎(𝑤ℎ2,𝑥1𝜎 𝑖1𝑤𝑥1,𝑖1 + 𝑏𝑥1+𝑤ℎ2,𝑥2𝜎 𝑖2𝑤𝑥2,𝑖2 + 𝑏𝑥2 + 𝑏ℎ2)
+ 𝑏𝑦)
i1
i2
o
17 parameters θ = {w, b}
Backpropagation
1. Random θ initialization
Iterate:
1. Forward - compute loss
2. Backward - update parameters
forward pass
backward pass
Backpropagation
x1
x2 h2
h1
y
i1
i2
i1 i2 o
0 0 0
0 1 1
1 0 1
1 1 0
ො𝑦 ≈ 𝑃(𝑜)
XOR
forward pass
backward pass
𝐽 𝑜, ො𝑦 =1
2(𝑜 − ො𝑦)2
𝛻𝜃𝐽 𝑜, ො𝑦 =𝜕𝐽
𝜕𝑤𝑥1,𝑖1
,𝜕𝐽
𝜕𝑏𝑥1,
𝜕𝐽
𝜕𝑤𝑥2,𝑖2
,𝜕𝐽
𝜕𝑏𝑥2, … ,
𝜕𝐽
𝜕𝑤𝑦,ℎ2
𝑇
Backpropagation
J
w
𝜕𝐽
𝜕𝑤
forward pass
backward pass
𝑤′ = 𝑤 + 𝛼𝜕𝐽
𝜕𝑤
learning rate
Backpropagation
x1
x2 h2
h1
y
i1
i2
ො𝑦 ≈ 𝑜
𝜕𝐽
𝜕𝑤𝑦,ℎ1
=𝜕𝐽
𝜕 ො𝑦∗
𝜕 ො𝑦
𝜕𝑤𝑦,ℎ1
=−𝜎 ො𝑦 1 − 𝜎 ො𝑦 𝑓 ℎ1
…
Backpropagation
x1
x2 h2
h1
y
i1
i2
ො𝑦 ≈ 𝑜
𝜕𝐽
𝜕𝑤ℎ1,𝑥1
=𝜕𝐽
𝜕𝑦∗𝜕𝑦
𝜕ℎ1∗
𝜕ℎ1𝜕𝑤ℎ1,𝑥1
𝜕𝐽
𝜕𝑤ℎ2,𝑥2
=𝜕𝐽
𝜕𝑦∗𝜕𝑦
𝜕ℎ2∗
𝜕ℎ2𝜕𝑤ℎ2,𝑥2
Backpropagation
x1
x2 h2
h1
y
i1
i2
ො𝑦 ≈ 𝑜
𝜕𝐽
𝜕𝑤𝑥1,𝑖1
=𝜕𝐽
𝜕𝑦∗𝜕𝑦
𝜕ℎ1∗𝜕ℎ1𝜕𝑥1
∗𝜕𝑥1
𝜕𝑤𝑥1,𝑖1
+𝜕𝐽
𝜕𝑦∗𝜕𝑦
𝜕ℎ2∗𝜕ℎ2𝜕𝑥1
∗𝜕𝑥1
𝜕𝑤𝑥1,𝑖1
Optimizers
approx. 𝜕𝐽
𝜕𝑤in batches
1. Gradient Descent
2. Stochastic Gradient Descent
3. Momentum
4. Adagrad/adadelta
5. RMSprop
6. Adam
𝑣 = 𝛾𝑣 + 𝛼𝜕𝐽
𝜕𝑤𝑤′ = 𝑤 + 𝑣
param-wise decaying learning rate
avg. gradients
RMSprop + momentum
𝑤′ = 𝑤 + 𝛼𝜕𝐽
𝜕𝑤
Image courtesy Chris Olah, 2014
Convolutional Neural Networks
CNN/convnet neurons:
1. Have receptive field
2. Share weights
3. Max pooling
Images courtesy Vincent Dumoulin, 2016
Convolutional Neural Networks
CNN/convnet neurons:
1. Have receptive field
2. Share weights
3. Max pooling
Input
Output
Images courtesy Vincent Dumoulin, 2016
Convolutional Neural Networks
90% parametersAlexNet trained using:
1. Dropout
2. Batch Normalization
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet
classification with deep convolutional neural networks." Advances in
neural information processing systems. 2012.
Convolutional Neural Networks
Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "ImageNet
classification with deep convolutional neural networks." Advances in
neural information processing systems. 2012.
Convolutional Neural Networks
Simonyan, Karen, and Andrew Zisserman. "Very deep convolutional
networks for large-scale image recognition." arXiv preprint
arXiv:1409.1556 (2014).
VGG16
Convolutional Neural Networks
ResNet
He, Kaiming, et al. "Deep residual learning for image
recognition." Proceedings of the IEEE conference on computer vision
and pattern recognition. 2016.
152 convolutional layers
Skip (residual) connections
GoogLeNet
Szegedy, Christian, et al. "Going deeper with
convolutions." Proceedings of the IEEE conference on computer vision
and pattern recognition. 2015.
1. Deep Supervision helps training
2. 1x1 convolutions can replace fully-connected layers
NeuroImaging Applications
1. Alzheimer’s Prediction
2. T1w MRI Quality Control
3. MRI Tissue Segmentation
4. PET Brain Extraction
id t x1 x2 x3 x4 x5 x6 DX
1 0 0.10 0.25 -0.20 0.01 Healthy
1 1 -0.20 0.01 Healthy
1 2 0.21 0.14 -0.31 0.01 MCI
1 3 0.12 0.32 -0.28 0.11 MCI
2 0 -0.01 0.35 -0.42 0.29 0.20 MCI
2 1 0.03 0.40 -0.82 MCI
2 2 0.10 0.89 -0.21 Alzheimer’s
Patient 1
Patient 2
…
…
id t x1 x2 x3 x4 x5 x6 DX
1 0 0.10 0.25 -0.20 0.01 -0.20 0.01 Healthy
1 1 0.10 0.25 -0.20 0.01 -0.20 0.01 Healthy
1 2 0.21 0.14 -0.31 0.01 -0.20 0.01 MCI
1 3 0.12 0.32 -0.28 0.11 -0.20 0.01 MCI
2 0 -0.01 0.35 -0.20 -0.42 0.29 0.20 MCI
2 1 0.03 0.40 -0.20 -0.82 0.29 0.20 MCI
2 2 0.10 0.89 -0.20 -0.21 0.29 0.20 Alzheimer’s
…
…
𝑃(𝐷𝑋𝑡+Δ𝑡|𝐷𝑋𝑡, 𝑋𝑡)
X
512
1024 1024
512 512
374
Δt
Δt Δt
Δt Δt
3
Δt input layer
fully-connected layer
3-class softmax layer
𝑃(𝐷𝑋𝑡+Δ𝑡|𝐷𝑋𝑡, 𝑋𝑡)
93% Accuracy
Automatic QC of T1w Brain MRI
P(QC|MRI)
16
32
32
64
64
256
256
256
128
2
3x3 convolutional layer
2x2 max pooling layer
fully-connected layer
2-class softmax layer
Automatic QC of T1w Brain MRI
+/- 10 voxels
Dataset Sensitivity Specificity
IBIS 97% 96%
Segmentation
Kamnitsas, Konstantinos, et al. "Efficient multi-scale 3D CNN with fully
connected CRF for accurate brain lesion segmentation." Medical image
analysis 36 (2017): 61-78.
DeepMedic
Segmentation
Çiçek, Özgün, et al. "3D U-Net: learning dense volumetric
segmentation from sparse annotation." International Conference on
Medical Image Computing and Computer-Assisted Intervention.
Springer International Publishing, 2016.
Dilated Convolutions
Yu F, Koltun V. Multi-scale context aggregation by dilated
convolutions. arXiv preprint arXiv:1511.07122. 2015 Nov 23.
Efficient Multi-scale
PET Brain Extraction
Funck, T. et al. Brain tissue segmentation from multiple PET radiotracers.
Poster at Montreal Artificial Intelligence in Neuroscience conference, 2017
prediction
FMZ
RCL
FDOPA
FDG
truthimage
Motion Estimation
Iglesias, Juan Eugenio, et al. "Retrospective head motion estimation in
structural brain MRI with 3D CNNs." International Conference on
Medical Image Computing and Computer-Assisted Intervention.
Springer, Cham, 2017.
Motion Estimation
PASS FAIL
Selvaraju, Ramprasaath R., et al. "Grad-CAM: Why did you say that?
visual explanations from deep networks via gradient-based
localization." arXiv preprint ArXiv:1610.02391 (2016).
Challenges
1. Data quantity
2. Data size
3. Data quality
4. Data expense
5. Data variability
6. Unexpected pathology
Start here
http://keras.io
http://www.deeplearningbook.org/
Autism Prediction
Heinsfeld, Anibal Sólon, et al. "Identification of autism spectrum
disorder using deep learning and the ABIDE dataset." NeuroImage:
Clinical 17 (2018): 16.
Denoising autoencoders