valentin leveau and titouan lorieul cocomeet 15/12/17 deep learning.pdf · valentin leveau -...
TRANSCRIPT
![Page 1: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/1.jpg)
“Demystifying” Deep LearningValentin Leveau and Titouan Lorieul
Cocomeet 15/12/17
![Page 2: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/2.jpg)
Summary1. Introduction2. From shallow neural networks to (very) deep ones3. Why deep CNN get popularized so late?4. Theoretical focus: learning as an optimization problem5. Practical limitations of neural networks
![Page 3: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/3.jpg)
We now are good at mimicking some part of intelligence : Learning
(Machine) Learning = Learning from examples to do a given task and
generalize to new examples.
Predict some variables given others.
Capture statistical relationships / structure
between observed variables.
Introduction: DL vs ML
AIMachine Learning
Deep Learning
![Page 4: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/4.jpg)
Machine Learning : Supervised Learning It is all about learning a prediction function parametrized by some
parameters
A simple function could be a linear model:
Goal is to learn the parameters so that we predict good values of yi given xi
![Page 5: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/5.jpg)
An image can be modeled by its pixel representation:
Very very high dimensional feature space
Not relevant features
Classification in high dimensional spaces
See is why a machine sees !!!
![Page 6: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/6.jpg)
Find Non Linear Invariant in the data (S. Mallat)
![Page 7: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/7.jpg)
Find a mapping such that the problem becomes linearly separable
We want to produce an abstraction of the image containing highly
discriminative features→ Representation Learning
Find Non Linear Invariant in the data (S. Mallat)
![Page 8: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/8.jpg)
Produce handcrafted intermediate representation of images Image abstraction that contain relevant information
Problem: Decide manually which kind of features are good for the different tasks
Solution: Let the system learn the change of variable
→ Toward Deep Representation Learning
Mid-LevelRepresentation
(BoW, Fisher Vectors,...)
Low-LevelRepresentation(SIFT, GIST,...)
Classification
Label
Query Image Intermediate Representation pipeline
Several strategies to go non linear
![Page 9: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/9.jpg)
11/09/2016Valentin Leveau - Kernelizing spatially consistent visual matches for FGC - 9
• Learning several levels of abstraction of the input signal (compositionality).• Progressive Embedding from input pixels to the label space.• Jointly learning all the trainable modules.
Low-level feature
extractionClassification
Image
Intermediate Representation pipeline Local
pooling
(sum, max, avg, etc.)
High-level feature
extraction
Local pooling
(sum, max, avg, etc.)
Deep (Convolutional) Neural Network
![Page 10: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/10.jpg)
From Shallow Neural Networks to (very) Deep Ones
![Page 11: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/11.jpg)
Perceptron : Simple Elementary Neural Unit
![Page 12: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/12.jpg)
Multi Layer Perceptron 2 layers architecture :
Layer 1: several units in parallel + non-linearities
Layer 2 : final linear classifier unit
W1 W2 W3 W4 W5
input data Deep non linear embedding Classification layer
![Page 13: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/13.jpg)
Multi Layer Perceptron• The nonlinearities are crucial !!!
• Linear combination of linear combinations = Linear combination ”depth is useless)
• Nonlinearities allow to bend the space to get samples linearly separable ”click the curved space)
![Page 14: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/14.jpg)
Optimization strategy : Supervised Learning Define an objective function to minimize with respect to the parameters
Example: least square minimization over training samples
![Page 15: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/15.jpg)
Optimization strategy : Gradient Descent (GD) Update parameters of an objective function in
the direction of the gradient
Several GD strategies: a) All the training samples : Batch GD (exact gradient of J) b) Only one training sample : Stochastic GD c) A subsample of the training set: Mini-batch GD (reasonable approximation of a)
E
![Page 16: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/16.jpg)
t
![Page 17: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/17.jpg)
Gradient descent through deep feedforward models Gradient can be hard to compute A lot of parameters with interdependencies from layer to layer Solution: Deep model is a composition of function
![Page 18: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/18.jpg)
Backpropagation of the gradient
We move co linearly in the direction of input data
![Page 19: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/19.jpg)
Convolutional Neural Network
W1 W2 W3 W4 W5
Such deep non linear mapping still suffer from lack of efficiency to learn invariance We still try to map pixel-wise vectorial representation to the target space. We need stronger assumptions about the model to learn useful invariants for vision.
Solution: Replace linear applications by convolution kernels with learnable params !
![Page 20: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/20.jpg)
A convolutionReplace linear applications by a convolution
![Page 21: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/21.jpg)
Applying a convolution is equivalent to sharing local receptive fields parameters !→ Huge numbers of matrix multiplications → But the number of free parameters is low !
Translation Invariance with Conv Layers
image patch at position (x,y)
Weight sharing:
Activation at position ”x,y) for the k-th filter
![Page 22: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/22.jpg)
Pooling layers Progressively reduce spatial resolution (e.g. max pooling):
Backprop gradient only spatial locations that chosen as max
![Page 23: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/23.jpg)
Why ConvNet got famous so late ?
• Not many theoretical justifications ”compared to CV + Kernel Methods)
• Too much parameters for little datasets and too slow algorithms and hardwares
• High convergence problems ”gradient vanishing, bad normalizations, …)
• No good ways to initialize deep models
→ Motivations in Unsupervised Learning ”2000-2011)
![Page 24: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/24.jpg)
Learn a model to: Project data into a non linear intermediate embedding ”Coding)
Reconstruct its own input from the codes ”Decoding)
PCA’s eigen directions span the same space than linear autoencoder’s
Coding function:
Decoding function:
Reconstruction objective:
Unsupervised Learning example: Autoencoder
min ||X - X_hat||²
h
X X_hat
![Page 25: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/25.jpg)
Unsupervised Learning example: Autoencoder It is more efficient in practice to learn the layers separately.
→ Actually with ReLU, that’s okay.
Stacked autoencoders: Train the first autoencoder layer to reconstruct the input
Use the intermediate representation as input to the next layer and re-apply the process
Fine-tune the model to jointly learn the layers
![Page 26: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/26.jpg)
Why ConvNet got famous so late ?• Large scale labelled dataset : ImageNet ”15 M)
• Hardware acceleration : GPU
![Page 27: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/27.jpg)
• The ReLU nonlinearity : • The killer detail : as good (if not better) as unsupervised pretraining !
What Changed ?
![Page 28: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/28.jpg)
What Changed ?• Batch Normalization ”x14 faster !!!)• Dropout
![Page 29: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/29.jpg)
Evolution of the architectures • LeNet1-5
[Y. LeCun et al. 1989]
• AlexNet[A. Krizhevsky et al. 2012]
![Page 30: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/30.jpg)
Evolution of the architectures • VGG Net
[K. Simonyan et al. 2014]
![Page 31: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/31.jpg)
Evolution of the architectures • GoogLeNet
[C. Szegedy et al. 2015]
![Page 32: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/32.jpg)
Evolution of the architectures • Inception modules
![Page 33: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/33.jpg)
Evolution of the architectures • ResNet ”152 layers, even 1,000 ...)
[K. He et al. 2015]
![Page 34: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/34.jpg)
Evolution of the architectures • Densenet:
[Liuzhuang et al. 2016]
![Page 35: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/35.jpg)
Setting hyper-parameters of DL is a Nightmare !
• A lot more of good practices to make it work:
• Data-augmentation ”random flip, crop, rotation, etc …)
• Momentum and adaptive learning rate methods
• Learning rate tuning and policy
• Learn an ensemble of models (and aggregate somehow their predictions)
![Page 36: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/36.jpg)
Setting hyper-parameters of DL is a Nightmare !
• Learning rate decay
![Page 37: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/37.jpg)
Transfer learning (fine-tuning)Problem: CNNs require huge training data to learn the millions of parameters Solution: Learn domain specific features by transfer learning
1. Train CNN on a generalist image dataset with millions of images
![Page 38: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/38.jpg)
Transfer learning (fine-tuning)Problem: CNNs require huge training data to learn the millions of parameters Solution: Learn domain specific features by transfer learning
1. Train CNN on a generalist image dataset with millions of images2. Keep the weights of the lowest layers but remove/reset the top layers
![Page 39: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/39.jpg)
Transfer learning (fine-tuning)Problem: CNNs require huge training data to learn the millions of parameters Solution: Learn domain specific features by transfer learning
1. Train CNN on a generalist image dataset with millions of images2. Keep the weights of the lowest layers but remove/reset the top layers3. Feed forward and back-propagate new domain specific images (with usually
a different number of classes C)
New output layer
Layer 7
![Page 40: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/40.jpg)
The power of transfer learningTransfer learning usually works for any domain
Table 1 - accuracy measured on several fine-grained image classification datasets
Even very specific ones:
Rice seeds varieties recognition 100 classes, 1 500 texture images
GoogLeNet trained from scratch 58.1%
GoogLeNet pre-trained on ImageNet 70.4%
Herbaria species recognition 255 classes, 11K herbaria sheets
Trademark Logos Car models Paris Buildings Aircraft models Bird species Flower species
GoogLeNet trained from scratch 67.7% 59.3% 55.3% 72.7% 24.4% 59.5%
GoogLeNet pre-trained on ImageNet 87.5% 79.9% 71.3% 88.1% 72.4% 89.5%
GoogLeNet trained from scratch 8.8%
GoogLeNet pre-trained on ImageNet 52.4%
![Page 41: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/41.jpg)
Frameworks
From academia…
...to industry
etc.
ToolkitHardware
GPUs…
Mobile…
FPGA… Project Brainwave (Microsoft)
And others… TPUs (Google), etc.
![Page 42: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/42.jpg)
Applications
![Page 43: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/43.jpg)
Plant species recognition: Pl@ntNet
![Page 44: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/44.jpg)
Localization / Segmentation (Deep Mask)
![Page 45: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/45.jpg)
Style Transfer to real and sketch images
![Page 46: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/46.jpg)
Image Generation with GAN[S. Chopra et al. 2015]
![Page 47: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/47.jpg)
Generative Adversarial Network (GAN)[I. GoodFellow et al. 2014]
![Page 48: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/48.jpg)
Logic with Deep Learning
![Page 49: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/49.jpg)
NLP with Memory Networks
![Page 50: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/50.jpg)
NLP with Memory Networks
![Page 51: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/51.jpg)
Image Captioning via attention based models
![Page 52: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/52.jpg)
“Reverse Image Captioning” :
![Page 53: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/53.jpg)
Ressources Cours de Yann LeCun au Collège de France
Intervention de Stéphane Mallat au Collège de France
The Deep Learning Book
Pattern Recognition and Machine Learning
![Page 54: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/54.jpg)
Questions
![Page 55: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/55.jpg)
Theoretical Focus:Learning as an Optimization Problem
![Page 56: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/56.jpg)
Statistical Learning Theory
Data generating process: supervised learning
X × Y probability space Input space X , e.g. Rd, images, ... Output space Y, e.g. R, 0, 1, 0, 1, ...,K − 1, ...
Data drawn from p(x, y) is fixed but unknown probability models uncertainty p(x, y) = p(y|x)p(x)
We have a sample set Tn = (xi, yi), i = 1, 2, ..., n
Objective
Find the function f that best predict y given x.
y = f(x)
![Page 57: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/57.jpg)
Objective
Find the function f that best predict y given x.
y = f(x)
Formalism
Reduce search to a parametrized function class f(.; θ) e.g. linear regression f(x;w, b) = wTx+ b, Θ = R
d × R
Loss function: l(y, y) = l(y, f(x; θ)) e.g. squared loss l(y, y) = (y − y)2
Risk function: J∗(θ) = Ex,y[l(y, f(x; θ))]
Optimization problem 1
minθ∈Θ
J∗(θ)
![Page 58: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/58.jpg)
Statistical Learning Theory
Optimization problem 1
minθ∈Θ
Ex,y[l(y, f(x; θ))]
p(x, y) is unknown, we cannot compute the expectation.
Use empirical risk instead J(θ) = 1
n
∑i l(yi, f(xi; θ))
Optimization problem 2
minθ∈Θ
J(θ)
![Page 59: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/59.jpg)
Optimization algorithms
Usually, we have convex objective functions and Θ ⊂ Rp. We
know how to deal with these optimization problems:
if derivable, first-order gradient approaches: GradientDescent, Stochastic Gradient Descent, ...
if two time derivable, second-order approaches:Newton-Raphson, Quasi-Newton, ...
...
![Page 60: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/60.jpg)
Optimization algorithms
Usually, we have convex objective functions and Θ ⊂ Rp. We
know how to deal with these optimization problems:
if derivable, first-order gradient approaches: GradientDescent, Stochastic Gradient Descent, ...
if two time derivable, second-order approaches:Newton-Raphson, Quasi-Newton, ...
...
However, when measured on a test set, the error is high. Thefitted model does not generalize on new input...
Why?
![Page 61: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/61.jpg)
Underfitting VS Overfitting
![Page 62: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/62.jpg)
Regularization term
Optimization problem 3
minθ∈Θ
J(θ) + λΩ(θ)
where λ is the regularization parameter (hyperparameter)
Examples:
L1 regularization: Ω(θ) = ‖θ‖1 =∑
k |θk|
L2 regularization: Ω(θ) = 1
2‖θ‖2
2= 1
2
∑k θ
2
k
Backed by theory: VC theory, PAC learning
ǫ2 ≤log |H|+ log 1
δ
2n
![Page 63: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/63.jpg)
What about Neural Networks?
Optimization problem
definitely not convex
needs a lot of engineering to make it converge tosomething: good initialization, ReLU activations, lots oftricks to avoid gradient from exploding, etc.
...but somehow Stochastic Gradient Descent works well
Generalization
no regularization term (but other approaches)
huge capacity (universal approximation theorem): classictheory do not provide a good bound on generalization...
In the end, we do not even optimize what we would like(surrogate loss), we just monitor the validation error in order toknow when we stop...
![Page 64: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/64.jpg)
What about Neural Networks?
Open question: Why does it work?
Optimization
empirically: a lot of minima reached by SGD are equivalentin term of generalization error1
Generalization
new theories rising?2
1Choromanska et al. 2015; Goodfellow and Vinyals 2014.2Shwartz-Ziv and Tishby 2017.
![Page 65: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/65.jpg)
Practical Limitations of Neural Networks
![Page 66: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/66.jpg)
Some current limitations
Interpretability
building predictive models can be useful to performanalysis on a set of variables (e.g. economics, ecology, ...)
... given we can understand what the model has learned
but neural networks are black boxes
Can be fooled
no measure of prediction uncertainty usually outputs only some sort of scores that can not be
interpreted as uncertainty estimates Bayesian Neural Networks (BNN)3
easy to find adversarial examples need for formal verification
3Kendall and Gal 2017.
![Page 67: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/67.jpg)
Some failures...
Because neural networks have allowed us to make considerableprogress in a lot of different tasks, we might considerer themsolved... but there are not.
![Page 68: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/68.jpg)
Some failures...
![Page 69: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/69.jpg)
Some failures...
![Page 70: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/70.jpg)
Some failures...
![Page 71: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/71.jpg)
Adversarial examples4
4Goodfellow, Shlens, et al. 2014; Kurakin et al. 2016.
![Page 72: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/72.jpg)
Verification
For safety-critical systems, we need formal guarantees on thebehavior of the algorithms, e.g.:
check properties of DNN implementation of next-generationairborne collision avoidance system for unmanned aircraft5
verify that there is no adversarial example around a testpoint6
5Katz et al. 2017.6Huang et al. 2017.
![Page 73: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/73.jpg)
Conclusion
Deep Learning has been very successful and has a very largearray of applications.
Proved that the current statistical learning theories areinsufficient.
We need to rethink of generalization in high dimension.
We need to find a new learning theory.
Lack of theoretical understanding limits their usage as blackbox modules in software because they do not provide strongcontracts (hence need for formal verification).
There is a current on-going debate in the community (e.g. seeAli Rahimi’s NIPS 2017 Test-of-Time award talk).
![Page 74: Valentin Leveau and Titouan Lorieul Cocomeet 15/12/17 Deep Learning.pdf · Valentin Leveau - Kernelizing spatially consistent visual matches for FGC 11/09/2016- 9 • Learning several](https://reader036.vdocument.in/reader036/viewer/2022063000/5f118bbafa03a25d885b0aab/html5/thumbnails/74.jpg)
Questions?