a hands-on experience on visual object recognition module ... · •batch normalization prevents...
TRANSCRIPT
![Page 1: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/1.jpg)
A Hands-On Experience on
Visual Object Recognition
Project
Visual RecognitionModule 5
Ramon Baldrich
week 5 : Joost van de Weijer,
Marc Masana, German Ros
Coordination
![Page 2: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/2.jpg)
Contents
Tips and tricks to make it work
• Stoachastic gradient descent with momentum
• Initialization
• Vanishing gradient problem
• Overfitting - Drop out
• Batch normalization
![Page 3: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/3.jpg)
Training
‘antelope’
‘ballet’
‘boat’
txty
0
; 0.01
0.99
f
0
; 1
0
f
0.98
; 0.01
0.01
f
1
arg min ; ,t t
t
l f x yT
1
arg min log ;j j
t j
y f xT
Empirical Risk
![Page 4: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/4.jpg)
Training
arg min , ; arg min , ;t t
t
E x y L x y
E
*
E• The can be computed with the backpropagation algorithm (chainrule).
E
![Page 5: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/5.jpg)
Training
![Page 6: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/6.jpg)
Training
![Page 7: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/7.jpg)
Training
![Page 8: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/8.jpg)
Training
![Page 9: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/9.jpg)
![Page 10: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/10.jpg)
![Page 11: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/11.jpg)
Training
arg min , ; arg min , ;t t
t
E x y L x y
E
*
E• The can be computed with the backpropagation algorithm.
E
, ; , ; , ;t t t
t t
E x y L x y L x y
![Page 12: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/12.jpg)
Training
arg min , ; arg min , ;t t
t
E x y L x y
E
*
E• Stochastic gradient descent:
If the data set is highly redundant the gradient over a part will already represent the gradient over the whole data set.
E
, ;t
L x y
Minibatch gradient descent: Use groups (called minibatches) of images to update the parameters.
• Less computation is used (very efficient on GPUs.)• Balance the classes in each mini batch • One round through all data is called an epoc.
![Page 13: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/13.jpg)
Training
, ;t
L x y E
General recipe for stochastic gradient descent:• Guess an initial learning rate.• If the error keeps getting worse or oscillates wildly, reduce the learning rate.• If the error is falling fairly consistently but slowly, increase the learning.• Towards the end of mini-batch learning it nearly always helps to turn down the learning rate.
Hinton
erro
r
![Page 14: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/14.jpg)
Training
, ;t
L x y E
• Exists a lot of literature on more elaborate gradient descent methods.
Hinton
erro
r
0.9 0.1 , ;t
L x y
•Stochastic gradient descent with momentum:
![Page 15: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/15.jpg)
Training + Regularization
‘antelope’
‘ballet’
‘boat’
txty
weight decay
2
k
ij
k i j
W
1
arg min log ;j j
t j
y f xT
![Page 16: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/16.jpg)
Initialization
The initialization of the networks is very important for (fast) convergence of thenetwork.
INPUT: Networks convergence faster if its inputs are whitened (linearly transformedto zero-mean and unit variance and decorrelated)
It is important to observe that weights cannot be initialized with thesame value . You need to break the symmetry.
BIASES: initialized to zero.
WEIGHTS:• Glorot & Bengio[2010] aimed to keep the variance of the outputs at each layer tobe white and unit variance.
• Improvement proposed by He et al [ArXiv 2015] for ReLu
weight_filler { type: "xavier“)6 6
,in out in out
W Un n n n
![Page 17: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/17.jpg)
Training Deep Networks
• Neural networks with one layer can approximate arbitrarely closeany network with multiple layers.
• However as the number of layers increase the number of nodes toexpress the same function decreases exponentially.
Problems• Vanishing gradient problem• Overfitting (number of parameters is often larger then numberof training examples)
![Page 18: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/18.jpg)
slide credit P.Poupart
![Page 19: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/19.jpg)
slide credit P.Poupart
![Page 20: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/20.jpg)
slide credit P.Poupart
![Page 21: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/21.jpg)
slide credit P.Poupart
![Page 22: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/22.jpg)
slide credit P.Poupart
![Page 23: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/23.jpg)
slide credit R.Fergus
• You have to play with the best positioning of the dropout (often it is placed in between the fully connected layers).
Dropout
![Page 24: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/24.jpg)
Batch Normalization
• Batch normalization is recent work from Sergey Ioffe and Christian Szegedy [ICML2015]
• We apply whitening to the input, and choose our initial weights in such a way that theactivations in between are close to whitened (‘Xavier’ inialization).
It would be nice to also ensure whitened activations during training for all layers.
• Training of networks is complicated because the distribution of layer inputs changes during training (internal covariate shift)
Making normalization at all layers part of the training prevents the internalcovariate shift.
![Page 25: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/25.jpg)
Batch Normalization
![Page 26: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/26.jpg)
Batch Normalization
Backpropagation with Batch Normalization
![Page 27: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/27.jpg)
Batch Normalization
Results on MNIST [Ioffe & Szegedy2015]
![Page 28: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/28.jpg)
Batch Normalization
Results on ImageNet [Ioffe & Szegedy2015]
Learning ratemultiplied with 5 and 30.
14x faster to reach same results
![Page 29: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/29.jpg)
Conclusion
We discussed a set of tools which you can use to improve training CNNs:
• change learning rate / weight-decay
•SGD with Momentum can speed up convergence.
• Correct initialization is crucial for the succesfull application of DNNs.
• Dropout is an effective method to prevent overfitting.
• Batch Normalization prevents the internal covariate shift and therefore allowshigher learning rates (It seems that Dropout is not necessary when using BatchNormalization)
AssignmentAs an assignment for next week Monday 11/4 (deadline at 9:00):
• submit a short presentation in which you show your results for the
different exercises (the exercises with number 1-4).
• include a copy of your best network (mynet_train.m).
![Page 30: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/30.jpg)
Assignment
As an assignment for next week Monday 11/4 (deadline at 9:00):
• submit a short presentation in which you show your results for the
different exercises (the exercises with number 1-4). • include a copy of your best network (mynet_train.m).
Exam MaterialAll material including the Hands-on slides.
(except Batch-normalization derivation of back propagation, and Generative
Adversarial Networks)
![Page 31: A Hands-On Experience on Visual Object Recognition Module ... · •Batch Normalization prevents the internal covariate shift and therefore allows higher learning rates (It seems](https://reader033.vdocument.in/reader033/viewer/2022041418/5e1cece9299c623f2751c21c/html5/thumbnails/31.jpg)
A Hands-On Experience on
Visual Object Recognition
Project
Visual RecognitionModule 5
Ramon Baldrich
week 5 : Joost van de Weijer,
Marc Masana, German Ros
Coordination