introduction to deep learning in python and matlab
TRANSCRIPT
Introduction to Hands-on Deep Learning
Imry Kissos Algorithm Researcher
Outline
● Problem Definition● Motivation● Training a Regression DNN● Training a Classification DNN● Open Source Packages● Summary + Questions
2
Problem Definition
3
Deep Convolutional Network
Tutorial
● Goal: Detect faciallandmarks on (normal)face images● Data set provided by Dr. Yoshua Bengio● Tutorial code available:https://github.com/dnouri/kfkd-tutorial/blob/master/kfkd.py
4
Flow
5
Predict Points on Test Set
Train ModelGeneral
Train Model “Nose Tip”
Train Model “Mouth Corners”
Flow
6
Train ImagesTrain Points
Fit TrainedNet
Flow
7
Test Images
Predict Predicted Points
Python DL Framework
Wrapper to Lasagne
Theano extension for Deep Learning
Define, optimize, and evaluate mathematical expressions
Efficient Cuda GPU for DNN
8
Low Level
High Level
HW Supports: GPU & CPUOS: Linux, OS X, Windows
Training a Deep Neural Network
1. Data Analysis2. Architecture Engineering3. Optimization4. Training the DNN
9
Training a Deep Neural Network
1. Data Analysisa. Exploration + Validationb. Pre-Processingc. Batch and Split
2. Architecture Engineering3. Optimization4. Training the DNN
10
Data Exploration + ValidationData:● 7K gray-scale images of detected faces● 96x96 pixels per image● 15 landmarks per image (?)
Data validation:● Some Landmarks are missing
11
1
Pre-Processing
12
Data Normalization
Shuffle train data
Batch-- t - train batch
- validation batch
- - test batch
⇐One Epoch’s data
13train/valid/test splits are constant
Train / Validation Split
14
Classification - Train/Validation preserve classes proportion
Training a Deep Neural Network
1. Data Analysis2. Architecture Engineering
a. Layers Definitionb. Layers Implementation
3. Optimization4. Training
15
Architecture
16
X Y
Conv Pool Dense Output
Layers Definition
17
Activation Function
18
1
ReLU
Dense Layer
19
Dropout
20
Dropout
21
Training a Deep Neural Network
1. Data Analysis2. Architecture Engineering3. Optimization
a. Back Propagationb. Objectivec. SGDd. Updatese. Convergence Tuning
4. Training the DNN 22
Back PropagationForward Path
23
Conv Dense
X Y
OutputPoints
Back PropagationForward Path
24
X Y
ConvOutputPointsDense
X Y
Training Points
Back PropagationBackward Path
25
X Y
Conv Dense
Back PropagationUpdate
26
Conv Dense
For All Layers:
Objective
27
S.G.D
28Updates the network after each batch
Optimization - Updates
29 Alec Radford
Adjusting Learning Rate & Momentum
30
Linear in epoch
Convergence Tuning
31
stops according to validation loss
returns best weights
Training a Deep Neural Network
1. Data Analysis2. Architecture Engineering3. Optimization4. Training the DNN
a. Fitb. Fine Tune Pre-Trainedc. Learning Curves
32
Fit
33
Loop over test batchs
Forward
Loop over train batchs
Forward+BackProp
Fine Tune Pre-Trained
fgd
34
change output layer
load pre-trained weight
fine tune specialist
Learning CurvesLoop over 6 Nets:
35
Epochs
Learning Curves Analysis
36
Net 1
Net 2
OverfittingConvergence Jittering
EpochsEpochs
RM
SE
RM
SE
Part 1 Summary
Training a DNN:
37
Python
● Rich eco-system● State-of-the-art● Easy to port from prototype to production
38https://github.com/yoavram/Py4Eng
Python DL Framework
39
Theano based Packages
Part 1 EndBreak
Part 2
Outline
● Problem Definition● Motivation● Training a regression DNN● Training a classification DNN● Improving the DNN● Open Source Packages● Summary
42
Matlab DL Framework
Open Source CNN Toolbox by
Numerical computing using Parallel Computing Toolbox
Efficient Cuda GPU for DNN
43
Low Level
High Level
HW Supports: GPU & CPUOS: Linux, OS X, Windows
Problem StatementClassify a, b, …, z images into 26 classes:
44http://www.robots.ox.ac.uk/~vgg/practicals/cnn/
Bonus - OCR:
Training a Deep Neural Network
1. Data Analysis2. Training the DNN3. Architecture Engineering4. Optimization
45
Data Analysis
46
Defines training vs validation
Class uint per class [1,26]
Data Pre-Processing
Image
47
scalar
Training Flow
48
Customized Batch Loading
49
How would you add Data Augmentation ?
trainOpts
50
Start from last iter if interrupted
initializeCharCnn()Net Architecture
Layers:● Conv● Pool● Conv● Pool● Conv● Relu● Conv● SoftMaxLoss
51
%f is the W initial std
Optimization
SoftMaxScore (-∞,∞) → probabilities [0,1]
52https://classroom.udacity.com/courses/ud730
One Hot Encoding
Encode class labels
53
Cross Entropy
Distance measure between S(Y) and Labels
54
Cross Entropy
Distance measure between S(Y) and Labels
55
D(S,L) is a positive scalart - index of ground-truth class
Cross Entropy
Distance measure between S(Y) and Labels
56
In vl_nnloss.m:
Training Goal
57
CNN
Minimize Loss
Loss=average cross entropy
58
Minimize loss
- learning rate
59
Error Rate
TopK - Target label is one of the top K predictions
The Error Rate is:
60
Loss & Error Convergence
61
Loss Error Rate
Learned Filters
62
OCR Evaluation
63
OCR Evaluation
64
Beyond Training
1. Training a classification DNN2. Improving the DNN
a. Analysis Capabilitiesb. Augmentation
3. Open Source Packages4. Summary
65
Basic VS Advanced Mode
66
Basic
Advance
Improving the DNN
Very tempting:● >1M images● >1M parameters● Large gap: Theory ↔ Practice
⇒Brute force experiments?!
67
Analysis Capabilities
1. Theoretical explanationa. Eg. dropout/augmentation decrease overfit
2. Empirical claims about a phenomenaa. Eg. normalization helps convergence
3. Numerical understandinga. Eg. exploding / vanishing updates
68
Reduce Overfitting
Solution:Data Augmentation
69
Net 1
Net 2
OverfittingEpochs
Data Augmentation
Horizontal Flip Perturbation
70
1
Convergence Challenges
71Need to monitor forward + backward path
EpochsEpochs
RM
SE
Data ErrorNormalization
Deal with NaN
1. If in first 100 iterations
a. Learning rate is too high
2. Beyond 100 iterations
a. Gradient explosion
i. Consider gradient clipping
b. Illegal math operation
i. SoftMax: inf/inf
ii. Division by zero by one of your customized layers
72http://russellsstewart.com//notes/0.html
The Net Doesn’t Learn Anything
1. Training loss does not reduce after first 100 iterations
a. Reduce the training size to 10 instances (images) to overfit it
i. Achieve 100% training accuracy on a small portion of data
b. Change batch size to 1 to and monitor the error per batch
c. Solve the simplest version of your problem
73http://russellsstewart.com//notes/0.html
Beyond Training
1. Training a classification DNN2. Improving the DNN3. Open Source Packages
a. DL Open Source Packagesb. Effort Estimation
4. Summary
74
Tips from Other PackagesTorch code organization Caffe’s separation
configuration ↔code
NeuralNet → YAML text format defining experiment’s configuration
75
DL Open Source Packages
76
Caffe & MatConvNet for applications Torch, TensorFlow and Theano for research on DL http://fastml.com/torch-vs-theano/
Simple dnnComplex dnn
Disruptive Effort Estimation
Feature Eng Deep Learning
77Modest SW Infra Huge SW Infra
Summary
● Dove into Training a DNN● Presented Analysis Capabilities● Reviewed Open Source Packages
78
ReferencesHinton Coursera Neuronal Networkhttps://www.coursera.org/course/neuralnetsUdacity Tensor Flow coursehttps://classroom.udacity.com/courses/ud730Technion Deep Learning coursehttp://moodle.technion.ac.il/course/view.php?id=4128Oxford Deep Learning coursehttps://www.youtube.com/playlist?list=PLE6Wd9FR--EfW8dtjAuPoTuPcqmOV53FuCS231n CNN for Visual Recognitionhttp://cs231n.github.io/Deep Learning Bookhttp://www.deeplearningbook.org/
79
Questions?
80
DNN