practical deep learning
TRANSCRIPT
![Page 1: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/1.jpg)
Practical Deep Learning
Tambet Matiisen
Machine Learning Meetup
29.03.2015
![Page 2: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/2.jpg)
![Page 3: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/3.jpg)
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
![Page 4: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/4.jpg)
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
![Page 5: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/5.jpg)
Microsoft Project Oxford
www.projectoxford.ai
Face Detection Similar Face Search Face Grouping
Face Identification Emotion Recognition
Face Verification
![Page 6: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/6.jpg)
Microsoft Project Oxford Demo
https://www.projectoxford.ai/demo/Emotion
![Page 7: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/7.jpg)
Other Services
• Computer vision API
– image categorization, pornography detection, OCR
• Video API
– face tracking, motion detection
• Speech API
– Speech recognition, speaker recognition
• Language API
– Spell check, entity recognition, predict next word
![Page 8: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/8.jpg)
Price
• 5000-10000 transactions per month free.
• Later from $0.05 to $4 per 1000 transactions.
• Good for prototyping?
• But what if
– your dataset is too big?
– your dataset is sensitive?
– the pricing model doesn’t match?
![Page 9: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/9.jpg)
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
![Page 10: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/10.jpg)
Caffe
• Developed in Berkley Vision and Learning Center (BVLC).
• Written in C++, bindings for Python and Matlab.
• Works under Ubuntu, OSX and with some effort in Windows.
• Uses GPUs (cuDNN) to accelerate learning.
caffe.berkeleyvision.org
![Page 11: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/11.jpg)
Caffe Model Zoo
• Others: age and gender classification, emotion recognition, car model classification, flower classification, image hashing, image segmentation, object detection (RCNN) etc.
1000 everyday objects, including 100 dog breeds
205 scene categories, including outdoors, indoors
Face feature extraction
https://github.com/BVLC/caffe/wiki/Model-Zoo
![Page 12: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/12.jpg)
Caffe Pretrained Model Demo
https://github.com/BVLC/caffe/blob/master/examples/00-classification.ipynb
![Page 13: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/13.jpg)
Transfer Learning
i
ii yxyxd 2)(),(
![Page 15: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/15.jpg)
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
![Page 16: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/16.jpg)
Fine-tuning Pretrained Network
10
freeze layers train
![Page 17: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/17.jpg)
Fine-tuning Pretrained Network
10
freeze layers train
![Page 18: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/18.jpg)
Fine-tuning Pretrained Network
10
freeze layers train
![Page 19: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/19.jpg)
Fine-tuning Pretrained Network
10
train
![Page 20: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/20.jpg)
Example: Estonian Border Guard
![Page 21: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/21.jpg)
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
![Page 22: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/22.jpg)
Images labeled “Lennart Meri” Many people on the image.
Face too small.
Not facing the camera.
The person not on the image at all!
![Page 23: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/23.jpg)
Training your own model
• Make sure you have enough labeled data
– Minimum in thousands, preferably in millions.
• Start with existing (most similar) architecture:
– Models based on ImageNet (256x256 color images)
– Models based on CIFAR-10 (32x32 color images)
• Scale down hidden layer sizes so that ratio of samples / parameters stays roughly the same.
– AlexNet: 1.2M training images, 61M parameters
– Your dataset: 100K images, 6M parameters
![Page 24: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/24.jpg)
Python toolkits
Keras Neon
• Written in Python
• Built on top of Theano, supports also TensorFlow
• Inspired by Torch API
• Plenty of examples
• Written in Python
• Custom GPU backend, written in GPU assembler
• Fastest convolutions
• Plenty of examples
keras.io neon.nervanasys.com
![Page 25: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/25.jpg)
Defining architecture
Keras Neon
![Page 26: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/26.jpg)
Training the model
Keras
Neon
![Page 27: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/27.jpg)
The Good and the Bad...
Keras
• Nicer API
• Better documentation
• Can be extended with Theano
• Slower convolutions
• Compilation time (Theano)
• Repo stability
Neon
• Fastest convolutions
• Some nice gimmicks: – deconvolution layer
– object detection (RCNN)
– guided backpropagation
• Recurrent networks slow
• Documentation
• Repo stability
Use with text data! Use with image data!
![Page 28: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/28.jpg)
Hyperparameter shock
• Too many hyperparameters to try – number of layers, hidden nodes, filter size, learning rate etc. – Start with default parameters from example
– Use adaptive learning rate (Adam, Rmsprop)
– Use batch normalization
– Turn off regularization at first
– Overfit small subset and then regularize with more data and dropout. Consider data augmentation.
– Do greedy search, changing one parameter at a time
– If desparate, try Bayesian optimization
![Page 29: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/29.jpg)
Example: positioning rat using brain activity
![Page 30: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/30.jpg)
![Page 31: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/31.jpg)
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
![Page 32: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/32.jpg)
Deep Learning in Practice
0. Use web API
1. Use pre-trained model
1.5. Fine-tune pre-trained model
2. Train your own model
2.5. Write custom layer/loss function
3. Design your own architecture
![Page 33: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/33.jpg)
TensorFlow
• Computational flow graph
– Automatic differentiation
• Runs on CPUs and GPUs
– Desktop, server, mobile
• Asynchronous computation
– Assign nodes to different devices
• Connect research and production
– The same code can be run everywhere
![Page 34: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/34.jpg)
Example: Logistic Regression
ypa
L
W
a
a
L
W
L
xW
a
)( ypxW
L T
bxWa
)softmax(ap
ji
ijij pyL,
)log(
![Page 35: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/35.jpg)
x W
b ×
+
Softmax
Log y
*
Sum
-
bxWa
)softmax(ap
ji
ijij pyL,
)log(
![Page 36: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/36.jpg)
x W
b ×
+
Softmax
Log y
*
Sum
-
![Page 37: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/37.jpg)
Differentiable Programming
1. Express your assumptions about the problem as computational graph
2. Come up with meaningful loss function
3. Optimize the hell out of it using gradient descent
4. Profit!!!
• Automatic differentiation inefficient? – C seemed inefficient compared to assembler
– ORMs seemed inefficient compared to SQL
![Page 38: Practical Deep Learning](https://reader031.vdocument.in/reader031/viewer/2022022203/58708f301a28ab412b8b51f5/html5/thumbnails/38.jpg)
Choose the level of abstraction that you are comfortable with!
0. Use web API
1. Use pre-trained model
2. Train your own model
3. Design your own architecture