pydresden 20170824 - deep learning for computer vision
TRANSCRIPT
![Page 1: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/1.jpg)
Deep Learningfor Computer Vision
Alex Conwayalex @ numberboost.com
PyDresden Meetup 24/08/2017
![Page 2: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/2.jpg)
Hands up!
![Page 3: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/3.jpg)
Image Classification
3
![Page 4: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/4.jpg)
https://research.googleblog.com/2017/06/supercharge-your-computer-vision-models.html
Object detection
![Page 5: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/5.jpg)
https://www.youtube.com/watch?v=VOC3huqHrss
Object detection
![Page 6: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/6.jpg)
Image Captioning & Visual Attention
XXX
6
https://einstein.ai/research/knowing-when-to-look-adaptive-attention-via-a-visual-sentinel-for-image-captioning
![Page 7: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/7.jpg)
Image Q&A
7https://arxiv.org/pdf/1612.00837.pdf
![Page 8: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/8.jpg)
Video Q&A
XXX
8
https://www.youtube.com/watch?v=UeheTiBJ0Io
![Page 9: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/9.jpg)
Pix2Pix
https://affinelayer.com/pix2pix/https://github.com/affinelayer/pix2pix-tensorflow
9
![Page 10: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/10.jpg)
Pix2Pix
https://medium.com/towards-data-science/face2face-a-pix2pix-demo-that-mimics-the-facial-expression-of-the-german-chancellor-b6771d65bf66
10
![Page 11: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/11.jpg)
11
![Page 12: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/12.jpg)
Style Transfer
https://github.com/junyanz/CycleGAN12
![Page 13: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/13.jpg)
Style Transfer
https://github.com/junyanz/CycleGAN13
![Page 14: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/14.jpg)
1. What is a neural network?
2. What is a convolutional neural network?
3. Quick tutorial
4. More advanced methods
5. Practical tips
6. Where to from here?
14
![Page 15: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/15.jpg)
Big Shout Outs
Jeremy Howard & Rachel Thomashttp://course.fast.ai
Andrej Karpathyhttp://cs231n.github.io
15
![Page 16: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/16.jpg)
1.What is a neural network?
![Page 17: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/17.jpg)
What is a single neuron?
17
• 3 inputs [x1,x2,x3] • 3 weights [w1,w2,w3]• Element-wise multiply and sum • Apply activation function f
![Page 18: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/18.jpg)
What is an Activation Function?
18
Sigmoid Tanh ReLU
Nonlinearities … “squashing functions” … transform neuron’s outputNB: sigmoid output in [0,1]
![Page 19: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/19.jpg)
What is a Deep Neural Network?
19
Inputs outputs
hiddenlayer 1
hiddenlayer 2
hiddenlayer 3
Outputs of one layer are inputs into the next layer
![Page 20: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/20.jpg)
How does a neural network learn?
20
• You need labelled examples “training data”
• Initially, the network makes random predictions (weights initialized randomly)
• For each training data point, we calculate the error between the network’s predictions and the ground-truth labels (“loss function” e.g. mean squared error)
• Using a method called ‘backpropagation’ (really just the chain rule from calculus 1), we use the error to update the weights (using ‘gradient descent’) so that the error is a little bit smaller next time (it learns from past errors)
• We show the network many examples and update its parameters
• An epoch is when we’ve shown the network all the training examples
• We train for many epochs, measuring the error rate as we go
• (Actually, we split out our data into training and validation so we don’t overfit)
• Stop training when the validation error rate stops decreasing
![Page 21: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/21.jpg)
http://playground.tensorflow.org
![Page 22: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/22.jpg)
What is a Neural Network?
For much more detail, see:
1. Michael Nielson’s Neural Networks & Deep Learning free online book http://neuralnetworksanddeeplearning.com/chap1.html
2. Anrej Karpathy’s CS231n Noteshttp://neuralnetworksanddeeplearning.com/chap1.html
22
![Page 23: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/23.jpg)
2. What is a convolutional
neural network?
![Page 24: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/24.jpg)
What is a Convolutional Neural Network?
24
“like a simple neural network but with special types of layers that work well on images”
(math works on numbers)
• Pixel = 3 colour channels (R, G, B)• Pixel intensity = number in [0,255]• Image has width w and height h• Therefore image is w x h x 3 numbers
![Page 25: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/25.jpg)
25This is VGGNet – don’t panic, we’ll come back to it later!
Example Architecture
![Page 26: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/26.jpg)
Convolutions
26
http://setosa.io/ev/image-kernels/
![Page 27: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/27.jpg)
New Layer Type: Convolutional Layer
27
• 2-d weighted average when multiply kernel over pixel patches• We slide the kernel over all pixels of the image (handle borders)• Kernel starts off with “random” values and network updates the
kernel values (using backpropagation) to try minimize loss• The kernels are learned• Kernels are shared across the whole image (parameter sharing)
![Page 28: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/28.jpg)
Convolutions
28
“imagine taking this 3x3 matrix (“kernel”) and positioning
it over a 3x3 area of an image, and let's multiply each
overlapping value. Next, let's sum up these products, and
let's replace the center pixel with this new value. If we
slide this 3x3 matrix over the entire image, we can
construct a new image by replacing each pixel in the
same manner just described.”
http://cs231n.github.io/convolutional-networks/
![Page 29: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/29.jpg)
Convolutions
29
“ ...for example, we can start with 8 randomly
generated filters; that is 8 3x3 matrices with random
elements. Given labeled inputs, we can then use
stochastic gradient descent to determine what the
optimal values of these filters are, and therefore we
allow the neural network to learn what things are
most important to detect in classifying images. “
http://cs231n.github.io/convolutional-networks/
![Page 30: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/30.jpg)
Convolutions
30
https://github.com/fchollet/keras/blob/master/examples/conv_filter_visualization.py
![Page 31: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/31.jpg)
Convolutions
31
![Page 32: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/32.jpg)
Convolutions
32
![Page 33: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/33.jpg)
Convolutions
33
![Page 34: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/34.jpg)
Great vid
34
https://www.youtube.com/watch?v=AgkfIQ4IGaM
![Page 35: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/35.jpg)
New Layer Type: Max Pooling
• Reduces dimensionality from one layer to next• By replacing NxN sub-area with max value• Makes network “look” at larger areas of the image at a time e.g.
Instead of identifying fur, identify cat• Reduces computational load• Controls for overfitting since losing information helps the network
generalize35
![Page 36: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/36.jpg)
New Layer Type: Dropout
• Form of regularization (helps prevent overfitting)• Trades ability to fit training data to help generalize to new data• Used during training (not test)• Randomly set weights in hidden layers to 0 with some probability p
36
![Page 37: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/37.jpg)
Bringing it all together
xxx
37
![Page 38: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/38.jpg)
ImageNet
38
http://image-net.org/explore
![Page 39: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/39.jpg)
ImageNet
39
![Page 40: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/40.jpg)
3. Quick tutorial
![Page 41: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/41.jpg)
Using a Pre-Trained ImageNet-Winning CNN
41
• We’ll be using “VGGNet”
• Oxford Visual Geometry Group (VGG)
• The runner-up in ILSVRC 2014
• Network is 16 layers (deep!)
• Its main contribution was in showing that the depth of the network is a critical component for good performance.
• Only 3x3 (stride 1 pad 1) convolutions and 2x2 max pooling
• Easy to fine-tune
![Page 42: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/42.jpg)
Using a Pre-Trained ImageNet-Winning CNN
42
https://blog.keras.io/building-powerful-image-classification-models-using-very-little-data.html
![Page 43: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/43.jpg)
43
CODE TIME! https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
![Page 44: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/44.jpg)
44
![Page 45: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/45.jpg)
Fine-tuning A CNN To Solve A New Problem
• Fix weights in convolutional layers (set trainable=False)• Re-train final dense layer(s)
45
![Page 46: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/46.jpg)
Visual Similarity
46
• Chop off last 2 VGG layers
• Use dense layer with 4096 activations
• Compute nearest neighbours in the space of these activations
![Page 47: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/47.jpg)
47
![Page 48: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/48.jpg)
48
![Page 49: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/49.jpg)
4. More advanced topics
![Page 50: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/50.jpg)
cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Lots of Computer Vision Tasks
![Page 51: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/51.jpg)
Long, Shelhamer, and Darrell, “Fully Convolutional Networks for Semantic Segmentation” CVPR 2015
Noh et al, “Learning Deconvolution Network for Semantic Segmentation” CVPR 2015
Semantic Segmentation
![Page 52: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/52.jpg)
http://cs231n.stanford.edu/slides/2017/cs231n_2017_lecture11.pdf
Object detection
![Page 53: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/53.jpg)
Object detection
![Page 54: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/54.jpg)
https://www.youtube.com/watch?v=VOC3huqHrss
Object detection
![Page 55: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/55.jpg)
![Page 56: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/56.jpg)
http://blog.romanofoti.com/style_transfer/https://github.com/fchollet/keras/blob/master/examples/neural_style_transfer.py
Style TransferLoss = content_loss + style_loss
Content loss = convolutions from pre-trained networkStyle loss = gram matrix from style image convolutions
![Page 57: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/57.jpg)
https://www.autoblog.com/2017/08/04/self-driving-car-sign-hack-stickers/
![Page 58: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/58.jpg)
http://blog.kaggle.com/2016/02/04/noaa-right-whale-recognition-winners-interview-2nd-place-felix-lau/
Computer Vision Pipelines
https://flyyufelix.github.io/2017/04/16/kaggle-nature-conservancy.html
![Page 59: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/59.jpg)
5. Practical tips
![Page 60: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/60.jpg)
Practical Tips• use a GPU – AWS p2 instances (use spot!)
• when overfitting (validation_accuracy <<< training_accuracy)– Increase dropout – Early stopping
• when underfitting (low training_accuracy)1. Add more data2. Use data augmentation
– Flipping / stretching / rotating– Slightly change hues / brightness
3. Use more complex model– Resnet / Inception / Densenet– Ensamble (average <<< weighted average with learned weights)
60
![Page 61: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/61.jpg)
6. Where to from here?
![Page 62: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/62.jpg)
Where to From Here?
• Clone the repo and try use your own data
• Do the fast.ai course
• Do Stanford cs231n
• Read colah.github.io/posts
• Email me questions /ideas :) [email protected]
62
![Page 63: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/63.jpg)
We’re hiring!
![Page 64: PyDresden 20170824 - Deep Learning for Computer Vision](https://reader038.vdocument.in/reader038/viewer/2022100803/5a6613137f8b9a214f8b514f/html5/thumbnails/64.jpg)
THANKS!https://github.com/alexcnwy/CTDL_CNN_TALK_20170620
Alex Conwayalex @ numberboost.com