deep learning with cnns - aris...
TRANSCRIPT
![Page 1: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/1.jpg)
Deep Learning with CNNs
University of Rome "La Sapienza"Dep. of Computer, Control and Management Engineering A. Ruberti
Valsamis Ntouskos, ALCOR Lab
![Page 2: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/2.jpg)
Outline
Deep Learning with CNNs
• Introduction - Motivation
• Theoretical aspects
• Brief history of CNNs
• Evolution of CNNs for image classification
• Applications of CNNs in computer vision
![Page 3: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/3.jpg)
Deep Learning with CNNs
Deep Learning with CNNs
Compositional Models
Learned End-to-End
Hierarchy of Representations- vision: pixel, motif, part, object
- text: character, word, clause, sentence
- speech: audio, band, phone, word
concrete abstractlearning
Slides from Caffe framework tutorial @ CVPR2015
![Page 4: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/4.jpg)
Deep Learning with CNNs
Deep Learning with CNNs
Compositional Models
Learned End-to-End
Back-propagation jointly learns
all of the model parameters to
optimize the output for the task.
Slides from Caffe framework tutorial @ CVPR2015
![Page 5: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/5.jpg)
Motivation
Deep Learning with CNNs
Up to now we treated inputs as general feature vectors
In some cases inputs have special structure:• Audio• Images• Videos
Signals: Numerical representations of physical quantities
Deep learning can be directly applied on signals by using suitable operators
![Page 6: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/6.jpg)
Motivation
Deep Learning with CNNs
. . . 0.0468 0.0468 0.0468 0.0390 0.0390 0.0390 0.0546 0.0625 0.0625 0.0390 0.0312 0.0468 0.0625 . . .
1D data - (variable length) vectors
Audio
![Page 7: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/7.jpg)
Motivation
Deep Learning with CNNs
Images
A sequence of images sampled through time - 3D data
2D data - matrices
Video
![Page 8: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/8.jpg)
What is a CNN?
Deep Learning with CNNs
![Page 9: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/9.jpg)
Some theory
Deep Learning with CNNs
Convolution
From Steve Seitz and Richard Szeliski's slides(https://courses.cs.washington.edu/courses/cse576/08sp/)
Interactive examples:http://setosa.io/ev/image-kernels/
Luca
![Page 10: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/10.jpg)
Some theory
Deep Learning with CNNs
Convolution
• Image filtering is
based on convolution
with special kernels
![Page 11: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/11.jpg)
Some theory
Deep Learning with CNNs
Pooling
• Introduces subsampling
![Page 12: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/12.jpg)
Some theory
Deep Learning with CNNs
Activation
Standard way to model a neuron
f(x) = tanh(x) or f(x) = (1 + e-x)-1
Very slow to train (saturation)
Non-saturating nonlinearity (RELU)f(x) = max(0, x)Quick to train
![Page 13: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/13.jpg)
Some theory
Deep Learning with CNNs
Regularization
Dropout
• Applied on the fully-connected layers
• During training prune nodes with probability α
• During testing nodes are weighed by α
Image from Srivastava et al.. "Dropout: A Simple Way to Prevent Neural Networks from Overfitting"
![Page 14: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/14.jpg)
Some theory
Deep Learning with CNNs
Every convolutional layer of a CNN transforms the 3D input
volume to a 3D output volume of neuron activations.
A regular 3-layer Neural Network
Material from Fei-Fei’s group
![Page 15: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/15.jpg)
Some theory
Deep Learning with CNNs
Each neuron is connected to a
local region in the input volume
spatially, but to all channels
The neurons still compute a dot
product of their weights with the
input followed by a non-linearity
Material from Fei-Fei’s group
![Page 16: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/16.jpg)
Terminology
• Kernel: matrix corresponding to convolution / filter
• Depth: number of feature maps / filters (d)
• Depth slice: a single feature map
• Padding: zero filled addition outer rows/columns (p)
• Stride: step of sliding kernel (s)
– e.g. value 1 takes one pixel at a time
• Receptive field: 2D dimensions of kernel (𝑤𝑘 × ℎk)
• Weight or parameter sharing: Parameters of the
filter are shared through a depth slice
– i.e. parameters are the same for all units of the same
features map
Deep Learning with CNNs
![Page 17: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/17.jpg)
Algorithms
Deep Learning with CNNs
• Each* neuron/layer is differentiable!
• Just apply backpropagation (chain-rule)
• Use standard gradient-based optimization algorithms
(SGD, AdaGrad, …)
• The devil lies in the details though …
▪Choosing hyperparameters / loss-function
▪Exploding/Vanishing gradients – batch normalization
▪Overfitting – Regularization
▪Cost of performing experiments
▪Convergence
▪…
*what about max-pooling?
![Page 18: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/18.jpg)
Kernels and
Feature maps
Deep Learning with CNNs
Material from Fei-Fei’s group
![Page 19: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/19.jpg)
Brief history of CNNs
Foundational work done in the middle of the 1900s
• 1940s-1960s: Cybernetics [McCulloch and Pitts 1943,
Hebb 1949, Rosenblatt 1958]
• 1980s-mid 1990s: Connectionism [Rumelhart 1986,
Hinton 1989]
• 1990s: modern convolutional networks [LeCun et al.
1998], LSTM [Hochreiter & Schmidhuber 1997,
MNIST and other large datasets]
Deep Learning with CNNs
![Page 20: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/20.jpg)
Brief history of CNNs
Deep Learning with CNNs
Hubel & Wiesel [60s] Simple & Complex cells architecture:
Hubel & Wiesel [60s] Simple & Complex cells architecture Fukushima’s Neocognitron [70s]
Yann LeCun’s Early CNNs [80s]:
![Page 21: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/21.jpg)
Brief history of CNNs
Deep Learning with CNNs
Convolutional Networks: 1989
LeNet: a layered model composed of convolution and subsampling operations followed
by a holistic representation and ultimately a classifier for handwritten digits. [ LeNet ]
![Page 22: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/22.jpg)
Recent success
• Parallel Computation (GPU)
• Larger training sets
• International Competitions
• Theoretical advancements
– Dropout
– ReLUs
– Batch Normalization
Deep Learning with CNNs
![Page 23: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/23.jpg)
Recent success
Deep Learning with CNNs
CUDA Jetson TX1, TK1
Android lib, demo
OpenCL branch
Better Hardware – GPUs
![Page 24: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/24.jpg)
Recent success
ImageNet
• Over 15M labeled high resolution images
• Roughly 22K categories
• Collected from web and labeled by Amazon Mechanical Turk
Deep Learning with CNNs
Larger training sets
![Page 25: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/25.jpg)
Recent success
ILSVRC
• Annual competition of image classification at large scale
• 1.2M images in 1K categories
• Classification: make 5 guesses about the image label
Deep Learning with CNNs
Competitions
EntleBucher Appenzeller
![Page 26: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/26.jpg)
Evolution of CNNs for image classification
Deep Learning with CNNs
AlexNet: a layered model composed of convolution, subsampling, and further operations
followed by a holistic representation and all-in-all a landmark classifier on
ILSVRC12. [ AlexNet ]
Convolutional Nets: 2012
AlexNet
![Page 27: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/27.jpg)
Evolution of CNNs for image classification
Deep Learning with CNNs
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error
- GoogLeNet: composition of multi-scale
dimension-reduced modules
+ depth
+ data
+ dimensionality reduction
![Page 28: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/28.jpg)
Evolution of CNNs for image classification
Deep Learning with CNNs
Convolutional Nets: 2014
ILSVRC14 Winners: ~6.6% Top-5 error
- VGG: 16 layers of 3x3 convolution
interleaved with max pooling +
3 fully-connected layers
+ depth
+ data
+ dimensionality reduction
![Page 29: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/29.jpg)
Evolution of CNNs for image classification
Deep Learning with CNNs
Convolutional Nets: 2015
ResNet
ILSVRC15 Winner: ~3.6% Top-5 error
Intuition: Easier to learn zero than identity function
![Page 30: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/30.jpg)
Evolution of CNNs for image classification
Deep Learning with CNNs
![Page 31: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/31.jpg)
Reasonable questions
• Is this just for a particular dataset? – No!
Deep Learning with CNNs
Slides from ICCV 2015 Math of Deep Learning tutorial
![Page 32: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/32.jpg)
Reasonable questions
Deep Learning with CNNs
Object Localization[R-CNN, HyperColumns, Overfeat, etc.]
Pose estimation [Thomson et al, CVPR’15]
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
![Page 33: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/33.jpg)
Reasonable questions
Deep Learning with CNNs
Semantic Segmentation[Pinhero, Collobert, Dollar, ICCV’15]
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
![Page 34: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/34.jpg)
Reasonable questions
Deep Learning with CNNs
• Is this just for a particular task? – No!
Slides from ICCV 2015 Math of Deep Learning tutorial
![Page 35: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/35.jpg)
Fine Tuning
Deep Learning with CNNs
Dogs vs. Cats
top 10 in 10 minutes
Take a pre-trained model and fine-tune to new tasks [DeCAF] [Zeiler-Fergus] [OverFeat]
© kaggle.com
Your Task
Style
RecognitionLots of Data
ImageNet
![Page 36: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/36.jpg)
Pixelwise Prediction
Deep Learning with CNNs
Fully convolutional networks for pixel prediction
in particular semantic segmentation
- end-to-end learning
- efficient inference and learning
100 ms per-image prediction
- multi-modal, multi-task
Applications
- semantic segmentation
- denoising
- depth estimation
- optical flow
Jon Long* & Evan Shelhamer*,
Trevor Darrell. CVPR’15
![Page 37: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/37.jpg)
Dealing with sequences
Deep Learning with CNNs
Recurrent Nets and Long Short Term Memories (LSTM)
are sequential models
- video
- language
- dynamics
learned by backpropagation through time
Recurrent Networks for Sequences
LRCN: Long-term Recurrent Convolutional Network
- activity recognition (sequence-in)
- image captioning (sequence-out)
- video captioning (sequence-to-sequence)
LRCN:
recurrent + convolutional
for visual sequences
![Page 38: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/38.jpg)
Dealing with sequences
Deep Learning with CNNs
Visual Sequence Tasks
Jeff Donahue et al. CVPR’15
50
Based on Long short-term memory (LSTM) layers
![Page 39: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/39.jpg)
What’s next?
Various questions/problems are still open:
• Learning with constraints / on manifolds
• Using high-level knowledge/structure
• Exploring the mathematics of the networks
– what types of functions they can represent?
– are these functions useful/interesting?
– convergence/efficiency
• Rotation invariance (group operators)
• CNNs can be easily ‘fooled’
• …
Deep Learning with CNNs
![Page 40: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/40.jpg)
Resources
Frameworks:
• Caffe/Caffe 2 (UC Berkeley) | C/C++, Python, Matlab
• TensorFlow (Google) | C/C++, Python, Java, Go
• Theano (U Montreal) | Python
• CNTK (Microsoft) | Python, C++ , C#/.Net, Java
• Torch/PyTorch (Facebook) | Lua/Python
• MxNet (DMLC) | Python, C++, R, Perl, …
• Darknet (Redmon J.) | C
• …
Deep Learning with CNNs
![Page 41: Deep Learning with CNNs - Aris Anagnostopoulosaris.me/contents/teaching/data-mining-2017/slides/Deep_Learning_pt... · by a holistic representation and ultimately a classifier for](https://reader031.vdocument.in/reader031/viewer/2022022522/5b2c167e7f8b9a163e8bbac9/html5/thumbnails/41.jpg)
Resources
High-level libraries:
• Keras | Backends: TensorFlow (TF), Theano
Models:
• Depends on the framework, e.g.
– https://github.com/BVLC/caffe/wiki/Model-Zoo (Caffe)
– https://github.com/tensorflow/models/tree/master/research (TF)
Interactive Interfaces:
• DIGITS (NVIDIA) | Caffe, TF, Torch
• TensorBoard (TF)
Tools:
• http://ethereon.github.io/netscope (for networks defined in protobuf )
Deep Learning with CNNs