machine learning for applications in computer vision · machine learning for applications in...
TRANSCRIPT
![Page 1: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/1.jpg)
Computer Vision Group Prof. Daniel Cremers
Machine Learning for Applications in Computer Vision
Neural Networks
![Page 2: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/2.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Perceptron• The human brain (1010 cells) is the archetype of neural networks
http://home.agh.edu.pl/~vlsi/AI/intro/
2
![Page 3: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/3.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Perceptron• The human brain (1010 cells) is the archetype of neural networks
http://home.agh.edu.pl/~vlsi/AI/intro/
3
Finds a hyperplane to separate the classes !Very similar to SVMs
![Page 4: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/4.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Perceptron• The human brain (1010 cells) is the archetype of neural networks
4
Activation Potential
![Page 5: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/5.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Neuron Activations• Different type of activation functions
Threshold activation (binary classifier)
5
![Page 6: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/6.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Neuron Activations• Different type of activation functions
Sigmoid activation
6
More accurate ! Describe the non-linearcharacteristics of biological neurons
y =
1
1 + exp
��
![Page 7: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/7.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Neuron Activations• Different type of activation functions
Tangent activation
7
y =
exp
� � exp
��
exp
�+exp
��
![Page 8: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/8.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Neuron Activations• Different type of activation functions
Linear activation
8
![Page 9: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/9.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Neuron Activations• Different type of activation functions
rectified Linear Unit activation
9
y = max(0,�)
![Page 10: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/10.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Simple Neural Network• Best reference: neuralnetworksanddeeplearning.com
10
![Page 11: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/11.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Simple Neural Network• Best reference: neuralnetworksanddeeplearning.com
11
How to train ?
How to learn weights and biases ?
![Page 12: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/12.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Back Propagation• Define a cost function
• Goal: Find global minimum
12
predictionground truth
![Page 13: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/13.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Back Propagation• Define a cost function
• Goal: Find global minimum
13
predictionground truth
Gradient Descent
![Page 14: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/14.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Back Propagation• Define a cost function
• Minimize the error (derivative of the cost) with gradient descent• Gradient descent update rule:
14
predictionground truth
Learning rate
![Page 15: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/15.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Back Propagation• Define a cost function
• Propagate the error through layers• Take the derivate of the output of the layer w.r.t the
input of the layer• Apply update rule for the parameters
• Repeat until convergence
15
predictionground truth
![Page 16: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/16.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Stochastic Gradient DescentGradient Descent
• Computing cost and derivative for the entire training set intractable.
• Not easy to incorporate new data in an ‘online’ settingSolution: SGD
• Update parameters over a batch of images• Mini-batch reduces the variance in the parameter
update and can lead to more stable convergence• Use momentum for faster convergence
http://ufldl.stanford.edu/tutorial/supervised/OptimizationStochasticGradientDescent/
16
vk ! v0k = µvk � ⌘@C
@wk, wk ! w0
k = wk + vk+1
![Page 17: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/17.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Other Optimisation Methods
• AdaDelta• AdaGrad• NAG (Nesterov’s Accelerated Gradient)• RMSProb• ADAM
• Less sensitive to hyper-parameter initialisation
17
![Page 18: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/18.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Back to Neuron Activations• Sigmoid:
๏Dying gradients - saturated activation๏Non-zero centered
• Tangent: ๏Dying gradients - saturated activation๏Zero centered
• ReLU: ๏Greatly accelerated convergence๏No expensive operation๏Can be fragile during training : use Leaky-ReLU
18
![Page 19: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/19.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
MNIST Digit Classification with NN
19
![Page 20: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/20.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Lets play around!
• cs.stanford.edu/people/karpathy/convnetjs
20
![Page 21: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/21.jpg)
C. HazirbasMachine Learning for Applications in Computer Vision
Quiz
21
![Page 22: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/22.jpg)
Machine Learning Practical
Convolutional Neural Network in A NetshellLingni Ma
Technische Universität MünchenComputer Vision Group
Summer Semester 2016
Unless otherwise stated, this slide takes most references from Standford lecture cs231n: ''convolutionalneural networks for visual recognition'' (http://cs231n.stanford.edu/). The mathematical details arereferenced from C. M. Bishop, "Pattern recognition and machine learning''.
![Page 23: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/23.jpg)
Outline
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 2 / 23
I Overview of convolutional neural network (CNN)I Building blocks: basic layers of CNNI Typical CNN architectureI Training
![Page 24: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/24.jpg)
Applications of CNN
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 3 / 23
I classification and recognitionI image segmentationI regression problems: optical flow, depth/normal estimation, loop
closure, transformation, reconstructionI feature extraction, encodingI unsupervised problem
![Page 25: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/25.jpg)
CNN Major Features
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 4 / 23
(recall) regular neural networksI universal approximatorI fully-connected
I not scale well to large inputI easily overfit
regular neural network
![Page 26: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/26.jpg)
CNN Major Features
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 5 / 23
convolutional neural networksI feed-forward architecture, direct acyclic graph (DAG)
yl+1k (w, x) = f
( M∑j=1
wl+1kj g
( D∑i=1
wljixi + bl
j)+ bl+1
k
)I error backpropagationI 3D volume of neuronsI small filter, local fully-connected, parameter sharing
regular neural network convolutional neural network
![Page 27: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/27.jpg)
A Toy Example CIFAR-10 classification with CNN
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 6 / 23
input input RGB image(32 x 32 x 3)
Conv compute the output of neurons, 12 filters of 3x3(32 x 32 x 12)
ReLU element-wise activation xo = max(0, xi)(32 x 32 x 12)
Pool downsample operation along width and height(16 x 16 x 12)
FC compute class score (SVM classifier)(1 x 1 x 10)
Loss loss computation during training
CNN: a direct acyclic graph (DAG) operates on volumes of neurons
![Page 28: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/28.jpg)
Basic CNN Layer Convolution
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 7 / 23
I a set of learnable filters (or kernels)I local fully connected: each filter is small in width and height, but
go through the full depth of input volumeI parameter sharing: each filter slides along input width and depthI feature map, output slice of one filter, stack into volumeI layer configuration
I receptive field (F): filter spatial dimensionI depth (D): number of filtersI padding (P): zero-padding on boundaryI stride (S): control the overlap of receptive field
![Page 29: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/29.jpg)
Basic CNN Layer Convolution
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 8 / 23
input volume: D1 × H1 × W1
conv layer: receptive field F, depth D, padding P, stride SI how many parameters does this layer contain?
(F × F × D1 + 1)× DI what is the output volume shape?
H2 = (H1 + 2P − F)/S + 1W2 = (W1 + 2P − F)/S + 1D2 = D
I spatial resolution perservationS = 1 and F = 2P + 1
![Page 30: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/30.jpg)
Basic CNN Layer Activation
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 9 / 23
I regular element-wise activationI no hyper-parameter requiredI ReLU1 = max(0, y)I PReLU2 = max(0, y) + αmin(0, y)
1 Alex Krizhevsky, Ilya Sutskever and Geoff Hinton, "ImageNet classification with deep convolutional neuralnetworks'', NIPS 20122 He et al. ICCV 2015, ''Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNetClassification''
![Page 31: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/31.jpg)
Basic CNN Layer Pooling
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 10 / 23
I drastically reduce the spatial parametersI control over-fittingI operate on each feature mapI common pooling layer specification
F = 2,S = 2,P = 0, non-overlapping, most commonF = 3,S = 2,P = 0, overlap pooling
I MAX pooling vs Average pooling
![Page 32: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/32.jpg)
Basic CNN Layer Fully Connected
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 11 / 23
I contain most of network parametersI last layer of CNN compute class score ⇒ act as SVM classifierI net surgery: convert FC layer to Conv layer
FC layer: K = 4096 operate on 7× 7× 512 volume of neuronFC: 4096 inner product wTx,where w, x ∈ R7×7×512
Conv: D = 4096,F = 7,P = 0,S = 1
I practical advantages with Conv layer: efficiencyinput 224× 224 image, downsample 32 times before FC.What if input 384× 384?evaluating 224× 224 crops of the 384× 384 image in strides of 32pixels is identical to forwarding the converted ConvNet one time
![Page 33: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/33.jpg)
Basic CNN Layer Batch Norm / Dropout
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 12 / 23
I batch normalization1
I accelerate training and increase accuracyI append after activation
I yi = γxi − µB√σ2B + ε
+ β, with µB =1
m
m∑i
xi, σ2B =
1
m
m∑i(xi − µB)
2
I dropout2I randomly mask out neurosI control over-fittingI mostly append to FC layers
1Ioffe et al., ''Batch normalization: accelerating deep network training by reducing internal covariate shift'',NIPS 20152Srivastava et al., "Dropout: a simple way to prevent neural networks from overfitting'', JMLR 2014
![Page 34: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/34.jpg)
Loss Function
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 13 / 23
regression with Euclidean loss
I E(w) =1
2N∑
i‖f (w, xi)− yi‖2
classification with hinge loss
I E(w) =1
N∑xn
∑j 6=k
max(0, yj(w, xn)− yk(w, xn) + 1
)
![Page 35: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/35.jpg)
Loss Function Classification with Cross-Entropy Loss
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 14 / 23
I cross-entropy H(p, q) =∑
x−p(x) log q(x)
minimize KL-divergence between true distribution p(x) andestimated distribution q(x)
I two-class classification (logistic regression)I E(w) = − 1
N∑xn
(pn logσ
(y(w, xn)
)+ (1− pn) log(1− σ
(y(w, xn)
))I logistic sigmoid: σ(y) = 1/
(1 + exp(−y)
)I multi-class classification (multiclass logistic regression)
I E(w) = − 1
N∑xn
log(σ(yk(w, xn)
))I softmax: σ(yk) = eyk/
∑j eyj
![Page 36: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/36.jpg)
Loss Function Regularizer
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 15 / 23
E(w) = Edata(w) + λEregu(w)
I regularizer is also known as weight decayI control over-fitting by penalize parameter valuesI L1 := ‖w‖1, not strictly differetiableI L2 := ‖w‖22, differentiable, zero-mean Gaussian prior distribution
![Page 37: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/37.jpg)
Building CNN patterns of layers and filters
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 16 / 23
Input
[[Conv ReLU
(BatchNorm
) ]× N Pool
]× M
[FC ReLU
(Dropout
) ]× K
FC (score)
I usually N ∈ (0, 3),M ≥ 0
I shape-preserve conv F = 2P + 1,S = 1, downsample with poolingI better stack small conv filter to obtain large receptive field
I stack three F = 3 filters vs. one F = 7 filterI K ∈ (0, 3)
![Page 38: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/38.jpg)
Classical CNN Architectures LeNet1
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 17 / 23
I first successful CNN exampleI digit and letters recognitionI 3 Conv, 1 FC, 60K parameters
1 Yann LeCun et al., ''Gradient-based learning applied to document recognition'', IEEE proceedings, 1998
![Page 39: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/39.jpg)
Classical CNN Architectures AlexNet1
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 18 / 23
I first work to popularize CNNI 5 Conv layers, 3 FC layers, 650K neurons, 60M parametersI first 96 filters learned
1 Alex Krizhevsky, Ilya Sutskever and Geoff Hinton, "ImageNet classification with deep convolutional neuralnetworks'', NIPS 2012
![Page 40: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/40.jpg)
Classical CNN Architectures VGGNet1
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 19 / 23
I deep: 16 or 19 layersI 13 or 16 Conv layers, small filters F = 3,S = 1,P = 1
I 5 max pooling layer, downsample by 32I 240K neurons, 138M parameters
1 Karen Simonyan and Andrew Zisserman, 2014 ICLR, ''Very deep convolutional networks for large-scaleimage recognition''
![Page 41: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/41.jpg)
Classical CNN Architectures
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 20 / 23
GoogLeNet1I 22 layersI inception module: combine convolutional layersResNet2I very deep, e.g. 152 layers
Do networks get better simply by stacking more layers?
1 Christian Szegedy et al, 2015 CVPR, "Going deeper with convolutions''2 He et al, 2015, ''Deep Residual Learning for Image Recognition''
![Page 42: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/42.jpg)
Training CNN
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 21 / 23
data preparationI mean substraction, normalizationI augmentationweight initializationI Gaussain, xavier1, msra2
I trainsfer learningoptimization algorithmI SGDI shuffle inputhyper-parameterI learning rate policyI batch size1 Xavier Glorot et al. ''Understanding the difficulty of training deep feedforward neural networks'', 20102 He et al., ''Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNetClassification'', ICCV 2015
![Page 43: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/43.jpg)
Training CNN Framework and Packages
Convolutional Neural Networks in A Netshell Machine Learning Practial SS2016 22 / 23
Caffe1
I C++/CUDA, with Python and Matlab bindingsI (+) good for feedforward network (+) finetune (+) easy to startI (-) bad at recurrent network (-) poor support for visualizationTorch2
I C and LuaI used a lot by Facebook, DeepMindTensorFlow3
I GoogleI Python, source code not readableI not limited for CNN, also good at RNN
1 http://caffe.berkeleyvision.org/2 https://torch.ch3 https://www.tensorflow.org/
![Page 44: Machine Learning for Applications in Computer Vision · Machine Learning for Applications in Computer Vision C. Hazirbas Perceptron • The human brain (1010 cells) is the archetype](https://reader033.vdocument.in/reader033/viewer/2022052609/5b2e1a7d7f8b9a594c8d1d35/html5/thumbnails/44.jpg)
Question ?