deep learning for computer vision - michigan state …cse803/lectures/deeplearning...6.s191...
TRANSCRIPT
![Page 1: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/1.jpg)
Deep Learning for Computer VisionMIT 6.S191
Ava SoleimanyJanuary 29, 2019
![Page 3: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/3.jpg)
What Computers “See”
![Page 4: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/4.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Images are Numbers
[1]
![Page 5: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/5.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Images are Numbers
[1]
![Page 6: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/6.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Images are NumbersWhat the computer sees
An image is just a matrix of numbers [0,255]!i.e., 1080x1080x3 for an RGB image
[1]
![Page 7: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/7.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Tasks in Computer Vision
- Regression: output variable takes continuous value- Classification: output variable takes class label. Can produce probability of belonging to a particular class
Input Image
classification
Lincoln
Washington
Jefferson
Obama
Pixel Representation
0.8
0.1
0.05
0.05
![Page 8: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/8.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
High Level Feature Detection
Let’s identify key features in each image category
Wheels, License Plate, Headlights
Door, Windows,
Steps
Nose, Eyes,
Mouth
![Page 9: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/9.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Manual Feature Extraction
Problems?
Define featuresDomain knowledge Detect features to classify
![Page 10: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/10.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Manual Feature Extraction
Define featuresDomain knowledge Detect features to classify
[2]
![Page 11: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/11.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Manual Feature Extraction
Define featuresDomain knowledge Detect features to classify
[2]
![Page 12: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/12.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Learning Feature Representations
Can we learn a hierarchy of features directly from the data instead of hand engineering?
Low level features Mid level features High level features
Eyes, ears, noseEdges, dark spots Facial structure
[3]
![Page 13: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/13.jpg)
Learning Visual Features
![Page 14: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/14.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Fully Connected Neural Network
![Page 15: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/15.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Fully Connected Neural Network
Fully Connected: • Connect neuron in hidden
layer to all neurons in input layer
• No spatial information!• And many, many parameters!
Input: • 2D image• Vector of pixel values
![Page 16: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/16.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Fully Connected Neural Network
How can we use spatial structure in the input to inform the architecture of the network?
Fully Connected: • Connect neuron in hidden
layer to all neurons in input layer
• No spatial information!• And many, many parameters!
Input: • 2D image• Vector of pixel values
![Page 17: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/17.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Using Spatial Structure
Neuron connected to region of input. Only “sees” these values.
Idea: connect patches of input to neurons in hidden layer.
Input: 2D image. Array of pixel values
![Page 18: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/18.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Using Spatial Structure
Connect patch in input layer to a single neuron in subsequent layer.Use a sliding window to define connections.
How can we weight the patch to detect particular features?
![Page 19: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/19.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Applying Filters to Extract Features
1) Apply a set of weights – a filter – to extract local features
2) Use multiple filters to extract different features
3) Spatially share parameters of each filter(features that matter in one part of the input should matter elsewhere)
![Page 20: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/20.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Feature Extraction with Convolution
1) Apply a set of weights – a filter – to extract local features
2) Use multiple filters to extract different features
3) Spatially share parameters of each filter
- Filter of size 4x4 : 16 different weights- Apply this same filter to 4x4 patches in input- Shift by 2 pixels for next patch
This “patchy” operation is convolution
![Page 21: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/21.jpg)
Feature Extraction and ConvolutionA Case Study
![Page 22: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/22.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
X or X?
Image is represented as matrix of pixel values… and computers are literal! We want to be able to classify an X as an X even if it’s shifted, shrunk, rotated, deformed.
[4]
![Page 23: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/23.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Features of X
[4]
![Page 24: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/24.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Filters to Detect X Features
filters
[4]
![Page 25: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/25.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
element wisemultiply add outputs
1 1 = 1X
= 9
[4]
![Page 26: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/26.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
Suppose we want to compute the convolution of a 5x5 image and a 3x3 filter :
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs…image
filter
[5]
![Page 27: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/27.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
filter feature map
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
[5]
![Page 28: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/28.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
filter feature map
[5]
![Page 29: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/29.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
filter feature map
[5]
![Page 30: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/30.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
filter feature map
[5]
![Page 31: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/31.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
filter feature map
[5]
![Page 32: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/32.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
filter feature map
[5]
![Page 33: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/33.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
[5]
filter feature map
![Page 34: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/34.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
filter feature map
[5]
![Page 35: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/35.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
The Convolution Operation
We slide the 3x3 filter over the input image, element-wise multiply, and add the outputs:
filter feature map
[5]
![Page 36: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/36.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Producing Feature Maps
Original Sharpen Edge Detect “Strong” Edge Detect
![Page 37: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/37.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Feature Extraction with Convolution
1) Apply a set of weights – a filter – to extract local features
2) Use multiple filters to extract different features3) Spatially share parameters of each filter
![Page 38: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/38.jpg)
Convolutional Neural Networks (CNNs)
![Page 39: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/39.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
CNNs for Classification
1. Convolution: Apply filters with learned weights to generate feature maps.2. Non-linearity: Often ReLU. 3. Pooling: Downsampling operation on each feature map.
Train model with image data.Learn weights of filters in convolutional layers.
![Page 40: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/40.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Convolutional Layers: Local Connectivity
For a neuron in hidden layer:- Take inputs from patch - Compute weighted sum- Apply bias
![Page 41: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/41.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Convolutional Layers: Local Connectivity
For a neuron in hidden layer:- Take inputs from patch - Compute weighted sum- Apply bias
4x4 filter : matrix of weights !"# $
"%&
'$#%&
'!"# (")*,#), + .
for neuron (p,q) in hidden layer
1) applying a window of weights 2) computing linear combinations
3) activating with non-linear function
![Page 42: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/42.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
CNNs: Spatial Arrangement of Output Volume
depth
width
height
Layer Dimensions:ℎ " # " $
where h and w are spatial dimensionsd (depth) = number of filters
Receptive Field: Locations in input image that a node is path connected to
Stride:Filter step size
[3]
![Page 43: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/43.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Introducing Non-Linearity
! " = max (0 , " )
Rectified Linear Unit (ReLU)- Apply after every convolution operation (i.e., after convolutional layers)
- ReLU: pixel-by-pixel operation that replaces all negative values by zero. Non-linear operation!
[5]
![Page 44: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/44.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Pooling
How else can we downsample and preserve spatial invariance?
1) Reduced dimensionality2) Spatial invariance
[3]
![Page 45: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/45.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Representation Learning in Deep CNNs
Mid level features
Eyes, ears, nose
Low level features
Edges, dark spots
High level features
Facial structure
Conv Layer 1 Conv Layer 2 Conv Layer 3
[3]
![Page 46: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/46.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
CNNs for Classification: Feature Learning
1. Learn features in input image through convolution2. Introduce non-linearity through activation function (real-world data is non-linear!)3. Reduce dimensionality and preserve spatial invariance with pooling
![Page 47: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/47.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
CNNs for Classification: Class Probabilities
- CONV and POOL layers output high-level features of input- Fully connected layer uses these features for classifying input image- Express output as probability of image belonging to a particular class
softmax () = +,-∑/ +,0
![Page 48: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/48.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
CNNs: Training with Backpropagation
Learn weights for convolutional filters and fully connected layers
! " =$%&(%) log ,-(%)
Backpropagation: cross-entropy loss
![Page 49: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/49.jpg)
CNNs for Classification: ImageNet
![Page 50: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/50.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
ImageNet Dataset
Dataset of over 14 million images across 21,841 categories
1409 pictures of bananas.
“Elongated crescent-shaped yellow fruit with soft sweet flesh”
[6,7]
![Page 51: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/51.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
ImageNet Challenge
Classification task: produce a list of object categories present in image. 1000 categories.“Top 5 error”: rate at which the model does not output correct label in top 5 predictions
Other tasks include: single-object localization, object detection from video/image, scene classification, scene parsing
[6,7]
![Page 52: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/52.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
ImageNet Challenge: Classification Task
2010
2011
2012
2013
2014
2015
Human0
10
20
30
clas
sific
atio
n er
ror %
28.225.8
16.4
11.7
6.73.57
5.1
2012: AlexNet. First CNN to win. - 8 layers, 61 million parameters2013: ZFNet- 8 layers, more filters2014: VGG- 19 layers 2014: GoogLeNet- “Inception” modules - 22 layers, 5million parameters2015: ResNet- 152 layers
[6,7]
![Page 53: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/53.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
ImageNet Challenge: Classification Task
2010
2011
2012
2013
2014
2014
2015
Human
0
10
20
30
clas
sifica
tion
erro
r %
28.225.8
16.4
11.7
6.73.57
5.17.3
2010
2011
2012
2013
2014
2014
2015
0
50
100
150
num
ber
of la
yers
[6,7]
![Page 54: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/54.jpg)
An Architecture for Many Applications
![Page 55: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/55.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
An Architecture for Many Applications
Object detection with R-CNNsSegmentation with fully convolutional networks
Image captioning with RNNs
![Page 56: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/56.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Beyond Classification
Object Detection
CAT, DOG, DUCK
Semantic Segmentation
CAT
Image Captioning
The cat is in the grass.
![Page 57: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/57.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Semantic Segmentation: FCNs
FCN: Fully Convolutional Network.Network designed with all convolutional layers,
with downsampling and upsampling operations
[3,8,9]
![Page 58: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/58.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Driving Scene Segmentation
[10]Fix reference
![Page 59: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/59.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Driving Scene Segmentation
[11, 12]
![Page 60: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/60.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Object Detection with R-CNNs
R-CNN: Find regions that we think have objects. Use CNN to classify.
[13]
![Page 61: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/61.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Image Captioning using RNNs
[14,15]
![Page 62: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/62.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Image Captioning using RNNs
[14,15]
![Page 63: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/63.jpg)
Deep Learning for Computer Vision: Impact and Summary
![Page 64: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/64.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Data, Data, Data
MNIST: handwritten digits
places: natural scenesImageNet:
22K categories. 14M images.CIFAR-10
Airplane
Automobile
Bird
Cat
Deer
Dog
Frog
Horse
Ship
Truck
![Page 65: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/65.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Deep Learning for Computer Vision: Impact
![Page 66: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/66.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Impact: Face Detection6.S191 Lab!
![Page 67: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/67.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Impact: Self-Driving Cars
[16]
![Page 68: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/68.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Impact: Healthcare
[17]
Identifying facial phenotypes of genetic disorders using deep learningGurovich et al., Nature Med. 2019
![Page 69: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/69.jpg)
6.S191 Introduction to Deep Learningintrotodeeplearning.com 1/29/19
Deep Learning for Computer Vision: Summary
• Why computer vision?• Representing images• Convolutions for
feature extraction
Foundations CNNs Applications• CNN architecture• Application to
classification• ImageNet
• Segmentation, object detection, image captioning
• Visualization
![Page 70: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/70.jpg)
Referencesgoo.gl/hbLkF6
![Page 71: Deep Learning for Computer Vision - Michigan State …cse803/Lectures/deepLearning...6.S191 Introduction to Deep Learning introtodeeplearning.com 1/29/19 Images are Numbers What the](https://reader033.vdocument.in/reader033/viewer/2022042410/5f283b0ef33220736a7575bd/html5/thumbnails/71.jpg)
End of Slides