![Page 1: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/1.jpg)
Module 5
Deep Convnets for Local RecognitionJoost van de Weijer4 April 2016
![Page 2: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/2.jpg)
Previously, end-to-end..
2Slide credit: Jose M Àlvarez
Dog
![Page 3: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/3.jpg)
Previously, end-to-end..
3Slide credit: Jose M Àlvarez
Dog
Learned Representation
![Page 4: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/4.jpg)
4
Dog
Learned Representation
Part I: End-to-end learning (E2E)
Previously, end-to-end..
![Page 5: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/5.jpg)
5
Learned Representation
Part I: End-to-end learning (E2E)
Task A(eg. image classification)
Previously, end-to-end..
![Page 6: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/6.jpg)
6
Part I: End-to-end learning (E2E)
Domain BFine-tuned
Learned Representation
Part I’: End-to-End Fine-Tuning (FT)
Part I: End-to-end learning (E2E)
Domain ALearned Representation
Part I: End-to-end learning (E2E)
Transfer
Previously,finetuning..
slide credit: X. Giro
![Page 7: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/7.jpg)
7
Fine-tuning a pre-trained network
Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
Previously,finetuning..
![Page 8: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/8.jpg)
8Slide credit: Victor Campos, “Layer-wise CNN surgery for Visual Sentiment Prediction” (ETSETB 2015)
Fine-tuning a pre-trained network
Fine-tuning: High learning rate in new layer, and low learning rate in all other layers.
Previously,finetuning..
![Page 9: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/9.jpg)
9
Task A(eg. image classification)
Learned Representation
Part I: End-to-end learning (E2E)
Task B(eg. image retrieval)Part II: Off-the-shelf features
Previously, off-the-shelf features..
slide credit: X. Giro
![Page 10: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/10.jpg)
Orange
Image classification: image as an input, label as output
spatial coded image representations(like spatial pyramids)
x y Fd d d
orderless image representation (like BOW)
1 1 Fd
Previously, off-the-shelf features..
![Page 11: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/11.jpg)
Two deep lectures in M5
Global Scale(today’s lecture)
Local Scale(next lecture)
Deep ConvNets for Recognition at...
![Page 12: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/12.jpg)
Orange
Image ClassificationImage classification: image as an input, label as output
How to process non-squared images ?
resize zero padding largest centred square
![Page 13: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/13.jpg)
Local object recognition
object localization
(single object)
object detection
semantic segmentation
![Page 14: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/14.jpg)
Classification+LOCALIZATION
slide credit: Li, Karpathy, Johnson
![Page 15: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/15.jpg)
Localization as regression
slide credit: Li, Karpathy, Johnson
![Page 16: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/16.jpg)
slide credit: Li, Karpathy, Johnson
Localization as regression
![Page 17: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/17.jpg)
regression head
classification head
Localization as regression
slide credit: Li, Karpathy, Johnson
![Page 18: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/18.jpg)
regression head
classification head
Localization as regression
slide credit: Li, Karpathy, Johnson
![Page 19: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/19.jpg)
Localization as regression
slide credit: Li, Karpathy, Johnson
![Page 20: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/20.jpg)
Localization as regressionClassification head:C- class scores
regression head:Cx4 - numbers
slide credit: Li, Karpathy, Johnson
Problem: multiple classes
![Page 21: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/21.jpg)
Localization as regression
slide credit: Li, Karpathy, Johnson
![Page 22: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/22.jpg)
Localization as regression (example)
Example of localization of cloths. Regression is done in two steps: first the person bounding box and then the cloth bounding boxes (master project 2015)
Esteve Cervantes: Evaluating deep features for Fashion Recognition
![Page 23: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/23.jpg)
Local object recognition
object localization
(single object)
object detection
semantic segmentation
any ideas ?
![Page 24: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/24.jpg)
Sliding window227
22
7
227
22
7
0.03
classification + regression
227
22
7
227
22
7
0.83classification + regression
Compute a new regressed bounding box and classification score for all sliding window positions.
![Page 25: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/25.jpg)
Sliding window
227
22
7
Repeat for different scales and combine all results (e.g. with non maxima suppression)
22
7
227
0.83
0.99
![Page 26: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/26.jpg)
Sliding window (efficient computation)
Let us for simplicity consider a simple three layer network
5x5
10
10
conv 1 fc1 fc2
car/not car
6
6
5
10
1
2
1
What are the spatial coordinates of conv1 ?
10
10
12x17
conv1 filter(5x5)
Part of the convolutionalfeatures are the same and do not need recomputation!
![Page 27: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/27.jpg)
Sliding window (efficient computation)
Let us for simplicity consider a simple three layer network
5x5
10
10
conv 1 fc1 fc2
car/not car
6
6
5
10
1
2
1
10
10
12x17
conv1 filter(5x5)
How many 10x10 windows are there in this 12x17 image ?
![Page 28: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/28.jpg)
Sliding window (efficient computation)
Let us for simplicity consider a simple three layer network
5x5
10
10
conv 1 fc1 fc2
car/not car
6
6
5
10
1
2
1
10
10
12x17
conv1 filter(5x5)
5x5
17
12
conv 1
13
8
5
The convolutions can be computed in a single pass.
![Page 29: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/29.jpg)
Sliding window (efficient computation)
Let us for simplicity consider a simple three layer network
5x5
10
10
conv 1 fc1 fc2
car/not car
6
6
5
10
1
2
1
10
10
12x17
conv1 filter(5x5)
5x5
17
12
conv 1
13
8
5 6x6x5
1x1x10
fc2
![Page 30: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/30.jpg)
Sliding window (efficient computation)
Let us for simplicity consider a simple three layer network
5x5
10
10
conv 1 fc1 fc2
car/not car
6
6
5
10
1
2
1
10
10
12x17
conv1 filter(5x5)
5x5
17
12
conv 1(5x5x3)
13
8
5
8
103
fc2=conv2(6x6x5)
![Page 31: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/31.jpg)
Sliding window (efficient computation)
Let us for simplicity consider a simple three layer network
5x5
10
10
conv 1 fc1 fc2
car/not car
6
6
5
10
1
2
1
10
10
12x17
conv1 filter(5x5)
5x5
17
12
conv 1(5x5x3)
13
8
5
8
103
fc2=conv2(6x6x5)
1x1x2
fc3
![Page 32: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/32.jpg)
Sliding window (efficient computation)
Let us for simplicity consider a simple three layer network
5x5
10
10
conv 1 fc1 fc2
car/not car
6
6
5
10
1
2
1
10
10
12x17
conv1 filter(5x5)
5x5
17
12
conv 15 fillters of (5x5x3)
13
8
5
8
103
fc2=conv210 filters of (6x6x5)
8
23
fc3=conv32 filters of (1x1x10)
We have the 8x3=24 classification scores sharing computation of the convolutional feaures.
![Page 33: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/33.jpg)
Example of bear and fish detection on multiple scales.
Semanet et al, ‘Integrated Recognition, Localization and Detection using Convolutional Networks’ ICLR 2014
Networks can be written as fully convolutional networks to speed up computation at testing time.
Sliding window (efficient computation)
![Page 34: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/34.jpg)
object proposals
selective search
K. Van de Sande et al. Segmentation as selective search for object recognition. ICCV 2011.
• object proposal methods compute boxes which potentially contain an object.
• Features for each box are extracted and a classifier is applied.
• typically thousands of boxes (but much less than sliding window)
• Many different approaches: selective search, edge boxes, GOP, etc.
![Page 35: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/35.jpg)
object proposals (RCNN)
Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.
1. compute object proposals (~2k)
2. warp dilated bounding box
4. classify regions
3. compute CNN features
car: yesperson : no
bounding box regression
![Page 36: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/36.jpg)
object proposals (RCNN)
Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.
Alex Net
![Page 37: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/37.jpg)
object proposals (RCNN)
Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.
Alex Net
remove last layer and finetune for 20 PASCAL classes
Use fc7 4096-d vector as the description of the bounding box.
Train a SVM on this representation for classification
![Page 38: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/38.jpg)
object proposals (RCNN)
slide credit: Girshick
![Page 39: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/39.jpg)
object proposals (RCNN)
![Page 40: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/40.jpg)
object proposals (RCNN)
slide credit: Li, Karpathy, Johnson
![Page 41: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/41.jpg)
object proposals (RCNN)
Girshick, Ross, et al. "Rich feature hierarchies for accurate object detection and semantic segmentation." CVPR 2014.
1. compute object proposals (~2k)
2. warp dilated bounding box
4. classify regions
3. compute CNN features
car: yesperson : no
improved bounding box
drawbacks:• not end-to-end• warping of boxes• lots of double computation (overlap of bounding boxes)
![Page 42: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/42.jpg)
object proposals (Fast R-CNN)
![Page 43: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/43.jpg)
object proposals (Fast R-CNN)
He, Kaiming, et al. "Spatial pyramid pooling in deep convolutionalnetworks for visual recognition." PAMI 2015
‘conv 5’ • compute ones the convolutional features per image.
shar
ed
co
mp
uta
tio
n(c
on
v1-c
on
v5)
![Page 44: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/44.jpg)
object proposals (Fast R-CNN)
This was first proposed by: He, Kaiming, et al. "Spatial pyramid pooling in deep convolutional networks for visual recognition." PAMI 2015
• compute ones the convolutional features• extract features from conv5 for all bb’s
shar
ed c
om
pu
tati
on
‘conv 5’
![Page 45: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/45.jpg)
object proposals (Fast R-CNN)
• pool the features in a spatial grid.
for all bounding boxes:Region of Interest pooling(ROI pooling)
shar
ed c
om
pu
tati
on
![Page 46: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/46.jpg)
object proposals (Fast R-CNN)
• pool the features in a spatial grid
ROI pooling:
FCsclassification:log loss
regression:smooth L1 loss
end-to-end training
shar
ed c
om
pu
tati
on
![Page 47: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/47.jpg)
object proposals (Fast R-CNN)
Fast R-CNN R-CNN
Train time 9.5 84
-speedup 8.8x -
Test time/image 0.32s 47s
Test speedup 146x -
mAP 66.9% 66.0%
multi-task improves also classification performance. end-to-end improves results
Test time does not include object proposal computation (which is now the bottleneck)
![Page 48: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/48.jpg)
object proposals (Faster R-CNN)
shar
ed c
om
pu
tati
on
‘conv5’
compute the object proposals directly in the network.
FCs Region Proposal Network (RPN)
ROI pooling:
![Page 49: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/49.jpg)
object proposals (Faster R-CNN)
slide credit: Kaming He
Slide a window over the feature map.
Add a network which classifies and regresses the bounding boxes.
The classification score provides the confidence of the presence of object.
![Page 50: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/50.jpg)
object proposals (Faster R-CNN)
slide credit: Kaming He
Slide a window over the feature map.
Add a network which classifies and regresses the bounding boxes.
The classification score provides the confidence of the presence of object.
Use N anchors for proposals of varying aspect ratios.
![Page 51: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/51.jpg)
object proposals (Faster R-CNN)
slide credit: Kaming He
Model Time
Edge boxes + R-CNN 0.25 sec + 1000*ConvTime + 1000*FcTime
Edge boxes + fast R-CNN 0.25 sec + 1*ConvTime + 1000*FcTime
faster R-CNN 1*ConvTime + 1000*FcTime
Computation for 1000 boxes.
![Page 52: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/52.jpg)
object proposals (Faster R-CNN)
slide credit: Li, Karpathy, johnson
![Page 53: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/53.jpg)
object proposals (Faster R-CNN)
slide credit: Li, Karpathy, johnson
![Page 54: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/54.jpg)
object localization
Winner ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2015 challenge with residual networks and Faster RCNN.
![Page 55: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/55.jpg)
object localization
Winner ImageNet Large Scale Visual Recognition Challenge (ILSVRC) 2015 challenge with residual networks and Faster RCNN
![Page 56: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/56.jpg)
summary object detection
slide credit: Li, Karpathy, johnson
• object localization: when there is one or a known number of objects/classes you can do object localization by adding a ‘regression head’ to your network.
• Sliding window + CNN can be computed efficiently by writing the network as a fully convolutional network.
• Object proposal methods are straightforwardly combined with CNNs, but for fast/good results consider:
• adding a regression head to improve bounding box estimation.• share computation of the convolutional features (SPP)• end-to-end training of network (fast RCNN)• include Region Proposal Network for fast object proposals within the network (faster RCNN).
![Page 57: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/57.jpg)
Local object recognition
object localization
(single object)
object detection
semantic segmentation
![Page 58: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/58.jpg)
semantic segmentation
semantic segmentation:assign a class to all pixels
instance segmentation : assign pixels to a particular instance of a class (chair1, etc..)
![Page 59: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/59.jpg)
semantic segmentationConvNet
predict center pixel
Because of the convolutions the resolution is smaller and upsampling is required
Write network as fully convolutionalnetwork and apply to image
![Page 60: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/60.jpg)
semantic segmentation
Long et al., Fully Convolutional Networksfor Semantic Segmentation, ICCV 2015
pixelwise loss
![Page 61: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/61.jpg)
semantic segmentation
Long et al., Fully Convolutional Networksfor Semantic Segmentation, ICCV 2015
Convolution (3x3)padding[1 1 1 1]stride [1 1]
inp
ut
![Page 62: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/62.jpg)
semantic segmentationConvolution (3x3)padding[1 1 1 1]stride [1 1]
inp
ut
![Page 63: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/63.jpg)
semantic segmentation
Convolution (3x3)padding[1 1 1 1]stride [2 2]
inp
ut
Convolution (3x3)padding[1 1 1 1]stride [1 1]
inp
ut
![Page 64: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/64.jpg)
semantic segmentation
Convolution (3x3)padding[1 1 1 1]stride [2 2]
inp
ut
Convolution (3x3)padding[1 1 1 1]stride [1 1]
inp
ut
![Page 65: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/65.jpg)
semantic segmentationdeconvolution (3x3)padding [1 1 1 1]stride [2 2]
inp
ut
![Page 66: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/66.jpg)
semantic segmentationdeconvolution (3x3)padding [1 1 1 1]stride [2 2]
inp
ut
• deconvolutions are also called fractionally strided convolutions, convolution transpose.
![Page 67: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/67.jpg)
semantic segmentation
Noh et al. ICCV 2015
![Page 68: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/68.jpg)
semantic segmentation
Noh et al. ICCV 2015
![Page 69: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/69.jpg)
semantic segmentation
combine where (local, shallow) with what (global, deep)
Long et al., Fully Convolutional Networksfor Semantic Segmentation, ICCV 2015
![Page 70: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/70.jpg)
semantic segmentation
Long et al., Fully Convolutional Networksfor Semantic Segmentation, ICCV 2015
interp + sum
interp + sum
dense output
‘skip layers’
![Page 71: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/71.jpg)
semantic segmentation
Long et al., Fully Convolutional Networksfor Semantic Segmentation, ICCV 2015
stride 32
no skips
stride 16
1 skip
stride 8
2 skips
ground truthinput image
![Page 72: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/72.jpg)
semantic segmentation
Eigen, Fergus, Predicting Depth, Surface Normals and Semantic Labelswith a Common Multi-Scale Convolutional Architecture, ICCV 2015
![Page 73: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/73.jpg)
semantic segmentation
Eigen, Fergus, Predicting Depth, Surface Normals and Semantic Labelswith a Common Multi-Scale Convolutional Architecture, ICCV 2015
Surface normalsresults
![Page 74: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/74.jpg)
instance segmentation
Dai et al. ‘Instance aware Semantic Segmentation via Multi-task Network Cascades’, arXiv 2015.
![Page 75: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/75.jpg)
instance segmentation
Dai et al. ‘Instance aware Semantic Segmentation via Multi-task Network Cascades’, arXiv 2015.
![Page 76: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/76.jpg)
instance segmentation
Dai et al. ‘Instance aware Semantic Segmentation via Multi-task Network Cascades’, arXiv 2015.
![Page 77: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/77.jpg)
instance segmentation
Dai et al. ‘Instance aware Semantic Segmentation via Multi-task Network Cascades’, arXiv 2015.
results ground-truth
![Page 78: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/78.jpg)
Generative Adversarial Networks
Dai et al. ‘Instance aware Semantic Segmentation via Multi-task Network Cascades’, arXiv 2015.
Fractionally strided convolutions (deconvolutions) can be used to generate images.
noise
![Page 79: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/79.jpg)
Generative Adversarial Networks
Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z.
max log log 1D
D x D G z
G(z)
generated horses
I can train a discriminative network D which is trained to distinguish real horse images x from generated horse images G(z)
x
real horses
D
![Page 80: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/80.jpg)
Generative Adversarial Networks
Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z.
maxlog log 1D
D x D G z
G(z)
generated horses
I can then optimize my generative network to fool the discriminative network.
x
real horses
D
minG
![Page 81: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/81.jpg)
Generative Adversarial Networks
Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z.
G(z)
generated horses
You can re-optimize the Discriminate network D, etc...
x
real horses
D
log oax l g 1mD
D x D G z minG
![Page 82: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/82.jpg)
Generative Adversarial Networks
Consider I would like to generate images of horses. My generated horse images G(z) are generated from noise z.
G(z)
generated horses
You can re-optimize the Discriminate network D, etc...until D gives in...
x
real horses
D
log oax l g 1mD
D x D G z minG
Goodman et al. Generative Adversarial NetsNIPS 2014
![Page 83: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/83.jpg)
Generative Adversarial Networks
Examples of generated bedrooms.Unsupervised Representation Radford et al. Learning with Deep ConvolutionalGenerative Adversarial Nteworks ICLR 2016
![Page 84: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/84.jpg)
Generative Adversarial Networks
Interpolation between points in z.
Unsupervised Representation Radford et al. Learning with Deep ConvolutionalGenerative Adversarial Nteworks ICLR 2016
![Page 85: Module 5 Deep Convnets for Local Recognition · Deep Convnets for Local Recognition Joost van de Weijer 4 April 2016. Previously, end-to-end.. Slide credit: Jose M 2 ... (SPP) •end-to-end](https://reader033.vdocument.in/reader033/viewer/2022041506/5e2529da461a7e4f7719b67e/html5/thumbnails/85.jpg)
summary semantic segmentation
slide credit: Li, Karpathy, johnson
• Fully convolutional networks can be applied for efficient classification of all pixels.• To get high quality segmentations deep features of multiple scales need to be combined (e.g. with skip layers).• upsampling can be done by de-convolution and de-pooling operations.• Instance segmentation can be performed by combining object detection and semantic segmentation pipelines.