fast r-cnn object detection with caffe · 2015-07-21 · teaser: faster r-cnn shaoqing ren, kaiming...
TRANSCRIPT
![Page 1: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/1.jpg)
Fast R-CNNObject detection with Caffe
Ross Girshick
Microsoft Research
arXiv code
Latest roasts
![Page 2: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/2.jpg)
Goals for this section
• Super quick intro to object detection
• Show one way to tackle obj. det. with ConvNets
• Highlight some more sophisticated uses of Caffe• Python layers
• Multi-task training with multiple losses
• Batch sizes that change dynamically during Net::Forward()
• Pointers to open source code so you can explore, try, and understand!
![Page 3: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/3.jpg)
Image classification (mostly what you’ve seen)
• 𝐾 classes
• Task: Assign the correct class label to the whole image
Digit classification (MNIST) Object recognition (Caltech-101, ImageNet, etc.)
![Page 4: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/4.jpg)
Classification vs. Detection
Dog
Bridge
DogDog
Easyish, these days Still quite a lot harder
Bridge
![Page 5: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/5.jpg)
Problem formulation
Person 0.7
Motorbike 0.9
Input Desired output*Actual results may vary
The Visual World ≈ 𝐾 object classes{airplane, bird, motorbike, person, sofa, bg}
YODA:Yet another
ObjectDetectionAlgorithm
![Page 6: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/6.jpg)
PASCAL VOC object detection
0%
10%
20%
30%
40%
50%
60%
70%
2006 2007 2008 2009 2010 2011 2012 2013 2014 2015
mea
n A
vera
ge P
reci
sio
n (
mA
P)
year
< 2 years1.8x mAP
~5 years
Before the successful application of ConvNets
After
Precision: higher is better
![Page 7: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/7.jpg)
Image
A Fast R-CNN network(VGG_CNN_M_1024)
Object box proposals (N)e.g., selective search
2. 𝑃 𝑐𝑙𝑠 = 𝑘 𝑏𝑜𝑥 = 𝑛,𝑖𝑚𝑎𝑔𝑒)
for each NK boxes
1. NK regressed objectboxes
Two outputs:
Fast R-CNN (Region-based Convolutional Networks)
A fast object detector implemented with Caffe- Caffe fork on GitHub that adds two new layers
(ROIPoolingLayer and SmoothL1LossLayer)- Python (using pycaffe) / more advanced Caffe usage- A type of Region-based Convolutional Network (R-CNN)
Let’s see how it works!
![Page 8: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/8.jpg)
Quick background
Region-based Convolution Networks (R-CNNs)
Inputimage
Extract regionproposals (~2k / image)e.g., selective search[van de Sande, Uijlings et al.]
Compute CNNfeatures onregions
Classify and refineregions
[Girshick et al. CVPR’14]
![Page 9: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/9.jpg)
Fast R-CNN (test-time detection)
Given an image and object proposals,detection happens with a single call to the Net::Forward()
Net::Forward() takes 60 to 330ms
Image
A Fast R-CNN network(VGG_CNN_M_1024)
Object box proposals (N)e.g., selective search
2. 𝑃 𝑐𝑙𝑠 = 𝑘 𝑏𝑜𝑥 = 𝑛,𝑖𝑚𝑎𝑔𝑒)
for each NK boxes
1. NK regressed objectboxes
Two output types:
![Page 10: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/10.jpg)
Fast R-CNN (test-time detection)
Image
A Fast R-CNN network(VGG_CNN_M_1024)
Minimal post-processing:- Non-maximum suppression (NMS)
Object proposals comes from:- Selective Search (2s / image) [van de Sande/Uijlings et al.]
- EdgeBoxes (0.2s / image) [Zitnick & Dollar]- MCG (30s / image) [Arbelaez et al.]- Etc.
Object box proposals (N)e.g., selective search
2. 𝑃 𝑐𝑙𝑠 = 𝑘 𝑏𝑜𝑥 = 𝑛,𝑖𝑚𝑎𝑔𝑒)
for each NK boxes
1. NK regressed objectboxes
Two output types:
![Page 11: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/11.jpg)
Zooming into the net
Pool5 blob size = 2000 x 512 x 6 x 6
2000 x (4 * 21)
2000 x 21
2000 image regions come in here, blob size = 2000 x 5
image comes in here, blob size = S x 3 x H x W (e.g., S = 1 or 5, H = 600, W = 1000)
conv5 feature map blob size = S x 512 x H/16 x W/16
(a bunch of conv layersand whatnot)
![Page 12: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/12.jpg)
Zooming into the net
image comes in here, blob size = S x 3 x H x W (e.g., S = 1 or 5, H = 600, W = 1000)
RoI Pooling Layer:- adaptive max pooling layer- dynamically expands batch from S to R (e.g., 2000)
conv5 feature map blob size = S x 512 x H/16 x W/16
Pool5 blob size = 2000 x 512 x 6 x 6
2000 x (4 * 21)
2000 x 21
2000 image regions come in here, blob size = 2000 x 5
![Page 13: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/13.jpg)
Another view of the same thing
These (top and bottom images) are the same
DeepConvNet
conv5
feature map
RoIprojection
RoIpoolinglayer FCs
RoI featurevector
softmaxbbox
regressor
Outputs:
FC FC
For each RoI
![Page 14: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/14.jpg)
RoI Pooling Layer• Special case of SPPnet’s SPP layer [He et al. ECCV’14]
• Two inputs (“bottoms”)• Conv feature map: S x 512 x H x W
• Regions of Interest: R x 5• 5 comes from [r, x1, y1, x2, y2], where r in [0, R – 1] specifies an image batch index
![Page 15: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/15.jpg)
The train-time netSingle fine-tuning operation all in Caffe
Even more boxes and arrowsLet’s look at them
![Page 16: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/16.jpg)
The train-time net (inputs)
Zoomed area
B full images: B x 3 x H x W (e.g., B = 2, H = 600, W = 1000)
Class labels: 128 x 21
Bounding-boxregression targets: 128 x 84
Bounding-boxregression loss weights: 128 x 84
RoIs: 128 x 5(75% background)
![Page 17: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/17.jpg)
The train-time net (exotic data layers)
Custom Python data layer• Samples 2 images• From each sampled image, takes 64 RoIs• Input batch is initially 2 elements• Gets expanded by the RoI Pooling Layer to 128 elements• Outputs 5 “tops”
• data [images]• rois [regions of interest]• labels [class labels for the rois]• bbox_targets [box regression targets]• bbox_loss_weights […details…]
![Page 18: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/18.jpg)
The train-time net (multi-task losses)
Zoomed area
Classification loss(Cross-entropy)
Bounding-box regression loss(“Smooth L1”)
+
![Page 19: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/19.jpg)
Code is onGitHub(MIT License,Runs on Linux)
![Page 20: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/20.jpg)
A brief tour of some of the codeCaffe fork
Train, test
Python modules
![Page 21: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/21.jpg)
A brief tour of some of the code (Caffe bits)Caffe fork
Train, test
Python modules
![Page 22: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/22.jpg)
Region of Interest (RoI) Pooling Layer
Expands a smallbatch into a bigbatch
![Page 23: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/23.jpg)
Smooth L1 LossLayer
Robust to outliersOptimizer friendly
Per-dimension lossweights
L1L1
L2
Smooth L1 loss
![Page 24: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/24.jpg)
A brief tour of some of the code (Python bits)Caffe fork
Train, test
Python modules
![Page 25: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/25.jpg)
Python data layerfor Fast R-CNN
Reshapes blobson-the-fly
![Page 26: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/26.jpg)
Python training code
Custom solver loopwith custom snapshotmethod
![Page 27: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/27.jpg)
A brief tour of some of the code (CLI tools)Caffe fork
Train, test
Python modules
![Page 28: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/28.jpg)
![Page 29: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/29.jpg)
![Page 30: Fast R-CNN Object detection with Caffe · 2015-07-21 · Teaser: Faster R-CNN Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research •The detection network also proposes](https://reader033.vdocument.in/reader033/viewer/2022041612/5e3869925c7923323e0cce34/html5/thumbnails/30.jpg)
Teaser: Faster R-CNNShaoqing Ren, Kaiming He, Ross Girshick, Jian Sun. Microsoft Research
• The detection network also proposes objects
• Marginal cost of proposals: 10ms
• VGG16 runtime ~200ms including all steps
• Higher mAP, faster
• Open-source Caffe code coming later this summer
Region ProposalNetwork sharesconv layers withFast R-CNN objectdetection network