spp-netimlab.postech.ac.kr/dkim/class/csed514_2019s/sppnet.pdf · - 20-60x faster than r-cnn, as...
TRANSCRIPT
SPP-netSpatial Pyramid Poolingin Deep ConvolutionalNetworks
Highlights
• ILSVRC 2014 (all provided-data tracks)
• DET -2nd
• CLS - 3rd
• LOC - 5th
• ECCV 2014 paper
• Published 2 months ago (arXiv:1406.4729v1, June18)
• Details disclosed (arXiv:1406.4729v2)
Overview
• SPP-net- a new network structure
• Classification- improves all CNNs
• Detection- 20-60x faster than R-CNN, asaccurate
Spatial PyramidMatching
• SPM: very successful in traditional computer vision[Grauman & Darrell, ICCV 2005] “The Pyramid Match Kernel: Discriminative Classification with Sets ofImage Features”
[Lazebnik et al, CVPR 2006] “Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural SceneCategories”
denseSIFT encoded
(VQ, SC,FV)SPM SVM
prediction
“fc layers”simply pooling?“conv layers”CNN
counterparts
SPP-net: SPM inCNN
1000
4096 4096
traditional
CNN
fixedsize conv fc
SPP-net
anysize
1000
4096 4096
spatialpyramid
pooling
• Fix bin numbers
• DO NOT fix binsize
SPP-net
• variable input size/scale• multi-size training
• multi-scale testing
• full-image view
• multi-level pooling• robust to deformation
• operates on featuremaps• pooling in regions
conv feature maps
conv layers
input image
concatenate
…...
…...
spatial pyramid poolinglayer
fc layers
14.76
13.92
13.52
11.97
14.14
13.54
11.12
13.64
13.33
12.80
12.33
10.95
10.50
10.00
11.00
11.50
12.00
12.50
13.00
13.50
14.00
14.50
15.00
ZF-5 Convnet*-5 Overfeat-5 Overfeat-7
ILSVRC top-5 val (10-view)
no-SPPbaselines
+ multi-size training
multi-level pooling
All CNNs
improved!
4architectures
ILSVRC 2014 CLSResults
• “shallow”• 7-conv, 1 Titan GPU, 3weeks
• but potential• SPP can improve deeper nets: >1% gain post-competition
team top-5 test
GoogLeNet 6.66
Oxford VGG 7.32
ours 8.06
Howard 8.11
DeeperVision 9.50
NUS-BST 9.79
TTIC_ECP 10.22
…
7-conv SPP-net,10-view 10.95%
7-conv SPP-net,9m6u-vltiei-wsc+a2le-f/uvlilew 9.08%
multiple SPP-nets 8.06%
Detection: SPP onRegions
SPP
conv feature maps
conv layers
input image
region
fc layers
…...
RCNN vs.SPP
• image regions vs. feature mapregions
SPP-net
1 net on fullimage
image
net
feature
featurefeature
net
image
net
feature
net
feature
net
feature
feature
R-CNN
2000 nets on image regions
• With regional features, we can do everything ofRCNN• fine-tune, SVM, bbox regression…
• similar accuracy, much faster
SPP-net1-scale
SPP-net5-scale
RCNN
mAP 58.0 59.2 58.5
GPU time / img 0.14s 0.38s 9s
speed-up 64x 24x -
VOC2007
SPP-net RCNN
GPU time / img 0.6s 32s
40k test imgs 8 hours 15days
cost of a singlemodel
ILSVRC 2014 DETResults
“provided data” track
mAP
NUS 37.2
ours, multiSPP-nets 35.1
UvA 32.0
ours, 1 SPP-net 31.8
Southeast-CASIA 30.4
1-HKUST 28.8
CASIA_CRIPAC_2 28.6
• Conclusion• SPM inCNNs
• CLS: improve all CNNs in the literature
• DET: practical, fast, andaccurate
• Futurework• SPP on advancednetworks
• Resources•code, config, tech report… http://research.microsoft.c
om/en-us/um/people/kahe/
• Acknowledgement• We thank NVIDIA for the GPUdonation.