relja arandjelović and andrew zisserman · 2014. 10. 28. · visual vocabulary with a semantic...
TRANSCRIPT
Visual vocabulary with a semantic twistRelja Arandjelović and Andrew Zisserman
Visual Geometry Group, Department of Engineering Science, University of Oxford
Motivation and objectives
Semantic vocabulary
Results
Fast Semantic Segmentationvia Soft Segments (FSSS)
(paper within a paper)
• Standard large scale instance retrieval:
- Usually based on matching local descriptors, e.g. (Root)SIFT
- Not distinctive enough
- Can't "see the big picture"
• SemanticSIFT:
- Matching: utilize local image semantic content
before
after
• Suppose we have pixel-wise semantic segmentation into C classes
• Assign a "semantic word" to a local image patch:
- The patch contains semantic class c if it contains at least one pixel of a class c
- Number of possible semantic words Ks=2C -1
- For our choice: {sky, flora, other} (C=3) there are Ks=7 semantic words: {sky}, {flora}, {other}, {sky, flora}, {sky, other}, {flora, other}, {sky, flora, other}
Matching
Product vocabulary
Feature removal
• Patches can match only if their semantic words are identical• Win #1: Increases precision due to stricter matching
• SemanticSIFT vocabulary: product vocabulary of the visual and semantic vocabularies; size K=Ksemantic x Kvisual
• Large scale retrieval: ranking via inverted index which exploits bag-of-words sparsity
- Larger vocabulary => shorter posting lists => fewer items to traverse during scoring => faster retrieval
• Win #2: Faster retrieval due to the larger (product) vocabulary
• For a specific task: some features are not useful, or even detrimental• Can remove features a priori known to be irrelevant
• Win #3: Reduced storage (RAM) costs
Win-Win-Win
• Testing on Oxford 5k and 105k datasets, training on Paris6k
• Baseline: Hamming Embedding + burstiness
• Over 5 random seeds: +1.2%
• Baseline with 7x larger visual vocabulary, Oxford 5k: 54.9%
• Expected speedup for an average query for Oxford 105k and SoftSemanticSIFT: 38.4%
Mean average precision (mAP)
Empirical speedup for the 55 Oxford queries
• State-of-the-art semantic segmentation methods take minutes per image
• We introduce a new method which takes 7 seconds on a single CPU in MATLAB for a 500x500 pixel image
• Code available:www.robots.ox.ac.uk/~vgg/software/fast_semantic_segmentation
• Idea:
- Start with fast soft-segmentation method by Leordeanu et al. ECCV 2012 (takes 1.7s)
- To handle segmentation uncertainty: introduce an "unknown" class and allow it to match all classes
- Minimize an energy which stimulates agreement between soft-segments and similar pixels, taking into account soft-segment unary potentials
- Stanford background dataset: 78% @ 3.7s / image- State-of-the-art: Lempitsky et al. (2011): 81.9% @ minutes per image due to using globalPb
• Results:
- Tighe & Lazebnik (2010): 77.5% @ 10 min / image
no geometric verification
False matches based on SIFT that are removed by semantic filtering