object retrieval with large vocabularies and fast spatial matching
DESCRIPTION
Object retrieval with large vocabularies and fast spatial matching. James Phibin 1 , Ondrej Chum 1 , Michael Isard 2 ,Josef Sivic 1 , and Andrew Zisserman 1 1 Department of Engineering Science, 2 University of Oxford Microsoft Research,Silicon Valley. CVPR 2007. Overview. Problem - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/1.jpg)
Object retrieval with large vocabularies and fast spatial
matching
James Phibin1, Ondrej Chum1, Michael Isard2,Josef Sivic1, and Andrew Zisserman1
1Department of Engineering Science, 2University of OxfordMicrosoft Research,Silicon Valley
CVPR 2007
![Page 2: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/2.jpg)
Overview
• Problem– Input: a user-selected region of a query image– Return: a ranked list of images retrieved from a large
corpus.• Containing the same object
• Objective– a promising step towards “web-scale” image corpora
• Improvement– Improving the visual vocabulary– Incorporating spatial information into the ranking– Examples
![Page 3: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/3.jpg)
Datasets
• Source– Flickr
• Oxford 5K dataset– “Oxford Christ Church,” “Oxford
Radcliffe Camera,”… with “Oxford”
– 5,062 (1,024*768) images
• 100K dataset– 145 most popular tags– 99,782 (1,024*768) images
• 1M dataset– 450 most popular tags– 1,040,801 (500*333) images
![Page 4: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/4.jpg)
Indexing the dataset
• Image description– Affine-invariant Hessian regions
• 3,300 regions on a 1,024*768 image– SIFT descriptor
• 128-D– 4×4× 8-direction gradient histogram
• Model– bag-of-visual-words
• Quantize the visual descriptors to index the image
• Search engine– L2 distance as similarity– tf-idf weighting scheme
• more commonly occurring = less discriminative = smaller weight
2×2 8-direction gradient histogram
![Page 5: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/5.jpg)
Train the Dictionary
K-mean
Approximate k-mean (AKM)
Hierarchical k-mean (HKM)
![Page 6: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/6.jpg)
• Traditional k-mean– single iteration
• O(NK)
• Strategy– Reduce the number of candidates of nearest cluster heads– AKM
• Approximate nearest neighbor– replace the exact computing nearest neighbors with
» 8 randomized k-d tree of cluster heads• Less than 1% of points are assigned differently from k-mean for moderate values of K
– HKM• “vocabulary tree”
– A small number (K=10) of cluster centers at each level– Kn clusters at the n-th level
• Quantization effect– AKM
• Conjunction of trees– Overlapping partition
– HKM• Points can additionally be assigned to some internal nodes
AKM v.s.HKM2D k-d tree
![Page 7: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/7.jpg)
Comparing vocabularies
K-mean v.s. AKM
HKM v.s.AKM
Scaling up with AKM
![Page 8: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/8.jpg)
Ground Truth
• Dataset– 5K dataset
• Searching– Manual– Entire– For 11 landmarks
• Labels– Positive
• Good: nice, clear• OK: more than 25% of the object
– Null• Junk: less than 25%
– Negative• Absent: object not present
![Page 9: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/9.jpg)
5 queries for each landmark
![Page 10: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/10.jpg)
Evaluation
• Precision– # of retrieved positive images / # of total retrieved
images
• Recall– # of retrieved positive images / # of total positive
images
• Average precision (AP)– The area under the precision-recall curve for a query
• Mean average precision (mAP)• Average AP for each of the 5 queries for a landmark• Final mAP = average for mAP for each landmark
![Page 11: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/11.jpg)
K-mean v.s. AKM
![Page 12: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/12.jpg)
HKM v.s.AKM
![Page 13: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/13.jpg)
Recognition Benchmark
D. Nistér and H. Stewénius. Scalable recognition with a vocabulary tree. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), volume 2, pages 2161-2168, June 2006.
![Page 14: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/14.jpg)
Scaling up with AKM
![Page 15: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/15.jpg)
Spatial re-ranking
![Page 16: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/16.jpg)
Use Spatial Info.
• Usage– Re-ranking the top ranked results
• Procedure1. Estimate a transformation for each target image2. Refine the estimations
– Reduce the errors due to outliers– LO-RANSAC
» RANdom SAmple Consensus » Additional modeL Optimization step
3. Re-rank target images– Scoring target images to the sum of the idf value for the
inlier words– Verified images above unverified images
![Page 17: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/17.jpg)
Restricted transformation
• Degree of freedom– 3 dof
• Isotropic scale• Covering the changes in zoom or
distance– 4 dof
• Anisotropic scale• Covering foreshortening, either horizontal
or vertical– 5 dof
• Anisotropic scale and vertical shear
• NOT– In-plane rotation
foreshorten(perspective)
shear
![Page 18: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/18.jpg)
Comparing spatial rankings
Different transformation typesLarge datasets
ExamplesExamples of errors
![Page 19: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/19.jpg)
Different transformation types
![Page 20: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/20.jpg)
Large datasets
![Page 21: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/21.jpg)
Examples
![Page 22: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/22.jpg)
Examples of errors
![Page 23: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/23.jpg)
Conclusion
• Conclusion– Scalable visual object-retrieval system
• Future work– More evaluation for higher scale– Including spatial info. into the index– Moving some of the burden of spatial
matching to the first ranking stage
![Page 24: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/24.jpg)
RANSAC
http://en.wikipedia.org/wiki/RANSAC
![Page 25: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/25.jpg)
RANSAC example
![Page 26: Object retrieval with large vocabularies and fast spatial matching](https://reader036.vdocument.in/reader036/viewer/2022062315/56815874550346895dc5d2d0/html5/thumbnails/26.jpg)