![Page 1: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/1.jpg)
Large Scale Discoveryof Spatially Related Images
Ondřej Chum and Jiří Matas
Center for Machine Perception
Czech Technical University
Prague
![Page 2: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/2.jpg)
2 /26
Related Vision Problems
• Organize my holiday snapshots– Schaffalitzky and Zisserman ECCV’02
• Find images containing a given “object” (“window”)– Sivic ICCV‘03, Nister CVPR‘06, Jegou CVPR’07, Philbin CVPR‘07, Chum ICCV’07
• Find small “object” in a film– Sivic and Zisserman CVPR’04
• Match and reconstruct Saint Marco – Snavely, Seitz and Szeliski SIGGRAPH’06
• Find and match ALL spatially related images in a large database, using only visual information, i.e. not using (flicker) tags, EXIF info, GPS, ….
This Work
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 3: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/3.jpg)
3 /26
Visual Only Approach
• Large database (100 000 images in our experiments)• Find spatially related clusters• Fast method, even for sizes up to 250 images• Probability of successful discovery of spatial relation of
images independent of database size
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 4: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/4.jpg)
4 /26
Image Clustering and its Time Complexity
Standard Approach (using image retrieval):Quadratic method in the size of database D -- O(D2)the multiplicative constant at the quadratic term ~ 1 – quadratic even for small D
1. Take each image in turn2. Use a image retrieval system to retrieve related images3. Compute connected components of the graph
Proposed method1.Seed Generation – hashingcharacterize images by pseudo-random numbers stored in a hash table time complexity equal to the sum of variances of Poisson distributions linear for database size D ¼ 250
2. Seed Growing – retrieval complete the clusters only for cluster members c << D, complexity O(cD)
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 5: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/5.jpg)
5 /26
Building on Two Methods• Fast (low recall) seed generation based on hashing• Thorough (high recall) seed growing based on image retrieval
Chum, Philbin, Isard, and Zisserman:Scalable Near Identical Image and Shot Detection
CIVR 2007
Chum, Philbin, Sivic, Isard, and Zisserman:Total Recall: Automatic Query Expansion
with a Generative Feature Model for Object RetrievalICCV 2007
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 6: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/6.jpg)
6 /26
Image Representation
0
4
0
2
...
Feature detector SIFT descriptor [Lowe’04]
Visual vocabulary
Vector quantization
…0
1
0
1
...
Bag of words
Set of words
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 7: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/7.jpg)
7 /26
Hypothesizing Seeds with min-Hash
A1 ∩ A2
A1 U A2
A1 A2
Image similarity measured as a set overlap (using min-Hash algorithm)
• Spatially related images share visual words• Problem: Robustly estimate set overlap of high dimensional sparse binary vectors in
constant time independent of the dimensionality (d¼105)• Set overlap probabilistically estimated via min-Hash• Similar approach as LSH (locally sensitive hashing)
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 8: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/8.jpg)
8 /26
min-Hash
0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 1
0 0 1 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0
1 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
• According to some (replicable) key select a small number of non-zero elements
• Similar vectors should have similar selected elements• Key = generate a random number (a hash) for each dimension, choose
nonzero element with minimal value of the key
29 12 19
26 3 26
29 12 1
35 27 7
![Page 9: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/9.jpg)
9 /26
Seed Generation: Probability of Success
An image pair forms a seed if at least one of k s-tuples of min-Hashes agrees.
Probability that an image pair is retrieved is a function of the similarity:
where s,k are user-controllable parameters of the method:s governs the size of the hashing table k is number of hashing tables
Successfully retrieved pair of images = at least one collision in one of the tables (equivalent to AND-OR)
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 10: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/10.jpg)
10 /26
Probability of Retrieving an Image Pair
similarity (set overlap)
Near duplicate ImagesImages of the same object
and unrelated images
8.9 % (sim = 0.057)
5.1% (sim = 0.047)
13.9 % (sim = 0,066)100% (sim = 0.746)
100% (sim = 0.322)
99.5% (sim = 0,217)
prob
abili
ty o
f re
trie
val
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 11: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/11.jpg)
11 /26
Spatially Related Images
18.9 % (sim = 0,074)5.1 % (sim = 0,047)
similarity (set overlap)
prob
abili
ty o
f re
trie
val (
log
scal
e)
13.9 %13.9 %
8.9 %8.9 % 5.1 %5.1 %
9.8 %9.8 % 7.2 %7.2 %
8.9 %8.9 %
13.9 %13.9 %
16.3 %16.3 %
10.7 %10.7 %
![Page 12: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/12.jpg)
12 /26
10%7%
4%
5% 4%
Seed Generation
P (no seed) =
6%
94.00 %85.73 %68.88 %
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 13: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/13.jpg)
13 /26
68.88 %
Seed Generation
P (no seed) = 55.13 %31.84 %1.94 %
Resemblance to RANSAC
Related image pair ~ an all inlier sample(there is no need to enumerate them all, one hit is sufficient)
Probability of retrieving an image pair ~ fraction of inliers
The number of related image pairs ~ how many times we can try
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 14: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/14.jpg)
14 /26
At Least One Seed in Cluster
cluster size
P(n
o se
ed)
similarity0.050.060.07
= probability of retrieval6.2%10.4%16.1%
Estimate of the probability of failure plot against the size of the clusterassumption used in this plot: all images in the cluster are related
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 15: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/15.jpg)
15 /26
backprojectfeatures
Growing the Seed• Application of Total Recall
– Combining average query expansion and transitive closure
– 3D geometric constraint (not only affine transformation)
– Tighter geometric constraints (10 pixel threshold)
queryenhanced query
Average query expansion (from possibly multiple coplanar structures)
Transitive closure crawl
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 16: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/16.jpg)
16 /26
Summary of the Method
Unknown structuremin-Hash seeds
x
Spatial verificationQuery Expansion
Rejected seed
Missed cluster
Seed
Cluster skeleton
Failed retrieval
Images
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 17: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/17.jpg)
17 /26
Experiment 1 Univ. of Kentucky Dataset
[Nister & Stewenius]
2550 clusters of size 4 – very small clusters
“partial” ground truth: “different” cluster share the same background
How many clusters have at least one seed?
CONTRAST – DIFFERENT TASKIf we were looking for ALL results not ANY (seed)
the standard retrieval measure on this dataset would be only 1.63 out of 4
46.9%
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 18: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/18.jpg)
18 /26
Experimental Validation UKY dataset
cluster size
P(n
o s
eed
)
similarity0.050.060.07
= probability of retrieval6.2%10.4%16.1%
+
In University of Kentucky dataset“average” similarity slightly above 0.06
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 19: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/19.jpg)
19 /26
Experimental Results on 100k Images
Hertford
Keble
Magdalen
Pitt Rivers
Radcliffe Camera
All Soul's
Ashmolean
Balliol
Bodleian
Christ Church
Cornmarket
Images downloaded from FLICKRIncludes 11 Oxford Landmarks with manually labelled ground truth
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 20: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/20.jpg)
20 /26
Experimental Results on 100k ImagesSettings scalable to millions images, also finding small clusters
Settings scalable to billions images, only finding larger clusters
Timing: 17 min 13 sec + 16 min 20 sec = 0.019 sec / imageO.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 21: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/21.jpg)
21 /26
Application – Object Labelling
Factorizing the clusters using multiple constrains• Matches between images• Weak geometric constraints (coplanarity, disparity)• Photographer’s psychology – tends to take pictures of
single objects
![Page 22: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/22.jpg)
22 /26
![Page 23: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/23.jpg)
23 /26
![Page 24: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/24.jpg)
24 /26
Automatic 3D Reconstruction
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 25: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/25.jpg)
25 /26O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 26: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/26.jpg)
26 /26
Conclusions
• Novel method for fast clustering in large collections• Combines fast low recall method (seed generation) and
thorough (total recall) method for seed growing• Probability of finding a cluster rapidly increases with its
size and is independent of the size of the database• Can be incrementally updated as the database grows• Efficient: 0.019 sec / image on a single PC• Fully parallelizable
• A state of the art near duplicate detection comes as a bonus (as a part of seed generation)
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images
![Page 27: Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague](https://reader036.vdocument.in/reader036/viewer/2022062422/56649ef45503460f94c074b7/html5/thumbnails/27.jpg)
27 /26
Thank you!
Thanks to Daniel Martinec, Michal Perďoch, James Philbin, Jakub Pokluda
Technical Report availablehttp://cmp.felk.cvut.cz/~chum/papers/Chum-TR-08.pdf
O.Chum, J. Matas: Large Scale Discovery of Spatially Related Images