![Page 1: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/1.jpg)
Collective Vision: Using Extremely Large Photograph Collections
Mark Lenz
CameraNet Seminar
University of Wisconsin – Madison
January 26, 2010
Acknowledgments: These slides combine and modify slides provided by Yantao Zheng et al. (National University of Singapore/Google)
![Page 2: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/2.jpg)
Introduction
• Distributed Collaboration
• Google Goggles– Personal object recognition
• World-Wide Landmark Recognition
• Building Rome in a Day– Distributed matching and reconstruction
![Page 3: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/3.jpg)
Distributed Collaboration
• Disaster or emergency– Time is of the essence
• Telecommunication networks down
• No maps or GPS
What can we do to help ourselves and those around us?
![Page 4: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/4.jpg)
Mobile Phones for Distributed Collaboration
• Camera for collecting visual information
• Ad-hoc wireless LAN– e.g. Bluetooth
Goals:– Determine location, exits and hazardous paths
Have I or someone else been here before?
![Page 5: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/5.jpg)
Model Scenarios
• Firefighters
• Trapped miners
• Natural Disasters– Large population exodus– Building collapse
Multiple agents collaborating to traverse an unknown environment
![Page 6: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/6.jpg)
• Visual search using picture as query
• Combination of algorithms– Object recognition– Optical character recognition– Geo-location (GPS & compass)
• Identify– Books and products– Businesses and landmarks
![Page 7: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/7.jpg)
A World-Wide Landmark Recognition Engine with Web Learning
• Goal: Build a landmark recognition engine at earth-scale
![Page 8: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/8.jpg)
Challenge I
No list of landmarks in the world We only have: noisy data on Internet
Tourist web articles
Tourist photos
geographical
location
![Page 9: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/9.jpg)
Challenge II
How to learn landmark visual models
Image search engine
Photo-sharing websites
![Page 10: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/10.jpg)
Challenge III
• Efficiency– Learning from enormous data– Recognizing from huge model
![Page 11: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/11.jpg)
Discovering landmarks in the world
Two approaches: Photos in photo sharing websites
Online tourist articles
Geo-tagged
Landmark
name
![Page 12: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/12.jpg)
Learning landmarks from GPS-Tagged photos
GPS-tagged photos
20M images from picasa.companoramio.com
Geo-clustering
geo cluster = landmarks?
validate by photo authors
Noisy image pool
Visualclustering
Graph clustering based on local features
Validate by photo authors
Analyzing text tags
Compute frequency of n-grams of text tags
Premise: Landmark photos are
• geographically adjacent• visually similar• uploaded by diff. users
![Page 13: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/13.jpg)
Landmarks from GPS-Tagged photos
~20 million GPS-tagged photos• 140k geo-clusters and 14k visual
clusters• 2240 landmarks from 812 cities in
104 countries – biased distribution, mostly in Europe
United States 263Spain 194Italy 183France 141United Kingdom 136Greece 51Portugal 48Russia 45Austria 42
![Page 14: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/14.jpg)
Learning landmarks from tourist web articles
Explore article corpus in wikitravel.com
Assume a geographical hierarchy
Landmark mining = named entity extraction
HTML is a structure tree Node: a HTML tag
Value: text
Classify each tree node , based on semantic clues embedded in the document structure
![Page 15: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/15.jpg)
Learning landmarks from tourist web articles
Heuristic rules nodes are in "To See" or "See"
section nodes are children of “bullet list”
nodes. Nodes indicate bold font format
Extract all named entities as landmark candidates
Validate by visual models
![Page 16: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/16.jpg)
Learning landmarks from tourist web articles
~7000 landmarks from 787 cities in 145 countries
More evenly distributed
![Page 17: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/17.jpg)
Unsupervised learning of landmark images
Geo-clusters
Landmarks from tour
articles
Noisy image pool
Visual clustering
Premise: photos from landmark should be similar
Clustering based on local features
Validate and clean models
Visual model validates landmarks!
Photo v.s. non-photo classifer to filter out noisy images
……
![Page 18: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/18.jpg)
Local Feature Detection
• Find invariant and robust features
• Create distinctive feature descriptions
![Page 19: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/19.jpg)
Laplacian-of-Gaussian (LoG)
• Scale-invariant edge detection
• Gaussian image filter to remove noise
• Laplacian filter to find areas of rapid change
![Page 20: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/20.jpg)
Local Feature Description
• Invariant and distinctive description
• Texture from 118 dimension Gabor wavelet
![Page 21: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/21.jpg)
Object matching based on local features
Sim( ) = image match score,
Image representationInterest points:
Laplacian-of-Gaussian (LoG) filter
Local feature: Gabor wavelets
match score =
Probability that match of and is false positive
Probability of at least m out of n features match, if
Probability of a feature match by chance
![Page 22: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/22.jpg)
Constructing match region graph
Image matching
•Node is match region•2 types of edges:
•match edge: measures match confidence
•overlap region edge: measures spatial overlapping
![Page 23: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/23.jpg)
Graph clustering on match regions
Distance between any two regions = shortest path connecting them
Why hierarchical agglomerative clustering? but not K-means, GMM etc
Because we don't have a priori knowledge of # of clusters. Each cluster should correspond to one aspect of a landmark
intuitively
Agglomerative hierarchical clustering
Match region graph Visual clusters
![Page 24: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/24.jpg)
Visual cluster example
Corcovado, Rio de Janeiro, BrazilAcropolis, Athens, Greece
![Page 25: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/25.jpg)
Visual cluster validation and cleaning Validate by authors or hosting webs of
images reflect the popular appeal of
landmarks Filter out non-photographic images, like
map, logo train Adaboost classifier features: color hist, hough transform, etc.
Clean clusters by detecting large area human face
![Page 26: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/26.jpg)
Efficiency issues
Issue 1: learning landmark image
21.4M photos
Recognition engine: ~5000
landmarksIssue 2: recognizing landmark
Query image
Parallel computing to learn true landmark images
Efficient hierarchical clustering
Indexing local feature for matching Query time: ~0.2 sec in a P4 computer
kd-tree indexing
![Page 27: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/27.jpg)
Experiments: statistics of learned landmarks
From photos
From articles
Total
Landmark # 2240 3246 5486
City # 812 626 1259
Country # 104 130 144
small overlap: 174 landmarks shared
China: 101 landmarksUnder-counted! Why?
U.S.- High internet penetration rate & enourmous tour site
![Page 28: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/28.jpg)
Evaluation of landmark image learning
• Randomly select 1000 visual clusters
• 68 (0.68%) are outliers: maps, logos, human photos
• Apply photographic v.s. non-photographic classifier
• 37 outliers. 0.68%=>0.37%
![Page 29: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/29.jpg)
Evaluation of landmark recognition
• Positive testing images: – 728 images from 124 landmarks
• Negative testing images: • Caltech-256 (30524 ) +
Pascal VOC 07 (9986 ) = 40,510 images.
• For positive images: – 417 images detected to be
landmarks– 337/417 (80.8%) are correct– Identification rate: 337/728
(46.3%)
• For negative images: – 463 images detected to be
landmarks– False acceptance rate:
1.1%
Landmarks canbe similar!
![Page 30: Collective Vision: Using Extremely Large Photograph Collections Mark Lenz CameraNet Seminar University of Wisconsin – Madison January 26, 2010 Acknowledgments:](https://reader035.vdocument.in/reader035/viewer/2022062421/56649e045503460f94aefde7/html5/thumbnails/30.jpg)
False detected images
Match is technically correct, but match region is not landmark
Match is technically false, due to visual similarity
A problem of model generation
A problem of image feature and matching mechanism