in-depth exploration of geotagging performance

In-depth Exploration of Geotagging Performance using sampling strategies on YFCC100MGeorge Kordopatis-Zilos, Symeon Papadopoulos, Yiannis KompatsiarisInformation Technologies Institute, Thessaloniki, Greece

MMCommons Workshop, October 16, 2016 @ Amsterdam, NL

Where is it?Depicted landmarkEiffel TowerLocationParis, Tennessee

Keyword “Tennesee” is very important to correctly place the photo.

Source (Wikipedia):http://en.wikipedia.org/wiki/Eiffel_Tower_(Paris,_Tennessee)

http://en.wikipedia.org/wiki/Eiffel_Tower_(Paris,_Tennessee)

http://en.wikipedia.org/wiki/Eiffel_Tower_(Paris,_Tennessee)

MotivationEvaluating multimedia retrieval systems• What do we evaluate?• How?• What decisions do we make based on it?

MM system (black box) Test Collection

Comparison to ground truth

Evaluation measure

Decision

Problem Formulation• Test collection creation Evaluation bias

• Performance reduced to a single measure miss a lot of nuances of performance

• Test problem: Geotagging = predicting the geographic location of a multimedia item based on its content

Example: Evaluating geotagging• Test collection #1: 1M images, 700K located in US• Assume we use P@1km as an evaluation measure

• System 1: almost perfect precision in US (100%), very poor for rest of the world (10%) P@1km = 0.7*100 + 0.3*10 = 73%

• System 2: approximately the same precision all over the world (65%) P@1km = 65%

• Test collection #2: 1M images, 500K depicting cats and puppies on white background• Then, for 50% of the collection any prediction is

essentially random.

Multimedia Geotagging• Problem of estimating the geographic location of a

multimedia item (e.g. Flickr image + metadata)• Variety of approaches:• Text-based: use the text metadata (tags)

• Gazetteer-based• Statistical methods (associations between tags & locations)

• Visual• Similarity-based (find most similar and use their location)• Model-based (learn visual model of an area)

• Hybrid• Combine text and visual

Language Model

• Most likely cell: • Tag-cell probability:

We will refer to this as:Base LM (or Basic)

Language Model Extensions• Feature selection

• Discard tags that do not provide any geographical cues• Selection criterion: locality > 0

• Feature weighting• More importance to tags with geographic information• Linear combination of locality and spatial entropy

• Multiple grids• Consider two grids: fine and coarse – if the estimate from the fine

grid falls within that of the coarse, then use that one

• Similarity Search• Out of the selected cell, use lat/lon of most similar item to refine

location estimation

We will refer to this as:Full LM (or Full)

MediaEval Placing Task• Benchmarking activity in the context of MediaEval• Dataset: • Flickr images and videos (different each year)• Training and test set

• Also possible to test systems that use external data

Edition Training Set Test Set

2015 4,695,149 949,889

2014 5,025,000 510,000

2013 8,539,050 262,000

Proposed Evaluation Framework• Initial (reference) test collection Dref

• Sampling function f: Dref Dtest

• Performance volatility

• p(D): performance score achieved in collection D• In our case, we consider two such measures:• P@1km• Median distance error

Sampling StrategiesA variety of approaches for Placing Task collection:• Geographical Uniform Sampling• User Uniform Sampling• Text-based Sampling• Text Diversity Sampling• Geographically Focused Sampling• Ambiguity-based Sampling• Visual Sampling

Uniform Sampling• Geographic Uniform Sampling• Divide earth surface into square areas of approximately

the same size (~10x10km)• Select N items from each area (N=median of items/area)

• User Uniform Sampling• Select only one item per user

Text Sampling• Text-based Sampling• Select only items with more than M terms (M: median

of terms/item)

• Text Diversity Sampling• Represent items using bag-of-words• Use MinHash to generate a binary code per BoW vector• Select one item per code (bucket) B

Other Sampling Strategies• Geographically Focused Sampling

• Pick items from a selected place (continent/country)

• Ambiguity-based Sampling• Select the set of items that are associated with ambiguous

place names (or the complementary set)• Ambiguity defined with the help of entropy

• Visual Sampling• Select only items associated with a given visual concept• Select only items associated with concepts related to buildings

Experiments - Setup• Placing Task 2015 dataset: 949,889 images (subset

of YFCC100M)• Test four variants of Language Model method:• Basic-PT: Base LM method trained on PT dataset (=4.7

geotagged images released by the task organizers)• Full-PT: Full LM method trained on PT dataset• Basic-Y: Base LM method trained on YFCC dataset

(=40M geotagged images of YFCC100M)• Full-Y: Full LM method trained on YFCC dataset

Reference Results

Geographical Uniform Sampling• Initial distribution • Uniform distribution:• select three items/cell

User Uniform Sampling

Text-based Sampling

Select only images with >7 tags/item

Text Diversity Sampling

• After MinHash, 478,817 buckets were created.

Geographically Focused Sampling

Results of Full-Y

Ambiguity-based Sampling

Visual Sampling

Summary of Results

Thank you!Data/Code:• https://github.com/MKLab-ITI/multimedia-geotagging/

Get in touch:• George Kordopatis-Zilos: [email protected] • Symeon Papadopoulos: [email protected] / @sympap

With the support of:

https://github.com/MKLab-ITI/multimedia-geotagging/tree/master/samples

mailto:[email protected]






in-depth exploration of geotagging performance

Data & Analytics