qmul @ mediaeval 2012: social event detection in collaborative photo collections

QMUL @ MediaEval 2012:Social Event Detection inCollaborative Photo CollectionsMarkus Brenner, Prof. Ebroul Izquierdo

Multimedia and Vision Research GroupQueen Mary University of London, UK

OBJECTIVE

In Collaborative Photo Collection …

1. Find and detect social events

2. Retrieve photos associated with the events

… with the help of additional, external information

INTRODUCTION AND BACKGROUND

Internet enables people to host, access and share their photos online; for example, through websites like Flickr and Facebook

Collaborative annotations and tags as well as public comments are commonplace

Information people assign varies greatly but often seems to include some sort of references to what happened where and who was involved observed experiences or occurrences simply referred to as events

INTRODUCTION AND BACKGROUND

Easier to search through photo collections if photos are grouped into events

Link events in photo collections to public social media like online news feeds

Automatically link news with corresponding photos

Provide additional information that might be relevant to users to facilitate their search, like the date and location of an event

Retrieving Photos

Limiting Search SpaceDetecting Events

Textual Features

Preprocessing

ClassificationVisual Pruning (Classification)

By Date and Time By Location

Expanding Feature Space

Detected Events

Query

Retrieved Photos

By Date/Timeand Topic

By Date and Location

Matching GeographicLocations

Composing Textual Features

Extracting Visual FeaturesGathering External Data

Topic -Specific

General

Translating Terms

Expanding the Topic

Compiling Names of Geographic

Locations

Looking up Geographic Locations

Soccer Matches*

Google Geocoding

API

GeoNames

Google Translate API

DBpedia(via SPARQL)

WordNet

* Example. Framework extendable to other topics.

OVERVIEW OF FRAMEWORK

GATHERING EXTERNAL DATA

Expanding the topic

Handling geographic locations(e.g. compiling names of locations)

Expanding the Topic

Social events often revolve around a topicExamples: Festivals, sport events, …

Problem: Users to no adhere to a controlled vocabulary

Idea: Expand textual representation of a given topicExample: Expand the term concert by relating terms like festival, gig, band, sound, etc.

Accomplish through combination of WordNet, DBpedia and some initial evidence

Handling Geographic Locations

Venue location of a social event is an important cue

Interested in gaining a more complete understanding such as of the city and country a event takes place to expand the query

Beneficial as users often refer to a different geographical hierarchy, e.g. foreigner to a country but local to a city

Also consider geographic coordinates to later matchgeo-tagged photos

Use Google Geocoding API

Compiling Names of Locations

Identify and understand any textual annotations in photos that refer to geographic locations

Used in retrieval process to isolate photos that do not likely correspond to the venue of a queried event

Extract all countries and larger cities from the GeoNames dataset

Topic-Specific: Soccer Matches

Use DBpedia (SPARQL) to find all soccer clubs and associated stadiums for a given city in the query

PREPROCESSING

Matching geographic locations

Translating terms and stop-words

Composing textual features

Matching Geographic Locations

Geo-tagged photos are becoming more and more popular

Identify photos as belonging and not belonging to a venue (and an event when also considering the time)

For each venue compile two sets of photos(within/outside its bounds)

Translating Terms and Stop-words

Photos get annotated and tagged in many different languages

Translate topic-related terms and stop-words into other languages

Limit to languages prevailing in the countries in which the query venues are located

Use Google Translate API

Composing Textual Features

Concatenate all information into a combined textual representation (title, description, keywords, username, …)

Also include information obtained from external sources

Use Roman preprocessor to converts text into lower case, strip punctuation as well as whitespaces and remove accents from Unicode characters

Eliminates common stop-words, numbers and terms commonly associated with photography

Apply language-agnostic character-based tokenizer

Convert tokens into a matrix of occurrences (TF/IDF)

RETRIEVING PHOTOS OF AN EVENT

In the most basic case, we (already) know about a specific event, and we wish to simply retrieve all photos associated with it

Classification-based approach

Limiting search space

Expanding feature space

Visual pruning

Classification-based Approach: I

Treat each event independently (we instantiate a separate classifier for each event for a series of events)

Train classifier on the textual features we compose beforehand according to each event

No separate training dataset required

Classification-based Approach: II

Binary classification, but also introduce a third class that reflects events of the same topic to improve results

Possible to include features of another query

Two different fusing strategies implemented

Experiment with multiple classifiers (Linear SVC, SGD, …)

Use spare data representation and sparse-adjusted classifier

Limiting Search Space

Generally, the date and time a photo was captured are effective cues to bound the search space

For each event’s prediction step, we consider only those photos that lie within the event’s temporal search window Specified by the query (e.g. New Year’s Eve)

Retrieved by the framework through external topic-specific sources (e.g. the specific days of a concert tour)

Roughly estimated (based on a clustering scheme) in the forthcoming event detection method

Exclude photos not matching geographic location

Expanding Feature Space

Expand feature space based on query information and photo collection itself

Helpful when “training” information is sparse(the case when there are few geo-tagged photos)

Iterative two-step process:

1. Train initial classifier on the few query terms available

2. Then compile new list of textual terms based on the predicted outcome over all applicable photos

3. Finally, used gained terms to refine initial query terms

Example: Photos related to a specific music venue contain terms of the playing band or artist

Visual Pruning

Mixing textual and visual features is not straightforward

Employ a cascade of two separate classifiers, each separately adjusted to its feature space and data representation

First fast textual classification, then visual binary pruning on few remaining photos

Utilize MPEG-7 color and texture features

Experiment with several classifiers (Random Forrest, SVC with RBF kernel, Linear SVC)

DETECTING EVENTS

Two proposals:

If the date but not time of day is known, apply a clustering method on all candidates of a given day largest clusters then reflect events

Otherwise: Expand approach by performing a prediction step for any day instead of just selected days conforming to the events will inadvertently grow the search space

In both cases apply a threshold (number of photos relating to potential event) prior considering a new event

EXPERIMENTS

Dataset

Implementation details and setup

Results

Dataset

2012 MediaEval SED Dataset – Challenge II

167.332 photos collected from Flickr

Metadata: unique Flickr ID, capture timestamp, username, title, description, keywords and partial geographic coordinates (in about a fifth of the cases:)

Ground truth in the form of event clusters (specifying associated photos) for two topics/challenges

“Training set”: 2011 MediaEval SED Dataset

Implementation Details and Setup

Define event as a distinct combination of location and date (one event per day at the same location)

Use English names of locations only

Bounding threshold of 500 meter

Default: Linear SVC, no feature expansion, no visual pruning

Evaluation measures: Precision (P), Recall (R),F-score, Normalized Mutual Information (NMI)

Dataset Setup

Focus on Challenge II

Challenge I/III: Current approach has limitation No event/venue detection through social media websites like

Twitter

Only basic venue/location detection/clustering issue when the destination covers a large area(e.g. entire country)

Results: Challenge II

Detected: 32 events

Identified several thousand photos not belongingto any relevant venue substantial reduction of candidates large amount of training samples

P R F NMI

Default configuration 79.0 67.1 72.6 0.65

Basic event detection 56.0 69.6 62.0 0.53

With visual pruning 83.2 61.9 71.0 0.63

With feature expansion 79.0 66.9 72.5 0.65

worse

CONCLUSION

External information, e.g. about a venue, helpful for both event detection and retrieval of associated photos

Finding and linking external data in a uniform waystill challenging

Visual information does not improve results much

Future considerations: Social media websites like Facebook and Twitter

Improved venue/location detection/clustering

Thank you!Questions?

qmul @ mediaeval 2012: social event detection in collaborative photo collections

Technology