qmul @ mediaeval 2012: social event detection in collaborative photo collections
TRANSCRIPT
QMUL @ MediaEval 2012:Social Event Detection inCollaborative Photo CollectionsMarkus Brenner, Prof. Ebroul Izquierdo
Multimedia and Vision Research GroupQueen Mary University of London, UK
OBJECTIVE
In Collaborative Photo Collection …
1. Find and detect social events
2. Retrieve photos associated with the events
… with the help of additional, external information
INTRODUCTION AND BACKGROUND
Internet enables people to host, access and share their photos online; for example, through websites like Flickr and Facebook
Collaborative annotations and tags as well as public comments are commonplace
Information people assign varies greatly but often seems to include some sort of references to what happened where and who was involved observed experiences or occurrences simply referred to as events
INTRODUCTION AND BACKGROUND
Easier to search through photo collections if photos are grouped into events
Link events in photo collections to public social media like online news feeds
Automatically link news with corresponding photos
Provide additional information that might be relevant to users to facilitate their search, like the date and location of an event
Retrieving Photos
Limiting Search SpaceDetecting Events
Textual Features
Preprocessing
ClassificationVisual Pruning (Classification)
By Date and Time By Location
Expanding Feature Space
Detected Events
Query
Retrieved Photos
By Date/Timeand Topic
By Date and Location
Matching GeographicLocations
Composing Textual Features
Extracting Visual FeaturesGathering External Data
Topic -Specific
General
Translating Terms
Expanding the Topic
Compiling Names of Geographic
Locations
Looking up Geographic Locations
Soccer Matches*
Google Geocoding
API
GeoNames
Google Translate API
DBpedia(via SPARQL)
WordNet
* Example. Framework extendable to other topics.
OVERVIEW OF FRAMEWORK
GATHERING EXTERNAL DATA
Expanding the topic
Handling geographic locations(e.g. compiling names of locations)
Expanding the Topic
Social events often revolve around a topicExamples: Festivals, sport events, …
Problem: Users to no adhere to a controlled vocabulary
Idea: Expand textual representation of a given topicExample: Expand the term concert by relating terms like festival, gig, band, sound, etc.
Accomplish through combination of WordNet, DBpedia and some initial evidence
Handling Geographic Locations
Venue location of a social event is an important cue
Interested in gaining a more complete understanding such as of the city and country a event takes place to expand the query
Beneficial as users often refer to a different geographical hierarchy, e.g. foreigner to a country but local to a city
Also consider geographic coordinates to later matchgeo-tagged photos
Use Google Geocoding API
Compiling Names of Locations
Identify and understand any textual annotations in photos that refer to geographic locations
Used in retrieval process to isolate photos that do not likely correspond to the venue of a queried event
Extract all countries and larger cities from the GeoNames dataset
Topic-Specific: Soccer Matches
Use DBpedia (SPARQL) to find all soccer clubs and associated stadiums for a given city in the query
PREPROCESSING
Matching geographic locations
Translating terms and stop-words
Composing textual features
Matching Geographic Locations
Geo-tagged photos are becoming more and more popular
Identify photos as belonging and not belonging to a venue (and an event when also considering the time)
For each venue compile two sets of photos(within/outside its bounds)
Translating Terms and Stop-words
Photos get annotated and tagged in many different languages
Translate topic-related terms and stop-words into other languages
Limit to languages prevailing in the countries in which the query venues are located
Use Google Translate API
Composing Textual Features
Concatenate all information into a combined textual representation (title, description, keywords, username, …)
Also include information obtained from external sources
Use Roman preprocessor to converts text into lower case, strip punctuation as well as whitespaces and remove accents from Unicode characters
Eliminates common stop-words, numbers and terms commonly associated with photography
Apply language-agnostic character-based tokenizer
Convert tokens into a matrix of occurrences (TF/IDF)
RETRIEVING PHOTOS OF AN EVENT
In the most basic case, we (already) know about a specific event, and we wish to simply retrieve all photos associated with it
Classification-based approach
Limiting search space
Expanding feature space
Visual pruning
Classification-based Approach: I
Treat each event independently (we instantiate a separate classifier for each event for a series of events)
Train classifier on the textual features we compose beforehand according to each event
No separate training dataset required
Classification-based Approach: II
Binary classification, but also introduce a third class that reflects events of the same topic to improve results
Possible to include features of another query
Two different fusing strategies implemented
Experiment with multiple classifiers (Linear SVC, SGD, …)
Use spare data representation and sparse-adjusted classifier
Limiting Search Space
Generally, the date and time a photo was captured are effective cues to bound the search space
For each event’s prediction step, we consider only those photos that lie within the event’s temporal search window Specified by the query (e.g. New Year’s Eve)
Retrieved by the framework through external topic-specific sources (e.g. the specific days of a concert tour)
Roughly estimated (based on a clustering scheme) in the forthcoming event detection method
Exclude photos not matching geographic location
Expanding Feature Space
Expand feature space based on query information and photo collection itself
Helpful when “training” information is sparse(the case when there are few geo-tagged photos)
Iterative two-step process:
1. Train initial classifier on the few query terms available
2. Then compile new list of textual terms based on the predicted outcome over all applicable photos
3. Finally, used gained terms to refine initial query terms
Example: Photos related to a specific music venue contain terms of the playing band or artist
Visual Pruning
Mixing textual and visual features is not straightforward
Employ a cascade of two separate classifiers, each separately adjusted to its feature space and data representation
First fast textual classification, then visual binary pruning on few remaining photos
Utilize MPEG-7 color and texture features
Experiment with several classifiers (Random Forrest, SVC with RBF kernel, Linear SVC)
DETECTING EVENTS
Two proposals:
If the date but not time of day is known, apply a clustering method on all candidates of a given day largest clusters then reflect events
Otherwise: Expand approach by performing a prediction step for any day instead of just selected days conforming to the events will inadvertently grow the search space
In both cases apply a threshold (number of photos relating to potential event) prior considering a new event
EXPERIMENTS
Dataset
Implementation details and setup
Results
Dataset
2012 MediaEval SED Dataset – Challenge II
167.332 photos collected from Flickr
Metadata: unique Flickr ID, capture timestamp, username, title, description, keywords and partial geographic coordinates (in about a fifth of the cases:)
Ground truth in the form of event clusters (specifying associated photos) for two topics/challenges
“Training set”: 2011 MediaEval SED Dataset
Implementation Details and Setup
Define event as a distinct combination of location and date (one event per day at the same location)
Use English names of locations only
Bounding threshold of 500 meter
Default: Linear SVC, no feature expansion, no visual pruning
Evaluation measures: Precision (P), Recall (R),F-score, Normalized Mutual Information (NMI)
Dataset Setup
Focus on Challenge II
Challenge I/III: Current approach has limitation No event/venue detection through social media websites like
Only basic venue/location detection/clustering issue when the destination covers a large area(e.g. entire country)
Results: Challenge II
Detected: 32 events
Identified several thousand photos not belongingto any relevant venue substantial reduction of candidates large amount of training samples
P R F NMI
Default configuration 79.0 67.1 72.6 0.65
Basic event detection 56.0 69.6 62.0 0.53
With visual pruning 83.2 61.9 71.0 0.63
With feature expansion 79.0 66.9 72.5 0.65
worse
CONCLUSION
External information, e.g. about a venue, helpful for both event detection and retrieval of associated photos
Finding and linking external data in a uniform waystill challenging
Visual information does not improve results much
Future considerations: Social media websites like Facebook and Twitter
Improved venue/location detection/clustering
Thank you!Questions?