mediafinder: collect, enrich and visualize media memes shared by the crowd
DESCRIPTION
"MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd", talk given at the 2nd Real Time Analysis and Mining of Social Streams Workshop (RAMSS) colocated with WWW 2013, Rio de Janeiro, BrazilTRANSCRIPT
MediaFinder: Collect, Enrich and Visualize Media Memes
Shared by the Crowd
Raphaël Troncy
[email protected] / @rtroncy
Conferences and natural disaster
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 2
- 3 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro
- 4 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro
- 5 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro
- 6 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro
Social Media: some definitions
Media Item: a photo or a video that is shared on a social network
Micropost: a text status message that can optionally accompany a media item
Social Network: an online service that focuses on building and reflecting social relationships among people sharing interests or activities Media Sharing Platforms: emphasis on sharing media
but blurred boundaries with social networks since users are encouraged to react on media content (like, comment, favorite, etc.)
Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 7
Social networks and media items
First-order support: Posting requires the inclusion of a media item Example: Flickr, YouTube
Second-order support: Possibility to post media items but also text-only messages Example: Facebook
Third-order support: No direct support for media items but rely on third party applications
to host them Example: Twitter before the introduction of native photo support
Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 8
Media Server
Composition of media item extractors (12 SNs) Rely on search APIs + a fix 30s timeout window to provide results Fallback on screen scraping when necessary (Twitter ecosystem)
Implemented as a NodeJS server
Serialize results in a common schema (JSON)
Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 9
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 10
Deep link Permalink
Clean text for NLP processing
Aggregate view of ALL social interactions
12 Social Networks
Media Finder (www2013)
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 11
Media Finder (zooming on media items)
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 12
Media Finder (timeline view)
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 13
Named Entities are Pivotal
Standalone software GATE Stanford CoreNLP Temis
Web APIs
http://nerd.eurecom.fr/
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 14
What is NERD? REST API2 ontology1
UI3
1 http://nerd.eurecom.fr/ontology 2 http://nerd.eurecom.fr/api/application.wadl 3 http://nerd.eurecom.fr
The NERD ontology has been integrated in the NIF project, a EU FP7 in the context of the LOD2: Creating Knowledge out of Interlinked Data
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 15
NERD REST API
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 16
GET, POST, PUT,
DELETE
/document /user /annotation/{extractor} /extraction /evaluation ...
JSON/RDF*
“entities” : [{ “entity”: “Tim Berners-Lee” , “type”: “Person” , “uri”: "http://dbpedia.org/resource/Tim_berners_lee", “nerdType”: "http://nerd.eurecom.fr/ontology#Person", “startChar”: 30, “endChar”: 45, “confidence”: 1, “relevance”: 0.5 }]
Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France.
Media Finder Architecture
Media items harvesting using the Media Server http://eventmedia.eurecom.fr/media-
server/search/{combined}/{term} https://github.com/vuknje/media-server (@tomayac fork)
Image near de-duplication DCT signature on image and video frame,
Hamming distance between image pairs
Clustering and disambiguation Named Entity Extraction using NERD Topic Generation using LDA
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 17
Media Finder (named entities clustering)
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 18
Media Finder (zooming in a cluster)
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 19
Media Finder
Live Topic Generation from Event Streams Meet us at WWW 2013 Demo Session http://www.youtube.com/watch?v=8iRiwz7cDYY
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 20
Tracking an event: Italian Election
Repeated queries over a period of time We have tracked and analyzed media posts tagged as
elezioni2013 from 2013-02-26 to 2013-03-03 Cron job: every 30 minutes over the 6 days Slice the data in 24 hours slots
Research questions: Can we re-create the news headlines?
Storyboarding: http://mediafinder.eurecom.fr/story/elezioni2013
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 21
Tracking an event: Italian Election
Dataset: ~16501 microposts containing (duplicate) media items ~21087 Named Entities extracted
Clustering NER and LDA Generate Bag of Entities (BOE) disambiguated with a
DBpedia URI
Examples: Monti, Bersani, Italia, Berlusconi, Grillo, Stelle
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 22
Tracking an event: Italian Election
Tracking and Analyzing The 2013 Italian Election To appear at ESWC 2013 Demo Session http://www.youtube.com/watch?v=jIMdnwMoWnk
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 23
Take Home Message
Media Server / Media Finder: Aggregating fresh social media items Making sense of media collection for video hyper-linking
NERD platform for extracting key information
Vision: adoption of semantic multimedia technologies will foster a European market for media fragment re-purposing and re-selling
Sneak preview: Interact with a Kinect and discover enriched hypervideo http://www.youtube.com/watch?v=4mSC685AG7k
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 24
Credits
Vuk Milicic … interaction designer
Giuseppe Rizzo … NERD guru
José Luis Redondo Garcia … triplification and clustering
Thomas Steiner … Media Server original code
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 25
http://www.slideshare.net/troncy
14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 26