mediafinder: collect, enrich and visualize media memes shared by the crowd

26
MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd Raphaël Troncy [email protected] / @rtroncy

Upload: raphael-troncy

Post on 18-Dec-2014

1.007 views

Category:

Technology


0 download

DESCRIPTION

"MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd", talk given at the 2nd Real Time Analysis and Mining of Social Streams Workshop (RAMSS) colocated with WWW 2013, Rio de Janeiro, Brazil

TRANSCRIPT

Page 1: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

MediaFinder: Collect, Enrich and Visualize Media Memes

Shared by the Crowd

Raphaël Troncy

[email protected] / @rtroncy

Page 2: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Conferences and natural disaster

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 2

Page 3: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

- 3 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro

Page 4: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

- 4 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro

Page 6: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

- 6 14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro

Page 7: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Social Media: some definitions

Media Item: a photo or a video that is shared on a social network

Micropost: a text status message that can optionally accompany a media item

Social Network: an online service that focuses on building and reflecting social relationships among people sharing interests or activities Media Sharing Platforms: emphasis on sharing media

but blurred boundaries with social networks since users are encouraged to react on media content (like, comment, favorite, etc.)

Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 7

Page 8: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Social networks and media items

First-order support: Posting requires the inclusion of a media item Example: Flickr, YouTube

Second-order support: Possibility to post media items but also text-only messages Example: Facebook

Third-order support: No direct support for media items but rely on third party applications

to host them Example: Twitter before the introduction of native photo support

Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 8

Page 9: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Server

Composition of media item extractors (12 SNs) Rely on search APIs + a fix 30s timeout window to provide results Fallback on screen scraping when necessary (Twitter ecosystem)

Implemented as a NodeJS server

Serialize results in a common schema (JSON)

Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro 14/05/2013 - 9

Page 10: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 10

Deep link Permalink

Clean text for NLP processing

Aggregate view of ALL social interactions

12 Social Networks

Page 11: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder (www2013)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 11

Page 12: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder (zooming on media items)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 12

Page 13: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder (timeline view)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 13

Page 14: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Named Entities are Pivotal

Standalone software GATE Stanford CoreNLP Temis

Web APIs

http://nerd.eurecom.fr/

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 14

Page 15: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

What is NERD? REST API2 ontology1

UI3

1 http://nerd.eurecom.fr/ontology 2 http://nerd.eurecom.fr/api/application.wadl 3 http://nerd.eurecom.fr

The NERD ontology has been integrated in the NIF project, a EU FP7 in the context of the LOD2: Creating Knowledge out of Interlinked Data

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 15

Page 16: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

NERD REST API

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 16

GET, POST, PUT,

DELETE

/document /user /annotation/{extractor} /extraction /evaluation ...

JSON/RDF*

“entities” : [{ “entity”: “Tim Berners-Lee” , “type”: “Person” , “uri”: "http://dbpedia.org/resource/Tim_berners_lee", “nerdType”: "http://nerd.eurecom.fr/ontology#Person", “startChar”: 30, “endChar”: 45, “confidence”: 1, “relevance”: 0.5 }]

Rizzo G., Troncy R. (2012), NERD: A Framework for Unifying Named Entity Recognition and Disambiguation Web Extraction Tools. In: European chapter of the Association for Computational Linguistics (EACL'12), Avignon, France.

Page 17: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder Architecture

Media items harvesting using the Media Server http://eventmedia.eurecom.fr/media-

server/search/{combined}/{term} https://github.com/vuknje/media-server (@tomayac fork)

Image near de-duplication DCT signature on image and video frame,

Hamming distance between image pairs

Clustering and disambiguation Named Entity Extraction using NERD Topic Generation using LDA

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 17

Page 18: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder (named entities clustering)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 18

Page 19: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder (zooming in a cluster)

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 19

Page 20: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Media Finder

Live Topic Generation from Event Streams Meet us at WWW 2013 Demo Session http://www.youtube.com/watch?v=8iRiwz7cDYY

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 20

Page 21: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Tracking an event: Italian Election

Repeated queries over a period of time We have tracked and analyzed media posts tagged as

elezioni2013 from 2013-02-26 to 2013-03-03 Cron job: every 30 minutes over the 6 days Slice the data in 24 hours slots

Research questions: Can we re-create the news headlines?

Storyboarding: http://mediafinder.eurecom.fr/story/elezioni2013

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 21

Page 22: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Tracking an event: Italian Election

Dataset: ~16501 microposts containing (duplicate) media items ~21087 Named Entities extracted

Clustering NER and LDA Generate Bag of Entities (BOE) disambiguated with a

DBpedia URI

Examples: Monti, Bersani, Italia, Berlusconi, Grillo, Stelle

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 22

Page 23: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Tracking an event: Italian Election

Tracking and Analyzing The 2013 Italian Election To appear at ESWC 2013 Demo Session http://www.youtube.com/watch?v=jIMdnwMoWnk

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 23

Page 24: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Take Home Message

Media Server / Media Finder: Aggregating fresh social media items Making sense of media collection for video hyper-linking

NERD platform for extracting key information

Vision: adoption of semantic multimedia technologies will foster a European market for media fragment re-purposing and re-selling

Sneak preview: Interact with a Kinect and discover enriched hypervideo http://www.youtube.com/watch?v=4mSC685AG7k

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 24

Page 25: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

Credits

Vuk Milicic … interaction designer

Giuseppe Rizzo … NERD guru

José Luis Redondo Garcia … triplification and clustering

Thomas Steiner … Media Server original code

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 25

Page 26: MediaFinder: Collect, Enrich and Visualize Media Memes Shared by the Crowd

http://www.slideshare.net/troncy

14/05/2013 Real-Time Analysis and Mining of Social Streams (RAMSS) - Rio de Janeiro - 26