event identification in social media hila becker, luis gravano mor naaman columbia university...

15
EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Upload: anabel-martin

Post on 23-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Page 2: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Social Media Sites Host Many “Event” Documents

Photo-sharing: Flickr Video-sharing: YouTube Social networking: Facebook

2

“Event”= something that occurs at a certain time in a certain place [Yang et al. ’99]

Popular, widely known eventsPresidential Inauguration, Thanksgiving Day Parade

Smaller events, without traditional news coverageLocal food drive, street fair

Social media documents for “All Points West” festival, Liberty State Park, New

Jersey, 8/8/08

Social media documents for “All Points West” festival, Liberty State Park, New

Jersey, 8/8/08

Page 3: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Identifying Events and Associated Social Media Documents

Applications Event search and browsing Local search …

3

General approach: group similar documents via clusteringEach cluster corresponds to one event and its associated social media documents

Page 4: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Event Identification: Challenges

Uneven data quality Missing, short, uninformative text … but revealing structured context

available: tags, date/time, geo-coordinates Scalability Dynamic data stream of event

information Unknown number of events

Necessary for many clustering algorithms Difficult to estimate

4

Page 5: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Clustering Social Media Documents Social media document

representation Social media document similarity Social media document clustering

Clustering task: definition Ensemble algorithm: combining

multiple clustering results Preliminary evaluation

5

Page 6: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Social Media Document Representation

TitleTitle

Description

Description

TagsTags

Date/TimeDate/Time

LocationLocation

All-TextAll-Text

6

Page 7: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Social Media Document Similarity

Text: tf-idf weights, cosine similarity

7

TitleTitle

Description

Description

TagsTags

Date/TimeDate/Time

LocationLocation

All-TextAll-Text

TitleTitle

Description

Description

TagsTags

Date/Time-

Keywords

Date/Time-

Keywords

Location-ProximityLocation-Proximity

All-TextAll-Text

Location-KeywordsLocation-Keywords

Date/Time-

Proximity

Date/Time-

Proximity

time

Location: geo-coordinate proximity

AA AAAA BB BBBB

Time: proximity in minutes

Page 8: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Social Media Document Clustering Framework

Document featurerepresentation

Social mediadocuments

Event clusters

8

Page 9: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Consensus Function:combine ensemble similarities

Consensus Function:combine ensemble similarities

Clustering: Ensemble Algorithm

Wtitle

Wtags

Wtime

9

f(C,W)f(C,W)

Ctitle

Ctags

Ctime

Ensemble clustering solution

Ensemble clustering solution

Learned in a training step

Learned in a training step

Page 10: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Clustering: Measuring Quality Homogeneous clusters

10

Complete clusters

Metric: Normalized Mutual Information (NMI)Shared information between clustering solution and “ground truth”

Page 11: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Experimental Setup

Data: >270K Flickr photos Event labels from Yahoo!’s “upcoming” event

database Split into 3 parts for training/validation/testing

Clusterers: single pass algorithm with centroid similarity

Weighing scheme: Normalized Mutual Information (NMI) scores on validation set

Consensus function: weighted average of clusterers’ binary predictions

Final prediction step: single pass clustering algorithm

11

Page 12: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Preliminary Evaluation Results Individual clusterer performance

Highest NMI: Tags, All-Text Lowest NMI: Description, Title

Ensemble performance, compared against all individual clusterers Highest overall performance in terms of

NMI More homogenous clusters: each event

is spread over fewer clusters

12

Details in paper

Details in paper

Page 13: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Document similarity metric Ensemble approach

Weight assignment Choice of clusterers

Train a classifier to predict document similarity Features correspond to similarity scores

All-text, title, tags, time, location, etc. Numeric values in [0,1]

State-of-the-art classifiers: SVM, Logistic Regression, …

13

Future Work: Alternative Choices

Page 14: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Future Work: Alternative Choices

Final clustering step Apply graph partitioning algorithms

Requires estimating the number of clusters Evaluation metrics: beyond NMI Datasets

Flickr LastFM, YouTube Exploit social network connections

14

Page 15: EVENT IDENTIFICATION IN SOCIAL MEDIA Hila Becker, Luis Gravano Mor Naaman Columbia University Rutgers University

Conclusions

Identified events and their corresponding social media documents Proposed a clustering solution Leveraged different representations of social media

documents Employed various social media similarity metrics

Developed a weighted ensemble clustering approach

Reported preliminary results of our event identification approach on a large-scale dataset of Flickr photographs

15