effective event identification in social media

26
+ Effective Event Identification in Social Media 2014/4/27(Mon.) Chang Wei-Yuan @ MakeLab Lab Meeting Fotis Psallidas, Hila Becker DEB’13

Upload: chang-wei-yuan

Post on 24-Jul-2015

67 views

Category:

Data & Analytics


0 download

TRANSCRIPT

+

Effective Event Identification in Social Media

2014/4/27(Mon.)Chang Wei-Yuan @ MakeLab Lab Meeting

Fotis Psallidas, Hila BeckerDEB’13

+Outline

n Introduction

nMethodn Known-Event Identificationn Unknown-Event Identificationn Improving Identification Effectiveness

nExperimental Evaluation

nConclusion

nThought

2

+ Introduction

nOnline social media are extensively distribute content related to real-world events. n event: something that occurs at a certain time

in a certain place

3

+ Introduction

nOnline social media are extensively distribute content related to real-world events. n event: something that occurs at a certain time

in a certain place

4

Goal:Identifying Events and Associated Social Media Documents

+ Introduction

nGeneral approach: group similar documents via clusteringn Each cluster corresponds to one event and

its associated social media documents

5

+ Introduction

nChallengesn Uneven data qualityn Highly heterogeneousn Dynamic data stream of event informationn Number of events unknown

6

+Event Identification

nKnown-Event Identification

nUnknown-Event Identification

nImproving Identification Effectiveness

7

+Known-Event Identification8

+Known-Event Identification

nSocial media content related to known eventsn reside in multiple social media sites, each

contributing different information

nTo retrieve cross-site social media documents for same eventn miss many relevant event documents

9

+Known-Event Identification

nIn the first step, using the known event properties to achieve high-precision results.

nIn the second step, using term extraction and frequency analysis to improve recall.

10

+Unknown-Event Identification11

na Twitter stream may contain many tweets related to an event n with messages related to other eventsn with messages unrelated to events

+Unknown-Event Identification

nThe proposed online clustering frameworkn leverages the multiple features to decide

when two social media documents correspond to the same event

12

Social Media Document Clustering Framework

Document  featurerepresentation

Social  mediadocuments

Event  clusters

13

Ensemble Algorithm

nThe proposed online clustering frameworkn deployed ensemble learning methods to learn and

associate each feature with a weight and a threshold that capture the importance of the features

14

Consensus Function:combine ensemble similarities

Wtitle

Wtags

Wtime

15

f(C,W)

Ctitle

Ctags

Ctime

Ensemble clustering solution

Learned in a training step

Ensemble Algorithm

Event Classification

nThe proposed online clustering frameworkn deployed event classification to distinguish

between event-related clusters and non-event ones

16

Event Classification 17

Ensemble clustering solution

Event

unrelated to events

related to an event

event classification

+ Improving Identification Effectiveness

nHow events behave over time have a significant impact on the effectiveness of the document clustering procedure?

nHow to refine the clustering procedure to benefit from these factors is a challenging task?

18

+URLs

nURLs in event-related social streams are ubiquitous. Individuals use them to share meaningful event-related external content.

19

+Bursty Vocabulary

nThe social media content related to an event tends to revolve around a central topic.n this central topic is expressed by a set of

terms that is significantly more frequent n span a wide time range exhibit a different set

of these bursty terms at different points of their lifetime.

20

+Bursty Vocabulary

nThe social media content related to an event tends to revolve around a central topic.

21

+Time Decay

na time decay function to the clustering frameworkn penalizes clusters that have been inactive for

a long time.n re-triggers events that have been inactive for

some time if the similarity score without the time-decay factor is strong enough.

22

+Experimental Evaluation

nDatan Upcoming datasetn 273,842 multi-featured Flickr photos that

correspond to 9,613 real-world events from the Upcoming event.

nthe BurstyV + TimeDec technique obtained the highest quality results.

23

+Conclusion

nThis article discussed the event identification task under two different scenarios, known- and.

nWe showed how to identify event content effectivelyn how we can exploit rich features of the social

media documentsn revealing temporal patterns of the relevant

content

24

+Thought25

+Thanks for listening.2014 / 4 / 27(Mon.) @ MakeLab Group [email protected]