finding news curators in twitter

17
Janette Lehmann, Carlos Castillo, Mounia Lalmas, Ethan Zuckerman Finding News Curators in Twitter

Upload: janette-lehmann

Post on 18-Dec-2014

2.068 views

Category:

Technology


0 download

DESCRIPTION

Users interact with online news in many ways, one of them being sharing content through online social networking sites such as Twitter. There is a small but important group of users that devote a substantial amount of effort and care to this activity. These users monitor a large variety of sources on a topic or around a story, carefully select interesting material on this topic, and disseminate it to an interested audience ranging from thousands to millions. These users are news curators, and are the main subject of study of this paper. We adopt the perspective of a journalist or news editor who wants to discover news curators among the audience engaged with a news site. We look at the users who shared a news story on Twitter and attempt to identify news curators who may provide more information related to that story. In this paper we describe how to find this specific class of curators, which we refer to as news story curators. Hence, we proceed to compute a set of features for each user, and demonstrate that they can be used to automatically find relevant curators among the audience of two large news organizations. This presentation is part of the SNOW workshop of the World-Wide-Web Conference, held in Rio to Janeiro, May 2013.

TRANSCRIPT

Page 1: Finding News Curators in Twitter

Janette Lehmann, Carlos Castillo, Mounia Lalmas, Ethan Zuckerman

Finding News Curators in Twitter

Page 2: Finding News Curators in Twitter

Outline

¨  Motivation

¨  Types of curators

¨  Labeling news story curators

¨  Automatically finding news story curators

¨  Conclusion and future work

2

Photo credit (first slide): Hobvias Sudoneighm (CC-BY).

Page 3: Finding News Curators in Twitter

Motivation

¨  Twitter has become a powerful tool for the aggregation and consumption of time-sensitive content in general and news in particular.

¨  Journalists use online social media platforms (Twitter, Facebook and others) and blogs to elicit other story angles or verify stories they are working on.

To what extend the community of engaged readers - those who

share news articles in social media – can contribute to the journalistic process?

What kind of roles people play when sharing news?

We want to detect users that provide further relevant

information to a news story. We call them news story curators.

3

Page 4: Finding News Curators in Twitter

Example

Al Jazeera English news article about the civil war in Syria

“Syria allows UN to step up food aid” [16 Jan 2013]

Users that posted the article in Twitter

Whom would you follow to find out more about the civil war in Syria?

4

#Followers Is tweeting about

@RevolutionSyria 88,122 Syria

@KenanFreeSyria 13,388 Syria

@UP_food 703 Food

@KeriJSmith 8,838 Breaking news/top stories

@BreakingNews 5,662,866 Breaking news/top stories

Page 5: Finding News Curators in Twitter

Types of news story curators

Human Automatic

Topi

c-

unfo

cuse

d

Topic-unfocused curator Disseminating news articles about diverse topics, usually breaking news/top stories à @KeriJSmith

News aggregators Collecting news articles (e.g. from RSS feeds) and automatically post their corresponding headlines and URLs à @BreakingNews

Topi

c-

focu

sed

Topic-focused curator Collecting interesting information with a specific focus, usually a geographic region or a topic à @KenanFreeSyria

Topic-focused aggregators Disseminating automatically news with topical focus à @UP_food, @RevolutionSyria

5

Page 6: Finding News Curators in Twitter

Types of news story curators

Human Automatic

Topi

c-

unfo

cuse

d To

pic-

fo

cuse

d

Topic-focused curator Collecting interesting information with a specific focus, usually a geographic region or a topic à @KenanFreeSyria

Topic-focused aggregators Disseminating automatically news with topical focus à @UP_food, @RevolutionSyria

Valuable curators for a specific story

These curators are probably less or not valuable

6

Page 7: Finding News Curators in Twitter

Data sets

Step 1: Selection of news articles ¨  News articles published in early 2013 from

¤  BBC World Service [BBC] 75 articles ¤  Al Jazeera English [AJE] 155 articles

¨  Stories: Obama's inauguration, Mali conflict, Pollution in Beijing, etc. Step 2 : News crowd detection ¨  All users who tweeted the article within the first 6 hours after

publication

Step 3: User characteristics ¨  Extraction of data from each user in the news crowd (e.g. further

tweets, profile information)

7

Page 8: Finding News Curators in Twitter

Labeling News Story Curators

8

Phot

o cr

edit:

Tho

mas

Leu

thar

d (C

C B

Y).

Page 9: Finding News Curators in Twitter

Labeling tasks

Data ¨  Sample of 20 news articles

¨  For each news article, a sample of 10 users who posted the article

¨  We shown to three assessors:

¤  The title of the news article and a sample of tweets of the user

¤  Profile description and the number of followers of the user

Labeling-Questions

9

Q1) Please indicate whether the user is interested or an expert of the topic of the article story: Yes: Most of her/his tweets relate to the topic of the story (e.g. the article is about the conflict in Syria, she/he is often tweeting about the conflict in Syria). Maybe: Many of her/his tweets relate to the topic of the story or she/he is interested in a related topic (e.g. the article is about the conflict in Syria, she/he is tweeting about armed conflicts or the Arabic world). No: She/he is not tweeting about the topic of the story. Unknown: Based on the information of the user it was not possible to label her/him.

Q2) Please indicate whether the user is a human or generates tweets automatically: Human: The user has conversations and personal comments in his or her tweets. The text of tweets that have URLs (e.g. to news articles) seems self-written and contain user own opinions. Maybe automatic: The Twitter user has characteristics of an automatic profile, but she/he could be human as well. Automatic: The tweet stream of the user looks automatically generated. The tweets contain only headlines and URLs of news articles. Unknown: Based on the information of the user it was not possible to label her/him as human or automatic.

Page 10: Finding News Curators in Twitter

Resulting training set

Interested? (topic-focused)

Human or Automatic? Interested + human

n yes no n human automatic

AJE 63 21% 79% 71 55% 45% 13%

BBC 58 3% 54% 54 35% 65% 1.8%

many users are topic-unfocused and automatic

10

We considered only users for which at least two annotators provided a decisive label (Yes or No, Human or Automatic)

Page 11: Finding News Curators in Twitter

Automatically finding News Story Curators

11

Phot

o cr

edit:

Mad

s Iv

erse

n (C

C B

Y-N

C-S

A).

Page 12: Finding News Curators in Twitter

Features

Visibility • Number of followers • Number of Twitter lists with user

Tweeting activity • Number of tweets per day • Fraction of tweets that contains a re-tweet mark "RT", a URL, a user

mention or a hashtag

Topic focus • Number of crowds the user belongs to • Number of distinct article sections of the crowds (e.g. sports, business) the

user belongs to

12

Page 13: Finding News Curators in Twitter

Simple models

UserIsHuman

UserFracURL >= 0.85 automatic,

otherwise human

Mod

el

Human class: Prec/Rec: 0.85

AUC: 0.81 Eval

uatio

n

UserIsInterestedInStory

UserSectionsQ >= 0.9 not-interested,

otherwise interested

Mod

el

Interested class: Prec: 0.48 / Rec: 0.93

AUC: 0.83 Eval

uatio

n

Preselection The user must have •  At least 1,000 followers •  Posted an article that is estimated related to the original article [1]

13

[1] J. Lehmann, C. Castillo, M. Lalmas, and E. Zuckerman. Transient news crowds in social media. In ICWSM, 2013.

feature (one) selection + random forest algorithm

Page 14: Finding News Curators in Twitter

Complex models

Precision Recall AUC

Automatic 0.88 0.84 0.93

Human 0.82 0.86 0.93

Interested 0.95 0.92 0.90

Not-interested 0.53 0.67 0.90

random forest with information-gain-based feature selection

random forest with asymmetric misclassification costs false negatives (classifying an interested user as not interested) were considered 5 times more costly than false positives

14

Page 15: Finding News Curators in Twitter

Precision-oriented evaluation

We compared our method with two baseline approaches ¨  Users with the largest number of followers [FOLLOWER-APPROACH]

¨  Users with the largest number of stories detected as related to the original one [STORY-APPROACH]

Data ¨  Sample of 20 news articles that had at least one curator, detected using the complex model

with a confidence value >= 0.75 ¨  We extracted for each article the same number of possible curators using the other two

approaches ¨  We asked three assessors to evaluate the results

(question Q1 – UserIsInterestedInStory)

¨  About 210 labels for 70 units were collected Results

true positive/false positive FOLLOWER-APPROACH: 2/18 = 11% STORY-APPROACH: 5/20 = 25% OUR APPROACH: 6/16 = 38%

15

Page 16: Finding News Curators in Twitter

Conclusion and future work

We were able to detect and model news story curators, who (could and maybe are) play an important role in the news ecosystem; not only for news readers,

but for journalists and editors.

¨  A large amount of activity on Twitter is automatic and some of these news aggregators can be considered to be good curators

¨  Mostly the attention of the user is quickly shifting away - posting a link does not have to reflect a long-standing interest on the subject of the link

Future work ¨  Adding other (Twitter) variables to the system that capture, for instance,

interestingness and serendipity

¨  Application on other news providers

¨  Analysis of the functionality of popular news aggregators, which are comparable to RSS feeds

16

Page 17: Finding News Curators in Twitter

Questions and Discussion…

17

Janette Lehmann Universitat Pompeu Fabra [email protected] Carlos Castillo Qatar Computing Research Institute [email protected] Mounia Lalmas Yahoo! Labs [email protected] Ethan Zuckerman MIT Center for Civic Media [email protected]

Phot

o cr

edit:

Way

ne L

arge

(CC

-BY-

ND

).

Photo credits: Hobvias Sudoneighm (CC BY), Thomas Leuthard (CC BY), Mads Iversen (CC BY-NC-SA), Wayne Large (CC BY-ND)