leveraging microblogs for resource ranking

22
Leveraging Microblogs for Resource Ranking Tomáš Majer, Marián Šimko [email protected], [email protected] 23.01.2012, SOFSEM 2012 Institute of Informatics and Software Engineering Faculty of Informatics and Information Technologies Slovak University of Technology in Bratislava

Upload: tomas-majer

Post on 12-Nov-2014

1.806 views

Category:

Technology


1 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Leveraging microblogs for resource ranking

Leveraging Microblogsfor Resource Ranking

Tomáš Majer, Marián Š[email protected], [email protected]

23.01.2012, SOFSEM 2012

Institute of Informatics and Software EngineeringFaculty of Informatics and Information TechnologiesSlovak University of Technology in Bratislava

Page 2: Leveraging microblogs for resource ranking

Microblog

• a brief form of a blog with limited size of a post, typically 140 characters of length

• sharing experiences, opinions, comments, links• Twitter, Identi.ca, Jaiku, (Google+, Facebook)

• Twitter:▫ over 300 millions of users (100 mil. at the time of writing paper)

▫ over 300 millions of tweets/day (100 mil. --||--)

Page 3: Leveraging microblogs for resource ranking

Microblog – why important?

• „Read-Write Web“ vision• user-generated data

▫ unbiased▫ not moderated

• huge amount of data – text, links• valuable source of information

▫ data mining

Page 4: Leveraging microblogs for resource ranking

Microblog – why important?

• „Read-Write Web“ vision• user-generated data

▫ unbiased▫ not moderated

• huge amount of data – text, links• valuable source of information

▫ data mining

22 % of posts

Page 5: Leveraging microblogs for resource ranking

State-of-the-Art

• topic identification/keyword extraction (Ramage et al., 2010)

• opinion mining (Pendey, Iyer, 2009)

• twitter search, tweets ranking (Teevan et al., 2011)

• user ranking (Gayo-Avello, 2010)

• user modeling (Abel et al., 2011)

• …• challenge: resource ranking

Page 6: Leveraging microblogs for resource ranking

Twitter Graph

U1

U2

U3

T1

T2

T4

T3

T7

T6

T5

R1

R2

R4

R5

R3

Users Tweets Resources

Page 7: Leveraging microblogs for resource ranking

Twitter Graph

U1

U2

U3

T1

T2

T4

T3

T7

T6

T5

R1

R2

R4

R5

R3

Users Tweets Resources

posts

contains

follows re-posts

Page 8: Leveraging microblogs for resource ranking

Resource Ranking: Overview

• TweetRank▫ ranking of a resource based on Twitter graph analysis

• Computation1. UserRank

2. TweetRelevance

3. TweetRank

Page 9: Leveraging microblogs for resource ranking

Resource Ranking: Overview

• TweetRank▫ ranking of a resource based on Twitter graph analysis

• Computation1. UserRank

2. TweetRelevance

3. TweetRank

U1

U2

U3

T1

T2

T4

T3

T7

T6

T5

R1

R2

R4

R5

R3

Page 10: Leveraging microblogs for resource ranking

UserRank

UserRank (u) = ∑f ∈ followers(u)

1+γ(u)UserRank ( f )∣ followers( f )∣

Page 11: Leveraging microblogs for resource ranking

UserRank

UserRank (u) = ∑f ∈ followers(u)

1+γ(u)UserRank ( f )∣ followers( f )∣

γ(u)=∣ followers(u)∣∣tweets(u)∣

Page 12: Leveraging microblogs for resource ranking

TweetRelevance, TweetRank

TweetRelevance (t) =UserRank (Author (t))∣tweets(Author (t ))∣

Page 13: Leveraging microblogs for resource ranking

TweetRelevance, TweetRank

TweetRelevance (t) =UserRank (Author (t))∣tweets(Author (t ))∣

= TR(t)

TweetRank (r) = ∑t∈tweets(r ) (TR(t)+ ∑

rt∈retweets(t)TR(t)TR(rt))

Page 14: Leveraging microblogs for resource ranking

Evaluation

1. TweetRank ranking vs. explicit user ranking (YouTube )

2. Search results ranking study (Search)

• Data:▫ 1,997,466 tweets from 367,824 users 85 % in English

▫ 1,468,365,182 connections between 40,103,281 users▫ 1,150,168 unique web links 3 % of them: YouTube

Page 15: Leveraging microblogs for resource ranking

Computing TweetRank

• TweetRank computed for each resource▫ power-law distribution

TweetRank intervals

# of

reso

urce

s

Page 16: Leveraging microblogs for resource ranking

Experiment YouTube

• YouTube – explicit user rating (Y1Rank)▫ positive/negative vote, normalized

• TweetRank vs. Y1Rank▫ correlation coefficient

r = 0.02

Twee

tRan

k, Y

1Ran

k

YouTube videos

Page 17: Leveraging microblogs for resource ranking

Experiment YouTube 2

• YouTube – application-collected user rating (Y2Rank)▫ „how do you like the video“?▫ 5 degree scale (1-best, 5-worst), 70 participants

• TweetRank vs. Y2Rank▫ Kendall rank correlation coefficient τ= 0.125

• Relative video ranking

Page 18: Leveraging microblogs for resource ranking

Experiment YouTube 2▫ 5-tuplets

Page 19: Leveraging microblogs for resource ranking

Experiment YouTube 2▫ pairs

Page 20: Leveraging microblogs for resource ranking

Experiment Search

• 20,000 resources• indexing: SOLR (Apache Lucene)• searching: resource ranking extended with TweetRank

• search results manual comparison▫ 20 randomly selected queries (e.g.: „apple“)▫ analyzed top-k results

Page 21: Leveraging microblogs for resource ranking

Experiment Search

• findings:▫ in general, „newer“ resources rank better▫ ranking does not reflect chronological ordering of

resources

• suitable for sorting search results within a predefined time window

Page 22: Leveraging microblogs for resource ranking

Conclusions

• microblog▫ perspective source of data, information, knowledge

• we proposed novel method for resource ranking leveraging microblog network

• an important additional knowledge from the crowd▫ a form of indirect explicit user rating

• great potential for search improvement▫ reflects temporal characteristics (not linearly)▫ sorting results within a predefined time window