a comparative study of hits vs pagerank algorithms for twitter users analysis
Post on 11-Apr-2017
298 Views
Preview:
TRANSCRIPT
1
A Comparative Study of HITS vs PageRank Algorithms for Twitter Users Analysis
Ong Kok Chien , Poo Kuan Hoong and Chiung Ching HoFaculty of Computing Informatics, Multimedia University Cyberjaya.
3
Introduction
Graph analysis algorithms on Social Network.
Identify worth noticing Twitter Users for a specific bag of topics.
4
Maximum 140 characters microblogging site.
“A Tweet is an expression of a moment or idea. It can contain text, photos, and videos. Millions of Tweets are shared in real time, every day.”
Reply
Retweet
Favorite
Hashtags
https://about.twitter.com/what-is-twitter/story-of-a-tweet
.com
5
Problem Statement
Why is it important to rank users?
By ranking users, we aims to differentiate relevant important information sources from those provided by spam accounts.
6
ObjectivesTo rank Twitter users using HITS and Page Rank.
To identify the direction of edges for the graph.
7
Methods
Link-based ranking algorithms (HITS & PageRank)
Twitter Users as Nodes.
Retweet relationships as Edges.
Direction of graph.
8
Example
PageRank (PR) E.g.: BackLinks in Websites - Referring back to
Original Content.
- Sergey Brin & Larry Page (1998). The anatomy of a large-scale hypertextual Web search engine.
Image extracted from Wikipedia
9
Example
Hyperlink-Induced Topic Search (HITS) Hubs : Catalog for relevant contents. Authorities : Great contents itself.
- Jon Kleinberg (1999). Authoritative sources in a hyperlinked environment.
Image extracted from cornell.edu
10
Example
Minister of Youth & Sports
Khairykj
shatyrah2
AyenSanji
RT-ed RT-ed
https://twitter.com/Khairykj/status/410964119521460224
12
Keywords
TeamMsia every1connects tmrewards yellowpages_my tmsmebiz MaxisComms MaxisListens DiGi_Telco DiGi_Youths helloUMobile
HyppTV Streamyx UMobile Digi Maxis Yes4G Celcom xpaxsays TMCorp TMConnects
13
Basic Statistics
Raw Dataset Total Tweets : 230,166 Total Unique Users : 121,461 ( screen_name ) Total Verified Users : 113 ( verified ) Average Followers Count : 983 ( followers_count )
Experiment Dataset ( Retweets ) No. of Tweets: 56,727 No. of Unique users: 50,636
9th Dec 2013 -> 29th Dec 2013
14
Results
PageRank Ranking
HITS
TeamMsia 1 TeamMsiaManOlimpik 2 Khairykj
Khairykj 3 ManOlimpikOKS_HARIMAUMUDA 4 OKS_HARIMAUMUDA
BrooksBeau 5 FIH_HockeyTMCorp 6 BB_Johor
LawakLegend 7 AtletMalaysiaWTFSG 8 TMCorp
JanganPanas 9 Faif_DFIH_Hockey 10 BBST15
60% Top10 were the same bag of users. (70% for TOP20)
15
Results
User Screen Name Verified Follower Counts
TeamMsia False 98,469ManOlimpik False 1,661
Khairykj True 432,259OKS_HARIMAUMUDA False 35,058
BrooksBeau False 1,226,629TMCorp False 13,767
LawakLegend False 49,593WTFSG False 469,909
JanganPanas False 14,642FIH_Hockey False 24,628
No. of followers of Twitter user doesn’t directly affect the No. of retweets.Not very important to have a verified account to get retweeted.
16
Results
“Football at the 2013 Southeast Asian Games”
9th Dec 2013 -> 29th Dec 2013
18
Summary
The use of Link-based ranking algorithms such as Page Rank and HITS does promise us some insights about concerning Twitter Users and their significance.
These insights can be useful for Customer Care / Churn Management
19
Future Work
Additional relationships to be considered. (Conversational Reply, Pure Mentions)
Further validation of additional attributes. (Verified, Tweet Count, Followers Count, Following Count etc. )
Extend deeper into Tweet level analysis.
20
Question?Contact MeOng Kok Chienahchienong@gmail.com
http://qrs.ly/2t49r7l vCard Download link
Thanks…
top related