a comparative study of hits vs pagerank algorithms for twitter users analysis

20
A Comparative Study of HITS vs PageRank Algorithms for Twitter Users Analysis Ong Kok Chien , Poo Kuan Hoong and Chiung Ching Ho Faculty of Computing Informatics, Multimedia University Cyberjaya. 1

Upload: poo-kuan-hoong

Post on 11-Apr-2017

298 views

Category:

Data & Analytics


1 download

TRANSCRIPT

1

A Comparative Study of HITS vs PageRank Algorithms for Twitter Users Analysis

Ong Kok Chien , Poo Kuan Hoong and Chiung Ching HoFaculty of Computing Informatics, Multimedia University Cyberjaya.

2

Outline

IntroductionProblem StatementObjectiveMethodsResultsSummary

3

Introduction

Graph analysis algorithms on Social Network.

Identify worth noticing Twitter Users for a specific bag of topics.

4

Twitter

Maximum 140 characters microblogging site.

“A Tweet is an expression of a moment or idea. It can contain text, photos, and videos. Millions of Tweets are shared in real time, every day.”

Reply

Retweet

Favorite

Hashtags

https://about.twitter.com/what-is-twitter/story-of-a-tweet

.com

5

Problem Statement

Why is it important to rank users?

By ranking users, we aims to differentiate relevant important information sources from those provided by spam accounts.

6

ObjectivesTo rank Twitter users using HITS and Page Rank.

To identify the direction of edges for the graph.

7

Methods

Link-based ranking algorithms (HITS & PageRank)

Twitter Users as Nodes.

Retweet relationships as Edges.

Direction of graph.

8

Example

PageRank (PR) E.g.: BackLinks in Websites - Referring back to

Original Content.

- Sergey Brin & Larry Page (1998). The anatomy of a large-scale hypertextual Web search engine.

Image extracted from Wikipedia

9

Example

Hyperlink-Induced Topic Search (HITS) Hubs : Catalog for relevant contents. Authorities : Great contents itself.

- Jon Kleinberg (1999). Authoritative sources in a hyperlinked environment.

Image extracted from cornell.edu

10

Example

Minister of Youth & Sports

Khairykj

shatyrah2

AyenSanji

RT-ed RT-ed

https://twitter.com/Khairykj/status/410964119521460224

11

Architecture

Twitter Streaming API

Configure Keywords

1 JSON raw data2

3 HiveQL 4 UnixScript

12

Keywords

TeamMsia every1connects tmrewards yellowpages_my tmsmebiz MaxisComms MaxisListens DiGi_Telco DiGi_Youths helloUMobile

HyppTV Streamyx UMobile Digi Maxis Yes4G Celcom xpaxsays TMCorp TMConnects

13

Basic Statistics

Raw Dataset Total Tweets : 230,166 Total Unique Users : 121,461 ( screen_name ) Total Verified Users : 113 ( verified ) Average Followers Count : 983 ( followers_count )

Experiment Dataset ( Retweets ) No. of Tweets: 56,727 No. of Unique users: 50,636

9th Dec 2013 -> 29th Dec 2013

14

Results

PageRank Ranking

HITS

TeamMsia 1 TeamMsiaManOlimpik 2 Khairykj

Khairykj 3 ManOlimpikOKS_HARIMAUMUDA 4 OKS_HARIMAUMUDA

BrooksBeau 5 FIH_HockeyTMCorp 6 BB_Johor

LawakLegend 7 AtletMalaysiaWTFSG 8 TMCorp

JanganPanas 9 Faif_DFIH_Hockey 10 BBST15

60% Top10 were the same bag of users. (70% for TOP20)

15

Results

User Screen Name Verified Follower Counts

TeamMsia False 98,469ManOlimpik False 1,661

Khairykj True 432,259OKS_HARIMAUMUDA False 35,058

BrooksBeau False 1,226,629TMCorp False 13,767

LawakLegend False 49,593WTFSG False 469,909

JanganPanas False 14,642FIH_Hockey False 24,628

No. of followers of Twitter user doesn’t directly affect the No. of retweets.Not very important to have a verified account to get retweeted.

16

Results

“Football at the 2013 Southeast Asian Games”

9th Dec 2013 -> 29th Dec 2013

17

Results

Closer look of how TeamMsia involved in the conversation.

18

Summary

The use of Link-based ranking algorithms such as Page Rank and HITS does promise us some insights about concerning Twitter Users and their significance.

These insights can be useful for Customer Care / Churn Management

19

Future Work

Additional relationships to be considered. (Conversational Reply, Pure Mentions)

Further validation of additional attributes. (Verified, Tweet Count, Followers Count, Following Count etc. )

Extend deeper into Tweet level analysis.

20

Question?Contact MeOng Kok [email protected]

http://qrs.ly/2t49r7l vCard Download link

Thanks…