enhancing twitter spam discovery using cross account pattern matching

24
ENHANCING TWITTER SPAM DETECTION USING CROSS ACCOUNT PATTERN MATCHING. By Ambarish Pande

Upload: ambarish-pande

Post on 22-Jan-2018

106 views

Category:

Data & Analytics


1 download

TRANSCRIPT

Page 1: Enhancing Twitter spam discovery using cross account pattern matching

ENHANCING TWITTER SPAM DETECTION USING CROSS ACCOUNT PATTERN MATCHING.

By Ambarish Pande

Page 2: Enhancing Twitter spam discovery using cross account pattern matching

Contents

▸ Introduction▸ Motivation▸ Proposed Algorithm▸ Implementation Details▸ Advantages and Drawbacks▸ Conclusion and Future work

Page 3: Enhancing Twitter spam discovery using cross account pattern matching

Introduction

▸ Emerging Social Networks.▹ Popularity of Facebook and Twitter▹ 1550 Million active FB users.▹ 320 Million active Twitter Users.▹ Global Reach▹ Multi-platform

▸ Social Network’s Revenue Model▹ Advertising▹ 85% of Twitter’s Revenue comes

from advertising

Page 4: Enhancing Twitter spam discovery using cross account pattern matching

Motivation

▸ The Problem▹ Social networks like twitter provide

a legal way of publicizing content.▹ Some companies go for illegal

methods like Spam Accounts.▹ Huge Revenue Loss to Twitter

10,000,00 $ /YrMillions of Dollars per year. That’s a lot of money!

Page 5: Enhancing Twitter spam discovery using cross account pattern matching

Motivation

▸ Existing Solution▹ Twitter’s spam detection algorithm

focuses on criteria such as:▹ harmful links▹ aggressive following behavior▹ posting to trending topics, ▹ posting duplicated tweets▹ Low profile activity

▸ Drawbacks▹ Spammers have evolved.▹ Now Twitter cannot detect spam

based on existing algorithm

Page 6: Enhancing Twitter spam discovery using cross account pattern matching

Proposed Algorithm

▸ Emphasis on interaction between accounts and not on individual accounts.

▸ Finding pattern with existing spam tweets.

▸ Detecting spam accounts based on tweets and spam tweets based on accounts.

Page 7: Enhancing Twitter spam discovery using cross account pattern matching

FLOW CHART TO DETECT SPAM

Identify Tweets with Malicious Links

Mining Spam Patterns

Spam Likelihood Estimation

Page 8: Enhancing Twitter spam discovery using cross account pattern matching

Proposed Algorithm

Stage 1 :Identify Tweets with Malicious Links.

1. Collect tweets and user info.2. Follow links in the Tweet3. Check whether it is flagged by Twitter or any

other URL Shortening services (goo.gl or bit.ly)

4. If yes Mark as Spam Else no

Leverage Twitter’s Database of Malicious links.

Page 9: Enhancing Twitter spam discovery using cross account pattern matching

Proposed Algorithm

Stage 2: Mining Spam Patterns. .

1. Strip off all URLS, @user mentions and #hashtags.

2. Strip off all non alphanumeric characters such as digits 0-9 or characters like *,!,@,#.

3. Create a hash for each stripped off tweet.

4. Compare the hash with hashes of other tweets.

Find Pattern

Page 10: Enhancing Twitter spam discovery using cross account pattern matching

Proposed Algorithm

Stage 3: Spam Likelihood Estimation.

1. Iterate through users and assign spam scores based on the user’s tweets.

2. Iterate through tweets and assign spam score based on the users of tweet.

Calculate Spam Score

Page 11: Enhancing Twitter spam discovery using cross account pattern matching

Proposed Algorithm

Stage 3: Spam Likelihood Estimation.

Here comes the MATH

Page 12: Enhancing Twitter spam discovery using cross account pattern matching

Proposed Algorithm

Page 13: Enhancing Twitter spam discovery using cross account pattern matching

Implementation Details

▸ Data Collection▹ Twitter java API - Twitter4j▹ Registering App with twitter.

Page 14: Enhancing Twitter spam discovery using cross account pattern matching

Implementation Details

▸ Data Storage▹ MySQL database.

Page 15: Enhancing Twitter spam discovery using cross account pattern matching

3,79,867tweets

3,129users

Implementation Details

▸ Twitter API has Rate Limits to Number of Requests.

▸ 180 Request / 15 min

Page 16: Enhancing Twitter spam discovery using cross account pattern matching

Implementation Details

▸ Stage 1 Implementation▹ JSoup - Web Crawler for Java

● t.co - Warning: this link may be unsafe

● Goo.gl - The site ahead contains malware

● Bit.ly - STOP - there might be a problem with the requested link

Page 17: Enhancing Twitter spam discovery using cross account pattern matching

Implementation Details

▸ Stage 1 Stats▹ After implementing the first stage of the

algorithm

Page 18: Enhancing Twitter spam discovery using cross account pattern matching

Implementation Details

▸ Stage 2 Implementation▹ Regular Expressions to Strip Off

#hashtags, @user mentions, URLs, special characters and numbers

▹ Used MD5 Algorithm to generate unique hashes.

▹ Tweets with same hash values were marked as spam.

Page 19: Enhancing Twitter spam discovery using cross account pattern matching

Implementation Details

▸ Stage 2 stats▹ 13015 duplicate hashes were found▹ It covered 70,728 tweets

Page 20: Enhancing Twitter spam discovery using cross account pattern matching

Implementation Details

▸ Stage 3 Stats▹ Spam tweets which were not initially

labelled by first two stages were found out.

▹ Users which tweet more spam were assigned high Spam Score.

▹ And tweets which are tweeted by such accounts are also assigned higher Spam Score

Page 21: Enhancing Twitter spam discovery using cross account pattern matching

Drawbacks

▸ Not good enough in detecting human controlled spam accounts.

Advantages

▸ Detects bot controlled spam accounts.▸ Easily detect Spam Campaigns.▸ Spam tweets with different user mentions

and links are also detected.▸ Excessive ReTweets to unrelated topics are

also treated as Spam.

Page 22: Enhancing Twitter spam discovery using cross account pattern matching

Conclusion and Future Work

▸ Cross Account pattern matching method is highly effective.

▸ Old Methods do not work nowadays.▸ For Future Work

▹ Clustering of tweets to understand topics which spammers use the most

▹ Providing a real time spam discovery solution by implementing Machine Learning.

Page 23: Enhancing Twitter spam discovery using cross account pattern matching

Refrences

[1] Publication

http://dl.ifip.org/db/conf/im/im2015m/137446.pdf

Page 24: Enhancing Twitter spam discovery using cross account pattern matching

THANKS!

Any questions?