@ spam : t he u nderground on 140 c haracters or l ess chris grier, vern paxson, michael zhang...

32
@SPAM: THE UNDERGROUND ON 140 CHARACTERS OR LESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University of Illinois, Urbana-Champaign ACM CCS 2010 2 0 1 1 / 3 / 2 2 1

Upload: ayden-herson

Post on 14-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

1

@SPAM: THE UNDERGROUND ON 140 CHARACTERS OR LESS

Chris Grier, Vern Paxson, Michael ZhangUniversity of California, Berkeley

Kurt ThomasUniversity of Illinois, Urbana-Champaign

ACM CCS 2010

20

11

/3/2

2

Page 2: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

2

AGENDA Introduction Background Data Collection Spam On Twitter Spam Campaign Blacklist Performance Conclusion

20

11

/3/2

2

Page 3: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

3

INTRODUCTION

Twitter has developed a following of 106 million users that post to the site over one billion times per month

Threat: Force guessing of weak passwords Phishing …

Twitter currently lacks a filtering mechanism to prevent spam, with the exception of malware, blocked using Google’s Safebrowsing API

Twitter has developed a loose set of heuristics to quantify spamming activity, such as excessive account creation or requests to befriend other users

20

11

/3/2

2

Page 4: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

4

INTRODUCTION (CONT.)

Present the first in-depth look at spam on Twitter

Finding that 0.13% of users exposed to spam URLs click though to the spam web site

Identify a diversity of spam campaigns exploiting a range of Twitter features to attract audiences

Blacklists are currently too slow to stop harmful links

Two types of spamming accounts on twitter

20

11

/3/2

2

Page 5: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

5

BACKGROUND

Common techniques to filter email spam IP blacklisting domain and URL blacklisting filtering on email contents

Social network spam requires a large social circle

The challenge of a successful spam campaign in Twitter: Obtaining enough accounts URL shortening

services on Twitter Have enough fresh URLs

20

11

/3/2

2

Page 6: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

6

BACKGROUND (CONT.)

Tweets: Twitter restricts these updates to 140 characters or less URL shortening

Follower: How to obtain a lot of followers Friends: Relationships in Twitter are not

bidirectional Mentions, Retweets, Hashtags

20

11

/3/2

2

Page 7: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

7

DATA COLLECTION

Collect data from two separate taps targets a random sample of Twitter activity specifically targets any tweets containing URLs.

use a custom web crawler to follow the URL through HTTP status codes and META tag redirects until reaching the final landing

Redirect resolution removes any URL obfuscation that masks the domain of the final landing page

20

11

/3/2

2

Page 8: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

8

DATA COLLECTION (CONT.)

We regularly check every landing page’s URL in our data set against three blacklists: Google Safebrowsing→phishing or malware URIBL , Joewein →domain present in spam email

Once a landing page is marked as spam, we analyze the associated spam tweets and users involved in the spam operation.

We have found that URIBL and Joewein include domains that are not exclusively hosting spam

20

11

/3/2

2

Page 9: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

9

DATA COLLECTION (CONT.)

During this time we gathered over 200 million tweets from the stream → Over 3 million tweets were identified as spam

Crawled 25 million URLs → 8% of all unique links were identified as spam by blacklists 5% were malware and phishing 95% directed users towards scams

20

11

/3/2

2

Page 10: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

10

DATA COLLECTION (CONT.)

bit.ly or an affiliated service is used to shorten a spam URL

we use the bit.ly API to download clickthrough statistics and click stream data which allows us to identify highly successful spam pages and the rate of traffic

20

11

/3/2

2

Page 11: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

11

SPAM ON TWITTER

Spammers must coerce Twitter members into following spam accounts spamming bots compromised accounts unwitting participants in spam distribution.

20

11

/3/2

2

Page 12: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

12

SPAM ON TWITTER (CONT.)

Roughly 50% of spam was uncategorized due to using random terms

This table is the other 50%

20

11

/3/2

2

Page 13: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

13

SPAM ON TWITTER (CONT.)

20

11

/3/2

2

Page 14: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

14

SPAM ON TWITTER (CONT.)

Call outs : Mentions are used by spammers to personalize messages in an attempt to increase the likelihood a victim follows a spam link.

Retweets : four sources of spam retweets : retweets purchased by spammers from respected

Twitter members spam accounts retweeting other spam hijacked retweets users unwittingly retweeting spam.

20

11

/3/2

2

Example: Win an iTouch AND a $150 Apple gift card @victim!http://spam.com

Example: RT @scammer: check out the Ipads there having a giveaway http://spam.com

Page 15: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

15

SPAM ON TWITTER (CONT.)

Tweet hijacking : spammers can hijack tweets posted by other users and retweet them, prepending the tweet with spam URLs.

Trend setting : the anomaly of 70% of phishing and malware spam containing hashtags can be explained by spammers attempting to create a trending topic

Trend hijacking : Rather than generating a unique topic, spammers can append currently trending topics to their own spam.

20

11

/3/2

2

Example: http://spam.com RT @barackobama A great battle isahead of us

Example: Buy more followers! http://spam.com #fwlr

Page 16: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

16

SPAM ON TWITTER (CONT.)

20

11

/3/2

2

Page 17: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

17

SPAM ON TWITTER (CONT.)

Coefficient of correlation between clicks and feature accounts involved in spamming and the number of

followers that receive a link (ρ > 0. 7) Hashtag (ρ=0.74) retweets with hashtags (ρ=0.55) number of times spam is tweeted (ρ=0.28)

indicating that repeatedly posting a link does little to increase traffic.

20

11

/3/2

2

Page 18: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

18

SPAM ON TWITTER (CONT.)

To understand the effectiveness of tweeting to entice a follower into visiting a spam URL

Reach = t × f t: the total tweets sent f: the followers exposed to each tweet

Averaging of (clicks / reach) for each of the 245,000 URLs in our bit.ly data set find roughly 0.13% of spam tweets generate a

visit, orders of magnitude higher when compared to clickthrough rates of 0.003%–0.006% reported for spam email

20

11

/3/2

2

Page 19: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

19

SPAM ON TWITTER (CONT.)

A number of factors which may degrade the quality of this estimate bit.ly URLs which may carry an inherent bias of

trust as the most popular URL shortening service click data from bit.ly includes the entire history

of a link, while our observation of a link’s usage only account for one month of Twitter activity

20

11

/3/2

2

Page 20: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

20

SPAM ON TWITTER (CONT.)

Twitter accounts career spamming account a compromised account was created by a

legitimate user Tests

x2 test on timestamp Tweet text and link entropy

20

11

/3/2

2

Page 21: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

21

Compromised spamming accountsan account could have been compromised by means of phishing, malware, or simple password guessing, currently a major trend in Twitter

the Koobface botnet

20

11

/3/2

2

SPAM ON TWITTER (CONT.)

Page 22: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

22

SPAM TOOLS2

01

1/3

/22

Page 23: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

23

SPAM CAMPAIGNS

Campaign : the set of accounts that spam at least one blacklisted landing page in common

To cluster accounts into campaigns vector c = {0, 1}n

ci cj , indicating at least one link is shared by both accounts.

20

11

/3/2

2

Page 24: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

24

SPAM CAMPAIGNS (CONT.)

if an account participates in multiple campaigns, the algorithm will automatically group the campaigns into a single superset An account is shared by two spammers used for multiple campaigns over time by a

single spammer compromised by different services

20

11

/3/2

2

Page 25: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

25

SPAM CAMPAIGNS (CONT.)

20

11

/3/2

2

Page 26: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

26

SPAM CAMPAIGNS (CONT.)

20

11

/3/2

2

Page 27: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

27

SPAM CAMPAIGNS (CONT.)

URLs being tweeted Single hop (shortened →landing page) Second hop(shortened URL → affiliate link →

landing page). landing page itself appears in tweets

Phishing for followers websites purporting to provide victims with

followers if they revealed their account credentials phished accounts are used to further promote the

phishing campaign. Defining features

tweets in this campaign is the extensive use of hashtags, 73%

20

11

/3/2

2

Page 28: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

28

SPAM CAMPAIGNS (CONT.)

Personalized mentions (http:// twitprize.com) Spam within the campaign would target victims

by using mentions and crafting URLs to include the victim’s Twitter account name to allow for personalized greetings

Defining features 99% are a retweet or mention

this campaign pass the entropy tests since each tweet contains a different username and the links point to distinct twitprize URLs.

20

11

/3/2

2

Page 29: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

29

SPAM CAMPAIGNS (CONT.)

Buying retweets One such service, retweet.it Defining features

unique feature present in all retweet.it

Distributing malware Defining features

One difference from other campaigns is this use of redirects to mask the landing page (bit.ly → intermediate →malware landing site)

Nested URL shortening

20

11

/3/2

2

Page 30: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

30

BLACKLIST PERFORMANCE2

01

1/3

/22

Page 31: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

31

BLACKLIST PERFORMANCE(CONT.)2

01

1/3

/22

Page 32: @ SPAM : T HE U NDERGROUND ON 140 C HARACTERS OR L ESS Chris Grier, Vern Paxson, Michael Zhang University of California, Berkeley Kurt Thomas University

32

CONCLUSION This paper presents the first study of spam on

Twitter including spam behavior, clickthrough, and the effectiveness of blacklists to prevent spam propagation

By measuring the clickthrough of these campaigns, we find that Twitter spam is far more successful at coercing users into clicking on spam URLs than email, with an overall clickthrough rate of 0.13%.

If blacklists were integrated into Twitter, they would protect only a minority of users

URLs posted to the site must be crawled to unravel potentially long chains of redirects, using the final landing page for blacklisting.

20

11

/3/2

2