whispers in the dark: analysis of an anonymous social network gang wang, bolun wang, tianyi wang,...

24
Whispers in the Dark: Analysis of an Anonymous Social Network Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, Ben Y. Zhao UC Santa Barbara [email protected] IMC’1 4

Upload: lee-hancock

Post on 16-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

Whispers in the Dark: Analysis of an Anonymous Social

Network

Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, Ben Y. Zhao

UC Santa [email protected]

IMC’14

2

Concerns of Using Online Social Networks

• Users cannot speak freely in online social networks– User profile is linkable to real-world identity– Online actions can cause serious consequences

3

Whisper Anonymous Social Network

• Whisper, an anonymous social app– Online profile unlinkable to real identity– Express freely without fear of retaliation or abuse

o Share stories, seek advice, express complaintso Whistleblowers, teenagers avoiding bully

– Interact with people anonymously – > 3 billion monthly page views, 2014

• Part of wave of new, anonymous social networks– SnapChat, Secret, Yik-yak, Wickr, Rooms

(Facebook)

Key Features

• No personally identifiable information– No real names, only nicknames– No user profiles (phone#/email)– No explicit social links– Moderate content to make sure users

don’t reveal their identity

• Post whisper messages– Topics including relationships, family,

work, religion, politics, sex, etc.– Secrets, confessions

4

5

Our Goals

• Understand how anonymity affects user behavior in anonymous social networks

– How is Whisper’s network structure different from existing networks like Facebook and Twitter?

– How does anonymity impact the friendships between users and user engagement over time?

– Implications on user anonymity and privacy

6

Outline

• Motivation

• Dataset and Whisper Network– Data Collection– Basic Network Structure

• User Engagement and Stickiness

• Anonymity and Privacy in Whisper

• Conclusion

7

Whisper Functions and Data

System-wise recent whispers

Public whisper lists• Latest: all recent whispers in the network• Nearby: whispers in local area < 40 miles• Popular: whispers received many replies• Featured: editor-picked whispers

Whisper and replies are

public data; Chatting is

private

Nickname and a (rough) location

8

Data Collection

• Crawled the “latest whisper” stream for 3 months*– All public messages from February to May 2014– 9,343,590 original whispers, 15,268,964 whisper

replies– 1,038,364 unique userIDs

• Interacted frequently with Whisper– In-person meetings to get data collection permission– Whisper removed GUID in June 2014

*Data collection with Whisper’s permission, IRB approved

Global universal identifier (GUID)

Link the same user’s data over time

Basic Analysis: Interaction Graph

• How do users interact with each other with no explicit social links?

• Interaction graph: Whisper vs. Facebook and Twitter– Users are nodes, edges represent user interaction– 3-month time window for all three graphs

Graph Interact.Event

Nodes Edges Avg. Degree

Clustering Coefficient

Avg. Path Length

Assort.

Whisper Replies 690K 6531K 9.47 0.033 4.28 -0.011

Facebook Wall Posts 707K 1260K 1.78 0.059 10.13 0.116

Twitter Retweets 4,317K 16972K 3.93 0.048 5.52 -0.025

9

Whisper graph has high dispersion Interact with a wide range of strangers

Whisper

Existing social networks: Interact with a fixed set of friends

Facebook

VS.

10

Persistent Friendship

• Persistent user pairs (strong ties) are extremely rare– Only 7.7% user pairs interacted multiple times (out of all

edges)– Majorities are weak ties, talked once, never again– Lower bound of user interactions (no data on private

messages)

Time Between First and Last Interaction (Days)

Num

ber o

f Tot

al In

tera

ction

s

Majority of user-pairs have weak relationships: short-lived, with few interactions

11

Do Communities Exist?

• Community detection on Whisper interaction graph– Modularity-based approaches: Louvain and Wakita– Resulting modularity: Louvain (0.492), Wakita

(0.409)

• Modularity > 0.3 community structure– Facebook (0.63), Youtube (0.66), Orkut (0.67)

[IMC’09] – Whisper has weak community structuresEven though users don’t have persistent friends, they

still form communities

12

Why Do Users Form Communities?

• Intuition: users interact with nearby users (via nearby list)

• Validation: whether community membership correlates with geographic location– Example community of 28,342 users, its top 4 regions

areo California (62%), Texas (1.5%), England (1.2%), Arizona

(0.9%)

• Users within a community likely from the same region

Users form communities based on geolocation

Percentile of Communities

1st

Region2nd Region

3rd Region

4th Region

50-percentile 52% 3.9% 1.5% 1.4%

70-percentile 45% 1.4% 1.3% 1.3%

90-percentile 32% 0.9% 0.9% 0.8%

13

Outline

• Motivation

• Dataset and Whisper Network

• User Engagement and Stickiness– User Engagement Over Time

– Predicting Future Engagement

• Anonymity and Privacy in Whisper

• Conclusion

14

From Network Ties to User Engagement

• Background: social ties impact network “stickiness”– Strong ties: close friends, weak ties: strangers– Strong ties help keep existing users from leaving

a more “sticky” network

• Our question: with a network of strangers, how well can Whisper maintain user engagement over time?

• Evaluate per-user engagement over time– How long do users stay active?– Do users turn dormant quickly?

stranger

15

0.050.15

0.250.35

0.450.55

0.650.75

0.850.95

0

5

10

15

20

25

30

35

40

User’s Active Period (Normalized)

% o

f Use

rs

How Long Do Users Stay Active?

Users who were only active for the first1-2

days (~35%)

Users who stayed active

• User’s active period (normalized)– “Active” means users still generate new content– User’s active period / our monitoring period of that user • Significant portion of users quickly turn dormant

• Bimodal distribution predict users stay or not?

16

Predicting User Engagement

• Binary prediction, whether disengage quickly or not– Input: user’s data during initial X days– ML classifiers: Random Forest, SVMs, Bayes,

Decision Tree

• Features (20)– Content posting volume, frequency (7)– Social interactions (8)– Temporal features (2)– Activity trend (3)

A extensive list of features, can

be further trimmed

1 Day 3 Day 7 Day 14 Day 30 Day50

60

70

80

90

100All Features Top 4 Features

Data From Users’ First X Days

Accu

racy

(%)

Prediction Result (Random Forest)

• 10-fold cross validation on ground-truth dataset– Classify users using their first X days of data

1-day data already has 75% of accuracy

94% Accuracy when predicting

engagement

17

Top 4 Features produce accurate results

• # of days with > 1 whisper• # of days with > 1 reply• Is posting volume decreasing?• # of total posts

• Whisper can identify users likely to leave

• Increase user engagement using other tools

18

Outline

• Motivation

• Dataset and Whisper Network

• User Engagement and Stickiness

• Anonymity and Privacy in Whisper

• Conclusion

19

Privacy and Anonymity in Whisper

• Existing mechanisms to prevent PII leakage– No personal information is collected (no real name,

phone# or email address)– Server only stores public whispers, private chats stay on

the phone– Noise is added to user GPS before sending to Whisper’s

servers

• Worst case: attacker compromises servers and obtains data– Much more external data needed to de-anonymize users

Whisper

?

20

Location Tracking Attack

• Tracking whisper users’ locations– Pinpoint current location: error < 0.2 miles– Allow attackers to follow (stalk) users

• How to attack– “Nearby list” shows whispers by distance– Triangulate user location using distance measurements– Reverse-engineer Whisper’s noise function

• Key problem: lack of GPS authentication– Unlimited # of queries from any location (fake GPS

input)– Use statistics to overcome noise

21

An Example Attack

Victim

Attacker

Whisper: “BZ is away in Dublin, party in the lab!”

Attack fully automated with forged GPS• Query “distance” to the victim• Navigate to victim step by step until convergence

Triangulate target location!

Distance Query

More Details

Location converged!

Fixed by Whisper

Whisper: “Get more beer!”

22

Summary

• The first large-scale measurements on Whisper

• User interaction has high dispersion, difficult to build persistent friendship

• User engagement shows bimodal distribution, future engagement can be predicted by early-day data

• Anonymous apps can still leak personal information– Location: once shared with the app, has the risk of leaking– No reliable GPS authentication, attacker can query any

locations

23

Thank You!Questions?

24

References

• [COSN’13] GARCIA, D., MAVRODIEV, P., AND SCHWEITZER, F. Social resilience in online communities: The autopsy of friendster. In Proc. of COSN (2013).

• [IMC’09] KWAK, H., CHOI, Y., EOM, Y.-H., JEONG, H., AND MOON, S. Mining communities in networks: a solution for consistency and its evaluation. In Proc. of IMC (2009)