whispers in the dark: analysis of an anonymous social network gang wang, bolun wang, tianyi wang,...
TRANSCRIPT
Whispers in the Dark: Analysis of an Anonymous Social
Network
Gang Wang, Bolun Wang, Tianyi Wang, Ana Nika, Haitao Zheng, Ben Y. Zhao
UC Santa [email protected]
IMC’14
2
Concerns of Using Online Social Networks
• Users cannot speak freely in online social networks– User profile is linkable to real-world identity– Online actions can cause serious consequences
3
Whisper Anonymous Social Network
• Whisper, an anonymous social app– Online profile unlinkable to real identity– Express freely without fear of retaliation or abuse
o Share stories, seek advice, express complaintso Whistleblowers, teenagers avoiding bully
– Interact with people anonymously – > 3 billion monthly page views, 2014
• Part of wave of new, anonymous social networks– SnapChat, Secret, Yik-yak, Wickr, Rooms
(Facebook)
Key Features
• No personally identifiable information– No real names, only nicknames– No user profiles (phone#/email)– No explicit social links– Moderate content to make sure users
don’t reveal their identity
• Post whisper messages– Topics including relationships, family,
work, religion, politics, sex, etc.– Secrets, confessions
4
5
Our Goals
• Understand how anonymity affects user behavior in anonymous social networks
– How is Whisper’s network structure different from existing networks like Facebook and Twitter?
– How does anonymity impact the friendships between users and user engagement over time?
– Implications on user anonymity and privacy
6
Outline
• Motivation
• Dataset and Whisper Network– Data Collection– Basic Network Structure
• User Engagement and Stickiness
• Anonymity and Privacy in Whisper
• Conclusion
7
Whisper Functions and Data
System-wise recent whispers
Public whisper lists• Latest: all recent whispers in the network• Nearby: whispers in local area < 40 miles• Popular: whispers received many replies• Featured: editor-picked whispers
Whisper and replies are
public data; Chatting is
private
Nickname and a (rough) location
8
Data Collection
• Crawled the “latest whisper” stream for 3 months*– All public messages from February to May 2014– 9,343,590 original whispers, 15,268,964 whisper
replies– 1,038,364 unique userIDs
• Interacted frequently with Whisper– In-person meetings to get data collection permission– Whisper removed GUID in June 2014
*Data collection with Whisper’s permission, IRB approved
Global universal identifier (GUID)
Link the same user’s data over time
Basic Analysis: Interaction Graph
• How do users interact with each other with no explicit social links?
• Interaction graph: Whisper vs. Facebook and Twitter– Users are nodes, edges represent user interaction– 3-month time window for all three graphs
Graph Interact.Event
Nodes Edges Avg. Degree
Clustering Coefficient
Avg. Path Length
Assort.
Whisper Replies 690K 6531K 9.47 0.033 4.28 -0.011
Facebook Wall Posts 707K 1260K 1.78 0.059 10.13 0.116
Twitter Retweets 4,317K 16972K 3.93 0.048 5.52 -0.025
9
Whisper graph has high dispersion Interact with a wide range of strangers
Whisper
Existing social networks: Interact with a fixed set of friends
VS.
10
Persistent Friendship
• Persistent user pairs (strong ties) are extremely rare– Only 7.7% user pairs interacted multiple times (out of all
edges)– Majorities are weak ties, talked once, never again– Lower bound of user interactions (no data on private
messages)
Time Between First and Last Interaction (Days)
Num
ber o
f Tot
al In
tera
ction
s
Majority of user-pairs have weak relationships: short-lived, with few interactions
11
Do Communities Exist?
• Community detection on Whisper interaction graph– Modularity-based approaches: Louvain and Wakita– Resulting modularity: Louvain (0.492), Wakita
(0.409)
• Modularity > 0.3 community structure– Facebook (0.63), Youtube (0.66), Orkut (0.67)
[IMC’09] – Whisper has weak community structuresEven though users don’t have persistent friends, they
still form communities
12
Why Do Users Form Communities?
• Intuition: users interact with nearby users (via nearby list)
• Validation: whether community membership correlates with geographic location– Example community of 28,342 users, its top 4 regions
areo California (62%), Texas (1.5%), England (1.2%), Arizona
(0.9%)
• Users within a community likely from the same region
Users form communities based on geolocation
Percentile of Communities
1st
Region2nd Region
3rd Region
4th Region
50-percentile 52% 3.9% 1.5% 1.4%
70-percentile 45% 1.4% 1.3% 1.3%
90-percentile 32% 0.9% 0.9% 0.8%
13
Outline
• Motivation
• Dataset and Whisper Network
• User Engagement and Stickiness– User Engagement Over Time
– Predicting Future Engagement
• Anonymity and Privacy in Whisper
• Conclusion
14
From Network Ties to User Engagement
• Background: social ties impact network “stickiness”– Strong ties: close friends, weak ties: strangers– Strong ties help keep existing users from leaving
a more “sticky” network
• Our question: with a network of strangers, how well can Whisper maintain user engagement over time?
• Evaluate per-user engagement over time– How long do users stay active?– Do users turn dormant quickly?
stranger
15
0.050.15
0.250.35
0.450.55
0.650.75
0.850.95
0
5
10
15
20
25
30
35
40
User’s Active Period (Normalized)
% o
f Use
rs
How Long Do Users Stay Active?
Users who were only active for the first1-2
days (~35%)
Users who stayed active
• User’s active period (normalized)– “Active” means users still generate new content– User’s active period / our monitoring period of that user • Significant portion of users quickly turn dormant
• Bimodal distribution predict users stay or not?
16
Predicting User Engagement
• Binary prediction, whether disengage quickly or not– Input: user’s data during initial X days– ML classifiers: Random Forest, SVMs, Bayes,
Decision Tree
• Features (20)– Content posting volume, frequency (7)– Social interactions (8)– Temporal features (2)– Activity trend (3)
A extensive list of features, can
be further trimmed
1 Day 3 Day 7 Day 14 Day 30 Day50
60
70
80
90
100All Features Top 4 Features
Data From Users’ First X Days
Accu
racy
(%)
Prediction Result (Random Forest)
• 10-fold cross validation on ground-truth dataset– Classify users using their first X days of data
1-day data already has 75% of accuracy
94% Accuracy when predicting
engagement
17
Top 4 Features produce accurate results
• # of days with > 1 whisper• # of days with > 1 reply• Is posting volume decreasing?• # of total posts
• Whisper can identify users likely to leave
• Increase user engagement using other tools
18
Outline
• Motivation
• Dataset and Whisper Network
• User Engagement and Stickiness
• Anonymity and Privacy in Whisper
• Conclusion
19
Privacy and Anonymity in Whisper
• Existing mechanisms to prevent PII leakage– No personal information is collected (no real name,
phone# or email address)– Server only stores public whispers, private chats stay on
the phone– Noise is added to user GPS before sending to Whisper’s
servers
• Worst case: attacker compromises servers and obtains data– Much more external data needed to de-anonymize users
Whisper
?
20
Location Tracking Attack
• Tracking whisper users’ locations– Pinpoint current location: error < 0.2 miles– Allow attackers to follow (stalk) users
• How to attack– “Nearby list” shows whispers by distance– Triangulate user location using distance measurements– Reverse-engineer Whisper’s noise function
• Key problem: lack of GPS authentication– Unlimited # of queries from any location (fake GPS
input)– Use statistics to overcome noise
21
An Example Attack
Victim
Attacker
Whisper: “BZ is away in Dublin, party in the lab!”
Attack fully automated with forged GPS• Query “distance” to the victim• Navigate to victim step by step until convergence
Triangulate target location!
Distance Query
More Details
Location converged!
Fixed by Whisper
Whisper: “Get more beer!”
22
Summary
• The first large-scale measurements on Whisper
• User interaction has high dispersion, difficult to build persistent friendship
• User engagement shows bimodal distribution, future engagement can be predicted by early-day data
• Anonymous apps can still leak personal information– Location: once shared with the app, has the risk of leaking– No reliable GPS authentication, attacker can query any
locations
24
References
• [COSN’13] GARCIA, D., MAVRODIEV, P., AND SCHWEITZER, F. Social resilience in online communities: The autopsy of friendster. In Proc. of COSN (2013).
• [IMC’09] KWAK, H., CHOI, Y., EOM, Y.-H., JEONG, H., AND MOON, S. Mining communities in networks: a solution for consistency and its evaluation. In Proc. of IMC (2009)