prying data from a social network - joseph · pdf filejoseph bonneau (university of cambridge)...
TRANSCRIPT
PRYING DATA FROM A SOCIAL NETWORK
Joseph [email protected]
Jonathan [email protected]
George [email protected]
Computer Laboratory
ASONAM Conference
Athens, Greece
July 20, 2009
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 1 / 1
I. Research Question
How can we extract data from a social network on an large scale?
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 2 / 1
Our Case Study
Why Facebook is interesting:
Size: 225 M usersComplexity
Third-PartyApplicationsPublic ListingsFB Connect
Accurate Profiles
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 3 / 1
Data of Interest
User ProfilesSocial GraphTraffic Data
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 4 / 1
Data of Interest
User ProfilesSocial GraphTraffic Data
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 4 / 1
Data of Interest
User ProfilesSocial GraphTraffic Data
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 4 / 1
Potential Adversaries
AdvertisersMarketersData Aggregators
Credit Ratings AgenciesInsurance Companies
Law EnforcementIntelligenceEmployersEducatorsOnline ScammersResearch Community
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 5 / 1
What This Talk is Not
Mechanics of large-scale parallelized web crawlingLargest academic crawls: ∼ 10 M profilesSee Wilson et al. User Interactions in Social Networks and theirImplications. EuroSys 2009.
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 6 / 1
II. Data Extraction Techniques
Public ListingsFalse ProfilesMalicious ApplicationsPhishingFacebook API
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 7 / 1
1.) Public Listings
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 8 / 1
1.) Public Listings
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 8 / 1
1.) Public Listings
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 8 / 1
1.) Public Listings
Not protected from crawlingAble to extract ∼ 500 k per day, desktop PCExtract entire network in ∼ 500 machine-days
Get only 8 links per listingCan still extract many useful features (Bonneau et al. 2009)
High Degree NodesSmall Dominating SetsHighly Central NodesCommunities
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 9 / 1
2.) False Profiles
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 10 / 1
2.) False Profiles
80% of users will befriend a frog (Krishmanurthy and Wills, 2008)Can then crawl profiles with Friend-of-Friend Privacy
70-90% of users viewable within a sub-networkRegional networks being phased out
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 11 / 1
3.) Malicious Applications
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 12 / 1
3.) Malicious Applications
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 12 / 1
3.) Top ApplicationsApplication # Users
1. How Well Do You Know Me? 28,074,5282. Causes 25,508,1743. MyCalendar 18,403,8784. We’re Related 16,860,9485. LivingSocial 16,618,0436. Movies 16,128,5397. RockYou Live 14,931,2298. Texas HoldEm Poker 14,594,9319. Pet Society 12,743,91810. Mafia Wars 12,694,72911. MindJolt Games 12,346,54912. Top Friends 12,144,26313. MyCalendar 12,128,12814. Slide FunSpace 11,088,63615. Farm Town 11,001,529
Source: InsideFacebook.com, 7/7/09
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 13 / 1
3.) Top DevelopersApplication # Users
1. Zynga 54,778,1272. RockYou! 37,783,7783. Playfish 33,030,8724. How Well Do You Know Me? 28,074,5285. Slide, Inc. 27,149,3776. Causes 25,508,1747. MyCalendar 18,403,8788. LivingSocial 17,543,3759. FamilyLink.com 17,299,31610. Flixster 16,128,53911. MindJolt 12,346,54912. My Calendar 12,128,12813. Slashkey 11,001,52914. 6 waves 10,809,79715. Zwigglers 10,006,859
Source: InsideFacebook.com, 7/7/09
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 14 / 1
3.) Weekly Application ChurnApplication # Users
1. MindJolt Games +2,444,4702. We’re Related +1,291,5313. Quizzer +959,6004. Farm Town +953,4285. Pet Society +840,2966. MyCalendar +820,0857. What Type Of Girl Are you? +743,5608. FARKLE +731,5379. Food Fling! +713,60410. Music +621,58811. Barn Buddy +600,10512. What Era Should You Time Travel To? +558,30113. Texas HoldEm Poker +490,32514. Cities I’ve Visited +488,83115. Waka-Waka +486,538
Source: InsideFacebook.com, 7/7/09
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 15 / 1
4.) Profile Compromise & Phishing
Email Phishing
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 16 / 1
4.) Profile Compromise & Phishing
Password Sharing
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 16 / 1
4.) Profile Compromise & Phishing
Facebook Connect
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 16 / 1
5.) Facebook Query Language
SELECT uid, name, affiliations FROM user
WHERE uid IN (X,Y, ... Z);
Step 1: Fetch Name/UID pairs
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 17 / 1
5.) Facebook Query Language
SELECT uid1, uid2 FROM friend
WHERE uid1 IN (X,Y, ... Z)
AND uid2 IN (U,V, ... W);
Step 2: Fetch Friendships
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 18 / 1
5.) Facebook Query Language
Can query sets of ∼ 1,000 users at a timeCan fetch all Name/UID pairs in ∼ 600 machine-daysExponential blowup in friendship queries:( N
1,0002
)≈
(200, 000
2
)≈ 2 · 1010
Still, useful to fill in gaps from other methods
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 19 / 1
III.) Simulation
How many nodes must be “compromised” to view a large portionof the network?Assume all nodes have friends-only or friend-of-friend privacyTest growth of node coverage and edge coverage
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 20 / 1
Data Set
Crawled ∼ 15,000 users from Stanford UniversityUsed FQL method, took < 12 hours.
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 21 / 1
Experimental ResultsFriends-Only Friend-of-Friend
Nodes
Links
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 22 / 1
Experimental Results
50% profiles 90% linksTargeted compromise, friend-only 0.16% 0.14%Random compromise, friend-only 0.71% 0.60%Friend requests, friend-only 50.0% 19.6%Targeted compromise, friend-of-friend 0.01% 0.01%Random compromise, friend-of-friend 0.04% 0.03%Friend requests, friend-of-friend 0.16% 0.14%
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 23 / 1
Simulation Conclusions
Only need to compromise a small fraction of networkInitial gains very fast
Friends-of-friend makes discovery 10-20 times fasterTargeted compromise doesn’t help muchPhishing needs to be taken seriously...
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 24 / 1
General Conclusions
Many ways to get data out of a modern SNSMost users unaware of these methodsData collection practical for many motivated parties
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 25 / 1
Thank You
Questions?
Joseph Bonneau (University of Cambridge) Prying Data From a Social Network July 20, 2009 26 / 1