lecture 20: privacy in online social networks xiaowei yang

65
Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Post on 19-Dec-2015

230 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Lecture 20: Privacy in Online Social Networks

Xiaowei Yang

Page 2: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

• References:– On the Leakage of Personally Identifiable

Information Via Online Social Networks by Balachander Krishnamurthy and Craig E. Wills

– Characterizing Privacy in Online Social Networks by Balachander Krishnamurthy and Craig E. Wills

Page 3: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Problem

• Online social networks are places for users to share privacy information– Personal identifiable information (PII)• Information that can be used to distinguish or trace an

individual’s identity either alone or when combined with other information linkable to an individual• Examples of PII

– Photos– Status update

• However, this information can be leaked to unintended parties

Page 4: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Today

• Measurement studies of the importance of the problem

• PII can be leaked to third-party websites that make users browsing history linkable

• OSN default privacy settings leak PII

Page 5: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Types of private bits in OSNs

Page 6: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Who can see your private bits

Page 7: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

USER PRIVACY CONTROLS

• Defaults are dangerous– By default, information in a user’s Facebook

profile/content, and comments (as on a user’s “Wall”) are viewable by any other user in the user’s networks• Has it changed?

–MySpace uses similar permissive defaults in terms of access to a user’s information—all users have access to all other user’s information.

Page 8: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Do users change their defaults?

• A 2005 study found that– only 1.2% of college Facebook users at CMU

changed the searchability of their thumbnail profile

– 0.06% changed their profile visibility (second row)

• 75% of 200 users in the Facebook London regional network have their full profile viewable by other users in the network

Page 9: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Measurement Methodology

• MySpace– Generated 5000 random numeric userids in an

observed range of valid userids– Retrieved their corresponding user profiles

• Bebo– Examined the profiles of users who were members

of interest groups within Bebo

Page 10: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

• Facebook– Join regional networks• Large and Small• Geographic diversity• Linguistic/culture diversity

– Used the random network browsing feature of Facebook to crawl users’ profiles• 10 users are displayed

– 200 retrievals for each regional network• 1600-1700 users

Page 11: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Results

• MySpace– Obtained profile information for 3851 valid

userids– 79% (3046) of users retained their default settings

– Profile, friends, comments and user content world viewable.

• Bebo– 80% of the Bebo users allowed their profile,

friends, comments and user content to be viewable.

Page 12: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Facebook

Page 13: Lecture 20: Privacy in Online Social Networks Xiaowei Yang
Page 14: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Observations

• Users in smaller networks less concerned in making private information available

• Higher privacy value in profile information than list of friends

• Wall is the most valuable– 79% of those with a viewable profile allowed their Wall

to be viewable to anyone in the network for NY– 83% for Seattle – 95% for the Worcester region.

Page 15: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

USE OF THIRD-PARTY DOMAINS

Page 16: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Information leakage to 3rd party domains

• PII is sent to 3rd party domains via HTTP requests

• Same PII may be sent to the same 3rd party domains when users browse other websites– Online history traceable

Page 17: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

HTTP Background

• A cookie is a piece of text stored by a web browser• A cookie is sent as an HTTP header by a web server to a

web browser• The web browser sends it back unchanged to the server

each time it accesses the server• A cookie makes web browsing stateful

– http is a request/response stateless protocol

Page 18: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

HTTP background (cont.)

• An HTTP request contains – the method to be applied to the resource– Request-URI (the uniform resource identifier to

the resource)– The protocol version in use

• Example of a Request-URI GET /pub/WWW/TheProject.html HTTP/1.1

Host: www.w3.org

Page 19: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

HTTP background (cont.)

• Referer is a request header field

• Specifies to the server the address (URI) of the resource from which the Request-URI was obtained– I.e., who asked for the server URI

• Referer allows a server to generate customized contents

Page 20: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

PII in OSNs

Page 21: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Sample of Leakage

• Friendid is associated with the doubleclick cookie

• Other sites the user browses can be linked to the friendid

Page 22: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Leakage of OSN IDs

• z.digg.com is a 3rd party advertisement site

Page 23: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Leakage via External Applications

Page 24: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Leakage of pieces of PII

Page 25: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Protection Against PII Leakage

• User actions– Providing none in OSNs– Filtering HTTP headers• Referer, Cookie

– Disallow cookies–…

• Aggregators– Filtering PII– Are they going to do it?

Page 26: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

• OSNs– Strip PII from HTTP requests– A session specific value for UID

• External applications– Similarly, strip PII from HTTP requests

Page 27: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Problem Not Unique to OSNs

• Any site you have an account with can do so

• Examples– A news site leaks user email addresses to online

aggregators– A travel site embeds a user’s first name and default

airport in its cookies, and leaks them to any site hiding in its domain

Page 28: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Conclusion

• Eric Schmidt “If you have something that you don’t want anyone to know, maybe you shouldn’t be doing it in the first place.”

• By clicking the links and browsing online, they know a lot more about you than you thought

Page 29: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Discussion

• What can be done to improve online user privacy?– Browser isolation

• Next lecture: privacy-preserving online advertisements

• Law enforcement?

Page 30: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Lecture 21: Privacy and Online Advertising

Page 31: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

References

• Challenges in Measuring Online Advertising Systems by Saikat Guha, Bin Cheng, and Paul Francis

• Serving Ads from localhost for Performance, Privacy, and Profit by Saikat Guha, Alexey Reznichenko, Kevin Tang, Hamed Haddadi, and Paul Francis

Page 32: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Problem

• Online advertising funds many web services– E.g., all the free stuff we get from Google

• Ad networks gather much user information

• How do they use the user information?

Page 33: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Goals

• Determining how well ad networks target users

Page 34: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Methodology

• Creating two clients representing two different user types

• Measuring the different ads each client sees

Page 35: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Challenges

• How to compare ads

• How to collect a representative snapshot of ads

• Quantifying the differences

• Avoiding measurement artifacts

Page 36: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Comparing Ads is challenging

• Ads don’t have unique IDs• A & B are semantically the same, but with

different text• A & C are different, but with same display URLs

Page 37: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

How to define two ads are the same?

• Easy but illegal approach: comparing destination URLs– FP: flagged as equal but not– FN: equal but not flagged

• Display URL has the lowest FNs Use display URL to define ads equality

Page 38: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Taking a Snapshot

• More ads can be displayed on any single page• How to determine all Ads that may be fed to a

user?– Reload the page multiple times– But too many reloads may lead to ads churn: old

ads expire, new ads show up

Page 39: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Determining the # of reloads

• Reloads every 5 seconds• Repeated for 200 queries• Curve becomes linear > 10 reloads

– Ads churns

• Use 10 reloads as the threshold

Page 40: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Quantifying Change

• Metrics– Jaccard index:

– Extended Jaccard index (cosine similarity)

||

||

BA

BA

Page 41: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Comparing Effectiveness

• Views: # of page reloads containing the ad• Value: # of page reloads scaled by the position of

the ad• Overlap: Jaccard index

Page 42: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Comparing Effectiveness

Page 43: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

The winner is

• Weight: log(views) or log(value)

Page 44: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Avoiding artifacts

• Different system parameters may lead to different ads view– Browsers used different DNS servers– Browsers receive different cookies– HTTP proxy

Page 45: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Analysis

• Configure two or more instances to differ by one parameter

• Comparing results for– Search Ads–Website Ads– Online Social Network Ads

Page 46: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Search Ads

• A, B: control w/o cookies• C, D: w/ cookies enabled. Seeded w/ different personae• Google 730 random product-related queries for 5 days• No obvious behavioral targeting in search ads. Why?

– Keyword based ads bidding

• Location targeting not studied

Page 47: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Websites Ads

• Measure 15 websites that show Google ads• A, B: control in NY• C: SF; D: Germany• Location affects web ads

Page 48: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Website Ads

• A, B: control• C: browse 3 out of 15 websites• D and E: browse random websites and Google search

random websites• Google does not use browsing behavior to pick ads

Page 49: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Online social network ads

• Set up three or more Facebook profiles• A, B: control and identical• C: differs from A by one profile parameter

Page 50: Lecture 20: Privacy in Online Social Networks Xiaowei Yang
Page 51: Lecture 20: Privacy in Online Social Networks Xiaowei Yang
Page 52: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Online social network ads

• Use all profile parameters to customize ads• Age and gender are two primary factors• Diurnal patterns due to ads churn– Should it increase or decrease?

• Education and relationship matter less, except for engaged and non-engaged women

Page 53: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Checking Impact of Sexual Preference

• Six profiles with different sexual preferences• Two males interested in females (male control)• Two females interested in males (female

control)• One male interested in male • One female interested in female

Page 54: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Ads differ by sexual preferences

Page 55: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Other results

• Found neutral ads targeted exclusively to gay men

• Clicking would reveal to the advertiser a user’s sexual preference

• 66 ads shown exclusively to gay men more than 50 times during experiments

Page 56: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Summary

• Search ads are largely key-word based so far

• Websites ads use location but probably not behavior

• Social network ads use all profile attributes to target users

Page 57: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Question: how can we design a privacy-preserving online

advertising system?

Page 58: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Goals

• Support online advertising– A good revenue source to fund online services

• Preserve user privacy

Page 59: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

PrivAd

• Serving Ads from a localhost client• Actors: user, publisher, advertiser, broker, and

dealer

Page 60: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

How it works

• Advertisers upload ads to broker

• User client subscribes to a set of the ads according to the user’s profile to the broker– Message encrypted with Broker’s public key and

contains a symmetric private key

• The Broker sends filtered ads to the user client– Ads are encrypted with the symmetric key

• Dealer anonymizes the client’s message to Broker

Page 61: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Ad View/Click Reporting

• When a user clicks an ad, the user client sends a view/click report containing ad ID and publisher ID to the broker via the dealer

• Dealer attaches a unique report ID, removes client identity information, maps the ID to the user identity information

Page 62: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Click-fraud defense

• Broker provides dealer the record IDs if it suspects click-fraud

• The dealer finds the user

• The dealer stops relaying ads to user if convinced

• Questions not answered: how to detect by broker, and what’s the punishment

Page 63: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Defining User Privacy

• Unlinkability– No single player can link the identity of user with

any piece of user’s profile– No single player can link together more than some

limited number of pieces of personalization information of a given user

• The dealer learns User A clicks on some ad• The broker learns someone clicked on ad X• Not robust to dealer/broker collusion

Page 64: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Scaling PrivAd

• Ads churn is significant• 2GB/month of compressed ad data

Page 65: Lecture 20: Privacy in Online Social Networks Xiaowei Yang

Discussion

• What challenges does PrivAd may face in a practical deployment?