david choffnes, northeastern university jingjing ren, northeastern university ashwin rao, university...

20
David Choffnes , Northeastern University Jingjing Ren , Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology Arnaud Legout, INRIA Sophia-Antipolis ReCon: Revealing and Controling PII Leaks in Mobile Network Systems DTL Workshop, Nov. 201 Sponsored by:

Upload: brenda-black

Post on 18-Jan-2018

216 views

Category:

Documents


0 download

DESCRIPTION

How Frequently Is PII Leaked? 3 Basic tracking is common Significant fraction of very personal information leaked across all platforms PII leakage is pervasive! Fraction of top 100 apps leaking PII (Tested in September, 2015)

TRANSCRIPT

Page 1: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

David Choffnes, Northeastern University Jingjing Ren, Northeastern UniversityAshwin Rao, University of HelsinkiMartina Lindorfer, Vienna Univ. of TechnologyArnaud Legout, INRIA Sophia-Antipolis

ReCon: Revealing and Controling PII Leaks in Mobile Network Systems

DTL Workshop, Nov. 2015

Sponsored by:

Page 2: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

2

Motivation Mobile devices

Rich sensors Ubiquitous connectivity

Key questions What personal information is transmitted? To whom does it go? What can average users do about it?

usernameemail address

home addressgender

real name

phone numberpassword...MAC address

Advertiser ID

Page 3: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

3

How Frequently Is PII Leaked?

Basic tracking is common

Significant fraction of very personal information leaked across all platforms

PII leakage is pervasive!

Frac

tion

of to

p 10

0 ap

ps le

akin

g PI

I

(Tested in September, 2015)

Page 4: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

4

How to Detect PII Leaks in Mobile? At the OS

Information flow analysis (static/dynamic/hybrid) Ok solutions, but not perfect or easily deployable

In the network Independent of OS, app store Easy to detect if you know what PII to search for

What if you don’t know the PII a priori?

Page 5: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

5

ReCon: Automatically Identifying PII Leaks

Hypothesis: PII leaks have distinguishing characteristics Is it just simple key/value pairs (e.g., “user=R3C0N”)?

Nope, this leads to high FP/FN rates Need to learn the structure of PII leaks

Approach: Build ML classifiers to reliably detect leaks Does not require knowing PII in advance Resilient to changes in PII leak formats over time

We built ReCon Machine learning to reveal PII leaks from mobile devices Software middleboxes to intercept and control leaks Works on all major platforms (iOS, Android, Windows Phone)

Page 6: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

6

ReCon: Viewing detected leaks

PII Category Device Identifiers Contact Information User Identifiers Credentials

User Feedback Correct Incorrect Not sure Not about me

Page 7: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

7

Where They Know You’ve Been

Location information is hard todigest using text alone

WTKYB shows just how pervasivelocation tracking is

Creepiness factor to help userscare more about privacy(?)

Page 8: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

8

Mitigating PII Leaks ReCon gives users control over leaks

Example simple strategies Block PII Modify PII Randomize identifiers Coarsen locations

Advanced mitigation (under dev) Mock user profiles Provide k-anonymity

Page 9: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

9

How does ReCon work? Key challenges for ML-based PII detection

Which classifier do we use? C4.5 Decision Tree is best trade-off between speed and

accuracy

How do we train the classifier? Use traces from real users and controlled experiments Break flows into separate words that may indicate a leak Feature selection for scalability

How well are we doing? Controlled experiments In the wild: Only the users themselves know for sure!

Crowdsourced reinforcement

Page 10: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

10

Key Results: ReCon accuracy How accurate is ReCon?

99% overall accuracy from controlled experiments

FPR: 2.2%, FNR: 3.5% Why?

Per-domain classifiers Decision tree captures non-trivial cases

Page 11: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

11

Key Results: ReCon Has Good Coverage

How does it compare to other solutions?

Device Identifier User Identifier Contact Info Location0.00%

10.00%

20.00%

30.00%

40.00%

50.00%

60.00%

70.00%

80.00%

90.00%

100.00%FlowDroid(Static IFA) Andrubis (Dynamic IFA) AppAudit(Hybrid IFA) ReCon

ReCon finds significantly more PII than IFA solutions

ReCon successfully idenifies missing leaks after

retrainingFrac

tion

of to

tal l

eaks

foun

d

Page 12: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

12

Key Results: User study IRB-approved user study

24 iOS, 13 Android devices 20/26 responses: system useful & behavior

change 165 cases of credential leaks, 94 verified Average leaks: iOS > Android Unexpected, suspicious leaks

Recipe/cooking app tracks location Video/Game/News app leaks gender

And more… Check out http://recon.meddle.mobi

Page 13: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

13

Summary ReCon: Provides transparency/control over

PII leaks Relies only on access to network traffic (OS

independent) Machine learning to automatically identify PII

leaks Crowdsourced reinforcement with user

feedback Works today! Check out

http://recon.meddle.mobiSponsor:Question

s?David Choffnes

[email protected]

Page 14: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

14

Backups

Page 15: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

15

Encryption and ReCon What is your answer for increasing use of

encryption? Recon needs access only to plaintext flows mcTLS, BlindBox Route to trusted middlebox that can do MITM

Works for most apps, but usually not logins Haystack (on Android device)

Page 16: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

16

Encryption: What is leaked? Leaks over SSL (not much)

Send PII to trackers over SSL (100 apps/device) 6 iOS 2 Android 1 Windows

Problem with SSL Certification pining Not working with VPN enabled

Obfuscation Little evidence in controlled experiment using

IFA

Page 17: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

17

Other applications of ReCon K-anonymity Explicit sharing

Allow users to control how much shared to third-parties

Obfuscation Retrain classifiers to identify obfuscated leaks Use static/dynamic to analysis tools that are

resilient to evasion techniques

Page 18: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

18

Deployment models ReCon only needs access to network flows

VPN proxy (current deployment): tunnel to proxy server Currently supported by all mobile OSes Can run VMs anywhere in the world

Raspberry Pi In home network Enables HTTPS decryption with minimal additional

risks On device

Haystack on Android In network

Awazza and other APN/middlebox deployment models

Page 19: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

19

Methodology Details Controlled experiments as ground truth Text classification approaches

Problem: Given a network flow, whether it contains PII information or not?

Feature Extraction: Bag-of-word model Example.com /someevent?x=1&y=2 {“z”:”xx@y”} Words: someevent, x, 1, y, 2, z, xx@y,

Per-Domain classifiers (e.g. Google-Analytics) Faster (compared to one-for-all) More accurate

Library: weka

Page 20: David Choffnes, Northeastern University Jingjing Ren, Northeastern University Ashwin Rao, University of Helsinki Martina Lindorfer, Vienna Univ. of Technology

20

Why Run ReCon? User incentives

Control over data leaks! Blocking unwanted content k-anonymity for increased privacy