personal data management with secure hardware - inriaanciaux/mdm-2013.pdf · pr sm prism lab. - umr...

94
PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Personal Data Management with Secure Hardware The Advantage of Keeping your Data at Hand Nicolas Anciaux, Benjamin Nguyen & Iulian Sandu Popa INRIA Paris-Rocquencourt & University of Versailles St-Quentin IEEE MDM’13 Advanced Seminar 4 th June 2013

Upload: others

Post on 12-Sep-2019

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PR SMPRiSM Lab. - UMR 8144

Personal Data Management with Personal Data Management with Secure Hardware The Advantage of Keeping your Data at Hand

Nicolas Anciaux, Benjamin Nguyen & Iulian Sandu PopaINRIA Paris-Rocquencourt & University of Versailles St-Quentin

IEEE MDM’13 Advanced Seminar4th June 2013

Page 2: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Mass-generation of personal data

Data sources have mostly turned to digital

Paper-based interactions e.g., banking, e-administration

Analog processese.g., photography

Mechanical interactionse.g., opening a door

People recording

People listnening

St Peter's Place, Roma

PR SM

e.g., opening a door

Where is your personal data? … In data centers

112 new emails per day ���� Mail servers

800 pages of social data ���� Social networks

Daily basis interaction data ���� Search engines, Telco, Transport, etc.

List of purchases ���� Central purchasing organizations

All this opens the way to exhilarating economic per spectives…..

2

Page 3: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

“Personal data is the new oil” (quoting WEF)Good news (for the economy)$2 billion a year spend by US companies

on third-party data about individuals

(Forrester Report)

(oil exploration and production is $400 billion)

$44.25 is the estimated return on $1

invested in email marketing (oil is up to 0.5$/yr)

States (e.g., France in 2013) investigate the idea of taxing personal data (personal data are resource s

PR SM

States (e.g., France in 2013) investigate the idea of taxing personal data (personal data are resource s

collected for free, escaping added value tax)

Facebook: value / #accounts ≈ ≈ ≈ ≈ 50$

Google: $38 billion business sells ads based on how people search the Web

Not only for Google & Facebook but also:e.g., Amazon (knows purchase intent), mail order systems companies, loyalty program sailors, banks &

insurrance, employement market (linkedIn, viadeo), travel & transportation (voyages-sncf), the

« love » market (meetic), etc.

3

Page 4: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

“Personal data is the new oil” (quoting WEF)

How would oil producers behave ?

They would offer to exploit your oil field for free

They would offer free services to you

… which would cost them only a few cents

(e.g. HW/SW to manage emails)

… or bad news ? (for the fields)

PR SM

(e.g. HW/SW to manage emails)

and would provide services which may not be to you (and not advertized)

… which would yield healthy returns

(e.g. advertisement and profiling, location tracking and spying, …)

In other words : your personal data would be

processed by sophisticated data refineries …

4

Page 5: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Many reasons to get anxious…

Even the most defended servers are sucessfuly attac ked

Including those of Pentagon, FBI and NASA

E.g., feb. 2013: 1TB hacked daily – victims include US military facilities

Personal data can not avoid being subject to neglig ence

+1000 data leak incidents, +100 millions records af fected per year

(reported by Open Security Foundation, , Privacy Rights Clearinghouse & Ponemon Institute)

Personal data is often regulated by loose privacy p olicies

Obscure policies, assumed accepted when using the s ervice

PR SM

Ill-intentioned scrutinization flourishes

Justified by business interests, governmental press ures and inquisitiveness among people

Intelius.com, which make scrutinization its busines s: “Live in the know”

…is recipient of 2011 “5000-Fastest Growing Private Companies” Award.

Only a few actors hold most the data…

E.g., Google: “We have YouTube, Gmail, Google Docs, Google Calendar, Google+, Google Wallet,

Chrome browser, Android mobile platform, etc.”

… and users cannot escape this without being exclude d from necessary services…

=> The risk of a backlash is growing

5

Page 6: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Yet data centers openly assume offenses against privacy lawsPrivacy policies of dominant actors are invalid vs. EU & US standards

Too vague about purposes for which personal data is collected

Security principle is violated

E.g., Facebook does not guarantee any level of data security (2013):

“We do our best to keep Facebook safe, but we cannot guarantee it”

Consent principle is not respected

Personal data is collected, transferred, used witho ut user’s consent

PR SM

Personal data is collected, transferred, used witho ut user’s consent

(Microsoft, Pandora, Yelp get personal information from Facebook)

Personal data of non-users are collected (non-users profiles)

Policy change (frequently) without requesting users ’ consent

Openness (view & correct false data) is not provided in prac tice

E.g., EU versus Facebook affair: 40000 users still waiting (2013)…

Data retention limits are not applied

Data retention is far too long wrt collection purpo ses (e.g., Google)

6

Page 7: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Question: After all, is privacy really required ?

Great untruth #1: Being privacy aware has limited e ffects for business

Argument:

Users do not switch to non-invasive systems (Ixquic k 3% market)

Alternative answer:

MORE privacy gives same number of users, but MORE p ersonal data

(study of +5000 Facebook users)

PR SM

Great untruth #2: Privacy is an old(-fashioned) con cept

Argument:

Youth exposes personal life online more easily, rat her than adults

“Privacy is no longer the social norm” (by M. Zuckerb erg)

Alternative answer:

Household is the adult’s private sphere, but for a teen it is not

The online sphere is their private sphere far from parents prying eyes

7

Page 8: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Answer: YES, privacy is really required

Vulnerable citizens remain under threatA study conducted on 1000 pupils shows that 72% rec eived unpleasant contact

from strangers via online profiles

The current practice “Accept the policy or quit” is not the good option

Blatant failures of emblematic applications due to privacy concernsNational EHRs failed in many countries because doct ors feel spied and patients fear

being discriminated – Prejudice is economical & soci al

PR SM

A new digital divide: applications whishing to foll ow UN chartersOrganizations like NGOs, Healthcare companies, etc. , must build their applications

on infrastructures complying with worldwide privacy laws

Citizens & governments are more and more concernedMore privacy complaints (+30% in France in 2011), m ore citizens feel that their

privacy online is not sufficiently protected (18/24 years become the dominant

category with 78%)

WEF: high risk to lock the value of personal data i f privacy is not provided (2012)

8

Page 9: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Is the current centralised model good wrt privacy protection?Intrinsic problem #1: personal data is exposed to s ophisticated attacks

Cost of attacks proportional to benefits (high on a centralized systems)

One person negligence may affect millions

E.g., hackers who cracked Sony’s PlayStation 3 game system last year

got 12 million credit and debit card numbers …

Intrinsic problem #2: personal data is hostage of s udden privacy changes

Centralised administration of data means delegation of control

PR SM

Centralised administration of data means delegation of control

This leads to regular changes, with application (an d business) evolution,

whit mergers and acquisition, etc.

Increasing security does not solve those intrinsic limitations

E.g., TrustedDB [VLDB11] proposes tamper-resistant hardware to secure

outsourced centralized databases.

9

Page 10: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Alternative solutions?

Alternative solution: for the W.E.F. it would be“a data platform that allows individuals to manage t he collection, usage

and sharing of data in different contexts and for d ifferent types and

sensitivities of data”

Alternative privacy preserving technical solutions are flourishingBased on decentralized & user centric principles

E.g., Freedombox, projectVRM, Personal data servers

PR SM

E.g., Freedombox, projectVRM, Personal data servers

Goal of this presentation : catch a glimpse of the holy Grail

A Personal Data Ecosystem…

… built around user-centricity and trustTransparency: what data is captured, how, for what purpose

Trust: security, integrity, availability, reliabili ty

Control: over the using and sharing of personal dat a

Value: assess the value created by the use of the d ata

10

Page 11: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Outline of the seminar

PART I. Decentralized architecturesReview of privacy-oriented decentralized solutions

An interesting attempts or a panacea ?

Abstract architecture with secure hardwareA see change ?

PART II. Resource constrained data managementHardware description, constraints and problem state ment

PR SM

Hardware description, constraints and problem state ment

Existing data management techniques for constrained HW

Representative structures and evaluation strategies

PART III. Global processingReview of existing solutions

Distributed processing on the asymmetric architectu re

CONCLUSION. A view of expected instances

11

Page 12: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PR SMPRiSM Lab. - UMR 8144

PART IPART IDecentralized Architectures

Page 13: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Decentralized Architectures

Outline of Part IReview of privacy-preserving decentralized solutions

Infomediaries

Vendor Relationship Management

FreedomBox

Decentralized Social Networks

PR SM

Personal Data Server (PDS) architectureA trusted, secure and decentralized architecture for personal data

management

13

Page 14: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Infomediaries (since late 1990)

Infomediary: trusted third party helping consumers to take control over the personal information used by marketersPersonal information is the property of individuals , not of the one who gathers itPersonal data has value ���� provide users with means to monetize and profit fro m

their information profilesTrust: separate the control over personal data from the service provider

AllAdvantage, Bynamite, Mydex, Adnostic, Lumeria, …

PR SM

Source: www.identitywoman.net/mass-educational-databases-wrong-architecture

14

Page 15: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Vendor Relationship Management (VRM, projectvrm.org,since 2006)

VRM: software tools for customers to provide them i ndependence from vendors

VRM is a software implementation of an infomediary

ObservationsNo privacy implemented in the Internet, which mainl y works as a Master-Slave systemCustomer Relationship Management (CRM), 14billion$ market in 2013, but the

customers are not involved“Big Data is turning into Big Brother” (Washington Post)

PR SM

“Big Data is turning into Big Brother” (Washington Post)

(Some of) VRM principlesGive the customer independence and a way to engageSpecify your own terms of serviceBe able to gather, examine and control the use of y our own data

VRM tools to do all that either on your own or with the help of a “fourth party” (a third-party that works for you)a dozen of open source and commercial development p rojects in 2012

15

Page 16: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Privowny (example of VRM software)

PR SM

Source: http://cyber.law.harvard.edu/interactive/events/2012/06/searls

16

Page 17: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

FreedomBox(freedomboxfoundation.org/ , since 2010)

Personal plug servers running open software to rega in

privacy and controlReturn the Internet to its intended P2P architectur e

(dehierarchicalization)

Keep your data in your home

Base hardware requirements

PR SM

Base hardware requirementsCheap (around 30$ for a plug server)

Power consumption < 15W

RAM > 256MB, Flash storage for file system > 512MB

Communication interfaces: network, serial, JTAG

Storage interfaces: SATA, USB, SD

Noise level < 20dB

17

Page 18: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

FreedomBox

Software stack covering a wide range of application s:Secure and anonymous communicationsDistributed Social NetworksPersonal CloudVRM

Trust: secure and anonymous communications, open so ftware, distribution

PR SM

distribution

18

Page 19: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Distributed SN (P2P) or Federated SN (interoperable client-

server implementations)

Main challenges of privacy-preserving DSNSecure message hosting

Secure and anonymous message transfer

Message hosting

Decentralized Social Networks (DSN)

PR SM

Message hostingEncryption and distributed hash table (Lotusnet, Pe erSoN), encryption

and trusted contacts (Safebook)

Attribute-based encryption for fine-grained access control (Persona)

Self-hosting (FreedomBox)

19

Page 20: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Message transfer: communication privacy optimized on the

social graph and physical network topologyHop-by-hop encryption among trusted users (Freenet)

Anonymous routing (Safebook, FreedomBox)

Message transfer in DSN s

Matryoshka

PR SM

Source: Safebook: A Privacy-Preserving Online Social Network Leveraging on Real-Life Trust

20

Anonymous routing in Safebook

Page 21: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Diaspora* ( https://joindiaspora.com/ , since 2010, more than

400 thousand users in 2013, cf. Wikipedia): appeare d as a

response to the many privacy issues engendered by

Facebook/Google

“ ...our distributed design means no big corporation wil l ever control

Diaspora* DSN

PR SM

“ ...our distributed design means no big corporation wil l ever control

Diaspora. Diaspora* will never sell your social lif e to advertisers, and

you won’t have to conform to someone’s arbitrary ru les or look over

your shoulder before you speak. ”

Trust: distribution, open software, users own their data

21

Page 22: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Summary of Distributed Solutions

Common main objective: privacy-preserving services

Different types of decentralized architecturesThree-tier architecture (Infomediary)

Two-tier architecture (VRM)

P2P (FreedomBox , Decentralized Social Networks)

PR SM

P2P (FreedomBox , Decentralized Social Networks)

Hybrid architecture (Decentralized Social Networks, Personal Cloud-

FreedomBox, Personal Data Store)

Built on common principlesUser-centricity and trust (transparency, security, control)

22

Page 23: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Critique of Decentralized Approaches

The Good : do not exhibit the intrinsic limitations of centr alized solutions (privacy, security, etc…)

The Bad : yet, they’ve generally known little success (the privacy paradox)

… and the Challenging : raise important, but interesting

PR SM

challenges Economic: viable business models compatible with privacy

Technical : design a secure Personal Data Server1 - Secure storage of personal data (i.e., local req uirements)2 - Provide the same level of functionality, respons iveness and

availability as a centralized solution (i.e., global requirements)

23

Page 24: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

1. Secure storage with a Personal Data Server

Secure storage under user’s controlData must be made highly available, resilient to fa ilure and protected

against confidentiality and integrity attacksCryptographic keys must be secured and only accessi ble by the userAccessing data from anywhere without privacy breach es

Data integration/aggregationAggregate user’s data in a single location: better usage, privacy, value

PR SM

Aggregate user’s data in a single location: better usage, privacy, valuePersonal data is heterogeneous

Structured/unstructured data, text, images, sound, video …Records of transactions, clickstream data, bookmarks, bills, profiles, projects,

preferences …Data modeling, data integration, querying

Privacy policy definitionIntuitive, simple ways for users to define access c ontrol rules

24

Page 25: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Existing attempts of a Personal Data Server

Many recent initiatives (Mydex, the Locker Project,

Personal.com, data.fm, Qiy Foundation, …)Personal data stores, personal data lockers/vaults, personal cloud

PR SM

Focus on secure storage and data aggregationManaged locally by the user (The Locker Project) or outsourced to a

trusted third party (Mydex, Personal.com)

Federate data from different sources (The Locker Pr oject)

25

Page 26: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Weaknesses of exiting solutions

Important security breaches related to the data sto rageData is stored encrypted in the Cloud (Mydex, Perso nal.com)

But the cryptographic keys are under the control of the service provider

Data is stored locally by the users on their person al computers (The

Locker Project)Raises several problems related to security, durability and availability

PR SM

Many functionalities required to obtain a complete Personal

Data Ecosystem are not providedE.g., Global querying, anonymous data publishing, s ecure sharing,

secure usage and accountability

26

Page 27: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Personal Data Server: functional architecture

Database engine to securely manage personal dataFacilitate the development of applications: data st ructuring, integrity,

queries, transactionsFacilitate the definition/enforcement of access and usage control rules

IHM / Applications Sensors

PR SM

DATA MODEL

Administration

Key Value Store

Data Sharing Manager

Query Manager RecoveryAnonymizer

CONTROL Context Manager

Relational DBMS Files Spatio-temporal

RAW ACCESSLog Containers File System Remote Files

Access & Usage Control

The cloud

27

Page 28: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

2. Required global functionalities of a Personal Da ta Server

Global queryingPersonal data is essential to the development of so cietal related

applications (smart cities, transport, energy, heal thcare …)Transparently query many PDSs as with a centralized database

Anonymous data publishingPDS must allow users to anonymously participate in global treatments

Distributed secure sharing

PR SM

Distributed secure sharingUsers must get a proof of legitimacy for the creden tials exposed by the

participants of a data exchange

Secure usage and accountabilityUsers must not loose control over their data throug h data sharing

KuppingerCole, a security analyst company promotes Life Management Platforms “a new approach for privacy-aware sharing of sensitive information, without the risk of loosing control of that information”

Privacy principles must be enforced for the externa lized data

28

Page 29: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Personal Data Server: complete functional architect ure

Provide global processing facilities (e.g., global queries, production of anonymized releases, data sharing) similar to that of a centralized database

IHM / Applications Sensors

Data Sharing

PR SM

DATA MODEL

Administration

Key Value Store

Data Sharing Manager

Query Manager RecoveryAnonymizer

CONTROL Context Manager

Relational DBMS Files Spatio-temporal

RAW ACCESSLog Containers File System Remote Files

Access & Usage Control

The cloud

29

Page 30: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

How to enforce the security of the PDS architecture

Advent of secure hardware at the edges of the Inter netSecure portable tokens: Secure MCU + Flash storage

A sea change for personal data services Offer privacy guarantees ( >> Trust )

PR SM 30

FLASH (GB size)

SecureMCU

Secure Portable Token

Sim Card(two chips superposed)

USB form factor(MicroSD Flash)

Contactless + USB8GB Flash Secure MicroSD

4G Flash

USB form factor(with SIM card)

Page 31: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Why trust personal secure HW solutions?

Users store their own data ���� minimize abusive usage

Self (user) managed platform ���� no DBA attack

Tamper -resistance + certified code/secure execution + single user

PR SM

Tamper -resistance + certified code/secure execution + single user ���� ratio cost/benefit of an attack is very high

Enforce privacy principles for externalized (shared ) data provided the recipient of the data is another PDSObservation: a user does not have all the privilege s over the data in her

PDS

31

Page 32: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

IHM / Applications

From a local functional architecture to a global distributed architecture

Administration

Sensors

ExternalData Manager

Query Manager RecoveryAnonymizer

CONTROL Context ManagerAccess & Usage Control

Implementation depending on the distributed archite cture model

PR SM 32

DATA MODELKey Value Store

CONTROL Context Manager

Relational DBMS Files Spatio-temporal

RAW ACCESSLog Containers File System Remote Files

Access & Usage Control

The cloudDevice dependent implementation

Page 33: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Global PDS Architectures: a spectrum of solutions

DurabilitySecure sharing

Global querying

PDS asymmetric architectureBuilt on Secure Portable TokensChallenges

Embedded data management (Part II of the seminar)Global querying (Part III of the seminar)

PR SM 33

Present other configurations of global architecture s in the Conclusion and Perspectives

Page 34: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PR SMPRiSM Lab. - UMR 8144

PART IIResource Constrained Data Management

Page 35: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Resource constrained data management

Goal: manage our own data in a secure & personal de vicePersonal folders can be large

E-mails, medical record, official documents (admin., bank, etc.), e-bills (telecom,

Amazon, IP, etc.), digital traces (transport, geo-localized services, etc.), …

A query engine must be embedded (to extract authori zed results)

Outline

PR SM

Outline Target hardware platforms & constraints

Existing techniques & problem statementThe “small RAM – NAND FLASH” paradox

A general framework to solve the problem

Representative proposals for search engine & relati onal DBMS

35

Page 36: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Target hardware: secure personal devices

Common architectureGBs of memory

Sim Card

…on which aGB flash chip

is superposed

USB MicroSDreader

Contactless + USB8GB Flash

Secure MicroSD4GB Flash

Differentformfactors

④④④④①①①① ②②②② ③③③③

..in which a secure chip is implanted

Memory devices…

PR SM

GBs of memory

NAND FLASH (dense, robust, low cost)

Tamper resistant microcontroller [SC02]

Miniaturization,

Protective layers (carrying signal),

Multi-Layering (hide sensitive lines),

Sensors (light/temp/power/freq.)

⇒⇒⇒⇒ Highly costly to attack

& communication interfaces (USB, contactless, APDU)

36

NANDFLASH

(Secure)MCU

BU

S

Page 37: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Strong hardware constraints … with a big impact on data management

Microcontrollers

Small RAM (<128 KB) Favor pipeline evaluation

RAM is not dense ⇒⇒⇒⇒ requires (many) indexes

Security is linked with size

NAND FLASH

⇒⇒⇒⇒

PR SM

NAND FLASH

High cost of random writes Data structures and strategies…

Pages are erased before write … must minimize random writes

Erase by Block vs. write by Page

How do existing techniques deal with these constrai nts ?

37

⇒⇒⇒⇒

Page 38: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Existing Techniques

Light & Embedded databasesEmbedded DBMS, e.g., SQLite, BerkeleyDB

Light DBMS, e.g., DB2 Everyplace, Oracle Database M obile Server

Target small but powerful devices (e.g., smart phon es, set top boxes)

⇒⇒⇒⇒ Small RAM & NAND Flash constraints are not supporte d

PR SM

FLASH aware indexation techniquesB+Tree adaptation: BFTL [TECS07], LATree [VLDB09], FD Tree [VLDB10]

Store index updates in a Flash resident log , itself indexed in RAM

Updates are committed to the B+-Tree in a batch mode (amortize write cost)

Vary in the way log/RAM index are managed

⇒⇒⇒⇒ Not compliant with small RAMSmall RAM ⇒ Small index in RAM ⇒ High commit frequency ⇒ Low gains

38

Page 39: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Existing Techniques (cont.)

Flash aware implementation of key-value storesSkimpyStash [SIG11], LogBase [VLDB12], SILT [SOSP11]

Store the key-value pairs in a log structure in FLASH

An index is maintained in RAM (relatively large size, ~1B per key-value pair)

⇒⇒⇒⇒ Incompatible with small RAM

Data management techniques dedicated to MCU (SoC )

PR SM

Pionee proposals in the area of DBMS for smartcardsPicoDBMS [VLDBJ01], VSDB [TOIS03]

Exploit byte writes accesses (EEPROM, NOR) not avai lable in NAND FLASH

Data management techniques on-chipRDBMS: GhostDB [SIG07], PBFilter [IS12], MiloDB [DAPD1 3]

Search engines: MAX [TSN08], Snoogle [TPDS10], Micro search [TECS10]

39

Details next

Page 40: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Problem statement

Problem : execute queries with a small RAM

on large volumes of data stored in NAND FLASH

Increase RAM consumptionEvaluating queries with a small RAM

Pipeline strategy Compensate

PR SM

The “small RAM – NAND FLASH ” combination…

… leads to paradoxical solutions !

How do recent works resolve the problem ?

Build Indexes Many random writes… unacceptable costs

in NAND FlashIndex maintenance

40

Page 41: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

General framework to solve the problem

Identify/design the needed indexes for a pipeline e valuation

Organize them into Sequentially Written Structures ( SWS)… structures which satisfy Flash constraints by const ruction:

Allocation & de-allocation are only made on a BLOCK basisPartial garbage collection never occurs (avoids costly GC)

Pages are written sequentially (and never updated n or moved)

PR SM

Pages are written sequentially (and never updated n or moved)Proscribes random writes by construction

If more scalability is needed: reorganize the SWSsTransform a SWS into a more efficient SWS

NB: transformation itself must only rely on (tempor ary) SWSs…

How do recent works implement this methodology?

41

Page 42: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

First illustration: embedded search engine

Information retrieval queries From a set of terms, retrieve the K most relevant d ocuments

(according to a weight function like TF-IDF)

Use an inverted index Stores triples (term, docid , value)

TF-IDF(doc) = ΣΣΣΣ value ti,doc x Log( {doc} / {doc containing t i} ){ti} set of

query terms

PR SM

Stores triples (term, docid , value)

Retrieves all triples corresponding to a given term

Classical search algorithmInverted index access for each term of the query

In RAM: one container is allocated / retrieved doci d too much!

used to aggregate the values of the different tripl es for that docid

Sort results and return the K docid with the highest TF-IDF

How to store the index sequentially ? How to answer in pipeline?

42

Page 43: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

How to store the inverted index sequentially (SWS) ?[TECS10]

RAM H3 17

H1

H2

hash table

Index triples(term, value, docid)

Buckets are chained in flash

H3 26

H1

H2

PR SM 43

Sequentially Written Structures (SWS)

FLASH

doc2 doc4

docid=7 docid=9 docid=21 docid=23

Documents

……

Hash buckets of Index triples

Page 44: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

How to evalute the query in pipeline?

The base for the pipeline evaluation strategy:In each hash list, docid are sorted

… t2,1,2t2,1,3t2,1,5∅

t1,5,7t1,1,9∅

… t2,1,20t2,2,21t2,1,23Addr 14

t1,3,21t1,1,23Addr 17

t2,1,25t2,2,28t2,3,30Addr 25

t1,1,25t1,5,28Addr 26

Addr 14 Addr 17 Addr 25 Addr 26 Addr 40 Addr 43

H1 56

H2 40

H3 43

… …

HASH Table (in RAM)

Hash buckets

[TECS10]

docid sorted (desc.)(hash value H3)

PR SM

How to compute the query in pipeline:1 page is allocated in RAM for each hash list conta ining a query term

In practice the number of terms ≤ 3 or 4…

The triples from the hash lists are “merged” on docid⇒ Triples with an equal docid arrive in RAM at the same time…

… the TF-IDF of each docid can be computed (directly)

The K docid with the highest TF-IDF are kept in RAM

Hash buckets(SWS in FLASH)

44

Page 45: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Second illustration: embedded relational database

SQL queriesEvaluate selections, projections, joins, (group by and aggregates)

RDBMS use indexesQ1: How to store an index sequentially (in a SWS)?

Q2: How to make it scalable?

… and algorithms: join algorithms (HJ, SMJ, …) need lots of RAM

PR SM

Join indices could be a solution…

… but consecutive joins incur random access or a RAM-hungry sort

Q3: How to compute select-project-joins queries in pipeline?

σσσσ(CUSTOMER) ORDER LINETEM

Sorted on CUS.id

Sorted on ORD.id

JI

JI

Sorted on CUS.id

45

Page 46: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

How to build a selection index sequentially (SWSs)?

SWS 1: «Keys» (vertical partition)Filled sequentially at tuples’ insertion

CUSTOMER… …… Joe… …… Jack… …… …… …… Paul… …

… …… Lyon… …… Lyon… …… …… …… Lyon… …

t20

t30

[IS12]

Keys

SWS 1

…Lyon…Lyon………

Lyon…

t20

t30

Indexedcolumn

CITY

P2

…Sum2

B.Filters

SWS 2

SWS 2: «Bloom Filters»Summary of SWS 1

BF = summaries 1 page of «Keys»

(consumes ~2B per key)

PR SM

Table scan(640 IOs)

… Paul… …… …… …… …… …… Jim… …… …… Tom… …… …

… Lyon… …… …… …… …… …… Lyon… …… …… Lyon… …… …

t50

t70

t90

46

Efficient: SWS2 + 1 IO/result… but how to achieve scalability?

Summary Scan(17 IOs)

Lyon……………

Lyon……

Lyon……

t50

t70

t90

P16

P68

P78

…Sum16

…Sum68

…Sum78

(consumes ~2B per key)

Written sequentially

Retrieve CUSTOMER.CITY=‘Lyon’Full scan of «Bloom Filters» (SWS 2)

For each BF : if ‘Lyon’ ∈∈∈∈ BF

Negative ⇒⇒⇒⇒ ignore it

Positive ⇒⇒⇒⇒ access 1 page of «Keys»

search ‘Lyon’ & return tuples ptrs

Page 47: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Reorganization process:Only uses seq. structures (SWSs)

Background / interruptible

Ex: Summary scan ���� B-Tree like

Scalability ⇒⇒⇒⇒ timely reorganize the index…to transform it into a more efficient index

[DAPD13]

B-Tree like index

Summary scanindex

Sorted run1

Sorted run2

Temp.SWSs

1) Sort the (key, pointer) pairs

���� Temp. SWS (sorted “runs”)

���� result is SWS: «Sorted Keys»

Keys

…Lyon…

Lyon………

Lyon……………

Lyon……

Lyon…

t20

t50

t70

t90

t30P2

P16

P68

P78

…Sum2

…Sum16

…Sum68

…Sum78

B.Filters

PR SM 47

SWS: «Tree»

B-Tree like index

SWS: «Sorted keys»

K1K2………

……Kn

Lyont20t50t70t90 t30

���� result is SWS: «Sorted Keys»

2) Build a key hierarchy

���� No need of temporary SWS

���� result is SWS: «Tree»

Result: efficient B-Tree like index

… how to evaluate

SQL queries in pipeline?

Lyon……

P78

Page 48: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

How to evaluate SQL queries in pipeline ?[SIG07, DAPB13]

TPCD likeschema

LIN

Project

{LINid ↓ , CUSid, ORDid, PSid}

Execution PlanTjoin Indexes(generalized join index)each rowid of the root

table contains the rowids of the tuples it refers to in the subtreeTselect on

SUP.Name

Tselect IndexesEach key of the index

contains the rowids of the root table refering to it

SELECT CUS.*, ORD.*, LIN.*, PARTSUP.*FROM CUSTOMER CUS, ORDER ORD, LINETEM LIN , PARTSUP PS, SUPPLIER SUPWHERE CUS.CUSkey = ORD.CUSkey AND ORD.ORDkey = LIN.ORDkey AND

LIN.PSkey = PS.PSkey AND PS.SUPkey = SUP.SUPkey ANDCUS.Mktsegment = 'HOUSEHOLD' AND SUP.Name = 'SUPPLIER-1'

π π π πQueryroottable

PR SM 48

LIN

PSORD

SUPCUS PARTselect on SUP.Name

Intersectmerge

Tjoinaccess

Tselectaccess

{LINid} ↓{LINid} ↓

{LINid} ↓

ORDid, PSid}

‘HOUSEHOLD’

Tselectaccess

‘SUPPLIER-1’

Tjoin on LIN

Tselect on CUS.Mktsegment

Tjoin on LIN

LIN

idO

RD

idC

US

idP

Sid

PA

Rid

refers to in the subtree

SU

Pid

SUP.Name

NB: Tselect returnssorted row ids!

Tselect on CUS.marketsegment

t20t50 t30

K1K2………

……

Kn

HOUSEHOLD

σ σ σ σ

π π π π

σ σ σ σ

π π π π

π π π π

table

‘HOUSEHOLD’ ‘SUPPLIER-1’

Page 49: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Conclusion

Encouraging results…A good methodology

To tackle the conflicting small RAM – NAND Flash constraints

Efficient search engines

Efficient SQL queries…a whole DBMS (indexes, tables, updates, buffers) can fit into SWS

… and many remaining challenges

PR SM

… and many remaining challengesExtend those principles to other data models

XML, time series, spatial-temporal data, noSQL & key-value stores, etc.

A general co-design approach is still missingHow to benefit from additional RAM ?

How to adapt to dynamic variations of the HW parameters ?

49

Page 50: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PR SMPRiSM Lab. - UMR 8144

PART III : SECURE GLOBAL PART III : SECURE GLOBAL COMPUTATIONS

The example of Secure computation of Privacy Preser ving Data Publishing Algorithms using Tokens

Page 51: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Secure Global Computationand Anonymous Data Publishing

PART III : OUTLINE

Problem Statement

Current Solutions to Secure Global Computation

Generic Approach

PR SM

Toolkits for Secure Computation

Using Trusted Hardware to Achieve Generic Computati on

Taking on Privacy Preserving Data Publishing

Perspectives

51

Page 52: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PROBLEM STATEMENT

PR SM

Part II of the talk showed how to query local data on PDSs.

Part III of the talk is going to discuss how to com pute

aggregate data using many PDSs.

52

Page 53: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Secure Global Computation on Tokens: Problem StatementOBJECTIVE :

Maintain the functionalities of traditional

database servers managing private data

(availability, durability, scalability of the

system, etc.) while increasing privacy

protection (by using secure tokens and

distributing computation)

PROBLEM :

THREAT MODEL :Infrastructure (SSI) can be :

honest but curiousWeakly-Malicious (Covert Adversary)

Token can be :

PR SM

PROBLEM : The use of secure portable tokens must

not jeopardize traditional data intensive

applications, in particular applications

aggregating data, e.g. Privacy

Preserving Data Publishing, SQL

processing

Unbreakable (honest)A subset can be broken (Weakly Malicious)

The « classical » problem of SecureGlobal Computation is more generaland makes no trust assumption

53

Page 54: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Is this a new problem ?

Several approaches are possible to securely perform global computations:

1. Use only an untrusted server/cloud/P2P and use generic (and costly)

algorithms. (e.g. Secure Multi-Party Computing, fully homomorphic encryption)

����Problem = COST

2. Use only an untrusted server/cloud/P2P and develop a specific algorithm for

PR SM

each specific class of queries or applications. (e.g. DataMining Toolkit), using

low cost primitives ����Problem = GENERICITY

3. Introduce a tangible element of trust, through the use of a trusted component

and develop a generic methodology to execute any centralized algorithm in this

context. (e.g. PDSs) ���� Problem = TRUST

54

Page 55: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

CURRENT SOLUTIONS TO SECURE GLOBAL QUERYING

PR SM 55

Page 56: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

APPROACH I : GENERIC AND SECURE GLOBAL COMPUTATION

PR SM 56

Page 57: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Generic Secure Multi-Party Computing (SMC)

Truly Generic SMC is exponential in the number of inputs and

therefore does not scale. See [Yao82, Yao86].

Other solutions such as [GMW87] do not provide specific

generics to compute a solution (i.e. they need a zero-

knowledge proof to work).Cost is unpractical : the resolution of the millionnaire problem proposed in ’82

PR SM

• Cost is unpractical : the resolution of the millionnaire problem proposed in ’82

is proportional to the size of the values compared.

• Generalization to m different parties requires taking into account cheating

(extra cost).

• [CKL06] have shown that in fact if there is not an honest majority, then only

trivial functions can be computed.

There are (more or less) complicated cryptographic protocols.Protocols are generic in the sense that they comput e values of mathematical functions.Protocols are far too costly .

57

Page 58: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Algebraic approach : Homomorphic Encryption

Homomorphic Encryption is a characteristic of sever al

crypto-systems such as RSA, Paillier, ElGamal, etc.

Example : Consider RSA. Given the RSA public key (e, m),

the encryption of a message x is given by :E(p)=p^e mod m

PR SM

The homomorphic property is :E(p1) x E(p2) = p1^e x p2^e mod m = (p1 x p2)^e mod; m = E(p1 x p2)

58

Page 59: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Fully Homomorphic Encryption

Fully Homomorphic Encrytion means that all ring operators are homomorphic (this means + and x).

Example : we say that E is a fully homomorphic encr yption from ({0,1}, +, x) to (D, ⊕, ⊗⊕, ⊗⊕, ⊗⊕, ⊗) if for all c 1, c2 in D, such that c 1=E(p1) and c 2=E(p2)

E-1(c1⊕⊕⊕⊕ c2) = p1+p2

PR SM

1 2 1 2

E-1(c1⊗⊗⊗⊗ c2) = p1 x p 2

Or more generally E -1(fD(c1,…,cn))=f{0,1}(p1,…,pn)

Why is this a solution ? • Any program with bounded input can be transformed into a Boolean circuit• Any circuit can be transformed into a polyonmial modulo 2• Secure computation of a polynomial equates to securely computing any program• To securely compute a polynomial, it is necessary and sufficient to securely compute +

and x operations.

59

Page 60: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Fully Homomorphic Encryption

For a long time, it was unclear whether fully homom orphic

encrytion was possible.

A first result was proposed using ideal lattice cry ptography

in [Gent09], and has been a hot topic since.

Keys to cypher only a couple of bits are gigabytes in size …

PR SM

Keys to cypher only a couple of bits are gigabytes in size …

���� The cost to have good security is (incredibly) high .

60

Page 61: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

APPROACH 2: TOOLKITS FOR SECURE COMPUTATIONS

PR SM 61

Page 62: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Data Mining Toolkit

Toolkit for Data Mining : [CKV+02] Primitives : – Secure Sum,

– Secure Set Union,

– Secure Size of Set Intersection,

– Scalar Product.

Can compute : Association Rules, Clusters. (Also : efficiency

drops when some participants are dishonest).

PR SM

drops when some participants are dishonest).

Not usable for other applications

(such as PPDP or SQL)

5 R=32

792

37

413

15

15-32 [40] = 23

Secure Sum Primitive

62

Page 63: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Queries on encrypted databases

• Similar questions appear in outsourced databases (DaaS), which have been a hot

research topic for over 10 years : WHAT ARE THE PROBLEMS ?– Same attack model : SSI (DaaS provider) can be untrusted.

– Attacks considered are inference frequency based attacks based on the frequency of cyphertexts.

e.g. Select department, count(distinct salary) from emp

might leak some information on individual salaries, given background knowledge of their distribution in

the company

• Some Solutions :– [HILM02, HIM04] propose techniques to execute various SQL and SQL aggregation queries over

PR SM

– [HILM02, HIM04] propose techniques to execute various SQL and SQL aggregation queries over

encrypted data.

– [HMT04] propose a specific index for range queries

– Many works (such as [ABG+04, AGB+05, SAP03]) use a trusted third party to compute queries, which

makes things simple

– [AEW12] Give a good overview of the problem of securing DaaS in the cloud

While in the approach envisionned no data is outsourced , but some data will be

exported to SSI.

Some techniques proposed in these articles could be useful in the PDS context !

63

Page 64: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

APPROACH 3 : USING TRUSTED HARDWARE TO ACHIEVE GENERIC COMPUTATION

PR SM 64

Page 65: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

A new trend : SMC Using Tokens

• Using tokens to improve the speed of computations : [JKSS10]

• New foundations of SMC [Katz07, GIS+10]

• Limited to Secure Intersect (Oblivious Search): [HL08, FPS+11]

�The primitives used are not « data intensive » primitives. Complex processing

using tokens is a new topic !

PR SM

�These processes involve initializing and sending one or more smart cards, that

can or can not be trusted. (PDSs would be an alternative).

�Smart cards cannot compute everything themselves (this is not introducing a

trusted third party)

The general idea when using Secure Hardware :

Use cheap secure hardware to obtain substancial complexity class gains with

SMC algorithms.

65

Page 66: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

What of Complicated Data Intensive Computations … ?

One of the classical multi-user data intensive appl ications

that require private computations is Privacy Preserving

Data Publishing . (an example of aggregate queries)

We will give some insight on the global framework

PR SM

We will give some insight on the global framework

proposed in [ANP13].

Adapting this type of approach to SQL is ongoing wo rk.

66

Page 67: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

EXAMPLE

PR SM

Taking on Privacy Preserving Data Publishing…

(or more generally aggregation operations)

…using Secure Portable Tokens

67

Page 68: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Privacy Preserving Data Publishing (PPDP)

Raw data Anonymized data (or sanitized …)

Is the process known in advance ?

PR SM

Individuals

(or sanitized …)

Publisher(trusted)

Recipients (no assumption of trust)

68

Page 69: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

What is anonymized data ? [Sweeney02]

Quasi-identifiers ! (QID)It is feasible to de-anonymize some parts of an anonymized dataset

based on quasi-identifiers i.e., sets of attributes that take unique

values over a given dataset.

These quasi-identifiers can then be used to cross information with

other databases or simply to deduce (private/sensit ive) information

from the data published

PR SM

from the data published

Concepts usedTraditional PPDP considers 2 classes of attributes :

Those part of the QID

The others, assumed sensitive

69

Page 70: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PPDP Techniques

Range from anything trivial and simple (pseudonymis ation)

to complex and provable (differential privacy [Dwo0 6])

And other ad hoc techniques …

Time t1

PR SM

k-anonymity

l-diversity

Time t2 There are two fake tuples

m-invarienceADVANTAGE : Global Queries are directDISADVATAGE : Differential Privacy does

not support all types of queries

/!\ Computing a k-anonymous releasemeans computing an AGGREGATION(as in SQL Group By)

70

Page 71: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Overview : Generic Protocol

Computing a query on such an architecture follows 3 steps

1. The querier broadcasts (credentials, query) coupl e

2. Each PDS decides locally whether to participate o r not in the query depending on

privacy models and rules and local opt-in/opt-out c hoices. ���� Collection Phase

3. A distributed protocol is established between par ticipating PDSs and SSI such that

PR SM

the final result can be delivered to the querier.a) Construction Phase (secure computation of the qu ery)

b) Sanitization Phase (sending the results to the q uerier)

/!\ Depending on the complexity of the query, the SS I may only store intermediate

results of may play a more active role in the compu tation

71

Page 72: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Mondrian Algorithm [LeFevre06]

PR SM 72

Page 73: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Parallelizing (and securing) the Approach

Collection phase is naturally parallel !

Construction phase is algorithm dependant . To remain as generic as possible,

this task is delegated to the central server, while disclosing an amount of

information compatible with the privacy requirement s. ����BREAK UP THE

TUPLES !

Encrypted Data (e)

Construction Information (c) / Sanitization informat ion (s)

PR SM

Construction Information (c) / Sanitization informat ion (s)

Safety Information ( ζζζζ)

Sanitization is parallelized on the tokens by sending them batches of information.

73

Page 74: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

And what of malicious adversaries … ?

• First solution : clustering to reduce impact of cracked PDS

• Attacks launched in the case of malicious adversaries can be the creation,

deletion and copy of tuples. (Active attacks)

• Several generic safety properties can be defined, and a supporting meta-

protocol in order to support current PPDP models against such adversaries.

• In the case of covert adversaries, detection is probabilistic.

PR SM

• In the case of covert adversaries, detection is probabilistic.� These counter measures are generic

74

Page 75: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

The MetaP Meta-Algorithm

PR SM 75

Page 76: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Future work

The immediate idea is to work on generic evaluation of SQL Queries !Can be simple in the case of SFW queries without joins

Is harder with joins or with group by

techniques used in the PPDP context will probably be useful

Other types of queries (No-SQL) could also be suppo rtedThe difficult part will often be the aggregate part.

PR SM 76

Page 77: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PR SMPRiSM Lab. - UMR 8144

PERSPECTIVES

Page 78: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Instances of alternative global architectures relyi ng on secure hardware

Personal Social-Medical Folder (Field experiment)A personal folder available at home to ease care co ordination

Each patient owns her medical-social folder in a se cure token

The folder is archived (encrypted) and shared using central services

Local and central copies are synchronized without In ternet connection

Human Powered Information Systems

PR SM

Human Powered Information SystemsSecure and low cost PDS for personal data services in Least

Developed Countries

A delay tolerant network (no infrastructure) is est ablished

Trusted CellsBased on secure personal tokens to regulate persona l data at home

The token is connected and regulates data sharing i n the cloud

78

Page 79: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Personal social-medical folder: architecture elemen ts

Patient’spersonal

server

FLASHSecurechip

JDBC API Healthrecords

DBMS

UI web app

@Central server

(data durability, availability,querying)

PR SM

Synchro. web app

Practitioner’s smart badge

File System

Sync.files

FLASH

Securechip

querying)

79

Page 80: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Availability at patient’s home

EHR on a personal server

Access from a browser by

patient’s visitors (doctors & social

workers, family…)

PR SM

Personal Server

Disconnected access to Personal Servers

(patient)

❩❩❩❩

Smart Badge

80

Page 81: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Care coordination between practitioners

EHRs on a central server

Web access & exchange

Sync. via Smart Badges No data re-entered

No network link required

EHR on a personal server

Access from a browser by

patient’s visitors (doctors & social

workers, family…)

Sync. with central server

PR SM

@Personal Server

External IS

Smart Badge

❪❪❪❪

❩❩❩❩

❫❫❫❫

Sync. with central server via Smart Badges

(practitioner)

81

Page 82: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Human Powered Information Systems (HPIS)

PR SM 82

Page 83: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

Trusted Cells Vision Architecture(credit: Gi-De)

ARM Trust Zone

PR SM 83

Page 84: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PR SMPRiSM Lab. - UMR 8144

THANK YOU

Page 85: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PR SMPRiSM Lab. - UMR 8144

REFERENCES

Page 86: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PART I: Distributed architecture (1/3)

The World Economic Forum. Rethinking Personal Data: Strengthening Trust. May 2012

A. Pentland et al. Personal Data: The Emergence of a New Asset Class. World Economic Forum.

January 2011

H. Nissenbaum, Privacy in context: Technology, poli cy, and the integrity of social life,” Stanford

Law Books, 2010

J. Catlett. Panel on infomediaries and negotiated pr ivacy techniques. In Proceedings of the tenth

conference on Computers, freedom and privacy: chall enging the assumptions, CFP ’00, pages

155–156, New York, NY, USA, 2000

PR SM

155–156, New York, NY, USA, 2000

Mass-Educational Databases = Wrong Architecture, ww w.identitywoman.net/mass-educational-

databases-wrong-architecture

VRM project, http://blogs.law.harvard.edu/vrm/projects/

A. Mitchell, I. Henderson, and D. Searls. Reinventi ng direct marketing — with vrm inside. Journal of

Direct Data and Digital Marketing Practice, 10(1):3 –15, 2008

FreedomBox: http://freedomboxfoundation.org/

Wikipedia. Freedombox, Vendor Relationship Manageme nt, Distributed Social Networks

86

Page 87: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PART I: Distributed architecture (2/3)

L. Cutillo, R. Molva, and T. Strufe. Safebook: A pr ivacy-preserving online social network leveraging on real-life trust. IEEE Communications Magazine, 4 7(12):94–101, 2009

L. M. Aiello and G. Ruffo. Lotusnet: tunable privac y for distributed online social network services. Computer Communications, In Press, 2010

I. Clarke, S. G. Miller, T. W. Hong, O. Sandberg, a nd B. Wiley. Protecting free with freenet. Internet Computing IEEE, 6(February):40–49, 2002

Diaspora*, https://joindiaspora.com/

R. Baden, A. Bender, N. Spring, B. Bhattacharjee, a nd D. Starin. Persona: An online social network with user -defined privacy. Computer, 39(4):135 –146, 2009

PR SM

with user -defined privacy. Computer, 39(4):135 –146, 2009

S. Buchegger, D. Schioberg, L. H. Vu, and A. Datta. PeerSoN: P2P Social Networking - Early Experiences and Insights. In Proceedings of the Sec ond ACM Workshop on Social Network Systems Social Network Systems 2009, co-located wit h Eurosys 2009, Nurnberg, Germany, March 31 2009

A. Narayanan, V. Toubiana, S. Barocas, H. Nissenbau m, D. Boneh: A Critical Look at Decentralized Personal Data Architectures CoRR abs/1202.4503: (201 2)

M. Mun, S. Hao, N. Mishra, K. Shilton, J. Burke, D. Estrin, M. Hansen, and R. Govindan. Personal Data Vaults: a locus of control for personal data s treams. 2010

87

Page 88: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PART I: Distributed architecture (3/3)

Mydex, http://mydex.org/

Mydex. The case for personal information empowermen t : The rise of the personal data store, 2010

The Locker Project, http://lockerproject.org/

Qiy Foundation, www.qiyfoundation.org/

Personal, www.personal.com

KuppingerCole, http://www.kuppingercole.com/report/advisorylifeman agementplatforms7060813412

T. Allard et al.: Secure Personal Data Servers: a V ision Paper. PVLDB 3(1): 25 -35 (2010)

PR SM

T. Allard et al.: Secure Personal Data Servers: a V ision Paper. PVLDB 3(1): 25 -35 (2010)

Giesecke & Devrient, “Portable Security Token”, http ://www.gd-sfs.com/portable-security-token

Eurosmart. Smart USB token. White paper, Eurosmart, 2008, (10p)

ARM-TrustZone, http://www.arm.com/products/processors/technologies /trustzone.php

N. Anciaux, P. Bonnet, L. Bouganim, B. Nguyen, I. S andu Popa, P. Pucheral. Trusted Cells: A Sea Change for Personnal Data Services, in "6th Biennal C onference on Innovative Database Research (CIDR)", Asilomar, États-Unis, 2013

88

Page 89: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PART II: Resource constrained data management (1/4)

Smart card security[SC02] Witteman, M. (2002). Advances in smartcard s ecurity.

Information Security Bulletin, 7(2002), 11-22.

Flash aware indexes[TECS07] Wu, C. H., Kuo, T. W., & Chang, L. P. (200 7). An efficient B-

tree layer implementation for flash -memory storage systems. ACM

PR SM

tree layer implementation for flash -memory storage systems. ACM

Transactions on Embedded Computing Systems (TECS), 6(3), 19.

[VLDB09] Agrawal, D., Ganesan, D., Sitaraman, R., D iao, Y., & Singh, S.

(2009). Lazy-adaptive tree: An optimized index struct ure for flash

devices. Proceedings of the VLDB Endowment, 2(1), 3 61-372.

[VLDB10] Li, Y., He, B., Yang, R. J., Luo, Q., & Yi , K. (2010). Tree

indexing on solid state drives. Proceedings of the VLDB Endowment,

3(1-2), 1195-1206.

89

Page 90: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PART II: Resource constrained data management (2/4)

Flash aware key-value stores[SIG11] Debnath, B., Sengupta, S., & Li, J. (2011, June). SkimpyStash:

RAM space skimpy key-value store on flash-based sto rage. In

Proceedings of the 2011 international conference on Management of

data (pp. 25-36). ACM.

[VLDB12] Vo, H. T., Wang, S., Agrawal, D., Chen, G. , & Ooi, B. C. (2012).

LogBase : a scalable log -structured database system in the cloud.

PR SM

LogBase : a scalable log -structured database system in the cloud.

Proceedings of the VLDB Endowment, 5(10), 1004-1015 .

[SOSP11] Lim, H., Fan, B., Andersen, D. G., & Kamin sky, M. (2011,

October). SILT: A memory-efficient, high-performanc e key-value

store. In Proceedings of the Twenty-Third ACM Sympo sium on

Operating Systems Principles (pp. 1-13). ACM.

90

Page 91: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

DBMS on-chip[VLDBJ01] Pucheral, P., Bouganim , L., Valduriez, P., & Bobineau, C.

(2001). PicoDBMS: Scaling down database techniques for the

smartcard. The VLDB Journal, 10(2-3), 120-132.

[TOIS03] Bolchini, C., Salice, F., Schreiber, F. A. , & Tanca, L. (2003).

Logical and physical design issues for smart card d atabases. ACM

Transactions on Information Systems (TOIS), 21(3), 254-285.

PART II: Resource constrained data management (3/4)

PR SM

Transactions on Information Systems (TOIS), 21(3), 254-285.

[SIG07] Anciaux, N., Benzine, M., Bouganim , L., Pucheral, P., & Shasha,

D. (2007, June). GhostDB: querying visible and hidd en data without

leaks. In Proceedings of the 2007 ACM SIGMOD intern ational

conference on Management of data (pp. 677-688). ACM .

[IS12] Yin, S., & Pucheral, P. (2012). PBFilter: A flash-based indexing

scheme for embedded systems. Information Systems.

91

Page 92: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PART II: Resource constrained data management (4/4)

DBMS on-chip (cont.)[DAPD13] Anciaux, N., Bouganim , L., Pucheral, P., Guo, Y., Le Folgoc,

L., & Yin, S. (2013). MILo-DB: a personal, secure a nd portable

database machine. Distributed and Parallel Database s, 1-27.

Search engines on-chip[TSN08] Yap, K. K., Srinivasan , V., & Motani , M. (2008). Max: Wide area

PR SM

[TSN08] Yap, K. K., Srinivasan , V., & Motani , M. (2008). Max: Wide area

human-centric search of the physical world. ACM Tra nsactions on

Sensor Networks (TOSN), 4(4), 26.

[TPDS10] Wang, H., Tan, C. C., & Li, Q. (2010). Sno ogle: A search

engine for pervasive environments. Parallel and Dis tributed Systems,

IEEE Transactions on, 21(8), 1188-1202.

[TECS10] Tan, C. C., Sheng, B., Wang, H., & Li, Q. (2010). Microsearch:

A search engine for embedded devices used in pervas ive computing.

ACM Transactions on Embedded Computing Systems (TEC S), 9(4).

92

Page 93: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PART III: references (uncomplete)

[ANP13] Allard, T., Nguyen, N., Pucheral, P.: MetaP : Revisiting Privacy-Preserving Data Publishing usi ng Secure

Devices, in DAPD, 55p, to appear.

[CKV+02] Clifton, C., Kantarcioglu, M., Vaidya, J., Lin, X., Zhu, M.Y.: Tools for privacy preserving dis tributed data

mining. SIGKDD Explor. Newsl., vol. 4, pages 28-34, ACM, New York, NY, USA, (2002)

[FPS+11] Fischlin, M., Pinkas, B., Sadeghi, A-R., S chneider, T., Visconti, I.: Secure set intersection with untrusted

hardware tokens. In CT-RSA, (2011).

[Gent09] Gentry, C.: Fully Homomorphic Encryption Us ing Ideal Lattices. In STOC, (2009)

[GIS+10] Goyal, V., Ishai, Y., Sahai, A., Venkatesa n R., Wadia, A.: Founding Cryptography on Tamper-Proof

PR SM

Hardware Tokens. Theory of Cryptography, pp 308-326, (2010)

[GMW87] Goldreich, O., Micali, S., Wigderson, A.: H ow to play ANY mental game. In ACM STOC, pp 218-229, New

York, NY, USA, (1987)

[HILM02] Hacigumus, H., Iyer, B., Li, C., Mehrotra, S.: Executing SQL over encrypted data in database service

provider model. ACM SIGMOD, pp. 216-227. Wisconsin (2002)

[HIM04] Hacigumus, H., Iyer, B. R., Mehrotra, S.: E fficient execution of aggregation queries over encr ypted relational

databases. DASFAA, pp. 125-136. Korea (2004)

[HL08] Hazay, C., Lindell, Y.: Constructions of trul y practical secure protocols using standard smartcards. In ACM

CCS, New York, NY, USA (2008)

Page 94: Personal Data Management with Secure Hardware - Inriaanciaux/MDM-2013.pdf · PR SM PRiSM Lab. - UMR 8144 Personal Data Management with Secure Hardware The Advantage of Keeping your

PART III: references

[JKSS10] Jarvinen, K., Kolesnikov, V., Sadeghi A-R. , Schneider, T.:

Embedded SFE:Offloading Server and Net-work Using H ardware

Tokens. In Financial Cryptography and Data Security (2010)

[Katz07] Katz, J.:Universally Composable Multi-part y Computation

Using Tamper-Proof Hardware. In Advances in Cryptol ogy,

EUROCRYPT '07, pp 115-128, (2007)

PR SM

[Yao82] Yao, A.C.: Protocols for secure computation s. In Annual

Symposium on Foundations of Computer Science, FOCS, pp 160-

164, Washington, DC, USA, (1982)

[Yao86] Yao, A.C.: How to generate and exchange sec rets. In Annual

Symposium on Foundations of Computer Science, FOCS, pp 162-

167, Washington, DC, USA, (1986)