privacy-preserving scientific data analysis in an open cloud c2d: conclave cloud dataverse ·...

13
C2D: Conclave Cloud Dataverse Privacy-Preserving Scientific Data Analysis in an Open Cloud Mayank Varia, Andrei Lapets, Ata Turk, Orran Krieger, Robert Bartlett Baron, Ben Getchell, Nicolas Haddad, Parul Singh

Upload: others

Post on 24-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

C2D: Conclave Cloud Dataverse

Privacy-Preserving Scientific Data Analysisin an Open Cloud

Mayank Varia, Andrei Lapets, Ata Turk, Orran Krieger, Robert Bartlett Baron, Ben Getchell, Nicolas Haddad, Parul Singh

Page 2: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Data Utility vs Data Privacy

• Companies in MA want to compute average salary differences across genders, ethnicities, ... without exposing average salary of any company

• Tier-1 trauma centers in Boston want to generate aggregate reports about cases they service without revealing any patient data• E.g. how many trauma cases they serviced during the marathon bombing

• Researchers in hospitals want to generate aggregate statistics about rare diseases across multiple hospitals without revealing patient data

• Companies want to run data analytics in the public cloud but do not trust a single public cloud provider

Page 3: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Privacy-Preserving Scientific Data Analysis in an Open Cloud

Mass Open Cloud

• Multi-vendor public cloud datacenter

• Collaborative effort: 5 universities, government, industry

Page 4: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Privacy-Preserving Scientific Data Analysis in an Open Cloud

Dataverse Mass Open Cloud

• Open-source platform for data repositories

• Mechanisms to control access

• Incentives to share and credit use of data

Page 5: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Dataverse Mass Open Cloud

Conclave (MPC)

Privacy-Preserving Scientific Data Analysis in an Open Cloud

Page 6: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Images: Facebook, Wikipedia

Toxicsilo data → safeguard privacy

Valuableshare data → new social insights

Page 7: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Toxicsilo data → safeguard privacy

Valuableshare data → new social insights

MPC enables secure data analysis for social good

and

Page 8: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Conclave: MPC for relational queries on big data

MPC query compilation from (unannotated) relational queries

• Static analysis to minimize MPC use while maintaining security

• Trust annotations to indicate when data sharing in the clear is acceptable for even better performance

Prototype implementation that:• Connects to existing backend data stacks like Spark and Hadoop

• Scales 4 magnitudes higher than most MPC engines (~100 GB range)

Code at https://github.com/cici-conclave

Page 9: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

• C2D framework runs on containers

• Each container stores data owned by a single project

• Containers never share data with each other

• Built an OpenShift / K8s container orchestration product with

• In-built job framework

• Capability to manage slack resources on MOC

• Integrate with Elastic Secure Infrastructure to build trusted secure bare-metal enclaves for parties is ongoing

• Demo video at https://youtu.be/_vEJmd_rO-0

The C2D framework

Page 10: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Dataverse Mass Open Cloud

Conclave (MPC)

Benefits of integration

Benefit:Bring cryptographically secure computing to where the data live

Page 11: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Dataverse Mass Open Cloud

Conclave (MPC)

Benefits of integration

Benefit:Leverage unique open cloud environment to improve performance

Page 12: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Dataverse Mass Open Cloud

Conclave (MPC)

Benefits of integration

Synergistic payoff: Separate the responsibilities and amortize the effort of each expert (developers, IT staff, privacy experts, etc.)

Page 13: Privacy-Preserving Scientific Data Analysis in an Open Cloud C2D: Conclave Cloud Dataverse · 2018-10-01 · •Capability to manage slack resources on MOC ... Dataverse Mass Open

Thanks!

Conclave