a 10000- view · big data framework provider processing frameworks (analytic tools, etc.) platforms...

28
SECURITY AND PRIVACY OF BIG DATA: A NIST WORKING GROUP PERSPECTIVE Arnab Roy Fujitsu Laboratories of America Co-Chair, Security and Privacy Subgroup NIST Big Data Working Group

Upload: others

Post on 14-Jul-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

SECURITY AND PRIVACY OF BIG DATA:

A NIST WORKING GROUP PERSPECTIVE Arnab Roy

Fujitsu Laboratories of America

Co-Chair, Security and Privacy Subgroup

NIST Big Data Working Group

Page 2: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

A 10000-FEET VIEW

Big Data

Scaling

Mixing

Page 3: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

A 10000-FEET VIEW

Big Data

Scaling • Volume

• Velocity

Mixing • Variety

• Veracity

Page 4: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

EMERGENT S&P CONSIDERATIONS

(Big) Scaling - Retarget to Big Data infrastructural shift

Distributed computing platforms like Hadoop

Non-relational data stores

(Data) Mixing – Control visibility

while enabling utility

Balancing privacy and utility

Enabling analytics and governance on

encrypted data

Reconciling authentication and

anonymity

Page 5: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,
Page 6: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

CONCEPTUAL TAXONOMY OF S&P TOPICS

Big Data S&P

Privacy

Provenance System Health

Page 7: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

Big Data S&P

Privacy

Provenance System Health

7

Big Data Application Provider

Visualization Access Analytics Curation Collection

System Orchestrator

DATA

SW

DATA

SW Da

ta C

on

su

me

r

Da

ta P

rovid

er

Horizontally Scalable (VM clusters)

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Big Data Framework Provider Processing Frameworks (analytic tools, etc.)

Platforms (databases, etc.)

Infrastructures

Physical and Virtual Resources (networking, computing, etc.)

DA

TA

SW

Communication

Privacy

Data Visibility

Control

Privacy

Preserving

Dissemination

Page 8: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

Big Data S&P

Privacy

Provenance System Health

8

Big Data Application Provider

Visualization Access Analytics Curation Collection

System Orchestrator

DATA

SW

DATA

SW Da

ta C

on

su

me

r

Da

ta P

rovid

er

Horizontally Scalable (VM clusters)

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Big Data Framework Provider Processing Frameworks (analytic tools, etc.)

Platforms (databases, etc.)

Infrastructures

Physical and Virtual Resources (networking, computing, etc.)

DA

TA

SW

Communication

Integrity

Metadata

Tracking

End-Point

Input

Validation

Control of

Valuable

Assets

Page 9: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

Big Data S&P

Privacy

Provenance System Health

9

Big Data Application Provider

Visualization Access Analytics Curation Collection

System Orchestrator

DATA

SW

DATA

SW Da

ta C

on

su

me

r

Da

ta P

rovid

er

Horizontally Scalable (VM clusters)

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Big Data Framework Provider Processing Frameworks (analytic tools, etc.)

Platforms (databases, etc.)

Infrastructures

Physical and Virtual Resources (networking, computing, etc.)

DA

TA

SW

Big Data Analytics for

Security Intelligence

Resistance to

DoS

Resistance to

DoS

Forensics

Page 10: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

OPERATIONAL TAXONOMY OF S&P TOPICS

Big Data S&P

Device and Application Registration

Identity and Access

Management

Data Governance

Infrastructure Management

Risk and Accountability

Page 11: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

11

Big Data Application Provider

Visualization Access Analytics Curation Collection

System Orchestrator

DATA

SW

DATA

SW Da

ta C

on

su

me

r

Da

ta P

rovid

er

Horizontally Scalable (VM clusters)

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Big Data Framework Provider Processing Frameworks (analytic tools, etc.)

Platforms (databases, etc.)

Infrastructures

Physical and Virtual Resources (networking, computing, etc.)

DA

TA

SW

Security Metadata

Model

Policy Enforcement

and Audit

Registration of

Devices, Users

Registration of

Applications

Big Data S&P

Device and Application

Registration

Identity and Access

Management

Data Governance

Infrastructure Management

Risk and Accountability

Page 12: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

12

Big Data Application Provider

Visualization Access Analytics Curation Collection

System Orchestrator

DATA

SW

DATA

SW Da

ta C

on

su

me

r

Da

ta P

rovid

er

Horizontally Scalable (VM clusters)

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Big Data Framework Provider Processing Frameworks (analytic tools, etc.)

Platforms (databases, etc.)

Infrastructures

Physical and Virtual Resources (networking, computing, etc.)

DA

TA

SW

Virtualization

Layer Identity

Identity

Providers

Application

Layer Identity

Big Data S&P

Device and Application Registration

Identity and

Access Manage

ment

Data Governance

Infrastructure Management

Risk and Accountability

Page 13: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

13

Big Data Application Provider

Visualization Access Analytics Curation Collection

System Orchestrator

DATA

SW

DATA

SW Da

ta C

on

su

me

r

Da

ta P

rovid

er

Horizontally Scalable (VM clusters)

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Big Data Framework Provider Processing Frameworks (analytic tools, etc.)

Platforms (databases, etc.)

Infrastructures

Physical and Virtual Resources (networking, computing, etc.)

DA

TA

SW

Digital Rights

Management

Isolation,

Storage Security

Data Loss

Prevention

and Detection

Big Data S&P

Device and Application Registration

Identity and Access

Management

Data Govern

ance

Infrastructure Management

Risk and Accountability

Data Lifecycle

Management

Encryption and

Key Management

Page 14: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

14

Big Data Application Provider

Visualization Access Analytics Curation Collection

System Orchestrator

DATA

SW

DATA

SW Da

ta C

on

su

me

r

Da

ta P

rovid

er

Horizontally Scalable (VM clusters)

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Big Data Framework Provider Processing Frameworks (analytic tools, etc.)

Platforms (databases, etc.)

Infrastructures

Physical and Virtual Resources (networking, computing, etc.)

DA

TA

SW

Configuration

Management

Logging

Network

Boundary

Control

Threat and

Vulnerability

Management

Big Data S&P

Device and Application Registration

Identity and Access

Management

Data Governance

Infrastructure

Management

Risk and Accountability

Malware

Surveillance and

Remediation

Page 15: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

15

Big Data Application Provider

Visualization Access Analytics Curation Collection

System Orchestrator

DATA

SW

DATA

SW Da

ta C

on

su

me

r

Da

ta P

rovid

er

Horizontally Scalable (VM clusters)

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Horizontally Scalable

Vertically Scalable

Big Data Framework Provider Processing Frameworks (analytic tools, etc.)

Platforms (databases, etc.)

Infrastructures

Physical and Virtual Resources (networking, computing, etc.)

DA

TA

SW

Compliance

Forensics

Accountability

Big Data S&P

Device and Application Registration

Identity and Access

Management

Data Governance

Infrastructure Management

Risk and

Accountability

Business Risk Model

Page 16: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

EMERGING CRYPTOGRAPHIC TECHNOLOGIES

Utility of Encrypted Client Data Technology

No operation possible at Cloud Standard Encryption

Controlled results visible at Cloud Searchable Encryption

- Symmetric

- Asymmetric

Transformations possible, but

results not visible to Cloud

Homomorphic Encryption

Policy-based Access Control Identity-based Encryption

Attribute-based Encryption

Page 17: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

DATA GOVERNANCE:

POLICY-BASED ENCRYPTION

Traditionally access control has been enforced by systems –

Operating Systems, Virtual Machines

Restrict access to data, based on access policy

Data is still in plaintext

Systems can be hacked!

Security of the same data in transit is ad-hoc

What if we protect the data itself in a cryptographic shell

depending on the access policy?

Decryption only possible by entities allowed by the

policy

Keys can be hacked! – but much smaller attack surface

Encrypted data can be moved around, as well as kept at

rest – uniform handling

Page 18: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

IDENTITY-BASED ENCRYPTION

Public-Key Encryption

Id-based Encryption

Certificate Authority

Bob Alice

Signed

Certificate

of Public

Key

Signed

Certificate of

Public Key

Encrypted Data

Master Authority

Bob

Alice

Encrypted Data George

Master Public Key

Page 19: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

POLICY-BASED ENCRYPTION

PK

“Doctor” “Neurology”

“Nurse” “Physical Therapy”

OR

Doctor AND

Nurse ICU

OR

Doctor AND

Nurse ICU

SK SK

=

Mitchell et al.

Page 20: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

PRIVACY PRESERVING DISSEMINATION:

SEARCHING AND FILTERING ENCRYPTED DATA

Suppose you have a system to receive emails encrypted under your public key

However, you do not want to receive spam mails

With plain public key encryption, there is no way to distinguish a legitimate email ciphertext from a spam ciphertext!

However, with recent techniques you can do the following:

Give a ‘token’ to the spam filter

Spam filter can apply token to the ciphertext, only deducing whether it is spam or not

Filter doesn’t get any clue about any other property of the mail!

Page 21: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

SEARCHING AND FILTERING

ENCRYPTED DATA

Encrypter Decrypter

SK PK Filtering

Token

Page 22: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

SECURE DATA FILTRATION

Problem Scenario:

The intelligence gathering community needs to collect a

useful subset of huge streaming sources of data

The criteria for being useful may be classified – private

criteria

Most of the streaming data is useless and storing it all

may be impractical – filter at source

How de we keep the filtering criteria secret even if it is

executing at the source?

Solution: Obfuscate the filtration code

Even if the source falls into enemy hands, it cannot figure

out the criteria

Page 23: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

SECURE DATA FILTRATION

Blogs

Net Traffic

News Feed

Cloud

Secret Criteria

Obfu

scate

Garbled Filter

Encrypted Filtered Data

Decrypt

Filtered Data

Garbled Filter

Private Searching on Streaming Data - Ostrovsky and Skeith, CRYPTO 2005

Page 24: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

SINGLE CLIENT END-TO-END PRIVACY:

SECURE OUTSOURCING OF COMPUTATION

Suppose you want to send all your sensitive data to the

cloud: photos, medical records, financial records, …

You could send everything encrypted

But wouldn’t be much use if you wanted the cloud to

perform some computations on them

What if you wanted to see how much you spent on

movies last month?

Solution: Fully Homomorphic Encryption

Cloud can perform any computation on the underlying

plaintext, all the while the results are encrypted!

Cloud has no clue about the plaintext or the results

Page 25: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

Secure Outsourcing of Computation

Plaintext Ciphertex

t

Processed

Ciphertex

t Plaintext

Encrypt

Homomorphic

Transformation

Fully Homomorphic Encryption (FHE)

Decrypt

• With FHE, computation on plaintext can be transformed into computation on ciphertext

• As a use case, a cloud can keep and process customer’s data without ever knowing the contents – Only customer can decrypt the processed data

– End to end security of customer data

Page 26: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

HOW DOES FHE WORK?

Intuition:

Represent programs

as circuits

Sequence of additions

and multiplications

Transform the input

data to a high

dimensional ring

(popularly, lattices)

Exploit ring

homomorphism with

respect to +,×

Source: http://cseweb.ucsd.edu/~daniele/lattice/lattice.html

Page 27: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

HOW DOES FHE WORK?

How do we ensure that the transformed representation hides the plaintext? Solution: add some noise to

the representation

In sufficiently high dimensions, it is considered hard to derive the closest lattice point, when noise is added

Now, we have a different problem With each +,×, noise grows!

At some point, data may be irrecoverable

Solution: noise reduction techniques Bootstrapping, Modulus

switching

Source: http://cseweb.ucsd.edu/~daniele/lattice/lattice.html

𝒙′

Page 28: A 10000- VIEW · Big Data Framework Provider Processing Frameworks (analytic tools, etc.) Platforms (databases, etc.) Infrastructures Physical and Virtual Resources (networking, computing,

THANKS!