sensitive data in a wired world negative representations of data stephanie forrest dept. of computer...

19
Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM http://cs.unm.edu/~forrest [email protected]

Upload: tabitha-burke

Post on 18-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Sensitive Data In a Wired WorldNegative Representations of Data

Stephanie Forrest Dept. of Computer Science

Univ. of New MexicoAlbuquerque, NM

http://cs.unm.edu/~forrest

[email protected]

Page 2: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Introduction

• Goal: Develop new approaches to data security and privacy that incorporate design principles from living systems:

– Survivability and evolvability – Autonomy– Robustness, adaptation and self repair – Diversity

• Extends earlier work on computational properties of the immune system:

– Intrusion detection– Automated response– Collaborative information filtering

Page 3: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Project Overview

• Immunology and data:– Negative representations of information

• Epidemiology and the Internet: – Social networks matter

– The real world is not always scale free

• The social utility of privacy:– Why is privacy an important value in democratic societies?

– Evolutionary perspective

Page 4: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Collaborations

• Paul Helman and Cris Moore (UNM)• Robert Axelrod and Mark Newman (Univ. Michigan)• Matthew Williamson (Sana Security)• Rebecca Wright and Michael de Mare (Stevens)• Joan Feigenbaum and Avi Silberschatz (Yale)

– Fernando Esponda’s post-doc next year.

Page 5: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

How the Immune System Distributes Detection

• Advantages of distributed negative detection:– Localized (no communication costs)

– Scalable and tunable

– Robust (no single point of failure)

– Private

• Many small detectors matching nonself (negative detection).

• Each detector matches multiple patterns (generalization).

Page 6: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Applications to Computing

• Anomaly detectors earlier work • Information filters earlier work• Adaptive queries future• Negative representations in progress

– A positive set DB is a set of fixed length strings.– A negative set NDB represents all the strings not in DB.– Intuition: If an adversary obtains a string from NDB, little information

is revealed.

Example:– U= All possible four character strings– DB={juan, eric, dave}– U-DB={aaaa, aaab, cris, john, luca, raul, tehj, tosh,.…}– There are 264-3= 456973 strings in U-DB.

Page 7: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Results

• Can U-DB be represented efficiently, given |U-DB| >> |DB| ?– YES: There is an algorithm that creates an NDB of size polynomial in DB.

– Strategy: Compress information using don’t care symbol. Other representations?

• What properties does the representation have?– Membership queries are tractable (linear time even without indexing).

• Other queries, information leakage are future work.

– Inferring information from a subset of NDB (next slide).– Inferring DB from NDB is NP-Hard (note: not doing crypto):

• Currently investigating instance difficulty.• Algorithms for increasing instance difficulty.• On-line insert/delete algorithms preserve problem difficulty.• Collaborations with R. Wright, M. de Mare, and C. Moore.

DB U-DB NDB

000 001 01*

101 010 0*1

111 011 1*0

100

110

Page 8: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

What information is revealed by queries?(without assuming irreversibility)

• Having access to a subset of NDB (or DB) yields some information about strings outside that subset:

– Assume NDB (or DB) is partitioned into n subsets.

• To the query “Is x in DB,” what do I learn about x if x is not in my subset?– Must consult n subsets of NDB to conclude that x is in DB.

– Must consult the subsets only until x is found (on average n/2).

– Assumes that we care more about DB than U-DB.

Probability and information content as the membership of strings is revealed. DB contains 10% of all possible L-length strings (formulas).

Page 9: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Private Set Intersection

• Determine which records are in the intersection of several databases i.e.– DB1 DB2 … DBn

(NDB1 NDB2 … NDBn)

• Each party may compute the intersection– DBi (NDB1 NDB2 … NDBn)

• Party i learns only the intersection of all the sets,• And not the cardinality of the other sets.

Page 10: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Results cont.

• How might these properties be useful?– Protect data from insider attacks

– Computing set intersections

– Surveys involving sensitive information

– Anonymous digital credentials

– Fingerprint databases

– Other ideas?

• Prototype implementations:– Perl, C

– http://esa.ackleyshack.com/ndb – See demo

Page 11: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Computer EpidemiologyJustin Balthrop, Mark Newman, Matt Williamson

• Information spreads over networks of social contacts between computers:– Email address books.– URL links.

• Network topology affects the rate and extent of spreading:– Epidemiological models, and the epidemic threshold.

• Controlling spread on scale-free networks:– Random vaccination is ineffective (e.g., anti-virus software).– Targeted vaccination of high-connectivity nodes.– Control degree distribution in time rather than space.

0 100 200 300 4000

50

100

150

200

250

300

1 10 100 1000DegreekDegreek

IP networkAdminstrator network

1

10

100

1000

10000

Email trafficAddress books

Science 304:527-529 (2004)

Page 12: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

The Social Utility of PrivacyRobert Axelrod and Ryan Gerety

• Typical framing:– Privacy values should remain as is (e.g., Lessig).– Individual rights vs. state (i.e., civil liberties vs. community safety / crime).

• A community may have its own interest in defending individual privacy (and not), independent of the civil liberties argument:

– To promote innovation in changing environments.– To cope with distortions (e.g., overconfidence of middle managers).– To compensate for overgeneralized norms.

• Not necessarily advocating more privacy: – From a societal/informational point of view how should appropriate bounds

on privacy be determined?

• Current status:– Exploratory modeling based on simple games.

Page 13: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Next Steps: Negative Representations

• Distributed negative representations • Leaking partial information• Relational algebra operators on the negative database:

– Select, join, etc.

• Instance difficulty:– Hiding given satisfying assignments in a SAT formula– Approximate representations– Other representations?

• More realistic implementations• Negative data mining:

– Is it easier/harder to find certain instances in NDB?

• Imprecise representations:– Partial matching and queries– Learning algorithms

Page 14: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

People

Stephanie Forrest

Elena Ackley

Fernando Esponda

Paul Helman

Page 15: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Publications

• F. Esponda, S. Forrest, and P. Helman ``Negative representations of information.'' International Journal of Information Security (submitted March 2005).

• F. Esponda, E.~S. Ackley, S. Forrest, and P. Helman ``On-line negative databases.'' Journal of Unconventional Computing (in press).

• F. Esponda, S. Forrest, and P. Helman. ``A formal framework for positive and negative detection.'' IEEE Transactions on Systems, Man, and Cybernetics 34:1 pp. 357-373 (2004).

• J. Balthrop, S. Forrest, M. Newman, and M. Williamson.``Technological networks and the spread of computer viruses.'’ Science 304:527-529 (2004).

• H. Inoue and S. Forrest ``Inferring Java security policies through dynamic sandboxing.'' "2005 International Conference on Programming Languages and Compilers (PLC'05) (in press).

• F. Esponda, E. Ackley, S. Forrest, and P. Helman. ``On-line negative databases.'' Third International Conference on Artificial Immune Systems (ICARIS) Best paper award (2004).

Page 16: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

SUPPLEMENTARY MATERIAL

Page 17: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Probabilities

F1 = P(x ∈ DB | x ∉ NDB fj ) =|DB |

|U | − |NDB fi |

F2 = P(x ∈ DB | x ∉ DB fj ) =|DB | − |DB fj |

|U | − |DB fj |

BACK

HN (x) = −F1 log2F1 − (1− F1)log2(1− F1)

HP (x) = −F2 log2 F2 − (1− F2)log2(1− F2)

Page 18: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Generating Hard-to-Reverse Negative Databases

• The randomized algorithm can be used to create a negative database.

• Insert/Delete operations turn known hard formulas into negative databases.

• The Morph operator may be used to search for hard instances.

Instance Difficulty (l=64)

0

100

200

300

400

500

600

700

800

900

1 2 3 4 5 6 7 8

Specified bits per record (k-SAT)

Decisions (zchaff)

Decisions

Instance Difficulty (Glassy8 formula l=64)

0

10000

20000

30000

40000

50000

60000

1 2 3 4 5 6 7 8

Specified bits per record (k-SAT)

Decisions (zchaff)

Original NDB

Updated NDB

H. Jia, C. Moore and B. Selman "From spin glasses to hard satisfiable formulas” SAT 2004.

Page 19: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest

Effect of the Morph operation

• The Morph operation takes as input a negative database NDB and outputs NDB’ that represents the same set U-DB.

• The plot shows how the complexity of a database changes after applying the morph operator.