sensitive data in a wired world negative representations of data stephanie forrest dept. of computer...
TRANSCRIPT
Sensitive Data In a Wired WorldNegative Representations of Data
Stephanie Forrest Dept. of Computer Science
Univ. of New MexicoAlbuquerque, NM
http://cs.unm.edu/~forrest
Introduction
• Goal: Develop new approaches to data security and privacy that incorporate design principles from living systems:
– Survivability and evolvability – Autonomy– Robustness, adaptation and self repair – Diversity
• Extends earlier work on computational properties of the immune system:
– Intrusion detection– Automated response– Collaborative information filtering
Project Overview
• Immunology and data:– Negative representations of information
• Epidemiology and the Internet: – Social networks matter
– The real world is not always scale free
• The social utility of privacy:– Why is privacy an important value in democratic societies?
– Evolutionary perspective
Collaborations
• Paul Helman and Cris Moore (UNM)• Robert Axelrod and Mark Newman (Univ. Michigan)• Matthew Williamson (Sana Security)• Rebecca Wright and Michael de Mare (Stevens)• Joan Feigenbaum and Avi Silberschatz (Yale)
– Fernando Esponda’s post-doc next year.
How the Immune System Distributes Detection
• Advantages of distributed negative detection:– Localized (no communication costs)
– Scalable and tunable
– Robust (no single point of failure)
– Private
• Many small detectors matching nonself (negative detection).
• Each detector matches multiple patterns (generalization).
Applications to Computing
• Anomaly detectors earlier work • Information filters earlier work• Adaptive queries future• Negative representations in progress
– A positive set DB is a set of fixed length strings.– A negative set NDB represents all the strings not in DB.– Intuition: If an adversary obtains a string from NDB, little information
is revealed.
Example:– U= All possible four character strings– DB={juan, eric, dave}– U-DB={aaaa, aaab, cris, john, luca, raul, tehj, tosh,.…}– There are 264-3= 456973 strings in U-DB.
Results
• Can U-DB be represented efficiently, given |U-DB| >> |DB| ?– YES: There is an algorithm that creates an NDB of size polynomial in DB.
– Strategy: Compress information using don’t care symbol. Other representations?
• What properties does the representation have?– Membership queries are tractable (linear time even without indexing).
• Other queries, information leakage are future work.
– Inferring information from a subset of NDB (next slide).– Inferring DB from NDB is NP-Hard (note: not doing crypto):
• Currently investigating instance difficulty.• Algorithms for increasing instance difficulty.• On-line insert/delete algorithms preserve problem difficulty.• Collaborations with R. Wright, M. de Mare, and C. Moore.
DB U-DB NDB
000 001 01*
101 010 0*1
111 011 1*0
100
110
What information is revealed by queries?(without assuming irreversibility)
• Having access to a subset of NDB (or DB) yields some information about strings outside that subset:
– Assume NDB (or DB) is partitioned into n subsets.
• To the query “Is x in DB,” what do I learn about x if x is not in my subset?– Must consult n subsets of NDB to conclude that x is in DB.
– Must consult the subsets only until x is found (on average n/2).
– Assumes that we care more about DB than U-DB.
Probability and information content as the membership of strings is revealed. DB contains 10% of all possible L-length strings (formulas).
Private Set Intersection
• Determine which records are in the intersection of several databases i.e.– DB1 DB2 … DBn
(NDB1 NDB2 … NDBn)
• Each party may compute the intersection– DBi (NDB1 NDB2 … NDBn)
• Party i learns only the intersection of all the sets,• And not the cardinality of the other sets.
Results cont.
• How might these properties be useful?– Protect data from insider attacks
– Computing set intersections
– Surveys involving sensitive information
– Anonymous digital credentials
– Fingerprint databases
– Other ideas?
• Prototype implementations:– Perl, C
– http://esa.ackleyshack.com/ndb – See demo
Computer EpidemiologyJustin Balthrop, Mark Newman, Matt Williamson
• Information spreads over networks of social contacts between computers:– Email address books.– URL links.
• Network topology affects the rate and extent of spreading:– Epidemiological models, and the epidemic threshold.
• Controlling spread on scale-free networks:– Random vaccination is ineffective (e.g., anti-virus software).– Targeted vaccination of high-connectivity nodes.– Control degree distribution in time rather than space.
0 100 200 300 4000
50
100
150
200
250
300
1 10 100 1000DegreekDegreek
IP networkAdminstrator network
1
10
100
1000
10000
Email trafficAddress books
Science 304:527-529 (2004)
The Social Utility of PrivacyRobert Axelrod and Ryan Gerety
• Typical framing:– Privacy values should remain as is (e.g., Lessig).– Individual rights vs. state (i.e., civil liberties vs. community safety / crime).
• A community may have its own interest in defending individual privacy (and not), independent of the civil liberties argument:
– To promote innovation in changing environments.– To cope with distortions (e.g., overconfidence of middle managers).– To compensate for overgeneralized norms.
• Not necessarily advocating more privacy: – From a societal/informational point of view how should appropriate bounds
on privacy be determined?
• Current status:– Exploratory modeling based on simple games.
Next Steps: Negative Representations
• Distributed negative representations • Leaking partial information• Relational algebra operators on the negative database:
– Select, join, etc.
• Instance difficulty:– Hiding given satisfying assignments in a SAT formula– Approximate representations– Other representations?
• More realistic implementations• Negative data mining:
– Is it easier/harder to find certain instances in NDB?
• Imprecise representations:– Partial matching and queries– Learning algorithms
People
Stephanie Forrest
Elena Ackley
Fernando Esponda
Paul Helman
Publications
• F. Esponda, S. Forrest, and P. Helman ``Negative representations of information.'' International Journal of Information Security (submitted March 2005).
• F. Esponda, E.~S. Ackley, S. Forrest, and P. Helman ``On-line negative databases.'' Journal of Unconventional Computing (in press).
• F. Esponda, S. Forrest, and P. Helman. ``A formal framework for positive and negative detection.'' IEEE Transactions on Systems, Man, and Cybernetics 34:1 pp. 357-373 (2004).
• J. Balthrop, S. Forrest, M. Newman, and M. Williamson.``Technological networks and the spread of computer viruses.'’ Science 304:527-529 (2004).
• H. Inoue and S. Forrest ``Inferring Java security policies through dynamic sandboxing.'' "2005 International Conference on Programming Languages and Compilers (PLC'05) (in press).
• F. Esponda, E. Ackley, S. Forrest, and P. Helman. ``On-line negative databases.'' Third International Conference on Artificial Immune Systems (ICARIS) Best paper award (2004).
SUPPLEMENTARY MATERIAL
Probabilities
€
F1 = P(x ∈ DB | x ∉ NDB fj ) =|DB |
|U | − |NDB fi |
€
F2 = P(x ∈ DB | x ∉ DB fj ) =|DB | − |DB fj |
|U | − |DB fj |
BACK
€
HN (x) = −F1 log2F1 − (1− F1)log2(1− F1)
€
HP (x) = −F2 log2 F2 − (1− F2)log2(1− F2)
Generating Hard-to-Reverse Negative Databases
• The randomized algorithm can be used to create a negative database.
• Insert/Delete operations turn known hard formulas into negative databases.
• The Morph operator may be used to search for hard instances.
Instance Difficulty (l=64)
0
100
200
300
400
500
600
700
800
900
1 2 3 4 5 6 7 8
Specified bits per record (k-SAT)
Decisions (zchaff)
Decisions
Instance Difficulty (Glassy8 formula l=64)
0
10000
20000
30000
40000
50000
60000
1 2 3 4 5 6 7 8
Specified bits per record (k-SAT)
Decisions (zchaff)
Original NDB
Updated NDB
H. Jia, C. Moore and B. Selman "From spin glasses to hard satisfiable formulas” SAT 2004.
Effect of the Morph operation
• The Morph operation takes as input a negative database NDB and outputs NDB’ that represents the same set U-DB.
• The plot shows how the complexity of a database changes after applying the morph operator.