sensitive data in a wired world negative representations of data stephanie forrest dept. of computer...
TRANSCRIPT
![Page 1: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/1.jpg)
Sensitive Data In a Wired WorldNegative Representations of Data
Stephanie Forrest Dept. of Computer Science
Univ. of New MexicoAlbuquerque, NM
http://cs.unm.edu/~forrest
![Page 2: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/2.jpg)
Introduction
• Goal: Develop new approaches to data security and privacy that incorporate design principles from living systems:
– Survivability and evolvability – Autonomy– Robustness, adaptation and self repair – Diversity
• Extends earlier work on computational properties of the immune system:
– Intrusion detection– Automated response– Collaborative information filtering
![Page 3: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/3.jpg)
Project Overview
• Immunology and data:– Negative representations of information
• Epidemiology and the Internet: – Social networks matter
– The real world is not always scale free
• The social utility of privacy:– Why is privacy an important value in democratic societies?
– Evolutionary perspective
![Page 4: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/4.jpg)
Collaborations
• Paul Helman and Cris Moore (UNM)• Robert Axelrod and Mark Newman (Univ. Michigan)• Matthew Williamson (Sana Security)• Rebecca Wright and Michael de Mare (Stevens)• Joan Feigenbaum and Avi Silberschatz (Yale)
– Fernando Esponda’s post-doc next year.
![Page 5: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/5.jpg)
How the Immune System Distributes Detection
• Advantages of distributed negative detection:– Localized (no communication costs)
– Scalable and tunable
– Robust (no single point of failure)
– Private
• Many small detectors matching nonself (negative detection).
• Each detector matches multiple patterns (generalization).
![Page 6: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/6.jpg)
Applications to Computing
• Anomaly detectors earlier work • Information filters earlier work• Adaptive queries future• Negative representations in progress
– A positive set DB is a set of fixed length strings.– A negative set NDB represents all the strings not in DB.– Intuition: If an adversary obtains a string from NDB, little information
is revealed.
Example:– U= All possible four character strings– DB={juan, eric, dave}– U-DB={aaaa, aaab, cris, john, luca, raul, tehj, tosh,.…}– There are 264-3= 456973 strings in U-DB.
![Page 7: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/7.jpg)
Results
• Can U-DB be represented efficiently, given |U-DB| >> |DB| ?– YES: There is an algorithm that creates an NDB of size polynomial in DB.
– Strategy: Compress information using don’t care symbol. Other representations?
• What properties does the representation have?– Membership queries are tractable (linear time even without indexing).
• Other queries, information leakage are future work.
– Inferring information from a subset of NDB (next slide).– Inferring DB from NDB is NP-Hard (note: not doing crypto):
• Currently investigating instance difficulty.• Algorithms for increasing instance difficulty.• On-line insert/delete algorithms preserve problem difficulty.• Collaborations with R. Wright, M. de Mare, and C. Moore.
DB U-DB NDB
000 001 01*
101 010 0*1
111 011 1*0
100
110
![Page 8: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/8.jpg)
What information is revealed by queries?(without assuming irreversibility)
• Having access to a subset of NDB (or DB) yields some information about strings outside that subset:
– Assume NDB (or DB) is partitioned into n subsets.
• To the query “Is x in DB,” what do I learn about x if x is not in my subset?– Must consult n subsets of NDB to conclude that x is in DB.
– Must consult the subsets only until x is found (on average n/2).
– Assumes that we care more about DB than U-DB.
Probability and information content as the membership of strings is revealed. DB contains 10% of all possible L-length strings (formulas).
![Page 9: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/9.jpg)
Private Set Intersection
• Determine which records are in the intersection of several databases i.e.– DB1 DB2 … DBn
(NDB1 NDB2 … NDBn)
• Each party may compute the intersection– DBi (NDB1 NDB2 … NDBn)
• Party i learns only the intersection of all the sets,• And not the cardinality of the other sets.
![Page 10: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/10.jpg)
Results cont.
• How might these properties be useful?– Protect data from insider attacks
– Computing set intersections
– Surveys involving sensitive information
– Anonymous digital credentials
– Fingerprint databases
– Other ideas?
• Prototype implementations:– Perl, C
– http://esa.ackleyshack.com/ndb – See demo
![Page 11: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/11.jpg)
Computer EpidemiologyJustin Balthrop, Mark Newman, Matt Williamson
• Information spreads over networks of social contacts between computers:– Email address books.– URL links.
• Network topology affects the rate and extent of spreading:– Epidemiological models, and the epidemic threshold.
• Controlling spread on scale-free networks:– Random vaccination is ineffective (e.g., anti-virus software).– Targeted vaccination of high-connectivity nodes.– Control degree distribution in time rather than space.
0 100 200 300 4000
50
100
150
200
250
300
1 10 100 1000DegreekDegreek
IP networkAdminstrator network
1
10
100
1000
10000
Email trafficAddress books
Science 304:527-529 (2004)
![Page 12: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/12.jpg)
The Social Utility of PrivacyRobert Axelrod and Ryan Gerety
• Typical framing:– Privacy values should remain as is (e.g., Lessig).– Individual rights vs. state (i.e., civil liberties vs. community safety / crime).
• A community may have its own interest in defending individual privacy (and not), independent of the civil liberties argument:
– To promote innovation in changing environments.– To cope with distortions (e.g., overconfidence of middle managers).– To compensate for overgeneralized norms.
• Not necessarily advocating more privacy: – From a societal/informational point of view how should appropriate bounds
on privacy be determined?
• Current status:– Exploratory modeling based on simple games.
![Page 13: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/13.jpg)
Next Steps: Negative Representations
• Distributed negative representations • Leaking partial information• Relational algebra operators on the negative database:
– Select, join, etc.
• Instance difficulty:– Hiding given satisfying assignments in a SAT formula– Approximate representations– Other representations?
• More realistic implementations• Negative data mining:
– Is it easier/harder to find certain instances in NDB?
• Imprecise representations:– Partial matching and queries– Learning algorithms
![Page 14: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/14.jpg)
People
Stephanie Forrest
Elena Ackley
Fernando Esponda
Paul Helman
![Page 15: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/15.jpg)
Publications
• F. Esponda, S. Forrest, and P. Helman ``Negative representations of information.'' International Journal of Information Security (submitted March 2005).
• F. Esponda, E.~S. Ackley, S. Forrest, and P. Helman ``On-line negative databases.'' Journal of Unconventional Computing (in press).
• F. Esponda, S. Forrest, and P. Helman. ``A formal framework for positive and negative detection.'' IEEE Transactions on Systems, Man, and Cybernetics 34:1 pp. 357-373 (2004).
• J. Balthrop, S. Forrest, M. Newman, and M. Williamson.``Technological networks and the spread of computer viruses.'’ Science 304:527-529 (2004).
• H. Inoue and S. Forrest ``Inferring Java security policies through dynamic sandboxing.'' "2005 International Conference on Programming Languages and Compilers (PLC'05) (in press).
• F. Esponda, E. Ackley, S. Forrest, and P. Helman. ``On-line negative databases.'' Third International Conference on Artificial Immune Systems (ICARIS) Best paper award (2004).
![Page 16: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/16.jpg)
SUPPLEMENTARY MATERIAL
![Page 17: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/17.jpg)
Probabilities
€
F1 = P(x ∈ DB | x ∉ NDB fj ) =|DB |
|U | − |NDB fi |
€
F2 = P(x ∈ DB | x ∉ DB fj ) =|DB | − |DB fj |
|U | − |DB fj |
BACK
€
HN (x) = −F1 log2F1 − (1− F1)log2(1− F1)
€
HP (x) = −F2 log2 F2 − (1− F2)log2(1− F2)
![Page 18: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/18.jpg)
Generating Hard-to-Reverse Negative Databases
• The randomized algorithm can be used to create a negative database.
• Insert/Delete operations turn known hard formulas into negative databases.
• The Morph operator may be used to search for hard instances.
Instance Difficulty (l=64)
0
100
200
300
400
500
600
700
800
900
1 2 3 4 5 6 7 8
Specified bits per record (k-SAT)
Decisions (zchaff)
Decisions
Instance Difficulty (Glassy8 formula l=64)
0
10000
20000
30000
40000
50000
60000
1 2 3 4 5 6 7 8
Specified bits per record (k-SAT)
Decisions (zchaff)
Original NDB
Updated NDB
H. Jia, C. Moore and B. Selman "From spin glasses to hard satisfiable formulas” SAT 2004.
![Page 19: Sensitive Data In a Wired World Negative Representations of Data Stephanie Forrest Dept. of Computer Science Univ. of New Mexico Albuquerque, NM forrest](https://reader030.vdocument.in/reader030/viewer/2022032723/56649d145503460f949e8b1e/html5/thumbnails/19.jpg)
Effect of the Morph operation
• The Morph operation takes as input a negative database NDB and outputs NDB’ that represents the same set U-DB.
• The plot shows how the complexity of a database changes after applying the morph operator.