maverick woo 2004-09-22 a brief history of history independent data structures
Post on 19-Dec-2015
217 views
TRANSCRIPT
2004-09-22
Maverick Woo
A Brief History Of A Brief History Of A Brief History Of A Brief History Of
History Independent Data History Independent Data StructuresStructures
History Independent Data History Independent Data StructuresStructures
2
Credit: South China Morning Post
3
Oblivious Data Structures
4
[Mic97][Mic97][Mic97][Mic97]
Daniele MicciancioOblivious data structures: Applications to cryptography
- STOC 1997An attempt to solve the privacy problem in
incremental cryptography
5
[BGG95][BGG95][BGG95][BGG95]
M. Bellare, O. Goldreich, S. GoldwasserIncremental Cryptographyand Application to Virus Protection
- STOC 1995
6
Why Incremental?Why Incremental?Why Incremental?Why Incremental?Most cryptographic primitives act on the
document as a whole.
MD5 Signature
Re-compute from scratch is wastefulIdeally, the running time should be O(f(|change|)).
T
------------------B9ECE18C950AFBFA6B0FDBFA4FF731D3
Th
------------------EEEB9A8EB45DD351D9EC0EB4ACCE66CE
This is a survey on the area of history inde-pendent data struc-tures. I will show you how this area developed in the last decade [. . .]------------------B97799DE817E55BCC3ADE4370246EB0D
…Thi
------------------A4704FD35F0308287F2937BA3ECCF5FE
7
The Free Food CamThe Free Food CamThe Free Food CamThe Free Food CamAt 15 frame per second, 1 M pixel per frame, 3 byte per pixel
http://zimbs4.srv.cs.cmu.edu/coke/ffc.html (Google for the phrase “free food cam”)
8
2004-09-21 13:15:552004-09-21 13:15:552004-09-21 13:15:552004-09-21 13:15:55
9
2004-09-21 13:46:242004-09-21 13:46:242004-09-21 13:46:242004-09-21 13:46:24How do you know these
pictures are
authentic?
How do you know these
pictures are
authentic?
10
Tree Signatures [BGG95]Tree Signatures [BGG95]Tree Signatures [BGG95]Tree Signatures [BGG95]Represent the document as a 2-3 Tree whose leaves contain constant-size block of characters
Non-incrementally sign each leaf and each internal node
Each change only takes O(log n) signature updates
C
A,B
A C D E
D
BS S S S S
S S
S
13
Privacy of Tree SignaturesPrivacy of Tree SignaturesPrivacy of Tree SignaturesPrivacy of Tree SignaturesAn Apparent Paradox
The very nature of digital signature )there can be no secret about the document and its signature…
How can privacy be an issue of when your original intent is to publish the document and its signature?
14
MetadataMetadataMetadataMetadataOct 2000: The Wall Street Journal reports that a candidate running for the U.S. Senate began receiving anonymous emails containing messages written in MS Word criticizing and attacking the candidate. A savvy aide looked at the document properties and discovered they were authored by the chief-of-staff of the opposing party.
Feb 2003: A dossier on Iraq’s security and intelligence organizations, cited by Colin Powell and published by 10 Downing Street, is discovered to have been plagiarized from a U.S. researcher on Iraq. Since the dossier was published on their website in MS Word format, researchers also discovered the four people in the British government who edited the document. They were subsequently called to Parliament for a hearing.
Mar 2004: SCO Group, seller of UNIX and Linux, sent out a warning letter to 1,500 of the world’s largest companies threatening legal liability for using Linux if they failed to obtain a license from the Utah-based company. After filing suit against Daimler-Chrysler, metadata in a MS Word document revealed that the SCO’s attorneys had originally identified Bank of America as the defendant.
From http://www.abanet.org/tech/ltrc/publications/metadata.html
“a fine documen
t”
“a fine documen
t”
Is Microsoft Word really so evil?
Is Microsoft Word really so evil?
16
Metadata in Tree SignaturesMetadata in Tree SignaturesMetadata in Tree SignaturesMetadata in Tree Signatures
Represent the document as a 2-3 Tree whose leaves contain constant-size block of characters
Non-incrementally sign each leaf and each internal node
Where is the metadata?
C
A,B
A C D E
D
BS S S S S
S S
S
How can a
2-3 tree leak
information?
How can a
2-3 tree leak
information?
17
The Wedding Guest ListThe Wedding Guest ListThe Wedding Guest ListThe Wedding Guest List
As you may know, some of us are getting married soon!
As hard-core computer scientists, of course you will maintain the guest list in sorted order
(using an object-oriented multimedia relational XML database)
Congrads!Congrads!C
A,B
A C D E
D
B
18
Manual Sorting Is HardManual Sorting Is HardManual Sorting Is HardManual Sorting Is Hard
BlandfordBlandfordBlellochBlelloch
19
Back To The Guest ListBack To The Guest ListBack To The Guest ListBack To The Guest List
And as the guest list gets compiled, there will always be someone who gets added to the list LAST…
“You added me after you have added {Foo}?”
C
A,B
A C D E
D
B
20
Between B and D is C…Between B and D is C…ontentionontentionBetween B and D is C…Between B and D is C…ontentionontention
Initial
Insert(“D”)
Initial
Insert(“B”)
B
A
A B C E
C
C
A
A C D E
D
B
A
A B C E
C,D
D
C
A,B
A C D E
D
B
21
[NT01][NT01][NT01][NT01]
Moni Naor, Vanessa TeagueAnti-persistence:History Independent Data Structures
- STOC 2001
22
[Mic97][Mic97][Mic97][Mic97]
Daniele MicciancioOblivious data structures: Applications to cryptography
- STOC 1997An attempt to solve the privacy problem in
incremental cryptography
23
Oblivious Data StructuresOblivious Data StructuresOblivious Data StructuresOblivious Data Structures
Informal definition
A data structure is said to be oblivious if it does not give out any knowledge about the sequence of update operations that have been applied to it other than the final result of the operations.
24
Formal Definition [Mic97]Formal Definition [Mic97]Formal Definition [Mic97]Formal Definition [Mic97]Let M be a set of operations,and S be a set of algorithms implementing them.
We say S is oblivious if:
for any two sequences of operations p1, p2, …, pn and q1, q2, …, qm that lead to the same set of values,the execution of these sequences have identical output probability distributions.
25
Oblivious 2-3 TreeOblivious 2-3 TreeOblivious 2-3 TreeOblivious 2-3 Tree
Issue: How do 2-3 trees “leak”?Degrees of nodes give out too much information
Solution: Randomize the degrees!Degrees should split uniformly between 2 and 3
26
Results in [Mic97]Results in [Mic97]Results in [Mic97]Results in [Mic97]Oblivious 2-3 Trees
Insertion and Deletion in expected O(log n) time whpSearch in worst-case O(log n) time
Incremental Signature SchemeSecurity: Tamper Proof, as defined in [BGG95]Privacy: Private (hmm… we will see :P)Expected O(log n) signature updates per change whp
27
For More InformationFor More InformationFor More InformationFor More Information
28
History Independence
29
Are We Done Yet?Are We Done Yet?Are We Done Yet?Are We Done Yet?
In an oblivious 2-3 tree, the degreesindeed don’t tell you anything aboutthe update sequence.
What else can leak information?
C
A,B
A C D E
D
B
30
Interface MattersInterface MattersInterface MattersInterface Matters
Dictionary ADT
Insert(key)Delete(key)Search(key)
Does this allow you to ask“is k1 is inserted some time before k2?”
C
A,B
A C D E
D
B
PrincipleWhen privacy is concerned, if some piece of information cannot be retrieved via the legitimate interface of a system, then it should not be retrievable even with full access to the system.
PrincipleWhen privacy is concerned, if some piece of information cannot be retrieved via the legitimate interface of a system, then it should not be retrievable even with full access to the system.
31
Full AccessFull AccessFull AccessFull Access00000200: 006e 1ef0 6335 0000 563d 0673 77a2 f4fc .n..c5..V=.sw...00000210: c81b 44b0 c62c 7e3e ff89 504e 470d 0a1a ..D..,~>..PNG...00000220: 0a00 0000 0d49 4844 5200 0000 9600 0000 .....IHDR.......00000230: 3908 0600 0001 a34c e24f 0000 0009 7048 9......L.O....pH00000240: 5973 0000 0b13 0000 0b13 0100 9a9c 1800 Ys..............00000250: 0000 0467 414d 4100 00b1 9e61 4c41 f700 ...gAMA....aLA..00000260: 0000 2063 4852 4d00 007a 3000 0080 9800 .. cHRM..z0.....00000270: 00f4 2400 0084 cb00 006d 5f00 00e8 6c00 ..#......m_...l.00000280: 003c 8b00 001b 58ca 10f6 b800 0034 c849 .<....X......4.I00000290: 4441 5478 da62 fc5f 3ff9 fff3 e72a 0ce2 DATx.b._?....*..000002a0: 921e 0c07 9e3f 6778 2629 c970 1848 4bda .....?gx&).p.HK.000002b0: 4b32 3c97 00d2 7640 faf5 3506 49c9 7f0c K2<[email protected]: cf9f 9f07 d23c 40fa 1190 5600 d2df 81b4 .....<@...V.....000002d0: 1190 6664 90e4 5365 0008 2096 e7cf ff01 ..fd..Se.. .....000002e0: 054c 18ae 020d f80b 3448 dfff 3783 1e8f .L......4H..7...000002f0: 24c3 379e df0c 5fc4 2419 76ff bacf 2021 #.7..._.#.v... !00000300: f10a a8e1 1850 1d0b 90be cb20 0d54 f7f4 .....P..... .T..00000310: f90d 205f 15c8 7f0a a4ed 80f4 0d06 8000 .. _............00000320: 6292 9434 04db c006 5470 1e68 201b 372b b..4....Tp.h .7+00000330: 83b3 ab3e c32f 0156 8648 231d 060e 8ec7 ...>./.V.H#.....
32
[NT01][NT01][NT01][NT01]
Moni Naor, Vanessa TeagueAnti-persistence:History Independent Data Structures
- STOC 2001
33
Memory RepresentationMemory RepresentationMemory RepresentationMemory Representation
Most memory allocators have a tendency toallocate newer objects at higher addresses
C
A,B
A C D E
D
B CA,BA CD EDB
Shape Memory
34
History Independence History Independence [NT01][NT01]History Independence History Independence [NT01][NT01]Definition
A data structure implementation is history independent if any two sequences S1 and S2 that yield the same content induce the same distribution on the memory representation.
Similar to Micciancio’s definition, except this is specified on memory representation
35
Single vs. Multiple Single vs. Multiple ObservationsObservationsSingle vs. Multiple Single vs. Multiple ObservationsObservations
Thief Hacker
36
Strong History Independence Strong History Independence [NT01][NT01]Strong History Independence Strong History Independence [NT01][NT01]Definition
Let P1 = {i11, i12, … , i1l} and P2 = {i21, i22, … , i2l}
be two lists of points such that for all b 2 {1, 2} and 1 · j · l,
we have that 1 · ibj · |Sb| and the content of data structure
following S1 up to i1j and S2 up to i2j are identical.
A data structure implementation is strongly history
independent if for any S1,P1,S2,P2 the distributions of the
memory representations at the points i1j and i2j are
identical.
37
Strong History Independence Strong History Independence [NT01][NT01]Strong History Independence Strong History Independence [NT01][NT01]
…
…
D1
S1
P1
D2
S2
P2
i11 i12 i13 i15
i21 i22 i23 i24 i25
i14
38
Weak History Independence Weak History Independence [NT01][NT01]Weak History Independence Weak History Independence [NT01][NT01]
Definition
A data structure implementation is history independent if any two sequences S1 and S2 that yield the same content induce the same distribution on the memory representation.
Similar to definition [Mic97], except this is specified on memory representation
Weakly
??
39
CheckpointCheckpoint
How HIDS got startedWhy obliviousness is inadequateWhat WHI and SHI are
The rest of this talk:- WHI is bearable- SHI is hard
40
Results in [NT01]Results in [NT01]Results in [NT01]Results in [NT01]
Besides the definitions,
WHI object allocatorsApplied to Dynamic perfect hashing
SHI hash tableCurrently does not support deletion
WHI Union-Find (sketch)
41
Incremental Card ShufflingIncremental Card ShufflingIncremental Card ShufflingIncremental Card ShufflingInsert(new card x)
Put the x at the end
Swap x with another card, chosen u.a.r.
Delete(card at position j)Swap card with the last card
Discard the (new) last cardLemma
The permutation of cards remains random with these procedures.
42
Fixed Size Object AllocatorFixed Size Object AllocatorFixed Size Object AllocatorFixed Size Object Allocator
Turn any bounded-indegree, fixed-size record oblivious data structure into WHI
Record allocation takes O(1) timeNeed to be careful when moving
blocks
ExamplesTreapsOblivious 2-3 Trees
44
Variable Size Object Variable Size Object AllocatorAllocatorVariable Size Object Variable Size Object AllocatorAllocator
Create buckets for objects of geometrically increasing size
B1, B2, B4, B8... (exercise :P)
Record allocation takes O(s log s) timeFrom here, adapt the dynamic perfect hashing into WHI
45
Going Back To Card ShufflingGoing Back To Card ShufflingGoing Back To Card ShufflingGoing Back To Card ShufflingInsert(new card x)
Put the x at the end
Swap x with another card, chosen u.a.r.
Don’t store the randomness and we get WHI
Don’t store the randomness and we get WHI
But that also
rules out SHI
But that also
rules out SHI
46
ExampleExampleExampleExample
3, 1, 2
3, 4, 2, 1
3, 4, 5, 1, 2
3, 2, 5, 1
3, 1, 2
3, 5, 2, 1
D1 D2
Ins(4)
Ins(5)
Del(4)
Ins(5)
47
Ordered HashingOrdered HashingOrdered HashingOrdered Hashing
In open addressing, how do you break tie in a collision?Resolve by alphabetical order
O. Amble, D. KnuthOrdered hash tables
- The Computer Journal 17(2), 1974
1 return
2 if
3 for
4 struct
5 while
6 int
7 float
48
SHI Hash TableSHI Hash TableSHI Hash TableSHI Hash Table
Looks like Ordered Hashing on Steroid
Define a priority function p(i, x, y) : {0, …, N-1} £ U £ U {True, False}
Round #
Contestants
Theorem
If p(i, x, y) defines a total order, then the hash table is SHI.
49
SHI Hash TableSHI Hash TableSHI Hash TableSHI Hash TableYouth-rules true if age(i,x) < age(i,y) p(i, x, y) = true if age(i,x) = age(i,y) and x < y
false otherwise
TheoremsFor any sequence of insertions the expected
amortized insertion probe-time for an element is 1/(1-).
For any element, the expected probe-time for successful or unsuccessful search is 1/(1-)
50
Observation in [NT01]Observation in [NT01]Observation in [NT01]Observation in [NT01]
“It is interesting to note that all the SHI data structures we have found have the property that each data structure content has a unique representation.”
SHI ) Unique Representation?
51
SHI ) Unique Representation
52
[H.02][H.02][H.02][H.02]
J. Hartline, E. S. Hong, A. E. Mohr, W. R. Pentney, E. C. RockeCharacterizing History Independent Data Structures
- ISSAC 2002
53
Results in [H.02]Results in [H.02]Results in [H.02]Results in [H.02]A (somewhat) simpler definition of WHI and
SHIShow SHI ) Unique Representation
Relax SHI to a new definition: SHI*The adversary can distinguish between empty and non-empty sequences of operations
Show how to resize WHI data structures
54
HysteresisHysteresisHysteresisHysteresis
n0.75n
2n
n
Capacity
#Items
0
Your wedding
is running out of
budget?
Your wedding
is running out of
budget?
56
[BP03][BP03][BP03][BP03]
Niv Buchbinder, Erez PetrankLower and Upper Bounds on Obtaining History Independence
- Crypto 2003
57
Results In [BP03]Results In [BP03]Results In [BP03]Results In [BP03]SHI ) Unique Representation
Predated by [H.02]
SHI Heap Lowerbound (in the comparison-based model)Insert, Extract, Increase-Key: (n)
WHI Heap UpperboundBuild-Heap: O(n); Increase-Key: O(log n)Insert and Extract-Max: expected O(log n)
58
Heap ReviewHeap ReviewHeap ReviewHeap Review
Binary Heap
10
3 6 2
8 5 4 1
9 7
10
9 7 8 5 4 1 3 6 2
59
www.HeapOrNot.comwww.HeapOrNot.comwww.HeapOrNot.comwww.HeapOrNot.com
2
3 6 5
8 9 4 1
10 7
10
3 6 2
8 5 4 1
9 7
Heapify
Heapify-1
60
Heap Heap Permutation PermutationHeap Heap Permutation Permutation
Build-Heap-1 (H)If size of H is 1, Return HChoose a node i u.a.r. among all nodes in HH Ã heapify-1(H, i)Return MakeNode(root(H),
Build-Heap-1(HL), Build-Heap-1(HR))
61
WHI HeapWHI HeapWHI HeapWHI Heap
1: Pick u.a.r. a heap among all possible heaps with values v1, …, vn.
2: Pick u.a.r. a permutation on the values v1, …, vn. Place them in an (almost) full tree and call Build-Heap.
Theorem 1 and 2 are actually the same distribution.
62
SimulationSimulationSimulationSimulation10
3 6 2
8 5 4 1
9 7
8,2,4,5,3,9,10,1,7,6
8,2,4,5,3,11,10,1,7,6,9
11
1 5 2
7 6 4 8
9 10
3
10,9,7,8,5,4,1,3,6,2
Build-Heap-1
Build-Heap
Insert
Bu
ild-H
eap
-1
63
[A.04][A.04][A.04][A.04]
U. A. Acar, G. E. Blelloch, R. Harper, J. L. Vittes, S. L. M. WooDynamizing Static Algorithms with Applications to Dynamic Trees and History Independence
- SODA 2004Also [ABH02], [ABH03]
64
DynamiziationDynamiziationDynamiziationDynamiziation10
3 6 2
8 5 4 1
9 7
8,2,4,5,3,9,10,1,7,6
8,2,4,5,3,11,10,1,7,6,9
11
1 5 2
7 6 4 8
9 10
3
Build-Heap
Build-Heap
Insert
10,9,7,8,5,4,1,3,6,2
Perm
ute
65
Uniquely Represented Dictionaries
66
[Sny77][Sny77][Sny77][Sny77]
Lawrence SnyderOn Uniquely Represented Data Structures
- FOCS 1977
67
Motivating QuestionsMotivating QuestionsMotivating QuestionsMotivating Questions
unique representation ) linear cost?logarithmic cost ) redundant representation?
Data Structure
Uniquely Represented
Cost
Linked list yes O(n)
Heap yes O(n)
AVL tree no O(log n)
2-3 tree no O(log n)
68
JellyfishJellyfishJellyfishJellyfish
Update and Search take O(n1/2) time
8
124
1 5 9 13
2
3
6
7
10
11
14
15
body
tentacles
69
Snyder’s LowerboundSnyder’s LowerboundSnyder’s LowerboundSnyder’s Lowerbound
Theorem
If a search tree uniquely represents its data, then
“The argumentation of this lemma is quite involved and will not be presented here.”
70
[AO91][AO91][AO91][AO91]
Arne Andersson, Thomas OttmannNew Tight Bounds on Uniquely Represented Dictionaries
- FOCS 1991
71
72
73
74
[ST90][ST90][ST90][ST90]
Rajamani Sundar, Robert E. TarjanUnique Binary Search Tree Representations and Equality-testing of Sets and Sequences
- STOC 1990
76
Future
77
Open ProblemsOpen ProblemsOpen ProblemsOpen Problems
SHI Hash table with deletion (Working on it)SHI* seems to be a very reasonable definition
See what can be done in this model
78
Q & A
79
GGooooooooooooooooooooooooooooooooggllee is is HiringHiringGGooooooooooooooooooooooooooooooooggllee is is HiringHiring
Amitabh Sinha
Operations and Mgmt. Science,
U.Mich. B. Sch.
Amitabh Sinha
Operations and Mgmt. Science,
U.Mich. B. Sch.
Ke Yang
Ke Yang
Nikhil Bansal
IBM Research
Nikhil Bansal
IBM Research
2004-5-15
80
• They are hiring!
• Talk to Ke Yang if you are interested…
(917) 881-2797
He’ll be around until 2004-Oct-1.
Ask him for a GMail account.
81
End
85
Color Test