sketching in adversarial environments or sublinearity and cryptography
DESCRIPTION
Sketching in Adversarial Environments Or Sublinearity and Cryptography. Moni Naor. Joint work with: Ilya Mironov and Gil Segev. Comparing Streams. How to compare data streams without storing them?. S A. S B. Step 1: Compress data on-line into sketches - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/1.jpg)
Sketching in Adversarial EnvironmentsOr
Sublinearity and Cryptography
1
Moni Naor
Joint work with: Ilya Mironov and Gil Segev
![Page 2: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/2.jpg)
2
Comparing Streams How to compare data streams without storing them?
SBSA
Step 1: Compress data on-line into sketches Step 2: Interact using only the sketches Goal: Minimize sketches, update time, and communication
![Page 3: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/3.jpg)
3
Comparing Streams
Real-life applications: massive data sets, on-line data,... Highly efficient solutions assuming shared randomness
$ Shared randomness $
How to compare data streams that cannot to be stored?
![Page 4: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/4.jpg)
4
Comparing Streams How to compare data streams that cannot to be stored?
Is shared randomness a reasonable assumption? No guarantees when set adversarially Inputs may be adversarially chosen depending on the randomness
$ Shared randomness $
Plagiarism
detection
![Page 5: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/5.jpg)
5
Communication complexity
Adversarial sketch model
“Adversarial” factors: No secrets Adversarially-chosen
inputs
Massive data sets: Sketching, streaming
The Adversarial Sketch Model
![Page 6: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/6.jpg)
6
The Adversarial Sketch Model
Goal: Compute f(A,B) Sketch phase
An adversary chooses the inputs of the parties Provided as on-line sequences of insert and delete operations No shared secrets The parties are not allowed to communicate Any public information is known to the adversary in advance
Adversary is computationally all powerful Interaction phase
small sketches, fast
updates
low communication &
computation
![Page 7: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/7.jpg)
7
Equality testing A, B µ [N] of size at most K Error probability ²
Our Results
If we had public randomness… Sketches of size O(log(1/²)) Similar update time, communication and computation
Equality testing in the adversarial sketch model requires sketches of size (K¢log(N/K))1/2
Lower Bound
![Page 8: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/8.jpg)
8
Equality testing A, B µ [N] of size at most K Error probability ²
Equality testing in the adversarial sketch model requires sketches of size (K¢log(N/K))1/2
Lower Bound
Explicit and efficient protocol: Sketches of size (K¢polylog(N)¢log(1/²))1/2
Update time, communication and computation polylog(N)
Upper Bound
Our Results
![Page 9: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/9.jpg)
9
(1 + ½)-approximation for any constant ½ Sketches of size (K¢polylog(N)¢log(1/²))1/2
Update time, communication and computation polylog(N)
Explicit construction: polylog(N)-approximation
Our Results Symmetric difference approximation
A, B µ [N] of size at most K Goal: approximate |A Δ B| with error probability ²
Upper Bound
![Page 10: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/10.jpg)
10
Outline Lower bound Equality testing
Main tool: Incremental encoding Explicit construction using dispersers
Symmetric difference approximation Summary & open problems
![Page 11: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/11.jpg)
11
Simultaneous Messages Model
x y
f(x,y)
![Page 12: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/12.jpg)
12
x y
Simultaneous Messages Model
Equality testing in the private-coin SM model requires communication (K¢log(N/K))1/2
Lower Bound
[NS96, BK97]
sketches
adversarial sketch model
![Page 13: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/13.jpg)
13
Outline Lower bound Equality testing
Main tool: Incremental encoding Explicit construction using dispersers
Symmetric difference approximation Summary & open problems
![Page 14: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/14.jpg)
14
Simultaneous Equality Testing
x
C(x)
y
C(y)
Communication K1/2
K
K1/2£K1/2
![Page 15: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/15.jpg)
15
First Attempt
C(A) C(B)row = 3
col = 2
C(B)3,2
Sketches of size K1/2 Problem: update time K1/2
![Page 16: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/16.jpg)
16
Incrementality vs. Distance
Impossible to achieve both properties simultaneously with Hamming distanceHamming distance
High distance:For every distinct A,B µ [N] of size at most K, d(C(A),C(B)) > 1 - ²
Incrementality:Given C(S) and x 2 [N], the encodings of S [ {x} and S \ {x} are obtained by modifying very few entries
logarithmic
constant
![Page 17: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/17.jpg)
17
Incremental Encoding
S C(S)1, ... , C(S)r
d(C(A),C(B)) = 1 - {1 – dH(C(A)i,C(B)i)}i = 1
r
r=1: Hamming distance Hope: Larger r will enable fast updates r corresponds to the communication complexity of our protocol
Want to keep r as small as possible
Explicit construction with r = logK: Codeword size K¢polylog(N) Update time polylog(N)
Normalized Hamming distance
![Page 18: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/18.jpg)
18
Equality Protocol
rows (3,1,1)
cols (2,3,1), values
{1 – dH(C(A)i,C(B)i)} < ²i = 1
r
C(A)1
C(A)2
C(A)3 C(B)3
C(B)2
C(B)1
Error probability:
1 – d(C(A), C(B))
![Page 19: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/19.jpg)
19
The Encoding Global encoding
Map each element to several entries of each codeword Exploit “random-looking” graphs
Local encoding Resolve collisions separately in each entry A simple solution when |A Δ B| is guaranteed to be small
![Page 20: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/20.jpg)
20
The Local Encoding Suppose that |A Δ B| · ℓ
![Page 21: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/21.jpg)
21
Missing Number Puzzle Let S={1,...,N}\{i} – random permutation over S:
(1),....,(N) as a one-way stream One number i is missing
Goal: Determine the missing number i using O(log N) bits
What if there are ℓ missing numbers?• Can it be done using O(ℓ¢logN) bits?
![Page 22: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/22.jpg)
22
The Local Encoding Suppose that |A Δ B| · ℓ
Associate each x 2 [N] with v(x) such that for any distinct x1,...,xℓ the vectors v(x1),...,v(xℓ) are linearly-independent
C(S) = v(x)x 2 S
If 1 · |A Δ B| · ℓ then C(A) C(B) For example v(x) = (1, x, ..., xℓ-1) Size & update time O(ℓ¢logN)
A simple & well-known solution:
Independent of the size of the sets
![Page 23: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/23.jpg)
23
The Global Encoding Each element is mapped into several entries of each codeword The content of each entry is locally encoded
Universe of size N
C1
C2
C3
![Page 24: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/24.jpg)
24
The Global Encoding
Universe of size N
A
B
12
21
2121
12
Each element is mapped into several entries of each codeword The content of each entry is locally encoded The local guarantee:
If 1 · |Ci[y] Å (A Δ B)| · ℓ then C(A) and C(B) differ on Ci[y]
Consider ℓ = 1
C(A) and C(B) differ at least on
these entries
C1[2]
![Page 25: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/25.jpg)
25
The Global Encoding Identify each codeword with a bipartite graph G = ([N],R,E) For S µ [N] define (S,ℓ) µ R as the set of all y 2 R for which
Universe of size N
S
(K, ², ℓ)-Bounded-Neighbor Disperser:For any S ½ [N] such that K · |S| · 2K it holds that
1 · |(y) Å S| · ℓ
|(S,ℓ)| > (1 - ²)|R|
2
12
2
1
![Page 26: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/26.jpg)
26
The Global Encoding
Universe of size N
A
B
r = logK codewords, each Ci is identified with a (2i, ², ℓ)-BND For i = log2|A Δ B| we have dH(C(A)i,C(B)i) > 1 - ² In particular
d(C(A),C(B)) = 1 - {1 – dH(C(A)i,C(B)i)} > 1 - ²i = 1
r
C1
C2
C3
Bounded-Neighbor
Disperser
![Page 27: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/27.jpg)
27
Constructing BNDs
Codeword of length M
Universe of size N
Given N and K, want to optimize M, ℓ, ² and the left-degree D
Optimal Extractor Disperser
1 polylog(N)
log(N/K)
M
D
ℓ
2(loglogN)2
K¢log(N/K)
K¢2(loglogN)2 K
polylog(N)
O(1)
(K, ², ℓ)-Bounded-Neighbor Disperser:For any S ½ [N] such that K · |S| · 2K it holds that
|(S,ℓ)| > (1 - ²)|R|
![Page 28: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/28.jpg)
28
Outline Lower bound Equality testing
Main tool: Incremental encoding Explicit construction using dispersers
Symmetric difference approximation Summary & open problems
![Page 29: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/29.jpg)
29
Symmetric Difference Approximation1. Sketch input streams into codewords2. Compare s entries from each pair of codewords
di - # of differing entries sampled from the i-th pair
3. Output APX = (1 + ½)i for the maximal i s.t. di & (1 - ²)s
A C(A)1, ... , C(A)kB C(B)1, ... , C(B)k
d1 dk
|AΔB|· APX · (1+½)¢ ¢|AΔB|KD(1 - ²)M
non-explicit: » 1explicit:
polylog(N)
![Page 30: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/30.jpg)
30
Outline Lower bound Equality testing
Main tool: Incremental encoding Explicit construction using dispersers
Symmetric difference approximation Summary & open problems
![Page 31: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/31.jpg)
31
Summary Formalized a realistic model for computation over massive data sets
Communication complexity
Adversarial sketch model
“Adversarial” factors: No secrets Adversarially-chosen
inputs
Massive data sets: Sketching, streaming
![Page 32: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/32.jpg)
32
Summary Formalized a realistic model for computation over massive data sets
Incremental encoding Main technical contribution Additional applications?
Determined the complexity of two fundamental tasks Equality testing Symmetric difference approximation
S C(S)1, ... , C(S)r
d(C(A),C(B)) = 1 - {1 –
dH(C(A)i,C(B)i)}i = 1
r
![Page 33: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/33.jpg)
33
Open Problems Better explicit approximation for symmetric difference
Our (1 + ½)-approximation in non-explicit Explicit approximation: polylog(N)
Approximating various similarity measures Lp norms, resemblance,...
Characterizing the class of functions that can be “efficiently” computed in the adversarial sketch model
The Power of Adversarial Sketching
sublinear sketches
polylog updates Possible approach: public-coins to private-coins transformation
that “preserves” the update time
![Page 34: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/34.jpg)
34
Computational Assumptions
Symmetric difference approximation: Not known Even with random oracles!
Thank you!
Better schemes using computational assumptions?
Equality testing: Incremental collision-resistant hashing [BGG ’94] Significantly smaller sketches Existing constructions either have very long public descriptions, or rely on random
oracles Practical constructions without random oracles?
![Page 35: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/35.jpg)
Can also consider multiple intrusions
Pan-Privacy Model
Data is stream of items, each item belongs to a userData of different users interleaved arbitrarilyCurator sees items, updates internal state, output at stream end
Pan-Privacy For every possible behavior of user in stream, joint distribution of the internal state at any single point in time and the final output is differentially private
state
output
![Page 36: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/36.jpg)
Universe U of users whose data in the stream; x 2 U• Streams x-adjacent if same projections of users onto U\{x}
Example: axbxcxdxxxex and abcdxe are x-adjacent • Both project to abcde• Notion of “corresponding locations” in x-adjacent streams
• U -adjacent: 9 x 2 U for which they are x-adjacent– Simply “adjacent,” if U is understood
Note: Streams of different lengths can be adjacent
Adjacency: User Level
![Page 37: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/37.jpg)
Example: Stream Density or # Distinct Elements
Universe U of users, estimate how many distinct users in U appear in data stream
Application: # distinct users who searched for “flu”
Ideas that don’t work:• Naïve
Keep list of users that appeared (bad privacy and space)• Streaming
– Track random sub-sample of users (bad privacy)– Hash each user, track minimal hash (bad privacy)
![Page 38: Sketching in Adversarial Environments Or Sublinearity and Cryptography](https://reader035.vdocument.in/reader035/viewer/2022081516/56815164550346895dbf8fe9/html5/thumbnails/38.jpg)
Pan-Private Density Estimator
Inspired by randomized response.Store for each user x 2 U a single bit bx
Initially all bx 0 w.p. ½1 w.p. ½
When encountering x redraw bx 0 w.p. ½-ε1 w.p. ½+ε
Final output: [(fraction of 1’s in table - ½)/ε] + noise
Pan-PrivacyIf user never appeared: entry drawn from D0
If user appeared any # of times: entry drawn from D1
D0 and D1 are 4ε-differentially private
Distribution D0
Distribution D1