Download - Towards a theory of data entangelement
Towards a Theory of Data Entanglement
James Aspnes, Joan Feigenbaum,
Aleksandr Yampolskiy, and Sheng Zhong
(Yale University)
Outline
Motivation Dagster and Tangler Our model Notions of entanglement Possibility and impossibility results Conclusion
Goal: Protect Remotely Stored Data from the Server
Question: Suppose you store your data on a remote server. How do you ensure that it is not corrupted by the server?
Answer: Have your data entangled with some VIPs’ data so that corruption of your data corruption of theirs.
Previous Work: Dagster [SW01]
New Documen
t
Encrypt
c randomly chosen blocks
Pool of blocks
Analysis:
Deleting a typical document loss of O(c) documents
Previous Work: Tangler [WM01]
(0, New Documen
t)2 randomly chosen blocks
Pool of n blocks
Analysis:
Deleting a typical document loss of O((log n) / n) documents
Interpolate degree-2 poly F()
(x1,F(x1))(x2,F(x2))
Our Model: Basic Framework
Initialization: Keys are distributed to participants.
Entanglement: Users’ data are combined into a common store.
Tampering: Adversary tampers with the store before it is stored on server.
encoding E
…
d1 d2 dn
initializer Ik1 k2 kn
kE
tamperer storage server
Our Model: Basic Framework (cont.)
Recovery: Users attempt to recover their data.
If Ri returns original document di, we say that user i recovers her data.
…k1 k2 kn
storage server
Our Model : Classification
Question: What can the adversary do to the data store?
Answer: He can… tamper with the store tamper with the store and distribute a new
recovery algorithm to all users (upgrade attack) encrypt the store and distribute his recovery
algorithm only to a few select buddies (superencryption attack)
Our Model : Classification (cont.)
Classification based on recovery algorithm: Standard recovery algorithm
Public recovery algorithm
Private recovery algorithm
…
…
…
Our Model : Classification (cont.)
Classification based on corrupting algorithm: Destructive adversary that reduces entropy of the
data store. Arbitrary adversary.
Altogether, we have 6 (= 3£ 2) adversary classes.
Our Definitions
Fix encoding scheme , adversary , and recovery algorithms Ri.
Recovery vector summarizes which documents are recovered
Our Definitions (cont.)
Data dependency: di depends on dj if, with high probability, di is recovered dj is recovered:
d1
d2
d3 d4
d1 depends on d2
Our Definitions (cont.)
All-or-nothing integrity (AONI): every document depends on every other document:
d1
d2
d3 d4
Our Definitions (cont.)
Symmetric recovery: adversary cannot bias which documents are recovered
Possibility of AONI in Standard-Recovery Model All users use the standard recovery algorithm:
for all i, Ri=R.
When combining data, mark data store using an unforgeable Message Authentication Code (MAC).
Standard recovery algorithm checks MAC: If MAC is valid, recover data. If MAC is invalid, refuse to recover data.
Impossibility of AONI in Public and Private-Recovery Models If any users use the adversary’s recovery
algorithm (for some i, Ri ≠ R), AONI cannot be achieved
Adversary modifies the data store so that old recovery algorithm does not work.
And distributes a new recovery algorithm that flips a coin to decide whether to recover data or not.
Impossibility of AONI in Public and Private-Recovery Models (cont.)
With high probability, not all coin flips will have same result.
With high probability, some data are recovered while others are not.
…
Possibility of Symmetric Recovery in Public-Recovery Model
All users use adversary’s recovery algorithm: for all i,
We can prevent targeted destruction of documents. Documents d1,…, dn must appear i.i.d Encoding scheme must be symmetric:
Possibility of AONI for Destructive Adversaries We can achieve AONI in all recovery models if
tamperer destroys entropy. When combining data, interpolate a polynomial using
points (ki, di).
Store = polynomial. AONI is achieved if sufficient entropy is removed.
Many stores are mapped to single corrupted store.
With high probability, cannot recover every data item.
Summary of Results
Destructive Tamperer
Arbitrary Tamperer
Standard Recovery
all-or-nothing all-or-nothing
Public Recovery
all-or-nothing symmetric recovery
Private Recovery
all-or-nothing
Future Work
We have considered a single-round model. Allowing multiple rounds of storage/retrieval will be more realistic.
What if data entanglement is combined with other techniques like replication? Will that help to defend data against untrusted server(s)?