Download - The Complexity of Pebbling Graphs and Spam Fighting Moni Naor WEIZMANN INSTITUTE OF SCIENCE
The Complexity of Pebbling Graphs and Spam Fighting
Moni NaorWEIZMANN INSTITUTEOF SCIENCE
Based on:
Cynthia Dwork, Andrew Goldberg, N:
On Memory-Bound Functions for Fighting Spam.
Cynthia Dwork, N, Hoeteck Wee:
Pebbling and Proofs of Work
Principal techniques for spam-fighting
1. FILTERING
text-based, trainable filters …
2. MAKING SENDER PAY
computation [Dwork Naor 92, Back 97, Abadi Burrows Manasse Wobber 03, DGN 03, DNW05]
human attention [Naor 96, Captcha]
micropayments
NOTE techniques are complementary: reinforce each other!
Principal techniques for spam-fighting
1. FILTERING
text-based, trainable filters …
2. MAKING SENDER PAY
computation [Dwork Naor 92, Back 97, Abadi Burrows Manasse Wobber 03, DGN 03, DNW 05]
human attention [Naor 96, Captcha]
micropayments
NOTE techniques are complementary: reinforce each other!
Talk Plan
The proofs of work approach
DGN’s Memory bound functions
Generating a large random looking table [DNW]
Open problems: moderately hard functions
Pricing via processing [Dwork-Naor Crypto 92]
automated for the user
non-interactive, single-pass
no need for third party or payment infrastructure
IDEA If I don’t know you: prove you spent significant computational resources (say 10 secs CPU time), just for me, and just for this message
message m, time d
+ proof = f(m,S,R,d)sender S recipient R
easy to verifymoderately hard to compute
Choosing the function f
Message m, Sender S, Receiver R and Date and time d
Hard to compute; f(m,S,R,d) - cannot be amortized
• lots of work for the sender
• Should have good understanding of best methods for computing f
Easy to check ”z = f(m,S,R,d)” - little work for receiver
Parameterized to scale with Moore's Law
• easy to exponentially increase computational cost, while barely increasing checking cost
Example: computing a square root mod a prime vs. verifying it;
x2 =y mod P
Which computational resource(s)?
WANT corresponds to the same computation time across machines
computing cycles
high variance of CPU speeds within desktops
factors of 10-30
memory-bound approach [Abadi Burrows Manasse Wobber 03]
low variance in memory lantencies
factors of 1-4
GOAL design a memory-bound proof of effort function which requires a large number of cache misses
memory-bound model
MAIN MEMORY
large but slow
CACHE
small but fast
USER
MAIN MEMORY
may be very very large
may exploit locality
CACHE
cache size at most ½ user’s main memory
SPAMMER
memory-bound model
MAIN MEMORY
large but slow
CACHE
small but fast
USER
MAIN MEMORY
may be very very large
CACHE
cache size at most ½ user’s main memory
SPAMMER
1. charge accesses to main memory
must avoid exploitation of locality
2. computation is free
except for hash function calls
watch out for low-space crypto attacks
Talk Plan
The proofs of work approach
DGN’s Memory bound functions
Generating a large random looking table [DNW]
Open problems: moderately hard functions
Path-following approach [DGN Crypto 03]
PUBLIC large random table T (2 x spammer’s cache size)
PARAMETERS integer L, effort parameter e
IDEA path is a sequence of L sequential accesses to T
sender searches collection of paths to find a good path
collection depends on (m, S, R, d)
density of good paths = 1/2e
locations in T depends on hash functions H0,…,H3
table T
Path-following approach [DGN Crypto 03]
PUBLIC large random table T (2 x spammer’s cache size)
PARAMETERS integer L, effort parameter e
IDEA path is a sequence of L sequential accesses to T
sender searches collection of paths to find a good path
OUTPUT (m, S, R, d) + description of a good path
COMPLEXITY sending: O(2eL) memory accesses; verifying: O(L) accesses
table T
L
Collection PP of paths. Depends on (m,S,R,d)
Successful Path
Abstracted Algorithm
Sender and Receiver share large random Table T.
To send message m, Sender S, Receiver R date/time d,
Repeat trial for k = 1,2, … until success:
Current state specified by A auxiliary table
Thread defined by (m,S,R,d,k)
Initialization: A = H0(m,S,R,d,k)
Main Loop: Walk for L steps (L=path length):
c = H1(A)
A = H2(A,T[c])
Success: if last e bit of H3(A) = 00…0
Attach to (m,S,R,d) the successful trial number k and H3(A)
Verification: straightforward given (m, S, R, d, k, H3 (A))
Animated Algorithm – a Single Step in the Loop
C = H1(A) A C
A = H2(A,T[C])
T
H1
H2
T[C]
Full Specification
E = (expected) factor by which computation cost exceeds verification = expected number of trials = 2e
If H3 behaves as a random function
L = length of walk
Want, say, ELt = 10 seconds, where
t = memory latency = 0.2 sec
Reasonable choices:
E = 24,000, L = 2048
Also need: How large is A?
A should not be very small…
1. Initialize: A = H0(m,S,R,d,k)
2. Main Loop: Walk for L steps:
c H1(A)
A H2(A,T[c])
3. Success if H3(A) = 0log E
4. Trial repeated for k = 1,2, …
5. Proof = (m,S,R,d,k,H3(A))
abstract algorithm
Choosing the H’s
A “theoretical” approach: idealized random functions
Provide a formal analysis showing that the amortized number of memory access is high
A concrete approach inspired by RC4 stream cipher
Very Efficient: a few cycles per step
Don’t have time inside inner loop to compute complex function
A is not small – changes gradually
Experimental Results across different machines
Path-following approach [Dwork-Goldberg-Naor Crypto 03]
[Remarks]
1. lower bound holds for spammer maximizing throughput across any collection of messages and recipients
2. model idealized hash functions using random oracles
3. relies on information-theoretic unpredictability of T
[Theorem] fix any spammer:
whose cache size is smaller than |T|/2
assuming T is truly random
assuming H0,…,H3 are idealized hash functions
the amortized number of memory accesses per successful message is (2eL).
Why Random Oracles?
Random Oracles 101
Can measure progress:
know which oracle calls must be made
can see when they occur.
First occurrence of each such call is a progress call:
1 2 3 1 3 2 3 4…
1. Initialize: A = H0(m,S,R,d,k)
2. Main Loop: Walk for L steps:
c H1(A)
A H2(A,T[c])
3. Success if H3(A) = 0log E
4. Trial repeated for k = 1,2, …
5. Proof = (m,S,R,d,k,H3(A))
abstract algorithm
Proof highlights
Use of idealized hash function implies:
At any point in time A is incompressible
The average number of oracle calls per success is (EL).
We can follow the progress of the algorithm
Cast the problem as that of asymmetric communication complexity between memory and cache
Only the cache has access to the functions H1 and H2
Cache Memory
Talk Plan
The proofs of work approach
DGN’s Memory bound functions
Generating a large random looking table [DNW]
Open problems
Using a succinct table [DNW 05]
GOAL use a table T with a succinct description
easy distribution of software (new users)
fast updates (over slow connections)
PROBLEM lose information theoretic unpredictability
spammer can exploit succinct description to avoid memory accesses
IDEA generate T using a memory-bound process
Use time-space trade-offs for pebbling Studied extensively in 1970s
User builds the table T once and for all
Pebbling a graph
GIVEN a directed acyclic graph
RULES:
inputs: a pebble can be placed on an input node at any time
a pebble can be placed on any non-input vertex if all immediate parent nodes have pebbles
pebbles may be removed at any time
GOAL find a strategy to pebble all the outputs while using few pebbles and few moves
INPUT
OUTPUT
What do we know about pebbling
Any graph can be pebbled using O(N/log N) pebbles. [Valiant]
There are graphs requiring (N/log N) pebbles [PTC]
Any graph of depth d can be pebbled using O(d) pebbles
Constant degree
Tight tradeoffs: some shallow graphs requires many (super poly) steps to pebble with a few pebbles [LT]
Some results about pebbling outputs hold even when possible to put the available pebbles in any initial configuration
INPUT OUTPUT
1. input node i labeled H4(i)
2. non-input node i labeledH4(i, labels of parent nodes)
3. entries of T =labels of output nodes
Li = H4(i, Lj, Lk)
Lj
Lk
Succinctly generating T
GIVEN a directed acyclic graph constant in-degree
OBSERVATION good pebbling strategy ) good spammer strategy
Converting spammer strategy to a pebbling
EX POST FACTO PEBBLING computed by offline inspection of spammer strategy
1. PLACING A PEBBLE place a pebble on node i if
H4 used to compute Li = H4(i, Lj, Lk), and
Lj, Lk are the correct labels
2. INITIAL PEBBLES place initial pebble on node j if
H4 applied with Lj as argument, and
Lj not computed via H4
3. REMOVING A PEBBLE remove a pebble as soon as it’s not needed anymore
IDEA limit # of pebbles used by the spammer as a function of its cache size and # of bits it brings from memory
computing a label using hash function
lower bound on # moves )lower bound on # hash function calls
using cache + memory fetches
lower bound on # pebbles )lower bound on # memory accesses
CONSTRUCTION dag D composed of D1 & D2
D1 has the property that pebbling many outputs requires many pebbles
more than cache and pages brought from memory can supply
stack of superconcentrators[Lengauer Tarjan 82]
D2 is a fault-tolerant layered graph
even if a constant fraction of each layer is deleted – can still embed a superconcentrator
stack of expanders[Alon Chung 88, Upfal 92]
D1
D2
outputs of D
inputs of D
SUPERCONCENTRATOR is a dag
N inputs, N outputs
any k inputs and k outputs connected by vertex-disjoint paths
Constructing the dag
Using the dag
[idea] fix any execution:
1. let S = set of mid-level nodes pebbled
2. if S is large, use time-space trade-offs for D1
3. if S is small, use fault-tolerant property of D2 :
delete nodes whose labels are largely determined by S
CONSTRUCTION dag D composed of D1 & D2
D1 has the property that pebbling many outputs requires many pebbles
more than cache and pages brought from memory can supply
stack of superconcentrators[Lengauer Tarjan 82]
D2 is a fault-tolerant layered graph
even if a constant fraction of each layer is deleted – can still embed a superconcentrator
stack of expanders [Alon Chung 88, Upfal 92]
The lower bound result
[Remarks]
1. lower bound holds for spammer maximizing throughput across any collection of messages and recipients
2. model idealized hash functions using random oracles
[Theorem] for the dag D, fix any spammer:
whose cache size is smaller than |T|/2
assuming H0,…,H4 are idealized hash functions
makes poly # of hash function calls
the amortized number of memory accesses per successful message is (2e L).
What can we conclude from the lower bound?
Shows that the design principles are sound
Gives us a plausibility argument
Tells us that if something will go wrong we will know where to look
But
Based on idealized random functions
How to implement them
Might be computationally expensive
Are applied to all of A
Might be computationally expensive simply to “touch” all of
Talk Plan
The proofs of work approach
DGN’s Memory bound functions
Generating a large random looking table [DNW]
Open problems: moderately hard functions
Alternative construction based on sorting
motivated by time-space trade-offs for sorting [Borodin Cook 82]
easier to implement
SORT
SORT
…
T[i] = H4(i, 1)
T[i] = H4(i, T[i], 2)
1. input node i labeled H4(i, 1)
2. at each round, sort array
3. then apply H4 to current values of the array
OPEN PROBLEM prove a lower bound
More open problems
WEAKER ASSUMPTIONS? no recourse to random oracles
use lower bounds for cell probe model and branching programs?
Unlike most of cryptography – in this case there is a chance of coming up with an unconditional result
Physical limitations of computation to form a reasonable lower bound on the spammers effort
A theory of moderately hard function?
Key idea in cryptography: use the computational infeasibility of problems in order to obtain security.
For many applications moderate hardness is needed
current applications:
abuse prevention, fairness, few round zero-knowledge
FURTHER WORK develop a theory of moderate hard functions
Open problems: moderately hard functions
Unifying Assumption
In the intractable world: one-way function necessary and sufficient for many tasks
Is there a similar primitive when moderate hardness is needed?
Precise model
Details of the computational model may matter, unifying it?
Hardness Amplification
Start with a somewhat hard problem and turn it into one that is harder.
Hardness vs. Randomness
Can we turn moderate hardness into and moderate pseudorandomness?
Following standard transformation is not necessarily applicable here
Evidence for non-amortization
It possible to demonstrate that if a certain problem is not resilient to amortization, then a single instance can be solved much more quickly?
Open problems: moderately hard functions
Immunity to Parallel Attacks
Important for timed-commitments
For the power function was used, is there a good argument to show immunity against parallel attacks?
Is it possible to reduce worst-case to average case:
find a random self reduction.
In the intractable world it is known that there are limitations on random self reductions from NP-Complete problems
Is it possible to randomly reduce a P-Complete problem to itself?
is it possible to use linear programming or lattice basis reduction for such purposes?
New Candidates for Moderately Hard Functions
Thank you
Merci beaucoup
רבה תודה