Making proof-based verified computation almost practical
Michael Walfish
The University of Texas at Austin
The motivation is 3rd party computing: cloud, volunteers, etc.
We desire the following properties in the above exchange:
1. Unconditional, meaning no assumptions about the server
2. General-purpose, meaning not specialized to a particular f
3. Practical, or at least conceivably practical soon
“f”, x
y, aux.client server
check whether y = f(x), without
computing f(x)
By verified computation, we mean the following:
REJECT
ACCEPT
Unfortunately, the constants and proof length are outrageous.
“f”, xy
client servery’
...
...
Theory can supposedly help. Consider the theory of Probabilistically Checkable Proofs (PCPs). [ALMSS92,
AS92]
Using a naive PCP implementation, verifying multiplication of 500×500 matrices would take 500+ trillion CPU years (seriously).
We have reduced the costs of a PCP-based argument system [Ishai et al., CCC07] by 20+ orders of magnitude, with proof.
We have implemented the refinements in a system that is not ready for prime time but is practical in some cases.
We believe that PCP-based machinery will be a key tool for building secure systems in the near future.
S. Setty, B. Braun, V. Vu, A. J. Blumberg, B. Parno, and M. Walfish. Resolving the conflict between generality and plausibility in verified computation. Cryptology ePrint 622, November 2012.
S. Setty, V. Vu, N. Panpalia, B. Braun, A. J. Blumberg, and M. Walfish. Taking proof-based verified computation a few steps closer to practicality. Usenix Security, August 2012.
S. Setty, R. McPherson, A. J. Blumberg, and M. Walfish. Making argument systems for outsourced computation practical (sometimes). Network and Distributed Systems Security Symposium (NDSS), February 2012.
S. Setty, A. J. Blumberg, and M. Walfish. Toward practical and unconditional verification of remote computations. Workshop on Hot Topics in Operating Systems (HotOS), May 2011.
ACCEPT/REJECT
“f”, xclient server
y
...
...
The proof is not drawn to scale: it is far too long to be transferred.
Pepper incorporates PCPs but not like this:
(Even the asymptotically short PCPs [BGHSV CCC05, BGHSV SIJC06, Dinur
JACM07, Ben-Sasson & Sudan SIJC08] seem to have high constants.)
We move out of the PCP setting: we make computational assumptions. (And we allow # of query bits to be superconstant.)
client server...
[IKO CCC07]
...
serverclientcommit request
commit response
q1w q2w q3w
… Pepper uses an efficient argument [Kilian CRYPTO 92,95]:
Instead of transferring the PCP …
q1, q2, q3, …PCPQuery(q){ return <q,w>;}
ACCEPT/REJECT
efficient checks [ALMSS JACM98]
queries
The server’s vector w encodes an execution trace of f(x).
w
f ( )
What is in w? (1)An entry for each wire; and(2)An entry for the product of each pair of wires.
x
x0
x1
xn
…
AND
OR
AND
01
1
10
y0
y1
10
01
1NOT
NOT
1
0
NOT
0
010
[ALMSS JACM98]
[IKO CCC07]...
serverclientcommit request
commit response
q1w q2w q3w
Pepper uses an efficient argument [Kilian CRYPTO 92,95]:
q1, q2, q3, …PCPQuery(q){ return <q,w>;}
ACCEPT/REJECT
[ALMSS JACM98]
queries
This is still too costly (by a factor of 1023), but it is promising.
efficient checks
“f”y
client server
wcommit request
commit response
, x
query vectors: q1, q2, q3, …
response scalars: q1w, q2w,
q3w, …
ACCEPT/REJECT
checks
queries
PEPPER incorporates refinements to [IKO CCC07], with proof.
query vectors: q1, q2, q3, …
w1
w2
w3
client server
The client amortizes its overhead by reusing queries over multiple runs. Each run has the same f but different input x.
“f”y
client server
response scalars: q1w, q2w,
q3w, …
wcommit request
commit response
, x
query vectors: q1, q2, q3, …
ACCEPT/REJECT
checks
queries ✔
Boolean circuit
Arithmetic circuit
Arithmetic circuit with
concise gates×
×
×
×
+
+
+ ab abab
something gross
Unfortunately, this computational model does not really handle fractions, comparisons, logical operations, etc.
If the output is 7 …
… there is a solution
An incorrect input/output pair results in unsatisfiable constraints.
0 = X - 6 0 = Y - (X + 1)0 = Y - 7
We now work with (degree-2) constraints over a finite field.
Increment(X) Y=X + 1
0 = X - <input>,0 = Y - (X+1),0 = Y - <output>
As an example, suppose the input above is 6.
If the output is 8 …
… there is no solution
0 = X - 6 0 = Y - (X + 1)0 = Y - 8
This formalism represents general-purpose program constructs concisely: if/else, !=, <, >, ||, &&, floating-point, …
“Z1 != Z2” 1 = (Z1 – Z2 ) M
if (bool X) Y=3else Y=4
0 = X - MY = M3 + (1-M)4
We have a compiler that does this transformation; it is derived from Fairplay [Malkhi et al. Usenix Security04].
This work should apply to other protocols.
Z1=23, Z2=187, …,
w
10
10
1
010
w
1013204
7
18723
219
01
805
The proof vector now encodes the assignment that satisfies the constraints.
Take-away: constraints are general-purpose, (relatively) concise, and accommodate the verification machinery.
01
1
10 1
0
0
1 = (Z1 – Z2 ) M
0 = Z3 − Z4
0 = Z3Z5 + Z6 − 5
The savings from the change are enormous (though the new model is not perfect – loops are still unrolled).
“f”y
client server
wcommit request
commit response
, x
query vectors: q1, q2, q3, …
✔
response scalars: q1w, q2w,
q3w, …
ACCEPT/REJECT
checks
queries ✔
w
server
We (mostly) eliminate the server’s PCP-based overhead.
before: # of entries quadratic in computation
size
after: # of entries linear in computation
size
Now, the server’s overhead is mainly in the cryptographic machinery and the constraint model itself.
The client and server reap tremendous benefit from this change.
|w|=|Z|2
PCP verifier
This resolves a conjecture of Ishai et al. [IKO
CCC07]
For any computation, there exists a linear PCP whose proof vector is (quasi)linear in the computation size.
[Gennaro et al. 2012]
linearity testquad corr.
testcircuit test
commit request
commit
response
queries
responsesπ(q1), …, π(qu)
π()=<,w>
q1, q2, …, qu
(z, z ⊗ z)
(z, h)
serverclient
new quad.test
|w|=|Z|+|C|
w
w
“f”, xyclient server
commit request
commit response
query vectors: q1, q2, q3, … w
✔
✔
✔
response scalars: q1w, q2w,
q3w, …
checks
queries
ACCEPT/REJECT
linearity test
quad. test
We prove: commitment primitive is (far) stronger than linearity test
commit request
commit
response
consistency test
queries
responses
PCP verifier
client
We eliminate linearity tests.
linearity: π(q1) + π(q2) = π(q1+q2)
Enc(r)
consistency:t = r + α1q1 + … + αuqu
π(t) = π(r) + α1π (q1) + … + αuπ (qu)
? ?
Enc(π(r))
π(t)
π()
q1, q2, …, qu
server
t
π(q1), …, π(qu)
?
We eliminate soundness amplification (repetition to reduce error).
≣
“f”, xyclient server
commit request
commit response
query vectors: q1, q2, q3, … w
✔
✔
✔
response scalars: q1w, q2w,
q3w, …
checks
queries
ACCEPT/REJECT
✔
✔
Our implementation of the server is massively parallel; it is threaded, distributed, and GPUed.
Some details of our evaluation platform:
It uses a cluster at Texas Advanced Computing Center (TACC)
Each machine runs Linux on an Intel Xeon 2.53 GHz with 48GB of RAM.
For GPU experiments, we use NVIDIA Tesla M2070 GPUs (448 CUDA cores and 6GB of memory)
To get a sense of the cost reduction, consider amortized costs for multiplication of 500×500 matrices:
Under the theory,naively applied
Under PEPPER
client CPU time >100 trillion years
<2.2 seconds
server CPU time >100 trillion years
6.5 hours
However, this assumes a (fairly large) batch.
1. What are the break-even points?
2. At these points, is the server’s latency acceptable?
# instances
CPU time
computation costs
verification costs
break-even point
break-even 4100 inst. 30,500 inst.
24,500 inst.
client CPU time 2.5 hours 27 minutes
25 minutes
server CPU time 3.1 years 10 months 9.5 months
break-even 20 inst. 1570 inst. 990 inst.
client CPU time 44 sec. 83 sec. 60 sec.
server CPU time 5.4 days 16 days 11.9 days
Consider outsourcing in two different regimes:
(1) Verification work performed on CPU
(2) If we had free crypto hardware for verification …
mat. mult.,m=500
insertion sort,m=256
PAM clustering,m=15, d=100
1 c
ore
4 c
ore
s6
0 c
ore
s6
0 c
ore
s(i
deal)
1 c
ore
4 c
ore
s6
0 c
ore
s
60 c
ore
s(i
deal)
1 c
ore
4 c
ore
s
60 c
ore
s(i
deal)
1 c
ore
4 c
ore
s6
0 c
ore
s
60
core
s(i
deal)
matrix multiplication
insertion sort PAM clustering all-pairs shortest paths
60
core
s
Parallelizing the server results in near-linear speedup
polynomial evaluation
Hamming distance root finding Fannkuch benchmark
1 c
ore
4 c
ore
s6
0 c
ore
s6
0 c
ore
s(i
deal)
1 c
ore
4 c
ore
s6
0 c
ore
s
60 c
ore
s(i
deal)
1 c
ore
4 c
ore
s
60 c
ore
s(i
deal)
1 c
ore
4 c
ore
s6
0 c
ore
s
60
core
s(i
deal)
60
core
s
Parallelizing the server results in near-linear speedup
1. The client requires batching to break even.
2. The server’s burden is too high, still.
3. The computational model is still limited (looping is clumsy, etc.)
PEPPER is not ready for prime time:
We describe the landscape in terms of our three goals.
Gives up being unconditional or general-purpose
Replication [Castro & Liskov TOCS02], trusted hardware [Chiesa &
Tromer ICS10, Sadeghi et al. TRUST10], auditing [Haeberlen et al. SOSP07, Monrose et al. NDSS99]
Special-purpose [Freivalds MFCS79, Golle & Mironov RSA01, Sion
VLDB05, Benabbas et al. CRYPTO11, Boneh & Freeman EUROCRYPT11]
Unconditional and general-purpose but not geared toward practice
Use fully homomorphic encryption [Gennaro et al., Chung et al. CRYPTO10]
Proof-based verified computation [GMR85, Ben-Or et al. STOC88, BFLS91, Kilian STOC92, ALMSS92, AS92, Goldwasser et al. STOC08, Ben-Sasson et al. ePrint12]
Proof-based verified computation geared to practice
PEPPER [HotOS11, NDSS12, Usenix Security 12, ePrint12]
[Cormode et al. ITCS12, Thaler et al. HotCloud12] [Gennaro et al. ePrint12]
We have reduced the costs of a PCP-based argument system [Ishai et al., CCC07] by 20+ orders of magnitude, with proof.
The refinements are in the PCP and the cryptography.
We have made the computational model general-purpose.
This leads to cost savings and programmability …
… and yields a practical compiler.
Take-away: Proof-based verified computation is now close to practical and will be a key tool in real systems.