a secure protocol for computing dot-products in clustered and distributed environments ioannis...
TRANSCRIPT
A Secure Protocol for Computing Dot-products in Clustered and Distributed
EnvironmentsIoannis Ioannidis, Ananth Grama and Mikhail Atallah
Purdue University.
Acknowledgements: National Science Foundation.
The Problem
• Dot-products are the basis of many important applications
• Scientific computations• Data mining• Transaction processing• Biometrics
• Use of distributed environments creates security issues
• Data too valuable to expose• Untrusted links or hosts• Spoofing is very easy
The Problem
• Each party is honest-but-curious– They play by the rules, but if they can find
out more, they will.
• Only one of the parties is interested in the result.
• We have a random number generator, which generates a uniformly distributed random integer, cast into a real.
Candidate Solution
• Use conventional cryptography– Secure tunneling can protect the links– More complex protocols offer protection
against untrusted hosts
• Unfortunately, public-key crypto has a high complexity– Modular exponentiation computations can
have a crippling effect on the overall performance
Security vs. Efficiency
• Ideally, no information should leak about the participating vectors during a secure dot-product protocol
• However, in the context of the given problem, in a clustered environment, security need not be so tight– Dot-products inherently leak data in the solution– Some leakage may be acceptable, since the same
dot-product will not be computed multiple times– Small compromises in security can lead to large
gains in efficiency
An Efficient Alternative
• Use linear algebraic properties to achieve a sufficient level of security– Hide a vector inside a matrix– Scramble the matrix– Multiply the matrix by the other vector– Retrieve the dot-product– A large part of the computation can be reused– Both parties must share a secret – a number –
before the protocol
An Efficient Alternative
• Security is not perfect– A small number of equations will leak– Statistics can reveal information
• But is sufficient for a real-world setting– If you don’t need to execute the same instance
many times, leaking a few equations is not a problem
– Statistical attacks demand larges amounts of information
– Not so easy to gather them in clustered environments
Algorithmic Considerations• Time overhead
– How much more computation needs to be performed?
– Public-key cryptography adds an unacceptable amount of overhead.
– But it is the only solution if perfect secrecy is the goal.
• Communication overhead– Network latency prevails in larger networks.
– Bit count is the decisive factor in tightly coupled networks.
Stability Considerations
• Algebraic manipulations of the data can introduce numerical errors in scientific computation data.
• Any protocol applied to real-valued vectors must be numerically stable to be of practical importance.
Experimental Results
• The protocol was executed on two PIII/450Mhz machines connected on a Gigabit Ethernet network
• Data was randomly generated vectors of length 106
• We measured the total overhead (computation and communication)– Communication overhead is expected to be a
factor of 4
Experimental Results
• Measured overhead showed a factor of 4.69 overhead– Communication overhead is the dominating
factor, even on a fast network
• Average numerical error was measured to 4.5 x 10-9
Conclusions and Ongoing Research
• It is possible to execute multiparty, real-valued dot-product computations efficiently and with satisfactory security
• Binary dot-products pose a different problem due to the sparsity of the vectors– Number theoretic techniques introduce
large time and communication overheads