the university of north carolina at chapel hill cryptography on graphics processors francisco...

37
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Cryptography on Graphics Processors Francisco Chinchilla COMP 290-GPGP Presentation

Post on 18-Dec-2015

216 views

Category:

Documents


2 download

TRANSCRIPT

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Cryptography on Graphics Processors

Francisco ChinchillaCOMP 290-GPGP Presentation

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Papers

“Brute Force Attack on UNIX Passwords with SIMD Computer”

Gershon Kedem and Yuriko Ishihara, Computer Science Department, Duke University. Usenix ’99 Security Symposium

"CryptoGraphics: Secret Key Cryptography Using Graphics Cards"

Debra L. Cook, John Ioannidis, Angelos D. Keromytis, and Jake Luck, Columbia University. To appear in the Proceedings of the RSA Conference, Cryptographer's Track (CT-RSA). February 2005, San Francisco, CA.

Remotely-Keyed CryptographicsDebra L. Cook, Ricardo Baratto, and Angelos D. Keromytis, Columbia University.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Outline

Quick Tour of CryptographyBrute force password hacking using PixelFlowImplementing Cryptographic systems on a GPUUsing GPU based cryptographic systems in remote keyed systems in unfriendly environmentsConclusions

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Cryptography Basics

Two functions of two variables

E(keye, text) encrypts

D(keyd, E(keye, text)) decrypts

If keye==keyd cryptosystem is called symmetric, otherwise it’s asymmetricWe’ll deal with symmetric

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Cipher Types

Block CipherEncrypts fixed-size blocks of data

Stream CipherEncrypts input data one bit (sometimes one byte) at a timeUseful for data with unknown length

Stream ciphers are faster in hardware, but slower in software than block ciphersBoth use XOR heavily

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Attacks to Break Encryption

Techniques usedDifferentialLinearBrute Force

Underlying methodsChosen PlaintextCiphertext-onlyKnown Plaintext

GoalsFind a passwordFind the key

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

DES

Block cipherDeveloped by IBM, made a FIPS in ‘76Uses only 56-bit keys, not 64, not 128, possibly due to NSADesigned against differential cryptoanalysis, a ‘new’ (not to IBM nor NSA) field of cryptography in the late 1980’s. Takes 2^47 chosen plaintexts to break.Not designed against linear cryptoanalysis. Takes 2^39-2^44 known plaintextsBroken in 56 hours by EFF $250k machine in 1998

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

DES Encryption Details

Subkey(48 bits)

The Feistel (F) function

Key Schedule

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

DES – Other Boxes

http://en.wikipedia.org/wiki/Substitution_box

http://en.wikipedia.org/wiki/DES_supplementary_material

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Crypt

A variation of the DES algorithmUsed to encrypt Unix passwordsSample invocations:

crypt(password)=pajXG3p8vEaokcrypt(God)=GoqTni0y.7CH2crypt(sex)=se8lvCoF9FQE.crypt(money)=moIzVjU7RTRr6crypt(love)=loYMPnrPlj9gk

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

AES

Competition to replace DES FIPSCompetition won by RijndaelAES is fast in both SW and HWRelatively easy to implementRequires little memory

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

AES 128-bit Details

AES operates on a state, a 4×4 array of bytes. For encryption, each round of a total of 10 consists of four stages:

SubBytes — a non-linear substitution step where each byte is replaced with another according to a lookup table, an S-block. This is the non-linear step.ShiftRows — a transposition step where each row of the state is shifted cyclically a certain number of steps. MixColumns — a mixing operation which operates on the columns of the state, combining the four bytes in each column using a linear transformation. This is the “Diffusion” step and is omitted in the final round!AddRoundKey — each byte of the state is combined with the round key; each round key is derived from the cipher key using a key schedule.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

RC4

Stream cipherDesigned by Ron Rivest in ’87In 1994 a description was anonymously posted, which lead to ARCFOURIn 2001 it was found that the first 256 bytes of the output were non-random (this leads to one big vulnerability in WEP)Unusually simpleRequires only byte-length manipulations

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

RC4 - details

for i from 0 to 255 S[i] := i j: = 0 for i from 0 to 255 j := (j + S[i] + key[i mod l]) mod 256 swap(S[i],S[j])

pseudo-random gen. algorithm (PRGA)

i := 0 j := 0 while GeneratingOutput: i := (i + 1) mod 256 j := (j + S[i]) mod 256 swap(S[i],S[j]) output S[(S[i] + S[j]) mod 256]

Key Scheduling Algorithm (KSA)Encrypt(bits, key){

S = new byte[256];

KSA(key,S);

randomBits = PRGA(S);

for i=0 to length(bits)-1{

bits[i] = XOR(bits[i],randomBits[i])

}

}

Decrypt(bits, key){ Encrypt(bits, key) }

RC4

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

PixelFlow Unit

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Processing Element

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

PixelFlow

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Password Hacking

Two full chassis=18 SIMD Arrays=147,456 PE’sFind the key used to encrypt a stream of bytes using RC4Check Unix passwords

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Password Hacking Results

Each PE checks 1,000 RC4 keys in 3.75s = 38.8M/s for the systemEach PE checks 1,000 crypt passwords in 6s =24.6M/s for the systemSystem encrypts 614M DES passwords/second

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Secret Key Cryptography

Test capability of XOR on different cards using different ciphers

RC4AES

Used OpenGL and GLUT, no CGDid not use ‘the latest greatest’ cardsBasic steps

Disable dithering!Copy pixels between coordinates, with colormapping and XOR on or off

Why colormaps?

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Test systems

Note: 256KB RAM = 256 MB RAM?

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

RC4 on a GPU

Random bit stream must be precomputedSimply load the bit stream and the data to the GPU and then perform the XOR

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

RC4 Comparisons (includes all data transfers)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

AES on a GPU

Key schedule is not done on GPUOnly store 1 byte of data in each pixel(De/En)crypts 4*n blocks simultaneously, given that the data is stored in n pixelsTested against reference and optimized CPU implementations

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

AES on GPU data layout

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

AES Encryption Rates

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Cryptography on GPU -Conclusions

Performance isn’t great, now what?GPU’s can still offload some cycles off the CPU?• Not really, the CPU was still fully utilized

Look for a new application, like video system described in their next paper

ResolutionRequired

MB/sGeForce3 RC4

Radeon 7500 RC4 TNT2 RC4

320x200@10fps 1.92 pass pass pass

320x200@15fps 2.88 pass pass pass

1280x1024@10fps 39.25 pass maybe fail

1280x1024@15fps 58.9 pass fail fail

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Remotely Keyed Cryptographics

Extend their previous workRC4 can be used to sustain rates that are ok for live videoThe video is not susceptible to attacks from within the OSFinally? explained why they didn’t use CG:

“Higher level languages do not allow the developer to specify which OpenGL commands are utilized when there are multiple ways of implementing a function via OpenGL commands and do not even guarantee the operations will be transformed into OpenGL commands but instead may transform it into C code… that may be executed on the OS and not using OpenGL.”

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Regarding compression and transmission

Encrypted data doesn’t compress100 Mbps = 16.7 500x500 RGB frames/sIf you know about the RFP protocol (used in VNC) or MPG video, you know that the entire screen isn’t necessarily updated from frame to frame, only the sections that need it

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

System Overview

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Session Overview

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Test systems (again)

Note: 256KB RAM = 256 MB RAM?

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Test results

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Remote Keyed Cryptography- Conclusion and Comments

99% of time taken on GPU is writing the keystreamRates still good for thin-clientsGreat for encrypting very sensitive images in a system that is not completely trusted

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Presentation Conclusion

GPU’s can have lots and lots of FLOPsCan we find applications that use them as the main PU?As a co-processor to offload cycles?If we fail, can we apply a different twist or find a special case/market of the setting where we had failed previously?

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Discussion

Could make the performance of these algorithms better if we could use fragment programs for example?

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL

Additional References

Wikipedia lots and lots of info and images taken from here

PixelFlow WebsiteIn particular:Eyles, John, Steven Molnar, John Poulton, Trey Greer, Anselmo Lastra, and Nick England, "PixelFlow: The Realization", Proceedings of the Siggraph/Eurographics Workshop on Graphics Hardware, Los Angeles, CA, August 3-4, 1997, 57-68.