aes encryption and decryption
DESCRIPTION
AES Encryption and decryption, applied parallel computingTRANSCRIPT
AES encryption and decryption on GPU
AES encryption and decryptionon GPUPresented by :S ROYIn this case study we take up integer stream processing on the GPU.
The new GeForce 8 Series GPU, several new extensions and functions have been introduced to GPU programming.
New integer processing features include not only the arithmetic operations but also the bitwise logical operations (such as AND and OR) and the right/left shift operations.
Array parameters and the new texture-buffer object provide a flexible way of referring to integer-indexed tables.Contd.With the new "transform feedback mode," it is now possible to store our results without the need to render to textures or pixel buffers.Several block-cipher modes of operation are also considered here.
New Functions for Integer Stream Processing
Transform Feedback Mode
Features of transform feedback modeThe GL's target parameter is changed to GL_TRANSFORM_FEEDBACK_BUFFER_NV
Need to specify the output attributes and whether each of them is output into a separate buffer object or they are all output interleaved into a single buffer object
The output buffer must be bound through special new API calls
Rasterization can also be optionally disabled
GPU Program Extensions
Two features are used:When declaring a register, we can either specify its type, such as FLOAT or INT, or just leave it typelesswe can refer to tables using an integer index, array parameters, or one of the newly introduced texture-buffer objects
Contd.
An Overview of the AES Algorithm
The AES algorithm is currently the standard block-cipher algorithm that has replaced the Data Encryption Standard (DES)
A rough summary of the requirements made by NIST for the new AES were the following:
Symmetric-key cipherBlock cipherSupport for 128-bit block sizesSupport for 128-, 192-, and 256-bit key lengths
AES cipher operation algorithm is as:
Contd.
The encryption step uses a key that converts the data into an unreadable ciphertext, and then the decryption step uses the same key to convert the ciphertext back into the original data. This type of key is a symmetric key; other algorithms require a different key for encryption and decryption
Contd.
The precise steps involved in the algorithmIn cryptography, algorithms such as AES are called product ciphersFor this class of ciphers, encryption is done in rounds, where each round's processing is accomplished using the same logic.Contd. these product ciphers, including AES, change the cipher key at each roundround keys is determined by a key schedule, which is generated from the cipher key given by the user
The AES Implementation on the GPU
The code given throughout this chapter uses C-style macros and comments to improve readabilityHead of the AES Cipher Vertex Program
Contd.
Program Parameters for Arguments and Constant Tables
In this application we expand the cipher key using the CPU and store the key schedule in the GPU program-local parameters.
Input/Output and the State
AES encryption operates over a two-dimensional array of bytes, called the state.
During the input step, we slice our data into sequential blocks of 16 bytes and unpack it into 4x4 arrays that we push onto the GPU's registers.
Finally, during the output step, we pack these 4x4 arrays back into sequential blocks of 16 bytes and stream the results back to the transform feedback buffer
Contd. Initialization During the initialization stage, we do an AddRoundKey operation, which is an XOR operation on the state by the round key, as determined by the key schedule
RoundsA round for the AES algorithm consists of four operations: the SubBytes operation, the ShiftRows operation, the MixColumns operation, and the previously mentioned AddRoundKey operation
Contd. The SubBytes Operation The SubBytes operation substitutes bytes independently, in a black-box fashion, using a nonlinear substitution table called the S-box
Contd.The ShiftRows Operation The ShiftRows operation shifts the last three rows of the state cyclically, effectively scrambling row data
The MixColumns Operation The next step is the MixColumns operation, which has the purpose of scrambling the data of each columnContd.
The AddRoundKey Operation This operation determines the current round key from the key schedule As an optimization we can also combine the MixColumns and AddRoundKey operations into a single subroutine
Performance
Tests were performed on a test machine with the following specificationsCPU: Pentium 4, 3 GHz, 2 MB Level 2 cacheMemory: 1 GBVideo: GeForce 8800 GTS 640 MBSystem: Linux 2.6, Driver 97.46
Vertex Program vs. Fragment Program
Results were obtained by processing a plaintext of 128 MB filled with random numbers and averaging measurements from ten runsThe throughput for the vertex program is 53 MB/sec, whereas for the fragment program, the throughput is 95 MB/sec with a batch size of 1 MB