an automated pipeline balancing in the src reconfigurable computer and its application to the rc5...
DESCRIPTION
Diab1011/MAPLD'043 Requirements Given: –A matching pair of Plain text message (M) and Cipher text (C) Find the correct corresponding Secret Key –Test the possible Secrete Keys exhaustively, –Keys, 128bit-long key from all 0’s to all 1’s. Requirements –The processing element (PE) to be fed a new Secrete Key (K i ) each cycle, –Compare C with the output C i corresponding to K iTRANSCRIPT
An automated pipeline balancingin the SRC Reconfigurable Computer
and its application to the RC5 cipher breaking
Hatim Diab1, Miaoqing Huang1, Kris Gaj2, Tarek El-Ghazawi1 , Nikitas Alexandridis1
1The George Washington University2George Masson University
Diab 1011/MAPLD'042
Objectives
• Implement pipelined RC5 Key Breaker on a single chip,
• Demonstrate automatic balancing of a pipeline by a compiler (SRC),
• Show the cost of added pipeline.
Diab 1011/MAPLD'043
Requirements
• Given:– A matching pair of Plain text message (M) and Cipher text
(C)• Find the correct corresponding Secret Key
– Test the possible Secrete Keys exhaustively,– Keys, 128bit-long key from all 0’s to all 1’s.
• Requirements– The processing element (PE) to be fed a new Secrete Key
(Ki) each cycle,
– Compare C with the output Ci corresponding to Ki
Diab 1011/MAPLD'044
RC5 Algorithm• Mixing in the Secret Key. i=j=0 A=B=0 do 3*max(26,4) times // S[0..25] is the array to be mixed for rc5 encryption A=S[i]=(S[i]+A+B)<<<3; // L[0…3] is the array converted from the secrete key K[0..15] B=L[j]=(L[j]+A+B)<<<(A+B); i=(i+1) mod (26); // The output is the array S[0..25], which will be used to encrypt j=(j+1) mod (4); // the plain text.
• Encryption. LE=A+S[0]; // A is the upper part of plain text RE=B+S[1]; // B is the low part of plain text for i=1 to 12 do LE=((LE⊕RE)<<<RE)+S[2*i]; RE=((RE⊕LE)<<<LE)+S[2*i+1]; The processed LE is the upper part of cipher text, The processed RE is the low part of cipher text.
Diab 1011/MAPLD'045
Key-Breaking Flowchart
Set 128 bit key to all 0s
Counter
Ci=C?
Key GenerationEncryption
M
Ki
Ci
Stop & return to main program
Y
N
Diab 1011/MAPLD'046
Condition & Implementation
• RC5 32/12/16– Cipher text 32*2 bits = 64 bits– 12 rounds– Key = 16 * 8bits = 128 bits
• Implement RC5 encryption using– 12 rounds of encryption macros, with 6 clocks
latency– 78 iterations of key generation macros, with 3
clocks latency
Diab 1011/MAPLD'047
Design & Bottleneck
• Pipelined design– Process one key every clock cycle in a pipelined
fashion• Data dependencies
– One of the features of RC5 is the extensive use of data dependent rotations,
– S value needed every 26th step,– L value needed every 4th step,
• Manual HDL-based realization of the pipeline proved to be time-consuming and error-prone.
Diab 1011/MAPLD'048
Data Dependencies in Each Iteration
0 1 2 3 4 5 6 7
L0 L1 L2 L3
8
L0
24 25to 26
26 27 28 29
L2 L3
50 51to 52
from 25
S0 S1 S2 S3
30
S4 S24 S25
52 53 54 55 76 77from 51
56
L0 L1 L2 L3 L0
S0 S1 S2 S3
RC 5 Encryption
Diab 1011/MAPLD'049
Solution
• Implement on one FPGA chip concurrently– 78 key initialization macros – 12 encryption macros
• Connect the macros in a linear pipeline. • The SRC compiler will balance the pipeline by
inserting delay channels to make all macros run synchronously.
Diab 1011/MAPLD'0410
Delay Channels Added by SRC Compiler
Delay 1 = 1 reg
Delay 2 = 2 reg
Delay 5 = 5 reg
wire
Diab 1011/MAPLD'0411
Detailed flow
0 1Xy 2 3 4 5 6 7 8 24 25
26 27 28 29 50 51to 52
from 25
skey100 skey101 skey102 skey103
30
skey104 skey124 skey125
52 53 54 55 76 7756
skey200 skey201
RC 5 Encryption
DelayChannel
from 51
to 26
DelayChannel
DelayChannel
DelayChannel
DelayChannel
DelayChannel
DelayChannel
DelayChannel
DelayChannel
skey000 skey001 skey002 skey003
00
kkey001
skey025skey024
DelayChannel
DelayChannel
kkey002 kkey003
skey004kkey000
kkey003
kkey010 kkey010
Diab 1011/MAPLD'0412
Compilation Result
• Device utilization summary: Number of External IOBs 594 out of 1104 53% Number of LOCed External IOBs 594 out of 594100% Number of Slices 33790 out of 33792 99% Number of BUFGMUXs 1 out of 16 6%
• Maximum Clock Frequency
Diab 1011/MAPLD'0413
Effectiveness of the BenchmarkCipher Text Expected Key Found Key Time (SRC) (s) Time (PC) (s)
EEDBA521 6D8F4B15
00000000 00000000 00000000 00000000
00000000 00000000 00000000 00000000
97,342 0
C53073A4 8AFAE310
00000000 00000000 00000000 00010000
00000000 00000000 00000000 00010000
98,028 359,000
07CEC757 C72BCAE9
00000000 00000000 00000000 10000000
00000000 00000000 00000000 10000000
2,781,980 1,847,105,000
2F68DC4A ADBFACC6
00000000 00000000 00000000 20000000
00000000 00000000 00000000 20000000
5,466,274 5,251,282,000
6643CACD D1EDD161
00000000 00000000 00000001 00000000
00000000 00000000 00000001 00000000
43,050,562 Too large to simulate
51C6514A 4EF0A99B
00000000 00000000 00000010 00000000
00000000 00000000 00000010 00000000
687,318,493 Too large to simulate
Diab 1011/MAPLD'0414
Conclusion• The objective was realized, i.e., every clock one
128bit-long variable is pushed into the processing chain,
• A speed-up of 1000x over SW and 300x over serial HW implementations was achieved,
• For the flexible parameters used in RC5 algorithm, different map routines can be designed respectively to fit the distinct area and throughput requirements,
• The automated pipeline balancing of the SRC compiler proved to substantially decrease the development time of complex pipelined designs.