an automated pipeline balancing in the src reconfigurable computer and its application to the rc5...

14
An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1 , Miaoqing Huang 1 , Kris Gaj 2 , Tarek El-Ghazawi 1 , Nikitas Alexandridis 1 1 The George Washington University 2 George Masson University

Upload: hilda-gregory

Post on 17-Jan-2018

218 views

Category:

Documents


0 download

DESCRIPTION

Diab1011/MAPLD'043 Requirements Given: –A matching pair of Plain text message (M) and Cipher text (C) Find the correct corresponding Secret Key –Test the possible Secrete Keys exhaustively, –Keys, 128bit-long key from all 0’s to all 1’s. Requirements –The processing element (PE) to be fed a new Secrete Key (K i ) each cycle, –Compare C with the output C i corresponding to K i

TRANSCRIPT

Page 1: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

An automated pipeline balancingin the SRC Reconfigurable Computer

and its application to the RC5 cipher breaking

Hatim Diab1, Miaoqing Huang1, Kris Gaj2, Tarek El-Ghazawi1 , Nikitas Alexandridis1

1The George Washington University2George Masson University

Page 2: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'042

Objectives

• Implement pipelined RC5 Key Breaker on a single chip,

• Demonstrate automatic balancing of a pipeline by a compiler (SRC),

• Show the cost of added pipeline.

Page 3: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'043

Requirements

• Given:– A matching pair of Plain text message (M) and Cipher text

(C)• Find the correct corresponding Secret Key

– Test the possible Secrete Keys exhaustively,– Keys, 128bit-long key from all 0’s to all 1’s.

• Requirements– The processing element (PE) to be fed a new Secrete Key

(Ki) each cycle,

– Compare C with the output Ci corresponding to Ki

Page 4: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'044

RC5 Algorithm• Mixing in the Secret Key. i=j=0 A=B=0 do 3*max(26,4) times // S[0..25] is the array to be mixed for rc5 encryption A=S[i]=(S[i]+A+B)<<<3; // L[0…3] is the array converted from the secrete key K[0..15] B=L[j]=(L[j]+A+B)<<<(A+B); i=(i+1) mod (26); // The output is the array S[0..25], which will be used to encrypt j=(j+1) mod (4); // the plain text.

• Encryption. LE=A+S[0]; // A is the upper part of plain text RE=B+S[1]; // B is the low part of plain text for i=1 to 12 do LE=((LE⊕RE)<<<RE)+S[2*i]; RE=((RE⊕LE)<<<LE)+S[2*i+1]; The processed LE is the upper part of cipher text, The processed RE is the low part of cipher text.

Page 5: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'045

Key-Breaking Flowchart

Set 128 bit key to all 0s

Counter

Ci=C?

Key GenerationEncryption

M

Ki

Ci

Stop & return to main program

Y

N

Page 6: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'046

Condition & Implementation

• RC5 32/12/16– Cipher text 32*2 bits = 64 bits– 12 rounds– Key = 16 * 8bits = 128 bits

• Implement RC5 encryption using– 12 rounds of encryption macros, with 6 clocks

latency– 78 iterations of key generation macros, with 3

clocks latency

Page 7: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'047

Design & Bottleneck

• Pipelined design– Process one key every clock cycle in a pipelined

fashion• Data dependencies

– One of the features of RC5 is the extensive use of data dependent rotations,

– S value needed every 26th step,– L value needed every 4th step,

• Manual HDL-based realization of the pipeline proved to be time-consuming and error-prone.

Page 8: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'048

Data Dependencies in Each Iteration

0 1 2 3 4 5 6 7

L0 L1 L2 L3

8

L0

24 25to 26

26 27 28 29

L2 L3

50 51to 52

from 25

S0 S1 S2 S3

30

S4 S24 S25

52 53 54 55 76 77from 51

56

L0 L1 L2 L3 L0

S0 S1 S2 S3

RC 5 Encryption

Page 9: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'049

Solution

• Implement on one FPGA chip concurrently– 78 key initialization macros – 12 encryption macros

• Connect the macros in a linear pipeline. • The SRC compiler will balance the pipeline by

inserting delay channels to make all macros run synchronously.

Page 10: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'0410

Delay Channels Added by SRC Compiler

Delay 1 = 1 reg

Delay 2 = 2 reg

Delay 5 = 5 reg

wire

Page 11: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'0411

Detailed flow

0 1Xy 2 3 4 5 6 7 8 24 25

26 27 28 29 50 51to 52

from 25

skey100 skey101 skey102 skey103

30

skey104 skey124 skey125

52 53 54 55 76 7756

skey200 skey201

RC 5 Encryption

DelayChannel

from 51

to 26

DelayChannel

DelayChannel

DelayChannel

DelayChannel

DelayChannel

DelayChannel

DelayChannel

DelayChannel

skey000 skey001 skey002 skey003

00

kkey001

skey025skey024

DelayChannel

DelayChannel

kkey002 kkey003

skey004kkey000

kkey003

kkey010 kkey010

Page 12: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'0412

Compilation Result

• Device utilization summary: Number of External IOBs 594 out of 1104 53% Number of LOCed External IOBs 594 out of 594100% Number of Slices 33790 out of 33792 99% Number of BUFGMUXs 1 out of 16 6%

• Maximum Clock Frequency

Page 13: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'0413

Effectiveness of the BenchmarkCipher Text Expected Key Found Key Time (SRC) (s) Time (PC) (s)

EEDBA521 6D8F4B15

00000000 00000000 00000000 00000000

00000000 00000000 00000000 00000000

97,342 0

C53073A4 8AFAE310

00000000 00000000 00000000 00010000

00000000 00000000 00000000 00010000

98,028 359,000

07CEC757 C72BCAE9

00000000 00000000 00000000 10000000

00000000 00000000 00000000 10000000

2,781,980 1,847,105,000

2F68DC4A ADBFACC6

00000000 00000000 00000000 20000000

00000000 00000000 00000000 20000000

5,466,274 5,251,282,000

6643CACD D1EDD161

00000000 00000000 00000001 00000000

00000000 00000000 00000001 00000000

43,050,562  Too large to simulate

51C6514A 4EF0A99B

00000000 00000000 00000010 00000000

00000000 00000000 00000010 00000000

687,318,493 Too large to simulate

Page 14: An automated pipeline balancing in the SRC Reconfigurable Computer and its application to the RC5 cipher breaking Hatim Diab 1, Miaoqing Huang 1, Kris

Diab 1011/MAPLD'0414

Conclusion• The objective was realized, i.e., every clock one

128bit-long variable is pushed into the processing chain,

• A speed-up of 1000x over SW and 300x over serial HW implementations was achieved,

• For the flexible parameters used in RC5 algorithm, different map routines can be designed respectively to fit the distinct area and throughput requirements,

• The automated pipeline balancing of the SRC compiler proved to substantially decrease the development time of complex pipelined designs.