authenticated encryption decryption scheme

16
VLSI DESIGN CONFERENCE 2016 Domain- Analog/Digital Design Challenge D3- Efficient Accelerator for Authenticated Encryption Title: HarSam Authors: Samnit Dua and Hardik Manocha Passcode: 26X-C4E3D5E4H7 Confirmation No: 26 Introduction: Our Team has selected one of the CAESAR Candidate’s paper to be implemented in the Design Contest for VLSI Design Conference 2016, named TIAOXIN-346. As stated in the paper (http://competitions.cr.yp.to/round1/tiaoxinv1.pdf ), implementation has been done on software displaying Speed analysis for the design. No Hardware implementation has been listed in the paper. Our Team, thus decided to design the Hardware for TIAOXIN-346, emphasizing on the FPGA implementation using VerilogHDL and try to achieve the same speed as stated in the paper, on the FPGA. Further, our team worked on the memory feature of the design as well. Complete analysis of our design is listed on the pages to come with comparison to the analysis listed in the paper. We have worked on the 256 number of bits of the Text that has to be encrypted and decrypted.

Upload: hardik-manocha

Post on 22-Feb-2017

169 views

Category:

Engineering


2 download

TRANSCRIPT

Page 1: Authenticated Encryption Decryption Scheme

VLSI DESIGN CONFERENCE 2016

Domain- Analog/Digital Design

Challenge D3- Efficient Accelerator for Authenticated Encryption

Title: HarSam

Authors: Samnit Dua and Hardik Manocha

Passcode: 26X-C4E3D5E4H7

Confirmation No: 26

Introduction:

Our Team has selected one of the CAESAR Candidate’s paper to be implemented in the Design Contest

for VLSI Design Conference 2016, named TIAOXIN-346. As stated in the paper

(http://competitions.cr.yp.to/round1/tiaoxinv1.pdf ), implementation has been done on software

displaying Speed analysis for the design. No Hardware implementation has been listed in the paper.

Our Team, thus decided to design the Hardware for TIAOXIN-346, emphasizing on the FPGA

implementation using VerilogHDL and try to achieve the same speed as stated in the paper, on the

FPGA. Further, our team worked on the memory feature of the design as well.

Complete analysis of our design is listed on the pages to come with comparison to the analysis listed in

the paper.

We have worked on the 256 number of bits of the Text that has to be encrypted and decrypted.

Page 2: Authenticated Encryption Decryption Scheme

Specification:

TIAOXIN-346 is a nonce based authenticated encryption scheme, which operates on 256 bits of the

Message and Associated data, along with 128 bits Key and Nonce (public message number)

For ENCRYPTION/AUTHENTICTION stage Tiaoxin- 346 (K; IV; M; AD) = (C; Tag)

Inputs-

Key, K (128 bits)

Nonce, IV (128 bits)

Plain Text, M (256 bits)

Associated Data, AD (256 bits)

Outputs-

Cipher Text, C (256 bits)

Authentication Tag, Tag (128 bits)

For DECRYPTION stage

Inputs-

Key, K (128 bits)

Nonce, IV (128 bits)

Cipher Text, M (256 bits)

Associated Data, AD (256 bits)

Authentication Tag, Tag (128 bits)

Outputs-

Plain Text, M (256 bits), if Authentication Tag generated matches with the input

Authentication Tag.

Page 3: Authenticated Encryption Decryption Scheme

Notations:

Z0 - a constant word defined as Z0 =428a2f98d728ae227137449123ef65cd

Z1 - a constant word defined as Z1 =b5c0fbcfec4d3b2fe9b5dba58189dbbc

Ts - a state composed of s words. For instance, T3 has 3 words, T6 has 6 words. To index state words we

use the language C notation, hence Ts = (Ts[0]; Ts[1]; : : : ; Ts[s-1]), where Ts[i]; i = 0; : : : ; s-1 are words,

and Ts[0] is the first word.

Operations:

X ^ Y {bitwise addition (XOR) of the words X and Y

X&Y {bitwise conjunction (AND) of the words X and Y

AES(X; SK) {one keyed round of AES applied to the word X, where SK is the sub key, i.e.:

AES(X; SK) = Mix Columns (Shift Rows (Sub Bytes(X))) ^ SK

Sub Bytes; Shift Rows; Mix Columns are the same operations as in AES.

Thus, AES(X; SK) is the AES-NI instruction aesenc.

R (Ts; M) {a round transformation of a state with s words. The inputs of R are state Ts and word M,

while the output is a new state Tnew i.e.

R: Ts X M->Tnew s :

Tnew s [0] = AES (Ts [s - 1]; Ts [0]) ^M

Tnew s [1] = AES (Ts [0]; Z0)

Tnew s [2] = Ts [1]

: : :

Tnew s [s - 1] = Ts[s - 2]

States of TIAOXIN-346:

Both the Encryption and Decryption parts of TIAOXIN-346 operate upon three states- T3, T4 and

T6. T3 consists of 3 words; T4 consists of 4 words while T6 consists of 6 words.

Update Operation:

T3, T4 and T6 are updated using UPDATE function. UPDATE function uses the R(T;M) operation,

as defined above.

Update: T3 X T4 X T5 XM0 XM1 XM2 -> T3 X T4 X T6

T3new = R(T3;M0); T3 = T3new

Page 4: Authenticated Encryption Decryption Scheme

T4 new = R(T4;M1); T4 = T4new

T6 new = R(T6;M2); T6 = T6new

T3 Update T4 Update T6 Update

Circled A stands for one AES round. The AES rounds applied to T3[2]; T4[3]; T6[5] are keyless, while the

AES rounds applied to T3[0]; T4[0]; T6[0] use Z0 as a sub key.

Definition of TIAOXIN-346

Tiaoxin – 346 processes the associated data AD and the message Min blocks where each block is

composed of 2 words (32 bytes, 256 bits)

The associated data AD is of 32 bytes. The length of the AD is encoded as 16-byte big endian word and

stored in AD Length, i.e. AD Length = |AD|.

The message M is of 32 bytes. The length of the M is encoded as 16-byte big endian word and stored in

M Length, i.e. M Length = |M|.

Tiaoxin - 346 is a stream cipher based design and as such it works in four phases: Initialization,

Processing associated data, Encryption, and Finalization. These phases are executed in the order

specified above.

INITIALIZATION:

In the initialization, the key K and the public message number (nonce) IV are loaded into the three states

T3; T4; T6 and the states go through 15 rounds.

T3 [0] = K; T3 [1] = K; T3 [2] = IV;

T4 [0] = K; T4 [1] = K; T4 [2] = IV; T4 [3] = Z0;

T6 [0] = K; T6 [1] = K; T6 [2] = IV; T6 [3] = Z1; T6 [4] = 0; T6 [5] = 0;

for i = 1 to 15

Page 5: Authenticated Encryption Decryption Scheme

Update (T3; T4; T6; Z0; Z1; Z0);

end for

PROCESSING ASSOCIATED DATA

Assume the associated data is composed of two words, i.e. AD= AD0 || AD1. The Processing associated

data is defined as:

Update (T3; T4; T6; AD0; AD1; AD0 ^ AD1);

ENCRYPTION:

Assume the Message M is composed of two words, i.e. M= M0 || M1. In the encryption, a block M is

processed in one round, and a block of cipher text C = C0 || C1 (concatenation) is output. The Processing

associated data is defined as:

Update (T3; T4; T6; M0; M1; M0 ^ M1);

C0=T3 [0] ^ T3 [2] ^ T4 [1] ^ (T6 [3] & T4 [3]);

C1= T6 [0] ^ T4 [2] ^ T3 [1] ^ (T6 [5] & T3 [2]);

TAG PRODUCTION:

After all message blocks have been processed, the words holding the lengths of the associated data and

message are processed, then the states go through 20 more rounds, and the tag Tag is produced as an

XOR of all words of all states. This final phase is defined as:

Update (T3; T4; T6; AD Length; M Length; AD Length ^ M Length);

for i = 1 to 20

Update (T3; T4; T6; Z1; Z0; Z1);

end for

Tag= T3 [0] ^ T3 [1] ^ T3 [2] ^ T4 [0] ^ T4 [1] ^ T4 [2] ^ T4 [3] ^ T6 [0] ^

T6 [1] ^ T6 [2] ^ T6 [3] ^ T6 [4] ^ T6 [5];

Page 6: Authenticated Encryption Decryption Scheme

DECRYPTION and VERIFICATION:

In the decryption-verification process, the order of the phases is the same: Initialization, Processing

associated data, Decryption, and Finalization. Initialization, Processing associated data and Finalization

are the same as during the encryption. Decryption is defined as:

Update (T3; T4; T6; 0; 0; 0);

M0= C0 ^ T3 [0] ^ T3 [2] ^ T4 [1] ^ (T6 [3] & T4 [3]);

M1= C ^ T6 [0] ^ T4 [2] ^ T3 [1] ^ (T6 [5] & T3 [2]) ^ M0;

T3 [0] = T3 [0] ^ M0;

T4 [0] = T4 [0] ^ M1;

T6 [0] = T6 [0] ^ M0 ^ M1;

VERILOG IMPLEMENTATION:

The VerilogHDL is an IEEE standard hardware description language. It is widely used in the design of

digital integrated circuits. Basically Verilog is verification through simulation, for timing analysis, for test

analysis and for logic synthesis. Verilog HDL allows designers to design at various levels of abstraction

like register transfer level, gate level and switch level. Verilog is used as an input for synthesis programs

which will generate a gate-level description for the circuit. Xilinx ISE 13.2 is a software tool developed by

Xilinx for synthesis and analysis of HDL designs.

VerilogHDL code is written in Xilinx ISE 13.2.

SIMULATION:

Our VerilogHDL code is simulated using ISIM available with Xilinx 13.2.

Test Vectors/Data for ENCRYPTION:

Inputs

Key, K = 91cc70a38f1cf31c3a3a39c748e8ee3a

Page 7: Authenticated Encryption Decryption Scheme

Nonce, IV = b7ddefbdfad7df7b7dbee3e5f5f5fbe6

Message, M= b7ddf2398e1471e39e6387474738e91d1dc74fbdfad7df7b7dbee3e5f5f5fbe6

Associate Data, AD= 91cc70a38f1cf31c3a3a39c748edbeef7defd6befbdbedf71f2fafafdf30ee3a

Outputs

C= d4a1b9fb02fa511cdf7f8cfbb90e22438702502bada2b70436ca6fc14c5d6224

Tag= bf979c14211c4930064abc4f50c2d0d0

Simulation Result for ENCRYPTION:

Test Vectors/Data for DECRYPTION:

Inputs

Key, K = 91cc70a38f1cf31c3a3a39c748e8ee3a

Nonce, IV = b7ddefbdfad7df7b7dbee3e5f5f5fbe6

Associate Data, AD= 91cc70a38f1cf31c3a3a39c748edbeef7defd6befbdbedf71f2fafafdf30ee3a

Page 8: Authenticated Encryption Decryption Scheme

C= d4a1b9fb02fa511cdf7f8cfbb90e22438702502bada2b70436ca6fc14c5d6224

Tag= bf979c14211c4930064abc4f50c2d0d0

Output CASE (1): When Same Tag is entered to DECRYPTION:

Message, M= b7ddf2398e1471e39e6387474738e91d1dc74fbdfad7df7b7dbee3e5f5f5fbe6

fail= 1

Simulation Result for Decryption:

Output CASE (2): When Different Tag is entered to DECRYPTION:

Tag= bf979c14211c4930064abc4f50c2d0d0 (here just first HEX value is changed to 0)

Message, M= X

fail= 0

Page 9: Authenticated Encryption Decryption Scheme

Simulation Result for Decryption:

SYNYTHESIZE SETTINGS:

Page 10: Authenticated Encryption Decryption Scheme
Page 11: Authenticated Encryption Decryption Scheme

SYNTHESIZE SUMMARY

Page 12: Authenticated Encryption Decryption Scheme

Device and the family used for our design implementation is SPARTAN 3E (xc3s500e-5vq100).

SUMMARY- ENCRYPTION

This summary shows Synthesize report for Enhanced Pentium M architecture.

Summary for Haswell architecture

Page 13: Authenticated Encryption Decryption Scheme

SUMMARY- DECRYPTION

Detailed Synthesize Report for Decryption is available in the main folder, named “reports ->

synthesize reports -> detailed_synthesize_report_dec.txt”.

FPGA IMPLEMENTATION:

Code written by our team is altered in order to test our design over FPGA.

Encryption code changes:

1) Inputs K, IV, AD, M are created constant in the code.

2) Rest of the Inputs is similar.

3) Inputs K,IV, AD, M are created as parameters with the values listed in this paper.

4) Outputs defined for FPGA code are only 8 bits. All these bits are used to display

Cipher Data and Tag on LEDs.

5) Only certain bits of C and Tag are displayed on LEDs, in order to have maximum

similarity with the actual code.

Page 14: Authenticated Encryption Decryption Scheme

SUMMARY- ENCRYPTION for FPGA

Detailed Synthesize report is available in main folder as “synthesize

report_fpga_implementation -> detailed_synthesize_report_enc_fpga.txt”.

Decryption Code changes:

1) Only inputs are clk and rst.

2) Inputs K, IV, C, Tag, and AD are created as parameters with the values generated

from Encryption module.

3) Output is only single LED, which describes the match between Input Tag and

generated Tag, and thus describes what would be the output.

SUMMARY- DECRYPTION for FPGA

Page 15: Authenticated Encryption Decryption Scheme

COMPARISON:

This section would describe the comparison among our design and the one described in the

paper (http://competitions.cr.yp.to/round1/tiaoxinv1.pdf ). This comparison is done for ENCRYPTION,

as the performance listed in the TIAOXIN-346 is only for ENCRYPTION.

Features Our Design TIAOXIN-346

Software Yes Yes

Hardware (SPARTAN 3E) Yes No

SPEED (Enhanced Pentium M

micro architecture)

(256 bits Data)

7.562ns N A

SPEED (Haswell micro

architecture)

(256 bits Data)

7.782ns N A

SPEED (Sandy Bridge micro

architecture)

(256 bits Data)

N A 1.45ns

Page 16: Authenticated Encryption Decryption Scheme

CONCLUSION:

As the problem statement for the Design Contest demanded, that teams participating should

implement Hardware for a CAESAR Entry, therefore, Our Team was able to achieve Hardware

Implementation for TIAOXIN-346. In the paper, entitled “TIAOXIN-346”, no Hardware

Implementation has been listed. Only Software Implementation has been described.

Although, in our design, we were not able to achieve similar SPEED performance as compared

to TIAOXIN-346. Our design is 5 times slower than TIAOXIN-346 but we have successfully

verified our design over FPGA.

Our Team is still working on to bring the listed SPEED Features in TIAOXIN-346, to be available

with our design, so that we add one more feature of HARDWARE IMPLEMENATION to TIAOXIN-

346. For this, our team has built another design which makes use of “Function Calls and LOOPS

Structures” instead of “multiple times Module Instantiations”. We have successfully

SIMULATED this design, but due to lack of system resources, we were not able to determine the

SPEED features of that design as Synthesize Process is taking whole lot of time and still not

completing and thereby that design is not Hardware Implemented as well. We are pretty sure

that Design with Functions and LOOP Structure would match the SPEED features of TIAOXIN-

346, as Functions and LOOP Structures take only 2-3 clock ticks rather than complete clock

cycles.