data transformation unit

7/29/2019 data transformation unit

1/55

BACHELOR OF ENGINEERING PROJECT ON

INTEGRATED DATA TRANSFORMATION UNIT

Submitted ByADISH GULECHHA

NEHA RASKARSHAILESH TENDULKAR

IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE DEGREE OFBACHELOR OF ENGINEERING

INELECTRONICS

UNDER THE GUIDANCE OFPROF.GIRISH GIDAYE

Department of Electronics Engineering

Vidyalankar Institute of TechnologyWadala (E) Mumbai 400 037.

University of Mumbai

2011- 2012


2/55

CERTIFICATE

This is to certify that

ADISH GULECHHANEHA RASKAR

SHAILESH TENDULKAR

Have successfully completed project titled

INTEGRATED DATA TRANSFORMATION UNIT

IN PARTIAL FULFILLMENT OF THE REQUIREMENT FOR THE DEGREE OFBACHELOR OF ENGINEERING

INELECTRONICS

Leading to Bachelors Degree in Engineering2011-2012

UNDER THE GUIDANCE OFPROF. GIRISH GIDAYE

Signature of Guide Head of Department

Examiner 1 Examiner 2 Principal

College Seal


3/55

ACKNOWLEDGEMENT

First and foremost, we would like to extend our deepest gratitude to our

project guide, Professor Girish Gidaye, for giving us the opportunity to workon new areas of digital system design. Without his continued support andinterest, this project would not have been the same as presented here.

My sincerest appreciation goes out to all those who have contributed directly

and indirectly to the completion of this project. Of particular mention is

Professor Shrikant Velankar for his guidance, advices and motivations. Hisconstant encouragement, critics and guidance were a key to bringing this

project to a fruitful completion.

My sincere appreciation also extends to all my colleagues and others who

have provided assistance at various occasions. Their views and tips are

useful indeed. At the same time, the constant encouragement and

camaraderie shared between all my friends during my graduate studies has

been an enriching experience.


4/55

ABSTRACT

Design of Integrated Data Transformation Unit on FPGA to increase the

Bandwidth of the channel. The Data would be received from multiple datachannels from host processors (PCs). The data streams would be first

multiplexed to form a single data stream. Then the data stream would

undergo Data compression by Run Length Encoding. Finally the compressed

data stream would be encrypted using DES. This data stream would be

communicated to another FPGA, where reverse process of decrypting,

decompressing and de-multiplexing is carried out to retrieve the original data

channels. For these operations, the data received / sent from / to the host

processor (PC) on 4, 8 or 16 Channels and Multiplexer / Compression /

Encryption as well as De-multiplexer / Decompression / Decryption logic

would be implemented on 2 FPGAs. Hence, the concept of secured high

bandwidth channel is implemented.


5/55

I

CONTENTS

Sr. No. Page Title Page No.

1. List of Figures II

List of Tables III

2. Introduction 1

3. Review of Literature 3

3.1 Verilog 3

3.2 FPGA 5

3.3 Multiplexer and De-multiplexer Unit 7

3.4 Compression and Decompression Unit 9

3.5 Encryption and Decryption Unit 11

4. Design Hierarchy 22

5. Plan of work 24

6. Testing and Results 25

7. Discussion of Results 29

8. Conclusion 31

9. Appendix 32

10. References 48


6/55

II

List of Figures

FIGURE 1 STAGES IN VERILOG 5

FIGURE 2 SYMBOL OF 4:1 MULTIPLEXER 7

FIGURE 3 SYMBOL OF 1:4 DE-MULTIPLEXER 8

FIGURE 4 DATA COMPRESSION MODEL 9

FIGURE 5 DES ALGORITHM OVERVIEV 14

FIGURE 6 KEY SCHEDULING 16

FIGURE 7 CALCULATION OF F(R,K) 19

FIGURE 8 TRANSMISSION SYSTEM HIERARCHICAL FLOW 22FIGURE 9 RECEIVER SYSTEM HIERARCHICAL FLOW 23

FIGURE 10 RTL SCHEMATIC OF THE TRANSMITTER SYSTEM 26

FIGURE 11 WAVEFORM FOR TRANSMITTER SYSTEM 26

FIGURE 12 RTL SCHEMATIC OF THE RECEIVER SYSTEM 28

FIGURE 13 WAVEFORM FOR RECEIVER SYSTEM 28

FIGURE 14 HUFFMAN CODER BLOCK 32

FIGURE 15 STRUCTURE OF LZSS ALGORITHM USED 33

FIGURE 16 RTL SCHEMATIC OF DES ENCRYPTION 36

FIGURE 17 RTL SCHEMATIC OF DES DECRYPTION 36

FIGURE 18 RTL SCHEMATIC OF COMPLETE TRX SYSTEM 38FIGURE 19 RTL SCHEMATIC OF COMPLETE RX SYSTEM 38


7/55

III

List of Tables

TABLE 1 FUNCTION TABLE OF MUX 7

TABLE 2 FUNCTION TABLE OF 1:4 DEMUX 8

TABLE 3 PC-1 PERMUTED CHOICE 1 15

TABLE 4 PC-2 PERMUTED CHOICE 2 17

TABLE 5 IP INITIAL PERMUTATION MATRIX 18

TABLE 6 INVERSE INITIAL PERMUTATION MATRIX 21

TABLE 7 DEVICE UTILIZATION SUMMARY OF TRX SYSTEM 25

TABLE 8 DEVICE UTILIZATION SUMMARY OF RX SYSTEM 27

TABLE 9 TIMING REPORT OF TRANSMITTER CORE 29

TABLE 10 TIMING REPORT OF SYSTEM CORE 30


8/55

Page | 1

2. INTRODUCTION

This project implements register-transfer-level design of a proprietary high-

speed data transformation processor core using Verilog Hardware Description

Language. In addition, this project also offers enhancements aimed at

improving the design portability to any hardware implementation technologies.

The main aim of this project has been to develop a core that processes data

in a fast and a secure manner entirely in hardware. We have made use of the

Data Encryption Standard (DES) and some other standard compression

techniques like LZ77/LZSS which operate on the input stream of the data. All

of this takes place completely in hardware which also increases the security ofthe system. We present the design of a complete Transmitter and the

Receiver system which can be ported to the Xilinx Spartan 3/3E Family FPGA

Boards.

The growing possibilities of modern communications need the special means

of security especially on computer network. The network security is becoming

more important as the amount of data being exchanged on the Internet is

increasing. Security requirements are necessary both at the final user level

and at the enterprise level, especially since the massive utilization of personal

computers, networks, and the Internet with its global availability. Throughout

time, computational security needs have been focused on different features:

secrecy or confidentiality, identification, verification, non-repudiation, integrity

control and availability.

This has resulted in an explosive growth of the field of information hiding. In

addition, the rapid growth of publishing and broadcasting technology also

requires an alternative solution in hiding information.


9/55

Page | 2

The rapid growth of networking is driving high-bandwidth data transfers all

over the world. Today, all the financial transactions, video surveillance, and e-

commerce are performed online. All data transfers are carried over networks

like LAN, WAN, and ATMs, which are interconnected with routers, switches,

bridges, and other network equipment. The growth of virtual private networks

(VPNs) and IP security solutions (IPSec) has heightened demand for secure,

high performance data transfers.


10/55

Page | 3

3. REVIEW OF LITERATURE

3.1 VERILOG

Verilog hardware description language is an IEEE standard (IEEE std. 1364-

1995) language used for describing the behaviour and functionality of digital

circuits.

In the semiconductor and electronic design industry, Verilog is a hardware

description language (HDL) used to model electronic systems. Verilog HDL,

not to be confused with VHDL (a competing language), is most commonlyused in the design, verification, and implementation of digital logic chips at the

register-transfer level of abstraction. It is also used in the verification of analog

and mixed-signal circuits.

Hardware description languages such as Verilog differ from software

programming languages because they include ways of describing the

propagation of time and signal dependencies (sensitivity). At the time of

Verilog's introduction (1984), Verilog represented a tremendous productivity

improvement for circuit designers who were already using graphical

schematic capture software and specially written software programs to

document and simulate electronic circuits.

Entry of large digital designs at the schematic level is very time consuming

and can be exceedingly tedious for circuits with wide data paths that must be

repeated for each bit of the data path. Hardware description languages

(HDLs) provide a more compact textual description of a design. Verilog is a

powerful language and offers several different levels of descriptions. The

lowest level is the gate level, in which statements are used to define individual

gates.


11/55

Page | 4

3.1.1 Structural v/s Behavioral Verilog

Behavioral modeling describes what a design must do, but does not have an

obvious mapping to hardware. Behavioral Verilog is used to describe designs

at a high level of abstraction, to design a processor at the gate level, in order

to quantify the complexity and timing requirements of the design. Hence you

will use structural Verilog only. The behavioral level of description is the most

abstract, resembling C with function calls (called tasks), for and while loops,

etc.

In the structural level, more abstract assign statements and always blocks are

used. These constructs are more powerful and can describe a design with

fewer lines of code, but still provide a clearly defined relationship to actual

hardware.

Verilog libraries containing modules that will be the basic building blocks are

used for design in Structural Verilog. These library parts include simple logic

gates, registers, and memory modules, for example. While the library parts

are designed behaviorally, they incorporate some timing information that will

be used in simulations. Using the class libraries ensures a uniform timing

standard for everyone.

Structural Verilog allows designers to describe a digital system as a

hierarchical interconnection of modules.

The Verilog code for the project consists only of module definitions and their

instances, the use of some behavioral Verilog for debugging purposes.


12/55

Page | 5

3.2 FPGA

A synthesis tool is used to translate the Verilog into actual hardware, such as

logic gates on a custom Application Specific Integrated Circuit (ASIC) or

configurable logic blocks (CLBs) on a Field Programmable Gate Array

(FPGA).

Various stages of ASIC/FPGA

Figure 1 Stages in Verilog


13/55

Page | 6

A field-programmable gate array (FPGA) is an integrated circuit designed to

be configured by the customer or designer after manufacturinghence "field-

programmable". The FPGA configuration is generally specified using a

hardware description language (HDL), similar to that used for an application-

specific integrated circuit (ASIC) (circuit diagrams were previously used to

specify the configuration, as they were for ASICs, but this is increasingly rare).

FPGAs can be used to implement any logical function that an ASIC could

perform. The ability to update the functionality after shipping, partial re-

configuration of a portion of the design and the low non-recurring engineering

costs relative to an ASIC design (notwithstanding the generally higher unit

cost), offer advantages for many applications.

FPGAs contain programmable logic components called "logic blocks", and a

hierarchy of reconfigurable interconnects that allow the blocks to be "wired

together"somewhat like many (changeable) logic gates that can be inter-

wired in (many) different configurations. Logic blocks can be configured to

perform complex combinational functions, or merely simple logic gates like

AND and XOR. In most FPGAs, the logic blocks also include memory

elements, which may be simple flip-flops or more complete blocks of memory.


14/55

Page | 7

3.3 MULTIPLEXER AND DEMULTIPLEXER UNIT

3.3.1 MULTIPLEXER

A multiplexer is a combinatorial circuit that is given a certain number (usuallya power of two) data inputs, let us say 2n, and n address inputs used as a

binary number to select one of the data inputs. The multiplexer has a single

output, which has the same value as the selected data input.

Depending upon the digital code applied at the select inputs one out of n data

input is selected& transmitted to a single o/p channel.

At face value a multiplexer is a logic circuit whose function is to select one

data line from among many. For this reason, many people refer to

multiplexers as data selectors.

Figure 2 Symbol of 4:1 Multiplexer Table 1 Function Table of MUX

Input Output

S1 S0 Y

0 0 D0

0 1 D1

1 0 D2

1 1 D3


15/55

Page | 8

3.3.2 DEMULTIPLEXER

The de-multiplexer is the inverse of the multiplexer, in that it takes a single

data input and n address inputs. It has 2n outputs. The address input

determine which data output is going to have the same value as the datainput. The other data outputs will have the value 0.

Figure 3 Symbol of 1:4 De-multiplexer

Table 2 Function Table of 1:4 DEMUX

Input Output

E S0 S1 D0 D1 D2 D3

E

E

E

E

0

1

0

1

0

0

1

1

E

0

0

0

0

E

0

0

0

0

E

0

0

0

0

E


16/55

Page | 9

3.4 COMPRESSION AND DECOMPRESSION UNIT

Data compression is the technique to reduce the redundancies in data

representation in order to decrease data storage requirements and hence

communication costs. Reducing the storage requirement is equivalent to

increasing the capacity of the storage medium and hence communication

bandwidth. Thus the development of efficient compression techniques will

continue to be a design challenge for future communication systems and

advanced multimedia applications.

Data is represented as a combination of information and redundancy.

Information is the portion of data that must be preserved permanently in its

original form in order to correctly interpret the meaning or purpose of the data.

Redundancy is that portion of data that can be removed when it is not needed

or can be reinserted to interpret the data when needed. Most often, the

redundancy is reinserted in order to generate the original data in its original

form. A technique to reduce the redundancy of data is defined as Data

compression. The redundancy in data representation is reduced such a way

that it can be subsequently reinserted to recover the original data, which is

called decompression of the data.

Figure 4 Data compression model

When we speak of a compression technique or a compression algorithm we

actually refer to two algorithms: the first one takes an input Xand generates a

representation XCthat requires fewer bits; the second one is a reconstruction

algorithm that operates on the compressed representation XCto generate the

reconstruction Y.


17/55

Page | 10

3.4.1 Types of Data Compression Models

There are two types of data compression models: lossy and lossless.

The lossy data compression works on the assumption that the data do

not have to be stored perfectly. Text files (specially files containing

computer programs) are stored using lossless techniques, since losing

a single character can make, in the worst case, the text dangerously

misleading.

Lossless compression ensures that the original information can be

exactly reproduced from the compressed data.

3.4.2 Advantages of Data Compression

It reduces the data storage requirements

The audience can experience rich-quality signals for audio-visual data

representation

Data security can also be greatly enhanced by encrypting the

decoding parameters and transmitting them separately from the

compressed database files to restrict access of proprietary information

The rate of input-output operations in a computing device can be

greatly increased due to shorter representation of data

Data Compression obviously reduces the cost of backup and recovery

of data in computer systems by storing the backup of large database

files in compressed form

The technique used in the design of high-speed data compression and

decompression processor cores is based on combination of LZSS

compression algorithm and Huffman coding. The source data to be

compressed is first processed by the LZSS compression technique since the

algorithm is not restricted in what type of data it can process, coupled with the

fact that it requires no a priori knowledge of the source. LZSS codeword is

then generated whenever matches between the source data and the

dictionary elements are detected, where the encoded data are represented as

position-lengthpair codeword.


18/55

Page | 11

3.5 ENCRYPTION AND DECRYPTION UNIT

Fast computers and advances in telecommunications have made high-speed,

global, widespread computer networks possible, in particular the Internet,

which is an open network. It has increased the access to databases, such as

the open World Wide Web. To decrease communication cost and to be user

friendly, private databases containing medical records, proprietary

information, tax information, etc., are often accessible via the Internet by using

a low-security password scheme.

The privacy of data is obviously vulnerable during communication, and data

in transit can be modified, in particular in open networks. Because of the lack

of secure computers, such concerns extend to stored data. Data

communicated and/or and accessible over such networks include bank and

other financial transactions, love letters, medical records, proprietary

information, etc., whose privacy must be protected. The authenticity of (the

data in) contracts, databases, electronic commerce, etc. must be protected

against modifications by outsider or by one of the parties involved in the

transaction.

Modern cryptography provides the means to address these issues.

Cryptography includes two basic components: Encryption algorithm and Keys.

If sender and recipient use the same key then it is known as symmetrical or

private key cryptography. It is always suitable for long data streams. Such

system is difficult to use in practice because the sender and receiver must

know the key. It also requires sending the keys over a secure channel from

sender to recipient. The question is that if secure channel already exist then

transmit the data over the same channel.

On the other hand, if different keys are used by sender and recipient then it is

known as asymmetrical or public key cryptography. The key used for

encryption is called the public key and the key used for decryption is called

the private key. Such technique is used for short data streams and also

requires more time to encrypt the data.


19/55

Page | 12

3.5.1 Techniques of Cryptography

There are two techniques used for data encryption and decryption, which are:

A] Symmetric Cryptography

If sender and recipient use the same key then it is known as symmetrical or

private key cryptography. It is always suitable for long data streams. Such

system is difficult to use in practice because the sender and receiver must

know the key. It also requires sending the keys over a secure channel from

sender to recipient.

There are two methods that are used in symmetric key cryptography: block

and stream.

The block method divides a large data set into blocks (based on

predefined size or the key size), encrypts each block separately and

finally combines blocks to produce encrypted data.

The stream method encrypts the data as a stream of bits without

separating the data into blocks. The stream of bits from the data is

encrypted sequentially using some of the results from the previous bit

until all the bits in the data are encrypted as a whole.

B] Asymmetric Cryptography

If sender and recipient use different keys then it is known as asymmetrical or

public key cryptography. The key used for encryption is called the public key

and the key used for decryption is called the private key. Such technique is

used for short data streams and also requires more time to encrypt the data.

Asymmetric encryption techniques are almost 1000 times slower than

symmetric techniques, because they require more computational processing

power. To get the benefits of both methods, a hybrid technique is usually

used. In this technique, asymmetric encryption is used to exchange the secret

key; symmetric encryption is then used to transfer data between sender and

receiver.


20/55

Page | 13

3.5.2 DES ALGORITHM

Data Encryption Standard (DES) is a cryptographic standard that was

proposed as the algorithm for the secure and secret items in 1970 and was

adopted as an American federal standard by National Bureau of Standards

(NBS) in 1973. DES is a block cipher, which means that during the encryption

process, the plaintext is broken into fixed length blocks and each block is

encrypted at the same time. Basically it takes a 64 bit input plain text and a

key of 64-bits (only 56 bits are used for conversion purpose and rest bits are

used for parity checking) and produces a 64 bit cipher text by encryption and

which can be decrypted again to get the message using the same key.

Additionally, we must highlight that there are four standardized modes of

operation of DES:

ECB (Electronic Codebook mode)

CBC (Cipher Block Chaining mode)

CFB (Cipher Feedback mode) and

OFB (Output Feedback mode)

The general depiction of DES encryption algorithm which consists of initial

permutation of the 64 bit plain text and then goes through 16 rounds, where

each round consists permutation and substitution of the text bit and the

inputted key bit, and at last goes through an inverse initial permutation to get

the 64 bit cipher text


21/55

Page | 14

Figure 5 DES Algorithm Overview


22/55

Page | 15

3.5.3 Steps for Algorithm

Step 1: Create 16 sub-keys, each of which is 48-bits long

The 64-bit key is permuted according to the following table, PC-1. Since the

first entry in the table is "57", this means that the 57th bit of the original key K

becomes the first bit of the permuted key K+. The 49th bit of the original key

becomes the second bit of the permuted key. The 4th bit of the original key is

the last bit of the permuted key. Note only 56 bits of the original key appear in

the permuted key.

.

Table 3 PC-1 Permuted choice 1

Next, split this key into left and right halves, C0 and D0, where each half has

28 bits.

From the permuted key K+, we get

C0 = 0011001111000011001100111100

D0 = 0011001111000011001100110011


23/55

Page | 16

With C0 and D0 defined, we now create sixteen blocks Cn and Dn,

1


24/55

Page | 17

We now form the keys Kn, for 1


25/55

Page | 18

Step 2: Encode each 64-bit block of data

There is an initial permutation IP of the 64 bits of the message data M. This

rearranges the bits according to the following table, where the entries in the

table show the new arrangement of the bits from their initial order.

Table 5 IP Initial Permutation Matrix

Here the 58th bit of M is "1", which becomes the first bit of IP. The 50th bit of

M is "1", which becomes the second bit of IP. The 7th bit of M is "0", which

becomes the last bit of IP.

Next divide the permuted block IP into a left half L0 of 32 bits, and a right half

R0 of 32 bits.

We now proceed through 16 iterations, for 1


26/55

Page | 19

This results in a final block, for n = 16, of L16 R16. That is, in each iteration,

we take the right 32 bits of the previous result and make them the left 32 bits

of the current step.

For the right 32 bits in the current step, we XOR the left 32 bits of the previous

step with the calculation f.

R1 = L0 + f(R0,K1)

To calculate f, we first expand each block Rn-1 from 32 bits to 48 bits. This is

done by using a selection table that repeats some of the bits in Rn-1 We'll call

the use of this selection table the function E. Thus E(Rn-1) has a 32 bit input

block, and a 48 bit output block. Thus the first three bits of E(Rn-1) are the

bits in positions 32, 1 and 2 of Rn-1 while the last 2 bits of E(Rn-1) are the bits

in positions 32 and1.

(Note that each block of 4 original bits has been expanded to a block of 6

output bits.)

Next in the f calculation, we XOR the output E(Rn-1) with the key Kn:

Kn + E(Rn-1).

Figure 7 Calculation of f(R,K)


27/55

Page | 20

To this point we have expanded Rn-1 from 32 bits to 48 bits, using the

selection table, and XORed the result with the key Kn . We now have 48 bits,

or eight groups of six bits. We now do something strange with each group of

six bits: we use them as addresses in tables called "S boxes". Each group of

six bits will give us an address in a different S box. Located at that address

will be a 4 bit number. This 4 bit number will replace the original 6 bits.

The net result is that the eight groups of 6 bits are transformed into eight

groups of 4 bits (the 4-bit outputs from the S boxes) for 32 bits total.

Write the previous result, which is 48 bits, in the form:

Kn + E(Rn-1) =B1B2B3B4B5B6B7B8, where each Bi is a group of six bits.

We now calculate S1(B1)S2(B2)S3(B3)S4(B4)S5(B5)S6(B6)S7(B7)S8(B8)

where Si(Bi) refers to the output of the i-th S box.

To repeat, each of the functions S1, S2,..., S8, takes a 6-bit block as input and

yields a 4-bit block as output.

The final stage in the calculation of f is to do a permutation P of the S-box

output to obtain the final value of f:

f = P(S1(B1)S2(B2)...S8(B8))

P yields a 32-bit output from a 32-bit input by permuting the bits of the input

block.

We calculate, R2 =L1 +f(R1, K2), and so on for 16 rounds. At the end of the

sixteenth round we have the blocks L16 and R16. We then reverse the order

of the two blocks into the 64-bit block R16 L16 and apply a final permutation

IP-1 as defined by the following table:


28/55

Page | 21

Table 6 Inverse Initial Permutation Matrix

Decryption is simply the inverse of encryption, following the same steps as

above, but reversing the order in which the sub-keys are applied.


29/55

Page | 22

4. DESIGN HIERARCHY

4.1 Transmission System

Figure 8 Transmission System Hierarchical Flow


30/55

Page | 23

4.2 Receiver System

Figure 9 Receiver System Hierarchical Flow


31/55

Page | 24

5. PLAN OF WORK

August September

Formation of final block

diagram

Study and Selection of

algorithms for

compression core

October November

Study of algorithms for

encryption core

Selection and Study of

hard ware description

language

January February

Coding of MUX and

DEMUX unit in Verilog

Study of DESencryption algorithm

Coding and

implementation of DES

encryption and

Decryption Unit

March April

Decision of not

implementing

compression core

because of increase in

complexity

Final system

connections and

structuring

Implementation of final

system


32/55

Page | 25

6. TESTING AND RESULTS

6.1 Transmission System

6.1.1 Device Utilization Summary

DEVICE UTILIZATION SUMMARY

Logic Utilization Used Available Utilization

Number of slices 559 3584 15%

Number of slice

Flip Flops

487 7168 6%

Number of 4 input

LUTs

989 7168 13%

Number of

bounded IOBs

17 141 12%

Number of BRAMs 4 16 25%

Number of GCLKs 1 8 12%

Table 7 Device Utilization Summary of Trx system


33/55

Page | 26

Figure 10 RTL Schematic of the Transmitter System

Figure 11 Waveform for Transmitter System


34/55

Page | 27

6.2 Receiver System

6.2.1 Device Utilization Summary

DEVICE UTILIZATION SUMMARY

Logic Utilization Used Available Utilization

Number of slices 772 3584 21%

Number of slice

Flip Flops

743 7168 10%

Number of 4 input

LUTs

1185 71568 16%

Number of

bounded IOBs

20 141 14%

Number of

BRAMs

4 16 25%

Number of

GCLKs

5 8 62%

Table 8 Device Utilization summary of Rx System


35/55

Page | 28

Figure 12 RTL Schematic of the Receiver System

Figure 13 Waveform for Receiver System


36/55

Page | 29

7. DISCUSSION OF RESULTS

7.1 Timing Report of Transmitter Core

Delay: 9.534ns (Levels of Logic = 4)Source: Sel (PAD)

Destination: sample (PAD)

Data Path: Sel to sample

Cell: in->out Fanout Delay Delay

IBUF:I->O 128 0.715 2.338

LUT3:I0>O 1 0.479 0.000

MUXF5:I1>O 4 0.314 0.779

OBUF:I->O 4.909

Total 9.534ns (6.417ns logic, 3.117ns route)(67.3% logic, 32.7% route)

Table 9 Timing Report of Transmitter Core


37/55

Page | 30

7.2 Timing Report of System core

Offset: 10.138ns (Levels of Logic = 4)

Source: T/bitcounter_5 (FF)Destination: SERIAL_CIPHER_TEXT (PAD)

Source Clock: CLK rising

Data Path: T/bitcounter_5 to SERIAL_CIPHER_TEXT

Cell:in->out Fanout Gate Delay Net Delay

FDRE:C->Q 5 0.626 1.078

LUT4:I0->O 1 0.479 0.704

LUT4:I3->O 1 0.479 0.704

LUT4:I3->O 1 0.479 0.681

OBUF:I->O 4.909

Total 10.138ns (6.972ns logic, 3.166ns route)(68.8% logic, 31.2% route)

Table 10 Timing Report of System Core


38/55

Page | 31

8. CONCLUSION

A proprietary high-speed encryption and decryption core design is analyzed. It

is observed from the timing reports that the computations of the System Core

occur at a very high speed as compared to the existing software Prototypes.

Since the data is sent on an FPGA the data sent is secured as it is a

Hardware Channel. Hence, due to the Hardware Implementation of such a

system a secure and fast data transfer takes place.

The first limitation is that of the hardware implementation of the compressioncore which occurs due to the complexity of the algorithm to be implemented in

HDL.

The second limitation is the data sent is sent serially through a PC which

makes the system slow (UART).

A complete system core and its associated test firmware are also developed

that form the hardware evaluation platform. Using this evaluation platform,

functionality of the design running on real hardware is proven.


39/55

Page | 32

9. Appendix A

CORE DESIGN

Design of Compression Unit

The main hardware module of the compression unit consists of three

hierarchical blocks, which are the LZSS coder, fixed Huffman coder and data

packer. All modules are synchronously clocked. The LZSS coder performs the

LZSS encoding of the source data symbol, while the fixed Huffman coder re-

encodes the length of LZSS codeword to achieve better compression ratio.

Finally, the data packer packs the unary codes from the fixed Huffman coder

into a fixed-length output packet and sends it to the interfacing block.

Figure 14 Huffman Coder Block

This suggests Huffman coding be employed to further encode the length

portion of LZSS code-word in order to achieve higher compression saving. In

the decompression side, the whole process is performed in the reverse order.


40/55

Page | 33

1. LZSS CODER

The LZSS algorithm, however, involves computationally intensive matching

process during the compression stage because each input phrase has to be

compared with every possible phrase in the dictionary. Furthermore, the

dictionary updating process involves variable length shifting of the input

source into the dictionary, since the length of longest matched phrase

changes with time. If this operation is done using variable-length shifter,

considerable amount of hardware resources will be consumed, which can

lead to higher implementation cost because bigger (and correspondingly,

more expensive) programmable logic device or ASIC silicon is needed. The

design tackles these problems through systolic array architecture of the LZSS

compression dictionary, where each input data is compared with every

dictionary elements simultaneously, while shifting input data is done one

symbol at a time through the use of a fixed-length shifter.

Figure 15 Structure of LZSS Algorithm Used

In order to achieve sufficiently high processing speed to obtain data

independent throughput, and to use fixed-length shifter to reduce the

hardware resource utilization, the LZSS coder design employs systolic arrays


41/55

Page | 34

architecture. The hardware architecture consists of four main components;

namely the dictionary, reduction tree, delay tree and codeword generator sub-

modules.

2. HUFFMAN CODER

The Huffman coding technique also presents certain design challenges.

Conventional Huffman coding requires a priori knowledge of the source data

distribution characteristics in order to construct an optimal encoding table for

better performance.

However, in many real-life applications, it is difficult to determine the

characteristics of source data because its probability distribution normally

changes with time. Even when the source distribution statistics are available,

different sources have different distribution characteristics. The encoding table

must then be generated for each type of source data. Furthermore, the

generated table must be transmitted along with the encoded data so that

decompression can be performed correctly. This would both reduce the

compression saving and increase the processing time of the hardware. The

design tackles these problems by employing a predefined Huffman encoding

table for both compression and decompression cores.

The reason for this is two-fold; the first one is to simplify generation of the

encoding table since adaptively building the table for different source data is

no longer required. The second reason is to eliminate the need to transmit the

encoding table to the decompression side, so that inefficient resource

utilization and degradation of compression saving issues due to this encoding

table transmission can be overcome.


42/55

Page | 35

DES ENCRYPTION AND DECRYPTION SYSTEM CORE


43/55

Page | 36

Figure 16 RTL Schematic of DES - ENCRYPTION

Figure 17 RTL Schematic of DES - DECRYPTION


44/55

Page | 37

COMPLETE SYSTEM CORE


45/55

Page | 38

Figure 18 RTL Schematic of Complete Trx System

Figure 19 RTL Schematic of Complete Rx System


46/55

Page | 39

APPENDIX B

TRANSMITTER SYSTEM CORE VERILOG CODE

This appendix presents the Verilog source codes of the transmitter system

core and all its sub-modules. The design hierarchy is presented in Design

Hierarchy.

The Verilog source codes starting from the top level module are presented

here. However, the complete codes are not given in the report.

Module Name: Transmitter_System_Top

module Transmitter_System_Top(CLK, RST, CHIP_SELECT_BAR,

ADDRESS, SERIAL_CIPHER_TEXT, Sel, transmit,

waddress,

we,

cs_ram_rec,

cs_ram_tx,

ENA,

DIN,

RD,

DR

);

//Input Signals

input ENA,DIN,RD;

input CLK;

input RST;

input cs_ram_rec, cs_ram_tx;

input CHIP_SELECT_BAR;

input ADDRESS;

input transmit,we;

input [3:0] waddress;

input [1:0]Sel;


47/55

Page | 40

//Output Signals

output DR;

output SERIAL_CIPHER_TEXT;

// Internal Wires

wire CLK;

wire RST;

wire CHIP_SELECT_BAR;

wire ADDRESS;

wire [64 : 1] CIPHER_TEXT_RAM;

wire [1:0]Sel;

wire [64:1]I3,I2,I1,I0;

wire [64:1]O;

wire [64:1] inter_mux;

wire [64:1] to_tx;

wire [64:1] to_ram;

// Receiver Module

RX Receiver(

.CLK(CLK),

.RST(RST),

.DIN(DIN),

.ENA(ENA),

.RD(RD),

.DR(DR),

.DOUT(to_ram)

);

// Receiver RAM module

ram1 RAM_REC(

.CLK(CLK),

.waddress(waddress),

.data_in(to_ram),

.we(we),


48/55

Page | 41

.cs(cs_ram_rec),

.data_out(inter_mux)

);

// MUX module

mux4to1 MUX(

.I0(inter_mux),

.I1(inter_mux),

.I2(inter_mux),

.I3(inter_mux),

.Sel(Sel),

.Y(O)

);

// Encryption Module

Des_Top ENCRYPT(

.CLK(CLK),

.RST(RST),

.CHIP_SELECT_BAR(CHIP_SELECT_BAR),

.ADDRESS(ADDRESS),

.PLAIN_TEXT(O),

.CIPHER_TEXT(CIPHER_TEXT_RAM)

);

//Transmitter RAM Module

ram1 RAM_TX(

.CLK(CLK),

.waddress(waddress),

.data_in(CIPHER_TEXT_RAM),

.we(we),

.cs(cs_ram_tx),

.data_out(to_tx)

);


49/55

Page | 42

//Transmitter Module

Transmitter T(

.CLK(CLK),

.RST(RST),

.transmit(transmit),

.data(to_tx),

.TxD(SERIAL_CIPHER_TEXT)

);

endmodule


50/55

Page | 43

APPENDIX C

DATASHEETS


51/55

Page | 44


52/55

Page | 45


53/55

Page | 46


54/55

Page | 47


55/55

10. REFERENCE

[1] J. Gailly, GZIP the Data Compression Program, 1993.

ftp://ftp.gnu.org/gnu/GZIP/ GZIP-1.2.4.tar.gz.

[2] T. A. Welch, A Technique for High-Performance Data Compression,

IEEE Computer., vol. 17, pp. 819, 1984.

[3] J. Ziv and A. Lempel, Compression of Individual Sequences Via Variable-

Rate Coding, IEEE Transactions on Information Theory, 1978.

[4] S. Leinen, Long-Term Traffic Statistics, 2001.

http://www.cs.columbia.edu/hgs/ internet/traffic.html.

[5] L. Deutsch, DEFLATE Compressed Data Format Specification Version 1.3,1996. ftp: //ftp.uu.net/pub/archiving/zip/doc/.

[6] D. Huffman, A Method for the Construction of Minimum-Redundancy

Codes, Proceedings of the Institute of Radio Engineers, vol. 40, pp. 1098

1101, September 1952.

[7] J. Ziv and A. Lempel, A Universal Algorithm for Sequential Data

Compression, IEEE Transactions on Information Theory, vol. 23, no. 3, pp.

337343, 1977.

[8] Z. Li and S. Hauck, Configuration Compression for Virtex FPGAs, Field

Programmable Custom Computing Machines, pp. 147159, 2001.

[9] J. Storer and T. Szymanski, Data Compression via Textual Substitution,

Journal of the ACM, vol. 29, no. 4, pp. 928951, 1982.

[10] N. Larsson, Extended Application of Suffix Trees to Data Compression,

Proceedings of the Conference on Data Compression, p. 190, 1996.

[11] T. C. Bell and D. Kulp, Longest-Match String Searching for Ziv-Lempel

Compression, Software - Practice and Experience, vol. 23, no. 7, pp. 757

771, 1993.

[12] Suzanne Rigler, FPGA-Based Lossless Data Compression Using GNU

Zip.

data transformation unit

Documents