read.pudn.comread.pudn.com/downloads599/doc/comm/2446752/docu… · web viewchapter 1....
Post on 17-Sep-2020
7 Views
Preview:
TRANSCRIPT
CHAPTER 1
INTRODUCTION
1.1 Hash algorithm
In this thesis, hash function SHA-256 is implemented on FPGA in a processor
structure. The design is described and captured using a hardware description
language, namely Verilog. Due to the rapid developments in the wireless
communications area and personal communications systems, providing information
security has become a more and more important subject. This security concept
becomes a more complicated subject when next-generation system requirements and
real-time computation speed are considered. In order to solve these security problems,
lots of research and development activities are carried out and cryptography has been
a very important part of any communication system in the recent years. Cryptographic
algorithms full fill specific information security requirements such as data integrity,
confidentiality and data origin authentication. Hash functions are among the most
important cryptographic algorithms and used in the several fields of
communication integrity and signature authentication. These functions are sort of
operations that take an arbitrary length of input and produce a condensed
representation of that input. This condensed representation of an arbitrary long input
is usually referred as message digest or hash value. The size of the message digest is
fixed depending on the particular hash function being used. The security of a hash
function is directly related to this message digest length. Hash functions have some
specific properties that make them secure; these properties are pre-image resistance,
second pre-image resistance and collision resistance as indicated in the documents of
FIPS. Pre-image resistance means that for all predefined hash values it is
computationally very hard to find an input having that particular hash value. Second
pre-image resistance means that given an input, it is computationally very hard to find
another input such that both inputs have the same hash value. Collision resistance
means that it is computationally very difficult to find two inputs having the same hash
value.
Hash functions are mostly used to provide password authentication in different
applications, generating digital signature with DSA (Digital Signature Algorithm)
1
And for verifying data integrity. In order to protect passwords from attacks, hash
values of the passwords are stored in the password database rather than clear text.
When a user logs into the system, the hash of the password entered by the user is
calculated and compared with the one stored in the database. If two hash values
match, the user is authenticated; otherwise the user is not granted.
In order to generate digital signatures and sign the document with that
signature, the hash value of the document is calculated. Then, this calculated hash
value is encrypted with a private key/public key using an encryption algorithm. This
digital signature is appended to the document and the document is sent with that
signature. At the receiving end only the user having the public key/private key related
to the person sending the document can decrypt the digital signature and reach to the
original hash value. The receiving person then calculates the hash value of the
received document. If the two hashes match then both the origin of the document is
authenticated and the content of the document is verified. In order to verify data
integrity, the hash values of the documents are calculated and kept in a location. Then
at a later time, hash value of the document is recomputed. If the hash values do not
match one conclude that the file is corrupted. The same technique is used for time
stamping the documents.
There are lots of hash functions developed up to now and MD5 (128 bit),
SHA-1, SHA-256, SHA-384 and SHA-512 are the most popular of them. The oldest
of these hash functions is the MD5 hash function. This function is developed in 1991
and has an output size of 128 bits. Researches on developing more secure hash
functions continued and in 1993 a more secure hash function SHA-1 which provides
an output size of 160 bits is developed. In 2002, in order to catch security levels
offered by other cryptographic algorithms, NIST developed the three new hash
functions: SHA-256, SHA-384 and SHA-512. These hash functions are standardized
with SHA-1 as SHS (Secure Hash Standard). A 224-bit hash function SHA-224,
based on SHA-256, has been added to SHS in 2004. Hash calculations are mainly
composed of three sections. In the first part the incoming message is padded and fixed
sized message blocks are prepared according to the particular hash function being
applied. After these padding operations, the message schedule is prepared. In this
state, message block is further divided into sub blocks to be used in each round of the
2
hash calculation process. In the hash calculation process message digest is computed
after some specific number of iterations related to the algorithm by using:
(i) Algorithm specific constants
(ii) Message words prepared by the message scheduler
(iii) The chaining variables
Hash functions can be implemented in hardware or software. However, as
security and throughput requirements of the systems increase, it is found that software
implementations cannot provide desired security and throughput values. As a result, it
is preferred to implement the hash functions in hardware. There are several hash
function implementations in the literature and commercially available in the market.
These implementations differ from each other according to the properties such as
area, speed and throughput. Kyu et al. implemented SHA-1, HAS-160 and MD5
algorithms in a single chip and proposed two architectures one resource sharing and
the second non-resource sharing. McLoone et al. implemented SHA-512 and SHA-
384 on a single chip. The proposed design achieves a throughput of 479 Mbps using a
shift register design approach in the message scheduling part and look up tables for
the constants required by the algorithms. Grembowski et al. implemented SHA-1 and
SHA-512 hash functions separately and compared the implementation results.
Sklavos et al. implemented SHA-1 and RIPEMD-160 hash functions in the same
hardware module. The advantage of the proposed implementation is that it exhibits
high throughput due to the pipeline technique used in the design. In an another study,
Sklavos et al. determined a common architecture for SHA-256, SHA-384 and SHA-
512 hash functions and implemented these functions separately. The implementation
results of the three functions are compared in the provided security level and in
the performance by using hardware terms. Michail et al. implemented SHA-1 hash
function is in such a way that the throughput of the design is increased by %53 and
the power dissipation is kept low. In a recent work on hash function implementations,
T.S. Ganesh et al. unify the hash functions MD5, SHA-1 and RIPEMD160.
The design is proposed to exhibit better throughput when compared to the
existing hash function implementations. In this study, hash functions SHA-1 and
SHA-256 are implemented in a processor structure. Hash functions SHA-1 and SHA-
3
256 are chosen considering the architectural similarities such as, word size and block
size and at the same time some computational differences that make the design not
straightforward. Analyzing the hash functions an instruction set is developed. The
instruction set consists of 14 instructions. Among these instructions six of them are
special instructions developed for SHA-1 and SHA-256 hash functions. The other
instructions are general purpose instructions. The address length of the instructions is
six bits. The data length is 32 bits. The proposed instruction set can be extended for
other hash algorithms and they can be implemented using the same architecture.
1.2 SHA
SHA (Secure Hash Algorithm) is designed by National Security Agency of the
U.S.A. It is a message compress standard is used to co-operate DSS (Digital Signature
Standard) that designed by NIST (National Institute of Standards and Technology).
Though SHA is designed for DSS, it can be also used in many protocols or secure
algorithm. The original version of SHA is called SHA or SHA-0. SHA-1 is the
improved version of SHA-0. The following figure 1-1 represents the internal block
diagram of sha-1.
Figure 1-1: Sha-1 Basic Blocks
Using SHA-1, a message which is no longer than 264 bit can be generated a 160
bit message abstract. Message abstract is much shorter than the message itself, so it
will spend less time to generate a digital signature. The more important is that the
digital signature generate by message abstract has the same security as generate by
4
message. The most important of all, SHA-1 is implied easily. Message compress
standard SHA is designed for DSS. The input of SHA is a message which is no longer
than 264 bit, and it can generate a 160 bit message abstract. If a message no longer than
264 bit, it needs to be added zeros to make the message become a 264 bit one. And if a
message longer than 264 bit, it needs to be separated into several groups. Every group
contains 264 bit. Then the message groups will be converted into message abstract
groups by SHA algorithm. When message abstract is generated, five 32 bit initial
values A, B, C, D, E will be used. Every time SHA-1 operates, non-linear function F t,
constant Wt and Kt are different if t is different value. According to parameter t, the
non-linear function Ft. The message m should be separated into group contains 512bit.
Then every group needs to be separated into 16 sub-groups Wt which contains 32bit in
everyone.
5
CHAPTER-2
HASH FUNCTIONS AND PROCESSORS
2.1 Hash functions
2.1.1 Definition and properties of hash function
A hash function is a sort of operation that takes an input and produces a fixed-
size string which is called the hash value. The input string can be of any length
depending on the algorithm used. The produced output is a condensed representation
of the input message or document and usually called as a message digest, a digital
fingerprint or a checksum. The size of the message digest is fixed depending on the
particular algorithm being used. This means that for a particular algorithm, all input
streams yield an output of same length. Furthermore a very small change in the input
results with a completely different hash value. This is known as the avalanche effect.
The hashing operation is illustrated below in Figure 2-1.
Figure 2-1 Hashing Operation
The security of a hash function is directly related to the message digest length.
Pre-image resistance, second pre-image resistance and collision resistance are very
important characteristics of any hash function.
6
1. Pre-image resistance (one-wayness): For all specified hash values it is
computationally very hard to find an input message having that particular hash
value. This property is illustrated in Figure 2-2.
Figure 2-2 Preimage Resistance
2. Second pre-image resistance: Given an input message m1, it is
computationally very hard to find another input message m2 such that
hash (m1)=hash(m2). This property is illustrated in Figure 2-3.
7
Figure 2-3 Second Preimage Resistance
3. Collision resistance: It is computationally very hard to find any two different
inputs that have the same hash value. This property is illustrated in Figure 2-4
Figure 2-4 Collision Resistance
Hash functions can be classified as keyed and un keyed hash functions. The
keyed hash functions take a secret key as an additional input parameter. In this case,
the above defined characteristics of hash functions are satisfied for any value of the
secret key. Keyed hash functions are also named as Message Authentication Codes or
MACs. In this study, we only deal with un keyed hash functions.
2.1.2 Applications of hash functions
The most common use fields of hash functions are verifying data
integrity, providing password authentication and generating digital signatures with
DSA in applications such as electronic mail, electronic funds transfer, software
distribution and data storage which require data integrity assurance and data origin
authentication. Data integrity is a very important part of a secure system. Any changes
made to the files can be detected by generating the message digests of the files using a
hash function. These digests are saved and in the future the digest is recomputed on
the file, if the new digest is different from the original digest, this means that the
8
original file is corrupted some way. This can be very important when protecting
critical system binaries and sensitive databases. As an addition during file
transmission through the networks such as the internet, files can be corrupted. In order
to verify that the received file is identical to the original file, the message digest of the
received file is calculated.
Then this calculated message digest is compared with the original one
published by the WEB site or FTP site. Since it is computationally very hard to find
two inputs that have the same hash value (collision resistance property of a hash
function), if the calculated digest is different from the original, one can be sure that
the received file differs from the transmitted file. Verifying data integrity by means of
a hash function is illustrated below in Figure 2-5.
Figure 2-5 Verifying Data Integrity
Password authentication is another field that hash functions are used. For
computer systems, it is insecure to store passwords in clear-text. Someone may reach
all of the passwords and entire user password database can be compromised. Because
of these reasons, a more secure way is to store the hashes of the passwords rather than
clear text passwords.
9
Storing the hashes of passwords is shown below in Figure 2-6.
Figure 2-6 Storing the Hash of a Password
When a user logs in, the hash value of the submitted password is calculated
and compared with the one stored in the password database. If the calculated hash
value is identical to the one stored in the database, the user is authenticated, and
otherwise the user is not granted. This scenario is illustrated below in Figure 2-7.
Figure 2-7 Authenticating Users
10
By this way, even if the password database is compromised, user privacy is still
protected since it is computationally very difficult to obtain the original passwords
from the hash values. One of the most popular applications of hash functions is
digital signatures. A digital signature is a type of asymmetric cryptography used to
simulate the security properties of a signature in digital, rather than in written form
Digital signatures are used to provide authentication of the associated input, usually
called a message. Messages can be anything from electronic mail to someone or even
a message sent in a more complicated cryptographic protocol. A digital signature
scheme consists of three algorithms:
A key generation algorithm G that randomly produces a “key pair” (PK, SK)
for the signer PK is the verifying key which is to be public and SK is the
signing key, to be kept private.
A signing algorithm S that, on input of a message m and a signing key SK,
produces a signature.
A signature verifying algorithm V that on input a message m, a verifying key
PK, and a signature, either accepts or rejects.
Two main properties are required. First, signatures computed properly should
always verify. That is, V should accept m, PK, S (m, SK) where SK is the secret
key related to PK, for any message m. Secondly, it should be hard for any adversary,
knowing only PK, to create valid signatures.
In practice, computing the digital signature of a long message with public key
algorithms is very inefficient. To save time, digital signature protocols are often
implemented with one-way hash functions. Instead of signing the whole document,
hash of the document is signed. In this case, the scenario is as follows:
The hash value of the document is calculated.
The calculated hash value is encrypted with the private key, there by the
document is signed.
The document and the signed hash value are send to the recipient the recipient
calculates the one way hash value of the document and decrypts the signed hash value
by using the public key. If the signed hash value is the same with the calculated hash
value, then the signature is valid.
11
The application and verification of a digital signature are illustrated below in
Figure 2-8 and Figure 2-9.
Figure 2-8 Application of a Digital Signature
12
Figure 2-9 Verification of a Digital Signature
If a hash function were not used, the recipient would not be sure that the
data integrity is protected. Since hash functions are one way functions, any
change in the document will change the signature and the signature would not be
validated.
As a result, when the signature is validated, the recipient makes sure that
the document is not altered. Another benefit of digital signatures is the
authentication of the source of the messages. Since private key used in the
encryption process belongs to a specific user, a valid signature shows that the
message is sent by that user.
One of the earliest proposed applications of digital signatures was to
facilitate the verification of nuclear test ban treaties. The United States and
Soviet Union (do not exist anymore) permitted each other to put seismometers
on the other’s soil to monitor nuclear tests. The problem was that each country
needed to assure itself that the host nation was not tampering with the data
from the monitoring nation’s seismometers. Simultaneously, the host nation
needed to assure itself that the monitor was sending only the specific
information needed for monitoring. Conventional authentication techniques can
solve the first problem.
But only digital signatures can solve both problems. The host nation can
read but not alter the data from seismometer and the monitoring nation knows
that the data has not been tampered with.
2.1.3 Attacks to the hash functions
There are two brute-force attacks to a hash function. In a brute force,
random inputs are tried and the results of the computations are stored until a
collision is found. The first attack can be described as follows: Suppose that the
hash of a specific message is given, an adversary can try to find another message
which has the same hash value. On the other hand, the second attack can be
explained as follows: suppose that an adversary tries to find to messages that
have the same hash value. This attack is easier than the first one and known as
birthday attack.
Birthday attack gets its name from the birthday paradox, which is a
known statistical problem. The answer to the question, how many people there
13
must be in a room for at least one person sharing your birthday is 183, but
surprisingly, the answer to the question how many people there must be in a
room for at least two of them will share the same birthday is 23. This means that
the probability of two or more people in a group of 23 having the same birthday
is greater than ½. Thus, assume that there is a hash function with n-bit output. In
order to find a message having a particular hash value, 2n hash calculations. On
the other hand, finding two messages having the same hash value would only
require 2n/2 hash calculations. For instance, a machine which can compute the
hash values of one million messages per second would take 600.000 years to
find a second message that have a given 64-bit hash value where the same
machine can find two messages having the same hash value in about an hour.
This means that in order to avoid a birthday attack, someone should choose a
hash value twice as long as the actual needed length.
2.1.4 Known hash functions
There is several hash functions developed up to now and among these
hash functions MD5, SHA-1, and SHA-256 are most popular. Summary of the
standard hash functions is given in Table 2-1.
Table 2-1 Summary of Standard Hash Functions
Algorithm Output Block size Word size Rounds xSteps
Year of the standard
MD4 128 512 32 16x3 1990
MD5 128 512 32 16x4 1991
RIPEMD 128 512 32 16x3
(X2 parallel)
1992
RIPEMD-128
128 512 32 16x4
(X2 parallel)
1996
RIPEMD- 160 512 32 16x5 1996
14
160 (X2 parallel)
SHA-0 160 512 32 80 1993
SHA-1 160 512 32 80 1995
SHA-256 256 512 32 64 2002
SHA-224 224 512 32 64 2004
SHA-384 384 1024 64 80 2002
SHA-512 512 1024 64 80 2002
MD4 proposed by Ron Rivest in 1990 was designed by using 32-bit
operations for high speed software implementations on 32-bit processors. MD
stands for message digest and the numerals refer to the functions being the
fourth design from the same hash function family. However, a collision problem
was found and in 1991 MD4 was reformed to MD5 by adding countermeasures
such as increasing the number of compression rounds from three to four
The compression function of MD5 operates on 512 bit blocks and this 512 bit
block is further divided into 16 32-bit sub blocks. The word size is 32 bits. There
are four 32-bit chaining variables and the output size is 128 bits. One important
parameter for compression functions is the number of rounds-the number of
sequential updates of the chaining variables. The compression function of MD5
has 64 rounds. MD5 is one of the most popular hash functions for many
applications such as IPSec. However it was pointed out that, collisions can be
generated using the compression function of MD5 and its 128-bit hash value is
not long enough to stop birthday attacks. It was estimated that two messages that
have the same hash value could be found within 24 days by developing a
dedicated hardware with a cost of 10 million dollars. Considering the processing
power of computers is improving 10-fold every 5 years, MD5 is no longer
secure against the birthday attack, and it is not recommended for future use.
RIPEMD is a 128 bit hash function developed by the RIPE (RACE
Integrity Primitives Evaluation) project in 1992 to address the attack on MD4.
15
However collisions for the first two and the last two out of three rounds were
found. In addition, a 128-bit hash value is no longer secure enough so as
described above and thus RIPEMD was improved to the 160-bit hash function
RIPEMD-160 in 1996 which has a five round compression function. At the same
time, a 128-bit hash function RIPEMD-128 that has a four round compression
function was proposed to replace RIPEMD.
NIST (National Institute of Standards and Technology) standardized a
160-bit hash function SHA (Secure Hash Algorithm) for the use with a digital
signature algorithm DSS (Digital Signature Standard) in 1993. Soon after that a
way was found to cause collisions in the compression function by analyzing the
message expansion function that consisted of only XOR (exclusive OR)
operations. In order to modify this SHA was modified to SHA-1 by adding a
one-bit rotation to the message expansion function. A 160-bit hash function hash
a security level on the order of 80 bits, so SHA-1 is designed to match the
security level of the block cipher Skipjack that uses 80-bit secret key. SHA-1 is
model taking some cues from MD5, it operates on 512 bit blocks and has five 32
bit chaining variables. The output length is 160 bits. Although the round
functions are less varied and simpler than those of MD5, SHA-1 has more
rounds-80 instead of 64. SHA-1 uses a more complex procedure for deriving 32-
bit sub blocks from the 512 bit message. Where this number is just four for
MD5. In 2001 NIST standardized the new block chipper AES (Advanced
Encryption Standard) to replace the DES (Data Encryption Standard) that had
been used for more than 20 years. AES supports three key lengths, 128, 192 and
256 bits, whose security levels are higher than SHA-1. In order to match these
security levels, NIST developed three new hash functions SHA-256, SHA-384,
and SHA-512 whose hash value sizes are 256, 384 and 512 bits,
respectively.SHA-256 and SHA-512 have similar designs, with SHA-256
operating on 32-bit words and SHA-512 operating on 64-bit words. Both
designs bear strong resemblance to SHA-1 although they are much closer to
each other than to their common predecessor. SHA-384 is a trivial modification
of SHA-512 which consists of trimming the output to 384-bits and changing the
initial value of the chaining variable. These hash functions are standardized with
SHA-1 as SHS (Secure Hash Standard) and a 224-bit hash function, SHA-224,
16
based on SHA-256, was added to SHS in 2004. SHA-224 is a truncated version
of SHA-256 with a different initial value. The most important difference
between the three new functions and SHA-1 is the procedure for deriving 32-bit
sub blocks from one block of message. Recently collisions for MD4, MD5,
RIPEMD and SHA have been reported and a possibility for breaking SHA-1 has
been suggested. Therefore, the migration to more secure hash functions should
be accelerated.
In this study, SHA-1 and SHA-256 hash functions are chosen to be
implemented as a starting point. The reason for such a selection is that SHA-1 is
one of the most commonly used hash functions and SHA-256 is developed after
SHA-1 and offers increased security levels. As described above, both of these
functions operate on 512-bit message blocks and word sizes are the same 32-
bits. Although they are similar in general, number of chaining variables, the
output size, generation of 32-bit sub blocks from 512-bit message blocks and
number of rounds differ from each other.
2.2 Hash processor implementation:In this study SHA-1 and SHA-256 hash functions are implemented in a
general processor structure. The design is fully described and captured using a
hardware description language named VHDL and implemented on Xilinx FPGA. The
aim is to follow all the steps in a digital hardware design flow and implement the hash
functions in a processor structure rather than in classical form. The first step in a
digital hardware design process is to determine the design methodology that will be
followed in order to satisfy the specifications determined. In this study, the aim is to
implement the SHA-1 and SHA-256 hash functions in a processor structure. Thus as a
first step, processor design on FPGA concept is examined and the design modules that
are going to be implemented are determined. There are generally two types of
processors: general purpose processors and dedicated processors. General purpose
processors such as Pentium CPU can perform different tasks under the control of
software instructions. General purpose processors are used in all personal computers.
Dedicated processors on the other hand are designed to perform one specific
task. Dedicated processors are usually much smaller than and not as complex as
general purpose processors. However they are used in every smart electronic device
17
such as TVs, cell phones, microwave ovens etc. The designed hash processor can be
considered as a general purpose processor. The logic circuit of a processor can be
divided into two parts: the data path and the control unit. The data path is responsible
for the actual execution of all data operations performed by the processor such as the
addition of two numbers. Even though the data path is capable of performing all the
data operations of the processor, it cannot however do it on its own. In order for the
data path to execute the operations automatically the control unit is required. The
control unit is a finite state machine (FSM) because it is a machine that executes by
going from one state to another that there are only a finite number of states for the
machine to go. A simple block diagram of a processor is shown below in Figure 2-10
Figure 2-10 Block Diagram of a Processor
The data path usually contains an arithmetic logic unit (ALU) and registers for
temporary storage of the data. Additionally, a program memory to hold the
instructions that are going to be run is a very important part of a processor. As a
consequence of these it is decided that the hash processor will contain a control unit, a
program memory and a data path. The internal structure of the data path, control unit
and program memory are determined according to the properties of hash functions
SHA-1 and SHA-256
CHAPTER-3
SECURE HASH ALGORITHM-256
18
3.1 Hash computational flow
Every hash computation process consists of two stages. The first stage is
the pre-processing stage. In this stage the message is padded, parsed into n
blocks and the chaining variables are initialized. In the second stage, hash
calculation is done. In the hash calculation stage, constants, functions and word
operations specific to the hash function are used. Hash calculation generates a
message schedule from the padded message and uses that schedule, along with
functions, constants and word operations to iteratively generate a series of hash
values. The final hash value generated by the hash computation is used to
generate the message digest. This scenario is illustrated below in Figure 3-1.
Figure 3-1: General Hash Computational Flow
SHA256 is one of the most popular hash functions. The message block size for
SHA-256 is 512 bits and message digest size is 256 bits. Calculation of message
digest for one block message is completed in 64 rounds. The general properties of
19
SHA-256 are summarized in Table 3-1.
SHA-256 calculation is completed in 64 rounds and 8 hash variables each of 32
bits are used. The word size of all the calculations is 32 bits. The padded message is
processed by 512 bit blocks. This 512 bit block is composed of 16 message words.
These 16 message words are expanded by means of functions and in each of the total 64
rounds a new message word is used.
Table 3-1 SHA-256 Summary
SHA-256
Message size <264
Block size 512 bits
Word size 32 bits
Trans rounds 64
Message digest 256 bits
Security 128 bits
# of chaining variables 8
3.1.1 SHA-256 functions SHA-256 uses six different logical functions. These functions operate on 32 bit
words and each has three parameters. These functions are:
1) Ch x , y , z x y x z
The architecture of this function is illustrated in Figure 3-2
20
Figure 3-2 Ch Function Architecture
2) Maj ( x , y , z ) x y x z y z
The architecture of this function is illustrated in Figure 3-3.
Figure 3-3 Maj Function Architecture
3) ∑0 {256} = ROTR 2(x) ROTR13(x) ROTR22(x)
The architecture of this function is shown below in Figure 3-4
21
Figure 3-4 ∑0 {256} Function Architecture
4) 1{256} = ROTR6(x) ROTR11(x) ROTR25(x)
The architecture of this function is shown in the figure 3-5
Figure 3-5 ∑1 {256} Function Architecture
5) 0 {256}
(x) = ROTR7(x) ROTR18(x) SHR 3(x)
The architecture of this function is shown in the figure 3-6
22
Figure 3-60 {256}
(X) Function Architecture
6) 1{256} (x) = ROTR 17(x) ROTR 19(x) SHR 10(x)The architecture of this function is shown in the figure 3-7
Figure 3-7 1{256} (X) Function Architecture
3.1.3 SHA-256 constants
SHA-256 uses a sequence of sixty-four constant 32-bit words,k0 (256)
, k1 (256),...,
k63(256) These words represent the first thirty-two bits of the fractional parts of the cube
23
roots of the first sixty four prime numbers. In hex, these constant words are (from left
to right)
428a2f98 71374491 b5c0fbcf e9b5dba5 3956c25b 59f111f1 923f82a4 ab1c5ed5
d807aa98 12835b01 243185be 550c7dc3 72be5d74 80deb1fe 9bdc06a7 c19bf174
e49b69c1 efbe4786 0fc19dc6 240ca1cc 2de92c6f 4a7484aa 5cb0a9dc 76f988da
983e5152 a831c66d b00327c8 bf597fc7 c6e00bf3 d5a79147 06ca6351 14292967
27b70a85 2e1b2138 4d2c6dfc 53380d13 650a7354 766a0abb 81c2c92e 92722c85
a2bfe8a1 a81a664b c24b8b70 c76c51a3 d192e819 d6990624 f40e3585 106aa070
19a4c116 1e376c08 2748774c 34b0bcb5 391c0cb3 4ed8aa4a 5b9cca4f 682e6ff3
748f82ee 78a5636f 84c87814 8cc70208 90befffa a4506ceb bef9a3f7 c67178f2
3.1.4 SHA-256 computation flow SHA-256 computation is composed of two stages, pre-processing stage and hash
calculation stage. In the pre-processing stage, message is padded, divided into 16 32-
bit sub blocks and message schedule is prepared.
Message Padding: Suppose that the length of the message, M, is l bits.
Append the bit “1” to the end of the message, followed by k zero bits,
where k is the smallest, non-negative solution to the equation
l1k 448 mod 512. Then append the 64-bit block that is equal to the number l expressed using a binary representation. For example, the (8-bit ASCII) message
“abc” has length 8x3 24, so the message is padded with a one bit, then
448251423 zero bits, and then the message length, to become the 512-bit padded
message. This is illustrated below in figure 3-8.
Figure 3-8 Message Padding
Setting the initial value: The 256-bit initial hash value H (0) is composed of eight 32-bit
words is shown in the table 3-2.
Table 3-2 Initial Hash Value for SHA-256
24
H0(0) H1
(0) H2(0) H3
(0)
6A09E667 BB67AE85 3C6EF372 A54FF53A
H4(0) H5
(0) H6(0) H7
(0)
510E527F 9B05688C 1F83D9AB 5BE0CD19
Hash Calculation: SHA-256 may be used to hash a message, M, having a length of l
bits, where 0<= l<=264 the algorithm uses:
1. A message schedule of 64x32-bit words. The words of the message schedule are
labelled W0, W1 …W64.
2. Eight working variables of 32-bits each. The working variables are labelled as:
A, B, C, D, E, F, G,H .
3. A hash value of eight 32-bit words. The words of the hash value are labelled as:
H0(i) , H1(i) , H2(i) , H3(i), H4(i),H5(i), H6(i), H7(i) which will hold the initial hash
value H(0), replaced by each intermediate hash value (after each message block is
processed) H(i) where i denotes the number of 512 bit block being processed in the
message M, and ending with the final hash value, H(N) where N is the number of the
last 512 bit block in the message M.
4. A single temporary word, T.
5. Previously defined constants which are labelled Kt where t is the round number. The
calculation is carried out as follows:
The message schedule is prepared i.e. the message word that is going to be used in that
round is prepared. This computation is done as described in the following formula:
25
Mt (i) 0 < t < 15
Wt=
σ t(256) (Wt-2)+Wt-7+σ 0
(256)(Wt-15)+Wt-16 16 < t < 63
in the above formula Mt denotes the tth 32-bit message word of the 512-bit message
block in the message M. The 8 working variables A, B, C, D, E, F, G, and H that are
going to be used in the computation are prepared as follows:
A = H0 (i-1)
B = H1 (i-1)
C = H2 (i-1)
D = H3 (i-1)
E = H4 (i-1)
F= H5 (i-1)
G= H6 (i-1)
H= H7 (i-1)
After these initializations, the final values of the working variables for that round are
calculated as described below:
T1= h+∑1(256) (e) + Ch (e, f, g) + Kt
(256) + Wt
T2 = ∑0 (256) (a) + Maj (a, b, c)
h = g
g = f
f = e
e = d + T1
d = c
c = b
26
b = a
a = T1+T2
As the final step, intermediate hash values are calculated as described below:
H0 (i) = A + H0
(i-1)
H1 (i) = B + H1
(i-1)
H2 (i) = C + H2
(i-1)
H3 (i) = D + H3
(i-1)
H4 (i) = E + H4
(i-1)
H5 (i) =F+H5
(i-1)
H6 (i) =G+H6
(i-1)
H7 (i) =h+H7
(i-1)
After 64 rounds the hash value of the incoming 512 bit message block is obtained.
Basic SHA-256 computation flow described above is shown below in Figure 3-9:
27
Figure 3-9 SHA-256 Computational Flow
3.2 Different hash implementations:
Hash functions can be implemented either in hardware or software.
Implementing hash functions completely in software is easier than implementing them
in hardware. However, since data rates increase and security protocols become more
and more complex, software implementations of hash functions cannot satisfy the
speed requirements of applications such as embedded systems, network routers and
online databases. Furthermore, providing security is another very important issue.
System implementation itself should be very secure even if in case of an attack.
Software implementations of hash functions cannot provide that degree of security
since access and modification are easier. When all these aspects are considered, it is
seen that it is desirable to implement hash functions in hardware in order to satisfy the
28
speed requirements of the systems and at the same time provide security. Hardware
implementations of hash functions are more secure, since access and modification are
harder. Additionally, power consumption is lesser and throughput is higher. Hardware
implementations of hash functions can be divided into two groups: classical
implementations and reconfigurable (reprogrammable) implementations. Classical
implementations are completely custom designs on Application Specific Integrated
Circuits (ASICs) and reconfigurable implementations are on FPGAs. When compared
in performance wise, it is found that ASICs exhibit the best performance, FPGAs are
close to ASICs and software implementations are the worst of all. On the other hand,
when development cost is considered, software development cost is the least, it is a bit
higher for FPGAs and development of the ASICs is the most expensive. When
considered in terms of flexibility, ASICs are the worst, software implementations are
the most flexible ones and FPGAs are close to software implementations since
they are reconfigurable structures. According to these judgments, it is obvious that
FPGA implementations have the advantages of both hardware and software.
Implementation of hash functions on reconfigurable platforms such as FPGAs brings
some advantages. These advantages can be listed as follows:
Ease of algorithm modification: Any modifications can be made easily due to
the reconfigurable nature of the FPGAs.
Architecture efficiency
Resource efficiency: FGPA implementation of hash functions require less
resources in the development phase
Cost efficiency: FPGA implementations are cost effective since they have
shorter design lead time
High throughput: FPGA implementations work at high speeds, so exhibit high
throughput
Hash function implementation on hardware is a very active research area and
various implementations exist in the literature. These implementations differ from
each other according to the specifications such as area, speed, through put, complexity
of design and power consumption. Although there are some differences between
the implementation of complex arithmetic and logic functions, main hardware
blocks in each design are similar. The general block diagram of a hash function
29
implementation is illustrated in Figure 3-10:
Figure: 3-10 General Block Diagram for a Hash Function Implementation
In general the message is input to the hardware as 32-bit message words. The
padding unit counts the incoming message words and makes necessary computations
described t-o pad the message and prepare the message blocks. The prepared message
block is usually stored in a RAM block. The size of the RAM block is dependent on
the algorithm implemented. The constants required by the algorithm are kept in
an array and this array is usually implemented as a ROM: The complex arithmetic
and logic computations required by the algorithm to prepare the message schedule and
30
hash the incoming message are carried out in the hash calculation block. The control
block provides necessary control signals for the padding unit, message ram, hash
calculation block and constants array.
3.2.1 Commercial hash function implementations
There are various hash function implementations in the market. Differ from
each other according to their capabilities. Some commercial hash function
implementations are listed in Table 3-3.
Cast Inc. has two hash function cores in the market they are SHA-1 and
SHA-256 hash function cores. Both of these cores calculates the digest of messages of
any length smaller than 264- 1 bits and message lengths should be multiple of 8 bits.
Bit padding operation is provided with both cores. SHA-1 calculations is completed
in 82 clock cycles and SHA-256 calculation is completed in 66 clock cycles
Both cores are implemented and tested on various FPGA families and results are
provided in the product datasheets. The SHA-256 and SHA-1 implementations are
available as soft cores (synthesizable HDL) for ASIC technologies and as firm cores
(net list) for FPGA technologies, and include everything required for successful
implementation. The functional description of the cores is as follows:
Both cores accept input message as 32-bit words and when a block of 512 bits
is completed, input stream is paused and hash calculation is carried out. When
processing of the 512 bit block is completed and core permits the input data to be fed
again. On the final message block when the last 32-bit word is input, the core must be
indicated that this is the last message word and the number of valid bytes in the last
message word must be input so that padding unit knows how many bytes to pad.
Technology Limited has two hash function cores in the market named as Helion Tiny
Hashing core and Helion Fast Hashing core. The first core supports SHA-1, SHA-
224, SHA-256 and MD5 with or without HMAC hashing. The user can select one of
these hash functions using the proper input on the core. The core is available with
either 8, 16 or 32 bit data interfaces. Input message words are stored in a 512 bit block
RAM in the core. After a 512 bit data block loaded, it is processed according to the
algorithm selected by the micro-coded controller. This controller executes a sequence
of instructions which perform a series of computations on the data block using a
31
specially designed Arithmetic Logic Unit (ALU). The core is implemented on various
Xilinx family FPGA’s and the implementation results are provided in the product
datasheet. The Helion Fast Hashing core has five modes of operation, these are: SHA-
1 hashing, SHA-256 hashing, MD5 hashing, dual mode (SHA-1 and SHA-256) and
dual mode (SHA-1 and MD5). In all of these modes, the message is input to the core
as 32-bit words. Once a 512-bit message block has been loaded, hash calculation
begins. In the hash calculation process a sequence of complex arithmetic and logic
functions are applied to the message words over a number of iterations. In each
iteration intermediate results of the chaining variables are stored and at the end of the
each block processing these are used to compute the running digest. The core is
implemented on various Xilinx family FPGA’s and the implementation results are
provided in the product datasheet.
Aldec Inc. has an SHA-1 IP core in the market. The core supports only SHA-1
hashing. 512-bit message blocks are processed in 81 clock cycles. Data is input to the
core as 32-bit message words. VHDL /Verilog source code, technology-dependent
EDIF and VHDL/Verilog net lists and software emulator of SHA core are delivered to
the user.
Ocean Logic Pty. Ltd. has SHA-1 and SHA-256 hash function cores in the
market. In both of the cores message is input to the core as 32-bit words. The SHA-1
calculation is completed in 81 clock cycles and SHA-256 calculation is completed in
65 clock cycles. The core is implemented on various Xilinx FPGAs also implemented
as ASIC. The results of these implementations are given below in Table 3-3.
Sci-worx has a SHA-1 function core in the market. The core supports only
SHA-1 hashing. 512-bit message blocks are processed in 81 clock cycles. Data is
input to the core as 32-bit message words. VHDL /Verilog source codes are delivered
to the user.
Table 3-3 Commercial Hash Function Cores
32
Vendor Supported
hash
function
Supported
platforms
Throughput Year
CAST Inc. SHA_1 ASIC/FPGA 6.24Mbps/MHz Oct-
2007
CAST Inc. SHA-256 ASIC/FPGA 7.75Mbps/MHz Oct-
2007
HDL
design
house
SHA-1 ASIC/FPGA/SOC 6.4 Mbps/MHz Dec-
2002
Hellion
technology
limited
SHA-1
only
SHA-256
only
MD5 only
Dual mode
(selectable
SHA-1 and
SHA-256
dual mode)
(selectable
SHA-1 and
MD5 dual
FPGA SHA-1:
6.24Mbps/MHz
SHA-256:
7.75Mbps/MHz
MD5:
7.75Mbps/MHz
July-
2005
33
mode)
Hellion
technology
limited
Supports
MD5,
SHA-1
SHA-224
and SHA-
256 hash
algorithms
FPGA SHA-1:
0.201Mbps/MHZ
SHA-224:
0.16 Mbps/MHz
SHA-256:
0.16 Mbps/MHZ
MD5:
0.31 Mbps/MHz
July-
2005
Aldec, Inc. SHA-1 FPGA - 2006
Ocean
logic
private Ltd
SHA-256 FPGA/ASIC 6.325Mbps/MHz
For ASIC 0.18 u
process
6.32Mbps/MHZ
for Xilinx
vertex E-8
6.96 Mbps/MHz
for Xilinx
2005
34
vertex II-5
Ocean
logic
private
limited
SHA-1 FPGA/ASIC 6.55 Mbps/MHz
for ASIC 0.18 u
process
5.5 Mbps/MHz
for Xilinx
vertex E-8
6.96 Mbit/s for
Xilinx
vertex II-5
2005
Sci-worx SHA-1 FPGA 6.24Mbps/MHz
35
CHAPTER-4
ALGORITHM FOR SHA-2564.1 SHA-256 Example
4.1.1 One-Block Message
Let the message, M, be the 24-bit (l = 24) ASCII string "abc", which is equivalent to
the following binary string:
01100001 01100010 01100011.
The message is padded by appending a "1" bit, followed by 423 "0" bits, and ending
with the hex value 00000000 00000018 (the two 32-bit word representation of
the length, 24). Thus, the final padded message consists of one block (N = 1).
For SHA-256, the initial hash value,
H0 (0) = 6A09E667
H1 (0) = BB67AE85
H2 (0) = 3C6EF372
H3 (0) = A54FF53A
H4 (0) = 510E527F
H5 (0) = 9B05688C
H6 (0) = 1F83D9AB
H7 (0) = 5BE0CD19
The words of the padded message block are then assigned to the words W0… W15 of
the message Schedule:
W0 = 61626380
W1 = 00000000
W2 = 00000000
W3 = 00000000
W4 = 00000000
W5 = 00000000
W6 = 00000000
W7 = 00000000
W8 = 00000000
36
W9 = 00000000
W10 = 00000000
W11 = 00000000
W12 = 00000000
W13 = 00000000
W14 = 00000000
W15 = 00000018.
The following schedule shows the hex values for a, b, c, d, e, f, g, and h after pass t of
the “for t = 0 to 63” loop described in.
a b c d e f g ht = 0: 5d6aebcd 6a09e667 bb67ae85 3c6ef372 fa2a4622 510e527f 9b05688c 1f83d9ab
t = 1: 5a6ad9ad 5d6aebcd 6a09e667 bb67ae85 78ce7989 fa2a4622 510e527f 9b05688c
t = 2: c8c347a7 5a6ad9ad 5d6aebcd 6a09e667 f92939eb 78ce7989 fa2a4622 510e527f
t = 3: d550f666 c8c347a7 5a6ad9ad 5d6aebcd 24e00850 f92939eb 78ce7989 fa2a4622
t = 4: 04409a6a d550f666 c8c347a7 5a6ad9ad 43ada245 24e00850 f92939eb 78ce7989
t = 5: 2b4209f5 04409a6a d550f666 c8c347a7 714260ad 43ada245 24e00850 f92939eb
t = 6: e5030380 2b4209f5 04409a6a d550f666 9b27a401 714260ad 43ada245 24e00850
t = 7: 85a07b5f e5030380 2b4209f5 04409a6a 0c657a79 9b27a401 714260ad 43ada245
t = 8 : 8e04ecb9 85a07b5f e5030380 2b4209f5 32ca2d8c 0c657a79 9b27a401 714260ad
t = 9: 8c87346b 8e04ecb9 85a07b5f e5030380 1cc92596 32ca2d8c 0c657a79 9b27a401
t = 10: 4798a3f4 8c87346b 8e04ecb9 85a07b5f 436b23e8 1cc92596 32ca2d8c 0c657a79
t = 11: f71fc5a9 4798a3f4 8c87346b 8e04ecb9 816fd6e9 436b23e8 1cc92596 32ca2d8c
t = 12: 87912990 f71fc5a9 4798a3f4 8c87346b 1e578218 816fd6e9 436b23e8 1cc92596
t = 13: d932eb16 87912990 f71fc5a9 4798a3f4 745a48de 1e578218 816fd6e9 436b23e8
t = 14: c0645fde d932eb16 87912990 f71fc5a9 0b92f20c 745a48de 1e578218 816fd6e9
t = 15: b0fa238e c0645fde d932eb16 87912990 07590dcd 0b92f20c 745a48de 1e578218
t = 16: 21da9a9b b0fa238e c0645fde d932eb16 8034229c 07590dcd 0b92f20c 745a48de
t = 17: c2fbd9d1 21da9a9b b0fa238e c0645fde 846ee454 8034229c 07590dcd 0b92f20c
t = 18: fe777bbf c2fbd9d1 21da9a9b b0fa238e cc899961 846ee454 8034229c 07590dcd
t = 19: e1f20c33 fe777bbf c2fbd9d1 21da9a9b b0638179 cc899961 846ee454 8034229c
t = 20: 9dc68b63 e1f20c33 fe777bbf c2fbd9d1 8ada8930 b0638179 cc899961 846ee454
t = 21: c2606d6d 9dc68b63 e1f20c33 fe777bbf e1257970 8ada8930 b0638179 cc899961
t = 22: a7a3623f c2606d6d 9dc68b63 e1f20c33 49f5114a e1257970 8ada8930 b0638179
t = 23: c5d53d8d a7a3623f c2606d6d 9dc68b63 aa47c347 49f5114a e1257970 8ada8930
t = 24: 1c2c2838 c5d53d8d a7a3623f c2606d6d 2823ef91 aa47c347 49f5114a e1257970
t = 25: cde8037d 1c2c2838 c5d53d8d a7a3623f 14383d8e 2823ef91 aa47c347 49f5114a
37
t = 26: b62ec4bc cde8037d 1c2c2838 c5d53d8d c74c6516 14383d8e 2823ef91 aa47c347
t = 27: 77d37528 b62ec4bc cde8037d 1c2c2838 edffbff8 c74c6516 14383d8e 2823ef91
t = 28: 363482c9 77d37528 b62ec4bc cde8037d 6112a3b7 edffbff8 c74c6516 14383d8e
t = 29: a0060b30 363482c9 77d37528 b62ec4bc ade79437 6112a3b7 edffbff8 c74c6516
t = 30: ea992a22 a0060b30 363482c9 77d37528 0109ab3a ade79437 6112a3b7 edffbff8
t = 31: 73b33bf5 ea992a22 a0060b30 363482c9 ba591112 0109ab3a ade79437 6112a3b7
t = 32: 98e12507 73b33bf5 ea992a22 a0060b30 9cd9f5f6 ba591112 0109ab3a ade79437
t = 33: fe604df5 98e12507 73b33bf5 ea992a22 59249dd3 9cd9f5f6 ba591112 0109ab3a
t = 34: a9a7738c fe604df5 98e12507 73b33bf5 085f3833 59249dd3 9cd9f5f6 ba591112
t = 35: 65a0cfe4 a9a7738c fe604df5 98e12507 f4b002d6 085f3833 59249dd3 9cd9f5f6
t = 36: 41a65cb1 65a0cfe4 a9a7738c fe604df5 0772a26b f4b002d6 085f3833 59249dd3
t = 37: 34df1604 41a65cb1 65a0cfe4 a9a7738c a507a53d 0772a26b f4b002d6 085f3833
t = 38: 6dc57a8a 34df1604 41a65cb1 65a0cfe4 f0781bc8 a507a53d 0772a26b f4b002d6
t = 39: 79ea687a 6dc57a8a 34df1604 41a65cb1 1efbc0a0 f0781bc8 a507a53d 0772a26b
t = 40: d6670766 79ea687a 6dc57a8a 34df1604 26352d63 1efbc0a0 f0781bc8 a507a53d
t = 41: df46652f d6670766 79ea687a 6dc57a8a 838b2711 26352d63 1efbc0a0 f0781bc8
t = 42: 17aa0dfe df46652f d6670766 79ea687a decd4715 838b2711 26352d63 1efbc0a0
t = 43: 9d4baf93 17aa0dfe df46652f d6670766 fda24c2e decd4715 838b2711 26352d63
t = 44: 26628815 9d4baf93 17aa0dfe df46652f a80f11f0 fda24c2e decd4715 838b2711
t = 45: 72ab4b91 26628815 9d4baf93 17aa0dfe b7755da1 a80f11f0 fda24c2e decd4715
t = 46: a14c14b0 72ab4b91 26628815 9d4baf93 d57b94a9 b7755da1 a80f11f0 fda24c2e
t = 47: 4172328d a14c14b0 72ab4b91 26628815 fecf0bc6 d57b94a9 b7755da1 a80f11f0
t = 48: 05757ceb 4172328d a14c14b0 72ab4b91 bd714038 fecf0bc6 d57b94a9 b7755da1
t = 49: f11bfaa8 05757ceb 4172328d a14c14b0 6e5c390c bd714038 fecf0bc6 d57b94a9
t = 50: 7a0508a1 f11bfaa8 05757ceb 4172328d 52f1ccf7 6e5c390c bd714038 fecf0bc6
t = 51: 886e7a22 7a0508a1 f11bfaa8 05757ceb 49231c1e 52f1ccf7 6e5c390c bd714038
t = 52: 101fd28f 886e7a22 7a0508a1 f11bfaa8 529e7d00 49231c1e 52f1ccf7 6e5c390c
t = 53: f5702fdb 101fd28f 886e7a22 7a0508a1 9f4787c3 529e7d00 49231c1e 52f1ccf7
t = 54: 3ec45cdb f5702fdb 101fd28f 886e7a22 e50e1b4f 9f4787c3 529e7d00 49231c1e
t = 55: 38cc9913 3ec45cdb f5702fdb 101fd28f 54cb266b e50e1b4f 9f4787c3 529e7d00
t = 56: fcd1887b 38cc9913 3ec45cdb f5702fdb 9b5e906c 54cb266b e50e1b4f 9f4787c3
t = 57: c062d46f fcd1887b 38cc9913 3ec45cdb 7e44008e 9b5e906c 54cb266b e50e1b4f
t = 58: ffb70472 c062d46f fcd1887b 38cc9913 6d83bfc6 7e44008e 9b5e906c 54cb266b
t = 59: b6ae8fff ffb70472 c062d46f fcd1887b b21bad3d 6d83bfc6 7e44008e 9b5e906c
t = 60: b85e2ce9 b6ae8fff ffb70472 c062d46f 961f4894 b21bad3d 6d83bfc6 7e44008e
t = 61: 04d24d6c b85e2ce9 b6ae8fff ffb70472 948d25b6 961f4894 b21bad3d 6d83bfc6
t = 62: d39a2165 04d24d6c b85e2ce9 b6ae8fff fb121210 948d25b6 961f4894 b21bad3d
t = 63: 506e3058 d39a2165 04d24d6c b85e2ce9 5ef50f24 fb121210 948d25b6 961f4894
38
That completes the processing of the first and only message block, M (1). The
final hash value, H (1) is calculated to be
H0 (1) = 6a09e667 + 506e3058 = ba7816bf
H1 (1) = bb67ae85 + d39a2165 = 8f01cfea
H2 (1) = 3c6ef372 + 04d24d6c = 414140de
H3 (1) = a54ff53a + b85e2ce9 = 5dae2223
H4 (1) = 510e527f + 5ef50f24 = b00361a3
H5 (1) = 9b05688c + fb121210 = 96177a9c
H6 (1) = 1f83d9ab + 948d25b6 = b410ff61
H7 (1) = 5be0cd19 + 961f4894 = f20015ad.
The resulting 256-bit message digest is
ba7816bf 8f01cfea 414140de 5dae2223 b00361a3 96177a9c
b410ff61 f20015ad.
39
CHAPTER-5
SIMULATION AND SYNTHESIS RESULTS
5.1 Introduction:This chapter gives the behavioural stimulation of the Secure Hash Algorithm
(SHA)-256 results in the form of wave forms. Stimulated and synthesised by using
Xilinx ISE stimulator and test bench has been written which inserts the input plain
text and input keys, which in output gives the cipher text as encoded data.
5.2 Port description
This section gives the outline regarding the names of the ports which we used
in this project SHA-256
S No. Mode Size Port description
1 Input <264 Plain text
2 Input 1 Appending
3 Output [0:255] Encrypted data
Table 5-1 Port description
5.3 Functional description
Above table shows us the ports we used for the SHA-256. By observing this
we can understand that the input is a plain text which is used with the name abc which
is of 24 bits length as [0:23] e.g. abc i.e. (011000010110001001100011) which is in
hexadecimal, output is 256 bit.
40
5.4 Output waveform of SHA-256:
Figure 5-1 Output Waveform Of SHA-256
41
SYNTHESIS RESULT:=============================================================
* Synthesis Options Summary *
=============================================================
---- Source Parameters
Input File Name : "sha256_final.prj"
Input Format : mixed
Ignore Synthesis Constraint File : NO
---- Target Parameters
Output File Name : "sha256_final"
Output Format : NGC
Target Device : xc3s100e-4-vq100
---- Source Options
Top Module Name : sha256_final
Automatic FSM Extraction : YES
FSM Encoding Algorithm : Auto
FSM Style : lut
RAM Extraction : Yes
RAM Style : Auto
ROM Extraction : Yes
ROM Style : Auto
Mux Extraction : YES
Decoder Extraction : YES
42
Priority Encoder Extraction : YES
Shift Register Extraction : YES
Logical Shifter Extraction : YES
XOR Collapsing : YES
Resource Sharing : YES
Multiplier Style : auto
Automatic Register Balancing : No
---- Target Options
Add IO Buffers : YES
Global Maximum Fanout : 500
Add Generic Clock Buffer(BUFG) : 8
Register Duplication : YES
Equivalent register Removal : YES
Slice Packing : YES
Pack IO Registers into IOBs : auto
---- General Options
Optimization Goal : Speed
Optimization Effort : 1
Keep Hierarchy : NO
Global Optimization : AllClockNets
RTL Output : Yes
Write Timing Constraints : NO
Hierarchy Separator : /
Bus Delimiter : <>
Case Specifier : maintain
43
Slice Utilization Ratio : 100
Slice Utilization Ratio Delta : 5
---- Other Options
lso : sha256_final.lso
Read Cores : YES
cross_clock_analysis : NO
verilog2001 : YES
safe_implementation : No
Optimize Instantiated Primitives : NO
use_clock_enable : Yes
use_sync_set : Yes
use_sync_reset : Yes
enable_auto_floorplanning : No
=============================================================
* HDL Compilation *
=============================================================
Compiling verilog file "chinnu.v"
Module <sha256_final> compiled
No errors in compilation
Analysis of file <"sha256_final.prj"> succeeded.
=============================================================
* HDL Analysis *
=============================================================
Analyzing top module <sha256_final>.
44
l = 24
p = 423
Module <sha256_final> is correct for synthesis.
Set property "resynthesize = true" for unit <sha256_final>.
=============================================================
* HDL Synthesis *
=============================================================
INFO:Xst:1304 - Contents of register <Wt> in unit <sha256_final> never changes during circuit operation. The register is replaced by logic.
INFO:Xst:1304 - Contents of register <busy> in unit <sha256_final> never changes during circuit operation. The register is replaced by logic.
INFO:Xst:1304 - Contents of register <W0> in unit <sha256_final> never changes during circuit operation. The register is replaced by logic.
Synthesizing Unit <sha256_final>.
Found 32-bit register for signal <A>.
Found 32-bit register for signal <B>.
Found 32-bit register for signal <C>.
Found 32-bit register for signal <D>.
Found 32-bit register for signal <E>.
Found 32-bit register for signal <F>.
Found 32-bit register for signal <G>.
Found 32-bit register for signal <H>.
Found 2-bit register for signal <round>.
Found 32-bit register for signal <Wt>.
Summary:
inferred 290 D-type flip-flop(s).
Unit <sha256_final> synthesized.
45
=============================================================
* Advanced HDL Synthesis *
=============================================================
Advanced RAM inference ...
Advanced multiplier inference ...
Advanced Registered AddSub inference ...
Dynamic shift register inference ...
* HDL Synthesis Report *
=============================================================
Macro Statistics
# Registers : 8
32-bit register :
=============================================================
* Low Level Synthesis *
=============================================================
Optimizing unit <sha256_final>...
Loading device for application Rf_Device from file '3s100e.nph' in environment C:/Xilinx.
Mapping all equations...
Building and optimizing final netlist ...
Found area constraint ratio of 100 (+ 5) on block sha256_final, actual ratio is 0.
FlipFlop H_0 has been replicated 135 time(s) to handle iob=true attribute.
=============================================================
* Final Report *
=============================================================
Final Results
RTL Top Level Output File Name : sha256_final.ngr
46
Top Level Output File Name : sha256_final
Output Format : NGC
Optimization Goal : Speed
Keep Hierarchy : NO
Design Statistics
# IOs : 258
Macro Statistics :
# Registers : 8
# 32-bit register : 8
Cell Usage :
# BELS : 2
# GND : 1
# VCC : 1
# FlipFlops/Latches : 136
# FDR : 136
# Clock Buffers : 1
# BUFGP : 1
# IO Buffers : 257
# IBUF : 1
# OBUF : 256
=============================================================
* Device utilization summary *
=============================================================
Selected Device : 3s100evq100-4
Number of Slices: 79 out of 960 8%
Number of Slice Flip Flops: 136 out of 1920 7%
Number of bonded IOBs: 258 out of 66 390% (*)
47
Number of GCLKs: 1 out of 24 4%
=============================================================
TIMING REPORT
=============================================================
Clock Information:
-----------------------------------+------------------------+-------+
Clock Signal | Clock buffer(FF name) | Load |
-----------------------------------+------------------------+-------+
clk_i | BUFGP | 136 |
-----------------------------------+------------------------+-------+
Timing Summary:
---------------
Speed Grade: -4
Minimum period: No path found
Minimum input arrival time before clock: 5.586ns
Maximum output required time after clock: 6.198ns
Maximum combinational path delay: No path found
Timing Detail:
All values displayed in nanoseconds (ns)
=============================================================
Timing constraint: Default OFFSET IN BEFORE for Clock 'clk_i'
Total number of paths / destination ports: 136 / 136
-------------------------------------------------------------------------
Offset: 5.586ns (Levels of Logic = 1)
Source: rst_i (PAD)
48
Destination: H_0 (FF)
Destination Clock: clk_i rising
Data Path: rst_i to H_0
Gate Net
Cell:in->out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- ------------
IBUF:I->O 136 1.930 2.449 rst_i_IBUF (rst_i_IBUF)
FDR:R 1.207 H_0
----------------------------------------
Total 5.586ns (3.137ns logic, 2.449ns route)
(56.2% logic, 43.8% route)
Timing constraint: Default OFFSET OUT AFTER for Clock 'clk_i'
Total number of paths / destination ports: 136 / 136
-------------------------------------------------------------------------
Offset : 6.198ns (Levels of Logic = 1)
Source : H_0_1 (FF)
Destination : SHA256_result<254> (PAD)
Source Clock: clk_i rising
Data Path: H_0_1 to SHA256_result<254>
Gate Net
Cell:in->out fanout Delay Delay Logical Name (Net Name)
---------------------------------------- ------------
FDR:C->Q 1 0.522 0.801 H_0_1 (H_0_1)
OBUF:I->O 4.875 SHA256_result_254_OBUF (SHA256_result<254>)
49
----------------------------------------
Total 6.198ns (5.397ns logic, 0.801ns route)
(87.1% logic, 12.9% route)
=============================================================
CPU : 5.18 / 5.32 s | Elapsed : 5.00 / 5.00 s
Total memory usage is 160084 kilobytes
Number of errors : 0 ( 0 filtered)
Number of warnings : 0 ( 0 filtered)
Number of infos : 3 ( 0 filtered)
RTL Schematic View:
50
CONCLUSION
My goal of design is to compress a long message to become a short and safe
message abstract with an acceptable throughput. We choose to implement our
architecture on Xilinx Vertex E family FPGA devices. We tried different im-
plementation strategies for several component used in the architecture to get the best
results. SHA is famous message compress standard used in computer cryptography. Its
improved version SHA-256 algorithm has been analyzed in this work, and implied by
HDL (hardware description language). xilinx 7.1 ISE is used to synthesis the module,
and then generated RTL level description circuit and simulated waveform.
51
APPENDIX - I
Source code for SHA-256 algorithm:
`define SHA256_H0 32'h6a09e667
`define SHA256_H1 32'hbb67ae85
`define SHA256_H2 32'h3c6ef372
`define SHA256_H3 32'ha54ff53a
`define SHA256_H4 32'h510e527f
`define SHA256_H5 32'h9b05688c
`define SHA256_H6 32'h1f83d9ab
`define SHA256_H7 32'h5be0cd19
`define K00 32'h428a2f98
`define K01 32'h71374491
`define K02 32'hb5c0fbcf
`define K03 32'he9b5dba5
`define K04 32'h3956c25b
`define K05 32'h59f111f1
`define K06 32'h923f82a4
`define K07 32'hab1c5ed5
`define K08 32'hd807aa98
`define K09 32'h12835b01
52
`define K10 32'h243185be
`define K11 32'h550c7dc3
`define K12 32'h72be5d74
`define K13 32'h80deb1fe
`define K14 32'h9bdc06a7
`define K15 32'hc19bf174
`define K16 32'he49b69c1
`define K17 32'hefbe4786
`define K18 32'h0fc19dc6
`define K19 32'h240ca1cc
`define K20 32'h2de92c6f
`define K21 32'h4a7484aa
`define K22 32'h5cb0a9dc
`define K23 32'h76f988da
`define K24 32'h983e5152
`define K25 32'ha831c66d
`define K26 32'hb00327c8
`define K27 32'hbf597fc7
`define K28 32'hc6e00bf3
`define K29 32'hd5a79147
`define K30 32'h06ca6351
`define K31 32'h14292967
`define K32 32'h27b70a85
53
`define K33 32'h2e1b2138
`define K34 32'h4d2c6dfc
`define K35 32'h53380d13
`define K36 32'h650a7354
`define K37 32'h766a0abb
`define K38 32'h81c2c92e
`define K39 32'h92722c85
`define K40 32'ha2bfe8a1
`define K41 32'ha81a664b
`define K42 32'hc24b8b70
`define K43 32'hc76c51a3
`define K44 32'hd192e819
`define K45 32'hd6990624
`define K46 32'hf40e3585
`define K47 32'h106aa070
`define K48 32'h19a4c116
`define K49 32'h1e376c08
`define K50 32'h2748774c
`define K51 32'h34b0bcb5
`define K52 32'h391c0cb3
`define K53 32'h4ed8aa4a
`define K54 32'h5b9cca4f
`define K55 32'h682e6ff3
54
`define K56 32'h748f82ee
`define K57 32'h78a5636f
`define K58 32'h84c87814
`define K59 32'h8cc70208
`define K60 32'h90befffa
`define K61 32'ha4506ceb
`define K62 32'hbef9a3f7
`define K63 32'hc67178f2
module sha256_final(clk_i,rst_i,SHA256_result);
input clk_i; // global clock input
input rst_i;
output SHA256_result;
reg [6:0] round;
wire [6:0] round_plus_1;
reg [31:0] H0,H1,H2,H3,H4,H5,H6,H7;
reg [31:0] Wt,Kt;
reg busy;
reg [31:0]
W0,W1,W2,W3,W4,W5,W6,W7,W8,W9,W10,W11,W12,W13,W14,W15;
reg [31:0] A,B,C,D,E,F,G,H;
wire [31:0]
M0,M1,M2,M3,M4,M5,M6,M7,M8,M9,M10,M11,M12,M13,M14,M15;
parameter l=8*3;
55
reg [(l-1):0] message;
initial
begin
message="abc";
end
wire [l:0] s;
assign s={message,1'b1};
parameter p=448-(l+1);
wire [447:0] out;
assign out={s,{p{1'b0}}};
wire [63:0] out1;
assign out1=l;
wire [511:0] out2;
assign out2={out,out1};
assign M0 = out2[511:480] ;
assign M1 = out2[479:448] ;
assign M2 = out2[447:416];
assign M3 = out2[415:384] ;
assign M4 = out2[383:352] ;
assign M5 = out2[351:320];
assign M6 = out2[319:288] ;
assign M7 = out2[287:256] ;
assign M8 = out2[255:224];
56
assign M9 = out2[223:192];
assign M10 = out2[191:160];
assign M11 = out2[159:128];
assign M12 = out2[127:96];
assign M13 = out2[95:64];
assign M14 = out2[63:32];
assign M15 = out2[31:0] ;
wire [31:0]
f1_EFG_32,f2_ABC_32,f3_A_32,f4_E_32,f5_W1_32,f6_W14_32,T1_32,T2_32;
wire [31:0] next_Wt,next_E,next_A;
wire [255:0] SHA256_result;
assign f1_EFG_32 = (E & F) ^ (~E & G);
assign f2_ABC_32 = (A & B) ^ (B & C) ^ (A & C);
assign f3_A_32 = {A[1:0],A[31:2]} ^ {A[12:0],A[31:13]} ^ {A[21:0],A[31:22]};
assign f4_E_32 = {E[5:0],E[31:6]} ^ {E[10:0],E[31:11]} ^ {E[24:0],E[31:25]};
assign f5_W1_32 = {W1[6:0],W1[31:7]} ^ {W1[17:0],W1[31:18]} ^
{3'b000,W1[31:3]};
assign f6_W14_32 = {W14[16:0],W14[31:17]} ^ {W14[18:0],W14[31:19]} ^
{10'b00_0000_0000,W14[31:10]};
assign T1_32 = H[31:0] + f4_E_32 + f1_EFG_32 + Kt + Wt;
assign T2_32 = f3_A_32 + f2_ABC_32;
assign next_Wt = f6_W14_32 + W9[31:0] + f5_W1_32 + W0[31:0];
assign next_E = D[31:0] + T1_32;
assign next_A = T1_32 + T2_32;
57
assign SHA256_result = {A,B,C,D,E,F,G,H};
assign round_plus_1 = round + 1;
always@(posedge clk_i)
begin
if (rst_i)
begin
W1 <= 'b0;
W2 <= 'b0;
W3 <= 'b0;
W4 <= 'b0;
W5 <= 'b0;
W6 <= 'b0;
W7 <= 'b0;
W8 <= 'b0;
W9 <= 'b0;
W10 <= 'b0;
W11 <= 'b0;
W12 <= 'b0;
W13 <= 'b0;
W14 <= 'b0;
W15 <= 'b0;
Wt <= 'b0;
58
A <= 'b0;
B <= 'b0;
C <= 'b0;
D <= 'b0;
E <= 'b0;
F <= 'b0;
G <= 'b0;
H <= 'b0;
H0 <= 'b0;
H1 <= 'b0;
H2 <= 'b0;
H3 <= 'b0;
H4 <= 'b0;
H5 <= 'b0;
H6 <= 'b0;
H7 <= 'b0;
round <= 'd0;
busy <= 'b0;
end
else
begin
59
W0<=M0;
W1<=M1;
W2<=M2;
W3<=M3;
W4<=M4;
W5<=M5;
W6<=M6;
W7<=M7;
W8<=M8;
W9<=M9;
W10<=M10;
W11<=M11;
W12<=M12;
W13<=M13;
W14<=M14;
W15<=M15;
H0 <= `SHA256_H0;
H1 <= `SHA256_H1;
H2 <= `SHA256_H2;
H3 <= `SHA256_H3;
H4 <= `SHA256_H4;
H5 <= `SHA256_H5;
H6 <= `SHA256_H6;
60
H7 <= `SHA256_H7;
A <= `SHA256_H0;
B <= `SHA256_H1;
C <= `SHA256_H2;
D <= `SHA256_H3;
E <= `SHA256_H4;
F <= `SHA256_H5;
G <= `SHA256_H6;
H <= `SHA256_H7;
round <= 'd0;
end
end
always@(posedge clk_i)
begin
case (round)
'd0:
begin
if(rst_i)
begin
W0 <= 'b0;
Wt <= 'b0;
end
61
else begin
Wt <= M0;
round <= round_plus_1;
end
end
'd1:
begin
H <= G;
G <= F;
F <= E;
E <= next_E;
D <= C;
C <= B;
B <= A;
A <= next_A;
Wt <= M1;
round<=round_plus_1;
end
'd2:
begin
H <= G;
G <= F;
62
F <= E;
E <= next_E;
D <= C;
C <= B;
B <= A;
A <= next_A;
Wt <= M2;
round <= round_plus_1;
end
'd3:
begin
H <= G;
G <= F;
F <= E;
E <= next_E;
D <= C;
C <= B;
B <= A;
A <= next_A;
Wt <= M3;
round <= round_plus_1;
end
63
same process will be repeated for d4,d5,d6,d7,d8,d9,d10,d11,d12,d13,d14,d15,d16,
d17,d18,d19,d20,d21,d22,d23,d24,d25,d26,d27,d28,d29,d30,d31,d32,d33,d34,d35,d3
6,d37,d38,d39,d40,d41,d42,d43,d44,d45,d46,d47,d48,d49,d50,d51,d52,d53,d54,d55,d
56,d57,d58,d59,d60,d61,d62,d63
'd64:
begin
H <= G;
G <= F;
F <= E;
E <= next_E;
D <= C;
C <= B;
B <= A;
A <= next_A;
A <= next_A + H0;
B <= A + H1;
C <= B + H2;
D <= C + H3;
E <= next_E + H4;
F <= E + H5;
G <= F + H6;
H <= G + H7;
64
round <= 'd0;
end
default:
begin
round <= 'd0;
end
endcase
end
//------------------------------------------------------------------
// Kt generator
//------------------------------------------------------------------
always @ (posedge clk_i)
begin
if (rst_i)
begin
Kt <= 'b0;
end
else
begin
case (round)
'd0: Kt <= `K00;
'd1: Kt <= `K01;
'd2: Kt <= `K02;
65
'd3: Kt <= `K03;
'd4: Kt <= `K04;
'd5: Kt <= `K05;
'd6: Kt <= `K06;
'd7: Kt <= `K07;
'd8: Kt <= `K08;
'd9: Kt <= `K09;
'd10: Kt <= `K10;
'd11: Kt <= `K11;
'd12: Kt <= `K12;
'd13: Kt <= `K13;
'd14: Kt <= `K14;
'd15: Kt <= `K15;
'd16: Kt <= `K16;
'd17: Kt <= `K17;
'd18: Kt <= `K18;
'd19: Kt <= `K19;
'd20: Kt <= `K20;
'd21: Kt <= `K21;
'd22: Kt <= `K22;
'd23: Kt <= `K23;
'd24: Kt <= `K24;
'd25: Kt <= `K25;
66
'd26: Kt <= `K26;
'd27: Kt <= `K27;
'd28: Kt <= `K28;
'd29: Kt <= `K29;
'd30: Kt <= `K30;
'd31: Kt <= `K31;
'd32: Kt <= `K32;
'd33: Kt <= `K33;
'd34: Kt <= `K34;
'd35: Kt <= `K35;
'd36: Kt <= `K36;
'd37: Kt <= `K37;
'd38: Kt <= `K38;
'd39: Kt <= `K39;
'd40: Kt <= `K40;
'd41: Kt <= `K41;
'd42: Kt <= `K42;
'd43: Kt <= `K43;
'd44: Kt <= `K44;
'd45: Kt <= `K45;
'd46: Kt <= `K46;
'd47: Kt <= `K47;
'd48: Kt <= `K48;
67
'd49: Kt <= `K49;
'd50: Kt <= `K50;
'd51: Kt <= `K51;
'd52: Kt <= `K52;
'd53: Kt <= `K53;
'd54: Kt <= `K54;
'd55: Kt <= `K55;
'd56: Kt <= `K56;
'd57: Kt <= `K57;
'd58: Kt <= `K58;
'd59: Kt <= `K59;
'd60: Kt <= `K60;
'd61: Kt <= `K61;
'd62: Kt <= `K62;
'd63: Kt <= `K63;
default:Kt <= 'd0;
endcase
end
end
endmodule
68
REFERENCES :
[1]. Federal Information Processing Standards (FIPS) Publication 180-1, Secure
Hash Standard (SHS), U.S. DoC/NIST, April 17, 1995.
[2]. A. Menezes, P. van Oorschot, and S. Vanstone. Handbook of Applied
Cryptography, CRC Press, Inc., October 1997.
[3]. NIST, Federal Information Processing Standards Publication 180-2, Secure Hash
Standards (SHS), August 2001
[4]. D. R. Stinson. Cryptography: Theory and Practice. CRC Press LLC, 1995.
69
top related