cryptanalysis of hash functions using an approach based on sat-solvers (formal proposal, first...
TRANSCRIPT
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
1/22
Cryptanalysis of Hash Functions usingan approach based on SAT-solvers.
Jair Cazarin Villanueva 125535
Dr. Mauricio Osorio
February 2008
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
2/22
1. Introduction.
Cryptography is the study of mathematical techniques related to aspects of information
security such as confidentiality, data integrity, entity authentication, and data origin
authentication [1]. In other words, cryptography is about the prevention and detection ofcheating and other malicious activities. Cryptography hash functions produce hash
values, which concisely represent longer messages or documents from which they were
computed [2] and they are used for a variety purposes but mainly in cryptography.
Therefore, computer security depends heavily on the strength of hash functions.
Examples of hash functions are MD4 [7], MD5 [8] and the SHA family [9]. The main role
of cryptographic hash functions is in the provision of message integrity checks and digital
signature [3]. Its very clear that the formal analysis of their robustness is of outmostimportance, unfortunately several standard cryptographic hash functions were broken in
2005 [4]. Breaking in this case mean to find a way to efficiently produce different
messages which are mapped to the same hash value by some hash function, as would
compromise the security of applications in which this functions are used. So, the use of
other kind of approaches seems to be the next step toward a greater assurance of security.
On the other hand, weve Boolean Satisfiability (SAT) Solvers which attacks the problem
of determining if the variables of a given Boolean formula can be assigned in such a way
as to make the formula evaluate to true. The use of SAT-solvers in various applications is
increasing and a growing number of problems who were efficiently encoded into SAT are
successfully being tackled by these programs [5], furthermore the performances of the
algorithms of these programs have been increased [6]. Most of this applications are still
belong to the traditional domains of formal verification and artificial intelligence, and
although several applications of SAT solvers to cryptanalysis have been described in the
literature, their efforts have been failed to produce any attacks of interests to
cryptologists [4] until the research of Ilya Mironov and Lintao Zhang in [2].
In this paper, they described that some of the attacks to this hash functions can be
automated by encoding them as CNF Formulas, which are the within reach of modern
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
3/22
SAT-solvers, and with this transformation delegate the more laborious part of the attack
to the them, creating the first example of a SAT-solver aided cryptanalysis of a non-trivial
cryptographic primitive.
The strategy was based on the fact that the original attacks consisted of several steps each
of which involves a lot of bit-tweaking and manual work, causing to keep track as many
as 122 Boolean conditions in the simplest function, using this, they found a way to
transform this conditions to a CNF formulas which are then used by a SAT-solver in order
to automate certain parts of the attack obviating the need for compiling tables of
sufficient conditions and designing clever message modification techniques.
So far, current research in this field found some ways to automate certain parts of these
attacks, its still needed to verify and test this approaches in order to find a complete
automation and/or improvements in both areas. This can be accomplished in two ways,
one can be creating a new toolkit for cryptanalysts and the other one is improving SAT-
solvers for specific cryptography problems.
2. Objectives.
2.1 General Objective.
To design and test this approach with different kind of SAT-solvers with the purpose to
find the most suitable one, then implement a semiautomatic testing tool to help
cryptanalysts in order to find weaknesses in hash functions, and through this tool,
analyze more in depth trying to detect possible improvements of the SAT-solvers
algorithms for this specific kind of problems.
2.2 Specific Objectives. Understand the theory and construction of hash functions including its principles. Study the MD4 and MD5 family of Hash Functions and how it really works. Interpret the problem of Boolean satisfiability including its complexity and
variations.
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
4/22
Install and use the MiniSAT SAT-solver. Understand how the attacks on Hash functions run. Study of the discovery process of the differential path for MD4 and MD5.
Realize how to automate the differential path via SAT-solvers. Discover one or two more SAT-solvers in order to compare results with Mini-SAT. Generate full collisions for MD4 and Md5 hash functions.
3. Research Scope.
This thesis is focused on working with the MDx family of hash functions designed by Ron
Rivest. Although there are others state-of-the-art hash functions like SHA-0 and SHA-1
and the foundations of this hash functions share similar design principles, its very well
known that the attack on SHA-1 is just theoretical [10] and also for SHA-0 generate a full
collision would require 3 million CPU hours using common SAT-Solvers [4].
Moreover, the first two stages of the attacks are usually done by hand or by applying
some of heuristics that implies a lot of creativity, therefore until now its not possible to
develop a full automatic attack and thus were going to replicate just the attacks already
known by the literature.
In the case of SAT-solvers well use MiniSAT [11] as main SAT-solver and if time
constraints permit us, we are going to be able to test two or more different SAT-solvers
that are going to be chosen as the research advance.
4. Hardware y Software.
Its required a simple personal computer under the minimal characteristics in both
processing and storage for the execution of the distinct approaches and algorithms in this
research, for this reason it will be used a laptop with the following features:
Dell XPS M1210 Core 2 Duo Intel 2 GHz.
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
5/22
3 GB of RAM.For the software, right now weve only taken into account MiniSAT [11] which is a
minimalistic, open source SAT solver with the purpose of help researchers and developers
with projects related on SAT. Probably also were going to use SATELITE that is a CNF
minimizer and preprocessor to the MiniSAT. Some reasons of choosing MiniSAT are its
key features like efficiency, integration and easy modification, but more important its
performance on SAT competitions [6].
5. Problem Statement.
5.1 Theory of Hash Functions.Cryptographic hash functions also known as one way hash functions are a major tool in
modern cryptography.
Hash functions are defined as a computationally efficient function mapping binary string
or arbitrary length to binary strings of some fixed length, called hash-values [1]. The basic
idea is that a hash-value serves as a compact representative of an input string. In the
cryptographic field, a hash function h is typically chosen such that it is computationally
infeasible to find two distinct inputs which hash to a common value, this means find
inputs x and y such that h(x) = h(y) (this is called the collision-resistant property and is
recognized as the gold standard of security hash functions [13]) and also that a given
specific hash-value, it is computationally infeasible to find an input x such that h(x) = y
(This is known as preimage resistant).
The formal definition of a cryptographic hash function is a mapping:
: {0,1} {0,1 }
Where {0,1}* denotes the set of bit strings of arbitrary length. The image h(X) of some
messageI {0,1} is called the hash value of X [16].
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
6/22
The most common cryptographic uses of hash functions are with digital signatures and
for data integrity. With digital signatures, a long message is usually hashed and only the
hash-value is signed (this is known as signature schemes) [14]. The party receiving the
message then hashes the received message and verifies that the received signature is
correct for this hash-value. Note here that the inability to find two messages with the
same hash-value is a security requirement, since otherwise; the signature on one message
hash-value would be the same as that on another, allowing a signer to sign one message
at a later point in time claim to have signed another. In the case of data integrity the
hash-value corresponding to a particular input is computed at some point in time. The
integrity of this hash-value is protected in some manner. At a subsequent point in time,
to verify that the input data has not been altered, the hash-value is recomputed using theinput at hand, and compared for equality with the original hash-value. Specific
applications include virus protection and software distribution.
Now its very important to differentiate a weak one-way hash function, and a strong one-
way hash function, so lets define each one.
A weak one-way hash function is a function Hsuch that [14]:
Hcan be applied to any argument of any size. Hproduces a fixed size output. Given Handy, its easy to compute H(y), this means that they are computable in
polynomial time.
Given Hand a suitably choseny, its computationally infeasible to findy y suchthat H(y) = H(y).
A strong one-way hash function is a function Hsuch that [14]:
Hcan be applied to any argument of any size. Hproduces a fixed size output (larger than a weak hash function). Given Handy, its easy to compute H(y).
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
7/22
Given H, its computationally infeasible to find any pair y, ysuch that y yandH(y) = H(y).
The main differences between strong and weak hash functions are that they are easier to
use in systems design because there are no pre-conditions on the select ofy, and they
provide the full claimed level of security even when used repeatedly. In this thesis
research, we will use just strong one-way hash functions.
Also in the literature we can find a different kind of taxonomy for hash functions [17]:
Preimage resistant, if given hash value y, it is computationally infeasible to find amessagex with h(x) = y.
Second preimage resistant, if given a message y, its computationally infeasible tofind a messagex y with h(x) = h(y).
Collision Resistant, if its computationally infeasible to find a collision, that is apair of two different messagesy andywith h(y) = h(y).
We can infer that one-way is equivalent to preimage resistant and a weak hash function is
second preimage resistant. These properties are going to be used forward when we start
talking about hash function construction.
A first standard hash function was MD4 (designed by Ron Rivest [7]), then was followed
by a better version called MD5. Then it appeared the first NIST-approved hash function,
SHA-0 (Secure Hash Algorithm) [9] which adopted the general structure of MD4 and two
years later was replaced by a new version, SHA1 [4]. Actually just two hash functions were
in wide-spread use: MD5 and SHA-1. One of the first persons who studied the
construction of hash function was Damgard from Aarhus University in Denmark, and a
theory in collision free hash functions construction was to consider families of hash
functions instead of just one hash function, in order to make complexity theoretic
treatment possible [14]. Under this statement we can say that MD4, MD5, SHA-0 and
SHA-1 belong to one family.
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
8/22
Most hash functions have a
termed a compression functio
message depends on what is
chaining variable has a fixed i
compression function is the
suitably complex way under
hashed. This process contin
under the action of different p
The final value of the chainin
that message. Later well stud
a collision-resistant compressi
Figure 1. The use of a compres
5.2 MD4.
MD4 is a message-digest alg
words. The message is padded
64-bit binary representation o
the message [17]. The messag
in three distinct rounds.
similar iterative structure which is based
n [12]. In summary, the computation of th
called a chaining variable. At the start
itial value which is specified as a part of the
used to update the value of this chainin
he action and influence of the part of the
es recursively, with the chaining variable
arts of the message, until the entire messag
variable is then output as the hash value c
some of the approaches in the basic const
on function.
sion functions in an iterative hash function.
follow the same design.
rithm developed by Rivest in 1990 and op
to ensure that its length in bits plus 64 is di
f the original length of the message is then
is processed in 512-bit blocks, and each bl
round what is
hash value for
f hashing, this
algorithm. The
g variable in a
message being
being updated
has been used.
rresponding to
uction block of
MDx and SHAx
rates on 32-bit
visible by 512. A
oncatenated to
ck is processed
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
9/22
Attacks on versions of MD4 with either the first or the last rounds missing we developed
very quickly by Den Boer, Bossealaers et al [18]. Also [19] has shown how collisions for the
full version of MD4 can be found in under a minute on typical PC.
5.3 MD5.
Some weaknesses that might lead to a compromised were discovered on MD4, so RSA has
to improve it and MD5 born in 1991 (Also by Rivest). It is basically MD5 with safety-belts
and while it is slightly slower than MD4, its more secure [17]. The algorithm consists of
four distinct rounds, which has slightly different design from that of MD4. Message-
digest size, as well as padding requirements, remains the same.
Attacking MD5 is a much more involved proposition than attacking MD4 since it is farmore complicated algorithm to analyze.
5.4 Overview of the attacks on hash functions.
The attacks on the MDx family of hash algorithms are very similar. We can summary in
finding a 512-bit message such that H(IV, M) = H(IV, M o ),where His the compression
function and is fixed [4]. Also as [2] said the complexity of this attack is 2, where n =
128 or 160. The trick is on the choice of a good and use some techniques to find a M that
take advantage of the weaknesses of the compression function bring the complexity of the
attack to fewer 2&$evaluations of the hash function.
In more detail, these attacks are divided in four stages [2]:
1. Choose ", , #'. Here stands for both xor and mod 2%$.2. Choose a differential path " , , # , where r is the number of rounds (r = 48, 64
or 80).
3. Find a set of sufficient conditions on the message M = ", , #', and theintermediate variables , , that guarantee that the message pair M, M =
""", , #'
#' follows the differential path " , , #
.
4. Choose a message M such that all sufficient conditions hold.
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
10/22
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
11/22
An equivalent formulation is to say that each clause should have at least one literal that is
true under the assignment. Such a clause is the said to be satisfied. If the is no assignment
satisfying all clauses, the CNF is said to be unsatisfiable.
An example of what an instance of SAT looks like:
SAT is a typical search problem. We are given an instance I (that is, some input data
specifying the problem at hand, in this case a Boolean formula in a conjunctive normal
form), and we are asked to find a solution S (an object that meets a particular
specification, in this case an assignment that satisfies each clause). If no such solution
exists, we must say so.
5.6 Sat-Solvers Algorithms.
A wide variety of techniques have been developed for solve SAT instances, as a result all
of them can be classified as either complete or approximate. Complete methods
systematically examine the entire solution (if one exists) in bounded time or otherwise
return that the formula is unsatisfiable. In this thesis were going to focus on software
that implements a complete method, but further we can analyze the behavior of some
approximate methods.
5.6.1 DPLL Algorithm.
The original Davis-Putnam procedure was based on a resolution rule that eliminated the
variables one-by-one and added all possible resolvents to the set of clauses; this was
known as DP Method. Unfortunately, this procedure requires exponential spaces,
therefore quickly was replaced the resolution rule with a splitting rule which divides the
problem into two smaller sub problems, this was known as DPLL because of theirauthors, Davis, G, Logemman and Donald Loveland in 1962 [25].
This is the fastest known algorithm for satisfiability testing that is no just sound, but also
complete. In summary, DPLL is depth-first search with backtracking and unit
propagataion.
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
12/22
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
13/22
With good data structures, we can implement unit propagation to take linear time in the
size of the input set of clauses.
5.6.2 Stochastic Local Search Algorithms.
Approximate SAT algorithms have gained widespread attention because they offer acomputationally feasible approach to finding high-quality solutions to NP-hard problems
in a scalable and efficient manner [26].
SLS (Stochastic Local Search Algorithms) generally involve taking a candidate solution
and performing some sort of perturbation which results in one or more new candidate
solutions. An evaluation function is then used to determine which of the candidate
solutions should be accepted. Also this kind of algorithms included two operations called
intensification and diversification [27]. Intensification is a means of greedily improving
solution quality within a small area of the search space for a local optimum, while
diversification helps to prevent stagnation by ensuring that contain only suboptimal
solutions. Incorporating some form of randomness has proved to an efficient
diversification mechanism, while intensification can be achieved through a variety of
techniques such as iterative improvement or the selection step in a genetic algorithm.
5.7 MiniSAT.Minisat was described in the paper An Extansible SAT-solver by Niklas En and Niklas
Sorensson from the Chalmers University of Technology in Sweden [4]. Because of the
growing number of problems encoded into SAT, the found that modifies an existing
solver with an understanding of the problem domain and of modern SAT-techniques, is
was so difficult. For this reason, they developed a small, complete and efficient SAT-
solver with the purpose to give the sufficient details about implementation enable
researchers around the world to construct his o her own solver in a very short time, in
order to meet the needs of a particular application area.
The ideas behind MiniSAT are based on conflict-driven backtracking, watched literals
and dynamic variables ordering. MiniSAT was implemented in C++. Later, well analyze
more in depth internal algorithms of MiniSAT.
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
14/22
6. Bibliography already reviewed.[2] Dejan Jovanovic and Predrag Janicic. Logical analysis of hash functions. Pages 200215.
Springer Verlag, 2005.
[3] RSA Laboratories - 2.1.6 What is a hash function?.
http://www.rsa.com/rsalabs/node.asp?id=2176.
[4] Ilya Mironov, Lintao Zhang. Applications of SAT Solvers to cryptanalysis of hash functions.
[6] SAT Competitions. http://www.satcompetition.org/.
[7] RFC 1186 (rfc1186) - MD4 Message Digest Algorithm. http://www.faqs.org/rfcs/rfc1186.html.
[8] RFC 1321 (rfc1321) - The MD5 Message-Digest Algorithm.
http://www.faqs.org/rfcs/rfc1321.html.
[9] RFC 3174 (rfc3174) - US Secure Hash Algorithm 1 (SHA1).
http://www.faqs.org/rfcs/rfc3174.html.
[11] MiniSat Page. http://minisat.se/.
[12] Ivan Damgard. Collision fre hash functions and public key signature schemes. In David
Chaum and Win L. Price, editors, Advances in Cryptology. Springer, 1988.
[14] Ivan Damgard. A design principle for hash functions. In advances in cryptology. Springer,
1990.
[15] Brassard, Gilles. One way hash functions and DES. Advances in Cryptology. Berlin: Springer-
Verlag, 1990. [16] El de MD4
[18] Crypto FAQ RSA http://www.rsa.com/rsalabs/node.asp?id=2253
6. Bibliography partially reviewed.[1] Menezes, a. et al. Handbook of Applied Cryptography. Boca Raton: CRC Press, 1997
[5] Niklas Een and Niklas Sorensson. An extensible SAT Solver. SAT 2003.
[10] Xiaoyun Wang, Yiqun Lisa Yin, and Hongbo Yu. Finding collisions in the full SHA-1.
[17] Ilya Mironov. Hash functions, theory, attacks and applications.
[19] B. den Boer and A. Bosselaers, An attack on the last two rounds of MD4,Advances in
Cryptology - Crypto '91, Springer-Verlag (1992), 194-203.
[20] H. Dobbertin, Alf Swindles Ann, CryptoBytes (3) 1 (Autumn 1995).
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
15/22
[21] Gabbay, Dov and Christopher Hogger. Handbook of Logic in Artificial Intelligence and Logic
Programming. Oxford: Clarendon Press, 1993.
[22] I.P. Gent and T. Walsh, "The search for Satisfaction", Internal Report, Dept. of Computer
Science, University of Strathclyde, 1999
[23] Algorithms for the Satisfiability (SAT) Problem: A survey. J. Gu, P. W. Purdom, J. Franco, and
B. W. Wah, in "Satisfiability Problem: Theory and Applications", DIMACS Series in Discrete
Mathematics and Theoretical Computer Science, American Mathematical Society, 1997, pp. 19-152.
[24] Sanjoy Dasgupta, Christos Papadimitriou, Umesh Vazirani. Algorithms. McGrawHill.
[25] Davis, M., G. Logemann, and D. Loveland (1962, July). A machine program for theorem-
proving. Commun. ACM 5 (7), 394-397.
[26] Holger H. Hoos and Thomas Sttzle: Stochastic local search: foundations and applications
(2005).
[27] Li, C.M., and Anbulagan. Heuristics based on unit propagation for satisfiability problems. In
Proc. 15th IJCAI. 1997.
7. Bibliography to Review.* Philip Hawkes, Michael Paddon, Gregory G. Rose: Musings on the Wang et al. MD5Collision, Cryptology ePrint Archive, Report 2004/264, 13 October 2004.
* M.J.B. Robshaw, On Recent Results for MD2, MD4 and Md5. RSA Laboratories Bulletin, News
and advice from RSA Laboratories. Number 4. Nomver 12, 1996.
* Hans Dobbertin, The Status of MD5 after a Recent Attack. RSA Laboratories, CryptoBytes, The
technical newsletter of RSA laboratories, a division of RSA Data Security, INC. Number 2, Summer
1996.
* Ilya Mironov. Hash functions: Theory, attacks and applications. November 14, 2005.
* Ilya Mironov. Hash Functions: From Merkle-Damgard to Shoup. Computer Science Department,
Stanford University.
* Preneel Bart. Analysis and Design of Cryptographic Hash Functions. February 2003.
* Klima Vlastimil. Finding MD5 Collisions on a Notebook PC Using Multi-message Modifications.
March 31, 2005.
* Propositional Logic, Class Notes for CS264A, UCLA.
* Goldberg Evgueni and Yakov Novikov. BerkMin: a Fast and Robust Sat-Solver.
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
16/22
* Niklas Een and Armin Biere. Effective Preprocessing in SAT through Variable and Clause
Elimination.
* Marques-Silva Joao et al. GRASP: A search Algorithm for a propositional satisfiability.
* Irinas Rish and Rina Dechter. Resolution versus Search: Two strategies for SAT.
* Fabio Massaci. Using WALK-SAT and Rel-SAT for Cryptographic Key Search.
Appendix 1.
MD4 Algorithm DescriptionWe begin by supposing that we have a b-bit message as input, and thatwe wish to find its message digest. Here b is an arbitrarynonnegative integer; b may be zero, it need not be a multiple of 8,and it may be arbitrarily large. We imagine the bits of the message
written down as follows:
m_0 m_1 ... m_{b-1} .
The following five steps are performed to compute the message digestof the message.
Step 1. Append padding bits
The message is "padded" (extended) so that its length (in bits)is congruent to 448, modulo 512. That is, the message isextended so that it is just 64 bits shy of being a multiple of512 bits long. Padding is always performed, even if the lengthof the message is already congruent to 448, modulo 512 (in
which case 512 bits of padding are added).
Padding is performed as follows: a single "1" bit is appendedto the message, and then enough zero bits are appended so thatthe length in bits of the padded message becomes congruent to448, modulo 512.
Step 2. Append length
A 64-bit representation of b (the length of the message beforethe padding bits were added) is appended to the result of theprevious step. In the unlikely event that b is greater than2^64, then only the low-order 64 bits of b are used. (These
bits are appended as two 32-bit words and appended low-orderword first in accordance with the previous conventions.)
At this point the resulting message (after padding with bitsand with b) has a length that is an exact multiple of 512 bits.Equivalently, this message has a length that is an exactmultiple of 16 (32-bit) words. Let M[0 ... N-1] denote thewords of the resulting message, where N is a multiple of 16.
Step 3. Initialize MD buffer
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
17/22
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
18/22
[B C D A 11 19][A B C D 12 3][D A B C 13 7][C D A B 14 11][B C D A 15 19]
[Round 2]Let [A B C D i s] denote the operation
A = (A + g(B,C,D) + X[i] + 5A827999)
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
19/22
[D A B C 9 9][C D A B 5 11][B C D A 13 15][A B C D 3 3][D A B C 11 9][C D A B 7 11][B C D A 15 15]
Then perform the following additions:A = A + AAB = B + BB
C = C + CCD = D + DD
(That is, each of the four registers is incremented bythe value it had before this block was started.)
end /* of loop on i */
Step 5. Output
The message digest produced as output is A,B,C,D. That is, webegin with the low-order byte of A, and end with the high-orderbyte of D.
Md5 Algorithm DescriptionWe begin by supposing that we have a b-bit message as input, and that
we wish to find its message digest. Here b is an arbitrarynonnegative integer; b may be zero, it need not be a multiple ofeight, and it may be arbitrarily large. We imagine the bits of themessage written down as follows:
m_0 m_1 ... m_{b-1}
The following five steps are performed to compute the message digestof the message.
3.1 Step 1. Append Padding Bits
The message is "padded" (extended) so that its length (in bits) iscongruent to 448, modulo 512. That is, the message is extended sothat it is just 64 bits shy of being a multiple of 512 bits long.Padding is always performed, even if the length of the message is
already congruent to 448, modulo 512.
Padding is performed as follows: a single "1" bit is appended to themessage, and then "0" bits are appended so that the length in bits ofthe padded message becomes congruent to 448, modulo 512. In all, atleast one bit and at most 512 bits are appended.
3.2 Step 2. Append Length
A 64-bit representation of b (the length of the message before the
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
20/22
padding bits were added) is appended to the result of the previousstep. In the unlikely event that b is greater than 2^64, then onlythe low-order 64 bits of b are used. (These bits are appended as two32-bit words and appended low-order word first in accordance with theprevious conventions.)
At this point the resulting message (after padding with bits and withb) has a length that is an exact multiple of 512 bits. Equivalently,this message has a length that is an exact multiple of 16 (32-bit)words. Let M[0 ... N-1] denote the words of the resulting message,where N is a multiple of 16.
3.3 Step 3. Initialize MD Buffer
A four-word buffer (A,B,C,D) is used to compute the message digest.Here each of A, B, C, D is a 32-bit register. These registers areinitialized to the following values in hexadecimal, low-order bytesfirst):
word A: 01 23 45 67
word B: 89 ab cd efword C: fe dc ba 98word D: 76 54 32 10
3.4 Step 4. Process Message in 16-Word Blocks
We first define four auxiliary functions that each take as inputthree 32-bit words and produce as output one 32-bit word.
F(X,Y,Z) = XY v not(X) ZG(X,Y,Z) = XZ v Y not(Z)H(X,Y,Z) = X xor Y xor ZI(X,Y,Z) = Y xor (X v not(Z))
In each bit position F acts as a conditional: if X then Y else Z.The function F could have been defined using + instead of v since XYand not(X)Z will never have 1's in the same bit position.) It isinteresting to note that if the bits of X, Y, and Z are independentand unbiased, the each bit of F(X,Y,Z) will be independent andunbiased.
The functions G, H, and I are similar to the function F, in that theyact in "bitwise parallel" to produce their output from the bits of X,Y, and Z, in such a manner that if the corresponding bits of X, Y,and Z are independent and unbiased, then each bit of G(X,Y,Z),H(X,Y,Z), and I(X,Y,Z) will be independent and unbiased. Note thatthe function H is the bit-wise "xor" or "parity" function of its
inputs.
This step uses a 64-element table T[1 ... 64] constructed from thesine function. Let T[i] denote the i-th element of the table, whichis equal to the integer part of 4294967296 times abs(sin(i)), where iis in radians. The elements of the table are given in the appendix.
Do the following:
/* Process each 16-word block. */
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
21/22
For i = 0 to N/16-1 do
/* Copy block i into X. */For j = 0 to 15 do
Set X[j] to M[i*16+j].end /* of loop on j */
/* Save A as AA, B as BB, C as CC, and D as DD. */AA = ABB = B
CC = CDD = D
/* Round 1. *//* Let [abcd k s i] denote the operation
a = b + ((a + F(b,c,d) + X[k] + T[i])
-
8/14/2019 Cryptanalysis of Hash Functions using an approach based on SAT-solvers (Formal Proposal, first draft)
22/22
end /* of loop on i */
3.5 Step 5. Output
The message digest produced as output is A, B, C, D. That is, webegin with the low-order byte of A, and end with the high-order byteof D.
This completes the description of MD5. A reference implementation inC is given in the appendix.
4. Summary
The MD5 message-digest algorithm is simple to implement, and providesa "fingerprint" or message digest of a message of arbitrary length.It is conjectured that the difficulty of coming up with two messageshaving the same message digest is on the order of 2^64 operations,and that the difficulty of coming up with any message having a givenmessage digest is on the order of 2^128 operations. The MD5 algorithm
has been carefully scrutinized for weaknesses. It is, however, arelatively new algorithm and further security analysis is of coursejustified, as is the case with any new proposal of this sort.