design and analysis of lightweight block ciphers : a focus

Design and Analysis of Lightweight Block Ciphers:A Focus on the Linear Layer

Christof Beierle

Doctoral DissertationFaculty of MathematicsRuhr-Universitat Bochum

December 2017

Design and Analysis of Lightweight Block Ciphers:A Focus on the Linear Layer

vorgelegt vonChristof Beierle

Dissertationzur Erlangung des Doktorgrades

der Naturwissenschaftenan der Fakultat fur Mathematikder Ruhr-Universitat Bochum

Dezember 2017

First reviewer: Prof. Dr. Gregor LeanderSecond reviewer: Prof. Dr. Alexander May

Date of oral examination: February 9, 2018

Abstract

Lots of cryptographic schemes are based on block ciphers. Formally, a blockcipher can be defined as a family of permutations on a finite binary vector space.A majority of modern constructions is based on the alternation of a nonlinearand a linear operation. The scope of this work is to study the linear operationwith regard to optimized efficiency and necessary security requirements. Our maintopics are

• the problem of efficiently implementing multiplication with fixed elements infinite fields of characteristic two.

• a method for finding optimal alternatives for the ShiftRows operation inAES-like ciphers.

• the tweakable block ciphers Skinny and Mantis.

• the effect of the choice of the linear operation and the round constants withregard to the resistance against invariant attacks.

• the derivation of a security argument for the block cipher Simon that doesnot rely on computer-aided methods.

Zusammenfassung

Viele kryptographische Verfahren basieren auf Blockchiffren. Formal kann eineBlockchiffre als eine Familie von Permutationen auf einem endlichen binaren Vek-torraum definiert werden. Eine Vielzahl moderner Konstruktionen basiert auf derwechselseitigen Anwendung von nicht-linearen und linearen Abbildungen. Im Rah-men dieser Arbeit wird die lineare Abbildung im Hinblick auf Effizienzoptimierungund notwendige Sicherheitseigenschaften studiert. Unsere Hauptthemen sind

• das Problem der effizienten Implementierung der Multiplikation mit festenElementen in endlichen Korpern der Charakteristik zwei.

• eine Methode fur das Finden optimaler Alternativen fur die ShiftRows Op-eration in AES-ahnlichen Chiffren.

• die veranderbaren Blockchiffren Skinny und Mantis.

• die Auswirkung der Wahl der linearen Abbildung und der Rundenkonstantenauf die Resistenz gegen Invariantangriffe.

• das Herleiten eines nicht auf Computerberechnungen beruhenden Sicher-heitsargumentes fur die Blockchiffre Simon.

Acknowledgements

First and foremost I would like to thank my advisor Gregor Leander for acceptingme as his student and for giving me the opportunity to work in his group duringthe last three years. He offered plenty of time for discussions and pointing to inter-esting research questions. I would also like to thank Alexander May for agreeingto be the second reviewer of this thesis.

My work was funded by DFG Research Training Group GRK 1817. Specialthanks also to them.

Further, I am very grateful to all of my co-authors with whom I collaboratedduring the three years of my Ph.D. studies and to all other people I worked with. Iwould especially like to thank Roberto Avanzi, Anne Canteaut, Gottfried Herold,Takanori Isobe and Thomas Peyrin for some valuable discussions and commentson particular topics discussed in this thesis.

I would like to thank Anne Canteaut for welcoming me for seven weeks in hergroup at INRIA, Paris and Christian Rechberger for inviting me to visit his groupat DTU, Copenhagen for one week.

I would also like to express my gratitude to the Embedded Security Group ofthe Faculty of Electrical Engineering and Information Technology for hosting mefor the first months as a Ph.D. student and for several weeks during my interdis-ciplinary project.

Further thanks go to my colleagues Thorsten Kranz and Friedrich Wiemer withwhom I shared an office. Especially thanks to Friedrich for helping me severaltimes with Linux issues and using the C3 cluster, and for proofreading parts ofthis thesis.

Special thanks also to Irmgard and Marion for helping with lots of administra-tive tasks.

I would like to express my deepest gratitude to all of my family for theirsupport and guidance. Especially, I thank my parents Marion and Klaus and mygrandparents Helga and Hermann, from whom I have learned so much. Finally, Ithank Anja for all her love and support.

Bochum, February 2018

Contents

List of Figures xi

List of Tables xv

Notations xvii

1 Introduction 1

2 State of the Art in Block Cipher Design 52.1 Security Notions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2 Block Cipher Constructions . . . . . . . . . . . . . . . . . . . . . . 11

2.2.1 Key-Alternating Ciphers . . . . . . . . . . . . . . . . . . . . 132.2.2 Substitution-Permutation Ciphers . . . . . . . . . . . . . . 142.2.3 Feistel and ARX . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Cryptanalytic Attacks . . . . . . . . . . . . . . . . . . . . . . . . . 182.3.1 Differential Cryptanalysis . . . . . . . . . . . . . . . . . . . 182.3.2 Linear Cryptanalysis . . . . . . . . . . . . . . . . . . . . . . 24

2.4 The Wide-Trail Strategy and AES-like Ciphers . . . . . . . . . . . 302.4.1 The Branch Number and a Link to Coding Theory . . . . . 322.4.2 AES-like Ciphers . . . . . . . . . . . . . . . . . . . . . . . . 342.4.3 Computing Active S-boxes with Automatic Tools . . . . . . 39

2.5 Lightweight Cryptography . . . . . . . . . . . . . . . . . . . . . . . 412.5.1 Lightweight Metrics . . . . . . . . . . . . . . . . . . . . . . 422.5.2 Characteristics of Lightweight Block Ciphers . . . . . . . . 432.5.3 Midori . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

I Design of Lightweight Linear Layers 49

3 Lightweight Linear Layers based on Finite Field Multiplications 513.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 513.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.2.1 The XOR-Count and the Cycle Normal Form . . . . . . . . 57

ix

3.3 Efficient Multiplication in Finite Fields . . . . . . . . . . . . . . . . 603.3.1 Characterizing Elements with Optimal XOR-Count . . . . . 613.3.2 Experimental Search for Optimal XOR-Counts . . . . . . . 64

3.4 Constructing Lightweight MDS Matrices . . . . . . . . . . . . . . . 673.4.1 Generic Lightweight MDS Matrices . . . . . . . . . . . . . . 683.4.2 Instantiating Lightweight MDS Matrices . . . . . . . . . . . 71

3.5 Generalizing the MDS Property . . . . . . . . . . . . . . . . . . . . 723.6 Conclusion and Open Problems . . . . . . . . . . . . . . . . . . . . 73

4 On the Best Word Permutations for Lightweight AES-like Ciphers 794.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 794.2 Classifying Word Permutations . . . . . . . . . . . . . . . . . . . . 82

4.2.1 Structure Matrix of a Word Permutation . . . . . . . . . . 844.2.2 Search Algorithm . . . . . . . . . . . . . . . . . . . . . . . . 854.2.3 The Difference between the Equivalence Notions . . . . . . 86

4.3 Case Study – The Best Word Permutations for Midori . . . . . . . 884.3.1 Computing the Minimum Number of Active S-boxes . . . . 89

4.4 Conclusion and Open Problems . . . . . . . . . . . . . . . . . . . . 91

5 The Tweakable Block Ciphers Skinny and Mantis 935.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 935.2 Tweakable Block Ciphers . . . . . . . . . . . . . . . . . . . . . . . 95

5.2.1 Related-Key Attacks . . . . . . . . . . . . . . . . . . . . . . 965.2.2 The TWEAKEY Framework . . . . . . . . . . . . . . . . . 100

5.3 The Skinny Family of Lightweight (Tweakable) Block Ciphers . . . 1015.3.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 1015.3.2 Design Rationale . . . . . . . . . . . . . . . . . . . . . . . . 1095.3.3 The MILP Model for Computing Active S-Boxes . . . . . . 1135.3.4 Diffusion Test . . . . . . . . . . . . . . . . . . . . . . . . . . 1165.3.5 Security Claim and Best Cryptanalysis so far . . . . . . . . 117

5.4 The Mantis Family of Low-Latency Tweakable Block Ciphers . . . 1185.4.1 Specification . . . . . . . . . . . . . . . . . . . . . . . . . . 1195.4.2 Design Rationale . . . . . . . . . . . . . . . . . . . . . . . . 1225.4.3 Security Claim and Best Cryptanalysis so far . . . . . . . . 1245.4.4 Unrolled Implementations . . . . . . . . . . . . . . . . . . . 1255.4.5 The MILP Constraints for the MixColumns Operation . . . 125

5.5 Conclusion and Future Work . . . . . . . . . . . . . . . . . . . . . 127

II Analysis of Lightweight Block Ciphers 129

6 Invariant Attacks 1316.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1316.2 Preliminaries and Explanation of the Attack . . . . . . . . . . . . . 1346.3 A Link to Linear Cryptanalysis . . . . . . . . . . . . . . . . . . . . 136

6.4 Proving the Absence of Invariants in Lightweight SPNs . . . . . . 1396.4.1 The Simple Case . . . . . . . . . . . . . . . . . . . . . . . . 1436.4.2 When the Dimension is Smaller . . . . . . . . . . . . . . . . 1436.4.3 Results for some Lightweight Ciphers . . . . . . . . . . . . 146

6.5 Design Criteria on the Linear Layer and on the Round Constants . 1476.5.1 The Possible Dimensions of the L-Invariant Subspaces . . . 1476.5.2 Considering More Round Constants . . . . . . . . . . . . . 1516.5.3 Choosing Random Round Constants . . . . . . . . . . . . . 154


7 Differential Trails in Simon-like Ciphers 1617.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1617.2 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1647.3 Analysis of Differential Trails . . . . . . . . . . . . . . . . . . . . . 167

7.3.1 Restriction to Rotations as Rotational Invariant Functions . 1687.3.2 Obtaining the Upper Bound for Simon and Simeck . . . . . 171


Bibliography 177

List of Figures

2.1 A product cipher . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.2 A key-alternating cipher . . . . . . . . . . . . . . . . . . . . . . . . 132.3 A substitution-permutation (SP) cipher . . . . . . . . . . . . . . . 142.4 One round Ff of a Feistel cipher . . . . . . . . . . . . . . . . . . . . 152.5 One round Ff of a key-alternating Feistel cipher . . . . . . . . . . 152.6 The structure of a key-alternating cipher with an AES-like round . 35

4.1 An AES-like cipher for equivalent permutations . . . . . . . . . . . 834.2 The possible transition patterns of the MixColumn matrix in Midori 90

5.1 Illustration of Mantis6 . . . . . . . . . . . . . . . . . . . . . . . . . 119

6.1 The highest possible dimensions of WL(c1, . . . , ct) for Skinny-64,Prince and Mantis . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

6.2 The probability that WL(c1, . . . , ct) = Fn2 for uniformly randomconstants ci for several lightweight ciphers . . . . . . . . . . . . . . 134

7.1 Comparison of the experimental bounds for Simon-32 and Simon-48from the literature and our provable bounds . . . . . . . . . . . . . 164

7.2 Illustration of the Simon round function and the generalized Simon-like round function . . . . . . . . . . . . . . . . . . . . . . . . . . . 167

7.3 Propagation of the Hamming weight for differences (α, 0) with w1(α) ∈{1, 2, 3} . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171

xiii

List of Tables

2.1 The S-box SbAES used in the AES . . . . . . . . . . . . . . . . . . . 362.2 The 4-bit S-box SbMid64 used in Midori-64 . . . . . . . . . . . . . . 46

3.1 Optimal instantiations of the generic MDS matrices for 2 ≤ n ≤ 8 . 713.2 Comparison of our results with F2s-linear MDS matrices from the

literature by average overhead per row . . . . . . . . . . . . . . . . 723.3 Minimal XOR-counts for all elements in F∗24 . . . . . . . . . . . . . 743.4 Minimal XOR-counts for all elements in F∗25 . . . . . . . . . . . . . 753.5 Minimal XOR-counts for all elements in F∗26 . . . . . . . . . . . . . 753.6 Minimal XOR-counts for all elements in F∗27 . . . . . . . . . . . . . 763.7 Minimal XOR-counts for all elements in F∗28 . . . . . . . . . . . . . 77

3.8 Matrices of the form Cxs+1 +E[i1,j1] +E[i2,j2] with irreducible char-acteristic pentanomial . . . . . . . . . . . . . . . . . . . . . . . . . 78

4.1 Some classes of permutations that, under the MixM operation ofMidori, lead to optimal bounds on the number of active S-boxes . . 90

5.1 Number of rounds t for Skinny-n-κ . . . . . . . . . . . . . . . . . . 1035.2 The 4-bit S-box Sb4 employed in the Skinny-64 versions . . . . . . 1035.3 The S-box Sb8 used in the Skinny-128 versions . . . . . . . . . . . . 1045.4 The Round Constants used in Skinny . . . . . . . . . . . . . . . . . 1075.5 Round-based ASIC implementations of the Skinny-64 and the Skinny-

128 versions and comparison to Simon . . . . . . . . . . . . . . . . 1105.6 Lower bounds on the minimum number of active S-boxes in Skinny 1135.7 Number of rounds of Skinny that are broken by the best key-recovery

attacks so far . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1185.8 Lower bounds on the number of active S-boxes in the single-key

model and in the related-tweak model for Mantis . . . . . . . . . . 1245.9 Unrolled implementations of several Mantis versions constrained for

the smallest area . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1255.10 Unrolled implementations of Mantis constrained for shortest delay 125

7.1 Number of rounds t needed for bounding the value of P (C) below2−n for all versions of Simon and Simeck . . . . . . . . . . . . . . . 175

xv

Notations

N The set of natural numbers, i.e., {1, 2, . . . }N≤l The set {1, . . . , l} ⊂ NN0<l The set {0, . . . , l − 1} ⊂ Z

F2 The finite field with two elements, i.e., {0, 1}F2s The binary extension field with 2s elements

id, idm The identity function (resp. the identity from Fm2 → Fm2 )

K∗ The multiplicative group of a field K

K[X] The polynomial ring in X over the field K

Matm(K),Matm(R) The ring of m×m matrices with coefficients in the fieldK, resp. ring with unity R

GLm(K) The general linear group of degree m over the field K

M> The transpose of the matrix M

0m The m×m zero matrix

Im The m×m identity matrix

Mi,j The coefficent of the matrix M in the i-th row of thej-th column

E[i,j] A matrix having the single non-zero coefficient E[i,j]i,j = 1

CQ The companion matrix of a polynomial Q

mα,mA,mL The minimal polynomial of a finite field element α orthe minimal polynomial of an invertible matrix A (resp.linear bijection L)⊕d

i=1Ai The block-diagonal matrix consisting of the d matrixblocks Ai⊕d

i=1 Vi The direct sum of vector spaces Vi

dimV The dimension of the vector space V

lcm(Q1, . . . , Ql) The least common multiple of Q1, . . . , Ql ∈ K[X]

gcd(Q1, . . . , Ql) The greatest common divisor of Q1, . . . , Ql ∈ K[X]

xvii

det(A) The determinant of a matrix A

w(A) The number of non-zero coefficients of a matrix A orthe number of non-zero coefficients of a polynomial A

Quot(F2[X]) The fraction field of the polynomial ring F2[X]

Sm The symmetric group of degree m, i.e., all permutations on{1, . . . ,m}

Bm The set of all Boolean functions of m variables, i.e., all func-tions f : Fm2 → F2

0,1 The constant Boolean functions x 7→ 0 (resp. x 7→ 1)

deg g,degQ The algebraic degree of a Boolean function g or the degree ofa polynomial Q

∆αf The derivative of the Boolean function f in direction α, i.e.,x 7→ f(x+ α) + f(x)

span{xi}i∈I The linear span of the vectors xi for i ∈ I〈α, x〉 The dot product of vectors α, x ∈ Fm2 , i.e.,

∑mi=1 αixi

E A block cipher

Ek A keyed instance of the block cipher E

Ri The i-th round of a product cipher

Ri The i-th unkeyed round function of a key-alternating cipher

· ∧ · The bit-wise AND operation in Fm2 , i.e.,(x1, . . . , xm) ∧ (y1, . . . , ym) := (x1y1, . . . , xmym)

· The bit-wise NOT operation in Fm2 , i.e.,

(x1, . . . , xm) := (x1 + 1, . . . , xm + 1)

· ∨ · The bit-wise OR operation in Fm2 , i.e.,

(x1, . . . , xm) ∨ (y1, . . . , ym) := (x1, . . . , xm) ∧ (y1, . . . , ym)

≪ r A cyclic r-bit rotation to the left, i.e., for (x1, . . . , xm) ∈ Fm2 ,((x1, . . . , xm) ≪ r) := (xr+1, . . . , xm, x1, . . . , xr)

≫ r A cyclic r-bit rotation to the right, i.e., for (x1, . . . , xm) ∈ Fm2 ,((x1, . . . , xm) ≫ r) := (xm−r+1, . . . , xm, x1, . . . , xm−r)

� r An r-bit shift to the right, i.e., for (x1, . . . , xm) ∈ Fm2 ,((x1, . . . , xm)� r) := (0, . . . , 0, x1, . . . , xm−r)

·||· Concatenation of vectors, i.e.,(x1, . . . , xl)||(y1, . . . , ym) := (x1, . . . , xl, y1, . . . , ym)

w1(x) The Hamming weight of a vector x = (x1, . . . , xd) ∈ Fd2, i.e.,the number of non-zero coefficients xi

ws(x) The weight with respect to s-bit words of x = (x1, . . . , xd) ∈(Fs2)d, i.e., the number of non-zero coefficients xi

0, .., 9, A, .., F Digits in base-16 representation (denoted in typewriter font),also used for representing binary vectors, e.g., C := (1, 1, 0, 0)

Chapter 1

Introduction

Due to the increasing deployment of connected devices, a huge amount of data iscirculating in the “Internet of Things” (IoT). Lots of this data contains sensitive,personal information. Consequently, it has to be protected from the access to thirdparties. Many of the desired security goals, including confidentiality and authen-ticity as among the most important, can be realized by symmetric cryptographicalgorithms, i.e., block ciphers, stream ciphers, hash functions, and message au-thentication codes. In the focus of this thesis are block ciphers which can be seenas the cornerstones of symmetric cryptography. Indeed, many other cryptographicprimitives and even more complex protocols can be realized with them. For in-stance, if we already have a secure block cipher at hand, there are well-knownconstructions for building secure hash functions or message authentication codes.Consequently, one needs a secure cipher to start with.

Formally, a block cipher is defined as a family of permutations on a finitemessage space parametrized by a key from a finite key space. Ideally, each of thosepermutations should be indistinguishable from a permutation chosen uniformly atrandom from the set of all permutations on the message space. As this idealsecurity goal is quite hard to capture, a rather practical “security notion” hasbeen established, i.e., a cryptographic primitive is assumed to be secure if nosignificant weaknesses have been found over a sufficient long period of times (e.g.,a few years). Nowadays, we have quite efficient and versatile block ciphers ofthat kind. The most important one is the Advanced Encryption Standard (AES),standardized in 2001 [PUB01]. It can certainly be considered to be the best-understood construction today and since its publication, no significant weaknesseshave been found. Its rather simple and elegant design and its versatility for manyapplications makes it the state-of-the-art cipher nowadays.

However, as the variety of possible applications increases and connected devicesbecome smaller and cheaper, there might be situations in which a cryptographicsolution tailored for meeting extremely constrained requirements on performanceand efficiency is needed. In this context, one can think of the requirements of hav-

1

ing extremely low response time, low energy consumption, occupying few chip area,or having compact code size when implemented in software. Those requirementsstand even more in focus in cases where offering security is not the main purposeof the application. Therefore, one might need new kind of designs, optimized withregard to various lightweight metrics. Over the years, plenty of those lightweightprimitives appeared, see e.g., [BP17] for a comprehensive list. Many of those blockcipher designs follow the structure of a key-alternating substitution-permutation(SP) cipher. In those structures, the primitive is designed as an iteration of pre-defined round functions which are interleaved by the addition of round keys. Theround functions are defined in a simple way, consisting of a non-linear operationand a linear operation, also called linear layer. It is a well-studied design paradigmto which also the AES belongs. In fact, in many designs, a well-known algorithm(e.g., the AES) is taken and its components are modified to meet the requiredlightweight requirements.

The purpose of this thesis is to broaden the knowledge of lightweight block ci-pher design. In particular, we study the design of the linear layer in key-alternatingciphers from the viewpoint of optimizing efficiency without sacrificing necessarysecurity properties. When we talk about security, we mean security against well-known attack methods, the most important being differential and linear attacks.Also, as lightweight designs tend to employ sparser components and sometimesrather innovative design considerations, another important part is to avoid theapplicability of new, dedicated attack methods.

At the core of our considerations is the well-known wide-trail strategy, intro-duced in [Dae95]. It suggests that spending some amount of computational effortinto a well-chosen linear layer helps for obtaining strong arguments on the securityof the primitive. In this light, there still exist various possibilities for optimization.

Outline of this Thesis

In Chapter 2, we start with explaining the state of the art in block cipher design,focussing on the most important topics needed for our considerations. We first givean intuitive (and then more formal) notion of security and then focus on the mostcommon constructions for block ciphers today. After that, the two most importantattack strategies, i.e., differential and linear attacks, are explained in more detail.We then continue by explaining the wide-trail design strategy, a powerful methodfor deriving strong security arguments on the resistance against those attacks.The AES, which is designed according to the wide-trail strategy, is presented inthis context. We conclude the chapter by explaining various lightweight metricsand typical characteristics of existing lightweight designs. We emphasize that thischapter is merely for introductory purposes and we do not claim any new results.When omitting some of the details, we point to the important literature.

The rest of this thesis is arranged in two parts. The first part sets the focusslightly more on the design of linear layers, whereas the second part is more ded-icated to the study of certain cryptanalytic attacks and the analysis of existing

2

lightweight designs. The first part consists of Chapters 3–5.

Chapter 3 studies the problem of efficiently implementing multiplication withfixed elements in finite fields of characteristic two. In particular, we study howchoosing an appropriate F2-basis can reduce the number of XOR operations (i.e.,additions in F2) needed for computing the multiplication. Those results are thenused for constructing MDS matrices which need a low number of XOR operationsfor their implementation. As the linear layer of a block cipher is defined for vectorspaces over a finite field of characteristic two, our results shed some light on howa particular choice of field representation can influence the efficiency of the linearlayer, and hence the efficiency of the primitive as a whole. This chapter is based onjoint work with Thorsten Kranz and Gregor Leander, published in the proceedingsof CRYPTO 2016 [BKL16].

Chapter 4 focusses on the linear layer in lightweight AES-like ciphers, i.e.,lightweight ciphers that resemble the round function employed in the AES. Moreprecisely, we study how the search for an optimal word permutation, to be used asan alternative for the AES ShiftRows operation, can be conducted in an exhaus-tive manner, i.e., without relying on a reduction of the search space. We give analgorithm that enumerates all word permutations up to some reasonable notion ofequivalence. The lightweight block cipher Midori then serves as a case study forfinding the best word permutation in terms of resistance against differential andlinear attacks. This cipher is particularly interesting as the designers based theirsearch on heuristic conditions. Indeed, there exists an alternative word permuta-tion that slightly outperforms the original choice. This chapter is based on jointwork with Gianira Alfarano, Stefan Kolbl and Gregor Leander.

Chapter 5 presents the two families of lightweight tweakable block ciphersSkinny and Mantis. They were designed for having competitive performance invarious environments while at the same time providing strong arguments on theresistance against related-key attacks. Those attacks, which were already intro-duced in the ’90s, assume a more powerful adversary that is able to encrypt (resp.decrypt) messages under several keys that are related to the key under usage. As atweakable block cipher can be defined as a block cipher for which parts of the keyare assumed to be public, this adversary model is of significant importance. Beforewe turn into describing the particular designs, we recall the formal notion of therelated-key adversary model and tweakable block ciphers. Along with describingthe design of Skinny and Mantis and their design rationale, we then explain inmore detail how the search for the choice of the linear layer was conducted andhow its security was evaluated using Mixed-Integer Linear Programming. Theresults in this chapter are based on joint work with Jeremy Jean, Stefan Kolbl,Gregor Leander, Amir Moradi, Thomas Peyrin, Yu Sasaki, Pascal Sasdrich andSiang Meng Sim, published in the proceedings of CRYPTO 2016 [BJK+16a].

The last two chapters build the second part of this thesis. In Chapter 6, wefocus on invariant attacks on lightweight ciphers. Those attacks were introducedin [LAAZ11] in the context of invariant subspaces, and in [TLS16] as the moregeneral nonlinear invariant attack. Those attacks allow to easily distinguish the

3

cipher from a random permutation for a particular class of weak keys, by exploitingapproximations by (nonlinear) Boolean functions. After explaining the attack, wepresent a link to linear cryptanalysis. In particular, in some cases, the existenceof an invariant attack for a particular weak key implies the existence of a high-biased linear approximation over the keyed instance. Then, for SP ciphers witha simple derivation of the round keys, the question we study is how a particularchoice of the linear layer, together with a particular choice of round constants,affects the applicability of the attack. We show that the lightweight block ciphersSkinny-64-64, Prince and Mantis7 (with tweak zero) are resistant against a largeclass of invariant attacks, including all the practical attacks we are aware of fromthe literature. In this context, we also explain a straightforward approach howa designer can protect against those class of invariant attacks by carefully choos-ing the round constants along with the linear layer. Large parts of this chapterare based on joint work with Anne Canteaut, Gregor Leander and Yann Rotella,published in the proceedings of CRYPTO 2017 [BCLR17].

Chapter 7 focusses on the Simon family of lightweight block ciphers, an inno-vative design presented by the US National Security Agency. Amost all of thesecurity analysis was conducted and published by third-party researchers, in mostcases under the usage of computer-aided tools. We outline a more theoretic se-curity argument against differential attacks that is verifiable by hand. For that,the round function of Simon (and more general Simon-like designs) is consideredas a separation into a non-linear and a linear operation. The applicability of oursecurity argument can then be formulated as a property of this linear operation.The chapter is based on the author’s publication that appeared in the proceedingsof SCN 2016 [Bei16].

We conclude each of the Chapters 3–7 by pointing to possible considerationsfor future work.

On the Author’s Contribution

Large parts of this thesis are based on joint work with other co-authors. There-fore, we start each chapter that contains joint work by mentioning the particularauthor’s contribution to the results.

4

Chapter 2

State of the Art in BlockCipher Design

We start with the formal definition of a block cipher as a family of permutations ona finite message space M indexed by a key from a finite key space K. As modernciphers are implemented on a computer, we will without loss of generality encodemessages and keys by binary vectors of fixed length n and κ, respectively.

Definition 2.1 (Block Cipher). Let n, κ ∈ N. An (n, κ)-block cipher is a function

E : Fn2 × Fκ2 → Fn2

with the property that, for each k ∈ Fκ2 , the projection Ek := E(·, k) is a permutationon Fn2 . Thereby, k is called the key of the keyed instance Ek.

Given an (n, κ)-block cipher E, we refer to the parameter n as its block lengthand to κ as its key length. Whenever the parameters are clear from the context,we simply refer to E as a block cipher. Given the values x ∈ Fn2 and y = Ek(x), wecall x the message or plaintext and y the ciphertext corresponding to encryptionunder Ek. A ciphertext y ∈ Fn2 can be decrypted to the corresponding plaintext xvia x = Ek

−1(y).Typical values for the block length in modern ciphers are 64, 128 or 256. To

encrypt messages of arbitrary length, one can employ the block cipher in a so-called mode of operation. This allows to extend the block cipher E to a familyof permutations E on Fn2

∗, i.e., the set of binary strings of any finite length thatis a multiple of n. As an example, we mention the counter mode (CTR mode),which is included in the list of recommendations from the US National Instituteof Standards and Technology (NIST) [Dwo01]. In the CTR mode, a message(x1, . . . , xm) ∈ Fn2

∗, with xi ∈ Fn2 , is encrypted under the key k ∈ Fκ2 via

(x1, . . . , xm)Ek7→ (x1 + Ek(c1), x2 + Ek(c2), . . . , xm + Ek(cm)) (2.1)

5

for some pre-defined counter values ci ∈ Fn2 . For instance, c1 can be chosen as theencoding of a natural number l1 < 2n and, for i > 1, ci as the encoding of li−1 + 1mod 2n. In this thesis, we only study block ciphers and do not focus on theiroperation modes. Therefore, we do not go into further detail here. More exampleson the possible usage of block ciphers can be found, e.g., in [KR11, Section 4].

It is worth remarking that it must be possible to evaluate the block cipherE in an efficient way. An efficient evaluation is crucial as the cipher actuallyhas to be computed in practice. As an example, one could imagine a random(128, 128)-block cipher which is simply defined by its look-up table. However, thislook-up table would contain so much information (i.e., its description would be2128 · 2128 · 128 bits) that it is not possible to store it and thus to evaluate thecipher in practice. Therefore, one has to rely on simpler constructions of blockciphers. Typical constructions of modern block ciphers are shown in Section 2.2.First, we explain what we require from a secure cipher.

2.1 Security Notions

One of the main purposes of a block cipher is to protect confidential data fromthe eyes of a third party, called the adversary. Therefore, it is necessary to firstdefine some notion of security and to precise against what kind of adversaries onehas to protect. Before we actually provide a formal definition, we give an informalintuition in the following.

Without explaining what security actually means, we require that the securityof a block cipher E (or better said the security of a keyed instance Ek) shouldsolely rely on the secrecy of the key k and not on the secrecy of the definition of E.In particular, we assume that the description of E is publicly available and, givenk and x, everyone can efficiently evaluate Ek(x) or its inverse Ek

−1(x). This goesback to one of the postulates of Auguste Kerkhoff from 1883, which is nowadaysbetter known as Kerkhoff’s principle.1

Kerkhoff’s Principle [Ker83]. The security of a cryptosystem should notrely on its secrecy. It should stay secure, even if it is in the hands of theadversary.

One of the main benefits of sticking to this principle is that, due to the publicknowledge of E, the security of the block cipher can be analyzed by experts from allover the world and possible weaknesses are more likely to be revealed. Moreover,(short) keys are easier to keep secret than whole algorithms.

But what do we actually require from a secure cipher? Obviously, we do notwant that, given some plaintext/ciphertext data (xi,Ek(xi))i, an adversary is ableto reveal the secret key k. If the key would be in the hands of the adversary, it

1The postulate was originally stated in French. This is a loose translation.

6

could simply decrypt all encrypted messages it intercepts. Therefore, protectionagainst those so-called key-recovery attacks is a necessary requirement. However,this is not sufficient. Just consider the trivial block cipher E for which Ek = id forall keys k. Clearly, it is not possible to reveal any information on the actual keyas it is simply not used for encryption. Obviously, such a cipher should not beconsidered secure as one immediately obtains information on the plaintext (in thisexample, one even obtains the whole plaintext). What we rather want to achieveis that any keyed instance is indistinguishable from a permutation that is chosenuniformly at random. This has become the state-of-the-art notion of security andwe mainly aim for protecting against those so-called distinguishing attacks.

The Power of the Adversary

We further have to think about what kind of adversaries we want to protect against.In particular, one can make different assumptions on the kind of data that is inthe hands of the adversary.

In a Ciphertext-Only Attack (COA), the adversary has only knowledge of asingle (or multiple) ciphertext(s) (Ek(xi))i. Its aim is to gain some information onthe message x or the secret key k.

In a Known-Plaintext Attack (KPA), the adversary knows some pre-definedmessage/ciphertext pairs (xi,Ek(xi))i.

In a Chosen-Plaintext Attack (CPA), the adversary is able to choose messagesxi on its own and has knowledge of (xi,Ek(xi))i, i.e., it is able to encrypt anychosen message.

In a Chosen-Ciphertext Attack (CCA), the adversary has knowledge of tuples(xi,Ek(xi))i for either chosen plaintexts xi or chosen ciphertexts Ek(xi), i.e., it isable to encrypt and decrypt messages of its choice.

As a designer of a cipher, it is reasonable to be rather conservative. In par-ticular, one usually aims for protecting against the last two scenarios (CPA andCCA) as they assume the most powerful adversaries.

A Formal Security Definition

We have already given some intuition on the requirements of a secure block cipher.In this section, we would like to outline – in a more formal way – what it actuallymeans for a block cipher to be secure against CPA and CCA attacks. For this,we use the notion of a pseudorandom permutation as described in, e.g., [BR05,Section 4]. Note that in the whole of this chapter, we only consider the single-key adversary model. The more involved related-key model is considered later inChapter 5.

As we want to protect against distinguishing attacks, we require from a securecipher that once the key is fixed and kept secret, an adversary cannot efficientlydistinguish the keyed instance from a permutation that is chosen uniformly atrandom. Here, the restriction to efficient adversaries is crucial. As an examplefor a non-efficient adversary, one could consider an algorithm that simply tries

7

all possible keys k and computes Ek(x) for all x. Of course, the adversary coulddistinguish a keyed instance from a random permutations as it simply computedthe complete look-up table for Ek (see also Example 2.1). Therefore, we naturallyhave to restrict the computational resources of the adversary and we only considercomputationally-bounded algorithms as possible adversaries.

Formally, a (q, t)-adversary is defined as a (probabilistic) algorithm Aq,t withrunning time2 at most t that is allowed to make at most q oracle queries to givenfunctions O1, . . . ,Ol. Thereby, q indicates the total number of allowed queries.After termination, the algorithm outputs a bit b ∈ {0, 1}. We write AO1,...,Ol

q,t ⇒ bto denote the event that the adversary Aq,t outputs b, given oracle access toO1, . . . ,Ol. In this context, oracle access to some function O means that theadversary has no further knowledge about how the function is defined, except thatit learns the output O(x) whenever it requests the function value of x.

Now, suppose we are given a block cipher E : Fn2 × Fκ2 → Fn2 . We consideran adversary Aq,t whose goal is to distinguish a keyed instance Ek from a per-mutation chosen uniformly at random from the class of all permutations on Fn2 .The adversary will either interact with the keyed instance Ek or with a randompermutation (as oracles) and has to decide (by outputting the bit b) with whichparticular permutation it interacted. In this model, the (CPA)-advantage of theadversary Aq,t is formally defined as

AdvPRP−CPAE (Aq,t) := Prob

k$←Fκ2

(AEkq,t ⇒ 1)− Prob

π$←Permn

(Aπq,t ⇒ 1

).

Thereby, we denote by k$← Fκ2 that the key k is chosen uniformly at random

from the set Fκ2 of possible keys, and by π$← Permn that the permutation π is

chosen uniformly at random from the set of all permutations on Fn2 . Thus, theprobabilities are defined over the uniform choices of k and π and over the randomchoices that the probabilistic adversary Aq,t makes.

One can further allow oracle access to the inverse permutation as well in orderto equip the adversary with more power. Then, the (CCA)-advantage of Aq,t canbe formally defined as

AdvPRP−CCAE (Aq,t) := Prob

k$←Fκ2

(AEk,Ek−1

q,t ⇒ 1)− Probπ

$←Permn(Aπ,π

−1

q,t ⇒ 1) .

These notions can be considered as a measurement on how well a particularadversary is able to “break” the cipher. In the following, we give an example ofan adversary that adheres to the very generic brute-force strategy.

Example 2.1 (Brute-Force Attack). We consider the family of adversaries BFq,t,that is given oracle access to an n-bit permutation O, as described in Algorithm 2.1.The constant TE denotes the time to evaluate one block cipher call Ek′(x) dependingon the machine on which the algorithm runs. For simplicity, we neglect the timefor the other operations executed in Algorithm 2.1 and thus basically measure the

2In principle, one could also consider other computational resources, e.g., memory.

8

time with regard to the number of block cipher calls.3 One can give a lower boundon the advantage of the adversary BFq,t. In particular, we have

Probk

$←Fκ2

(BFEk

q,t ⇒ 1)≥

⌊tqTE

⌋2κ

and

Probπ

$←Permn

(BFπq,t ⇒ 1

)≤ Prob (∃k′ : ∀i ∈ {1, . . . , q} : yi = Ek′(xi))

≤ 2κ

2n(2n − 1) . . . (2n − q + 1).

This immediately leads to a lower bound for the advantage as

AdvPRP−CPAE (BFq,t) ≥

⌊t

qTE

⌋2−κ − 2κ

2n(2n − 1) . . . (2n − q + 1).

The second term is the only one which is dependent on the block length n andexponentially tends to zero as the number of oracle queries q increases. Thus,a very small value for q suffices in order to approximate this term by zero. Forinstance, if E is a (128, 64)-block cipher, we could choose q = 1 and t such that⌊tTE

⌋= 263 and obtain

AdvPRP−CPAE (BFq,t) ≥ 49.9999% .

The algorithm would therefore be at least about 50% more successful in distinguish-ing the cipher from a random permutation than an algorithm which simple guesses

the value b$← {0, 1}.

When the block length is not larger than the key length, one would choose aslightly higher value for q. For instance, for a (64, 64)-block cipher, q = 2 wouldbe reasonable.

The brute-force attack described above is very generic and does not exploit anyparticular structure of the block cipher. Moreover, it suggests that a sufficientlylarge key length should be used. Although the execution of 263 block cipher calls isfairly huge, it is not completely out of reach, especially with a large supercomputer.In comparison, the three versions of the Advanced Encryption Standard (AES),which were standardized by NIST in 2001 [PUB01], provide key length of 128, 196and 256, respectively. In its report on lightweight cryptography [MBTM17], NISTindicates that new ciphers to be standardized should provide a key length of atleast κ = 112.

From a secure block cipher, we would require that there exists no adversarywith a significant high advantage for reasonable restrictions on q and t. This leadsto the notion of a (strong) pseudorandom permutation as defined in the following.

3We do not measure the time for the oracle queries as q in BFq,t would usually be fairly low,i.e., q ≤ 3

9

Algorithm 2.1 BFq,t

1: Choose q pairwise different x1, . . . , xq$← Fn2

2: Compute (y1, . . . , yq)← (O(x1), . . . ,O(xq))3: L← {}4: for i ∈ {1, . . . , b t

qTEc} do

5: k′$← Fκ2 \ L

6: if ∀j ∈ {1, . . . , q} : yj = Ek′(xj) then7: return 18: end if9: L← L ∪ {k′}

10: end for11: return 0

Definition 2.2 ((Strong) Pseudorandom Permutation, see e.g., [BR05]). We saythat the block cipher E is a (q, t, ε)-PRP (pseudorandom permutation) if

maxAq,t

AdvPRP−CPAE (Aq,t) ≤ ε .

Analogously, we say E is a (q, t, ε)-SPRP (strong pseudorandom permutation) if

maxAq,t

AdvPRP−CCAE (Aq,t) ≤ ε .

These notions of a pseudorandom permutation and a strong pseudorandom per-mutation formalize the intuition of security against chosen-plaintext and chosen-ciphertext attacks, respectively. However, these notions are only capable of defin-ing security upto a certain threshold (ε) and only against adversaries with boundedresources (q, t). Therefore, in practice one always has to evaluate what realisticassumptions on the adversary are and what the particular level of security onewants to achieve is. This of course highly depends on the particular application.As a guidance, one may achieve that there is basically no better attack than bruteforce.

Provable Security and Block Cipher Design

The benefit of the above formal security definition is that it provides a frameworkfor proving security of (more complex) constructions like cryptographic protocols,modes of operations, or other other (S)PRPs by reducing to the security of theunderlying (S)PRPs. In other words, under the assumption that E is a (strong)pseudorandom permutation with parameters (q, t, ε), one tries to build other con-structions that can be proven secure for certain parameters (q′, t′, ε′) dependingon (q, t, ε). Lots of research is done that mainly focuses on this kind of provablesecurity. However, by the conceptual method of reduction, the security of com-plex cryptographic systems relies on the security of the underlying building blocks

10

R0 R1 Rt

k0 k1 kt

. . .x Ek(x)

KeySchedulek

Figure 2.1: A product cipher.

(here the underlying block cipher). Consequently, one needs a secure cipher tostart with.

The scope of block cipher design – as we focus on in this thesis – can be phrasedas building efficient algorithms to be used as block ciphers and providing soundarguments that justify the assumption for such an algorithm being a (strong)pseudorandom permutation. In fact, we are not aware of any block cipher that isproven to be an (S)PRP for reasonable parameters q, t, ε without any reductionargument. The problem is simply that, given parameters q, t, one has to considerall possible (q, t)-adversaries. Rather, the state of the art is to propose a blockcipher design and then analyze the resistance of the block cipher against certain,well-studied, kind of attacks. In a nutshell, one studies adversaries that adhere tospecific strategies, the most important being differential and linear attacks. Theyare described in detail in Section 2.3.1 and Section 2.3.2, respectively.

The designers of new block ciphers usually claim security of their cipher upto certain parameters and directly provide sound arguments on the resistanceagainst well-studied attack strategies. After the publication of the design, thescientific community further analyzes possible methods for attacking the cipher.If no significant weaknesses have been found over a certain period of time (usuallya few years), it may be safe to conjecture the cipher being an (S)PRP up to certainparameters (q, t, ε). Then, the block cipher may be used as a building block inprovable secure constructions.

As the most prominent example of a block cipher, we refer to the AES. Itsspecification is publicly available for over 15 years now and it can certainly beconsidered as the best-analyzed block cipher to date. No significant weaknesseshave been spotted. The cipher will be explained in detail in Section 2.4.2.

2.2 Block Cipher Constructions

We have already seen an example of an obviously bad block cipher, a cipher inwhich the key has no influence on the encryption at all. Before we explain commonconstructions for block ciphers today, we give another example of a weak blockcipher. For that, we consider a cipher E that splits into two (or more) smaller

11

ciphers E′ and E′′, i.e., a message x = (x1, x2) is encrypted as

Ek(x1, x2) = (E′k′(x1),E′′k′′(x2)) .

Such a construction should not be considered as a secure block cipher as changingonly one part of the message (x1 or x2) will only change one part of the ciphertext.This may allow for statistical attacks in which the knowledge of the distributionof possible messages is exploited. In the seminal work of Claude Shannon from1949 [Sha49], the author provides several design principles that a modern ciphershould adhere to in order to avoid such statistical attacks. These principles werecalled confusion and diffusion and, indeed, they are still of major importance inmodern designs.

According to Shannon, confusion means “to make the relation between thesimple statistics of (the ciphertext) and the simple description of (the key) a verycomplex and involved one”. Further, to stick to his wording, “in the method ofdiffusion the statistical structure of (the message) which leads to its redundancy is‘dissipated’ into long range statistics – i.e., into statistical structure involving longcombinations of letters in the cryptogram” [Sha49, Section 23].

Obviously, we further want to avoid that a cipher is linear. If that would be thecase, an adversary would just need the encryptions of a basis to be able to encryptor decrypt arbitrary messages. Therefore, every secure block cipher needs a non-linear component. Most modern designs today realize diffusion by the applicationof a linear transformation, whereas a non-linear operation provides confusion. Thenon-linear operation is often realized as a substitution operation on parts of themessage space (see Section 2.2.2).

Before we go into more detail of modern block cipher constructions, we explainthe notion of a product cipher. This concept also goes back to [Sha49] and today,almost all practical block ciphers can be defined as a product cipher. The high-levelstructure is depicted in Figure 2.1. Thereby, the cipher E is built from multipleiterations of other (more simple) block ciphers Ri, so-called rounds. Each round Ri

operates under the influence of a specific round key that is derived from the initialkey for E by the so-called key-scheduling algorithm. In other words, each keyedinstance Ek : Fn2 → Fn2 is defined as

Ek = Rtkt ◦ · · · ◦ R1k1 ◦ R0k0 ,

for previously-defined block ciphers Ri : Fn2 × Fκi2 → Fn2 and round keys ki whichare derived by a function KeySchedule : Fκ2 → Fκ0

2 × · · · × Fκt2 as (k0, k1, . . . , kt)=KeySchedule(k).

There are two main benefits of building block ciphers as product ciphers.Firstly, they tend to be more efficient to implement. If, for instance, the sameround is iterated several times, it only has to be implemented once. Secondly,a product cipher can often be analyzed more easily. In fact, modern block ci-phers iterate quite simple round functions several times and the security analysisis most often conducted under the assumption of independent round keys4 (see

4For that reason, we omit the discussion on how to design a key schedule.

12

R0 R1 Rt

k0 k1 kt

x Ek(x)

KeySchedulek

. . .

Figure 2.2: A key-alternating cipher.

Assumption 2.2).

2.2.1 Key-Alternating Ciphers

A special type of product cipher is the key-alternating cipher [DGV95, DR02] towhich a majority of modern designs belong. This type of product cipher exactlydescribes the way the particular round keys are introduced within the rounds. Thestructure is depicted in Figure 2.2. The key-scheduling function has to generateround keys in Fn2 , i.e., KeySchedule : Fκ2 → Fnt+1

2 , and each round Ri is defined as

Ri : Fn2 × Fn2 → Fn2(x, ki) 7→ Addki (Ri (x)) ,

where Ri is a bijection on Fn2 and Addki is defined as5

Addki : Fn2 → Fn2 , x 7→ x+ ki .

We then have that any keyed instance of E can be written as

Ek = Addkt ◦Rt ◦ · · · ◦Addk1 ◦R1 ◦Addk0 ◦R0 .

Note that, according to Kerkhoff’s principle, we assume that all of the unkeyedround functions Ri are publicly known. In particular, any known plaintext x canbe transformed by R0 to R0(x) and thus, the application of R0 in the beginningadds no security to the cipher at all. In practice, one therefore usually starts withthe addition of k0, i.e., R0 = id. Without loss of generality, we can assume R0 tobe the identity and refer to the construction depicted in Figure 2.2 (with R0 = id)as a t-round key-alternating cipher.

The benefit of the key-alternating structure is that, on the one hand, it allowsfor an easier analysis of the cipher under the assumption of independent roundkeys and, on the other hand, it allows for a quite efficient implementation of the

5Note that, in the literature, the vector addition + in Fn2 is also often denoted by ⊕ as itcorresponds to the bit-wise exclusive-or (XOR) operation.

13

k0 k1 kt

x Ek(x)

KeySchedulek

. . .

k2

S L S L S L

Figure 2.3: A substitution-permutation (SP) cipher.

cipher as changing the particular keyed instance can be realized by simply addingother round keys. Moreover, the unkeyed round functions Ri are often (almost)identical and therefore only a single round has to be implemented.

A majority of common key-alternating block ciphers follow the notion of asubstitution-permutation cipher, which we explain in the following.

2.2.2 Substitution-Permutation Ciphers

A substitution-permutation cipher (also called SP cipher or SP network) defines aspecial structure of the round functions in a product cipher. Thereby, the roundsconsist of the application of a non-linear operation S (also called substitution layeror S-box layer), which is realized as a parallel application of smaller functions(so-called S-boxes), and the application of a linear transformation L (also calledlinear layer). Originally, the structure of a substitution-permutation network wasintroduced in [FNS75], where the linear layer L was defined as a permutation ofbits, i.e., a to L associated matrix is a permutation matrix over F2. Nowadays,most SP ciphers follow the notion of a key-alternating cipher as explained aboveand the unkeyed round functions can be decomposed into the (invertible) non-linear layer S and the (invertible) linear layer L. The structure is depicted inFigure 2.3. Whenever we refer to SP ciphers in the remainder of this thesis, weare talking about this particular structure.

Formally, the substitution layer S : Fn2 → Fn2 is defined as an ns-times parallelapplication of an invertible S-box Sb : Fs2 → Fs2 such that n = ns · s. In otherwords,6,7

S : Fn2 ∼= Fs2 × · · · × Fs2 → Fs2 × · · · × Fs2 ∼= Fn2(x1, x2 . . . , xns) 7→ (Sb(x1), Sb(x2), . . . , Sb(xns)) .

Usually, s is chosen to be rather small. In particular, common choices for s ares = 4 or s = 8 and therefore, those S-boxes could be simply implemented by storing

6In principle, one could use a different S-box at each position. However, for easier implemen-tation, usually the same S-box is applied in parallel.

7We interchangeable identify Fsns2 with (Fs2)ns .

14

f

kxl xr

yl yr

Figure 2.4: One round Ff of aFeistel cipher.

f

xl xr

yl yr

k

Figure 2.5: One round Ff of a key-alternating Feistel cipher.

their look-up tables. Sometimes, algebraic constructions for Sb are used. In thatcase, the S-boxes may also be computed during execution. Research has been donethat focusses on building cryptographically strong S-boxes and on optimizing theirefficiency, see for instance [Nyb93, Nyb94, Can05, CDL16, Sto16].

The linear layer L can be defined by a matrix M ∈ GLn(F2) by fixing a particu-lar choice of basis. Then, the application of L corresponds to (left-) multiplicationwith M . Nowadays, in an SP cipher, one allows L to be any invertible lineartransformation that is not necessarily a permutation of bits. It was to a largeextend the wide-trail strategy [Dae95] that pioneered the usage of a general linearlayer instead of a bit permutation. Although a bit permutation may allow foran easier implementation, the wide-trail strategy suggested that the usage of aslightly more complex linear layer may allow for better trade-offs between securityand efficiency. We explain the wide-trail strategy in more detail in Section 2.4.

To return to the principles of Shannon, the S-boxes are to a large extendresponsible for obtaining (local) confusion within the cipher. Then, the linearlayer should diffuse the information over the whole n-bit state. It is exactly thislinear layer we focus on in this thesis. In particular, we study the questions of howto design the linear layer in order to improve security of the cipher and efficiencyof its implementation with regard to certain lightweight metrics (as explained inSection 2.5).

2.2.3 Feistel and ARX

Another important block cipher construction is the so-called Feistel cipher (orFeistel network), which was first introduced in the design of Lucifer (see [Sor84]),a predecessor of the Data Encryption Standard (DES). The DES was developed

15

in the ’70s an was published as a US FIPS standard in 1977 [PUB77]. A detaileddescription of the cipher can be found, e.g., in [KR11]. In particular, a Feistelcipher is another special form of product cipher in which all the rounds Ri are ofthe form Ff , which we define below. To define a Feistel round, the requirement is

that the block length n is an even number. Let f : Fn22 × Fκ2 → F

n22 be any keyed

function from Fn22 × Fκ2 to F

n22 . It is not required that f fulfills the definition of a

block cipher, i.e., a keyed instance fk = f(·, k) has not necessarily to be a bijection.One also calls f the Feistel function. The Feistel round Ff of an (n, κ)-block cipheris then defined as8

Ff : Fn22 × F

n22 × Fκ2 → F

n22 × F

n22

(xl, xr, k) 7→ (fk(xl) + xr, xl) .

In other words, any input x ∈ Fn2 is split into two halves xl and xr, the lefthalf xl will be copied to the right half of the output and the left half of theoutput consists of the right input half xr, which is XORed with fk(xl). TheFeistel round is illustrated in Figure 2.4. It is easy to see that Ff fulfills thedefinition of an (n, κ)-block cipher. In particular, the inverse of any keyed instanceFfk : (xl, xr) 7→ (fk(xl) + xr, xl) can be given as

Ffk

−1: (xl, xr) 7→ (xr, fk(xr) + xl) .

This already illustrates the advantages of Feistel ciphers; firstly that the Feistelfunction f can be any keyed function and, secondly, that decryption is almostidentical to encryption, resulting in a low implementation overhead.

In its general form (and also for the particular structure of the DES), Feistelciphers are structurally different from key-alternating ciphers. However, when theFeistel function f is of the special form

f : Fn22 × F

n22 → F

n22

(x, k) 7→ Addk(f(x))

for a public, unkeyed function f : Fn22 → F

n22 , the resulting Feistel cipher can be

considered as a key-alternating cipher in which the round keys are only added tothe right halves. In particular, any keyed instance of one Feistel round can thenbe written as

Ffk : (xl, xr) 7→ (f(xl) + xr + k, xl) .

The structure is depicted in Figure 2.5. An example of a key-alternating Feistelcipher is the lightweight cipher Simon [BSS+13].

8Originally, instead of copying the left half of the input and processing it through the Feis-tel function, the right half was copied and processed by the Feistel function. Without loss ofgenerality, we stick to our notion of a Feistel round.

16

Example 2.2 (Simon). Simon is a block cipher family that was designed by theNational Security Agency (NSA) and was published in 2013. It was designed forachieving exceptional good performance when implemented on a variety of plat-forms. It employs a very simple function f in the key-alternating Feistel set-ting. The cipher comes in different versions, supporting block lengths of n ∈{32, 48, 64, 96, 128}, and f can be defined for all corresponding values of n

2 as

f : Fn22 → F

n22

x 7→ (ϑ1(x) ∧ ϑ2(x)) + θ(x) ,

where ∧ denotes the component-wise F2-multiplication of binary vectors and ϑ1, ϑ2

and θ are cyclic rotations to the left by eight, one and two bits, respectively. For-mally,

ϑ1 : (x1, x2, . . . , xn2 ) 7→ (x9, x10, . . . , xn2 , x1, x2, . . . , x8) ,

ϑ2 : (x1, x2, . . . , xn2 ) 7→ (x2, x3, . . . , xn2 , x1) ,

θ : (x1, x2, . . . , xn2 ) 7→ (x3, x4, . . . , xn2 , x1, x2) .

Simon can be considered as a rather innovative design because of the simplicityof the Feistel function and because of its key schedule. Most of the security analysisof the cipher was done using computer-aided methods. We consider Simon andSimon-like designs in more detail in Chapter 7, putting the focus on deriving amore theoretical security argument.

Several generalizations of Feistel networks have been introduced, for instanceunbalanced Feistel networks [SK96], i.e., Feistel constructions in which the twoinput parts xl, xr do not have to be of the same length, or generalized Feistelconstructions which allow to split the input into more than two parts [ZMI90,Nyb96].

Add-Rotation-XOR (ARX) Ciphers

Some ciphers avoid the usage of smaller S-boxes for their non-linear operation.Instead, they rely on arithmetic operations applied on the state. This in particularavoids the implementation of table look-ups. For efficiency reasons, in many ofsuch designs, three basic arithmetic operations are employed, i.e.,

(i) Addition modulo 2d,

(ii) Cyclic bit-wise rotations, and

(iii) XOR operations (component-wise F2-addition in Fd2) .

We have already seen Simon as an example of a cipher that applies cyclic rotationsand XOR operations. Those two operations are F2-linear. The non-linear opera-tion is the addition modulo 2d. There, the d-bit vectors are regarded as integerssmaller than 2d and the addition modulo 2d is the group operation in (Z2d ,+).

17

The benefit of modular addition is its efficiency in software implementations. In-deed, ARX ciphers can often be implemented using only few lines of code. Blockciphers belonging to the class of ARX designs include for example FEAL [SM88],TEA [WN95], Speck [BSS+13], or SPARX and LAX [DPU+16].

While the wide-trail strategy offers a powerful method for evaluating the se-curity of S-box based ciphers, deriving general arguments on the resistance ofARX designs against the most important attack methods (differential and linearcryptanalysis) has been a long-standing open problem. Recently, Dinu et al. pre-sented a strategy for proving resistance against differential and linear attacks bydesign [DPU+16].

2.3 Cryptanalytic Attacks

In this section, we explain the most important cryptanalytic attacks on block ci-phers, i.e, differential and linear attacks. They have gained so much importancethat every new cipher should come along with strong arguments for the resistanceagainst them. All in all, differential and linear attacks can be seen as the corner-stones of cryptanalysis of modern block ciphers. They aim for exploiting specificproperties of the cipher that allow to distinguish it from a permutation chosenuniformly at random. Especially for product ciphers, distinguishing attacks canalso be seen in the context of key recovery. In particular, the adversary may guessthe last round key(s) of the cipher and decrypt ciphertexts over the last round(s).If there exists a distinguishing attack on the reduced-round cipher, the adversarymay now be able to validate its key guess. Thereby, it is assumed that a wrongkey guess randomizes the intermediate values corresponding to the round-reducedciphertexts that are obtained by the partial decryption under the wrong key.

The standard approach of a designer of a new (product) cipher is to prove,mostly under simplifying assumptions, that it is not possible to efficiently dis-tinguish t rounds of the proposed product cipher from a random permutationusing differential and linear attacks. Then, the final cipher will be specified as at + tm-round version of the product cipher, where tm denotes a reasonable secu-rity margin. Therefore, already with explaining the attacks in the following, wemention the designer’s standard security arguments for obtaining confidence in theresistance against them.

2.3.1 Differential Cryptanalysis

The technique of differential cryptanalysis was introduced by Biham and Shamirin 1990 as an attack on round-reduced versions of DES [BS91a, BS91b]. The gen-eral idea can be phrased as analyzing how differences in the input of the cipherpropagate through differences in the output. It turned out to be a powerful crypt-analytic technique and nowadays, the designers of new ciphers are expected toprovide strong arguments for the resistance against differential cryptanalysis. Weexplain the details of this attack method in the following. Thereby, we keep the

18

focus on product ciphers and, especially, key-alternating ciphers as it is also donein, e.g., [DR02].

Definition 2.3 (Differential Probability). For a vectorial function F : Fk2 → Fl2and vectors α ∈ Fk2 , β ∈ Fl2, we define

∆F (α, β) := {x ∈ Fk2 | F (x) + F (x+ α) = β} .

The pair (α, β) is said to be a differential over F (also denoted αF→ β) and the

differential probability is defined as

Prob(αF→ β) :=

|∆F (α, β)|2k

.

Thus, Prob(αF→ β) describes exactly the probability that F (x) + F (x + α) = β

over a uniformly chosen x ∈ Fk2 .

The simple idea of an adversary that adheres to differential cryptanalysis is thatit is in possession of a differential (α, β) over Ek that holds with high probability,

i.e., Prob(αEk→ β) > 2−n. The adversary can now distinguish the keyed instance

from a random permutation by querying the oracle O for many randomly choseninput pairs (x, x + α) and checking whether the output difference O(x) + O(x +α) equals β as often as one would expect by the differential probability. Thus,differential cryptanalysis is an example of a chosen-plaintext attack.

As the block cipher is a family of permutations parametrized by a key, we haveto differentiate between the fixed-key probability of a differential (α, β) and theexpected differential probability when averaged over all possible keys. Formally, foran (n, κ)-block cipher E, the expected differential probability of a differential (α, β)is defined as

EDPE(α, β) :=1

2κ

∑k∈Fκ2

Prob(αEk→ β) .

A priori, the adversary has no knowledge of the actual key k of the keyed in-stance Ek it wants to distinguish from a random permutation. Moreover, it mayactually want to exploit a differential that holds with a high expected differentialprobability. Thus, for practical reasons, we implicitly assume that the fixed-keyprobability of a differential is to a large extend independent of the actual key used.This assumption was first formulated in [LM91] as the Hypothesis of StochasticEquivalence.

Assumption 2.1 (Hypothesis of Stochastic Equivalence). Given a block cipher Eand a differential (α, β), then

EDPE(α, β) ≈ Prob(αEk→ β)

for ”almost” all keys k.

19

Thus, a designer would like to guarantee a low upper bound on the expecteddifferential probability of any non-zero differential. In the following, we elaboratemore on differential cryptanalysis of product ciphers and key-alternating cipher inparticular.

Differential Cryptanalysis of Product Ciphers

Any keyed instance Ek of a product cipher can be written as iterations of bijectiveround functions as

Ek = Rtkt ◦ · · · ◦ R1k1 ◦ R0k0 .

For an iterative function, besides of just considering a differential over the functionitself, one can moreover consider all the intermediate differences after each iterationsimultaneously. This leads to the definition of a differential trail as follows.

Definition 2.4 (Differential Trail, see, e.g., p. 117 in [DR02]). Let F : Fn2 → Fn2be an iterative function of the form F = Ft ◦ · · · ◦F2 ◦F1 with Fi : Fn2 → Fn2 . Givent+ 1 vectors α0, . . . , αt ∈ Fn2 , we define

∆F1,...,Ft(α0, . . . , αt) := {x ∈ Fn2 | ∀ 1 ≤ i ≤ t : Fi . . . F1(x)+Fi . . . F1(x+α0) = αi}.

The tuple (α0, . . . , αt) ∈ (Fn2 )t+1

is said to be a differential trail over F (also

denoted α0F1→ α1

F2→ . . .Ft→ αt) and the probability of the differential trail is

defined as

Prob(α0F1→ α1

F2→ . . .Ft→ αt) :=

|∆F1,...,Ft(α0, . . . , αt)|2n

.

Thus, Prob(α0F1→ α1

F2→ . . .Ft→ αt) describes exactly the probability that, over a

uniformly chosen x ∈ Fn2 , the intermediate difference Fi . . . F1(x)+Fi . . . F1(x+α0)is equal to αi, for all i ∈ {1, . . . , t}.

It follows that for an iterative function F : Fn2 → Fn2 , one obtains the probability

of a differential α0F→ αt as the sum of the probabilities of all its containing

differential trails, i.e.,

Prob(α0F→ αt) =

∑α1,...,αt−1∈Fn2

Prob(α0F1→ α1

F2→ . . .Ft−1→ αt−1

Ft→ αt) .

There remains the question of how to efficiently compute the probability of adifferential trail, especially for larger values of n. Indeed, this is a rather difficulttask. However, if we assume that the probabilities of the differentials over Fiare independent, one can compute the probability of the differential trail as theproduct of the probabilities of the differentials over the Fi. This assumption islikely to make the computation much easier, especially if the definitions of the Fiare quite simple. We outline this property for the case of product ciphers in thefollowing.

20

Let E be a product cipher with rounds R0, . . .Rt. If all the rounds Ri are suchthat, for each differential (α, β), the expected differential probability EDPRi(α, β)is independent of the choice of the actual input value x, i.e., if

∀x ∈ Fn2 :∑k∈Fn2

Prob(αRik→ β) = |{k ∈ Fn2 | Rik(x) + Rik(x+ α) = β}| , (2.2)

it can be shown that the average probability of any differential trail when averagedover all possible round keys can be computed as the product of the average proba-bilities of its single-round differentials when averaged over all round keys [LM91].In other words,

∑k0,...,kt∈Fn2

Prob(α0

R0k0→ . . .Rtkt→ αt+1) =

∑k0,...,kt∈Fn2

t∏i=0

Prob(αiRiki→ αi+1) .

Product ciphers for which the rounds fulfill Equation 2.2 were defined as Markovciphers in [LM91].9 Key-alternating ciphers are a special case of product ciphersfor which the following, well-known connection can be shown.

Theorem 2.1. Let E be an (n, (t+ 1)n)-block cipher that is defined as a t-roundkey-alternating cipher (as depicted in Figure 2.2, with R0 = id). Let further

KeySchedule : F(t+1)n2 → (Fn2 )t+1 bijectively map the (t+ 1)n-bit initial key to t+ 1

round keys of n-bit. Then, the expected differential probability of a differential(α0, αt) can be given as

EDPE(α0, αt) =∑

α1,...αt−1∈Fn2

t∏i=1

Prob(αi−1Ri→ αi) . (2.3)

Proof. One can see that the rounds for decryption, i.e.,

Ri−1 : Fn2 × Fn2 → Fn2

(x, k) 7→ R−1i (x+ k) ,

fulfill the properties of a Markov cipher as in Equation 2.2. In particular, for allx ∈ Fn2 , it is∑

k∈Fn2

Prob(αRi−1k→ β) =

∑k∈Fn2

|{x ∈ Fn2 | R−1i (x+ k) +R−1

i (x+ k + α) = β}|2n

= |{x ∈ Fn2 | R−1i (x) +R−1

i (x+ α) = β}|= |{k ∈ Fn2 | R−1

i (x+ k) +R−1i (x+ k + α) = β}|

= |{k ∈ Fn2 | Ri−1k (x) + Ri

−1k (x+ α) = β}| .

9In [LM91], the theory was developed for a more general notion of “difference” that canbe defined for any group operation on the message space. However, as we are focussing onkey-alternating ciphers, we only consider XOR differences in this thesis.

21

Moreover, for all keys k ∈ Fn2 , we have Prob(αRi−1k→ β) = Prob(α

R−1i→ β). For

the t-round key-alternating cipher E, one can now deduce the following on theexpected differential probability of a differential (α−1, αt):

EDPE(α−1, αt) =1

2(t+1)n

∑k∈F(t+1)n

2

Prob(α−1Ek→ αt)

=1

2(t+1)n

∑k∈F(t+1)n

2

Prob(αtEk−1

→ α−1)

=1

2(t+1)n

∑k0...kt∈Fn2

∑α0,...αt−1∈Fn2

Prob(αtRt−1kt→ . . .

R1−1k1→ α0

R0−1k0→ α−1)

=1

2(t+1)n

∑α0,...αt−1∈Fn2

∑k0...kt∈Fn2

t∏i=0

Prob(αiRi−1ki→ αi−1)

=∑

α0,...αt−1∈Fn2

t∏i=0

Prob(αiRi−1

→ αi−1)

=∑

α0,...αt−1∈Fn2

t∏i=0

Prob(αi−1Ri→ αi)

=∑α0∈Fn2

Prob(α−1R0→ α0)

∑α1,...αt−1∈Fn2

t∏i=1

Prob(αi−1Ri→ αi) .

Now, by the definition of the t-round key-alternating cipher, we have R0 = id and

thus, Prob(α−1R0→ α0) = 1 if and only if α−1 = α0. Otherwise, the probability is

zero. By substituting α−1 with α0 and vice versa, we finally obtain

EDPE(α0, αt) =∑

α1,...αt−1∈Fn2

t∏i=1

Prob(αi−1Ri→ αi) .

The above theorem is only valid if there is no real key-scheduling algorithm, i.e.,if the initial key already consists of all the round keys. In other words, Equation 2.3only holds true under the assumption of independent round keys. When analyzinga cipher in practice, we will implicitly assume that the round keys are independentand that Equation 2.3 holds true.

Assumption 2.2 (Independent Round Keys). We assume that the round keysof a t-round key-alternating cipher are independent. Then, according to Theorem2.1, the expected differential probability of a differential (α0, αt) can be computedas given in Equation 2.3.

22

Differentials over the Building Blocks of SP Ciphers

As explained in Section 2.2.2, the rounds of SP ciphers have a special structure,i.e., they consist of a parallel application of smaller invertible S-boxes followedby the application of an invertible linear transformation. For those layers, it isstraightforward to derive the following, well-known properties on the differentialprobability. We therefore state the following proposition without proof.

Proposition 2.1 (Differential Probability over the Building Blocks of SP Ci-phers). One can give the differential probability over an S-box layer and a linearlayer as follows:

(i) Let S : (Fs2)ns → (Fs2)ns be the ns-times parallel application of a functionSb : Fs2 → Fs2. Then,

Prob(

(α1, . . . , αns)S→ (β1, . . . , βns)

)=

ns∏i=1

Prob(αiSb→ βi) .

(ii) Let L : Fn2 → Fn2 be a linear transformation. Then,

Prob(αL→ β) =

{1 if β = L(α)

0 else.

The Standard Argument on the Resistance against Differential Attacks

The standard argument of a designer is to guarantee a low upper bound on theexpected differential probability (over a round-reduced version of the cipher) ofany non-zero differential; based on a single differential trail. In other words, thedesigner shows that the product of the differential probabilities in any non-zero dif-ferential trail over the round-reduced version is below 2−n. Then, under the simpli-fying assumption that the expected differential probability as given in Equation 2.3is dominated by a single product, the expected differential probability would betoo low in order to be able to distinguish the round-reduced version of the cipherfrom a random permutation, thus rendering the cipher secure against differentialcryptanalysis. Note that this approach for proving resistance against differentialattacks also implicitly assumes the hypothesis of stochastic equivalence and theassumption of independent round keys (i.e., Assumption 2.1 and Assumption 2.2).

Extensions and Generalizations

Several variants and generalizations of differential attacks have been proposed.For instance, impossible differential attacks which utilize differentials with prob-ability exactly zero (see, e.g., [Knu98, Proposition1], [BBS05]). As other gen-eralizations, we refer to truncated differentials and higher order differential at-tacks [Lai94, Knu95].

23

2.3.2 Linear Cryptanalysis

The general idea of linear cryptanalysis, which was discovered by Matsui in 1993[Mat94], is to approximate a linear Boolean function of the cipher’s output by alinear Boolean function of the input. In particular, to distinguish a keyed instanceEk of an (n, κ)-block cipher E from a random permutation, the adversary wouldexploit linear functions lα, lβ : Fn2 → F2 for which the approximation

lα(x) = lβ(Ek(x))

holds with a high absolute bias, i.e., for many values x or only for few values x.We explain the most important concepts of linear cryptanalysis in the following.For a comprehensive study, we refer to [DR02]. A more recent systematization ofknowledge on linear cryptanalysis is given in [KLW17]. We basically follow thelines of the literature.

It is worth remarking that, for a fixed n, the linear Boolean functions in Bnform a binary vector space of dimension n which is isomorphic to Fn2 via

α ∈ Fn2 7→ (lα : x 7→ 〈α, x〉) .

Here, 〈α, x〉 denotes the canonical inner product, which is defined as∑i αixi in

F2. Note that, whenever α 6= 0, the corresponding linear function lα is balanced,i.e., the outputs 0 and 1 are taken equally often. Equivalently, this can be stated(see [Car07, Lemma 1]) as

∑x∈Fn2

(−1)〈α,x〉 =

{2n if α = 0

0 if α 6= 0. (2.4)

In the context of linear cryptanalysis, a linear approximation is usually defined interms of vectors (also called masks) α, β.

Definition 2.5 (Linear Approximation). For a vectorial function F : Fk2 → Fl2, alinear approximation is defined as a tuple (α, β) with α ∈ Fk2 , β ∈ Fl2. The bias ofthe linear approximation is defined as

εF (α, β) := Probx(〈α, x〉 = 〈β, F (x)〉)− 1

2=|{x ∈ Fk2 | 〈α, x〉 = 〈β, F (x)〉}|

2k− 1

2

and its correlation is defined as

corF (α, β) := 2 · εF (α, β) = 2 · Probx (〈α, x〉 = 〈β, F (x)〉)− 1 .

Thus, the correlation of a linear approximation (α, β) can take values between−1 and 1. In many of the literature, the correlation is equivalently represented interms of the Fourier transform as

corF (α, β) =1

2k

∑x∈Fk2

(−1)〈α,x〉+〈β,F (x)〉 . (2.5)

24

This representation, together with the fundamental fact given in Equation 2.4, isusually employed for proving the results presented in this section.

The simple idea of an adversary that adheres to linear cryptanalysis is that itis in possession of a linear approximation (α, β) over the keyed instance Ek thatholds with high absolute correlation, i.e., | corEk(α, β)| > 2−

n2 . The adversary

can now distinguish the keyed instance from a random permutation by queryingthe oracle O for many inputs x and checking whether 〈α, x〉 = 〈β,O(x)〉 holds asoften as one would expect by the correlation of the linear approximation.10 Incontrast to differential cryptanalysis, in which the adversary has to choose pairs ofplaintexts that fulfill a certain input difference, linear cryptanalysis is an exampleof a known-plaintext attack.

However, if the adversary wants to exploit a particular linear approximation(α, β), its correlation over the keyed instance of E is highly dependent on the actualkey k. In particular, this key-dependency can be stated in terms of the so-calledlinear hull, which was introduced by Nyberg in 1994 [Nyb95]. As it is explainedin detail in [KLW17], the correlation of the linear approximation over Ek can begiven as a signed sum of correlations of linear approximations over E, i.e.,

corEk(α, β) =∑γ∈Fκ2

(−1)〈γ,k〉 corE((α, γ), β) .

Therefore, the suitability of a linear approximation for attacking the cipher actu-ally depends on the distribution of the correlations for all possible keys. In otherwords, the attack might work for ”almost” all keys or only for (few) particularkeys, depending on this distribution. The keys for which the linear attack workare usually referred to as weak keys. Thus, for a thorough understanding of thesecurity of a cipher, one has to study this distribution. As the description of thelinear hull given above is the most general description of the key dependency forarbitrary block ciphers, we will see later in Corollary 2.1 that the linear hull inthe case where E is a t-round key-alternating cipher actually reduces to a muchconcreter and simpler expression.

Moreover, under the assumption of independent round keys (i.e., Assump-tion 2.2), one can derive the mean and variance of the distribution of correlationsover all keys (see Corollary 2.2 and Corollary 2.3). Of course, the actual distri-bution depends on the key-scheduling algorithm of the cipher and the assumptionof independent round keys is only for simplifying the analysis. In fact, not muchtheory on how the key-schedule affects this distribution is known. For more de-tails, we refer to e.g., [KLW17], where the focus is on linear key schedules. We nowelaborate more on linear cryptanalysis of key-alternating ciphers in particular.

10[Mat94] indicates that one needs about c · | corEk (α, β)|2 known plaintexts for a distinguisherwith a reasonable high advantage, where c is some small constant. Thus, if the absolute correla-tion would be smaller than 2−

n2 , there would not be enough plaintexts available.

25

Linear Cryptanalysis of Key-Alternating Ciphers

When having a linear approximation (α0, αt) over an iterative function, similar tothe notion of a differential trail, one can consider a chain of approximations overthe particular iterations. This leads to the notion of a linear trail as follows.

Definition 2.6 (Linear Trail (see [DGV95])). Let F : Fn2 → Fn2 be an iteratedfunction of the form F = Ft ◦ · · · ◦ F2 ◦ F1 with Fi : Fn2 → Fn2 . Given t+ 1 vectorsα0, . . . , αt ∈ Fn2 , the tuple (α0, . . . , αt) is said to be a linear trail over F and itscorrelation is defined as

corF1,...,Ft(α0, α1, . . . , αt) :=

t∏i=1

corFi(αi−1, αi) .

This leads to the following important theorem for iterative functions, whichwas first stated in [DGV95].

Theorem 2.2 (Theorem of Linear Trail Composition (Theorem 7.8.1 in [DR02])).Let F : Fn2 → Fn2 be an iterated function of the form F = Ft ◦ · · · ◦ F2 ◦ F1 withFi : Fn2 → Fn2 . Then, the correlation of the linear approximation (α0, αt) over Fcan be given as the sum of the correlations of all its containing linear trails, i.e.,

corF (α0, αt) =∑

α1,...,αt−1∈Fn2

corF1,...,Ft(α0, α1, . . . , αt−1, αt) .

Proof. Without loss of generality, we show the theorem for F = F2 ◦ F1, whereF1, F2 : Fn2 → Fn2 . The general case then follows by induction. We use the repre-sentation given in Equation 2.5 and the fundamental fact given in Equation 2.4.In particular, for fixed α0, α2 ∈ Fn2 , it is∑

α1∈Fn2

corF1(α0, α1) corF2(α1, α2)

=∑α1∈Fn2

1

2n1

2n

∑x∈Fn2

(−1)〈α0,x〉+〈α1,F1(x)〉 ∑x′∈Fn2

(−1)〈α1,x′〉+〈α2,F2(x′)〉

=1

2n1

2n

∑x∈Fn2

∑x′∈Fn2

(−1)〈α0,x〉+〈α2,F2(x′)〉 ∑α1∈Fn2

(−1)〈α1,F1(x)+x′〉

=1

2n

∑x∈Fn2

(−1)〈α0,x〉+〈α2,F2(F1(x))〉 = corF (α0, α2) .

We now focus on linear approximations over key-alternating ciphers. Let Ekbe a keyed-instance of a t-round key alternating cipher. In particular,

Ek = Rtkt ◦ · · · ◦ R1k1 ◦ R0k0

26

with Riki = Addki ◦Ri and R0 = id. Thereby, k0, . . . , kt denote the round keysthat are derived from k by the key-scheduling algorithm. The correlation of alinear approximation (α, β) over a key addition Addk can be given as

corAddk(α, β) =

{(−1)〈β,k〉 if α = β

0 else.

This can be deduced from the fact that

〈α, x〉 = 〈β, x+ k〉⇔ 〈α, x〉 = 〈β, x〉+ 〈β, k〉⇔ 〈α+ β, x〉 = 〈β, k〉

and x 7→ 〈α + β, x〉 is balanced (i.e., the correlation corAddk(α, β) is zero) if andonly if α 6= β. Otherwise, x 7→ 〈α+ β, x〉 is equal to the zero function.

This implies that the absolute correlation of a linear trail over R0k0 , . . . ,Rtkt

is independent of the actual round keys k0, . . . , kt. In particular,

corR0k0,...,Rtkt

(α−1, α0, . . . , αt) =

t∏i=0

corRiki(αi−1, αi)

=

t∏i=0

∑γ

corRi(αi−1, γ) corAddki(γ, αi)

=

t∏i=0

(−1)〈αi,ki〉 corRi(αi−1, αi)

= (−1)〈α0,k0〉+···+〈αt,kt〉t∏i=0

corRi(αi−1, αi) .

From this formula and Theorem 2.2, one obtains the following, well-knownconnections as straightforward implications (see [DR02, pp. 103–108]).

Corollary 2.1 (Linear Hull Theorem for Key-Alternating Ciphers). Let E be at-round key-alternating cipher (as depicted in Figure 2.2 with R0 = id). Then, thecorrelation of a linear approximation (α0, αt) over a keyed instance Ek is given as

corEk(α0, αt) =∑

α1,...,αt−1∈Fn2

(−1)〈α0,k0〉+···+〈αt,kt〉t∏i=1

corRi(αi−1, αi) .

Thereby, (k0, . . . , kt) denote the round keys that are derived from k by the key-scheduling algorithm.

For independent round keys, it follows that the average correlation over allkeys is equal to zero. One can further give the average square correlation of a

27

given linear approximation over all keys as the sum of the square correlations ofall containing linear trails over the unkeyed rounds. These connections are statedin the following Corollaries 2.2 and 2.3.

Corollary 2.2. Let E be an (n, (t + 1)n)-block cipher that is defined as a t-round key-alternating cipher (as depicted in Figure 2.2 with R0 = id). Let further

KeySchedule : F(t+1)n2 → (Fn2 )

t+1bijectively map the (t+ 1)n-bit initial key to t+ 1

round keys of n-bit. Then, the average correlation over all keys of any non-triviallinear approximation (α0, αt) equals zero, i.e.,

∀α0, αt ∈ Fn2 \ {0} :1

2(t+1)n

∑k∈F(t+1)n

2

corEk(α0, αt) = 0 .

Proof. Let α0, αt ∈ Fn2 \ {0}. Then,

∑k∈F(t+1)n

2

corEk(α0, αt) =∑

k0,...,kt∈Fn2

∑α1,...,αt−1∈Fn2

(−1)〈α0,k0〉+···+〈αt,kt〉t∏i=1

corRi(αi−1, αi)

=∑

α1,...,αt−1∈Fn2

t∏i=1

corRi(αi−1, αi)∑

k0,...,kt∈Fn2

(−1)〈α0,k0〉+···+〈αt,kt〉

=∑

α1,...,αt−1∈Fn2

t∏i=1

corRi(αi−1, αi) · 0

= 0 .

Corollary 2.3 (See Theorem 7.9.1 in [DR02]). Let E be an (n, (t+1)n)-block cipherthat is defined as a t-round key-alternating cipher (as depicted in Figure 2.2 with

R0 = id). Let further KeySchedule : F(t+1)n2 → Fn2

t+1 bijectively map the (t+ 1)n-bit initial key to t+1 round keys of n-bit. Then, for the average square correlationover all keys of the linear approximation (α0, αt), one obtains

1

2(t+1)n

∑k∈F(t+1)n

2

corEk(α0, αt)2 =

∑α1,...,αt−1

t∏i=1

corRi(αi−1, αi)2 .

Linear Approximations over the Building Blocks in SP Ciphers

We have already seen the correlation of a linear approximation over a key additionAddk. For the other building blocks of SP ciphers, similar to the case of differen-tials as described in Proposition 2.1, there are well-known simplified expressions forthe correlations of linear approximations. As they are straightforward to obtain,we state them without proof.

28

Proposition 2.2 (Correlation over the Building Blocks of SP ciphers). One cancompute the correlation of a linear approximation over an S-box layer and a linearlayer as follows:

(i) Let S : (Fs2)ns → (Fs2)ns be the ns-times parallel application of a bijectivefunction Sb : Fs2 → Fs2. Then,

corS ((α1, . . . , αns), (β1, . . . , βns)) =

ns∏i=1

corSb(αi, βi) .

(ii) Let L : Fn2 → Fn2 , x 7→ Mx be a linear permutation given by M ∈ GLn(F2).Then,

corL(α, β) =

{1 if α = M>β

0 else.

The Standard Argument on the Resistance against Linear Attacks

Similar to the argument for the resistance against differential cryptanalysis, thestandard designer’s argument for the resistance against linear cryptanalysis is basedon a single linear trail. In particular, the aim is to guarantee a low upper bound(i.e., < 2−

n2 ) on the absolute correlation of any non-zero linear trail over a reduced-

round version of the cipher. Then, under the simplifying assumption that thecorrelation of the linear approximation as given in Corollary 2.1 is dominated by acorrelation of a single linear trail, the absolute correlation of the linear approxima-tion would be too low in order to be able to distinguish the round-reduced versionof the cipher from a random permutation.

Extensions and Generalizations

Over the years, several extensions and generalizations of linear cryptanalysis ap-peared. As an example, we refer to zero-correlation attacks [BR14], in which linearapproximations with correlation exactly zero are utilized.

In principle, instead of approximating the cipher’s input and output by linearBoolean functions lα and lβ , one can think about approximating the input andoutput by any, not necessarily linear, Boolean functions g and h, respectively.Analogue to the correlation of a linear approximation, for a vectorial functionF : Fk2 → Fl2, the correlation of an approximation (g, h), with g ∈ Bk, h ∈ Bl, isdefined as

corF (g, h) := 2 · Probx (g(x) = h(F (x)))− 1 .

In the case where g and h are balanced, the distinguisher works similar as in thecase of linear cryptanalysis. The usage of nonlinear approximations as a general-ization of linear cryptanalysis has first been considered in [HKM95] and [KR96b].However, for a long time, nonlinear approximations were no real thread for prac-tical block ciphers as there is a complex key dependency and no real possibility of

29

iteratively joining approximations, as it is done with linear trails. More recently,the usage of nonlinear approximations was rediscovered with the introduction ofso-called invariant attacks, i.e., invariant subspace [LAAZ11] and nonlinear invari-ant attacks [TLS16]. These attacks utilize a nonlinear approximation (g, h) forwhich corEk(g, h) ∈ {−1, 1} for a significant fraction of weak keys k. They workespecially well for lightweight ciphers with a simple key schedule, see Section 2.5and Chapter 6.

2.4 The Wide-Trail Strategy and AES-like Ciphers

In this section, we explain the wide-trail strategy as introduced by Daemen in[Dae95]. It proposes a design strategy for key-alternating block ciphers that al-lows for simple arguments on the resistance against differential and linear attacks.The main starting point is that the (unkeyed) round functions Ri of the key-alternating cipher are composed as Ri = L ◦ S for an S-box layer S and a linearlayer L. Thereby, S consists of an ns-times parallel application of the bijectives-bit S-box Sb and L can be given by M ∈ GLn(F2) as x 7→ Mx. Instead ofusing an ordinary bit permutation for L, as it was originally done in substitution-permutation networks, the wide-trail strategy explains how the linear layer couldbe chosen in a more general way to avoid the existence of differential (resp. linear)trails with high probability (resp. absolute correlation).

For a vector v = (v1, . . . , vd) ∈ (Fs2)d, its weight with respect to s-bit words,denoted ws(v), is defined as the total number of non-zero s-bit words in v. For-mally,

ws(v) :=

d∑i=1

δs(vi), where δs(vi) :=

{1 ∈ Z if vi 6= 0

0 ∈ Z if vi = 0.

Thereby, the value of δs(vi) is said to be the activity of vi. We further define thefunction

~δs : (Fs2)d → Zd

(v1, . . . , vd) 7→ (δs(v1), . . . , δs(vd))

and we call ~δs(v) the activity pattern of v. Let now C = (α(0), α(1), . . . , α(t)) be a t-

round (differential or linear) trail,11 where ∀r ∈ {0, . . . , t}, α(r) = (α(r)1 , . . . , α

(r)ns ) ∈

(Fs2)ns . The weight of the trail with respect to s-bit words, denoted wts(C), isdefined as the total number of non-zero s-bit words in the first t difference (resp.mask) patterns for the round inputs, i.e.,

wts(C) :=

t−1∑r=0

ws(α(r)) =

t−1∑r=0

ns∑i=1

δs(α(r)i ) .

11We change to superscript notation (r) for denoting the round r within a trail.

30

Note that, as the round function is defined as L ◦ S with S splitting into smallers-bit S-boxes as described above, the s-bit words in the first t components of thetrail correspond exactly to the S-box input differences (resp. masks). Therefore,

if δs(α(r)i ) = 1, the i-th S-box is called active. If δs(α

(r)i ) = 0, it is called passive.

The Wide-Trail Security Argument

According to Proposition 2.1, we have a simple formula for the product of theprobabilities of the single-round differentials contained in the differential trail C.Similarly, if C is a linear trail, we have a simple formula for the absolute correlationof C according to Proposition 2.2. In particular,

t∏r=1

Prob(α(r−1) L◦S→ α(r)

)=

t∏r=1

ns∏i=1

Prob(α

(r−1)i

Sb→ L−1(α(r))i

)and∣∣∣corL◦S,...,L◦S

(α(0), α(1), . . . , α(t)

)∣∣∣ =

t∏r=1

ns∏i=1

∣∣∣corSb

(α

(r−1)i , (M>α(r))i

)∣∣∣ .We can then bound the product of the probabilities of the single-round differ-

entials contained in a differential trail C as

t∏r=1

Prob(α(r−1) L◦S→ α(r)) ≤ pwts(C)Sb , where pSb := max

α6=0,β{Prob(α

Sb→ β)} .

Analogously, the absolute correlation of a linear trail C can be upper bounded as

| corL◦S,...,L◦S(α(0), α(1), . . . , α(t))| ≤ cwts(C)Sb , where cSb := max

α,β 6=0{| corSb(α, β)|} .

We say that a t-round differential trail C = (α(0), α(1), . . . , α(t)) 6= 0 is validif every differential probability of its containing single-round differentials is non-zero. We denote the set of valid t-round differential trails over Rt ◦ · · · ◦ R2 ◦ R1

by VdiffR1,...,Rt

. Formally,

VdiffR1,...,Rt = {(α(0), α(1), . . . , α(t)) 6= 0 | ∀r ∈ {1, . . . , t} : Prob(α(r−1) Rr→ α(r)) 6= 0}.

Analogously, we say that a t-round linear trail C = (α(0), α(1), . . . , α(t)) 6= 0 isvalid if its correlation is non-zero. We denote the set of valid t-round linear trailsover Rt ◦ · · · ◦R2 ◦R1 by V lin

R1,...,Rt. Formally,

V linR1,...,Rt = {(α(0), α(1), . . . , α(t)) 6= 0 | ∀r ∈ {1, . . . , t} : corRr (α

(r−1), α(r)) 6= 0} .

The goal of the cipher designer is to guarantee a low upper bound on the productof the single-round differential probabilities of any valid differential trail and on

31

the absolute correlation of any valid linear trail, respectively. There are basicallytwo approaches the designer can focus on. Firstly, he can choose an S-box Sb forwhich pSb and cSb are low. As upper bounds, one can show that, for any s-bit

S-box Sb, pSb ≥ 21−s and cSb ≥ 2−s+12 , see [NK93, CV95]. In the first place,

this suggests that the word length s should be chosen to be large, ideally s = n.However, as the value of s gets larger, the size of the description of the S-box asa look-up table grows exponentially. Therefore, one usually aims for s to be fairlylow, e.g., s ≤ 8.

A second approach would be to design a linear layer L that maximizes theminimum weight of any valid differential or linear trail. This suggest the usage ofa more complex, carefully-chosen, L and is exactly the approach of the wide-trailstrategy. In a nutshell, it should be the responsibility of the linear layer to guar-antee high diffusion in differential and linear trails. According to Proposition 2.1and as explained above, the set

ΓdiffL,s,t :={(α(0), . . . , α(t)) 6= 0 | ∀r ∈ {1, . . . , t} : ~δs(α

(r−1)) = ~δs(L−1(α(r)))} (2.6)

contains all valid t-round differential trails as a subset. Moreover, it is only de-pendent on the linear layer L and not on the S-box layer S. Similarly, accordingto Proposition 2.2, the set

ΓlinL,s,t := {(α(0), . . . , α(t)) 6= 0 | ∀r ∈ {1, . . . , t} : ~δs(α

(r−1)) = ~δs(M>α(r))} (2.7)

contains all valid t-round linear trails as a subset and is not dependent on theparticular S-box layer S. The goal of the wide-trail strategy is to define a linearlayer

L : Fs·ns2 → Fs·ns2

that guarantees a high weight for all trails in ΓdiffL,s,t and Γlin

L,s,t, for some certainvalue of t. One also uses the wording that L should lead to a high number of activeS-boxes over t rounds. That value of t for which the minimum number of activeS-boxes can be proven to be high enough would be an indicator for the number ofrounds specified for the cipher.

It is worth remarking that, whenever the weight of any valid t-round trail is atleast w, the weight of any valid dt-round trail must be at least d · w.

2.4.1 The Branch Number and a Link to Coding Theory

In [Dae95], Daemen introduced the branch number of a linear transformation asa measure for its diffusion. Indeed, it provides a lower bound on the minimumnumber of active S-boxes in any valid differential (resp. linear) trail over tworounds.

Definition 2.7 (Differential and linear branch number (see, e.g., pp. 131–132of [DR02])). Let L : Fs·ns2 → Fs·ns2 , x 7→ Mx be an F2-linear transformation. Thedifferential branch number of L with respect to s-bit words is defined as

Bdiffs (L) := min

α6=0{ws(α) + ws(L(α))} .

32

Analogously, the linear branch number of L with respect to s-bit words is definedas

Blins (L) := min

α6=0{ws(α) + ws(M

>α)} .

Proposition 2.3 (Two-Round Propagation Theorem (Theorem 9.3.1 in [DR02])).The weight with respect to s-bit words of any valid two-round differential (resp.linear) trail over round functions L ◦ S, where S consists of a parallel applicationof an s-bit S-box, is lower bounded by the differential (resp. linear) branch numberof L, i.e.,

minC∈Vdiff

L◦S,L◦S

{wts(C)} ≥ Bdiffs (L) and min

C∈VlinL◦S,L◦S

{wts(C)} ≥ Blins (L) .

Proof. As the case of linear trails can be proven in a similar way as the case ofdifferential trails, we only show the proof for differential trails here.

We consider trails in ΓdiffL,s,t. In particular, let C = (α(0), α(1), α(2)) be a differ-

ential trail in ΓdiffL,s,2. Then, ~δs(α

(0)) = ~δs(L−1(α(1))). Moreover,

wts(C) = ws(α(0)) + ws(α

(1)) = ws(L−1(α(1))) + ws(α

(1)) ≥ Bdiffs (L) ,

as α(1) 6= 0.

One immediately obtains

Corollary 2.4. The weight with respect to s-bit words of any valid 2t-round differ-ential (resp. linear) trail over round functions L◦S, where S consists of a parallelapplication of an s-bit S-box, is lower bounded by t · Bdiff

s (L) (resp. t · Blins (L)).

It is obvious that for any F2-linear transformation on Fsd2 , its differential andlinear branch number with respect to s-bit words is smaller or equal to d + 1.In [RDP+96], the relation between the branch number of a linear transformationand the minimum distance of a linear code was pointed out. In particular, ifL : Fd2s → Fd2s is defined as an F2s-linear transformation over the d-dimensionalvector space Fd2s , the differential branch number of L is equal to the minimumdistance of the F2s-linear code C of length 2d and dimension d generated by [IdM ],where Id is the d×d identity matrix and M ∈ GLd(F2s) the to L associated matrixfor some choice of basis. Analogously, the linear branch number of L is equal tothe minimum distance of the dual code C⊥, which is generated by [M>Id]. Fromthe Singleton bound of linear codes, (see [MS77, Chapter 1, Theorem 11]), oneimmediately obtains that the minimum distance of each of those codes is upperbounded by d + 1. As linear codes with highest possible minimum distance arecalled maximum distance separable (MDS), the following terminology has beenestablished for matrices that lead to an optimal branch number.

Definition 2.8 (MDS Matrix). A d × d matrix M with coefficients in F2s iscalled maximum distance separable (MDS), if the F2s-linear code of length 2d anddimension d generated by [IdM ] is MDS.

33

Since for every MDS code its dual code is also MDS (see [MS77, Chapter11, Theorem 2]), it follows that a d × d MDS matrix M corresponds to a lineartransformation on Fd2s with optimal differential and linear branch number of d+1.Further, an MDS matrix can be characterized as follows.

Theorem 2.3 (Theorem 8, p. 321 in [MS77]). Let M be a d × d matrix withcoefficients in F2s . Then, M is MDS if and only if all its square submatrices areinvertible.

It follows that MDS matrices must be invertible. A discussion on a moregeneral MDS property in cases where L is not F2s -linear, but just F2-linear, ismade in Section 3.5.

In order to guarantee the highest possible number of active S-boxes, and thusto obtain a high resistance against differential and linear attacks, the wide-trailstrategy suggests the application of an MDS matrix for the linear layer L. However,such an MDS matrix may cause a huge implementation overhead for larger valuesof ns. In the following, we explain a cipher design that allows for a better trade-offin terms of the minimum number of active S-boxes and implementation efficiency.

2.4.2 AES-like Ciphers

We now explain a block cipher structure that was especially designed accordingto the wide-trail strategy. Originally, the structure was introduced with the blockcipher SQUARE [DKR97], a predecessor of the Rijndael cipher which was adoptedas the Advanced Encryption Standard (AES) in 2001 [PUB01]. As the AES is thebest-known and best-studied block cipher today, we explain its particular roundfunction in the following. However, as the AES inspired lots of other designs(e.g., ANUBIS [BR00b], LED [GPPR11], mCrypton [LK06], Midori [BBI+15], PHO-TON [GPP11], Prince [BCG+12], QARMA [Ava17], Skinny and Mantis (Chapter 5),and Whirlpool [BR00a]), we give a more general definition first. We call ciphersdesigned according to this general notion AES-like ciphers.

An AES-like cipher fulfills the notion of an SP cipher as depicted in Figure 2.3.In particular, it is a key-alternating block cipher operating on a block length ofn = s · ns, where ns is split into two dimensions nr, nc such that ns = nr · nc.For a better representation, one usually writes the cipher’s input, output and itsinternal states x ∈ Fsnrnc2 as an nr×nc-dimensional array with s-bit words,12 i.e.,

x =

x1 xnr+1 . . . x(nc−1)nr+1

x2 xnr+2 . . . x(nc−1)nr+2

......

. . ....

xnr x2nr . . . xncnr

, xi ∈ F2s .

12Note that, in the following, we will represent an s-bit word xi as an element of the finitefield with 2s elements, i.e., xi ∈ F2s .

34

m . . . c

k0 k1 kt

RSb,p,M RSb,p,M RSb,p,M

=RSb,p,M

Sb Sb Sb Sb

Sb Sb Sb Sb

Sb Sb Sb Sb

Sb Sb Sb Sb

SSb Permutep MixM

Figure 2.6: The structure of a key-alternating cipher with an AES-like round,illustrated for a 4× 4 state of s-bit words.

The characteristic of an AES-like cipher is that it adheres to a special kind ofround function, as given in Definition 2.9 and depicted in Figure 2.6. After theapplication of such an (unkeyed) round, a round key k(r) ∈ Fsnrnc2 is added to thecipher’s internal state as it is common for a key-alternating cipher. Our notionof an AES-like cipher sets no requirements on the key-scheduling algorithm and,for simplicity, we will ignore it in the following. Anyway, as explained earlier inSections 2.3.1 and 2.3.2, the standard security argument against differential andlinear attacks is done under the assumption of independent round keys.

Definition 2.9. An AES-like round is defined as a permutation

RSb,p,M : Fnrnc2s → Fnrnc2s ,

which is parametrized by the word length s ∈ N, the state dimension nr, nc ∈ N,where nr denotes the number of rows and nc denotes the number of columns of annr × nc state, an invertible S-box Sb : F2s → F2s , a permutation p ∈ Snrnc , anda matrix M ∈ GLnr (F2s). In particular, the round function RSb,p,M is composedof the bijective transformations SSb, Permutep and MixM operating on an nr ×ncstate, such that RSb,p,M = MixM ◦ Permutep ◦ SSb:

1. SSb is a parallel application of the S-box Sb : F2s → F2s to all nr · nc wordsof the state.

SSb : (F2s)nrnc → (F2s)

nrnc

∀i ∈ N0<nr ,∀j ∈ N0

<nc : xnrj+i+1 7→ Sb(xnrj+i+1) .

2. Permutep permutes the words of the state according to the permutation p,i.e.,

Permutep : (F2s)nrnc → (F2s)

nrnc

∀i ∈ N0<nr ,∀j ∈ N0

<nc : xnrj+i+1 7→ xp(nrj+i+1) .

35

Table 2.1: The S-box SbAES used in the AES. For each (hexadecimal) valueof x and y, the table shows SbAES(x||y) as a hexadecimal value. For instance,SbAES(1B) = AF.

yx 0 1 2 3 4 5 6 7 8 9 A B C D E F

0 63 7C 77 7B F2 6B 6F C5 30 01 67 2B FE D7 AB 76

1 CA 82 C9 7D FA 59 47 F0 AD D4 A2 AF 9C A4 72 C0

2 B7 FD 93 26 36 3F F7 CC 34 A5 E5 F1 71 D8 31 15

3 04 C7 23 C3 18 96 05 9A 07 12 80 E2 EB 27 B2 75

4 09 83 2C 1A 1B 6E 5A A0 52 3B D6 B3 29 E3 2F 84

5 53 D1 00 ED 20 FC B1 5B 6A CB BE 39 4A 4C 58 CF

6 D0 EF AA FB 43 4D 33 85 45 F9 02 7F 50 3C 9F A8

7 51 A3 40 8F 92 9D 38 F5 BC B6 DA 21 10 FF F3 D2

8 CD 0C 13 EC 5F 97 44 17 C4 A7 7E 3D 64 5D 19 73

9 60 81 4F DC 22 2A 90 88 46 EE B8 14 DE 5E 0B DB

A E0 32 3A 0A 49 06 24 5C C2 D3 AC 62 91 95 E4 79

B E7 C8 37 6D 8D D5 4E A9 6C 56 F4 EA 65 7A AE 08

C BA 78 25 2E 1C A6 B4 C6 E8 DD 74 1F 4B BD 8B 8A

D 70 3E B5 66 48 03 F6 0E 61 35 57 B9 86 C1 1D 9E

E E1 F8 98 11 69 D9 8E 94 9B 1E 87 E9 CE 55 28 DF

F 8C A1 89 0D BF E6 42 68 41 99 2D 0F B0 54 BB 16

3. MixM applies a left-multiplication by the nr × nr matrix M to all columnsof the state, i.e.,

MixM : (F2s)nrnc → (F2s)

nrnc

∀j ∈ N0<n : [xnrj+1, . . . , xnrj+nr ]

> 7→M · [xnrj+1, . . . , xnrj+nr ]> .

The Advanced Encryption Standard

The AES comes with three different versions, i.e., AES-128, AES-192 and AES-256. All of them operate on a block length of n = 128, but differ in their keylength. In particular, AES-128 supports a key length of κ = 128, AES-192 a keylength of κ = 192, and AES-256 a key length of κ = 256. The 128-bit state isrepresented as a 4 × 4 array of words of length s = 8. As one represents the8-bit words by elements in the finite field with 28 elements, one has to agree ona particular field representation. In the AES, the field is represented as F2s

∼=F2[X]/(X8 + X4 + X3 + X + 1) and thus, each element of the field is given as apolynomial in F2[X] of degree lower than 8. In the literature, such a polynomialis usually denoted as a hexadecimal value representing the coefficient vector of thepolynomial. For instance, the field element X + 1 is denoted by 03. Using thisnotation, the unkeyed round RSbAES,p,M : F16

28 → F1628 of all versions of the AES is

defined as follows:

36

SSbAES (SubBytes). The S-box SbAES : F28 → F28 employed in the S-box layer isgiven in Table 2.1. It has the algebraic expression13

x 7→ h(x2s−2) ,

where h is an affine permutation.

Permutep (ShiftRows) operates as a permutation of the words of the state. Inparticular, it left-rotates the rows of the state by the offset 0, 1, 2 and 3,respectively.

x1 x5 x9 x13

x2 x6 x10 x14

x3 x7 x11 x15

x4 x8 x12 x16

7→x1 x5 x9 x13

x6 x10 x14 x2

x11 x15 x3 x7

x16 x4 x8 x12

.

This corresponds to the permutation

p =

(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161 6 11 16 5 10 15 4 9 14 3 8 13 2 7 12

).

MixM (MixColumns). The MDS matrix

M =

02 03 01 01

01 02 03 01

01 01 02 03

03 01 01 02

∈ GL4(F28)

is applied to every of the four columns of the state.

The difference between the three versions of the AES is their particular key-scheduling algorithm and the number of applied rounds. The key-scheduling al-gorithm of AES-128 takes the 128-bit initial key k and generates eleven 128-bitround keys k(0), . . . , k(10). Thus, AES-128 is a 10-round key-alternating cipher.All but the last round of the cipher are the same and as defined above. Theonly exception is the last round which omits the MixColumns operation. Sim-ilarly, the key-scheduling algorithm of AES-192 takes the 192-bit initial key kand generates 13 128-bit round keys k(0), . . . , k(12). Thus, AES-192 is a 12-roundkey-alternating cipher. Again, the MixColumns operation is omitted in the lastround. The last version, AES-256, is a 14-round key alternating cipher and thekey-scheduling algorithm takes the 256-bit initial key k and generates 15 128-bitround keys k(0), . . . , k(14). As for the other versions, the MixColumns operationin the last round is omitted. For more details on the specification of the AES, andparticularly for the key-scheduling algorithms, we refer to [PUB01].

13It was shown in [Nyb94] that this algebraic construction based on inversion in the finite fieldhas strong cryptographic properties. In particular, pSbAES = 2−6 and cSbAES = 2−3. It is stillan open problem to answer whether there exists a bijective S-box on 8-bit that improves thosevalues.

37

Active S-boxes in AES-like Ciphers – The Four-Round Propagation Theorem

One of the advantages of the AES-like design is its simple structure. Moreover,if the permutation p is carefully chosen, one can prove a strong lower boundon the weight of any valid four-round trail. In particular, one obtains that theminimum number of active S-boxes in any valid four-round trail is lower boundedby the square of the branch number of the linear transformation x 7→ Mx. Inthe following, we denote the differential branch number of this transformation byBdiffs (M) and the linear branch number by Blin

s (M), respectively.

Theorem 2.4 (Four-Round Propagation Theorem (Theorem 9.5.1 in [DR02])).Let RSb,p,M be an AES-like round with nc ≥ nr such that, for each column of thestate, Permutep distributes the words of a column to all different columns.

Then, the minimum number of active S-boxes of any valid four-round differen-tial (resp. linear) trail is lower bounded by Bdiff

s (M)2 (resp. Blins (M)2).

Proof. We only show the case of differential trails. Let

C = (α(0), α(1), α(2), α(3), α(4)) ∈ ΓdiffMixM◦Permutep,s,4 .

We have to show that wts(C) ≥ Bdiffs (M)2. As Permutep commutes with SSb, the

four rounds of RSb,p,M can be written as

MixM ◦ Permutep ◦ SSb ◦MixM ◦ SSb ◦ PMP ◦ SSb ◦MixM ◦ Permutep ◦ SSb ,

where PMP := Permutep◦MixM ◦Permutep. We thus have for the activity patternof the trail:

~δs(α(0)) = ~δs(Permute−1

p (Mix−1M (α(1))))

~δs(α(1)) = ~δs(PMP−1(Permutep(α

(2))))

~δs(Permutep(α(2))) = ~δs(Mix−1

M (α(3))) .

We can now give a lower bound for the weight of the trail C, by using the branchnumber of PMP with respect to columns (i.e., snr-bit words), as

wts(C) = ws(α(0)) + ws(α

(1)) + ws(α(2)) + ws(α

(3))

= ws(α(0)) + ws(α

(1)) + ws(Permutep(α(2))) + ws(α

(3))

= ws(Mix−1M (α(1))) + ws(α

(1)) + ws(Permutep(α(2))) + ws(α

(3))

≥ Bdiffs (M) · wsnr (α

(1)) + Bdiffs (M) · wsnr (Permutep(α

(2)))

= Bdiffs (M) · (wsnr (α

(1)) + wsnr (Permutep(α(2))))

≥ Bdiffs (M) · Bdiff

snr (PMP) .

It is left to show that PMP has a branch number with respect to columns larger orequal to Bdiff

s (M). For this, let α ∈ Fnrnc2s 6= 0. Let further β = (β1, . . . , βnrnc) =

38

Permutep(α), let γ = (γ1, . . . , γnrnc) = MixM (β) and let δ = Permutep(γ). Asβ 6= 0, there exists an active column in β, i.e., there exists a j ∈ {0, . . . , nc − 1}such that

δsnr ((βnrj+1, . . . , βnrj+nr )) = 1 .

Because of the definition of the branch number of M , we further have

ws((βnrj+1, . . . , βnrj+nr )) + ws((γnrj+1, . . . , γnrj+nr )) ≥ Bdiffs (M) .

Since Permute−1p distributes the words βnrj+1, . . . , βnrj+nr in all different columns

and Permutep distributes the words γnrj+1, . . . , γnrj+nr in all different columns,we must have at least ws(βnrj+1, . . . , βnrj+nr ) active columns in α and also atleast ws(γnrj+1, . . . , γnrj+nr ) active columns in δ. In other words,

wsnr (α) ≥ ws(βnrj+1, . . . , βnrj+nr ) and wsnr (δ) ≥ ws(γnrj+1, . . . , γnrj+nr ) .

Thus, one finally obtains wsnr (α) + wsnr (δ) ≥ Bdiffs (M).

Corollary 2.5. Let RSb,p,M be an AES-like round as in Theorem 2.4. Then, theminimum number of active S-boxes of any valid 4t-round differential (resp. linear)trail is lower bounded by t · Bdiff

s (M)2 (resp. t · Blins (M)2).

2.4.3 Computing Active S-boxes with Automatic Tools

The Four-Round Propagation Theorem definitely belongs to the strongest wide-trail arguments we currently know, strengthening confidence in the security of theAES. Its simplicity and elegance inspired several designers to adopt the structureof an AES-like round function. However, in ciphers which do not follow an AES-like design or in cases where the bounds for multiple rounds obtained by iteratingthe bounds obtained by the Four-Round Propagation Theorem (Corollary 2.5) arenot strong enough, the minimum number of active S-boxes has to be analyzedin a different way. In most cases, this is done using computer-aided tools. Weexplain two common methods, i.e., Matsui’s approach and Mixed-Integer LinearProgramming (MILP).

Matsui’s Approach

In [Mat95, Section 4], Matsui presented an algorithm for computing the maximumof the product of the single-round differential probabilities over all (non-trivial)differential trails, and the maximum absolute correlation over all (non-trivial)linear trails in DES, respectively. In the context of the wide-trail approach, it isespecially useful for computing the minimum weight of all trails in Γdiff

L,s,t or ΓlinL,s,t

for a given number of rounds t. For the case of differential trails, this algorithmis given as Algorithm 2.2 below. Indeed, the designers of the AES-like lightweightcipher Midori utilized this approach for computing the minimum number of activeS-boxes in order to find the best choice for the Permutep operation.

39

Algorithm 2.2 Matsui’s algorithm for computing minC∈ΓdiffL,s,t

wts(C)

1: procedure Matsuit((B0, . . . , Bt−1), Bt)2: for δ(0) ∈ {0, 1}ns \ {0} do3: MatsuiRecursivet((B0, . . . , Bt−1), 0, ws(δ

(0)), δ(0))4: end for5: Bt ← Bt6: return Bt7: end procedure8:

9: procedure MatsuiRecursivet((B0, . . . , Bt−1), r, w, δ(r))10: if r = t− 1 then11: Bt ← w12: return13: end if14: for δ(r+1) s.t. ∃α(r)∈(Fs2)ns with δ(r) = ~δs(α

(r)) and δ(r+1) = ~δs(L(α(r))) do

15: if w +Bt−r−1 ≤ Bt then16: MatsuiRecursivet((B0, . . . , Bt−1), r + 1,w + ws(δ

(r+1)), δ(r+1))17: end if18: end for19: end procedure

For t ∈ N, let

Bt = minC∈Γdiff

L,s,t

wts(C)

denote the minimum number of active S-boxes over all trails in ΓdiffL,s,t and let

B0 := 0. In order to run Matsuit for computing Bt, one needs to know Bt′ forall t′ < t. In other words, one has to run Matsuit′ for all values of t′ smallerthan t first. As Matsui stated in his original paper, the values of Bt up to t = 3can often be obtained quite easily, so there might be no necessity to start withcomputing B1 by Matsui’s algorithm. Further, the algorithm Matsuit needs asinput an initial value Bt with the requirement that Bt ≥ Bt. This initial valuehas to be estimated first. During the execution, the algorithm will dynamicallyoverwrite Bt with tighter estimates for the actual bound Bt. It is an example ofa branch and bound algorithm. In particular, the activity patterns can be seento be arranged in a tree, where each node represents a possible activity patternof the input difference at one particular round and its children are the possibleactivity patterns for the output differences according to Equation 2.6. The treeis traversed in a depth-first manner and whenever the weight of a (partial) trail,

represented by a path in the tree, exceeds Bt, the subtree with the current node asa parent can be pruned and has not to be traversed any more. Thus, the preciserthe initial guess of Bt is, the more paths of the tree can be pruned in the firstplace and the more efficient the algorithm gets. Moreover, if the linear layer of

40

the cipher allows only few possible output differential activity patterns for manyof the possible input differences, the algorithm can become very efficient.

Mixed-Integer Linear Programming (MILP)

Another way of computing the minimum number of active S-boxes is to model theminimization problem

minC∈Γdiff

L,s,t

wts(C)

as a MILP instance, see, e.g., [MWGP12]. The crucial part is to model the lin-ear constraints that exactly define the branching transitions of the linear layercorresponding to the trails in Γdiff

L,s,t (resp. ΓlinL,s,t). For efficiency reasons, often a

superset of ΓdiffL,s,t (resp. Γlin

L,s,t) is modelled instead. For instance, in [MWGP12],the authors define the optimization problem by linear constraints that model onlythe branch number of the linear layer and thus, their bounds might not be tightfor the specific linear layer considered. Over the years, lots of progress has beenmade in order to derive tighter bounds using MILP, especially in the more in-volved related-key setting (see, e.g., [SHS+13, SHW+14b, SHW+14a]). In thedesign of Skinny and Mantis, a MILP approach is used in order to derive boundson the minimum number of active S-boxes, both in the single-key setting and inthe related-key setting. Details follow in Chapter 5.

2.5 Lightweight Cryptography

Today, the AES can certainly be considered as the most versatile block cipher to beemployed in environments with high security standards. By design, it only requiresa reasonable implementation overhead for the security it offers. However, theremight be situations in which the AES cannot be employed as its operations requiretoo much computational resources. An important field of research that emergedover the last years is lightweight cryptography. Although it is not trivial to give aprecise definition of lightweight cryptography, its goal is to establish cryptographicsolutions for applications in resource-constrained environments. Especially in thecontext of the “Internet of Things” (IoT), there is a strong demand for crypto-graphic algorithms (e.g., block ciphers) that can be implemented on devices withextremely constrained resources, e.g., low chip area or low power supply. Further,especially if the main application of the device is not to provide a cryptographicsolution, the cryptographic algorithm should come with negligible additional cost.To give a concrete example from the healthcare sector, consider a pacemaker.Here, the constrained resource is energy as one certainly wants to avoid frequentlychanging batteries. If a cryptographic algorithm should be implemented in orderto prevent manipulating the device from outside, it should require as low energyas possible.

Because of its importance over the last years, NIST is currently running alightweight cryptography project with the goal of learning more about the needs

41

from industry and to standardize new lightweight designs. We refer to the re-port on this project [MBTM17] for further details. We also refer to [BP17] for acomprehensive survey on lightweight cryptography and for a huge list of examplesof existing designs. In this thesis, we merely consider lightweight block ciphers,especially their design and analysis.

2.5.1 Lightweight Metrics

A lightweight design is optimized with respect to certain lightweight metrics. Suchmetrics include chip area, latency, throughput, code size, power or energy con-sumption. Here, we briefly explain two important lightweight metrics, i.e., chiparea and latency, as examples.

Area in Hardware

One of the most considered lightweight metrics is area. If a block cipher should beimplemented on a small hardware device, e.g., an RFID tag, only a limited amountof chip area is available. Usually, the hardware area needed for implementationis measured in terms of Gate Equivalents (GE). Thereby, one GE determines thearea for implementing a single two-input bitwise NAND gate (i.e., x1 ∧ x2 forx1, x2 ∈ F2). One of the first block ciphers that was designed with a focus onreducing area in hardware implementations is Present [BKL+07]. It has a blocklength of n = 64 and supports two different key length of κ = 80 and κ = 128,respectively. The design offers very competitive area requirements. In the originalpaper, the designers were able to implement the 80-bit key length version of theircipher requiring 1570 GE, outperforming several other ciphers including the AES.Present employs a simple bit permutation as its linear layer. The main reason isthat a bit permutation can be implemented without any gates, merely as permutingwires between gates.

Whenever the linear layer is designed in a more complex manner, a specialfocus lies on the number of XOR gates required for its implementation. Thereby,one XOR gate implements the field addition of two elements in F2. Reducing thenumber of XOR operations is one important design goal that has attracted lotsof attention recently. In Chapter 3, we look at this so-called XOR-count in moredetail and analyze the efficiency of multiplication in finite fields of characteristictwo.

Latency

Another lightweight metric is latency. In case of a block cipher, this defines theamount of time (usually measured in ns) needed for encrypting the message andproviding the ciphertext. Low-latency ciphers are especially required in applica-tions where a fast response is crucial. One of the first low-latency block ciphersoptimized for hardware implementations is Prince [BCG+12]. Its main feature isthe capability of encrypting messages within a single clock cycle requiring only a

42

reasonable low amount of chip area. For this, the cipher is not implemented in around-based manner, i.e., when only a single round is implemented and iteratedseveral times. Instead, the rounds are unrolled and the full cipher is implementedat once. For this reason, Prince employs only a small number of rounds. Theother beneficial feature of Prince is that decryption can be realized with almost noimplementation overhead due to its so-called α-reflection property.

In Section 5.4 we present Mantis, a family of (tweakable) low-latency blockciphers. It is inspired by the design of Prince and the low-energy block cipherMidori, which we explain in Section 2.5.3 below.

2.5.2 Characteristics of Lightweight Block Ciphers

In this section, we briefly explain the characteristics that lightweight block cipherdesigns usually have in common.

Sparser Components

If we consider the AES, it consists of cryptographically strong building blocks.Its S-box is of s = 8 bits length and has a maximum differential probability andmaximum absolute correlation of only pSbAES = 2−6 and cSbAES = 2−3, respectively.No cryptographically stronger S-box of eight bits length with respect to thosevalues is known to date. Further, its MixColumns matrix is MDS and its key-scheduling algorithm is quite complex and non-linear. However, those operationsmay still be too expensive for applications in extremely constrained environments.For instance, the S-box has to be either implemented as an inversion in the finitefield F28 or by storing a look-up table of size 8×28 bits. Further, due to its density,the MDS MixColumns matrix might as well be too expensive to be implemented.

Therefore, in lightweight cryptography, designers focus on tailoring the crypto-graphic components to a minimum. For instance, a four-bit S-box requires less areathan an eight-bit S-box, a sparse linear layer can be implemented more efficientlythan an MDS matrix. For that reason, in lightweight block ciphers, one often hasS-boxes of length s = 4. In cases where still eight-bit S-boxes are employed, theyare usually constructed by smaller building-blocks, e.g., four-bit S-boxes, as donein [CDL16]. In order to minimize XOR operations, the linear layer is often verysparse and therefore only offers a limited amount of diffusion.

More Rounds

The 128-bit key length version of the AES only needs ten rounds to strengthenagainst cryptanalytic attacks. The main reason for that is the employment ofcryptographically strong building blocks. One can think about the following ex-treme: The non-linear operation within a cipher round consists of just a singlebitwise AND and the round key is introduced by a single bitwise XOR. In orderto obtain a secure cipher, many rounds have to be applied and the actual secu-

43

rity would come from iterating a very weak function a lot.14 However, this wouldsignificantly increase the latency in a round-based implementation. Therefore, de-signers of lightweight cryptographic algorithms are usually aiming to find the besttrade-off between lightweightness of the rounds and the number of rounds thathave to be applied. This, of course, depends on the particular lightweight metricone wants to optimize. For low-latency requirements, the number of rounds hasto be quite low.

Simple Key Schedules

As the building blocks of lightweight designs are tailored to a minimum, severaldesigns also employ a very simple, often linear, key schedule. One of the mostsimple kind of key schedules one can think of is the usage of identical round keysin every round. However, in order to avoid slide attacks [BW99], the (keyed)rounds Ri of a block cipher should not be all the same. Therefore, in the simplestpractical key schedule, pre-defined and public round constants c0, . . . , ct ∈ Fn2 areadded to the round keys. In other words, the key-scheduling algorithm takes theinitial key k and derives the round keys of the form15

k 7→ (k + c0, k + c1, . . . , k + ct) .

Examples for schemes that employ such a kind of key schedule are LED, Midori,Noekeon [DPVAR00] and Prince. Such a simple and linear key schedule may looksuspicious at a first sight, especially as we analyze the security of ciphers underthe assumption of independent round keys. Kranz, Leander and Wiemer recentlyanalyzed this key schedule in [KLW17] and found out that, fortunately, for everylinear approximation, the average variance of the correlation over all possible roundconstants is the same as the variance for independent round keys as stated inCorollary 2.3. This gives indication for the soundness of employing simple linearkey schedules and choosing random round constants.

However, studying the security of key-alternating ciphers with a key-schedulingalgorithm of the form k 7→ (k + c0, k + c1, . . . , k + ct) remains still of significantimportance in lightweight cryptography. In particular, several lightweight block ci-phers with such a key-scheduling algorithm have recently been broken by so-calledinvariant attacks, i.e., the invariant subspace attack [LAAZ11] or the nonlinearinvariant attack [TLS16]. Chapter 6 deals with those attacks in more detail.

14Such an example was described by Bogdanov at the summer school on “Design and Securityof Cryptographic Functions, Algorithms and Devices” in Albena, Bulgaria in 2013. He claimedthat several thousand of those rounds are required for obtaining a secure cipher. The slidescan be found at https://www.cosic.esat.kuleuven.be/summer_school_albena/slides/Andrey_lightweight-bc.pdf (accessed: December 5, 2017).

15Several variants are possible. For instance, sometimes two different round keys k′ and k′′ arederived from the initial key k and are employed in every second round, i.e., k 7→ (k′ + c0, k′′ +c1, k′ + c2, . . . ).

44

https://www.cosic.esat.kuleuven.be/summer_school_albena/slides/Andrey_lightweight-bc.pdf

https://www.cosic.esat.kuleuven.be/summer_school_albena/slides/Andrey_lightweight-bc.pdf

Innovative Designs

By extremely focussing on efficiency and performance, sometimes designers comeup with innovative constructions that deviate from well-known cipher designs. Oneof the most prominent examples of an innovative block cipher design is the NSAcipher Simon, as mentioned in Example 2.2. It was designed for having competi-tive performance on a variety of platforms, making it a flexible lightweight cipherfor the IoT. Although Simon is a Feistel cipher, it is neither based on S-boxes noron an ARX construction. Moreover, it employs an innovative key schedule. Thedrawback of an innovative design is that it may be harder to analyze and may leadto new, dedicated, attacks. Indeed, the NSA did not publish any design rationaleof their cipher from a cryptographic viewpoint and only explained their consider-ations with respect to performance (see also [BSS+15]). This, implicitly, left thetask of cryptanalysis to third researchers. In fact, several papers that focussedon analyzing Simon appeared since its introduction in 2013. Lots of those papersemployed dedicated, computer-aided arguments for analyzing the cipher with re-spect to standard attacks. However, no serious threats have been found so far and,still, Simon offers a reasonable security margin based on existing cryptanalysis. InChapter 7, we also take a look at Simon and derive a security argument on theresistance against differential attacks that does not rely on computer-aided tools,i.e., that can be verified by hand.

Chapter 5 explains the block cipher Skinny which was designed with the moti-vation of having a cipher that offers a competitive level of performance but comeswith security arguments against standard cryptanalytic attacks by design, espe-cially in the more involved related-key setting.

2.5.3 Midori

Midori is a lightweight block cipher published by Banik et al. in 2015 [BBI+15].It was especially designed for achieving low energy consumption. It comes in twodifferent versions, i.e., Midori-64 and Midori-128, to support block lengths of n = 64and n = 128, respectively. In both versions, the key length is κ = 128. The overalldesign is inspired by the AES which makes it belonging to the class of AES-likeciphers as described in Section 2.4.2. In particular, the building blocks of therounds are much simpler than those of the AES with the purpose of minimizingenergy consumption. For instance, the implementation of the multiplicative struc-ture of the finite field F2s is avoided by only using the trivial elements 0 and 1 ascoefficients in the MixColumn matrix. Moreover, the cipher employs the typical“lightweight” key-scheduling algorithm of generating basically the same round keyand adding pre-defined round constants. As we are going to consider Midori-64as a case study for some of our results, i.e., finding the best word permutationlayers (Chapter 4) and studying invariant attacks (Chapter 6), we briefly explainthe design of Midori-64 in the following. Moreover, the low-latency block cipherMantis, which we explain in Section 5.4, is to a large extend inspired by the designof Midori. For more details and the exact specification of Midori-64 and Midori-128,

45

we refer to the original design document.

The Round Function of Midori-64

The unkeyed round function RSbMid64,p,M : F1624 → F16

24 of Midori-64 fulfills the defi-nition of an AES-like round as given in Definition 2.9. In particular, the state isrepresented by a 4× 4 array of words of length s = 4 as

x1 x5 x9 x13

x2 x6 x10 x14

x3 x7 x11 x15

x4 x8 x12 x16

.

The round function of Midori-64 is composed of the following consecutive opera-tions. It is iterated 16 times with a round key addition in between.16

SSbMid64(SubCell). The involutory S-box SbMid64 : F4

2 → F42 as given in Table 2.2

is employed in the S-box layer. Note that in Midori one does not need themultiplicative structure of the finite field F24 . Therefore, we simply give theS-box as a mapping F4

2 → F42. A four-bit vector is represented in hexadecimal

notation, e.g., (0, 0, 1, 0) = 2.

Table 2.2: The 4-bit S-box SbMid64 used in Midori-64.

x 0 1 2 3 4 5 6 7 8 9 A B C D E F

SbMid64(x) C A D 3 E B F 7 8 9 1 5 0 2 4 6

Permutep (ShuffleCell) operates as a permutation of the words of the state asfollows:

x1 x5 x9 x13

x2 x6 x10 x14

x3 x7 x11 x15

x4 x8 x12 x16

7→x1 x15 x10 x8

x11 x5 x4 x14

x6 x12 x13 x3

x16 x2 x7 x9

.


p =

(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 161 11 6 16 15 5 12 2 10 4 13 7 8 14 3 9

).

MixM (MixColumn). The involutory matrix

M =

0 1 1 11 0 1 11 1 0 11 1 1 0

∈ GL4(F24)


16The last round omits the linear layer.

46

The Key-Scheduling Algorithm of Midori-64

Midori-64 takes the initial 128-bit key k and splits it into two 64-bit keys ξ0 andξ1, thus k = ξ0||ξ1. Then, we have for the whitening keys, k0 = k16 = ξ0 + ξ1,and for the other round keys, ki = ξ(i−1) mod 2 + ci, where the ci are pre-definedround constants. For the precise definition of the round constants, we refer tothe original design paper. It is worth remarking that the round constants are allcontained in the set {(0, 0, 0, 0), (0, 0, 0, 1)}16, i.e., in each word of the state onlythe least significant bit is affected by the round constant addition.

On the ShuffleCell Permutation

One major contribution of the designers was the observation that the usage of aword permutation Permutep that is different from the AES ShiftRows operation,in combination with a non-MDS MixColumns matrix, may lead to a higher num-ber of active S-boxes as one would expect by iterating the bounds obtained bythe Four-Round Propagation Theorem (Theorem 2.4). Indeed, the designers wereable to increase the minimum number of active S-boxes, e.g., for six rounds from20 (i.e., 4 · 4 active S-boxes for four rounds plus 4 active S-boxes for two rounds)to 30, by using the permutation as described above. Those bounds were obtainedby automatic search tools and unfortunately, there is not much theoretical under-standing why the bounds improve that significantly. In Chapter 4, we analyzepossible alternatives for the ShuffleCell permutation in Midori.

Invariant Attacks on Midori-64

Several external cryptanalysis has already been conducted on Midori. One particu-lar threat is that Midori-64 is vulnerable to invariant attacks, see [GJN+16, TLS16].In particular, [TLS16] showed that there exists a fraction of 2−64 weak keys forwhich the keyed instance can be distinguished from a random permutation usingonly very few plaintext/ciphertext pairs. Chapter 6 studies the invariant attackin more detail.

47

Part I

Design of Lightweight LinearLayers

49

Chapter 3

Lightweight Linear Layersbased on Finite FieldMultiplications

This chapter is based on the publication [BKL16] which is joint work with Thors-ten Kranz and Gregor Leander.1 All authors equally contributed. The main partof the author was Section 3.2 and 3.3, i.e., studying the efficiency of multiplicationsin finite fields of characteristic two.

3.1 Introduction

Many block cipher designs, including the AES or other AES-like ciphers, buildon finite fields as their underlying mathematical structure. In most cases, thoseciphers can be designed without having to specify a concrete representation of thefinite field in advance. However, when the cipher is finally being implemented inpractice, one necessarily has to choose a particular representation of the finite field,basically as binary strings for its elements. For instance, in the case of the AES,the finite field with 28 elements is represented as F2[X]/(X8+X4+X3+X+1), seeSection 2.4.2. In general, this choice does not influence the security of the cipher,but might heavily influence the performance of the resulting implementation. Inthis chapter we focus on this choice of field representations and evaluate how tochoose an optimal representation with respect to multiplication with fixed fieldelements.

When applying an MDS matrix in the linear layer of a block cipher design, themain challenge is to choose an MDS matrix that is most suitable for allowing an

1This chapter is a modified version of the original publication [BKL16]. We additionallydiscuss follow-up work that appeared afterwards. The version published by Springer-Verlag isavailable at DOI: 10.1007/978-3-662-53018-4 23 ( c© IACR 2016).

51

https://dx.doi.org/10.1007/978-3-662-53018-4_23

efficient implementation. In particular, as those MDS matrices are usually definedover a finite field with characteristic two, i.e., F2s , one important question is howthe choice of a particular F2-basis of F2s impacts the implementation efficiency.

From a design point of view, one thus has to choose a linear layer given asa mapping on Fd2s and an F2-basis of F2s to concretely specify the primitive.This is actually a very natural separation of the design of the cipher and itsspecification (and thus implementation) on bit level. As nicely explained in [DR11]by introducing Rijndael-GF, this separation is probably most obvious for theAES itself, but in principle possible for any cipher. Following [DR11], the choice ofbasis is to a large extent independent of the design and the security of the cipher.However, the choice of basis might have a significant impact on the efficiency ofthe cipher on certain platforms.

For software implementations, depending on the details, the choice of basis iseither irrelevant (in e.g. a table-based implementation) or hard to capture (in e.g. abit-sliced implementation) as the efficiency might depend on the exact instructionsoffered by a given platform. For hardware implementations, one has to distinguishbetween a serial implementation or a round-based implementation. As the round-based implementation seems most relevant in practice (see [SKOP15]), we mainlyfocus on this use-case here.

For a round-based hardware implementation, the impact of the choice of basisalready becomes apparent when focusing on how to implement the multiplicationwith one given element α in F2s . For different choices of bases, the efficiency ofimplementations of the resulting F2-linear mappings differs significantly. Thus,the very fundamental task we study in the first part of this chapter is:

For a given element α ∈ F2s find a basis such that multiplication by α can beimplemented most efficiently.

While the above question is of independent interest, with potentially very dif-ferent applications, we use our results for designing efficient linear layers. Thus, inthe second part of this chapter, we will give several constructions of MDS matrices.Echoing the above, the construction of our MDS matrices are independent of thechoice of the basis – actually to a large extent independent of the field size as well.

The combination of the first part, i.e., how to choose a basis that allows foran optimal implementation, and the second part, i.e., the construction of MDSmatrices, finally results in implementations of MDS matrices that are very efficientfor a large variety of parameters. This application serves as a nice example wherean improved understanding on how to choose the field representation immediatelyleads to improved results. This is even more interesting as the construction ofefficient MDS matrices has been an active field of research recently.

Related Work

In particular the construction of efficient serial MDS matrices is a well-studiedsubject. Considering serial implementations of MDS matrices is based on the initialidea of Guo, Peyrin, and Poschmann used in the design of PHOTON [GPP11] and

52

later in the block cipher LED [GPPR11]. In a nutshell the idea is not to implementan MDS matrix directly, but rather implement a matrix A such that Al is MDS forsome small l. When considering a hardware implementation, it reduces the chiparea if implementing A is significantly cheaper than Al. The circuit implementingA is then iterated l times, which does not increase its size significantly. This basicidea has been further generalized and improved in a series of subsequent papers. In[SDMS12] and [WWW13] the authors focus on even more efficient choices for A byconsidering F2-linear MDS codes. Their approach uses symbolic computations inorder to derive general conditions on how to choose the matrix entries independentof the dimension.

In [XZL14], Xu et al. furthermore took into account the cost of implementingthe inverse matrix. In [AF15], Augot and Finiasz improved significantly upon theefficiency of the search algorithm of [SDMS12], allowing them to search for MDSmatrices of much larger dimension than previously possible.

For the case of round-based implementations, the authors of [SKOP15] focuson MDS matrices that have an efficient implementation (in terms of the number ofXOR operations needed) and put special emphasis on involutory MDS matrices,i.e., MDS matrices that are their own inverse. They derive several constructionsand rather efficient search methods for MDS matrices meeting their goals. Liu andSim [LS16] improved upon some of those results by characterizing equivalences incirculant (and circulant-like) MDS matrices and thus further reduced the searchspace. In both works, in order to improve the efficiency for a given MDS matrixdefined over a finite field, the authors considered different representations of theunderlying finite fields by running through all possible irreducible polynomials ofthe given degree. However, in view of the question of how to choose an optimalbasis, this corresponds to investigating only a small subset of all possible bases.Work on investigating the XOR-count distribution for other than the polynomialbases has also been done in [SS16a].

Li and Wang constructed circulant involutory F2-linear MDS matrices [LW16].While it was already known that circulant MDS matrices over a finite field cannotbe involutory [GR14], they have shown their existence in this more general case.Independently, the authors of [LS16] have shown the existence of left-circulantinvolutory MDS matrices over finite fields.

Recently, Sarkar and Syed pointed out how lightweight linear layers could beconstructed from Toeplitz matrices and constructed MDS matrices with optimizedXOR-count [SS16b]. As previous work, they only considered field representationsgiven by a polynomial basis. In [JPST17], the authors introduced the s-XOR met-ric which allows to reuse intermediate results for computing the XOR operations.They investigated the s-XOR count for all elements of the finite fields F24 and F28

under all possible polynomial bases and were able to derive some of the lightestMDS matrices known to date using an improved search tool. We adopted thiss-XOR metric for our purposes of optimizing finite field multiplications with fixedelements.

It is important to remark that all of those results on finding the lightest MDS

53

matrices – as well as our approach on constructing lightweight MDS matrices – arebased on local optimizations. In a matrix-vector multiplication, every coordinateof the result is computed as the sum over multiplications in the finite field. Inlocal optimizations, the XOR operations needed to sum up the results of the mul-tiplications are considered as a fixed part and only the overhead for computing themultiplications in the finite field are optimized. Recently, Kranz et al. focussedon globally optimizing the implementation cost for those locally-optimized ma-trices [KLSW17]. Using well-known heuristic algorithms for finding the shortestlinear straight-line program, they were able to significantly improve the implemen-tation. Thus, they results suggest that future research should focus on globallyoptimizing the implementation cost of linear layers.

Results of this Chapter

After introducing notation and recalling some basics in Section 3.2, in Section 3.3we study the question of how to find an optimal implementation of the multiplica-tion by a given field element α. Here, efficiency is measured in terms of the numberof binary additions (aka. XOR operations) needed to implement the correspond-ing binary matrix. Note that this metric corresponds to the s-XOR metric asintroduced in [JPST17] and differs from the XOR-count metric used in [KPPY14]and [SKOP15]. In those two (and many other) papers, the XOR-count of an s× smatrix M is defined as the number of ones in M minus s. However, the numberof (additional) ones in a matrix does not necessarily correspond to the number ofXOR operations needed for implementation. Thus, while the number of ones inM is certainly an easier to handle metric, it is more appropriate to consider theactual number of XOR operations as the efficiency metric. For technical reasons,we focus on the number of XOR operations without using temporary registers,i.e., in-place XOR operations. One of our main results in this first part is that fora non-trivial element α, one can find a basis such that the matrix correspondingto multiplication with α can be implemented with one single XOR operation ifand only if the characteristic polynomial of α is an irreducible trinomial, i.e., anirreducible polynomial with exactly three non-zero coefficients. Note that in ournotion, an XOR-count equal to one coincides with the definition of the XOR-countin [KPPY14] and [SKOP15]. The interesting part here is that the condition on thecharacteristic polynomial is not only sufficient but also necessary. As an immedi-ate consequence, one cannot hope to implement the multiplication by any elementα 6= 1 in F∗28 with one binary addition only. This follows by the above and thewell-known fact that there do not exist irreducible trinomials of degree 8 [Swa62].

We further show that, for any given basis, there are at most two (non-trivial)elements α and β such that the multiplication with those elements can be im-plemented with one XOR operation. In fact, β is necessarily the multiplicativeinverse of α.

While the weight of the (irreducible) characteristic polynomial of an elementα clearly gives an upper bound of the number of XOR operations needed to im-

54

plement the corresponding multiplication, we show that this bound is in generalnot tight in the case where the characteristic polynomial is of weight larger thanthree.

In particular, for all elements α ∈ F∗2s with 4 ≤ s ≤ 8 we present an optimalrepresentation such that the multiplication with α can be implemented with a min-imal number of XOR operations. For all those elements α, that are not containedin a proper subfield of F2s , the multiplication can be implemented with at mostthree XOR operations (and often with two only). Those results are given in Tables3.3 to 3.7 and cover the cases which are most relevant for symmetric cryptographyin practice. Interestingly, and maybe counter-intuitive, multiplication with nontrivial elements in a proper subfield turns out to be among the most expensive inall the cases explored here.

Moreover, for all 2 ≤ s ≤ 2048 for which no irreducible trinomial of degree sexists, we present one element α ∈ F2s such that multiplication by α requires twoXOR operations, see Table 3.8. Those results are proven optimal by the abovementioned necessary and sufficient condition.

In the second part, i.e., Section 3.4, we present several (circulant) matrices.Entries in those matrices are represented as powers of a generic field element α. Bysymbolically computing all minors, i.e., the determinants of all square submatrices,we derive a list of polynomials in F2[α]. Now, whenever α is chosen such that it isnot a root of any of those polynomials, the matrix is MDS. One nice consequenceof this approach is that, as the degree of those polynomials is limited, our matricesare MDS for almost all elements in F2s as soon as s is large enough, i.e. largerthan the maximal degree of those polynomials.

Finally, the first and second part are combined in Section 3.4.2 to result in veryefficient MDS matrices in terms of the XOR-count. A summary of our results andcomparison with other work is given in Table 3.1 and Table 3.2, respectively. Themain observation here is that if multiplication by α can be implemented with tXOR operations, then multiplication by α±i for i ≥ 0 can be implemented with atmost t · i XOR operations.2 Thus, by simply minimizing the sum of the (absolute)exponents for our circulant MDS matrices, we immediately reduce the XOR-count.

As an interesting side result, we like to point out that the XOR-count per bitactually decreases with increasing field size.3 For example, our 4×4 MDS matriceshave a per bit XOR-count of 3+ 3

s , or 3+ 6s in the case that no irreducible trinomial

of degree s exists.

Thus, even though reducing the number of XOR operations has already re-ceived considerable attention recently, this part nicely shows that our improvedunderstanding of how to choose an optimal basis allows us to easily improve uponknown constructions. Note that such improvements are possible independent fromwhich XOR-count definition is used, i.e., we were able to improve existing results

2It is exactly this part where considering only in-place XOR operations becomes very helpful,as otherwise multiplication by α and by α−1 might differ in their XOR-count.

3This is also true for the constructions given in [WWW13], but does not hold for the subfield(or code-interleaving) construction.

55

also in the simple XOR-count definition by changing the basis. For example, wefound an element in F28 with only two additional non-zero entries in its matrix,which directly improves the results of [SKOP15].

Finally, in Section 3.5 we give a perspective on F2-linear MDS matrices. Inparticular, we point out that while there exists no α ∈ F28 (resp. F213 , F216) whichcan be implemented with only one XOR operation, there does exist an 8×8 (resp.13×13, 16×16) binary matrix, that can be used in place for the multiplication byα in the above mentioned 4× 4 matrix to result in an additive MDS matrix withreduced cost.4 Again, the idea of considering the entries of the matrix as powersof a single field element is beneficial as the conditions for the matrix to be MDSremain basically unchanged.

We then conclude by pointing to some interesting questions for future investi-gations.

3.2 Preliminaries

Although there exists up to isomorphism only one finite field for every possibleorder, we are interested in the specific representation. For instance, if Q ∈ F2[X]is an irreducible polynomial of degree s, then F2s

∼= F2[X]/(Q) where (Q) denotesthe ideal generated by Q.

We first recall some basics about finite fields and matrix representations. Formore background the reader is referred to, e.g., [LN94, Section 2.5] and [War94].Let V ∼= Ks be a finite-dimensional vector space over the field K. Every linearmapping f : V → V can be described as v 7→ ABv by a left-multiplication witha matrix AB ∈ Mats(K). This representation is dependent on the choice of thebasis B for V . For instance, if B = {b1, . . . bs}, the j-th column of AB consists ofthe coefficients a1,j , . . . , as,j of f(bj) =

∑si=1 ai,jbi. Thus, changing the basis from

B to B′ results in a different matrix representation of f . This transformation iscalled the change of basis transformation, which is simply a conjugation of AB .Thus, AB′ = TABT

−1 using an invertible matrix T . In this case, AB and AB′ areare called similar (resp. permutation-similar if T is a permutation matrix).

There is a natural way of representing the elements in a finite field with char-acteristic p as vectors with coefficients in Fp. In the following, we consider therepresentation of the multiplication by α by a matrix as described in the followingdiagram.

F2s F2s·α

Fs2 Fs2

ΦB Φ−1B

Mα,B

4Note that the authors of [LW16] constructed a similar 32× 32 F2-linear MDS matrix.

56

The bijection ΦB maps elements α ∈ F2s to its vectorial representation over F2

with regard to a basis B (and Φ−1B vice versa). Mα,B denotes the s×s matrix rep-

resenting (left-) multiplication by the element α. For different bases B and B′, onecan obtain Mα,B′ from Mα,B by the change of basis transformation, in particularMα,B′ = TMα,BT

−1 for an invertible T . We denote similarity of matrices withthe relation symbol ∼, (resp. ∼π for permutation-similarity). The characteristicpolynomial of a matrix A is defined as χA := det(λI−A) ∈ F2[λ] and the minimalpolynomial is denoted by mA. Recall that the minimal polynomial is the (monic)polynomial P of least degree, such that P (A) = 0s. It is a well-known fact that theminimal polynomial divides the characteristic polynomial, thus χA(A) = 0s. Asthe minimal polynomial and the characteristic polynomial are actually propertiesof the underlying linear mapping, similar matrices have the same characteristicand the same minimal polynomial. A special type of matrix, that will play animportant role in the following is the companion matrix of a polynomial.

Definition 3.1. For a monic polynomial

Q = Xd + πd−1Xd−1 + · · ·+ π1X + π0 ∈ F2[X]

of degree d, the companion matrix of Q is defined as the d× d matrix

CQ :=

0 π0

1 0 π1

. . .. . .

...1 0 πd−2

1 πd−1

.

It is known from linear algebra that the characteristic polynomial and theminimal polynomial of CQ are equal to Q itself, i.e., χCQ = mCQ = Q. In addition,any matrix A is similar to a companion matrix if and only if its characteristicpolynomial coincides with its minimal polynomial. In particular, CQ is exactlythe rational canonical form of A in this case (see Proposition 6.10 in Chapter 6).

3.2.1 The XOR-Count and the Cycle Normal Form

The XOR-count of a field element was already studied in [KPPY14] and [SKOP15].In their formal definition, a matrix A ∈ GLs(F2) has an XOR-count of t if and onlyif A can be written as a permutation matrix with t additional non-zero entries.Formally, A = P +

∑tk=1E

[ik,jk] and w(A) = s + t. Here, P is a permutationmatrix and E[ik,jk] denotes a binary s×s matrix which consists of all zeros, exceptin the ik-th row of the jk-th column. Although all matrices of that structurecan be implemented with at most t XOR operations (not necessarily withouttemporary registers), the construction does not contain all possible matrices whichare realizable with at most t XOR operations. For instance, there are matriceswith three additional non-zero entries such that the result of their defining linear

57

function can be computed with just two additions. As an example, consider1 0 11 1 10 0 1

v1

v2

v3

=

v1 + v3

(v1 + v3) + v2

v3

.

In the following, we consider an alternative definition which includes the casesdescribed above. Note that this corresponds to the notion of the s-XOR metric,which was first introduced in [JPST17].

Definition 3.2. A matrix A ∈ GLs(F2) has an XOR-count of t, denoted w⊕(A) =t, if t is the minimal number such that A can be written as

A = P

t∏k=1

(I + E[ik,jk])

for a permutation matrix P and such that ik 6= jk for all k.

Note that if a matrix can be represented in the form P∏tk=1(I + E[ik,jk]),

the number of factors (I + E[ik,jk]) clearly gives an upper bound on the actualXOR-count. It is worth pointing out that the definition above just counts thenumber of XOR operations without using temporary registers. Those are tech-nically somewhat easier to handle. However, this restriction does not make adifference for matrices with XOR-count less or equal to two, which we are mostconcerned about in in the following. In general, allowing temporary registers mightwell reduce the number of XOR operations needed for an implementation.

This definition of the XOR-count coincides with the one from [KPPY14] forthe case that t = 1, i.e., for matrices of XOR-count 1. For other cases, the numberof additional non-zero entries can increase. We will often consider t = 2 withinthis chapter. By evaluating the product, it follows that any A with w⊕(A) = 2 isof the form

A =

{P + P (E[i1,j1] + E[i2,j2]) iff i2 6= j1

P + P (E[i1,j1] + E[i2,j2] + E[i1,j2]) iff i2 = j1 .

The XOR-count is invariant under permutation-similarity. Moreover, naturallyin the setting not allowing temporary registers, the XOR-count is invariant undertaking the inverse. This is summarized and formally proven in the following lemmaand corollary.

Lemma 3.1. If A ∼π A′, then w⊕(A) = w⊕(A′).

Proof. Let A′ = QAQ−1 where Q is the permutation matrix representing thepermutation σ ∈ Ss. Let I +E[ik,jk] be a factor in the XOR-count representationof A = P

∏tk=1(I +E[ik,jk]), where t = w⊕(A). Then the following identity holds:

(I + E[ik,jk])Q−1 = Q−1 + E[ik,σ−1(jk)] = Q−1(I + E[σ−1(ik),σ−1(jk)]) .

58

One is able to commute Q−1 to the front before the first factor by proceeding forall of the t factors and finally obtain

A′ = QPQ−1t∏

k=1

(I + E[σ−1(ik),σ−1(jk)]) .

It follows that w⊕(A′) ≤ w⊕(A). By reverting the above steps we obtain w⊕(A) ≤w⊕(A′).

Corollary 3.1. If w⊕(A) = t, then also w⊕(A−1) = t.

Proof. We show that A−1 is permutation-similar to a matrix with an XOR-countof t. (

P

t∏k=1

(I + E[ik,jk])

)−1

=

1∏k=t

(I + E[ik,jk])P−1 ∼π P−11∏k=t

(I + E[ik,jk])

Later, we would like to be able to exhaustively search over all matrices with lowXOR-count for a given dimension s. Since the number of permutation matrices(which is s!) rapidly increases with s, an exhaustive search will quickly becomeinfeasible if we do not restrict the structure of P . By a well-known fact fromcombinatorics, one is able to assume P to be in a specific form.

Lemma 3.2. For any permutation matrix P of dimension s, it is

P ∼πd⊕k=1

CXmk+1

for some mk with∑dk=1mk = s and m1 ≥ · · · ≥ md ≥ 1.

Proof. It is well known that two permutations with the same cycle type are con-jugate [DF04, Chapter 4.3, Proposition 11]. That is, given the permutationsσ, τ ∈ Ss as

σ = (o1, o2, . . . , od1)(od1+1, . . . , od2) . . . (odm−1+1, . . . , odm)

τ = (t1 , t2 , . . . , td1 )(td1+1 , . . . , td2 ) . . . (tdm−1+1 , . . . , tdm )

in cycle notation, one can find a π ∈ Ss such that πσπ−1 = τ . This π operates asa relabeling of indices.

Let σ in the form above be the permutation defined by P . Now, there exits apermutation π such that πσπ−1 = (d1, 1, 2, . . . , d1 − 1)(d2, d1 + 1, d1 + 2, . . . , d2 −1) . . . (dm, dm−1 + 1, dm−1 + 2, . . . , dm − 1). If Q denotes the permutation matrixdefined by π, one obtains QPQ−1 in the desired form.

59

We say that any permutation matrix of this structure is in cycle normal form.The cycle normal form of P is denoted by C(P ). Up to permutation-similarity,we can always assume that the permutation matrix P of a given matrix withXOR-count t is in cycle normal form, as stated in the following corollary.

Corollary 3.2.

P

t∏k=1

(I + E[ik,jk]) ∼π C(P )

t∏k=1

(I + E[σ−1(ik),σ−1(jk)])

for some permutation σ ∈ Ss.

3.3 Efficient Multiplication in Finite Fields

In this section, we first present some theoretic results towards understanding thestructure of matrices Mα,B representing (left-) multiplication by some finite fieldelement α ∈ F∗2s . The parameter B indicates a basis of F2s considered as an s-dimensional vector space over F2. The XOR-count of Mα,B is indeed depending onthe choice of the basis B. As described in Corollary 3.2, we can assume a certainnormal form for matrices with an XOR-count of t.

Not every (invertible) matrix is a representation of a field multiplication. Forexample, an obvious condition for that is that the multiplicative order of thematrix divides 2s − 1. In order to understand exactly which matrices indeedrepresent multiplication with some field element α, Theorem 3.1 below gives acharacterization that allows to efficiently decide when a given matrix correspondsto multiplication by a field element. The crucial part is the minimal polynomialof α. It is a property of the linear mapping

fα : F2s → F2s , β 7→ αβ

and is invariant under changing the specific representation of fα to β 7→Mα,Bβ.

Theorem 3.1. Let A ∈ Mats(F2) \ {0s}. Then A = Mα,B for some elementα ∈ F∗2s with respect to some basis B if and only if mA is irreducible.

Proof. As described in [War94], the ring generated by a matrix A defines a fieldof order 2s if and only if the characteristic polynomial χA is irreducible. This isthe case since χA(A) = 0 and thus A is the root of an irreducible polynomial of

degree s. One can see that F2(A) = {∑s−1i=0 αiA

i | αi ∈ F2} since it must containall sums of powers of A. However, for F2(A) being a field it is not necessary that Ahas an irreducible characteristic polynomial. It can be possible that A generatesa subfield F2m of F2s . As we show now, this is the case if and only if the minimalpolynomial of α is irreducible and has degree m.

If mA is not irreducible, F2(A) is not a field and thus A cannot represent afield multiplication. Let now mA be irreducible. The characteristic polynomial

60

χA is necessarily a power of mA, since both of these polynomials share the sameirreducible factors. So, χA = (mA)d for some positive integer d. Both d anddegmA divide s. Because of the irreducibility of mA, the rational canonical formof A consists of d blocks of CmA . Thus, we obtain the similarity

A ∼d⊕k=1

CmA .

Since χCmA = mA, the matrix A defines a multiplication with some element ina subfield of F2s .

Note that any field element α is, up to its conjugates α, α2, α22, . . . , α2s−1

,uniquely identified by its minimal polynomial. For every field element α, theminimal polynomial mα is exactly the minimal polynomial mA of a matrix Arepresenting multiplication with α. Furthermore, two matrices A,A′ ∈ Mats(F2)with the same irreducible minimal polynomial are similar. Thus, given a matrixA, identifying the element α such that A = Mα,B is equivalent to computing the(irreducible) minimal polynomial of A.

The main question is which field elements can be implemented with a minimalnumber of XOR operations, or in particular, what is the minimal XOR-count for agiven (non-trivial) field element α ∈ F∗2s . Trivially, multiplication with α = 1 canbe implemented with zero additions since M1,B = Is for all bases B. On the otherhand, if the XOR-count is 0, the element is equal to 1. In a first place, we thus aimfor an XOR-count of 1 whenever possible. By a simple observation, this optimalresult can be realized if the minimal polynomial of α is a trinomial of degree s.

Example 3.1. Let the field with 2s elements be represented as F2s = F2[X]/(Q)for an irreducible polynomial Q of degree s. For the (left-) multiplication withX in the canonical basis B = {1, X,X2, . . . , Xs−1}, it is MX,B = CQ. Thus,w⊕(MX,B) = w(Q)− 2 and the XOR-count of MX,B equals 1 if Q is a trinomial.

Since our approach is about finding any (non-trivial) element α ∈ F∗2s such thatmultiplication with α can be implemented with minimal additions, this fact impliesthat we cannot hope to improve upon the implementation costs if there exists anirreducible trinomial of degree s. However, for several s, including the interestingcase where s is a multiple of 8, there does not exist such a trinomial [Swa62]. Thequestion is what happens for these cases. As one of our main results, we show thatthe condition on the minimal polynomial is not only sufficient but also necessary.

3.3.1 Characterizing Elements with Optimal XOR-Count

In this section, we prove the converse of the fact described in Example 3.1, namelythe necessary condition on the minimal (resp. characteristic) polynomial of αresulting in an XOR-count of 1.

Theorem 3.2. Let α ∈ F2s . Then there exists a matrix A with w⊕(A) = 1 suchthat A = Mα,B for some basis B if and only if mα is a trinomial of degree s.

61

Proof. Let Mα,B represent multiplication by some element α ∈ F2s with respectto the basis B = {b1, . . . , bs} and let further w⊕(Mα,B) = 1. We show thatthe characteristic polynomial χMα,B

is a trinomial and coincides with mα. Since

the XOR-count is 1, we can assume, w.l.o.g., that Mα,B = P + E[i,j] such that

P =⊕l

k=1 CXmk+1 is in cycle normal form. We first show that l = 1. Supposel > 1, then, depending on E[i,j], the matrix Mα,B is either in upper or lower block-triangular form consisting of at least two diagonal blocks. Since at least one of themmust be of the form CXm+1, the polynomial Xm+1 must divide the characteristicpolynomial χMα,B

. Since further (X + 1) | (Xm + 1), the minimal polynomial ofα is necessarily a multiple of X + 1. This is a contradiction since α 6= 1 and mα

must be irreducible. Hence, Mα,B is permutation-similar to CXs+1 + E[i,j]. It isfurther i 6= j + 1 mod s since otherwise Mα,B would be singular.

We now investigate how α operates on the basis elements bk ∈ B. Consideringthe structure of Mα,B , we obtain the following list of equations.

αb1 = b2

...

αbj−1 = bj

αbj = bj+1 + bi

αbj+1 = bj+2

...

αbs = b1 .

By defining γ = bj+1, one can express every basis element bk as a power of αmultiplied by γ. In particular,

bj+k mod s = αk−1γ (3.1)

for k ∈ {1, . . . , s}. Combining this observation with the identity αbj = bj+1 + bi,one obtains

αsγ = γ + αtγ (3.2)

for some exponent t 6= 0. Since γ 6= 0, the field element α is a root of the trinomialQ = Xs + Xt + 1. It is left to show that Q is exactly the minimal polynomial ofα. Suppose that mα = Xm +

∑m−1k=0 ckX

k with coefficients ck ∈ {0, 1} and m < s.By multiplying mα(α) with γ, one obtains

αmγ =

m−1∑k=0

ckαkγ

and thus btm =∑m−1k=0 ckbtk for some basis elements btk . We are now able to express

one basis element btk as a sum of other elements from B which is contradictory to

62

the linear independence of the basis. Hence, degmα = s and thus mα = Q whichfinally proves the theorem.

Note that the polynomial Q is exactly the characteristic polynomial of Mα,B

since it must be a monic multiple of mα having degree s. An alternative way ofproving that the characteristic polynomial of a matrix CXs+1 +E[i,j] is a trinomialis given in Section 3.3.2 below. As a simple corollary one obtains that any α ∈ F∗2swith an XOR-count of 1 cannot be contained in a proper subfield.

Corollary 3.3. Let α ∈ F∗2s \ {1} and let further degmα < s, indicating that αlies in a proper subfield of F2s . Then, any matrix Mα,B representing multiplicationby a field element α with respect to some basis B has w⊕(Mα,B) > 1.

This result implies that building MDS layers using a block interleaving con-struction [ADK+14], also called subfield construction in [KPPY14], almost alwaysresults in suboptimal implementation costs. Note that specific instances of thisconstruction are also implicitly used in the AES, LS-Designs [GLSV15] and thehash function Whirlwind [BNN+10].

Now let α be an element with XOR-count 1. From Corollary 3.1 we know thatα−1 has the same XOR-count. Next, we show that there do not exist any furtherelements with an XOR-count equal to 1.

Theorem 3.3. For any given basis B of F2s , there exist at most two field elementsα and α−1 with w⊕(Mα,B) = w⊕(Mα−1,B) = 1.

Proof. Let α ∈ F∗2s with w⊕(Mα,B) = 1 for the basis B = {b1, . . . , bs}. We showthat, for any β ∈ F2s with w⊕(Mβ,B) = 1, it holds that β = α±1.

Since, w.l.o.g., Mα,B can be assumed to be of the form CXs+1 +E[i,j], we know

that (3.1) and (3.2) hold. We further know that Mβ,B is of the form P + E[i′,j′]

and thus there exist l,m ∈ {1, . . . , s} with l 6= m and βbj+l mod s = bj+m mod s.Using equation (3.1), we can write β = αm−l =: αy where y ∈ {−(s−1), . . . , s−1}.We directly see that y 6= 0. It remains to show that −1 ≤ y ≤ 1.

Assume y ≥ 2. We use equations (3.1) and (3.2) to obtain

βbj+(s−y+1) mod s = αsγ = γ + αtγ = bj+1 mod s + bj+t+1 mod s .

Since 0 < t < s, it holds that bj+1 mod s 6= bj+t+1 mod s and thus the accordingcolumn contains an additional 1. For the next column, we have

βbj+(s−y+2) mod s = αs+1γ = αγ + αt+1γ

=

{bj+2 mod s + bj+t+2 mod s, for t < s− 1

bj+2 mod s + bj+1 mod s + bj mod s, for t = s− 1

Hence, this column also contains at least one additional 1 which is contradictoryto the XOR-count of 1.

For −y ≥ 2 we can construct the same contradiction by considering β−1.

63

We now understand the structure of field elements α that can be implementedwith a single addition. One might think that also for the other cases, the weight ofthe minimal polynomial of α strictly lower-bounds the XOR-count as w(mα)− 2.As we will see next, this is not the case.

3.3.2 Experimental Search for Optimal XOR-Counts

Surprisingly, we often can improve the XOR-count, compared to using the com-panion matrix for multiplication, if the weight of the minimal polynomial is greaterthan 3. For instance, if mα is an irreducible pentanomial (that is of weight 5) ofdegree s, there often exists a basis B such that w⊕(Mα,B) = 2. Indeed, for all2 ≤ s ≤ 2048 for which no irreducible trinomial of degree s exists, we found someelement α ∈ F∗2s with an XOR-count of 2 for some basis B. For every such dimen-sion, we present an example of such a matrix in Table 3.8. Thus, for all practicallyrelevant fields, we are able to identify an element such that multiplication can beimplemented with one or two XOR operations. By Theorem 3.2, these results areproven to be optimal.

Moreover, as fields of small size are most interesting for SP ciphers, we inves-tigated those in full detail. For the fields F24 , F25 , F26 , F27 and F28 we presentthe optimal XOR-count for each non-trivial element α in Tables 3.3, 3.4, 3.5, 3.6and Table 3.7, respectively. The main observation is that each element which isnot contained in a proper subfield can be implemented with at most 3 additions.Furthermore, whenever an XOR-count of 2 is possible, the minimal polynomial ofα is a pentanomial in all those cases. However, a more thorough characterizationof elements with non-optimal XOR-count is left as an open problem (see Section3.6 for more details).

Those results are based on a search. Since we are only interested in matricesup to similarity (due to the change of basis), we just need to consider all matricesin the normal form described in Corollary 3.2. This will exhaust all possibilitiesof similarity classes for a given XOR-count t. In particular, the search space isreduced from s!(s(s− 1))t to only p(s)(s(s− 1))t where p(s) denotes the numberof partitions of s, which is exactly the number of possible cycle normal forms ofdimension s. This allows us to exhaustively search over all similarity classes upto t = 3 XOR operations for the fields of small size. The key-point here is that,instead of searching for an optimal basis for a given field element, we generatedall matrices with small XOR-count and used Theorem 3.1 in order to check whichfield element (if any) the given matrix corresponds to.

In order to identify a single lightweight element for larger field sizes, we iden-tified conditions in which cases the characteristic polynomial of a matrix withXOR-count 2 has weight 5, see Theorem 3.4 below. During the search, one onlyhas to check for irreducibility. This allows to compute the results presented inTable 3.8 extremely fast, i.e., within a couple of minutes on a standard PC.

64

Theorem 3.4. Let M = CXs+1+E[i1,j1]+E[i2,j2] such that the following relationshold:

i1 < j1 6= s, i2 > j2 + 1, i1 ≤ j2,i2 ≤ j1, j1 − (i1 − 1) 6= s, s− (j1 − i1) 6= i2 − j2 .

The characteristic polynomial of M is a pentanomial of degree s. In particular,

χM = λs + λs+i1−j1+i2−j2−2 + λs+i1−j1−1 + λi2−j2−1 + 1 .

Proof of Theorem 3.4

In the following, we first present an alternative way of proving the fact that thecharacteristic polynomial of some matrix M = CXs+1 +E[i,j] with w⊕(M) = 1 isa trinomial of degree s. This is true in general, even if M does not represent amultiplication with a field element. This later helps us to prove Theorem 3.4.

Lemma 3.3. For M = CXs+1 + E[i,j] with w(M) = s + 1, the characteristicpolynomial χM of M is a trinomial of degree s.

Proof. It is to compute χM = det(λIs−M) = det(λIs +CXs+1 +E[i,j]). If j = s,then M = CXs+Xi−1+1 and χM = λs + λi−1 + 1 is a trinomial of degree s. Thus,w.l.o.g., one can assume j < s. To compute the determinant, we use Laplace’sformula by expanding along the s-th column. One obtains

χM = det

1 λ

1 λ. . .

. . .

1 λ1

+ E[i−1,j]

+ λ det

λ1 λ

. . .. . .

1 λ1 λ

+ E[i,j]

,

where E[0,j] := 0 and E[s,j] := 0. Both of these remaining matrices are of dimen-sion (s− 1)× (s− 1). We now distinguish three cases:

1. i < j: The additional 1 lies in the upper triangle of M . Now, χM reduces toχM = 1 +λ det(λIs−1 +CXs−1 +E[i,j])). In order to compute the remainingdeterminant, we keep on expanding along the last column for s− 1− j timesuntil the additional 1 is located in the rightmost column. We now obtainthe determinant of a companion matrix. Thus,

χM = 1 + λs−j det(λIj + CXj+Xi−1)

= 1 + λs−j(λj + λi−1) = λs + λs−j+i−1 + 1 .

65

2. i = j: In this case, the additional 1 lies on the main diagonal of M and

χM = 1 + λ(λs−2(λ+ 1)) = λs + λs−1 + 1 .

3. i > j: The additional 1 lies in the lower triangle of M . Because of thestructure of M , it is further i > (j + 1). Defining the m×m matrix Sλm as

Sλm :=

1 λ

1 λ. . .

. . .

1 λ1

,

the characteristic polynomial of M reduces to χM = det(Sλs−1+E[i−1,j])+λs.

We expand along the last row of Sλs−1 + E[i−1,j] for s − i times and get

χM = det(Sλi−1 + E[i−1,j]) + λs.

Now, the additional 1 lies in the last row of the remaining (i− 1)× (i− 1)-dimensional matrix. The goal is now to shift this 1 to the first column. Thisis done by expanding j − 1 times along the first column. We now obtainχM = det(Sλi−j + E[i−j,1]) + λs and the additional 1 is in the lower leftcorner of the matrix. As a last step, we expand along the first column forone more time and finally get

χM = λs + det(Sλi−j + E[i−j,1]) = λs + det(λIi−j−1 + CXi−j−1) + 1

= λs + λi−j−1 + 1 .

We now present the proof of Theorem 3.4 which makes use of Lemma 3.3.

Proof of Theorem 3.4. The first two conditions ensure that M has exactly oneadditional non-zero entry in the upper and one in the lower triangle (not on themain diagonal). Since j1, j2, i2 6= s, we can expand along the last column andobtain

χM = det(Sλs−1 + E[i1−1,j1] + E[i2−1,j2])

+ λdet(λIs−1 + CXs−1 + E[i1,j1] + E[i2,j2]) .

For simplicity, we define A := Sλs−1+E[i1−1,j1]+E[i2−2,j2] andB := λIs−1+CXs−1+

E[i1,j1] + E[i2,j2]. In order to compute the latter part, we ”push” the additionalnon-zero entry from the upper triangle to the top-right corner by first expandings− 1− j1 times along the last column and then expanding i1 − 1 times along thefirst row. The condition i2 ≤ j1 ensures that E[i2,j2] will not be eliminated fromexpanding along the last column and the condition i1 ≤ j2 ensures that E[i2,j2]

66

will not be eliminated from expanding along the first row. Using Lemma 3.3, oneobtains

λ det(B) = λλs−1−j1λi1−1 det(λIj1−i1+1 + CXj1−i1+1+1 + E[i2−i1+1,j2−i1+1])

= λs−1−j1+i1(λj1−i1+1 + λi2−i1+1−j2+i1−1−1 + 1)

= λs + λs+i1−j1+i2−j2−2 + λs+i1−j1−1 .

For det(A), we proceed similar to case (iii) in Lemma 3.3. We first expand j2 − 1times along the first column in order to get the additional non-zero value fromthe lower triangle to the leftmost column. Because of the condition i1 ≤ j2, thiseliminates E[i1−1,j1]. Now, one can expand s− j2 − (i2 − j2) times along the lastrow, until the remaining additional non-zero entry lies in the lower left corner ofthe remaining matrix. We finally expand along the first column one more timeand obtain

det(A) = det(Sλs−j2 + E[i2−j2,1]) = det(Sλi2−j2 + E[i2−j2,1]) = λi2−j2−1 + 1 .

The last two relations make sure that all of the five coefficients of det(A)+λ det(B)are distinct such that χM is indeed a pentanomial.

3.4 Constructing Lightweight MDS Matrices

Our goal is now to construct lightweight MDS matrices. We use the results ob-tained in the previous sections and restrict our search to circulant matrices andentries with low XOR-count. This simplifies checking the MDS property and com-puting an upper bound of the XOR-count of the whole matrix. The complexityof our algorithm enables us to easily search for MDS matrices up to dimension 8.Our construction is generic and works for all finite fields F2s with s > b for a givenbound b.

More precisely, we construct circulant matrices with entries of the form α±i

where α is an element in F2s . Choosing entries of this form enables us to easilyupper-bound the XOR-count of the elements since

w⊕(x±k) ≤ kw⊕(x) .

This can be easily seen by using Corollary 3.1 and the fact that αk can be im-plemented by k times implementing α. We want to keep the size of the finitefield over which the matrix is defined generic. Thus, we choose the matrix entriesfrom a subgroup of the field of fractions of the polynomial ring F2[X], denotedQuot(F2[X]). That is, every element is of the form

Xd + ad−1Xd−1 + · · ·+ a1X + a0

Xt + bt−1Xt−1 + · · ·+ b1X + b0.

More precisely, and as mentioned above, we restrict our search to elements from〈X〉 which is the multiplicative subgroup of Quot(F2[X]) generated by X. Our

67

search works by constructing MDS conditions for an n× n matrix M with entriesin 〈X〉. This approach later allows us to substitute the indeterminate X by anyα ∈ F2s that fulfills all of the conditions given below. In this context, we letM(α) ∈ Matn(F2s) denote the matrix obtained by substituting X with α ∈ F2s .

We define the weight of some circulant matrix with entries in 〈X〉 as the sumof the absolute values of the exponents in its first row, that is, the number of timesα has to be applied per row. Then, for a given dimension, we are interested infinding the lightest matrix M which can be made MDS for as many finite fieldsas possible. Note that the higher priority here was to find a lightweight matrix.Thus, there might exist matrices which can be made MDS for even more fields,but with a probably higher cost.

MDS conditions

Note that a matrix is MDS if and only if all its square submatrices are invert-ible [MS77, p. 321, Theorem 8]. Thus, given a matrix M ∈ Matn(Quot(F2[X])),we compute the determinants of all square submatrices (called minors) of M inorder to check the MDS property. This way one obtains a list of conditions (poly-nomials in F2) for a matrix to be MDS. Since the determinant of a matrix withelements from a field is an element of the field itself, all of these determinants canbe represented as the fraction of two polynomials. Thus, M is MDS if and only ifthe numerator of all minors is non-zero. One can decompose the numerators intotheir irreducible factors and collect all of them in a set T . This set now definesthe MDS conditions. In particular, M(α) is MDS if and only if α is not a rootof any of these irreducible polynomials in T , i.e., if and only if mα /∈ T . Thistrivially holds for s > maxP∈T {degP} and any α ∈ F2s which is not containedin a proper subfield. In general, if α is not contained in a proper subfield, thenecessary and sufficient condition for the existence of an MDS matrix M(α) isthat not all irreducible polynomials of degree s are contained in T . We note thatthere exists a value b which lower bounds the field size for which M can always bemade MDS. That is, for all t > b, there exists an irreducible polynomial of degreet which is not in T .

3.4.1 Generic Lightweight MDS Matrices

We now present some results obtained by the approach described above. Giventhe restrictions, these matrices achieve the smallest weight, i.e., the smallest sumof (absolute) exponents of X. Later, we will use these generic matrices to buildconcrete instantiations of n × n MDS matrices M(α) for n ∈ {2, 3, . . . , 8} over afinite field F2s with s > b. We note that the given results are not necessarily theonly possible constructions with the smallest weight.

We also present the conditions for the matrix to be MDS, i.e., the irreduciblepolynomials that must not be equal to mα. However, since the number of condi-tions rapidly increases with the dimension of the matrix, we refrain from presentinga complete list for dimensions 6 to 8. Instead, we give the SageMath [StSDT16]

68

Listing 3.1: Sage code for computing the set T .

P.<X> = GF( 2 ) [ ]K = Frac t i onF i e ld (P)

def mds equations (M) :R = [P(X) ]for i in range ( len (M. rows ( ) ) + 1 ) [ 1 : ] :

L = M. minors ( i )for l in L :

i f ( l != 0 ) :F = l i s t ( l . numerator ( ) . f a c t o r ( ) )for f in F:

R. append ( f [ 0 ] )else :

returnreturn l i s t ( set (R) )

source code that was used to compute the set T of irreducible polynomials inListing 3.1.

2× 2 and 3× 3 matrices

The matrices

circ(1, α) =

(1 αα 1

)

and

circ(1, 1, α) =

1 1 αα 1 11 α 1

are MDS for all α 6= 0, 1.

4× 4 matrices

For s > 3, there exists an α ∈ F2s such that the matrix circ(1, 1, α, α−2) is MDS.More precisely, the matrix is MDS if and only if α is not a root of any of the

69

following polynomials:

X

X + 1

X2 +X + 1

X3 +X + 1

X3 +X2 + 1

X4 +X3 +X2 +X + 1

X5 +X2 + 1

5× 5 matrices

For s > 3, there exists an α ∈ F2s such that the matrix circ(1, 1, α, α−2, α) isMDS. More precisely, the matrix is MDS if and only if α is not a root of any ofthe following polynomials:

X

X + 1

X2 +X + 1

X3 +X + 1

X3 +X2 + 1

X4 +X + 1

X4 +X3 + 1

6× 6 matrices

For s > 5, there exists an α ∈ F2s such that the matrix circ(1, α, α−1, α−2, 1, α3)is MDS.

7× 7 matrices

For s > 5, there exists an α ∈ F2s such that circ(1, 1, α−2, α, α2, α, α−2) is an MDSmatrix.

8× 8 matrices

For s > 7, there exists an α ∈ F2s such that circ(1, 1, α−1, α, α−1, α3, α4, α−3) isan MDS matrix.

70

Table 3.1: Optimal instantiations of the generic MDS matrices for 2 ≤ n ≤ 8.In each cell, the first entry describes the minimal polynomial of α ∈ F2s and thesecond entry describes the overhead of the instantiated n× n matrix M(α). Thetrinomial Xs+Xa+1 is denoted by (a) and the pentanomial Xs+Xa+Xb+Xc+1is denoted by (a, b, c), respectively.

sn 2 3 4 5 6 7 8 9 10 11 12 13

2 (1), 1 (1), 1 (1), 1 (2), 1 (1), 1 (1), 1 (6,5,1), 2 (1), 1 (3), 1 (2), 1 (3), 1 (10,9,1), 23 (1), 1 (1), 1 (1), 1 (2), 1 (1), 1 (1), 1 (6,5,1), 2 (1), 1 (3), 1 (2), 1 (3), 1 (10,9,1), 24 - - (1), 3 (3), 3 (1), 3 (1), 3 (6,5,1), 6 (1), 3 (3), 3 (2), 3 (3), 3 (10,9,1), 65 - - (3,2,1), 8 (2), 4 (1), 4 (1), 4 (6,5,1), 8 (1), 4 (3), 4 (2), 4 (3), 4 (10,9,1), 86 - - - - (1), 7 (1), 7 (6,5,1), 14 (1), 7 (3), 7 (2), 7 (3), 7 (10,9,1), 147 - - - - (1), 8 (1), 8 (6,5,1), 16 (1), 8 (3), 8 (2), 8 (3), 8 (10,9,1), 168 - - - - - - (6,5,2), 26 (8), 13 (3), 13 (2), 13 (3), 13 (10,9,1), 26

3.4.2 Instantiating Lightweight MDS Matrices

We now combine the efficient multiplication in finite fields from Section 3.3 withour construction of MDS matrices, i.e., the presented generic MDS matrices areinstantiated with elements α with low XOR-count.

In a matrix multiplication every element is computed as the sum over mul-tiplications. The according XOR-count was already discussed in [KPPY14] and[SKOP15]. For our matrices, the total number of XOR operations needed per rowis upper bounded by

(n− 1)s+ w · w⊕(α) .

Here, (n − 1)s XORs are the static part which comes from summing over themultiplication results and w is the weight as defined above. The overhead ofw · w⊕(α) XORs is needed for multiplying with the single elements. The staticpart cannot be changed by fast multiplication. Therefore, this overhead is the partthat has to be minimized.

The cost per bit for the whole matrix is given by

n((n− 1)s+ ww⊕(α))

ns= n− 1 +

ww⊕(α)

s.

One can notice that it decreases for larger field sizes.For each of the matrices M described in Section 3.4.1, Table 3.1 presents choices

for α such that M(α) is MDS. Note that concrete instantiations are only givenup to the field size s = 13. The reason is that for larger s, all possible CP withP as an irreducible degree-s polynomial of weight 3 are valid choices. If no suchtrinomial exists, one can choose Mα,B as in Table 3.8.

Table 3.2 compares the results presented in this section to the best construc-tions known to date. It turned out that our construction of the 4× 4 MDS matrixin F24 is identical to the F2-linear matrix constructed in [LW16, LS16]. We stressthat our construction leads to among the lightest MDS matrices, improving theresults described in [LS16, SKOP15] for 4 × 4 MDS matrices over F28 and 8 × 8MDS matrices over F28 , respectively. This is also the case when considering an

71

Table 3.2: Comparison of our results with the (non-involutory) F2s -linear MDSmatrices from [SKOP15, Section 6.2] and [LS16, LW16, SS16b, JPST17] by averageoverhead per row. In cases where a product is given, a subfield construction wasused. †: In these constructions, the XOR-count is given by counting the numberof additional 1’s in the corresponding matrix.

(n, s) our [SKOP15]† [LS16]† [LW16]† [SS16b]† [JPST17]

(4,4) 3 5 3 3 2.5 2.5(4,8) 6 2 · 5 8 – 6.75 2 · 2.5(8,8) 26 40 30 – – 24

unrolled implementation of the serial implementations in [WWW13]. Unrolledvariants of their implementations have an XOR-count that is slightly larger thanours. Moreover, and more importantly, the circuit depth is considerably increaseddue to the optimization with respect to a serial implementation.

Note that our results in Table 3.2 are measured by the XOR-count from Defini-tion 3.2 while the results from [LW16, LS16, SKOP15, SS16b] use the more simpleXOR-count definition, i.e., counting the number of 1’s in the matrix. Additionallyto those results, our understanding of how to choose an optimal basis can also beused to improve existing results in the simple XOR-count definition. For example,we can represent the 8×8 MDS matrix in F28 from [LS16] with 28 additional onesinstead of 30 by change of basis.

3.5 Generalizing the MDS Property

Here, following, e.g., [WWW13], we consider a generalization to F2-linear MDScodes in order to improve efficiency.

There are some dimensions for which no field element with an XOR-count of1 exists, for instance s = 8. However, especially this dimension is very importantsince lots of block cipher designs, including the AES, are byte oriented. One wouldlike to have some element α with w⊕(α) = 1. A way of solving this problem is tonot restrict to field elements. Instead, α can be chosen to be some other matrix inthe ring R = Mats(F2). Given an n× n matrix M with elements in Quot(F2[X]),the substitution M(α) now consists of elements in a commutative ring with unity,which is the subring of R generated by α. In general, given a commutative ringwith unity R, one can define the determinant detR : Matn(R) → R in a similarway than for matrices over fields. As described in [Kna07, pp. 212 - 215], anyA ∈ Matn(R) is invertible if and only if detR(A) is a unit in R. We now definethe MDS property for matrices over a commutative ring.

Definition 3.3. Let R be a commutative ring with unity. A matrix M ∈ Matn(R)is MDS if and only if for every 1 ≤ d ≤ n, any d×d submatrix of M is invertible.

For checking the MDS property in our case, we use a well-known fact aboutblock matrices.

72

Theorem 3.5 (Theorem 1 in [Sil00]). Let K be a field and let R be a commutativesubring of Mats(K) for some integer s. For any matrix M ∈ Matd(R), it is

det(M) = det(detR(M)) ,

where det(M) is the determinant of M considered as M ∈ Matds(K).

As an implication, M(α) is MDS if and only if P (α) is invertible for all P ∈ T ,if and only if det(P (α)) 6= 0 for all P ∈ T .

2× 2 and 3× 3 matrices

Given M = circ(1, X) (resp. M = circ(1, 1, X)), one has to make sure that bothX and X + 1 are invertible for M to be MDS. This is the case if X is substitutedby the companion matrix CXs+X+1 for s ≥ 2. Thus, M(CXs+X+1) is MDS andeach non-trivial entry has an XOR-count of 1.

4× 4 matrices

The MDS conditions are more complex than above. So, we only present someimprovements for s ∈ {8, 13, 16}. The matrix M = circ(1, 1, α, α−2) is MDS for

α ∈ {CX8+X2+1, CX13+X+1, CX16+X+1} .

Note that a similar matrix for s = 8 was recently constructed in [LW16].

3.6 Conclusion and Open Problems

We presented a study of optimal multiplication bases with respect to the XOR-count. When applied to MDS matrices those lead to very efficient round-basedimplementations. We expect our results to be applied in other domains as well.

Our investigations leave some possibilities for future research. While we havebeen able to characterize exactly which field elements can be implemented with oneXOR operation only, the general case is still open. For small fields of dimensionsmaller or equal to eight, we were able to compute the optimal bases with the helpof an exhaustive computer search. However, for larger dimensions, this approachturns quickly inefficient and more insight would be needed. As a first step, weconjecture the following statement.

Conjecture 3.1. If w⊕(Mα,B) = 2, then mα is of weight smaller or equal to 5.

Note that the converse of the conjectured statement is (unlike the case oftrinomials) wrong. As can be seen in Table 3.7, there exist a pentanomial of degree8 which cannot be implemented with two XOR operations only. Beyond that, ourintuition is that the larger the weight of the minimal polynomial, the larger thegap between the most efficient multiplication and the efficiency of multiplying by

73

Table 3.3: Minimal XOR-counts for all elements in F∗24 .

minimal polynomial mα min w⊕(α) matrixX + 1 0 I

X2 +X + 1 2 Cmα ⊕ CmαX4 +X + 1 1 CmαX4 +X3 + 1 1 Cmα

X4 +X3 +X2 +X + 1 2 CX4+1 + E[2,2] + E[3,4]

means of the companion matrix. Quantifying and demonstrating such a statementis an interesting and challenging open problem. Another interesting question is toget an improved understanding of how to most efficiently multiply with elementsin proper subfields. More specifically, as a generalization of Corollary 3.3, one mayask the following question.

Question 3.1. Is the most efficient way to multiply with a subfield element givenby multiplying in the subfield d times, where d is the extension degree of the fieldwhen viewed as an extension of the subfield. More precisely, given an α ∈ F∗2m ⊂F∗2s in a proper subfield of dimension m = s

d and let Mα∈F2m ,B′ be the multiplica-

tion matrix in F2m with an optimal XOR-count. Is Mα∈F2s ,B =⊕d

k=1Mα∈F2m ,B′

a matrix with the lowest possible XOR-count for multiplication with α ∈ F2s? Inparticular, is w⊕(Mα∈F2s ,B) = dw⊕(Mα∈F2m ,B′)?

Finally, for MDS matrices it should be noted that we locally achieve the op-timal solution. What would be needed to finally settle the search for lightweightmatrices is a global optimal solution. That is, for a given dimension, find an MDSmatrix that can be implemented with the minimal number of XOR operations.Very recently, Kranz et al. tackled this problem by applying well-known heuristicalgorithms for finding the shortest linear straight-line program to the most effi-cient locally-optimized MDS matrices [KLSW17]. They substantially reduced thenumber of XOR operations needed for implementing the matrices.

Finally, when optimizing for software, similar questions can be phrased andinvestigating solutions that are valid for more than one specific platform is achallenging research topic.

74



X5 +X2 + 1 1 CmαX5 +X3 + 1 1 Cmα

X5 +X3 +X2 +X + 1 2 CX5+1 + E[2,4] + E[4,2]

X5 +X4 +X2 +X + 1 2 CX5+1 + E[2,2] + E[3,5]

X5 +X4 +X3 +X + 1 2 CX5+1 + E[2,3] + E[3,1] + E[3,3]

X5 +X4 +X3 +X2 + 1 2 CX5+1 + E[2,2] + E[3,4]


minimal polynomialmα

min w⊕(α) matrix

X + 1 0 IX2 +X + 1 3 Cmα ⊕ Cmα ⊕ CmαX3 +X + 1 2 Cmα ⊕ CmαX3 +X2 + 1 2 Cmα ⊕ CmαX6 +X + 1 1 CmαX6 +X3 + 1 1 Cmα

X6 +X4 +X2 +X + 1 2 (CX4+1 ⊕ CX2+1)(I + E[1,5] + E[5,4])X6 +X4 +X3 +X + 1 2 CX6+1 + E[2,3] + E[4,6]

X6 +X5 + 1 1 CmαX6 +X5 +X2 +X + 1 2 CX6+1 + E[2,2] + E[3,6]

X6 +X5 +X3 +X2 + 1 2 CX6+1 + E[2,2] + E[3,5]

X6 +X5 +X4 +X + 1 2 CX6+1 + E[2,3] + E[3,1] + E[3,3]

X6 +X5 +X4 +X2 + 1 2 (CX4+1⊕CX2+1)(I+E[1,5]+E[6,1]+E[6,5])

75



X7 +X + 1 1 CmαX7 +X3 + 1 1 Cmα

X7 +X3 +X2 +X + 1 2 CX7+1 + E[2,6] + E[4,2]

X7 +X4 + 1 1 CmαX7 +X4 +X3 +X2 + 1 2 (CX4+1 ⊕ CX3+1)(I + E[1,5] + E[5,3])X7 +X5 +X2 +X + 1 2 (CX5+1 ⊕ CX2+1)(I + E[1,6] + E[6,5])X7 +X5 +X3 +X + 1 2 CX7+1 + E[2,3] + E[4,7]

X7 +X5 +X4 +X3 + 1 2 (CX4+1 ⊕ CX3+1)(I + E[1,5] + E[7,2])X7+X5+X4+X3+X2+X+1 3 CX7+1 + E[2,3] + E[4,6] + E[4,7]

X7 +X6 + 1 1 CmαX7 +X6 +X3 +X + 1 2 (CX6+1 ⊕ CX1+1)(I + E[1,7] + E[7,4])X7 +X6 +X4 +X + 1 2 (CX6+1 ⊕ CX1+1)(I + E[1,7] + E[7,3])X7 +X6 +X4 +X2 + 1 2 CX7+1 + E[2,4] + E[4,1] + E[4,4]

X7 +X6 +X5 +X2 + 1 2 (CX5+1⊕CX2+1)(I+E[1,6]+E[7,1]+E[7,6])

X7+X6+X5+X3+X2+X+1 3 CX7+1 + E[2,2] + E[2,3] + E[4,7]

X7 +X6 +X5 +X4 + 1 2 CX7+1 + E[2,2] + E[3,4]

X7+X6+X5+X4+X2+X+1 3 CX7+1 + E[2,2] + E[3,4] + E[3,7]

X7+X6+X5+X4+X3+X2+1 3 CX7+1 + E[2,2] + E[2,3] + E[4,6]

76



X2 +X + 1 4⊕4

k=1 CmαX4 +X + 1 2 Cmα ⊕ CmαX4 +X3 + 1 2 Cmα ⊕ Cmα

X4 +X3 +X2 +X + 1 4⊕2

k=1(CX4+1 + E[2,2] + E[3,4])X8 +X4 +X3 +X + 1 2 CX8+1 + E[2,6] + E[4,2]

X8 +X4 +X3 +X2 + 1 3 CmαX8 +X5 +X3 +X + 1 2 (CX5+1 ⊕ CX3+1)(I + E[1,6] + E[6,5])X8 +X5 +X3 +X2 + 1 2 CX8+1 + E[2,6] + E[5,2]



X8 +X6 +X5 +X + 1 2 CX8+1 + E[2,4] + E[4,2]

X8 +X6 +X5 +X2 + 1 2 (CX6+1 ⊕ CX2+1)(I + E[1,7] + E[7,2])X8 +X6 +X5 +X3 + 1 2 CX8+1 + E[2,3] + E[4,6]

X8 +X6 +X5 +X4 + 1 3 CmαX8+X6+X5+X4+X2+X+1 3 CX8+1 + E[2,3] + E[2,4] + E[5,8]

X8+X6+X5+X4+X3+X+1 3 CX8+1 + E[2,3] + E[2,5] + E[6,8]

X8 +X7 +X2 +X + 1 2 CX8+1 + E[2,2] + E[3,8]

X8 +X7 +X3 +X + 1 2 (CX7+1 ⊕ CX+1)(I + E[1,8] + E[8,5])X8 +X7 +X3 +X2 + 1 2 CX8+1 + E[2,2] + E[3,7]

X8+X7+X4+X3+X2+X+1 3 CX8+1 + E[2,2] + E[3,6] + E[3,8]

X8 +X7 +X5 +X + 1 2 (CX7+1 ⊕ CX+1)(I + E[1,8] + E[8,3])X8 +X7 +X5 +X3 + 1 2 (CX5+1⊕CX3+1)(I+E[1,6]+E[8,1]+E[8,6])X8 +X7 +X5 +X4 + 1 2 CX8+1 + E[2,2] + E[3,5]

X8+X7+X5+X4+X3+X2+1 3 CX8+1 + E[2,2] + E[3,5] + E[3,7]

X8 +X7 +X6 +X + 1 2 CX8+1 + E[2,3] + E[3,1] + E[3,3]

X8+X7+X6+X3+X2+X+1 3 CX8+1 + E[2,2] + E[2,3] + E[4,8]

X8+X7+X6+X4+X2+X+1 3 (CX6+1⊕CX2+1)(I+E[1,7]+E[7,3]+E[7,8])X8+X7+X6+X4+X3+X2+1 3 CX8+1 + E[2,2] + E[2,3] + E[4,7]

X8+X7+X6+X5+X2+X+1 3 CX8+1 + E[2,2] + E[3,4] + E[3,8]

X8+X7+X6+X5+X4+X+1 3 CX8+1 + E[2,3] + E[3,1] + E[3,3] + E[8,3]

X8+X7+X6+X5+X4+X2+1 3 CX8+1 + E[2,2] + E[2,5] + E[6,7]

X8+X7+X6+X5+X4+X3+1 3 CX8+1 + E[2,2] + E[2,3] + E[4,6]

77

Table 3.8: For each s ≤ 2048 for which no irreducible trinomial of degree s exists,this table presents a matrix of the form Cxs+1 +E[i1,j1] +E[i2,j2] with irreduciblecharacteristic pentanomial. Such a matrix is represented as a 4-tuple (i1, j1, i2, j2).In all cases, the characteristic polynomial is equal to λs + λs+i1−j1+i2−j2−2 +λs+i1−j1−1 + λi2−j2−1 + 1.

s s s s s s s s s s8 (1,3,3,1) 237 (1,168,3,1) 451 (1,104,3,1) 659 (1,250,3,1) 869 (1,128,3,1) 1067 (1,960,5,1) 1274 (1,1176,3,1) 1480 (1,413,3,1) 1680 (1,645,3,1) 1867 (1,670,3,1)

13 (1,4,3,1) 240 (1,121,4,1) 452 (1,90,3,1) 661 (1,224,3,1) 872 (1,405,3,1) 1068 (1,54,3,1) 1275 (1,1265,3,1) 1483 (1,412,3,1) 1682 (1,5,3,1) 1868 (1,420,3,1)16 (1,9,4,1) 243 (1,38,3,1) 453 (1,302,3,1) 664 (1,149,3,1) 874 (1,83,3,1) 1069 (1,338,3,1) 1277 (1,230,3,1) 1484 (1,41,3,1) 1683 (1,1278,3,1) 1869 (1,384,3,1)19 (1,8,3,1) 245 (1,38,3,1) 454 (1,314,3,1) 666 (1,117,3,1) 875 (1,386,3,1) 1070 (1,228,3,1) 1280 (1,81,3,1) 1485 (1,1086,3,1) 1684 (1,730,3,1) 1872 (1,183,3,1)24 (1,5,3,1) 246 (1,71,3,1) 456 (1,129,3,1) 667 (1,38,4,1) 877 (1,248,5,1) 1072 (1,789,4,1) 1283 (1,344,3,1) 1488 (1,1017,3,1) 1685 (1,816,3,1) 1874 (1,35,4,1)26 (1,11,3,1) 248 (1,29,3,1) 459 (1,270,3,1) 669 (1,48,3,1) 878 (1,3,3,1) 1073 (1,362,3,1) 1285 (1,1174,3,1) 1491 (1,666,3,1) 1686 (1,72,3,1) 1875 (1,1386,3,1)27 (1,3,3,1) 251 (1,24,3,1) 461 (1,170,3,1) 672 (1,567,3,1) 880 (1,11,3,1) 1074 (1,801,3,1) 1288 (1,379,3,1) 1493 (1,1002,3,1) 1688 (1,255,3,1) 1876 (1,1204,3,1)32 (1,3,3,1) 254 (1,19,3,1) 464 (1,55,3,1) 674 (1,307,4,1) 883 (1,589,3,1) 1075 (1,142,3,1) 1290 (1,149,3,1) 1494 (1,620,3,1) 1690 (1,733,3,1) 1877 (1,628,3,1)37 (1,16,4,1) 256 (1,157,3,1) 466 (1,451,3,1) 675 (1,225,3,1) 885 (1,512,3,1) 1076 (1,49,3,1) 1291 (1,302,3,1) 1496 (1,21,3,1) 1691 (1,26,3,1) 1880 (1,207,3,1)38 (1,4,3,1) 259 (1,20,3,1) 467 (1,72,3,1) 677 (1,647,3,1) 886 (1,10,3,1) 1077 (1,706,4,1) 1292 (1,473,3,1) 1498 (1,223,3,1) 1693 (1,394,3,1) 1882 (1,399,4,1)40 (1,14,3,1) 261 (1,12,3,1) 469 (1,188,3,1) 678 (1,312,3,1) 888 (1,501,3,1) 1080 (1,75,3,1) 1293 (1,212,3,1) 1499 (1,3,3,1) 1696 (1,19,3,1) 1883 (1,680,3,1)43 (1,8,3,1) 262 (1,13,3,1) 472 (1,385,3,1) 680 (1,21,3,1) 891 (1,12,3,1) 1083 (1,92,3,1) 1296 (1,257,3,1) 1501 (1,1222,3,1) 1699 (1,404,3,1) 1885 (1,352,3,1)45 (1,6,3,1) 264 (1,63,3,1) 475 (1,94,3,1) 681 (1,51,3,1) 893 (1,827,4,1) 1088 (1,3,3,1) 1299 (1,144,3,1) 1502 (1,5,3,1) 1701 (1,540,3,1) 1888 (1,905,3,1)48 (1,21,3,1) 267 (1,182,3,1) 477 (1,286,4,1) 683 (1,104,3,1) 896 (1,87,3,1) 1091 (1,1026,3,1) 1301 (1,160,3,1) 1504 (1,559,3,1) 1702 (1,262,3,1) 1891 (1,280,3,1)50 (1,7,3,1) 269 (1,64,3,1) 480 (1,273,5,1) 685 (1,172,3,1) 899 (1,64,3,1) 1093 (1,310,3,1) 1303 (1,380,3,1) 1506 (1,215,3,1) 1704 (1,1617,4,1) 1892 (1,440,3,1)51 (1,12,3,1) 272 (1,165,3,1) 482 (1,115,3,1) 688 (1,149,3,1) 901 (1,504,4,1) 1096 (1,947,3,1) 1304 (1,391,3,1) 1507 (1,200,3,1) 1706 (1,843,3,1) 1893 (1,344,3,1)53 (1,4,3,1) 275 (1,20,3,1) 483 (1,26,3,1) 691 (1,606,7,1) 904 (1,241,5,1) 1099 (1,644,3,1) 1307 (1,1200,3,1) 1509 (1,128,5,1) 1707 (1,150,3,1) 1894 (1,391,3,1)56 (1,13,3,1) 277 (1,208,3,1) 485 (1,158,3,1) 693 (1,278,3,1) 907 (1,142,3,1) 1101 (1,474,3,1) 1309 (1,26,3,1) 1512 (1,381,3,1) 1709 (1,688,3,1) 1896 (1,1053,4,1)59 (1,14,3,1) 280 (1,73,3,1) 488 (1,359,3,1) 696 (1,77,3,1) 909 (1,480,3,1) 1104 (1,515,3,1) 1312 (1,901,3,1) 1515 (1,14,3,1) 1712 (1,95,3,1) 1897 (1,80,3,1)61 (1,4,3,1) 283 (1,154,3,1) 491 (1,477,3,1) 699 (1,360,3,1) 910 (1,8,3,1) 1107 (1,936,3,1) 1315 (1,508,3,1) 1517 (1,698,3,1) 1714 (1,1021,3,1) 1898 (1,241,3,1)64 (1,61,3,1) 285 (1,158,3,1) 493 (1,20,3,1) 701 (1,238,3,1) 912 (1,627,3,1) 1109 (1,278,3,1) 1316 (1,204,3,1) 1520 (1,131,3,1) 1715 (1,250,3,1) 1899 (1,986,3,1)67 (1,58,3,1) 288 (1,206,3,1) 496 (1,149,3,1) 703 (1,19,3,1) 914 (1,81,3,1) 1112 (1,35,3,1) 1317 (1,820,4,1) 1522 (1,985,3,1) 1717 (1,142,3,1) 1901 (1,230,3,1)69 (1,42,4,1) 290 (1,96,3,1) 499 (1,40,3,1) 704 (1,195,5,1) 915 (1,320,3,1) 1114 (1,143,3,1) 1318 (1,109,3,1) 1523 (1,906,3,1) 1718 (1,242,3,1) 1904 (1,535,3,1)70 (1,19,3,1) 291 (1,200,3,1) 501 (1,144,4,1) 706 (1,503,3,1) 917 (1,572,3,1) 1115 (1,328,3,1) 1320 (1,167,3,1) 1525 (1,602,3,1) 1720 (1,133,3,1) 1907 (1,780,3,1)72 (1,15,5,1) 293 (1,16,3,1) 502 (1,245,3,1) 707 (1,376,3,1) 920 (1,535,3,1) 1117 (1,220,3,1) 1322 (1,405,3,1) 1528 (1,79,3,1) 1723 (1,322,3,1) 1909 (1,46,3,1)75 (1,36,3,1) 296 (1,109,3,1) 504 (1,141,3,1) 709 (1,230,3,1) 922 (1,299,4,1) 1118 (1,168,3,1) 1323 (1,272,3,1) 1531 (1,910,3,1) 1725 (1,90,3,1) 1910 (1,266,3,1)77 (1,47,3,1) 298 (1,55,3,1) 507 (1,429,6,1) 710 (1,17,3,1) 923 (1,52,4,1) 1120 (1,1043,3,1) 1325 (1,22,3,1) 1532 (1,66,3,1) 1727 (1,207,3,1) 1912 (1,157,3,1)78 (1,23,3,1) 299 (1,116,3,1) 509 (1,10,3,1) 712 (1,455,3,1) 925 (1,859,3,1) 1123 (1,410,3,1) 1328 (1,917,3,1) 1533 (1,1160,3,1) 1728 (1,1227,3,1) 1914 (1,639,4,1)80 (1,5,3,1) 301 (1,154,3,1) 512 (1,425,3,1) 715 (1,110,3,1) 928 (1,537,4,1) 1124 (1,424,3,1) 1330 (1,187,3,1) 1536 (1,39,3,1) 1730 (1,443,3,1) 1915 (1,40,3,1)82 (1,47,5,1) 304 (1,109,3,1) 515 (1,236,3,1) 717 (1,448,4,1) 929 (1,51,3,1) 1125 (1,254,3,1) 1331 (1,978,3,1) 1538 (1,509,3,1) 1731 (1,338,3,1) 1916 (1,663,3,1)83 (1,4,3,1) 306 (1,81,3,1) 517 (1,172,3,1) 720 (1,369,4,1) 931 (1,544,3,1) 1128 (1,21,3,1) 1333 (1,530,3,1) 1539 (1,30,3,1) 1732 (1,1250,3,1) 1917 (1,1026,3,1)85 (1,16,3,1) 307 (1,192,4,1) 520 (1,331,3,1) 723 (1,414,3,1) 933 (1,264,3,1) 1131 (1,552,3,1) 1336 (1,97,3,1) 1541 (1,698,3,1) 1733 (1,128,3,1) 1920 (1,39,3,1)88 (1,11,3,1) 309 (1,155,3,1) 523 (1,140,3,1) 725 (1,68,3,1) 934 (1,424,3,1) 1132 (1,52,3,1) 1339 (1,820,3,1) 1544 (1,411,3,1) 1736 (1,493,3,1) 1922 (1,1497,3,1)91 (1,8,3,1) 311 (1,25,3,1) 525 (1,328,4,1) 728 (1,393,3,1) 936 (1,23,3,1) 1133 (1,238,3,1) 1341 (1,732,3,1) 1546 (1,351,4,1) 1739 (1,420,3,1) 1923 (1,326,3,1)96 (1,9,3,1) 312 (1,7,5,1) 528 (1,35,3,1) 731 (1,80,3,1) 939 (1,288,3,1) 1136 (1,957,3,1) 1342 (1,610,3,1) 1547 (1,62,3,1) 1741 (1,476,3,1) 1925 (1,210,3,1)99 (1,78,3,1) 315 (1,50,3,1) 530 (1,17,3,1) 733 (1,310,3,1) 940 (1,8,3,1) 1139 (1,246,3,1) 1344 (1,185,3,1) 1549 (1,1220,3,1) 1744 (1,1183,3,1) 1928 (1,83,3,1)

101 (1,20,3,1) 317 (1,90,3,1) 531 (1,44,3,1) 734 (1,56,3,1) 941 (1,382,3,1) 1141 (1,196,3,1) 1346 (1,57,3,1) 1552 (1,325,3,1) 1747 (1,1402,3,1) 1930 (1,569,3,1)104 (1,94,3,1) 320 (1,53,3,1) 533 (1,56,3,1) 736 (1,271,5,1) 944 (1,191,3,1) 1143 (1,191,3,1) 1347 (1,18,3,1) 1555 (1,404,3,1) 1749 (1,56,3,1) 1931 (1,3,3,1)107 (1,44,4,1) 323 (1,120,3,1) 535 (1,104,3,1) 739 (1,286,3,1) 946 (1,469,3,1) 1144 (1,335,3,1) 1349 (1,138,5,1) 1557 (1,822,3,1) 1752 (1,1585,4,1) 1933 (1,940,3,1)109 (1,20,3,1) 325 (1,244,3,1) 536 (1,117,3,1) 741 (1,18,3,1) 947 (1,16,3,1) 1147 (1,598,3,1) 1352 (1,595,3,1) 1560 (1,1027,5,1) 1754 (1,103,3,1) 1936 (1,1657,3,1)112 (1,91,3,1) 326 (1,27,3,1) 539 (1,3,3,1) 744 (1,635,3,1) 949 (1,550,4,1) 1149 (1,902,5,1) 1355 (1,216,3,1) 1563 (1,198,3,1) 1755 (1,308,3,1) 1939 (1,110,3,1)114 (1,33,3,1) 328 (1,237,4,1) 541 (1,364,5,1) 747 (1,548,3,1) 950 (1,154,3,1) 1150 (1,130,3,1) 1357 (1,302,5,1) 1565 (1,66,3,1) 1757 (1,988,3,1) 1941 (1,1262,3,1)115 (1,32,3,1) 331 (1,8,3,1) 542 (1,50,3,1) 749 (1,316,3,1) 952 (1,573,4,1) 1152 (1,11,3,1) 1360 (1,485,3,1) 1568 (1,231,3,1) 1758 (1,213,3,1) 1942 (1,76,3,1)116 (1,20,3,1) 334 (1,64,3,1) 544 (1,65,5,1) 752 (1,117,3,1) 955 (1,80,3,1) 1155 (1,8,3,1) 1363 (1,1018,3,1) 1571 (1,430,3,1) 1760 (1,273,3,1) 1944 (1,429,3,1)117 (1,99,4,1) 335 (1,20,3,1) 546 (1,213,3,1) 755 (1,230,3,1) 957 (1,372,3,1) 1157 (1,328,3,1) 1365 (1,488,3,1) 1573 (1,112,3,1) 1761 (1,186,3,1) 1947 (1,210,3,1)120 (1,15,3,1) 336 (1,35,3,1) 547 (1,302,4,1) 757 (1,751,3,1) 958 (1,80,3,1) 1160 (1,87,3,1) 1368 (1,533,3,1) 1574 (1,284,3,1) 1762 (1,277,3,1) 1949 (1,686,3,1)122 (1,60,3,1) 338 (1,15,3,1) 548 (1,18,3,1) 760 (1,187,3,1) 960 (1,765,3,1) 1162 (1,451,3,1) 1370 (1,35,3,1) 1576 (1,131,3,1) 1763 (1,266,3,1) 1952 (1,205,3,1)125 (1,18,3,1) 339 (1,20,3,1) 549 (1,304,4,1) 763 (1,334,3,1) 962 (1,213,3,1) 1163 (1,162,3,1) 1371 (1,528,3,1) 1579 (1,706,3,1) 1765 (1,458,3,1) 1954 (1,71,4,1)128 (1,61,3,1) 341 (1,278,3,1) 552 (1,195,3,1) 764 (1,40,3,1) 963 (1,38,3,1) 1165 (1,32,3,1) 1373 (1,154,3,1) 1581 (1,1446,3,1) 1766 (1,453,3,1) 1955 (1,1256,4,1)131 (1,36,3,1) 344 (1,85,3,1) 554 (1,191,3,1) 765 (1,216,3,1) 965 (1,88,3,1) 1168 (1,629,3,1) 1376 (1,151,3,1) 1584 (1,621,3,1) 1768 (1,157,3,1) 1957 (1,1486,3,1)133 (1,28,3,1) 347 (1,10,3,1) 555 (1,10,4,1) 766 (1,161,3,1) 968 (1,319,3,1) 1171 (1,466,3,1) 1378 (1,611,4,1) 1587 (1,250,7,1) 1771 (1,518,3,1) 1960 (1,31,3,1)136 (1,11,3,1) 349 (1,58,3,1) 557 (1,318,3,1) 768 (1,117,3,1) 970 (1,415,3,1) 1172 (1,288,3,1) 1379 (1,306,3,1) 1589 (1,128,7,1) 1773 (1,1352,3,1) 1963 (1,1504,3,1)138 (1,53,3,1) 352 (1,125,3,1) 560 (1,125,3,1) 770 (1,547,3,1) 971 (1,346,3,1) 1173 (1,108,3,1) 1381 (1,1276,3,1) 1592 (1,1135,3,1) 1776 (1,669,3,1) 1965 (1,732,3,1)139 (1,8,5,1) 355 (1,164,3,1) 562 (1,25,3,1) 771 (1,138,3,1) 973 (1,56,3,1) 1176 (1,57,3,1) 1382 (1,270,3,1) 1594 (1,821,3,1) 1779 (1,1046,3,1) 1968 (1,9,3,1)141 (1,47,3,1) 356 (1,10,3,1) 563 (1,60,3,1) 773 (1,138,3,1) 974 (1,30,3,1) 1179 (1,56,3,1) 1384 (1,221,3,1) 1595 (1,192,3,1) 1781 (1,36,3,1) 1970 (1,649,3,1)143 (1,19,3,1) 357 (1,266,3,1) 565 (1,178,3,1) 776 (1,569,3,1) 976 (1,715,3,1) 1181 (1,650,4,1) 1387 (1,140,3,1) 1597 (1,1150,3,1) 1784 (1,1481,3,1) 1971 (1,6,3,1)144 (1,39,3,1) 360 (1,5,3,1) 568 (1,373,3,1) 779 (1,338,3,1) 978 (1,311,3,1) 1184 (1,701,3,1) 1389 (1,204,3,1) 1598 (1,289,3,1) 1786 (1,1429,3,1) 1972 (1,820,3,1)149 (1,40,3,1) 361 (1,25,3,1) 571 (1,10,4,1) 781 (1,278,3,1) 980 (1,369,3,1) 1187 (1,530,3,1) 1392 (1,1283,3,1) 1600 (1,529,3,1) 1787 (1,112,4,1) 1973 (1,1918,3,1)152 (1,3,3,1) 363 (1,258,3,1) 572 (1,35,3,1) 784 (1,169,3,1) 981 (1,42,3,1) 1189 (1,136,3,1) 1394 (1,371,3,1) 1603 (1,134,3,1) 1789 (1,578,3,1) 1976 (1,361,3,1)157 (1,50,3,1) 365 (1,294,3,1) 573 (1,6,3,1) 786 (1,387,3,1) 984 (1,917,5,1) 1192 (1,397,3,1) 1395 (1,380,3,1) 1605 (1,1328,5,1) 1792 (1,1001,4,1) 1978 (1,289,3,1)158 (1,12,3,1) 368 (1,25,3,1) 576 (1,41,3,1) 787 (1,52,3,1) 987 (1,308,3,1) 1194 (1,153,3,1) 1397 (1,128,3,1) 1608 (1,273,3,1) 1794 (1,27,4,1) 1979 (1,1190,4,1)160 (1,91,3,1) 371 (1,20,3,1) 578 (1,507,3,1) 788 (1,26,3,1) 989 (1,138,3,1) 1195 (1,962,3,1) 1400 (1,489,3,1) 1610 (1,561,3,1) 1795 (1,130,3,1) 1981 (1,1930,3,1)163 (1,104,3,1) 373 (1,34,3,1) 579 (1,114,3,1) 789 (1,294,3,1) 992 (1,33,3,1) 1197 (1,12,3,1) 1403 (1,3,3,1) 1611 (1,506,3,1) 1796 (1,48,3,1) 1982 (1,260,3,1)164 (1,18,3,1) 374 (1,38,3,1) 581 (1,318,3,1) 790 (1,269,3,1) 995 (1,558,5,1) 1200 (1,179,3,1) 1405 (1,658,3,1) 1613 (1,892,4,1) 1797 (1,8,3,1) 1984 (1,739,3,1)165 (1,135,3,1) 376 (1,113,3,1) 584 (1,63,3,1) 792 (1,131,3,1) 997 (1,76,3,1) 1203 (1,788,3,1) 1406 (1,97,3,1) 1614 (1,138,3,1) 1800 (1,819,3,1) 1987 (1,1196,3,1)168 (1,21,3,1) 379 (1,8,3,1) 586 (1,469,3,1) 795 (1,450,3,1) 1000 (1,101,3,1) 1205 (1,590,5,1) 1408 (1,817,3,1) 1616 (1,801,3,1) 1803 (1,1158,3,1) 1989 (1,232,5,1)171 (1,60,3,1) 381 (1,198,4,1) 587 (1,256,3,1) 796 (1,26,3,1) 1002 (1,225,3,1) 1208 (1,319,3,1) 1411 (1,634,3,1) 1619 (1,578,3,1) 1805 (1,1312,3,1) 1992 (1,177,3,1)173 (1,32,3,1) 384 (1,195,3,1) 589 (1,70,3,1) 797 (1,336,3,1) 1003 (1,716,3,1) 1211 (1,686,3,1) 1413 (1,880,4,1) 1621 (1,852,4,1) 1808 (1,545,3,1) 1995 (1,710,3,1)176 (1,19,3,1) 387 (1,102,3,1) 591 (1,95,3,1) 800 (1,523,3,1) 1004 (1,106,3,1) 1213 (1,664,3,1) 1416 (1,461,3,1) 1622 (1,158,3,1) 1811 (1,618,3,1) 1997 (1,366,3,1)179 (1,10,3,1) 389 (1,38,3,1) 592 (1,241,3,1) 802 (1,47,4,1) 1005 (1,692,3,1) 1216 (1,709,3,1) 1418 (1,149,3,1) 1624 (1,1375,3,1) 1812 (1,702,3,1) 1998 (1,155,3,1)181 (1,175,3,1) 392 (1,47,3,1) 595 (1,364,3,1) 803 (1,60,3,1) 1006 (1,40,3,1) 1219 (1,676,4,1) 1419 (1,324,3,1) 1626 (1,573,3,1) 1813 (1,14,3,1) 2000 (1,333,3,1)184 (1,83,3,1) 395 (1,126,3,1) 597 (1,540,3,1) 805 (1,248,3,1) 1008 (1,521,3,1) 1221 (1,200,3,1) 1421 (1,496,3,1) 1627 (1,130,3,1) 1816 (1,83,3,1) 2002 (1,5,3,1)187 (1,22,3,1) 397 (1,208,3,1) 598 (1,248,3,1) 808 (1,379,3,1) 1011 (1,330,3,1) 1222 (1,181,3,1) 1424 (1,323,3,1) 1629 (1,500,3,1) 1819 (1,926,3,1) 2003 (1,710,3,1)188 (1,21,3,1) 398 (1,104,3,1) 600 (1,87,3,1) 811 (1,394,3,1) 1013 (1,296,3,1) 1224 (1,545,3,1) 1427 (1,1110,3,1) 1632 (1,791,3,1) 1821 (1,668,3,1) 2005 (1,1074,4,1)189 (1,63,3,1) 400 (1,283,3,1) 603 (1,453,3,1) 813 (1,98,3,1) 1016 (1,131,3,1) 1227 (1,212,3,1) 1429 (1,400,3,1) 1635 (1,545,3,1) 1822 (1,856,3,1) 2008 (1,445,3,1)190 (1,16,3,1) 403 (1,124,3,1) 605 (1,248,3,1) 816 (1,285,3,1) 1017 (1,261,3,1) 1229 (1,40,3,1) 1432 (1,1241,3,1) 1637 (1,90,3,1) 1824 (1,1125,4,1) 2011 (1,922,3,1)192 (1,155,3,1) 405 (1,68,5,1) 608 (1,73,3,1) 819 (1,496,4,1) 1018 (1,937,3,1) 1232 (1,477,3,1) 1435 (1,484,3,1) 1640 (1,401,3,1) 1826 (1,335,3,1) 2012 (1,40,3,1)195 (1,158,3,1) 408 (1,27,3,1) 611 (1,334,3,1) 821 (1,424,3,1) 1019 (1,3,3,1) 1235 (1,50,3,1) 1437 (1,380,3,1) 1643 (1,3,3,1) 1827 (1,1308,3,1) 2013 (1,108,3,1)197 (1,126,3,1) 410 (1,117,3,1) 613 (1,368,4,1) 824 (1,233,3,1) 1021 (1,832,3,1) 1237 (1,380,3,1) 1439 (1,300,3,1) 1644 (1,221,3,1) 1829 (1,760,3,1) 2014 (1,170,3,1)200 (1,53,3,1) 411 (1,108,7,1) 616 (1,597,4,1) 827 (1,680,3,1) 1024 (1,643,3,1) 1240 (1,826,3,1) 1440 (1,901,4,1) 1645 (1,176,3,1) 1832 (1,295,3,1) 2016 (1,645,3,1)203 (1,3,3,1) 413 (1,36,3,1) 619 (1,118,3,1) 829 (1,560,3,1) 1027 (1,830,3,1) 1243 (1,34,3,1) 1443 (1,348,3,1) 1646 (1,68,3,1) 1834 (1,1820,3,1) 2018 (1,173,3,1)205 (1,176,3,1) 416 (1,87,3,1) 621 (1,344,3,1) 830 (1,60,3,1) 1032 (1,541,4,1) 1244 (1,46,3,1) 1445 (1,68,3,1) 1648 (1,1249,3,1) 1835 (1,1352,3,1) 2019 (1,1506,3,1)206 (1,7,3,1) 419 (1,40,5,1) 624 (1,609,3,1) 832 (1,685,3,1) 1035 (1,208,5,1) 1245 (1,360,3,1) 1448 (1,93,3,1) 1651 (1,490,3,1) 1837 (1,1162,4,1) 2021 (1,102,3,1)208 (1,197,4,1) 421 (1,20,7,1) 627 (1,377,3,1) 835 (1,92,3,1) 1037 (1,758,3,1) 1248 (1,327,3,1) 1450 (1,1183,3,1) 1653 (1,834,3,1) 1840 (1,409,3,1) 2024 (1,1949,4,1)211 (1,112,3,1) 424 (1,359,3,1) 629 (1,268,3,1) 836 (1,8,3,1) 1038 (1,708,3,1) 1250 (1,313,3,1) 1451 (1,22,3,1) 1654 (1,1000,3,1) 1842 (1,303,3,1) 2027 (1,1980,3,1)213 (1,24,3,1) 427 (1,22,3,1) 630 (1,80,3,1) 837 (1,542,3,1) 1040 (1,311,3,1) 1251 (1,38,3,1) 1453 (1,146,3,1) 1656 (1,179,3,1) 1843 (1,904,3,1) 2029 (1,4,3,1)216 (1,11,3,1) 429 (1,18,3,1) 632 (1,117,3,1) 840 (1,173,3,1) 1043 (1,304,3,1) 1253 (1,318,3,1) 1456 (1,29,3,1) 1658 (1,801,3,1) 1845 (1,786,3,1) 2030 (1,337,3,1)219 (1,201,3,1) 430 (1,49,3,1) 635 (1,60,3,1) 843 (1,200,3,1) 1045 (1,298,3,1) 1254 (1,56,3,1) 1459 (1,496,3,1) 1659 (1,446,3,1) 1848 (1,1077,3,1) 2032 (1,1479,5,1)221 (1,80,3,1) 432 (1,117,5,1) 637 (1,38,6,1) 848 (1,243,3,1) 1046 (1,38,3,1) 1256 (1,605,3,1) 1461 (1,122,3,1) 1661 (1,156,3,1) 1850 (1,1015,4,1) 2035 (1,442,3,1)222 (1,71,3,1) 434 (1,119,3,1) 638 (1,3,3,1) 851 (1,58,3,1) 1048 (1,539,3,1) 1258 (1,689,3,1) 1462 (1,913,3,1) 1662 (1,285,3,1) 1851 (1,720,3,1) 2037 (1,210,3,1)224 (1,83,3,1) 435 (1,122,3,1) 640 (1,35,3,1) 853 (1,142,3,1) 1051 (1,12,4,1) 1259 (1,168,3,1) 1464 (1,1155,3,1) 1664 (1,135,3,1) 1852 (1,46,3,1) 2038 (1,1498,3,1)226 (1,169,3,1) 437 (1,318,3,1) 643 (1,412,4,1) 854 (1,138,3,1) 1053 (1,1044,3,1) 1261 (1,640,3,1) 1467 (1,782,3,1) 1667 (1,1146,3,1) 1853 (1,906,3,1) 2040 (1,77,3,1)227 (1,182,3,1) 440 (1,297,3,1) 644 (1,236,3,1) 856 (1,691,3,1) 1056 (1,891,3,1) 1262 (1,285,3,1) 1469 (1,164,4,1) 1669 (1,106,3,1) 1856 (1,861,3,1) 2042 (1,417,3,1)229 (1,80,3,1) 442 (1,187,4,1) 645 (1,50,3,1) 859 (1,178,3,1) 1059 (1,260,3,1) 1264 (1,283,3,1) 1472 (1,83,3,1) 1670 (1,472,3,1) 1858 (1,1099,3,1) 2043 (1,398,3,1)230 (1,3,3,1) 443 (1,48,3,1) 648 (1,626,3,1) 863 (1,92,3,1) 1061 (1,268,3,1) 1267 (1,800,4,1) 1474 (1,829,3,1) 1672 (1,881,4,1) 1859 (1,4,3,1) 2045 (1,1554,3,1)232 (1,133,3,1) 445 (1,88,3,1) 653 (1,478,3,1) 864 (1,39,3,1) 1064 (1,351,3,1) 1269 (1,907,4,1) 1475 (1,68,3,1) 1675 (1,970,4,1) 1861 (1,398,3,1) 2046 (1,240,3,1)235 (1,26,3,1) 448 (1,119,3,1) 656 (1,295,3,1) 867 (1,48,3,1) 1066 (1,401,3,1) 1272 (1,383,3,1) 1477 (1,140,3,1) 1677 (1,852,3,1) 1864 (1,61,3,1) 2048 (1,83,3,1)

78

Chapter 4

On the Best WordPermutations for LightweightAES-like Ciphers

This chapter is based on joint work with Gianira Alfarano, Stefan Kolbl and GregorLeander. The author partly contributed to all of the results. His main contributionwas the formalization of the method for classifying word permutations with regardto our defined notion of equivalence (i.e., Section 4.2) and the implementation ofthe classification algorithm for the particular case of the MixColumn matrix ofMidori.

4.1 Introduction

The AES has influenced a large variety of lightweight designs. In particular, manydesigns start with the initial structure of the AES and tailor it with respect toone or more parts in order to fulfill their considered lightweight requirements.One of the more recent examples was presented in 2015 with the block cipherMidori [BBI+15]. While its design of follows the general outline of the AES, almostall components are modified in order to reach the goal of minimizing energy.

In particular, the authors decided to change the MixColumns operation in away that it applies multiplication with a binary matrix of branch number 4, com-pared to the non-binary MixColumns operation in the AES with branch number5. This has the benefit of reducing the energy consumption. Moreover, the imple-mentation of the finite field multiplication can be avoided. However, the downsideis that, a priori , the minimum number of active S-boxes reduces. More precisely,while for the AES we have at least 25 active S-boxes in any linear or differential4-round trail, moving to a branch number of 4 reduces this number to 16 (seeTheorem 2.4). For states of quadratic dimension and under the usage of a Mix-

79

Columns operation with optimal branch number, it was shown in [BJL+15] thatone cannot increase the minimum number of active S-boxes when substitutingthe ShiftRows operation by another, arbitrary, word permutation. Therefore, theinteresting, and maybe unexpected, observation made by the designers of Midoriis that actually substituting ShiftRows can significantly increase the number ofactive S-boxes if a MixColumns operation with non-optimal branch number is em-ployed. Indeed, by using the Midori word permutation as given in Section 2.5.3,the authors of Midori managed to increase the minimum number of active S-boxes,e.g., from 20 to 30 for 6 rounds. Later in 2016, Todo and Aoki analyzed whichbinary matrices lead to an improved number of active S-boxes for the classicalShiftRows permutation [TA16].

The interesting and important question raised by the designers of Midori iswhat the optimal choice of the word permutation actually is. The difficulty in an-swering this question comes from the fact that the theoretical approach describedin Section 2.4.2 is not capable of proving better bounds on the minimum numberof active S-boxes than one obtains by just iterating four-round trails. In otherwords, with our current knowledge we are not able to theoretically analyze thoseimproved bounds on the number of active S-boxes, but rather have to rely oncomputer search techniques like Matsui’s algorithm [Mat95] or techniques basedon Mixed-Integer Linear Programming (MILP) [MWGP12]. Quite some progresshas been made on those tools in recent years, especially in the area of MILP(e.g. [SHW+14b]). For a given word permutation, even for a higher number ofrounds, it is still possible to compute bounds on the minimum number of activeS-boxes within reasonable time using a standard PC. However, there is a hugechoice of possible permutations to be considered – roughly 244.25 ≈ 16! in the caseof a 4× 4 state as used in Midori – which immediately renders the naive approachof simply testing them all very inefficient.

The designers of Midori overcome this problem by heuristically reducing thesearch space of all word permutations to be considered. However, it stayed un-clear if, by this reduction, we actually exclude the best possible permutations. Inthe design document of Midori, some conditions are given under which a permu-tation should lead to an optimal number of active S-boxes [BBI+15, pp. 15-16].Unfortunately, those conditions are given without a proof or an intuition. More-over, as we will see later, those conditions do not guarantee an optimal number ofactive S-boxes for all number of rounds simultaneously.

Thus, the final goal is clearly to gain theoretical insight on what exactly char-acterizes the optimal word permutations. However, as already mentioned, thisseems out of reach with our current knowledge. We then focus on the task of com-putationally finding the best permutations among all permutations, i.e., withoutany restriction on the search space.

80


The starting point is the simple, but useful, observation given in Section 4.2, i.e.,for any AES-like cipher, there are several word permutations which basically leadto the same cipher. More precisely, if two word permutations p and p′ differ only byconjugation with a permutation that commutes with the MixColumns operation,the entire two ciphers differ only by a permutation on the plaintext and ciphertextand a corresponding permutation of the round keys. This immediately impliesthat in particular, the ciphers have the same cryptographic resistance against anyattack that does not involve details about the key-scheduling algorithm. Especially,the bounds on the number of active S-boxes are necessarily the same.

For our task of finding the best permutation, this means that we have to checkonly one of those word permutations, p or p′. More formally, we show that beingequal up to conjugation with a permutation that commutes with a given Mix-Columns operation actually defines an equivalence relation (see Definition 4.1) onthe set of all possible permutations. We then have to check only one representativeof each possible equivalence class.

This naturally leads to the task of classifying permutations with respect to thisnotion of equivalence. Again, when approaching this task in a naive manner, itquickly turns very inefficient. In order to keep it still manageable, we first studya weakened notion of equivalence, which allows us to significantly simplify theclassification algorithm. We furthermore give an easy to verify condition on whenthis a priori weakened equivalence notion coincides with the equivalence notionmentioned above. The running time of the resulting classification algorithm as wellas the number of existing equivalence classes strongly depends on the structure ofthe MixColumns operation.

The MixColumns operation used in Midori finally serves as a case study for ourgeneral approach (see Section 4.3). Focusing on Midori is especially interesting forthe following two reasons. Firstly, the MixColumns operation fulfills the sufficientcondition for which the weaker notion of equivalence coincides with the strongernotion of equivalence we are actually interested in. This allows the simplifiedclassification mentioned above. Secondly, as we will explain in detail, the numberof different equivalence classes is especially small for Midori. Indeed, our algorithmreveals that there are only about 221.7 equivalence classes. Thus, compared tochecking 244.5 possible permutations, we gain a speed-up factor of more than 222.All in all, this allows us to compare the actual best word permutation with respectto the number of active S-boxes – without any restriction on the search space –with the one actually used in Midori.

As it turns out, the permutation used in Midori gives optimal bounds for 1up to 8 and 10 up to 12 rounds. For all other number of rounds up to 40, thatword permutation is not optimal. As an alternative, we are able to present a wordpermutation that is optimal for most number of rounds up to 40, with few excep-tions. It is worth noticing that the number of optimal permutations varies whenconsidering different number of rounds and that there is actually no single permu-tation that achieves the optimal bound for all number of rounds simultaneously.

81

Our analysis indicates that the conditions on optimal word permutations given bythe Midori designers (see [BBI+15, pp. 15-16]) do not precisely characterize theproperties a designer would like to have on optimal word permutations.

It is worth remarking that the ciphers mCrypton and Mantis apply the sameMixColumns operation than Midori. This suggests that our findings are not limitedto Midori, but may instead be useful for future cipher designs.

4.2 Classifying Word Permutations

When designing a new block cipher, besides choosing a cryptographically strongS-box, a crucial goal of the designer is to choose an appropriate linear layer in orderto maximize diffusion properties and prevent against differential and linear attacks.In the notion of AES-like ciphers as depicted in Figure 2.6 and formally definedin Definition 2.9, the linear layer is fully specified by a matrix M ∈ GLnr (F2s)corresponding to the MixColumns operation and by a permutation p ∈ Snrnc . Anatural designer’s approach is to first select the matrix M and then choose theword permutation that maximizes the minimum number of active S-boxes. Indeed,one of the major novelties of the Midori design was that the choice of a specific typeof word permutation, in combination with the appropriate MixColumns matrix,can lead to much higher number of guaranteed active S-boxes, compared to justapplying a simple ShiftRows-like operation as it was done in the AES. This analysiscan be done by using automatic search tools. For Midori in particular, Matsui’salgorithm was used (see Algorithm 2.2 in Section 2.4.3).

In order to reduce the search space of all permutations in Snrnc , it is crucial toidentify under which conditions two permutations lead to the same cryptographicproperties. In particular, we base our work on the following simple observation: Ifwe consider a permutation p ∈ Snrnc , then any permutation that is obtained fromp by conjugation with some ϑ ∈ Snrnc for which MixM ◦ Permuteϑ = Permuteϑ ◦MixM lead to the same cryptographic properties. In other words, the SP cipherinstantiated with the AES-like round RSb,ϑ◦p◦ϑ−1,M is just a permuted version ofthe SP cipher instantiated with RSb,p,M (under a permutation of the round keys),whenever the word permutation under ϑ commutes with the operation MixM .This fact is illustrated in Figure 4.1. Overall, this motivates the notion of M -equivalence as defined in the following. For given state dimensions nrnc, we willdenote the set of all word permutations ϑ for which Permuteϑ commutes withMixM by T (M).

Definition 4.1 (M -equivalence). Let M ∈ GLnr (F2s). We say that two permu-tations p, p′ ∈ Snrnc are M -equivalent, if there exists a permutation ϑ ∈ T (M)such that p′ = ϑ ◦ p ◦ ϑ−1. We write p ∼ p′ for two M -equivalent permutations pand p′.

Note that T (M) is a subgroup of Snrnc which implies that the relation ∼ issymmetric, reflexive and transitive. Thus, ∼ is indeed an equivalence relation onSnrnc .

82

k0 k1 k2

SSb Pϑ−1 Pp Pϑ MixM SSb Pϑ−1 Pp Pϑ MixM

Pϑ−1 SSb Pp Pϑ MixM Pϑ−1 SSb Pp Pϑ MixM

Pϑ(k0) Pϑ(k1) k2

Pϑ−1 SSb Pp MixM Pϑ Pϑ−1 SSb Pp MixM Pϑ

Pϑ(k0) Pϑ(k1) Pϑ(k2)

id

(a)

(b)

(c)

Figure 4.1: This illustration shows how equivalent permutations lead to the samecipher upto permutation of the input, the output and the round keys. One caneasily see that cipher (a) is the same as cipher (b) by using the fact that the S-boxlayer commutes with the word permutation layer. The ciphers (b) and (c) are thesame since Permuteϑ (here denoted as Pϑ for short) commutes with MixM .

If p and p′ are M -equivalent permutations, by definition there exists a permu-tation ϑ such that

RSb,p′,M = Permuteϑ ◦RSb,p,M ◦ Permuteϑ−1 .

It is important to note that this can be extended for an arbitrary number of rounds.In particular, for any t ∈ N, we have

RtSb,p′,M = Permuteϑ ◦RtSb,p,M ◦ Permuteϑ−1 .

Thus, if any cryptanalysis is done independently of the actual specification ofthe round keys (i.e., under the assumption of independent round keys, see As-sumption 2.2), the corresponding ciphers share the same cryptographic proper-ties. This holds in particular for the case of differential and linear cryptanalysis;M -equivalent permutations lead to the same bound on the minimum number ofactive S-boxes.

For given nr, nc ∈ N and M ∈ GLnr (F2s), we aim for classifying all permu-tations in Snrnc up to M -equivalence. As described above, such a classificationwould allow us to check only a single representative of each equivalence class forits cryptographic properties and it thus may significantly reduce the complexityof finding the best word permutation. However, there is a difficulty in achievingthis classification. Namely, for an arbitrary M , it is not obvious how to effi-ciently determine T (M) and to separate all permutations into their equivalenceclasses. In order to reach our goal, we therefore first consider a weaker notion ofM -equivalence and describe an algorithm that enumerates all permutations up tothis weak equivalence. Later, we will see that, for certain choices of M , this weak

83

equivalence coincides with the general notion of M -equivalence. Fortunately, thisapproach allows us to finally classify all word permutations upto M -equivalencein its stronger notion for the case of Midori.

Definition 4.2 (weak M -equivalence). Let P denote the set of all word per-mutations that permute whole columns of the state and let Pl be the set of wordpermutations that operate independently on the columns of the state. Formally,

P= {p ∈ Snrnc | ∃σ ∈ Snc: ∀i ∈ N≤nr , j ∈ N≤nc: nr(j−1) + ip7→ nr(σ(j)−1) + i}

Pl= {p ∈ Snrnc | ∃σ0, . . . , σnc−1 ∈ Snr: ∀i ∈ N≤nr , j ∈N0<nc:nrj+ i

p7→ nrj+σj(i)}

Then, we say that a word permutation p is weakly M -equivalent to a word per-mutation p′, written p ∼w p′, if there exists a word permutation ϑ ∈ T (M) of theform ϑ = π ◦ φ, with π ∈ P and φ ∈ Pl, such that p′ = ϑ ◦ p ◦ ϑ−1.

Again, since {ϑ ∈ T (M) | ϑ = π ◦ φ with π ∈ P and φ ∈ Pl} is a subgroupof Snrnc , the relation ∼w is indeed an equivalence relation. Further, p ∼w p′

trivially implies p ∼ p′. For an equivalence class [p]∼w , we consider the smallestpermutation (in lexicographic ordering ≺) as its canonical representative. For agiven M , we now describe a way to enumerate a single representative of eachequivalence class.

4.2.1 Structure Matrix of a Word Permutation

Let p ∈ Snrnc be a word permutation on an nr ×nc state. We define the structurematrix of p as the nc × nc matrix Ap s.t. Api,j contains the number of words ofcolumn i that are permuted to column j. Formally, for i, j ∈ N≤nc ,

Api,j= |{k | k=nr(i−1)+r for r ∈N≤nr and nr(i−1)+rp7→ nr(j−1)+r′, r′ ∈N≤nr}|

Example 4.1.1 5 9 132 6 10 143 7 11 154 8 12 16

p7→

5 1 14 26 7 15 312 10 9 416 13 8 11

, Ap =

0 1 0 32 1 1 01 1 1 11 1 2 0

Note that an nc×nc matrix is a valid structure matrix for some permutation if

and only if the sum of each column as well as the sum of each row adds up to nr.Let now σ ∈ Snc . For an nc × nc matrix A, we define Aσ as the nc × nc matrixthat is obtained from A by both permuting the rows and the columns accordingto σ. Formally,

∀i, j ∈ N≤nc : Aσi,j := Aσ(i),σ(j) .

We now define an equivalence relation ∼ on nc × nc structure matrices as

A ∼ B : ⇔ ∃σ ∈ Snc : B = Aσ.

The following proposition explains how the weak M -equivalence of permuta-tions and equivalence of their corresponding structure matrices are related.

84

Proposition 4.1. Let nr, nc ∈ N and let M ∈ GLnr (F2s).

(i) If p ∼w p′ for two p, p′ ∈ Snrnc , then Ap ∼ Ap′ .

(ii) Let A ∼ B for two valid nc×nc structure matrices of permutations in Snrnc .Then, for any permutation p ∈ Snrnc that has A as structure matrix, thereexist a permutation p′ ∈ Snrnc that has B as structure matrix such thatp ∼w p′.

Proof. (i) Let p ∼w p′. Then, by definition there exists permutations π ∈ P�and φ ∈ Pl such that p′ = π ◦φ ◦ p ◦φ−1 ◦π−1. Clearly, Ap = Aφ◦p◦φ−1 . Let

now be σ ∈ Snc such that, for all i ∈ N≤nr and j ∈ N≤nc , nr(j − 1) + iπ7→

nr(σ(j)− 1) + i. Then, for all i, j ∈ N≤nc , it is

(Ap′)i,j = (Ap′◦π)σ−1(i),j = (Aπ−1◦p′◦π)σ−1(i),σ−1(j) = (Aφ◦p◦φ−1)σ−1(i),σ−1(j) .

This shows that Ap′ = Aσ−1

φ◦p◦φ−1 = Aσ−1

p .

(ii) Let p ∈ Snrnc such that Ap = A. By definition of the equivalence between Aand B, there exist a σ ∈ Snc such that Aσ

p = B. With the same observationas above, it follows that there exists a π ∈ P� such that Aσ

p = Aπ◦p◦π−1 .This means that π ◦ p ◦ π−1 has structure matrix B. Now, since Permuteπcommutes with MixM , we have π ◦ p ◦ π−1 ∼w p.

This result (ii) implies that, in order to characterize permutations upto weakM -equivalence, it is enough to separate all valid structure matrices into their equiv-alence classes, pick a representative of each class and search for all permutations(upto weak M -equivalence) that fulfill the respective structure matrix.

4.2.2 Search Algorithm

Algorithm 4.1 for enumerating all permutations upto weak M -equivalence for agiven structure matrix A is given below. It works as follows. We start with an nr×nc word permutation pstart which is undefined at any position (this is representedby a −1 value). Then, we apply EnumerateRecursive to pstart. After eachcall, the permutation is extended by another column until it is completely defined.Note that it can only be extended by a column which meets the requirements givenby the structure matrix A (see line 12 of Algorithm 4.1). Further, the extensionhas to make sure that we can still get a permutation, i.e., no value except -1 occurstwice. Only if the new extended permutation leads to a smallest representative ofthe equivalence class of conjugation with all π ∈ Pl, the algorithm will proceedwith this permutation. This is checked in line 14. Checking if p is the smallestrepresentative can be done with at most |Pl| iterations.

After the algorithm terminates, it outputs a list of word permutations thatcontains at least one representative of each equivalence class. However, it can

85

contain more than one representative. As a last step, for each p in this list, it istherefore required to check if p is the smallest permutation w.r.t. conjugation byall π ◦φ with φ ∈ Pl and π ∈ P� that leave the structure matrix of p invariant. Inother words, it is to check if p is smaller than any permutation π ◦φ◦p◦φ−1 ◦π−1,where φ ∈ Pl and π ∈ P� such that Ap = Aσ

p for the σ ∈ Snc corresponding tothe permutation π.

4.2.3 The Difference between the Equivalence Notions

In this section, we outline the relation between weak M -equivalence and thestronger notion of M -equivalence. The following proposition describes a suffi-cient condition on the matrix M such that the notions of weak M -equivalenceand M -equivalence are the same. Note that, from now on, we focus on matri-ces M ∈ GLnr (F2s) that consists of binary coefficients only, i.e., coefficients from{0, 1} ⊆ F2s .

Proposition 4.2. Let M ∈ GLnr (F2s) be an nr×nr matrix with binary coefficientsand let G be the directed graph with m vertices that has M as its adjacency matrix.Then, if G is a strongly connected directed graph, the notion of M -equivalencecoincides with the notion of weak M -equivalence.

Proof. Let ϑ be a permutation in T (M). We can write any word position k ∈{1, . . . , nrnc} of an nr × nc state uniquely as k = nr · Block(k) + Index(k) with1 ≤ Index(k) ≤ nr. We now have to show that, for all k, k′ ∈ {1, . . . , nrnc},Block(k) = Block(k′) implies Block(ϑ(k)) = Block(ϑ(k′)). This would imply thatϑ can be written as π ◦ φ with π ∈ P� and φ ∈ Pl.

We can represent the operation MixM as the nrnc×nrnc binary block-diagonalmatrix which consists of nc blocks of the nr × nr matrix M , i.e.,

MixM =

M

M. . .

M

=: (bi,j)i,j∈N≤nrnc .

Since the permutation matrix of ϑ has to commute with MixM considered as annrnc × nrnc binary matrix, we necessarily have the property that bi,j = bϑ(i),ϑ(j)

for all i, j ∈ {1, . . . , nrnc}. Let now k, k′ ∈ {1, . . . , nrnc}. By the block-diagonalstructure of the matrix MixM , it is

bk,k′ = 1⇔ Block(k) = Block(k′) and Index(k′) ∈ TIndex(k),

where Ti := {j ∈ N≤nr | Mi,j = 1}. Since bk,k′ = bϑ(k),ϑ(k′), we have thatBlock(ϑ(k)) = Block(ϑ(k′)), for all k′ with Index(k′) ∈ TIndex(k) and Block(k) =Block(k′).

What we have now shown is that, for an arbitrary l ∈ N≤nr and for all k, k′

with Index(k), Index(k′) ∈ Tl,

Block(k) = Block(k′) =⇒ Block(ϑ(k)) = Block(ϑ(k′)).

86

Algorithm 4.1 Enumerate all permutations upto weak M -equivalence for a givenstructure matrix A

1: procedure EnumeratePermutations(A)2: R← {}

3: pstart =

−1 . . . −1...

. . ....

−1 . . . −1

4: EnumerateRecursive(A, 0, pstart)5: return R6: end procedure7:

8: procedure EnumerateRecursive(A, j, p)9: if j ≥ nc then

10: return11: end if12: for q = [q1, . . . , qnr ]

> corresponding to A·,j do13: pnew = Extend(p, q)14: if pnew is permutation and @p′ ∈ [p]∼w : p′ ≺ p then15: if pnew is completely defined then16: R← R ∪ {pnew}17: else18: return EnumerateRecursive(A, j + 1, pnew)19: end if20: end if21: end for22: end procedure23:

24: procedure Extend(p, q)25: for j ∈ N≤nc do26: if p·,j = [−1, . . . ,−1]> then27: p·,j ← q28: return p29: end if30: end for31: end procedure

87

Let now be k, k′ given with Block(k) = Block(k′) and Index(k), Index(k′) notnecessarily in the same Tl. From the property that G is a strongly connecteddirected graph, we obtain that there exists k(0), . . . , k(t) s.t. k = k(0), k′ = k(t),Block(k(i)) = Block(k(j)) for all i, j ≤ t, and

∀i ∈ {0, . . . , t− 1} ∃l′ : Index(k(i)), Index(k(i+1)) ∈ Tl′ .

But then, Block(ϑ(k(i))) must be the same for all k(i), and in particular,

Block(ϑ(k)) = Block(ϑ(k′)) .

Example 4.2. Let

M =

0 1 1 11 0 1 11 1 0 11 1 1 0

be the MixColumns matrix applied in Midori. Then, the directed graph G with nrvertices having adjacency matrix M can be given as

1 2

4 3 ,

which is a strongly connected directed graph.

Corollary 4.1. For the Midori MixColumns matrix M , the notion of weak M -equivalence coincides with the notion of M -equivalence.

4.3 Case Study – The Best Word Permutations forMidori

Midori operates on an nr × nc state with nr = nc = 4, using a word size of s = 4for the 64-bit block-size version and s = 8 for the 128-bit version, respectively. Forsuch state dimensions, there are 501 possible structure matrices upto equivalence.The Midori MixColumns operation MixM has the useful property that Permuteφcommutes with MixM for all 244 possible permutations φ ∈ Pl.

Applying Algorithm 4.1 can be done efficiently and thus, all word permutationscan be enumerated upto M -equivalence. One finally obtains 3, 413, 774 ≈ 221.7

88

distinct equivalence classes. Out of those, 14, 022 permutations correspond to theall-1 structure matrix, i.e.,

A =

1 1 1 11 1 1 11 1 1 11 1 1 1

.

Note that permutations having this particular structure matrix distribute thewords within a column to all different columns. By intuition, those permutationshould lead to the highest number of active S-boxes.

For each of the 3, 413, 774 distinct word permutations p, we now want to eval-uate the cryptographic properties of the corresponding cipher that is obtained bysubstituting the word permutation of Midori by p. In particular, we would like tocompute an exact lower bound on the minimum number of active S-boxes.

4.3.1 Computing the Minimum Number of Active S-boxes

In order to find the exact lower bounds on the minimum number of active S-boxes for all of the 221.7 candidates, we applied Matsui’s algorithm [Mat95] (seeAlgorithm 2.2). In particular, the MixColumns matrix of Midori has the interestingproperty that only a very limited number of branching transitions are possible. Forinstance, 2 active words in a column will never lead to 3 active words after applyingMixColumns. Figure 4.2 shows all 51 possible MixColumns transitions for a singlecolumn.

This leads to a very efficient running time of Matsui’s algorithm which allowsto compute the bounds for the permutations for up to 40 rounds within a fewdays on a CPU cluster.1 Our most interesting observation is that the Midori wordpermutation is in fact not optimal for all number of rounds. For instance, thereare four permutations upto equivalence that guarantee 44 active S-boxes for 9rounds, while the permutation used in Midori only guarantees 41. Furthermore,up to 40 rounds, the Midori word permutation is never optimal from 13 roundsonwards. Instead, there are two alternative permutations that are optimal formost of the number of rounds up to 40. Interestingly, there does not exists a wordpermutations that is optimal for all of the number of rounds simultaneously. Someoptimal permutations are listed in Table 4.1.

In the case of Midori, the designers only looked at a subset of word permuta-tions, denoted Sopt, which they called “optimal”, by first filtering all row-basedpermutations according to Condition 1 and then applying a column permutationfor which Condition 2 or Condition 3 holds. We recall those conditions as statedin [BBI+15, pp. 15-16].

1If a permutation reached only 40 or less active S-boxes over 10 rounds, the computation of thebounds for more rounds was aborted. The overall running time for all of the 221.7 permutationswas roughly 1600 CPU days.

89

Figure 4.2: This figure shows all of the 51 possible (non-trivial) transition pat-terns of the MixColumn matrix in Midori.

Table 4.1: Some classes of permutations that, under the MixM operation ofMidori, lead to optimal bounds on the number of active S-boxes. All of the permu-tations have the all-1 structure matrix. An optimal bound for a particular numberof rounds is emphasized in red. Here, optimal refers to the best bound over allpermutations that have more than 40 active S-boxes over 10 rounds. The first linerepresents the equivalence class of the Midori permutation.

Rounds[p]∼ 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40

[1 5 9 13 6 10 4 14 15 2 7 12 11 8 16 3] 23 30 35 38 41 50 57 62 67 72 75 84 89 94 101 106 109 116 121 128 133 138 143 148 155 160 165 170 175 182 187 192 197 202 209 214[2 5 9 13 3 7 14 10 11 6 15 1 16 4 12 8] 23 30 35 38 41 50 57 62 67 72 75 84 89 94 101 106 109 116 121 128 133 138 143 148 155 160 165 170 175 182 187 192 197 202 209 214[1 5 9 13 3 7 10 14 8 15 4 12 6 16 11 2] 18 24 30 38 44 48 52 56 60 70 76 80 84 88 92 96 106 112 116 120 124 128 132 142 148 152 156 160 164 168 178 184 188 192 196 200[1 5 9 13 3 7 10 14 11 4 8 15 12 2 6 16] 18 24 30 38 44 48 52 56 60 70 76 80 84 88 92 96 106 112 116 120 124 128 132 142 148 152 156 160 164 168 178 184 188 192 196 200[2 5 9 13 3 7 14 10 6 11 1 15 4 16 8 12] 18 24 30 38 44 48 52 56 60 70 76 80 84 88 92 96 106 112 116 120 124 128 132 142 148 152 156 160 164 168 178 184 188 192 196 200[2 5 9 13 6 10 3 14 1 15 7 11 4 16 8 12] 18 24 30 38 44 48 52 56 60 70 76 80 84 88 92 96 106 112 116 120 124 128 132 142 148 152 156 160 164 168 178 184 188 192 196 200[1 5 9 13 3 7 10 14 15 8 4 11 16 6 2 12] 20 28 34 38 43 48 54 62 68 72 76 82 88 96 100 104 110 116 122 128 132 138 144 150 156 160 166 172 178 184 188 194 200 206 212 216[2 5 9 13 6 3 10 14 15 8 4 11 12 16 7 1] 22 28 32 38 41 48 54 60 68 72 75 82 88 95 100 104 109 116 120 128 132 136 143 148 152 160 164 168 176 180 184 192 196 200 208 212[2 5 9 13 6 10 1 14 15 11 4 8 3 16 7 12] 22 28 32 38 41 48 54 60 68 72 75 82 88 95 100 104 109 116 120 128 132 136 143 148 152 160 164 168 176 180 184 192 196 200 208 212[1 5 9 13 3 6 10 14 8 2 11 15 12 16 7 4] 20 28 32 38 41 50 55 60 67 74 77 83 91 94 101 107 111 118 125 130 137 142 147 154 159 164 171 178 181 188 195 200 206 212 217 224[1 5 9 13 3 6 10 14 8 2 11 15 16 12 4 7] 20 28 32 38 41 50 55 60 67 74 77 83 91 94 101 107 111 118 125 130 137 142 147 154 159 164 171 178 181 188 195 200 206 212 217 224[1 5 9 13 3 7 10 14 8 2 11 15 12 16 6 4] 22 28 33 38 41 48 55 62 67 74 77 84 90 96 102 108 113 120 125 130 136 142 148 154 160 166 170 176 182 188 195 200 206 210 216 222[1 5 9 13 3 7 14 10 6 2 11 15 16 12 4 8] 22 28 33 38 41 48 55 62 67 74 77 84 90 96 102 108 113 120 125 130 136 142 148 154 160 166 170 176 182 188 195 200 206 210 216 222[1 5 9 13 6 10 2 14 8 11 3 15 12 16 4 7] 22 28 32 38 41 48 56 62 67 74 77 84 90 95 101 108 113 118 124 130 137 144 149 154 160 166 173 178 184 190 196 201 207 214 220 226[1 5 9 13 6 10 2 14 15 11 3 7 8 4 16 12] 22 28 32 38 41 48 56 62 67 74 77 84 90 95 101 108 113 118 124 130 137 144 149 154 160 166 173 178 184 190 196 201 207 214 220 226[1 5 9 13 6 3 10 14 8 15 2 11 16 12 4 7] 20 28 32 38 43 48 55 60 67 72 79 84 89 96 101 108 114 120 125 130 137 144 148 155 161 166 171 178 183 190 197 201 207 212 219 225[1 5 9 13 6 3 10 14 11 2 15 8 12 16 7 4] 20 28 32 38 43 48 55 60 67 72 79 84 89 96 101 108 114 120 125 130 137 144 148 155 161 166 171 178 183 190 197 201 207 212 219 225[1 5 9 13 6 10 2 14 15 4 7 11 8 16 12 3] 22 26 31 38 43 48 53 60 65 72 79 84 87 96 101 108 113 120 123 130 137 141 147 154 160 166 172 177 183 188 195 201 207 212 219 224[1 5 9 13 6 10 2 14 15 7 3 11 8 4 16 12] 22 28 33 38 43 50 55 62 67 72 79 84 90 96 101 108 114 120 126 130 136 142 148 154 160 166 172 176 182 188 194 200 206 212 218 222[1 5 9 13 6 10 2 14 15 11 7 3 8 16 12 4] 22 28 33 38 43 50 55 62 67 72 79 84 90 96 101 108 114 120 126 130 136 142 148 154 160 166 172 176 182 188 194 200 206 212 218 222[1 5 9 13 6 10 3 14 15 4 8 12 7 16 2 11] 22 28 33 38 43 50 57 62 67 72 79 86 91 96 101 108 115 120 125 132 137 144 150 156 161 166 173 180 185 190 195 202 209 214 221 226[1 5 9 13 6 10 4 14 8 15 2 12 16 11 3 7] 22 28 33 38 43 50 57 62 67 72 79 86 91 96 101 108 115 120 125 132 137 144 150 156 161 166 173 180 185 190 195 202 209 214 221 226

90

• Condition 1: After applying a cell-permutation once and twice, each inputcell in a column is mapped into a cell in the different column.

• Condition 2: After applying a cell-permutation twice and twice inversely,each input cell in a column is mapped into a cell in the same row.

• Condition 3: After applying a cell-permutation once and three times in-versely, each input cell in a column is mapped into a cell in the same row.

For our optimal word permutations for 9 rounds (and all permutations in thisequivalence class), we checked whether any of them contains a member in Sopt.This is not the case, so it shows that these conditions are neither strictly necessarynor sufficient to maximize the number of active S-boxes.


In this chapter, we have seen that it is feasible to classify all word permutationsfor some lightweight AES-like ciphers like Midori and to find the optimal wordpermutations with respect to the minimum number of active S-boxes. We showedhow the full search space can be reduced by classifying all the word permutationsup to a reasonable notion of equivalence.

We provided an efficient algorithm for finding all those equivalence classes andthen determined the exact bound on the minimum number of active S-boxes usingMatsui’s approach. This demonstrates that, for certain designs including Midori,mCrypton and Mantis, it is feasible to cover all choices for the word permutation.We further provided several permutations which outperform the original wordpermutation used in Midori. For future work, it would be interesting to derivea further understanding on how optimal word permutations could be generated,e.g., in a similar way than the Midori designers generated the permutations fromConditions 1–3 above. We emphasize that this seems to be a very difficult problem.

Overall, we think the methods presented in this chapter will be particularuseful for future designs as they allow to explore the whole design space for wordpermutations. For future work, it would be interesting to analyze other AES-likeciphers, for instance Skinny. However, due to the specific MixM operation, theclassification algorithm would lead to more classes that have to be handled withMatsui’s algorithm. On the other hand, Matsui’s algorithm for computing thebounds would be more efficient for the MixM operation used in Skinny due to theeven more limited possible MixColumns transitions.

Another topic for further research would be to study in which scenarios MILP ispreferable over Matsui’s approach in terms of running time for computing boundson the number of active S-boxes.

91

Chapter 5

The Tweakable Block CiphersSkinny and Mantis

The two designs Skinny and Mantis, described in Section 5.3 and Section 5.4, werepreviously published in [BJK+16a] which is joint work with Jeremy Jean, StefanKolbl, Gregor Leander, Amir Moradi, Thomas Peyrin, Yu Sasaki, Pascal Sasdrichand Siang Meng Sim.1 The author contributed to the design of the Skinny familyof block ciphers, with a focus on the design of the linear layer and the choice ofthe tweakey permutation for maximizing the minimum number of active S-boxesin the related-tweakey (TK1) setting. Further, he contributed to the design of thelow-latency cipher Mantis and derived the dedicated MILP model for computingthe minimum number of active S-boxes in the related-tweak setting.

5.1 Introduction

In 2013, the National Security Agency (NSA) presented the Simon family oflightweight block ciphers [BSS+13]. The ciphers of those family consist of a verylightweight round function and offer extremely good performance on a variety ofplatforms. However, due to its innovative design, standard security arguments(e.g., those according to the wide-trail strategy) do not apply and the securityanalysis with regard to differential and linear attacks is not straightforward. Un-fortunately, the designers did not provide any security analysis in the design doc-ument and all the cryptanalysis was basically conducted by external researchers.For instance, in [KLT15], the authors derive bounds on the probability of differ-ential trails (resp. square correlation of linear trails) based on SAT/SMT solvers.

1The original article published by Springer-Verlag is available at DOI: 10.1007/978-3-662-53008-5 5 ( c© IACR 2016). A full version is available at eprint.iacr.org/2016/660. We omitmany parts here, especially those that are not related to the author’s contribution.

93

https://dx.doi.org/10.1007/978-3-662-53008-5_5

https://dx.doi.org/10.1007/978-3-662-53008-5_5

https://eprint.iacr.org/2016/660

In the more complex related-key adversary model, a model in which the adversaryis allowed to query encryptions under different related keys, no such bounds areknown.

Although no significant weaknesses of Simon have been published to date, itwould be beneficial to have an alternative cipher at hand that competes withthe performance of Simon, while additionally providing strong security arguments,even in the related-key model.

Outline of this Chapter

We first give a brief introduction into tweakable block ciphers and their formalsecurity notions. Basically, a tweakable block cipher receives an additional (public)input, called the tweak, that serves as a parameter for achieving variability. Wealso explain the related-key adversary model for (classical) block ciphers. We thenrefrain from the formal security notions and explain adversaries that adhere tothe more specific related-key differential attack. Those type of attacks allow theadversary to insert differences not only within the plaintext, but also within thekey. They are by far the most important when it comes to argue on the resistanceagainst related-key attacks. Similar to differential attacks in the single-key model,there are standard arguments used by block cipher designers for providing evidenceon the resistance against related-key differential attacks. We then explain theTWEAKEY framework which was introduced as a unification of tweakable blockcipher designs and the design of classical block ciphers resistant against related-keyattacks.

In Section 5.3, we show Skinny, a family of lightweight tweakable block ciphers.It comes in different versions to support block length of 64-bit and 128-bit anddifferent key/tweak sizes. The design according to the TWEAKEY frameworkallows the usage of Skinny as a flexible tweakable block cipher. Skinny is a designfrom academia whose goal is to compete with the NSA design Simon in termsof hardware/software performance, while in addition providing easy and strongsecurity arguments on the resistance against differential and linear attacks. Incomparison to Simon (which not allows a tweak input), we also consider the related-key/related-tweak model in the security analysis. To reach our goal, Skinny useswell-understood design principles. All components are optimized for performance,with lots of parts not strictly necessary for security being removed. After giving thespecification of the Skinny family and the motivation behind the design choices, weexplain the mixed-integer linear programming (MILP) approach for evaluating thesecurity against differential and linear attacks in more detail. We then concludethe section by mentioning the best external cryptanalysis of Skinny so far.

In Section 5.4, we explain the lightweight tweakable block cipher Mantis, adesign optimized for low-latency applications such as memory encryption [HT14].Especially for the use case of memory encryption, the additional tweak input isbeneficial as the memory address can be defined as the tweak. Such a low-latencyblock cipher should allow for a fast execution within a single clock cycle and

94

the overhead for additionally implementing decryption should be quite low. Thedesign is based on the low-latency (non-tweakable) block cipher Prince and theblock cipher Midori. After giving the specification and the motivation behind thedesign choices, we explain how the security was evaluated using MILP and mentionthe best external cryptanalysis so far.

5.2 Tweakable Block Ciphers

The concept of a block cipher that includes a public parameter for achieving vari-ability goes back to the design of the Hasty Pudding Cipher [Sch98]. This was laterformalized in the notion of a tweakable block cipher [LRW02, LRW11]. Formally,a tweakable block cipher can be defined as a family of block ciphers parametrizedby a public parameter, called the tweak.

Definition 5.1 (Tweakable Block Cipher). Let n, κ, τ ∈ N. An (n, κ, τ)-tweakableblock cipher is a function

E : Fn2 × Fκ2 × Fτ2 → Fn2with the property that, for each h ∈ Fτ2 , the projection hE := E(·, ·, h) is a blockcipher. Thereby, h is called the tweak. For a key k ∈ Fκ2 , we denote by Ek theprojection E(·, k, ·) and refer to it as a keyed instance of the tweakable block cipherE. By hEk := E(·, k, h), we denote the keyed instance of E with key k and tweak h.

There seems to be no substantial difference between the definition of an (n, κ, τ)-tweakable block cipher and the definition of an (n, κ + τ)-block cipher, as akey/tweak pair (k, h) could be simply considered as one element k||h ∈ Fκ+τ

2 ,serving as the key. The main point of separating the notion of a tweakable blockcipher from that of a classical block cipher is that the key is kept secret andthe tweak is assumed to be public information that serves as a parameter forachieving variability of the actual instance. The motivation of allowing this ad-ditional variability, as outlined in [LRW11], is that variability is needed at themode-of-operation level. For instance, when we recall the example of the CTRmode (Equation 2.1), a counter is used for varying the encryption functions ineach block. The authors suggested that the source of variability should be directlyincorporated in the block cipher itself (as an example of a mode of operation,each block could be encrypted with the same tweakable block cipher and differentcounters are incorporated as the tweaks). Such a tweakable block cipher shouldthen be designed in a way that allows to change the tweak more efficiently thanchanging the key.

The formal security notion of a tweakable block cipher has to model the tweakas public information. More precisely, for a key k chosen uniformly at random,the keyed instance Ek should be indistinguishable from a family of permutationschosen independently and uniformly at random. This is formalized in the nextdefinition. Let us define Permn,τ as the set of all (n, τ)-block ciphers, i.e., the setof all families of 2τ permutations on Fn2 that are parametrized by h ∈ Fτ2 .

95

Definition 5.2 (Pseudorandomness of a Tweakable Block Cipher (Definition 3in [LRW11])). Let E be an (n, κ, τ)-tweakable block cipher. Let Aq,t be a (q, t)-adversary with oracle access to an element of Permn,τ . The TPRP advantage of

Aq,t against E is defined as

AdvTPRP−CPA

E(Aq,t) := Prob

k$←Fκ2

(AEkq,t ⇒ 1)− Prob

Π$←Permn,τ

(AΠq,t ⇒ 1

).

Here, Π$← Permn,τ denotes that Π is chosen as a family of 2τ independent,

uniformly random permutations on Fn2 . Thus, the probabilities are defined overthe uniform choices of k and Π and over the random choices that the probabilisticadversary Aq,t makes. Similar to the notion of a pseudorandom permutation for

a block cipher, one considers E to be secure (also called chosen-tweak secure) if,for reasonable restrictions on the computational resources q, t,

maxAq,t

AdvTPRP−CPA

E(Aq,t) ≤ ε

for a sufficiently small ε. One can moreover consider CCA adversaries that have,like in the notion of a strong pseudorandom permutation, also oracle access to theinverse permutations (see [LRW11, Definition 4]).

Examples of existing tweakable block ciphers in the literature can be dividedinto two classes. The first class constructs a tweakable block cipher from un-derlying primitives (e.g., classical block ciphers), see for instance the LRW con-struction [LRW02] or the XE and XEX construction [Rog04]. The security of thetweakable block cipher is then proven by reducing to the security of the under-lying primitive. The second class contains tweakable block ciphers designed fromscratch, i.e., block cipher designs that directly support the tweak input. Exam-ples include the Hasty Pudding Cipher [Sch98], the ciphers Deoxys-BC, Joltik-BCand Kiasu-BC proposed along with the TWEAKEY framework [JNP14], or thelow-latency tweakable block cipher QARMA [Ava17].

5.2.1 Related-Key Attacks

Related-key attacks [Bih94a, Bih94b] refer to a special kind of adversary model onblock ciphers. In a nutshell, this model allows the adversary to query differentinstances of the cipher which are related in some way. The goal of the adversaryis to distinguish the original keyed instance from a random permutation. Bellareand Kohno formalized the related-key adversary model in [BK03a] and introducedthe notion of pseudorandomness with respect to related-key attacks. Formally, theadversary interacts with a special kind of oracle that exactly models the queriesto related instances according to a previously defined relation.

Definition 5.3 (Related-Key Oracle [BK03a]). Let E be an (n, κ)-block cipher.For a keyed instance Ek, a related-key oracle ERK,k is defined as an oracle whichtakes as arguments a function φ : Fκ2 → Fκ2 and an element x ∈ Fn2 , and returnsthe value of Eφ(k)(x) whenever queried at inputs φ and x.

96

In this context, φ is also called a related-key deriving function. The notion ofpseudorandomness in the related-key model will restrict the adversary to oraclequeries of the form (φ, x), where the related-key deriving function φ must belongto a previously defined set of allowed functions.

Definition 5.4 (Pseudorandomness with Respect to Related-Key Attacks (Defi-nition 1 in [BK03a]). Let E be an (n, κ)-block cipher and let Φ = {φi : Fκ2 → Fκ2}ibe a set of related-key-deriving functions. Let Aq,t be a (q, t)-adversary with oracleaccess to a related-key oracle and being restricted to queries of the form (φ, x), forφ ∈ Φ and x ∈ Fn2 . The Φ-restricted related-key advantage of Aq,t against E isdefined as

AdvPRP−RKAΦ,E (Aq,t):=Prob

k$←Fκ2

(AERK,k

q,t ⇒ 1)−ProbΠ

$←Permn,κ,k$←Fκ2

(AΠRK,k

q,t ⇒ 1) .

The probabilities are defined over the uniform choices of k and Π and overthe random choices that the probabilistic adversary Aq,t makes. Similar to allother security notions we have mentioned, one considers the block cipher E secureagainst Φ-restricted related-key attacks if, for reasonable restrictions on the com-putational resources q, t, the maximum Φ-restricted related-key advantage overall q, t adversaries is sufficiently small. Moreover, the above definition can easilybe extended to adversaries that are given related-key oracle access to the inversepermutations as well (see [BK03b, Definition 8.1]).

Bellare and Kohno formally showed several impossibility results for achiev-ing related-key security. In particular, the set Φ of allowed related-key derivingfunctions must be defined in a way that excludes trivial attacks.

Example 5.1 ((Proposition 4.1 in [BK03b])). Let E be an (n, κ)-block cipher andlet c ∈ Fκ2 . Let Φ be a set of related-key deriving functions that contains theconstant function φc : k 7→ c. We consider the adversary A with oracle access toa related-key oracle ORK,k given in Algorithm 5.1 below. It only makes a singleoracle query (for the function φc contained in Φ) and has basically the runningtime of one block cipher call. For the advantage, it is

AdvPRP−RKAΦ,E (A) = Prob

x$←Fn2

k$←Fκ2

(Eφc(x)(x) = Ec(x))− Probπ

$←Permn

x$←Fn2

(π(x) = Ec(x))

= 1− 1

2n.

This example illustrates that it is impossible to achieve resistance against alltypes of related-key attacks. The authors further showed impossibility results formore natural choices of Φ. As a more important example of practically-relevantrelated-key deriving functions, they studied the set

Φ⊕κ := {φ : Fκ2 → Fκ2 | ∃ι ∈ Fκ2 such that φ : k 7→ k + ι} = {Addι}ι∈Fκ2

97

Algorithm 5.1 Adversary A

1: Choose x$← Fn2

2: Compute y ← ORK,k(φc, x)3: if y = Ec(x) then4: return 15: end if6: return 0

in further detail. Especially at the protocol- or mode-of-operation level, related-keyaccess with regard to related-key deriving functions in Φ⊕κ might be a reasonableassumption on the power of the adversary and most block cipher designs (especiallythose not designed for lightweight purposes) aim for security against Φ⊕κ -restrictedrelated-key attacks. Moreover, Bellare and Kohno formally proved that a blockcipher which is secure against Φ⊕κ -restricted related-key adversaries gives rise to achosen-tweak secure tweakable block cipher. In particular, if one has a block cipherresistant against Φ⊕κ -restricted related-key attacks, one can construct a tweakableblock cipher by just XOR-ing the tweak to the key.

Theorem 5.1 ((Theorem 7.1 in [BK03b])). Let E be an (n, κ)-block cipher andlet E be the (n, κ, κ)-tweakable block cipher defined by hEk = Ek+h. Let A bean adversary with oracle access to an element of Permn,κ, one can construct aΦ⊕κ -restricted PRP-RKA-adversary B against E, with roughly the same number oforacle queries and running time, such that

AdvTPRP−CPA

E(A) ≤ AdvPRP−RKA

Φ⊕κ ,E(B) .

Related-Key Differential Attacks

When we consider related-key attacks for the set of related-key deriving func-tions Φ⊕κ , and especially aim for designing a cipher resistant against Φ⊕κ -restrictedrelated-key attack, a natural type of adversary to consider is a related-key differen-tial attack [KSW96]. In contrast to the differential attack in the single-key model,the adversary is not only allowed to insert differences in the plaintext, but also inthe key.

Definition 5.5 (Related-Key Differential). Let E be an (n, κ)-block cipher. Adifferential ((α, ι), β) over E with α, β ∈ Fn2 and ι ∈ Fκ2 is said to be a related-keydifferential over E.

One can give the related-key differential probability as

Prob((α, ι)E→ β) =

|{(x, k) ∈ Fn2 × Fκ2 | Ek(x) + Ek+ι(x+ α) = β}|2n+κ

.

If the adversary wants to distinguish a keyed instance Ek of the cipher from a

random permutation and is in possession of a related-key differential (α, ι)E→ β

98

that holds with probability > 2−n, it would query the related-key oracle on inputs(Add0 : k 7→ k + 0, x) and (Addι : k 7→ k + ι, x + α) for many x ∈ Fn2 and checkwhether the XOR difference of the two oracle responses equals β as often as onewould expect by the differential probability.

If E is defined as a product cipher, similar as in the case of single-key differen-tial attacks, a security argument on the resistance against related-key differentialattacks is usually based on a single differential trail. In the related-key model, adifferential trail not only fixes the particular output differences after every round,but also the differences of all the round keys. In particular, if R0, . . .Rt denote therounds of the product cipher (where Ri : Fn2 × Fκi2 → Fn2 ), as a security argumentone computes an upper bound on the value of

Prob(ιKeySchedule−→ (ι0, . . . ιt))

t∏i=0

Prob((αi, ιi)Ri→ αi+1) (5.1)

over all ι ∈ Fκ2 , ιi ∈ Fκi2 , and αi ∈ Fn2 with (ι, α0) 6= (0, 0). In the case of at-round key-alternating cipher with unkeyed round functions R0, . . . Rt, Equation5.1 reduces to

Prob(ιKeySchedule−→ (ι0, . . . ιt))

t∏i=0

Prob(αiRi→ αi+1 + ιi) . (5.2)

Note that, whenever we fix ι = 0 in Equation 5.2, this is exactly the securityargument on the resistance against single-key differential attacks. By additionallyconsidering all possible differences ι in the key, we include much more powerfuladversaries.

As a standard approach, the designer of a key-alternating cipher aiming forresistance against related-key differential attacks would evaluate the number ofrounds t for which Equation 5.2 can be upper bounded by 2−n, considering alldifferences in the round inputs/outputs and all differences in the round keys (withthe restriction (α0, ι) 6= (0, 0) in order to avoid a trivial differential trail), andspecify the actual number of rounds of the cipher as t+tm for a reasonable securitymargin tm. As for single-key differential attacks, several automatic search toolsexist for finding the best differential trails and for deriving those upper bounds.For instance, for word-oriented ciphers, Biryukov and Nikolic developed a tool forfinding related-key differential trails based on Matsui’s approach [BN10]. A searchalgorithm based on MILP is presented in [SHW+14b].

We would like to mention that, for instance, there exist related-key attacks(based on differential cryptanalysis) on AES-192 and AES-256 [BKN09, BK09].

The Wide-Trail Approach

When the rounds of the cipher are defined as Ri = L ◦ S for a bijective linearlayer L and for S consisting of a parallel application of a bijective s-bit S-box Sb,similar as already described in Section 2.4 for the single-key model, the probability

99

given in Equation 5.2 can be upper bounded by computing the minimum numberof active S-boxes. In particular,

t∏i=0

Prob(αiRi→ αi+1 + ιi) =

t∏i=0

Prob(αi

S→ L−1(αi+1) + L−1(ιi))≤ pwts(C)

Sb ,

where C = (α0, . . . , αt) and wts(C) =∑t−1i=0 ws(αi).

The difference to the single-key model is that it has to be taken care of theaddition of the differences in the key input, i.e., valid trails are not defined as

those for which, for all i ∈ {0, . . . , t}, Prob(αiRi→ αi+1) 6= 0, but those for which

Prob(αiRi→ αi+1 + ιi) 6= 0, where (ι0, . . . , ιt) defines a possible output difference

over the key schedule. If the key schedule is an affine function, the differencesι0, . . . , ιt are completely determined by the initial key-input difference ι. In otherwords, for each ι ∈ Fκ2 , there is only one tuple (ι0, . . . , ιt) for which the differential

probability Prob(ιKeySchedule−→ (ι0, . . . ιt)) 6= 0. In fact, that differential probability

is equal to one.

For Skinny and Mantis, the key schedule is indeed affine (linear up to additionof round constants), which renders the computation of bound on the minimumnumber of active S-boxes very efficient. After explaining those designs, we describethe MILP model for evaluating the bounds in Sections 5.3.3 and 5.4.5, respectively.

5.2.2 The TWEAKEY Framework

In [JNP14], the TWEAKEY framework was proposed in order to unify the designof block ciphers that resist related-key attacks and the design of tweakable blockciphers. A drawback of constructing a tweakable block cipher from a Φ⊕κ -restrictedrelated-key secure block cipher by simply adding the tweak to the key, as givenin Theorem 5.1, is that one loses the related-key security of the tweakable blockcipher, i.e., changing the tweak h to h + δ and the key k to k + δ results in thesame instance of the cipher. Instead, the idea of TWEAKEY is to consider thekey and the tweak as basically the same type of input. In particular, the idea isto design an (n, κ + τ)-block cipher E and use it as an (n, κ, τ)-tweakable blockcipher E by setting hEk = Ek||h, i.e., parts of the κ + τ -bit key are assumed tobe a public tweak. This allows to obtain tweakable block cipher designs with aflexible tweak length. It was suggested that such a design should be of a key-alternating construction with a very efficient key schedule. Further, as security ofE against Φ⊕κ+τ -restricted related-key attacks is necessary in order to employ it assuch a flexible tweakable block cipher, the design should allow for arguments onthe resistance against related-key differential attacks.

More precisely, the TWEAKEY framework specifies the key-scheduling algo-rithm, also called the tweakey schedule, as follows. From an initial input tk ∈ Fκ+τ

2 ,called the tweakey, the round keys to be added in the key-alternating construc-tion are generated by the iteration of an update function upd: Fκ+τ

2 → Fκ+τ2 , an

100

extraction function extr : Fκ+τ2 → Fn2 , and the addition of a round constant. In

particular, the round key kr is obtained from tk by2

kr = extr(updr(tk)) + cr .

For AES-like ciphers, the authors presented the superposition tweakey (STK) con-struction as an efficient instance of the TWEAKEY framework that allows for aneasier security analysis using automatic tools. In that construction, the length ofthe tweakey input tk is a multiple of n = s · nr · nc. The STK tweakey scheduleseparates the tweakey input tk into distinct AES-like nr × nc states with s-bitwords. Then, upd operates as a permutation of the words of each state, followedby a transformation operating separately on each s-bit word of the states.3 Theextraction function extr is simply defined as the F2-addition of each s · nr · nc-bitstate. In the next section, we explain Skinny which was designed as a family oflightweight block ciphers following the STK construction.

5.3 The Skinny Family of Lightweight (Tweakable)Block Ciphers

In the following, we interchangeable use the expressions key, tweak and tweakey,as they basically refer to the same type of input.

The Skinny family of lightweight tweakable block ciphers comes in six differentversions, supporting block length of n = 64 and n = 128 and tweakey lengthof κ ∈ {n, 2n, 3n}, respectively. We denote the (n, κ)-version of the cipher bySkinny-n-κ. As almost all block ciphers, Skinny is designed as a product cipher.In particular, all the versions of Skinny mentioned above can be given as

Skinny-n-κ : Fn2 × Fκ2 → Fn2(x, k) 7→ Rtkt ◦ · · · ◦ R1k1(x) .

Thereby, the round keys k1, . . . , kt ∈ Fn2 are derived from the initial key k by akey-scheduling algorithm KeySchedule : Fκ2 → (Fn2 )t. The actual definition of therounds Ri, the key-scheduling algorithm, and the number of rounds t depends onthe particular Skinny version. We outline their specifications in the following.4

5.3.1 Specification

In all versions, the internal state is represented by a 4 × 4 array of words in Fs2,with s = 4 for n = 64, and s = 8 for n = 128. Therefore, we denote an internal

2In the TWEAKEY paper, the addition of the round constant was not part of the tweakeyschedule. Instead it was part of the cipher’s round functions.

3In the TWEAKEY paper, this transformation is a finite field multiplication with pre-definedelements αi ∈ F2s .

4In contrast to the original design document, we give a slightly different representation thatincludes the round constants within the key schedule.

101

state x ∈ Fs·162 as

x =

x1 x5 x9 x13

x2 x6 x10 x14

x3 x7 x11 x15

x4 x8 x12 x16

, xi ∈ Fs2 .

Note that the initial state is loaded in row-wise manner, i.e., a plaintext m =m1|| . . . ||m16 with mi ∈ Fs2 is mapped to the cipher’s initial state as

m 7→

m1 m2 m3 m4

m5 m6 m7 m8

m9 m10 m11 m12

m13 m14 m15 m16

.

The design of the key schedule of the Skinny family follows the TWEAKEYframework, more precisely the STK construction. As already mentioned, eachSkinny-n family naturally comes in three tweakey length versions, i.e., n, 2n, or3n. Therefore, the tweakey state will be represented as a collection of z distinct4 × 4 arrays of words in Fs2. We denote these arrays TK(1) when z = 1, TK(1)

and TK(2) when z = 2, and TK(1), TK(2) and TK(3) when z = 3. Moreover, bySK, we denote the single-key adversary model and by TK1, TK2 or TK3, wedenote the adversary model in which the attacker can introduce differences in theparticular tweakey states.

The cipher gets an initial key input k = ξ1||ξ2|| . . . ||ξ16z with ξi ∈ Fs2. Theinitialization of the cipher’s initial tweakey state is then done in a row-wise mannerby simply setting, for all j ∈ {1, . . . , z},

TK(j) =

ξ16(j−1)+1 ξ16(j−1)+2 ξ16(j−1)+3 ξ16(j−1)+4

ξ16(j−1)+5 ξ16(j−1)+6 ξ16(j−1)+7 ξ16(j−1)+8

ξ16(j−1)+9 ξ16(j−1)+10 ξ16(j−1)+11 ξ16(j−1)+12

ξ16(j−1)+13 ξ16(j−1)+14 ξ16(j−1)+15 ξ16(j−1)+16

.

Note that, whenever parts of the key material is dedicated for a public tweak,the user must ensure that the length of the actual secret key is always at least asbig as the block length n.

The Round Function

Table 5.1 specifies the number of rounds t that are applied in the particular Skinnyversion. All of the rounds Ri are defined by four operations, i.e., SSbs , Permutep,MixM , and a round key addition, as

Ri : Fn2 × Fn2 → Fn2(x, ki) 7→ MixM ◦ Permutep ◦Addki ◦SSbs(x) .

Those operations are defined as follows, depending on the block length n.

102

Table 5.1: Number of rounds t for Skinny-n-κ.

Tweakey length κ

Block length n n 2n 3n

64 t = 32 t = 36 t = 40

128 t = 40 t = 48 t = 56

SSbs (SubCells). For the versions with n = 64, the 4-bit S-box Sb4 : F42 → F4

2 asgiven in Table 5.2 is applied to every word of the cipher’s internal state. Itsconstruction is similar to the S-box employed in the lightweight block cipherPiccolo [SIH+11].

Table 5.2: The 4-bit S-box Sb4 employed in the Skinny-64 versions.

x 0 1 2 3 4 5 6 7 8 9 A B C D E F

Sb4(x) C 6 9 0 1 A 2 B 3 8 5 D 4 E 7 F

Note that Sb4 can also be described by using four bit-wise NOR operations(i.e., x1 ∨ x2 for x1, x2 ∈ F2) and four bit-wise XOR operations (i.e., additionin F2). If (x3, x2, x1, x0) ∈ F4

2 represents the S-box input, one simply appliesthe transformation

(x3, x2, x1, x0) 7→ ((x3, x2, x1, x0 + (x3 ∨ x2)) ≪ 1) ,

iterated four times, except that the last iteration omits the cyclic rotationby one bit.

For the versions with n = 128, the 8-bit S-box Sb8 : F82 → F8

2 as given in Ta-ble 5.3 is applied to every word of the cipher’s internal state. It is constructedin a similar way as Sb4 described above. In particular, if (x7, x6, . . . , x0) ∈ F8

2

represents the S-box input, Sb8 applies the transformation

(x7, x6, . . . , x0) 7→ (x2, x1, x7, x6, x4 + (x7 ∨ x6), x0 + (x3 ∨ x2), x3, x5) ,

which is iterated three times. Then, as a final step, the transformation

(x7, x6, . . . , x0) 7→ (x7, x6, x5, x4 + (x7 ∨ x6), x3, x1, x2, x0 + (x3 ∨ x2))

is applied.

Addki applies the addition of the round key ki to the internal state x. How thoseround keys are derived from the initial tweakey by the tweakey schedule isdescribed below.

103

Table 5.3: The S-box Sb8 used in the Skinny-128 versions. For each (hexadecimal)value of x and y, the table shows Sb8(x||y) as a hexadecimal value. For instance,Sb8(A4) = 19.

y

x 0 1 2 3 4 5 6 7 8 9 A B C D E F

0 65 4C 6A 42 4B 63 43 6B 55 75 5A 7A 53 73 5B 7B

1 35 8C 3A 81 89 33 80 3B 95 25 98 2A 90 23 99 2B

2 E5 CC E8 C1 C9 E0 C0 E9 D5 F5 D8 F8 D0 F0 D9 F9

3 A5 1C A8 12 1B A0 13 A9 05 B5 0A B8 03 B0 0B B9

4 32 88 3C 85 8D 34 84 3D 91 22 9C 2C 94 24 9D 2D

5 62 4A 6C 45 4D 64 44 6D 52 72 5C 7C 54 74 5D 7D

6 A1 1A AC 15 1D A4 14 AD 02 B1 0C BC 04 B4 0D BD

7 E1 C8 EC C5 CD E4 C4 ED D1 F1 DC FC D4 F4 DD FD

8 36 8E 38 82 8B 30 83 39 96 26 9A 28 93 20 9B 29

9 66 4E 68 41 49 60 40 69 56 76 58 78 50 70 59 79

A A6 1E AA 11 19 A3 10 AB 06 B6 08 BA 00 B3 09 BB

B E6 CE EA C2 CB E3 C3 EB D6 F6 DA FA D3 F3 DB FB

C 31 8A 3E 86 8F 37 87 3F 92 21 9E 2E 97 27 9F 2F

D 61 48 6E 46 4F 67 47 6F 51 71 5E 7E 57 77 5F 7F

E A2 18 AE 16 1F A7 17 AF 01 B2 0E BE 07 B7 0F BF

F E2 CA EE C6 CF E7 C7 EF D2 F2 DE FE D7 F7 DF FF

104

Permutep (ShiftRows) operates as a permutation of the words of the state. Similarto the AES, this operation cyclically rotates the words within the rows of thestate by 0, 1, 2, and 3 positions, respectively. However, the rotations are tothe right, instead of to the left. Formally,

x1 x5 x9 x13

x2 x6 x10 x14

x3 x7 x11 x15

x4 x8 x12 x16

7→x1 x5 x9 x13

x14 x2 x6 x10

x11 x15 x3 x7

x8 x12 x16 x4

,

which corresponds to the permutation of indices

p =

(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 14 11 8 5 2 15 12 9 6 3 16 13 10 7 4

).

MixM (MixColumns). Each column of the state is multiplied by the matrix

M =

1 0 1 1

1 0 0 0

0 1 1 0

1 0 1 0

∈ GL4(F2s) .

The final value of the internal state array (after the last MixM operation)then provides the ciphertext with the state words being mapped to the ciphertextrow by row. Test vectors for Skinny can be found in [BJK+16b]. Note thatall components of the cipher have very simple inverses, thus decryption can bedescribed and implemented in a similar way than encryption.

The Tweakey Schedule

The key-scheduling algorithm for all versions with z ∈ {1, 2, 3} is given as Algo-rithm 5.2. It shows how the tweakey states TK(i) are updated and the particularround keys are extracted. For the update of the tweakey states, the key scheduleapplies the functions PermutepT , LFSRupdate(2)

s , and LFSRupdate(3)s , defined as

follows:

PermutepT operates on a 4× 4 state TK(i) as a permutation of the s-bit words asTK

(i)1 TK

(i)5 TK

(i)9 TK

(i)13

TK(i)2 TK

(i)6 TK

(i)10 TK

(i)14

TK(i)3 TK

(i)7 TK

(i)11 TK

(i)15

TK(i)4 TK

(i)8 TK

(i)12 TK

(i)16

7→TK

(i)7 TK

(i)16 TK

(i)3 TK

(i)8

TK(i)11 TK

(i)12 TK

(i)4 TK

(i)15

TK(i)1 TK

(i)5 TK

(i)9 TK

(i)13

TK(i)2 TK

(i)6 TK

(i)10 TK

(i)14

105

This corresponds to the permutation of indices

pT =

(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

7 11 1 2 16 12 5 6 3 4 9 10 8 15 13 14

).

LFSRupdate(2)s transforms each word in the first two rows of the tweakey state

TK(2) according to an LFSR operation. In particular, let us denote the j-th

word of that tweakey state by TK(2)j = (xj3 , xj2 , xj1 , xj0) if s = 4, and by

TK(2)j = (xj7 , . . . , xj1 , xj0) if s = 8, respectively. Then,

LFSRupdate(2)4 : TK(2) 7→ TK(2)

where, for each j ∈ {1, 2, 5, 6, 9, 10, 13, 14}, TK(2)j = (xj2 , xj1 , xj0 , xj3 + xj2),

and, for each other index j, TK(2)j = TK

(2)j . If s = 8, we have


where, for each j ∈ {1, 2, 5, 6, 9, 10, 13, 14},

TK(2)j = (xj6 , xj5 , xj4 , xj3 , xj2 , xj1 , xj0 , xj7 + xj5) ,


(2)j .

LFSRupdate(3)s transforms each word in the first two rows of the tweakey state

TK(3) according to an LFSR operation (another one as in LFSRupdate(2)s ).

In particular, let us denote the j-th word of that tweakey state by TK(3)j =

(xj3 , xj2 , xj1 , xj0) if s = 4, and by TK(3)j = (xj7 , . . . , xj1 , xj0) if s = 8,

respectively. Then,


where, for each j ∈ {1, 2, 5, 6, 9, 10, 13, 14}, TK(3)j = (xj0 + xj3 , xj3 , xj2 , xj1),


(3)j . If s = 8, we have


where, for each j ∈ {1, 2, 5, 6, 9, 10, 13, 14},

TK(3)j = (xj0 + xj6 , xj7 , xj6 , xj5 , xj4 , xj3 , xj2 , xj1) ,


(3)j .

106

Table 5.4: The Round Constants used in Skinny.

Round r

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

ar 1 3 7 F F E D B 7 F E C 9 3 7 E

br 0 0 0 0 1 3 3 3 3 2 1 3 3 3 2 0

17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32

ar D A 5 B 6 C 8 0 1 2 5 B 7 E C 8

br 1 3 3 2 1 2 1 3 2 0 0 0 1 2 1 3

33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48

ar 1 3 6 D B 6 D A 4 9 2 4 8 1 2 4

br 3 2 0 0 1 3 2 1 3 2 1 2 0 1 2 0

49 50 51 52 53 54 55 56

ar 9 3 6 C 9 2 5 A

br 0 1 2 0 1 3 2 0

The Round Constants ar and br, that are added to the round keys, are givenin Table 5.4. For a lightweight implementation, those round constants canbe generated by by a 6-bit affine LFSR as follows. Denote the LFSR stateby (x5, x4, x3, x2, x1, x0). In the first place, it is initialized by (0, 0, 0, 0, 0, 0)and updated in every round according to

(x5, x4, x3, x2, x1, x0) 7→ (x4, x3, x2, x1, x0, x5 + x4 + 1) .

Then, in every round r, the round constants are taken as ar = (x3, x2, x1, x0)and br = (0, 0, x5, x4) if s = 4. If s = 8, the constants are padded byzeros in the most significant bits, i.e., ar = (0, 0, 0, 0, x3, x2, x1, x0) and br =(0, 0, 0, 0, 0, 0, x5, x4).

View as an AES-like Cipher

Although the round keys are added in between the rounds, it is important to notethat the definition of the round Ri can be re-written as

(x, ki) 7→ AddMixM◦Permutep(ki) ◦MixM ◦ Permutep ◦ SSbs(x)

and thus, all Skinny versions can be defined in the notion of a key-alternating AES-like cipher, as explained in Section 2.4.2. Then, their unkeyed round functionsfollow the structure of an AES-like round RSbs,p,M (Definition 2.9).

107

Algorithm 5.2 The Key Schedule of Skinny

procedure KeySchedule(TK(1))for r = 1, . . . , t do

kr←

TK

(1)1 TK

(1)5 TK

(1)9 TK

(1)13

TK(1)2 TK

(1)6 TK

(1)10 TK

(1)14

0 0 0 0

0 0 0 0

+

ar 0 0 0

br 0 0 0

2 0 0 0

0 0 0 0

TK(1) ← PermutepT (TK(1))

end forend procedure

procedure KeySchedule(TK(1), TK(2))for r = 1, . . . , t do

kr←

TK

(1)1 TK

(1)5 TK

(1)9 TK

(1)13

TK(1)2 TK

(1)6 TK

(1)10 TK

(1)14

0 0 0 0

0 0 0 0

+

TK

(2)1 TK

(2)5 TK

(2)9 TK

(2)13

TK(2)2 TK

(2)6 TK

(2)10 TK

(2)14

0 0 0 0

0 0 0 0

+

ar 0 0 0

br 0 0 0

2 0 0 0

0 0 0 0

TK(1) ← PermutepT (TK(1))TK(2) ← PermutepT (TK(2))

TK(2) ← LFSRupdate(2)s (TK(2))


108

procedure KeySchedule(TK(1), TK(2), TK(3))for r = 1, . . . , t do

kr←

TK

(1)1 TK

(1)5 TK

(1)9 TK

(1)13

TK(1)2 TK

(1)6 TK

(1)10 TK

(1)14

0 0 0 0

0 0 0 0

+

TK

(2)1 TK

(2)5 TK

(2)9 TK

(2)13

TK(2)2 TK

(2)6 TK

(2)10 TK

(2)14

0 0 0 0

0 0 0 0

+

TK

(3)1 TK

(3)5 TK

(3)9 TK

(3)13

TK(3)2 TK

(3)6 TK

(3)10 TK

(3)14

0 0 0 0

0 0 0 0

+

ar 0 0 0

br 0 0 0

2 0 0 0

0 0 0 0

TK(1) ← PermutepT (TK(1))TK(2) ← PermutepT (TK(2))TK(3) ← PermutepT (TK(3))




5.3.2 Design Rationale

In order to understand the motivation behind the particular construction of Skinny,we briefly recall some of the design considerations; both with regard to securityand efficiency. For more details, we refer to [BJK+16b, Section 3].

Several design choices are inspired by already existing constructions, e.g., theTWEAKEY framework and the building blocks of AES-like ciphers. However,all of the components employed in Skinny are optimized for lightweight purposes.Although the design performs well in most lightweight scenarios, a special fo-cus was on minimizing the area required for round-based hardware implementa-tions. The main competitor with regard to this metric is Simon. Indeed, theround-based ASIC implementations of Skinny-64-128, resp. Skinny-128-256, ob-tained in [BJK+16b] slightly outperform the implementations of Simon-64-128,resp. Simon-128-256, given in [BSS+15] with regard to area requirements. In ad-dition, Skinny-128-128 performs very well compared to its Simon equivalent (seeFigure 5.5 for details).5

While all the components employed in Skinny are optimized for reducing area,an important criteria was that the round function itself still preserves a significantamount of cryptographic strength. Otherwise, the total number of rounds to

5In [BJK+16b], results for several other implementations, i.e., unrolled implementations, serialimplementations, as well as FPGA and software implementations, are given. Also, a thresholdimplementation for protection against side-channel attacks was realized.

109

Table 5.5: Round-based ASIC implementations of the Skinny-64 and the Skinny-128 versions and comparison to Simon. (Cell library: UMC L180 0.18 µm forSkinny and IBM 8RF 130 nm for Simon)

Area Delay Clock Throughput Ref.

Cycles @100KHz @maximum

GE ns # KBit/s MBit/s

Skinny-64-64 1223 1.77 32 200.00 1130.00 [BJK+16b]

Skinny-64-128 1696 1.87 36 177.78 951.11 [BJK+16b]

Skinny-64-192 2183 2.02 40 160.00 792.00 [BJK+16b]

Skinny-128-128 2391 2.89 40 320.00 1107.20 [BJK+16b]

Skinny-128-256 3312 2.89 48 266.67 922.67 [BJK+16b]

Skinny-128-384 4268 2.89 56 228.57 790.86 [BJK+16b]

Simon-64-128 1751 1.60 46 145.45 870.00 [BSS+15]

Simon-128-128 2342 1.60 70 188.24 1145.00 [BSS+15]

Simon-128-256 3419 1.60 74 177.78 1081.00 [BSS+15]

be applied has to become extremely large in order to guarantee security of thecipher using standard arguments. In other words, the task was to find the exactspot for which we get optimal performance along with strong security arguments(in our design measured by the minimum number of active S-boxes). This goalwas aimed to achieve by an iterative design process. In a nutshell, the intuitionbehind our final design choice is that removing any operation from the cipherwould lead to much weaker design that significantly worsens the trade off betweenimplementation efficiency and throughput (i.e., the number of rounds that haveto be applied).

The General Structure

We chose to design Skinny as an SP cipher since there are well-known solutions forevaluating the security, i.e., by computing the minimum number of active S-boxes.Moreover, such a structure allows for designing the n = 64 and n = 128 version ofthe cipher in almost the same way by only changing the word length s and using adifferent S-box. Further, fixing a particular version Skinny-n-κ, we wanted to keepthe definition of all the rounds exactly the same. For instance, this means thatno whitening key is applied, or that we avoid the usage of a different linear layerin the last round (as opposed to the AES, where the last MixColumns operationis omitted). This simplifies the overall description of the cipher and reduces itsimplementation overhead. However, this design decision implies that parts of thefirst and the last round do not contribute to the security of the cipher as the

110

adversary can simply invert all operations until the first (resp. last) key addition.

We further chose the STK construction of the TWEAKEY framework for in-troducing the tweakey input. This allows to design Skinny as a flexible tweakableblock cipher with the benefit of obtaining security arguments against related-keydifferential attacks, using automatic tools like MILP. We explain the MILP modelfor computing the minimum number of active S-boxes in the TK1 adversary modelin detail in Section 5.3.3. For a detailed description of the MILP approach for com-puting bounds in TK2 and TK3, we refer to [BJK+16b].

SubCells

For the choice of the S-boxes Sb4 and Sb8 to be applied in the non-linear layer,we required that pSb4 , pSb8 ≤ 2−2 and cSb4 , cSb8 ≤ 2−1. An automatic search wasconducted in order to find an S-box construction with a limited implementationcost and meeting the above requirements on the cryptographic properties. Thefinal choice for the S-box Sb4 has an algebraic degree of three, the final choice ofSb8 has an algebraic degree of six. The two choices both meet the required boundson the maximum differential probability and maximum absolute correlation byequality.

The Round Constants

The purpose of round constants is to differentiate the particular rounds and avoidsymmetries. For instance, a bad choice of round constants may cause weaknesseswith regard to invariant attacks. In Chapter 6, we study these attacks in moredetail and actually show that Skinny-64-64 can be proven secure against a largeclass of invariant attacks. For avoiding to directly store the particular roundconstants as a look-up table, a lightweight LFSR is used for their generation.

The Tweakey Update Function and the Round Key Addition

In contrast to the STK construction as originally proposed in [JNP14], the wordsof the tweakey states are updated by affine LFSRs instead by applying finite fieldmultiplications, mainly for reducing hardware area. But as the most importantdifference, the round keys are only extracted from the first two rows of the tweakeystates, i.e., the third and fourth row of each round key array (except of the roundconstant 2 in the word in the first column of the third row) consist of only zeros.This saves the implementation of the XOR operations for adding the last tworows of the round key arrays. As only half of the words of each tweakey state areextracted in every round, only those halves are updates by the LFSR operations.Special care has been taken by choosing the actual LFSRs in order to guaranteea high number of rounds before cancellations of differences in TK2 and TK3 canhappen. In particular, and due to the fact that only half of the state is affected bythe LFSRs in every round, for a fixed word index, a single cancellation can only

111

happen every 30 rounds for TK2 and two cancellations can only happen every 30rounds for TK3.

The tweakey permutation pT has been chosen to maximize the bounds on thenumber of active S-boxes in the related-key model (in the SK model, it has noimpact). Additionally, we have enforced the special property of pT that all wordsin the lower half of the state are permuted to the upper half and vice versa.Since only the first and the second row of the tweakey states are added to thecipher’s internal state, this ensures that all words of the tweakey states will beadded (almost) equally often to the cipher’s internal state. On top of that, weonly considered those variants of pT that leave the words in the upper half of thetweakey state at the same relative position in the lower half and that consist of asingle cycle.

The Linear Layer

In order to compete with the performance of Simon with regard to hardware im-plementations, we extremely tailored the AES-like linear layer used in Skinny andchose a very sparse matrix M for the MixM operation. In particular, M only con-sists of the coefficients 0, 1 ∈ F2s . It is sparser than the matrix employed in Midoriand has a differential and linear branch number with respect to s-bit words of onlytwo. This may look suspicious at a first glance and one certainly has to considerthe existence of differential and linear trails with only a single active S-box perround. However, we designed M in a way that whenever a branching transitionwith only two active S-boxes occurs, the next transition will likely lead to a muchhigher number of active S-boxes. For instance, when looking at the definition ofM , the only way to get a branching transition with two active S-boxes is to havea non-zero input difference in either the second or the fourth component. But thisnecessarily leads to a non-zero input difference in the first or third component inthe next round, which then diffuses to many output positions. A (multiple-round)differential trail with a single active S-box per round is therefore not possible.Actually, one can prove at least 96 active S-boxes over 20 rounds using the MILPtool (see Table 5.6 for the actual bounds). Similar observations can be made for

linear trails, i.e., by considering M>−1

.

We have considered all possibilities for M that can be implemented with atmost three XOR operations and kept those matrices that, in combination withPermutep, guaranteed fast diffusion and led to strong bounds on the minimumnumber of active S-boxes in the SK model. Here, by full diffusion we refer tothe number of rounds needed such that every bit of the internal state depends onevery input bit. In all Skinny versions, six rounds are needed (both in the forwarddirection and for the inverse cipher) to guarantee full diffusion. Section 5.3.4outlines how that diffusion was evaluated.

As a last criterion on M , we required that the round key input affects thewhole internal state of the cipher as fast as possible. This is in particular crucialas only half of the state is added with non-zero key material in every round. Our

112

Table 5.6: Lower bounds on the minimum number of active S-boxes in Skinny.The numbers for SK Lin correspond to the active S-boxes in linear trails according

to M>−1

. In cases where solving the particular MILP instance did not finish intime, upper bounds are given between parentheses. 64 active S-boxes are requiredin order to avoid differential and linear distinguishers based on a single trail (it isp64Sbs

= 2−128 ≤ 2−n and c64Sbs

= 2−128 ≤ 2−n).

Model 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

SK 1 2 5 8 12 16 26 36 41 46 51 55 58 61 66

TK1 0 0 1 2 3 6 10 13 16 23 32 38 41 45 49

TK2 0 0 0 0 1 2 3 6 9 12 16 21 25 31 35

TK3 0 0 0 0 0 0 1 2 3 6 10 13 16 19 24

SK Lin 1 2 5 8 13 19 25 32 38 43 48 52 55 58 64

Model 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30

SK 75 82 88 92 96 102 108 (114) (116) (124) (132) (138) (136) (148) (158)

TK1 54 59 62 66 70 75 79 83 85 88 95 102 (108) (112) (120)

TK2 40 43 47 52 57 59 64 67 72 75 82 85 88 92 96

TK3 27 31 35 43 45 48 51 55 58 60 65 72 77 81 85

SK Lin 70 76 80 85 90 96 102 107 (110) (118) (122) (128) (136) (141) (143)

final choice of M is optimal with respect to full key diffusion in the sense thatonly a single round is required (for both the case of encryption and decryption) toensure full round key diffusion over the internal state.

5.3.3 The MILP Model for Computing Active S-Boxes

We used the MILP approach based on the framework of Mouha et al. [MWGP12]for computing the minimum number of active S-boxes in SK (with regard todifferential and linear trails) and TK1 (with regard to differential trails). In ourMILP model, we need the following decision variables that can take values in{0, 1} ⊆ Z.

• xi,j,r, with i, j ∈ N0<4, r ∈ N0

<t+1, for indicating the activity pattern at theS-box inputs. In particular, we are going to define the constraints such thatxi,j,r = 1 if and only if the S-box at position i+ 4j + 1 of the input state inround r is active (here, the round index is starting from 0). The objectivefunction that we want to minimize will then be defined as∑

i,j∈N0<4

∑r∈N0

<t

xi,j,r .

• yi,j,r, with i, j ∈ N0<4, r ∈ N0

<t, for indicating the activity pattern (withrespect to s-bit words) of the internal state after the addition of the roundkeys.

113

• ξi,j , with i, j ∈ N0<4, for indicating the activity pattern (with respect to s-bit

words) of the initial tweakey state TK(1).

• For modelling the branching transitions, we need two sets of auxiliary vari-ables, i.e., d⊕i,j,r, with i ∈ N0

<2, j ∈ N0<4, r ∈ N0

<t, for modelling the round key

addition layer (only needed for TK1) and dj,r, d′j,r, d

′′j,r, with j ∈ N0

<4, r ∈N0<t, for the MixM layer.

As the round key addition and the MixM layer only consist of word-wise XORoperations, the main building blocks of the model are the linear constraints fordefining the branching transitions over those word-wise XOR operations. Forshorter notations, we define the following sets.

Constraints for XOR. Let i1, i2, o, d be decision variables that can take valuesin {0, 1} ⊆ Z. We denote by C⊕[i1, i2, o, d] the set of linear constraints

{i1 ≤ d} ∪ {i2 ≤ d} ∪ {o ≤ d} ∪ {i1 + i2 + o ≥ 2d} .

All of the constraints in C⊕[i1, i2, o, d] are fulfilled if and only if

(i1, i2, o, d) ∈ {(0, 0, 0, 0), (0, 1, 1, 1), (1, 0, 1, 1), (1, 1, 0, 1), (1, 1, 1, 1)} .

The crucial observation is that, if x1 is an s-bit word with activity i1 and x2 is ans-bit word with activity i2, the activity o of the word x1 + x2 fulfills

(i) o = 0 if both i1 = i2 = 0,

(ii) o = 1 if (i1, i2) ∈ {(0, 1), (1, 0)}, and

(iii) o ∈ {0, 1} if i1 = i2 = 1.

Those properties are modelled by the above linear constraints. The auxiliaryvariable d just indicates whether at least one input x1 or x2 is active.

Constraints for the Linear Mixing. Let i1 . . . , i4, o1, . . . , o4, d1, d2, d3 be deci-sion variables that can take values in {0, 1} ⊆ Z.

By CM [i1, i2, i3, i4, o1, o2, o3, o4, d1, d2, d3] we denote the set of linear constraints

C⊕[i1, i3, o4, d1] ∪ C⊕[o4, i4, o1, d2] ∪ C⊕[i2, i3, o3, d3] ∪ {o2 = i1} .

Those constraints then model the (word-wise) differential branching transitions ofthe transformation

x1

x2

x3

x4

7→

1 0 1 1

1 0 0 0

0 1 1 0

1 0 1 0

x1

x2

x3

x4

=

x1 + x3 + x4

x1

x2 + x3

x1 + x3

,

114

where the xi are s-bit words.6

One now obtains a lower bound on the minimum number of active S-boxes overr rounds in SK by solving the following MILP instance:

Minimize ∑i,j∈N0

<4

∑r∈N0

<t

xi,j,r

Subject to:

1. Excluding the trivial solution with zero active S-boxes

{∑i,j∈N0

<4xi,j,0 ≥ 1}

2. Application of the linear layer⋃r∈N0

<t

∪j∈N0<4

CM [x0,j,r, x1,j−1 mod 4,r, x2,j−2 mod 4,r, x3,j−3 mod 4,r,

x0,j,r+1, x1,j,r+1, x2,j,r+1, x3,j,r+1, dj,r, d′j,r, d

′′j,r]

For TK1, we have to optimize the following MILP model:

Minimize ∑i,j∈N0

<4

∑r∈N0

<t

xi,j,r

Subject to

1. Excluding the trivial solution

{∑i,j∈N0

<4xi,j,0 + ξi,j ≥ 1}

2. Application of the tweakey addition to half of the state – Here, the tweakey statepermutation pT is denoted as a permutation on the indices (i, j) ∈ N0

<4 × N0<4⋃

r∈N0<t

∪i∈{0,1} ∪j∈N0<4C⊕[xi,j,r, ξpr

T(i,j), yi,j,r, d

⊕i,j,r] ∪

∪i∈{2,3} ∪j∈N0<4{yi,j,r = xi,j,r}

3. Application of the linear layer⋃r∈N0

<t

∪j∈N0<4

CM [y0,j,r, y1,j−1 mod 4,r, y2,j−2 mod 4,r, y3,j−3 mod 4,r,

x0,j,r+1, x1,j,r+1, x2,j,r+1, x3,j,r+1, dj,r, d′j,r, d

′′j,r]

For all of the number of rounds r that are given in Table 5.6, the accordingMILP instances were solved using Gurobi [GO16].

6For computing bounds on the minimum number of active S-boxes in linear trails in the

single-key model, one has to model the branching transitions of M>−1

. For that, one employsanother auxiliary variable d and defines C

M>−1 [i1, i2, i3, i4, o1, o2, o3, o4, d, d1, d2, d3] as the setof constraints C⊕[i2, i3, d, d1] ∪ C⊕[i1, d, o2, d2] ∪ C⊕[i4, d, o4, d3] ∪ {o1 = i4} ∪ {o3 = i2}.

115

On The Tightness of the MILP Bounds

The solution of the minimization problems defined above determines a lower boundon the number of active S-boxes in any (non-trivial) t-round trail in the SK,resp. TK1 case. If we consider the word-wise application of the S-box as a blackbox, then all of the computed bounds for SK are tight in the sense that one canconstruct a valid differential trail for a specific choice of S-boxes. In other words,the bound is tight if the S-box can be chosen independently for every word andevery round. This is less clear in the related-key scenario and therefore, we onlyclaim lower bounds. The actual minimum number of active S-boxes might be evenhigher.

5.3.4 Diffusion Test

When we analyze the diffusion properties in a cipher, we evaluate the minimumnumber of rounds r such that every bit of the internal state after the applicationof r rounds depends on every input bit. For an SP cipher like Skinny, the diffusionproperties depend both on the linear layer and on the S-box. To formally definewhat full diffusion means and to evaluate the diffusion properties in Skinny withblock length n = 16s in particular, we first define the diffusion matrix Ds asdescribed in the following.

Since the linear layer in Skinny consists of a word permutation and a MixMoperation with a binary 4 × 4 matrix M , one can represent the linear layer as abinary matrix L ∈ GL16(F2s). In particular,

L =

1 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0

1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0

1 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0

0 0 0 0 1 0 0 0 0 0 0 1 0 0 1 0

0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0

0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0

0 0 0 0 1 0 0 0 0 0 0 0 0 0 1 0

0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 1

0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0

0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0

0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0

0 0 0 1 0 0 1 0 0 0 0 0 1 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0

0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0

0 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0

.

Furthermore, for an S-box Sb, we define the dependency matrix Dep(Sb) ∈Mats(Z) by

Dep(Sb)i,j =

{1 if ∃x : Sbi(x) 6= Sbi(x+ ej)

0 else.

116

Thereby, Sbi denotes the i-th coordinate function and ej the j-th unit vector. Forthe Skinny S-boxes Sb4 and Sb8, we have

Dep(Sb4) =

1 1 1 1

1 1 1 1

0 1 1 1

1 0 1 1

, Dep(Sb8) =

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

0 1 1 0 0 0 1 0

1 1 1 1 0 0 0 0

1 0 1 1 1 1 1 1

1 0 1 1 0 0 0 0

0 0 0 0 1 0 1 1

1 0 1 1 1 1 1 1

.

Now, we can define the Skinny diffusion matrix Ds for s ∈ {4, 8} as a blockmatrix Ds ∈ Mat16(Mats(Z)) by

Dsi,j =

{Dep(Sbs) if Li,j = 1

0s if Li,j = 0.

One can now define full diffusion as follows.

Definition 5.6. The cipher achieves full diffusion after r rounds, if Drs contains

no zero coefficient when Ds is interpreted as a 16s× 16s matrix over the integers.

For the Skinny-64 and Skinny-128 versions, we made sure that full diffusion isachieved after 6 rounds, both in forward direction and for the inverse. Note thatthe diffusion matrix of the inverse has to be computed separately.

5.3.5 Security Claim and Best Cryptanalysis so far

The security claim for Skinny is resistance against related-key/related-tweak at-tacks. The particular design according to the TWEAKEY framework, in particularthe possibility of dedicating some key material to a public tweak input, should al-low to use Skinny in scenarios where both related-key and related-tweak security isneeded. We emphasize that, in cases where related-key security is not needed, onecould also use all the tweakey input as a secret key and then XOR a public tweakto the key. From Theorem 5.1, one would directly obtain provable chosen-tweaksecurity, but at the price of sacrificing related-key security.

In the design paper, various cryptanalysis on Skinny was already conducted bythe authors. After the publication of the design, several third-party cryptanalysisfollowed, see [SMB16, ABC+17, LGS17, TAY17] for a selection. In Table 5.7, forall Skinny versions, we give the maximum number of rounds for which the round-reduced cipher can be broken by the best published attacks so far. Here, the term“broken” refers to a key-recovery attack (in the related-tweakey model) with atime complexity below 2κ encryption operations7 and a data complexity below2n. Those best attacks are based on impossible differential [Knu98, BBS05] andrectangle [BDK01] distinguishers. These results indicate that there is still a hugesecurity margin left.

7We exclude attacks that are based on accelerated brute force.

117

Table 5.7: Number of rounds of Skinny that are broken by the best key-recoveryattacks so far, published in [LGS17]. All those key-recovery attacks make use ofa distinguisher over a smaller number of rounds. The red percentage values showthe ratio between number of rounds broken and the total number of rounds t.

Tweakey length κ

Block length n n 2n 3n

64 19/32 23/36 27/40

59.4% 63.9% 67.5%

128 19/40 23/48 27/56

47.5% 47.9% 48.2%

5.4 The Mantis Family of Low-Latency TweakableBlock Ciphers

In this section, we give a tweakable block cipher design which is optimized forlow-latency implementations.8

The existing low-latency block cipher Prince [BCG+12] already provides a verygood starting point for a low-latency design. Its round function basically followsthe AES-like structure, employing a MixM operation of branch number four. Themain difference between the overall structure of Prince and AES (and actually allother key-alternating ciphers we have already considered) is that the design issymmetric around a linear layer in the middle. This allows to realize what wasdefined as α-reflection, i.e., the decryption Ek

−1 under a key k basically correspondsto encryption with a related key k+α, where α is a fixed constant. A natural wayof turning a Prince-like design into a tweakable block cipher is to define a tweakschedule and evaluate the number of rounds until the minimum number of activeS-boxes (in the related-tweak model) is high enough.

However, the problem is that the latency of a cipher is directly related to thenumber of rounds. Thus, it is crucial to find a design that ensures security alreadywith a low number of rounds. Here, components of the block cipher Midori turnout to be beneficial. As outlined already before, one of the key observations inMidori was that deviating from the ShiftRows operation used in the AES allowsto significantly improve upon the number of active S-boxes (in the single keymodel) if a MixM layer with a branch number of only four is used. Moreover,the designers of Midori designed a 4-bit S-box that was optimized with respect tocircuit-depth. This directly leads to an improved version of Prince itself: Simply

8We acknowledge the contribution of Roberto Avanzi to the design of Mantis. He first sug-gested us to combine Prince with the TWEAKEY framework, and also to modify the latter bypermuting the tweak independently from the key, in order to save on the field multiplications ofthe tweak words. He then brainstormed with us on early versions of the design.

118

replace the round function by the function of Midori while keeping the entiredesign symmetric around the middle in order to preserve the α-reflection property.This simple change would result in a cipher with improved latency and improvedsecurity (measured by number of active S-boxes) compared to Prince. It is actuallyexactly this Prince-like Midori that we use as a starting point for designing thelow-latency block cipher Mantis. The final step in the design of Mantis was todesign a suitable tweak-scheduling algorithm that would guarantee a high numberof active S-boxes in the setting where the attacker can control the difference inthe tweak. Using again the MILP approach (see Section 5.4.5), we are able todemonstrate that only a slight increase in the number of rounds compared toPrince is already sufficient to get confidence in the resistance against differentialattacks in the related-tweak model. It is important to note that we now makea distinction between related-key and related-tweak attacks. In the former, theadversary is allowed to insert differences in the key input, while in the latter case,the adversary is only allowed to insert differences in the tweak input. We have tomake this distinction as Mantis is certainly not secure in the related-key settingbecause of the α-reflection property.

R1 R2 R3 R4 R5 R6 S M S R−16 R−1

5 R−14 R−1

3 R−12 R−1

1

Pσ Pσ Pσ Pσ Pσ Pσ P−1σ P−1

σ P−1σ P−1

σ P−1σ P−1

σ

ξξ ξ ξ ξ ξ ξ ξ ξ ξ ξ ξ ξ ξζ

x

h

y

ζ′

Figure 5.1: Illustration of Mantis6. Here, Pσ is a short notation for Permuteσ,M a short notation for MixM , and ξ := ξ + α.

5.4.1 Specification

Mantis is a family of tweakable block ciphers that comes in different versions, i.e.,

Mantist : Fn2 × Fκ2 × Fτ2 → Fn2

with a block length of n = 64, a key length of κ = 128, and a tweak length ofτ = 64. The only difference in the versions, parametrized by a natural numbert, is the number of rounds. In particular, as Mantist is defined as a reflectioncipher, the parameter t specifies the number of rounds of one half of the cipher.The overall design structure is illustrated in Figure 5.1. Compared to Skinny,we have to distinguish tweak and key input. The reason is that, because of thedesign structure as a reflection cipher, Mantis cannot resist related-key attacks,but only related-tweak attacks. By related-tweak (differential) attacks, we refer tothe model in which the adversary is able to insert differences in the plaintext andin the tweak input, but not in the key input.

Similar to the block cipher Prince, the tweakable cipher Mantist is based onthe FX-construction [KR96a] and thus applies whitening keys before and after

119

applying its core components. For that, the 128-bit initial key k is first split intok = ζ||ξ, where ζ, ξ ∈ F64

2 . Then, k is extended to the 192 bit key

(ζ, ζ ′, ξ) = (ζ, (ζ ≫ 1) + (ζ � 63), ξ) ,

and ζ, ζ ′ are used as whitening keys in an FX-construction. The subkey ξ isused as the round key for all of the 2t rounds of Mantist. We decided to stickwith the FX-construction for simplicity, even though other options as described in[BCKL17].

Initialization

In all versions of Mantis, the cipher’s internal states, the states of the keys, andthe tweak state are represented by a 4× 4 array of words in F4

2, respectively. Wedenote an internal state (resp. key or tweak state) x ∈ F4·16

2 as

x =

x1 x5 x9 x13

x2 x6 x10 x14

x3 x7 x11 x15

x4 x8 x12 x16

, xi ∈ F42 .

A plaintext m = m1|| . . . ||m16 ∈ F4·162 is mapped to the cipher’s initial state in

a row-wise manner. Similarly, the initial tweak input h = h1|| . . . ||h16 ∈ F4·162

and all of the key input ζ = ζ1|| . . . ||ζ16 ∈ F4·162 , ζ ′ = ζ ′1|| . . . ||ζ ′16 ∈ F4·16

2 , and,ξ = ξ1|| . . . ||ξ16 ∈ F4·16

2 , are loaded row wise to the initial tweak state, resp. keystates.

The Round Functions

A keyed instance of the round Ri (i.e., Riki = Ri(·, ki)) in Mantist operates on thecipher’s internal state as

MixM ◦ Permutep ◦Addki ◦Addci ◦SSbMid64,

where the constant ci depends on the round index i. In the following, we describeall those components of the rounds.

SSbMid64(SubCells). The involutory S-box SbMid64 as given in Table 2.2 is applied

to every word of the internal state. Using the Midori S-box is beneficial as itis especially optimized for small area and low circuit depth.

Addci (AddConstanti). In round Ri, the i-th round constant ci, as defined below,is added to the internal state. For proving that the round constants are notchosen with intentional weaknesses, they are defined in a similar way as forPrince, i.e., we use the first fractional digits of the base-16 representation ofthe irrational number π to generate those constants (actually the very first

120

digits correspond to α defined below). Note that, in contrast to Prince, theconstants are added row-wise instead of column-wise.

α=

2 4 3 F

6 A 8 8

8 5 A 3

0 8 D 3

, c1 =

1 3 1 9

8 A 2 E

0 3 7 0

7 3 4 4

, c2 =

A 4 0 9

3 8 2 2

2 9 9 F

3 1 D 0

,

c3 =

0 8 2 E

F A 9 8

E C 4 E

6 C 8 9

, c4 =

4 5 2 8

2 1 E 6

3 8 D 0

1 3 7 7

, c5 =

B E 5 4

6 6 C F

3 4 E 9

0 C 6 C

,

c6 =

C 0 A C

2 9 B 7

C 9 7 C

5 0 D D

, c7 =

3 F 8 4

D 5 B 5

B 5 4 7

0 9 1 7

, c8 =

9 2 1 6

D 5 D 9

8 9 7 9

F B 1 B

Permutep (PermuteCells) is equal to the Permutep transformation of Midori-64.To recall, it operates as a permutation of the words of the state as

x1 x5 x9 x13

x2 x6 x10 x14

x3 x7 x11 x15

x4 x8 x12 x16

7→x1 x15 x10 x8

x11 x5 x4 x14

x6 x12 x13 x3

x16 x2 x7 x9

.


p =

(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

1 11 6 16 15 5 12 2 10 4 13 7 8 14 3 9

).

Note that this permutation ensures a higher number of active S-boxes com-pared to the choice made in Prince.

MixM (MixColumns). As in Midori, the involutory matrix

M =

0 1 1 1

1 0 1 1

1 1 0 1

1 1 1 0

∈ GL4(F24)


Encryption

In the following, we define Ht : F642 × F64

2 × F642 → F64

2 as the application of the trounds R1, . . .Rt and one additional SSbMid64

layer. More precisely, each instance is

121

defined as

Ht(·, ξ, h) = SSbMid64◦ Rt(·,Permutetσ(h) + ξ) ◦ · · · ◦ R1(·,Permute1

σ(h) + ξ) ,

where Permuteσ permutes each word of the tweak state h ash1 h5 h9 h13

h2 h6 h10 h14

h3 h7 h11 h15

h4 h8 h12 h16

7→h10 h6 h12 h16

h1 h5 h9 h13

h14 h4 h8 h2

h3 h7 h11 h15

.


σ =

(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

10 1 14 3 6 5 4 7 12 9 8 11 16 13 2 15

).

With this notation, we can define the tweakable block cipher Mantist : F642 ×

F1282 × F64

2 → F642 by giving each instance as

Mantist(·, k, h) = Addζ′+ξ+α+h ◦Ht−1(·, ξ + α, h) ◦MixM ◦ Ht(·, ξ, h) ◦Addζ+ξ+h .

Decryption

Because of the α-reflection property, it is

Mantist−1(·, k, h) = Addζ+ξ+h ◦Ht

−1(·, ξ, h) ◦MixM ◦ Ht(·, ξ + α, h) ◦Addζ′+ξ+α+h .

Test vectors for Mantist, t ∈ {5, 6, 7, 8}, can be found in [BJK+16b].

5.4.2 Design Rationale

The goal was to design a cipher competitive to Prince in terms of latency withthe advantage of being tweakable. In contrast to Skinny, we distinguish betweentweak and key input. In particular, we allow an attacker to control the tweakbut not the key. Thus, similar to Prince, we do not claim related-key security.In order to reach this goal, again, several components are borrowed from alreadyexisting ciphers. Note that, as we aim for an efficient unrolled implementation,one is not restricted to a classical round-iterated design. As latency was the mainoptimization goal, the security margin of Mantis is way smaller than that of Skinny.We chose to include the number of rounds as a defining parameter in each Mantisversion in order to specify more aggressive and more conservative versions. Thosecan be targets for external cryptanalysis. While the small versions of Mantis fort ≤ 4 can only be considered as “toy ciphers” and are just defined for the sake ofcompleteness, Mantis5 refers to the most aggressive practical solution. We refer toSection 5.4.3 for the particular security claims and the best external cryptanalysisso far.

122

α-Reflection Property

Mantist is designed as a reflection cipher, i.e., encryption under a key k equalsdecryption under a related key. This significantly reduces the implementationoverhead for decryption. Therefore, the parameter t denotes only half the numberof rounds, as the second half of the cipher is basically the inverse of the first half.It is advantageous that MixM is involutory since we need the middle part of thecipher to be an involution.

The Choice of the Linear Layer

To achieve low latency in a fully unrolled implementation, one is limited in thenumber rounds to be applied. Therefore, one has to achieve very fast diffusionand guarantee a high number of active S-boxes. To reach those requirements, weadopted the linear layer of Midori. It provides full diffusion after only three roundsand guarantees a high number of active S-boxes in the single-key setting. We referto Table 5.8 for the actual bounds. The bounds (both in the single-key as in therelated-tweak setting) were computed with the same MILP approach as in Skinny.Section 5.4.5 explains how the constraints for modelling the particular linear layerof Mantis were defined.

The Choice of the Round Constants

Compared to Midori, whose round constants are extremely sparse and structured,we decided to employ very dense (and basically random) round constants overthe whole state, similar at it was done in Prince. This should in particular protectagainst invariant attacks. In Chapter 6, we look at the resistance of Mantis againstinvariant attacks in more detail.

The Choice of the S-box

For the S-box in Mantis we used the same S-box as in Midori-64. This S-box SbMid64

has a low circuit depth and thus can be implemented to achieve a significantlylower latency than the Prince S-box. The maximum differential probability ispSbMid64

= 2−2 and the maximum absolute correlation is cSbMid64= 2−1.

The Choice of the Tweak Permutation σ

Our aim was to choose a word permutation σ ∈ S16 such that five rounds (plusone additional SSbMid64

layer) guarantee at least 16 active S-boxes in the related-tweak setting. This would guarantee at least 32 active S-boxes for Mantis5, whichis enough for the standard wide-trail argument on the resistance against (related-tweak) differential attacks (resp. linear attacks) based on a single trail. Sincethere are 16! possibilities for h, which is too much for an exhaustive search, werestricted ourself on a subclass of 8! tweak permutations. The restriction is thattwo complete rows (without changing the position of the words in those rows) are

123

permuted to different rows. In our case, the first and third row are permuted tothe second and fourth row, respectively. The bounds were derived using the MILPapproach. We tested several thousand choices for the permutation σ and foundout that 16 active S-boxes were the best possible to reach over H5. Out of theseoptimal choices, we took the permutation that maximized the bound for Mantis5and, as a second step, for Mantis6. We refer to Table 5.8 for the actual bounds.

Table 5.8: Lower bounds on the number of linear (and differential) active S-boxesin the single-key model and on the number active S-boxes in related-tweak (RT)differential attacks for Mantis.

Mantis2 Mantis3 Mantis4 Mantis5 Mantis6 Mantis7 Mantis8

Linear 14 32 46 62 70 76 82

RT 6 12 20 34 44 50 56

5.4.3 Security Claim and Best Cryptanalysis so far

For Mantis7, the original security claim is that any adversary who is in possessionof 2n chosen plain/ciphertext pairs which were obtained under chosen tweaks, butwith a fixed unknown key, needs at least 2126−n calls to the encryption functionin order to recover the secret key. Thus, the security claims are the same as forPrince, except that also related-tweak security is claimed. Until now, no attack onMantis7 that invalidates this claim has been published.

Mantis was designed as a cipher having only a small security margin. In theoriginal design document, further cryptanalysis on the more aggressive versions ofMantis was explicitly encouraged. The designer’s claim on the aggressive versionMantis5 was security against practical attacks, similar to what has been consideredin the Prince challenge. More precisely, it was claimed that no related-tweak attack(better than the generic claim above) is possible against Mantis5 with less than 230

chosen (resp. 240 known) plaintext/ciphertext pairs. It turned out that this claimwas too optimistic and it was invalidated by external cryptanalysis. In particular,Dobraunig et al. presented a (related-tweak) key-recovery attack on Mantis5 witha (theoretical) data complexity of 228 chosen plaintexts and a time complexity ofabout 238 block cipher operations. They implemented the attack and were able torecover the key using 230 chosen plaintext in about one core hour [DEKM16]. Veryrecently, Eichlseder and Kales presented a (related-tweak) key-recovery attack onMantis6 with a data complexity of 253.94 chosen plaintexts and a time complexityof 253.94 computations [EK17]. The attacks use differential cryptanalysis methodsand exploit families of differential trails.

124

5.4.4 Unrolled Implementations

In Table 5.9 and Table 5.10, we list results of unrolled implementations for Mantis,constrained for the smallest area and the shortest latency, respectively.

Table 5.9: Unrolled implementationsof several Mantis versions constrainedfor the smallest area (both encryptionand decryption), Cell Library: UMCL180 0.18 µm.

Area Delay Ref.

GE ns

Mantis5 8544 15.95 [BJK+16b]

Mantis6 9861 17.60 [BJK+16b]

Mantis7 11209 20.50 [BJK+16b]

Mantis8 12533 21.34 [BJK+16b]

Prince 8344 16.00 [MS16]

Table 5.10: Unrolled implementationsof several Mantis versions constrainedfor the shortest delay (both encryptionand decryption), Cell library: UMCL180 0.18 µm.

Area Delay Ref.

GE ns

Mantis5 13424 9.00 [BJK+16b]

Mantis6 18375 10.00 [BJK+16b]

Mantis7 23926 11.00 [BJK+16b]

Mantis8 30252 12.00 [BJK+16b]

Prince 17693 9.00 [MS16]

5.4.5 The MILP Constraints for the MixColumns Operation

For computing the minimum number of active S-boxes in Mantis, we used the sameMILP approach as for Skinny. The main difference is that the linear constraintsfor modelling the (word-wise) branching transitions of the transformation

M :

x1

x2

x3

x4

7→

0 1 1 1

1 0 1 1

1 1 0 1

1 1 1 0

x1

x2

x3

x4

=

x2 + x3 + x4

x1 + x3 + x4

x1 + x2 + x4

x1 + x2 + x3

=:

y1

y2

y3

y4

are more complex than for the linear mixing in Skinny. Therefore, we do notdescribe the complete model, but just give the constraints for M instead. Inparticular, let i1 . . . , i4, o1, . . . , o4, d be decision variables that can take values in{0, 1} ⊆ Z. We define

CM [i1, i2, i3, i4, o1, o2, o3, o4, d] :=

5⋃j=1

Cj [i1, i2, i3, i4, o1, o2, o3, o4, d] ,

where the five sets of linear constraints Cj [i1, i2, i3, i4, o1, o2, o3, o4, d], for j ∈{1, . . . , 5}, are given as follows:

125

1. Constraints for describing the branch number:

C1[i1, i2, i3, i4, o1, o2, o3, o4, d] =

4∑j=1

ij + oj ≥ 4d

∪4⋃j=1

{ij ≤ d, oj ≤ d}

2. An inactive input vector cannot turn active and vice versa:

C2[i1, i2, i3, i4, o1, o2, o3, o4, d] =

4∑j=1

ij ≥ d,4∑j=1

oj ≥ d

3. If yl (resp. xl) is active then at least one of the xj (resp. yj) with j 6= l is

active:

C3[i1, i2, i3, i4, o1, o2, o3, o4, d]

=

∑j∈{1,2,3}

ij ≥ o4,∑

j∈{1,2,4}ij ≥ o3,

∑j∈{1,3,4}

ij ≥ o2,∑

j∈{2,3,4}ij ≥ o1

∪

∑j∈{1,2,3}

oj ≥ i4,∑

j∈{1,2,4}oj ≥ i3,

∑j∈{1,3,4}

oj ≥ i2,∑

j∈{2,3,4}oj ≥ i1

4. If yj (resp. xj) is active then, for all l 6= j, at least one of xj , xl, yl (resp.yj , yl, xl) is active:

C4[i1, i2, i3, i4, o1, o2, o3, o4, d] =⋃

(j,l)∈N≤4×N≤4

j 6=l

{ij + il + ol ≥ oj , oj + ol + il ≥ ij}

5. Additional constraints:

C5[i1, i2, i3, i4, o1, o2, o3, o4, d] =⋃

(d,j)∈N≤4×N≤4

d6=j

{oj +∑l 6=jl 6=d

il ≥ id, ij +∑l 6=jl 6=d

ol ≥ od}

The solutions for i1, . . . , i4, o1, . . . , o4 for which there exists a d ∈ {0, 1} suchthat all constraints in CM [i1, i2, i3, i4, o1, o2, o3, o4, d] are fulfilled correspond to thepossible branching transitions as depicted in Figure 4.2.

As for Skinny, the corresponding MILP instances were solved using Gurobi[GO16].

126

5.5 Conclusion and Future Work

We have shown the two tweakable block cipher designs Skinny and Mantis. WhileSkinny is designed to be a flexible lightweight cipher for various applications, Mantisis optimized for low-latency applications. Because of the extreme focus on latency,the security margin of Mantis is rather small. We explicitly encourage furthercryptanalysis on the two designs.

Recently, Avanzi published the tweakable block cipher family QARMA for low-latency applications [Ava17]. It reuses components of Mantis, but adds severalinnovative design choices. For instance, its reflection property is built upon a non-involutory and keyed layer in the middle. Further, instead of using a MixColumnsmatrix with binary coefficients, it considers more general matrices over rings withzero-divisors for defining the linear layer. Those allow more possibilities while atthe same time keeping the latency low. More analysis of QARMA and its generalstructure would be an interesting topic for future work.

127

Part II

Analysis of Lightweight BlockCiphers

129

Chapter 6

Invariant Attacks

Large parts of this chapter (with the exception of Section 6.3) are based on thepublication [BCLR17] which is joint work with Anne Canteaut, Gregor Leanderand Yann Rotella.1 All authors equally contributed. The main contribution of theauthor was in the first part of the paper (here Section 6.4), i.e., the algorithmicapproach of proving the non-applicability of the invariant attack. In Section 6.5,he contributed to the proof of Theorem 6.1 and by proving Proposition 6.11.

6.1 Introduction

As explained in Section 2.5, the main idea of lightweight cryptography can beembraced as designing cryptographic primitives that put an extreme focus onperformance. This in turn resulted in many new designs which achieve betterperformance by essentially removing any operations that are not strictly neces-sary (or believed to be necessary) for the security of the scheme. One particularinteresting case of reducing the complexity is the design of the key schedule andthe choice of round constants. Both of these are arguably the parts that we un-derstand least and only very basic design criteria are available on how to choosea good key schedule or how to choose good round constants. Consequently, manyof the lightweight block ciphers avoid any complexity in the key schedule at all.Instead, identical keys are used in the rounds and (often very simple and sparse)round constants are added on top (e.g., see LED [GPPR11], Skinny (Chapter 5),Prince [BCG+12], Mantis (Chapter 5), Midori [BBI+15], to mention a few).

However, several of those schemes were recently broken using a structural at-tack called invariant subspace attack [LAAZ11, LMR15], as well as the recentlypublished generalization called nonlinear invariant attack [TLS16]. Indeed, thoseattacks have been successfully applied to quite a number of recent designs in-cluding PRINTcipher [LAAZ11], Midori-64 [GJN+16, TLS16], iSCREAM [LMR15]

1The original article published by Springer-Verlag is available at DOI: 10.1007/978-3-319-63715-0 22 ( c© IACR 2017). Here, parts of the text are rearranged, modified or omitted.

131

https://dx.doi.org/10.1007/978-3-319-63715-0_22

https://dx.doi.org/10.1007/978-3-319-63715-0_22

and SCREAM [TLS16], NORX v2.0 [CFG+17], Simpira v1 [Røn16] and Harakav.0 [Jea16]. Both attacks, that we jointly call invariant attacks, notably exploitthe fact that these lightweight primitives have a very simple key schedule wherethe same round key (up to the addition of a round constant) is applied in severalrounds.

It is therefore of major importance to study invariant attacks in more detailand to determine whether a given primitive is vulnerable. More generally, it wouldbe interesting to exhibit some design criteria for the building blocks of a cipherwhich guarantee the resistance against those attacks. As mentioned above, thiswould shed light on the fundamental open question on how to select proper roundconstants.


After explaining the idea of invariant attacks, we show in Section 6.3 how theyare related to the existence of linear approximations. In particular, we see that inmany cases, the existence of an invariant for a keyed instance Ek directly impliesthe existence of a linear approximation over Ek with a significant high bias. Thiscomes from the relation between invariant attacks and nonlinear approximations.Although this connection to linear cryptanalysis is a quite interesting observation,we are unfortunately not able to state anything more about those linear approxi-mations besides their existence.

Then, we analyze the resistance of several lightweight substitution-permutationciphers against invariant attacks. Our framework both covers the invariant sub-space attack, as well as the recently published nonlinear invariant attack. Byexactly formalizing the requirements of those attacks, we are able to reveal theprecise mathematical properties that render those attacks applicable. Indeed, aswe will detail below, the rational canonical form of the linear layer will play amajor role in our analysis. Our results show that the linear layer and the roundconstants have a major impact on the resistance against invariant attacks, whilethis type of attacks was previously believed to be mainly related to the behaviourof the S-box, see e.g., [GJN+16]. In particular, if the number of invariant factors ofthe linear layer is small (for instance, if its minimal polynomial has a high degree),we can easily find round constants which guarantee the resistance to all types ofinvariant attacks, independently of the choice of the S-box layer. In order to easethe application of our results in practice, we implemented all our findings in Sage[StSDT16] and added the source code in Listing 6.1 in the end of this chapter.

In our framework, the resistance against invariant attacks is defined in thefollowing sense: For each instantiation of the cipher with a fixed key, there is nofunction that is invariant for both the substitution layer and for the linear part ofeach round. This implies that any adversary who still wants to apply an invari-ant attack necessarily has to search for invariants over the whole round function,which appears to have a cost exponential in the block size in general. Indeed, allpublished invariant attacks we are aware of exploit weaknesses in the underlying

132

building blocks of the round. Therefore, our notion of resistance guarantees com-plete security against the major class of invariant attacks, including all variantspublished so far.

This analysis of the resistance against invariant attacks is split in two parts, afirst part (Section 6.4) which can be seen as the attacker’s view on the problem anda second part (Section 6.5) which reflects more on the designer’s decision on howto avoid those attacks. More precisely, Section 6.4 details an algorithmic approachwhich enables an adversary to spot a possible weakness with respect to invariantattacks within a given cipher. For the lightweight block ciphers Skinny-64, Princeand Mantis7, the 7-round version of Mantis, this algorithm is used to prove theresistance against invariant attacks.2

These results come from the following observation: Let L denote the linear layerof the cipher in question and let c1, . . . , ct ∈ Fn2 be the (XOR) differences betweentwo round constants involved in rounds where the same round key is applied.Furthermore let WL(c1, . . . , ct) denote the smallest L-invariant subspace of Fn2 thatcontains all c1, . . . , ct. Then, one can guarantee resistance if WL(c1, . . . , ct) coversthe whole input space Fn2 . As a direct result, we will see that in Skinny-64, thereare enough differences between round constants to guarantee the full dimensionof the corresponding L-invariant subspace. This directly implies the resistance ofSkinny-64, and this result holds for any reasonable choice of the S-box layer.3 Incontrast, for Prince and Mantis7, there are not enough suitable ci to generate asubspace WL(c1, . . . , ct) with full dimension. However, for both primitives, we areable to keep the security argument by also considering the S-box layer, using thefact that the dimension of WL(c1, . . . , ct) is not too low in both cases.

As a second part, in Section 6.5 we provide an in-depth analysis of the impactof the round constants and of the linear layer on the resistance against invariantattacks. The first question we study is the following:

Given the linear layer L of a cipher, what is the minimum number of round con-stants needed to guarantee resistance against the invariant attack, independentlyfrom the choice of the S-box?

Figure 6.1 shows the maximal dimension that can be reached by WL(c1, . . . , ct)when t values of ci are considered. It shows in particular that the whole inputspace can be covered with only t = 4 values in the case of Skinny-64, while 8 and16 values are needed for Prince and Mantis, respectively. This explains why, eventhough Prince and Mantis apply very dense round constants, the dimension doesnot increase rapidly for higher values of t. The observations in Fig. 6.1 are deducedfrom the invariant factors (or the rational canonical form) of the linear layer, aswe show by the following theorem.

2For Mantis7, the resistance against invariant attacks is only proven for the untweaked version,i.e., the tweak input is considered to be zero.

3We have to provide that the S-box has no component of degree 1. If the S-box has such alinear or affine component, the cipher would be vulnerable against linear cryptanalysis.

133

0 2 4 6 8 10 12 14 16

8

16

24

32

40

48

56

64

t

max

tdim

WL(c

1,...,c

t)

Skinny64PrinceMantis

Figure 6.1: For Skinny-64, Prince andMantis, this figure shows the highestpossible dimension of WL(c1, . . . , ct) fort values c1, . . . , ct (see Theorem 6.1).

0 2 4 6 8 10 12 14 16 18 20 22 24 260

0.2

0.4

0.6

0.8

1

t

Pr(dim

WL(c

1,...,c

t)=

64)

LEDSkinny64PrinceMantis

Figure 6.2: For several lightweight ci-phers, this figure shows the probabilitythat WL(c1, . . . , ct) = Fn2 for uniformlyrandom constants ci (see Theorem 6.3).

Theorem 6.1. Let Q1, . . . , Qr be the invariant factors of the linear layer L andlet t ≤ r. Then

maxc1,...,ct∈Fn2

dimWL(c1, . . . , ct) =

t∑i=1

degQi .

For the special case of a single constant c, the maximal dimension of WL(c) isequal to the degree of the greatest invariant factor of L, i.e., the minimal polyno-mial of L. We will also explain how the particular round constants must be chosenin order to guarantee the best possible resistance.

As designers often choose random round constants to instantiate the primitive,we were also interested in the following question:

How many randomly chosen round constants are needed to guarantee the bestpossible resistance with a high probability?

We derive an exact formula for the probability that the subspace WL(c1, . . . , ct)has full dimension for t uniformly random constants ci. Fig. 6.2 gives an overviewof this probability for several lightweight designs.

6.2 Preliminaries and Explanation of the Attack

Let f ∈ Bn be an n-bit Boolean function. The derivative of f in direction α ∈ Fn2is the Boolean function defined by ∆αf := x 7→ f(x + α) + f(x). The followingterminology will be extensively used in this chapter. It refers to the constantderivatives which play a major role in the context of invariant attacks.

134

Definition 6.1 (see [Lai95]). An element α ∈ Fn2 is said to be a linear structureof f ∈ Bn if the corresponding derivative ∆αf is constant. The set of all linearstructures of a function f is a linear subspace of Fn2 and is called the linear spaceof f :

LS(f) := {α ∈ Fn2 | ∆αf = ε, ε ∈ {0,1}} .

The nonlinear invariant attack was described in [TLS16] as a distinguishingattack on block ciphers. Indeed, the authors were able to construct an attack onthe lightweight schemes Midori-64, SCREAM [GLS+15], and iSCREAM [GLS+14].For an (n, κ)-block cipher E, the idea is to find a subset S ⊂ Fn2 for which thepartition of the input set into S ∪ (Fn2 \ S) is preserved by the keyed instance Ekfor as many keys k as possible, i.e.,

Ek(S) = S or Ek(S) = Fn2 \ S .

The keys for which this property is fulfilled are called weak keys. The specialcase when S is an affine space corresponds to the so-called invariant subspaceattack [LAAZ11].

An equivalent formulation is obtained by considering the n-bit Boolean functiong ∈ Bn defined by g(x) = 1 if and only if x ∈ S. Then, finding an invariant consistsin finding a function g ∈ Bn such that g + g ◦ Ek is constant. This motivates thefollowing definition.

Definition 6.2 (see [TLS16]). Let F : Fn2 → Fn2 be a permutation. A Booleanfunction g ∈ Bn for which g + g ◦ F ∈ {0,1} is called an invariant for F . Wedenote the set of all invariants for F by

U(F ) := {g ∈ Bn | g + g ◦ F is constant} .

As observed in [TLS16], the set U(F ) is a linear subspace of Bn. If n is quitesmall, e.g., n = 4 in the example of a four-bit S-box, the space U(F ) can beefficiently computed. An important remark, which will be used later, is that if Fhas a cycle of odd length, then all g ∈ U(F ) satisfy g + g ◦ F = 0.

Whenever we are trying to find an invariant for a keyed instance Ek of a blockcipher, we obviously focus on non-trivial invariants, i.e., on g 6∈ {0,1}. Whentrying to find an invariant in practice, one usually tries to find a Boolean functiong ∈ Bn that is an invariant for all the building blocks of the cipher simultaneously.Indeed, all the invariant attacks we are aware of exploit the iterative structure ofthe block cipher. We now illustrate the attack on the block cipher Midori-64 as anexample.

Example 6.1 (Invariant Attack on Midori-64 as presented in [TLS16]). The blocklength is n = 64 and the key length is κ = 128. If we denote the four-bit word inthe j-th cell of the state by (xj,3, xj,2, xj,1, xj,0), the invariant g ∈ Bn is given by

g(x) =

16∑j=1

(xj,3xj,2 + xj,2 + xj,1 + xj,0) .

135

This function is an invariant for the S-box layer as the four-bit Boolean function

(xj,3, xj,2, xj,1, xj,0) 7→ xj,3xj,2 + xj,2 + xj,1 + xj,0

is contained in U(SbMid64). Further, since this function is quadratic and since theMixColumns matrix of Midori is orthogonal and consists only of coefficient in F2,Theorem 1 in [TLS16] applies and g is an invariant for the linear layer L and thusalso for the (unkeyed) round function L◦S. To determine the set of weak keys, onehas to identify the round keys ki such that g is an invariant for Addki . For this,we rely on the following fact, which was already implicitly contained in [TLS16].

Proposition 6.1. Let g ∈ Bn. Then, g is an invariant for the key addition Addki ,if and only if ki ∈ LS(g).

Proof. By the definition of the invariant property, there exists a constant ε ∈ Fn2such that

∀x ∈ Fn2 : g(x) + g(x+ ki) = ε .

This coincides with the definition of ki ∈ LS(g).

This implies that the weak round keys in Midori-64 are those for which kij,3 =kij,2 = 0 for all j. Due to the simplicity of the key-scheduling algorithm, thishappens for 264 out of all 2128 possible initial keys k.

6.3 A Link to Linear Cryptanalysis

In Section 2.3.2, we have already mentioned how the invariant attack is relatedto nonlinear approximations. In particular, an invariant g for a keyed instanceEk defines a nonlinear approximation with absolute correlation equal to one, i.e.,| corEk(g, g)| = 1. Similar to the Theorem of Linear Trail Composition (Theo-rem 2.2), one can show the following for nonlinear approximations. For a Booleanfunction g ∈ Bn and a mask γ ∈ Fn2 , we define corg(γ) := corg(γ, 1) = 2 ·Probx (〈γ, x〉 = g(x))− 1.

Proposition 6.2 (Nonlinear Trail Composition). Let F : Fn2 → Fn2 and let g, h ∈Bn. Then,

corF (g, h) =∑

γ,γ′∈Fn2

corg(γ) corF (γ, γ′) corh(γ′) .

Proof. The proof is very similar to the proof of Theorem 2.2. We first show that,for any γ′ ∈ Fn2 ,

∑γ∈Fn2 corg(γ) corF (γ, γ′) = corF (g, lγ′). For this, we use the

fundamental fact given in Equation 2.4, i.e.,

∑γ∈Fn2

(−1)〈γ,x〉 =

{2n if x = 0

0 if x 6= 0.

136

In particular, ∑γ∈Fn2

corg(γ) corF (γ, γ′)

=∑γ∈Fn2

1

2n1

2n

∑x∈Fn2

(−1)〈γ,x〉+g(x)∑x′∈Fn2

(−1)〈γ,x′〉+〈γ′,F (x′)〉

=1

2n1

2n

∑x∈Fn2

∑x′∈Fn2

(−1)g(x)+〈γ′,F (x′)〉 ∑γ∈Fn2

(−1)〈γ,x+x′〉

=1

2n

∑x∈Fn2

(−1)g(x)+〈γ′,F (x)〉 = corF (g, lγ′) .

Using the fact proven above, we proceed to prove the result as follows:∑γ∈Fn2

∑γ′∈Fn2

corg(γ) corF (γ, γ′) corh(γ′)

=∑γ′∈Fn2

corh(γ′)∑γ∈Fn2

corg(γ) corF (γ, γ′)

=∑γ′∈Fn2

corh(γ′) corF (g, lγ′)

=1

2n1

2n

∑x∈Fn2

∑x′∈Fn2

(−1)g(x′)+h(x)

∑γ′∈Fn2

(−1)〈γ′,x+F (x′)〉

=1

2n

∑x′∈Fn2

(−1)g(x′)+h(F (x′)) = corF (g, h) .

This result shows how a nonlinear approximation is related to multiple linearapproximations. When applying Proposition 6.2 to an invariant g for Ek, oneobtains

1 = | corEk(g, g)| = |∑

γ,γ′∈Γg

corg(γ) corEk(γ, γ′) corg(γ′)| , (6.1)

where Γg := {γ| corg(γ) 6= 0}. If we consider the invariant attacks on Midori-64, SCREAM and iSCREAM as presented in [TLS16], one observes that in thoseexamples the absolute value of corg(γ) is the same for all γ in Γg. In particular,Equation 6.1 simplifies to

|∑

γ,γ′∈Γg

(−1)f(γ)+f(γ′) corEk(γ, γ′)| = 232 , (6.2)

where f is an n-bit Boolean function and |Γg| = 232.

137

Midori-64

We have Γg = {(γ1, . . . , γ16) | ∀j : γj,0 = γj,1 = 1} and

f(γ) =

16∑j=1

γj,3γj,2 + γj,3 .

SCREAM

It is n = 128 and the invariant is given by

g(x) =

16∑j=1

(xj,2xj,1 + xj,5 + xj,2 + xj,0) .

In Equation 6.2, we have

Γg = {(γ1, . . . , γ16) | ∀j : γj,0 = γj,5 = 1, γj,3 = γj,4 = γj,6 = γj,7 = 0}

and

f(γ) =

16∑j=1

γj,2γj,1 + γj,2 .

iSCREAM

It is n = 128 and the invariant is given by

g(x) =

16∑j=1

(xj,5xj,4 + xj,6 + xj,0) .

In Equation 6.2, we have

Γg = {(γ1, . . . , γ16) | ∀j : γj,0 = γj,6 = 1, γj,1 = γj,2 = γj,3 = γj,7 = 0}

and

f(γ) =

16∑j=1

γj,5γj,4 .

From Equation 6.2, one can deduce that, for each weak key, there must existlinear approximations (over the whole cipher, and in fact for all possible numberof rounds) with absolute correlation larger than or equal to 2−32. Since the blocklength of SCREAM and iSCREAM is n = 128, this implies the existence of linearapproximations with absolute correlation larger than 2−

n2 , i.e., linear approxima-

tions that can be exploited by a standard linear attack.The particular reason for the nonlinear trail composition simplifying to Equa-

tion 6.2 in those three examples is that the invariants are quadratic. In general,one can prove the following theorem.4

4We gratefully acknowledge the contribution of Anne Canteaut, who communicated this re-lation.

138

Theorem 6.2 (Existence of Linear Approximations from Quadratic Invariants).Let g ∈ Bn be a quadratic invariant for a permutation F : Fn2 → Fn2 . Then, thereexists a Boolean function f ∈ Bn such that

|∑

γ,γ′∈Γg

(−1)f(γ)+f(γ′) corF (γ, γ′)| = |Γg|

and |Γg| = 2n−dim LS(g). Moreover, if g is balanced, there must exist γ, γ′ 6= 0 forwhich | corF (γ, γ′)| ≥ 2dim LS(g)−n.

Proof. From Proposition 6.2, we obtain

|∑

γ,γ′∈Γg

corg(γ) corg(γ′) corF (γ, γ′)| = 1 ,

where Γg = {γ ∈ Fn2 | corg(γ) 6= 0}. From Theorem 4 in [Car07] it follows that, for

any γ ∈ Fn2 , it is corg(γ) ∈ {0,±2dim LS(g)−n

2 }. This implies

|∑

γ,γ′∈Γg

(−1)f(γ)+f(γ′) corF (γ, γ′)| = 2n−dim LS(g)

for an appropriate function f ∈ Bn.To see that |Γg| = 2n−dim LS(g), we use Parseval’s relation (see Corollary 3

in [Car07]) and obtain

1 =∑γ∈Fn2

cor2g(γ) =

∑γ∈Γg

cor2g(γ) = |Γg|2dim LS(g)−n .

The existence of γ, γ′ ∈ Γg for which | corF (γ, γ′)| ≥ 2dim LS(g)−n immediatelyfollows. If g is balanced, we have 0 /∈ Γg.

6.4 Proving the Absence of Invariants in LightweightSPNs

In the remainder of this chapter, we concentrate on block ciphers which follow thespecific structure of SP networks as depicted in Figure 2.3.

In practice, the technique applied for finding invariants for the cipher usuallyconsists in exploiting its iterative structure and in searching for functions whichare invariant for all constituent building blocks. Indeed, computing invariants forthe round function is in general infeasible for a proper block size, typically n = 64or n = 128. Despite the fact that all published invariant attacks we are aware ofexploit invariants for all the constituent building blocks, the algorithm describedin [LMR15] searches for invariant subspaces over the whole round function. How-ever, it can only be applied in the special case for finding an invariant subspace,

139

and not for detecting an arbitrary invariant set. Moreover, it only detects spacesof large dimension efficiently.

Therefore, we consider in the following only those invariants that are invariantunder both the substitution layer S and the linear parts Addki ◦L of all rounds.The linear spaces of these invariants have then a very specific structure as pointedout in the following proposition.

Proposition 6.3. Let L be a linear permutation on Fn2 . Let g ∈ Bn be an invariantfor both Addki ◦L and Addkj ◦L for two round keys ki and kj. Then, LS(g) is alinear space invariant under L which contains (ki + kj).

Proof. By definition of g, there exist εi, εj ∈ F2 such that, for all x ∈ Fn2 ,

g(x) = g(L(x) + ki) + εi and g(x) = g(L(x) + kj) + εj .

This implies that, for all x ∈ Fn2 ,

g(L(x) + ki) + g(L(x) + kj) = εi + εj ,

or equivalently, by replacing (L(x) + kj) by y:

g(y + ki + kj) + g(y) = εi + εj , ∀y ∈ Fn2

and thus (ki+kj) ∈ LS(g). We then have to show that LS(g) is invariant under L.Let s ∈ LS(g). Then, there exists a constant ε ∈ F2 such that g(x) = g(x+ s) + ε.Since g is an invariant for Addki ◦L, we deduce

g(L(x) + ki) + εi = g(x) = g(x+ s) + ε = g(L(x) + L(s) + ki) + (εi + ε) .

Finally, we set y := L(x) + ki and obtain

g(y) = g(y + L(s)) + ε (6.3)

which completes the proof.

Therefore, the attack requires the existence of an invariant for the substitutionlayer whose linear space is invariant under L and contains all differences betweenthe round keys.5 The difference between two round keys, which should be con-tained in LS(g), is in general dependent on the initial key. However, if we consideronly designs where some round keys are equal up to the addition of a round con-stant, we obtain that the differences between these round constants must belongto LS(g). Then, LS(g) is a linear space invariant under L which contains the differ-ences (ci+cj) for any pair (i, j) of rounds such that ki = k+ci and kj = k+cj . Thesmallest such subspaces are spanned by the cycles of L as shown by the followinglemma.

5Note that a similar observation was already made in [Ava17] in the context of the invariantsubspace attack.

140

Lemma 6.1. Let L be a linear permutation of Fn2 . For any c ∈ Fn2 , the smallestL-invariant linear subspace of Fn2 which contains c, denoted by WL(c), is

span{Li(c) | i ≥ 0} .

Proof. Obviously, span{Li(c) | i ≥ 0} is included in WL(c), since WL(c) is a linearsubspace of Fn2 and is invariant under L. Moreover, we observe that span{Li(c) |i ≥ 0} is invariant under L. Indeed, for any λ1, λ2 ∈ F2 and any (i, j),

L(λ1Li(c) + λ2L

j(c)) = λ1Li+1(c) + λ2L

j+1(c)

and then belongs to span{Li(c) | i ≥ 0}. Then, this subspace is the smallest linearsubspace of Fn2 invariant under L which contains c.

Let now D be a set of known differences between round keys, i.e., a subset ofall ki + kj = (ci + cj). We define the subset

WL(D) :=∑c∈D

span{Li(c) | i ≥ 0} =∑c∈D

WL(c) .

We then deduce from the previous observations that the invariant attack appliesonly if there is a non-trivial invariant g for the S-box layer such that WL(D) ⊆LS(g). A SageMath code that computes the linear space WL(D) for a predefinedlist D is given in Listing 6.1 in the end of this chapter. It has been used fordetermining the dimension of WL(D) corresponding to the round constants inseveral lightweight ciphers.

Skinny-64

Considering the untweaked version Skinny-64-64, one observes that the round keysrepeat every 16 rounds. We define

D := {c1 + c17, c2 + c18, c3 + c19, c4 + c20, c5 + c21} ,

where the round constants ci are those of Skinny and of the form

ci =

ar 0 0 0

br 0 0 0

2 0 0 0

0 0 0 0

as described in Algorithm 5.2. We obtain dimWL(D) = 64.

141

Skinny-128

In Skinny-128, the round constants are all of the following form:a 0 0 0

b 0 0 0

02 0 0 0

0 0 0 0

with 8-bit values a ∈ {00, . . . , 0F} and b ∈ {00, . . . , 03}. Then, as the linear layeris defined by a binary matrix, we can see that the dimension of WL(D) is at most64, because none of the four most significant bits will be activated with any roundconstant.

Prince

Prince uses ten round keys ki, 1 ≤ i ≤ 10, which are all of the form ki = k + ci.The so-called α-reflection property implies that, for any i, ki + k11−i = α whereα is a fixed constant. The particular values of the round constants ci and α canbe found in the specification of Prince [BCG+12]. We can then consider the set of(independent) round constant differences

D = {α, c1 + c2, c1 + c3, c1 + c4, c1 + c5} .

We obtain that dimWL(D) = 56.

Mantis

As Prince, also Mantis7 has the α-reflection property. We therefore consider thefollowing set of round constant differences:6

D = {α, c1 + c2, c1 + c3, c1 + c4, c1 + c5, c1 + c6, c1 + c7} ,

where the ci and α are as given in Section 5.4. We obtain that dimWL(D) = 42.

Midori-64

In Midori-64, the round constants are only added to the least significant bit of eachword and the linear layer does not provide any mixing within the words. ThenWL(D) = {(0, 0, 0, 0), (0, 0, 0, 1)}16, and has dimension 16 only.

As the invariant attack applies only if there is a non-trivial invariant g for theS-box layer such that WL(D) ⊆ LS(g), by intuition, the attack should be harderas the dimension of WL(D) increases. In the following, we analyze the impact ofthe dimension of WL(D) to the applicability of the attack in detail and present amethod to prove the non-existence of invariants based on this dimension.

6For Mantis, we assume the tweak to be zero. Otherwise the key schedule is not of the simpleform as described above.

142

6.4.1 The Simple Case

We first consider a simple case, i.e., when the dimension of WL(D) is at least n−1.

Proposition 6.4. Suppose that the dimension of WL(D) is at least n− 1. Then,any g ∈ Bn such that WL(D) ⊆ LS(g) is linear, affine or constant. As a conse-quence, there is no non-trivial invariant g of the S-box layer such that WL(D) ⊆LS(g), unless the S-box layer has a component of degree 1.

Proof. From [Car07, Prop. 14], it follows that

dim LS(g) ≥ k ⇔ deg g ≤

{n− k if k 6= n

1 if k = n.

This implies that g must be linear, affine or constant. Invariants of algebraic degree1 imply the existence of a linear approximation with probability 1, or equivalentlythat the S-box has a component (i.e., a linear combination of its coordinates) ofdegree 1.

In the rest of the chapter, we will implicitly exclude the case when the S-boxhas a component of degree 1, as the cipher would be already broken by linearcryptanalysis.

Skinny-64

As shown before, for the untweaked version Skinny-64-64 one obtains dimWL(D) =64. This indicates that the round constants do not allow non-trivial invariants thatare invariant for both the substitution and the linear parts of Skinny-64, and thisresult holds for any choice of the S-box layer.

Unfortunately, the dimension of WL(D) is not high enough for the other cipherswe considered. For those primitives, we therefore cannot prove the resistanceagainst invariant attacks based on the linear layer only.

6.4.2 When the Dimension is Smaller

Not every cipher applies round constants such that the dimension of WL(D) islarger than or equal to n− 1. Even for Prince and Mantis, which have very denseround constants, it is not the case and we cannot directly rely on this argument.However, if n − dimWL(D) is small, we can still prove that the invariant attackdoes not apply but only by exploiting some information on the S-box layer. Thiscan be done by checking whether there exists a non-trivial invariant g for the S-boxlayer which admits some given elements as 0-linear structures, in the sense of thefollowing definition.

143

Definition 6.3. A linear structure α of a Boolean function f is called a 0-linearstructure if the corresponding derivative equals the all-zero function. The set of all0-linear structures of f is a linear subspace of LS(f) denoted by LS0(f). Elementsβ s.t. ∆βg = 1 are called 1-linear structures of f .

Note that 0-linear structures are also called invariant linear structures. Itis well known that the dimension of LS0(f) drops by at most 1 compared toLS(f) [DW97].

Checking that all invariants are constant based on 0-linear structures.

In the following, we search for an invariant g for the S-box layer S that is alsoinvariant for the linear part of each round. Suppose now, in a first step, that weknow a subspace Z of LS(g) which is composed of 0-linear structures only. In otherwords, we now search for an invariant g for S such that LS0(g) ⊇ Z for some fixedZ. If the dimension of this subspace Z is close to n, we can try to prove that anysuch invariant is constant based on the following observation.

Proposition 6.5. Let g be an invariant for a permutation S : Fn2 → Fn2 such thatLS0(g) ⊇ Z for some given subspace Z ⊂ Fn2 . Then

(i) g is constant on each coset of Z;

(ii) g is constant on S(Z).

Proof. Since Z ⊆ LS0(g), for any a ∈ Fn2 , we have that g(a + z) = g(a) for allz ∈ Z, i.e., g is constant on all (a + Z). Now, we use that g is an invariant forS, which means that there exists ε ∈ F2 such that g(S(x)) = g(x) + ε. Since g isconstant on Z, we deduce that g is constant on S(Z).

To show that g must be trivial, the idea is to evaluate the S-box layer at somepoints in Z and deduce that g takes the same value on all corresponding cosets.The number of distinct cosets of Z equals 2n−dimZ , which is not too large whendimZ is close to n. Then, we hope that all cosets will be hit when evaluating Sat a few points in Z. In this situation, g must be a constant function. In otherwords, we are able to conclude that there do not exist non-trivial invariants forboth the substitution layer and the linear part.

In our experiments, we used the following very simple algorithm. If it termi-nates, all invariants must be constant. An efficient SageMath implementation ofAlgorithm 6.1 is given in Listing 6.1 at the end of this chapter.

Determining a suitable Z from WL(D)

Up to now, we assumed the knowledge of a subspace Z of WL(D) for whichZ ⊆ LS0(g) for all invariants g we are considering. But, the fact that someelements are 0-linear structures depends on the actual invariant g and thus, eachof the elements d ∈ WL(D) might or might not be a 0-linear structure. However,

144

Algorithm 6.1 Checking that U(S) ∩ {g ∈ Bn | Z ⊆ LS0(g)} is trivial

1: R = {}2: repeat

3: z$← Z

4: Compute S(z)5: Add to R a representative of the coset defined by S(z)6: until |R| = 2n−dimZ

some 0-linear structures can be determined by using one of the two followingapproaches.

First approach. The first observation comes from (6.3) in the proof of Prop. 6.3.

Lemma 6.2. Let L be a linear permutation on Fn2 . Let g ∈ Bn be an invariant forAddki ◦L for some ki and let V be a subspace of LS(g) which is invariant underL. Then, for any v ∈ V , (v + L(v)) ∈ LS0(g).

Proof. Let v ∈ V . Similar as in the proof of Prop. 6.3, we use that g is an invariantfor Addki ◦L and see that there exists an ε ∈ F2 such that, for all x ∈ Fn2 ,

g(x) = g(x+ v) + ε = g(x+ L(v)) + ε .

We finally set y := x+ v and obtain

g(y) = g(y + v + L(v)) ,

implying that v + L(v) is a 0-linear structure for g.

Following the previous lemma, one option is to just run Algorithm 6.1 on Z =WL(D′) withD′ = {d+L(d) | d ∈ D}. The disadvantage is that the dimension of Zmight be too low and therefore the algorithm might be too inefficient. In this case,one can also consider a different approach and run the algorithm several times,by considering all possible choices for the 0-linear structures among all elementsin D. Suppose that, in the initial set of constants D = {d1, d2, . . . , dm, . . . , dt},the elements d1, . . . , dm are all 1-linear structures and the elements dm+1, . . . , dtare all 0-linear structures for some invariant g with LS(g) ⊇WL(D). One can nowconsider

D′ = {d1 + L(d1), d2 + L(d2), . . . , dm + L(dm), dm+1, . . . , dt, d1 + d2, . . . , d1 + dm}

which increases the dimension of WL(D′) by adding the sums of the 1-linearstructures. We then have WL(D′) ⊆ LS0(g) and we can apply Algorithm 6.1on Z = WL(D′). Since we cannot say in advance which of the constants are 1-linear structures, there are 2t possible choices of such a subspace WL(D′) and werun Algorithm 6.1 on all of them. This approach still might be very inefficientdue to the smaller dimension of WL(D′) and since Algorithm 6.1 has to be run 2t

times.

145

Second approach. If the S-box layer S of the cipher has an odd-length cycle (i.e.,if every S-box has an odd-length cycle), we can come up with the following.

Proposition 6.6. Let g ∈ U(S) where S : Fn2 → Fn2 is a permutation with an oddcycle. Then, any linear structure of g which belongs to the image set of (S + idn),i.e., {S(x) + x | x ∈ Fn2}, is a 0-linear structure of g.

Proof. If the S-box layer has an odd cycle, then any g ∈ U(S) necessarily fulfillsg(x) = g(S(x)) for all x ∈ Fn2 . Now let g ∈ U(S) and c ∈ LS(g) ∩ Im(S + idn).Then there exists an x0 ∈ Fn2 such that S(x0) = x0 + c. We then deduce that

g(x0) = g(S(x0)) = g(x0 + c) ,

implying that c is a 0-linear structure of g.

Therefore, if we find enough of these c ∈ WL(D) ∩ Im(S + idn), we can justapply Algorithm 6.1 on the resulting set. This approach will be used on Mantis7,as explained next.

6.4.3 Results for some Lightweight Ciphers

We now apply the techniques explained above to some existing lightweight designs.

Prince

For Prince, we apply the first approach to D′ = {d+ L(d) | d ∈ D} where

D = {α, c1 + c2, c1 + c3, c1 + c4, c1 + c5} .

Then, dimWL(D′) = 51. We run Algorithm 6.1 on WL(D′) and the algorithmterminates within a few minutes on a standard PC. We now have proven that thereare no non-trivial invariants that are invariant for both the substitution layer andthe linear parts of all rounds in Prince.

Mantis

Since dimWL(D) = 42 for Mantis7, applying our algorithm 27 times on a subspaceof codimension 23 is a quite expensive task. We therefore exploit Proposition 6.6.Let Sb denote the 4-bit S-box of Mantis as given in Table 2.2. Indeed, the S-boxlayer of Mantis is the parallel application of Sb.

The S-box layer has an odd cycle because Sb has a fixed point. Moreover,the image set of (Sb + id4) is composed of 7 values {0, 9, A, B, C, E, F}. The c ∈WL(D) for which each 4-bit word is equal to a value in Im(Sb + id4) is a 0-linearstructure. For a random value c ∈ F64

2 , we expect that every 4-bit word belongs

to Im(Sb + id4) with a probability of(

716

)16 ≈ 2−19.082. In fact, one can findenough such c ∈ WL(D) in a reasonable time that generate the whole invariantspace WL(D), implying that WL(D) ⊆ LS0(g) for all invariants g ∈ U(S) with

146

WL(D) ⊆ LS(g). We then run Algorithm 6.1 on Z = WL(D). The algorithmterminates and we therefore deduce the non-existence of any non-trivial invariantwhich is invariant for S and the linear parts of all rounds in Mantis7 (where thetweak is assumed to be zero).

Midori-64

For Midori-64, WL(D) = {(0, 0, 0, 0), (0, 0, 0, 1)}16 and has dimension 16 only.Then, there are 248 different cosets of WL(D), implying that our algorithm isnot efficient. Indeed, the cipher consists of a significant space of weak keys asshown in [GJN+16, TLS16] (see also Example 6.1).

6.5 Design Criteria on the Linear Layer and on theRound Constants

In this section, we study the properties of WL(D) in more detail and explain thedifferent behaviors which have been previously observed. Most notably, we wouldlike to determine whether the differences in the dimensions of WL(D) we noticedare due to a bad choice of the round constants or if they are inherent to the choiceof the linear layer. At this aim, we analyze the possible values for the dimensionof WL(D) from a more theoretical viewpoint. We first consider the L-invariantsubspace WL(c) generated by a single element c.

6.5.1 The Possible Dimensions of the L-Invariant Subspaces

We show that, for a single element c, the dimension of WL(c) is upper-bounded bythe degree of the minimal polynomial of the linear layer. Recall that, for a linearinvertible mapping L : Fn2 → Fn2 , the minimal polynomial of L is defined as themonic polynomial mL ∈ F2[X] of smallest degree such that mL(L) = 0. Moreover,we have to consider the minimal polynomial with respect to single elements in Fn2as defined as follows.

Definition 6.4 (e.g., page 176 of [Gan59]). The minimal annihilating polynomialof an element c ∈ Fn2 (w.r.t L) (aka the order polynomial of c or simply the

minimal polynomial of c) is the monic polynomial ordL(c) =∑di=0 πiX

i ∈ F2[X]of smallest degree such that

ordL(c)(L)(c) =

d∑i=0

πi(Li(c)) = 0 .

Proposition 6.7. Let L be a linear permutation of Fn2 . For any non-zero c ∈ Fn2 ,the dimension of WL(c) is the degree of the minimal polynomial of c.

147

Proof. We know from Lemma 6.1 that WL(c) is spanned by all Li(c), i ≥ 0. Let dbe the smallest integer such that {c, L(c), . . . , Ld−1(c)} are linearly independent.By definition, d corresponds the degree of the minimal polynomial of c since thefact that Ld(c) belongs to span{Li(c) | 0 ≤ i < d} is equivalent to the existence

of π0, . . . , πd−1 ∈ F2 such that Ld(c) =∑d−1i=0 πiL

i(c), i.e., P (L)(c) = 0 with

P = Xd +∑d−1i=0 πiX

i ∈ F2[X]. It follows that d ≤ dimWL(c).We now need to prove that d = dimWL(c), i.e., that all Ld+t(c) for t ≥ 0 belong

to the linear subspace spanned by {c, L(c), . . . , Ld−1(c)}. This can be proven byinduction on t. The property holds for t = 0 by definition of d. Suppose now thatLd+t(c) ∈ span{c, L(c), . . . , Ld−1(c)}. Then,

Ld+t+1(c) = L(Ld+t(c)

)= L

(d−1∑i=0

λiLi(c)

)

=

d−1∑i=0

λiLi+1(c) ∈ span{c, . . . , Ld−1(c)} .

Obviously, the minimal polynomial of c is a divisor of the minimal polynomialof L. The previous proposition then provides an upper bound on the dimensionof any subspace WL(c), for c ∈ Fn2 \ {0}.

Corollary 6.1. Let L be a linear permutation of Fn2 . For any c ∈ Fn2 , the dimen-sion of WL(c) is at most the degree of the minimal polynomial of L.

We can even get a more precise result and show that the possible values for thedimension of WL(c) correspond to the degrees of the divisors of mL. Moreover,there are some elements c which lead to any of these values. In particular, thedegree of mL can always be achieved. This result can be proven in a constructiveway by using the representation of the associated matrix as a block diagonal matrixwhose diagonal consists of companion matrices.

Let us first focus on the special case when the minimal polynomial of L hasdegree n. It is well known that, in this case, there is a basis such that the matrixof L is the companion matrix of mL (e.g., [Her75, Lemma 6.7.1]). Using thisproperty, we can prove the following proposition.

Proposition 6.8. Let L be a linear permutation of Fn2 corresponding to the mul-tiplication by some companion matrix CQ with Q ∈ F2[X] of degree n. For anynon-constant divisor P of Q in F2[X], there exists c ∈ Fn2 such that ordL(c) = P .

Proof. When the matrix of the linear permutation we consider is a companionmatrix CQ, then the elements {c>, c>CQ, c>C2

Q, . . .}, can be seen as the succes-sive internal states of the n-bit LFSR with characteristic polynomial Q and initialstate c (see [LN94, Lemma 6.12 and p. 195]). It follows that ordL(c) corresponds

148

to the minimal polynomial of the sequence produced by the LFSR with character-istic polynomial Q and initial state c (see [LN94, Theorem 6.51]). On the otherhand, it is well-known that there is a one-to-one correspondence between the se-quences (st)t≥0 produced by the LFSR with characteristic polynomial Q and theset of polynomials C ∈ F2[X] with degC < degQ [LN94, Theorem 6.40]. Thiscomes from the fact that the generating function of any LFSR sequence can bewritten as ∑

t≥0

stXt =

C(X)

Q∗(X),

where Q∗ denotes the reciprocal of polynomial Q, i.e., Q∗(X) = XdegQQ(1/X),and C is defined by the LFSR initial state.

Let now P be any non-constant divisor of Q, i.e., Q(X) = P (X)R(X) withP 6= 1. Then, the reciprocal polynomials satisfy Q∗(X) = P ∗(X)R∗(X). It followsthat, for C(X) = R∗(X),

C(X)

Q∗(X)=

1

P ∗(X).

Therefore, the sequence generated from the initial state defined by C = R∗ hasminimal polynomial P . This is equivalent to the fact that the order polynomial ofthis initial state equals P .

When the degree of the minimal polynomial of the linear layer is smaller thanthe block size, the previous result can be generalized by representing L by a blockdiagonal matrix whose diagonal is composed of companion matrices. It leads tothe following general result on the possible dimensions of WL(c).

Proposition 6.9. Let L be a linear permutation of Fn2 and mL be its mini-mal polynomial. Then, for any divisor P of mL, there exists c ∈ Fn2 such thatdimWL(c) = degP .

Most notably,maxc∈Fn2

dimWL(c) = degmL .

Proof. If P equals the constant polynomial of degree zero, i.e., P = 1, we choosec = 0. Therefore, we assume in the following that P is of positive degree.

Let us factor the minimal polynomial of L in

mL(X) = M1(X)e1M2(X)e2 . . .Mk(X)ek

where M1, . . . , Mk are distinct irreducible polynomials over F2. From Theo-rem 6.7.1 and its corollary in [Her75], Fn2 can be decomposed into a direct sum ofL-invariant subspaces

Fn2 =

k⊕i=1

ri⊕j=1

Vi,j

such that the matrix of the linear transformation induced by L on Vi,j is the

companion matrix of Mì,ji where the ì,j are integers such that ì,1 = ei (the

149

polynomials M`i,ji are called the elementary divisors of L). Let now P be a non-

constant divisor of mL. Thus, we assume w.l.o.g. that

P (X) = M1(X)a1M2(X)a2 . . .Mκ(X)aκ with 1 ≤ ai ≤ ei .

Since each Maii is a non-constant divisor of Mei

i , we know from Proposition 6.8that there exists ui ∈ Vi,1 such that ordLi(ui) = Mai

i , where Li denotes thelinear transformation induced by L on Vi,1. Let us now consider the elementc ∈

⊕κi=1 Vi,1 defined by c =

∑κi=1 ui. Let π0, . . . πd−1 ∈ F2 such that R(X) :=

Xd +∑d−1t=0 πtX

t equals the order polynomial of c. In particular,

Ld(c) =

d−1∑t=0

πtLt(c) .

Using that Lt(c) =∑κi=1 L

t(ui) and the direct sum property, we deduce that, forany 1 ≤ i ≤ κ,

Ld(ui) =

d−1∑t=0

πtLt(ui) .

Then, R is a multiple of the order polynomials of all ui. It follows that R mustbe a multiple of lcm(Ma1

1 , . . . ,Maκκ ) = P . Since P (L(c)) = 0, we deduce that the

order polynomial of c is equal to P .

LED

The minimal polynomial of the linear layer in LED is

mL = (X8 +X7 +X5 +X3 + 1)4(X8 +X7 +X6 +X5 +X2 +X + 1)4 ∈ F2[X] .

Since its degree equals the block size, we deduce from the previous propositionthat there exists an element c ∈ F64

2 such that WL(c) covers the whole space.

Skinny

The linear layer in Skinny with a 16s-bit state, s ∈ {4, 8}, is an F2s-linear permu-tation of (F2s)

16 defined by a 16× 16 matrix M with coefficients in F2. Moreover,the multiplicative order of this matrix in GL16(F2) equals 16, implying that theminimal polynomial of L is a divisor of X16 + 1. It can actually be checked that(M + id16)e 6= 0 for all e < 16, implying that

mL = X16 + 1 = (X + 1)16 ∈ F2[X] .

It follows that there exist some elements c ∈ (F2s)16 such that dimWL(c) = d for

any value of d between 1 and 16. Elements c which generate a subspace WL(c) ofgiven dimension can be easily exhibited using the construction detailed in the proofof Proposition 6.8. Indeed, up to a change of basis, the matrix of L in GL16(F2)

150

corresponds to the companion matrix of (X16 + 1), i.e., to a mere rotation of16-bit vectors. In other words, we can find a matrix U ∈ GL16(F2) such thatM = U ×CX16+1×U−1. Let us now consider elements c ∈ (F2s)

16 for which onlythe least significant bits of the cells can take non-zero values. Let b be the 16-bitvector corresponding to these least significant bits, then dimWL(c) = d where dis the length of the shortest LFSR generating b′ = U−1b.

Prince

The minimal polynomial of the linear layer in Prince is

mL = X20 +X18 +X16 +X14 +X12 +X8 +X6 +X4 +X2 + 1

= (X4 +X3 +X2 +X + 1)2(X2 +X + 1)4(X + 1)4 ∈ F2[X] .

The maximal dimension of WL(c) is then 20 and the factorization of mL showsthat there exist elements which generate subspaces of much lower dimension.

Mantis and Midori-64

Mantis and Midori-64 share the same linear layer, which has minimal polynomial

mL = (X + 1)6 ∈ F2[X] .

We deduce that dimWL(c) ≤ 6.

6.5.2 Considering More Round Constants

We can now consider more than one round constant and determine the maximumdimension of WL(c1, . . . , ct) spanned by t elements. This value is related to theso-called invariant factor form (aka. the rational canonical form) of the linearlayer, as defined in the following proposition.

Proposition 6.10 (Invariant factors (see Chapter 12, Theorem 16 of [DF04])).Let L be a linear permutation of Fn2 . A basis of Fn2 can be found in which thematrix of L is of the form

CQr

CQr−1

. . .

CQ1

for polynomials Qi such that Qr | Qr−1 | · · · | Q1. The polynomial Q1 equals theminimal polynomial of L. In this decomposition, the Qi are called the invariantfactors of L.

151

The invariant factors of the linear layer then define the maximal value ofWL(c1, . . . , ct), as stated in Theorem 6.1 which we restate below.

Theorem 6.1. Let Q1, . . . , Qr be the invariant factors of the linear layer L andlet t ≤ r. Then

maxc1,...,ct∈Fn2

dimWL(c1, . . . , ct) =

t∑i=1

degQi .

Most notably, the minimal number of elements that must be considered in Din order to generate a space WL(D) of full dimension is equal to the number ofinvariant factors of the linear layer.

Proof of Theorem 6.1

We represent L in invariant factor form as in Proposition 6.10. We denote byV1, . . . , Vr the invariant subspaces such that Fn2 =

⊕ri=1 Vi and the linear transfor-

mation induced by L on Vi, denoted L|Vi , is represented by the companion matrix

CQi . We define eVi as the first unit vector in Vi, i.e., Vi = span{Lk(eVi) | 0 ≤ k <degQi} and ordL|Vi (eVi) = Qi. Using Proposition 6.8, one can prove the followinglemma.

Lemma 6.3. Let t ≤ r. Then

maxc1,...,ct∈Fnq

dimWL(c1, . . . , ct) ≥t∑i=1

degQi .

Proof. We choose c1 = eV1and obtain WL(c1) = WL|V1

(c1) = V1. Then dimV1

equals degQ1. We now continue with L|V2⊕···⊕Vm which has minimal polynomialQ2 and choose c2 accordingly. Iterating this until ct, we construct WL(c1, . . . , ct)as the direct sum

⊕ti=1WL(ci) which has dimension

∑ti=1 degQi.

In order to prove equality, we need the following two lemmas.

Lemma 6.4. Let c in Fn2 =⊕r

j=1 Vj be represented as c =∑j∈J uj with J ⊆

{1, . . . , r} and uj ∈ Vj \ {0}. Then WL(c) ⊆WL(c) with c :=∑j∈J eVj .

Proof. Let v ∈WL(c). Then

v =∑i∈N

αiLi(c) =

∑i∈N

αiLi(∑j∈J

uj) =∑i∈N

∑j∈J

αiLi(uj)

=∑i∈N

∑j∈J

αiLi(∑k∈N

βkLk(eVj )) =

∑i∈N

∑j∈J

∑k∈N

αiβkLi+k(eVj )

=∑i∈N

∑k∈N

αiβkLi+k(

∑j∈J

eVj ) =∑i∈N

∑k∈N

αiβkLi+k(c) ∈WL(c) .

152

This implies that for any c1, . . . , ct ∈ Fn2 , it is WL(c1, . . . , ct) ⊆WL(c1, . . . , ct).Thus, we can assume w.l.o.g. that all ci are of the form ci =

∑rj=1 γijeVj with

γij ∈ F2. Then, to any t-tuple (c1, . . . , ct) ∈ (Fn2 )t where each ci is of the formdescribed above, we associate a t× t matrix M(c1,...,ct) := [γij ]i,j over Fn2 .

Lemma 6.5. Let (c1, . . . , ct) ∈ (Fn2 )t be such that ci =∑rj=1 γijeVj and let

M(c′1,...,c′t)

be any matrix obtained from M(c1,...,ct) by elementary row operations.Then, for (c′1, . . . , c

′t) corresponding to M(c′1,...,c

′t)

, we have

WL(c′1, . . . , c′t) = WL(c1, . . . , ct) .

Proof. For a t× t matrix over F2, an elementary row operation is either

(i) a swap of two different rows or

(ii) an addition of one row to another.

Transforming a matrix M(c1,...,cr,...,cs,...,ct) by operation (i) results in the matrix

M(c1,...,cs,...,cr,...,ct) and obviously∑ti=1WL(ci) is commutative.

We therefore only have to show that for two constants cr, cs the equalityWL(cr) + WL(cs) = WL(cr + cs) + WL(cs) holds. Let v ∈ WL(cr) + WL(cs).Then,

u =∑i∈N

(αiLi(cr) + βiL

i(cs)) =∑i∈N

(αiLi(cr) + αiL

i(cs) + αiLi(cs) + βiL

i(cs))

=∑i∈N

(αiLi(cr + cs) + (αi + βi)L

i(cs)) ∈WL(cr + cs) +WL(cs) .

The other inclusion ⊇ follows accordingly.

Now, we can prove the main theorem.

Proof of Theorem 6.1. The only thing left to show is ≤. Given c1, . . . , ct ∈ Fn2with t ≤ r. By Lemma 6.4, WL(c1, . . . , ct) ⊆ WL(c1, . . . , ct) for appropriate ci =∑rj=1 γijeVj with γij ∈ F2.Consider the matrixM(c1,...,ct). Using elementary row operations, one can bring

M(c1,...,ct) in reduced row-echelon form M(c1,...,ct). Now, by Lemma 6.5, the ci aresuch that WL(c1, . . . ct) = WL(c1, . . . , ct) and, most importantly, WL(c1, . . . , ct) =∑ti=1WL|Vi⊕···⊕Vr

(ci). This is because ci =∑rj=1 γijeVj has γij = 0 for all j < i.

Since the minimal polynomial of L|Vi⊕···⊕Vr equals Qi, one finally obtains:

dimWL(c1, . . . , ct) ≤ dimWL(c1, . . . , ct) = dim

t∑i=1

WL|Vi⊕···⊕Vr(ci) ≤

t∑i=1

degQi

153

Prince

The linear layer of Prince has 8 invariant factors:

Q1 = Q2 = mL

= X20 +X18 +X16 +X14 +X12 +X8 +X6 +X4 +X2 + 1 ∈ F2[X]

Q3 = Q4 = X8 +X6 +X2 + 1 = (X + 1)4(X2 +X + 1)2 ∈ F2[X]

Q5 = Q6 = Q7 = Q8 = (X + 1)2 ∈ F2[X]

Then, from any set D with 5 elements, the maximal dimension we can get forWL(D) is 20 + 20 + 8 + 8 + 2 = 58, while we get 56 for the particular D derivedfrom the effective round constants D = {α, c1 +c2, c1 +c3, c1 +c4, c1 +c5}. We canthen see that the round constants are not optimal, but that we can never achievethe full dimension with the number of rounds used in Prince.


The linear layer of Mantis (resp. Midori-64) has 16 invariant factors:

Q1 = . . . , Q8 = (X + 1)6 and Q9 = . . . , Q16 = (X + 1)2 .

From the set D of size 7 (resp. 8) obtained from the actual round constants ofMantis7 (resp. Mantis8), we generate a space WL(D) of dimension 42 (resp. 48)which is then optimal. We also see that one needs at least 16 round constantdifferences c1, . . . , c16 to cover the whole input space. It is worth noticing thatthe round constants in Midori are only non-zero on the least significant bit in eachcell, implying that WL(D) has dimension at most 16. This is the main weaknessof Midori-64 with respect to invariant attacks and this explains why the use of thesame linear in Mantis does not lead to a similar attack.

The maximal dimension we can reach from a given number of round constantsfor the linear layers of Prince and of Mantis is then depicted in Fig. 6.1 in thebeginning of this chapter.

6.5.3 Choosing Random Round Constants

Often, the round constants of a cipher are chosen randomly. In this section, wewant to compute the probability that a set of uniformly random chosen elementsD generates a space WL(D) of maximal dimension. Again, we first consider thecase of a single constant, i.e., D = {c}.

Proposition 6.11. Let L be a linear permutation of Fn2 . Assume that


154

where M1, . . . , Mk are distinct irreducible polynomials in F2[X]. Then, the prob-ability for a uniformly chosen c ∈ Fn2 to obtain dimWL(c) = degmL is

Probc

$←Fn2(dimWL(c) = degmL) =

k∏i=1

(1− 1

2µi degMi

),

where µi is the number of invariant factors of L which are multiples of Meii .

Proof. We use the decomposition based on the elementary divisors, as in the proofof Proposition 6.9. From [Her75, Page 308], Fn2 can be decomposed into a directsum

Fn2 =

k⊕i=1

ri⊕j=1

Vi,j

such that the matrix of the linear transformation induced by L on Vi,j is the

companion matrix of Mì,ji where, for each i, the ì,j , 1 ≤ j ≤ ri, form a decreasing

sequence of integers such that ì,1 = ei. Then, the minimal polynomial of any

element u in Vi,j is a divisor of Mì,ji . It follows that, if c =

∑ki=1

∑rij=1 ui,j ∈⊕k

i=1

⊕rij=1 Vi,j , then ordL(c) = mL if and only if, for any i, there exists an

index j such that ordL(ui,j) = Meii . Obviously, this situation can only occur

if ì,j = ei. This last condition is equivalent to the fact that j ≤ µi, whereµi = max{j | ì,j = ei}. Using that the invariant factors of L are related to thedecomposition of mL by

Qv =

k∏i=1

Mì,vi

where ì,v = 0 if v > ri, we deduce that µi is the number of invariant factors Qvwhich are multiples of Mei

i . Let us now define the event

Ei,j : ordL(ui,j) = Mì,ji .

Then, we have

Probc

$←Fn2(dimWL(c) = degmL) =

k∏i=1

Prob

µi⋃j=1

Ei,j

.

It is important to note that for a fixed i, the probability of the event Ei,j is thesame for all j. This probability corresponds to the proportion of polynomials of

degree less than degMì,ji which are coprime to M

ì,ji . Indeed, as noticed in the

proof of Proposition 6.8, there is a correspondence between the elements in Vi,j

and the initial states of the LFSR with characteristic polynomial Mì,ji . Recall

that the number of polynomials coprime to a given polynomial P is

φ(P ) := |{f ∈ F2[X] | deg f < degP, gcd(f, P ) = 1}| .

155

If P is irreducible, then for any power of P we have φ(P k) = 2(k−1) degP (2degP−1).We then deduce that

Prob(Ei,j) =φ(M

ì,ji )

2ì,j degMi=

2(ì,j−1) degMi(2degMi − 1)

2ì,j ·degMi= 1− 1

2degMi.

To compute Prob(⋃µi

j=1Ei,j

), we use the inclusion-exclusion principle and obtain

Prob

µi⋃j=1

Ei,j

=

µi∑j=1

(−1)j−1

(µij

)(1− 1

2degMi

)j=

(1− 1

2µi degMi

).

LED

The minimal polynomial of the linear layer in LED is

mL = (X8 +X7 +X5 +X3 + 1)4(X8 +X7 +X6 +X5 +X2 +X + 1)4 ∈ F2[X] .

A single constant c is sufficient to generate the whole space. Since mL has twoirreducible factors, each of of degree 8, we get from the previous proposition thatthe probability that WL(c) = F64

2 for a uniformly chosen constant c is

Prob(WL(c) = F64

2

)= (1− 2−8)2 ≈ 0.9922 .

Probability to generate the whole space with several random constants

One can also give a formula for the probability to get the maximal dimension witht randomly chosen round elements, when t varies. This probability highly dependson the degrees of the irreducible factors of the minimal polynomial of L.

Theorem 6.3. Let L be a linear permutation of Fn2 . Assume that


where M1, . . . , Mk are distinct irreducible polynomials over F2. Then, the proba-bility that WL(c1, · · · , ct) equals Fn2 is

Probc1,...,ct

$←Fn2(WL(c1, · · · , ct) = Fn2 ) =

k∏j=1

rj−1∏ij=0

(1− 1

2(t−ij) degMj

),

where rj is the number of invariant factors of L which are multiples of Mj.

It is worth noticing that, when t < r with r the number of invariant factors, theproduct equals zero which corresponds to the fact that we need at least r constantsto generate the whole space. The proof of this Theorem can be found in the originalpublication [BCLR17].

156

Prince

Recall that the minimal polynomial of the linear layer in Prince is

mL = X20 +X18 +X16 +X14 +X12 +X8 +X6 +X4 +X2 + 1

= (X4 +X3 +X2 +X + 1)2(X2 +X + 1)4(X + 1)4 ∈ F2[X] .

It then has three irreducible factors

M1 = X4 +X3 +X2 +X + 1, M2 = X2 +X + 1 and M3 = (X + 1) .

Moreover, we know that the eight invariant factors of L are

Q1 = Q2 = mL ,

Q3 = Q4 = (X + 1)4(X2 +X + 1)2 ,

Q5 = Q6 = Q7 = Q8 = (X + 1)2 .

We then deduce that µ1 = 2, µ2 = 2 and µ3 = 4. Proposition 6.11 then impliesthat dimWL(c) ≤ 20 and

Prob (dimWL(c) = 20) = (1− 2−8)(1− 2−4)2 ≈ 0.8755

for a uniformly chosen c. Since L has 8 invariant factors, at least t = 8 elementsc1, . . . , c8 are needed to reach WL(c1, . . . , ct) = F64

2 . The number of invariantfactors in which each of the Mi appears is given by r1 = 2, r2 = 4 and r3 = 8.From Theorem 6.3, we get that the probability that WL(c1, . . . , c8) = F64

2 is

1∏i=0

(1− 2−(8−i)·4

)×

3∏i=0

(1− 2−(8−i)·2

) 7∏i=0

(1− 2−(8−i)

)' 0.2895 .


The minimal polynomial of the linear layer of Mantis and Midori-64 has a singleirreducible factor, which is (X+1). This linear layer has 16 invariant factors. Sincethe first 8 invariant factors equal the minimal polynomial, which has degree 6, wederive from Proposition 6.11 that the probability that a uniformly chosen elementgenerates a subspace of dimension 6 is

Prob (dimWL(c) = 6) = (1− 2−8) ≈ 0.9961 .

We need at least 16 elements c1, . . . , c16 to cover the whole space and this occurswith probability

16∏j=1

(1− 1

2j

)' 0.28879 .

It is worth noticing that when we increase the number of random round constantsfrom 16 to 20, this probability increases to 0.93879.

157

Figure 6.2 in the beginning of this chapter shows how the probability that thewhole space is covered increases with the number of randomly chosen elements,for the linear layers of LED, Skinny-64, Prince and Mantis. The fact that the curvecorresponding to Skinny-64, Prince and Mantis have a similar shape comes fromthe fact that all three linear layers have a minimal polynomial divisible by (X+1),and this factor appears in all invariant factors. Then, the term corresponding tothe irreducible factor of degree 1, namely

t∏j=t−r+1

(1− 1

2j

)is the dominant term in the formula in Theorem 6.3. Most notably, for t = r, theprobability is close to (1− 2−1)(1− 2−2)(1− 2−3)(1− 2−4) ' 0.3.


For lightweight substitution-permutation ciphers with a simple key scheduling, weprovided a detailed analysis on the impact of the design of the linear layer and theparticular choice of the round constants to the applicability of both the invariantsubspace attack and the more recently published nonlinear invariant attack. Withan algorithmic approach, a designer is now able to easily check the soundness ofthe chosen round constants, in combination with the choice of the linear layer,with regard to the resistance against invariant attacks and can thus easily avoidpossible weaknesses by design. We stress that in many cases, this analysis canbe done independently of the choice of the substitution layer. We directly appliedour methods to several existing lightweight ciphers and showed in particular whySkinny-64-64, Prince, and Mantis7 are secure against invariant attacks; unless theadversary exploits weaknesses which are not based on weaknesses of the underlyingbuilding blocks, i.e., substitution layer and linear layer. In fact, we are not awareof any such strong attacks in the literature.

As future work, one can think about further generalizations of invariant attacks.As it was already mentioned in [TLS16], it would be interesting to know if onecan make use of statistical invariant attacks, i.e., invariant attacks that only workwith a certain probability instead for all possible plaintexts. In other words, it ispossible to utilize nonlinear approximations that hold with an absolute correlationless than 1? Further, it would be nice to study invariant attacks under morecomplex key schedules.

We have also seen a relation between a quadratic invariant g for an instanceEk and the existence of high-biased linear approximations over Ek. For futurework, one could try to understand more about this relation. As a starting point,one could try to analyze the distribution of the correlations corEk(γ, γ′) over allγ, γ′ ∈ Γg. There might also be a link to the observations made in [LAAZ11,Section 4.3] and [AABL12, Section 5].

158

Listing 6.1: Sage code for proving the non-applicability of the invariant attack.

from sage . geometry . hyperplane arrangement . a f f i n e s u b s p a c e importAff ineSubspace

# conver t s an in t e g e r to a binary vec to r .def t o b i n a r y v e c t o r ( a , l ength ) :

l s = I n t e g e r ( a ) . b i t s ( ) [ : : − 1 ]return vec to r (GF(2) , length , [ 0 ] ∗ ( length−len ( l s ) )+l s )

# Evaluates the S−box l a y e r with S−box Sb on vec tor vdef s b o x l a y e r e v a l (Sb , b it Sb , v ) :

w = copy ( v )for i in range ( len (w) / b i t Sb ) :

w[ ( i ∗ b i t Sb ) : ( ( i +1)∗ b i t Sb ) ] =l i s t ( t o b i n a r y v e c t o r (Sb [ ZZ( l i s t (w[ ( i ∗ b i t Sb )

: ( ( i +1)∗ b i t Sb ) ] [ : : − 1 ] ) , base = 2) ] , b i t Sb ) )return w

# returns complement C of V such tha t C. i n t e r s e c t i o n (V) i s t r i v i a ldef decomposit ion complement (V) :

L1 = l i s t (V. b a s i s ( ) )L2 = l i s t (V. ambient vec to r space ( ) . b a s i s ( ) )R = [ ]for v in L2 :

i f ( v not in span (L1) ) :L1 . append ( v )R. append ( v )

return span (R)

# input : l i s t o f d i f f e r e n c e s D, l i n e a r l a y e r L as a matrix# output : the subspace W L(D)def W space (D, L) :

R = [ ]for c in D:

for j in range (L . m u l t i p l i c a t i v e o r d e r ( ) ) :R. append ( (L∗∗ j ) ∗c )

return span (R)

# input : S−box S , subspace Z of W L(D)# i f true , the cons tant s prevent inva r i an t a t t a c k sdef check with sbox (S , Z) :

b i t S = int ( l og ( len (S) ,2 ) )

# de f ine 0 + Z as an a f f i n e space and choose a complement Q of Z# Q i s isomorphic to (GF(2) ˆn)/Z and each q in Q i s a

r e p r e s en t a t i v e o f a d i f f e r e n t cose t q + ZA = Aff ineSubspace (0 ,Z)Q = Aff ineSubspace (0 , decomposit ion complement (Z) )

# l s w i l l i n d i c a t e a l l c o s e t s ” h i t ” by the S−box l a y e rl s = set ( )k = 2∗∗Q. dimension ( )print ( repr ( k ) + ’ c o s e t s to check ’ )percent done = 0

159

# repeat t h i s u n t i l each cose t i s h i twhile ( len ( l s ) < k ) :

a = A. l i n e a r p a r t ( ) . random element ( ) + A. po int ( )b = s b o x l a y e r e v a l (S , b i t S , a )# q g i v e s the unique r e p r e s en t a t i v e o f the cose t in Qq = Q. i n t e r s e c t i o n ( Af f ineSubspace (b , Z) ) . po int ( )# add in t e g e r r ep r e s en ta t i on o f q in the s e t o f c o s e t s h i t .l s . add (ZZ( l i s t ( q ) , base=2) )i f ( len ( l s ) /k >= ( percent done +1)/100) :

percent done = percent done + 1print ( repr ( percent done ) + ’ % done ’ )

return t rue

160

Chapter 7

Differential Trails in Simon-likeCiphers

This chapter is a revised version of the author’s publication [Bei16].1

7.1 Introduction

Once a new cipher is proposed, the designers are expected to provide security ar-guments on the resistance differential and linear attacks. In particular, any newdesign itself should allow for an, if possible simple, security argument. In SP ci-phers, the separation into linear and non-linear components offers the advantageof analyzing the structure more easily. As we have already introduced in Sec-tion 2.4.2 and Section 2.4.3, two principles are common. Firstly, one can try toobtain theoretical bounds on the minimum number of active S-boxes according tothe wide-trail strategy (see Theorem 2.4), or secondly, one employs computer-aidedmethods. The advantage of having provable bounds on the minimum number ofactive S-boxes is one reason why so many AES-like designs occurred over the lastyears. It also emphasizes that designers prefer well-understood principles. Whilefor AES-like ciphers counting the number of active S-boxes can be somehow doneindependently of the choice of the S-box, some other strategies use specific prop-erties of the non-linear components. For instance, the designers of Present showedthat any valid five-round differential trail has at least 10 active S-boxes, usingproperties of the actual S-boxes [BKL+07].

The other strategy is evaluating the security using computer-aided search meth-ods. For instance, one can model the propagation of differential and linear trailsas a mixed-integer linear programming instance [MWGP12, SHW+14b, BJL+15].

1The original article published by Springer-Verlag is available at DOI: 10.1007/978-3-319-44618-9 23 ( c© Springer International Publishing Switzerland 2016).

161

http://dx.doi.org/10.1007/978-3-319-44618-9_23

http://dx.doi.org/10.1007/978-3-319-44618-9_23

Examples of a design which uses experimental arguments are the hash functionKeccak [BDPA11], the block cipher Serpent [BAK98], and also the AES-like designsSkinny and Mantis (Chapter 5). However, the bounds obtained with this approach,although very useful, are not verifiable without a machine and do not contributesignificantly to a better understanding of the design itself.

Basically, in both strategies, (if the non-linear component is not too weak) thedesign of the linear layer is the crucial step when it comes to providing securityagainst differential and linear attacks. While a single round can often be analyzedquite easily, the analysis of the linear layer usually has to be done using a morecomplex argument over multiple rounds. Unfortunately, not many constructionsare known that allow to prove security using arguments that are verifiable byhand. One therefore may seek for alternative design principles. Especially forlots of Feistel designs, the constructions might be less clear and less understood.However, there are some fundamental results on bounding the differential andlinear behavior [NK95]. There are also Feistel designs which consist of SP-typeround functions [SP04, SS04] combining the advantages of the Feistel constructionand the simple arguments of the wide-trail strategy.

In contrast to a scientific design process, the NSA presented the Simon familyof lightweight block ciphers [BSS+13]. Besides its specification, no arguments onthe security are provided. Especially since Simon is an innovative Feistel cipher,its design is harder to analyze. Besides its non-bijective round function and thecombining of the branches after every round, the difficulties are caused by thebit-wise structure. Since the design choices were left unclear in the first place, oneseeks for a deeper understanding of the cipher.

Related Work

The appearance of the Simon family of block ciphers inspired the cryptographiccommunity taking further investigations on the possible design rationale. There-fore, several cryptanalytic results followed. For instance, see [ABG+13, AL13,WLV+14, WWJZ14, AAA+15, ALLW15, Ash15, BRV15, CW16, KSI16, TM16]for a selection. They are mostly based on experimental search. No significantweaknesses have been found so far and Simon still offers a reasonable securitymargin based on existing cryptanalysis.

In [KLT15], Kolbl, Leander and Tiessen pointed out some interesting propertiesof Simon-like round functions. Those observations were then used for a furtheranalysis of the differential and linear behavior over multiple rounds. Althoughthe analysis of the round function was done in a mathematical rigorous manner,the multi-round behavior was derived using a computer-aided approach. As oneresult, the rotation constants of Simon turned out to be in some sense not optimallychosen.

Inspired by the design of Simon, Yang et al. proposed the Simeck family oflightweight block ciphers in [YZS+15]. It can be seen as a Simon-like cipher usingdifferent rotation constants in its round function and a key schedule inspired by

162

Speck [BSS+13].At the NIST lightweight workshop in 2015, the designers of Simon presented

a follow-up paper covering some considerations with regard to implementation[BSS+15].

Very recently, in [LLW17b], Liu, Li and Wang derived an upper bound on theabsolute correlation of non-trivial linear trails in Simon and Simeck, adapting ourmethods to the case of linear cryptanalysis.


After describing a generalization of the Simon design by decoupling the roundfunction into a linear and a non-linear component, we show that the structure ofa Simon-like design allows for a proof on the resistance against differential attacksunder standard simplifying assumptions. The question whether the proof worksdepends on the interaction between those two components. If the non-linear partρ is of the form ρ(x) = (x ≪ a) ∧ (x ≪ b), it can be in general formulated asa property of the linear part. A sufficient condition is that the linear part hasa branch number (with respect to bits) of at least 11. Since this is not the casefor Simon and Simeck, we consider those designs separately. In particular, for allversions of Simon and Simeck, we are able to show that

t∏i=1

Prob(αi−1F fS→ αi) ≤ 2−2t+2 ,

where t denotes the number of rounds of the particular cipher, F fS the (unkeyed)round of the key-alternating Feistel structure, and (α0, . . . , αt) any non-zero t-round differential trail. We show this in detail for the example of Simon. Mostimportantly, for all versions of Simon and Simeck, the number of rounds t is suchthat 2−2t+2 ≤ 2−n, where n denotes the block length.

In clear distinction to prior work such as [KLT15], our argument is a formalproof covering multiple rounds and can thus be verified without experimentaltools. In our approach, we use a well-known property of the Simon-like Feistelfunction, namely that the set of possible output differences Uα defines an affinesubspace depending on the input difference α and that the differential probabilityhighly depends on the Hamming weight of α. The main idea is that we extendthe analysis of the Feistel function to the cases where α has a Hamming weightequal to two and consider the propagation of Hamming weights over the Feistelstructure.2

Figure 7.1 illustrates the bounds proven with our method and, as a comparison,the bounds obtained from experimental search described in [KLT15, Section 5.2]for two instances of Simon. It is to mention that, although our bounds are worse

2Note that recently, in [LLW17a], Liu, Li and Wang presented an improved result on thedifferential probability of differences α with Hamming weight less than n

4. Their result would

allow for some simplifications in the proof of our main result.

163

than the experimental results, they are still much better than the bounds oneobtains by trivially multiplying the worst-case probabilities for every round.

2 4 6 8 10 12 14 16

0

10

20

30

40

50

number of rounds t

−lo

g2

∏ t i=1

Pro

b(αi−

1FfS

→αi)

Simon-48Simon-32

our boundtrivial bound

Figure 7.1: Comparison of the experimental bounds for Simon-32 and Simon-48as described in [KLT15, Section 5.2] and our provable bounds.

7.2 Preliminaries

For a slightly more simplified notation in our arguments, throughout this chapterwe will denote the bits within vectors x ∈ Fd2 with indices starting from zero, i.e.,x = (x0, x1, . . . , xd−1). We may sometimes also use a superscript notation fordenoting the position of a bit. For example, the element (0, . . . , 0, y(i), 0, 0, . . . , 0)denotes a vector (x0, x1, . . . , xd−1) with xi = y and xj = 0 for all j 6= i. Moreover,when n denotes the block length of a cipher, we define n′ = n

2 .

Argument on the Resistance Against Differential Attacks

When designing a new block cipher, in the ideal case, one would like to avoid theexistence of non-trivial differentials with high probability for all possible keyedinstances. However, since in general, computing the maximum differential proba-bility of multi-round differentials is a non-trivial task, one adheres to simplifyingassumptions. In most cases, and as outlined in detail in Section 2.3.1, one con-centrates on upper bounding the product of the differential probabilities over therounds contained in any non-zero differential trail. In particular, if one is going todesign a key-alternating cipher as depicted in Figure 2.2, a typical approach is to

estimate the number of rounds t such that∏ti=1 Prob(αi−1

Ri→ αi) < 2−n for any

164

non-zero t-round differential trail (α0, α1, . . . , αt) and finally specify the numberof rounds of the primitive as t+tm, where tm defines a reasonable security margin.This provides a sound argument on the resistance against differential attacks underthe hypothesis of stochastic equivalence (i.e., Assumption 2.1), the assumption ofindependent round keys (i.e., Assumption 2.2), and the assumption that the valueof the sum over the probabilities given in Equation 2.3 is dominated by a singledifferential trail.

For the key-alternating Feistel cipher Simon, whereas the unkeyed Feistel roundis denoted by F fS and the block length denoted by n, we show that

t∏i=1

Prob(αi−1F fS→ αi) < 2−n

for any non-zero t-round differential trail (α0, α1, . . . , αt), where t is strictly smallerthan the specified number of rounds, still leaving a reasonable security margin.Note that our security argument still implicitly assumes independent round keys,although the key is only added to one half of the n-bit state.

A Remark on the Key-Alternating Feistel Construction

In Section 2.2.3, we have explained the notion of a key-alternating Feistel structure.Recall that, for a vectorial Boolean function f : Fn′2 → Fn′2 and k ∈ Fn′2 , we havegiven the keyed instance of the round as

Ffk : Fn′

2 × Fn′

2 → Fn′

2 × Fn′

2

(xl, xr) 7→ (f(xl) + xr + k, xl) .

Thereby, f is called the Feistel function (or simply f -function) and k is the roundkey. For simplicity, we will consider an identical Feistel function f in every round.With regard to the key-alternating structure, we define F f := Ff0 as the unkeyedround of the cipher. The application of those unkeyed rounds is then interleavedby the round key addition, where the left half of each round key is always equalto zero.

We are going to denote a difference within the Feistel structure as a pair (γ, δ) ∈Fn′2 × Fn′2 , describing the differences in the left and the right branch, respectively.For a t-round differential trail C = ((γ0, δ0), (γ1, δ1), . . . , (γt, δt)) over (unkeyed)Feistel rounds F f , we define

P (C) :=

t∏i=1

Prob(γi−1f→ γi + δi−1) .

It is straightforward to deduce that

t∏i=1

Prob((γi−1, δi−1)F f→ (γi, δi)) =

{P (C) if ∀i ∈ N≤t : γi−1 = δi

0 else.

165

Lemma 7.1 presents a general observation on the Feistel construction. It statesthat, having upper bounds on the value of P (C) for all differential trails C startingwith (0, α) and ending with (0, β), one can easily upper bound the value of P (C)of any differential trail C.

Lemma 7.1. For t ≥ 1, let for all non-zero differences α, β ∈ Fn′2 , the value ofP (C) of any t-round trail C (over round functions F f ) starting with (0, α) andending with (0, β) be upper bounded by p(t).

Let further p(0) := 1 and q := maxα6=0,β Prob(αf→ β). Then, for any T > 0

and any non-zero T -round trail C over round functions F f , it is

P (C) ≤ max

{maxl∈N0

<T

{p(l)qT−l−1}, p(T )

}.

Proof. Let us fix a non-zero T -round trail C = (γ0, δ0)F f→ (γ1, δ1)

F f→ . . .F f→

(γT , δT ). The proof is now split into two cases.

(i) Let us assume that there exist distinct i, j such that γi = γj = 0. Then,w.l.o.g., one can choose two distinct indices i′, j′ such that γi′ = γj′ = 0and γl 6= 0 for all l < i′ and all l > j′. If (i′, j′) = (0, T ), we already haveP (C) ≤ p(T ) by definition. Thus, let (i′, j′) 6= (0, T ). Now, by definition

P ((γi′ , δi′), . . . , (γj′ , δj′)) ≤ p(j′ − i′) .

Since γj′ = 0 and , for all l < i′ and l > j′, γl 6= 0, we have

P (C) ≤ p(j′ − i′)i′−1∏l=0

Prob(γlf→ γl+1 + δl)

T−1∏l=j′+1

Prob(γlf→ γl+1 + δl)

≤ p(j′ − i′)qi′qT−(j′+1) = p(j′ − i′)qT−(j′−i′)−1 .

and j′ − i′ < T .

(ii) If γi = 0 for at most one i, then

T−1∏l=0

Prob(γlf→ γl+1 + δl) ≤

∏l∈N0

<T \{i}Prob(γl

f→ γl+1 + δl) ≤ qT−1 = p(0)qT−1 .

As Lemma 7.1 states a general property for all Feistel structures, we give asimplified version in Section 7.3 as Corollary 7.1. It covers the special case of theround function in Simon and a Simon-like round function, which will be definednext.

166

Simon and Simon-like Ciphers

We generalize the design of the Simon block cipher to the Simon-like structure.Figure 7.2 illustrates this construction. For the Simon-like design, one requires arotational invariant function of algebraic degree two as the non-linear component.A vectorial function f : Fd2 → Fd2 is called rotational invariant if f(x ≪ r) =(f(x) ≪ r) for all x ∈ Fd2 and all offsets r. This leads to the following definition.

Definition 7.1. A Simon-like f -function fS is composed of an F2-linear functionθ : Fn′2 → Fn′2 and a function ρ : Fn′2 → Fn′2 of algebraic degree two of the formρ(x) = ϑ1(x) ∧ ϑ2(x), with F2-linear and rotational invariant ϑi : Fn

′2 → Fn′2 , as

fS : x 7→ ρ(x) + θ(x) .

In this context, a Simon-like cipher employs such an f -function in a key-alternating Feistel construction.

Note that the rotational invariance is, in this general case, not required for thelinear part θ.

≪ 8

≪ 1

≪ 2

∧

kt

ϑ1

ϑ2

θ

∧ρ

kt

Figure 7.2: Illustration of the Simon round function and the generalized Simon-like round function.

7.3 Analysis of Differential Trails

In this section, we analyze the propagation characteristics of differences over sev-eral rounds. We rely on the fact that a single Simon-like round is quite wellunderstood. Let

Lα(x) := (ϑ1(x) ∧ ϑ2(α)) + (ϑ1(α) ∧ ϑ2(x)) .

We first recall the observation that for any input difference α ∈ Fn′2 into a Simon-like round function fS , a possible output difference lies in the affine subspaceUα := ImLα + fS(α). The main reason is that fS has algebraic degree two. Thisis formally stated in Proposition 7.1.

167

Proposition 7.1 (Kolbl, Leander, Tiessen [KLT15]). Let fS be a Simon-like f -function. For an input difference α ∈ Fn′2 into fS, the set of possible output

differences defines an affine subspace Uα such that Prob(αfS→ β) 6= 0 if and only

if β ∈ Uα. Defining dα := dim ImLα, it holds that

β ∈ Uα ⇔ β + fS(α) ∈ ImLα

and Prob(αfS→ β) = 2−dα for all valid differentials (i.e., differentials with non-zero

differential probability) over fS.

Since the probability is the same for all output differences β in this subspace,

we simply write pα for Prob(αfS→ β) with β ∈ Uα. For all output differences which

are not elements in this subspace, the differential probability is equal to zero.Because of the rotational invariance, it holds that ImL(α≪r) = (ImLα ≪ r)

with p(α≪r) = pα. One can thus restrict the consideration to a single representa-tive of this equivalence class if only one round function is analyzed.

7.3.1 Restriction to Rotations as Rotational Invariant Functions

From now on, we consider ϑ1, ϑ2 of the form ϑ1(x) = (x ≪ a) and ϑ2(x) =(x≪ b), respectively. This describes the most simple structure of the generalizedSimon-like cipher. For the θ step defined as θ(x) = (x ≪ c), one obtains Simonand Simeck as a special case using (8, 1, 2), resp. (5, 0, 1), as a choice for therotation constants (a, b, c). The following lemma states that we can obtain anupper bound on the differential probability over fS depending on the Hammingweight of the input difference. While a weaker version of Lemma 7.2 can bededuced from [KLT15, Theorem 3, p. 9], we improve the bound from [KLT15] ifthe Hamming weight of the input difference equals 2. Although this improvementseems to be of little importance at a first glance, it is exactly this tighter boundwhich allows us to prove the main result. Thus, Lemma 7.2, and especially case(ii), is one of the core components in our argument.3

Lemma 7.2. Let ϑ1(x) = (x≪ a) and ϑ2(x) = (x≪ b). Assume that n′ ≥ 6 iseven and gcd(a− b, n′) = 1. Let α = (α0, . . . , αn′−1) ∈ Fn′2 be an input differenceinto fS. Then, for the differential probability over fS, it holds that

(i) If w1(α) = 1, then pα ≤ 2−2 .

(ii) If w1(α) = 2, then pα ≤ 2−3 .

(iii) If w1(α) 6= n′, then pα ≤ 2−w1(α) .

(iv) If w1(α) = n′, then pα ≤ 2−n′+1 .

3In [LLW17a], Liu, Li and Wang extended case (ii) to all α with w1(α) < n′2

. In particular,

they showed that pα ≤ 2−w1(α)−1 in those cases.

168

Proof. Without loss of generality one can assume that b = 0 and a < n′

2 , a 6= 0because of the rotational invariance and since a− b and n′ are coprime. Accordingto [KLT15, Theorem 3, p. 9], it is pα = 2−dα with

dα =

{w1

(((α≪ a) ∨ α) + (α ∧ (α≪ a) ∧ (α≪ 2a))

)if w1(α) 6= n′

n′ − 1 if w1(α) = n′.

Note that dα = dim ImLα, where

Lα(x) = ((x≪ a) ∧ α) + ((α≪ a) ∧ x) .

(i), (iii) and (iv) follow directly from the above formula. In order to show (ii), weconstruct three linearly independent elements in ImLα.

Let w1(α) = 2 with α0 = αi = 1. W.l.o.g., let i ≤ n′

2 , i 6= 0, since every α witha Hamming weight of two is rotational equivalent to that assumed form. Now,consider the following three elements x, y, z ∈ Fn′2 :

x= (0, . . . , 0, 1(a), 0, . . . , 0) ⇒ Lα(x) = (1(0), 0, . . . , 0, α(a)2a , 0, . . . , 0)

y= (0, . . . , 0, 1(a+i), 0, . . . , 0) ⇒ Lα(y) = (0, . . . , 1(i), 0, . . . , 0, α(i+a)i+2a mod n′ , 0, . . . , 0)

z= (1, 1, . . . , 1) ⇒ Lα(z) = (α≪ a) + α .

Clearly, Lα(x) and Lα(y) are linearly independent. To show that Lα(z) /∈span{Lα(x), Lα(y)}, we consider the two cases

(1) αi+2a mod n′ = 0 : Then Lα(y)i+a = 0. Since Lα(z)n′−a = 1 and n′ − a /∈{0, i, a}, the linear independence follows.

(2) αi+2a mod n′ = 1 : Then i+ 2a mod n′ ∈ {0, i} because of the construction ofα. However, since 2a 6= 0 mod n′, it follows that i+ 2a = 0 mod n′. Hence,2a = n′ − i. Now 2a 6= i, because otherwise n′ = 4a which is contradictory togcd(a, n′) = 1 (since n′ ≥ 6). Thus Lα(x)a = 0. In addition, i 6= a becauseotherwise 3a = 0 mod n′ which is also contradictory to gcd(a, n′) = 1. Now,Lα(z)i−a mod n′ = 1 and i− a /∈ {0, i, i+ a}.

In all cases, we thus have pα ≤ 2−2 if α 6= 0 and p0 = 1. The interestingproperty is the fact that pα ≤ 2−w1(α)−1 if α has a Hamming weight of 2. This iswhat we make use of in the following arguments. The basic idea is to guaranteeenough transitions with a probability ≤ 2−3 before a zero input difference intofS occurs (then p0 = 1). This allows us to catch up the factor 2−2 that we losefor the zero input difference. Otherwise, if we were not able to guarantee thetighter bound described in Lemma 7.2 (ii), the input difference into fS of everysecond round might be equal to zero in the worst case and our argument wouldonly provide the trivial bound of 2−T over T rounds. See also Figure 7.1 for anillustration. For the formal proof, we give Corollary 7.1 at first. It is an implicationof Lemma 7.1 for the Simon-like f function.

169

Corollary 7.1. Let, for all non-zero differences α, β and all t ≥ 1, the valueof P (C) for any t-round differential trail C (over Simon-like Feistel rounds F fS )starting with (0, α) and ending with (0, β) be upper bounded by 2−2t. Let furtherpγ ≤ 2−2 for all γ 6= 0. Then, for any non-zero T -round trail C over Simon-likerounds F fS , with T > 0, it is

P (C) ≤ 2−2T+2 .

Proof. With the notation in Lemma 7.1, it is p(t) = 2−2t and q ≤ 2−2. Thus,

P (C) ≤ max

{maxl∈N0

<T

{p(l)qT−l−1}, p(T )

}≤ max

{maxl∈N0

<T

{2−2l2−2T+2l+2}, 2−2T

}= max{2−2T+2, 2−2T } = 2−2T+2 .

Thus, in order to prove an upper bound on the value of P (C) of 2−2T+2 forany non-zero T -round differential trail C, we only have to concentrate on t-roundtrails of the form (0, α) → · · · → (0, β) and prove an upper bound of 2−2t for allof those. We further can restrict ourselves to the shortest trails of this form, e.g.,γi 6= 0 for all intermediate γi. The reason is that one can easily concatenate theseshort trails to longer ones for which the property holds as well.

We have to do the analysis for a specific choice of the linear mapping θ. Asa more general case, Theorem 7.1 formulates a sufficient condition on θ for ourargument to work. It requires that the branch number of θ (with respect to bits)is high enough.4

Theorem 7.1. Let B1(θ) ≥ 11. Then for any distinct a, b and any n′ fulfilling theproperties of Lemma 7.2, the value of P (C) for any non-zero T -round differentialtrail C over round functions F fS is upper bounded by 2−2T+2.

Proof. Fix a t-round trail of the form

(0, α)→ (γ1 = α, 0)→ (γ2, δ2)→ · · · → (γt−1, δt−1)→ (0, β)

with γi 6= 0 for all i ∈ {1, . . . , t−1}. Thus, we have pγi ≤ 2−2 for all those i. Sinceγ1 = α and (0, α)→(α, 0) holds with certainty (p0 = 1), we have to show thateither pγi ≤ 2−4 for at least one i or that pγi , pγj ≤ 2−3 for at least two distinctindices i, j. In other words, one has to make sure to gain a factor of 2−2 withinthe trail. In order to show this, we make use of Lemma 7.2. If w1(α) ≥ 4, we areclearly done since pγ1 = pα ≤ 2−w1(α). We thus have to distinguish 3 cases.

(i) w1(α) = 1: Because of the branch number, it is w1(θ(x) + θ(x + α)) ≥ 10.Since further w1(ρ(x) + ρ(x+ α)) ≤ 2, we have w1(γ2) ≥ 8 and pγ2 ≤ 2−4.

4Using the improvements recently shown in [LLW17a], a branch number of 8 instead of 11would suffice.

170

(ii) w1(α) = 2: It is w1(θ(x) + θ(x+α)) ≥ 9 and w1(ρ(x) +ρ(x+α)) ≤ 4. Thus,w1(γ2) ≥ 5 and therefore pγ2 ≤ 2−4.

(iii) w1(α) = 3: We already have pα ≤ 2−3. Since w1(θ(x) + θ(x + α)) ≥ 8 andw1(ρ(x) + ρ(x+ α)) ≤ 6, it is w1(γ2) ≥ 2 and therefore pγ2 ≤ 2−3.

See also Figure 7.3 for the propagation of the Hamming weights of differencesα.

θ

ρ

α 01 0

≤2

≥10

γ2 δ2 = α≥8

θ

ρ

α 02 0

≤4

≥9

γ2 δ2 = α≥5

θ

ρ

α 03 0

≤6

≥8

γ2 δ2 = α≥2

Figure 7.3: Propagation of the Hamming weight (in red) for differences (α, 0)with w1(α) ∈ {1, 2, 3}.

We recall that θ does not have to be rotational invariant. Nevertheless, havinga branch number of at least 11 is a quite restrictive property on a linear layer andin fact, for n′ = 16, there does not exist such a linear mapping. The reason isthat the minimum distance d of any [32, 16, d] code over F2 is at most 8 [Gra07].However, for n′ ∈ {24, 32, 48, 64} which corresponds to most of the block lengthin Simon, such a linear mapping θ exists as one can also deduce from [Gra07]. Asthe previous argument is more generic one, we investigate the linear part of Simonin more detail in the following.

7.3.2 Obtaining the Upper Bound for Simon and Simeck

In the following, we consider the linear part θ(x) = (x ≪ c), which has a branchnumber of only 2. Choosing (8, 1, 2) for the rotation constants (a, b, c), we obtainthe round function of Simon. Theorem 7.2 states the same bound as above for allversions of Simon with regard to the supported block lengths. Note that the resultsare dependent on the specific choice of the rotation constants, but can be provenfor other choices in a similar way. Of course, it does not hold for all possible a, band c. For example, if c = a or c = b, one obtains the trivial bound of P (C) ≤ 2−t

for any non-zero t-round trail C since

C ′ = ((1, 0, . . . , 0), 0)→ (0, (1, 0, . . . , 0))→ ((1, 0, . . . , 0), 0)

171

would be a valid two-round trail with P (C ′) = 2−2 that is iterative and can thusbe concatenated to longer trails.

Theorem 7.2 (Bounds for Simon). Let n′ ∈ {16, 24, 32, 48, 64} and let θ(x) =(x ≪ 2). For the rotation constants a = 8, b = 1, the value of P (C) for anyT -round differential trail C over round functions F fS is upper bounded by 2−2T+2.

Proof. Again, fix a t-round trail of the form

(0, α)→ (γ1 = α, 0)→ (γ2, δ2)→ · · · → (γt−1, δt−1)→ (0, β)

with γi 6= 0 for all i ∈ {1, . . . , t − 1}. We have to show that either pγi ≤ 2−4

for at least one i or that pγi , pγj ≤ 2−3 for at least two distinct indices i, j. Inorder to show this, Lemma 7.2 is used several times within this proof. Again, wehave to distinguish three cases. Note that for simplicity with indices, we assumerotations to the right in the following. We use the ∗ symbol to indicate a bit withunspecified value.

(i) w1(α) = 1: Considering the rotational equivalence, let, w.l.o.g.,

α = (1, 0, . . . , 0) .

Recall that we get Uα = ImLα + fS(α). Since we assume

fS : x 7→ (x≫ 8) ∧ (x≫ 1) + (x≫ 2) ,

we obtain

γ2 = (0, ∗1, 1, 0, 0, 0, 0, 0, ∗2, 0, 0, 0, 0, 0, 0, 0 . . . ) ∈ Uα + 0 .

Case 1 (∗2 = 0): Then,

γ3 = (1, 0, ∗, ∗, 1, 0, 0, 0, 0, ∗, ∗, 0, 0, 0, 0, 0 . . . ) ∈ Uγ2 + α,

γ4 = (0, ∗, ∗†, ∗, ∗, ∗, 1, 0, ∗, 0, ∗, ∗, ∗, 0, 0, 0 . . . ) ∈ Uγ3 + γ2 .

If now the weight of γ4 is larger than 1, then pγ3 , pγ4 ≤ 2−3. Thus, letw1(γ4) = 1. It follows that

γ5 = (1, 0, ∗, ∗, 1, 0, 0, ∗, 1, ∗, ∗, 0, 0, 0, ∗, 0 . . . ) ∈ Uγ4 + γ3

and thus pγ5 ≤ 2−3.

Case 2 (∗2 = 1): Then pγ2 ≤ 2−3 already holds and

γ3 = (∗‡, 0, ∗, ∗, 1, 0, 0, 0, 0, ∗, ∗, 0, 0, 0, 0, 0 . . . ) ∈ Uγ2 + α .

†This bit is only unknown if the length is 16 bit (n′ = 16). Therefore, w.l.o.g., we assumethis bit to be unknown. In the following, we may also consider certain bits to be unknown if theactual value does not matter for the proof.

172

Again, w.l.o.g., let w1(γ3) = 1. It follows that

γ4 = (0, ∗, 1, 0, 0, ∗, 1, 0, 1, 0, 0, 0, ∗, 0, 0, 0 . . . ) ∈ Uγ3 + γ2


(ii) w1(α) = 2: Considering the rotational equivalence, let, w.l.o.g.,

α = (1, 0, . . . , 0, 1(i), 0, . . . , 0)

with i ≤ n′

2 . It follows that already pα ≤ 2−3.

Case 1 (i = 1): Then,

γ2 = (0, ∗, ∗, 1, 0, 0, 0, 0, ∗, ∗, 0, 0, 0, 0, 0, 0 . . . ) ∈ Uα + 0 .

Again, w.l.o.g., let w1(γ2) = 1. Then,

γ3 = (1, 1, 0, 0, ∗, 1, 0, 0, 0, 0, 0, ∗, 0, 0, 0, 0 . . . ) ∈ Uγ2 + α


Case 2 (i = 4): Then,

γ2 = (0, ∗, 1, 0, 0, ∗, 1, 0, ∗, 0, 0, 0, ∗, 0, 0, 0 . . . ) ∈ Uα + 0

and pγ2 ≤ 2−3.

Case 3 (i 6= 1, i 6= 4): Then,

γ2 = (∗, ∗, 1, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗ . . . ) ∈ Uα + 0 .

Again, w.l.o.g., let w1(γ2) = 1. Then,

γ3 = (1, ∗, ∗, ∗, 1, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗ . . . ) ∈ Uγ2 + α


(iii) w1(α) = 3: W.l.o.g., let α = (1, 0, . . . , 1(i), 0, . . . , 1(j), 0, . . . , 0) with i ≥ n′

3because of the rotational invariance. Again, pα ≤ 2−3. Since n′ ≥ 16, it isi ≥ 6. We distinguish the following cases:

Case 1 (j 6= n′ − 6, i 6= n′ − 6): Then,

γ2 = (∗, ∗, 1, ∗, ∗, ∗, ∗, ∗, . . . ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗) ∈ Uα + 0

and for w1(γ2) = 1 we obtain

γ3 = (1, 0, 0, ∗, 1, 0, ∗, ∗, . . . ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗) ∈ Uγ2 + α

‡Of course, this bit is already equal to 1 if the length n′ is larger than 16.

173

such that pγ3 ≤ 2−3.

Case 2 (i = n′ − 6): Then,

γ2 = (∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, . . . ∗, ∗, ∗, ∗, 1, ∗, ∗, ∗) ∈ Uα + 0

if j 6= n′ − 5 and

γ2 = (∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, . . . ∗, ∗, ∗, ∗, ∗, 1, ∗, ∗) ∈ Uα + 0

if j = n′ − 5. In both cases, for w1(γ2) = 1 we obtain

γ3 = (1(0), 0, 0, 0, ∗, ∗, 0, 0, . . . 0, 0, 1(i), ∗, ∗, ∗, ∗, ∗) ∈ Uγ2 + α

such that pγ3 ≤ 2−3.

Case 3 (j = n′ − 6): Now, we still have to consider the two possibilitiesj − i 6= 6 and j − i = 6. For the first case, one gets

γ2 = (∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, . . . ∗, ∗, ∗, ∗, 1, ∗, ∗, ∗) ∈ Uα + 0

and for w1(γ2) = 1,

γ3 = (1, ∗, ∗, ∗, ∗, ∗, ∗, ∗, . . . ∗, ∗, ∗, ∗, ∗, ∗, 1, ∗) ∈ Uγ2 + α .

If j − i = 6, then,

γ2 = (∗, ∗, ∗, ∗, . . . ∗, ∗, 1(i+2), ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗, ∗) ∈ Uα + 0

and for w1(γ2) = 1,

γ3 = (1(1), ∗, ∗, ∗, . . . 1(i), ∗, ∗, ∗, 1, ∗, 1(j), ∗, ∗, ∗, ∗, ∗) ∈ Uγ2 + α .

Using a similar argument, one obtains the bounds for Simeck as the followingtheorem states.

Theorem 7.3 (Bounds for Simeck). Let n′ ∈ {16, 24, 32} and θ(x) = (x ≪1). For the rotation constants a = 5, b = 0, the value of P (C) for any T -rounddifferential trail C over round functions F fS is upper bounded by 2−2T+2.

Interestingly, for every version of Simon and Simeck, it turns out that ourapproach is sufficient in order to bound P (C) below 2−n for a smaller number ofrounds than specified in the actual cipher. Recall that n′ denotes the length ofone Feistel branch, i.e., half the block length. For n′ up to 32, the security margintm of the corresponding cipher(s) can be considered as reasonable. See Table 7.1for a comparison.

174

Table 7.1: Number of rounds t needed for bounding the value of P (C), where Cis any non-zero t-round differential trail, below 2−n for all versions of Simon andSimeck. The ? symbol indicates that there is an appropriate version of Simeckwith the same number of rounds.

rounds roundsneeded

margin tm

Simon-32-64? 32 17 15

Simon-48-72 36 25 11

Simon-48-96? 36 25 11

Simon-64-96 42 33 9

Simon-64-128? 44 33 11

Simon-96-96 52 49 3

Simon-96-144 54 49 5

Simon-128-128 68 65 3

Simon-128-192 69 65 4

Simon-128-256 72 65 7


We presented a more general description of Simon-like designs by separating theround function into a linear and a non-linear component and developed a non-experimental security argument on full-round versions of Simon that can be verifiedby hand. We hope that this work encourages further research on analyzing Simon-like designs. An open question is whether our approach can be generalized in orderto obtain better bounds over multiple rounds. The recent improvements shownin [LLW17a] seem to be a good starting point. Furthermore, it would be favorableto avoid the consideration of every special case individually. This is related to thequestion of how to design the linear part θ in this set-up.

175

Bibliography

[AAA+15] Mohamed Ahmed Abdelraheem, Javad Alizadeh, Hoda A. Alkhza-imi, Mohammad Reza Aref, Nasour Bagheri, and Praveen Gau-ravaram. Improved linear cryptanalysis of reduced-round SIMON-32and SIMON-48. In Alex Biryukov and Vipul Goyal, editors, IN-DOCRYPT 2015, volume 9462 of LNCS, pages 153–179. Springer,Heidelberg, 2015.

[AABL12] Mohamed Ahmed Abdelraheem, Martin Agren, Peter Beelen, andGregor Leander. On the distribution of linear biases: Three instruc-tive examples. In Reihaneh Safavi-Naini and Ran Canetti, editors,CRYPTO 2012, volume 7417 of LNCS, pages 50–67. Springer, Hei-delberg, 2012.

[ABC+17] Ralph Ankele, Subhadeep Banik, Avik Chakraborti, Eik List, FlorianMendel, Siang Meng Sim, and Gaoli Wang. Related-key impossible-differential attack on reduced-round SKINNY. In Dieter Gollmann,Atsuko Miyaji, and Hiroaki Kikuchi, editors, ACNS 17, volume 10355of LNCS, pages 208–228. Springer, Heidelberg, 2017.

[ABG+13] Javad Alizadeh, Nasour Bagheri, Praveen Gauravaram, Abhishek Ku-mar, and Somitra Kumar Sanadhya. Linear cryptanalysis of roundreduced SIMON. Cryptology ePrint Archive, Report 2013/663, 2013.https://eprint.iacr.org/2013/663.

[ADK+14] Martin R. Albrecht, Benedikt Driessen, Elif Bilge Kavun, GregorLeander, Christof Paar, and Tolga Yalcin. Block ciphers - focus on thelinear layer (feat. PRIDE). In Juan A. Garay and Rosario Gennaro,editors, CRYPTO 2014, Part I, volume 8616 of LNCS, pages 57–76.Springer, Heidelberg, 2014.

[AF15] Daniel Augot and Matthieu Finiasz. Direct construction of recursiveMDS diffusion layers using shortened BCH codes. In Carlos Cidand Christian Rechberger, editors, FSE 2014, volume 8540 of LNCS,pages 3–17. Springer, Heidelberg, 2015.

177


[AL13] Hoda A. Alkhzaimi and Martin M. Lauridsen. Cryptanalysis of theSIMON family of block ciphers. Cryptology ePrint Archive, Report2013/543, 2013. https://eprint.iacr.org/2013/543.

[ALLW15] Farzaneh Abed, Eik List, Stefan Lucks, and Jakob Wenzel. Differen-tial cryptanalysis of round-reduced Simon and Speck. In Carlos Cidand Christian Rechberger, editors, FSE 2014, volume 8540 of LNCS,pages 525–545. Springer, Heidelberg, 2015.

[Ash15] Tomer Ashur. Improved linear trails for the block cipher Simon. Cryp-tology ePrint Archive, Report 2015/285, 2015. https://eprint.

iacr.org/2015/285.

[Ava17] Roberto Avanzi. The QARMA block cipher family. IACR Trans.Symm. Cryptol., 2017(1):4–44, 2017.

[BAK98] Eli Biham, Ross J. Anderson, and Lars R. Knudsen. Serpent: A newblock cipher proposal. In Serge Vaudenay, editor, FSE’98, volume1372 of LNCS, pages 222–238. Springer, Heidelberg, 1998.

[BBI+15] Subhadeep Banik, Andrey Bogdanov, Takanori Isobe, Kyoji Shibu-tani, Harunaga Hiwatari, Toru Akishita, and Francesco Regazzoni.Midori: A block cipher for low energy. In Tetsu Iwata and Jung HeeCheon, editors, ASIACRYPT 2015, Part II, volume 9453 of LNCS,pages 411–436. Springer, Heidelberg, 2015.

[BBS05] Eli Biham, Alex Biryukov, and Adi Shamir. Cryptanalysis of Skip-jack reduced to 31 rounds using impossible differentials. J. Cryptol.,18(4):291–311, 2005.

[BCG+12] Julia Borghoff, Anne Canteaut, Tim Guneysu, Elif Bilge Kavun,Miroslav Knezevic, Lars R. Knudsen, Gregor Leander, VentzislavNikov, Christof Paar, Christian Rechberger, Peter Rombouts,Søren S. Thomsen, and Tolga Yalcin. PRINCE - A low-latency blockcipher for pervasive computing applications - extended abstract. InXiaoyun Wang and Kazue Sako, editors, ASIACRYPT 2012, volume7658 of LNCS, pages 208–225. Springer, Heidelberg, 2012.

[BCKL17] Christina Boura, Anne Canteaut, Lars R Knudsen, and Gregor Lean-der. Reflection ciphers. Design. Code. Cryptogr., 82(1-2):3–25, 2017.

[BCLR17] Christof Beierle, Anne Canteaut, Gregor Leander, and Yann Rotella.Proving resistance against invariant attacks: How to choose theround constants. In Jonathan Katz and Hovav Shacham, editors,CRYPTO 2017, Part II, volume 10402 of LNCS, pages 647–678.Springer, Heidelberg, 2017.

178




[BDK01] Eli Biham, Orr Dunkelman, and Nathan Keller. The rectangle at-tack - rectangling the Serpent. In Birgit Pfitzmann, editor, EU-ROCRYPT 2001, volume 2045 of LNCS, pages 340–357. Springer,Heidelberg, 2001.

[BDPA11] Guido Bertoni, Joan Daemen, Michael Peeters, and Gilles Van Ass-che. The Keccak reference. Submission to NIST (Round 3), 13, 2011.

[Bei16] Christof Beierle. Pen and paper arguments for SIMON and simon-like designs. In Vassilis Zikas and Roberto De Prisco, editors, Securityand Cryptography for Networks - 10th International Conference, SCN2016, Amalfi, Italy, August 31 - September 2, 2016, Proceedings, vol-ume 9841 of LNCS, pages 431–446. Springer, Heidelberg, 2016.

[Bih94a] Eli Biham. New types of cryptanalytic attacks using related keys. J.Cryptol., 7(4):229–246, 1994.

[Bih94b] Eli Biham. New types of cryptanalytic attacks using related keys (ex-tended abstract). In Tor Helleseth, editor, EUROCRYPT’93, volume765 of LNCS, pages 398–409. Springer, Heidelberg, 1994.

[BJK+16a] Christof Beierle, Jeremy Jean, Stefan Kolbl, Gregor Leander, AmirMoradi, Thomas Peyrin, Yu Sasaki, Pascal Sasdrich, and Siang MengSim. The SKINNY family of block ciphers and its low-latencyvariant MANTIS. In Matthew Robshaw and Jonathan Katz, edi-tors, CRYPTO 2016, Part II, volume 9815 of LNCS, pages 123–153.Springer, Heidelberg, 2016.

[BJK+16b] Christof Beierle, Jeremy Jean, Stefan Kolbl, Gregor Leander, AmirMoradi, Thomas Peyrin, Yu Sasaki, Pascal Sasdrich, and Siang MengSim. The SKINNY family of block ciphers and its low-latency vari-ant MANTIS. Cryptology ePrint Archive, Report 2016/660, 2016.https://eprint.iacr.org/2016/660.

[BJL+15] Christof Beierle, Philipp Jovanovic, Martin M. Lauridsen, GregorLeander, and Christian Rechberger. Analyzing permutations for AES-like ciphers: Understanding ShiftRows. In Kaisa Nyberg, editor, CT-RSA 2015, volume 9048 of LNCS, pages 37–58. Springer, Heidelberg,2015.

[BK03a] Mihir Bellare and Tadayoshi Kohno. A theoretical treatment ofrelated-key attacks: RKA-PRPs, RKA-PRFs, and applications. InEli Biham, editor, EUROCRYPT 2003, volume 2656 of LNCS, pages491–506. Springer, Heidelberg, 2003.

[BK03b] Mihir Bellare and Tadayoshi Kohno. A theoretical treatment ofrelated-key attacks: RKA-PRPs, RKA-PRFs, and applications.

179


Full version, 2003. https://homes.cs.washington.edu/~yoshi/

papers/RKA/rka.pdf.

[BK09] Alex Biryukov and Dmitry Khovratovich. Related-key cryptanalysisof the full AES-192 and AES-256. In Mitsuru Matsui, editor, ASI-ACRYPT 2009, volume 5912 of LNCS, pages 1–18. Springer, Heidel-berg, 2009.

[BKL+07] Andrey Bogdanov, Lars R. Knudsen, Gregor Leander, Christof Paar,Axel Poschmann, Matthew J. B. Robshaw, Yannick Seurin, andC. Vikkelsoe. PRESENT: An ultra-lightweight block cipher. In PascalPaillier and Ingrid Verbauwhede, editors, CHES 2007, volume 4727of LNCS, pages 450–466. Springer, Heidelberg, 2007.

[BKL16] Christof Beierle, Thorsten Kranz, and Gregor Leander. Lightweightmultiplication in GF(2n) with applications to MDS matrices. InMatthew Robshaw and Jonathan Katz, editors, CRYPTO 2016,Part I, volume 9814 of LNCS, pages 625–653. Springer, Heidelberg,2016.

[BKN09] Alex Biryukov, Dmitry Khovratovich, and Ivica Nikolic. Distin-guisher and related-key attack on the full AES-256. In Shai Halevi, ed-itor, CRYPTO 2009, volume 5677 of LNCS, pages 231–249. Springer,Heidelberg, 2009.

[BN10] Alex Biryukov and Ivica Nikolic. Automatic search for related-keydifferential characteristics in byte-oriented block ciphers: Applicationto AES, Camellia, Khazad and others. In Henri Gilbert, editor, EU-ROCRYPT 2010, volume 6110 of LNCS, pages 322–344. Springer,Heidelberg, 2010.

[BNN+10] Paulo Barreto, Ventzislav Nikov, Svetla Nikova, Vincent Rijmen, andElmar Tischhauser. Whirlwind: a new cryptographic hash function.Design. Code. Cryptogr., 56(2-3):141–162, 2010.

[BP17] Alex Biryukov and Leo Perrin. State of the art in lightweight sym-metric cryptography. Cryptology ePrint Archive, Report 2017/511,2017. https://eprint.iacr.org/2017/511.

[BR00a] Paolo S. L. M. Barreto and Vincent Rijmen. The Whirlpool HashingFunction. NESSIE submission, 2000.

[BR00b] Paulo S. L. M. Barreto and Vincent Rijmen. The ANUBIS BlockCipher. NESSIE submission, 2000.

[BR05] Mihir Bellare and Phillip Rogaway. Introduction to modern cryp-tography. Lecture notes, 2005. http://web.cs.ucdavis.edu/

~rogaway/classes/227/spring05/book/main.pdf.

180

https://homes.cs.washington.edu/~yoshi/papers/RKA/rka.pdf

https://homes.cs.washington.edu/~yoshi/papers/RKA/rka.pdf


http://web.cs.ucdavis.edu/~rogaway/classes/227/spring05/book/main.pdf

http://web.cs.ucdavis.edu/~rogaway/classes/227/spring05/book/main.pdf

[BR14] Andrey Bogdanov and Vincent Rijmen. Linear hulls with correla-tion zero and linear cryptanalysis of block ciphers. Design. Code.Cryptogr., 70(3):369–383, 2014.

[BRV15] Alex Biryukov, Arnab Roy, and Vesselin Velichkov. Differential analy-sis of block ciphers SIMON and SPECK. In Carlos Cid and ChristianRechberger, editors, FSE 2014, volume 8540 of LNCS, pages 546–570.Springer, Heidelberg, 2015.

[BS91a] Eli Biham and Adi Shamir. Differential cryptanalysis of DES-likecryptosystems. In Alfred J. Menezes and Scott A. Vanstone, editors,CRYPTO’90, volume 537 of LNCS, pages 2–21. Springer, Heidelberg,1991.

[BS91b] Eli Biham and Adi Shamir. Differential cryptanalysis of DES-likecryptosystems. J. Cryptol., 4(1):3–72, 1991.

[BSS+13] Ray Beaulieu, Douglas Shors, Jason Smith, Stefan Treatman-Clark,Bryan Weeks, and Louis Wingers. The SIMON and SPECK fami-lies of lightweight block ciphers. Cryptology ePrint Archive, Report2013/404, 2013. https://eprint.iacr.org/2013/404.

[BSS+15] Ray Beaulieu, Douglas Shors, Jason Smith, Stefan Treatman-Clark,Bryan Weeks, and Louis Wingers. SIMON and SPECK: Block ciphersfor the internet of things. NIST Lightweight Cryptography Workshop,2015.

[BW99] Alex Biryukov and David Wagner. Slide attacks. In Lars R. Knud-sen, editor, FSE’99, volume 1636 of LNCS, pages 245–259. Springer,Heidelberg, 1999.

[Can05] David Canright. A very compact S-box for AES. In Josyula R. Raoand Berk Sunar, editors, CHES 2005, volume 3659 of LNCS, pages441–455. Springer, Heidelberg, 2005.

[Car07] Claude Carlet. Boolean Functions for Cryptography and Error Cor-recting Codes. In Yves Crama and Peter Hammer, editors, BooleanMethods and Models. Cambridge University Press, 2007.

[CDL16] Anne Canteaut, Sebastien Duval, and Gaetan Leurent. Constructionof lightweight S-boxes using Feistel and MISTY structures. In OrrDunkelman and Liam Keliher, editors, SAC 2015, volume 9566 ofLNCS, pages 373–393. Springer, Heidelberg, 2016.

[CFG+17] Colin Chaigneau, Thomas Fuhr, Henri Gilbert, Jeremy Jean, andJean-Rene Reinhard. Cryptanalysis of NORX v2.0. IACR Trans.Symm. Cryptol., 2017(1):156–174, 2017.

181


[CV95] Florent Chabaud and Serge Vaudenay. Links between differential andlinear cryptanalysis. In Alfredo De Santis, editor, EUROCRYPT’94,volume 950 of LNCS, pages 356–365. Springer, Heidelberg, 1995.

[CW16] Huaifeng Chen and Xiaoyun Wang. Improved linear hull attack onround-reduced SIMON with dynamic key-guessing techniques. InThomas Peyrin, editor, FSE 2016, volume 9783 of LNCS, pages 428–449. Springer, Heidelberg, 2016.

[Dae95] Joan Daemen. Cipher and hash function design strategies based onlinear and differential cryptanalysis. PhD thesis, Doctoral Disserta-tion, March 1995, KU Leuven, 1995.

[DEKM16] Christoph Dobraunig, Maria Eichlseder, Daniel Kales, and FlorianMendel. Practical key-recovery attack on MANTIS5. IACR Trans.Symm. Cryptol., 2016(2):248–260, 2016.

[DF04] David S. Dummit and Richard M. Foote. Abstract algebra. Wiley,Hoboken, 2004.

[DGV95] Joan Daemen, Rene Govaerts, and Joos Vandewalle. Correlation ma-trices. In Bart Preneel, editor, FSE’94, volume 1008 of LNCS, pages275–285. Springer, Heidelberg, 1995.

[DKR97] Joan Daemen, Lars R. Knudsen, and Vincent Rijmen. The blockcipher Square. In Eli Biham, editor, FSE’97, volume 1267 of LNCS,pages 149–165. Springer, Heidelberg, 1997.

[DPU+16] Daniel Dinu, Leo Perrin, Aleksei Udovenko, Vesselin Velichkov, Jo-hann Großschadl, and Alex Biryukov. Design strategies for ARX withprovable bounds: Sparx and LAX. In Jung Hee Cheon and TsuyoshiTakagi, editors, ASIACRYPT 2016, Part I, volume 10031 of LNCS,pages 484–513. Springer, Heidelberg, 2016.

[DPVAR00] Joan Daemen, Michael Peeters, Gilles Van Assche, and Vincent Rij-men. Nessie proposal: Noekeon, 2000.

[DR02] Joan Daemen and Vincent Rijmen. The design of Rijndael: AES-the advanced encryption standard. Springer-Verlag Berlin Heidelberg,2002.

[DR11] Joan Daemen and Vincent Rijmen. Correlation analysis in GF (2n).In Advanced Linear Cryptanalysis of Block and Stream Ciphers. Cryp-tology and information security, pages 115–131, 2011.

[DW97] Ed Dawson and Chuan-Kun Wu. On the linear structure of symmetricBoolean functions. Australas. J. Comb., 16:239–243, 1997.

182

[Dwo01] Morris Dworkin. Recommendation for block cipher modes of opera-tion: Methods and techniques. National Institute of Standards andTechnology. SP 800-38A, 2001.

[EK17] Maria Eichlseder and Daniel Kales. Clustering related-tweak char-acteristics: Application to MANTIS-6. Cryptology ePrint Archive,Report 2017/1136, 2017. https://eprint.iacr.org/2017/1136.

[FNS75] Horst Feistel, William A. Notz, and J. Lynn Smith. Some crypto-graphic techniques for machine-to-machine data communications. InProc. of the IEEE, volume 63, pages 1545–1554, 1975.

[Gan59] Felix R. Gantmacher. The theory of matrices. Chelsea PublishingCompagny, 1959.

[GJN+16] Jian Guo, Jeremy Jean, Ivica Nikolic, Kexin Qiao, Yu Sasaki, andSiang Meng Sim. Invariant subspace attack against Midori64 and theresistance criteria for S-box designs. IACR Trans. Symm. Cryptol.,2016(1):33–56, 2016.

[GLS+14] Vincent Grosso, Gaetan Leurent, Francois-Xavier Standaert, KeremVarici, Anthony Journault, Francois Durvaux, Lubos Gaspar, andStephanie Kerckhof. SCREAM v1, 2014. Submission to CAESAR.

[GLS+15] Vincent Grosso, Gaetan Leurent, Francois-Xavier Standaert, KeremVarici, Anthony Journault, Francois Durvaux, Lubos Gaspar, andStephanie Kerckhof. SCREAM v3, 2015. Submission to CAESAR.

[GLSV15] Vincent Grosso, Gaetan Leurent, Francois-Xavier Standaert, andKerem Varici. LS-designs: Bitslice encryption for efficient maskedsoftware implementations. In Carlos Cid and Christian Rechberger,editors, FSE 2014, volume 8540 of LNCS, pages 18–37. Springer, Hei-delberg, 2015.

[GO16] Inc. Gurobi Optimization. Gurobi optimizer reference manual, 2016.

[GPP11] Jian Guo, Thomas Peyrin, and Axel Poschmann. The PHOTONfamily of lightweight hash functions. In Phillip Rogaway, editor,CRYPTO 2011, volume 6841 of LNCS, pages 222–239. Springer, Hei-delberg, 2011.

[GPPR11] Jian Guo, Thomas Peyrin, Axel Poschmann, and Matthew J. B. Rob-shaw. The LED block cipher. In Bart Preneel and Tsuyoshi Takagi,editors, CHES 2011, volume 6917 of LNCS, pages 326–341. Springer,Heidelberg, 2011.

183


[GR14] Kishan Chand Gupta and Indranil Ghosh Ray. Cryptographicallysignificant MDS matrices based on circulant and circulant-like ma-trices for lightweight applications. Crypt. Commun., 7(2):257–287,2014.

[Gra07] Markus Grassl. Bounds on the minimum distance of linear codesand quantum codes. http://www.codetables.de, 2007. Accessed:February 15, 2016.

[Her75] Israel N. Herstein. Topics in Algebra. John Wiley & Sons, Lexington,USA, 1975.

[HKM95] Carlo Harpes, Gerhard G. Kramer, and James L. Massey. A general-ization of linear cryptanalysis and the applicability of Matsui’s piling-up lemma. In Louis C. Guillou and Jean-Jacques Quisquater, editors,EUROCRYPT’95, volume 921 of LNCS, pages 24–38. Springer, Hei-delberg, 1995.

[HT14] Michael Henson and Stephen Taylor. Memory encryption: a surveyof existing techniques. ACM Comput. Surv., 46(4):1–53, 2013.

[Jea16] Jeremy Jean. Cryptanalysis of Haraka. IACR Trans. Symm. Cryptol.,2016(1):1–12, 2016.

[JNP14] Jeremy Jean, Ivica Nikolic, and Thomas Peyrin. Tweaks and keysfor block ciphers: The TWEAKEY framework. In Palash Sarkar andTetsu Iwata, editors, ASIACRYPT 2014, Part II, volume 8874 ofLNCS, pages 274–288. Springer, Heidelberg, 2014.

[JPST17] Jeremy Jean, Thomas Peyrin, Siang Meng Sim, and Jade Tourteaux.Optimizing implementations of lightweight building blocks. IACRTrans. Symm. Cryptol., 2017(4):130–168, 2017.

[Ker83] Auguste Kerkhoff. La cryptographie militaire. Journal de sciencesmilitaires, IX:5–38, 161–191, 1883.

[KLSW17] Thorsten Kranz, Gregor Leander, Ko Stoffelen, and FriedrichWiemer. Shorter linear straight-line programs for MDS matrices.IACR Trans. Symm. Cryptol., 2017(4):188–211, 2017.

[KLT15] Stefan Kolbl, Gregor Leander, and Tyge Tiessen. Observations on theSIMON block cipher family. In Rosario Gennaro and Matthew J. B.Robshaw, editors, CRYPTO 2015, Part I, volume 9215 of LNCS,pages 161–185. Springer, Heidelberg, 2015.

[KLW17] Thorsten Kranz, Gregor Leander, and Friedrich Wiemer. Linearcryptanalysis: Key schedules and tweakable block ciphers. IACRTrans. Symm. Cryptol., 2017(1):474–505, 2017.

184

http://www.codetables.de

[Kna07] Anthony W Knapp. Basic algebra. Springer Science & BusinessMedia, 2007.

[Knu95] Lars R. Knudsen. Truncated and higher order differentials. InBart Preneel, editor, FSE’94, volume 1008 of LNCS, pages 196–211.Springer, Heidelberg, 1995.

[Knu98] Lars Knudsen. Deal - a 128-bit block cipher. NIST AES Proposal,1998.

[KPPY14] Khoongming Khoo, Thomas Peyrin, Axel York Poschmann, and Hui-hui Yap. FOAM: Searching for hardware-optimal SPN structures andcomponents with a fair comparison. In Lejla Batina and MatthewRobshaw, editors, CHES 2014, volume 8731 of LNCS, pages 433–450. Springer, Heidelberg, 2014.

[KR96a] Joe Kilian and Phillip Rogaway. How to protect DES against exhaus-tive key search. In Neal Koblitz, editor, CRYPTO’96, volume 1109of LNCS, pages 252–267. Springer, Heidelberg, 1996.

[KR96b] Lars R. Knudsen and Matthew J. B. Robshaw. Non-linear approxi-mations in linear cryptanalysis. In Ueli M. Maurer, editor, EURO-CRYPT’96, volume 1070 of LNCS, pages 224–236. Springer, Heidel-berg, 1996.

[KR11] Lars R. Knudsen and Matthew Robshaw. The block cipher compan-ion. Springer Science & Business Media, 2011.

[KSI16] Kota Kondo, Yu Sasaki, and Tetsu Iwata. On the design rationaleof SIMON block cipher: Integral attacks and impossible differentialattacks against SIMON variants. In Mark Manulis, Ahmad-RezaSadeghi, and Steve Schneider, editors, ACNS 16, volume 9696 ofLNCS, pages 518–536. Springer, Heidelberg, 2016.

[KSW96] John Kelsey, Bruce Schneier, and David Wagner. Key-schedule crypt-analysis of IDEA, G-DES, GOST, SAFER, and Triple-DES. In NealKoblitz, editor, CRYPTO’96, volume 1109 of LNCS, pages 237–251.Springer, Heidelberg, 1996.

[LAAZ11] Gregor Leander, Mohamed Ahmed Abdelraheem, Hoda Alkhzaimi,and Erik Zenner. A cryptanalysis of PRINTcipher: The invariantsubspace attack. In Phillip Rogaway, editor, CRYPTO 2011, volume6841 of LNCS, pages 206–221. Springer, Heidelberg, 2011.

[Lai94] Xuejia Lai. Higher order derivatives and differential cryptanalysis.In Richard E. Blahut, Daniel J. Costello, Ueli Maurer, and ThomasMittelholzer, editors, Communications and Cryptography: Two Sidesof One Tapestry, pages 227–233. Springer US, Boston, MA, 1994.

185

[Lai95] Xuejia Lai. Additive and linear structures of cryptographic functions.In Bart Preneel, editor, FSE’94, volume 1008 of LNCS, pages 75–85.Springer, Heidelberg, 1995.

[LGS17] Guozhen Liu, Mohona Ghosh, and Ling Song. Security analysis ofSKINNY under related-tweakey settings. IACR Trans. Symm. Cryp-tol., 2017(3):37–72, 2017.

[LK06] Chae Hoon Lim and Tymur Korkishko. mCrypton - a lightweightblock cipher for security of low-cost RFID tags and sensors. InJooseok Song, Taekyoung Kwon, and Moti Yung, editors, WISA 05,volume 3786 of LNCS, pages 243–258. Springer, Heidelberg, 2006.

[LLW17a] Zhengbin Liu, Yongqiang Li, and Mingsheng Wang. Optimal differ-ential trails in SIMON-like ciphers. IACR Trans. Symm. Cryptol.,2017(1):358–379, 2017.

[LLW17b] Zhengbin Liu, Yongqiang Li, and Mingsheng Wang. The security ofSIMON-like ciphers against linear cryptanalysis. Cryptology ePrintArchive, Report 2017/576, 2017. https://eprint.iacr.org/2017/

576.

[LM91] Xuejia Lai and James L. Massey. Markov ciphers and differentialcryptanalysis. In Donald W. Davies, editor, EUROCRYPT’91, vol-ume 547 of LNCS, pages 17–38. Springer, Heidelberg, 1991.

[LMR15] Gregor Leander, Brice Minaud, and Sondre Rønjom. A generic ap-proach to invariant subspace attacks: Cryptanalysis of Robin, iS-CREAM and Zorro. In Elisabeth Oswald and Marc Fischlin, editors,EUROCRYPT 2015, Part I, volume 9056 of LNCS, pages 254–283.Springer, Heidelberg, 2015.

[LN94] Rudolf Lidl and Harald Niederreiter. Introduction to finite fields andtheir applications. Cambridge university press, 1994.

[LRW02] Moses Liskov, Ronald L. Rivest, and David Wagner. Tweakable blockciphers. In Moti Yung, editor, CRYPTO 2002, volume 2442 of LNCS,pages 31–46. Springer, Heidelberg, 2002.

[LRW11] Moses Liskov, Ronald L. Rivest, and David Wagner. Tweakable blockciphers. J. Cryptol., 24(3):588–613, 2011.

[LS16] Meicheng Liu and Siang Meng Sim. Lightweight MDS generalizedcirculant matrices. In Thomas Peyrin, editor, FSE 2016, volume9783 of LNCS, pages 101–120. Springer, Heidelberg, 2016.

[LW16] Yongqiang Li and Mingsheng Wang. On the construction oflightweight circulant involutory MDS matrices. In Thomas Peyrin,

186



editor, FSE 2016, volume 9783 of LNCS, pages 121–139. Springer,Heidelberg, 2016.

[Mat94] Mitsuru Matsui. Linear cryptanalysis method for DES cipher. InTor Helleseth, editor, EUROCRYPT’93, volume 765 of LNCS, pages386–397. Springer, Heidelberg, 1994.

[Mat95] Mitsuru Matsui. On correlation between the order of S-boxes andthe strength of DES. In Alfredo De Santis, editor, EUROCRYPT’94,volume 950 of LNCS, pages 366–375. Springer, Heidelberg, 1995.

[MBTM17] Kerry A. McKay, Larry Bassham, Meltem Sonmez Turan, and NickyMouha. Report on lightweight cryptography. National Institute ofStandards and Technology, 2017. https://doi.org/10.6028/NIST.IR.8114.

[MS77] Florence Jessie MacWilliams and Neil James Alexander Sloane. Thetheory of Error-Correcting Codes. North-Holland Publishing Com-pany, 1977.

[MS16] Amir Moradi and Tobias Schneider. Side-channel analysis protectionand low-latency in action – case study of PRINCE and Midori –. InJung Hee Cheon and Tsuyoshi Takagi, editors, ASIACRYPT 2016,Part I, volume 10031 of LNCS, pages 517–547. Springer, Heidelberg,2016.

[MWGP12] Nicky Mouha, Qingju Wang, Dawu Gu, and Bart Preneel. Differentialand linear cryptanalysis using mixed-integer linear programming. InChuan-Kun Wu, Moti Yung, and Dongdai Lin, editors, InformationSecurity and Cryptology, volume 7537 of LNCS, pages 57–76. SpringerBerlin Heidelberg, 2012.

[NK93] Kaisa Nyberg and Lars R. Knudsen. Provable security against dif-ferential cryptanalysis. In Ernest F. Brickell, editor, CRYPTO’92,volume 740 of LNCS, pages 566–574. Springer, Heidelberg, 1993.

[NK95] Kaisa Nyberg and Lars R. Knudsen. Provable security against adifferential attack. J. Cryptol., 8(1):27–37, 1995.

[Nyb93] Kaisa Nyberg. On the construction of highly nonlinear permuta-tions. In Rainer A. Rueppel, editor, EUROCRYPT’92, volume 658of LNCS, pages 92–98. Springer, Heidelberg, 1993.

[Nyb94] Kaisa Nyberg. Differentially uniform mappings for cryptography. InTor Helleseth, editor, EUROCRYPT’93, volume 765 of LNCS, pages55–64. Springer, Heidelberg, 1994.

187

https://doi.org/10.6028/NIST.IR.8114

https://doi.org/10.6028/NIST.IR.8114

[Nyb95] Kaisa Nyberg. Linear approximation of block ciphers. In Alfredo DeSantis, editor, EUROCRYPT’94, volume 950 of LNCS, pages 439–444. Springer, Heidelberg, 1995.

[Nyb96] Kaisa Nyberg. Generalized Feistel networks. In Kwangjo Kimand Tsutomu Matsumoto, editors, ASIACRYPT’96, volume 1163 ofLNCS, pages 91–104. Springer, Heidelberg, 1996.

[PUB77] PUB FIPS. 46: Data encryption standard. National Institute ofStandards and Technology, 1977.

[PUB01] PUB FIPS. 197: Advanced encryption standard (AES). NationalInstitute of Standards and Technology, 2001. http://csrc.nist.

gov/publications/fips/fips197/fips-197.pdf.

[RDP+96] Vincent Rijmen, Joan Daemen, Bart Preneel, Anton Bossalaers, andErik De Win. The cipher SHARK. In Dieter Gollmann, editor,FSE’96, volume 1039 of LNCS, pages 99–111. Springer, Heidelberg,1996.

[Rog04] Phillip Rogaway. Efficient instantiations of tweakable blockciphersand refinements to modes OCB and PMAC. In Pil Joong Lee, editor,ASIACRYPT 2004, volume 3329 of LNCS, pages 16–31. Springer,Heidelberg, 2004.

[Røn16] Sondre Rønjom. Invariant subspaces in Simpira. Cryptology ePrintArchive, Report 2016/248, 2016. https://eprint.iacr.org/2016/

248.

[Sch98] Rick Schroeppel. The Hasty Pudding Cipher. NIST AES proposal,1998.

[SDMS12] Mahdi Sajadieh, Mohammad Dakhilalian, Hamid Mala, and PouyanSepehrdad. Recursive diffusion layers for block ciphers and hash func-tions. In Anne Canteaut, editor, FSE 2012, volume 7549 of LNCS,pages 385–401. Springer, Heidelberg, 2012.

[Sha49] C. E. Shannon. Communication theory of secrecy systems. Bell Syst.Tech. J., 28(4):656–715, 1949.

[SHS+13] Siwei Sun, Lei Hu, Ling Song, Yonghong Xie, and Peng Wang. Auto-matic security evaluation of block ciphers with s-bp structures againstrelated-key differential attacks. In International Conference on In-formation Security and Cryptology, pages 39–51. Springer, 2013.

[SHW+14a] Siwei Sun, Lei Hu, Meiqin Wang, Peng Wang, Kexin Qiao, Xi-aoshuang Ma, Danping Shi, Ling Song, and Kai Fu. Towards finding

188

http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf

http://csrc.nist.gov/publications/fips/fips197/fips-197.pdf



the best characteristics of some bit-oriented block ciphers and auto-matic enumeration of (related-key) differential and linear character-istics with predefined properties. Cryptology ePrint Archive, Report2014/747, 2014. https://eprint.iacr.org/2014/747.

[SHW+14b] Siwei Sun, Lei Hu, Peng Wang, Kexin Qiao, Xiaoshuang Ma, and LingSong. Automatic security evaluation and (related-key) differentialcharacteristic search: Application to SIMON, PRESENT, LBlock,DES(L) and other bit-oriented block ciphers. In Palash Sarkar andTetsu Iwata, editors, ASIACRYPT 2014, Part I, volume 8873 ofLNCS, pages 158–178. Springer, Heidelberg, 2014.

[SIH+11] Kyoji Shibutani, Takanori Isobe, Harunaga Hiwatari, Atsushi Mit-suda, Toru Akishita, and Taizo Shirai. Piccolo: An ultra-lightweight blockcipher. In Bart Preneel and Tsuyoshi Takagi, ed-itors, CHES 2011, volume 6917 of LNCS, pages 342–357. Springer,Heidelberg, 2011.

[Sil00] John R. Silvester. Determinants of block matrices. Math. Gaz.,84(501):460–467, 2000.

[SK96] Bruce Schneier and John Kelsey. Unbalanced Feistel networks andblock cipher design. In Dieter Gollmann, editor, FSE’96, volume 1039of LNCS, pages 121–144. Springer, Heidelberg, 1996.

[SKOP15] Siang Meng Sim, Khoongming Khoo, Frederique E. Oggier, andThomas Peyrin. Lightweight MDS involution matrices. In GregorLeander, editor, FSE 2015, volume 9054 of LNCS, pages 471–493.Springer, Heidelberg, 2015.

[SM88] Akihiro Shimizu and Shoji Miyaguchi. Fast data encipherment algo-rithm FEAL. In David Chaum and Wyn L. Price, editors, EURO-CRYPT’87, volume 304 of LNCS, pages 267–278. Springer, Heidel-berg, 1988.

[SMB16] Sadegh Sadeghi, Tahere Mohammadi, and Nasour Bagheri. Crypt-analysis of reduced round SKINNY block cipher. Cryptology ePrintArchive, Report 2016/1120, 2016. https://eprint.iacr.org/2016/1120.

[Sor84] Arthur Sorkin. Lucifer, a cryptographic algorithm. Cryptologia,8(1):22–42, 1984.

[SP04] Taizo Shirai and Bart Preneel. On Feistel ciphers using optimal dif-fusion mappings across multiple rounds. In Pil Joong Lee, editor,ASIACRYPT 2004, volume 3329 of LNCS, pages 1–15. Springer, Hei-delberg, 2004.

189




[SS04] Taizo Shirai and Kyoji Shibutani. Improving immunity of Feistelciphers against differential cryptanalysis by using multiple MDS ma-trices. In Bimal K. Roy and Willi Meier, editors, FSE 2004, volume3017 of LNCS, pages 260–278. Springer, Heidelberg, 2004.

[SS16a] Sumanta Sarkar and Siang Meng Sim. A deeper understanding of theXOR count distribution in the context of lightweight cryptography.In David Pointcheval, Abderrahmane Nitaj, and Tajjeeddine Rachidi,editors, AFRICACRYPT 16, volume 9646 of LNCS, pages 167–182.Springer, Heidelberg, 2016.

[SS16b] Sumanta Sarkar and Habeeb Syed. Lightweight diffusion layer:Importance of Toeplitz matrices. IACR Trans. Symm. Cryptol.,2016(1):95–113, 2016.

[Sto16] Ko Stoffelen. Optimizing S-box implementations for several criteriausing SAT solvers. In Thomas Peyrin, editor, FSE 2016, volume 9783of LNCS, pages 140–160. Springer, Heidelberg, 2016.

[StSDT16] William A. Stein and the Sage Development Team. Sage MathematicsSoftware, 2016.

[Swa62] Richard G. Swan. Factorization of polynomials over finite fields. Pa-cific J. Math., 12(3):1099–1106, 1962.

[TA16] Yosuke Todo and Kazumaro Aoki. Wide trail design strategy forbinary MixColumns - enhancing lower bound of number of active S-boxes. In Mark Manulis, Ahmad-Reza Sadeghi, and Steve Schneider,editors, ACNS 16, volume 9696 of LNCS, pages 467–484. Springer,Heidelberg, 2016.

[TAY17] Mohamed Tolba, Ahmed Abdelkhalek, and Amr M. Youssef. Impos-sible differential cryptanalysis of reduced-round SKINNY. In MarcJoye and Abderrahmane Nitaj, editors, AFRICACRYPT 17, volume10239 of LNCS, pages 117–134. Springer, Heidelberg, 2017.

[TLS16] Yosuke Todo, Gregor Leander, and Yu Sasaki. Nonlinear invariantattack - practical attack on full SCREAM, iSCREAM, and Midori64.In Jung Hee Cheon and Tsuyoshi Takagi, editors, ASIACRYPT 2016,Part II, volume 10032 of LNCS, pages 3–33. Springer, Heidelberg,2016.

[TM16] Yosuke Todo and Masakatu Morii. Bit-based division property andapplication to Simon family. In Thomas Peyrin, editor, FSE 2016,volume 9783 of LNCS, pages 357–377. Springer, Heidelberg, 2016.

[War94] William P. Wardlaw. Matrix representation of finite fields. Math.Mag., 67(4):289–293, 1994.

190

[WLV+14] Qingju Wang, Zhiqiang Liu, Kerem Varici, Yu Sasaki, Vincent Rij-men, and Yosuke Todo. Cryptanalysis of reduced-round SIMON32and SIMON48. In Willi Meier and Debdeep Mukhopadhyay, editors,INDOCRYPT 2014, volume 8885 of LNCS, pages 143–160. Springer,Heidelberg, 2014.

[WN95] David J. Wheeler and Roger M. Needham. TEA, a tiny encryptionalgorithm. In Bart Preneel, editor, FSE’94, volume 1008 of LNCS,pages 363–366. Springer, Heidelberg, 1995.

[WWJZ14] Ning Wang, Xiaoyun Wang, Keting Jia, and Jingyuan Zhao. Dif-ferential attacks on reduced SIMON versions with dynamic key-guessing techniques. Cryptology ePrint Archive, Report 2014/448,2014. https://eprint.iacr.org/2014/448.

[WWW13] Shengbao Wu, Mingsheng Wang, and Wenling Wu. Recursive dif-fusion layers for (lightweight) block ciphers and hash functions. InLars R. Knudsen and Huapeng Wu, editors, SAC 2012, volume 7707of LNCS, pages 355–371. Springer, Heidelberg, 2013.

[XZL14] Hong Xu, Yonghui Zheng, and Xuejia Lai. Construction of perfectdiffusion layers from linear feedback shift registers. IET Inf. Secur.,9(2):127–135, 2015.

[YZS+15] Gangqiang Yang, Bo Zhu, Valentin Suder, Mark D. Aagaard, andGuang Gong. The Simeck family of lightweight block ciphers. In TimGuneysu and Helena Handschuh, editors, CHES 2015, volume 9293of LNCS, pages 307–329. Springer, Heidelberg, 2015.

[ZMI90] Yuliang Zheng, Tsutomu Matsumoto, and Hideki Imai. On the con-struction of block ciphers provably secure and not relying on any un-proved hypotheses. In Gilles Brassard, editor, CRYPTO’89, volume435 of LNCS, pages 461–480. Springer, Heidelberg, 1990.

191


design and analysis of lightweight block ciphers : a focus

Documents