vlsi aspects on inversion in finite fieldsmikael:... · 2011. 2. 23. · vlsi aspects on inversion...
TRANSCRIPT
Linkoping Studies in Science and TechnologyDissertation No. 731
VLSI Aspects on
Inversion in Finite Fields
Mikael Olofsson
Department of Electrical EngineeringLinkopings universitet, SE-581 83 Linkoping, Sweden
Linkoping 2002
VLSI Aspects on Inversion in Finite Fields
c© 2002 Mikael Olofsson
Department of Electrical EngineeringLinkopings universitetSE-581 83 LinkopingSweden
This document was prepared using LATEX2ε on a SUN Ultra 10. The fig-ures were produced using Xfig (from xfig.org) and the plots were producedusing Matlab (from MathWorks, Inc.). The irreducible polynomials inAppendix A were found using Magma (from University of Sydney) and theindex was compiled by Makeindex.
Printed in Sweden by UniTryck, Linkoping 2002
ISBN 91-7373-256-7ISSN 0345-7524
To my surprise
Abstract
Different algorithms and architectures for inversion in finite extension fieldsare studied. The investigation is restricted to fields of characteristic two.Based on a simple transistor model, various architectures are compared withrespect to delay, area requirement, and energy consumption.
Both polynomial and normal basis representations are considered. A specialinvestigation is made on representations of fields as tower fields. New archi-tectures are presented and compared with previously known architectures.For tower fields, a thorough investigation is made for the case where theextension degree is a power of two. In that case the investigation is basedon a classification of all possible bases of the field over its largest subfield.
It is noted that normal bases, generated by irreducible all-one polynomi-als, are closely related to the polynomial bases which are generated by thesame polynomials. Based on this observation, it is shown how architecturesconsidered for polynomial basis representation can be modified for use withcorresponding normal bases.
A list of minimum weight irreducible polynomials over F2 is also given.
i
ii
Acknowledgements
Everyone, who has tried to prepare a thesis, knows that it is hard work andthat it cannot be done without support from others.
First and foremost, my thougts go to my supervisor Professor Thomas Eric-son, and to Professor Stefan Dodunekov who filled in for Thomas during theacademic year 1993-94. Your experience and your comments on my workhas been invaluable to me. And Thomas, your patience while reading mymanuscripts has always amazed me.
I am lucky enough to be working in a very stimulating environment. To myformer and present colleagues in the Image Coding group, the InformationTheory group, and the Data Transmission group: Thanks for research re-lated discussions, mini-golf, go-cart, go, friday cakes, and discussions aboutjust anything over a cup of coffee.
Some colleagues deserve to be especially mentioned: Dr. Edoardo Mastro-vito, who introduced me to finite fields; Assistant Professor Ralf Kotter,who gave me the push that made me look into inversion; Associate Pro-fessor Lars-Inge Alfredsson, with whom I have had numerous discussionsabout cost measures; Mrs. Gunilla Svahn-Roming, our secretary, who hasbeen a great help on many occasions; and Research Engineer Jean-JacquesMoulis, who has been keeping my computer up and running.
I would not have been in the place where I am today, if it had not been formy parents, Mrs. Lisbeth Olofsson and Mr. Anders Olofsson. You taughtme that nothing is impossible; all that’s needed is some reading and effort.My father, regrettably, never got the opportunity to se this result of myefforts.
iii
Last, but absolutely not least, my thougts go to my family. Elisabeth, yourlove, patience, and understanding, is what has kept me going. Our sons,Erik and Viktor, have also had their share of sacrifices.
To all of you, I send my deepest gratitude.
Linkoping, December, 2001.
Mikael Olofsson
iv
Contents
1 Introduction 1
1.1 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Notation and Symbols . . . . . . . . . . . . . . . . . . . . . . 5
2 Mathematical Background 7
2.1 Groups, Rings, and Fields . . . . . . . . . . . . . . . . . . . . 7
2.2 Extension Fields . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.3 Vector and Matrix Representations . . . . . . . . . . . . . . . 11
3 VLSI Considerations 15
3.1 Static CMOS Gates . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Models Used . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.2.1 The Delay Model . . . . . . . . . . . . . . . . . . . . . 17
3.2.2 The Size Model . . . . . . . . . . . . . . . . . . . . . . 21
3.2.3 The Power Dissipation Model . . . . . . . . . . . . . . 22
3.3 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.3.1 Large Capacitive Loads . . . . . . . . . . . . . . . . . 23
3.3.2 Adder Trees . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3.3 Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.3.4 Control Logic and Control Signals . . . . . . . . . . . 26
3.4 Summary of Cost Measures . . . . . . . . . . . . . . . . . . . 27
4 Polynomial Basis Inverters 29
4.1 Inversion Based on the Euclidean Algorithm . . . . . . . . . . 29
4.1.1 The Architecture . . . . . . . . . . . . . . . . . . . . . 29
4.1.2 Properties of the Architecture . . . . . . . . . . . . . . 32
4.2 A Berlekamp-Massey Based Inverter . . . . . . . . . . . . . . 35
4.2.1 Triangular Bases . . . . . . . . . . . . . . . . . . . . . 36
v
vi Contents
4.2.2 The Architecture . . . . . . . . . . . . . . . . . . . . . 38
4.2.3 Properties of the Architecture . . . . . . . . . . . . . . 41
4.3 Inversion Based on the Gauss-Jordan Algorithm . . . . . . . 45
4.3.1 A Systolic Implementation of the Gauss-Jordan algo-rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.2 Previous Inverters Based on the Gauss-Jordan Algo-rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.3.3 A New Preprocessor . . . . . . . . . . . . . . . . . . . 49
4.4 Properties of the Polynomial Basis Inverters . . . . . . . . . . 54
5 Normal Basis Inverters 59
5.1 All-One Polynomials . . . . . . . . . . . . . . . . . . . . . . . 59
5.2 The Massey-Omura Multiplier . . . . . . . . . . . . . . . . . . 61
5.3 Inversion Based on Exponentiation . . . . . . . . . . . . . . . 69
5.3.1 Inversion by Squaring and Multiplication . . . . . . . 69
5.3.2 Inversion by Accellerated Squaring and Multiplication 73
5.4 Polynomial Basis Inverters Revisited . . . . . . . . . . . . . . 74
5.5 Properties of the Normal Basis Inverters . . . . . . . . . . . . 80
6 Inversion in Tower Fields 85
6.1 Bases of Tower Fields . . . . . . . . . . . . . . . . . . . . . . 86
6.2 Arithmetic Using Bases of Type I . . . . . . . . . . . . . . . . 88
6.2.1 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . 88
6.2.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 91
6.2.3 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.3 Arithmetic Using Bases of Type II . . . . . . . . . . . . . . . 92
6.3.1 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . 93
6.3.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 93
6.3.3 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.4 Arithmetic Using Bases of Type III . . . . . . . . . . . . . . . 96
6.4.1 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . 96
6.4.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 99
6.4.3 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . 100
6.5 Arithmetic in F4 . . . . . . . . . . . . . . . . . . . . . . . . . 102
6.5.1 Polynomial Basis Representation . . . . . . . . . . . . 103
6.5.2 Normal Basis Representation . . . . . . . . . . . . . . 105
6.5.3 Properties of Arithmetic in F4 . . . . . . . . . . . . . 107
6.6 Arithmetic in F16 . . . . . . . . . . . . . . . . . . . . . . . . . 109
6.6.1 Multiplication by a Constant . . . . . . . . . . . . . . 109
6.6.2 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . 109
Contents vii
6.6.3 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . 1126.6.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 1136.6.5 Properties of Arithmetic in F16 . . . . . . . . . . . . . 115
6.7 Properties of Inversion in Tower Fields . . . . . . . . . . . . . 1186.7.1 Type I Bases . . . . . . . . . . . . . . . . . . . . . . . 1206.7.2 Type II Bases . . . . . . . . . . . . . . . . . . . . . . . 1216.7.3 Type III Bases . . . . . . . . . . . . . . . . . . . . . . 1216.7.4 Best Choices . . . . . . . . . . . . . . . . . . . . . . . 122
7 Concluding Remarks 127
7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287.1.1 Fast Inverters . . . . . . . . . . . . . . . . . . . . . . . 1287.1.2 Small Inverters . . . . . . . . . . . . . . . . . . . . . . 1307.1.3 Low Energy Inverters . . . . . . . . . . . . . . . . . . 130
7.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . 133
A Minimum Weight Irreducible Polynomials over F2 135
A.1 The table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
viii Contents
Chapter 1
Introduction
1.1 Outline of the thesis
The overall structure of this thesis is as follows. First, in Chapters 2 and 3,we give mathematical and technical background. Then we present bothsome new and some old solutions to the inversion problem in Chapters 4through 6. These solutions are compared with respect to chip area, timeand energy consumption. Last, we summarize, draw some conclusions, andsuggest future research in Chapter 7.
In Chapter 2, we give some mathematical background. We give definitionsof fundamental concepts, and introduce some notation. We also introducea matrix representation of finite extension fields.
In Chapter 3, we give some technical background. Based on a simple modelof the MOS transistors, we derive the cost measures used in this thesis.
We start the investigation in Chapter 4, where we consider polynomial basesfor the representation of the field. Here we present two new architecturesfor inversion.
Normal basis inverters are studied in Chapter 5. We show that polynomialbasis inverters can in some cases be used for inversion in normal basis rep-resentation. We also make small improvements to two known architectures,and propose an architecture based on a known algorithm.
1
2 Chapter 1. Introduction
In Chapter 6 we consider inversion in tower fields, where the extensiondegree is a power of two. We consider all possible bases of the field over itslargest true subfield. New architectures are given.
The last chapter in the thesis, Chapter 7, summarizes the thesis. Here wedraw final conclusions and suggest future research.
In Appendix A, we give the result of a computer search for irreduciblepolynomials of low weight.
1.2 Background
The theory of finite fields goes back to the nineteenth century. Mathemati-cians like Abel [1], Gauss [17], and Galois adressed the problem of solvingalgebraic equations of higher degree, but it was Galois [15] who in 1830,inspired by complex numbers, introduced what were to be called Galoisimaginaries. In that way he actually gave rise to the theory of finite fieldsand especially finite extension fields. He is therefore considered as one of thefounders of modern algebra. In 1893 Weber [51] treated fields as abstractobjects in terms of groups, in a similar way as we do today. Dickson [10]gave 1901 a thorough exposition of the area as known by the nineteenthcentury mathematicians.
Shannons [43] introduction of information theory in 1948 and the relatedarea algebraic coding theory gave rise to new questions in the area. Er-ror correcting codes are often defined over finite rings or fields. Efficienthardware for decoding of algebraic error-correcting codes over finite fieldsdepends on the existence of good VLSI implementations of algebraic oper-ations on elements in finite extension fields. Therefore, the study of archi-tectures for VLSI implementations of algebraic operations on elements inthese fields is of interest.
Much effort has been made on finding and comparing architectures for mul-tiplication, squaring, and exponentiation. However, an operation that oftenis regarded as too complex to be suitable for implementation is inversion.A natural question that arises is wheather this is true. The goal of this the-sis is to compare old and new architectures for inversion in finite extensionfields with respect to time, area, and energy needs.
1.2. Background 3
The first constructive idea of an architecture for inversion is described byBerlekamp [5] in 1968. That architecture for inversion in F2m , the field ofsize 2m, is based on the Euclidean algorithm for polynomials. The archi-tecture is regular and uses O (m) clock cycles and O (m) chip area, witha clock period of length O (logm). The chip area is here regarded as thenumber of arithmetic units for the ground field used in the architecture.The clock period length is considered to be the largest delay among thesignal paths in the architecture, and the delay of a certain gate in a pathis considered to depend on the capacitive load of its output. However, asdescribed by Berlekamp this architecture needs a relatively large amount ofcontrol circuitry.
Since then, people have proposed various methods for inversion, with dif-ferent application areas. The idea of using the Euclidean algorithm forinversion is also used by Araki, Fujita, and Morisue [2].
Mastrovito [34] studies in his thesis several architectures for algebraic oper-ations on elements of F2m . A small part of that thesis deals with inversion.One method treated there is inversion by exponentiation. This can be doneby combining multiplication and squaring either in O (m) clock cycles andO(
m2)
chip area or in O(
m2)
clock cycles and O (m) chip area, dependingon how the multiplication and squaring is implemented. In both these casesthe clock period is of length O (logm). This assumes polynomial basis rep-resentation of the element. Another method for inversion by exponentiationstudied by Mastrovito is inversion by a multiplier tree. Using a polynomialbasis and direct multipliers, a direct inverter reaching O
(
m3)
chip areaand a clock period of O
(
log2m)
can be produced. The regularity of theseinverters is fairly good. Table lookup, also mentioned by Mastrovito, hasgood time properties and regularity, but the chip area needed is O (m2m),which makes it interesting for small m only. The Euclidean algorithm ismentioned, but not investigated further.
Hasan and Bhargava [20] present new architectures for multiplication andsquaring using a polynomial basis generated by an irreducible all-one-poly-nomial. They use these new architectures for inversion by exponentiation.They reach O (m) clock cycles of length O (logm) and O
(
m2)
chip area.Wang et al. [47] proposes an architecture that uses this idea for normalbasis representation. They use a Massey-Omura multiplier to perform themultiplication and hence the time, size, and structural properties of thatmultiplier are inherited by the inverter.
4 Chapter 1. Introduction
There are other somewhat similar methods for inversion using normal basisrepresentation. Feng [13] proposes an architecture that uses O (m logm)clock cycles of length O (logm) and at least O (m) chip area for inversionin F2m using normal basis representation. It is based on a normal basismultiplier derived in that paper. Itoh and Tsujii [26][27] propose algorithmsfor inversion in two classes of finite fields using normal basis representation.However, they do not propose any architectures.
Direct inversion has resonably good time and size properties, at least insome cases, but the architectures are typically terribly irregular. However,some people have given structured methods for generating networks fordirect inversion. Litow and Davida [31] and von zur Gathen [16] prove theexistence of networks for direct inversion with a depth that is O (logm) anda gate count that is O
(
mO(1))
. Morii and Kasahara [36] propose a methodusing all subfields of F2m . The method is especially efficient when m isa power of 2. In that case the chip size is O
(
mlog 3 logm)
and the clockperiod O
(
log3m)
Asano, Itoh, and Tsujii [3] also propose a method usingall subfields of F2m . They claim that the gate count of the correspondingarchitecture is O
(
m3 logm)
, with a depth of O(
log2m)
.
Paar [39] also studies several architectures for algebraic operations on ele-ments of F2m . He concentrates on bit parallel architectures in his thesis.Inversion in composite fields is considered in one chapter. The architec-tures for inversion that are studied are based on the inverters of Itoh andTsujii [26] and Morii and Kasahara [36].
Davida [9] gives an equation system based on the Chinese remainder the-orem. The 2m − 1 × 2m − 1 matrix over the ground field defining thisequation system is partly Toeplitz. Wang and Lin [48] as well as Hasanand Bhargava [22] propose systolic array solutions for division. The ar-chitecture proposed by Wang and Lin solves the equation system given byDavida, while the architecture proposed by Hasan and Bhargava solves anequation system based on an m×m matrix. Both architectures use O (m)clock cycles of length O (1) and O
(
m2)
chip area. However, both the sizeand time properties are substantially smaller for the architecture by Hasanand Bhargava compared to the architecture by Wang and Lin. A similararchitecture is proposed in this thesis that needs approximately 85% of thechip size needed by the architecture by Hasan and Bhargava. These ar-chitectures are good in the sense that they are extremely structured, havesmall clock periods and use no feedback loops and no adder trees. They
1.3. Notation and Symbols 5
are all modifications of standard systolic array structures for solving linearequation systems.
Kovac, Ranganatan, and Varanasi [28][29] present an algorithm that is saidto be based on pattern recognition. This results in an exhaustive searchfor elements of the field. Therefore the time and size properties are notgood. The presented architecture is for m = 4, but the idea can easily beadapted to be used with any m. The calculations are then performed inn parallel branches, where n is some suitable power of 2. The architectureuses O (2m/n) clock cycles of length O (logm) and O
(
nm2)
chip area.
1.3 Notation and Symbols
Throughout this thesis we will analyze architectures for performing arith-metic operations in finite fields of characteristic 2 using basic arithmeticoperations in subfields. We denote a field of size q by Fq. The arithmeticoperations addition, multiplication and division in these fields are denotedby +, juxtaposition, and /. Sometimes multiplication is denoted by a dot(·) for clarity. The symbols used for hardware manipulating elements offields of characteristic 2 are given in Figure 1.1.
We will occationally use Boolean operations. These will be denoted asfollows.
Boolean inversion of A: ABoolean AND of A and B: A ∧BBoolean OR of A and B: A ∨BBoolean XOR of A and B: A∨B
The symbols used for Boolean gates are given in Figure 1.2.
We will frequently use switches. The symbols used for these switches followthe pattern given in Figure 1.3.
6 Chapter 1. Introduction
a
Addition
Multiplication by a Buffer
Multiplication
Figure 1.1: Symbols for objects manipulating elements of fields of characteristic 2.
a
AND-gate NAND-gate
OR-gate NOR-gate
XOR-gate XNOR-gate
Memory cellInitiated by a
Inverter
Figure 1.2: Symbols for Boolean gates and memory cells.
A
B
Figure 1.3: A two-way switch with two control signals, A and B. The switch isunderstood to be connected to the terminal labelled A when we haveA = 1 and to the terminal labelled B when we have B = 1.
Chapter 2
Mathematical Background
In this chapter we give an overview of the mathematical background ofthis thesis. Statements that can be considered as standard material aregiven without proof. We refer to Herstein [25], Lidl and Niederreiter [30],and Wan [46] for more details and proofs. Berlekamp [5], Blahut [7], andMcEliece [35] consider finite fields from an engineering point of view.
2.1 Groups, Rings, and Fields
Algebra is the theory of sets under one or more operations that are definedon the sets. Algebraic systems with special structure have been given namessuch as groups, rings, and fields.
Definition 1 A group G is a nonempty set under a binary associative op-eration ⋆ that operates on G, with the following properties.
1. There is an element e ∈ G, called the unit element of G, such thata ⋆ e = e ⋆ a = a holds for all a ∈ G.
2. For any a in G there is an element a−1, called the inverse of a, suchthat a−1 ⋆ a = a ⋆ a−1 = e holds.
If ⋆ is commutative, that is if a ⋆ b = b ⋆ a holds for any a, b in G, then G iscalled an Abelian group.
7
8 Chapter 2. Mathematical Background
It can easily be shown that both the unit element, and the inverse of anelement of G are uniquely determined. A simple example of an Abeliangroup is the set of integers under addition.
Definition 2 A ring R is a nonempty set under two binary operations,normally called addition, denoted by +, and multiplication, denoted by ·,with the following properties.
1. R is an Abelian group with respect to addition.
2. Multiplication is associative, that is (a · b) · c = a · (b · c) holds for anya, b, c in R.
3. Multiplication is distributive over addition, that is a·(b+c) = a·b+a·cand (b+ c) · a = b · a+ c · a hold for any a, b, c in R.
If the multiplication in R is commutative, that is if a · b = b ·a holds for anya, b in R, then R is called a commutative ring.
Multiplication is often denoted by juxtaposition. The best known ring isthe integer ring, which is commutative, but we should note that there areother rings. The operations do not necessarily need to be what we normallymean by addition and multiplication. For instance, we do not even demandthe presence of a multiplicative unit element. But if there is a multiplica-tive unit element, it can be shown that it cannot equal the additive unitelement. By convention we denote the additive unit element by 0 and themultiplicative unit element (if any) by 1. The ring of integers is a ring withmultiplicative unit element, while the set of even integers under integer ad-dition and multiplication is a ring without multiplicative unit element, bothwith infinitely many elements. Denote by Zn the residue class of integersmodulo n, n > 1. Then Zn is a ring with exactly n elements, under additionand multiplication reduced modulo n.
Definition 3 A field F is a commutative ring with the additional prop-erty that the set of nonzero elements of F form an Abelian group undermultiplication. A field with finitely many elements is called a finite field.
A finite field is also called a Galois field after the French 19th centurymathematician Evariste Galois.
2.2. Extension Fields 9
The real numbers under ordinary addition and multiplication is an exampleof an infinite field. We noted that Zn is a finite ring with n elements.However, if n is a prime, then Zn is actually a finite field. We normallyrefer to such a field as a prime field, since its size is a prime.
There is a finite field with q elements if and only if q is a prime power.Furthermore, any two finite fields with q elements are isomorphic, whichmeans that there is a one-to-one mapping from one of the fields to the othersuch that the the algebraic structure is preserved. This means that there isessentially only one finite field with q elements. We will denote this field byFq. The multiplicative group F∗
q , Fq \ {0} is not only Abelian. It is also acyclic group, i.e. there is at least one element α ∈ F∗
q such that any elementin F∗
q is a power of α. The element α is then referred to as a generator ofF∗q or a primitive element of Fq.
The smallest finite field is F2. Here addition and multiplication are per-formed mod 2, which means that addition is equivalent to Boolean XORand that multiplication is equivalent to Boolean AND.
2.2 Extension Fields
Let p be a prime. We have seen how we can construct Fp. Now we wish toconstruct Fpm , m > 1.
Definition 4 Let F and K be fields such that F ⊂ K holds. Then K iscalled an extension of F and F is called a subfield of K.
The field of complex numbers C is an extension of the field of real numbersR, constructed by adjoining a root of the irreducible polynomial x2+1 toR. Extensions of finite fields can also be constructed by adjoining roots ofirreducible polynomials. Let p(x) =
∑mi=0 pix
i be an irreducible polynomialover Fp of degree m. Such polynomials exist of any degree m ≥ 1 overany finite field. It is often convenient to consider only irreducible polyno-mials with the leading term pm = 1. Such polynomials are called monic.Fpm is then constructed by adjoining a root of p(x) to Fp. This can also bedescribed as the set of polynomials over Fp reduced modulo p(x). An impor-tant observation is that this implies that Fpm is a vector space of dimensionm over Fp. Furthermore, the elements represented by xi, 0 ≤ i < m, forma basis of this vector space that we formally define in the following way.
10 Chapter 2. Mathematical Background
Definition 5 Let ϑ be an element of Fqm such that{
ϑi}m−1
i=0is a basis of
Fqm over Fq. Then{
ϑi}m−1
i=0is called a polynomial basis of Fqm over Fq.
These bases are also called standard, canonical, and conventional bases inthe literature. There is always at least one polynomial basis of a field overany of its subfields. The element ϑ ∈ Fqm generates a polynomial basis ofFqm over Fq if and only if ϑ is a root of an irreducible polynomial p(x) ofdegree m over Fq. Another often used basis is the following.
Definition 6 Let ϑ be an element of Fqm such that {ϑqi}m−1i=0 is a basis of
Fqm over Fq. Then {ϑqi}m−1i=0 is called a normal basis of Fqm over Fq.
There is always at least one normal basis of a field over any of its subfields.One useful property of normal bases is that raising an element to power qis a simple cyclic shift of the vector representing the element in a normalbasis. If {ϑqi}m−1
i=0 is a basis, then ϑ is a root of an irreducible polynomial,
say p(x), of degree m. Furthermore, {ϑqi}m−1i=0 is the set of all roots of
p(x). However, not all irreducible polynomials of degree m generate normalbases. Polynomial and normal basis representations are the most commonrepresentations of finite extension fields.
Definition 7 Let α be an element of Fqm. The trace of α over Fq is defined
as TrFqm/Fq(α) ,
∑m−1i=0 αq
i.
When the fields involved are obvious from the context, we will simply writeTr(α). The trace function is a mapping from Fqm onto Fq, that is linearover Fq. Actually, the linear transformations from Fqm into Fq are knownto be exactly the mappings Lβ, β ∈ Fqm , given by Lβ(α) = Tr(βα) forall α ∈ Fqm and they are known to be distinct. See for instance Lidl andNiederreiter [30, pp 54-56].
Definition 8 Let {θi}m−1i=0 and {σj}m−1
j=0 be bases of Fqm over Fq. If
TrFqm/Fq(θiσj) =
{
0 , i 6= j1 , i = j
holds, the bases are said to be dual.
Given a basis of Fqm over Fq, there exists a unique dual of that basis. Finally,the smallest subfield of Fqm is a prime field Fp. We refer to the prime p asthe characteristic of Fqm .
2.3. Vector and Matrix Representations 11
2.3 Vector and Matrix Representations
We have noted that Fqm is a vector space over Fq. That makes additionin Fqm equivalent to vector addition of the corresponding vectors over Fq.It also makes multiplication of an element of Fqm by an element of Fq es-pecially simple, since that is equivalent to multiplication of a vector by ascalar. Multiplication of two elements of Fqm and inversion of an element inF∗qm , however, demand some more elaborate thinking. We use the following
notation for the vectors.
Definition 9 Let {θj}m−1j=0 be a basis of Fqm over Fq and let α be an element
in Fqm. Define the row vector θ = (θ0, θ1, . . . , θm−1). The column vector
αθ = (a0, a1, . . . , am−1)T over Fq satisfying α = θαθ is called the θ-vector
of α.
Next, we use two bases to create a matrix notation of elements in Fqm .
Definition 10 Let {θj}m−1j=0 and {σi}m−1
i=0 be bases of Fqm over Fq and de-
fine the row vectors θ , (θ0, θ1, . . . , θm−1) and σ , (σ0, σ1, . . . , σm−1). Thematrix
ασ,θ , ((θ0α)σ , (θ1α)σ , . . . , (θm−1α)σ) ,
where (θjα)σ is the σ-vector of θjα, is called the [σ, θ]-matrix of α ∈ Fqm.
Let {θ′i}m−1i=0 be the dual basis of {θj}m−1
j=0 . It is well known that the entries
ai in the θ-vector of α are given by ai = Tr(θ′iα) for all α ∈ Fqm . Thisstatement can easily be extended in the following way for [σ, θ]-matrices.
Lemma 1 Let {θj}m−1j=0 and {σi}m−1
i=0 be bases of Fqm over Fq and let
{σ′i}m−1i=0 be the dual basis of {σi}m−1
i=0 . Define θ , (θ0, θ1, . . . , θm−1) and
σ , (σ0, σ1, . . . , σm−1). Then the entries of
ασ,θ =
a0,0 · · · a0,m−1...
. . ....
am−1,0 · · · am−1,m−1
are given by ai,j = Tr(σ′iθjα) for all α ∈ Fqm.
12 Chapter 2. Mathematical Background
The following lemma gives us a possibility to describe multiplication in finiteextension fields, where the elements are represented in different bases.
Lemma 2 Let {θj}m−1j=0 and {σi}m−1
i=0 be bases of Fqm over Fq and define
the row vectors θ , (θ0, θ1, . . . , θm−1) and σ , (σ0, σ1, . . . , σm−1). Let α, β,and γ be elements of Fqm. Then γσ = ασ,θβθ holds if and only if γ = αβholds.
Proof: Assuming γ = αβ, we have σγσ = αθβθ by Definition 9. Multiplyingα into the vector θ gives us
σγσ = (αθ0, αθ1, . . . , αθm−1)βθ.
By Definition 9, we have αθj = σ (αθj)σ, which gives us
σγσ = (σ (αθ0)σ , σ (αθ1)σ , . . . , σ (αθm−1)σ)βθ
= σ ((αθ0)σ , (αθ1)σ , . . . , (αθm−1)σ)βθ,
where we by Definition 10 can identify ((αθ0)σ , (αθ1)σ , . . . , (αθm−1)σ) asthe [σ, θ]-matrix of α and hence σγσ = σασ,θβθ holds. Since {σi}m−1
i=0 is abasis of Fqm over Fq, these elements are linearly independent. Thereforeγσ = ασ,θβθ holds.
Conversely, assuming γσ = ασ,θβθ, we simply follow the above steps back-wards to prove that γ = αβ holds. 2
All fields considered in this thesis have characteristic 2, i.e. we are consid-ering F2m for some positive integer m. From a practical point of view, itis reasonable to assume that all elements are represented as vectors withrespect to the same basis, say θ. For multiplication, this means that wehave α and β as αθ and βθ, and wish to calculate the product γ = αβ asγθ. To make use of Lemma 2, we need to generate αθ,θ, and then calculatethe matrix-vector product γθ = αθ,θβθ. Both the generation of the matrixand the calculation of the matrix-vector product are performed using F2-arithmetic. We have already noted that those operations can be performedusing AND gates as F2-multipliers and using XOR gates as F2-adders. If θis a polynomial basis, we can generate the columns of αθ,θ from αθ usinga binary feedback shift register of length m, where the feedback network is
2.3. Vector and Matrix Representations 13
given by the irreducible polynomial that is used to generate the basis. Thematrix-vector product can then be sequentially determined at the same timeas the generation of the matrix by considering one column at a time andaccumulating the result in a register of length m. Most sequential poly-nomial basis multipliers take this approach, e.g. a multiplier presented byBerlekamp [5, Ch. 2.4].
A direct consequence of Lemma 2 is that ασ,θ is singular if and only if αis zero. Therefore, we can find the inverse of any non-zero element α bygenerating ασ,θ for some suitable basis σ from αθ, and then solving theequation system ασ,θβθ = 1σ, where 1σ is the σ-vector of 1. In this casewe can choose σ freely, since that basis is only used during calculation. Forinstance, we can choose σ in order to achieve some structure in ασ,θ thatwe can make use of in the inversion algorithm.
Definition 11 Let {θi}m−1i=0 and {σi}m−1
i=0 be sets of elements from Fqm. Ifthere is a β ∈ F∗
qm such that σi = βθi holds for all 0 ≤ i < m, then {σi}m−1i=0
is called a multiple of {θi}m−1i=0 .
Obviously, if {θj}m−1j=0 is a basis of Fqm over Fq, then every multiple of
{θj}m−1j=0 is again a basis of Fqm over Fq. By Definition 8, the following
also holds. Let {θj}m−1j=0 , be a basis of Fqm over Fq and let {θ′i}
m−1i=0 be its
dual basis. Then{
β−1θ′i}m−1
i=0is the dual basis of {βθj}m−1
j=0 , where β is anyelement of F∗
qm .
We are now ready to derive necessary and sufficient conditions that thematrices are Hankel.
Theorem 1 Let {σi}m−1i=0 and {θj}m−1
j=0 be bases of Fqm over Fq. Define the
row vectors σ , (σ0, σ1, . . . , σm−1) and θ , (θ0, θ1, . . . , θm−1). The matricesασ,θ are Hankel for all α ∈ Fqm if and only if {θj}m−1
j=0 is a multiple of a
polynomial basis and {σi}m−1i=0 is a multiple of the dual basis of the same
polynomial basis.
14 Chapter 2. Mathematical Background
Proof: First assume that ασ,θ is Hankel for all α ∈ Fqm . Let {σ′i}m−1i=0 be the
dual basis of {σi}m−1i=0 . By Lemma 1, the i, j-th entry of ασ,θ = (aij)
m−1,m−1i,j=0,0
isai,j = Tr(σ′iθjα), 0 ≤ i, j < m.
If ασ,θ is Hankel, then we have ai,j = ai+1,j−1, for 0 ≤ i < m− 1 and0 < j ≤ m− 1. Since Tr(βα) = Tr(γα) is true for all α ∈ Fqm if and only ifβ = γ ∈ Fqm holds, we have
σ′iθj = σ′i+1θj−1.
We rewrite this in the form
θj/θj−1 = σ′i+1/σ′i.
This relation must hold for all i, 0 ≤ i < m− 1, and j, 0 < j ≤ m− 1, andtherefore we have
θj/θj−1 = σ′i+1/σ′i = φ
for some φ ∈ F∗qm . Since a multiple of a basis is again a basis, both {σ′i}
m−1i=0
and {θj}m−1j=0 are multiples of the polynomial basis
{
φi}m−1
i=0of Fqm over Fq
and hence, {σi}m−1i=0 is a multiple of the dual basis of
{
φi}m−1
i=0.
Conversely, assuming that {θj}m−1j=0 is a multiple of a polynomial basis and
that {σi}m−1i=0 is a multiple of the dual basis of the same polynomial basis,
we follow the same steps backwards to prove that ασ,θ is Hankel for allα ∈ Fqm . 2
The proof technique used above can be used to provide necessary and suf-ficient conditions for other structures as well.
Berlekamp [6] noted that the [σ, θ]-matrices are Hankel if {θj}m−1j=0 is a poly-
nomial basis and {σi}m−1i=0 is its dual basis. Morii et al [37] essentially noted
the same for the case where {θj}m−1j=0 is a polynomial basis and {σi}m−1
i=0 is amultiple of its dual basis, but without mentioning any dual basis. The factthat we can have Hankel matrices implies that we can create algorithmsfor inversion in finite extension fields by modifying algorithms for solvingHankel problems. We will return to this idea.
Chapter 3
VLSI Considerations
The purpose of this chapter is to derive cost measures that are suitable foran analysis of architectures for VLSI implementations. The cost measuresused are derived from a simple model of the MOS transistors, assumingstatic CMOS implementation of the Boolean gates. We also assume thatonly one-input and two-input gates are used.
3.1 Static CMOS Gates
XOR gates and AND gates perform addition and multiplication in F2.Therefore, we are especially interested in these gates. For clarity, we givetheir standard implementations using static CMOS in Figures 3.1 and 3.2.The assumed binary static master-slave D flip-flop is given in Figure 3.3.
All switches are assumed to be implemented as ordinary CMOS transmissiongates using one nMOS transistor and one pMOS transistor in parallel. Thisswitch needs two control signals. We will assume that any n-way switch isimplemented by combining n transmission gates in the obvious way. Thisswitch needs 2n control signals, each connected to the gate of one transistor.However, the two transmission gates in a 2-way switch can be controlled bythe same two control signals. Hence, these two control signals are connectedto the gates of two transistors each.
15
16 Chapter 3. VLSI Considerations
Vdd
Vdd
A
A
B
B
A ∧BA ∧B
Figure 3.1: A static CMOS AND gate. A and B are the inputs and A ∧ B is theoutput.
Vdd
Vdd
Vdd
A
AA
A
A
A
BB
B
B
B
B
A∨B
Figure 3.2: A static CMOS XOR gate. A and B are the inputs and A∨B is theoutput.
3.2. Models Used 17
Master Slave
Vdd VddVdd Vdd
D Q
φ
φφ
φ φ
φφ
φ
Figure 3.3: A static CMOS D-flip-flop with input D, output Q, and clock signal φ.
3.2 Models Used
We would like to have a model of the transistor that is simple to use, butwe still want the model to mirror effects like delay in Boolean gates, arearequirement, and power dissipation.
3.2.1 The Delay Model
The delay Td of a Boolean gate is defined to be the time difference between50% level of the input transition and 50% level of the output transition.Assume that we have a Boolean inverter, whose output is loaded by a ca-pacitance CL, and let the input signal be a step function. Based on ananalytic resoning, Weste and Eshragian [53, Ch 4.5] derive the expression
Td = ACL
β,
where A is a constant given by the supply voltage and the threshold voltageof the transistor and where β is the gain factor of the transistor. Now, letW and L denote the width and the length of the transistor. Also, let µdenote the mobility of the charge carriers, let ε denote the capacitivity of
18 Chapter 3. VLSI Considerations
the insulator, and let tox denote the thickness of the insulator. Then thegain factor is given by
β =µε
tox· WL.
Introducing the process constant
K , Atoxµε,
we can express the delay as
Td = KL
WCL.
There are actually two such process constants, one for each type of tran-sistor. We use the notation Kn for nMOS transistors and Kp for pMOStransistors. The main reason that these constants differ is that the mobil-ity of electrons and holes differ. Weste and Eshragian [53, Ch 2.2.1] statethat the mobility of holes and electrons is 180 cm2/Vs and 500 cm2/Vsrespectively. Hence, we have Kp ≈ 2.8Kn.
In a CMOS technology there are minimum sizes of all geometrical measures.Consider a transistor of minimum length L0 and minimum width W0. Wewill normalize all properties with respect to this minimum size transistor.Therefore we use the normalized width
w =W
W0
of a transistor instead of the width W . In static CMOS, nMOS transistorsare used to make the output low, while pMOS transistors are used to makethe output high. Therefore, Kn determines the delay for negative outputtransitions and Kp determines the delay for positive output transitions. Inorder to get symmetric behaviour, we assume, unless otherwise stated, thatall transistors have length L0, all nMOS transistors have normalized width1, and all pMOS transistors have normalized width 2.8. These transistors,and Boolean gates built of them, we call unscaled.
In more complex gates the load capacitance may be charged or dischargedthrough transistors connected in series. The delay is then given by
Td = K∑
i
LiWi
CL, (3.1)
3.2. Models Used 19
where Li and Wi are the length and the width, respectively, of the i-thtransistor. The load capacitance may also be charged or discharged throughtransistors connected in parallel. The worst case, however, is always givenby transistors connected in series. We use the worst case as our estimate ofthe delay of a given gate. Equation 3.1 can be interpreted as the delay ofan RC-link. Therefore we introduce the resistance
R = K∑
i
LiWi
of the series transistors and the reference resistance
R0 = KnL0
W0.
We normalize resistances with respect to R0 by introducing the normalizedresistance
r =R
R0
of a resistance R. For the output of an unscaled Boolean gate, this isthe maximum number of series transistors connecting the output to Vdd orground. In case the transistor widths of a Boolean gate are scaled by ascaling factor s, then the normalized output resistance of that Boolean gateis scaled by s−1.
The capacitances in a MOS transistor can be modeled as the three capac-itances Cg, Cs, and Cd connecting the gate, source, and drain respectivelyto ground. Weste and Eshragian [53, Ch 4.3.4] give an example of an nMOStransistor where Cg is about 4.5 times larger than Cs and Cd. The capacitiveload of a CMOS gate is then dominated by the input gate capacitances ofthe succeeding gates. We will therefore only consider the gate capacitancesin our transistor model.
The gate capacitance is proportional to LW , where L and W are the lengthand width, respectively, of the transistor gate. Let C ′
0 be the gate ca-pacitance of a minimum size transistor. Then the gate capacitance of anunscaled nMOS transistor is C ′
0 and the gate capacitance of an unscaledpMOS transistor is 2.8C ′
0. Since almost all gate outputs in static CMOSare loaded by as many nMOS gate capacitances as pMOS gate capacitances,it is convenient to normalize capacitances with respect to the average gatecapacitance. Therefore, we introduce the reference capacitance C0 = 1.9C ′
0.
20 Chapter 3. VLSI Considerations
We normalize capacitances with respect to C0 by introducing the normalizedcapacitance
c =C
C0
of a capacitance C. For an input of an unscaled Boolean gate, this issimply the number of transistor gates connected to that input. In case thetransistor widths in a Boolean gate are scaled by a scaling factor s, thenthe normalized input capacitance of that Boolean gate is also scaled by s.
Now we are ready to return to the delay. We wish to normalize delayswith respect to the minimum size transistor. Therefore, we introduce thereference delay T0 = R0C0, and we normalize delays with respect to T0 byintroducing the normalized delay
t =T
T0
of a delay T . Assume that we have a Boolean gate with normalized outputresistance r. Let the normalized load capacitance be c. Then the normalizeddelay of that link is t = rc. As an example, consider the AND gate inFigure 3.1 on page 16, realized as a NAND gate, followed by a Booleaninverter. The normalized output resistance of the NAND gate is r = 2 andthe normalized input capacitance of the inverter is c = 2. The normalizeddelay of this link is therefore tint = rc = 4. We call tint the normalizedinternal delay of the AND gate.
Let ti be the normalized delay of the i-th link in a path from the output ofa flip-flop to the input of a flip-flop in an architecture to be analyzed. Thetotal normalized delay of that path is simply the sum
∑
i
ti.
This includes delays of Boolean gates and of one flip-flop. The path with thelargest delay is called the critical path. This limits the clocking frequencyfc to
fc ≤ (tCPT0)−1,
where tCP is the normalized delay of the critical path. Since there in somecases may be more than one critical path, we will speak about “a criticalpath” or “critical paths” when that is appropriate.
3.2. Models Used 21
So far we have assumed that the input waveform is a step function. A moreaccurate assumption would have been to assume a ramp. Hedenstierna andJeppson [24] have given a modified expression of the delay, where a fractionof the input rise- or fall-time is added to our delay. Our limited investigationsuggests that this may increase the delay by some 30 percent compared toour simpler model. We will not take this effect into account.
3.2.2 The Size Model
The actual chip area requirement of a certain architecture depends verymuch on the technology used for the implementation. The area is not onlydetermined by the number of transistors and the sizes of them. Area is alsoneeded for interconnections. This area depends on how complicated theinterconnection network is, and can in some cases be a substantial part ofthe total area. It would be desirable to include this area in our model for thechip area. Regrettably, that would make our model far to complicated. Inorder to make the chip area model reasonable, we therefore restrict ourselvesto considering the number of transistors and their sizes only.
Let A′0 be the area occupied by a transistor of minimum size, and assume
that a transistor of normalized width w occupies the area wA′0. Then the
area occupied by an nMOS transistor isA′0 and the area occupied by a pMOS
transistor is 2.8A′0. We normally have as many nMOS transistors as pMOS
transistors in static CMOS. Following the arguments for capacitances above,it is convenient to normalize areas with respect to the average transistorarea. Hence, define the reference area A0 = 1.9A′
0. We normalize areaswith respect to A0 by introducing the normalized area
a =A
A0
of an area A. The normalized area occupied by an unscaled Boolean gateis then simply the number of transistors in that gate. In case the transistorwidths in a Boolean gate are scaled by a scaling factor s, then the normalizedarea of that Boolean gate is also scaled by s.
22 Chapter 3. VLSI Considerations
3.2.3 The Power Dissipation Model
There are several different types of power dissipation in static CMOS. Themost obvious ones are listed below.
• Dynamic power dissipation arises when capacitances are charged anddischarged. This is normally the dominating part of the total powerdissipation.
• Static power dissipation is caused by leakage currents through reversebiased parasitic diodes. For a well designed chip, this is a very smallpart of the total power dissipation. Weste and Eshragian [53, Ch. 4.7]state that the static power dissipation for a specific inverter is between1nW and 2nW at the supply voltage Vdd = 5V.
• Short circuit power dissipation arises due to the fact that both thenMOS net and the pMOS net of a Boolean gate will be conductingduring a short period of time during input signal transitions. This isnormally a few per cent of the total power dissipation, as noted byVeendrick [45].
We will only consider dynamic power dissipation since this is the dominatingpart of the total power dissipation. Consider a network of Boolean gates,with a number of inputs, and only one output. Assume further that thedelays of the paths from the inputs to the output differ. Due to this differ-ence in delay, we may have multiple transitions at the output node duringa clock period, even if each input only changes its value at most once. Thisphenomenon is called glitches and it causes unwanted power dissipation.Shen et al [44] have noted that this unwanted power dissipation is typicallyabout 20 per cent of the total power dissipation. In order to simplify thepower dissipation model, we will only consider single transitions during oneclock period, i.e. the signal changes its value either once or not at all duringa clock cycle. However, there is one obvious exception: The clock signalalways has exactly two transitions per clock cycle.
A capacitance C charged through a transistor to the voltage Vdd holdsthe energy CV 2
dd/2. At the same time the energy lost in the transistoris also CV 2
dd/2. This means that when the capacitor has been chargedand discharged, the total consumed energy is CV 2
dd, and the energy cost
3.3. Special Cases 23
of a single transition is CV 2dd/2. The power consumed by charging and
discharging C with the frequency f is therefore fCV 2dd. Let us relate energies
and powers to our reference capacitance C0 by defining the reference energyE0 = C0V
2dd/2 and the reference power P0 = fcC0V
2dd/2, where fc is the
clock frequency used. We normalize energies and powers with respect to E0
and P0, respectively, by introducing the normalized energy
e =E
E0
of an energy E, and the normalized power dissipation
p =P
P0
of a power P .
3.3 Special Cases
3.3.1 Large Capacitive Loads
We have noted that the normalized delay of a link consisting of a gate outputwith normalized resistance r and a normalize load capacitance c is rc. Aconsequence of our assumption, that we only use Boolean gates with one ortwo inputs, is that r is either 1 or 2 for an unscaled Boolean gate. However,c may be large, which could be a problem for us. Luckily, there is a simplemethod to make the delay substantially smaller than rc for large enough rc.
Let n be a positive odd integer and let s be given by s = (rc/2)1/n. Consideran inverter chain, consisting of n− 1 cascaded Boolean inverters, connectedbetween the gate output and the large load capacitance. The i’th inverterin the chain is scaled by si/r, 1 ≤ i < n. The normalized delay ti of the i’thlink in this chain is therefore given by
ti =r
si−1· 2si
r= 2s, 1 ≤ i ≤ n,
including both the input link (i = 1) and the output link (i = n) of thechain. Hence, the total normalized delay of this chain is
t =n∑
i=1
ti = n · 2s = 2s logs
(rc
2
)
24 Chapter 3. VLSI Considerations
It is fairly easy to show that we have
t ≥ t′ , 2e ln(rc
2
)
.
Let us study the quotient t/t′. Then we have
t
t′=
s
e ln s,
which only depends on s. This quotient reaches its minimum at s = e wherewe have t = t′. Given rc, we wish to choose n and s such that t is minimized.Typically, this does not give us s = e. Luckily, the choice of s is not cruisal.For example, the delay does not increase by more than some 6 percent ifwe choose s somewhere in the interval 2 ≤ s ≤ 4, compared to the optimums = e.
The normalized area requirement of the inverter chain is
a =n−1∑
i=1
2si/r =2s
r· s
n−1 − 1
s− 1
The quotient a/c for an optimum choice of n approaches (e−1)−1 as c tendsto infinity.
The above described inverter chain cannot be used for all possible valuesof r and c. The largest normalized output resistance among the consideredgates is 2. A potential problem is when the best choice of s would be smallerthan r, based on the above reasoning. Then the scaling factor s/r of thefirst inverter in the inverter chain would be smaller than 1. But the smallestpossible scaling factor is 1, otherwise at least the nMOS transistor of thatinverter would be smaller than the minimum size transistor. The problemappears for us when we have r = 2 and 33/2 < c < 8. The best choice of nis still 3. In this case, we use an unscaled first inverter, which minimizesthe delay of the input link of the inverter chain. The scaling factor of thesecond inverter is set to (c/2)1/2. Then the delay of the input link and thedelay of the output link of that inverter is equal, which minimizes the totaldelay of the inverter chain.
Let t(r, c) and a(r, c) denote the normalized delay and area of an optimuminverter chain, driven by a normalized resistance r and loaded by a normal-ized capacitance c. Let n∗(r, c) be the optimum inverter chain length forthe first approach, that is we have
n∗(r, c) = arg minodd n>0
{
2n(rc
2
)1/n}
,
3.3. Special Cases 25
where arg min gives us the smallest n that minimizes 2n (rc/2)1/n. Basedon the above reasoning, we have
t(r, c) =
rc, r = 2 and c ≤ 3 + 51/2
4 + (8c)1/2, r = 2 and 3 + 51/2 < c < 8
2n∗(r, c)(rc
2
)1/n∗(r,c), otherwise
(3.2)
and
a(r, c) =
0, r = 2 and c ≤ 3 + 51/2
2 + (2c)1/2, r = 2 and 3 + 51/2 < c < 8
2
r·rc2 −
(
rc2
)1/n∗(r,c)
(
rc2
)1/n∗(r,c) − 1, otherwise
(3.3)
for r ∈ {1, 2}. Whenever we have a situation where r and c are fixed, weuse these functions directly. However, when c depends on the dimension ofthe field, we would prefer simpler expressions. Then we use the values givenby the bounds
t(r, c) ≤ 4.4 log2
(rc
2
)
, (3.4)
a(r, c) ≤ c (3.5)
instead, still limited to r ∈ {1, 2}, where the bound in Equation 3.4 is onlyvalid for rc ≥ 2
√3.
Let p(r, c, pL) denote the normalized power dissipation of an optimum in-verter chain, driven by a normalized resistance r and loaded by a normalizedcapacitance c with normalized power dissipation pL. Since the inverter chainis used to amplify a certain signal, we have the same activity throughoutthe chain, and hence we have
p(r, c, pL) =pL
ca(r, c).
The bound on a(r, c) gives us the corresponding bound p(r, c, pL) ≤ pL.
3.3.2 Adder Trees
We will need to add several elements of F2. In order to minimize the delay ofsuch an adder, we will assume that this addition is performed by a minimum
26 Chapter 3. VLSI Considerations
depth adder tree, where the basic two input adder is an XOR gate. Thedepth of this tree is ⌈log2m⌉, where m is the number of signals to add.
The XOR gate has the properties cin = 4, tint = 2, and rout = 2. Becausethe depth of the adder tree is ⌈log2m⌉, the largest normalized delay in thetree is
t = tint + (tint + routcin) (⌈log2m⌉ − 1)
= 2 + 10 (⌈log2m⌉ − 1) .
We will use the value given by the bound t < 2 + 10 log2m.
The number of XOR gates needed in the adder tree is m− 1, and the nor-malized area requirement of each XOR gate is 12. Therefore, we have thenormalized area requirement a = 12(m− 1) of an m input F2 adder.
3.3.3 Buffers
A binary buffer simply consists of two cascaded inverters, which in somecases can be used to reduce the capacitive load of a gate output. Hencebuffers provide a possibility to reduce the delay of architectures. The delayis often regarded as the most important property. We therefore insert buffersin architectures whenever that reduces the delay, and we assume that allbuffers consist of unscaled inverters. Buffers for F2m are implemented as mparallel binary buffers in the obvious way.
3.3.4 Control Logic and Control Signals
The control logic of an architecture can be implemented in several differentways. As for the architecture itself, there is a tradeoff between time, area,and energy consumption. We do not include the control logic in the areaand energy measures. We do, though, include the needed inverter chainsfor the distribution of control signals.
The critical paths may very well pass through the control logic. The delayof the control logic is therefore included as a generic tctrl, which may be afunction of the extension degree of the field.
We assume that the normalized input capacitances of the control logic inputsare all 2. We further assume that the normalized output resistances of thecontrol logic outputs are all 1.
3.4. Summary of Cost Measures 27
3.4 Summary of Cost Measures
We assume that every m-input adder is implemented as an adder tree ofdepth ⌈log2m⌉ and that any large capacitive load in a critical path is drivenby an inverter chain as described in Section 3.3. We will especially assumethat all the clock signals connected to the flip-flops are driven by apropri-ate inverter chains. Moreover, we assume that buffers are used to reducecapacitive loads whenever that reduces the delay of critical paths. Let nff
be the number of flip-flops in the architecture. Then the total normalizedcapacitive load of the clock signals is cclock = 8nff since each flip-flop has 8clocked transistors.
Let tCP be the normalized delay of the critical path of the architecture,and let n be the number of clock cykles needed for an architecture to per-form the calculation it is supposed to do. Our physical model described inSections 3.2 and 3.3 can be summarized as follows:
Time: The normalized time needed for a calculation is ntCP, pro-vided that the architecture is clocked at maximum clockfrequency.
Space: The normalized area requirement of an architecture is thesum of the normalized area requirement of the gates, flip-flops, and needed inverter chains in the architecture. Thisincludes the normalized area requirement of the clock signalinverter chains.
Power: The normalized power dissipation p of an architecture isthe sum of the normalized power dissipation of the gates,flip-flops, and inverter chains in the architecture, includingthe normalized power dissipation of the clock signal inverterchains.
Energy: The normalized energy cost of a calculation is np.
The normalized properties from Sections 3.2 and 3.3 of the gates used aregathered in Table 3.1.
28 Chapter 3. VLSI Considerations
Gate cin tint rout a
Boolean inverter 2 0 1 2Binary buffer 2 2 1 42-input NAND gate 2 0 2 42-input NOR gate 2 0 2 42-input AND gate 2 4 1 62-input OR gate 2 4 1 62-input XOR gate 4 2 2 122-input XNOR gate 4 2 2 12Transmission gate1 1 0 +1 2
Inverter chain2 – t(rD, cL) – a(rD, cL)m-input F2 adder 4 2 + 10 log2m 2 12(m− 1)n-way switch1, n > 2 1 0 +1 2n2-way switch1 2 0 +1 4
Flip-flop 2 10 1 16
Table 3.1: The normalized input capacitances cin, normalized internal delays tint,normalized output resistance rout, and normalized area requirement a, ofthe used building blocks.
1Transmission gate and switches: The input capacitance is only valid for the controlsignals. Observe that the n-way switch has 2n control signals while the 2-way switch hasonly 2 control signals. The transmission gate and all switches add 1 to the normalizedoutput resistance of the previous gate.
2Inverter chain: cL is the normalized load capacitance and rD is the normalized driverresistance of the previous gate. tint is here the normalized delay for all links from theprevious gate to the load capacitance.
Chapter 4
Polynomial Basis Inverters
Polynomial bases are frequently used for representation of finite extensionfields. They provide regular architectures for most arithmetic operations.
Definition 5 (Restated from Section 2.2) Let ϑ be an element of Fqm such
that{
ϑi}m−1
i=0is a basis of Fqm over Fq. Then
{
ϑi}m−1
i=0is called a polyno-
mial basis of Fqm over Fq.
4.1 Inversion Based on the Euclidean Algorithm
Berlekamp [5, pp 21–44] proposes the use of the extended Euclidean algo-rithm for polynomials for inversion in F2m . In this section, we analyze amodified architecture based on Berlekamps inversion algorithm. The ad-vantage of this architecture is that its control logic is much simpler than thecontrol logic of Berlekamps architecture.
4.1.1 The Architecture
Let p(x) be the irreducible polynomial over F2 of degree m used to generatea polynomial basis of F2m . Also, let a(x) be the polynomial representationof α ∈ F2m in this basis. Obviously, since p(x) is irreducible and since
29
30 Chapter 4. Polynomial Basis Inverters
deg a(x) < deg p(x) holds, a(x) and p(x) are relatively prime. If we initiatethe Euclidean algorithm with p(x) and a(x), then the extended algorithmgenerates two polynomials, c(x) and d(x), with degrees deg c(x) < m anddeg d(x) < m− 1. These polynomials satisfy
a(x)c(x) + p(x)d(x) = 1. (4.1)
From equation 4.1, we find that the inverse element α−1 has the polynomialrepresentation c(x). Therefore, we can use the extended Euclidean algo-rithm for inversion in F2m using a polynomial basis. The algorithm givenby Berlekamp [5, p 41] is reformulated below.
Algorithm 4.1 Inversion Algorithm Based on the Euclidean Algorithm
Initiate upper registers rU(x) = p(x) and cU(x) = 1. Also initiate lower regis-ters rL(x) = a(x) and cL(x) = 0. Set dU = m and dL = m − 1. Also initiatethe control bit k = 0.
repeat
if rU,dU= 0 then
if dU = dL then Set k = k + 1 mod 2 (⇒ k = 1)Decrement dU by 1
else if rL,dL= 0 then
if dU = dL then Set k = k + 1 mod 2 (⇒ k = 0)Decrement dL by 1
else
if k = 0 then
Set rU(x) = rU(x) + xdU−dLrL(x)Set cL(x) = cL(x) + xdU−dLcU(x)if dU = dL then Set k = k + 1 mod 2 (⇒ k = 1)Decrement dU by 1
else
Set rL(x) = rL(x) + xdL−dUrU(x)Set cU(x) = cU(x) + xdU−dLcL(x)if dU = dL then Set k = k + 1 mod 2 (⇒ k = 0)Decrement dL by 1
end if
end if
until (dU, dL) = (0, 1) or (1, 0)if k = 0 then The desired inverse is cL(x).else The desired inverse is cU(x).end if
In each iteration in Algorithm 4.1 we decrement dU or dL by 1 and whenthe algorithm is terminated we have (dU, dL) = (0, 1) or (dU, dL) = (1, 0).
4.1. Inversion Based on the Euclidean Algorithm 31
0
0
0
0 0
0
0
0
0
0
1
pm pm 1 p1 p0am 1 am 2 a0
rU(x)
rL(x)
cU(x)
cL(x)
rU,dU
rL,dL
KU KUKUKU
SU SUSUSU
AU AUAUAU
KL KLKL
SL SLSL
AL ALAL
KU KUKU
SU SUSU
AU AUAU
KL KLKL
SL SLSL
AL ALAL
OUOUOU OLOLOL
R
R
R
R
R
R
R
R
R
R
R
R
R
cm 1c1c0
Figure 4.1: An inverter architecture for F2m based on Euclid’s algorithm.
Since dU and dL are initially set to m and m− 1 respectively, the algorithmneeds a total of 2m− 2 iterations after initialization.
The degree of p(x) is m, while the degrees of all other polynomials appearingin the algorithm do not exceed m−1. Therefore it would suffice with m+1memory cells for the upper register rU(x) and m memory cells for the otherregisters. Totally, we can store the polynomials using 4m+ 1 memory cells.Berlekamp observed that the inequalities
deg rU(x) + deg cU(x) ≤ m, (4.2)
deg rL(x) + deg cL(x) ≤ m (4.3)
hold throughout the algorithm. He used this fact to reduce the number ofmemory cells to 2(m+2) by letting spare memory cells from the r-registersbe allocated by the c-registers. However, this solution demands very muchcontrol logic, both global logic and especially local logic in each bit-slice.
We instead analyze the algorithm using four registers, one of length m+ 1and three of length m, thus removing the need for local control signals ineach bit-slice. In Figure 4.1 we show an architecture based on Algorithm 4.1.The control logic needs to keep track of k, dU, and dL, and generate thecontrol signals R (reset), KU (keep rU(x)), SU (shift rU(x)), AU (add torU(x)), OU (output cU(x)), KL (keep rL(x)), SL (shift rL(x)), AL (add torL(x)), and OL (output cL(x)).
32 Chapter 4. Polynomial Basis Inverters
4.1.2 Properties of the Architecture
The critical paths start in rU,dU and rL,dL , pass through the control logicvia the global reset signal R, and end in the inputs of the flip-flops. Thenormalized area and delay of this architecture can easily be found usingthe methods from Chapter 3. However, special attention is needed for thenormalized power.
Let n1 denote the total number of transitions among the flip-flops in the rand c registers in Figure 4.1 during initialization, and let e1 denote the cor-responding normalized energy. Initially in Figure 4.1, we set rU(x) = p(x),rL(x) = a(x), cU(x) = 1, and cL(x) = 0. Then all flip-flops in rL(x), cU(x),and cL(x) may have transitions, since a(x) may be any nonzero element ofthe field, and the contents of the c registers may be any nonzero binary vec-tors of length m after the previous calculation. However, regarding rU(x)we should note that either one or two of the two leftmost flipflops are setafter the previous calculation. Therefore, the weight w of p(x) bounds thetotal number of transitions among the flip-flops in rU(x) at initialization toat most w + 1. Thus, we have
n1 ≤ 3m+ w + 1.
The output of almost every one of these flip-flops is connected to an inputof an XOR gate. There are 8 transistors in the data path of each flip-flop,and 6 transistors of an XOR gate controlled by each flip-flop. Totally, wehave
e1 ≤ 14n1 ≤ 42m+ 14w + 14
A question that arises is whether we can determine or bound the smallestpossible w for all m. By computer search we have found that there is at leastone irreducible trinomial or pentanomial for all degrees m in the interval2 ≤ m ≤ 4000. Hence, w does not need to be greater than 5. The result ofthis search is given in Appendix A.
In each clock cycle we decrement either dU or dL. Consider a clock cyclewhere dL is decremented from d to d − 1, and let kd be the number oftransitions in the r and c registers in Figure 4.1 during this clock cycle.The inequalities in Equations 4.2 and 4.3 can be used to bound kd. WhendL is decremented, we have rU,dU = 1 and there are two possibilities; rL,d iseither 0 or 1. When we have rL,d = 0 the following happens.
4.1. Inversion Based on the Euclidean Algorithm 33
1. rL(x) is shifted, yielding transitions in at most d+1 flip-flops in rL(x).
2. rU(x) is unchanged.
3. cL(x) is shifted. Then Equation 4.3 assures that there are transitionsin at most m− d+ 2 flip-flops in cL(x).
4. cU(x) is unchanged.
The total number of transitions among the flip-flops in the r and c registersis then at most m+ 3. If instead rL,d = 1 holds, the following happens.
1. A shifted version of rU(x) is added to a shifted version of rL(x) andplaced in rL(x), yielding transitions in at most d+1 flip-flops in rL(x).
2. rU(x) is unchanged.
3. cL(x) is shifted. Then Equation 4.3 assures that there are transitionsin at most m− d+ 2 flip-flops in cL(x).
4. cL(x) is added to cU(x). Then Equation 4.3 assures that there aretransitions in at most m− d+ 1 flip-flops in cU(x).
In this case, the total number of transitions among the flip-flops in the rand c registers is at most 2m− d + 4. Throughout the algorithm, we haved ≤ m, which implies 2m− d+4 > m+3. Hence, we have kd ≤ 2m− d+4.For a clock cycle where instead dU is decremented from d to d−1 we use thesame arguments, based on Equation 4.2 instead, and get the same boundon the number of transitions.
Let n2 denote the total number of transitions among the flip-flops in the rand c registers in Figure 4.1 during calculation after initialization, and lete2 denote the corresponding normalized energy. Initially, we set (dU, dL) to(m,m − 1) and during the calculation (dU, dL) are decremented to either(1, 0) or (0, 1). With up to 14 transistors controlled by each flip-flop wetherefore have the bounds
n2 = km + 2m∑
d=2
kd + k1 ≤ 3m2 + 4m− 7,
e2 ≤ 14n2 ≤ 42m2 + 56m− 98.
Let e3 denote the normalized energy corresponding to the transitions inthe reset signal R during one calculation. R is 1 during initialization, i.e.
34 Chapter 4. Polynomial Basis Inverters
during exactly one clock cycle in each calculation. Hence, there are exactly2 transitions in R. Let c3 denote the total normalized load capacitance ofR and the needed inverter chains. Recall that a(r, c) is the area of a delay-optimal inverter chain, driven by a normalized resistance r and loaded by anormalized capacitance c, as defined in Equation 3.3 on page 25. Then wehave
c3 = 8m+ 2 + a(1, 8m+ 2) ≤ 16m+ 4,
e3 = 2c3 ≤ 32m+ 8,
using the bound on a(r, c) in Equation 3.5 on page 25.
Let e4 denote the normalized energy corresponding to the transitions in thecontrol signals OU and OL during one calculation. These signals are used tochoose the output. This choice is done once for each calculation. Thereforewe have at most one transition in these signals. Let c4 denote the totalnormalized load capacitance of OU and the needed inverter chains for OU.We note that c4 is also the total normalized load capacitance of OL and theneeded inverter chains for OL. Then we have
c4 = 2m+ a(1, 2m) ≤ 4m,
e4 ≤ 2c4 ≤ 8m,
again using the bound on a(r, c) in Equation 3.5 on page 25.
Let e5 denote the normalized energy of all control signals, except OU,OL,and R, during one calculation including initialization. The normalized loadcapacitances of control signals KU, SU, and AU, are all 4m + 2, while thenormalized load capacitance of control signals KL, SL, and AL, are all 4m.During each of the 2m− 1 clock cycles, there are transitions in at most twoof of the first three signals and in at most two of the latter three signals.Totally, this gives us
e5 ≤ (2(4m+ 2 + 4m) + 2a(4m+ 2) + 2a(4m))(2m− 1)
≤ 64m2 − 16m− 8.
The total normalized energy of the considered signals is thus bounded by
5∑
i=1
ei ≤ 106m2 + 122m+ 14w − 84
for one calculation.
4.2. A Berlekamp-Massey Based Inverter 35
Number of clock cycles (n) 2m− 1Normalized delay (t) 4.4 log2(m+ 1
4) + 26.8 + tctrl
Normalized time (nt) (2m− 1)(
4.4 log2(m+ 1
4) + 26.8 + tctrl
)
Normalized area (a) 192m+ 28Normalized power (p) 181m+ 119.5 + 14w+3.5
2m−1
Normalized energy (np) 362m2 + 58m+ 14w − 116
Table 4.1: Properties of the Euclidean inverter architecture. m is the extension degreeand w is the weight of the irreducible polynomial used to generate thebasis. All properties except the number of clock cycles are upper bounds.tctrl is the normalized delay of the control logic.
The main part of the energy, however, is due to the clock signal. Let e6denote the normalized energy due to clocking during one calculation. Wehave 4m + 1 flip-flops with 8 clocked transistors each, which gives us thenormalized load capacitance 32m + 8. The number of clock cycles for onecalculation is 2m− 1, and we have 2 transitions in each clock cycle, whichgives us exactly 4m−2 transitions for the clock signal during one calculation.This gives us the normalized energy (32m+8)(4m−2) of the clocked parts ofthe flip-flops, and the normalized energy of the clock signal inverter chainsis at most the same value. Totally, the normalized energy of one calculationdue to the clocking signal is bounded by
e6 ≤ 256m2 − 64m− 32,
and we have the expected normalized power and energy of the architecturegiven in Table 4.1, together with its other properties.
4.2 A Berlekamp-Massey Based Inverter
Several authors have observed and characterized the similarities betweenthe Euclidean algorithm for polynomials and the Berlekamp-Massey algo-rithm for shift register synthesis. For instance, Reed et. al. [42] give animplementation of the Berlekamp-Massey algorithm for finding the error-locator polynomial using continued fractions. The topic is also treated byCheng [8], Dornstetter [11], Eastman [12], and Welch, Scholtz [52]. Sincethe extended Euclidean algorithm for polynomials can be used for inversionin finite extension fields, it seems resonable that there should be a way to
36 Chapter 4. Polynomial Basis Inverters
use the Berlekamp-Massey algorithm for the same purpose. Hasan [19] alsoclaims that it is possible, but does not give any architecture.
4.2.1 Triangular Bases
The Berlekamp-Massey algorithm essentially performs Gauss elimination onHankel matrices. Thus, by Theorem 1 on page 13, we need two bases inorder to represent our elements as Hankel matrices; one that is a multiple ofa polynomial basis, and one that is a multiple of the dual of that basis. Thefollowing basis was used for multiplication by Wang and Blake [50] and byHasan and Bhargava [21]. The basis was introduced by Wang and Blake,and named by Hasan and Bhargava.
Definition 12 Let{
ϑi}m−1
i=0be a polynomial basis of Fqm over Fq with
ϑ being a root of the monic irreducible polynomial p(x) =∑m
i=0 pixi over
Fq. Then {σj}m−1j=0 given by σj =
∑m−1−ji=0 pi+j+1ϑ
i is the triangular basis
corresponding to{
ϑi}m−1
i=0.
A simple result of Lemma 2 on page 12 is the following.
Lemma 3 Let {θi}m−1i=0 and {σi}m−1
i=0 be bases of Fqm over Fq and definethe row vectors θ , (θ0, θ1, . . . , θm−1) and σ , (σ0, σ1, . . . , σm−1). Thenσ = θ1θ,σ holds.
Proof: By Definition 9 we have σi = σ (σi)σ = θ (σi)θ, 0 ≤ i < m, andby Lemma 2 we have (σi)θ = 1θ,σ (σi)σ, 0 ≤ i < m. Combining thesetwo statements, we get σ (σi)σ = θ1θ,σ (σi)σ, 0 ≤ i < m. This equationsystem can be identified as σ1σ,σ = θ1θ,σ1σ,σ, since (σi)σ, 0 ≤ i < m, arethe columns of 1σ,σ. The matrix 1σ,σ is the m × m unit matrix, which istrivially nonsingular. Therefore we have σ = θ1θ,σ. 2
The following extension of Lemma 2 is straightforward and is therefore givenwithout proof.
4.2. A Berlekamp-Massey Based Inverter 37
Lemma 4 Let {θj}m−1j=0 , {σi}m−1
i=0 , and {φk}m−1k=0 be bases of Fqm over Fq
and define the row vectors θ , (θ0, θ1, . . . , θm−1), σ , (σ0, σ1, . . . , σm−1),and φ , (φ0, φ1, . . . , φm−1). Let α, β, and γ be elements of Fqm. Thenγσ,φ = ασ,θβθ,φ holds if and only if γ = αβ holds.
Now we have what we need to prove the following result.
Theorem 2 Let{
ϑi}m−1
i=0be a polynomial basis of Fqm over Fq and let
{σi}m−1i=0 be its triangular basis. Then {σi}m−1
i=0 is a multiple of the dual
basis of{
ϑi}m−1
i=0.
Proof: Since{
ϑi}m−1
i=0is a polynomial basis, ϑ is a root of an irreducible
polynomial, say p(x) =∑m
i=0 pixi, over Fq with pm = 1. Define the row
vectors θ ,(
ϑ0, ϑ1, . . . , ϑm−1)
and σ , (σ0, σ1, . . . , σm−1). By Theorem 1,it suffices to show that ασ,θ is a Hankel matrix for all α in Fqm .
It is easily shown that
ϑθ,θ =
0 · · · 0 −p0
1 0 −p1
. . ....
0 1 −pm−1
holds. By Definition 12 and Lemma 3 we have
1θ,σ =
p1 · · · pm... . .
.
pm 0
.
It is easily checked that ϑσ,σ = ϑTθ,θ fulfills the equality 1θ,σϑσ,σ = ϑθ,θ1θ,σ,
given by Lemma 4.
To prove that ασ,θ is a Hankel matrix for all α ∈ Fqm we have to examinethis matrix closer. From Definition 10 on page 11 and Lemma 2 on page 12,we have
ασ,θ =(
ϑ0σ,σασ, ϑ
1σ,σασ, . . . , ϑ
m−1σ,σ ασ
)
.
That is, each column of ασ,θ is the previous column multiplied by ϑσ,σ. Sinceϑσ,σ has the above form, it is obvious that ασ,θ is a Hankel matrix. 2
It should be mentioned that Hasan and Bhargava [23] showed that ασ,θ isgiven by the Hankel matrix ασ,θ = (aij)
m−1,m−1i,j=0,0 , aij = Tr(σ′0ϑ
i+jα). Thisobservation together with Theorem 1 is enough to prove Theorem 2.
38 Chapter 4. Polynomial Basis Inverters
Tr(θ′m−1−jα),0 ≤ j < m
and0,
m ≤ j < 2m− 1
Tr(σ′
0ϑjα), 0 ≤ j < 2m− 1
pm−1 pm−2 pm−3 p1 p0
Figure 4.2: Generation of ασ,θ from αθ using a Fibonacci type linear feedback shiftregister.
4.2.2 The Architecture
From now on let us assume that {θj}m−1j=0 is a polynomial basis of F2m over
F2, where the basis elements are given by θj = ϑj and where ϑ is a root ofthe irreducible polynomial p(x) =
∑mi=0 pix
i over F2 of degree m. Let usalso assume that {σi}m−1
i=0 is the triangular basis corresponding to {θj}m−1j=0 .
It is well known that the entries of ασ,θ can be generated from the vectorrepresentation
αθ =(
Tr(θ′0α),Tr(θ′1α), . . . ,Tr(θ′m−1α))T
of α with respect to {θj}m−1j=0 using the relations
Tr(σ′0ϑjα) =
Tr(θ′m−1α), j = 0
Tr(θ′m−1−jα) −∑m−1
i=m−j piTr(θ′m−1ϑj+i−mα), 0 < j < m
−∑m−1
i=0piTr(θ′m−1ϑ
j+i−mα), m ≤ j ≤ 2m− 2.
This means that we can generate ασ,θ from αθ using a Fibonacci-type linearfeedback shift register of length m, where the feedback network is givenby p(x) as shown in Figure 4.2. The flip-flops in the feedback register areinitially set to zero.
The Hankel matrix ασ,θ specifies the left hand side of the equation
ασ,θ(α−1)θ = 1σ
which we wish to solve. The Berlekamp-Massey algorithm finds a solutionto a Hankel problem with the right hand side being zero. The right handside of the equation is 1σ = (0, . . . , 0, 1)T, since by Definition 12 we haveσm−1 = 1. Therefore we can run the Berlekamp-Massey algorithm on the
4.2. A Berlekamp-Massey Based Inverter 39
first m − 1 rows of ασ,θ. In our inverter, the Berlekamp-Massey algorithmsolves equation systems over F2. Therefore all nonzero discrepancies are 1and hence we do not need to save them, nor do we need to invert them, asthe original algorithm prescribes. Let hi, 1 ≤ i ≤ m be the i-th row of ασ,θ.Assume that the Berlekamp-Massey algorithm has generated the resultingcolumn vector a. We have already noted that ασ,θ is nonsingular for allnonzero α. Therefore the product hma is nonzero. Since the only nonzeroelement of F2 is 1, we have (α−1)θ = a.
For the analysis, we give the Berlekamp-Massey algorithm here, in a formsimilar to that given by Feng and Tzeng [14]. However, it is given underthe assumtion that it is to be used for inversion in F2m . Therefore, we canassume that the matrix is non-singular. Define the shift matrix
S ,
(
0 0Im−1 0
)
,
where Im−1 is the m − 1 ×m − 1 identity matrix. Since we are interestedin binary equation systems only, we state the algorithm for use over F2.
Algorithm 4.2 The Berlekamp-Massey Algorithm for Inversion in F2m
Given a sequence [z1, . . . , z2m−1] over F2, associated with a nonsingular m×mHankel matrix, H = (hi,j)
m,mi,j=1,1, with rows hi = (zi, . . . , zi+m−1), initiate
vectors A = (0, . . . , 0)T and B = (1, 0, . . . , 0)T of length m, and initiate integersr = s = 1 and s′ = 0.
while r < mSet ∆ = hrBif ∆ = 0 then
Increment r Continue with the next row.else if r < s then
Set B = B + Ss−r−1A Update vector B.Increment r Continue with the next row.
else
Set s′ = s Remember this column number.Set s = r + 1 Consider a new column,Set r = s′ and a new row.Set (A,B) = (B,Ss−s′
B +A) Update vectors A and B accordingly.end if
end while
The result is B.
The integers r and s are the row number and the column number considered.The third integer s′ is the column number of the previous column considered.
40 Chapter 4. Polynomial Basis Inverters
am−1−j , 0 ≤ j < m0, m ≤ j < 2m− 1
}
∆
0 0 0
00
0
0
0 0 0
0
0
1
pm−1 pm−2 p1 p0
R R
R R R
R R R
R R RR R R
RR R R
SS S SLL L L
A
B
C
Figure 4.3: Architecture of an inverter for F2m using the Berlekamp-Massey algo-rithm. The bold lines denote possible critical paths.
4.2. A Berlekamp-Massey Based Inverter 41
Massey [33] chose to consider only the active parts of the vectors, and byletting A instead be shifted backwards one step during each iteration, hecould remove the shift represented by Ss−s
′
B above, and produce a veryregular architecture.
In Figure 4.3 we display an inverter architecture based on Algorithm 4.2.The feedback register C in the bottom of the figure generates the entries ofασ,θ, i.e. the zi’s in Algorithm 4.2. It is a modified version of the feedbackregister in Figure 4.2. The switches in the register are used for initiatingit to zero, and the flip-flop in the feedback network reduces the length ofthe critical path. The rest of Figure 4.3 is a modified version of the stan-dard Berlekamp-Massey architecture proposed by Massey [33]. A result ofMassey’s reconfiguration is that the leftmost flip-flop in register B is alwaysset. Therefore the leftmost bitslice in Figure 4.3 is somewhat simplified.Let s∗ be the value of s when Algorithm 4.2 terminates. Another resultof Massey’s reconfiguration is that the resulting vector in B needs to beshifted m − s∗ steps to its correct position after the calculation. This isdone by letting the algorithm continue until we have r+ s = 2m+ 1, whichcan be seen as an extension of the matrix by one more column. Duringthese extra iterations, the contents of register B is moved to register A, andthen register A is shifted m − s∗ steps. At the same time, the contents ofB is updated a number of times, producing nothing of particular interest.Finally, we can read out the result from register A.
The control logic needs to keep track of row number r and column number sin the algorithm, and generate the control signals R (reset), S (shift registerA), and L (load register A) accordingly.
4.2.3 Properties of the Architecture
There are two possibilities of critical paths depending on the delay of thecontrol logic. One possibility is indicated by thicker lines in Figure 4.3. Theother starts as the first in the input of the architecture, enters the controllogic through the discrepancy ∆, exits the control logic through the resetsignal R, and ends in the flip-flops of register B. The normalized area anddelay of this inverter can be found using the methods from Chapter 3. Letw be the weight of p(x). The area needed for the architecture depends onw, since the adder in the feedback network of register A has w − 1 inputs.As we noted in Section 4.1, w does not need to be greater than 5. As for
42 Chapter 4. Polynomial Basis Inverters
the inverter based on the Euclidean algorithm, special attention is neededfor the normalized energy.
Let e1 denote the normalized energy consumed by the architecture duringinitialization, excluding clocking and control of the switches. The registersA, B, and C, may contain any binary vectors, except the all zero vector, atthe end of the previous calculation. They are initiated with all zeros, whichmeans that there may be transitions in all flip-flops. Thus, there may betransitions in any data path in Figure 4.3, and e1 is upper bounded by thenormalized area of this part of the architecture, except for the transistorsthat are not controlled directly by any of the datapaths. The transistors thatwe leave out now are the clocked ones in the flip-flops, and the transistorsin the switches. Thus, we have
e1 ≤ 60m+ 12w + a(1, 2(m− 1)) + a(2, 10) − 52
≤ 62m+ 12w − 47.2,
where a(r, c) is the area of a delay-optimal inverter chain, driven by a nor-malized resistance r and loaded by a normalized capacitance c, as definedin Equation 3.3 on page 25, and where we have used the bound on a(r, c)in Equation 3.5, also on page 25.
Let n2 denote the total number of transitions among the flip-flops in theregisters A and B in Figure 4.3 during calculation after initialization, andlet e2 denote the corresponding normalized energy. Number the clock cyclesfrom 1 to 2m, where clock cycle 1 is used for initialization. Then, in clockcycle k, 1 < k ≤ 2m, we have s + r = k. We should note that only the s′
rightmost flip-flops among the r leftmost flip-flops in register A may be set.Also, only the s − 1 leftmost flip-flops in register B may be set. In eachiteration, we have three possibilities.
1. ∆ = 0. Then register A is shifted, resulting in at most s′ + 1 transi-tions.
2. ∆ = 1 and r < s. Then we get at most s′+1 transitions from the shiftof register A. At the same time, the contents of register A is addedto register B, resulting in at most s′ transitions in register B. Totallywe have at most 2s′ + 1 transitions in registers A and B.
3. ∆ = 1 and r ≥ s. Then both registers are updated. There are at mostr transitions in A and at most s′ transitions in B. Totally, we have atmost r + s′ transitions in registers A and B.
4.2. A Berlekamp-Massey Based Inverter 43
From Algorithm 4.2 we find that s′ < s and s′ ≤ r hold throughout thealgorithm. Using these inequalities in the three cases above, we can deducethat we have no more than k transitions in registers A and B during iterationk. Therefore we have
n2 ≤2m∑
k=2
k = 2m2 +m− 1.
Each flip-flop in registers A and B controls at most 4 transistors in an ANDgate, and 6 transistors in an XOR gate, in addition to the 8 transistors inthe data path of each flip-flop. Totally, we have at most 18 transistor gatescontrolled by each flip-flop, which gives us
e2 ≤ 18n2 ≤ 36m2 + 18m− 18.
Let e3 denote the normalized energy corresponding to transitions in registerC during calculation after initialization. The leftmost flip-flop in C togetherwith the input controls 12 transistors in an XOR gate, 4 transistors in abuffer, and 8 transistors in the data path of the flip-flop. The rest of theflip-flops in C each controls 2 transistors in an AND gate, in addition to the8 transistors in the data path of each flip-flop. The leftmost flip-flop mayhave transitions during all clock cycles. Let n3 denote the total number oftransitions among the rest of the flip-flops in register C during calculationafter initialization. Consider clock cycle k, 1 < k < m. Then there maybe transitions in the k − 1 leftmost flip-flops among the considered ones.During clock cycle k, m ≤ k ≤ 2m, however, there may be transitions in allthose m− 1 flip-flops. Therefore, we have
n3 ≤m−1∑
k=2
(k − 1) +
2m∑
k=m
(m− 1) =3
2(m2 −m),
e3 ≤ 10n3 + 24(2m− 1) ≤ 15m2 + 33m− 24
Let e4 be the normalized energy consumed in the w − 1-input adder in thefeedback shift register, in the m-input adder that produces ∆, and in thedistribution of ∆ over m − 1 AND gates. Let c4 be the total normalizedcapacitance in these adders, these AND gates, and the distribution over theAND gates. Totally, we have
c4 = 12(w − 1) + 12(m− 1) + 4 + a(1, 2(m− 1)) + 2(m− 1)
≤ 16m+ 12w − 28.
44 Chapter 4. Polynomial Basis Inverters
Number of clock cycles (n) 2mNormalized delay (t) 14.4 log2(m) + 54.9Normalized time (nt) 28.8m log2(m) + 109.8mNormalized area (a) 136m+ 12w − 72Normalized power (p) 145.5m+ 12w + 8.5 − 38.6
mNormalized energy (np) 291m2 + 24mw + 17m− 77.2
Table 4.2: Properties of the Berlekamp-Massey inverter architecture. m is the ex-tension degree and w is the weight of the irreducible polynomial used togenerate the basis. All properties except the number of clock cycles areupper bounds.
where we have used the bound on a(r, c) in Equation 3.5 on page 25. Onecalculation, except initialization, takes 2m − 1 clock cycles. Therefore, wehave the trivial bound
e4 ≤ (2m− 1)(16m+ 12w − 28)
= 32m2 + 24mw − 72m− 12w + 28.
Let e5 denote the normalized energy consumed by the global control signalsL, S, and R, and by the distribution of these signals. There are exactly 2transitions in R during one calculation. The normalized capacitive load ofR is 10m − 4 + a(1, 10m − 4), including inverter chains. In L and S, how-ever, there may be transitions in all 2m clock cycles. The total normalizedcapacitive load of these signals is 4m+2a(1, 2m), including inverter chains.Thus, we have
e5 ≤ 2(10m− 4 + a(1, 10m− 4)) + 2m(4m+ 2a(1, 2m))
≤ 16m2 + 40m− 16,
where we again have used the bound on a(r, c) in Equation 3.5 on page 25.
The total normalized energy of the above considered signals is bounded by
5∑
i=1
ei ≤ 99m2 + 24mw + 81m− 77.2
for one calculation.
4.3. Inversion Based on the Gauss-Jordan Algorithm 45
As before, the main part of the energy is due to the clock signal. Let e6denote the normalized energy due to clocking during one calculation. Wehave 3m − 1 flip-flops with 8 clocked transistors each, which gives us thenormalized load capacitance 24m − 8. The number of clock cycles for onecalculation is 2m, and we have 2 transitions in the clock signal in each clockcycle, which gives us exactly 4m transitions for the clock signal during onecalculation. This gives us the normalized energy 4m(24m−8) of the clockedparts of the flip-flops, and the normalized energy of the clock signal inverterchains is at most the same value. Totally, the normalized energy of onecalculation due to the clocking signal is bounded by
e6 ≤ 192m2 − 64m,
and we have the expected normalized power and energy of the architecturegiven in Table 4.2, together with its other properties.
4.3 Inversion Based on the Gauss-Jordan Algo-
rithm
Any algorithm for solving general linear equation systems can naturally beused for inversion in finite fields. One such algorithm is the Gauss-Jordanalgorithm. The version of the algorithm given by Wang and Lin [48][49] isreformulated here.
Algorithm 4.3 The Gauss-Jordan Algorithm
Given an n × n nonsingular matrix A = [ai,j ] and an n-dimensional columnvector b = [bi], over F2, the solution of the equation Ax = b can be found bythe following row operations on (A|b).for k = 1 to n do
Let i be the row number of the topmost nonzero element of the k-th column.for j = i+ 1 to n do
Add aj,k times row i to row j.end for
Rotate rows i to n such that row i becomes row n.end for
The result is in the vector b.
46 Chapter 4. Polynomial Basis Inverters
(a)
(b)
(c)
P1,1 P1,2
P1,
n−1P1,n
P1,
n+1
P2,2
P2,
n−1P2,n
P2,
n+1
Pn−1,
n−1
Pn−1,
n
Pn−1,
n+1
Pn,n
Pn,
n+1
10n−1
C
T
T
T
T T
TT
T
E
E
E
E
Din
Dout
b1...bn
a1,1
...a1,n
a2,1
...a2,n
an−1,1
...an−1,n
an,1
...an,n
X...Xx1
...xn
Figure 4.4: Basic Architecture for the Gauss-Jordan algorithm over F2: (a) Overallstructure, (b) boundary cell, and (c) main cell. X represents data fromprevious calculations.
4.3. Inversion Based on the Gauss-Jordan Algorithm 47
The basic architecture for this algorithm is given in Figure 4.4. The basiccells and the boundary cells are the cells used by Wang and Lin [48][49],with an obvious simplification in the boundary cells and another equallyobvious simplification in the main cells. The number of boundary cells is nand the number of main cells is n(n+1)/2. The latency of this architectureis n clock cycles and the throughput rate is one result per n clock cycles.
4.3.1 A Systolic Implementation of the Gauss-Jordan algo-
rithm
The choice of an architecture for any calculation is a tradeoff between space,clocking speed, throughput rate, latency, and regularity. For systolic archi-tectures, this tradeoff is completely in favour of clocking speed and regu-larity. These sequential architectures are highly pipelined and built up of aregular structure of several copies of a few basic cells. The cells should besmall and independent of the size of the architecture. A cell is supposed tobe connected only to its nearest neighbours, including control signals. Thecritical path in a systolic architecture is supposed to be short and indepen-dent of the size of the architecture.
The architecture in Figure 4.4 is not systolic since the critical path is notindependent of n. The architecture has no feedback loops except within thecells, so the signals flow in one direction between the cells. This fact makesit well suited for pipelining in order to minimize the critical path in thearchitecture. A pipelined version of this architecture is given in Figure 4.5.The latency of this architecture is 3n − 1 clock cycles and the throughputrate is still one result per n clock cycles. Also the critical path is short andindependent of n.
4.3.2 Previous Inverters Based on the Gauss-Jordan Algo-
rithm
Let p(x) =∑n
i=0 pixi be the irreducible polynomial of degree m over F2
used to generate a polynomial basis of F2m . Also let a(x) =∑n−1
i=0 aixi and
c(x) =∑n−1
i=0 cixi be the polynomial representations of α ∈ F∗
2m and itsinverse, respectively, in this basis. Then
a(x)c(x) + p(x)d(x) = 1 (4.4)
48 Chapter 4. Polynomial Basis Inverters
P1,1 P1,2
P1,
n−1P1,n
P1,
n+1
P2,2
P2,
n−1P2,n
P2,
n+1
Pn−1,
n−1
Pn−1,
n
Pn−1,
n+1
Pn,n
Pn,
n+1
01m−1
X...XXXb1...bn
a1,1
...a1,n
Xa2,1
...a2,n
X...X
an−1,1
...an−1,n
X...XXan,1
...an,n
X...Xx1
...xn
Figure 4.5: A pipelined version of the architecture for the Gauss-Jordan algorithmover F2. X represents data from previous calculations.
4.3. Inversion Based on the Gauss-Jordan Algorithm 49
holds for some d(x) =∑n−2
i=0 dixi of degree at most n − 2. Wang and
Lin [48][49] use a systolic implementation of the Gauss-Jordan algorithm tosolve the obvious linear equation system based on this polynomial equation.The preprocessor is simple. The major drawback is the size of the equationsystem. The matrix is of size (2m − 1) × (2m − 1), while we have notedin Section 4.2 that we only need an m × m matrix. The trivial solutionwould be to use the architecture in Figure 4.5, with n = 2m− 1. Wang andLin noted that it is possible to simplify the architecture in this case, thusremoving approximately 25% of the area. However, the architecture is stillapproximately 3 times as large as if we had used an m×m matrix instead.
The second architecture is proposed by Hasan and Bhargava [22]. As inSection 4.2 let us assume that {θj}m−1
j=0 is a polynomial basis, where the
basis elements are given by θj = ϑj , and where ϑ is a root of the irreduciblepolynomial p(x) =
∑mi=0 pix
i over F2 of degree m. Let α be an element ofF∗
2m . This inverter uses a preprocessor that generates αθ,θ from αθ. Then theGauss-Jordan algorithm is used to solve the equation αθ,θ(α
−1)θ = 1θ withn = m. The preprocessor of the Hasan and Bhargava [22] inverter is givenin Figure 4.6. By using the m×m matrix αθ,θ, Hasan and Bhargava addressthe problem of the size of the Wang and Lin architecture. However, thereis a disadvantage of their preprocessor, namely the triangular shift registerlayer in the top used to feed the matrix entries to the systolic implementationof the Gauss-Jordan algorithm. The area of this preprocessor is thereforeessentially linear in m2.
Using the methods from Chapter 3 we find that this architecture has theproperties in Table 4.3. As before, special care is needed for the controlsignals. The architecture is fed with new data every m clock cycles. Thecontrol signals, both of the Gauss-Jordan architecture in Figure 4.5 and ofthe preprocessor in Figure 4.6, have totally two transitions each during thesem clock cycles. All other signals may have transitions during every clockcycle. For brevity, we omit the detailed derivation of the power consumtion.We use the same approach as in Sections 4.1 and 4.2.
4.3.3 A New Preprocessor
We continue to let {θj}m−1j=0 be a polynomial basis of F2m over F2, where the
basis elements are given by θj = ϑj , and where ϑ is a root of the irreducible
50 Chapter 4. Polynomial Basis Inverters
(b)
(a)
m signals
b0...
bm−1
m− 1flipflops
m− 2flipflops
2m− 1flipflops
am−1, . . . , a0
1, 0m−1
pm−1, . . . , p0
ain
Cin
pin
aout
Cout
pout
Figure 4.6: The preprocessor of the systolic inverter for F2m using the Gauss-Jordanalgorithm due to Hasan and Bharghava [22].
Number of clock cycles (n) mNormalized delay (t) 30Normalized time (nt) 30mNormalized area (a) 73m2 + 271m− 194Normalized power (p) 113m2 + 439m+ 36w − 272 − 52wm−1 − 84m−1
Normalized energy (np) 113m3 + 439m2 + 36wm− 272m− 52w − 84
Table 4.3: Properties of the systolic Gauss-Jordan inverter architecture due to Hasanand Bhargava. m is the extension degree and w is the weight of theirreducible polynomial used to generate the basis. All properties exceptthe number of clock cycles are upper bounds.
4.3. Inversion Based on the Gauss-Jordan Algorithm 51
polynomial p(x) =∑m
i=0 pixi over F2 of degree m. As in Section 4.2, let us
also assume that {σi}m−1i=0 is the triangular basis corresponding to {θj}m−1
j=0 .Again, let α be an element of F∗
2m , and assume that α is represented as αθ.We noted in Section 4.2 that ασ,θ can be generated from αθ using the Fibo-nacci type linear feedback shift register in Figure 4.2 on page 38. Finally,we can find the inverse of α by solving the equation system ασ,θ(α
−1)θ = 1σ.
Let w be the weight of p(x). An obvious drawback of the Fibonacci typefeedback shift register is that the w-input F2-adder in the feedback networkwill determine the critical path of the inverter. The delay of the critical pathis then essentially linear in log2w. Also, the regularity of this feedback shiftregister is poor. The critical path of this shift register can be reduced byrearranging the memory cells as in Figure 4.7. This rearrangement alsomakes the feedback shift register much more regular. In this modified shiftregister no signal passes more than two F2-adders and one F2-multiplier.
The preprocessor shall not only generate the entries of the matrix. It shallalso distribute these entries correctly. A possible feeding order is givenin Table 4.4. The systolic implementation of the Gauss-Jordan algorithmin Figure 4.5 on page 48 is designed to accept new data every m timeinstances. Therefore we should apply new data in the same manner startingat time instance m. While one part of the architecture is doing the lastcomputations for one input, the other part of the architecture is doing thefirst computations for the next input.
Consider the preprocessor in Figure 4.8. It consists of two modified Fibonac-ci type linear feedback shift registers to deal with the need of processing twodifferent inputs at the same time. These are built by the cells in Figure 4.8band one of the cells in Figure 4.8c-f. The cells in Figure 4.8g are used to shiftthe result further. The choice between the outputs of the two feedback shiftregisters is performed by the switches and the uppermost shift register inFigure 4.8a. Together with each feedback shift register there is an additionalshift register with input labelled Cin and output labelled Cout in the cells.These shift registers are used together with the additional F2 multipliers toreset the feedback shift registers after a completed calculation.
Using the methods from Chapter 3 we find that this architecture has theproperties in Table 4.5. As before, special care is needed for the controlsignals. The architecture is fed with new data every m clock cycles. Thecontrol signal in the Gauss-Jordan architecture in Figure 4.5 on page 48 still
52 Chapter 4. Polynomial Basis Inverters
Tr(θ′m−1−jα),0 ≤ j < m
and0,
m ≤ j < 2m− 1
Tr(σ′
0ϑjα), 0 ≤ j < 2m− 1
0
0
0
0 0
pm−1 pm−2 pm−3 p1 p0
Figure 4.7: Generation of ασ,θ from αθ using a modified Fibonacci type linear feed-back shift register with a short critical path.
Time Column Left handinstance 0 1 2 3 side
0 Tr(σ′
0ϑ3α)
1 Tr(σ′
0ϑ4α) Tr(σ′
0ϑ2α)
2 Tr(σ′
0ϑ5α) Tr(σ′
0ϑ3α) Tr(σ′
0ϑ1α)
3 Tr(σ′
0ϑ6α) Tr(σ′
0ϑ4α) Tr(σ′
0ϑ2α) Tr(σ′
0ϑ0α)
4 Tr(σ′
0ϑ5α) Tr(σ′
0ϑ3α) Tr(σ′
0ϑ1α) 0
5 Tr(σ′
0ϑ4α) Tr(σ′
0ϑ2α) 0
6 Tr(σ′
0ϑ3α) 0
7 1
Table 4.4: The order of the input signals to the systolic implementation of the Gauss-Jordan algorithm for inversion in F16 using our preprocessor.
Number of clock cycles (n) mNormalized delay (t) 40Normalized time (nt) 40mNormalized area (a) 61m2 + 351m+ 24w − 216Normalized power (p) 93m2 + 503m+ 24w − 280 − 68m−1
Normalized energy (np) 93m3 + 503m2 + 24mw − 280m− 68
Table 4.5: Properties of our systolic Gauss-Jordan inverter architecture. m is theextension degree and w is the weight of the irreducible polynomial used togenerate the basis. All properties except the number of clock cycles areupper bounds.
4.3. Inversion Based on the Gauss-Jordan Algorithm 53
(b) (c)
(d) (e)
(f) (g)
(a)
00000000 11111111
m signals
2m− 1 flipflops
i=0 i=1 · · ·
⌈m/4⌉ − 1 copies
midcells
m− ⌈m/4⌉ − 1 copies
p0
p0
p0
p0
p1
p1p1
p2
p2
p3pm−1−4i
pm−2−4i
pm−3−4i
pm−4−4i
12m−202
1m−2021m
0m1m
1m0m
1m−10m1
am−1, . . . , a0
0m−11
Cin
Cin
Cin
Cin
CinCout
sin
sin
sin
sin
sin
sin
sout
sout
sout
sout
sout
sout
fin
fout
fout
fout
fout
fout
(b)
(b)(b)
(b)(b)
(b)
(c)...
(f)
(c)...
(f)
(g)(g)
(g)(g)(g)
(g)
Figure 4.8: The preprocessor of our systolic architecture of an inverter for F2m usingthe Gauss-Jordan algorithm: (a) Overall structure, (b) left cells, (c) midcells for m ≡ 0 mod 4, (d) mid cells for m ≡ 3 mod 4, (e) mid cells form ≡ 2 mod 4, (f) mid cells for m ≡ 1 mod 4, and (g) right cells.
54 Chapter 4. Polynomial Basis Inverters
have totally two transitions during these m clock cycles. The control signalsin our preprocessor in Figure 4.8, however, have totally two transitions eachduring 2m clock cycles. All other signals may have transitions during everyclock cycle. Again for brevity, we omit the detailed derivation of the powerconsumtion.
4.4 Properties of the Polynomial Basis Inverters
The architectures in this chapter can be divided into two categories. The ar-chitectures in Sections 4.1 and 4.2 based on Euclids and Berlekamp-Masseysalgorithms both essentially solve Hankel problems. The architectures in Sec-tion 4.3 are based on the Gauss-Jordan algorithm which can solve arbitrarylinear equation systems. In Figures 4.9 through 4.11 we have plotted thenormalized time, area, and energy, respectively, needed for the architecturesconsidered in this chapter, for 2 ≤ m ≤ 65. These plots are based on Ta-bles 4.1, 4.2, 4.3, and 4.5, on pages 35, 44, 50 and 52, respectively. We haveneglected the delay of the control logic in Figure 4.9.
The first two architectures have the same orders of all our cost measures.However, by comparing Tables 4.1 and 4.2 on pages 35 and 44, respectively,or by studying Figures 4.9 through 4.11, we see that these architectureshave different application areas. We should prefer the inverter based on theEuclidean algorithm if the most important property is time. The reason thatthe critical path of the inverter based on Berlekamp-Massey’s algorithm ismuch longer than the critical path of the inverter based on the Euclideanalgorithm is that the adder tree in the Berlekamp-Massey inverter addsabout 10 log2m to the critical path. On the other hand we should preferthe Berlekamp-Massey inverter if the most important property is area orenergy consumption.
The two architectures in Section 4.3 both use a systolic implementation ofthe Gauss-Jordan algorithm. The difference between those architecturesis all in the preprocessors. By comparing Tables 4.3 and 4.5 on pages 50and 52, respectively, or by studying Figures 4.9 through 4.11, we see thatthese architectures also have different application areas. We should preferHasan and Bhargava’s inverter if the most important property is time. Onthe other hand we should prefer our inverter if the most important propertyis area or power dissipation for m ≥ 4.
4.4. Properties of the Polynomial Basis Inverters 55
0 10 20 30 40 50 600
50
100
150
200
250
300Polynomial basis inverters
EUC BM GJ−1GJ−2
m
nt/
m
Figure 4.9: Normalized time needed for inversion in F2m using polynomial bases andthe architectures considered in this chapter.
EUC – Inversion based on the Euclidean algorithm.BM – Inversion based on the Berlekamp-Massey algorithm.GJ-1 – Inversion based on the Gauss-Jordan algorithm using Hasan andBhargava’s preprocessor.GJ-2 – Inversion based on the Gauss-Jordan algorithm using ourpreprocessor.
56 Chapter 4. Polynomial Basis Inverters
0 10 20 30 40 50 600
20
40
60
80
100
120
140
160
180
200Polynomial basis inverters
EUC BM GJ−1GJ−2
m
a/m
2
Figure 4.10: Normalized area needed for inversion in F2m using polynomial basesand the architectures considered in this chapter.
EUC – Inversion based on the Euclidean algorithm.BM – Inversion based on the Berlekamp-Massey algorithm.GJ-1 – Inversion based on the Gauss-Jordan algorithm using Hasanand Bhargava’s preprocessor.GJ-2 – Inversion based on the Gauss-Jordan algorithm using ourpreprocessor.
4.4. Properties of the Polynomial Basis Inverters 57
0 10 20 30 40 50 600
50
100
150
200
250
300Polynomial basis inverters
EUC BM GJ−1GJ−2
m
np/
m3
Figure 4.11: Normalized energy needed for inversion in F2m using polynomial basesand the architectures considered in this chapter.
EUC – Inversion based on the Euclidean algorithm.BM – Inversion based on the Berlekamp-Massey algorithm.GJ-1 – Inversion based on the Gauss-Jordan algorithm using Hasanand Bhargava’s preprocessor.GJ-2 – Inversion based on the Gauss-Jordan algorithm using ourpreprocessor.
58 Chapter 4. Polynomial Basis Inverters
Among the four architectures considered in this chapter, the fastest one isHasan and Bhargava’s inverter based on the Gauss-Jordan algorithm, whilethe smallest and least power consuming one is the inverter based on theBerlekamp-Massey algorithm. Both the inverter based on the Euclideanalgorithm and Hasan and Bhargava’s inverter based on the Gauss-Jordanalgorithm are independent of p(x), the irreducible polynomial used to gen-erate the polynomial basis. They are both fed with p(x) at each calculation.Our inverters, both the one based on the Berlekamp-Massey algorithm andthe one based on the Gauss-Jordan algorithm, have p(x) hardwired in thearchitecture. This fact makes our inverters less flexible, but also easier touse, compared to the other two inverters.
Chapter 5
Normal Basis Inverters
An interresting property, that can be utilized when deriving architecturesfor arithmetic operations in F2m , is that squaring is a simple cyclic shiftif the element is represented using a normal basis over F2. The Massey-Omura [32] multiplier is one example where this property is used.
Definition 6 (Restated from Section 2.2) Let ϑ be an element of Fqm such
that {ϑqi}m−1i=0 is a basis of Fqm over Fq. Then {ϑqi}m−1
i=0 is called a normalbasis of Fqm over Fq.
There is an important difference between polynomial bases and normal basesfrom an implementational point of view. The choice of a normal basis is farmore crusial than the choice of a polynomial basis. For instance, the size ofa Massey-Omura multiplier for F2m varies heavily with the choice of basis.
5.1 All-One Polynomials
Irreducible all-one polynomials are of special interest for arithmetic usingnormal bases, since they provide efficient architectures, at least for multi-plication. An all-one polynomial of degree m is a polynomial of the form
f(x) =m∑
i=0
xi.
59
60 Chapter 5. Normal Basis Inverters
It is fairly easy to show that the roots of an irreducible all-one polynomialform a normal basis.
Irreducible all-one polynomials do exist, but not for all degrees. There is awell known theorem stating sufficient conditions for irreducibility of all-onepolynomials, namely if e and p are primes such that p is a primitive elementof GF(e), then the all-one polynomial of degree e − 1 is irreducible overGF(p). See for instance Dickson [10, p. 21, Th 33] or Mastrovito [34, p. 32,Th 2.27]. Actually, these conditions are necessary as well, as we state inthe following theorem. We have not been able to find this sharper versionof the theorem anywhere in the literature. For completeness, we give a fullproof of the theorem.
Theorem 3 Let e be an integer, e > 2, and let p be a prime. Then theall-one polynomial of degree e − 1 is irreducible over Fp if and only if e isa prime and p is a primitive element of Fe.
Proof: Let f(x) be the all-one polynomial of degree e−1. First assume thate is a prime and p is a primitive element of Fe. Hence, e and p are relativelyprime. Specifically, p does not divide e. Let ϑ be a root of f(x). We haveϑe = 1 since f(x) divides xe−1. There is therefore a natural interpretation ofexponents of ϑ as elements of Fe. Consider the elements ϑp
i, 0 ≤ i < e− 1.
Since p is a primitive element of Fe, we have{
pi}e−2
i=0= {i}e−1
i=1 in Fe, andtherefore we also have
{
ϑpi}e−2
i=0={
ϑi}e−1
i=1.
Since ϑe = 1 holds, we know that the order of ϑ divides e, but e is a primeand hence, the order of ϑ is either 1 or e. If the order of ϑ is 1, then we haveϑ = 1. However, we have f(1) = e and p ∤ e. Consequently, f(1) is nonzeroin Fp. Therefore, we conclude that the order of ϑ is e and that the elements
ϑpi, 0 ≤ i < e− 1, are distinct and so, f(x) is irreducible over Fp.
Conversely, assume that f(x) is irreducible over Fp. Consider the polyno-mial xe − 1 = (x− 1)f(x). Let a be a divisor of e. Then xa − 1 is a divisorof xe − 1. The only divisors of xe − 1 are xe − 1, f(x), and x− 1, and hence,the only possibility is that we either have a = e or a = 1. It follows that eis a prime.
5.2. The Massey-Omura Multiplier 61
Let ϑ be a root of f(x). Then ϑpi, 0 ≤ i < e− 1, are roots of f(x). Since
f(x) is irreducible, those roots are distinct and there are no other rootsof f(x). We have already noted that there is a natural interpretation ofexponents of ϑ as elements of Fe. The exponents pi, 0 ≤ i < e− 1, aredistinct in Fe since ϑp
i, 0 ≤ i < e− 1, are distinct. Furthermore, since the
number of elements is exactly the same as the number of nonzero elementsof Fe, p is a primitive element of Fe. This last conclusion can be made if pis in
{
pi}e−2
i=0, which is the case since we have e > 2. 2
The first cases of irreducible all-one polynomials over F2 are those with thedegrees
2, 4, 10, 12, 18, 28, 36, 52, 58, 60, 66, 82, and 100.
This fact is a simple consequence of Theorem 3.
5.2 The Massey-Omura Multiplier
Let ϑ be an element in F2m generating a normal basis over F2, and let ϑ′
be the element generating the dual basis of that normal basis for which wehave Tr(ϑϑ′) = 1. Define the vector
φ ,
(
ϑ20
, ϑ21
, . . . , ϑ2m−1)
Let α, β, and γ be any elements of F2m satisfying αβ = γ, with φ-vectors
αφ = (a0, a1, . . . , am−1)T
βφ = (b0, b1, . . . , bm−1)T
γφ = (c0, c1, . . . , cm−1)T
respectively. Define the shift matrix
S ,
(
0 1Im−1 0
)
,
where Im−1 is the (m− 1) × (m− 1) identity matrix. There is a functionfφ satisfying
cm−1 = fφ(αφ, βφ).
62 Chapter 5. Normal Basis Inverters
Squaring in F2m is a cyclic shift if the field is represented using a normalbasis over F2. Therefore, we have
cm−1−i = fφ(Siαφ, S
iβφ), 0 ≤ i < m.
Hence, we can use the same function repeatedly to calculate all coefficientsin γφ, either sequentially or in parallel. This multiplier is known as theMassey-Omura [32] multiplier.
Let us study fφ closer. By repeatedly using Definition 9 on page 11, theassumption γ = αβ yields
φγφ =
m−1∑
i=0
m−1∑
j=0
aibjφ ·(
ϑ2i+2j)
φ.
The last coefficient of the vector(
ϑ2i+2j)
φis Tr(ϑ2i+2j
(ϑ′)2m−1
), and thus,
we have
cm−1 =
m−1∑
i=0
m−1∑
j=0
aibjTr(ϑ2i+2j(ϑ′)2
m−1
).
Define the matrix Tφ = (ti,j)m−1,m−1i,j=0,0 , where the coefficients are given by
ti,j = Tr(
ϑ2i+2j(ϑ′)2
m−1)
.
Then the function fφ can be written as the bilinear transform
fφ(αφ, βφ) = αTφTφβφ.
Rewrite fφ as
fφ(αφ, βφ) =
m−1∑
i=0
m−1∑
j=0
ti,jaibj =∑
(i,j) : ti,j=1
aibj .
The obvious way to implement the function fφ is therefore to use one F2-multiplier for each (i, j) for which we have ti,j = 1, and add all these prod-ucts using an adder tree. Let wH (A) denote the Hamming weight of amatrix or vector A. The implementation of fφ outlined here uses wH (Tφ)F2-multipliers and wH (Tφ) − 1 F2-adders.
5.2. The Massey-Omura Multiplier 63
The properties of Tφ have been studied by Mullin et al. [38], Ash et al. [4],and Geiselmann [18]. They use the complexity measure
Cφ , wH (Tφ) ,
which we call the Hamming complexity of φ. Ash et al. give the trivial upperbound Cφ ≤ m2. This general bound can be sharpened somewhat.
Ash et al. note that
ϑ1+2l=
m−1∑
k=0
t−k−1,l−k−1ϑ2k
holds, where the indices are reduced mod m. Hence, the vectors (ϑ1+2l)φ,
0 ≤ l < m, are the diagonals of Tφ, where the diagonals are wrapped aroundthe edges of Tφ. In particular, this means that the main diagonal of Tφ is(ϑ2)φ, i.e. all elements are zero except one.
To be able to sharpen the trivial upper bound, we first prove three simplelemmas.
Lemma 5 Let ϑ be a root of an irreducible polynomial of degree m, m > 1,over F2, such that ϑ1+2l
= 1 holds for an integer l, 0 ≤ l < m. Then m iseven and we have l = m/2.
Proof: First, ϑ2 is one of the roots of an irreducible polynomial of degreem,m > 1, over F2. Hence, we have ϑ2 6= 1 and l 6= 0. Assume that there are twovalues, l1 and l2, in {1, 2, . . . ,m− 1}, for which we have ϑ1+2l1 = ϑ1+2l2 = 1.
This gives us ϑ2l1 = ϑ2l2 . But ϑ is a root of an irreducible polynomial overF2, and therefore l1 = l2 holds. Thus, there is at most one l, 0 < l < m,such that ϑ1+2l
= 1 holds. Now assume that we actually have one suchl. We note that ϑ1+2m−l
= (ϑ1+2l)2
m−lholds. Hence, m − l, which is an
element in {1, 2, . . . ,m− 1}, is such that ϑ1+2m−l= 1 holds. Since there
cannot be more than one such element in {1, 2, . . . ,m− 1}, we must havem − l = l, which gives us m = 2l. In other words, m is even and we havel = m/2. 2
Lemma 6 Let m be an even positive integer, and let ϑ be a normal elementin F2m over F2 such that ϑ1+2m/2
= 1 holds. Let φ be the normal basis
defined by ϑ. Then we have wH
(
(ϑ1+2m/2+1
)φ
)
= wH
(
(ϑ1+2m/2−1
)φ
)
= 1.
64 Chapter 5. Normal Basis Inverters
Proof: From the assumption ϑ1+2m/2
= 1 we have ϑ1+2m/2+1
= ϑ2m/2
, which
is one of the basis elements in φ. Thus, we have wH
(
(ϑ1+2m/2+1
)φ
)
= 1. We
note that ϑ1+2m/2−1
= (ϑ1+2m/2+1
)2m/2−1
holds. Using ϑ1+2m/2+1
= ϑ2m/2
,
in the last expression, we have ϑ1+2m/2−1
= ϑ2m−1
, which also is one of the
basis elements in φ. Thus, we have wH
(
(ϑ1+2m/2−1
)φ
)
= 1 as well. 2
Lemma 7 Let m be an even positive integer, and let φ be a normal basis
of F2m over F2. Also, let α be an element in F2m. Then wH
(
(α1+2m/2
)φ
)
is even.
Proof: First, we note that 2m − 1 = (2m/2 + 1)(2m/2 − 1) holds. Hence
we have (α2m/2+1)2m/2−1 = 1, and α1+2m/2
is an element of F2m/2 . This
also means that we have (α1+2m/2
)2m/2
= α2m/2
. Since φ is a normal basis
this means that the first half of (α1+2m/2
)φ equals the last half and hence,
wH
(
(α1+2m/2
)φ
)
is even. 2
We are now ready to prove the improved general upper bound.
Theorem 4 Let φ be a normal basis of F2m over F2. Then the Hammingcomplexity of φ is upper bounded by
Cφ ≤ m2 −m− 2⌈m/2⌉ + 3, m ∈ {2, 3},Cφ ≤ m2 −m− 2⌈m/2⌉ + 1, m > 3
Proof: Define vl , wH
(
(ϑ1+2l)φ
)
. Recall the definition Cφ , wH (Tφ). Let
ϑ be one of the generators of φ. We have already noted that the diagonalsof the m ×m matrix Tφ, wrapped around the edges of Tφ, are the vectors
(ϑ1+2l)φ, 0 ≤ l < m. Therefore, we have
Cφ =
m−1∑
l=0
vl.
5.2. The Massey-Omura Multiplier 65
The first element in this sum, v0, is 1, since ϑ2 is one of the basis elementsof φ. Hence we have the upper bound
Cφ ≤ m2 − (m− 1).
Since φ is a normal basis over F2, we have 1φ = (1, . . . , 1)T. Hence, for every
l for which we have ϑ1+2l 6= 1, we have vl < m. According to Lemma 5,there is at most one integer l, 0 ≤ l < m, such that ϑ1+2l
= 1 holds, andonly if m is even. Hence we can reduce the above bound by m− 2 for evenm, and by m− 1 for odd m. Thus, we have the upper bounds
Cφ ≤ m2 − (m− 1) − (m− 2), m even,Cφ ≤ m2 − (m− 1) − (m− 1), m odd.
Let m be even, m > 2, and let us study ϑ1+2m/2
. We can identify twopossibilities.
1. We have vm/2 = m, corresponding to ϑ1+2m/2
= 1, which is possibleaccording to Lemma 5. According to Lemma 6, we then have vm/2+1 =vm/2−1 = 1. Thus, in this case there are an additional 2(m− 2) zerosin the corresponding two diagonals of Tφ.
2. We have vm/2 < m, corresponding to ϑ1+2m/2 6= 1. By Lemma 7, vm/2is even. Thus, we have vm/2 ≤ m− 2. In this case there is at least anadditional 2 zeros in the main diagonal.
Thus, for even m, m > 2, we have the bound
Cφ ≤ m2 − (m− 1) − (m− 2) − 2.
Now, let m be odd, m > 3. Assume that there is an l such that vl = m− 1holds. Then we have ϑ1+2l
= 1 + ϑ2kfor some k, 0 ≤ k < m. Using this
last expression, we find
ϑ1+2l+1
= (1 + ϑ2k)ϑ2l
= ϑ2l+ ϑ2k+2l
= (ϑ+ ϑ1+2k−l)2
l.
Hence, we have either vl+1 = vk−l + 1 or vl+1 = vk−l − 1, where the indicesare reduced mod m. Then there are four possibilities.
1. We have 0 < l < m − 1 and vk−l = m − 1. Then we also havevl+1 = m − 2, since all these weights are at most m − 1. Since Tφ issymmetric, we also have vm−l−1 = m− 2. This reduces the bound by2.
66 Chapter 5. Normal Basis Inverters
2. We have 0 < l < m− 1 and vk−l = m− 2. Since m > 3 holds, we alsohave m− 2 > 1. Thus, we have k − l 6= 0 mod m, since v0 = 1 holds.Again since Tφ is symmetric, we also have vm−k+l = m− 2. This alsoreduces the bound by 2.
3. We have 0 < l < m − 1 and vk−l < m − 2. Then, since we havevl+1 ≤ vk−l+1, we also have vl+1 < m−1, Again since Tφ is symmetric,we also have vm−l−1 < m− 1. This also reduces the bound by 2.
4. We have l = m − 1. Then we have l + 1 = 0 mod m, and as wealready have noted, v0 = 1 holds. Then we have vk−l = 2 and by thesymmetry of Tφ, we also have vm−k+l = 2. This reduces the bound by2(m− 3).
Thus, for odd m, m > 3, we have the bound
Cφ ≤ m2 − (m− 1) − (m− 1) − 2.
2
Mullin et al. [38] and Geiselmann [18] showed that the inequality
Cφ ≥ 2m− 1
holds for any normal basis. This lower bound coincides with our upperbound in Theorem 4 for m ∈ {2, 3}, and there are normal bases meetingour upper bound for m ∈ {2, 3, 4, 5}. For fairly large m, however, ourupper bound seems to be off by approximately 10%. As an example, Geisel-mann [18] gives a normal basis of F230 over F2 with Hamming complexityCφ = 759. Theorem 4 in this case gives the bound Cφ ≤ 841.
If we have Cφ = 2m − 1 for a normal basis φ, then φ is called optimal.There are optimal normal bases for some fields, but not for all fields. Itis fairly easy to show that roots of irreducible all-one polynomials generateoptimal normal bases. However, there are other optimal normal bases. Thefollowing three theorems, due to Ash et al. [4], give us a means to createnormal bases, with some additional control of the Hamming complexity.
Theorem 5 Let p be a prime satisfying p = km + 1 for some positiveintegers k and m. Furthermore, let c be a primitive k-th root of unity in
5.2. The Massey-Omura Multiplier 67
Fp, and let β be a primitive p-th root of unity in F2km. If F∗p is generated
by 2 and c, then
α =k−1∑
i=0
βci
generates a normal basis of F2m over F2.
Ash et al. [4] call these bases projection bases, since they are projected fromF2km to F2m using a trace-like projection. They noted that there is at leastone k such that the conditions in Theorem 5 are fulfilled if and only if 8 ∤ m.Thus, not all normal bases can be constructed using Theorem 5, since thereare normal bases of F2m over F2 for all m, m > 1.
Theorem 6 Consider a normal basis φ of F2m over F2 constructed as inTheorem 5. Then we have the bounds
km− (k2 − 3k + 3) ≤ Cφ ≤ km− 1, for even k(k + 1)m− (k2 − k + 1) ≤ Cφ ≤ (k + 1)m− k, for odd k
According to Theorem 6, any normal basis generated by Theorem 5 withk ∈ {1, 2} is optimal, but k > 2 may produce optimal normal bases for somem. For those m where Theorem 5 can be used with k = 1, we get normalbases generated by irreducible all-one polynomials. Ash et al. also give anumber of theorems that we summarize in the following theorem. We haveused the lower bound Cφ ≥ 2m− 1 to rule out impossible values of m.
Theorem 7 Consider a normal basis φ of F2m over F2 constructed as inTheorem 5. For the following cases, we achieve the lower bound given inTheorem 6.
(i) k ∈ {1, 2} (iv) k = 6 and m > 12(ii) k ∈ {3, 4} and m > 2 (v) k = 7 and m > 6(iii) k = 5 and m > 4
Geiselmann [18] lists normal bases of F2m over F2 for m ≤ 60. In all caseswhere he has found optimal normal bases, Theorem 5 produces optimalnormal bases with k ∈ {1, 2}. The best known Hamming complexity forextension degree m, 2 ≤ m ≤ 60, according to Geiselmann, are given inTable 5.1.
The properties of a parallel Massey-Omura multiplier for a normal basis φare given in Table 5.2.
68 Chapter 5. Normal Basis Inverters
m Cφ
2 3 *3 5 *4 7 *5 9 *6 11 *7 198 219 17 *
10 19 *11 21 *12 23 *13 4514 27 *15 45
m Cφ
16 8517 8118 35 *19 11720 6321 9522 6323 45 *24 10525 9326 51 *27 14128 55 *29 57 *30 59 *
m Cφ
31 23732 36133 65 *34 24335 69 *36 71 *37 14138 20739 77 *40 18941 81 *42 13543 16544 14745 153
m Cφ
46 13547 26148 42549 18950 99 *51 101 *52 103 *53 105 *54 20955 18956 39957 49758 115 *59 59760 119 *
Table 5.1: Best known Hamming complexity for normal bases of extension degreem, 2 ≤ m ≤ 60, over F2, according to Geiselmann [18]. For m ≤ 32, thenumbers are known to be the best possible. The numbers marked byan asterisk (*) correspond to optimal bases, and are thus also the bestpossible.
Number of clock cycles (n) 1Normalized delay (t) 10(1 + log2 Cφ)Normalized time (nt) 10(1 + log2 Cφ)Normalized area (a) (18Cφ − 12)mNormalized power (p) (18Cφ − 12)mNormalized energy (np) (18Cφ − 12)mNormalized input capacitance (c) 2Cφ
Normalized output resistance (r) 2
Table 5.2: Properties of a parallel Massey-Omura multiplier for a normal basis φ. Allproperties are upper bounds, except the number of clock cycles, the inputcapacitance, and the output resistance.
5.3. Inversion Based on Exponentiation 69
5.3 Inversion Based on Exponentiation
Many published inverters for normal basis representation are based on thefact that α = α2m
holds for all α ∈ F2m . For α 6= 0, multiply both sidesof the equation by α−2, and we get α−1 = α2m−2. Hence, it is possibleto perform inversion by exponentiation. This exponentiation can be per-formed by alternating squaring and multiplication. We have already notedthat squaring in F2m is a cyclic shift when we are assuming a normal basisrepresentation over F2. The problem is that we need several multiplicationsfor each inversion.
5.3.1 Inversion by Squaring and Multiplication
Consider the decomposition
2m − 2 =
m−1∑
i=1
2i
It follows that we have
α−1 = α2m−2 =
m−1∏
i=1
α2i.
Based on this equation, Wang et al. [47] give the following algorithm forinversion.
Algorithm 5.1 Inversion by Squaring and Multiplication
Given α ∈ F∗
2m , initiate A = α and B = 1.
repeat m− 1 timesSet (A,B) = (A2, A2B)
end repeat
The result is in B.
Wang et al. [47] also give an architecture that is based on Algorithm 5.1.However, since squaring is costless in normal basis representation, we canremove one iteration from Algorithm 5.1 by doing the first squaring uponinitialization. This gives us the following modified algorithm.
70 Chapter 5. Normal Basis Inverters
Algorithm 5.2 Inversion by Squaring and Multiplication (5.1 modified)
Given α ∈ F∗
2m , initiate A = B = α2.
repeat m− 2 timesSet (A,B) = (A2, A2B)
end repeat
The result is in B.
We can also rewrite the exponent as
2m − 2 = ((· · · ((2 + 1) 2 + 1) 2 · · · + 1) 2 + 1) 2,
where 2 appears m− 1 times. Thus, we have
α−1 = α2m−2 =
(
(
· · ·(
(
α2α)2α)2
· · ·α)2
α
)2
where α appears m− 1 times in the left hand side. Based on this equation,we can use the following algorithm for inversion.
Algorithm 5.3 Inversion by Squaring and Multiplication
Given α ∈ F∗
2m , initiate A = α, B = 1.
repeat m− 1 timesSet B = (AB)2
end repeat
The result is in B.
Mastrovito [34] gives an architecture that is based on Algorithm 5.3. Aswith Algorithm 5.1, we can remove one iteration from Algorithm 5.3 bydoing the first squaring upon initialization.
Algorithm 5.4 Inversion by Squaring and Multiplication (5.3 modified)
Given α ∈ F∗
2m , initiate A = α, B = α2.
repeat m− 2 timesSet B = (AB)2
end repeat
The result is in B.
5.3. Inversion Based on Exponentiation 71
Initiate Initiate
m memory cells for F2 m memory cells for F2
Square in F2m
Square in F2m
α α−1
Figure 5.1: An architecture of an inverter for F2m based on Algorithm 5.2. α−1 ispresent at the output after m− 2 steps.
Initiate Initiate
m memory cells for F2m memory cells for F2
Square in F2m
Square in F2m
α α−1
Figure 5.2: An architecture of an inverter for F2m based on Algorithm 5.4. α−1 ispresent at the output after m− 2 steps.
Algorithms 5.1 through 5.4 can all be used regardless of what type of basisis used. In Figures 5.1 and 5.2, we display architectures, based on Algo-rithms 5.2 and 5.4, respectively. These architectures are similar to thosepublished by Wang et al. [47] and Mastrovito [34], respectively. The squar-ing in both architectures is implemented as hard wired shifts. The prop-erties of these architectures are given in Tables 5.3 and 5.4, assuming thatthe multiplier is a parallel Massey-Omura multiplier as in Section 5.2.
72 Chapter 5. Normal Basis Inverters
Number of clock cycles (n) m− 2Normalized delay (t) 14.4 log2 Cφ + 26Normalized time (nt) 14.4m log2 Cφ + 26m− 28.8 log2 Cφ − 52Normalized area (a) 22Cφm+ 52mNormalized power (p) 22Cφm+ 68m+ 32 + 64
m−2
Normalized energy (np) 22Cφm2 + 68m2 − 44Cφm− 104m
Table 5.3: Properties of the inverter in Figure 5.1, assuming a parallel Massey-Omuramultiplier, for a normal basis φ. All properties are upper bounds, exceptthe number of clock cycles.
Number of clock cycles (n) m− 2Normalized delay (t) 14.4 log2 Cφ + 26Normalized time (nt) 14.4m log2 Cφ + 26m− 28.8 log2 Cφ − 52Normalized area (a) 22Cφm+ 52m
Normalized power (p) 18Cφm+ 60m+ 4Cφ + 40 +8Cφ+80
m−2
Normalized energy (np) 18Cφm2 + 60m2 − 32Cφm− 80m
Table 5.4: Properties of the inverter in Figure 5.2, assuming a parallel Massey-Omuramultiplier, for a normal basis φ. All properties are upper bounds, exceptthe number of clock cycles.
5.3. Inversion Based on Exponentiation 73
5.3.2 Inversion by Accellerated Squaring and Multiplication
The algorithms in Section 5.3.1 all use O (m) multiplications in F2m . Feng [13]presents an algorithm that performs inversion using only O (log2m) multi-plications in F2m .
Define q = ⌊log2m⌋, and consider the binary decomposition
m− 1 =
q∑
i=0
mi2i,
where we have mi ∈ {0, 1}, 0 ≤ i < q, and mq = 1. Also, define
p =
q−1∑
i=0
mi,
where mi, 0 ≤ i < q, are interpreted as integers. Feng [13] showed that
2m − 2 =
=(([
· · ·[(
mq2−mq2q
(1 + 22q−1
) +mq−1
)
2−mq−12q−1
(1 + 22q−2
)
+ mq−2 |] 2−mq−22q−2
(1 + 22q−3
) + · · · +m2
]
2−m222
(1 + 221
)
+ m1 |) 2−m121
(1 + 220
) +m0
)
2m−m0
holds. Thus, we have
α−1 = α2m−2
=
|· · · [((αmq)2−mq2q
(1+22q−1) · αmq−1
)2−mq−12q−1(1+22q−2
)
· αmq−2 |]2−mq−22q−2(1+22q−3
)
· · ·αm2
2−m222 (1+221)
· αm1 |2−m121 (1+220)
· αm0
2m−m0
.
Based on this equation, Feng [13] gives the following algorithm for inversion.
74 Chapter 5. Normal Basis Inverters
Algorithm 5.5 Inversion by Accellerated Squaring and Multiplication
Given α ∈ F∗
2m , initiate A = α, B = α2.
for i = q to 1if mi = 1 then
Set B = B2−2
i
end if
Set B = B ·B22
i−1
This row is executed q times.if mi−1 = 1 then
Set B = BA This row is executed p times.end if
end for
The result is B2m−m0
.
In Figure 5.3 there is an architecture based on Algorithm 5.5. The multi-plexer layer feeds the multiplier with the correct data at each time instance,which includes choosing the correct cyclic shifts of the data. Its propertiesare given in Table 5.5
An algorithm based on a similar technique as Algorithm 5.5 is proposed byItoh and Tsujii [26]. That algorithm gives an architecture that is similar tothe one in Figure 5.3, with similar properties. However, it only works form = 2r + 1, where r is a positive integer.
5.4 Polynomial Basis Inverters Revisited
Let p(x) be an irreducible all-one polynomial of degree m and let ϑ be aroot of p(x). In the proof of Theorem 3 on page 60 we observed that
{
ϑpi}m−1
i=0={
ϑi}m
i=1
holds. This observation has implications. After reordering the basis ele-ments, the normal basis defined by ϑ is a multiple of the polynomial basisdefined by ϑ. Also, the dual of that normal basis, which in turn is also anormal basis, is a reordered multiple of the dual of the polynomial basis.
The basis exchange between the normal basis and the polynomial basis isvery simple. Define the vectors
θ =(
ϑ0, ϑ1, . . . , ϑm−1)
,
φ =(
ϑ1, ϑ2, . . . , ϑm)
,
5.4. Polynomial Basis Inverters Revisited 75
Initiate Initiate
Control
m memory cells for F2m memory cells for F2
Square in F2m
Power 2m−m0
Multiplexer layer
α
α−1
Figure 5.3: An architecture of an inverter for F2m based on Algorithm 5.5. α−1 ispresent at the output after p+ q steps.
Number of clock cycles (n) p+ qNormalized delay (t) 14.4 log2 Cφ + 33.0Normalized time (nt) (14.4 log2 Cφ + 33.0)(p+ q)Normalized area (a) 22Cφm+ 12qm+ 64mNormalized power (p) 22Cφm+ 60m+ 56m
p+q
Normalized energy (np) (22Cφ + 60)(p+ q)m+ 56m
Table 5.5: Properties of the inverter in Figure 5.3, assuming a parallel Massey-Omuramultiplier, for a normal basis φ. All properties are upper bounds, exceptthe number of clock cycles.
76 Chapter 5. Normal Basis Inverters
i.e. θ is the polynomial basis and φ is the reordered normal basis. Let ai,i ∈ {0, 1, . . . ,m− 1} and bi, i ∈ {0, 1, . . . ,m− 1} be the coefficients of αθand αφ respectively. Then we have the equalities
α =
m−1∑
i=0
aiϑi =
m−1∑
i=0
biϑi+1. (5.1)
Since ϑ is a root of the all-one polynomial of degree m, we have
ϑm =
m−1∑
i=0
ϑi.
Applying this equality on Equation 5.1, we get
ai =
{
bm−1, i = 0bm−1 + bi−1, 0 < i < m
(5.2)
bi =
{
a0, i = m− 1a0 + ai+1, 0 ≤ i < m− 1
(5.3)
These operations can easily be performed either in parallel or sequentially.We can invert α represented as αφ by
1. determining αθ from αφ using Equation 5.2,
2. determining(
α−1)
θusing any inverter for polynomial basis represen-
tation, and
3. determining(
α−1)
φfrom
(
α−1)
θusing Equation 5.3.
The cost of these basis exchanges is small compared to the cost of invertingin a polynomial basis.
The inverter based on the Euclidean algorithm in Section 4.1 has bothparallel input and parallel output. We therefore assume that the basisexchanges are obtained by two arrays of adders performing the operationsin Equations 5.2 and 5.3 in parallel. Using this approach, we reach theproperties in Table 5.6.
The inverter based on the Berlekamp-Massey algorithm in Section 4.2 hasserial input and parallel output. We therefore assume that the first basis
5.4. Polynomial Basis Inverters Revisited 77
Number of clock cycles (n) 2m− 1Normalized delay (t) 4.4 log2(m+ 1
4) + 26.8 + tctrl
Normalized time (nt) (2m− 1)(
4.4 log2(m+ 1
4) + 26.8 + tctrl
)
Normalized area (a) 224m− 4Normalized power (p) 181.5m+ 139.25 + 11.25
2m−1
Normalized energy (np) 371m2 + 93m− 128
Table 5.6: Properties of the Euclidean inverter architecture for normal basis repre-sentation, where the normal basis is generated by an all-one polynomialof degree m. All properties except the number of clock cycles are upperbounds. tctrl is the normalized delay of the control logic.
Number of clock cycles (n) 2mNormalized delay (t) 14.4 log2(m) + 54.9Normalized time (nt) 28.8m log2(m) + 109.8mNormalized area (a) 164m− 64Normalized power (p) 172m+ 9 − 35.6
mNormalized energy (np) 344m2 + 18m− 71.2
Table 5.7: Properties of the Berlekamp-Massey inverter architecture for normal basisrepresentation, where the normal basis is generated by an all-one polyno-mial of degree m. All properties except the number of clock cycles areupper bounds.
exchange is done by a single adder performing the operations in Equation 5.2sequentially, and we assume that the second basis exchange is done by anarray of adders performing the operations in Equation 5.3 in parallel. Wereach the properties in Table 5.7.
The two inverters based on the Gauss-Jordan algorithm can naturally alsobe used. However, we do not need to perform the basis exchanges explicitly.Instead, we can modify the right hand side of the equation and keep thegeneration of the matrix intact, by choosing suitable bases for internal usein the architectures.
The inverter based on the Gauss-Jordan algorithm proposed by Hasan andBhargava [22], briefly described in Section 4.3.2, generates αθ,θ from αθ andsolves the equation αθ,θ(α
−1)θ = 1θ. We need a basis τ such that ατ,φ isgenerated from αφ in the same way as αθ,θ is generated from αθ, in order to
78 Chapter 5. Normal Basis Inverters
keep the matrix generation in the preprocessor unchanged. Let us thereforecompare these matrices. By Definition 10 on page 11, we have
ατ,φ =((
ϑ1α)
τ,(
ϑ2α)
τ, . . . , (ϑmα)τ
)
,
αθ,θ =((
ϑ0α)
θ,(
ϑ1α)
θ, . . . ,
(
ϑm−1α)
θ
)
.
For the i-th column, 1 ≤ i ≤ m, of these matrices, we have
(
ϑiα)
τ= ϑiτ,τ1τ,φαφ,
(
ϑi−1α)
θ= ϑi−1
θ,θ αθ,
respectively, based on Lemmas 2 and 4 on pages 12 and 37. If ατ,φ andαθ,θ are generated in the same way from αφ and αθ respectively, we haveϑiτ,τ1τ,φ = ϑi−1
θ,θ , 1 ≤ i ≤ m. Thus, we have 1τ,φ = ϑ−1θ,θ. Using Lemmas 3
and 4 on pages 36 and 37, this gives us τ = φϑθ,θ. It is easily shown thatwe have
ϑθ,θ =
0 · · · 0 11 0 1
. . ....
0 1 1
.
The task of the architecture is now to find (α−1)φ by solving ατ,φ(α−1)φ = 1τ .
Finally, for the left hand side of the equation we have
1τ = 1τ,φ1φ =
1 1 0...
. . .
1 0 11 0 · · · 0
1...1
=
0...01
.
Thus, the only modification that is needed is to change the left hand sideof the equation from 1θ to 1τ . However, we can make use of the fact thatp(x) is an all-one polynomial. Since all coefficients of p(x) are one, wedo not need to feed these coefficients to the architecture. Instead, we canhardwire the effect of these coefficients, thus removing 2 flip-flops and oneF2-multiplier in each cell in the preprocessor in Figure 4.6 on page 50. Wereach the properties in Table 5.8.
Our inverter based on the Gauss-Jordan algorithm in Section 4.3.3 solvesa Hankel problem that arises from an internal use of the triangular basis σof θ. The architecture generates ασ,θ from αθ and finds (α−1)θ by solving
5.4. Polynomial Basis Inverters Revisited 79
Number of clock cycles (n) mNormalized delay (t) 30Normalized time (nt) 30mNormalized area (a) 73m2 + 271m− 194Normalized power (p) 113m2 + 475m− 288 − 136m−1
Normalized energy (np) 113m3 + 475m2 − 288m− 136
Table 5.8: Properties of the systolic Gauss-Jordan inverter architecture due to Hasanand Bhargava for normal basis representation, where the normal basis isgenerated by an all-one polynomial of degree m. All properties except thenumber of clock cycles are upper bounds.
the Hankel problem ασ,θ(α−1)θ = 1σ. We need a basis ψ such that αψ,φ is
generated from αφ in the same way as ασ,θ is generated from αθ, in order tokeep the matrix generation in the preprocessor unchanged. As before, letus compare these matrices. Again, by Definition 10 on page 11, we have
αψ,φ =(
(
ϑ1α)
ψ,(
ϑ2α)
ψ, . . . , (ϑmα)ψ
)
,
ασ,θ =((
ϑ0α)
θ,(
ϑ1α)
θ, . . . ,
(
ϑm−1α)
θ
)
.
For the i-th column, 1 ≤ i ≤ m, of these matrices, we have
(
ϑiα)
ψ= ϑiψ,ψ1ψ,φαφ,
(
ϑi−1α)
θ= ϑi−1
σ,σ 1σ,θαθ
respectively, again based on Lemmas 2 and 4 on pages 12 and 37. If αψ,φand ασ,θ are generated in the same way from αφ and αθ respectively, we haveϑiψ,ψ1ψ,φ = ϑi−1
σ,σ 1σ,θ for all i in {1, . . . ,m}. Thus, 1ψ,φ = ϑ−1σ,σ1σ,θ holds. By
Lemma 4, we have 1φ,ψ = 1−1ψ,φ. Combining the last two expressions, we get
1φ,ψ = 1−1σ,θϑσ,σ. Again by Lemma 4, we have 1θ,σ = 1−1
σ,θ. In the proof of
Theorem 2 we noted that ϑσ,σ = ϑTθ,θ holds. By Lemma 3 and Definition 12,
we have
1θ,σ =
1 · · · 1... . .
.
1 0
.
80 Chapter 5. Normal Basis Inverters
Number of clock cycles (n) mNormalized delay (t) 40Normalized time (nt) 40mNormalized area (a) 61m2 + 375m− 192Normalized power (p) 93m2 + 527m− 256 − 68m−1
Normalized energy (np) 93m3 + 527m2 − 256m− 68
Table 5.9: Properties of our systolic Gauss-Jordan inverter architecture for normalbasis representation, where the basis is generated by an all-one polynomialof degree m. All properties except the number of clock cycles are upperbounds.
Combining the last four expressions, we get
1φ,ψ = 1θ,σϑTθ,θ =
1 · · · 1... . .
.
1 0
0 1 0...
. . .
0 0 11 1 · · · 1
=
1 0 · · · 00 1 · · · 1...
... . ..
0 1 0
.
This matrix can be used to find ψ as ψ = φ1φ,ψ according to Lemma 3 onpage 36.
The task of the architecture is now to find (α−1)φ by solving αψ,φ(α−1)φ = 1ψ.
Finally, for the left hand side of the equation, we have
1ψ = 1ψ,φ1φ = 1−1φ,ψ1φ =
1 0 · · · 0 00 0 · · · 0 1...
... . ..
. ..
10 0 . .
.. .
.
0 1 1 0
1...1
=
110...0
.
Thus, the only modification that is needed is to change the left hand sideof the equation from 1σ to 1ψ. We reach the properties in Table 5.9, whichare the properties of the original architecture with w = m+ 1.
5.5 Properties of the Normal Basis Inverters
The architectures in this chapter can be divided into two categories. Thethree architectures in Section 5.3 are designed to be used with any normal
5.5. Properties of the Normal Basis Inverters 81
basis, while the four architectures in Section 5.4 are designed to be usedonly with normal bases generated by irreducible all-one polynomials. InFigures 5.4 through 5.6 we have plotted the normalized time, area, andenergy, respectively, needed for the architectures considered in this chapter,for 2 ≤ m ≤ 65. These plots are based on Tables 5.3 through 5.9, on pages72 – 80. We have neglected the delay of the control logic in Figure 5.4.
The three architectures in Section 5.3 are based on similar ideas. Theyperform the exponentiation α2m−2 based on some decomposition of 2m − 2.They also have very similar properties, except that Feng’s architecture inFigure 5.3 on page 75 needs only at most 2 log2m clock cycles, where thearchitectures in Figure 5.1 and Figure 5.2 on page 71 both need m−2 clockcycles, as seen in Tables 5.3 and 5.4 on page 72, and Table 5.5 on page 75.This also affects the energy needed to complete an inversion. We shouldtherefore prefer the architecture based on Feng’s algorithm for all normalbases that are not generated by an irreducible all-one polynomial.
When the normal basis is generated by an irreducible all-one polynomial,we should typically prefer the inverters in Section 5.4, at least if the areais the most important property. For m > 20, the inverter based on Feng’salgorithm is still the fastest one. Since the inverters in Section 5.4 arebased on inverters for polynomial basis representation, they inherit theirproperties from the corresponding inverters in Chapter 4. The commentsmade in Section 4.4 therefore hold here as well. The fastest architecture isstill Hasan and Bhargava’s inverter based on the Gauss-Jordan algorithm,while the smallest and least power consuming one is the inverter based onthe Berlekamp-Massey algorithm, as we can see in Tables 5.6 through 5.9 onpages 77 through 80. For small m, the inverter based on Feng’s algorithmconsumes less energy than the inverters in Section 5.4. For fairly largem, the energy consumption of the inverter based on Feng’s algorithm isapproximately the same as the the energy consumption of the invertersbased on the Berlekamp-Massey algorithm and the Euclidean algorithm.
82 Chapter 5. Normal Basis Inverters
0 10 20 30 40 50 600
50
100
150
200
250
300
350Normal basis inverters
M&S−1&2 M&S−3 AOP:EUC AOP:BM AOP:GJ−1AOP:GJ−2
m
nt/
m
Figure 5.4: Normalized time needed for inversion in F2m using normal bases and thearchitectures considered in this chapter. In all cases we consider normalbases whose Hamming complexity is the smallest known.
M&S-1&2 – Inversion based on multiplication and squaring us-ing our modified versions of the ideas of Wang et al. and Mastrovitorespectively.M&S-3 – Inversion based on accellerated multiplication and squaringusing Fengs idea.
AOP:* – Inversion for the case where the normal basis is gener-ated from an irreducible all-one polynomial, via a basis exchange to thecorresponding polynomial basis.AOP:EUC – Inversion based on the Euclidean algorithm.AOP:BM – Inversion based on the Berlekamp-Massey algorithm.AOP:GJ-1 – Inversion based on the Gauss-Jordan algorithm usingHasan and Bhargava’s preprocessor.AOP:GJ-2 – Inversion based on the Gauss-Jordan algorithm using ourpreprocessor.
5.5. Properties of the Normal Basis Inverters 83
0 10 20 30 40 50 600
50
100
150
200
250
300Normal basis inverters
M&S−1&2 M&S−3 AOP:EUC AOP:BM AOP:GJ−1AOP:GJ−2
m
a/m
2
Figure 5.5: Normalized area needed for inversion in F2m using normal bases and thearchitectures considered in this chapter. In all cases we consider normalbases whose Hamming complexity is the smallest known.
M&S-1&2 – Inversion based on multiplication and squaring us-ing our modified versions of the ideas of Wang et al. and Mastrovitorespectively.M&S-3 – Inversion based on accellerated multiplication and squaringusing Fengs idea.
AOP:* – Inversion for the case where the normal basis is gener-ated from an irreducible all-one polynomial, via a basis exchange to thecorresponding polynomial basis.AOP:EUC – Inversion based on the Euclidean algorithm.AOP:BM – Inversion based on the Berlekamp-Massey algorithm.AOP:GJ-1 – Inversion based on the Gauss-Jordan algorithm usingHasan and Bhargava’s preprocessor.AOP:GJ-2 – Inversion based on the Gauss-Jordan algorithm using ourpreprocessor.
84 Chapter 5. Normal Basis Inverters
0 10 20 30 40 50 600
50
100
150
200
250
300Normal basis inverters
M&S−1 M&S−2 M&S−3 AOP:EUC AOP:BM AOP:GJ−1AOP:GJ−2
m
np/
m3
Figure 5.6: Normalized energy needed for inversion in F2m using normal bases andthe architectures considered in this chapter. In all cases we considernormal bases whose Hamming complexity is the smallest known.
M&S-1 – Inversion based on multiplication and squaring usingour modified version of the idea of Wang et al.M&S-2 – Inversion based on multiplication and squaring using ourmodified version of the idea of Mastrovito.M&S-3 – Inversion based on accellerated multiplication and squaringusing Fengs idea.
AOP:* – Inversion for the case where the normal basis is gener-ated from an irreducible all-one polynomial, via a basis exchange to thecorresponding polynomial basis.AOP:EUC – Inversion based on the Euclidean algorithm.AOP:BM – Inversion based on the Berlekamp-Massey algorithm.AOP:GJ-1 – Inversion based on the Gauss-Jordan algorithm usingHasan and Bhargava’s preprocessor.AOP:GJ-2 – Inversion based on the Gauss-Jordan algorithm using ourpreprocessor.
Chapter 6
Inversion in Tower Fields
Extension fields can be represented in many different ways. So far we haveonly considered F2m to be an extension of F2 using either a polynomial ornormal basis. However, there are many valid bases of F2m over F2, that areneither polynomial nor normal. It is perfectly possible that some of thesenon-standard bases can generate smaller, faster, or less energy-consumingarchitectures. One way to create a non-polynomial, non-normal basis is tomake the extension in more than one step. For instance, let n divide m,and create F2n as an extension of F2. Then create F2m as an extension ofF2n of degree m/n. Such fields are often referred to as composite fields. Theresulting basis of F2m over F2 is typically non-polynomial and non-normaleven if the two extensions are made using polynomial or normal bases.
A tower over the field F is strictly a set F of finite extensions of F which istotally ordered by inclusion. The term tower field has been used in the liter-ature to denote the largest field in F , assuming that the field is representedby successive extensions using all fields in F , and we adopt this notion.Let m be a power of two, F = F2 and F =
{
F22k : k ∈ {1, 2, . . . , log2m}
}
.Hence we have the finite field F2m , where m is a power of two, and wherethe field is constructed by successive extensions of degree two, starting withF2. The properties of operations in such a field depend on how F2m isrepresented as an extension of degree two of its largest true subfield F2m/2 .
85
86 Chapter 6. Inversion in Tower Fields
6.1 Bases of Tower Fields
First we state three theorems that will help us to characterize bases of F2m
over F2m/2 . The following theorem is a special case of a well known theoremestablished by Pellet [40]. It can be found in any text book about finitefields, see for instance Lidl, Niederreiter [30, Cor. 3.79, p. 127]. By thisresult we are able to establish all polynomial bases of F2m over Fm/2.
Theorem 8 The polynomial x2 + x+ ǫ is irreducible over F2n if and onlyif we have TrF2n/F2
(ǫ) = 1.
A direct consequence of Theorem 8 is that there are 2n−1 irreducible polyno-mials of the given form. Furthermore, every polynomial f(x) = ax2+bx+c,where a and b are nonzero, can be transformed into the given form by thetransformation (a/b2)f(bx/a) = x2 + x+ ac/b2.
If n is even, then TrF2n/F2(ǫ) is zero whenever we have ǫ ∈ F2n/2 . If n is a
power of two, all true subfields of F2n are contained in F2n/2 . Therefore ifwe have TrF2n/F2
(ǫ) = 1, then ǫ is in F2n \ F2n/2 .
The following theorem is a special case of a theorem of Perlis [41, Th. 1].
Theorem 9 Let m be an even integer and let p(x) be an irreducible poly-nomial over F2m/2 of degree 2, with σ as one of its roots. Then σ defines anormal basis of F2m over F2m/2.
Theorems 8 and 9 give us all polynomial and normal bases of F2m over F2m/2 .The following theorem essentially gives us all bases that are not related tothe above ones.
Theorem 10 Let m be an even integer, m > 2. Let p(x) = x2 +x+p0 andq(x) = x2 +x+ q0 be distinct irreducible polynomials over F2m/2, with rootsσ and θ respectively. Then {σ, θ} is a basis of F2m over F2m/2.
Proof: Since m is an even integer, we know that F2m/2 exists. Furthermore,since we also have m > 2, Theorem 8 assures that there are at least twodistinct irreducible polynomials of the given form. Assume that σ and θ
6.1. Bases of Tower Fields 87
are linearly dependent over F2m/2 . Then there is an element a ∈ F∗
2m/2 suchthat σ = aθ holds, and we have
p(aθ) = a2θ2 + aθ + p0 = 0.
But θ is a root of q(x) and hence θ2 = θ + q0 holds. Identifying in theexpression above we get
a2(θ + q0) + aθ + p0 = 0,
which gives us the two equations
a2 + a = 0,
a2q0 + p0 = 0,
using the fact that θ and 1 are linearly independent over F2m/2 . The onlynonzero solution is a = 1, q0 = p0. By assumption we have q0 6= p0.Therefore σ and θ cannot be linearly dependent over F2m/2 . Hence {σ, θ} isa basis of F2m over F2m/2 . 2
Now we are ready to partition the set of all bases of F2m over F2m/2 intothree classes. Let s and t be elements of F∗
2m/2 . Also let σ and θ be roots of
the distinct irreducible polynomials p(x) = x2+x+p0 and q(x) = x2+x+q0respectively. The set of possible bases of F2m over F2m/2 can be divided intothe following three types.
• Bases of the form {s, tθ}, which we call bases of type I.
• Bases of the form {sσ, tσ2m/2}, which we call bases of type II.
• Bases of the form {sσ, tθ}, which we call bases of type III.
The partition is chosen so that arithmetic operations in F2m basically areperformed in the same way using any basis within one class. Among thebases of type I are all polynomial bases of F2m over F2m/2 , and among thebases of type II are all normal bases of F2m over F2m/2 .
From Theorem 8 we conclude that there are 2m/2−1 irreducible polynomials
of the form x2 + x + ǫ over F2m/2 . Therefore there are(
2m/2−1
2
)
pairs
of distinct irreducible polynomials p(x) and q(x). Theorem 9 assures thateach polynomial has two linearly independent roots, and Theorem 10 assures
88 Chapter 6. Inversion in Tower Fields
that roots of distinct polynomials are linearly independent. Finally thereare 2m/2 − 1 possible choices each of s and t. Hence there are
2m/2(
2m/2 − 1)2
bases of type I,
2m/2−1(
2m/2 − 1)2
bases of type II, and
(
2m/2−1
2
)
22(
2m/2 − 1)2
bases of type III.
Summing these three numbers we get (2m − 1)(
2m − 2m/2)
/2 bases of typesI, II, and III, which is exactly the number of bases of F2m over F2m/2 . Weshould note that there are no bases of type III for m = 2.
6.2 Arithmetic Using Bases of Type I
Let s and t be nonzero elements of F2m/2 , and let θ be a root of the irreduciblepolynomial q(x) = x2 + x+ q0 over F2m/2 . Then {s, tθ} is a basis of type Ias mentioned in Section 6.1.
6.2.1 Inversion
Let α, β, and γ be elements of F2m . Let (a0, a1), (b0, b1), and (c0, c1) be therepresentations in the chosen basis of α, β, and γ respectively. That is wehave α = a0s + a1tθ, β = b0s + b1tθ, and γ = c0s + c1tθ. Assuming thatγ = αβ holds we get
c0 = sa0b0 + t2s−1q0a1b1, (6.1)
c1 = sa0b1 + sa1b0 + ta1b1. (6.2)
Now, assuming that we have γ = 1, we get c0 = s−1 and c1 = 0. SolvingEquations 6.1 and 6.2 for this case gives us
b0 =a0 + ts−1 · a1
(sa0 + tq1/20 · a1)2 + st · a0 · a1
, (6.3)
b1 =a1
(sa0 + tq1/20 · a1)2 + st · a0 · a1
. (6.4)
6.2. Arithmetic Using Bases of Type I 89
(a) (b)
(c) (d)
t/s
s
tq1/2
0
st
t/s
s2
t2q0
q1/2
0
q0
a0
a0
a0
a0
a1
a1
a1
a1
b0 b0
b0 b0
b1
b1
b1
b1
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Inversionin F2m/2
Inversionin F2m/2
Inversionin F2m/2
Inversionin F2m/2
Figure 6.1: Architectures for inversion in F2m using a basis of type I over F2m/2 .(a) Architecture A based on Equations 6.3 and 6.4 for arbitrary s and t.(b) Architecture B based on Equations 6.5 and 6.6 for arbitrary s and t.(c),(d) Architectures A and B for s = t = 1.
These equations define the inverter in Figure 6.1a. It uses at most four mul-tiplications by constants in F2m/2 . However, for the special case s = t = 1most of the constants in Equations 6.3 and 6.4 are 1 and we get the archi-tecture in Figure 6.1c using only one multiplication by a constant in F2m/2 .
We get an alternative inverter by rewriting Equations 6.3 and 6.4 as
b0 =a0 + ts−1 · a1
s2a0 · (a0 + ts−1 · a1) + t2q0 · a21
, (6.5)
b1 =a1
s2a0 · (a0 + ts−1 · a1) + t2q0 · a21
. (6.6)
These equations define the inverter in Figure 6.1b, which uses at mostthree multiplications by constants in F2m/2 . However, for the special case
90 Chapter 6. Inversion in Tower Fields
(a) (b)
t
t
s
s
s−1
s−1q0
t−1
q0
a0a0
a1a1
b0b0
b1b1
c0c0
c1c1
Figure 6.2: Architectures for multiplication in F2m using a basis of type I over F2m/2
based on Equations 6.7 and 6.8. (a) Architecture for arbitrary s and t.(b) Architecture for the special case where we have s = t = 1.
(a) (b)
sb0
t2s−1b1q0sb1
sb0 + tb1
sb0
s(b0+b1)
sq0b1
a0a0
a1a1 c0
c0 c1
c1
Figure 6.3: Architectures for multiplication by a constant in F2m using a basis oftype I over F2m/2 . (a) Architecture based on Equations 6.9 and 6.10 forarbitrary s and t. (b) Architecture based on Equations 6.11 and 6.12 forthe special case where we have s = t.
s = t = 1 most of the constants in Equations 6.5 and 6.6 are 1 and we getthe architecture in Figure 6.1d using only one multiplication by a constantin F2m/2 . This last architecture has previously been published by Morii andKasahara [36].
We assumed that m is a power of two and that F2m is constructed by succes-sive extensions of degree two, starting with F2. For the analysis we assumethat all intermediate fields are represented over their largest subfields usingsimilar bases. Hence we need architectures for multiplication, squaring, andmultiplication by a constant using the same type of basis. For simplicity wederive architectures for these operations in F2m over F2m/2 .
6.2. Arithmetic Using Bases of Type I 91
6.2.2 Multiplication
Multiplication in F2m can be performed in the following way. Rewrite Equa-tions 6.1 and 6.2 as
c0 = s−1 · sa0 · sb0 + s−1q0 · ta1 · tb1, (6.7)
c1 = t−1 · ((sa0 + ta1) · (sb0 + tb1) + sa0 · sb0) . (6.8)
These equations define the multiplier in Figure 6.2a using at most sevenmultiplications by constants in F2m/2 . As for inversion we are interestedin the special case s = t = 1. For this case most of the constants inEquations 6.7 and 6.8 are 1, and hence the corresponding multiplicationsby these constants are replaced by wires. In this way we get the simplifiedarchitecture in Figure 6.2b using only one multiplication by a constant inF2m/2 , previously published by Paar [39].
Multiplication by a constant is naturally less complex than multiplication oftwo variable elements. Assume that β is the constant element and rewriteEquations 6.1 and 6.2 as
c0 = sb0 · a0 + t2s−1q0b1 · a1, (6.9)
c1 = sb1 · a0 + (sb0 + tb1) · a1. (6.10)
This gives us the trivial architecture in Figure 6.3a using four multiplicationsby constants and two additions in F2m/2 . An alternative architecture can bederived if we have s = t. We rewrite Equations 6.9 and 6.10 for this specialcase as
c0 = sb0 · a0 + sq0b1 · a1, (6.11)
c1 = sb0 · a0 + s(b0 + b1) · (a0 + a1), (6.12)
and get the architecture in Figure 6.3b using three multiplications by con-stants and three additions in F2m/2 .
6.2.3 Squaring
Squaring can of course be performed using the suggested multiplier, but aless complex squarer can be obtained by using a dedicated architecture forsquaring. Setting β = α in Equations 6.1 and 6.2 we get
c0 = s · a20 + t2s−1q0 · a2
1, (6.13)
c1 = t · a21, (6.14)
92 Chapter 6. Inversion in Tower Fields
(a) (b)
s
t2s−1q0
t
q0
a0a0
a1a1
c0 c0
c1 c1
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Figure 6.4: Architectures for squaring in F2m using a basis of type I over F2m/2
based on Equations 6.13 and 6.14. (a) Architecture for arbitrary s andt. (b) Architecture for the special case where we have s = t = 1.
(a) (b)
s
t
t−1
s−1
p0 p0
a0a0
a1a1
b0b0
b1 b1
Squaringin F2m/2
Squaringin F2m/2
Inversionin F2m/2
Inversionin F2m/2
Figure 6.5: Architectures for inversion in F2m using a basis of type II over F2m/2
based on Equations 6.17 and 6.18. (a) Architecture for arbitrary s andt. (b) Architecture for the special case where we have s = t = 1.
which gives us the architecture in Figure 6.4a using three multiplications byconstants in F2m/2 . As above we are interested in the special case where wehave s = t = 1. Here most of the constants in Equations 6.13 and 6.14 are 1.We get the architecture in Figure 6.4b, which uses only one multiplicationby a constant in F2m/2 .
6.3 Arithmetic Using Bases of Type II
Let s and t be nonzero elements of F2m/2 , and let σ be a root of the irre-
ducible polynomial p(x) = x2 + x + p0 over F2m/2 . Then {sσ, tσ2m/2} is abasis of type II as mentioned in Section 6.1.
6.3. Arithmetic Using Bases of Type II 93
6.3.1 Inversion
Let α, β, and γ be elements of F2m . Let (a0, a1), (b0, b1), and (c0, c1) be therepresentations in the chosen basis of α, β, and γ respectively. That is wehave α = a0sσ + a1tσ
2m/2
, β = b0sσ + b1tσ2m/2
, and γ = c0sσ + c1tσ2m/2
.Assuming γ = αβ as in Section 6.2 we get
c0 = s(1 + p0)a0b0 + tp0a0b1 + tp0a1b0 + t2s−1p0a1b1, (6.15)
c1 = s2t−1p0a0b0 + sp0a0b1 + sp0a1b0 + t(1 + p0)a1b1, (6.16)
Assuming γ = 1, we get c0 = s−1 and c1 = t−1. Solving Equations 6.15and 6.16 for this case gives us
b0 =s−1 · ta1
p0 · (sa0 + ta1)2 + sa0 · ta1, (6.17)
b1 =t−1 · sa0
p0 · (sa0 + ta1)2 + sa0 · ta1. (6.18)
These equations define the inverter in Figure 6.5a, using at most five mul-tiplications by constants in F2m/2 . For the special case s = t = 1 we needonly one multiplication by a constant in F2m/2 , and we get the architecturein Figure 6.5b.
As in Section 6.2 we assume that any true subfield of F2m is represented overits largest true subfield using a similar basis. Hence we need architecturesfor multiplication, squaring, and multiplication by a constant using the sametype of basis.
6.3.2 Multiplication
Multiplication in F2m can be performed in the following way. Rewrite Equa-tions 6.15 and 6.16 as
c0 = s−1p0 · (sa0 + ta1) · (sb0 + tb1) + s−1 · sa0 · sb0, (6.19)
c1 = t−1p0 · (sa0 + ta1) · (sb0 + tb1) + t−1 · ta1 · tb1. (6.20)
These equations define the multiplier in Figure 6.6a, using at most eightmultiplications by constants in F2m/2 . For the special case where we haves = t = 1, we rewrite Equations 6.19 and 6.20 as
c0 = p0 · (a0 + a1) · (b0 + b1) + a0 · b0, (6.21)
c1 = p0 · (a0 + a1) · (b0 + b1) + a1 · b1. (6.22)
94 Chapter 6. Inversion in Tower Fields
(a) (b)
t
t
s
s s−1
t−1
p0/s
p0/t
p0
a0 a0
a1 a1
b0b0
b1 b1
c0
c0
c1
c1
Figure 6.6: Architectures for multiplication in F2m using a basis of type II over F2m/2 .(a) Architecture for arbitrary s and t based on Equations 6.19 and 6.20.(b) Architecture for s = t = 1 based on Equations 6.21 and 6.22.
(a) (b)
s(1 + p0)b0 + tp0b1
tp0b0 + t2s−1p0b1s2t−1p0b0 + sp0b1
sp0b0 + t(1 + p0)b1
sb0
sp0(b0+b1)
sb1
a0a0
a1a1
c0 c0
c1 c1
Figure 6.7: Architectures for multiplication by a constant in F2m using a basis oftype II over F2m/2 . (a) Architecture based on Equations 6.23 and 6.24for arbitrary s and t. (b) Architecture based on Equations 6.25 and 6.26for the special case where we have s = t.
(a) (b) (c)
s(1 + p0)
t2s−1p0
s2t−1p0
t(1 + p0)
s
sp0
s
p0
a0 a0
a0
a1 a1
a1
c0 c0c0
c1 c1c1
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Figure 6.8: Architectures for squaring in F2m using a basis of type II over F2m/2 .(a) Architecture for arbitrary s and t based on Equations 6.27 and 6.28.(b),(c) Architectures for the special cases where we have s = t ands = t = 1 respectively based on Equations 6.29 and 6.30.
6.3. Arithmetic Using Bases of Type II 95
which defines the architecture in Figure 6.6b, using only one multiplicationby a constant in F2m/2 .
For multiplication by a constant we assume that β is the constant element.Rewriting Equations 6.15 and 6.16 as
c0 = (s(1 + p0)b0 + tp0b1) · a0 +(
tp0b0 + t2s−1p0b1)
· a1, (6.23)
c1 =(
s2t−1p0b0 + sp0b1)
· a0 + (sp0b0 + t(1 + p0)b1) · a1, (6.24)
gives us the architecture in Figure 6.7a, using at most two adders and fourmultiplications by constants in F2m/2 . This architecture is the same as fortype I bases. The only difference is that the constants differ. As for typeI bases we can get an alternative architecture if we have s = t. RewritingEquations 6.23 and 6.24 for this special case as
c0 = sp0(b0 + b1) · (a0 + a1) + sb0 · a0, (6.25)
c1 = sp0(b0 + b1) · (a0 + a1) + sb1 · a1, (6.26)
we get the architecture in Figure 6.7b, using at most three adders and threemultiplications by constants in F2m/2 .
6.3.3 Squaring
For squaring we set β = α in Equations 6.15 and 6.16. Then we have
c0 = s(p0 + 1)a20 + t2s−1p0a
21, (6.27)
c1 = s2t−1p0a20 + t(p0 + 1)a2
1, (6.28)
which gives us the architecture in Figure 6.8a, using two adders and atmost four multiplications by constants in F2m/2 . We can get an alternativearchitecture for s = t. Rewrite Equations 6.27 and 6.28 as
c0 = sp0(a20 + a2
1) + sa20, (6.29)
c1 = sp0(a20 + a2
1) + sa21, (6.30)
for this special case. These equations define the architecture in Figure 6.8b,using three adders and at most three multiplications by constants in F2m/2 .For the case where we have s = t = 1 this architecture simplifies to the archi-tecture in Figure 6.8c, which uses three adders and only one multiplicationby a constant in F2m/2 .
96 Chapter 6. Inversion in Tower Fields
6.4 Arithmetic Using Bases of Type III
Let s and t be nonzero elements of F2m/2 . Also let p(x) = x2 + x+ p0 andq(x) = x2 + x + q0 be distinct irreducible polynomials over F2m/2 . Finallylet σ be a root of p(x) and let θ be a root of q(x). Then {sσ, tθ} is a basisof type III as mentioned in Section 6.1.
6.4.1 Inversion
Let α, β, and γ be elements of F2m . Let (a0, a1), (b0, b1), and (c0, c1) be therepresentations in the chosen basis of α, β, and γ respectively. It followsthat we have α = a0sσ + a1tθ, β = b0sσ + b1tθ, and γ = c0sσ + c1tθ.Assuming that γ = αβ holds we get
γ = a0b0s2σ2 + (a0b1 + a1b0)stσθ + a1b1t
2θ2. (6.31)
We need to express the products σ2, σθ, and θ2 in our basis. First we knowthat σ and θ are roots of p(x) and q(x) respectively. Therefore we have
σ2 = σ + p0, (6.32)
θ2 = θ + q0. (6.33)
Using Equations 6.32 and 6.33, we can rewrite Equation 6.31 as
γ = a0b0s2(σ + p0) + (a0b1 + a1b0)stσθ + a1b1t
2(θ + q0). (6.34)
Now we need to express 1 and σθ in our basis.
Define the polynomial r(x) = x2 + x+ p0 + q0. We have
TrF2m/2/F2
(p0 + q0) = TrF2m/2/F2
(p0) + TrF2m/2/F2
(q0) = 1 + 1 = 0,
and hence, according to Theorem 8, r(x) is reducible over F2m/2 . Moreover,we can easily verify that σ + θ is a root of r(x), the other being 1 + σ + θ.Therefore σ + θ is a nonzero element of F2m/2 . Define d , σ + θ. Then 1can be expressed as
1 = d−1σ + d−1θ, (6.35)
and our objective to express 1 in our basis is reached. By choosing p0 andq0 properly we can place d in any true subfield of F2m except F2.
6.4. Arithmetic Using Bases of Type III 97
Our irreducible polynomials can be expressed as p(x) = (x+ σ)(x+ σ2m/2
)
and q(x) = (x + θ)(x + θ2m/2
). Since the coefficient of x by definition is 1in both polynomials we have
1 = σ + σ2m/2
= θ + θ2m/2
. (6.36)
Using Equations 6.35 and 6.36 we find the equality
σθ =q0dσ +
p0
dθ, (6.37)
and our objective to express σθ in our basis is reached.
Using both Equations 6.35 and 6.37 in Equation 6.34 and identifying inγ = c0sσ + c1tθ we get
c0 = s(
1 +p0
d
)
a0b0 + tq0da0b1 + t
q0da1b0 +
t2
s
q0da1b1, (6.38)
c1 =s2
t
p0
da0b0 + s
p0
da0b1 + s
p0
da1b0 + t
(
1 +q0d
)
a1b1, (6.39)
Assuming γ = 1, we get c0 = (ds)−1 and c1 = (dt)−1. Solving Equa-tions 6.38 and 6.39 for this case gives us
b0 =(1 + d−1) · a0 + d−1ts−1 · a1
(sp1/20 · a0 + tq
1/20 · a1)2 + std · a0 · a1
, (6.40)
b1 =d−1st−1 · a0 + (1 + d−1) · a1
(sp1/20 · a0 + tq
1/20 · a1)2 + std · a0 · a1
. (6.41)
These equations define the inverter in Figure 6.9a, using at most sevenmultiplications by constants in F2m/2 .
The first special case of interest is q0 = p20, d = p0, and s = t = p−1
0 .Rewriting Equations 6.40 and 6.41 for this case gives us
b0 =a0 + p−1
0 (a0 + a1)
p−10 (a0 + a1) · a0 + a2
1
, (6.42)
b1 =a1 + p−1
0 (a0 + a1)
p−10 (a0 + a1) · a0 + a2
1
. (6.43)
These equations define the inverter in Figure 6.9b, using one multiplicationby a constant in F2m/2 .
98 Chapter 6. Inversion in Tower Fields
(a) (b)
(c)
1 + d−1
sp1/2
st−1d−1
ts−1d−1
tq1/2
1 + d−1
std
p−10
d
d dp0
a0
a0 a0
a1
a1 a1
b0
b0
b0
b1
b1
b1
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Inversionin F2m/2
Inversionin F2m/2
Inversionin F2m/2
Figure 6.9: Architectures for inversion in F2m using a basis of type III over F2m/2 .Architecture (a) for arbitrary s and t based on Equations 6.40 and 6.41.(b) for q0 = p2
0 and s = t = p−10 based on Equations 6.42 and 6.43.
(c) for q0 = p0 + 1 and s = t = d2 based on Equations 6.44 and 6.45.
(a) (b) (c)
s
s
t
t d/s
d/t
q0s−1d−1
p0t−1d−1 p−1
0
d
dp0
a0 a0a0
a1 a1a1
b0 b0b0
b1 b1b1
c0c0c0
c1 c1c1
Figure 6.10: Architectures for multiplication in F2m using a basis of type III overF2m/2 . (a) Architecture for arbitrary s and t based on Equations 6.46and 6.47. (b) Architecture for q0 = p2
0 and s = t = p−10 based on Equa-
tions 6.48 and 6.49. (c) Architecture for q0 = p0 + 1 and s = t = d−1
based on Equations 6.50 and 6.51.
6.4. Arithmetic Using Bases of Type III 99
The second special case of interest is when we have q0 = p0 + 1. Then d isa primitive element in F4. Let s = t = d2 hold. Rewriting Equations 6.40and 6.41 for this case gives us the equations
b0 =da0 + (a0 + a1)
(da1 + dp1/20 (a0 + a1))2 + a0a1
, (6.44)
b1 =da1 + (a0 + a1)
(da1 + dp1/20 (a0 + a1))2 + a0a1
, (6.45)
defining the inverter in Figure 6.9c. This architecture seems to use threemultiplications by constants in F2m/2 . However, two of these constantsequals d which, as we have noted, is in F4. This multiplication by a constantis simpler than multiplication by an arbitrary constant element in F2m/2 , aswe shall see further on. Hence the architecture uses one multiplication bya constant in F2m/2 and two multiplications by constants in F4.
As in Sections 6.2 and 6.3 we need architectures for multiplication, squaring,and multiplication by a constant using the same type of basis.
6.4.2 Multiplication
Multiplication in F2m can be performed in the following way. Rewrite Equa-tions 6.38 and 6.39 as
c0 = s−1q0d−1 · (sa0 + ta1) · (sb0 + tb1) + s−1d · sa0 · sb0, (6.46)
c1 = t−1p0d−1 · (sa0 + ta1) · (sb0 + tb1) + t−1d · ta1 · tb1. (6.47)
These equations define the multiplier in Figure 6.10a, which among otherthings uses four additions and at most eight multiplications by constants inF2m/2 .
For the special case when we have q0 = p20, d = p0, and s = t = p−1
0 , we canrewrite Equations 6.46 and 6.47 as
c0 = (a0 + a1) · (b0 + b1) + a0 · b0, (6.48)
c1 = p−10 · (a0 + a1) · (b0 + b1) + a1 · b1, (6.49)
defining the multiplier in Figure 6.10b. This architecture uses four additionsand one multiplication by a constant in F2m/2 .
100 Chapter 6. Inversion in Tower Fields
For the special case when we have q0 = p0 + 1, s = t = d2 with d being aprimitive element in F4, we can rewrite Equations 6.46 and 6.47 as
c0 = (a0 + a1) (b0 + b1) (1 + p0) d+ a0 · b0, (6.50)
c1 = (a0 + a1) (b0 + b1) p0d+ a1 · b1, (6.51)
defining the multiplier in Figure 6.10c, using five additions, one multipli-cation by a constant in F2m/2 , and one multiplication by a constant in F4.
For multiplication by a constant we assume that β is the constant elementand rewrite Equations 6.38 and 6.39 as
c0 =(
s(1 +p0
d)b0 + t
q0db1
)
a0 +
(
tq0db0 +
t2
s
p0
db1
)
a1, (6.52)
c1 =
(
s2
t
p0
db0 + s
p0
db1
)
a0 +(
sp0
db0 + t(1 +
q0d
)b1
)
a1. (6.53)
These equations define the architecture in Figure 6.11a, using two addersand at most four multiplications by constants in F2m/2 . This architecture isthe same as for type I and II bases, only the constants differ.
We have not been able to find any improvement on multiplication by aconstant for the special case when we have q0 = p2
0, d = p0, and s = t = p−10 .
However, for the special case when we have q0 = p0 + 1, s = t with d beinga primitive element in F4, we can rewrite Equations 6.52 and 6.53 as
c0 = sp0d2 ·(b0+b1)·(a0+a1)+sb1 ·a1+s
(
b0+d2b1)
·(
a0+d2a1
)
, (6.54)
c1 = sp0d2 ·(b0+b1)·(a0+a1)+d·sb1 ·a1. (6.55)
These equations define the architecture in Figure 6.11b, using five addersand at most three multiplications by constants in F2m/2 . It also uses twomultiplications by constants in F4.
6.4.3 Squaring
For squaring we set β = α in Equations 6.38 and 6.39. Then we have
c0 = s(1 + p0d−1) · a2
0 + t2s−1q0d−1 · a2
1, (6.56)
c1 = s2t−1p0d−1 · a2
0 + t(1 + q0d−1) · a2
1, (6.57)
which gives us the architecture in Figure 6.12a. This architecture uses twoadditions and at most four multiplications by constants in F2m/2 .
6.4. Arithmetic Using Bases of Type III 101
(a) (b)
s(1 + p0d−1)b0 + tq0d
−1b1
tq0d−1b0 + t2s−1q0d
−1b1s2t−1p0d
−1b0 + sp0d−1b1
sp0d−1b0 + t(1 + q0d
−1)b1
s(b0 + d2b1)
sd2p0(b0+b1)
sb1d2
d
a0a0
a1 a1
c0c0
c1c1
Figure 6.11: Architectures for multiplication by a constant in F2m using a basis oftype III over F2m/2 . (a) Architecture for arbitrary s and t based onEquations 6.52 and 6.53. (b) Architecture for q0 = p0 + 1 and s = tbased on Equations 6.54 and 6.55.
(a)
(b) (c)
s(1 + p0d−1)
t2s−1q0d−1
s2t−1p0d−1
t(1 + q0d−1)
p−10 d2
dp0
a0 a0
a0
a1
a1
a1c0
c0
c0c1
c1
c1
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Squaringin F2m/2
Figure 6.12: Architectures for squaring in F2m using a basis of type III over F2m/2 .Architecture (a) for arbitrary s and t based on Equations 6.56 and 6.57;(b) for q0 = p2
0 and s = t = p−10 based on Equations 6.58 and 6.59;
(c) for q0 = p0 + 1 and s = t = d2 based on Equations 6.60 and 6.61.
102 Chapter 6. Inversion in Tower Fields
For the special case when we have q0 = p20, d = p0, and s = t = p−1
0 , we canrewrite Equations 6.56 and 6.57 as
c0 = a21, (6.58)
c1 = p−10 (a2
0 + a21) + a2
1, (6.59)
defining the architecture in Figure 6.12b. This architecture uses two addi-tions and one multiplication by a constant in F2m/2 .
For the special case when we have q0 = p0 + 1, s = t = d2 with d being aprimitive element in F4, we can rewrite Equations 6.56 and 6.57 as
c0 = dp0(a20 + a2
1) + a21 + d2(a2
0 + a21), (6.60)
c1 = dp0(a20 + a2
1) + a21, (6.61)
defining the architecture in Figure 6.12c. It uses three additions and onemultiplication by a constant in F2m/2 . It also uses one multiplication by aconstant in F4.
6.5 Arithmetic in F4
Our objective is to compare architectures for inversion in tower fields usingbases of our three types. We primarily aim to analyze the architecturesfor inversion derived in Sections 6.2 through 6.4 using the cost measuresoutlined in Chapter 3. Since these architectures for F2m use arithmeticoperations in F2m/2 , we will get recursive equations for all measures. Whensolving these equations we will need corresponding measures for some smallsubfield. We therefore start the analysis by considering F4. As a byproductof the analysis of inversion we also get analyses of all arithmetic operationsfor which we have derived architecures in Sections 6.2 through 6.4.
We noted in Section 6.1 that there are no bases of type III of F4 over F2.Therefore we need to treat this field separately to make the comparison fair.There are essentially two ways to choose the representation of F4 and whatarchitectures to use for arithmetic operations in F4. The first alternativeis to fix the representation and the architectures without considering theconsequences in larger fields. This is a fast and simple method, howeverblind-folded. The most obvious risk is that we choose a representation that
6.5. Arithmetic in F4 103
favourizes one of our types of bases of the larger fields. The second alter-native is to use the representation of F4 and the architectures for F4 that isbest for each situation. This is the most fair comparison, but also the mostcumbersome. In this thorough investigation we need to consider all repre-sentations of F4 and all interesting architectures for arithmetic operationsusing these representations for all three types of bases of the larger fieldsand for all interesting architectures for arithmetic operations in these largerfields. We choose to use this second alternative.
There are three ways to choose two distinct nonzero elements of F4. Allthree choices are valid bases of F4 over F2. Two of them are polynomialbases and one is a normal basis. The two polynomial bases are equivalentin the sense that they give rise to equal architectures, due to the fact thatthey are generated by the same irreducible polynomial. Hence we have thefollowing two alternatives to consider.
• The polynomial basis {1, ǫ}, and
• the normal basis {ǫ, ǫ2},
where ǫ is a root of x2 + x+ 1.
6.5.1 Polynomial Basis Representation
The polynomial basis is a basis of type I. Therefore the equations corre-sponding to the arithmetic operations in F4 using this basis are specialcases of Equations 6.1 to 6.14 of Section 6.2 with s = t = q0 = 1.
Let α, β, and γ be elements of F4, satisfying αβ = γ as in Section 6.2.Let (a0, a1), (b0, b1), and (c0, c1) be the representations of α, β, and γ re-spectively expressed in the polynomial basis. Setting s = t = q0 = 1 inEquations 6.1 and 6.2 we get
c0 = a0b0 + a1b1, (6.62)
c1 = a0b1 + a1b0 + a1b1, (6.63)
for multiplication. This defines architecture A in Figure 6.13a.
104 Chapter 6. Inversion in Tower Fields
(e)
(a) (b)
(c) (d)
a0
a0
a0
a0
a0
a1a1
a1 a1
a1
b0
b0
b1b1
c0
c0c0
c0
c0 c1
c1c1
c1
c1
Figure 6.13: Architecture of (a) multiplier A, (b) multiplier B, (c)(d) multiplicationby the constants ǫ and ǫ2 respectively, and (e) inversion/squaring, for F4
using a polynomial basis over F2 defined by the irreducible polynomialp(x) = x2 + x+ 1 over F2. The bold lines denote critical paths.
Rewriting Equations 6.62 and 6.63 as
c0 = a0b0 + a1b1, (6.64)
c1 = (a0 + a1)(b0 + b1) + a0b0, (6.65)
gives us architecture B in Figure 6.13b, which is a special case of the archi-tecture in Figure 6.2b.
For multiplication by a constant we assume that β is the constant elementand rewrite Equations 6.62 and 6.63 as
c0 = b0a0 + b1a1, (6.66)
c1 = b1a0 + (b0 + b1)a1. (6.67)
The two interesting cases are when β is ǫ or ǫ2. For β = ǫ we have (b0, b1) =(0, 1), which gives us
c0 = a1, (6.68)
c1 = a0 + a1. (6.69)
6.5. Arithmetic in F4 105
For β = ǫ2 we have (b0, b1) = (1, 1), which gives us
c0 = a0 + a1, (6.70)
c1 = a0. (6.71)
Equations 6.68 through 6.71 define the simple architectures in Figure 6.13cand d.
For squaring we have α = β and Equations 6.62 and 6.63 become
c0 = a20 + a2
1, (6.72)
c1 = a21. (6.73)
Now a0 and a1 are elements of F2 and hence the square of these elementsare the elements themselves. Therefore squaring is given by
c0 = a0 + a1, (6.74)
c1 = a1. (6.75)
Equations 6.74 and 6.75 define the architecture in Figure 6.13d.
For inversion we use the fact that α3 = 1 holds for all nonzero α in F4.Therefore we have α−1 = α3α−1 = α2 and inversion is in fact given by thesame equations as squaring in F4. Hence we use the same architecture forinversion as for squaring.
6.5.2 Normal Basis Representation
The normal basis is a basis of type II. Therefore the equations correspond-ing to the arithmetic operations in F4 using this basis are special cases ofEquations 6.15 to 6.28 of Section 6.3 with s = t = p0 = 1.
Let α, β, and γ still be elements of F4, satisfying αβ = γ as in Section 6.3.Let (a0, a1), (b0, b1), and (c0, c1) be the representations of α, β, and γ re-spectively expressed in the normal basis. Setting s = t = p0 = 1 in Equa-tions 6.15 and 6.16 we get
c0 = a0b1 + a1b0 + a1b1, (6.76)
c1 = a0b1 + a1b0 + a0b0, (6.77)
106 Chapter 6. Inversion in Tower Fields
for multiplication. This defines architecture C in Figure 6.14a. RewritingEquations 6.76 and 6.77 as
c0 = (a0 + a1)(b0 + b1) + a0b0, (6.78)
c1 = (a0 + a1)(b0 + b1) + a1b1, (6.79)
gives us architecture D in Figure 6.14b, which is a special case of the archi-tecture in Figure 6.6b.
For multiplication by a constant we assume that β is the constant elementand rewrite Equations 6.76 and 6.77 as
c0 = b1a0 + (b0 + b1)a1, (6.80)
c1 = (b0 + b1)a0 + b0a1. (6.81)
The two interesting cases are still when β is ǫ or ǫ2. For β = ǫ we have(b0, b1) = (1, 0), which gives us
c0 = a1, (6.82)
c1 = a0 + a1. (6.83)
For β = ǫ2 we have (b0, b1) = (0, 1), which gives us
c0 = a0 + a1, (6.84)
c1 = a0. (6.85)
Equations 6.82 through 6.85 define the simple architectures in Figure 6.14cand d. It is notable that these equations are exactly the same as Equa-tions 6.68 through 6.71 for polynomial basis representation.
For squaring and inversion we have α = β and Equations 6.76 and 6.77become
c0 = a1, (6.86)
c1 = a0, (6.87)
where we again have used the fact that a0 and a1 are elements of F2. Squar-ing and inversion is therefore a cyclic shift which can be hardwired as inFigure 6.14d.
6.5. Arithmetic in F4 107
(e)
(a) (b)
(c) (d)
a0
a0a0
a0
a0
a1
a1
a1
a1
a1
b0
b0
b1b1
c0
c0c0
c0
c0 c1
c1
c1
c1
c1
Figure 6.14: Architecture of (a) multiplier C, (b) multiplier D, (c)(d) multiplicationby the constants ǫ and ǫ2 respectively, and (e) inversion/squaring, forF4 using a normal basis over F2 defined by the irreducible polynomialp(x) = x2 + x+ 1 over F2. The bold lines denote critical paths.
6.5.3 Properties of Arithmetic in F4
The properties of all architectures depend on the properties of the operationsin F2. The assumed implementations of multiplication and addition in F2 aregiven in Figures 3.1 and 3.2 on page 16. The properties of these operationshave already been given in Table 3.1 on page 28.
The properties of our architectures for arithmetic operations in F4 are gath-ered in Table 6.1a for polynomial basis representation and in Table 6.1b fornormal basis representation. There are also the measures of F4-buffers andF4-adders. They are implemented in the only natural way as two parallelF2-counterparts.
The measures are given assuming that inverter chains are used in the criticalpath whenever that minimizes the delay. That is the case only for multiplierarchitecture C.
108 Chapter 6. Inversion in Tower Fields
Arithmetic operation cin tint rout a
Buffer 2 2 1 8Addition 4 2 2 24Multiplication (Architecture A) 4 20 2 60Multiplication (Architecture B) 6 16 2 66Multiplication by a constant 6 2 2 16Squaring/Inversion 6 2 2 16
(a)
Arithmetic operation cin tint rout a
Buffer 2 2 1 8Addition 4 2 2 24Multiplication (Architecture C) 4 24 2 66Multiplication (Architecture D) 6 20 2 66Multiplication by a constant 6 2 2 16Squaring/Inversion cL 0 0 0
(b)
Table 6.1: The normalized properties of arithmetic operations in F4 using (a) poly-nomial basis representation and (b) normal basis representation, where cLis the normalized load capacitance.
(a) (b)
a00 a00
a01 a01
a10 a10
a11 a11
c00 c00
c01 c01
c10 c10
c11 c11
k0
k1
k2
k3
ǫ
ǫ
ǫ
ǫ
Connectionnetwork
Figure 6.15: Architecture of multiplication by a constant in F16. The individualconstants in F4 are assumed to be nonzero since that is the worstcase with respect to area. (a) General architecture. The shaded areasrepresent hardwired connections given by the corresponding constant.(b) Example that illustrates the worst case with respect to both areaand delay with k0 = k1 = k2 = k3 = ǫ. The bold lines denote criticalpaths.
6.6. Arithmetic in F16 109
6.6 Arithmetic in F16
We can make a few simplifications in F16. These simplifications are basedon the fact that squaring and inversion are the same thing in F4 and on thesimple architecture for squaring and multiplication by a constant for F4. Asbefore, (a0, a1), (b0, b1), and (c0, c1) represent the elements α, β, and γ ofthe field F16 over its coefficient field F4. The coefficients in F2 are indicatedby a second index. For instance, ai is represented by the pair (ai0, ai1).Some architectures in this section are given using operations in F2.
6.6.1 Multiplication by a Constant
Consider the architectures given in Figures 6.3a, 6.7a, and 6.11a, for thethree different types of bases. They are all defined by the equations
c0 = k0 · a0 + k1 · a1, (6.88)
c1 = k2 · a0 + k3 · a1, (6.89)
where ki, i = 0, . . . , 3, are constants in F4 given by the constant elementβ ∈ F16 and the chosen basis of F16 over F4. The only difference betweenthe three versions of the architecture is the actual values of the constants.Each input signal feeds two multiplications by constants in F4. We could usethe simple architecture in Figures 6.13cd and 6.14cd for this multiplicationin F4. However, the only difference between multiplying by one constantand by another in F4 is the connection of the output signals. Hence, we canbasically use the same architecture for both multiplications by constants ofthe same input signal. Therefore the multiplication by a constant in F16 canbe performed by the architecture given in Figure 6.15a, assuming the worstcase with respect to area requirement, which is when all ki are nonzero.A special case of the more general architecture in Figure 6.15a is given inFigure 6.15b with k0 = k1 = k2 = k3 = ǫ. The example in Figure 6.15b is aworst case with respect to both area and delay.
6.6.2 Squaring
The best basis of type I for F16 is the polynomial basis defined by a root ofx2 + x+ ǫ. Equations 6.13 and 6.14 then become
c0 = a20 + ǫ · a2
1, (6.90)
c1 = a21. (6.91)
110 Chapter 6. Inversion in Tower Fields
Expressing Equations 6.90 and 6.91 over F2 assuming a polynomial basisrepresentation of F4 gives us
c00 = a00 + a01 + a11, (6.92)
c01 = a01 + a10, (6.93)
c10 = a10 + a11, (6.94)
c11 = a11. (6.95)
Assuming a normal basis representation of F4 instead gives us
c00 = a01 + a10, (6.96)
c01 = a00 + a10 + a11, (6.97)
c10 = a11, (6.98)
c11 = a10. (6.99)
The corresponding architectures are given in Figure 6.16a and b respectively.The buffers reduce the input capacitance and remove the load capacitancefrom the internal delay.
The best basis of type II for F16 is the normal basis defined by a root ofx2 + x+ ǫ. Equations 6.27 and 6.28 then become
c0 = ǫ2 · a20 + ǫ · a2
1, (6.100)
c1 = ǫ · a20 + ǫ2 · a2
1. (6.101)
Expressing Equations 6.100 and 6.101 over F2 assuming a polynomial basisrepresentation of F4 gives us
c00 = a00 + a11, (6.102)
c01 = a00 + a01 + a10, (6.103)
c10 = a01 + a10, (6.104)
c11 = a00 + a10 + a11. (6.105)
Assuming a normal basis representation of F4 instead gives us
c00 = a00 + a01 + a10, (6.106)
c01 = a01 + a10 + a11, (6.107)
c10 = a00 + a10 + a11, (6.108)
c11 = a00 + a01 + a11. (6.109)
The corresponding architectures are given in Figure 6.16c and d respec-tively. Again the buffers reduce the input capacitance and remove the loadcapacitance from the internal delay.
6.6. Arithmetic in F16 111
(a) (b)
(c) (d)
(e) (f)
a00
a00
a00
a00
a00 a00
a01
a01
a01
a01
a01 a01
a10
a10
a10
a10
a10
a10
a11
a11
a11
a11
a11
a11
c00 c00
c00
c00
c00
c00
c01 c01
c01
c01
c01
c01
c10 c10
c10
c10c10
c10
c11
c11 c11
c11
c11
c11
Figure 6.16: Architectures of squaring in F16 assuming the following representatonsof F16 over F4 and of F4 over F2.
PB = polynomial basis. NB = normal basis.
(a) F16: type I with t = s = 1; F4: PB (Eqs. 6.92–6.95)(b) F16: type I with t = s = 1; F4: NB (Eqs. 6.96–6.99)(c) F16: type II with t = s = 1; F4: PB (Eqs. 6.102–6.105)(d) F16: type II with t = s = 1; F4: NB (Eqs. 6.106–6.109)(e) F16: type III with t = s = ǫ2; F4: PB (Eqs. 6.112–6.115)(f) F16: type III with t = s = ǫ2; F4: NB (Eqs. 6.116–6.119)
Buffers are used to reduce delays. The bold lines denote criticalpaths.
112 Chapter 6. Inversion in Tower Fields
For bases of type III we need two distinct irreducible polynomials of thepreferred form over F4. There are exactly two such polynomials, namelyx2 + x+ p0 and x2 + x+ q0 with p0 = ǫ and q0 = ǫ2. Here our two specialcases considered in Section 6.4 coinside since q0 = p2
0 = p0 + 1 holds and wehave d = ǫ. As in Section 6.4 we assume s = t = d−1 = ǫ2. Equations 6.56and 6.57 then become
c0 = a21, (6.110)
c1 = ǫ2 · a20 + ǫ · a2
1. (6.111)
Expressing Equations 6.110 and 6.111 over F2 assuming a polynomial basisrepresentation of F4 gives us
c00 = a10 + a11, (6.112)
c01 = a11, (6.113)
c10 = a00 + a01 + a11, (6.114)
c11 = a00 + a10. (6.115)
Assuming a normal basis representation of F4 instead gives us
c00 = a11, (6.116)
c01 = a10, (6.117)
c10 = a00 + a01 + a10, (6.118)
c11 = a01 + a10 + a11. (6.119)
The corresponding architectures are given in Figure 6.16e and f respectively.Again the buffers reduce the input capacitance and remove the load capac-itance from the internal delay.
6.6.3 Inversion
For the (type I) polynomial basis defined by a root of x2 + x+ ǫ we get
b0 = (a0 + a1)(
a0 + ǫ2a1 + (a0a1)2)
, (6.120)
b1 = a1
(
a0 + ǫ2a1 + (a0a1)2)
, (6.121)
from Equations 6.3 and 6.4 on page 88, using the fact that squaring andinversion is the same in F4. This gives us the architecture in Figure 6.17a,
6.6. Arithmetic in F16 113
which can be seen as a simplified version of the architecture in Figure 6.1cfor F16.
Rewriting Equations 6.120 and 6.121 we get
b0 = (a0 + a1)(
(a0(a0 + a1))2 + ǫ2a1
)
, (6.122)
b1 = a1
(
(a0(a0 + a1))2 + ǫ2a1
)
. (6.123)
This gives us the architecture in Figure 6.17b, which can be seen as a sim-plified version of the architecture in Figure 6.1d for F16.
For the (type II) normal basis defined by a root of x2 + x+ ǫ we get
b0 = a1
(
ǫ2(a0 + a1) + (a0a1)2)
, (6.124)
b1 = a0
(
ǫ2(a0 + a1) + (a0a1)2)
, (6.125)
from Equations 6.17 and 6.18 on page 93, again using the fact that squaringand inversion is the same in F4. This gives us the architecture in Fig-ure 6.17c, which can be seen as a simplified version of the architecture inFigure 6.5b for F16.
For the type III basis above defined by a root of x2 + x + ǫ and a root ofx2 + x+ ǫ2 we get
b0 =(
ǫ2a0 + a1
) (
a0 + ǫ2a1 + (a0a1)2)
, (6.126)
b1 =(
a0 + ǫ2a1
) (
a0 + ǫ2a1 + (a0a1)2)
, (6.127)
from Equations 6.40 and 6.41 on page 97, again using the fact that squaringand inversion is the same in F4. This gives us the architecture in Fig-ure 6.17d, which can be seen as a simplified version of the architecture inFigure 6.9c for F16.
6.6.4 Multiplication
We have not been able to find any significant improvement on multiplicationin F16 compared to the architectures derived in Sections 6.2 through 6.4.However, we can reduce the input capacitance for multiplication using basesof type II and III by inserting buffers. The placement of these buffers aregiven in Figure 6.18.
114 Chapter 6. Inversion in Tower Fields
(a) (b)
(c) (d)
a0
a0
a0
a0
a1
a1
a1
a1
b0
b0
b0
b0
b1
b1
b1
b1
ǫ2 ǫ2
ǫ2ǫ2
ǫ2
Squaringin F4
Squaringin F4
Squaringin F4
Squaringin F4
Figure 6.17: Architecture of inversion in F16 using (a)(b) a type I basis witht = s = 1, (c) a type II basis with t = s = 1, and (d) a type IIIbasis with t = s = ǫ2. The bold lines denote critical paths.
(a) (b) (c)
a0 a0 a0
a1 a1 a1
b0 b0 b0
b1 b1 b1
c0
c0
c0c1
c1
c1ǫ
ǫ
ǫ2
Figure 6.18: Architecture of multiplication in F16 using (a) a type I basis witht = s = 1, (b) a type II basis with t = s = 1, and (c) a type IIIbasis with t = s = ǫ2. The bold lines denote critical paths. The buffersreduce the input capacitance.
6.6. Arithmetic in F16 115
Arithmetic Basis Basisoperation of F16 of F4 cin tint rout a
Buffer All cases 2.00 2.00 1.00 16.00Addition All cases 4.00 2.00 2.00 48.00Multiplication All cases 6.00 16.00 2.00 92.00by a constant
Squaring type I PB 6.00 12.00 2.00 64.00NB 6.00 12.00 2.00 48.00
type II PB 6.00 14.93 2.00 74.93NB 6.00 16.00 2.00 100.00
type III PB 6.00 12.00 2.00 64.00NB 6.00 16.00 2.00 50.00
Table 6.2: The normalized properties of buffering, addition, multiplication by a con-stant, and squaring in F16 for all three types of bases of F16 over F4 andusing both polynomial basis (PB) and normal basis (NB) of F4 over F2.
6.6.5 Properties of Arithmetic in F16
The properties of our architectures depend on the properties of the opera-tions in F2 and in some cases on the properties of the operations in F4. Theproperties of operations in F2 are given in Table 3.1 on page 28, while theproperties of the operations in F4 are given in Table 6.1 on page 108.
The properties of buffering, addition, multiplication by a constant, andsquaring in F16 are given in Table 6.2. The properties of multiplication inF16 are given in Table 6.3. Finally the properties of inversion in F16 aregiven in Table 6.4. The properties are given assuming that inverter chainsare used in the critical path whenever that minimizes the delay.
We can note that we minimize both the area and the delay of inversion inF16 by choosing normal basis representation of F4 and a type II basis of F16
over F4 with s = t = 1. The area is minimized when we use F4-multiplierC, but the delay is minimized when we use F4-multiplier D.
On the other hand, for multiplication in F16 we minimize both the area andthe delay by choosing polynomial basis representation of F4 and a type Ibasis of F16 over F4 with s = t = 1. The area is minimized when we useF4-multiplier A, but the delay is minimized when we use F4-multiplier B.
116 Chapter 6. Inversion in Tower Fields
Type I Type II Type III
Basi
sof
F4
over
F2.
Mult
iplier
inF
4
Mult
iplier
inFig
ure
6.1
8a
Mult
iplier
inFig
ure
6.1
8b
Mult
iplier
inFig
ure
6.1
8c
Area PB A 297.46 335.46 330.80B 320.93 364.39 359.72
NB C 315.46 353.46 348.80D 320.93 364.39 359.72
Delay PB A 42.93 56.93 54.93B 38.93 55.86 53.85
NB C 46.93 60.93 58.93D 42.93 59.86 57.85
Input PB A 8.00 6.00 6.00
capacitance B 10.00 6.00 6.00
NB C 8.00 6.00 6.00
D 10.00 6.00 6.00
Output All cases 2.00 2.00 2.00
resistance
Table 6.3: The normalized area, delay, input capacitance, and output resistance ofmultiplication in F16 for all three types of bases of F16 over F4 and usingboth polynomial basis (PB) and normal basis (NB) of F4 over F2. Thesmallest numbers for each F16-multiplier are emphasized, and smallestnumbers for each choice in F4 are bold.
6.6. Arithmetic in F16 117
Type I Type II Type III
Basi
sof
F4
over
F2.
Mult
iplier
inF
4
Inver
ter
inFig
ure
6.1
7a
Inver
ter
inFig
ure
6.1
7b
Inver
ter
inFig
ure
6.1
7c
Inver
ter
inFig
ure
6.1
7d
Area PB A 322.93 318.93 298.93 338.93B 343.99 343.05 319.99 359.99
NB C 314.00 310.00 290.00 330.00D 317.06 316.12 293.06 333.06
Delay PB A 74.93 87.86 74.93 74.93
B 68.66 82.66 68.66 68.66
NB C 70.00 82.93 70.00 70.00
D 63.74 77.74 63.74 63.74
Input PB A 6.00 6.00 6.00 6.00
capacitance B 8.00 6.00 8.00 8.00
NB C 6.00 6.00 6.00 6.00
D 8.00 6.00 8.00 8.00
Output All cases 2.00 2.00 2.00 2.00
resistance
Table 6.4: The normalized area, delay, input capacitance, and output resistance ofinversion in F16 for all three types of bases of F16 over F4 and using bothpolynomial basis (PB) and normal basis (NB) of F4 over F2. The smallestnumbers for each F16-inverter are emphasized, and smallest numbers foreach choice in F4 are bold.
118 Chapter 6. Inversion in Tower Fields
6.7 Properties of Inversion in Tower Fields
We have assumed that we have m = 2k for some positive integer k. Wehave also assumed that the field F2m is constructed by successive extensionsof degree 2, starting with F2. For the analysis we assume that all theseextensions are made using the same type of basis. The only exception isF4 since there is no basis of type III of F4 over F2. For all three types ofbases of the larger fields we consider both polynomial and normal bases ofF4 over F2 and all four multiplier architectures derived in Section 6.6. Foreach type, we wish to find the smallest and the fastest inverter.
Each architecture for F2m in Sections 6.2 through 6.4 use arithmetic opera-tions in F2m/2 . Therefore we can express the properties of these architecturesusing corresponding properties of architectures for F2m/2 . This gives us a setof first order difference equations, and the solution of these equations givesus the explicit properties. The initial conditions of the equations are theproperties of our architectures for F16 in Section 6.6. We assume that opti-mal inverter chains are used whenever they reduce the delay of the criticalpath. We therefore will use the functions a (r, c) and t (r, c), the normalizedarea and delay respectively of an optimal inverter chain driven by a normal-ized resistance r and loaded by a normalized capacitance c, introduced inSection 3.3.1. Buffers are also used to reduce capacitive loads in the criticalpath to further reduce the delay whenever possible.
Let cA(k), rA(k), tA(k), and aA(k) denote the normalized input capaci-tance, output resistance, internal delay, and area respectively of additionin F
22k . We use similar notations with indices B, CM, and I for the corre-sponding properties of buffering, multiplication by a constant, and inversionrespectively in F
22k .
Buffering and addition are performed componentwise and therefore it is easyto establish the properties
cB(k) = 2, cA(k) = 4,rB(k) = 1, rA(k) = 2,tB(k) = 2, tA(k) = 2,aB(k) = 4 · 2k, aA(k) = 12 · 2k,
for buffering and addition.
6.7. Properties of Inversion in Tower Fields 119
For multiplication by a constant using the architecture in Figures 6.3a, 6.7a,and 6.11a on pages 90, 94, and 101 we have the equations
cCM(k) = 2cCM(k − 1),
rCM(k) = rA(k − 1),
tCM(k) = tCM(k − 1) + t (rCM(k − 1), cA(k − 1)) + tA(k − 1)
= tCM(k − 1) + 10,
aCM(k) = 4aCM(k − 1) + 4a (rCM(k − 1), cA(k − 1))m
2+ 2aA(k − 1)
= 4aCM(k − 1) + 12 · 2k.
The initial conditions for F16 are found in Table 6.2. They are cCM(2) = 6,rCM(2) = 2, tCM(2) = 16, and aCM(2) = 92. The solution is
cCM(k) = 3 · 2k−1,
rCM(k) = 2,
tCM(k) = 10k − 4,
aCM(k) = 8.75 · 4k − 12 · 2k.
Using the same recursive principle we get the properties of all our architec-tures.
In Section 6.4 we use a multiplication of an element of F22k by a primitive
constant in F4. The construction of our field as a tower field can be inter-preted as a resulting extension over F4 of degree 2k−1. Consequently, wecan implement this multiplication componentwise over F4 using 2k−1 copiesof the simple architectures in Figures 6.13cd and 6.14cd. As for bufferingand addition we can easily establish the properties
cCM,F4(k) = 6, tCM,F4
(k) = 2,rCM,F4
(k) = 2, aCM,F4(k) = 2k+3.
for multiplication by a constant in F4. The area, delay, as well as inputcapacitance, are considerably smaller than the corresponding properties formultiplication by a constant in F
22k .
Most architectures for the arithmetic operations that we have considered inthis chapter have the normalized output resistance 2. However, the differentarchitectures for inversion that we have derived do not have the same inputcapacitance. Therefore we assume that all inverters are driven from outputswith normalized resistance 2. Any inverter chains that are needed for theseinput links are included in the area and the delay of the inverters.
120 Chapter 6. Inversion in Tower Fields
6.7.1 Type I Bases
The smallest type I inverter, for all k > 2, is the inverter in Figure 6.1c, usingthe multiplication by a constant in Figure 6.3b, the squarer in Figure 6.4b,the multiplier in Figure 6.2b, and assuming a polynomial basis of F4. Weuse the F4 multiplier in Figure 6.13a and the F16 inverter in Figure 6.17b.For this configuration we have the properties
tI(k) = 7.6 · k3 − 8.4 · k2 + 19.4 · k + 21.5 + 4.4
k+1∑
i=4
log i,
aI(k) = 16.1 · 3kk + 46.7 · 3k − 18.5 · 2kk − 56.3 · 2k − 18.1.
The situation is somewhat more complicated for fast type I inverters. Fork = 3 the fastest inverter is the inverter in Figure 6.1c, using the multiplica-tion by a constant in Figure 6.3a, the squarer in Figure 6.4b, the multiplierin Figure 6.2b, assuming a normal basis of F4, with the F4 multiplier inFigure 6.14b. Then we have the properties
tI(3) = 203.6,
aI(3) = 1781.7.
The fastest inverter for k = 4 is instead the inverter in Figure 6.1d, still usingthe multiplication by a constant in Figure 6.3a, the squarer in Figure 6.4b,the multiplier in Figure 6.2b, assuming a normal basis of F4, with the F4
multiplier in Figure 6.14b. Then we have the properties
tI(4) = 421.2,
aI(4) = 7492.2.
For k = 5 the fastest inverter is again the inverter in Figure 6.1c, still usingthe multiplication by a constant in Figure 6.3a, the squarer in Figure 6.4b,the multiplier in Figure 6.2b, but assuming a polynomial basis of F4, withthe F4 multiplier in Figure 6.13b. Then we have the properties
tI(5) = 761.8,
aI(5) = 28449.
The fastest type I inverter, for all k > 5, is the inverter in Figure 6.1d,still using the multiplication by a constant in Figure 6.3a, the squarer inFigure 6.4b, the multiplier in Figure 6.2b, again assuming a polynomial
6.7. Properties of Inversion in Tower Fields 121
basis of F4, with the F4 multiplier in Figure 6.13b. For this configurationwe have the properties
tI(k) = 4.8 · k3 + 2.1 · k2 + 6.4 · k + 4.4
k+2∑
i=5
log i+ 49.7,
aI(k) = 15.4 · 4k + 70.2 · 3k + 3.0 · 2kk − 143.6 · 2k + 53.7,
for k > 5.
6.7.2 Type II Bases
The smallest type II inverter, for all k > 2, is the inverter in Figure 6.5b,using the multiplication by a constant in Figure 6.7b, the squarer in Fig-ure 6.8c, the multiplier in Figure 6.6b, and assuming a polynomial basisof F4. We use the F4 multiplier in Figure 6.13a and the F16 inverter inFigure 6.17c. For this configuration we have the properties
tI(k) = 9.0 · k3 − 16.0 · k2 + 55.7 · k − 33.4,
aI(k) = 17.4 · 3kk + 13.4 · 3k + 5.5 · 2kk − 2.1 · 2k − 149.1.
The fastest type II inverter, for all k > 2, is also the inverter in Figure 6.5b,but using the multiplication by a constant in Figure 6.7a, the squarer inFigure 6.8a, the multiplier in Figure 6.6b, and assuming a polynomial basisof F4. We use the F4 multiplier in Figure 6.13b and the F16 inverter inFigure 6.17c. For this configuration we have the properties
tI(k) = 4.8 · k3 + 11.0 · k2 − 0.6 · k,aI(k) = 20.5 · 4k + 79.4 · 3k − 12.0 · 2kk − 178.1 · 2k + 86.4.
6.7.3 Type III Bases
The smallest type III inverter, for all k > 2, is the inverter in Figure 6.9c,assuming q0 = p0+1 and s = t = d−1, using the multiplication by a constantin Figure 6.11b, the squarer in Figure 6.12c, the multiplier in Figure 6.10c,
122 Chapter 6. Inversion in Tower Fields
and assuming a polynomial basis of F4. We use the F4 multiplier in Fig-ure 6.13a and the F16 inverter in Figure 6.17d. For this configuration wehave the properties
tI(k) = 12.3 · k3 − 32.9 · k2 + 83.3 · k − 47.5,
aI(k) = 27.2 · 3kk + 22.0 · 3k − 13.3 · 2kk − 69.3 · 2k + 56.8.
The fastest type III inverter, for all k > 2, is the inverter in Figure 6.9a,also assuming q0 = p0 + 1 and s = t = d−1, using the multiplication by aconstant in Figure 6.11a, the squarer in Figure 6.12a, the multiplier in Fig-ure 6.10c, and assuming a polynomial basis of F4. We use the F4 multiplierin Figure 6.13b and the F16 inverter in Figure 6.17d. For this configurationwe have the properties
tI(k) = 4.8 · k3 + 12.0 · k2 − 7.9 · k + 5.9,
aI(k) = 23.9 · 4k + 87.9 · 3k − 12.0 · 2kk − 175.8 · 2k − 14.7.
6.7.4 Best Choices
In Figures 6.19 and 6.20 we have plotted the normalized time and area,respectively, needed for the fastest and smallest architectures for each typeof basis, for extension degree m = 2k, 1 ≤ k ≤ 6. We should note that thenormalized area of these architectures also serve as upper bounds on thenormalized energy of these architectures, since we have n = 1.
The fastest choice, as well as the smallest choice, for k = 1 is trivially the(type II) normal basis inverter in Figure 6.14d on page 107.
The fastest choice for k = 2 is any of the type I inverter in Figure 6.17a, thetype II inverter in Figure 6.17c, or the type III inverter in Figure 6.17d, usingnormal basis representation of F4, with the F4 multiplier in Figure 6.14b.However, the smallest of these three inverters is the type II inverter, andhence, that should be the natural choice when delay is the most importantmeasure. The fastest choices for k > 2 are the fast type I inverters in 6.7.1using a normal basis of F4 for 2 < k < 5 and using a polynomial basis of F4
for k ≥ 5.
6.7. Properties of Inversion in Tower Fields 123
0 10 20 30 40 50 600
5
10
15
20
25
30
35
40Tower field inverters
TF−I−small TF−I−fast TF−II−small TF−II−fast TF−III−smallTF−III−fast
m
nt/
m
Figure 6.19: Normalized time needed for inversion in F2m using tower field repre-sentations and the architectures considered in this chapter.
TF-I-small – Inversion using bases of type I, smallest case.TF-I-fast – Inversion using bases of type I, fastest case.TF-II-small – Inversion using bases of type II, smallest case.TF-II-fast – Inversion using bases of type II, fastest case.TF-III-small – Inversion using bases of type III, smallest case.TF-III-fast – Inversion using bases of type III, fastest case.
124 Chapter 6. Inversion in Tower Fields
0 10 20 30 40 50 600
5
10
15
20
25
30
35
40Tower field inverters
TF−I−small TF−I−fast TF−II−small TF−II−fast TF−III−smallTF−III−fast
m
a/m
2
Figure 6.20: Normalized area needed for inversion in F2m using tower field repre-sentations and the architectures considered in this chapter.
TF-I-small – Inversion using bases of type I, smallest case.TF-I-fast – Inversion using bases of type I, fastest case.TF-II-small – Inversion using bases of type II, smallest case.TF-II-fast – Inversion using bases of type II, fastest case.TF-III-small – Inversion using bases of type III, smallest case.TF-III-fast – Inversion using bases of type III, fastest case.
6.7. Properties of Inversion in Tower Fields 125
The smallest choice for k = 2 is the type II inverter in Figure 6.17c usingnormal basis representation of F4, with the F4 multiplier in Figure 6.14a.The smallest choice for k = 3 is the type I inverter in Figure 6.1c usingthe multiplication by a constant in Figure 6.3a, the squarer in Figure 6.4b,the multiplier in Figure 6.2b, assuming a normal basis of F4, with the F4
multiplier in Figure 6.14b. with the F4 multiplier in Figure 6.14a. Thesmallest choice for 3 < k < 25 is the small type II inverter in Section 6.7.2,where we instead use a polynomial basis of F4. The smallest choice fork ≥ 25 is the small type I inverter in Section 6.7.1, using a polynomial basisof F4. However, such large fields are hardly of any practical interest today.In Figures 6.19 and 6.20 we have plotted the normalized time, area, andenergy, respectively, needed for the architectures considered in this chapter,for 2 ≤ m ≤ 65. These plots are based on the cost measures given inSections 6.7.1 through 6.7.3.
The best choice of representation of F4 varies with the size of the field. Forsmall k the best choice for all three types is normal basis representation ofF4, but for large k the best choice is instead a polynomial basis. This has todo with the area and the delay of inversion and multiplication in F4. For anychoice of k we only have one F4 inverter but the number of F4 multipliersgrows with k. Therefore the size and delay of an F4 inverter is importantfor small k and the size and delay of an F4 multiplier is important for largek. In Table 6.1 on page 108 we see that the smallest polynomial basismultiplier is smaller than the smallest normal basis multiplier in F4. Also,the fastest polynomial basis multiplier is smaller than the fastest normalbasis multiplier. On the other hand, the smallest and fastest F4 inverter isthe trivial normal basis inverter.
Using a polynomial basis of F4 when a normal basis is the best, or using anormal basis of F4 when a polynomial basis is the best makes the propertiesslightly larger. The increase in both area and delay is a few per cent forsmall values of k. The increase is smaller for large values of k.
126 Chapter 6. Inversion in Tower Fields
Chapter 7
Concluding Remarks
The architectures considered in this thesis can be separated into the follow-ing two categories;
1. architectures based on a direct extension of F2, and
2. architectures based on extensions in more than one step.
The architectures can also be separated into the following two categories;
1. sequential architectures, and
2. parallel architectures.
These two ways to characterize the architectures coincide in this thesis. Thearchitectures in Chapters 4 and 5 for polynomial and normal basis represen-tation fall into category 1 of both characterizations, while the architecturesin Chapter 6 for tower fields fall into category 2 of both characterizations.This does not necessarily mean that the architectures in these two categoriesare uncomparable.
127
128 Chapter 7. Concluding Remarks
7.1 Conclusions
In Figures 7.1 through 7.3 we have plotted the normalized time, area, andenergy, respectively, needed for the best architectures considered in Chap-ters 4 through 6 for 2 ≤ m ≤ 65. Recall that we have neglected the delayof the control logic of the sequential inverters in Chapters 4 and 5. Uponstudying those figures, we note that there is no architecture that is the bestin all respects.
7.1.1 Fast Inverters
We find fast inverters in all three chapters, where we consider differentinverter architectures, Chapters 4 through 6. More precisely:
• m ∈ {2, 4, 8, 16, 32}:The fastest inverters are found in Chapter 6, where we use tower fieldrepresentation. For m ∈ {2, 4}, the fastest inverters are based on TypeII bases. For m ∈ {8, 16, 32}, we should instead prefer type I inverters.
• m ∈ {3, 5 . . . 7, 9 . . . 15, 17, 19 . . . 25, 27, 29, 31}:The fastest inverter is found in Chapter 4, where we use polynomial ba-sis representation. We should prefer the inverter based on the Gauss-Jordan algorithm using Hasan and Bhargava’s preprocessor.
• m ∈ {18, 26, 28, 30, 34 . . . 60}:The fastest inverter is found in Chapter 5, where we use normalbasis representation. For those values of the extension degree, thefastest choice is Feng’s inverter based on accellerated multiplicationand squaring.
• m > 60:We do not know the exact values of the smallest Hamming complexityamong normal bases for these extension degrees. However, the upperbound given in Theorem 4 on page 64 still garantees that the timeneeded for one inversion grows slower for Feng’s inverter than for allother considered inverters.
7.1. Conclusions 129
0 10 20 30 40 50 600
10
20
30
40
50
60
70Fast inverters
GJ−1 M&S−3 AOP:GJ−1 TF−I−fast TF−II−fast
m
nt/
m
Figure 7.1: Normalized time needed for inversion in F2m using the fastest architec-tures for polynomial basis, normal basis, and tower field representationsconsidered in Chapters 4 through 6.
GJ-1 – Polynomial basis inversion based on the Gauss-Jordan al-gorithm using Hasan and Bhargava’s preprocessor.M&S-3 – Normal basis inversion based on accellerated multiplicationand squaring using Fengs idea.AOP:GJ-1 – Normal basis inversion based on the Gauss-Jordanalgorithm using Hasan and Bhargava’s preprocessor for the case wherethe normal basis is generated from an irreducible all-one polynomial, viaa basis exchange to the corresponding polynomial basis.TF-I-fast – Tower field inversion using bases of type I, fastest case.TF-II-fast – Tower field inversion using bases of type II, fastest case.
130 Chapter 7. Concluding Remarks
7.1.2 Small Inverters
We find the smallest inverters in Chapter 6 for small extension degrees, andin Chapters 4 for large extension degrees. More precisely:
• m ∈ {2, 4}:The smallest inverters are found in Chapter 6, where we use towerfield representation. We should prefer the smallest type II inverters.
• m = 3 and m > 4:The smallest inverter is found in Chapter 4, where we use polyno-mial basis representation. For those values of the extension degree,the smallest choice is our inverter based on the Berlekamp-Masseyalgorithm.
7.1.3 Low Energy Inverters
We find inverters consuming little energy in all three chapters, where weconsider different inverter architectures, Chapters 4 through 6. More pre-cisely:
• m = 2k, k ∈ Z+:The least energy consuming inverters are found in Chapter 6, where weuse tower field representation. For m = 8, the least energy consuminginverter is the smallest type I inverter. For all other k < 25, thesmallest inverter is a type II inverter. For k ≥ 25, we should prefer atype I inverter.
• m ∈ {3, 5 . . . 7, 9 . . . 12, 14, 18, 26}:The least energy consuming inverters are found in Chapter 5, wherewe use normal basis representation. For m = 3, we prefer our modifiedversion of Mastrovito’s inverter. In all other cases, we prefer Feng’sinverter.
• All other cases:The least energy consuming inverter is found in Chapter 4, namely ourinverter based on the Berlekamp-Massey algorithm, using polynomialbasis representation.
7.1. Conclusions 131
0 10 20 30 40 50 600
50
100
150
200
250Small inverters
BM M&S−1&2 AOP:BM TF−I−small TF−II−small
m
a/m
2
Figure 7.2: Normalized area needed for inversion in F2m using the smallest architec-tures for polynomial basis, normal basis, and tower field representationsconsidered in Chapters 4 through 6.
BM – Polynomial basis inversion based on the Berlekamp-Masseyalgorithm.M&S-1&2 – Normal basis inversion based on multiplication andsquaring using our modified versions of the ideas of Wang et al. andMastrovito respectively.AOP:BM – Normal basis inversion based on the Berlekamp-Masseyalgorithm, for the case where the normal basis is generated from anirreducible all-one polynomial, via a basis exchange to the correspondingpolynomial basis.TF-I-small – Tower field inversion using bases of type I, smallest case.TF-II-small – Tower field inversion using bases of type II, smallestcase.
132 Chapter 7. Concluding Remarks
0 10 20 30 40 50 600
50
100
150
200
250
300
350
400
450
500Low energy inverters
BM M&S−2 M&S−3 AOP:BM TF−I−small TF−II−small
m
np/
m2
Figure 7.3: Normalized energy needed for inversion in F2m using the architecturesneeding the smallest amount of energy for polynomial basis, normalbasis, and tower field representations considered in Chapters 4 through 6.
BM – Polynomial basis inversion based on the Berlekamp-Masseyalgorithm.M&S-2 – Normal basis inversion based on multiplication and squaringusing our modified version of the idea of Mastrovito.M&S-3 – Normal basis inversion based on accellerated multiplicationand squaring using Fengs idea.AOP:BM – Normal basis inversion based on the Berlekamp-Masseyalgorithm, for the case where the normal basis is generated from anirreducible all-one polynomial, via a basis exchange to the correspondingpolynomial basis.TF-I-small – Tower field inversion using bases of type I, smallest case.TF-II-small – Tower field inversion using bases of type II, smallestcase.
7.2. Future Research 133
7.2 Future Research
Our upper bound on the Hamming complexity of normal bases does notseem to be especially good. We noted in Section 5.2 that the bound seemsto be off by approximately 10% for fairly large extension degrees. Is itpossible to find better upper bounds? Our investigation suggests that thecases m ≡ i mod 4, i ∈ {0, 1, 2, 3}, should be separated.
A possibly harder problem, but more interesting, would be the search foran upper bound on the smallest possible Hamming complexity of normalbases for all or some extension degrees.
There seems to be a lack of explicitly given normal bases with small Ham-ming complexity for many extension degrees > 32. Therefore a search fornormal bases for such bases is of interest, preferably with minimum Ham-ming complexity. This is a computationally demanding task. Methods forexplicit constructions are also of interest. Theorem 5 gives us a method forconstruction of normal bases for extension degrees that are not divisible byeight. A corresponding method for degrees that are divisible by eight wouldbe preferable as well.
The list in Appendix A of minimum weight irreducible polynomials over F2,for degrees up to 4000, contains only polynomials of weight ≤ 5. It is ourbelief that this holds for all degrees, but we have yet to prove that.
Most published architectures for multiplication and inversion are based oneither polynomial or normal basis representation. What about other bases?Are there other interesting, preferably infinite, classes of bases? Our in-vestigation of tower field representation, where the extension is a power oftwo, is one example. A natural extension of our investigation of tower fieldrepresentations, would be to consider for instance extension degrees thatare powers of three, or of the form 2i3j .
134 Chapter 7. Concluding Remarks
Appendix A
Minimum Weight Irreducible
Polynomials over F2
This appendix contains a table of irreducible polynomials, one polynomialfor each degree up to 4000. The polynomials are found using Magma bya lexicographic search among the possible trinomials of the given degrees.In case no irreducible trinomial is found for the given degree, the searchcontinues with pentanomials and so on. The first irreducible polynomialfound using this procedure is choosen. The result is that the polynomialchoosen has minimum possible weight for the given degree, m, and giventhat weight it has the minimum degree, k1, of the second term and so on.
To save space we have only listed the degrees of the mid terms, since irre-ducible polynomials over F2 always contain 1 and xm, where m is the degree.The table should be interpreted as follows. The polynomial is a trinomialif there is only one mid term, and it is a pentanomial if there are three midterms. As an example, take the entries for degrees 90 and 91:
m ki
90 2791 8, 5, 1
These two rows should be interpreted as the trinomial x90 + x27 + 1 and thepentanomial x91 + x8 + x5 + x+ 1.
135
136 Appendix A. Minimum Weight Irreducible Polynomials over F2.
A.1 The table
m ki
1
2 1
3 1
4 1
5 2
6 1
7 1
8 4, 3, 1
9 1
10 3
11 2
12 3
13 4, 3, 1
14 5
15 1
16 5, 3, 1
17 3
18 3
19 5, 2, 1
20 3
21 2
22 1
23 5
24 4, 3, 1
25 3
26 4, 3, 1
27 5, 2, 1
28 1
29 2
30 1
31 3
32 7, 3, 2
33 10
34 7
35 2
36 9
37 6, 4, 1
38 6, 5, 1
39 4
40 5, 4, 3
41 3
42 7
43 6, 4, 3
44 5
45 4, 3, 1
46 1
47 5
48 5, 3, 2
49 9
50 4, 3, 2
51 6, 3, 1
52 3
53 6, 2, 1
54 9
55 7
56 7, 4, 2
57 4
58 19
59 7, 4, 2
60 1
61 5, 2, 1
62 29
63 1
64 4, 3, 1
65 18
66 3
67 5, 2, 1
68 9
69 6, 5, 2
70 5, 3, 1
m ki
71 6
72 10, 9, 3
73 25
74 35
75 6, 3, 1
76 21
77 6, 5, 2
78 6, 5, 3
79 9
80 9, 4, 2
81 4
82 8, 3, 1
83 7, 4, 2
84 5
85 8, 2, 1
86 21
87 13
88 7, 6, 2
89 38
90 27
91 8, 5, 1
92 21
93 2
94 21
95 11
96 10, 9, 6
97 6
98 11
99 6, 3, 1
100 15
101 7, 6, 1
102 29
103 9
104 4, 3, 1
105 4
106 15
107 9, 7, 4
108 17
109 5, 4, 2
110 33
111 10
112 5, 4, 3
113 9
114 5, 3, 2
115 8, 7, 5
116 4, 2, 1
117 5, 2, 1
118 33
119 8
120 4, 3, 1
121 18
122 6, 2, 1
123 2
124 19
125 7, 6, 5
126 21
127 1
128 7, 2, 1
129 5
130 3
131 8, 3, 2
132 17
133 9, 8, 2
134 57
135 11
136 5, 3, 2
137 21
138 8, 7, 1
139 8, 5, 3
140 15
m ki
141 10, 4, 1
142 21
143 5, 3, 2
144 7, 4, 2
145 52
146 71
147 14
148 27
149 10, 9, 7
150 53
151 3
152 6, 3, 2
153 1
154 15
155 62
156 9
157 6, 5, 2
158 8, 6, 5
159 31
160 5, 3, 2
161 18
162 27
163 7, 6, 3
164 10, 8, 7
165 9, 8, 3
166 37
167 6
168 15, 3, 2
169 34
170 11
171 6, 5, 2
172 1
173 8, 5, 2
174 13
175 6
176 11, 3, 2
177 8
178 31
179 4, 2, 1
180 3
181 7, 6, 1
182 81
183 56
184 9, 8, 7
185 24
186 11
187 7, 6, 5
188 6, 5, 2
189 6, 5, 2
190 8, 7, 6
191 9
192 7, 2, 1
193 15
194 87
195 8, 3, 2
196 3
197 9, 4, 2
198 9
199 34
200 5, 3, 2
201 14
202 55
203 8, 7, 1
204 27
205 9, 5, 2
206 10, 9, 5
207 43
208 9, 3, 1
209 6
210 7
m ki
211 11, 10, 8
212 105
213 6, 5, 2
214 73
215 23
216 7, 3, 1
217 45
218 11
219 8, 4, 1
220 7
221 8, 6, 2
222 5, 4, 2
223 33
224 9, 8, 3
225 32
226 10, 7, 3
227 10, 9, 4
228 113
229 10, 4, 1
230 8, 7, 6
231 26
232 9, 4, 2
233 74
234 31
235 9, 6, 1
236 5
237 7, 4, 1
238 73
239 36
240 8, 5, 3
241 70
242 95
243 8, 5, 1
244 111
245 6, 4, 1
246 11, 2, 1
247 82
248 15, 14, 10
249 35
250 103
251 7, 4, 2
252 15
253 46
254 7, 2, 1
255 52
256 10, 5, 2
257 12
258 71
259 10, 6, 2
260 15
261 7, 6, 4
262 9, 8, 4
263 93
264 9, 6, 2
265 42
266 47
267 8, 6, 3
268 25
269 7, 6, 1
270 53
271 58
272 9, 3, 2
273 23
274 67
275 11, 10, 9
276 63
277 12, 6, 3
278 5
279 5
280 9, 5, 2
m ki
281 93
282 35
283 12, 7, 5
284 53
285 10, 7, 5
286 69
287 71
288 11, 10, 1
289 21
290 5, 3, 2
291 12, 11, 5
292 37
293 11, 6, 1
294 33
295 48
296 7, 3, 2
297 5
298 11, 8, 4
299 11, 6, 4
300 5
301 9, 5, 2
302 41
303 1
304 11, 2, 1
305 102
306 7, 3, 1
307 8, 4, 2
308 15
309 10, 6, 4
310 93
311 7, 5, 3
312 9, 7, 4
313 79
314 15
315 10, 9, 1
316 63
317 7, 4, 2
318 45
319 36
320 4, 3, 1
321 31
322 67
323 10, 3, 1
324 51
325 10, 5, 2
326 10, 3, 1
327 34
328 8, 3, 1
329 50
330 99
331 10, 6, 2
332 89
333 2
334 5, 2, 1
335 10, 7, 2
336 7, 4, 1
337 55
338 4, 3, 1
339 16, 10, 7
340 45
341 10, 8, 6
342 125
343 75
344 7, 2, 1
345 22
346 63
347 11, 10, 3
348 103
349 6, 5, 2
350 53
A.1. The table. 137
m ki
351 34
352 13, 11, 6
353 69
354 99
355 6, 5, 1
356 10, 9, 7
357 11, 10, 2
358 57
359 68
360 5, 3, 2
361 7, 4, 1
362 63
363 8, 5, 3
364 9
365 9, 6, 5
366 29
367 21
368 7, 3, 2
369 91
370 139
371 8, 3, 2
372 111
373 8, 7, 2
374 8, 6, 5
375 16
376 8, 7, 5
377 41
378 43
379 10, 8, 5
380 47
381 5, 2, 1
382 81
383 90
384 12, 3, 2
385 6
386 83
387 8, 7, 1
388 159
389 10, 9, 5
390 9
391 28
392 13, 10, 6
393 7
394 135
395 11, 6, 5
396 25
397 12, 7, 6
398 7, 6, 2
399 26
400 5, 3, 2
401 152
402 171
403 9, 8, 5
404 65
405 13, 8, 2
406 141
407 71
408 5, 3, 2
409 87
410 10, 4, 3
411 12, 10, 3
412 147
413 10, 7, 6
414 13
415 102
416 9, 5, 2
417 107
418 199
419 15, 5, 4
420 7
421 5, 4, 2
422 149
423 25
424 9, 7, 2
425 12
m ki
426 63
427 11, 6, 5
428 105
429 10, 8, 7
430 14, 6, 1
431 120
432 13, 4, 3
433 33
434 12, 11, 5
435 12, 9, 5
436 165
437 6, 2, 1
438 65
439 49
440 4, 3, 1
441 7
442 7, 5, 2
443 10, 6, 1
444 81
445 7, 6, 4
446 105
447 73
448 11, 6, 4
449 134
450 47
451 16, 10, 1
452 6, 5, 4
453 15, 6, 4
454 8, 6, 1
455 38
456 18, 9, 6
457 16
458 203
459 12, 5, 2
460 19
461 7, 6, 1
462 73
463 93
464 19, 18, 13
465 31
466 14, 11, 6
467 11, 6, 1
468 27
469 9, 5, 2
470 9
471 1
472 11, 3, 2
473 200
474 191
475 9, 8, 4
476 9
477 16, 15, 7
478 121
479 104
480 15, 9, 6
481 138
482 9, 6, 5
483 9, 6, 4
484 105
485 17, 16, 6
486 81
487 94
488 4, 3, 1
489 83
490 219
491 11, 6, 3
492 7
493 10, 5, 3
494 17
495 76
496 16, 5, 2
497 78
498 155
499 11, 6, 5
500 27
m ki
501 5, 4, 2
502 8, 5, 4
503 3
504 15, 14, 6
505 156
506 23
507 13, 6, 3
508 9
509 8, 7, 3
510 69
511 10
512 8, 5, 2
513 26
514 67
515 14, 7, 4
516 21
517 12, 10, 2
518 33
519 79
520 15, 11, 2
521 32
522 39
523 13, 6, 2
524 167
525 6, 4, 1
526 97
527 47
528 11, 6, 2
529 42
530 10, 7, 3
531 10, 5, 4
532 1
533 4, 3, 2
534 161
535 8, 6, 2
536 7, 5, 3
537 94
538 195
539 10, 5, 4
540 9
541 13, 10, 4
542 8, 6, 1
543 16
544 8, 3, 1
545 122
546 8, 2, 1
547 13, 7, 4
548 10, 5, 3
549 16, 4, 3
550 193
551 135
552 19, 16, 9
553 39
554 10, 8, 7
555 10, 9, 4
556 153
557 7, 6, 5
558 73
559 34
560 11, 9, 6
561 71
562 11, 4, 2
563 14, 7, 3
564 163
565 11, 6, 1
566 153
567 28
568 15, 7, 6
569 77
570 67
571 10, 5, 2
572 12, 8, 1
573 10, 6, 4
574 13
575 146
m ki
576 13, 4, 3
577 25
578 23, 22, 16
579 12, 9, 7
580 237
581 13, 7, 6
582 85
583 130
584 14, 13, 3
585 88
586 7, 5, 2
587 11, 6, 1
588 35
589 10, 4, 3
590 93
591 9, 6, 4
592 13, 6, 3
593 86
594 19
595 9, 2, 1
596 273
597 14, 12, 9
598 7, 6, 1
599 30
600 9, 5, 2
601 201
602 215
603 6, 4, 3
604 105
605 10, 7, 5
606 165
607 105
608 19, 13, 6
609 31
610 127
611 10, 4, 2
612 81
613 19, 10, 4
614 45
615 211
616 19, 10, 3
617 200
618 295
619 9, 8, 5
620 9
621 12, 6, 5
622 297
623 68
624 11, 6, 5
625 133
626 251
627 13, 8, 4
628 223
629 6, 5, 2
630 7, 4, 2
631 307
632 9, 2, 1
633 101
634 39
635 14, 10, 4
636 217
637 14, 9, 1
638 6, 5, 1
639 16
640 14, 3, 2
641 11
642 119
643 11, 3, 2
644 11, 6, 5
645 11, 8, 4
646 249
647 5
648 13, 3, 1
649 37
650 3
138 Appendix A. Minimum Weight Irreducible Polynomials over F2.
m ki
651 14
652 93
653 10, 8, 7
654 33
655 88
656 7, 5, 4
657 38
658 55
659 15, 4, 2
660 11
661 12, 11, 4
662 21
663 107
664 11, 9, 8
665 33
666 10, 7, 2
667 18, 7, 3
668 147
669 5, 4, 2
670 153
671 15
672 11, 6, 5
673 28
674 11, 7, 4
675 6, 3, 1
676 31
677 8, 4, 3
678 15, 5, 3
679 66
680 23, 16, 9
681 11, 9, 3
682 171
683 11, 6, 1
684 209
685 4, 3, 1
686 197
687 13
688 19, 14, 6
689 14
690 79
691 13, 6, 2
692 299
693 15, 8, 2
694 169
695 177
696 23, 10, 2
697 267
698 215
699 15, 10, 1
700 75
701 16, 4, 2
702 37
703 12, 7, 1
704 8, 3, 2
705 17
706 12, 11, 8
707 15, 8, 5
708 15
709 4, 3, 1
710 13, 12, 4
711 92
712 5, 4, 3
713 41
714 23
715 7, 4, 1
716 183
717 16, 7, 1
718 165
719 150
720 9, 6, 4
721 9
722 231
723 16, 10, 4
724 207
725 9, 6, 5
m ki
726 5
727 180
728 4, 3, 2
729 58
730 147
731 8, 6, 2
732 343
733 8, 7, 2
734 11, 6, 1
735 44
736 13, 8, 6
737 5
738 347
739 18, 16, 8
740 135
741 9, 8, 3
742 85
743 90
744 13, 11, 1
745 258
746 351
747 10, 6, 4
748 19
749 7, 6, 1
750 309
751 18
752 13, 10, 3
753 158
754 19
755 12, 10, 1
756 45
757 7, 6, 1
758 233
759 98
760 11, 6, 5
761 3
762 83
763 16, 14, 9
764 6, 5, 3
765 9, 7, 4
766 22, 19, 9
767 168
768 19, 17, 4
769 120
770 14, 5, 2
771 17, 15, 6
772 7
773 10, 8, 6
774 185
775 93
776 15, 14, 7
777 29
778 375
779 10, 8, 3
780 13
781 17, 16, 2
782 329
783 68
784 13, 9, 6
785 92
786 12, 10, 3
787 7, 6, 3
788 17, 10, 3
789 5, 2, 1
790 9, 6, 1
791 30
792 9, 7, 3
793 253
794 143
795 7, 4, 1
796 9, 4, 1
797 12, 10, 4
798 53
799 25
800 9, 7, 1
m ki
801 217
802 15, 13, 9
803 14, 9, 2
804 75
805 8, 7, 2
806 21
807 7
808 14, 3, 2
809 15
810 159
811 12, 10, 8
812 29
813 10, 3, 1
814 21
815 333
816 11, 8, 2
817 52
818 119
819 16, 9, 7
820 123
821 15, 11, 2
822 17
823 9
824 11, 6, 4
825 38
826 255
827 12, 10, 7
828 189
829 4, 3, 1
830 17, 10, 7
831 49
832 13, 5, 2
833 149
834 15
835 14, 7, 5
836 10, 9, 2
837 8, 6, 5
838 61
839 54
840 11, 5, 1
841 144
842 47
843 11, 10, 7
844 105
845 2
846 105
847 136
848 11, 4, 1
849 253
850 111
851 13, 10, 5
852 159
853 10, 7, 1
854 7, 5, 3
855 29
856 19, 10, 3
857 119
858 207
859 17, 15, 4
860 35
861 14
862 349
863 6, 3, 2
864 21, 10, 6
865 1
866 75
867 9, 5, 2
868 145
869 11, 7, 6
870 301
871 378
872 13, 3, 1
873 352
874 12, 7, 4
875 12, 8, 1
m ki
876 149
877 6, 5, 4
878 12, 9, 8
879 11
880 15, 7, 5
881 78
882 99
883 17, 16, 12
884 173
885 8, 7, 1
886 13, 9, 8
887 147
888 19, 18, 10
889 127
890 183
891 12, 4, 1
892 31
893 11, 8, 6
894 173
895 12
896 7, 5, 3
897 113
898 207
899 18, 15, 5
900 1
901 13, 7, 6
902 21
903 35
904 12, 7, 2
905 117
906 123
907 12, 10, 2
908 143
909 14, 4, 1
910 15, 9, 7
911 204
912 7, 5, 1
913 91
914 4, 2, 1
915 8, 6, 3
916 183
917 12, 10, 7
918 77
919 36
920 14, 9, 6
921 221
922 7, 6, 5
923 16, 14, 13
924 31
925 16, 15, 7
926 365
927 403
928 10, 3, 2
929 11, 4, 3
930 31
931 10, 9, 4
932 177
933 16, 6, 1
934 22, 6, 5
935 417
936 15, 13, 12
937 217
938 207
939 7, 5, 4
940 10, 7, 1
941 11, 6, 1
942 45
943 24
944 12, 11, 9
945 77
946 21, 20, 13
947 9, 6, 5
948 189
949 8, 3, 2
950 13, 12, 10
A.1. The table. 139
m ki
951 260
952 16, 9, 7
953 168
954 131
955 7, 6, 3
956 305
957 10, 9, 6
958 13, 9, 4
959 143
960 12, 9, 3
961 18
962 15, 8, 5
963 20, 9, 6
964 103
965 15, 4, 2
966 201
967 36
968 9, 5, 2
969 31
970 11, 7, 2
971 6, 2, 1
972 7
973 13, 6, 4
974 9, 8, 7
975 19
976 17, 10, 6
977 15
978 9, 3, 1
979 178
980 8, 7, 6
981 12, 6, 5
982 177
983 230
984 24, 9, 3
985 222
986 3
987 16, 13, 12
988 121
989 10, 4, 2
990 161
991 39
992 17, 15, 13
993 62
994 223
995 15, 12, 2
996 65
997 12, 6, 3
998 101
999 59
1000 5, 4, 3
1001 17
1002 5, 3, 2
1003 13, 8, 3
1004 10, 9, 7
1005 12, 8, 2
1006 5, 4, 3
1007 75
1008 19, 17, 8
1009 55
1010 99
1011 10, 7, 4
1012 115
1013 9, 8, 6
1014 385
1015 186
1016 15, 6, 3
1017 9, 4, 1
1018 12, 10, 5
1019 10, 8, 1
1020 135
1021 5, 2, 1
1022 317
1023 7
1024 19, 6, 1
1025 294
m ki
1026 35
1027 13, 12, 6
1028 119
1029 98
1030 93
1031 68
1032 21, 15, 3
1033 108
1034 75
1035 12, 6, 5
1036 411
1037 12, 7, 2
1038 13, 7, 2
1039 21
1040 15, 10, 8
1041 412
1042 439
1043 10, 7, 6
1044 41
1045 13, 9, 6
1046 8, 5, 2
1047 10
1048 15, 7, 2
1049 141
1050 159
1051 13, 12, 10
1052 291
1053 10, 9, 1
1054 105
1055 24
1056 11, 2, 1
1057 198
1058 27
1059 6, 3, 1
1060 439
1061 10, 3, 1
1062 49
1063 168
1064 13, 11, 9
1065 463
1066 10, 9, 3
1067 13, 9, 8
1068 15, 8, 3
1069 18, 16, 8
1070 15, 14, 11
1071 7
1072 19, 9, 8
1073 12, 6, 3
1074 7, 4, 3
1075 15, 14, 5
1076 8, 6, 3
1077 10, 9, 7
1078 361
1079 230
1080 15, 9, 6
1081 24
1082 407
1083 16, 7, 2
1084 189
1085 62
1086 189
1087 112
1088 22, 21, 10
1089 91
1090 79
1091 12, 10, 5
1092 23
1093 7, 6, 1
1094 57
1095 139
1096 24, 15, 6
1097 14
1098 83
1099 16, 9, 1
1100 35
m ki
1101 9, 7, 4
1102 117
1103 65
1104 21, 9, 6
1105 21
1106 195
1107 23, 11, 10
1108 327
1109 17, 14, 3
1110 417
1111 13
1112 15, 8, 6
1113 107
1114 19, 10, 6
1115 18, 15, 3
1116 59
1117 12, 10, 4
1118 9, 7, 5
1119 283
1120 13, 9, 6
1121 62
1122 427
1123 14, 7, 3
1124 8, 7, 4
1125 15, 8, 3
1126 105
1127 27
1128 7, 3, 1
1129 103
1130 551
1131 10, 6, 1
1132 6, 4, 1
1133 11, 6, 4
1134 129
1135 9
1136 9, 4, 2
1137 277
1138 31
1139 13, 12, 5
1140 141
1141 12, 7, 3
1142 357
1143 7, 2, 1
1144 11, 9, 7
1145 227
1146 131
1147 7, 6, 3
1148 23
1149 20, 17, 3
1150 13, 4, 1
1151 90
1152 15, 3, 2
1153 241
1154 75
1155 13, 6, 1
1156 307
1157 8, 7, 3
1158 245
1159 66
1160 15, 11, 2
1161 365
1162 18, 16, 11
1163 11, 10, 1
1164 19
1165 8, 6, 1
1166 189
1167 133
1168 12, 7, 2
1169 114
1170 27
1171 6, 5, 1
1172 15, 5, 2
1173 17, 14, 5
1174 133
1175 476
m ki
1176 11, 9, 3
1177 16
1178 375
1179 15, 8, 6
1180 25
1181 17, 11, 6
1182 77
1183 87
1184 5, 3, 2
1185 134
1186 171
1187 13, 8, 4
1188 75
1189 8, 3, 1
1190 233
1191 196
1192 9, 8, 7
1193 173
1194 15, 14, 12
1195 13, 6, 5
1196 281
1197 9, 8, 2
1198 405
1199 114
1200 15, 9, 6
1201 171
1202 287
1203 8, 4, 2
1204 43
1205 4, 2, 1
1206 513
1207 273
1208 11, 10, 6
1209 118
1210 243
1211 14, 7, 1
1212 203
1213 9, 5, 2
1214 257
1215 302
1216 27, 25, 9
1217 393
1218 91
1219 12, 10, 6
1220 413
1221 15, 14, 9
1222 18, 16, 1
1223 255
1224 12, 9, 7
1225 234
1226 167
1227 16, 13, 10
1228 27
1229 15, 6, 2
1230 433
1231 105
1232 25, 10, 2
1233 151
1234 427
1235 13, 9, 8
1236 49
1237 10, 6, 4
1238 153
1239 4
1240 17, 7, 5
1241 54
1242 203
1243 16, 15, 1
1244 16, 14, 7
1245 13, 6, 1
1246 25
1247 14
1248 15, 5, 3
1249 187
1250 15, 13, 10
140 Appendix A. Minimum Weight Irreducible Polynomials over F2.
m ki
1251 13, 10, 5
1252 97
1253 11, 10, 9
1254 19, 10, 4
1255 589
1256 31, 30, 2
1257 289
1258 9, 6, 4
1259 11, 8, 6
1260 21
1261 7, 4, 1
1262 7, 4, 2
1263 77
1264 5, 3, 2
1265 119
1266 7
1267 9, 5, 2
1268 345
1269 17, 10, 8
1270 333
1271 17
1272 16, 9, 7
1273 168
1274 15, 13, 4
1275 11, 10, 1
1276 217
1277 18, 11, 10
1278 189
1279 216
1280 12, 7, 5
1281 229
1282 231
1283 12, 9, 3
1284 223
1285 10, 9, 1
1286 153
1287 470
1288 23, 16, 6
1289 99
1290 10, 4, 3
1291 9, 8, 4
1292 12, 10, 1
1293 14, 9, 6
1294 201
1295 38
1296 15, 14, 2
1297 198
1298 399
1299 14, 11, 5
1300 75
1301 11, 10, 1
1302 77
1303 16, 12, 8
1304 20, 17, 15
1305 326
1306 39
1307 14, 12, 9
1308 495
1309 8, 3, 2
1310 333
1311 476
1312 15, 14, 2
1313 164
1314 19
1315 12, 4, 2
1316 8, 6, 3
1317 13, 12, 3
1318 12, 11, 5
1319 129
1320 12, 9, 3
1321 52
1322 10, 8, 3
1323 17, 16, 2
1324 337
1325 12, 9, 3
m ki
1326 397
1327 277
1328 21, 11, 3
1329 73
1330 11, 6, 1
1331 7, 5, 4
1332 95
1333 11, 3, 2
1334 617
1335 392
1336 8, 3, 2
1337 75
1338 315
1339 15, 6, 4
1340 125
1341 6, 5, 2
1342 15, 9, 7
1343 348
1344 15, 6, 1
1345 553
1346 6, 3, 2
1347 10, 9, 7
1348 553
1349 14, 10, 4
1350 237
1351 39
1352 17, 14, 6
1353 371
1354 255
1355 8, 4, 1
1356 131
1357 14, 6, 1
1358 117
1359 98
1360 5, 3, 2
1361 56
1362 655
1363 9, 5, 2
1364 239
1365 11, 8, 4
1366 1
1367 134
1368 15, 9, 5
1369 88
1370 10, 5, 3
1371 10, 9, 4
1372 181
1373 15, 11, 2
1374 609
1375 52
1376 19, 18, 10
1377 100
1378 7, 6, 3
1379 15, 8, 2
1380 183
1381 18, 7, 6
1382 10, 9, 2
1383 130
1384 11, 5, 1
1385 12
1386 219
1387 13, 10, 7
1388 11
1389 19, 9, 4
1390 129
1391 3
1392 17, 15, 5
1393 300
1394 17, 13, 9
1395 14, 6, 5
1396 97
1397 13, 8, 3
1398 601
1399 55
1400 8, 3, 1
m ki
1401 92
1402 127
1403 12, 11, 2
1404 81
1405 15, 10, 8
1406 13, 2, 1
1407 47
1408 14, 13, 6
1409 194
1410 383
1411 25, 14, 11
1412 125
1413 20, 19, 16
1414 429
1415 282
1416 10, 9, 6
1417 342
1418 5, 3, 2
1419 15, 9, 4
1420 33
1421 9, 4, 2
1422 49
1423 15
1424 11, 6, 2
1425 28
1426 103
1427 18, 17, 8
1428 27
1429 11, 6, 5
1430 33
1431 17
1432 11, 10, 6
1433 387
1434 363
1435 15, 10, 9
1436 83
1437 7, 6, 4
1438 357
1439 13, 12, 4
1440 14, 13, 7
1441 322
1442 395
1443 16, 5, 1
1444 595
1445 13, 10, 3
1446 421
1447 195
1448 11, 3, 2
1449 13
1450 16, 12, 3
1451 14, 3, 1
1452 315
1453 26, 10, 5
1454 297
1455 52
1456 9, 4, 2
1457 314
1458 243
1459 16, 14, 9
1460 185
1461 12, 5, 3
1462 13, 5, 2
1463 575
1464 12, 9, 3
1465 39
1466 311
1467 13, 5, 2
1468 181
1469 20, 18, 14
1470 49
1471 25
1472 11, 4, 1
1473 77
1474 17, 11, 10
1475 15, 14, 8
m ki
1476 21
1477 17, 10, 5
1478 69
1479 49
1480 11, 10, 2
1481 32
1482 411
1483 21, 16, 3
1484 11, 7, 4
1485 22, 10, 3
1486 85
1487 140
1488 9, 8, 6
1489 252
1490 279
1491 9, 5, 2
1492 307
1493 17, 10, 4
1494 13, 12, 9
1495 94
1496 13, 11, 4
1497 49
1498 17, 11, 10
1499 16, 12, 5
1500 25
1501 6, 5, 2
1502 12, 5, 1
1503 80
1504 8, 3, 2
1505 246
1506 11, 5, 2
1507 11, 10, 2
1508 599
1509 18, 12, 10
1510 189
1511 278
1512 10, 9, 3
1513 399
1514 299
1515 13, 10, 6
1516 277
1517 13, 10, 6
1518 69
1519 220
1520 13, 10, 3
1521 229
1522 18, 11, 10
1523 16, 15, 1
1524 27
1525 18, 9, 3
1526 473
1527 373
1528 18, 17, 7
1529 60
1530 207
1531 13, 9, 8
1532 22, 20, 13
1533 25, 18, 7
1534 225
1535 404
1536 21, 6, 2
1537 46
1538 6, 2, 1
1539 17, 12, 6
1540 75
1541 4, 2, 1
1542 365
1543 445
1544 11, 7, 1
1545 44
1546 10, 8, 5
1547 12, 5, 2
1548 63
1549 17, 4, 2
1550 189
A.1. The table. 141
m ki
1551 557
1552 19, 12, 2
1553 252
1554 99
1555 10, 8, 5
1556 65
1557 14, 9, 3
1558 9
1559 119
1560 8, 5, 2
1561 339
1562 95
1563 12, 9, 7
1564 7
1565 13, 10, 2
1566 77
1567 127
1568 21, 10, 7
1569 319
1570 667
1571 17, 10, 3
1572 501
1573 18, 12, 9
1574 9, 8, 5
1575 17
1576 20, 9, 2
1577 341
1578 731
1579 7, 6, 5
1580 647
1581 10, 4, 2
1582 121
1583 20
1584 21, 19, 13
1585 574
1586 399
1587 15, 10, 7
1588 85
1589 16, 8, 3
1590 169
1591 15
1592 12, 7, 5
1593 568
1594 10, 7, 1
1595 18, 2, 1
1596 3
1597 14, 3, 2
1598 13, 7, 3
1599 643
1600 14, 11, 1
1601 548
1602 783
1603 14, 11, 1
1604 317
1605 7, 6, 4
1606 153
1607 87
1608 15, 13, 1
1609 231
1610 11, 5, 3
1611 18, 13, 7
1612 771
1613 30, 20, 11
1614 15, 6, 3
1615 103
1616 13, 4, 3
1617 182
1618 211
1619 17, 6, 1
1620 27
1621 13, 12, 10
1622 15, 14, 10
1623 17
1624 13, 11, 5
1625 69
m ki
1626 11, 5, 1
1627 18, 6, 1
1628 603
1629 10, 4, 2
1630 741
1631 668
1632 17, 15, 3
1633 147
1634 227
1635 15, 10, 9
1636 37
1637 16, 6, 1
1638 173
1639 427
1640 7, 5, 1
1641 287
1642 231
1643 20, 15, 10
1644 18, 9, 1
1645 14, 12, 5
1646 16, 5, 1
1647 310
1648 18, 13, 1
1649 434
1650 579
1651 18, 13, 8
1652 45
1653 12, 8, 3
1654 16, 9, 5
1655 53
1656 19, 15, 10
1657 16
1658 17, 6, 5
1659 17, 10, 1
1660 37
1661 17, 10, 9
1662 21, 13, 7
1663 99
1664 17, 9, 6
1665 176
1666 271
1667 18, 17, 13
1668 459
1669 21, 17, 10
1670 6, 5, 2
1671 202
1672 5, 4, 3
1673 90
1674 755
1675 15, 7, 2
1676 363
1677 8, 4, 2
1678 129
1679 20
1680 11, 6, 2
1681 135
1682 15, 8, 7
1683 14, 13, 2
1684 10, 4, 3
1685 24, 13, 10
1686 19, 14, 11
1687 31
1688 15, 8, 6
1689 758
1690 16, 11, 5
1691 16, 5, 1
1692 359
1693 23, 18, 17
1694 501
1695 29
1696 15, 6, 3
1697 201
1698 459
1699 12, 10, 7
1700 225
m ki
1701 22, 17, 13
1702 24, 22, 5
1703 161
1704 14, 11, 3
1705 52
1706 19, 17, 6
1707 21, 14, 12
1708 93
1709 13, 10, 3
1710 201
1711 178
1712 15, 12, 5
1713 250
1714 7, 6, 4
1715 17, 13, 6
1716 221
1717 13, 11, 8
1718 17, 14, 9
1719 113
1720 17, 14, 10
1721 300
1722 39
1723 18, 13, 3
1724 261
1725 15, 14, 8
1726 753
1727 8, 4, 3
1728 11, 10, 5
1729 94
1730 15, 13, 1
1731 10, 4, 2
1732 14, 11, 10
1733 8, 6, 2
1734 461
1735 418
1736 19, 14, 6
1737 403
1738 267
1739 10, 9, 2
1740 259
1741 20, 4, 3
1742 869
1743 173
1744 19, 18, 2
1745 369
1746 255
1747 22, 12, 9
1748 567
1749 20, 11, 7
1750 457
1751 482
1752 6, 3, 2
1753 775
1754 19, 17, 6
1755 6, 4, 3
1756 99
1757 15, 14, 8
1758 6, 5, 2
1759 165
1760 8, 3, 2
1761 13, 12, 10
1762 25, 21, 17
1763 17, 14, 9
1764 105
1765 17, 15, 14
1766 10, 3, 2
1767 250
1768 25, 6, 5
1769 327
1770 279
1771 13, 6, 5
1772 371
1773 15, 9, 4
1774 117
1775 486
m ki
1776 10, 9, 3
1777 217
1778 635
1779 30, 27, 17
1780 457
1781 16, 6, 2
1782 57
1783 439
1784 23, 21, 6
1785 214
1786 20, 13, 6
1787 20, 16, 1
1788 819
1789 15, 11, 8
1790 593
1791 190
1792 17, 14, 3
1793 114
1794 21, 18, 3
1795 10, 5, 2
1796 12, 9, 5
1797 8, 6, 3
1798 69
1799 312
1800 22, 5, 2
1801 502
1802 843
1803 15, 10, 3
1804 747
1805 6, 5, 2
1806 101
1807 123
1808 19, 16, 9
1809 521
1810 171
1811 16, 7, 2
1812 12, 6, 5
1813 22, 21, 20
1814 545
1815 163
1816 23, 18, 1
1817 479
1818 495
1819 13, 6, 5
1820 11
1821 17, 5, 2
1822 18, 8, 1
1823 684
1824 7, 5, 1
1825 9
1826 18, 11, 3
1827 22, 20, 13
1828 273
1829 4, 3, 2
1830 381
1831 51
1832 18, 13, 7
1833 518
1834 9, 5, 1
1835 14, 12, 3
1836 243
1837 21, 17, 2
1838 53
1839 836
1840 21, 10, 2
1841 66
1842 12, 10, 7
1843 13, 9, 8
1844 339
1845 16, 11, 5
1846 901
1847 180
1848 16, 13, 3
1849 49
1850 6, 3, 2
142 Appendix A. Minimum Weight Irreducible Polynomials over F2.
m ki
1851 15, 4, 1
1852 16, 13, 6
1853 18, 15, 12
1854 885
1855 39
1856 11, 9, 4
1857 688
1858 16, 15, 7
1859 13, 10, 6
1860 13
1861 25, 23, 12
1862 149
1863 260
1864 11, 9, 1
1865 53
1866 11
1867 12, 4, 2
1868 9, 7, 5
1869 11, 8, 1
1870 121
1871 261
1872 10, 5, 2
1873 199
1874 20, 4, 3
1875 17, 9, 2
1876 13, 9, 4
1877 12, 8, 7
1878 253
1879 174
1880 15, 4, 2
1881 370
1882 9, 6, 1
1883 16, 10, 9
1884 669
1885 20, 10, 9
1886 833
1887 353
1888 17, 13, 2
1889 29
1890 371
1891 9, 8, 5
1892 8, 7, 1
1893 19, 8, 7
1894 12, 11, 10
1895 873
1896 26, 11, 2
1897 12, 9, 1
1898 10, 7, 2
1899 13, 6, 1
1900 235
1901 26, 24, 19
1902 733
1903 778
1904 12, 11, 1
1905 344
1906 931
1907 16, 6, 4
1908 945
1909 21, 19, 14
1910 18, 13, 11
1911 67
1912 20, 15, 10
1913 462
1914 14, 5, 1
1915 10, 9, 6
1916 18, 11, 10
1917 16, 9, 7
1918 477
1919 105
1920 11, 3, 2
1921 468
1922 23, 16, 15
1923 16, 15, 6
1924 327
1925 23, 10, 4
m ki
1926 357
1927 25
1928 17, 16, 7
1929 31
1930 7, 5, 2
1931 16, 7, 6
1932 277
1933 14, 13, 6
1934 413
1935 103
1936 15, 10, 1
1937 231
1938 747
1939 5, 2, 1
1940 113
1941 20, 10, 7
1942 15, 9, 6
1943 11
1944 27, 22, 18
1945 91
1946 51
1947 18, 13, 12
1948 603
1949 10, 7, 3
1950 9
1951 121
1952 15, 14, 6
1953 17
1954 16, 11, 2
1955 23, 15, 6
1956 279
1957 16, 12, 6
1958 89
1959 371
1960 17, 15, 2
1961 771
1962 99
1963 7, 6, 3
1964 21
1965 10, 7, 5
1966 801
1967 26
1968 25, 19, 14
1969 175
1970 10, 7, 2
1971 20, 5, 4
1972 12, 11, 1
1973 22, 5, 1
1974 165
1975 841
1976 25, 19, 17
1977 238
1978 11, 8, 6
1979 22, 21, 4
1980 33
1981 8, 7, 6
1982 14, 9, 2
1983 113
1984 13, 11, 5
1985 311
1986 891
1987 20, 16, 14
1988 555
1989 23, 14, 8
1990 133
1991 546
1992 6, 3, 2
1993 103
1994 15
1995 10, 7, 3
1996 307
1997 14, 10, 1
1998 15, 12, 2
1999 367
2000 13, 10, 6
m ki
2001 169
2002 22, 21, 11
2003 12, 10, 8
2004 441
2005 17, 12, 7
2006 917
2007 205
2008 26, 23, 13
2009 54
2010 459
2011 17, 15, 4
2012 19, 15, 4
2013 5, 4, 2
2014 9, 7, 6
2015 42
2016 21, 15, 7
2017 330
2018 20, 7, 3
2019 20, 7, 2
2020 81
2021 19, 14, 1
2022 349
2023 165
2024 40, 35, 9
2025 274
2026 475
2027 11, 10, 3
2028 93
2029 12, 7, 4
2030 13, 12, 2
2031 386
2032 7, 6, 2
2033 881
2034 143
2035 9, 8, 4
2036 71
2037 19, 18, 3
2038 16, 11, 6
2039 155
2040 7, 2, 1
2041 735
2042 16, 8, 7
2043 9, 7, 4
2044 45
2045 7, 6, 4
2046 12, 11, 3
2047 3
2048 19, 14, 13
2049 124
2050 15, 13, 8
2051 13, 6, 5
2052 323
2053 21, 13, 6
2054 201
2055 11
2056 13, 12, 3
2057 245
2058 343
2059 14, 12, 10
2060 387
2061 19, 4, 1
2062 16, 3, 2
2063 48
2064 17, 9, 2
2065 97
2066 71
2067 17, 13, 8
2068 18, 10, 7
2069 18, 9, 8
2070 237
2071 11, 5, 3
2072 13, 10, 3
2073 253
2074 231
2075 9, 7, 4
m ki
2076 851
2077 15, 14, 4
2078 16, 6, 5
2079 35
2080 4, 3, 1
2081 467
2082 523
2083 21, 11, 10
2084 4, 2, 1
2085 9, 8, 3
2086 261
2087 141
2088 18, 11, 5
2089 150
2090 9, 4, 1
2091 12, 9, 5
2092 17, 15, 7
2093 16, 15, 7
2094 645
2095 256
2096 19, 4, 2
2097 119
2098 19
2099 15, 12, 9
2100 35
2101 25, 22, 9
2102 33
2103 98
2104 19, 15, 9
2105 153
2106 111
2107 17, 10, 2
2108 21, 5, 3
2109 10, 5, 1
2110 12, 9, 6
2111 249
2112 16, 13, 7
2113 385
2114 155
2115 11, 10, 1
2116 25
2117 24, 16, 11
2118 385
2119 84
2120 17, 14, 6
2121 304
2122 91
2123 14, 11, 3
2124 45
2125 24, 17, 14
2126 881
2127 539
2128 23, 9, 1
2129 21
2130 239
2131 13, 6, 5
2132 213
2133 24, 22, 4
2134 23, 13, 2
2135 47
2136 15, 12, 9
2137 331
2138 13, 9, 2
2139 14, 4, 1
2140 283
2141 16, 3, 1
2142 69
2143 345
2144 13, 7, 3
2145 19
2146 595
2147 8, 3, 2
2148 549
2149 17, 9, 2
2150 569
A.1. The table. 143
m ki
2151 224
2152 24, 13, 7
2153 582
2154 10, 7, 5
2155 10, 9, 8
2156 405
2157 14, 4, 1
2158 93
2159 6
2160 31, 25, 14
2161 766
2162 47
2163 12, 9, 7
2164 561
2165 10, 4, 2
2166 693
2167 840
2168 11, 9, 3
2169 55
2170 411
2171 7, 6, 4
2172 6, 4, 1
2173 15, 8, 4
2174 225
2175 128
2176 15, 8, 1
2177 554
2178 15
2179 8, 7, 2
2180 111
2181 18, 12, 7
2182 93
2183 162
2184 11, 10, 5
2185 51
2186 51
2187 22, 11, 1
2188 99
2189 19, 8, 7
2190 441
2191 111
2192 8, 5, 3
2193 71
2194 15, 13, 9
2195 23, 22, 16
2196 539
2197 6, 5, 2
2198 893
2199 49
2200 20, 15, 5
2201 143
2202 15, 3, 2
2203 14, 6, 5
2204 11, 7, 1
2205 14, 7, 4
2206 793
2207 438
2208 21, 16, 6
2209 142
2210 539
2211 20, 14, 3
2212 423
2213 20, 19, 4
2214 1041
2215 39
2216 24, 7, 2
2217 455
2218 603
2219 22, 12, 11
2220 7
2221 17, 16, 6
2222 333
2223 17, 6, 2
2224 21, 19, 5
2225 47
m ki
2226 19, 16, 7
2227 14, 9, 8
2228 425
2229 17, 8, 7
2230 637
2231 654
2232 19, 17, 4
2233 249
2234 7, 6, 1
2235 20, 17, 11
2236 63
2237 7, 4, 2
2238 1053
2239 120
2240 23, 7, 1
2241 20
2242 7
2243 27, 15, 2
2244 399
2245 22, 12, 11
2246 23, 15, 6
2247 217
2248 9, 4, 3
2249 126
2250 927
2251 19, 16, 13
2252 75
2253 19, 14, 2
2254 10, 9, 2
2255 729
2256 14, 9, 6
2257 829
2258 983
2259 16, 10, 6
2260 12, 4, 1
2261 14, 12, 7
2262 57
2263 273
2264 15, 7, 2
2265 151
2266 343
2267 18, 17, 8
2268 115
2269 15, 10, 7
2270 369
2271 560
2272 21, 10, 9
2273 630
2274 239
2275 15, 12, 1
2276 21
2277 10, 4, 2
2278 17, 14, 7
2279 276
2280 13, 4, 2
2281 715
2282 975
2283 20, 13, 4
2284 889
2285 8, 6, 2
2286 249
2287 651
2288 17, 16, 7
2289 136
2290 23, 6, 5
2291 13, 10, 2
2292 89
2293 10, 8, 3
2294 21, 17, 10
2295 259
2296 15, 10, 1
2297 405
2298 15, 13, 3
2299 16, 6, 1
2300 95
m ki
2301 15, 9, 8
2302 15, 8, 1
2303 80
2304 8, 7, 5
2305 424
2306 551
2307 11, 7, 2
2308 31
2309 12, 10, 8
2310 233
2311 148
2312 19, 6, 4
2313 221
2314 879
2315 17, 15, 4
2316 21
2317 17, 4, 2
2318 245
2319 161
2320 13, 11, 5
2321 543
2322 83
2323 16, 3, 2
2324 717
2325 14, 8, 5
2326 13, 10, 7
2327 32
2328 15, 9, 2
2329 105
2330 15, 5, 1
2331 14
2332 349
2333 18, 15, 8
2334 1125
2335 553
2336 15, 10, 8
2337 523
2338 211
2339 10, 3, 2
2340 39
2341 24, 18, 16
2342 65
2343 415
2344 27, 26, 14
2345 29
2346 987
2347 11, 10, 2
2348 731
2349 31, 16, 9
2350 21, 19, 4
2351 950
2352 23, 20, 2
2353 328
2354 14, 11, 6
2355 12, 11, 6
2356 183
2357 10, 9, 8
2358 161
2359 172
2360 19, 10, 8
2361 646
2362 13, 10, 6
2363 9, 7, 4
2364 643
2365 21, 14, 5
2366 16, 13, 6
2367 610
2368 13, 11, 8
2369 77
2370 12, 11, 6
2371 20, 18, 17
2372 1139
2373 17, 14, 5
2374 24, 16, 13
2375 198
m ki
2376 7, 5, 4
2377 381
2378 243
2379 22, 9, 3
2380 1
2381 18, 12, 2
2382 429
2383 49
2384 21, 19, 1
2385 607
2386 11, 9, 1
2387 8, 7, 6
2388 11
2389 31, 12, 10
2390 629
2391 956
2392 31, 13, 3
2393 59
2394 423
2395 17, 8, 7
2396 173
2397 22, 17, 4
2398 15, 13, 11
2399 107
2400 20, 19, 17
2401 61
2402 251
2403 11, 8, 2
2404 67
2405 17, 14, 5
2406 14, 12, 5
2407 91
2408 23, 6, 4
2409 1198
2410 807
2411 12, 2, 1
2412 25
2413 11, 6, 1
2414 29
2415 154
2416 23, 6, 5
2417 225
2418 311
2419 22, 16, 6
2420 77
2421 11, 8, 4
2422 1117
2423 102
2424 21, 16, 6
2425 678
2426 20, 4, 3
2427 8, 6, 5
2428 301
2429 22, 14, 7
2430 477
2431 303
2432 29, 22, 19
2433 305
2434 507
2435 18, 6, 2
2436 145
2437 9, 4, 3
2438 929
2439 404
2440 12, 7, 5
2441 339
2442 127
2443 15, 13, 4
2444 1115
2445 23, 20, 10
2446 18, 13, 6
2447 786
2448 21, 10, 4
2449 621
2450 191
144 Appendix A. Minimum Weight Irreducible Polynomials over F2.
m ki
2451 10, 4, 3
2452 331
2453 21, 14, 11
2454 357
2455 313
2456 12, 5, 3
2457 238
2458 23, 20, 18
2459 17, 7, 4
2460 35
2461 19, 18, 10
2462 22, 13, 8
2463 1172
2464 5, 4, 3
2465 531
2466 599
2467 18, 14, 2
2468 99
2469 26, 16, 11
2470 217
2471 15, 6, 3
2472 12, 3, 1
2473 225
2474 899
2475 12, 11, 9
2476 17, 3, 2
2477 19, 17, 6
2478 765
2479 72
2480 20, 5, 2
2481 710
2482 11, 7, 6
2483 12, 11, 2
2484 523
2485 142
2486 19, 14, 9
2487 155
2488 23, 13, 9
2489 315
2490 8, 7, 5
2491 25, 16, 12
2492 141
2493 18, 15, 7
2494 13, 8, 2
2495 497
2496 12, 3, 1
2497 1171
2498 8, 7, 4
2499 13, 12, 9
2500 135
2501 22, 21, 5
2502 45
2503 316
2504 19, 8, 6
2505 131
2506 17, 11, 3
2507 13, 8, 1
2508 25
2509 14, 13, 3
2510 1113
2511 110
2512 29, 21, 7
2513 99
2514 183
2515 8, 7, 5
2516 563
2517 14, 4, 1
2518 18, 13, 2
2519 579
2520 31, 15, 13
2521 426
2522 16, 10, 5
2523 23, 17, 14
2524 15, 6, 4
2525 7, 6, 5
m ki
2526 141
2527 640
2528 19, 9, 4
2529 49
2530 14, 5, 3
2531 6, 2, 1
2532 26, 22, 13
2533 10, 3, 1
2534 185
2535 24, 19, 16
2536 21, 10, 9
2537 77
2538 315
2539 10, 9, 3
2540 209
2541 11, 8, 7
2542 97
2543 240
2544 21, 20, 6
2545 982
2546 891
2547 22, 10, 3
2548 373
2549 10, 9, 5
2550 333
2551 103
2552 28, 3, 2
2553 28
2554 1123
2555 9, 6, 2
2556 349
2557 18, 17, 7
2558 18, 8, 1
2559 23
2560 9, 3, 1
2561 201
2562 203
2563 12, 11, 10
2564 561
2565 25, 16, 14
2566 37
2567 122
2568 8, 5, 2
2569 69
2570 18, 15, 14
2571 18, 16, 9
2572 535
2573 12, 11, 3
2574 5
2575 867
2576 7, 2, 1
2577 674
2578 15, 7, 3
2579 23, 6, 1
2580 105
2581 26, 14, 12
2582 22, 19, 15
2583 31
2584 25, 19, 12
2585 263
2586 1047
2587 23, 12, 10
2588 13, 8, 1
2589 29, 11, 10
2590 1017
2591 219
2592 15, 12, 5
2593 297
2594 863
2595 24, 17, 2
2596 145
2597 16, 8, 7
2598 225
2599 289
2600 14, 13, 7
m ki
2601 406
2602 11, 6, 1
2603 18, 8, 7
2604 435
2605 19, 14, 5
2606 1181
2607 34
2608 15, 11, 2
2609 425
2610 427
2611 27, 17, 10
2612 21, 14, 6
2613 14, 12, 9
2614 553
2615 518
2616 17, 8, 7
2617 462
2618 71
2619 17, 10, 1
2620 835
2621 8, 7, 1
2622 11, 5, 3
2623 409
2624 15, 10, 4
2625 112
2626 43
2627 20, 17, 11
2628 47
2629 13, 9, 6
2630 177
2631 139
2632 19, 5, 3
2633 1241
2634 20, 11, 5
2635 25, 21, 14
2636 18, 11, 10
2637 9, 6, 4
2638 10, 3, 1
2639 144
2640 23, 11, 9
2641 736
2642 551
2643 16, 13, 10
2644 597
2645 18, 11, 10
2646 297
2647 513
2648 15, 8, 1
2649 689
2650 17, 13, 5
2651 7, 5, 4
2652 519
2653 17, 4, 2
2654 20, 16, 13
2655 53
2656 19, 11, 5
2657 242
2658 6, 3, 2
2659 20, 18, 16
2660 5
2661 17, 14, 2
2662 14, 12, 7
2663 458
2664 27, 21, 19
2665 772
2666 663
2667 254
2668 819
2669 18, 4, 2
2670 229
2671 46
2672 18, 7, 1
2673 530
2674 967
2675 13, 10, 9
m ki
2676 93
2677 17, 8, 6
2678 15, 6, 5
2679 286
2680 15, 9, 4
2681 635
2682 463
2683 11, 6, 1
2684 14, 12, 3
2685 8, 2, 1
2686 789
2687 225
2688 21, 10, 6
2689 36
2690 12, 9, 3
2691 14, 10, 8
2692 577
2693 10, 5, 3
2694 621
2695 123
2696 17, 15, 12
2697 170
2698 963
2699 32, 30, 29
2700 3
2701 12, 10, 5
2702 257
2703 67
2704 12, 9, 7
2705 12, 10, 5
2706 515
2707 9, 6, 4
2708 423
2709 10, 9, 3
2710 7, 3, 1
2711 690
2712 21, 12, 7
2713 840
2714 12, 8, 7
2715 30, 26, 15
2716 255
2717 14, 8, 3
2718 369
2719 102
2720 25, 18, 1
2721 826
2722 127
2723 9, 6, 5
2724 121
2725 21, 17, 2
2726 10, 6, 1
2727 430
2728 21, 7, 5
2729 96
2730 343
2731 15, 11, 2
2732 845
2733 19, 8, 7
2734 9, 5, 4
2735 933
2736 16, 3, 1
2737 226
2738 923
2739 12, 9, 5
2740 109
2741 6, 5, 4
2742 149
2743 447
2744 19, 18, 10
2745 484
2746 9, 7, 2
2747 15, 11, 6
2748 25
2749 22, 18, 17
2750 629
A.1. The table. 145
m ki
2751 49
2752 15, 4, 2
2753 716
2754 231
2755 13, 7, 6
2756 159
2757 24, 23, 12
2758 17, 5, 4
2759 842
2760 29, 26, 7
2761 108
2762 1319
2763 12, 10, 6
2764 687
2765 16, 10, 3
2766 1285
2767 102
2768 25, 19, 15
2769 269
2770 567
2771 13, 12, 5
2772 135
2773 30, 25, 20
2774 28, 3, 2
2775 802
2776 7, 3, 2
2777 22, 21, 17
2778 1095
2779 20, 17, 9
2780 51
2781 28, 27, 10
2782 22, 10, 9
2783 168
2784 29, 21, 15
2785 349
2786 339
2787 19, 18, 3
2788 21, 16, 2
2789 14, 12, 8
2790 837
2791 490
2792 12, 7, 2
2793 343
2794 11, 9, 4
2795 10, 8, 4
2796 769
2797 19, 6, 1
2798 20, 14, 5
2799 880
2800 17, 14, 6
2801 279
2802 18, 14, 3
2803 18, 16, 13
2804 609
2805 24, 8, 2
2806 729
2807 270
2808 15, 13, 1
2809 1342
2810 23, 10, 9
2811 10, 9, 7
2812 453
2813 13, 7, 6
2814 621
2815 84
2816 21, 19, 8
2817 109
2818 15, 9, 1
2819 10, 6, 5
2820 815
2821 16, 6, 4
2822 18, 17, 3
2823 592
2824 15, 14, 10
2825 288
m ki
2826 135
2827 19, 10, 6
2828 1103
2829 9, 6, 4
2830 17, 15, 13
2831 186
2832 27, 18, 1
2833 409
2834 15, 13, 7
2835 20, 13, 5
2836 1113
2837 17, 8, 3
2838 20, 4, 1
2839 1033
2840 20, 15, 9
2841 370
2842 1231
2843 7, 3, 2
2844 25
2845 10, 9, 1
2846 23, 15, 4
2847 329
2848 15, 8, 1
2849 114
2850 1411
2851 10, 7, 1
2852 1145
2853 14, 8, 1
2854 313
2855 41
2856 15, 13, 3
2857 756
2858 17, 9, 7
2859 29, 20, 11
2860 603
2861 20, 16, 10
2862 405
2863 139
2864 21, 17, 15
2865 212
2866 9, 7, 2
2867 15, 13, 10
2868 915
2869 8, 6, 1
2870 12, 11, 1
2871 272
2872 21, 5, 2
2873 75
2874 13, 6, 3
2875 20, 16, 2
2876 605
2877 10, 7, 4
2878 781
2879 149
2880 13, 10, 6
2881 1201
2882 1431
2883 16, 13, 12
2884 529
2885 13, 11, 6
2886 20, 14, 9
2887 469
2888 11, 4, 1
2889 76
2890 31
2891 16, 15, 10
2892 309
2893 27, 7, 2
2894 16, 14, 9
2895 358
2896 29, 6, 1
2897 15
2898 91
2899 19, 10, 1
2900 303
m ki
2901 11, 3, 2
2902 14, 10, 9
2903 279
2904 27, 15, 6
2905 321
2906 1155
2907 17, 14, 1
2908 19, 13, 10
2909 23, 22, 4
2910 1301
2911 685
2912 16, 9, 2
2913 238
2914 351
2915 18, 7, 5
2916 21
2917 16, 15, 4
2918 237
2919 149
2920 19, 9, 5
2921 480
2922 559
2923 11, 6, 5
2924 12, 4, 1
2925 12, 4, 3
2926 20, 14, 1
2927 974
2928 24, 21, 11
2929 651
2930 9, 4, 1
2931 13, 8, 1
2932 14, 7, 6
2933 15, 14, 13
2934 713
2935 13, 12, 7
2936 5, 3, 2
2937 172
2938 499
2939 30, 17, 5
2940 49
2941 23, 18, 17
2942 1425
2943 320
2944 5, 3, 2
2945 146
2946 551
2947 22, 20, 11
2948 17, 3, 2
2949 17, 7, 4
2950 397
2951 872
2952 17, 13, 2
2953 33
2954 9, 6, 5
2955 12, 10, 6
2956 823
2957 19, 14, 3
2958 23, 13, 5
2959 69
2960 12, 3, 2
2961 86
2962 319
2963 21, 14, 5
2964 83
2965 25, 22, 15
2966 861
2967 1028
2968 29, 27, 4
2969 561
2970 583
2971 18, 13, 2
2972 693
2973 18, 10, 4
2974 11, 3, 1
2975 192
m ki
2976 21, 10, 3
2977 126
2978 375
2979 12, 11, 6
2980 381
2981 13, 2, 1
2982 669
2983 330
2984 17, 9, 6
2985 166
2986 343
2987 8, 3, 2
2988 313
2989 18, 9, 7
2990 26, 22, 9
2991 292
2992 23, 3, 1
2993 569
2994 303
2995 9, 6, 4
2996 345
2997 12, 6, 5
2998 669
2999 1011
3000 15, 12, 9
3001 975
3002 22, 21, 10
3003 12, 11, 5
3004 351
3005 14, 12, 5
3006 15, 9, 6
3007 963
3008 15, 13, 1
3009 1349
3010 25, 12, 10
3011 22, 8, 6
3012 1327
3013 23, 6, 2
3014 17, 15, 5
3015 308
3016 38, 25, 9
3017 108
3018 203
3019 16, 6, 1
3020 413
3021 22, 10, 1
3022 14, 12, 1
3023 734
3024 32, 3, 2
3025 757
3026 19, 18, 13
3027 17, 16, 4
3028 135
3029 11, 6, 4
3030 12, 9, 4
3031 55
3032 17, 15, 4
3033 238
3034 399
3035 21, 20, 2
3036 391
3037 7, 6, 3
3038 633
3039 436
3040 27, 21, 3
3041 776
3042 415
3043 18, 16, 15
3044 69
3045 17, 14, 11
3046 1021
3047 19, 15, 4
3048 18, 3, 2
3049 765
3050 651
146 Appendix A. Minimum Weight Irreducible Polynomials over F2.
m ki
3051 19, 17, 16
3052 363
3053 22, 20, 15
3054 21, 4, 3
3055 13, 7, 1
3056 5, 4, 3
3057 110
3058 811
3059 15, 10, 1
3060 405
3061 22, 15, 1
3062 1053
3063 32
3064 25, 11, 9
3065 432
3066 455
3067 18, 16, 13
3068 215
3069 34, 26, 19
3070 20, 13, 8
3071 65
3072 11, 10, 5
3073 184
3074 17, 9, 3
3075 16, 14, 10
3076 475
3077 12, 10, 8
3078 105
3079 174
3080 21, 19, 16
3081 64
3082 9, 6, 1
3083 23, 20, 18
3084 109
3085 25, 14, 12
3086 1281
3087 49
3088 20, 13, 11
3089 261
3090 279
3091 12, 7, 5
3092 45
3093 14, 11, 8
3094 769
3095 419
3096 33, 29, 14
3097 1162
3098 18, 17, 11
3099 14, 13, 11
3100 45
3101 10, 7, 3
3102 225
3103 124
3104 23, 9, 5
3105 833
3106 6, 2, 1
3107 14, 12, 11
3108 61
3109 26, 20, 19
3110 1421
3111 199
3112 17, 15, 1
3113 191
3114 19, 15, 4
3115 25, 18, 16
3116 461
3117 19, 8, 4
3118 525
3119 315
3120 18, 17, 11
3121 493
3122 22, 7, 6
3123 15, 10, 4
3124 861
3125 24, 21, 18
m ki
3126 449
3127 139
3128 30, 19, 11
3129 23
3130 867
3131 22, 8, 7
3132 123
3133 6, 4, 3
3134 89
3135 356
3136 15, 12, 10
3137 587
3138 29, 19, 13
3139 14, 11, 10
3140 1115
3141 23, 18, 12
3142 981
3143 8
3144 23, 21, 8
3145 112
3146 18, 11, 6
3147 17, 10, 7
3148 1171
3149 22, 3, 2
3150 253
3151 1254
3152 21, 17, 6
3153 98
3154 19, 17, 6
3155 15, 12, 2
3156 565
3157 24, 14, 10
3158 19, 9, 5
3159 103
3160 7, 6, 2
3161 858
3162 315
3163 18, 13, 10
3164 113
3165 17, 13, 10
3166 18, 10, 1
3167 672
3168 33, 31, 18
3169 1123
3170 783
3171 19, 14, 13
3172 301
3173 20, 17, 14
3174 81
3175 646
3176 13, 10, 5
3177 484
3178 915
3179 22, 12, 2
3180 1085
3181 12, 10, 3
3182 1205
3183 1225
3184 11, 10, 2
3185 204
3186 891
3187 9, 8, 2
3188 129
3189 19, 18, 12
3190 12, 4, 1
3191 495
3192 25, 8, 7
3193 211
3194 1059
3195 19, 14, 1
3196 175
3197 22, 16, 14
3198 841
3199 54
3200 11, 6, 4
m ki
3201 674
3202 24, 12, 3
3203 14, 7, 3
3204 31
3205 17, 9, 2
3206 15, 8, 6
3207 704
3208 16, 13, 3
3209 81
3210 1303
3211 12, 10, 5
3212 1559
3213 30, 16, 1
3214 1197
3215 614
3216 21, 11, 3
3217 67
3218 10, 9, 8
3219 24, 10, 3
3220 19
3221 11, 6, 5
3222 145
3223 784
3224 23, 19, 1
3225 101
3226 9, 7, 5
3227 8, 7, 6
3228 1225
3229 12, 9, 7
3230 501
3231 15, 9, 8
3232 12, 9, 7
3233 575
3234 511
3235 21, 11, 8
3236 887
3237 19, 8, 4
3238 409
3239 98
3240 12, 3, 2
3241 127
3242 27, 13, 7
3243 22, 13, 5
3244 1249
3245 11, 10, 4
3246 1221
3247 426
3248 15, 8, 1
3249 149
3250 15, 11, 8
3251 9, 6, 5
3252 567
3253 10, 5, 3
3254 1485
3255 124
3256 31, 26, 2
3257 806
3258 203
3259 22, 4, 1
3260 237
3261 18, 12, 10
3262 15, 13, 7
3263 939
3264 17, 5, 2
3265 18, 16, 7
3266 19, 2, 1
3267 20, 19, 10
3268 73
3269 22, 3, 2
3270 237
3271 333
3272 23, 10, 1
3273 1408
3274 775
3275 24, 13, 10
m ki
3276 69
3277 25, 22, 1
3278 22, 12, 1
3279 446
3280 16, 15, 6
3281 47
3282 783
3283 30, 28, 21
3284 24, 17, 13
3285 18, 4, 1
3286 397
3287 717
3288 21, 18, 11
3289 43
3290 11, 7, 3
3291 18, 7, 1
3292 61
3293 20, 18, 15
3294 249
3295 594
3296 19, 14, 13
3297 7
3298 639
3299 18, 17, 14
3300 55
3301 24, 10, 4
3302 605
3303 1336
3304 19, 17, 3
3305 806
3306 127
3307 15, 10, 2
3308 717
3309 23, 20, 6
3310 1
3311 618
3312 14, 9, 3
3313 436
3314 1019
3315 12, 8, 2
3316 1641
3317 22, 17, 7
3318 585
3319 58
3320 17, 10, 4
3321 20
3322 567
3323 28, 14, 10
3324 173
3325 25, 19, 10
3326 1145
3327 875
3328 17, 9, 2
3329 525
3330 191
3331 18, 17, 11
3332 587
3333 16, 8, 7
3334 6, 4, 1
3335 636
3336 11, 10, 5
3337 370
3338 1155
3339 22, 16, 12
3340 11, 7, 5
3341 25, 19, 12
3342 9, 6, 5
3343 73
3344 30, 27, 15
3345 796
3346 15, 6, 1
3347 23, 18, 16
3348 177
3349 20, 19, 17
3350 1401
A.1. The table. 147
m ki
3351 731
3352 21, 20, 19
3353 389
3354 10, 9, 3
3355 10, 6, 4
3356 339
3357 24, 17, 15
3358 19, 8, 6
3359 99
3360 18, 15, 5
3361 12, 10, 4
3362 11, 7, 4
3363 14, 10, 2
3364 85
3365 24, 15, 2
3366 257
3367 136
3368 7, 5, 1
3369 1541
3370 15, 10, 1
3371 30, 29, 18
3372 47
3373 14, 6, 4
3374 417
3375 49
3376 11, 9, 1
3377 236
3378 623
3379 25, 20, 9
3380 659
3381 7, 4, 1
3382 217
3383 956
3384 21, 9, 3
3385 603
3386 19, 9, 2
3387 26, 25, 16
3388 169
3389 17, 15, 4
3390 1381
3391 465
3392 23, 13, 6
3393 1615
3394 13, 12, 3
3395 22, 10, 6
3396 13, 6, 1
3397 19, 4, 1
3398 245
3399 416
3400 14, 13, 6
3401 531
3402 387
3403 15, 12, 6
3404 173
3405 24, 9, 2
3406 22, 13, 12
3407 507
3408 16, 15, 6
3409 244
3410 1023
3411 14, 8, 5
3412 325
3413 14, 9, 6
3414 93
3415 1272
3416 28, 27, 1
3417 32
3418 15
3419 12, 9, 3
3420 423
3421 19, 14, 5
3422 1121
3423 11
3424 22, 15, 6
3425 189
m ki
3426 1071
3427 16, 12, 1
3428 17, 16, 13
3429 16, 12, 6
3430 153
3431 153
3432 25, 2, 1
3433 28, 25, 12
3434 14, 13, 12
3435 15, 14, 5
3436 159
3437 18, 16, 10
3438 393
3439 147
3440 27, 16, 1
3441 394
3442 8, 7, 3
3443 26, 19, 3
3444 69
3445 21, 5, 2
3446 21, 17, 8
3447 404
3448 17, 11, 6
3449 917
3450 11, 8, 3
3451 19, 14, 9
3452 1145
3453 16, 6, 1
3454 25, 23, 21
3455 21
3456 19, 18, 9
3457 120
3458 519
3459 19, 18, 12
3460 1495
3461 20, 10, 7
3462 225
3463 289
3464 11, 6, 3
3465 304
3466 43
3467 28, 26, 6
3468 921
3469 38, 16, 6
3470 917
3471 314
3472 17, 14, 7
3473 720
3474 735
3475 30, 16, 13
3476 525
3477 16, 15, 12
3478 465
3479 155
3480 19, 15, 13
3481 546
3482 15, 5, 4
3483 12, 5, 2
3484 1329
3485 8, 7, 4
3486 1085
3487 120
3488 12, 11, 1
3489 518
3490 16, 12, 3
3491 19, 14, 7
3492 57
3493 19, 18, 1
3494 25, 19, 9
3495 254
3496 35, 21, 4
3497 1025
3498 567
3499 29, 24, 4
3500 375
m ki
3501 15, 8, 2
3502 15, 13, 6
3503 993
3504 23, 17, 10
3505 103
3506 13, 5, 3
3507 21, 14, 6
3508 10, 7, 6
3509 23, 12, 7
3510 81
3511 1141
3512 37, 35, 6
3513 41
3514 11, 9, 4
3515 17, 10, 9
3516 667
3517 22, 14, 12
3518 16, 14, 9
3519 569
3520 32, 29, 3
3521 129
3522 399
3523 23, 12, 2
3524 1439
3525 10, 7, 5
3526 12, 11, 10
3527 476
3528 25, 18, 7
3529 270
3530 10, 9, 5
3531 18, 3, 1
3532 1561
3533 30, 3, 2
3534 973
3535 162
3536 12, 7, 5
3537 218
3538 13, 6, 5
3539 16, 2, 1
3540 75
3541 23, 7, 2
3542 345
3543 377
3544 21, 14, 2
3545 998
3546 151
3547 26, 23, 12
3548 255
3549 14, 6, 3
3550 1269
3551 183
3552 15, 9, 6
3553 13, 3, 2
3554 24, 23, 17
3555 28, 25, 15
3556 127
3557 14, 8, 5
3558 397
3559 69
3560 17, 3, 2
3561 257
3562 927
3563 18, 15, 6
3564 225
3565 22, 17, 12
3566 8, 6, 1
3567 24, 20, 12
3568 21, 12, 10
3569 1028
3570 699
3571 30, 13, 3
3572 1143
3573 13, 8, 2
3574 889
3575 339
m ki
3576 19, 10, 3
3577 348
3578 17, 9, 5
3579 20, 14, 6
3580 915
3581 22, 15, 2
3582 713
3583 747
3584 25, 12, 10
3585 7
3586 19, 14, 8
3587 26, 6, 5
3588 843
3589 30, 28, 8
3590 1713
3591 509
3592 38, 33, 14
3593 72
3594 59
3595 28, 14, 2
3596 383
3597 22, 9, 3
3598 24, 5, 1
3599 114
3600 9, 5, 2
3601 669
3602 10, 2, 1
3603 23, 11, 6
3604 637
3605 8, 7, 4
3606 861
3607 142
3608 15, 14, 10
3609 1016
3610 12, 5, 2
3611 18, 7, 1
3612 215
3613 17, 7, 6
3614 29
3615 47
3616 25, 18, 7
3617 377
3618 1539
3619 13, 12, 5
3620 231
3621 22, 21, 16
3622 481
3623 10, 9, 7
3624 29, 27, 12
3625 279
3626 26, 25, 13
3627 7, 6, 4
3628 957
3629 15, 10, 2
3630 729
3631 90
3632 26, 17, 5
3633 553
3634 651
3635 15, 8, 2
3636 391
3637 7, 6, 5
3638 28, 8, 1
3639 76
3640 20, 15, 10
3641 1626
3642 771
3643 14, 13, 8
3644 1365
3645 21, 14, 6
3646 20, 17, 6
3647 45
3648 23, 7, 2
3649 394
3650 1691
148 Appendix A. Minimum Weight Irreducible Polynomials over F2.
m ki
3651 15, 13, 6
3652 721
3653 10, 9, 8
3654 273
3655 112
3656 17, 12, 11
3657 928
3658 1471
3659 18, 13, 2
3660 61
3661 16, 11, 6
3662 1365
3663 130
3664 35, 24, 14
3665 189
3666 30, 20, 11
3667 15, 6, 4
3668 269
3669 22, 7, 4
3670 23, 4, 3
3671 101
3672 19, 17, 8
3673 544
3674 27, 15, 11
3675 30, 10, 9
3676 609
3677 25, 20, 7
3678 501
3679 21
3680 14, 13, 7
3681 115
3682 471
3683 15, 13, 10
3684 81
3685 9, 4, 3
3686 81
3687 889
3688 32, 13, 11
3689 759
3690 839
3691 26, 9, 2
3692 6, 5, 3
3693 26, 20, 18
3694 1129
3695 62
3696 36, 33, 22
3697 91
3698 1719
3699 24, 21, 5
3700 675
3701 4, 2, 1
3702 1281
3703 429
3704 14, 13, 1
3705 148
3706 1195
3707 11, 6, 1
3708 147
3709 16, 14, 6
3710 797
3711 1735
3712 13, 12, 7
3713 413
3714 459
3715 20, 18, 11
3716 24, 11, 4
3717 18, 15, 4
3718 23, 18, 10
3719 488
3720 17, 15, 11
3721 31
3722 15, 7, 5
3723 18, 6, 4
3724 10, 9, 8
3725 21, 14, 8
m ki
3726 609
3727 42
3728 9, 4, 2
3729 184
3730 1191
3731 26, 20, 5
3732 1327
3733 8, 7, 3
3734 1305
3735 46
3736 33, 22, 18
3737 287
3738 75
3739 18, 10, 5
3740 95
3741 16, 15, 4
3742 25, 18, 11
3743 279
3744 27, 14, 2
3745 684
3746 22, 9, 7
3747 32, 22, 11
3748 19, 11, 8
3749 15, 4, 1
3750 1013
3751 435
3752 9, 4, 2
3753 407
3754 1611
3755 15, 13, 8
3756 291
3757 18, 16, 5
3758 21, 20, 9
3759 208
3760 23, 9, 1
3761 30
3762 383
3763 23, 10, 2
3764 1307
3765 28, 19, 12
3766 21, 15, 1
3767 672
3768 14, 7, 2
3769 300
3770 107
3771 13, 10, 9
3772 61
3773 10, 9, 4
3774 24, 9, 4
3775 1416
3776 7, 5, 4
3777 1414
3778 9, 5, 1
3779 23, 8, 2
3780 63
3781 10, 9, 6
3782 1785
3783 272
3784 29, 13, 6
3785 87
3786 1027
3787 14, 6, 1
3788 1173
3789 16, 15, 4
3790 22, 21, 17
3791 45
3792 20, 7, 5
3793 481
3794 17, 4, 3
3795 8, 7, 5
3796 127
3797 16, 8, 6
3798 1337
3799 202
3800 24, 23, 21
m ki
3801 112
3802 16, 15, 8
3803 18, 15, 6
3804 349
3805 18, 12, 9
3806 9, 7, 5
3807 68
3808 29, 18, 4
3809 938
3810 323
3811 9, 8, 4
3812 1799
3813 11, 8, 7
3814 22, 21, 11
3815 143
3816 19, 13, 9
3817 252
3818 17, 8, 6
3819 16, 6, 3
3820 20, 11, 3
3821 8, 7, 6
3822 29
3823 609
3824 19, 13, 2
3825 437
3826 23, 8, 1
3827 18, 13, 8
3828 1217
3829 13, 9, 6
3830 713
3831 310
3832 35, 13, 2
3833 35
3834 567
3835 15, 5, 4
3836 681
3837 22, 18, 3
3838 273
3839 503
3840 27, 9, 1
3841 840
3842 1331
3843 16, 5, 2
3844 1063
3845 11, 10, 9
3846 693
3847 108
3848 29, 18, 13
3849 71
3850 583
3851 29, 24, 19
3852 169
3853 12, 7, 5
3854 765
3855 1399
3856 39, 25, 3
3857 50
3858 459
3859 14, 8, 7
3860 35
3861 31, 10, 2
3862 18, 16, 5
3863 834
3864 19, 15, 9
3865 289
3866 315
3867 20, 14, 6
3868 13, 12, 9
3869 24, 22, 13
3870 913
3871 264
3872 10, 3, 2
3873 32
3874 20, 8, 3
3875 11, 10, 4
m ki
3876 157
3877 17, 11, 4
3878 19, 9, 2
3879 121
3880 27, 5, 1
3881 810
3882 1775
3883 20, 9, 2
3884 45
3885 15, 8, 3
3886 273
3887 915
3888 45, 42, 6
3889 340
3890 20, 19, 10
3891 17, 9, 2
3892 289
3893 16, 13, 2
3894 1197
3895 777
3896 15, 7, 5
3897 310
3898 25, 9, 1
3899 21, 20, 12
3900 65
3901 26, 6, 1
3902 1845
3903 350
3904 17, 13, 2
3905 26
3906 251
3907 15, 4, 1
3908 855
3909 14, 12, 11
3910 28, 22, 13
3911 1673
3912 24, 11, 2
3913 393
3914 531
3915 25, 22, 9
3916 445
3917 16, 12, 11
3918 117
3919 285
3920 15, 13, 8
3921 785
3922 26, 21, 1
3923 24, 21, 3
3924 245
3925 18, 16, 5
3926 17, 16, 12
3927 367
3928 8, 7, 5
3929 1440
3930 199
3931 23, 9, 4
3932 1563
3933 30, 19, 3
3934 28, 12, 5
3935 20, 15, 8
3936 15, 5, 3
3937 252
3938 1835
3939 28, 19, 10
3940 21, 5, 2
3941 19, 11, 6
3942 57
3943 1125
3944 31, 29, 28
3945 427
3946 1155
3947 22, 10, 5
3948 293
3949 28, 22, 3
3950 873
A.1. The table. 149
m ki
3951 752
3952 11, 6, 5
3953 698
3954 503
3955 24, 8, 5
3956 429
3957 18, 16, 10
3958 27, 4, 2
3959 891
3960 29, 15, 2
3961 756
3962 255
3963 13, 8, 1
m ki
3964 735
3965 14, 3, 2
3966 337
3967 357
3968 25, 18, 14
3969 196
3970 163
3971 10, 7, 2
3972 595
3973 13, 11, 8
3974 861
3975 322
3976 36, 3, 1
m ki
3977 221
3978 19, 9, 7
3979 25, 9, 2
3980 16, 9, 4
3981 21, 11, 8
3982 21, 13, 8
3983 11
3984 19, 5, 2
3985 1038
3986 12, 8, 7
3987 11, 4, 2
3988 1017
3989 6, 5, 2
m ki
3990 469
3991 168
3992 27, 8, 6
3993 1468
3994 19, 12, 9
3995 12, 9, 8
3996 19
3997 16, 13, 3
3998 153
3999 1250
4000 31, 18, 17
150 Appendix A. Minimum Weight Irreducible Polynomials over F2.
Bibliography
[1] N. H. Abel. Memoires sur les equations algebriques ou on demontrel’impossibilite de la resolution de l’equation generale du cinquiemedegre. Christiania, 1824. Facsimile edition, Trondheim, 1976.
[2] K. Araki, I. Fujita, and M. Morisue. Fast Inverters Over Finite FieldsBased on Euclid’s Algorithm. Transactions of the Institute of Electron-ics, Information and Communication Engineers E, E72(11):1230–34,November 1989.
[3] Y. Asano, T. Itoh, and S. Tsujii. Generalised Fast Algorithm forComputing Multiplicative Inverses in GF (2m). Electronics Letters,25(10):664–65, May 1989.
[4] D. W. Ash, I. F. Blake, and S. A. Vanstone. Low Complexity NormalBases. Discrete Applied Mathematics, 25:191–210, 1989.
[5] Elwyn R. Berlekamp. Algebraic Coding Theory. McGraw-Hill BookCompany, New York, 1968.
[6] Elwyn R. Berlekamp. Bit-Serial Reed-Solomon Encoders. IEEE Trans-actions on Information Theory, IT-28(6):869–74, November 1982.
[7] Richard E. Blahut. Theory and Practice of Error Control Codes.Addison-Wesley publishing company, Reading, Massachusetts, 1983.
[8] Unjeng Cheng. On the Continued Fraction and Berlekamp’s Algorithm.IEEE Transactions on Information Theory, 30(3):541–44, 1984.
[9] George I. Davida. Inverse of Element of a Galois Field. ElectronicsLetters, 8(21):518–20, October 1972.
151
152 Bibliography
[10] Leonard Eugene Dickson. Linear Groups with an Exposition of theGalois Field Theory. Dover Publications Inc., Leipzig, 1901. Reprintedby Dover Publications, New York 1958.
[11] Jean Louis Dornstetter. On the Equivalence Between Berlekamp’sand Euclid’s Algorithms. IEEE Transactions on Information Theory,33(3):428–31, 1987.
[12] Willard L. Eastman. Inside Euclids Algorithm. In Dijen Ray-Chaudhuri, editor, Coding Theory and Design Theory. Part I: CodingTheory, pages 113–127, New York, 1990. Springer-Verlag.
[13] Gui-Liang Feng. A VLSI Architecture for Fast Inversion in GF (2m).IEEE Transactions on Computers, 38(10):1383–1386, October 1989.
[14] Gui-Liang Feng and Kenneth K. Tzeng. A Generalization of theBerlekamp-Massey Algorithm for Multisequence Shift-Register Synthe-sis with Applications to Decoding Cyclic Codes. IEEE Transactionson Information Theory, 37(5):1274–87, 1991.
[15] Evariste Galois. Sur la theorie des nombres. Bulletin des SciencesMathematiques de Ferussac, 13:428–35, June 1830.
[16] Joachim von zur Gathen. Inversion in finite fields using logarithmicdepth. Journal of Symbolic Computation, 9(2):175–83, 1990.
[17] Carl Friedrich Gauss. Werke I: Disquisitiones arithmeticae. Gottingen,1863.
[18] Willi Geiselmann. Algebraische Algorithmenentwicklung am Beispielder Arithmetik in endlichen Korpern. PhD thesis, Universitat Karls-ruhe, Karlsruhe, Germany, 1993.
[19] M. A. Hasan. Shift Register Synthesis For Multiplicative Inversion OverGF (2m). In IEEE International Symposium on Information Theory,page 49, Whistler, British Columbia, Canada, September 1995. IEEE.
[20] M. A. Hasan and V. K. Bhargava. Multiplication and Inversion Overa Class of GF (2m). In IEEE Pacific Rim Conference on Communica-tions, Computers and Signal Processing, volume 1, pages 211–14, NewYork, May 1991. IEEE.
Bibliography 153
[21] M. A. Hasan and V. K. Bhargava. A VLSI Architecture for a Low Com-plexity Rate-Adaptive Reed-Solomon Encoder. In 16:th Biennial Sym-posium on Communications, pages 331–34, Kingston, Canada, May1992.
[22] M. A. Hasan and V. K. Bhargava. Bit-serial Systolic Divider and Mul-tiplier for Finite Fields GF (2m). IEEE Transactions on Computers,41(8):972–80, August 1992.
[23] M. A. Hasan and V. K. Bhargava. Division and Bit-serial Multiplicationover GF (qm). IEE Proceedings E, 139(3):230–36, May 1992.
[24] Nils Hedenstierna and Kjell O. Jeppson. CMOS Circuit Speed andBuffer Optimization. IEEE Transactions on Computer-Aided Design,CAD-6(2):270–81, March 1987.
[25] I. N. Herstein. Topics in Algebra. John Wiley & sons, New York, 1975.
[26] T. Itoh and S. Tsujii. A Fast Algorithm for Computing MultiplicativeInverses in GF (2m) Using Normal Bases. Information and Computa-tion, 78(3):171–77, September 1988.
[27] T. Itoh and S. Tsujii. Effective Recursive Algorithm for ComputingMultiplicative Inverses in GF (2m). Electronics Letters, 24(6):334–35,March 1988.
[28] M. Kovac, N. Ranganathan, and M. Varanasi. A Systolic Algorithmand Architecture for Galois Field Arithmetic. In V. K. Prasanna andL. H. Canter, editors, Proceedings. Sixth International Parallel Process-ing Symposium, pages 283–88, Los Alamitos, California, USA, March1992. IEEE Comput. Soc. Press.
[29] M. Kovac, N. Ranganathan, and M. Varanasi. SIGMA: A VLSI SystolicArray Implementation of a Galois Field GF (2m) Based Multiplicationand Division Algoritm. IEEE Transactions on VLSI Systems, 1(1):22–30, March 1993.
[30] Rudolf Lidl and Harald Niederreiter. Finite Fields. Cambridge Univer-sity Press, Cambridge, 1984.
[31] B. E. Litow and G. I. Davida. O (log(n)) Parallel Time Finite FieldInversion. In VLSI Algorithms and Architectures. 3rd Aegean Workshopon Computing, AWOC88. Corfu, Greece, pages 74–80, Berlin, NewYork, June 1988. Springer-Verlag.
154 Bibliography
[32] J. L. Massey and J. K. Omura. Apparatus for Finite Field Computa-tion. US Patent 4,587,627, 1986.
[33] James L. Massey. Shift-Register Synthesis and BCH Decoding. IEEETransactions on Information Theory, 15(1):122–127, 1969.
[34] Edoardo D. Mastrovito. VLSI Architectures for Computation in GaloisFields. PhD thesis, Linkoping University, Linkoping, Sweden, 1991.No. 242.
[35] R. J. McEliece. Finite Fields for Computer Scientists and Engineers.Kluwer Academic Publishers, Boston, 1987.
[36] M. Morii and M. Kasahara. Efficient Construction of Gate Circuit forComputing Multiplicative Inverses over GF (2m). Transactions of theInstitute of Electronics, Information and Communication Engineers E,E72(1):37–42, January 1989.
[37] M. Morii, M. Kasahara, and D. L. Whitling. Efficient Bit-Serial Mul-tiplication and the Discrete-Time Wiener-Hopf Equations over FiniteFields. IEEE Transactions on Information Theory, IT-35(6):1177–83,November 1989.
[38] R. C. Mullin, I. M. Onyszchuk, and S. A. Vanstone. Optimal NormalBases in GF(pn). Discrete Appl. Math., 22:149–61, 1988/89.
[39] Christof Paar. Efficient VLSI Architectures for Bit Parallel Compu-tation in Galois Fields. PhD thesis, Universitat GH Essen, Essen,Germany, 1994. No. 328.
[40] A.-E. Pellet. Sur les fonctions irreductibles suivant un module premieret une fonction modulaire. C. R. Acad. Sci. Paris, 70(7):328–30, 1870.
[41] Sam Perlis. Normal Bases of Cyclic Fields of Prime-Power Degree.Duke Mathematical Journal, 9:507–17, 1942.
[42] Irving S. Reed, Robert A. Scholtz, T. K. Truong, and Lloyd R. Welch.The Fast Decoding of Reed-Solomon Codes Using Fermat TheoreticTransforms and Continued Fractions. IEEE Transactions on Informa-tion Theory, 24(1):100–106, 1978.
[43] C. E. Shannon. A Mathemetical Theory of Communication. Bell Sys-tem Technical Journal, 27:379–423,623–56, July, October 1948.
Bibliography 155
[44] A. Shen, A. Ghosh, S. Devadas, and K Keutzer. On AveragePower Dissipation and Random Pattern Testability of CMOS Com-binational Logic Networks. In IEEE/ACM International Conferenceon Computer-Aided Design, pages 402–7, Santa Clara, CA, November1992.
[45] H. J. M. Veendrick. Short-Circuit Dissipation of Static CMOS Circuitryand its Impact on the Design of Buffer Circuits. IEEE Journal on Solid-State Circuits, SC-19(4):468–73, August 1984.
[46] Zhe-xian Wan. Introduction to Abstract and Linear Algebra. Studentlit-teratur, Lund, Sweden, 1992.
[47] C. C. Wang, T. K. Truong, H. M. Shao, L. J. Deutsch, J. K. Omura,and I. S. Reed. VLSI Architectures for Computing Multiplications andInverses in GF (2m). IEEE Transactions on Computers, C-34(8):709–17, August 1985.
[48] Chin-Liang Wang and Jung-Lung Lin. A Systolic Architecture for Com-puting Inverses in Finite Fields GF (2m). In 1991 International Sympo-sium on VLSI Technology, Systems, and Applications. Proceedings ofTechnical Papers, pages 312–16, New York, May 1991. IEEE.
[49] Chin-Liang Wang and Jung-Lung Lin. A Systolic Architecture for Com-puting Inverses in Finite Fields GF (2m). IEEE Transactions on Com-puters, 42(9):1141–46, 1993.
[50] M. Z. Wang and I. F. Blake. Bit Serial Multiplication in Finite Fields.SIAM Journal on Discrete Mathematics, 3(1):140–48, February 1990.
[51] Heinrich Weber. Die allgemeinen Grundlagen der Galois’schen Gle-ichungstheorie. Mathematische Annalen, 43:521–49, 1893.
[52] Lloyd R. Welch and Robert A. Scholtz. Continued Fractions andBerlekamp’s Algorithm. IEEE Transactions on Information Theory,25(1):19–27, 1979.
[53] Neil H. E. Weste and Kamran Eshraghian. Principles of CMOS VLSIDesign. A Systems Perspetive. Addison-Wesley Publishing Company,Reading, Massachusetts, 1988.
156 Bibliography
Index
Abelian group, 7Adder tree, 25–26, 27, 54Addition, 8
in F2, 9, 12, 15, 25–26, 62in a ring, 8in extension fields, 11in finite fields, 5Notation for, 5of vectors, 11Properties of, 108 (table), 115
(table), 118Symbol for, 6 (figure)
Algebraic coding theory, 2All-one polynomial, see PolynomialAND gate, see Boolean ANDArea, see ModelArithmetic
in F2, 9in F4, 102–107
Architectures for, 104 (fig-ure), 107 (figure)
Properties of, 107, 108 (ta-ble)
using normal bases, 105–106using polynomial bases,
103–105in F16, 109–115
Architectures for, 108 (fig-ure), 111 (figure), 114 (fig-ure)
Properties of, 115–117 (ta-ble)
Arithmetic (continued)
in tower fields, 85–125
using bases of type I, 88–92
using bases of type II, 92–95
using bases of type III, 96–102
Background, 2–5
Bases
Canonical, 10
Conventional, 10
Dual, 10, 11, 13, 14, 37, 61,74
exchange, 74–76
Multiple of, 13, 14, 37
Normal, 3, 10, 59, 61–67, 74,76, 81, 85–87, 103,105–107, 110, 112, 113,115, 118, 120, 122, 125,128, 130, 133
Best known, 68 (table)
Construction of, 66
Optimal, 66
of F4, 103
of tower fields, 86–88
of type I, 87, 88–92, 120–122,125, 128, 130
of type II, 87, 92–95, 121–122,125, 128, 130
of type III, 87, 96–102, 118,121–122
157
158 Index
Bases (continued)Polynomial, 3, 10, 12–14, 29,
36–38, 47, 49, 58, 74, 76,85–87, 103–107, 109, 110,112, 115, 118, 120–122,125, 128, 130
Standard, 10Triangular, 36, 37, 38, 78
Berlekamp-Massey algorithm, 35,38
relation to the Euclidean al-gorithm, 35
BooleanAND, 5, 6 (figure), 16 (figure),
20gates, 6 (figure), 15, 16 (fig-
ure)Properties of, 28 (table)Scaled, 19–21, 23Symbols for, 5Unscaled, 18, 19–21, 23, 24,
26inversion, 5, 6 (figure), 20, 23NAND, 6 (figure), 20NOR, 6 (figure)operations, 5OR, 5, 6 (figure)XNOR, 6 (figure)XOR, 5, 6 (figure), 16 (figure),
26Buffer, 26, 27, 43, 110, 112, 113,
118Properties of, 28 (table), 107,
108 (table), 115, 115 (ta-ble), 118
Symbol for, 6 (figure)
Canonical basis, see BasesCapacitance, 19Capacitive load, 19
Chinese remainder theorem, 4Clock frequency, 23Clock signal, 22, 27, 35CMOS, see Static CMOSCommutative ring, 8Complex field, 9Composite field, 85Conclusions, 128–130Continued fractions, 35Control
logic, 26signal, 15, 26
Conventional basis, see BasesCost measures, 27–28Critical path, 20, 32, 41, 47, 51,
54, 107, 115, 118through control logic, 26
D Flip-flop, see Flip-flopDelay, see ModelDivision, 5Drain capacitance, 19Dual basis, see BasesDynamic power dissipation,
see Power dissipation
Energy, see ModelError correcting codes, 2Error-locator polynomial, 35Euclidean algorithm, 29, 30
for polynomials, 35relation to the Berlekamp-Mas-
sey algorithm, 35Exponentiation, 2Extension field, 9, 9–10
Fast inverters, see InversionFeedback network, 38
Index 159
Feedback shift register,see Shift register
Field, 8Finite field, 8
Characteristic of, 10Flip-flop, 15, 17 (figure), 27Frequency, 23Future Research, 133
Galois field, see Finite fieldGalois imaginaries, 2Gate, 15Gate capacitance, 19Gates, see Boolean gatesGauss elimination, 36Gauss-Jordan algorithm, 45
Architecture of, 46 (figure), 47,48 (figure)
Generator, 9Glitch, 22Group, 7
Hamming complexity, 63Best known, 68 (table)Lower bound on, 66, 67Upper bound on, 63, 64, 66,
67, 133Hamming weight
of a matrix, 62of a polynomial, 32, 41, 51of a vector, 62, 65
Hankel matrix, see Matrix
Information theory, 2Input gate capacitance, 19Integers, 8Interconnection, 21Introduction, 1–5
Inversion, 2
based on exponentiation, 3,69–74
Algorithm for, 69, 70, 73
Architecture of, 71 (figure),75 (figure)
Properties of, 72 (table), 75(table)
based on pattern recognition,5
based on the Berlekamp-Mas-sey algorithm, 35–45, 54,76
Algorithm for, 39
Architecture of, 40 (figure),38–41
Control signals of, 41
Properties of, 44 (table), 41–45, 77 (table)
based on the Euclidean algo-rithm, 3, 29–35, 54, 76
Algorithm for, 30
Architecture of, 31 (figure)
Control signals of, 31
Properties of, 35 (table), 32–35, 77 (table)
based on the Gauss-Jordan al-gorithm, 45–54, 77–80
Order of input signals for,52 (table)
Preprocessor of, 50 (figure),53 (figure)
Properties of, 49, 50 (table),51, 52 (table), 79, 80 (ta-ble)
Boolean, see Boolean inversion
by a direct network, 4
by a multiplier tree, 3
by Araki/Fujita/Morisue, 3
160 Index
Inversion (continued)by Asano/Itoh/Tsujii, 4by Berlekamp, 3, 29by Feng, 4, 73–74by Hasan/Bhargava, 3, 4, 49,
77by Itoh/Tsujii, 4, 74by Kovac/Ranganatan/Vara-
nasi, 5by Mastrovito, 70by Morii/Kasahara, 4, 90by table lookup, 3by Wang et al., 3, 69by Wang/Lin, 4, 49Fast, 129 (figure)in F4, 105, 109
Architecture for, 104 (figure),107 (figure)
Properties of, 108 (table)in F16, 112–113
Architecture for, 114 (figure)Properties of, 117 (table)
in extension fields, 11in tower fields, 85–125
Properties of, 123 (figure),124 (figure), 118–125
Low Energy, 132 (figure)Small, 131 (figure)using all subfields, 4using bases of type I, 88–90
Architecture of, 89 (figure)Properties of, 120–121
using bases of type II, 93Architecture of, 92 (figure)Properties of, 121
using bases of type III, 96–99Architecture of, 98 (figure)Properties of, 121–122
Inversion (continued)using normal bases, 59–81
Properties of, 80–81, 82–84(figure)
using polynomial bases, 29–58,74–80, 105
Properties of, 55 (figure), 56(figure), 57 (figure), 54–58
using systolic arrays, 4with logarithmic depth, 4
Inverter chain, 23, 27Irreducible
pentanomial, 32, 135polynomial, see Polynomialtrinomial, 32, 135
Large capacitive loads, 23–25Leakage current, 22Linear feedback shift register,
see Shift registerLinear transformation, 10Low energy inverters,
see Inversion
Massey-Omura multiplier,see Multiplication
MatrixHankel, 13, 14, 36–39, 78Nonsingular, 13, 36, 39, 45representation, 11, 12, 13Toeplitz, 4
Minimum size transistor,see MOS transistor
Mobility, 18Model, 1, 15, 17–23
Area, 17, 21Delay, 17–21Energy, 22, 23, 27
Index 161
Model (continued)
Power dissipation, 17, 22–23,27
Size, 21, 27
Time, 27
Monic polynomial, see Polynomial
MOS transistor, 1, 15
Length of, 18
Minimum size of, 18, 21
Scaled, 19–21
Unscaled, 18, 19
Width of, 18
Multiple transitions, see Transitions
Multiplication, 2, 69
by Berlekamp, 13
by Hasan/Bhargava, 3, 36
by Massey/Omura, 3, 59, 61–67
Properties of, 67, 68 (table)
used in inversion, 71
by Paar, 91
by Wang/Blake, 36
in F2, 9, 12, 15, 62
in F4, 103–106
Architecture for, 104 (figure),107 (figure)
Properties of, 108 (table)
in F16, 113
Architecture for, 114 (figure)
Properties of, 116 (table)
in a field, 8
in a ring, 8
in extension fields, 11, 12
in finite fields, 5
Notation for, 5
Symbol for, 6 (figure)
using bases of type I, 91
Architecture of, 90 (figure)
Multiplication (continued)
using bases of type II, 93–95
Architecture of, 94 (figure)
using bases of type III, 99–100
Architecture of, 98 (figure)
using normal bases, 105–106
using polynomial bases, 13,103–104
Multiplication by a constant
in F4, 104–106, 109
Architecture for, 104 (figure),107, 108 (figure)
Properties of, 108 (table)
in F16, 109
Architecture for, 108 (figure)
Properties of, 115 (table)
in tower fields, 119
Symbol for, 6 (figure)
using bases of type I, 91
Architecture of, 90 (figure)
Properties of, 119
using bases of type II, 95
Architecture of, 94 (figure)
Properties of, 119
using bases of type III, 100
Architecture of, 101 (figure)
Properties of, 119
using normal bases, 106
using polynomial bases,104–105
NAND gate, see Boolean NAND
nMOS transistor, 15
Nonsingular matrix, see Matrix
NOR gate, see Boolean NOR
Normal basis, see Bases
inverters, see Inversion
multipliers, see Multiplication
162 Index
Normalized
area, 21, 27, 28 (table)
of an adder tree, 26
of an inverter chain, 24–25
of control logic, 26
capacitance, 20
delay, 20
of an adder tree, 26
of an inverter chain, 23–25
of control logic, 26
energy, 23, 27
of control logic, 26
input capacitance, 28 (table)
of control logic, 26
internal delay, 20, 28 (table)
output resistance, 28 (table)
of control logic, 26
power dissipation, 23, 27
of an inverter chain, 25
properties, 27, 28 (table)
of addition, see Addition
of Boolean gates, see Booleangates
of buffers, see Buffer
of inversion, see Inversion
of multiplication, see Multi-plication (by a constant)
of VLSI architectures, 15–28
resistance, 19
time, 27
total delay, 20
width, 18
Notation, 5
OR gate, see Boolean OR
Outline, 1–2
Pattern recognition, 5
Pentanomial, 135,see also Polynomial
pMOS transistor, 15
Polynomial
All-one, 3, 59–61, 74
Irreducible, 9, 10, 13, 29, 60,86, 87
of minimum weight, 32, 133,135–149
Table of, 136–149
Monic, 9
Polynomial basis, see Bases
inverters, see Inversion
multipliers, see Multiplication
Power dissipation, see Model
Prime field, 9
Primitive element, 9
Properties, see Normalized prop-erties
Real field, 9
Resistance, 19
Ring, 8
Scaled
gate, see Boolean gates
transistor, see MOS transistor
Scaling factor, 19–21, 23
Shift register, 12, 38, 38 (figure),41, 51, 52 (figure)
Short circuit power dissipation,see Power dissipation
Single transitions, see Transitions
Size, see Model
Small inverters, see Inversion
Source capacitance, 19
Squaring, 2, 69, 91, 95, 100
by Hasan/Bhargava, 3
in F4, 105–106, 109
Architecture for, 104 (figure),107 (figure)
Properties of, 108 (table)
Index 163
Squaring (continued)in F16, 109–112
Architecture for, 111 (figure)Properties of, 115 (table)
using bases of type I, 91–92Architecture of, 92 (figure)
using bases of type II, 95Architecture of, 94 (figure)
using bases of type III, 100–102Architecture of, 101 (figure)
using normal bases, 62, 106using polynomial bases, 105
Standard basis, see BasesStatic CMOS, 15Static power dissipation,
see Power dissipationSubfield, 4, 9Sun Zi theorem, see Chinese re-
mainder theoremSwitch, 5, 15Symbols, 5, 6 (figure)Systolic array, 4
Time, see ModelToeplitz matrix, see MatrixTower field, 85
Bases of, see BasesInversion in, see InversionMultiplication in, see Multi-
plication (by a constant)Squaring in, see Squaring
Trace, 10Transistor, see MOS transistor and
ModelTransitions, 22Transmission gate, 15Triangular basis, see BasesTrinomial, 135,
see also PolynomialType I bases, see Bases
Type II bases, see BasesType III bases, see Bases
Unscaledgate, see Boolean gatestransistor, see MOS transistor
Vectoraddition, see Addition of vec-
torsrepresentation, 11space, 9
VLSI, 15–28
Weight, see Hamming weight
XNOR gate, see Boolean XNORXOR gate, see Boolean XOR