vlsi aspects on inversion in finite fieldsmikael:... · 2011. 2. 23. · vlsi aspects on inversion...

Linkoping Studies in Science and TechnologyDissertation No. 731

VLSI Aspects on

Inversion in Finite Fields

Mikael Olofsson

Department of Electrical EngineeringLinkopings universitet, SE-581 83 Linkoping, Sweden

Linkoping 2002

VLSI Aspects on Inversion in Finite Fields

c© 2002 Mikael Olofsson

Department of Electrical EngineeringLinkopings universitetSE-581 83 LinkopingSweden

This document was prepared using LATEX2ε on a SUN Ultra 10. The fig-ures were produced using Xfig (from xfig.org) and the plots were producedusing Matlab (from MathWorks, Inc.). The irreducible polynomials inAppendix A were found using Magma (from University of Sydney) and theindex was compiled by Makeindex.

Printed in Sweden by UniTryck, Linkoping 2002

ISBN 91-7373-256-7ISSN 0345-7524

To my surprise

Abstract

Different algorithms and architectures for inversion in finite extension fieldsare studied. The investigation is restricted to fields of characteristic two.Based on a simple transistor model, various architectures are compared withrespect to delay, area requirement, and energy consumption.

Both polynomial and normal basis representations are considered. A specialinvestigation is made on representations of fields as tower fields. New archi-tectures are presented and compared with previously known architectures.For tower fields, a thorough investigation is made for the case where theextension degree is a power of two. In that case the investigation is basedon a classification of all possible bases of the field over its largest subfield.

It is noted that normal bases, generated by irreducible all-one polynomi-als, are closely related to the polynomial bases which are generated by thesame polynomials. Based on this observation, it is shown how architecturesconsidered for polynomial basis representation can be modified for use withcorresponding normal bases.

A list of minimum weight irreducible polynomials over F2 is also given.

i

Acknowledgements

Everyone, who has tried to prepare a thesis, knows that it is hard work andthat it cannot be done without support from others.

First and foremost, my thougts go to my supervisor Professor Thomas Eric-son, and to Professor Stefan Dodunekov who filled in for Thomas during theacademic year 1993-94. Your experience and your comments on my workhas been invaluable to me. And Thomas, your patience while reading mymanuscripts has always amazed me.

I am lucky enough to be working in a very stimulating environment. To myformer and present colleagues in the Image Coding group, the InformationTheory group, and the Data Transmission group: Thanks for research re-lated discussions, mini-golf, go-cart, go, friday cakes, and discussions aboutjust anything over a cup of coffee.

Some colleagues deserve to be especially mentioned: Dr. Edoardo Mastro-vito, who introduced me to finite fields; Assistant Professor Ralf Kotter,who gave me the push that made me look into inversion; Associate Pro-fessor Lars-Inge Alfredsson, with whom I have had numerous discussionsabout cost measures; Mrs. Gunilla Svahn-Roming, our secretary, who hasbeen a great help on many occasions; and Research Engineer Jean-JacquesMoulis, who has been keeping my computer up and running.

I would not have been in the place where I am today, if it had not been formy parents, Mrs. Lisbeth Olofsson and Mr. Anders Olofsson. You taughtme that nothing is impossible; all that’s needed is some reading and effort.My father, regrettably, never got the opportunity to se this result of myefforts.

iii

Last, but absolutely not least, my thougts go to my family. Elisabeth, yourlove, patience, and understanding, is what has kept me going. Our sons,Erik and Viktor, have also had their share of sacrifices.

To all of you, I send my deepest gratitude.

Linkoping, December, 2001.

Mikael Olofsson

iv

Contents

1 Introduction 1

1.1 Outline of the thesis . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Notation and Symbols . . . . . . . . . . . . . . . . . . . . . . 5

2 Mathematical Background 7

2.1 Groups, Rings, and Fields . . . . . . . . . . . . . . . . . . . . 7

2.2 Extension Fields . . . . . . . . . . . . . . . . . . . . . . . . . 9

2.3 Vector and Matrix Representations . . . . . . . . . . . . . . . 11

3 VLSI Considerations 15

3.1 Static CMOS Gates . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Models Used . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3.2.1 The Delay Model . . . . . . . . . . . . . . . . . . . . . 17

3.2.2 The Size Model . . . . . . . . . . . . . . . . . . . . . . 21

3.2.3 The Power Dissipation Model . . . . . . . . . . . . . . 22

3.3 Special Cases . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.3.1 Large Capacitive Loads . . . . . . . . . . . . . . . . . 23

3.3.2 Adder Trees . . . . . . . . . . . . . . . . . . . . . . . . 25

3.3.3 Buffers . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.3.4 Control Logic and Control Signals . . . . . . . . . . . 26

3.4 Summary of Cost Measures . . . . . . . . . . . . . . . . . . . 27

4 Polynomial Basis Inverters 29

4.1 Inversion Based on the Euclidean Algorithm . . . . . . . . . . 29

4.1.1 The Architecture . . . . . . . . . . . . . . . . . . . . . 29

4.1.2 Properties of the Architecture . . . . . . . . . . . . . . 32

4.2 A Berlekamp-Massey Based Inverter . . . . . . . . . . . . . . 35

4.2.1 Triangular Bases . . . . . . . . . . . . . . . . . . . . . 36

v

vi Contents

4.2.2 The Architecture . . . . . . . . . . . . . . . . . . . . . 38

4.2.3 Properties of the Architecture . . . . . . . . . . . . . . 41

4.3 Inversion Based on the Gauss-Jordan Algorithm . . . . . . . 45

4.3.1 A Systolic Implementation of the Gauss-Jordan algo-rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.2 Previous Inverters Based on the Gauss-Jordan Algo-rithm . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.3.3 A New Preprocessor . . . . . . . . . . . . . . . . . . . 49

4.4 Properties of the Polynomial Basis Inverters . . . . . . . . . . 54

5 Normal Basis Inverters 59

5.1 All-One Polynomials . . . . . . . . . . . . . . . . . . . . . . . 59

5.2 The Massey-Omura Multiplier . . . . . . . . . . . . . . . . . . 61

5.3 Inversion Based on Exponentiation . . . . . . . . . . . . . . . 69

5.3.1 Inversion by Squaring and Multiplication . . . . . . . 69

5.3.2 Inversion by Accellerated Squaring and Multiplication 73

5.4 Polynomial Basis Inverters Revisited . . . . . . . . . . . . . . 74

5.5 Properties of the Normal Basis Inverters . . . . . . . . . . . . 80

6 Inversion in Tower Fields 85

6.1 Bases of Tower Fields . . . . . . . . . . . . . . . . . . . . . . 86

6.2 Arithmetic Using Bases of Type I . . . . . . . . . . . . . . . . 88

6.2.1 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.2.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 91

6.2.3 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . 91

6.3 Arithmetic Using Bases of Type II . . . . . . . . . . . . . . . 92

6.3.1 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . 93

6.3.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 93

6.3.3 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . 95

6.4 Arithmetic Using Bases of Type III . . . . . . . . . . . . . . . 96

6.4.1 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . 96

6.4.2 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 99

6.4.3 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . 100

6.5 Arithmetic in F4 . . . . . . . . . . . . . . . . . . . . . . . . . 102

6.5.1 Polynomial Basis Representation . . . . . . . . . . . . 103

6.5.2 Normal Basis Representation . . . . . . . . . . . . . . 105

6.5.3 Properties of Arithmetic in F4 . . . . . . . . . . . . . 107

6.6 Arithmetic in F16 . . . . . . . . . . . . . . . . . . . . . . . . . 109

6.6.1 Multiplication by a Constant . . . . . . . . . . . . . . 109

6.6.2 Squaring . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Contents vii

6.6.3 Inversion . . . . . . . . . . . . . . . . . . . . . . . . . 1126.6.4 Multiplication . . . . . . . . . . . . . . . . . . . . . . . 1136.6.5 Properties of Arithmetic in F16 . . . . . . . . . . . . . 115

6.7 Properties of Inversion in Tower Fields . . . . . . . . . . . . . 1186.7.1 Type I Bases . . . . . . . . . . . . . . . . . . . . . . . 1206.7.2 Type II Bases . . . . . . . . . . . . . . . . . . . . . . . 1216.7.3 Type III Bases . . . . . . . . . . . . . . . . . . . . . . 1216.7.4 Best Choices . . . . . . . . . . . . . . . . . . . . . . . 122

7 Concluding Remarks 127

7.1 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1287.1.1 Fast Inverters . . . . . . . . . . . . . . . . . . . . . . . 1287.1.2 Small Inverters . . . . . . . . . . . . . . . . . . . . . . 1307.1.3 Low Energy Inverters . . . . . . . . . . . . . . . . . . 130

7.2 Future Research . . . . . . . . . . . . . . . . . . . . . . . . . 133

A Minimum Weight Irreducible Polynomials over F2 135

A.1 The table . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136

viii Contents

Chapter 1

Introduction

1.1 Outline of the thesis

The overall structure of this thesis is as follows. First, in Chapters 2 and 3,we give mathematical and technical background. Then we present bothsome new and some old solutions to the inversion problem in Chapters 4through 6. These solutions are compared with respect to chip area, timeand energy consumption. Last, we summarize, draw some conclusions, andsuggest future research in Chapter 7.

In Chapter 2, we give some mathematical background. We give definitionsof fundamental concepts, and introduce some notation. We also introducea matrix representation of finite extension fields.

In Chapter 3, we give some technical background. Based on a simple modelof the MOS transistors, we derive the cost measures used in this thesis.

We start the investigation in Chapter 4, where we consider polynomial basesfor the representation of the field. Here we present two new architecturesfor inversion.

Normal basis inverters are studied in Chapter 5. We show that polynomialbasis inverters can in some cases be used for inversion in normal basis rep-resentation. We also make small improvements to two known architectures,and propose an architecture based on a known algorithm.

1

2 Chapter 1. Introduction

In Chapter 6 we consider inversion in tower fields, where the extensiondegree is a power of two. We consider all possible bases of the field over itslargest true subfield. New architectures are given.

The last chapter in the thesis, Chapter 7, summarizes the thesis. Here wedraw final conclusions and suggest future research.

In Appendix A, we give the result of a computer search for irreduciblepolynomials of low weight.

1.2 Background

The theory of finite fields goes back to the nineteenth century. Mathemati-cians like Abel [1], Gauss [17], and Galois adressed the problem of solvingalgebraic equations of higher degree, but it was Galois [15] who in 1830,inspired by complex numbers, introduced what were to be called Galoisimaginaries. In that way he actually gave rise to the theory of finite fieldsand especially finite extension fields. He is therefore considered as one of thefounders of modern algebra. In 1893 Weber [51] treated fields as abstractobjects in terms of groups, in a similar way as we do today. Dickson [10]gave 1901 a thorough exposition of the area as known by the nineteenthcentury mathematicians.

Shannons [43] introduction of information theory in 1948 and the relatedarea algebraic coding theory gave rise to new questions in the area. Er-ror correcting codes are often defined over finite rings or fields. Efficienthardware for decoding of algebraic error-correcting codes over finite fieldsdepends on the existence of good VLSI implementations of algebraic oper-ations on elements in finite extension fields. Therefore, the study of archi-tectures for VLSI implementations of algebraic operations on elements inthese fields is of interest.

Much effort has been made on finding and comparing architectures for mul-tiplication, squaring, and exponentiation. However, an operation that oftenis regarded as too complex to be suitable for implementation is inversion.A natural question that arises is wheather this is true. The goal of this the-sis is to compare old and new architectures for inversion in finite extensionfields with respect to time, area, and energy needs.

1.2. Background 3

The first constructive idea of an architecture for inversion is described byBerlekamp [5] in 1968. That architecture for inversion in F2m , the field ofsize 2m, is based on the Euclidean algorithm for polynomials. The archi-tecture is regular and uses O (m) clock cycles and O (m) chip area, witha clock period of length O (logm). The chip area is here regarded as thenumber of arithmetic units for the ground field used in the architecture.The clock period length is considered to be the largest delay among thesignal paths in the architecture, and the delay of a certain gate in a pathis considered to depend on the capacitive load of its output. However, asdescribed by Berlekamp this architecture needs a relatively large amount ofcontrol circuitry.

Since then, people have proposed various methods for inversion, with dif-ferent application areas. The idea of using the Euclidean algorithm forinversion is also used by Araki, Fujita, and Morisue [2].

Mastrovito [34] studies in his thesis several architectures for algebraic oper-ations on elements of F2m . A small part of that thesis deals with inversion.One method treated there is inversion by exponentiation. This can be doneby combining multiplication and squaring either in O (m) clock cycles andO(

m2)

chip area or in O(

m2)

clock cycles and O (m) chip area, dependingon how the multiplication and squaring is implemented. In both these casesthe clock period is of length O (logm). This assumes polynomial basis rep-resentation of the element. Another method for inversion by exponentiationstudied by Mastrovito is inversion by a multiplier tree. Using a polynomialbasis and direct multipliers, a direct inverter reaching O

(

m3)

chip areaand a clock period of O

(

log2m)

can be produced. The regularity of theseinverters is fairly good. Table lookup, also mentioned by Mastrovito, hasgood time properties and regularity, but the chip area needed is O (m2m),which makes it interesting for small m only. The Euclidean algorithm ismentioned, but not investigated further.

Hasan and Bhargava [20] present new architectures for multiplication andsquaring using a polynomial basis generated by an irreducible all-one-poly-nomial. They use these new architectures for inversion by exponentiation.They reach O (m) clock cycles of length O (logm) and O

(

m2)

chip area.Wang et al. [47] proposes an architecture that uses this idea for normalbasis representation. They use a Massey-Omura multiplier to perform themultiplication and hence the time, size, and structural properties of thatmultiplier are inherited by the inverter.


There are other somewhat similar methods for inversion using normal basisrepresentation. Feng [13] proposes an architecture that uses O (m logm)clock cycles of length O (logm) and at least O (m) chip area for inversionin F2m using normal basis representation. It is based on a normal basismultiplier derived in that paper. Itoh and Tsujii [26][27] propose algorithmsfor inversion in two classes of finite fields using normal basis representation.However, they do not propose any architectures.

Direct inversion has resonably good time and size properties, at least insome cases, but the architectures are typically terribly irregular. However,some people have given structured methods for generating networks fordirect inversion. Litow and Davida [31] and von zur Gathen [16] prove theexistence of networks for direct inversion with a depth that is O (logm) anda gate count that is O

(

mO(1))

. Morii and Kasahara [36] propose a methodusing all subfields of F2m . The method is especially efficient when m isa power of 2. In that case the chip size is O

(

mlog 3 logm)

and the clockperiod O

(

log3m)

Asano, Itoh, and Tsujii [3] also propose a method usingall subfields of F2m . They claim that the gate count of the correspondingarchitecture is O

(

m3 logm)

, with a depth of O(

log2m)

.

Paar [39] also studies several architectures for algebraic operations on ele-ments of F2m . He concentrates on bit parallel architectures in his thesis.Inversion in composite fields is considered in one chapter. The architec-tures for inversion that are studied are based on the inverters of Itoh andTsujii [26] and Morii and Kasahara [36].

Davida [9] gives an equation system based on the Chinese remainder the-orem. The 2m − 1 × 2m − 1 matrix over the ground field defining thisequation system is partly Toeplitz. Wang and Lin [48] as well as Hasanand Bhargava [22] propose systolic array solutions for division. The ar-chitecture proposed by Wang and Lin solves the equation system given byDavida, while the architecture proposed by Hasan and Bhargava solves anequation system based on an m×m matrix. Both architectures use O (m)clock cycles of length O (1) and O

(

m2)

chip area. However, both the sizeand time properties are substantially smaller for the architecture by Hasanand Bhargava compared to the architecture by Wang and Lin. A similararchitecture is proposed in this thesis that needs approximately 85% of thechip size needed by the architecture by Hasan and Bhargava. These ar-chitectures are good in the sense that they are extremely structured, havesmall clock periods and use no feedback loops and no adder trees. They

1.3. Notation and Symbols 5

are all modifications of standard systolic array structures for solving linearequation systems.

Kovac, Ranganatan, and Varanasi [28][29] present an algorithm that is saidto be based on pattern recognition. This results in an exhaustive searchfor elements of the field. Therefore the time and size properties are notgood. The presented architecture is for m = 4, but the idea can easily beadapted to be used with any m. The calculations are then performed inn parallel branches, where n is some suitable power of 2. The architectureuses O (2m/n) clock cycles of length O (logm) and O

(

nm2)

chip area.

1.3 Notation and Symbols

Throughout this thesis we will analyze architectures for performing arith-metic operations in finite fields of characteristic 2 using basic arithmeticoperations in subfields. We denote a field of size q by Fq. The arithmeticoperations addition, multiplication and division in these fields are denotedby +, juxtaposition, and /. Sometimes multiplication is denoted by a dot(·) for clarity. The symbols used for hardware manipulating elements offields of characteristic 2 are given in Figure 1.1.

We will occationally use Boolean operations. These will be denoted asfollows.

Boolean inversion of A: ABoolean AND of A and B: A ∧BBoolean OR of A and B: A ∨BBoolean XOR of A and B: A∨B

The symbols used for Boolean gates are given in Figure 1.2.

We will frequently use switches. The symbols used for these switches followthe pattern given in Figure 1.3.


a

Addition

Multiplication by a Buffer

Multiplication

Figure 1.1: Symbols for objects manipulating elements of fields of characteristic 2.

a

AND-gate NAND-gate

OR-gate NOR-gate

XOR-gate XNOR-gate

Memory cellInitiated by a

Inverter

Figure 1.2: Symbols for Boolean gates and memory cells.

A

B

Figure 1.3: A two-way switch with two control signals, A and B. The switch isunderstood to be connected to the terminal labelled A when we haveA = 1 and to the terminal labelled B when we have B = 1.

Chapter 2

Mathematical Background

In this chapter we give an overview of the mathematical background ofthis thesis. Statements that can be considered as standard material aregiven without proof. We refer to Herstein [25], Lidl and Niederreiter [30],and Wan [46] for more details and proofs. Berlekamp [5], Blahut [7], andMcEliece [35] consider finite fields from an engineering point of view.

2.1 Groups, Rings, and Fields

Algebra is the theory of sets under one or more operations that are definedon the sets. Algebraic systems with special structure have been given namessuch as groups, rings, and fields.

Definition 1 A group G is a nonempty set under a binary associative op-eration ⋆ that operates on G, with the following properties.

1. There is an element e ∈ G, called the unit element of G, such thata ⋆ e = e ⋆ a = a holds for all a ∈ G.

2. For any a in G there is an element a−1, called the inverse of a, suchthat a−1 ⋆ a = a ⋆ a−1 = e holds.

If ⋆ is commutative, that is if a ⋆ b = b ⋆ a holds for any a, b in G, then G iscalled an Abelian group.

7

8 Chapter 2. Mathematical Background

It can easily be shown that both the unit element, and the inverse of anelement of G are uniquely determined. A simple example of an Abeliangroup is the set of integers under addition.

Definition 2 A ring R is a nonempty set under two binary operations,normally called addition, denoted by +, and multiplication, denoted by ·,with the following properties.

1. R is an Abelian group with respect to addition.

2. Multiplication is associative, that is (a · b) · c = a · (b · c) holds for anya, b, c in R.

3. Multiplication is distributive over addition, that is a·(b+c) = a·b+a·cand (b+ c) · a = b · a+ c · a hold for any a, b, c in R.

If the multiplication in R is commutative, that is if a · b = b ·a holds for anya, b in R, then R is called a commutative ring.

Multiplication is often denoted by juxtaposition. The best known ring isthe integer ring, which is commutative, but we should note that there areother rings. The operations do not necessarily need to be what we normallymean by addition and multiplication. For instance, we do not even demandthe presence of a multiplicative unit element. But if there is a multiplica-tive unit element, it can be shown that it cannot equal the additive unitelement. By convention we denote the additive unit element by 0 and themultiplicative unit element (if any) by 1. The ring of integers is a ring withmultiplicative unit element, while the set of even integers under integer ad-dition and multiplication is a ring without multiplicative unit element, bothwith infinitely many elements. Denote by Zn the residue class of integersmodulo n, n > 1. Then Zn is a ring with exactly n elements, under additionand multiplication reduced modulo n.

Definition 3 A field F is a commutative ring with the additional prop-erty that the set of nonzero elements of F form an Abelian group undermultiplication. A field with finitely many elements is called a finite field.

A finite field is also called a Galois field after the French 19th centurymathematician Evariste Galois.

2.2. Extension Fields 9

The real numbers under ordinary addition and multiplication is an exampleof an infinite field. We noted that Zn is a finite ring with n elements.However, if n is a prime, then Zn is actually a finite field. We normallyrefer to such a field as a prime field, since its size is a prime.

There is a finite field with q elements if and only if q is a prime power.Furthermore, any two finite fields with q elements are isomorphic, whichmeans that there is a one-to-one mapping from one of the fields to the othersuch that the the algebraic structure is preserved. This means that there isessentially only one finite field with q elements. We will denote this field byFq. The multiplicative group F∗

q , Fq \ {0} is not only Abelian. It is also acyclic group, i.e. there is at least one element α ∈ F∗

q such that any elementin F∗

q is a power of α. The element α is then referred to as a generator ofF∗q or a primitive element of Fq.

The smallest finite field is F2. Here addition and multiplication are per-formed mod 2, which means that addition is equivalent to Boolean XORand that multiplication is equivalent to Boolean AND.

2.2 Extension Fields

Let p be a prime. We have seen how we can construct Fp. Now we wish toconstruct Fpm , m > 1.

Definition 4 Let F and K be fields such that F ⊂ K holds. Then K iscalled an extension of F and F is called a subfield of K.

The field of complex numbers C is an extension of the field of real numbersR, constructed by adjoining a root of the irreducible polynomial x2+1 toR. Extensions of finite fields can also be constructed by adjoining roots ofirreducible polynomials. Let p(x) =

∑mi=0 pix

i be an irreducible polynomialover Fp of degree m. Such polynomials exist of any degree m ≥ 1 overany finite field. It is often convenient to consider only irreducible polyno-mials with the leading term pm = 1. Such polynomials are called monic.Fpm is then constructed by adjoining a root of p(x) to Fp. This can also bedescribed as the set of polynomials over Fp reduced modulo p(x). An impor-tant observation is that this implies that Fpm is a vector space of dimensionm over Fp. Furthermore, the elements represented by xi, 0 ≤ i < m, forma basis of this vector space that we formally define in the following way.


Definition 5 Let ϑ be an element of Fqm such that{

ϑi}m−1

i=0is a basis of

Fqm over Fq. Then{

ϑi}m−1

i=0is called a polynomial basis of Fqm over Fq.

These bases are also called standard, canonical, and conventional bases inthe literature. There is always at least one polynomial basis of a field overany of its subfields. The element ϑ ∈ Fqm generates a polynomial basis ofFqm over Fq if and only if ϑ is a root of an irreducible polynomial p(x) ofdegree m over Fq. Another often used basis is the following.

Definition 6 Let ϑ be an element of Fqm such that {ϑqi}m−1i=0 is a basis of

Fqm over Fq. Then {ϑqi}m−1i=0 is called a normal basis of Fqm over Fq.

There is always at least one normal basis of a field over any of its subfields.One useful property of normal bases is that raising an element to power qis a simple cyclic shift of the vector representing the element in a normalbasis. If {ϑqi}m−1

i=0 is a basis, then ϑ is a root of an irreducible polynomial,

say p(x), of degree m. Furthermore, {ϑqi}m−1i=0 is the set of all roots of

p(x). However, not all irreducible polynomials of degree m generate normalbases. Polynomial and normal basis representations are the most commonrepresentations of finite extension fields.

Definition 7 Let α be an element of Fqm. The trace of α over Fq is defined

as TrFqm/Fq(α) ,

∑m−1i=0 αq

i.

When the fields involved are obvious from the context, we will simply writeTr(α). The trace function is a mapping from Fqm onto Fq, that is linearover Fq. Actually, the linear transformations from Fqm into Fq are knownto be exactly the mappings Lβ, β ∈ Fqm , given by Lβ(α) = Tr(βα) forall α ∈ Fqm and they are known to be distinct. See for instance Lidl andNiederreiter [30, pp 54-56].

Definition 8 Let {θi}m−1i=0 and {σj}m−1

j=0 be bases of Fqm over Fq. If

TrFqm/Fq(θiσj) =

{

0 , i 6= j1 , i = j

holds, the bases are said to be dual.

Given a basis of Fqm over Fq, there exists a unique dual of that basis. Finally,the smallest subfield of Fqm is a prime field Fp. We refer to the prime p asthe characteristic of Fqm .

2.3. Vector and Matrix Representations 11

2.3 Vector and Matrix Representations

We have noted that Fqm is a vector space over Fq. That makes additionin Fqm equivalent to vector addition of the corresponding vectors over Fq.It also makes multiplication of an element of Fqm by an element of Fq es-pecially simple, since that is equivalent to multiplication of a vector by ascalar. Multiplication of two elements of Fqm and inversion of an element inF∗qm , however, demand some more elaborate thinking. We use the following

notation for the vectors.

Definition 9 Let {θj}m−1j=0 be a basis of Fqm over Fq and let α be an element

in Fqm. Define the row vector θ = (θ0, θ1, . . . , θm−1). The column vector

αθ = (a0, a1, . . . , am−1)T over Fq satisfying α = θαθ is called the θ-vector

of α.

Next, we use two bases to create a matrix notation of elements in Fqm .

Definition 10 Let {θj}m−1j=0 and {σi}m−1

i=0 be bases of Fqm over Fq and de-

fine the row vectors θ , (θ0, θ1, . . . , θm−1) and σ , (σ0, σ1, . . . , σm−1). Thematrix

ασ,θ , ((θ0α)σ , (θ1α)σ , . . . , (θm−1α)σ) ,

where (θjα)σ is the σ-vector of θjα, is called the [σ, θ]-matrix of α ∈ Fqm.

Let {θ′i}m−1i=0 be the dual basis of {θj}m−1

j=0 . It is well known that the entries

ai in the θ-vector of α are given by ai = Tr(θ′iα) for all α ∈ Fqm . Thisstatement can easily be extended in the following way for [σ, θ]-matrices.

Lemma 1 Let {θj}m−1j=0 and {σi}m−1

i=0 be bases of Fqm over Fq and let

{σ′i}m−1i=0 be the dual basis of {σi}m−1

i=0 . Define θ , (θ0, θ1, . . . , θm−1) and

σ , (σ0, σ1, . . . , σm−1). Then the entries of

ασ,θ =

a0,0 · · · a0,m−1...

. . ....

am−1,0 · · · am−1,m−1

are given by ai,j = Tr(σ′iθjα) for all α ∈ Fqm.


The following lemma gives us a possibility to describe multiplication in finiteextension fields, where the elements are represented in different bases.

Lemma 2 Let {θj}m−1j=0 and {σi}m−1

i=0 be bases of Fqm over Fq and define

the row vectors θ , (θ0, θ1, . . . , θm−1) and σ , (σ0, σ1, . . . , σm−1). Let α, β,and γ be elements of Fqm. Then γσ = ασ,θβθ holds if and only if γ = αβholds.

Proof: Assuming γ = αβ, we have σγσ = αθβθ by Definition 9. Multiplyingα into the vector θ gives us

σγσ = (αθ0, αθ1, . . . , αθm−1)βθ.

By Definition 9, we have αθj = σ (αθj)σ, which gives us

σγσ = (σ (αθ0)σ , σ (αθ1)σ , . . . , σ (αθm−1)σ)βθ

= σ ((αθ0)σ , (αθ1)σ , . . . , (αθm−1)σ)βθ,

where we by Definition 10 can identify ((αθ0)σ , (αθ1)σ , . . . , (αθm−1)σ) asthe [σ, θ]-matrix of α and hence σγσ = σασ,θβθ holds. Since {σi}m−1

i=0 is abasis of Fqm over Fq, these elements are linearly independent. Thereforeγσ = ασ,θβθ holds.

Conversely, assuming γσ = ασ,θβθ, we simply follow the above steps back-wards to prove that γ = αβ holds. 2

All fields considered in this thesis have characteristic 2, i.e. we are consid-ering F2m for some positive integer m. From a practical point of view, itis reasonable to assume that all elements are represented as vectors withrespect to the same basis, say θ. For multiplication, this means that wehave α and β as αθ and βθ, and wish to calculate the product γ = αβ asγθ. To make use of Lemma 2, we need to generate αθ,θ, and then calculatethe matrix-vector product γθ = αθ,θβθ. Both the generation of the matrixand the calculation of the matrix-vector product are performed using F2-arithmetic. We have already noted that those operations can be performedusing AND gates as F2-multipliers and using XOR gates as F2-adders. If θis a polynomial basis, we can generate the columns of αθ,θ from αθ usinga binary feedback shift register of length m, where the feedback network is

2.3. Vector and Matrix Representations 13

given by the irreducible polynomial that is used to generate the basis. Thematrix-vector product can then be sequentially determined at the same timeas the generation of the matrix by considering one column at a time andaccumulating the result in a register of length m. Most sequential poly-nomial basis multipliers take this approach, e.g. a multiplier presented byBerlekamp [5, Ch. 2.4].

A direct consequence of Lemma 2 is that ασ,θ is singular if and only if αis zero. Therefore, we can find the inverse of any non-zero element α bygenerating ασ,θ for some suitable basis σ from αθ, and then solving theequation system ασ,θβθ = 1σ, where 1σ is the σ-vector of 1. In this casewe can choose σ freely, since that basis is only used during calculation. Forinstance, we can choose σ in order to achieve some structure in ασ,θ thatwe can make use of in the inversion algorithm.

Definition 11 Let {θi}m−1i=0 and {σi}m−1

i=0 be sets of elements from Fqm. Ifthere is a β ∈ F∗

qm such that σi = βθi holds for all 0 ≤ i < m, then {σi}m−1i=0

is called a multiple of {θi}m−1i=0 .

Obviously, if {θj}m−1j=0 is a basis of Fqm over Fq, then every multiple of

{θj}m−1j=0 is again a basis of Fqm over Fq. By Definition 8, the following

also holds. Let {θj}m−1j=0 , be a basis of Fqm over Fq and let {θ′i}

m−1i=0 be its

dual basis. Then{

β−1θ′i}m−1

i=0is the dual basis of {βθj}m−1

j=0 , where β is anyelement of F∗

qm .

We are now ready to derive necessary and sufficient conditions that thematrices are Hankel.

Theorem 1 Let {σi}m−1i=0 and {θj}m−1

j=0 be bases of Fqm over Fq. Define the

row vectors σ , (σ0, σ1, . . . , σm−1) and θ , (θ0, θ1, . . . , θm−1). The matricesασ,θ are Hankel for all α ∈ Fqm if and only if {θj}m−1

j=0 is a multiple of a

polynomial basis and {σi}m−1i=0 is a multiple of the dual basis of the same

polynomial basis.


Proof: First assume that ασ,θ is Hankel for all α ∈ Fqm . Let {σ′i}m−1i=0 be the

dual basis of {σi}m−1i=0 . By Lemma 1, the i, j-th entry of ασ,θ = (aij)

m−1,m−1i,j=0,0

isai,j = Tr(σ′iθjα), 0 ≤ i, j < m.

If ασ,θ is Hankel, then we have ai,j = ai+1,j−1, for 0 ≤ i < m− 1 and0 < j ≤ m− 1. Since Tr(βα) = Tr(γα) is true for all α ∈ Fqm if and only ifβ = γ ∈ Fqm holds, we have

σ′iθj = σ′i+1θj−1.

We rewrite this in the form

θj/θj−1 = σ′i+1/σ′i.

This relation must hold for all i, 0 ≤ i < m− 1, and j, 0 < j ≤ m− 1, andtherefore we have

θj/θj−1 = σ′i+1/σ′i = φ

for some φ ∈ F∗qm . Since a multiple of a basis is again a basis, both {σ′i}

m−1i=0

and {θj}m−1j=0 are multiples of the polynomial basis

{

φi}m−1

i=0of Fqm over Fq

and hence, {σi}m−1i=0 is a multiple of the dual basis of

{

φi}m−1

i=0.

Conversely, assuming that {θj}m−1j=0 is a multiple of a polynomial basis and

that {σi}m−1i=0 is a multiple of the dual basis of the same polynomial basis,

we follow the same steps backwards to prove that ασ,θ is Hankel for allα ∈ Fqm . 2

The proof technique used above can be used to provide necessary and suf-ficient conditions for other structures as well.

Berlekamp [6] noted that the [σ, θ]-matrices are Hankel if {θj}m−1j=0 is a poly-

nomial basis and {σi}m−1i=0 is its dual basis. Morii et al [37] essentially noted

the same for the case where {θj}m−1j=0 is a polynomial basis and {σi}m−1

i=0 is amultiple of its dual basis, but without mentioning any dual basis. The factthat we can have Hankel matrices implies that we can create algorithmsfor inversion in finite extension fields by modifying algorithms for solvingHankel problems. We will return to this idea.

Chapter 3

VLSI Considerations

The purpose of this chapter is to derive cost measures that are suitable foran analysis of architectures for VLSI implementations. The cost measuresused are derived from a simple model of the MOS transistors, assumingstatic CMOS implementation of the Boolean gates. We also assume thatonly one-input and two-input gates are used.

3.1 Static CMOS Gates

XOR gates and AND gates perform addition and multiplication in F2.Therefore, we are especially interested in these gates. For clarity, we givetheir standard implementations using static CMOS in Figures 3.1 and 3.2.The assumed binary static master-slave D flip-flop is given in Figure 3.3.

All switches are assumed to be implemented as ordinary CMOS transmissiongates using one nMOS transistor and one pMOS transistor in parallel. Thisswitch needs two control signals. We will assume that any n-way switch isimplemented by combining n transmission gates in the obvious way. Thisswitch needs 2n control signals, each connected to the gate of one transistor.However, the two transmission gates in a 2-way switch can be controlled bythe same two control signals. Hence, these two control signals are connectedto the gates of two transistors each.

15

16 Chapter 3. VLSI Considerations

Vdd

Vdd

A

A

B

B

A ∧BA ∧B

Figure 3.1: A static CMOS AND gate. A and B are the inputs and A ∧ B is theoutput.

Vdd

Vdd

Vdd

A

AA

A

A

A

BB

B

B

B

B

A∨B

Figure 3.2: A static CMOS XOR gate. A and B are the inputs and A∨B is theoutput.

3.2. Models Used 17

Master Slave

Vdd VddVdd Vdd

D Q

φ

φφ

φ φ

φφ

φ

Figure 3.3: A static CMOS D-flip-flop with input D, output Q, and clock signal φ.

3.2 Models Used

We would like to have a model of the transistor that is simple to use, butwe still want the model to mirror effects like delay in Boolean gates, arearequirement, and power dissipation.

3.2.1 The Delay Model

The delay Td of a Boolean gate is defined to be the time difference between50% level of the input transition and 50% level of the output transition.Assume that we have a Boolean inverter, whose output is loaded by a ca-pacitance CL, and let the input signal be a step function. Based on ananalytic resoning, Weste and Eshragian [53, Ch 4.5] derive the expression

Td = ACL

β,

where A is a constant given by the supply voltage and the threshold voltageof the transistor and where β is the gain factor of the transistor. Now, letW and L denote the width and the length of the transistor. Also, let µdenote the mobility of the charge carriers, let ε denote the capacitivity of


the insulator, and let tox denote the thickness of the insulator. Then thegain factor is given by

β =µε

tox· WL.

Introducing the process constant

K , Atoxµε,

we can express the delay as

Td = KL

WCL.

There are actually two such process constants, one for each type of tran-sistor. We use the notation Kn for nMOS transistors and Kp for pMOStransistors. The main reason that these constants differ is that the mobil-ity of electrons and holes differ. Weste and Eshragian [53, Ch 2.2.1] statethat the mobility of holes and electrons is 180 cm2/Vs and 500 cm2/Vsrespectively. Hence, we have Kp ≈ 2.8Kn.

In a CMOS technology there are minimum sizes of all geometrical measures.Consider a transistor of minimum length L0 and minimum width W0. Wewill normalize all properties with respect to this minimum size transistor.Therefore we use the normalized width

w =W

W0

of a transistor instead of the width W . In static CMOS, nMOS transistorsare used to make the output low, while pMOS transistors are used to makethe output high. Therefore, Kn determines the delay for negative outputtransitions and Kp determines the delay for positive output transitions. Inorder to get symmetric behaviour, we assume, unless otherwise stated, thatall transistors have length L0, all nMOS transistors have normalized width1, and all pMOS transistors have normalized width 2.8. These transistors,and Boolean gates built of them, we call unscaled.

In more complex gates the load capacitance may be charged or dischargedthrough transistors connected in series. The delay is then given by

Td = K∑

i

LiWi

CL, (3.1)

3.2. Models Used 19

where Li and Wi are the length and the width, respectively, of the i-thtransistor. The load capacitance may also be charged or discharged throughtransistors connected in parallel. The worst case, however, is always givenby transistors connected in series. We use the worst case as our estimate ofthe delay of a given gate. Equation 3.1 can be interpreted as the delay ofan RC-link. Therefore we introduce the resistance

R = K∑

i

LiWi

of the series transistors and the reference resistance

R0 = KnL0

W0.

We normalize resistances with respect to R0 by introducing the normalizedresistance

r =R

R0

of a resistance R. For the output of an unscaled Boolean gate, this isthe maximum number of series transistors connecting the output to Vdd orground. In case the transistor widths of a Boolean gate are scaled by ascaling factor s, then the normalized output resistance of that Boolean gateis scaled by s−1.

The capacitances in a MOS transistor can be modeled as the three capac-itances Cg, Cs, and Cd connecting the gate, source, and drain respectivelyto ground. Weste and Eshragian [53, Ch 4.3.4] give an example of an nMOStransistor where Cg is about 4.5 times larger than Cs and Cd. The capacitiveload of a CMOS gate is then dominated by the input gate capacitances ofthe succeeding gates. We will therefore only consider the gate capacitancesin our transistor model.

The gate capacitance is proportional to LW , where L and W are the lengthand width, respectively, of the transistor gate. Let C ′

0 be the gate ca-pacitance of a minimum size transistor. Then the gate capacitance of anunscaled nMOS transistor is C ′

0 and the gate capacitance of an unscaledpMOS transistor is 2.8C ′

0. Since almost all gate outputs in static CMOSare loaded by as many nMOS gate capacitances as pMOS gate capacitances,it is convenient to normalize capacitances with respect to the average gatecapacitance. Therefore, we introduce the reference capacitance C0 = 1.9C ′

0.


We normalize capacitances with respect to C0 by introducing the normalizedcapacitance

c =C

C0

of a capacitance C. For an input of an unscaled Boolean gate, this issimply the number of transistor gates connected to that input. In case thetransistor widths in a Boolean gate are scaled by a scaling factor s, thenthe normalized input capacitance of that Boolean gate is also scaled by s.

Now we are ready to return to the delay. We wish to normalize delayswith respect to the minimum size transistor. Therefore, we introduce thereference delay T0 = R0C0, and we normalize delays with respect to T0 byintroducing the normalized delay

t =T

T0

of a delay T . Assume that we have a Boolean gate with normalized outputresistance r. Let the normalized load capacitance be c. Then the normalizeddelay of that link is t = rc. As an example, consider the AND gate inFigure 3.1 on page 16, realized as a NAND gate, followed by a Booleaninverter. The normalized output resistance of the NAND gate is r = 2 andthe normalized input capacitance of the inverter is c = 2. The normalizeddelay of this link is therefore tint = rc = 4. We call tint the normalizedinternal delay of the AND gate.

Let ti be the normalized delay of the i-th link in a path from the output ofa flip-flop to the input of a flip-flop in an architecture to be analyzed. Thetotal normalized delay of that path is simply the sum

∑

i

ti.

This includes delays of Boolean gates and of one flip-flop. The path with thelargest delay is called the critical path. This limits the clocking frequencyfc to

fc ≤ (tCPT0)−1,

where tCP is the normalized delay of the critical path. Since there in somecases may be more than one critical path, we will speak about “a criticalpath” or “critical paths” when that is appropriate.

3.2. Models Used 21

So far we have assumed that the input waveform is a step function. A moreaccurate assumption would have been to assume a ramp. Hedenstierna andJeppson [24] have given a modified expression of the delay, where a fractionof the input rise- or fall-time is added to our delay. Our limited investigationsuggests that this may increase the delay by some 30 percent compared toour simpler model. We will not take this effect into account.

3.2.2 The Size Model

The actual chip area requirement of a certain architecture depends verymuch on the technology used for the implementation. The area is not onlydetermined by the number of transistors and the sizes of them. Area is alsoneeded for interconnections. This area depends on how complicated theinterconnection network is, and can in some cases be a substantial part ofthe total area. It would be desirable to include this area in our model for thechip area. Regrettably, that would make our model far to complicated. Inorder to make the chip area model reasonable, we therefore restrict ourselvesto considering the number of transistors and their sizes only.

Let A′0 be the area occupied by a transistor of minimum size, and assume

that a transistor of normalized width w occupies the area wA′0. Then the

area occupied by an nMOS transistor isA′0 and the area occupied by a pMOS

transistor is 2.8A′0. We normally have as many nMOS transistors as pMOS

transistors in static CMOS. Following the arguments for capacitances above,it is convenient to normalize areas with respect to the average transistorarea. Hence, define the reference area A0 = 1.9A′

0. We normalize areaswith respect to A0 by introducing the normalized area

a =A

A0

of an area A. The normalized area occupied by an unscaled Boolean gateis then simply the number of transistors in that gate. In case the transistorwidths in a Boolean gate are scaled by a scaling factor s, then the normalizedarea of that Boolean gate is also scaled by s.


3.2.3 The Power Dissipation Model

There are several different types of power dissipation in static CMOS. Themost obvious ones are listed below.

• Dynamic power dissipation arises when capacitances are charged anddischarged. This is normally the dominating part of the total powerdissipation.

• Static power dissipation is caused by leakage currents through reversebiased parasitic diodes. For a well designed chip, this is a very smallpart of the total power dissipation. Weste and Eshragian [53, Ch. 4.7]state that the static power dissipation for a specific inverter is between1nW and 2nW at the supply voltage Vdd = 5V.

• Short circuit power dissipation arises due to the fact that both thenMOS net and the pMOS net of a Boolean gate will be conductingduring a short period of time during input signal transitions. This isnormally a few per cent of the total power dissipation, as noted byVeendrick [45].

We will only consider dynamic power dissipation since this is the dominatingpart of the total power dissipation. Consider a network of Boolean gates,with a number of inputs, and only one output. Assume further that thedelays of the paths from the inputs to the output differ. Due to this differ-ence in delay, we may have multiple transitions at the output node duringa clock period, even if each input only changes its value at most once. Thisphenomenon is called glitches and it causes unwanted power dissipation.Shen et al [44] have noted that this unwanted power dissipation is typicallyabout 20 per cent of the total power dissipation. In order to simplify thepower dissipation model, we will only consider single transitions during oneclock period, i.e. the signal changes its value either once or not at all duringa clock cycle. However, there is one obvious exception: The clock signalalways has exactly two transitions per clock cycle.

A capacitance C charged through a transistor to the voltage Vdd holdsthe energy CV 2

dd/2. At the same time the energy lost in the transistoris also CV 2

dd/2. This means that when the capacitor has been chargedand discharged, the total consumed energy is CV 2

dd, and the energy cost

3.3. Special Cases 23

of a single transition is CV 2dd/2. The power consumed by charging and

discharging C with the frequency f is therefore fCV 2dd. Let us relate energies

and powers to our reference capacitance C0 by defining the reference energyE0 = C0V

2dd/2 and the reference power P0 = fcC0V

2dd/2, where fc is the

clock frequency used. We normalize energies and powers with respect to E0

and P0, respectively, by introducing the normalized energy

e =E

E0

of an energy E, and the normalized power dissipation

p =P

P0

of a power P .

3.3 Special Cases

3.3.1 Large Capacitive Loads

We have noted that the normalized delay of a link consisting of a gate outputwith normalized resistance r and a normalize load capacitance c is rc. Aconsequence of our assumption, that we only use Boolean gates with one ortwo inputs, is that r is either 1 or 2 for an unscaled Boolean gate. However,c may be large, which could be a problem for us. Luckily, there is a simplemethod to make the delay substantially smaller than rc for large enough rc.

Let n be a positive odd integer and let s be given by s = (rc/2)1/n. Consideran inverter chain, consisting of n− 1 cascaded Boolean inverters, connectedbetween the gate output and the large load capacitance. The i’th inverterin the chain is scaled by si/r, 1 ≤ i < n. The normalized delay ti of the i’thlink in this chain is therefore given by

ti =r

si−1· 2si

r= 2s, 1 ≤ i ≤ n,

including both the input link (i = 1) and the output link (i = n) of thechain. Hence, the total normalized delay of this chain is

t =n∑

i=1

ti = n · 2s = 2s logs

(rc

2

)


It is fairly easy to show that we have

t ≥ t′ , 2e ln(rc

2

)

.

Let us study the quotient t/t′. Then we have

t

t′=

s

e ln s,

which only depends on s. This quotient reaches its minimum at s = e wherewe have t = t′. Given rc, we wish to choose n and s such that t is minimized.Typically, this does not give us s = e. Luckily, the choice of s is not cruisal.For example, the delay does not increase by more than some 6 percent ifwe choose s somewhere in the interval 2 ≤ s ≤ 4, compared to the optimums = e.

The normalized area requirement of the inverter chain is

a =n−1∑

i=1

2si/r =2s

r· s

n−1 − 1

s− 1

The quotient a/c for an optimum choice of n approaches (e−1)−1 as c tendsto infinity.

The above described inverter chain cannot be used for all possible valuesof r and c. The largest normalized output resistance among the consideredgates is 2. A potential problem is when the best choice of s would be smallerthan r, based on the above reasoning. Then the scaling factor s/r of thefirst inverter in the inverter chain would be smaller than 1. But the smallestpossible scaling factor is 1, otherwise at least the nMOS transistor of thatinverter would be smaller than the minimum size transistor. The problemappears for us when we have r = 2 and 33/2 < c < 8. The best choice of nis still 3. In this case, we use an unscaled first inverter, which minimizesthe delay of the input link of the inverter chain. The scaling factor of thesecond inverter is set to (c/2)1/2. Then the delay of the input link and thedelay of the output link of that inverter is equal, which minimizes the totaldelay of the inverter chain.

Let t(r, c) and a(r, c) denote the normalized delay and area of an optimuminverter chain, driven by a normalized resistance r and loaded by a normal-ized capacitance c. Let n∗(r, c) be the optimum inverter chain length forthe first approach, that is we have

n∗(r, c) = arg minodd n>0

{

2n(rc

2

)1/n}

,

3.3. Special Cases 25

where arg min gives us the smallest n that minimizes 2n (rc/2)1/n. Basedon the above reasoning, we have

t(r, c) =

rc, r = 2 and c ≤ 3 + 51/2

4 + (8c)1/2, r = 2 and 3 + 51/2 < c < 8

2n∗(r, c)(rc

2

)1/n∗(r,c), otherwise

(3.2)

and

a(r, c) =

0, r = 2 and c ≤ 3 + 51/2

2 + (2c)1/2, r = 2 and 3 + 51/2 < c < 8

2

r·rc2 −

(

rc2

)1/n∗(r,c)

(

rc2

)1/n∗(r,c) − 1, otherwise

(3.3)

for r ∈ {1, 2}. Whenever we have a situation where r and c are fixed, weuse these functions directly. However, when c depends on the dimension ofthe field, we would prefer simpler expressions. Then we use the values givenby the bounds

t(r, c) ≤ 4.4 log2

(rc

2

)

, (3.4)

a(r, c) ≤ c (3.5)

instead, still limited to r ∈ {1, 2}, where the bound in Equation 3.4 is onlyvalid for rc ≥ 2

√3.

Let p(r, c, pL) denote the normalized power dissipation of an optimum in-verter chain, driven by a normalized resistance r and loaded by a normalizedcapacitance c with normalized power dissipation pL. Since the inverter chainis used to amplify a certain signal, we have the same activity throughoutthe chain, and hence we have

p(r, c, pL) =pL

ca(r, c).

The bound on a(r, c) gives us the corresponding bound p(r, c, pL) ≤ pL.

3.3.2 Adder Trees

We will need to add several elements of F2. In order to minimize the delay ofsuch an adder, we will assume that this addition is performed by a minimum


depth adder tree, where the basic two input adder is an XOR gate. Thedepth of this tree is ⌈log2m⌉, where m is the number of signals to add.

The XOR gate has the properties cin = 4, tint = 2, and rout = 2. Becausethe depth of the adder tree is ⌈log2m⌉, the largest normalized delay in thetree is

t = tint + (tint + routcin) (⌈log2m⌉ − 1)

= 2 + 10 (⌈log2m⌉ − 1) .

We will use the value given by the bound t < 2 + 10 log2m.

The number of XOR gates needed in the adder tree is m− 1, and the nor-malized area requirement of each XOR gate is 12. Therefore, we have thenormalized area requirement a = 12(m− 1) of an m input F2 adder.

3.3.3 Buffers

A binary buffer simply consists of two cascaded inverters, which in somecases can be used to reduce the capacitive load of a gate output. Hencebuffers provide a possibility to reduce the delay of architectures. The delayis often regarded as the most important property. We therefore insert buffersin architectures whenever that reduces the delay, and we assume that allbuffers consist of unscaled inverters. Buffers for F2m are implemented as mparallel binary buffers in the obvious way.

3.3.4 Control Logic and Control Signals

The control logic of an architecture can be implemented in several differentways. As for the architecture itself, there is a tradeoff between time, area,and energy consumption. We do not include the control logic in the areaand energy measures. We do, though, include the needed inverter chainsfor the distribution of control signals.

The critical paths may very well pass through the control logic. The delayof the control logic is therefore included as a generic tctrl, which may be afunction of the extension degree of the field.

We assume that the normalized input capacitances of the control logic inputsare all 2. We further assume that the normalized output resistances of thecontrol logic outputs are all 1.

3.4. Summary of Cost Measures 27

3.4 Summary of Cost Measures

We assume that every m-input adder is implemented as an adder tree ofdepth ⌈log2m⌉ and that any large capacitive load in a critical path is drivenby an inverter chain as described in Section 3.3. We will especially assumethat all the clock signals connected to the flip-flops are driven by apropri-ate inverter chains. Moreover, we assume that buffers are used to reducecapacitive loads whenever that reduces the delay of critical paths. Let nff

be the number of flip-flops in the architecture. Then the total normalizedcapacitive load of the clock signals is cclock = 8nff since each flip-flop has 8clocked transistors.

Let tCP be the normalized delay of the critical path of the architecture,and let n be the number of clock cykles needed for an architecture to per-form the calculation it is supposed to do. Our physical model described inSections 3.2 and 3.3 can be summarized as follows:

Time: The normalized time needed for a calculation is ntCP, pro-vided that the architecture is clocked at maximum clockfrequency.

Space: The normalized area requirement of an architecture is thesum of the normalized area requirement of the gates, flip-flops, and needed inverter chains in the architecture. Thisincludes the normalized area requirement of the clock signalinverter chains.

Power: The normalized power dissipation p of an architecture isthe sum of the normalized power dissipation of the gates,flip-flops, and inverter chains in the architecture, includingthe normalized power dissipation of the clock signal inverterchains.

Energy: The normalized energy cost of a calculation is np.

The normalized properties from Sections 3.2 and 3.3 of the gates used aregathered in Table 3.1.


Gate cin tint rout a

Boolean inverter 2 0 1 2Binary buffer 2 2 1 42-input NAND gate 2 0 2 42-input NOR gate 2 0 2 42-input AND gate 2 4 1 62-input OR gate 2 4 1 62-input XOR gate 4 2 2 122-input XNOR gate 4 2 2 12Transmission gate1 1 0 +1 2

Inverter chain2 – t(rD, cL) – a(rD, cL)m-input F2 adder 4 2 + 10 log2m 2 12(m− 1)n-way switch1, n > 2 1 0 +1 2n2-way switch1 2 0 +1 4

Flip-flop 2 10 1 16

Table 3.1: The normalized input capacitances cin, normalized internal delays tint,normalized output resistance rout, and normalized area requirement a, ofthe used building blocks.

1Transmission gate and switches: The input capacitance is only valid for the controlsignals. Observe that the n-way switch has 2n control signals while the 2-way switch hasonly 2 control signals. The transmission gate and all switches add 1 to the normalizedoutput resistance of the previous gate.

2Inverter chain: cL is the normalized load capacitance and rD is the normalized driverresistance of the previous gate. tint is here the normalized delay for all links from theprevious gate to the load capacitance.

Chapter 4

Polynomial Basis Inverters

Polynomial bases are frequently used for representation of finite extensionfields. They provide regular architectures for most arithmetic operations.

Definition 5 (Restated from Section 2.2) Let ϑ be an element of Fqm such

that{

ϑi}m−1

i=0is a basis of Fqm over Fq. Then

{

ϑi}m−1

i=0is called a polyno-

mial basis of Fqm over Fq.

4.1 Inversion Based on the Euclidean Algorithm

Berlekamp [5, pp 21–44] proposes the use of the extended Euclidean algo-rithm for polynomials for inversion in F2m . In this section, we analyze amodified architecture based on Berlekamps inversion algorithm. The ad-vantage of this architecture is that its control logic is much simpler than thecontrol logic of Berlekamps architecture.

4.1.1 The Architecture

Let p(x) be the irreducible polynomial over F2 of degree m used to generatea polynomial basis of F2m . Also, let a(x) be the polynomial representationof α ∈ F2m in this basis. Obviously, since p(x) is irreducible and since

29

30 Chapter 4. Polynomial Basis Inverters

deg a(x) < deg p(x) holds, a(x) and p(x) are relatively prime. If we initiatethe Euclidean algorithm with p(x) and a(x), then the extended algorithmgenerates two polynomials, c(x) and d(x), with degrees deg c(x) < m anddeg d(x) < m− 1. These polynomials satisfy

a(x)c(x) + p(x)d(x) = 1. (4.1)

From equation 4.1, we find that the inverse element α−1 has the polynomialrepresentation c(x). Therefore, we can use the extended Euclidean algo-rithm for inversion in F2m using a polynomial basis. The algorithm givenby Berlekamp [5, p 41] is reformulated below.

Algorithm 4.1 Inversion Algorithm Based on the Euclidean Algorithm

Initiate upper registers rU(x) = p(x) and cU(x) = 1. Also initiate lower regis-ters rL(x) = a(x) and cL(x) = 0. Set dU = m and dL = m − 1. Also initiatethe control bit k = 0.

repeat

if rU,dU= 0 then

if dU = dL then Set k = k + 1 mod 2 (⇒ k = 1)Decrement dU by 1

else if rL,dL= 0 then

if dU = dL then Set k = k + 1 mod 2 (⇒ k = 0)Decrement dL by 1

else

if k = 0 then

Set rU(x) = rU(x) + xdU−dLrL(x)Set cL(x) = cL(x) + xdU−dLcU(x)if dU = dL then Set k = k + 1 mod 2 (⇒ k = 1)Decrement dU by 1

else

Set rL(x) = rL(x) + xdL−dUrU(x)Set cU(x) = cU(x) + xdU−dLcL(x)if dU = dL then Set k = k + 1 mod 2 (⇒ k = 0)Decrement dL by 1

end if

end if

until (dU, dL) = (0, 1) or (1, 0)if k = 0 then The desired inverse is cL(x).else The desired inverse is cU(x).end if

In each iteration in Algorithm 4.1 we decrement dU or dL by 1 and whenthe algorithm is terminated we have (dU, dL) = (0, 1) or (dU, dL) = (1, 0).

4.1. Inversion Based on the Euclidean Algorithm 31

0

0

0

0 0

0

0

0

0

0

1

pm pm 1 p1 p0am 1 am 2 a0

rU(x)

rL(x)

cU(x)

cL(x)

rU,dU

rL,dL

KU KUKUKU

SU SUSUSU

AU AUAUAU

KL KLKL

SL SLSL

AL ALAL

KU KUKU

SU SUSU

AU AUAU

KL KLKL

SL SLSL

AL ALAL

OUOUOU OLOLOL

R

R

R

R

R

R

R

R

R

R

R

R

R

cm 1c1c0

Figure 4.1: An inverter architecture for F2m based on Euclid’s algorithm.

Since dU and dL are initially set to m and m− 1 respectively, the algorithmneeds a total of 2m− 2 iterations after initialization.

The degree of p(x) is m, while the degrees of all other polynomials appearingin the algorithm do not exceed m−1. Therefore it would suffice with m+1memory cells for the upper register rU(x) and m memory cells for the otherregisters. Totally, we can store the polynomials using 4m+ 1 memory cells.Berlekamp observed that the inequalities

deg rU(x) + deg cU(x) ≤ m, (4.2)

deg rL(x) + deg cL(x) ≤ m (4.3)

hold throughout the algorithm. He used this fact to reduce the number ofmemory cells to 2(m+2) by letting spare memory cells from the r-registersbe allocated by the c-registers. However, this solution demands very muchcontrol logic, both global logic and especially local logic in each bit-slice.

We instead analyze the algorithm using four registers, one of length m+ 1and three of length m, thus removing the need for local control signals ineach bit-slice. In Figure 4.1 we show an architecture based on Algorithm 4.1.The control logic needs to keep track of k, dU, and dL, and generate thecontrol signals R (reset), KU (keep rU(x)), SU (shift rU(x)), AU (add torU(x)), OU (output cU(x)), KL (keep rL(x)), SL (shift rL(x)), AL (add torL(x)), and OL (output cL(x)).


4.1.2 Properties of the Architecture

The critical paths start in rU,dU and rL,dL , pass through the control logicvia the global reset signal R, and end in the inputs of the flip-flops. Thenormalized area and delay of this architecture can easily be found usingthe methods from Chapter 3. However, special attention is needed for thenormalized power.

Let n1 denote the total number of transitions among the flip-flops in the rand c registers in Figure 4.1 during initialization, and let e1 denote the cor-responding normalized energy. Initially in Figure 4.1, we set rU(x) = p(x),rL(x) = a(x), cU(x) = 1, and cL(x) = 0. Then all flip-flops in rL(x), cU(x),and cL(x) may have transitions, since a(x) may be any nonzero element ofthe field, and the contents of the c registers may be any nonzero binary vec-tors of length m after the previous calculation. However, regarding rU(x)we should note that either one or two of the two leftmost flipflops are setafter the previous calculation. Therefore, the weight w of p(x) bounds thetotal number of transitions among the flip-flops in rU(x) at initialization toat most w + 1. Thus, we have

n1 ≤ 3m+ w + 1.

The output of almost every one of these flip-flops is connected to an inputof an XOR gate. There are 8 transistors in the data path of each flip-flop,and 6 transistors of an XOR gate controlled by each flip-flop. Totally, wehave

e1 ≤ 14n1 ≤ 42m+ 14w + 14

A question that arises is whether we can determine or bound the smallestpossible w for all m. By computer search we have found that there is at leastone irreducible trinomial or pentanomial for all degrees m in the interval2 ≤ m ≤ 4000. Hence, w does not need to be greater than 5. The result ofthis search is given in Appendix A.

In each clock cycle we decrement either dU or dL. Consider a clock cyclewhere dL is decremented from d to d − 1, and let kd be the number oftransitions in the r and c registers in Figure 4.1 during this clock cycle.The inequalities in Equations 4.2 and 4.3 can be used to bound kd. WhendL is decremented, we have rU,dU = 1 and there are two possibilities; rL,d iseither 0 or 1. When we have rL,d = 0 the following happens.

4.1. Inversion Based on the Euclidean Algorithm 33

1. rL(x) is shifted, yielding transitions in at most d+1 flip-flops in rL(x).

2. rU(x) is unchanged.

3. cL(x) is shifted. Then Equation 4.3 assures that there are transitionsin at most m− d+ 2 flip-flops in cL(x).

4. cU(x) is unchanged.

The total number of transitions among the flip-flops in the r and c registersis then at most m+ 3. If instead rL,d = 1 holds, the following happens.

1. A shifted version of rU(x) is added to a shifted version of rL(x) andplaced in rL(x), yielding transitions in at most d+1 flip-flops in rL(x).

2. rU(x) is unchanged.

3. cL(x) is shifted. Then Equation 4.3 assures that there are transitionsin at most m− d+ 2 flip-flops in cL(x).

4. cL(x) is added to cU(x). Then Equation 4.3 assures that there aretransitions in at most m− d+ 1 flip-flops in cU(x).

In this case, the total number of transitions among the flip-flops in the rand c registers is at most 2m− d + 4. Throughout the algorithm, we haved ≤ m, which implies 2m− d+4 > m+3. Hence, we have kd ≤ 2m− d+4.For a clock cycle where instead dU is decremented from d to d−1 we use thesame arguments, based on Equation 4.2 instead, and get the same boundon the number of transitions.

Let n2 denote the total number of transitions among the flip-flops in the rand c registers in Figure 4.1 during calculation after initialization, and lete2 denote the corresponding normalized energy. Initially, we set (dU, dL) to(m,m − 1) and during the calculation (dU, dL) are decremented to either(1, 0) or (0, 1). With up to 14 transistors controlled by each flip-flop wetherefore have the bounds

n2 = km + 2m∑

d=2

kd + k1 ≤ 3m2 + 4m− 7,

e2 ≤ 14n2 ≤ 42m2 + 56m− 98.

Let e3 denote the normalized energy corresponding to the transitions inthe reset signal R during one calculation. R is 1 during initialization, i.e.


during exactly one clock cycle in each calculation. Hence, there are exactly2 transitions in R. Let c3 denote the total normalized load capacitance ofR and the needed inverter chains. Recall that a(r, c) is the area of a delay-optimal inverter chain, driven by a normalized resistance r and loaded by anormalized capacitance c, as defined in Equation 3.3 on page 25. Then wehave

c3 = 8m+ 2 + a(1, 8m+ 2) ≤ 16m+ 4,

e3 = 2c3 ≤ 32m+ 8,

using the bound on a(r, c) in Equation 3.5 on page 25.

Let e4 denote the normalized energy corresponding to the transitions in thecontrol signals OU and OL during one calculation. These signals are used tochoose the output. This choice is done once for each calculation. Thereforewe have at most one transition in these signals. Let c4 denote the totalnormalized load capacitance of OU and the needed inverter chains for OU.We note that c4 is also the total normalized load capacitance of OL and theneeded inverter chains for OL. Then we have

c4 = 2m+ a(1, 2m) ≤ 4m,

e4 ≤ 2c4 ≤ 8m,

again using the bound on a(r, c) in Equation 3.5 on page 25.

Let e5 denote the normalized energy of all control signals, except OU,OL,and R, during one calculation including initialization. The normalized loadcapacitances of control signals KU, SU, and AU, are all 4m + 2, while thenormalized load capacitance of control signals KL, SL, and AL, are all 4m.During each of the 2m− 1 clock cycles, there are transitions in at most twoof of the first three signals and in at most two of the latter three signals.Totally, this gives us

e5 ≤ (2(4m+ 2 + 4m) + 2a(4m+ 2) + 2a(4m))(2m− 1)

≤ 64m2 − 16m− 8.

The total normalized energy of the considered signals is thus bounded by

5∑

i=1

ei ≤ 106m2 + 122m+ 14w − 84

for one calculation.

4.2. A Berlekamp-Massey Based Inverter 35

Number of clock cycles (n) 2m− 1Normalized delay (t) 4.4 log2(m+ 1

4) + 26.8 + tctrl

Normalized time (nt) (2m− 1)(

4.4 log2(m+ 1

4) + 26.8 + tctrl

)

Normalized area (a) 192m+ 28Normalized power (p) 181m+ 119.5 + 14w+3.5

2m−1

Normalized energy (np) 362m2 + 58m+ 14w − 116

Table 4.1: Properties of the Euclidean inverter architecture. m is the extension degreeand w is the weight of the irreducible polynomial used to generate thebasis. All properties except the number of clock cycles are upper bounds.tctrl is the normalized delay of the control logic.

The main part of the energy, however, is due to the clock signal. Let e6denote the normalized energy due to clocking during one calculation. Wehave 4m + 1 flip-flops with 8 clocked transistors each, which gives us thenormalized load capacitance 32m + 8. The number of clock cycles for onecalculation is 2m− 1, and we have 2 transitions in each clock cycle, whichgives us exactly 4m−2 transitions for the clock signal during one calculation.This gives us the normalized energy (32m+8)(4m−2) of the clocked parts ofthe flip-flops, and the normalized energy of the clock signal inverter chainsis at most the same value. Totally, the normalized energy of one calculationdue to the clocking signal is bounded by

e6 ≤ 256m2 − 64m− 32,

and we have the expected normalized power and energy of the architecturegiven in Table 4.1, together with its other properties.

4.2 A Berlekamp-Massey Based Inverter

Several authors have observed and characterized the similarities betweenthe Euclidean algorithm for polynomials and the Berlekamp-Massey algo-rithm for shift register synthesis. For instance, Reed et. al. [42] give animplementation of the Berlekamp-Massey algorithm for finding the error-locator polynomial using continued fractions. The topic is also treated byCheng [8], Dornstetter [11], Eastman [12], and Welch, Scholtz [52]. Sincethe extended Euclidean algorithm for polynomials can be used for inversionin finite extension fields, it seems resonable that there should be a way to


use the Berlekamp-Massey algorithm for the same purpose. Hasan [19] alsoclaims that it is possible, but does not give any architecture.

4.2.1 Triangular Bases

The Berlekamp-Massey algorithm essentially performs Gauss elimination onHankel matrices. Thus, by Theorem 1 on page 13, we need two bases inorder to represent our elements as Hankel matrices; one that is a multiple ofa polynomial basis, and one that is a multiple of the dual of that basis. Thefollowing basis was used for multiplication by Wang and Blake [50] and byHasan and Bhargava [21]. The basis was introduced by Wang and Blake,and named by Hasan and Bhargava.

Definition 12 Let{

ϑi}m−1

i=0be a polynomial basis of Fqm over Fq with

ϑ being a root of the monic irreducible polynomial p(x) =∑m

i=0 pixi over

Fq. Then {σj}m−1j=0 given by σj =

∑m−1−ji=0 pi+j+1ϑ

i is the triangular basis

corresponding to{

ϑi}m−1

i=0.

A simple result of Lemma 2 on page 12 is the following.

Lemma 3 Let {θi}m−1i=0 and {σi}m−1

i=0 be bases of Fqm over Fq and definethe row vectors θ , (θ0, θ1, . . . , θm−1) and σ , (σ0, σ1, . . . , σm−1). Thenσ = θ1θ,σ holds.

Proof: By Definition 9 we have σi = σ (σi)σ = θ (σi)θ, 0 ≤ i < m, andby Lemma 2 we have (σi)θ = 1θ,σ (σi)σ, 0 ≤ i < m. Combining thesetwo statements, we get σ (σi)σ = θ1θ,σ (σi)σ, 0 ≤ i < m. This equationsystem can be identified as σ1σ,σ = θ1θ,σ1σ,σ, since (σi)σ, 0 ≤ i < m, arethe columns of 1σ,σ. The matrix 1σ,σ is the m × m unit matrix, which istrivially nonsingular. Therefore we have σ = θ1θ,σ. 2

The following extension of Lemma 2 is straightforward and is therefore givenwithout proof.


Lemma 4 Let {θj}m−1j=0 , {σi}m−1

i=0 , and {φk}m−1k=0 be bases of Fqm over Fq

and define the row vectors θ , (θ0, θ1, . . . , θm−1), σ , (σ0, σ1, . . . , σm−1),and φ , (φ0, φ1, . . . , φm−1). Let α, β, and γ be elements of Fqm. Thenγσ,φ = ασ,θβθ,φ holds if and only if γ = αβ holds.

Now we have what we need to prove the following result.

Theorem 2 Let{

ϑi}m−1

i=0be a polynomial basis of Fqm over Fq and let

{σi}m−1i=0 be its triangular basis. Then {σi}m−1

i=0 is a multiple of the dual

basis of{

ϑi}m−1

i=0.

Proof: Since{

ϑi}m−1

i=0is a polynomial basis, ϑ is a root of an irreducible

polynomial, say p(x) =∑m

i=0 pixi, over Fq with pm = 1. Define the row

vectors θ ,(

ϑ0, ϑ1, . . . , ϑm−1)

and σ , (σ0, σ1, . . . , σm−1). By Theorem 1,it suffices to show that ασ,θ is a Hankel matrix for all α in Fqm .

It is easily shown that

ϑθ,θ =

0 · · · 0 −p0

1 0 −p1

. . ....

0 1 −pm−1

holds. By Definition 12 and Lemma 3 we have

1θ,σ =

p1 · · · pm... . .

.

pm 0

.

It is easily checked that ϑσ,σ = ϑTθ,θ fulfills the equality 1θ,σϑσ,σ = ϑθ,θ1θ,σ,

given by Lemma 4.

To prove that ασ,θ is a Hankel matrix for all α ∈ Fqm we have to examinethis matrix closer. From Definition 10 on page 11 and Lemma 2 on page 12,we have

ασ,θ =(

ϑ0σ,σασ, ϑ

1σ,σασ, . . . , ϑ

m−1σ,σ ασ

)

.

That is, each column of ασ,θ is the previous column multiplied by ϑσ,σ. Sinceϑσ,σ has the above form, it is obvious that ασ,θ is a Hankel matrix. 2

It should be mentioned that Hasan and Bhargava [23] showed that ασ,θ isgiven by the Hankel matrix ασ,θ = (aij)

m−1,m−1i,j=0,0 , aij = Tr(σ′0ϑ

i+jα). Thisobservation together with Theorem 1 is enough to prove Theorem 2.


Tr(θ′m−1−jα),0 ≤ j < m

and0,

m ≤ j < 2m− 1

Tr(σ′

0ϑjα), 0 ≤ j < 2m− 1

pm−1 pm−2 pm−3 p1 p0

Figure 4.2: Generation of ασ,θ from αθ using a Fibonacci type linear feedback shiftregister.

4.2.2 The Architecture

From now on let us assume that {θj}m−1j=0 is a polynomial basis of F2m over

F2, where the basis elements are given by θj = ϑj and where ϑ is a root ofthe irreducible polynomial p(x) =

∑mi=0 pix

i over F2 of degree m. Let usalso assume that {σi}m−1

i=0 is the triangular basis corresponding to {θj}m−1j=0 .

It is well known that the entries of ασ,θ can be generated from the vectorrepresentation

αθ =(

Tr(θ′0α),Tr(θ′1α), . . . ,Tr(θ′m−1α))T

of α with respect to {θj}m−1j=0 using the relations

Tr(σ′0ϑjα) =

Tr(θ′m−1α), j = 0

Tr(θ′m−1−jα) −∑m−1

i=m−j piTr(θ′m−1ϑj+i−mα), 0 < j < m

−∑m−1

i=0piTr(θ′m−1ϑ

j+i−mα), m ≤ j ≤ 2m− 2.

This means that we can generate ασ,θ from αθ using a Fibonacci-type linearfeedback shift register of length m, where the feedback network is givenby p(x) as shown in Figure 4.2. The flip-flops in the feedback register areinitially set to zero.

The Hankel matrix ασ,θ specifies the left hand side of the equation

ασ,θ(α−1)θ = 1σ

which we wish to solve. The Berlekamp-Massey algorithm finds a solutionto a Hankel problem with the right hand side being zero. The right handside of the equation is 1σ = (0, . . . , 0, 1)T, since by Definition 12 we haveσm−1 = 1. Therefore we can run the Berlekamp-Massey algorithm on the


first m − 1 rows of ασ,θ. In our inverter, the Berlekamp-Massey algorithmsolves equation systems over F2. Therefore all nonzero discrepancies are 1and hence we do not need to save them, nor do we need to invert them, asthe original algorithm prescribes. Let hi, 1 ≤ i ≤ m be the i-th row of ασ,θ.Assume that the Berlekamp-Massey algorithm has generated the resultingcolumn vector a. We have already noted that ασ,θ is nonsingular for allnonzero α. Therefore the product hma is nonzero. Since the only nonzeroelement of F2 is 1, we have (α−1)θ = a.

For the analysis, we give the Berlekamp-Massey algorithm here, in a formsimilar to that given by Feng and Tzeng [14]. However, it is given underthe assumtion that it is to be used for inversion in F2m . Therefore, we canassume that the matrix is non-singular. Define the shift matrix

S ,

(

0 0Im−1 0

)

,

where Im−1 is the m − 1 ×m − 1 identity matrix. Since we are interestedin binary equation systems only, we state the algorithm for use over F2.

Algorithm 4.2 The Berlekamp-Massey Algorithm for Inversion in F2m

Given a sequence [z1, . . . , z2m−1] over F2, associated with a nonsingular m×mHankel matrix, H = (hi,j)

m,mi,j=1,1, with rows hi = (zi, . . . , zi+m−1), initiate

vectors A = (0, . . . , 0)T and B = (1, 0, . . . , 0)T of length m, and initiate integersr = s = 1 and s′ = 0.

while r < mSet ∆ = hrBif ∆ = 0 then

Increment r Continue with the next row.else if r < s then

Set B = B + Ss−r−1A Update vector B.Increment r Continue with the next row.

else

Set s′ = s Remember this column number.Set s = r + 1 Consider a new column,Set r = s′ and a new row.Set (A,B) = (B,Ss−s′

B +A) Update vectors A and B accordingly.end if

end while

The result is B.

The integers r and s are the row number and the column number considered.The third integer s′ is the column number of the previous column considered.


am−1−j , 0 ≤ j < m0, m ≤ j < 2m− 1

}

∆

0 0 0

00

0

0

0 0 0

0

0

1

pm−1 pm−2 p1 p0

R R

R R R

R R R

R R RR R R

RR R R

SS S SLL L L

A

B

C

Figure 4.3: Architecture of an inverter for F2m using the Berlekamp-Massey algo-rithm. The bold lines denote possible critical paths.


Massey [33] chose to consider only the active parts of the vectors, and byletting A instead be shifted backwards one step during each iteration, hecould remove the shift represented by Ss−s

′

B above, and produce a veryregular architecture.

In Figure 4.3 we display an inverter architecture based on Algorithm 4.2.The feedback register C in the bottom of the figure generates the entries ofασ,θ, i.e. the zi’s in Algorithm 4.2. It is a modified version of the feedbackregister in Figure 4.2. The switches in the register are used for initiatingit to zero, and the flip-flop in the feedback network reduces the length ofthe critical path. The rest of Figure 4.3 is a modified version of the stan-dard Berlekamp-Massey architecture proposed by Massey [33]. A result ofMassey’s reconfiguration is that the leftmost flip-flop in register B is alwaysset. Therefore the leftmost bitslice in Figure 4.3 is somewhat simplified.Let s∗ be the value of s when Algorithm 4.2 terminates. Another resultof Massey’s reconfiguration is that the resulting vector in B needs to beshifted m − s∗ steps to its correct position after the calculation. This isdone by letting the algorithm continue until we have r+ s = 2m+ 1, whichcan be seen as an extension of the matrix by one more column. Duringthese extra iterations, the contents of register B is moved to register A, andthen register A is shifted m − s∗ steps. At the same time, the contents ofB is updated a number of times, producing nothing of particular interest.Finally, we can read out the result from register A.

The control logic needs to keep track of row number r and column number sin the algorithm, and generate the control signals R (reset), S (shift registerA), and L (load register A) accordingly.

4.2.3 Properties of the Architecture

There are two possibilities of critical paths depending on the delay of thecontrol logic. One possibility is indicated by thicker lines in Figure 4.3. Theother starts as the first in the input of the architecture, enters the controllogic through the discrepancy ∆, exits the control logic through the resetsignal R, and ends in the flip-flops of register B. The normalized area anddelay of this inverter can be found using the methods from Chapter 3. Letw be the weight of p(x). The area needed for the architecture depends onw, since the adder in the feedback network of register A has w − 1 inputs.As we noted in Section 4.1, w does not need to be greater than 5. As for


the inverter based on the Euclidean algorithm, special attention is neededfor the normalized energy.

Let e1 denote the normalized energy consumed by the architecture duringinitialization, excluding clocking and control of the switches. The registersA, B, and C, may contain any binary vectors, except the all zero vector, atthe end of the previous calculation. They are initiated with all zeros, whichmeans that there may be transitions in all flip-flops. Thus, there may betransitions in any data path in Figure 4.3, and e1 is upper bounded by thenormalized area of this part of the architecture, except for the transistorsthat are not controlled directly by any of the datapaths. The transistors thatwe leave out now are the clocked ones in the flip-flops, and the transistorsin the switches. Thus, we have

e1 ≤ 60m+ 12w + a(1, 2(m− 1)) + a(2, 10) − 52

≤ 62m+ 12w − 47.2,

where a(r, c) is the area of a delay-optimal inverter chain, driven by a nor-malized resistance r and loaded by a normalized capacitance c, as definedin Equation 3.3 on page 25, and where we have used the bound on a(r, c)in Equation 3.5, also on page 25.

Let n2 denote the total number of transitions among the flip-flops in theregisters A and B in Figure 4.3 during calculation after initialization, andlet e2 denote the corresponding normalized energy. Number the clock cyclesfrom 1 to 2m, where clock cycle 1 is used for initialization. Then, in clockcycle k, 1 < k ≤ 2m, we have s + r = k. We should note that only the s′

rightmost flip-flops among the r leftmost flip-flops in register A may be set.Also, only the s − 1 leftmost flip-flops in register B may be set. In eachiteration, we have three possibilities.

1. ∆ = 0. Then register A is shifted, resulting in at most s′ + 1 transi-tions.

2. ∆ = 1 and r < s. Then we get at most s′+1 transitions from the shiftof register A. At the same time, the contents of register A is addedto register B, resulting in at most s′ transitions in register B. Totallywe have at most 2s′ + 1 transitions in registers A and B.

3. ∆ = 1 and r ≥ s. Then both registers are updated. There are at mostr transitions in A and at most s′ transitions in B. Totally, we have atmost r + s′ transitions in registers A and B.


From Algorithm 4.2 we find that s′ < s and s′ ≤ r hold throughout thealgorithm. Using these inequalities in the three cases above, we can deducethat we have no more than k transitions in registers A and B during iterationk. Therefore we have

n2 ≤2m∑

k=2

k = 2m2 +m− 1.

Each flip-flop in registers A and B controls at most 4 transistors in an ANDgate, and 6 transistors in an XOR gate, in addition to the 8 transistors inthe data path of each flip-flop. Totally, we have at most 18 transistor gatescontrolled by each flip-flop, which gives us

e2 ≤ 18n2 ≤ 36m2 + 18m− 18.

Let e3 denote the normalized energy corresponding to transitions in registerC during calculation after initialization. The leftmost flip-flop in C togetherwith the input controls 12 transistors in an XOR gate, 4 transistors in abuffer, and 8 transistors in the data path of the flip-flop. The rest of theflip-flops in C each controls 2 transistors in an AND gate, in addition to the8 transistors in the data path of each flip-flop. The leftmost flip-flop mayhave transitions during all clock cycles. Let n3 denote the total number oftransitions among the rest of the flip-flops in register C during calculationafter initialization. Consider clock cycle k, 1 < k < m. Then there maybe transitions in the k − 1 leftmost flip-flops among the considered ones.During clock cycle k, m ≤ k ≤ 2m, however, there may be transitions in allthose m− 1 flip-flops. Therefore, we have

n3 ≤m−1∑

k=2

(k − 1) +

2m∑

k=m

(m− 1) =3

2(m2 −m),

e3 ≤ 10n3 + 24(2m− 1) ≤ 15m2 + 33m− 24

Let e4 be the normalized energy consumed in the w − 1-input adder in thefeedback shift register, in the m-input adder that produces ∆, and in thedistribution of ∆ over m − 1 AND gates. Let c4 be the total normalizedcapacitance in these adders, these AND gates, and the distribution over theAND gates. Totally, we have

c4 = 12(w − 1) + 12(m− 1) + 4 + a(1, 2(m− 1)) + 2(m− 1)

≤ 16m+ 12w − 28.


Number of clock cycles (n) 2mNormalized delay (t) 14.4 log2(m) + 54.9Normalized time (nt) 28.8m log2(m) + 109.8mNormalized area (a) 136m+ 12w − 72Normalized power (p) 145.5m+ 12w + 8.5 − 38.6

mNormalized energy (np) 291m2 + 24mw + 17m− 77.2

Table 4.2: Properties of the Berlekamp-Massey inverter architecture. m is the ex-tension degree and w is the weight of the irreducible polynomial used togenerate the basis. All properties except the number of clock cycles areupper bounds.

where we have used the bound on a(r, c) in Equation 3.5 on page 25. Onecalculation, except initialization, takes 2m − 1 clock cycles. Therefore, wehave the trivial bound

e4 ≤ (2m− 1)(16m+ 12w − 28)

= 32m2 + 24mw − 72m− 12w + 28.

Let e5 denote the normalized energy consumed by the global control signalsL, S, and R, and by the distribution of these signals. There are exactly 2transitions in R during one calculation. The normalized capacitive load ofR is 10m − 4 + a(1, 10m − 4), including inverter chains. In L and S, how-ever, there may be transitions in all 2m clock cycles. The total normalizedcapacitive load of these signals is 4m+2a(1, 2m), including inverter chains.Thus, we have

e5 ≤ 2(10m− 4 + a(1, 10m− 4)) + 2m(4m+ 2a(1, 2m))

≤ 16m2 + 40m− 16,

where we again have used the bound on a(r, c) in Equation 3.5 on page 25.

The total normalized energy of the above considered signals is bounded by

5∑

i=1

ei ≤ 99m2 + 24mw + 81m− 77.2

for one calculation.

4.3. Inversion Based on the Gauss-Jordan Algorithm 45

As before, the main part of the energy is due to the clock signal. Let e6denote the normalized energy due to clocking during one calculation. Wehave 3m − 1 flip-flops with 8 clocked transistors each, which gives us thenormalized load capacitance 24m − 8. The number of clock cycles for onecalculation is 2m, and we have 2 transitions in the clock signal in each clockcycle, which gives us exactly 4m transitions for the clock signal during onecalculation. This gives us the normalized energy 4m(24m−8) of the clockedparts of the flip-flops, and the normalized energy of the clock signal inverterchains is at most the same value. Totally, the normalized energy of onecalculation due to the clocking signal is bounded by

e6 ≤ 192m2 − 64m,

and we have the expected normalized power and energy of the architecturegiven in Table 4.2, together with its other properties.

4.3 Inversion Based on the Gauss-Jordan Algo-

rithm

Any algorithm for solving general linear equation systems can naturally beused for inversion in finite fields. One such algorithm is the Gauss-Jordanalgorithm. The version of the algorithm given by Wang and Lin [48][49] isreformulated here.

Algorithm 4.3 The Gauss-Jordan Algorithm

Given an n × n nonsingular matrix A = [ai,j ] and an n-dimensional columnvector b = [bi], over F2, the solution of the equation Ax = b can be found bythe following row operations on (A|b).for k = 1 to n do

Let i be the row number of the topmost nonzero element of the k-th column.for j = i+ 1 to n do

Add aj,k times row i to row j.end for

Rotate rows i to n such that row i becomes row n.end for

The result is in the vector b.


(a)

(b)

(c)

P1,1 P1,2

P1,

n−1P1,n

P1,

n+1

P2,2

P2,

n−1P2,n

P2,

n+1

Pn−1,

n−1

Pn−1,

n

Pn−1,

n+1

Pn,n

Pn,

n+1

10n−1

C

T

T

T

T T

TT

T

E

E

E

E

Din

Dout

b1...bn

a1,1

...a1,n

a2,1

...a2,n

an−1,1

...an−1,n

an,1

...an,n

X...Xx1

...xn

Figure 4.4: Basic Architecture for the Gauss-Jordan algorithm over F2: (a) Overallstructure, (b) boundary cell, and (c) main cell. X represents data fromprevious calculations.


The basic architecture for this algorithm is given in Figure 4.4. The basiccells and the boundary cells are the cells used by Wang and Lin [48][49],with an obvious simplification in the boundary cells and another equallyobvious simplification in the main cells. The number of boundary cells is nand the number of main cells is n(n+1)/2. The latency of this architectureis n clock cycles and the throughput rate is one result per n clock cycles.

4.3.1 A Systolic Implementation of the Gauss-Jordan algo-

rithm

The choice of an architecture for any calculation is a tradeoff between space,clocking speed, throughput rate, latency, and regularity. For systolic archi-tectures, this tradeoff is completely in favour of clocking speed and regu-larity. These sequential architectures are highly pipelined and built up of aregular structure of several copies of a few basic cells. The cells should besmall and independent of the size of the architecture. A cell is supposed tobe connected only to its nearest neighbours, including control signals. Thecritical path in a systolic architecture is supposed to be short and indepen-dent of the size of the architecture.

The architecture in Figure 4.4 is not systolic since the critical path is notindependent of n. The architecture has no feedback loops except within thecells, so the signals flow in one direction between the cells. This fact makesit well suited for pipelining in order to minimize the critical path in thearchitecture. A pipelined version of this architecture is given in Figure 4.5.The latency of this architecture is 3n − 1 clock cycles and the throughputrate is still one result per n clock cycles. Also the critical path is short andindependent of n.

4.3.2 Previous Inverters Based on the Gauss-Jordan Algo-

rithm

Let p(x) =∑n

i=0 pixi be the irreducible polynomial of degree m over F2

used to generate a polynomial basis of F2m . Also let a(x) =∑n−1

i=0 aixi and

c(x) =∑n−1

i=0 cixi be the polynomial representations of α ∈ F∗

2m and itsinverse, respectively, in this basis. Then

a(x)c(x) + p(x)d(x) = 1 (4.4)


P1,1 P1,2

P1,

n−1P1,n

P1,

n+1

P2,2

P2,

n−1P2,n

P2,

n+1

Pn−1,

n−1

Pn−1,

n

Pn−1,

n+1

Pn,n

Pn,

n+1

01m−1

X...XXXb1...bn

a1,1

...a1,n

Xa2,1

...a2,n

X...X

an−1,1

...an−1,n

X...XXan,1

...an,n

X...Xx1

...xn

Figure 4.5: A pipelined version of the architecture for the Gauss-Jordan algorithmover F2. X represents data from previous calculations.


holds for some d(x) =∑n−2

i=0 dixi of degree at most n − 2. Wang and

Lin [48][49] use a systolic implementation of the Gauss-Jordan algorithm tosolve the obvious linear equation system based on this polynomial equation.The preprocessor is simple. The major drawback is the size of the equationsystem. The matrix is of size (2m − 1) × (2m − 1), while we have notedin Section 4.2 that we only need an m × m matrix. The trivial solutionwould be to use the architecture in Figure 4.5, with n = 2m− 1. Wang andLin noted that it is possible to simplify the architecture in this case, thusremoving approximately 25% of the area. However, the architecture is stillapproximately 3 times as large as if we had used an m×m matrix instead.

The second architecture is proposed by Hasan and Bhargava [22]. As inSection 4.2 let us assume that {θj}m−1

j=0 is a polynomial basis, where the

basis elements are given by θj = ϑj , and where ϑ is a root of the irreduciblepolynomial p(x) =

∑mi=0 pix

i over F2 of degree m. Let α be an element ofF∗

2m . This inverter uses a preprocessor that generates αθ,θ from αθ. Then theGauss-Jordan algorithm is used to solve the equation αθ,θ(α

−1)θ = 1θ withn = m. The preprocessor of the Hasan and Bhargava [22] inverter is givenin Figure 4.6. By using the m×m matrix αθ,θ, Hasan and Bhargava addressthe problem of the size of the Wang and Lin architecture. However, thereis a disadvantage of their preprocessor, namely the triangular shift registerlayer in the top used to feed the matrix entries to the systolic implementationof the Gauss-Jordan algorithm. The area of this preprocessor is thereforeessentially linear in m2.

Using the methods from Chapter 3 we find that this architecture has theproperties in Table 4.3. As before, special care is needed for the controlsignals. The architecture is fed with new data every m clock cycles. Thecontrol signals, both of the Gauss-Jordan architecture in Figure 4.5 and ofthe preprocessor in Figure 4.6, have totally two transitions each during thesem clock cycles. All other signals may have transitions during every clockcycle. For brevity, we omit the detailed derivation of the power consumtion.We use the same approach as in Sections 4.1 and 4.2.

4.3.3 A New Preprocessor

We continue to let {θj}m−1j=0 be a polynomial basis of F2m over F2, where the

basis elements are given by θj = ϑj , and where ϑ is a root of the irreducible


(b)

(a)

m signals

b0...

bm−1

m− 1flipflops

m− 2flipflops

2m− 1flipflops

am−1, . . . , a0

1, 0m−1

pm−1, . . . , p0

ain

Cin

pin

aout

Cout

pout

Figure 4.6: The preprocessor of the systolic inverter for F2m using the Gauss-Jordanalgorithm due to Hasan and Bharghava [22].

Number of clock cycles (n) mNormalized delay (t) 30Normalized time (nt) 30mNormalized area (a) 73m2 + 271m− 194Normalized power (p) 113m2 + 439m+ 36w − 272 − 52wm−1 − 84m−1

Normalized energy (np) 113m3 + 439m2 + 36wm− 272m− 52w − 84

Table 4.3: Properties of the systolic Gauss-Jordan inverter architecture due to Hasanand Bhargava. m is the extension degree and w is the weight of theirreducible polynomial used to generate the basis. All properties exceptthe number of clock cycles are upper bounds.


polynomial p(x) =∑m

i=0 pixi over F2 of degree m. As in Section 4.2, let us

also assume that {σi}m−1i=0 is the triangular basis corresponding to {θj}m−1

j=0 .Again, let α be an element of F∗

2m , and assume that α is represented as αθ.We noted in Section 4.2 that ασ,θ can be generated from αθ using the Fibo-nacci type linear feedback shift register in Figure 4.2 on page 38. Finally,we can find the inverse of α by solving the equation system ασ,θ(α

−1)θ = 1σ.

Let w be the weight of p(x). An obvious drawback of the Fibonacci typefeedback shift register is that the w-input F2-adder in the feedback networkwill determine the critical path of the inverter. The delay of the critical pathis then essentially linear in log2w. Also, the regularity of this feedback shiftregister is poor. The critical path of this shift register can be reduced byrearranging the memory cells as in Figure 4.7. This rearrangement alsomakes the feedback shift register much more regular. In this modified shiftregister no signal passes more than two F2-adders and one F2-multiplier.

The preprocessor shall not only generate the entries of the matrix. It shallalso distribute these entries correctly. A possible feeding order is givenin Table 4.4. The systolic implementation of the Gauss-Jordan algorithmin Figure 4.5 on page 48 is designed to accept new data every m timeinstances. Therefore we should apply new data in the same manner startingat time instance m. While one part of the architecture is doing the lastcomputations for one input, the other part of the architecture is doing thefirst computations for the next input.

Consider the preprocessor in Figure 4.8. It consists of two modified Fibonac-ci type linear feedback shift registers to deal with the need of processing twodifferent inputs at the same time. These are built by the cells in Figure 4.8band one of the cells in Figure 4.8c-f. The cells in Figure 4.8g are used to shiftthe result further. The choice between the outputs of the two feedback shiftregisters is performed by the switches and the uppermost shift register inFigure 4.8a. Together with each feedback shift register there is an additionalshift register with input labelled Cin and output labelled Cout in the cells.These shift registers are used together with the additional F2 multipliers toreset the feedback shift registers after a completed calculation.

Using the methods from Chapter 3 we find that this architecture has theproperties in Table 4.5. As before, special care is needed for the controlsignals. The architecture is fed with new data every m clock cycles. Thecontrol signal in the Gauss-Jordan architecture in Figure 4.5 on page 48 still


Tr(θ′m−1−jα),0 ≤ j < m

and0,

m ≤ j < 2m− 1

Tr(σ′

0ϑjα), 0 ≤ j < 2m− 1

0

0

0

0 0

pm−1 pm−2 pm−3 p1 p0

Figure 4.7: Generation of ασ,θ from αθ using a modified Fibonacci type linear feed-back shift register with a short critical path.

Time Column Left handinstance 0 1 2 3 side

0 Tr(σ′

0ϑ3α)

1 Tr(σ′

0ϑ4α) Tr(σ′

0ϑ2α)

2 Tr(σ′

0ϑ5α) Tr(σ′

0ϑ3α) Tr(σ′

0ϑ1α)

3 Tr(σ′

0ϑ6α) Tr(σ′

0ϑ4α) Tr(σ′

0ϑ2α) Tr(σ′

0ϑ0α)

4 Tr(σ′

0ϑ5α) Tr(σ′

0ϑ3α) Tr(σ′

0ϑ1α) 0

5 Tr(σ′

0ϑ4α) Tr(σ′

0ϑ2α) 0

6 Tr(σ′

0ϑ3α) 0

7 1

Table 4.4: The order of the input signals to the systolic implementation of the Gauss-Jordan algorithm for inversion in F16 using our preprocessor.

Number of clock cycles (n) mNormalized delay (t) 40Normalized time (nt) 40mNormalized area (a) 61m2 + 351m+ 24w − 216Normalized power (p) 93m2 + 503m+ 24w − 280 − 68m−1

Normalized energy (np) 93m3 + 503m2 + 24mw − 280m− 68

Table 4.5: Properties of our systolic Gauss-Jordan inverter architecture. m is theextension degree and w is the weight of the irreducible polynomial used togenerate the basis. All properties except the number of clock cycles areupper bounds.


(b) (c)

(d) (e)

(f) (g)

(a)

00000000 11111111

m signals

2m− 1 flipflops

i=0 i=1 · · ·

⌈m/4⌉ − 1 copies

midcells

m− ⌈m/4⌉ − 1 copies

p0

p0

p0

p0

p1

p1p1

p2

p2

p3pm−1−4i

pm−2−4i

pm−3−4i

pm−4−4i

12m−202

1m−2021m

0m1m

1m0m

1m−10m1

am−1, . . . , a0

0m−11

Cin

Cin

Cin

Cin

CinCout

sin

sin

sin

sin

sin

sin

sout

sout

sout

sout

sout

sout

fin

fout

fout

fout

fout

fout

(b)

(b)(b)

(b)(b)

(b)

(c)...

(f)

(c)...

(f)

(g)(g)

(g)(g)(g)

(g)

Figure 4.8: The preprocessor of our systolic architecture of an inverter for F2m usingthe Gauss-Jordan algorithm: (a) Overall structure, (b) left cells, (c) midcells for m ≡ 0 mod 4, (d) mid cells for m ≡ 3 mod 4, (e) mid cells form ≡ 2 mod 4, (f) mid cells for m ≡ 1 mod 4, and (g) right cells.


have totally two transitions during these m clock cycles. The control signalsin our preprocessor in Figure 4.8, however, have totally two transitions eachduring 2m clock cycles. All other signals may have transitions during everyclock cycle. Again for brevity, we omit the detailed derivation of the powerconsumtion.

4.4 Properties of the Polynomial Basis Inverters

The architectures in this chapter can be divided into two categories. The ar-chitectures in Sections 4.1 and 4.2 based on Euclids and Berlekamp-Masseysalgorithms both essentially solve Hankel problems. The architectures in Sec-tion 4.3 are based on the Gauss-Jordan algorithm which can solve arbitrarylinear equation systems. In Figures 4.9 through 4.11 we have plotted thenormalized time, area, and energy, respectively, needed for the architecturesconsidered in this chapter, for 2 ≤ m ≤ 65. These plots are based on Ta-bles 4.1, 4.2, 4.3, and 4.5, on pages 35, 44, 50 and 52, respectively. We haveneglected the delay of the control logic in Figure 4.9.

The first two architectures have the same orders of all our cost measures.However, by comparing Tables 4.1 and 4.2 on pages 35 and 44, respectively,or by studying Figures 4.9 through 4.11, we see that these architectureshave different application areas. We should prefer the inverter based on theEuclidean algorithm if the most important property is time. The reason thatthe critical path of the inverter based on Berlekamp-Massey’s algorithm ismuch longer than the critical path of the inverter based on the Euclideanalgorithm is that the adder tree in the Berlekamp-Massey inverter addsabout 10 log2m to the critical path. On the other hand we should preferthe Berlekamp-Massey inverter if the most important property is area orenergy consumption.

The two architectures in Section 4.3 both use a systolic implementation ofthe Gauss-Jordan algorithm. The difference between those architecturesis all in the preprocessors. By comparing Tables 4.3 and 4.5 on pages 50and 52, respectively, or by studying Figures 4.9 through 4.11, we see thatthese architectures also have different application areas. We should preferHasan and Bhargava’s inverter if the most important property is time. Onthe other hand we should prefer our inverter if the most important propertyis area or power dissipation for m ≥ 4.

4.4. Properties of the Polynomial Basis Inverters 55

0 10 20 30 40 50 600

50

100

150

200

250

300Polynomial basis inverters

EUC BM GJ−1GJ−2

m

nt/

m

Figure 4.9: Normalized time needed for inversion in F2m using polynomial bases andthe architectures considered in this chapter.

EUC – Inversion based on the Euclidean algorithm.BM – Inversion based on the Berlekamp-Massey algorithm.GJ-1 – Inversion based on the Gauss-Jordan algorithm using Hasan andBhargava’s preprocessor.GJ-2 – Inversion based on the Gauss-Jordan algorithm using ourpreprocessor.


0 10 20 30 40 50 600

20

40

60

80

100

120

140

160

180


EUC BM GJ−1GJ−2

m

a/m

2

Figure 4.10: Normalized area needed for inversion in F2m using polynomial basesand the architectures considered in this chapter.

EUC – Inversion based on the Euclidean algorithm.BM – Inversion based on the Berlekamp-Massey algorithm.GJ-1 – Inversion based on the Gauss-Jordan algorithm using Hasanand Bhargava’s preprocessor.GJ-2 – Inversion based on the Gauss-Jordan algorithm using ourpreprocessor.

4.4. Properties of the Polynomial Basis Inverters 57

0 10 20 30 40 50 600

50

100

150

200

250


EUC BM GJ−1GJ−2

m

np/

m3

Figure 4.11: Normalized energy needed for inversion in F2m using polynomial basesand the architectures considered in this chapter.

EUC – Inversion based on the Euclidean algorithm.BM – Inversion based on the Berlekamp-Massey algorithm.GJ-1 – Inversion based on the Gauss-Jordan algorithm using Hasanand Bhargava’s preprocessor.GJ-2 – Inversion based on the Gauss-Jordan algorithm using ourpreprocessor.


Among the four architectures considered in this chapter, the fastest one isHasan and Bhargava’s inverter based on the Gauss-Jordan algorithm, whilethe smallest and least power consuming one is the inverter based on theBerlekamp-Massey algorithm. Both the inverter based on the Euclideanalgorithm and Hasan and Bhargava’s inverter based on the Gauss-Jordanalgorithm are independent of p(x), the irreducible polynomial used to gen-erate the polynomial basis. They are both fed with p(x) at each calculation.Our inverters, both the one based on the Berlekamp-Massey algorithm andthe one based on the Gauss-Jordan algorithm, have p(x) hardwired in thearchitecture. This fact makes our inverters less flexible, but also easier touse, compared to the other two inverters.

Chapter 5

Normal Basis Inverters

An interresting property, that can be utilized when deriving architecturesfor arithmetic operations in F2m , is that squaring is a simple cyclic shiftif the element is represented using a normal basis over F2. The Massey-Omura [32] multiplier is one example where this property is used.

Definition 6 (Restated from Section 2.2) Let ϑ be an element of Fqm such

that {ϑqi}m−1i=0 is a basis of Fqm over Fq. Then {ϑqi}m−1

i=0 is called a normalbasis of Fqm over Fq.

There is an important difference between polynomial bases and normal basesfrom an implementational point of view. The choice of a normal basis is farmore crusial than the choice of a polynomial basis. For instance, the size ofa Massey-Omura multiplier for F2m varies heavily with the choice of basis.

5.1 All-One Polynomials

Irreducible all-one polynomials are of special interest for arithmetic usingnormal bases, since they provide efficient architectures, at least for multi-plication. An all-one polynomial of degree m is a polynomial of the form

f(x) =m∑

i=0

xi.

59

60 Chapter 5. Normal Basis Inverters

It is fairly easy to show that the roots of an irreducible all-one polynomialform a normal basis.

Irreducible all-one polynomials do exist, but not for all degrees. There is awell known theorem stating sufficient conditions for irreducibility of all-onepolynomials, namely if e and p are primes such that p is a primitive elementof GF(e), then the all-one polynomial of degree e − 1 is irreducible overGF(p). See for instance Dickson [10, p. 21, Th 33] or Mastrovito [34, p. 32,Th 2.27]. Actually, these conditions are necessary as well, as we state inthe following theorem. We have not been able to find this sharper versionof the theorem anywhere in the literature. For completeness, we give a fullproof of the theorem.

Theorem 3 Let e be an integer, e > 2, and let p be a prime. Then theall-one polynomial of degree e − 1 is irreducible over Fp if and only if e isa prime and p is a primitive element of Fe.

Proof: Let f(x) be the all-one polynomial of degree e−1. First assume thate is a prime and p is a primitive element of Fe. Hence, e and p are relativelyprime. Specifically, p does not divide e. Let ϑ be a root of f(x). We haveϑe = 1 since f(x) divides xe−1. There is therefore a natural interpretation ofexponents of ϑ as elements of Fe. Consider the elements ϑp

i, 0 ≤ i < e− 1.

Since p is a primitive element of Fe, we have{

pi}e−2

i=0= {i}e−1

i=1 in Fe, andtherefore we also have

{

ϑpi}e−2

i=0={

ϑi}e−1

i=1.

Since ϑe = 1 holds, we know that the order of ϑ divides e, but e is a primeand hence, the order of ϑ is either 1 or e. If the order of ϑ is 1, then we haveϑ = 1. However, we have f(1) = e and p ∤ e. Consequently, f(1) is nonzeroin Fp. Therefore, we conclude that the order of ϑ is e and that the elements

ϑpi, 0 ≤ i < e− 1, are distinct and so, f(x) is irreducible over Fp.

Conversely, assume that f(x) is irreducible over Fp. Consider the polyno-mial xe − 1 = (x− 1)f(x). Let a be a divisor of e. Then xa − 1 is a divisorof xe − 1. The only divisors of xe − 1 are xe − 1, f(x), and x− 1, and hence,the only possibility is that we either have a = e or a = 1. It follows that eis a prime.

5.2. The Massey-Omura Multiplier 61

Let ϑ be a root of f(x). Then ϑpi, 0 ≤ i < e− 1, are roots of f(x). Since

f(x) is irreducible, those roots are distinct and there are no other rootsof f(x). We have already noted that there is a natural interpretation ofexponents of ϑ as elements of Fe. The exponents pi, 0 ≤ i < e− 1, aredistinct in Fe since ϑp

i, 0 ≤ i < e− 1, are distinct. Furthermore, since the

number of elements is exactly the same as the number of nonzero elementsof Fe, p is a primitive element of Fe. This last conclusion can be made if pis in

{

pi}e−2

i=0, which is the case since we have e > 2. 2

The first cases of irreducible all-one polynomials over F2 are those with thedegrees

2, 4, 10, 12, 18, 28, 36, 52, 58, 60, 66, 82, and 100.

This fact is a simple consequence of Theorem 3.

5.2 The Massey-Omura Multiplier

Let ϑ be an element in F2m generating a normal basis over F2, and let ϑ′

be the element generating the dual basis of that normal basis for which wehave Tr(ϑϑ′) = 1. Define the vector

φ ,

(

ϑ20

, ϑ21

, . . . , ϑ2m−1)

Let α, β, and γ be any elements of F2m satisfying αβ = γ, with φ-vectors

αφ = (a0, a1, . . . , am−1)T

βφ = (b0, b1, . . . , bm−1)T

γφ = (c0, c1, . . . , cm−1)T

respectively. Define the shift matrix

S ,

(

0 1Im−1 0

)

,

where Im−1 is the (m− 1) × (m− 1) identity matrix. There is a functionfφ satisfying

cm−1 = fφ(αφ, βφ).


Squaring in F2m is a cyclic shift if the field is represented using a normalbasis over F2. Therefore, we have

cm−1−i = fφ(Siαφ, S

iβφ), 0 ≤ i < m.

Hence, we can use the same function repeatedly to calculate all coefficientsin γφ, either sequentially or in parallel. This multiplier is known as theMassey-Omura [32] multiplier.

Let us study fφ closer. By repeatedly using Definition 9 on page 11, theassumption γ = αβ yields

φγφ =

m−1∑

i=0

m−1∑

j=0

aibjφ ·(

ϑ2i+2j)

φ.

The last coefficient of the vector(

ϑ2i+2j)

φis Tr(ϑ2i+2j

(ϑ′)2m−1

), and thus,

we have

cm−1 =

m−1∑

i=0

m−1∑

j=0

aibjTr(ϑ2i+2j(ϑ′)2

m−1

).

Define the matrix Tφ = (ti,j)m−1,m−1i,j=0,0 , where the coefficients are given by

ti,j = Tr(

ϑ2i+2j(ϑ′)2

m−1)

.

Then the function fφ can be written as the bilinear transform

fφ(αφ, βφ) = αTφTφβφ.

Rewrite fφ as

fφ(αφ, βφ) =

m−1∑

i=0

m−1∑

j=0

ti,jaibj =∑

(i,j) : ti,j=1

aibj .

The obvious way to implement the function fφ is therefore to use one F2-multiplier for each (i, j) for which we have ti,j = 1, and add all these prod-ucts using an adder tree. Let wH (A) denote the Hamming weight of amatrix or vector A. The implementation of fφ outlined here uses wH (Tφ)F2-multipliers and wH (Tφ) − 1 F2-adders.


The properties of Tφ have been studied by Mullin et al. [38], Ash et al. [4],and Geiselmann [18]. They use the complexity measure

Cφ , wH (Tφ) ,

which we call the Hamming complexity of φ. Ash et al. give the trivial upperbound Cφ ≤ m2. This general bound can be sharpened somewhat.

Ash et al. note that

ϑ1+2l=

m−1∑

k=0

t−k−1,l−k−1ϑ2k

holds, where the indices are reduced mod m. Hence, the vectors (ϑ1+2l)φ,

0 ≤ l < m, are the diagonals of Tφ, where the diagonals are wrapped aroundthe edges of Tφ. In particular, this means that the main diagonal of Tφ is(ϑ2)φ, i.e. all elements are zero except one.

To be able to sharpen the trivial upper bound, we first prove three simplelemmas.

Lemma 5 Let ϑ be a root of an irreducible polynomial of degree m, m > 1,over F2, such that ϑ1+2l

= 1 holds for an integer l, 0 ≤ l < m. Then m iseven and we have l = m/2.

Proof: First, ϑ2 is one of the roots of an irreducible polynomial of degreem,m > 1, over F2. Hence, we have ϑ2 6= 1 and l 6= 0. Assume that there are twovalues, l1 and l2, in {1, 2, . . . ,m− 1}, for which we have ϑ1+2l1 = ϑ1+2l2 = 1.

This gives us ϑ2l1 = ϑ2l2 . But ϑ is a root of an irreducible polynomial overF2, and therefore l1 = l2 holds. Thus, there is at most one l, 0 < l < m,such that ϑ1+2l

= 1 holds. Now assume that we actually have one suchl. We note that ϑ1+2m−l

= (ϑ1+2l)2

m−lholds. Hence, m − l, which is an

element in {1, 2, . . . ,m− 1}, is such that ϑ1+2m−l= 1 holds. Since there

cannot be more than one such element in {1, 2, . . . ,m− 1}, we must havem − l = l, which gives us m = 2l. In other words, m is even and we havel = m/2. 2

Lemma 6 Let m be an even positive integer, and let ϑ be a normal elementin F2m over F2 such that ϑ1+2m/2

= 1 holds. Let φ be the normal basis

defined by ϑ. Then we have wH

(

(ϑ1+2m/2+1

)φ

)

= wH

(

(ϑ1+2m/2−1

)φ

)

= 1.


Proof: From the assumption ϑ1+2m/2

= 1 we have ϑ1+2m/2+1

= ϑ2m/2

, which

is one of the basis elements in φ. Thus, we have wH

(

(ϑ1+2m/2+1

)φ

)

= 1. We

note that ϑ1+2m/2−1

= (ϑ1+2m/2+1

)2m/2−1

holds. Using ϑ1+2m/2+1

= ϑ2m/2

,

in the last expression, we have ϑ1+2m/2−1

= ϑ2m−1

, which also is one of the

basis elements in φ. Thus, we have wH

(

(ϑ1+2m/2−1

)φ

)

= 1 as well. 2

Lemma 7 Let m be an even positive integer, and let φ be a normal basis

of F2m over F2. Also, let α be an element in F2m. Then wH

(

(α1+2m/2

)φ

)

is even.

Proof: First, we note that 2m − 1 = (2m/2 + 1)(2m/2 − 1) holds. Hence

we have (α2m/2+1)2m/2−1 = 1, and α1+2m/2

is an element of F2m/2 . This

also means that we have (α1+2m/2

)2m/2

= α2m/2

. Since φ is a normal basis

this means that the first half of (α1+2m/2

)φ equals the last half and hence,

wH

(

(α1+2m/2

)φ

)

is even. 2

We are now ready to prove the improved general upper bound.

Theorem 4 Let φ be a normal basis of F2m over F2. Then the Hammingcomplexity of φ is upper bounded by

Cφ ≤ m2 −m− 2⌈m/2⌉ + 3, m ∈ {2, 3},Cφ ≤ m2 −m− 2⌈m/2⌉ + 1, m > 3

Proof: Define vl , wH

(

(ϑ1+2l)φ

)

. Recall the definition Cφ , wH (Tφ). Let

ϑ be one of the generators of φ. We have already noted that the diagonalsof the m ×m matrix Tφ, wrapped around the edges of Tφ, are the vectors

(ϑ1+2l)φ, 0 ≤ l < m. Therefore, we have

Cφ =

m−1∑

l=0

vl.


The first element in this sum, v0, is 1, since ϑ2 is one of the basis elementsof φ. Hence we have the upper bound

Cφ ≤ m2 − (m− 1).

Since φ is a normal basis over F2, we have 1φ = (1, . . . , 1)T. Hence, for every

l for which we have ϑ1+2l 6= 1, we have vl < m. According to Lemma 5,there is at most one integer l, 0 ≤ l < m, such that ϑ1+2l

= 1 holds, andonly if m is even. Hence we can reduce the above bound by m− 2 for evenm, and by m− 1 for odd m. Thus, we have the upper bounds

Cφ ≤ m2 − (m− 1) − (m− 2), m even,Cφ ≤ m2 − (m− 1) − (m− 1), m odd.

Let m be even, m > 2, and let us study ϑ1+2m/2

. We can identify twopossibilities.

1. We have vm/2 = m, corresponding to ϑ1+2m/2

= 1, which is possibleaccording to Lemma 5. According to Lemma 6, we then have vm/2+1 =vm/2−1 = 1. Thus, in this case there are an additional 2(m− 2) zerosin the corresponding two diagonals of Tφ.

2. We have vm/2 < m, corresponding to ϑ1+2m/2 6= 1. By Lemma 7, vm/2is even. Thus, we have vm/2 ≤ m− 2. In this case there is at least anadditional 2 zeros in the main diagonal.

Thus, for even m, m > 2, we have the bound

Cφ ≤ m2 − (m− 1) − (m− 2) − 2.

Now, let m be odd, m > 3. Assume that there is an l such that vl = m− 1holds. Then we have ϑ1+2l

= 1 + ϑ2kfor some k, 0 ≤ k < m. Using this

last expression, we find

ϑ1+2l+1

= (1 + ϑ2k)ϑ2l

= ϑ2l+ ϑ2k+2l

= (ϑ+ ϑ1+2k−l)2

l.

Hence, we have either vl+1 = vk−l + 1 or vl+1 = vk−l − 1, where the indicesare reduced mod m. Then there are four possibilities.

1. We have 0 < l < m − 1 and vk−l = m − 1. Then we also havevl+1 = m − 2, since all these weights are at most m − 1. Since Tφ issymmetric, we also have vm−l−1 = m− 2. This reduces the bound by2.


2. We have 0 < l < m− 1 and vk−l = m− 2. Since m > 3 holds, we alsohave m− 2 > 1. Thus, we have k − l 6= 0 mod m, since v0 = 1 holds.Again since Tφ is symmetric, we also have vm−k+l = m− 2. This alsoreduces the bound by 2.

3. We have 0 < l < m − 1 and vk−l < m − 2. Then, since we havevl+1 ≤ vk−l+1, we also have vl+1 < m−1, Again since Tφ is symmetric,we also have vm−l−1 < m− 1. This also reduces the bound by 2.

4. We have l = m − 1. Then we have l + 1 = 0 mod m, and as wealready have noted, v0 = 1 holds. Then we have vk−l = 2 and by thesymmetry of Tφ, we also have vm−k+l = 2. This reduces the bound by2(m− 3).

Thus, for odd m, m > 3, we have the bound

Cφ ≤ m2 − (m− 1) − (m− 1) − 2.

2

Mullin et al. [38] and Geiselmann [18] showed that the inequality

Cφ ≥ 2m− 1

holds for any normal basis. This lower bound coincides with our upperbound in Theorem 4 for m ∈ {2, 3}, and there are normal bases meetingour upper bound for m ∈ {2, 3, 4, 5}. For fairly large m, however, ourupper bound seems to be off by approximately 10%. As an example, Geisel-mann [18] gives a normal basis of F230 over F2 with Hamming complexityCφ = 759. Theorem 4 in this case gives the bound Cφ ≤ 841.

If we have Cφ = 2m − 1 for a normal basis φ, then φ is called optimal.There are optimal normal bases for some fields, but not for all fields. Itis fairly easy to show that roots of irreducible all-one polynomials generateoptimal normal bases. However, there are other optimal normal bases. Thefollowing three theorems, due to Ash et al. [4], give us a means to createnormal bases, with some additional control of the Hamming complexity.

Theorem 5 Let p be a prime satisfying p = km + 1 for some positiveintegers k and m. Furthermore, let c be a primitive k-th root of unity in


Fp, and let β be a primitive p-th root of unity in F2km. If F∗p is generated

by 2 and c, then

α =k−1∑

i=0

βci

generates a normal basis of F2m over F2.

Ash et al. [4] call these bases projection bases, since they are projected fromF2km to F2m using a trace-like projection. They noted that there is at leastone k such that the conditions in Theorem 5 are fulfilled if and only if 8 ∤ m.Thus, not all normal bases can be constructed using Theorem 5, since thereare normal bases of F2m over F2 for all m, m > 1.

Theorem 6 Consider a normal basis φ of F2m over F2 constructed as inTheorem 5. Then we have the bounds

km− (k2 − 3k + 3) ≤ Cφ ≤ km− 1, for even k(k + 1)m− (k2 − k + 1) ≤ Cφ ≤ (k + 1)m− k, for odd k

According to Theorem 6, any normal basis generated by Theorem 5 withk ∈ {1, 2} is optimal, but k > 2 may produce optimal normal bases for somem. For those m where Theorem 5 can be used with k = 1, we get normalbases generated by irreducible all-one polynomials. Ash et al. also give anumber of theorems that we summarize in the following theorem. We haveused the lower bound Cφ ≥ 2m− 1 to rule out impossible values of m.

Theorem 7 Consider a normal basis φ of F2m over F2 constructed as inTheorem 5. For the following cases, we achieve the lower bound given inTheorem 6.

(i) k ∈ {1, 2} (iv) k = 6 and m > 12(ii) k ∈ {3, 4} and m > 2 (v) k = 7 and m > 6(iii) k = 5 and m > 4

Geiselmann [18] lists normal bases of F2m over F2 for m ≤ 60. In all caseswhere he has found optimal normal bases, Theorem 5 produces optimalnormal bases with k ∈ {1, 2}. The best known Hamming complexity forextension degree m, 2 ≤ m ≤ 60, according to Geiselmann, are given inTable 5.1.

The properties of a parallel Massey-Omura multiplier for a normal basis φare given in Table 5.2.


m Cφ

2 3 *3 5 *4 7 *5 9 *6 11 *7 198 219 17 *

10 19 *11 21 *12 23 *13 4514 27 *15 45

m Cφ

16 8517 8118 35 *19 11720 6321 9522 6323 45 *24 10525 9326 51 *27 14128 55 *29 57 *30 59 *

m Cφ

31 23732 36133 65 *34 24335 69 *36 71 *37 14138 20739 77 *40 18941 81 *42 13543 16544 14745 153

m Cφ

46 13547 26148 42549 18950 99 *51 101 *52 103 *53 105 *54 20955 18956 39957 49758 115 *59 59760 119 *

Table 5.1: Best known Hamming complexity for normal bases of extension degreem, 2 ≤ m ≤ 60, over F2, according to Geiselmann [18]. For m ≤ 32, thenumbers are known to be the best possible. The numbers marked byan asterisk (*) correspond to optimal bases, and are thus also the bestpossible.

Number of clock cycles (n) 1Normalized delay (t) 10(1 + log2 Cφ)Normalized time (nt) 10(1 + log2 Cφ)Normalized area (a) (18Cφ − 12)mNormalized power (p) (18Cφ − 12)mNormalized energy (np) (18Cφ − 12)mNormalized input capacitance (c) 2Cφ

Normalized output resistance (r) 2

Table 5.2: Properties of a parallel Massey-Omura multiplier for a normal basis φ. Allproperties are upper bounds, except the number of clock cycles, the inputcapacitance, and the output resistance.

5.3. Inversion Based on Exponentiation 69

5.3 Inversion Based on Exponentiation

Many published inverters for normal basis representation are based on thefact that α = α2m

holds for all α ∈ F2m . For α 6= 0, multiply both sidesof the equation by α−2, and we get α−1 = α2m−2. Hence, it is possibleto perform inversion by exponentiation. This exponentiation can be per-formed by alternating squaring and multiplication. We have already notedthat squaring in F2m is a cyclic shift when we are assuming a normal basisrepresentation over F2. The problem is that we need several multiplicationsfor each inversion.

5.3.1 Inversion by Squaring and Multiplication

Consider the decomposition

2m − 2 =

m−1∑

i=1

2i

It follows that we have

α−1 = α2m−2 =

m−1∏

i=1

α2i.

Based on this equation, Wang et al. [47] give the following algorithm forinversion.

Algorithm 5.1 Inversion by Squaring and Multiplication

Given α ∈ F∗

2m , initiate A = α and B = 1.

repeat m− 1 timesSet (A,B) = (A2, A2B)

end repeat

The result is in B.

Wang et al. [47] also give an architecture that is based on Algorithm 5.1.However, since squaring is costless in normal basis representation, we canremove one iteration from Algorithm 5.1 by doing the first squaring uponinitialization. This gives us the following modified algorithm.


Algorithm 5.2 Inversion by Squaring and Multiplication (5.1 modified)

Given α ∈ F∗

2m , initiate A = B = α2.

repeat m− 2 timesSet (A,B) = (A2, A2B)

end repeat

The result is in B.

We can also rewrite the exponent as

2m − 2 = ((· · · ((2 + 1) 2 + 1) 2 · · · + 1) 2 + 1) 2,

where 2 appears m− 1 times. Thus, we have

α−1 = α2m−2 =

(

(

· · ·(

(

α2α)2α)2

· · ·α)2

α

)2

where α appears m− 1 times in the left hand side. Based on this equation,we can use the following algorithm for inversion.

Algorithm 5.3 Inversion by Squaring and Multiplication

Given α ∈ F∗

2m , initiate A = α, B = 1.

repeat m− 1 timesSet B = (AB)2

end repeat

The result is in B.

Mastrovito [34] gives an architecture that is based on Algorithm 5.3. Aswith Algorithm 5.1, we can remove one iteration from Algorithm 5.3 bydoing the first squaring upon initialization.

Algorithm 5.4 Inversion by Squaring and Multiplication (5.3 modified)

Given α ∈ F∗

2m , initiate A = α, B = α2.

repeat m− 2 timesSet B = (AB)2

end repeat

The result is in B.


Initiate Initiate

m memory cells for F2 m memory cells for F2

Square in F2m

Square in F2m

α α−1

Figure 5.1: An architecture of an inverter for F2m based on Algorithm 5.2. α−1 ispresent at the output after m− 2 steps.

Initiate Initiate

m memory cells for F2m memory cells for F2

Square in F2m

Square in F2m

α α−1

Figure 5.2: An architecture of an inverter for F2m based on Algorithm 5.4. α−1 ispresent at the output after m− 2 steps.

Algorithms 5.1 through 5.4 can all be used regardless of what type of basisis used. In Figures 5.1 and 5.2, we display architectures, based on Algo-rithms 5.2 and 5.4, respectively. These architectures are similar to thosepublished by Wang et al. [47] and Mastrovito [34], respectively. The squar-ing in both architectures is implemented as hard wired shifts. The prop-erties of these architectures are given in Tables 5.3 and 5.4, assuming thatthe multiplier is a parallel Massey-Omura multiplier as in Section 5.2.


Number of clock cycles (n) m− 2Normalized delay (t) 14.4 log2 Cφ + 26Normalized time (nt) 14.4m log2 Cφ + 26m− 28.8 log2 Cφ − 52Normalized area (a) 22Cφm+ 52mNormalized power (p) 22Cφm+ 68m+ 32 + 64

m−2

Normalized energy (np) 22Cφm2 + 68m2 − 44Cφm− 104m

Table 5.3: Properties of the inverter in Figure 5.1, assuming a parallel Massey-Omuramultiplier, for a normal basis φ. All properties are upper bounds, exceptthe number of clock cycles.

Number of clock cycles (n) m− 2Normalized delay (t) 14.4 log2 Cφ + 26Normalized time (nt) 14.4m log2 Cφ + 26m− 28.8 log2 Cφ − 52Normalized area (a) 22Cφm+ 52m

Normalized power (p) 18Cφm+ 60m+ 4Cφ + 40 +8Cφ+80

m−2

Normalized energy (np) 18Cφm2 + 60m2 − 32Cφm− 80m



5.3.2 Inversion by Accellerated Squaring and Multiplication

The algorithms in Section 5.3.1 all use O (m) multiplications in F2m . Feng [13]presents an algorithm that performs inversion using only O (log2m) multi-plications in F2m .

Define q = ⌊log2m⌋, and consider the binary decomposition

m− 1 =

q∑

i=0

mi2i,

where we have mi ∈ {0, 1}, 0 ≤ i < q, and mq = 1. Also, define

p =

q−1∑

i=0

mi,

where mi, 0 ≤ i < q, are interpreted as integers. Feng [13] showed that

2m − 2 =

=(([

· · ·[(

mq2−mq2q

(1 + 22q−1

) +mq−1

)

2−mq−12q−1

(1 + 22q−2

)

+ mq−2 |] 2−mq−22q−2

(1 + 22q−3

) + · · · +m2

]

2−m222

(1 + 221

)

+ m1 |) 2−m121

(1 + 220

) +m0

)

2m−m0

holds. Thus, we have

α−1 = α2m−2

=

|· · · [((αmq)2−mq2q

(1+22q−1) · αmq−1

)2−mq−12q−1(1+22q−2

)

· αmq−2 |]2−mq−22q−2(1+22q−3

)

· · ·αm2

2−m222 (1+221)

· αm1 |2−m121 (1+220)

· αm0

2m−m0

.

Based on this equation, Feng [13] gives the following algorithm for inversion.


Algorithm 5.5 Inversion by Accellerated Squaring and Multiplication

Given α ∈ F∗

2m , initiate A = α, B = α2.

for i = q to 1if mi = 1 then

Set B = B2−2

i

end if

Set B = B ·B22

i−1

This row is executed q times.if mi−1 = 1 then

Set B = BA This row is executed p times.end if

end for

The result is B2m−m0

.

In Figure 5.3 there is an architecture based on Algorithm 5.5. The multi-plexer layer feeds the multiplier with the correct data at each time instance,which includes choosing the correct cyclic shifts of the data. Its propertiesare given in Table 5.5

An algorithm based on a similar technique as Algorithm 5.5 is proposed byItoh and Tsujii [26]. That algorithm gives an architecture that is similar tothe one in Figure 5.3, with similar properties. However, it only works form = 2r + 1, where r is a positive integer.

5.4 Polynomial Basis Inverters Revisited

Let p(x) be an irreducible all-one polynomial of degree m and let ϑ be aroot of p(x). In the proof of Theorem 3 on page 60 we observed that

{

ϑpi}m−1

i=0={

ϑi}m

i=1

holds. This observation has implications. After reordering the basis ele-ments, the normal basis defined by ϑ is a multiple of the polynomial basisdefined by ϑ. Also, the dual of that normal basis, which in turn is also anormal basis, is a reordered multiple of the dual of the polynomial basis.

The basis exchange between the normal basis and the polynomial basis isvery simple. Define the vectors

θ =(

ϑ0, ϑ1, . . . , ϑm−1)

,

φ =(

ϑ1, ϑ2, . . . , ϑm)

,

5.4. Polynomial Basis Inverters Revisited 75

Initiate Initiate

Control

m memory cells for F2m memory cells for F2

Square in F2m

Power 2m−m0

Multiplexer layer

α

α−1

Figure 5.3: An architecture of an inverter for F2m based on Algorithm 5.5. α−1 ispresent at the output after p+ q steps.

Number of clock cycles (n) p+ qNormalized delay (t) 14.4 log2 Cφ + 33.0Normalized time (nt) (14.4 log2 Cφ + 33.0)(p+ q)Normalized area (a) 22Cφm+ 12qm+ 64mNormalized power (p) 22Cφm+ 60m+ 56m

p+q

Normalized energy (np) (22Cφ + 60)(p+ q)m+ 56m



i.e. θ is the polynomial basis and φ is the reordered normal basis. Let ai,i ∈ {0, 1, . . . ,m− 1} and bi, i ∈ {0, 1, . . . ,m− 1} be the coefficients of αθand αφ respectively. Then we have the equalities

α =

m−1∑

i=0

aiϑi =

m−1∑

i=0

biϑi+1. (5.1)

Since ϑ is a root of the all-one polynomial of degree m, we have

ϑm =

m−1∑

i=0

ϑi.

Applying this equality on Equation 5.1, we get

ai =

{

bm−1, i = 0bm−1 + bi−1, 0 < i < m

(5.2)

bi =

{

a0, i = m− 1a0 + ai+1, 0 ≤ i < m− 1

(5.3)

These operations can easily be performed either in parallel or sequentially.We can invert α represented as αφ by

1. determining αθ from αφ using Equation 5.2,

2. determining(

α−1)

θusing any inverter for polynomial basis represen-

tation, and

3. determining(

α−1)

φfrom

(

α−1)

θusing Equation 5.3.

The cost of these basis exchanges is small compared to the cost of invertingin a polynomial basis.

The inverter based on the Euclidean algorithm in Section 4.1 has bothparallel input and parallel output. We therefore assume that the basisexchanges are obtained by two arrays of adders performing the operationsin Equations 5.2 and 5.3 in parallel. Using this approach, we reach theproperties in Table 5.6.

The inverter based on the Berlekamp-Massey algorithm in Section 4.2 hasserial input and parallel output. We therefore assume that the first basis


Number of clock cycles (n) 2m− 1Normalized delay (t) 4.4 log2(m+ 1

4) + 26.8 + tctrl

Normalized time (nt) (2m− 1)(

4.4 log2(m+ 1

4) + 26.8 + tctrl

)

Normalized area (a) 224m− 4Normalized power (p) 181.5m+ 139.25 + 11.25

2m−1

Normalized energy (np) 371m2 + 93m− 128

Table 5.6: Properties of the Euclidean inverter architecture for normal basis repre-sentation, where the normal basis is generated by an all-one polynomialof degree m. All properties except the number of clock cycles are upperbounds. tctrl is the normalized delay of the control logic.

Number of clock cycles (n) 2mNormalized delay (t) 14.4 log2(m) + 54.9Normalized time (nt) 28.8m log2(m) + 109.8mNormalized area (a) 164m− 64Normalized power (p) 172m+ 9 − 35.6

mNormalized energy (np) 344m2 + 18m− 71.2

Table 5.7: Properties of the Berlekamp-Massey inverter architecture for normal basisrepresentation, where the normal basis is generated by an all-one polyno-mial of degree m. All properties except the number of clock cycles areupper bounds.

exchange is done by a single adder performing the operations in Equation 5.2sequentially, and we assume that the second basis exchange is done by anarray of adders performing the operations in Equation 5.3 in parallel. Wereach the properties in Table 5.7.

The two inverters based on the Gauss-Jordan algorithm can naturally alsobe used. However, we do not need to perform the basis exchanges explicitly.Instead, we can modify the right hand side of the equation and keep thegeneration of the matrix intact, by choosing suitable bases for internal usein the architectures.

The inverter based on the Gauss-Jordan algorithm proposed by Hasan andBhargava [22], briefly described in Section 4.3.2, generates αθ,θ from αθ andsolves the equation αθ,θ(α

−1)θ = 1θ. We need a basis τ such that ατ,φ isgenerated from αφ in the same way as αθ,θ is generated from αθ, in order to


keep the matrix generation in the preprocessor unchanged. Let us thereforecompare these matrices. By Definition 10 on page 11, we have

ατ,φ =((

ϑ1α)

τ,(

ϑ2α)

τ, . . . , (ϑmα)τ

)

,

αθ,θ =((

ϑ0α)

θ,(

ϑ1α)

θ, . . . ,

(

ϑm−1α)

θ

)

.

For the i-th column, 1 ≤ i ≤ m, of these matrices, we have

(

ϑiα)

τ= ϑiτ,τ1τ,φαφ,

(

ϑi−1α)

θ= ϑi−1

θ,θ αθ,

respectively, based on Lemmas 2 and 4 on pages 12 and 37. If ατ,φ andαθ,θ are generated in the same way from αφ and αθ respectively, we haveϑiτ,τ1τ,φ = ϑi−1

θ,θ , 1 ≤ i ≤ m. Thus, we have 1τ,φ = ϑ−1θ,θ. Using Lemmas 3

and 4 on pages 36 and 37, this gives us τ = φϑθ,θ. It is easily shown thatwe have

ϑθ,θ =

0 · · · 0 11 0 1

. . ....

0 1 1

.

The task of the architecture is now to find (α−1)φ by solving ατ,φ(α−1)φ = 1τ .

Finally, for the left hand side of the equation we have

1τ = 1τ,φ1φ =

1 1 0...

. . .

1 0 11 0 · · · 0

1...1

=

0...01

.

Thus, the only modification that is needed is to change the left hand sideof the equation from 1θ to 1τ . However, we can make use of the fact thatp(x) is an all-one polynomial. Since all coefficients of p(x) are one, wedo not need to feed these coefficients to the architecture. Instead, we canhardwire the effect of these coefficients, thus removing 2 flip-flops and oneF2-multiplier in each cell in the preprocessor in Figure 4.6 on page 50. Wereach the properties in Table 5.8.

Our inverter based on the Gauss-Jordan algorithm in Section 4.3.3 solvesa Hankel problem that arises from an internal use of the triangular basis σof θ. The architecture generates ασ,θ from αθ and finds (α−1)θ by solving


Number of clock cycles (n) mNormalized delay (t) 30Normalized time (nt) 30mNormalized area (a) 73m2 + 271m− 194Normalized power (p) 113m2 + 475m− 288 − 136m−1

Normalized energy (np) 113m3 + 475m2 − 288m− 136

Table 5.8: Properties of the systolic Gauss-Jordan inverter architecture due to Hasanand Bhargava for normal basis representation, where the normal basis isgenerated by an all-one polynomial of degree m. All properties except thenumber of clock cycles are upper bounds.

the Hankel problem ασ,θ(α−1)θ = 1σ. We need a basis ψ such that αψ,φ is

generated from αφ in the same way as ασ,θ is generated from αθ, in order tokeep the matrix generation in the preprocessor unchanged. As before, letus compare these matrices. Again, by Definition 10 on page 11, we have

αψ,φ =(

(

ϑ1α)

ψ,(

ϑ2α)

ψ, . . . , (ϑmα)ψ

)

,

ασ,θ =((

ϑ0α)

θ,(

ϑ1α)

θ, . . . ,

(

ϑm−1α)

θ

)

.

For the i-th column, 1 ≤ i ≤ m, of these matrices, we have

(

ϑiα)

ψ= ϑiψ,ψ1ψ,φαφ,

(

ϑi−1α)

θ= ϑi−1

σ,σ 1σ,θαθ

respectively, again based on Lemmas 2 and 4 on pages 12 and 37. If αψ,φand ασ,θ are generated in the same way from αφ and αθ respectively, we haveϑiψ,ψ1ψ,φ = ϑi−1

σ,σ 1σ,θ for all i in {1, . . . ,m}. Thus, 1ψ,φ = ϑ−1σ,σ1σ,θ holds. By

Lemma 4, we have 1φ,ψ = 1−1ψ,φ. Combining the last two expressions, we get

1φ,ψ = 1−1σ,θϑσ,σ. Again by Lemma 4, we have 1θ,σ = 1−1

σ,θ. In the proof of

Theorem 2 we noted that ϑσ,σ = ϑTθ,θ holds. By Lemma 3 and Definition 12,

we have

1θ,σ =

1 · · · 1... . .

.

1 0

.


Number of clock cycles (n) mNormalized delay (t) 40Normalized time (nt) 40mNormalized area (a) 61m2 + 375m− 192Normalized power (p) 93m2 + 527m− 256 − 68m−1

Normalized energy (np) 93m3 + 527m2 − 256m− 68

Table 5.9: Properties of our systolic Gauss-Jordan inverter architecture for normalbasis representation, where the basis is generated by an all-one polynomialof degree m. All properties except the number of clock cycles are upperbounds.

Combining the last four expressions, we get

1φ,ψ = 1θ,σϑTθ,θ =

1 · · · 1... . .

.

1 0

0 1 0...

. . .

0 0 11 1 · · · 1

=

1 0 · · · 00 1 · · · 1...

... . ..

0 1 0

.

This matrix can be used to find ψ as ψ = φ1φ,ψ according to Lemma 3 onpage 36.

The task of the architecture is now to find (α−1)φ by solving αψ,φ(α−1)φ = 1ψ.

Finally, for the left hand side of the equation, we have

1ψ = 1ψ,φ1φ = 1−1φ,ψ1φ =

1 0 · · · 0 00 0 · · · 0 1...

... . ..

. ..

10 0 . .

.. .

.

0 1 1 0

1...1

=

110...0

.

Thus, the only modification that is needed is to change the left hand sideof the equation from 1σ to 1ψ. We reach the properties in Table 5.9, whichare the properties of the original architecture with w = m+ 1.

5.5 Properties of the Normal Basis Inverters

The architectures in this chapter can be divided into two categories. Thethree architectures in Section 5.3 are designed to be used with any normal

5.5. Properties of the Normal Basis Inverters 81

basis, while the four architectures in Section 5.4 are designed to be usedonly with normal bases generated by irreducible all-one polynomials. InFigures 5.4 through 5.6 we have plotted the normalized time, area, andenergy, respectively, needed for the architectures considered in this chapter,for 2 ≤ m ≤ 65. These plots are based on Tables 5.3 through 5.9, on pages72 – 80. We have neglected the delay of the control logic in Figure 5.4.

The three architectures in Section 5.3 are based on similar ideas. Theyperform the exponentiation α2m−2 based on some decomposition of 2m − 2.They also have very similar properties, except that Feng’s architecture inFigure 5.3 on page 75 needs only at most 2 log2m clock cycles, where thearchitectures in Figure 5.1 and Figure 5.2 on page 71 both need m−2 clockcycles, as seen in Tables 5.3 and 5.4 on page 72, and Table 5.5 on page 75.This also affects the energy needed to complete an inversion. We shouldtherefore prefer the architecture based on Feng’s algorithm for all normalbases that are not generated by an irreducible all-one polynomial.

When the normal basis is generated by an irreducible all-one polynomial,we should typically prefer the inverters in Section 5.4, at least if the areais the most important property. For m > 20, the inverter based on Feng’salgorithm is still the fastest one. Since the inverters in Section 5.4 arebased on inverters for polynomial basis representation, they inherit theirproperties from the corresponding inverters in Chapter 4. The commentsmade in Section 4.4 therefore hold here as well. The fastest architecture isstill Hasan and Bhargava’s inverter based on the Gauss-Jordan algorithm,while the smallest and least power consuming one is the inverter based onthe Berlekamp-Massey algorithm, as we can see in Tables 5.6 through 5.9 onpages 77 through 80. For small m, the inverter based on Feng’s algorithmconsumes less energy than the inverters in Section 5.4. For fairly largem, the energy consumption of the inverter based on Feng’s algorithm isapproximately the same as the the energy consumption of the invertersbased on the Berlekamp-Massey algorithm and the Euclidean algorithm.


0 10 20 30 40 50 600

50

100

150

200

250

300

350Normal basis inverters

M&S−1&2 M&S−3 AOP:EUC AOP:BM AOP:GJ−1AOP:GJ−2

m

nt/

m

Figure 5.4: Normalized time needed for inversion in F2m using normal bases and thearchitectures considered in this chapter. In all cases we consider normalbases whose Hamming complexity is the smallest known.

M&S-1&2 – Inversion based on multiplication and squaring us-ing our modified versions of the ideas of Wang et al. and Mastrovitorespectively.M&S-3 – Inversion based on accellerated multiplication and squaringusing Fengs idea.

AOP:* – Inversion for the case where the normal basis is gener-ated from an irreducible all-one polynomial, via a basis exchange to thecorresponding polynomial basis.AOP:EUC – Inversion based on the Euclidean algorithm.AOP:BM – Inversion based on the Berlekamp-Massey algorithm.AOP:GJ-1 – Inversion based on the Gauss-Jordan algorithm usingHasan and Bhargava’s preprocessor.AOP:GJ-2 – Inversion based on the Gauss-Jordan algorithm using ourpreprocessor.

5.5. Properties of the Normal Basis Inverters 83

0 10 20 30 40 50 600

50

100

150

200

250


M&S−1&2 M&S−3 AOP:EUC AOP:BM AOP:GJ−1AOP:GJ−2

m

a/m

2

Figure 5.5: Normalized area needed for inversion in F2m using normal bases and thearchitectures considered in this chapter. In all cases we consider normalbases whose Hamming complexity is the smallest known.

M&S-1&2 – Inversion based on multiplication and squaring us-ing our modified versions of the ideas of Wang et al. and Mastrovitorespectively.M&S-3 – Inversion based on accellerated multiplication and squaringusing Fengs idea.



0 10 20 30 40 50 600

50

100

150

200

250


M&S−1 M&S−2 M&S−3 AOP:EUC AOP:BM AOP:GJ−1AOP:GJ−2

m

np/

m3

Figure 5.6: Normalized energy needed for inversion in F2m using normal bases andthe architectures considered in this chapter. In all cases we considernormal bases whose Hamming complexity is the smallest known.

M&S-1 – Inversion based on multiplication and squaring usingour modified version of the idea of Wang et al.M&S-2 – Inversion based on multiplication and squaring using ourmodified version of the idea of Mastrovito.M&S-3 – Inversion based on accellerated multiplication and squaringusing Fengs idea.


Chapter 6

Inversion in Tower Fields

Extension fields can be represented in many different ways. So far we haveonly considered F2m to be an extension of F2 using either a polynomial ornormal basis. However, there are many valid bases of F2m over F2, that areneither polynomial nor normal. It is perfectly possible that some of thesenon-standard bases can generate smaller, faster, or less energy-consumingarchitectures. One way to create a non-polynomial, non-normal basis is tomake the extension in more than one step. For instance, let n divide m,and create F2n as an extension of F2. Then create F2m as an extension ofF2n of degree m/n. Such fields are often referred to as composite fields. Theresulting basis of F2m over F2 is typically non-polynomial and non-normaleven if the two extensions are made using polynomial or normal bases.

A tower over the field F is strictly a set F of finite extensions of F which istotally ordered by inclusion. The term tower field has been used in the liter-ature to denote the largest field in F , assuming that the field is representedby successive extensions using all fields in F , and we adopt this notion.Let m be a power of two, F = F2 and F =

{

F22k : k ∈ {1, 2, . . . , log2m}

}

.Hence we have the finite field F2m , where m is a power of two, and wherethe field is constructed by successive extensions of degree two, starting withF2. The properties of operations in such a field depend on how F2m isrepresented as an extension of degree two of its largest true subfield F2m/2 .

85

86 Chapter 6. Inversion in Tower Fields

6.1 Bases of Tower Fields

First we state three theorems that will help us to characterize bases of F2m

over F2m/2 . The following theorem is a special case of a well known theoremestablished by Pellet [40]. It can be found in any text book about finitefields, see for instance Lidl, Niederreiter [30, Cor. 3.79, p. 127]. By thisresult we are able to establish all polynomial bases of F2m over Fm/2.

Theorem 8 The polynomial x2 + x+ ǫ is irreducible over F2n if and onlyif we have TrF2n/F2

(ǫ) = 1.

A direct consequence of Theorem 8 is that there are 2n−1 irreducible polyno-mials of the given form. Furthermore, every polynomial f(x) = ax2+bx+c,where a and b are nonzero, can be transformed into the given form by thetransformation (a/b2)f(bx/a) = x2 + x+ ac/b2.

If n is even, then TrF2n/F2(ǫ) is zero whenever we have ǫ ∈ F2n/2 . If n is a

power of two, all true subfields of F2n are contained in F2n/2 . Therefore ifwe have TrF2n/F2

(ǫ) = 1, then ǫ is in F2n \ F2n/2 .

The following theorem is a special case of a theorem of Perlis [41, Th. 1].

Theorem 9 Let m be an even integer and let p(x) be an irreducible poly-nomial over F2m/2 of degree 2, with σ as one of its roots. Then σ defines anormal basis of F2m over F2m/2.

Theorems 8 and 9 give us all polynomial and normal bases of F2m over F2m/2 .The following theorem essentially gives us all bases that are not related tothe above ones.

Theorem 10 Let m be an even integer, m > 2. Let p(x) = x2 +x+p0 andq(x) = x2 +x+ q0 be distinct irreducible polynomials over F2m/2, with rootsσ and θ respectively. Then {σ, θ} is a basis of F2m over F2m/2.

Proof: Since m is an even integer, we know that F2m/2 exists. Furthermore,since we also have m > 2, Theorem 8 assures that there are at least twodistinct irreducible polynomials of the given form. Assume that σ and θ

6.1. Bases of Tower Fields 87

are linearly dependent over F2m/2 . Then there is an element a ∈ F∗

2m/2 suchthat σ = aθ holds, and we have

p(aθ) = a2θ2 + aθ + p0 = 0.

But θ is a root of q(x) and hence θ2 = θ + q0 holds. Identifying in theexpression above we get

a2(θ + q0) + aθ + p0 = 0,

which gives us the two equations

a2 + a = 0,

a2q0 + p0 = 0,

using the fact that θ and 1 are linearly independent over F2m/2 . The onlynonzero solution is a = 1, q0 = p0. By assumption we have q0 6= p0.Therefore σ and θ cannot be linearly dependent over F2m/2 . Hence {σ, θ} isa basis of F2m over F2m/2 . 2

Now we are ready to partition the set of all bases of F2m over F2m/2 intothree classes. Let s and t be elements of F∗

2m/2 . Also let σ and θ be roots of

the distinct irreducible polynomials p(x) = x2+x+p0 and q(x) = x2+x+q0respectively. The set of possible bases of F2m over F2m/2 can be divided intothe following three types.

• Bases of the form {s, tθ}, which we call bases of type I.

• Bases of the form {sσ, tσ2m/2}, which we call bases of type II.

• Bases of the form {sσ, tθ}, which we call bases of type III.

The partition is chosen so that arithmetic operations in F2m basically areperformed in the same way using any basis within one class. Among thebases of type I are all polynomial bases of F2m over F2m/2 , and among thebases of type II are all normal bases of F2m over F2m/2 .

From Theorem 8 we conclude that there are 2m/2−1 irreducible polynomials

of the form x2 + x + ǫ over F2m/2 . Therefore there are(

2m/2−1

2

)

pairs

of distinct irreducible polynomials p(x) and q(x). Theorem 9 assures thateach polynomial has two linearly independent roots, and Theorem 10 assures


that roots of distinct polynomials are linearly independent. Finally thereare 2m/2 − 1 possible choices each of s and t. Hence there are

2m/2(

2m/2 − 1)2

bases of type I,

2m/2−1(

2m/2 − 1)2

bases of type II, and

(

2m/2−1

2

)

22(

2m/2 − 1)2

bases of type III.

Summing these three numbers we get (2m − 1)(

2m − 2m/2)

/2 bases of typesI, II, and III, which is exactly the number of bases of F2m over F2m/2 . Weshould note that there are no bases of type III for m = 2.

6.2 Arithmetic Using Bases of Type I

Let s and t be nonzero elements of F2m/2 , and let θ be a root of the irreduciblepolynomial q(x) = x2 + x+ q0 over F2m/2 . Then {s, tθ} is a basis of type Ias mentioned in Section 6.1.

6.2.1 Inversion

Let α, β, and γ be elements of F2m . Let (a0, a1), (b0, b1), and (c0, c1) be therepresentations in the chosen basis of α, β, and γ respectively. That is wehave α = a0s + a1tθ, β = b0s + b1tθ, and γ = c0s + c1tθ. Assuming thatγ = αβ holds we get

c0 = sa0b0 + t2s−1q0a1b1, (6.1)

c1 = sa0b1 + sa1b0 + ta1b1. (6.2)

Now, assuming that we have γ = 1, we get c0 = s−1 and c1 = 0. SolvingEquations 6.1 and 6.2 for this case gives us

b0 =a0 + ts−1 · a1

(sa0 + tq1/20 · a1)2 + st · a0 · a1

, (6.3)

b1 =a1

(sa0 + tq1/20 · a1)2 + st · a0 · a1

. (6.4)

6.2. Arithmetic Using Bases of Type I 89

(a) (b)

(c) (d)

t/s

s

tq1/2

0

st

t/s

s2

t2q0

q1/2

0

q0

a0

a0

a0

a0

a1

a1

a1

a1

b0 b0

b0 b0

b1

b1

b1

b1

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Inversionin F2m/2

Inversionin F2m/2

Inversionin F2m/2

Inversionin F2m/2

Figure 6.1: Architectures for inversion in F2m using a basis of type I over F2m/2 .(a) Architecture A based on Equations 6.3 and 6.4 for arbitrary s and t.(b) Architecture B based on Equations 6.5 and 6.6 for arbitrary s and t.(c),(d) Architectures A and B for s = t = 1.

These equations define the inverter in Figure 6.1a. It uses at most four mul-tiplications by constants in F2m/2 . However, for the special case s = t = 1most of the constants in Equations 6.3 and 6.4 are 1 and we get the archi-tecture in Figure 6.1c using only one multiplication by a constant in F2m/2 .

We get an alternative inverter by rewriting Equations 6.3 and 6.4 as

b0 =a0 + ts−1 · a1

s2a0 · (a0 + ts−1 · a1) + t2q0 · a21

, (6.5)

b1 =a1

s2a0 · (a0 + ts−1 · a1) + t2q0 · a21

. (6.6)

These equations define the inverter in Figure 6.1b, which uses at mostthree multiplications by constants in F2m/2 . However, for the special case


(a) (b)

t

t

s

s

s−1

s−1q0

t−1

q0

a0a0

a1a1

b0b0

b1b1

c0c0

c1c1

Figure 6.2: Architectures for multiplication in F2m using a basis of type I over F2m/2

based on Equations 6.7 and 6.8. (a) Architecture for arbitrary s and t.(b) Architecture for the special case where we have s = t = 1.

(a) (b)

sb0

t2s−1b1q0sb1

sb0 + tb1

sb0

s(b0+b1)

sq0b1

a0a0

a1a1 c0

c0 c1

c1

Figure 6.3: Architectures for multiplication by a constant in F2m using a basis oftype I over F2m/2 . (a) Architecture based on Equations 6.9 and 6.10 forarbitrary s and t. (b) Architecture based on Equations 6.11 and 6.12 forthe special case where we have s = t.

s = t = 1 most of the constants in Equations 6.5 and 6.6 are 1 and we getthe architecture in Figure 6.1d using only one multiplication by a constantin F2m/2 . This last architecture has previously been published by Morii andKasahara [36].

We assumed that m is a power of two and that F2m is constructed by succes-sive extensions of degree two, starting with F2. For the analysis we assumethat all intermediate fields are represented over their largest subfields usingsimilar bases. Hence we need architectures for multiplication, squaring, andmultiplication by a constant using the same type of basis. For simplicity wederive architectures for these operations in F2m over F2m/2 .

6.2. Arithmetic Using Bases of Type I 91

6.2.2 Multiplication

Multiplication in F2m can be performed in the following way. Rewrite Equa-tions 6.1 and 6.2 as

c0 = s−1 · sa0 · sb0 + s−1q0 · ta1 · tb1, (6.7)

c1 = t−1 · ((sa0 + ta1) · (sb0 + tb1) + sa0 · sb0) . (6.8)

These equations define the multiplier in Figure 6.2a using at most sevenmultiplications by constants in F2m/2 . As for inversion we are interestedin the special case s = t = 1. For this case most of the constants inEquations 6.7 and 6.8 are 1, and hence the corresponding multiplicationsby these constants are replaced by wires. In this way we get the simplifiedarchitecture in Figure 6.2b using only one multiplication by a constant inF2m/2 , previously published by Paar [39].

Multiplication by a constant is naturally less complex than multiplication oftwo variable elements. Assume that β is the constant element and rewriteEquations 6.1 and 6.2 as

c0 = sb0 · a0 + t2s−1q0b1 · a1, (6.9)

c1 = sb1 · a0 + (sb0 + tb1) · a1. (6.10)

This gives us the trivial architecture in Figure 6.3a using four multiplicationsby constants and two additions in F2m/2 . An alternative architecture can bederived if we have s = t. We rewrite Equations 6.9 and 6.10 for this specialcase as

c0 = sb0 · a0 + sq0b1 · a1, (6.11)

c1 = sb0 · a0 + s(b0 + b1) · (a0 + a1), (6.12)

and get the architecture in Figure 6.3b using three multiplications by con-stants and three additions in F2m/2 .

6.2.3 Squaring

Squaring can of course be performed using the suggested multiplier, but aless complex squarer can be obtained by using a dedicated architecture forsquaring. Setting β = α in Equations 6.1 and 6.2 we get

c0 = s · a20 + t2s−1q0 · a2

1, (6.13)

c1 = t · a21, (6.14)


(a) (b)

s

t2s−1q0

t

q0

a0a0

a1a1

c0 c0

c1 c1

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Figure 6.4: Architectures for squaring in F2m using a basis of type I over F2m/2

based on Equations 6.13 and 6.14. (a) Architecture for arbitrary s andt. (b) Architecture for the special case where we have s = t = 1.

(a) (b)

s

t

t−1

s−1

p0 p0

a0a0

a1a1

b0b0

b1 b1

Squaringin F2m/2

Squaringin F2m/2

Inversionin F2m/2

Inversionin F2m/2

Figure 6.5: Architectures for inversion in F2m using a basis of type II over F2m/2

based on Equations 6.17 and 6.18. (a) Architecture for arbitrary s andt. (b) Architecture for the special case where we have s = t = 1.

which gives us the architecture in Figure 6.4a using three multiplications byconstants in F2m/2 . As above we are interested in the special case where wehave s = t = 1. Here most of the constants in Equations 6.13 and 6.14 are 1.We get the architecture in Figure 6.4b, which uses only one multiplicationby a constant in F2m/2 .

6.3 Arithmetic Using Bases of Type II

Let s and t be nonzero elements of F2m/2 , and let σ be a root of the irre-

ducible polynomial p(x) = x2 + x + p0 over F2m/2 . Then {sσ, tσ2m/2} is abasis of type II as mentioned in Section 6.1.

6.3. Arithmetic Using Bases of Type II 93

6.3.1 Inversion

Let α, β, and γ be elements of F2m . Let (a0, a1), (b0, b1), and (c0, c1) be therepresentations in the chosen basis of α, β, and γ respectively. That is wehave α = a0sσ + a1tσ

2m/2

, β = b0sσ + b1tσ2m/2

, and γ = c0sσ + c1tσ2m/2

.Assuming γ = αβ as in Section 6.2 we get

c0 = s(1 + p0)a0b0 + tp0a0b1 + tp0a1b0 + t2s−1p0a1b1, (6.15)

c1 = s2t−1p0a0b0 + sp0a0b1 + sp0a1b0 + t(1 + p0)a1b1, (6.16)

Assuming γ = 1, we get c0 = s−1 and c1 = t−1. Solving Equations 6.15and 6.16 for this case gives us

b0 =s−1 · ta1

p0 · (sa0 + ta1)2 + sa0 · ta1, (6.17)

b1 =t−1 · sa0

p0 · (sa0 + ta1)2 + sa0 · ta1. (6.18)

These equations define the inverter in Figure 6.5a, using at most five mul-tiplications by constants in F2m/2 . For the special case s = t = 1 we needonly one multiplication by a constant in F2m/2 , and we get the architecturein Figure 6.5b.

As in Section 6.2 we assume that any true subfield of F2m is represented overits largest true subfield using a similar basis. Hence we need architecturesfor multiplication, squaring, and multiplication by a constant using the sametype of basis.



c0 = s−1p0 · (sa0 + ta1) · (sb0 + tb1) + s−1 · sa0 · sb0, (6.19)

c1 = t−1p0 · (sa0 + ta1) · (sb0 + tb1) + t−1 · ta1 · tb1. (6.20)

These equations define the multiplier in Figure 6.6a, using at most eightmultiplications by constants in F2m/2 . For the special case where we haves = t = 1, we rewrite Equations 6.19 and 6.20 as

c0 = p0 · (a0 + a1) · (b0 + b1) + a0 · b0, (6.21)

c1 = p0 · (a0 + a1) · (b0 + b1) + a1 · b1. (6.22)


(a) (b)

t

t

s

s s−1

t−1

p0/s

p0/t

p0

a0 a0

a1 a1

b0b0

b1 b1

c0

c0

c1

c1

Figure 6.6: Architectures for multiplication in F2m using a basis of type II over F2m/2 .(a) Architecture for arbitrary s and t based on Equations 6.19 and 6.20.(b) Architecture for s = t = 1 based on Equations 6.21 and 6.22.

(a) (b)

s(1 + p0)b0 + tp0b1

tp0b0 + t2s−1p0b1s2t−1p0b0 + sp0b1

sp0b0 + t(1 + p0)b1

sb0

sp0(b0+b1)

sb1

a0a0

a1a1

c0 c0

c1 c1

Figure 6.7: Architectures for multiplication by a constant in F2m using a basis oftype II over F2m/2 . (a) Architecture based on Equations 6.23 and 6.24for arbitrary s and t. (b) Architecture based on Equations 6.25 and 6.26for the special case where we have s = t.

(a) (b) (c)

s(1 + p0)

t2s−1p0

s2t−1p0

t(1 + p0)

s

sp0

s

p0

a0 a0

a0

a1 a1

a1

c0 c0c0

c1 c1c1

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Figure 6.8: Architectures for squaring in F2m using a basis of type II over F2m/2 .(a) Architecture for arbitrary s and t based on Equations 6.27 and 6.28.(b),(c) Architectures for the special cases where we have s = t ands = t = 1 respectively based on Equations 6.29 and 6.30.

6.3. Arithmetic Using Bases of Type II 95

which defines the architecture in Figure 6.6b, using only one multiplicationby a constant in F2m/2 .

For multiplication by a constant we assume that β is the constant element.Rewriting Equations 6.15 and 6.16 as

c0 = (s(1 + p0)b0 + tp0b1) · a0 +(

tp0b0 + t2s−1p0b1)

· a1, (6.23)

c1 =(

s2t−1p0b0 + sp0b1)

· a0 + (sp0b0 + t(1 + p0)b1) · a1, (6.24)

gives us the architecture in Figure 6.7a, using at most two adders and fourmultiplications by constants in F2m/2 . This architecture is the same as fortype I bases. The only difference is that the constants differ. As for typeI bases we can get an alternative architecture if we have s = t. RewritingEquations 6.23 and 6.24 for this special case as

c0 = sp0(b0 + b1) · (a0 + a1) + sb0 · a0, (6.25)

c1 = sp0(b0 + b1) · (a0 + a1) + sb1 · a1, (6.26)

we get the architecture in Figure 6.7b, using at most three adders and threemultiplications by constants in F2m/2 .

6.3.3 Squaring

For squaring we set β = α in Equations 6.15 and 6.16. Then we have

c0 = s(p0 + 1)a20 + t2s−1p0a

21, (6.27)

c1 = s2t−1p0a20 + t(p0 + 1)a2

1, (6.28)

which gives us the architecture in Figure 6.8a, using two adders and atmost four multiplications by constants in F2m/2 . We can get an alternativearchitecture for s = t. Rewrite Equations 6.27 and 6.28 as

c0 = sp0(a20 + a2

1) + sa20, (6.29)

c1 = sp0(a20 + a2

1) + sa21, (6.30)

for this special case. These equations define the architecture in Figure 6.8b,using three adders and at most three multiplications by constants in F2m/2 .For the case where we have s = t = 1 this architecture simplifies to the archi-tecture in Figure 6.8c, which uses three adders and only one multiplicationby a constant in F2m/2 .


6.4 Arithmetic Using Bases of Type III

Let s and t be nonzero elements of F2m/2 . Also let p(x) = x2 + x+ p0 andq(x) = x2 + x + q0 be distinct irreducible polynomials over F2m/2 . Finallylet σ be a root of p(x) and let θ be a root of q(x). Then {sσ, tθ} is a basisof type III as mentioned in Section 6.1.

6.4.1 Inversion

Let α, β, and γ be elements of F2m . Let (a0, a1), (b0, b1), and (c0, c1) be therepresentations in the chosen basis of α, β, and γ respectively. It followsthat we have α = a0sσ + a1tθ, β = b0sσ + b1tθ, and γ = c0sσ + c1tθ.Assuming that γ = αβ holds we get

γ = a0b0s2σ2 + (a0b1 + a1b0)stσθ + a1b1t

2θ2. (6.31)

We need to express the products σ2, σθ, and θ2 in our basis. First we knowthat σ and θ are roots of p(x) and q(x) respectively. Therefore we have

σ2 = σ + p0, (6.32)

θ2 = θ + q0. (6.33)

Using Equations 6.32 and 6.33, we can rewrite Equation 6.31 as

γ = a0b0s2(σ + p0) + (a0b1 + a1b0)stσθ + a1b1t

2(θ + q0). (6.34)

Now we need to express 1 and σθ in our basis.

Define the polynomial r(x) = x2 + x+ p0 + q0. We have

TrF2m/2/F2

(p0 + q0) = TrF2m/2/F2

(p0) + TrF2m/2/F2

(q0) = 1 + 1 = 0,

and hence, according to Theorem 8, r(x) is reducible over F2m/2 . Moreover,we can easily verify that σ + θ is a root of r(x), the other being 1 + σ + θ.Therefore σ + θ is a nonzero element of F2m/2 . Define d , σ + θ. Then 1can be expressed as

1 = d−1σ + d−1θ, (6.35)

and our objective to express 1 in our basis is reached. By choosing p0 andq0 properly we can place d in any true subfield of F2m except F2.

6.4. Arithmetic Using Bases of Type III 97

Our irreducible polynomials can be expressed as p(x) = (x+ σ)(x+ σ2m/2

)

and q(x) = (x + θ)(x + θ2m/2

). Since the coefficient of x by definition is 1in both polynomials we have

1 = σ + σ2m/2

= θ + θ2m/2

. (6.36)

Using Equations 6.35 and 6.36 we find the equality

σθ =q0dσ +

p0

dθ, (6.37)

and our objective to express σθ in our basis is reached.

Using both Equations 6.35 and 6.37 in Equation 6.34 and identifying inγ = c0sσ + c1tθ we get

c0 = s(

1 +p0

d

)

a0b0 + tq0da0b1 + t

q0da1b0 +

t2

s

q0da1b1, (6.38)

c1 =s2

t

p0

da0b0 + s

p0

da0b1 + s

p0

da1b0 + t

(

1 +q0d

)

a1b1, (6.39)

Assuming γ = 1, we get c0 = (ds)−1 and c1 = (dt)−1. Solving Equa-tions 6.38 and 6.39 for this case gives us

b0 =(1 + d−1) · a0 + d−1ts−1 · a1

(sp1/20 · a0 + tq

1/20 · a1)2 + std · a0 · a1

, (6.40)

b1 =d−1st−1 · a0 + (1 + d−1) · a1

(sp1/20 · a0 + tq

1/20 · a1)2 + std · a0 · a1

. (6.41)

These equations define the inverter in Figure 6.9a, using at most sevenmultiplications by constants in F2m/2 .

The first special case of interest is q0 = p20, d = p0, and s = t = p−1

0 .Rewriting Equations 6.40 and 6.41 for this case gives us

b0 =a0 + p−1

0 (a0 + a1)

p−10 (a0 + a1) · a0 + a2

1

, (6.42)

b1 =a1 + p−1

0 (a0 + a1)

p−10 (a0 + a1) · a0 + a2

1

. (6.43)

These equations define the inverter in Figure 6.9b, using one multiplicationby a constant in F2m/2 .


(a) (b)

(c)

1 + d−1

sp1/2

st−1d−1

ts−1d−1

tq1/2

1 + d−1

std

p−10

d

d dp0

a0

a0 a0

a1

a1 a1

b0

b0

b0

b1

b1

b1

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Inversionin F2m/2

Inversionin F2m/2

Inversionin F2m/2

Figure 6.9: Architectures for inversion in F2m using a basis of type III over F2m/2 .Architecture (a) for arbitrary s and t based on Equations 6.40 and 6.41.(b) for q0 = p2

0 and s = t = p−10 based on Equations 6.42 and 6.43.

(c) for q0 = p0 + 1 and s = t = d2 based on Equations 6.44 and 6.45.

(a) (b) (c)

s

s

t

t d/s

d/t

q0s−1d−1

p0t−1d−1 p−1

0

d

dp0

a0 a0a0

a1 a1a1

b0 b0b0

b1 b1b1

c0c0c0

c1 c1c1

Figure 6.10: Architectures for multiplication in F2m using a basis of type III overF2m/2 . (a) Architecture for arbitrary s and t based on Equations 6.46and 6.47. (b) Architecture for q0 = p2

0 and s = t = p−10 based on Equa-

tions 6.48 and 6.49. (c) Architecture for q0 = p0 + 1 and s = t = d−1

based on Equations 6.50 and 6.51.


The second special case of interest is when we have q0 = p0 + 1. Then d isa primitive element in F4. Let s = t = d2 hold. Rewriting Equations 6.40and 6.41 for this case gives us the equations

b0 =da0 + (a0 + a1)

(da1 + dp1/20 (a0 + a1))2 + a0a1

, (6.44)

b1 =da1 + (a0 + a1)

(da1 + dp1/20 (a0 + a1))2 + a0a1

, (6.45)

defining the inverter in Figure 6.9c. This architecture seems to use threemultiplications by constants in F2m/2 . However, two of these constantsequals d which, as we have noted, is in F4. This multiplication by a constantis simpler than multiplication by an arbitrary constant element in F2m/2 , aswe shall see further on. Hence the architecture uses one multiplication bya constant in F2m/2 and two multiplications by constants in F4.

As in Sections 6.2 and 6.3 we need architectures for multiplication, squaring,and multiplication by a constant using the same type of basis.



c0 = s−1q0d−1 · (sa0 + ta1) · (sb0 + tb1) + s−1d · sa0 · sb0, (6.46)

c1 = t−1p0d−1 · (sa0 + ta1) · (sb0 + tb1) + t−1d · ta1 · tb1. (6.47)

These equations define the multiplier in Figure 6.10a, which among otherthings uses four additions and at most eight multiplications by constants inF2m/2 .

For the special case when we have q0 = p20, d = p0, and s = t = p−1

0 , we canrewrite Equations 6.46 and 6.47 as

c0 = (a0 + a1) · (b0 + b1) + a0 · b0, (6.48)

c1 = p−10 · (a0 + a1) · (b0 + b1) + a1 · b1, (6.49)

defining the multiplier in Figure 6.10b. This architecture uses four additionsand one multiplication by a constant in F2m/2 .


For the special case when we have q0 = p0 + 1, s = t = d2 with d being aprimitive element in F4, we can rewrite Equations 6.46 and 6.47 as

c0 = (a0 + a1) (b0 + b1) (1 + p0) d+ a0 · b0, (6.50)

c1 = (a0 + a1) (b0 + b1) p0d+ a1 · b1, (6.51)

defining the multiplier in Figure 6.10c, using five additions, one multipli-cation by a constant in F2m/2 , and one multiplication by a constant in F4.

For multiplication by a constant we assume that β is the constant elementand rewrite Equations 6.38 and 6.39 as

c0 =(

s(1 +p0

d)b0 + t

q0db1

)

a0 +

(

tq0db0 +

t2

s

p0

db1

)

a1, (6.52)

c1 =

(

s2

t

p0

db0 + s

p0

db1

)

a0 +(

sp0

db0 + t(1 +

q0d

)b1

)

a1. (6.53)

These equations define the architecture in Figure 6.11a, using two addersand at most four multiplications by constants in F2m/2 . This architecture isthe same as for type I and II bases, only the constants differ.

We have not been able to find any improvement on multiplication by aconstant for the special case when we have q0 = p2

0, d = p0, and s = t = p−10 .

However, for the special case when we have q0 = p0 + 1, s = t with d beinga primitive element in F4, we can rewrite Equations 6.52 and 6.53 as

c0 = sp0d2 ·(b0+b1)·(a0+a1)+sb1 ·a1+s

(

b0+d2b1)

·(

a0+d2a1

)

, (6.54)

c1 = sp0d2 ·(b0+b1)·(a0+a1)+d·sb1 ·a1. (6.55)

These equations define the architecture in Figure 6.11b, using five addersand at most three multiplications by constants in F2m/2 . It also uses twomultiplications by constants in F4.

6.4.3 Squaring

For squaring we set β = α in Equations 6.38 and 6.39. Then we have

c0 = s(1 + p0d−1) · a2

0 + t2s−1q0d−1 · a2

1, (6.56)

c1 = s2t−1p0d−1 · a2

0 + t(1 + q0d−1) · a2

1, (6.57)

which gives us the architecture in Figure 6.12a. This architecture uses twoadditions and at most four multiplications by constants in F2m/2 .


(a) (b)

s(1 + p0d−1)b0 + tq0d

−1b1

tq0d−1b0 + t2s−1q0d

−1b1s2t−1p0d

−1b0 + sp0d−1b1

sp0d−1b0 + t(1 + q0d

−1)b1

s(b0 + d2b1)

sd2p0(b0+b1)

sb1d2

d

a0a0

a1 a1

c0c0

c1c1

Figure 6.11: Architectures for multiplication by a constant in F2m using a basis oftype III over F2m/2 . (a) Architecture for arbitrary s and t based onEquations 6.52 and 6.53. (b) Architecture for q0 = p0 + 1 and s = tbased on Equations 6.54 and 6.55.

(a)

(b) (c)

s(1 + p0d−1)

t2s−1q0d−1

s2t−1p0d−1

t(1 + q0d−1)

p−10 d2

dp0

a0 a0

a0

a1

a1

a1c0

c0

c0c1

c1

c1

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Squaringin F2m/2

Figure 6.12: Architectures for squaring in F2m using a basis of type III over F2m/2 .Architecture (a) for arbitrary s and t based on Equations 6.56 and 6.57;(b) for q0 = p2

0 and s = t = p−10 based on Equations 6.58 and 6.59;

(c) for q0 = p0 + 1 and s = t = d2 based on Equations 6.60 and 6.61.


For the special case when we have q0 = p20, d = p0, and s = t = p−1

0 , we canrewrite Equations 6.56 and 6.57 as

c0 = a21, (6.58)

c1 = p−10 (a2

0 + a21) + a2

1, (6.59)

defining the architecture in Figure 6.12b. This architecture uses two addi-tions and one multiplication by a constant in F2m/2 .

For the special case when we have q0 = p0 + 1, s = t = d2 with d being aprimitive element in F4, we can rewrite Equations 6.56 and 6.57 as

c0 = dp0(a20 + a2

1) + a21 + d2(a2

0 + a21), (6.60)

c1 = dp0(a20 + a2

1) + a21, (6.61)

defining the architecture in Figure 6.12c. It uses three additions and onemultiplication by a constant in F2m/2 . It also uses one multiplication by aconstant in F4.

6.5 Arithmetic in F4

Our objective is to compare architectures for inversion in tower fields usingbases of our three types. We primarily aim to analyze the architecturesfor inversion derived in Sections 6.2 through 6.4 using the cost measuresoutlined in Chapter 3. Since these architectures for F2m use arithmeticoperations in F2m/2 , we will get recursive equations for all measures. Whensolving these equations we will need corresponding measures for some smallsubfield. We therefore start the analysis by considering F4. As a byproductof the analysis of inversion we also get analyses of all arithmetic operationsfor which we have derived architecures in Sections 6.2 through 6.4.

We noted in Section 6.1 that there are no bases of type III of F4 over F2.Therefore we need to treat this field separately to make the comparison fair.There are essentially two ways to choose the representation of F4 and whatarchitectures to use for arithmetic operations in F4. The first alternativeis to fix the representation and the architectures without considering theconsequences in larger fields. This is a fast and simple method, howeverblind-folded. The most obvious risk is that we choose a representation that

6.5. Arithmetic in F4 103

favourizes one of our types of bases of the larger fields. The second alter-native is to use the representation of F4 and the architectures for F4 that isbest for each situation. This is the most fair comparison, but also the mostcumbersome. In this thorough investigation we need to consider all repre-sentations of F4 and all interesting architectures for arithmetic operationsusing these representations for all three types of bases of the larger fieldsand for all interesting architectures for arithmetic operations in these largerfields. We choose to use this second alternative.

There are three ways to choose two distinct nonzero elements of F4. Allthree choices are valid bases of F4 over F2. Two of them are polynomialbases and one is a normal basis. The two polynomial bases are equivalentin the sense that they give rise to equal architectures, due to the fact thatthey are generated by the same irreducible polynomial. Hence we have thefollowing two alternatives to consider.

• The polynomial basis {1, ǫ}, and

• the normal basis {ǫ, ǫ2},

where ǫ is a root of x2 + x+ 1.

6.5.1 Polynomial Basis Representation

The polynomial basis is a basis of type I. Therefore the equations corre-sponding to the arithmetic operations in F4 using this basis are specialcases of Equations 6.1 to 6.14 of Section 6.2 with s = t = q0 = 1.

Let α, β, and γ be elements of F4, satisfying αβ = γ as in Section 6.2.Let (a0, a1), (b0, b1), and (c0, c1) be the representations of α, β, and γ re-spectively expressed in the polynomial basis. Setting s = t = q0 = 1 inEquations 6.1 and 6.2 we get

c0 = a0b0 + a1b1, (6.62)

c1 = a0b1 + a1b0 + a1b1, (6.63)

for multiplication. This defines architecture A in Figure 6.13a.


(e)

(a) (b)

(c) (d)

a0

a0

a0

a0

a0

a1a1

a1 a1

a1

b0

b0

b1b1

c0

c0c0

c0

c0 c1

c1c1

c1

c1

Figure 6.13: Architecture of (a) multiplier A, (b) multiplier B, (c)(d) multiplicationby the constants ǫ and ǫ2 respectively, and (e) inversion/squaring, for F4

using a polynomial basis over F2 defined by the irreducible polynomialp(x) = x2 + x+ 1 over F2. The bold lines denote critical paths.

Rewriting Equations 6.62 and 6.63 as

c0 = a0b0 + a1b1, (6.64)

c1 = (a0 + a1)(b0 + b1) + a0b0, (6.65)

gives us architecture B in Figure 6.13b, which is a special case of the archi-tecture in Figure 6.2b.


c0 = b0a0 + b1a1, (6.66)

c1 = b1a0 + (b0 + b1)a1. (6.67)

The two interesting cases are when β is ǫ or ǫ2. For β = ǫ we have (b0, b1) =(0, 1), which gives us

c0 = a1, (6.68)

c1 = a0 + a1. (6.69)


For β = ǫ2 we have (b0, b1) = (1, 1), which gives us

c0 = a0 + a1, (6.70)

c1 = a0. (6.71)

Equations 6.68 through 6.71 define the simple architectures in Figure 6.13cand d.

For squaring we have α = β and Equations 6.62 and 6.63 become

c0 = a20 + a2

1, (6.72)

c1 = a21. (6.73)

Now a0 and a1 are elements of F2 and hence the square of these elementsare the elements themselves. Therefore squaring is given by

c0 = a0 + a1, (6.74)

c1 = a1. (6.75)

Equations 6.74 and 6.75 define the architecture in Figure 6.13d.

For inversion we use the fact that α3 = 1 holds for all nonzero α in F4.Therefore we have α−1 = α3α−1 = α2 and inversion is in fact given by thesame equations as squaring in F4. Hence we use the same architecture forinversion as for squaring.

6.5.2 Normal Basis Representation

The normal basis is a basis of type II. Therefore the equations correspond-ing to the arithmetic operations in F4 using this basis are special cases ofEquations 6.15 to 6.28 of Section 6.3 with s = t = p0 = 1.

Let α, β, and γ still be elements of F4, satisfying αβ = γ as in Section 6.3.Let (a0, a1), (b0, b1), and (c0, c1) be the representations of α, β, and γ re-spectively expressed in the normal basis. Setting s = t = p0 = 1 in Equa-tions 6.15 and 6.16 we get

c0 = a0b1 + a1b0 + a1b1, (6.76)

c1 = a0b1 + a1b0 + a0b0, (6.77)


for multiplication. This defines architecture C in Figure 6.14a. RewritingEquations 6.76 and 6.77 as

c0 = (a0 + a1)(b0 + b1) + a0b0, (6.78)

c1 = (a0 + a1)(b0 + b1) + a1b1, (6.79)

gives us architecture D in Figure 6.14b, which is a special case of the archi-tecture in Figure 6.6b.


c0 = b1a0 + (b0 + b1)a1, (6.80)

c1 = (b0 + b1)a0 + b0a1. (6.81)

The two interesting cases are still when β is ǫ or ǫ2. For β = ǫ we have(b0, b1) = (1, 0), which gives us

c0 = a1, (6.82)

c1 = a0 + a1. (6.83)

For β = ǫ2 we have (b0, b1) = (0, 1), which gives us

c0 = a0 + a1, (6.84)

c1 = a0. (6.85)

Equations 6.82 through 6.85 define the simple architectures in Figure 6.14cand d. It is notable that these equations are exactly the same as Equa-tions 6.68 through 6.71 for polynomial basis representation.

For squaring and inversion we have α = β and Equations 6.76 and 6.77become

c0 = a1, (6.86)

c1 = a0, (6.87)

where we again have used the fact that a0 and a1 are elements of F2. Squar-ing and inversion is therefore a cyclic shift which can be hardwired as inFigure 6.14d.


(e)

(a) (b)

(c) (d)

a0

a0a0

a0

a0

a1

a1

a1

a1

a1

b0

b0

b1b1

c0

c0c0

c0

c0 c1

c1

c1

c1

c1

Figure 6.14: Architecture of (a) multiplier C, (b) multiplier D, (c)(d) multiplicationby the constants ǫ and ǫ2 respectively, and (e) inversion/squaring, forF4 using a normal basis over F2 defined by the irreducible polynomialp(x) = x2 + x+ 1 over F2. The bold lines denote critical paths.

6.5.3 Properties of Arithmetic in F4

The properties of all architectures depend on the properties of the operationsin F2. The assumed implementations of multiplication and addition in F2 aregiven in Figures 3.1 and 3.2 on page 16. The properties of these operationshave already been given in Table 3.1 on page 28.

The properties of our architectures for arithmetic operations in F4 are gath-ered in Table 6.1a for polynomial basis representation and in Table 6.1b fornormal basis representation. There are also the measures of F4-buffers andF4-adders. They are implemented in the only natural way as two parallelF2-counterparts.

The measures are given assuming that inverter chains are used in the criticalpath whenever that minimizes the delay. That is the case only for multiplierarchitecture C.


Arithmetic operation cin tint rout a

Buffer 2 2 1 8Addition 4 2 2 24Multiplication (Architecture A) 4 20 2 60Multiplication (Architecture B) 6 16 2 66Multiplication by a constant 6 2 2 16Squaring/Inversion 6 2 2 16

(a)

Arithmetic operation cin tint rout a

Buffer 2 2 1 8Addition 4 2 2 24Multiplication (Architecture C) 4 24 2 66Multiplication (Architecture D) 6 20 2 66Multiplication by a constant 6 2 2 16Squaring/Inversion cL 0 0 0

(b)

Table 6.1: The normalized properties of arithmetic operations in F4 using (a) poly-nomial basis representation and (b) normal basis representation, where cLis the normalized load capacitance.

(a) (b)

a00 a00

a01 a01

a10 a10

a11 a11

c00 c00

c01 c01

c10 c10

c11 c11

k0

k1

k2

k3

ǫ

ǫ

ǫ

ǫ

Connectionnetwork

Figure 6.15: Architecture of multiplication by a constant in F16. The individualconstants in F4 are assumed to be nonzero since that is the worstcase with respect to area. (a) General architecture. The shaded areasrepresent hardwired connections given by the corresponding constant.(b) Example that illustrates the worst case with respect to both areaand delay with k0 = k1 = k2 = k3 = ǫ. The bold lines denote criticalpaths.


6.6 Arithmetic in F16

We can make a few simplifications in F16. These simplifications are basedon the fact that squaring and inversion are the same thing in F4 and on thesimple architecture for squaring and multiplication by a constant for F4. Asbefore, (a0, a1), (b0, b1), and (c0, c1) represent the elements α, β, and γ ofthe field F16 over its coefficient field F4. The coefficients in F2 are indicatedby a second index. For instance, ai is represented by the pair (ai0, ai1).Some architectures in this section are given using operations in F2.

6.6.1 Multiplication by a Constant

Consider the architectures given in Figures 6.3a, 6.7a, and 6.11a, for thethree different types of bases. They are all defined by the equations

c0 = k0 · a0 + k1 · a1, (6.88)

c1 = k2 · a0 + k3 · a1, (6.89)

where ki, i = 0, . . . , 3, are constants in F4 given by the constant elementβ ∈ F16 and the chosen basis of F16 over F4. The only difference betweenthe three versions of the architecture is the actual values of the constants.Each input signal feeds two multiplications by constants in F4. We could usethe simple architecture in Figures 6.13cd and 6.14cd for this multiplicationin F4. However, the only difference between multiplying by one constantand by another in F4 is the connection of the output signals. Hence, we canbasically use the same architecture for both multiplications by constants ofthe same input signal. Therefore the multiplication by a constant in F16 canbe performed by the architecture given in Figure 6.15a, assuming the worstcase with respect to area requirement, which is when all ki are nonzero.A special case of the more general architecture in Figure 6.15a is given inFigure 6.15b with k0 = k1 = k2 = k3 = ǫ. The example in Figure 6.15b is aworst case with respect to both area and delay.

6.6.2 Squaring

The best basis of type I for F16 is the polynomial basis defined by a root ofx2 + x+ ǫ. Equations 6.13 and 6.14 then become

c0 = a20 + ǫ · a2

1, (6.90)

c1 = a21. (6.91)


Expressing Equations 6.90 and 6.91 over F2 assuming a polynomial basisrepresentation of F4 gives us

c00 = a00 + a01 + a11, (6.92)

c01 = a01 + a10, (6.93)

c10 = a10 + a11, (6.94)

c11 = a11. (6.95)

Assuming a normal basis representation of F4 instead gives us

c00 = a01 + a10, (6.96)

c01 = a00 + a10 + a11, (6.97)

c10 = a11, (6.98)

c11 = a10. (6.99)

The corresponding architectures are given in Figure 6.16a and b respectively.The buffers reduce the input capacitance and remove the load capacitancefrom the internal delay.

The best basis of type II for F16 is the normal basis defined by a root ofx2 + x+ ǫ. Equations 6.27 and 6.28 then become

c0 = ǫ2 · a20 + ǫ · a2

1, (6.100)

c1 = ǫ · a20 + ǫ2 · a2

1. (6.101)


c00 = a00 + a11, (6.102)

c01 = a00 + a01 + a10, (6.103)

c10 = a01 + a10, (6.104)

c11 = a00 + a10 + a11. (6.105)


c00 = a00 + a01 + a10, (6.106)

c01 = a01 + a10 + a11, (6.107)

c10 = a00 + a10 + a11, (6.108)

c11 = a00 + a01 + a11. (6.109)

The corresponding architectures are given in Figure 6.16c and d respec-tively. Again the buffers reduce the input capacitance and remove the loadcapacitance from the internal delay.


(a) (b)

(c) (d)

(e) (f)

a00

a00

a00

a00

a00 a00

a01

a01

a01

a01

a01 a01

a10

a10

a10

a10

a10

a10

a11

a11

a11

a11

a11

a11

c00 c00

c00

c00

c00

c00

c01 c01

c01

c01

c01

c01

c10 c10

c10

c10c10

c10

c11

c11 c11

c11

c11

c11

Figure 6.16: Architectures of squaring in F16 assuming the following representatonsof F16 over F4 and of F4 over F2.

PB = polynomial basis. NB = normal basis.

(a) F16: type I with t = s = 1; F4: PB (Eqs. 6.92–6.95)(b) F16: type I with t = s = 1; F4: NB (Eqs. 6.96–6.99)(c) F16: type II with t = s = 1; F4: PB (Eqs. 6.102–6.105)(d) F16: type II with t = s = 1; F4: NB (Eqs. 6.106–6.109)(e) F16: type III with t = s = ǫ2; F4: PB (Eqs. 6.112–6.115)(f) F16: type III with t = s = ǫ2; F4: NB (Eqs. 6.116–6.119)

Buffers are used to reduce delays. The bold lines denote criticalpaths.


For bases of type III we need two distinct irreducible polynomials of thepreferred form over F4. There are exactly two such polynomials, namelyx2 + x+ p0 and x2 + x+ q0 with p0 = ǫ and q0 = ǫ2. Here our two specialcases considered in Section 6.4 coinside since q0 = p2

0 = p0 + 1 holds and wehave d = ǫ. As in Section 6.4 we assume s = t = d−1 = ǫ2. Equations 6.56and 6.57 then become

c0 = a21, (6.110)

c1 = ǫ2 · a20 + ǫ · a2

1. (6.111)


c00 = a10 + a11, (6.112)

c01 = a11, (6.113)

c10 = a00 + a01 + a11, (6.114)

c11 = a00 + a10. (6.115)


c00 = a11, (6.116)

c01 = a10, (6.117)

c10 = a00 + a01 + a10, (6.118)

c11 = a01 + a10 + a11. (6.119)

The corresponding architectures are given in Figure 6.16e and f respectively.Again the buffers reduce the input capacitance and remove the load capac-itance from the internal delay.

6.6.3 Inversion

For the (type I) polynomial basis defined by a root of x2 + x+ ǫ we get

b0 = (a0 + a1)(

a0 + ǫ2a1 + (a0a1)2)

, (6.120)

b1 = a1

(

a0 + ǫ2a1 + (a0a1)2)

, (6.121)

from Equations 6.3 and 6.4 on page 88, using the fact that squaring andinversion is the same in F4. This gives us the architecture in Figure 6.17a,


which can be seen as a simplified version of the architecture in Figure 6.1cfor F16.

Rewriting Equations 6.120 and 6.121 we get

b0 = (a0 + a1)(

(a0(a0 + a1))2 + ǫ2a1

)

, (6.122)

b1 = a1

(

(a0(a0 + a1))2 + ǫ2a1

)

. (6.123)

This gives us the architecture in Figure 6.17b, which can be seen as a sim-plified version of the architecture in Figure 6.1d for F16.

For the (type II) normal basis defined by a root of x2 + x+ ǫ we get

b0 = a1

(

ǫ2(a0 + a1) + (a0a1)2)

, (6.124)

b1 = a0

(

ǫ2(a0 + a1) + (a0a1)2)

, (6.125)

from Equations 6.17 and 6.18 on page 93, again using the fact that squaringand inversion is the same in F4. This gives us the architecture in Fig-ure 6.17c, which can be seen as a simplified version of the architecture inFigure 6.5b for F16.

For the type III basis above defined by a root of x2 + x + ǫ and a root ofx2 + x+ ǫ2 we get

b0 =(

ǫ2a0 + a1

) (

a0 + ǫ2a1 + (a0a1)2)

, (6.126)

b1 =(

a0 + ǫ2a1

) (

a0 + ǫ2a1 + (a0a1)2)

, (6.127)

from Equations 6.40 and 6.41 on page 97, again using the fact that squaringand inversion is the same in F4. This gives us the architecture in Fig-ure 6.17d, which can be seen as a simplified version of the architecture inFigure 6.9c for F16.


We have not been able to find any significant improvement on multiplicationin F16 compared to the architectures derived in Sections 6.2 through 6.4.However, we can reduce the input capacitance for multiplication using basesof type II and III by inserting buffers. The placement of these buffers aregiven in Figure 6.18.


(a) (b)

(c) (d)

a0

a0

a0

a0

a1

a1

a1

a1

b0

b0

b0

b0

b1

b1

b1

b1

ǫ2 ǫ2

ǫ2ǫ2

ǫ2

Squaringin F4

Squaringin F4

Squaringin F4

Squaringin F4

Figure 6.17: Architecture of inversion in F16 using (a)(b) a type I basis witht = s = 1, (c) a type II basis with t = s = 1, and (d) a type IIIbasis with t = s = ǫ2. The bold lines denote critical paths.

(a) (b) (c)

a0 a0 a0

a1 a1 a1

b0 b0 b0

b1 b1 b1

c0

c0

c0c1

c1

c1ǫ

ǫ

ǫ2

Figure 6.18: Architecture of multiplication in F16 using (a) a type I basis witht = s = 1, (b) a type II basis with t = s = 1, and (c) a type IIIbasis with t = s = ǫ2. The bold lines denote critical paths. The buffersreduce the input capacitance.


Arithmetic Basis Basisoperation of F16 of F4 cin tint rout a

Buffer All cases 2.00 2.00 1.00 16.00Addition All cases 4.00 2.00 2.00 48.00Multiplication All cases 6.00 16.00 2.00 92.00by a constant

Squaring type I PB 6.00 12.00 2.00 64.00NB 6.00 12.00 2.00 48.00

type II PB 6.00 14.93 2.00 74.93NB 6.00 16.00 2.00 100.00

type III PB 6.00 12.00 2.00 64.00NB 6.00 16.00 2.00 50.00

Table 6.2: The normalized properties of buffering, addition, multiplication by a con-stant, and squaring in F16 for all three types of bases of F16 over F4 andusing both polynomial basis (PB) and normal basis (NB) of F4 over F2.

6.6.5 Properties of Arithmetic in F16

The properties of our architectures depend on the properties of the opera-tions in F2 and in some cases on the properties of the operations in F4. Theproperties of operations in F2 are given in Table 3.1 on page 28, while theproperties of the operations in F4 are given in Table 6.1 on page 108.

The properties of buffering, addition, multiplication by a constant, andsquaring in F16 are given in Table 6.2. The properties of multiplication inF16 are given in Table 6.3. Finally the properties of inversion in F16 aregiven in Table 6.4. The properties are given assuming that inverter chainsare used in the critical path whenever that minimizes the delay.

We can note that we minimize both the area and the delay of inversion inF16 by choosing normal basis representation of F4 and a type II basis of F16

over F4 with s = t = 1. The area is minimized when we use F4-multiplierC, but the delay is minimized when we use F4-multiplier D.

On the other hand, for multiplication in F16 we minimize both the area andthe delay by choosing polynomial basis representation of F4 and a type Ibasis of F16 over F4 with s = t = 1. The area is minimized when we useF4-multiplier A, but the delay is minimized when we use F4-multiplier B.


Type I Type II Type III

Basi

sof

F4

over

F2.

Mult

iplier

inF

4

Mult

iplier

inFig

ure

6.1

8a

Mult

iplier

inFig

ure

6.1

8b

Mult

iplier

inFig

ure

6.1

8c

Area PB A 297.46 335.46 330.80B 320.93 364.39 359.72

NB C 315.46 353.46 348.80D 320.93 364.39 359.72

Delay PB A 42.93 56.93 54.93B 38.93 55.86 53.85

NB C 46.93 60.93 58.93D 42.93 59.86 57.85

Input PB A 8.00 6.00 6.00

capacitance B 10.00 6.00 6.00

NB C 8.00 6.00 6.00

D 10.00 6.00 6.00

Output All cases 2.00 2.00 2.00

resistance

Table 6.3: The normalized area, delay, input capacitance, and output resistance ofmultiplication in F16 for all three types of bases of F16 over F4 and usingboth polynomial basis (PB) and normal basis (NB) of F4 over F2. Thesmallest numbers for each F16-multiplier are emphasized, and smallestnumbers for each choice in F4 are bold.


Type I Type II Type III

Basi

sof

F4

over

F2.

Mult

iplier

inF

4

Inver

ter

inFig

ure

6.1

7a

Inver

ter

inFig

ure

6.1

7b

Inver

ter

inFig

ure

6.1

7c

Inver

ter

inFig

ure

6.1

7d

Area PB A 322.93 318.93 298.93 338.93B 343.99 343.05 319.99 359.99

NB C 314.00 310.00 290.00 330.00D 317.06 316.12 293.06 333.06

Delay PB A 74.93 87.86 74.93 74.93

B 68.66 82.66 68.66 68.66

NB C 70.00 82.93 70.00 70.00

D 63.74 77.74 63.74 63.74

Input PB A 6.00 6.00 6.00 6.00

capacitance B 8.00 6.00 8.00 8.00

NB C 6.00 6.00 6.00 6.00

D 8.00 6.00 8.00 8.00

Output All cases 2.00 2.00 2.00 2.00

resistance

Table 6.4: The normalized area, delay, input capacitance, and output resistance ofinversion in F16 for all three types of bases of F16 over F4 and using bothpolynomial basis (PB) and normal basis (NB) of F4 over F2. The smallestnumbers for each F16-inverter are emphasized, and smallest numbers foreach choice in F4 are bold.


6.7 Properties of Inversion in Tower Fields

We have assumed that we have m = 2k for some positive integer k. Wehave also assumed that the field F2m is constructed by successive extensionsof degree 2, starting with F2. For the analysis we assume that all theseextensions are made using the same type of basis. The only exception isF4 since there is no basis of type III of F4 over F2. For all three types ofbases of the larger fields we consider both polynomial and normal bases ofF4 over F2 and all four multiplier architectures derived in Section 6.6. Foreach type, we wish to find the smallest and the fastest inverter.

Each architecture for F2m in Sections 6.2 through 6.4 use arithmetic opera-tions in F2m/2 . Therefore we can express the properties of these architecturesusing corresponding properties of architectures for F2m/2 . This gives us a setof first order difference equations, and the solution of these equations givesus the explicit properties. The initial conditions of the equations are theproperties of our architectures for F16 in Section 6.6. We assume that opti-mal inverter chains are used whenever they reduce the delay of the criticalpath. We therefore will use the functions a (r, c) and t (r, c), the normalizedarea and delay respectively of an optimal inverter chain driven by a normal-ized resistance r and loaded by a normalized capacitance c, introduced inSection 3.3.1. Buffers are also used to reduce capacitive loads in the criticalpath to further reduce the delay whenever possible.

Let cA(k), rA(k), tA(k), and aA(k) denote the normalized input capaci-tance, output resistance, internal delay, and area respectively of additionin F

22k . We use similar notations with indices B, CM, and I for the corre-sponding properties of buffering, multiplication by a constant, and inversionrespectively in F

22k .

Buffering and addition are performed componentwise and therefore it is easyto establish the properties

cB(k) = 2, cA(k) = 4,rB(k) = 1, rA(k) = 2,tB(k) = 2, tA(k) = 2,aB(k) = 4 · 2k, aA(k) = 12 · 2k,

for buffering and addition.

6.7. Properties of Inversion in Tower Fields 119

For multiplication by a constant using the architecture in Figures 6.3a, 6.7a,and 6.11a on pages 90, 94, and 101 we have the equations

cCM(k) = 2cCM(k − 1),

rCM(k) = rA(k − 1),

tCM(k) = tCM(k − 1) + t (rCM(k − 1), cA(k − 1)) + tA(k − 1)

= tCM(k − 1) + 10,

aCM(k) = 4aCM(k − 1) + 4a (rCM(k − 1), cA(k − 1))m

2+ 2aA(k − 1)

= 4aCM(k − 1) + 12 · 2k.

The initial conditions for F16 are found in Table 6.2. They are cCM(2) = 6,rCM(2) = 2, tCM(2) = 16, and aCM(2) = 92. The solution is

cCM(k) = 3 · 2k−1,

rCM(k) = 2,

tCM(k) = 10k − 4,

aCM(k) = 8.75 · 4k − 12 · 2k.

Using the same recursive principle we get the properties of all our architec-tures.

In Section 6.4 we use a multiplication of an element of F22k by a primitive

constant in F4. The construction of our field as a tower field can be inter-preted as a resulting extension over F4 of degree 2k−1. Consequently, wecan implement this multiplication componentwise over F4 using 2k−1 copiesof the simple architectures in Figures 6.13cd and 6.14cd. As for bufferingand addition we can easily establish the properties

cCM,F4(k) = 6, tCM,F4

(k) = 2,rCM,F4

(k) = 2, aCM,F4(k) = 2k+3.

for multiplication by a constant in F4. The area, delay, as well as inputcapacitance, are considerably smaller than the corresponding properties formultiplication by a constant in F

22k .

Most architectures for the arithmetic operations that we have considered inthis chapter have the normalized output resistance 2. However, the differentarchitectures for inversion that we have derived do not have the same inputcapacitance. Therefore we assume that all inverters are driven from outputswith normalized resistance 2. Any inverter chains that are needed for theseinput links are included in the area and the delay of the inverters.


6.7.1 Type I Bases

The smallest type I inverter, for all k > 2, is the inverter in Figure 6.1c, usingthe multiplication by a constant in Figure 6.3b, the squarer in Figure 6.4b,the multiplier in Figure 6.2b, and assuming a polynomial basis of F4. Weuse the F4 multiplier in Figure 6.13a and the F16 inverter in Figure 6.17b.For this configuration we have the properties

tI(k) = 7.6 · k3 − 8.4 · k2 + 19.4 · k + 21.5 + 4.4

k+1∑

i=4

log i,

aI(k) = 16.1 · 3kk + 46.7 · 3k − 18.5 · 2kk − 56.3 · 2k − 18.1.

The situation is somewhat more complicated for fast type I inverters. Fork = 3 the fastest inverter is the inverter in Figure 6.1c, using the multiplica-tion by a constant in Figure 6.3a, the squarer in Figure 6.4b, the multiplierin Figure 6.2b, assuming a normal basis of F4, with the F4 multiplier inFigure 6.14b. Then we have the properties

tI(3) = 203.6,

aI(3) = 1781.7.

The fastest inverter for k = 4 is instead the inverter in Figure 6.1d, still usingthe multiplication by a constant in Figure 6.3a, the squarer in Figure 6.4b,the multiplier in Figure 6.2b, assuming a normal basis of F4, with the F4

multiplier in Figure 6.14b. Then we have the properties

tI(4) = 421.2,

aI(4) = 7492.2.

For k = 5 the fastest inverter is again the inverter in Figure 6.1c, still usingthe multiplication by a constant in Figure 6.3a, the squarer in Figure 6.4b,the multiplier in Figure 6.2b, but assuming a polynomial basis of F4, withthe F4 multiplier in Figure 6.13b. Then we have the properties

tI(5) = 761.8,

aI(5) = 28449.

The fastest type I inverter, for all k > 5, is the inverter in Figure 6.1d,still using the multiplication by a constant in Figure 6.3a, the squarer inFigure 6.4b, the multiplier in Figure 6.2b, again assuming a polynomial


basis of F4, with the F4 multiplier in Figure 6.13b. For this configurationwe have the properties

tI(k) = 4.8 · k3 + 2.1 · k2 + 6.4 · k + 4.4

k+2∑

i=5

log i+ 49.7,

aI(k) = 15.4 · 4k + 70.2 · 3k + 3.0 · 2kk − 143.6 · 2k + 53.7,

for k > 5.

6.7.2 Type II Bases

The smallest type II inverter, for all k > 2, is the inverter in Figure 6.5b,using the multiplication by a constant in Figure 6.7b, the squarer in Fig-ure 6.8c, the multiplier in Figure 6.6b, and assuming a polynomial basisof F4. We use the F4 multiplier in Figure 6.13a and the F16 inverter inFigure 6.17c. For this configuration we have the properties

tI(k) = 9.0 · k3 − 16.0 · k2 + 55.7 · k − 33.4,

aI(k) = 17.4 · 3kk + 13.4 · 3k + 5.5 · 2kk − 2.1 · 2k − 149.1.

The fastest type II inverter, for all k > 2, is also the inverter in Figure 6.5b,but using the multiplication by a constant in Figure 6.7a, the squarer inFigure 6.8a, the multiplier in Figure 6.6b, and assuming a polynomial basisof F4. We use the F4 multiplier in Figure 6.13b and the F16 inverter inFigure 6.17c. For this configuration we have the properties

tI(k) = 4.8 · k3 + 11.0 · k2 − 0.6 · k,aI(k) = 20.5 · 4k + 79.4 · 3k − 12.0 · 2kk − 178.1 · 2k + 86.4.

6.7.3 Type III Bases

The smallest type III inverter, for all k > 2, is the inverter in Figure 6.9c,assuming q0 = p0+1 and s = t = d−1, using the multiplication by a constantin Figure 6.11b, the squarer in Figure 6.12c, the multiplier in Figure 6.10c,


and assuming a polynomial basis of F4. We use the F4 multiplier in Fig-ure 6.13a and the F16 inverter in Figure 6.17d. For this configuration wehave the properties

tI(k) = 12.3 · k3 − 32.9 · k2 + 83.3 · k − 47.5,

aI(k) = 27.2 · 3kk + 22.0 · 3k − 13.3 · 2kk − 69.3 · 2k + 56.8.

The fastest type III inverter, for all k > 2, is the inverter in Figure 6.9a,also assuming q0 = p0 + 1 and s = t = d−1, using the multiplication by aconstant in Figure 6.11a, the squarer in Figure 6.12a, the multiplier in Fig-ure 6.10c, and assuming a polynomial basis of F4. We use the F4 multiplierin Figure 6.13b and the F16 inverter in Figure 6.17d. For this configurationwe have the properties

tI(k) = 4.8 · k3 + 12.0 · k2 − 7.9 · k + 5.9,

aI(k) = 23.9 · 4k + 87.9 · 3k − 12.0 · 2kk − 175.8 · 2k − 14.7.

6.7.4 Best Choices

In Figures 6.19 and 6.20 we have plotted the normalized time and area,respectively, needed for the fastest and smallest architectures for each typeof basis, for extension degree m = 2k, 1 ≤ k ≤ 6. We should note that thenormalized area of these architectures also serve as upper bounds on thenormalized energy of these architectures, since we have n = 1.

The fastest choice, as well as the smallest choice, for k = 1 is trivially the(type II) normal basis inverter in Figure 6.14d on page 107.

The fastest choice for k = 2 is any of the type I inverter in Figure 6.17a, thetype II inverter in Figure 6.17c, or the type III inverter in Figure 6.17d, usingnormal basis representation of F4, with the F4 multiplier in Figure 6.14b.However, the smallest of these three inverters is the type II inverter, andhence, that should be the natural choice when delay is the most importantmeasure. The fastest choices for k > 2 are the fast type I inverters in 6.7.1using a normal basis of F4 for 2 < k < 5 and using a polynomial basis of F4

for k ≥ 5.


0 10 20 30 40 50 600

5

10

15

20

25

30

35

40Tower field inverters

TF−I−small TF−I−fast TF−II−small TF−II−fast TF−III−smallTF−III−fast

m

nt/

m

Figure 6.19: Normalized time needed for inversion in F2m using tower field repre-sentations and the architectures considered in this chapter.

TF-I-small – Inversion using bases of type I, smallest case.TF-I-fast – Inversion using bases of type I, fastest case.TF-II-small – Inversion using bases of type II, smallest case.TF-II-fast – Inversion using bases of type II, fastest case.TF-III-small – Inversion using bases of type III, smallest case.TF-III-fast – Inversion using bases of type III, fastest case.


0 10 20 30 40 50 600

5

10

15

20

25

30

35

40Tower field inverters

TF−I−small TF−I−fast TF−II−small TF−II−fast TF−III−smallTF−III−fast

m

a/m

2

Figure 6.20: Normalized area needed for inversion in F2m using tower field repre-sentations and the architectures considered in this chapter.

TF-I-small – Inversion using bases of type I, smallest case.TF-I-fast – Inversion using bases of type I, fastest case.TF-II-small – Inversion using bases of type II, smallest case.TF-II-fast – Inversion using bases of type II, fastest case.TF-III-small – Inversion using bases of type III, smallest case.TF-III-fast – Inversion using bases of type III, fastest case.


The smallest choice for k = 2 is the type II inverter in Figure 6.17c usingnormal basis representation of F4, with the F4 multiplier in Figure 6.14a.The smallest choice for k = 3 is the type I inverter in Figure 6.1c usingthe multiplication by a constant in Figure 6.3a, the squarer in Figure 6.4b,the multiplier in Figure 6.2b, assuming a normal basis of F4, with the F4

multiplier in Figure 6.14b. with the F4 multiplier in Figure 6.14a. Thesmallest choice for 3 < k < 25 is the small type II inverter in Section 6.7.2,where we instead use a polynomial basis of F4. The smallest choice fork ≥ 25 is the small type I inverter in Section 6.7.1, using a polynomial basisof F4. However, such large fields are hardly of any practical interest today.In Figures 6.19 and 6.20 we have plotted the normalized time, area, andenergy, respectively, needed for the architectures considered in this chapter,for 2 ≤ m ≤ 65. These plots are based on the cost measures given inSections 6.7.1 through 6.7.3.

The best choice of representation of F4 varies with the size of the field. Forsmall k the best choice for all three types is normal basis representation ofF4, but for large k the best choice is instead a polynomial basis. This has todo with the area and the delay of inversion and multiplication in F4. For anychoice of k we only have one F4 inverter but the number of F4 multipliersgrows with k. Therefore the size and delay of an F4 inverter is importantfor small k and the size and delay of an F4 multiplier is important for largek. In Table 6.1 on page 108 we see that the smallest polynomial basismultiplier is smaller than the smallest normal basis multiplier in F4. Also,the fastest polynomial basis multiplier is smaller than the fastest normalbasis multiplier. On the other hand, the smallest and fastest F4 inverter isthe trivial normal basis inverter.

Using a polynomial basis of F4 when a normal basis is the best, or using anormal basis of F4 when a polynomial basis is the best makes the propertiesslightly larger. The increase in both area and delay is a few per cent forsmall values of k. The increase is smaller for large values of k.

Chapter 7

Concluding Remarks

The architectures considered in this thesis can be separated into the follow-ing two categories;

1. architectures based on a direct extension of F2, and

2. architectures based on extensions in more than one step.

The architectures can also be separated into the following two categories;

1. sequential architectures, and

2. parallel architectures.

These two ways to characterize the architectures coincide in this thesis. Thearchitectures in Chapters 4 and 5 for polynomial and normal basis represen-tation fall into category 1 of both characterizations, while the architecturesin Chapter 6 for tower fields fall into category 2 of both characterizations.This does not necessarily mean that the architectures in these two categoriesare uncomparable.

127

128 Chapter 7. Concluding Remarks

7.1 Conclusions

In Figures 7.1 through 7.3 we have plotted the normalized time, area, andenergy, respectively, needed for the best architectures considered in Chap-ters 4 through 6 for 2 ≤ m ≤ 65. Recall that we have neglected the delayof the control logic of the sequential inverters in Chapters 4 and 5. Uponstudying those figures, we note that there is no architecture that is the bestin all respects.

7.1.1 Fast Inverters

We find fast inverters in all three chapters, where we consider differentinverter architectures, Chapters 4 through 6. More precisely:

• m ∈ {2, 4, 8, 16, 32}:The fastest inverters are found in Chapter 6, where we use tower fieldrepresentation. For m ∈ {2, 4}, the fastest inverters are based on TypeII bases. For m ∈ {8, 16, 32}, we should instead prefer type I inverters.

• m ∈ {3, 5 . . . 7, 9 . . . 15, 17, 19 . . . 25, 27, 29, 31}:The fastest inverter is found in Chapter 4, where we use polynomial ba-sis representation. We should prefer the inverter based on the Gauss-Jordan algorithm using Hasan and Bhargava’s preprocessor.

• m ∈ {18, 26, 28, 30, 34 . . . 60}:The fastest inverter is found in Chapter 5, where we use normalbasis representation. For those values of the extension degree, thefastest choice is Feng’s inverter based on accellerated multiplicationand squaring.

• m > 60:We do not know the exact values of the smallest Hamming complexityamong normal bases for these extension degrees. However, the upperbound given in Theorem 4 on page 64 still garantees that the timeneeded for one inversion grows slower for Feng’s inverter than for allother considered inverters.

7.1. Conclusions 129

0 10 20 30 40 50 600

10

20

30

40

50

60

70Fast inverters

GJ−1 M&S−3 AOP:GJ−1 TF−I−fast TF−II−fast

m

nt/

m

Figure 7.1: Normalized time needed for inversion in F2m using the fastest architec-tures for polynomial basis, normal basis, and tower field representationsconsidered in Chapters 4 through 6.

GJ-1 – Polynomial basis inversion based on the Gauss-Jordan al-gorithm using Hasan and Bhargava’s preprocessor.M&S-3 – Normal basis inversion based on accellerated multiplicationand squaring using Fengs idea.AOP:GJ-1 – Normal basis inversion based on the Gauss-Jordanalgorithm using Hasan and Bhargava’s preprocessor for the case wherethe normal basis is generated from an irreducible all-one polynomial, viaa basis exchange to the corresponding polynomial basis.TF-I-fast – Tower field inversion using bases of type I, fastest case.TF-II-fast – Tower field inversion using bases of type II, fastest case.


7.1.2 Small Inverters

We find the smallest inverters in Chapter 6 for small extension degrees, andin Chapters 4 for large extension degrees. More precisely:

• m ∈ {2, 4}:The smallest inverters are found in Chapter 6, where we use towerfield representation. We should prefer the smallest type II inverters.

• m = 3 and m > 4:The smallest inverter is found in Chapter 4, where we use polyno-mial basis representation. For those values of the extension degree,the smallest choice is our inverter based on the Berlekamp-Masseyalgorithm.

7.1.3 Low Energy Inverters

We find inverters consuming little energy in all three chapters, where weconsider different inverter architectures, Chapters 4 through 6. More pre-cisely:

• m = 2k, k ∈ Z+:The least energy consuming inverters are found in Chapter 6, where weuse tower field representation. For m = 8, the least energy consuminginverter is the smallest type I inverter. For all other k < 25, thesmallest inverter is a type II inverter. For k ≥ 25, we should prefer atype I inverter.

• m ∈ {3, 5 . . . 7, 9 . . . 12, 14, 18, 26}:The least energy consuming inverters are found in Chapter 5, wherewe use normal basis representation. For m = 3, we prefer our modifiedversion of Mastrovito’s inverter. In all other cases, we prefer Feng’sinverter.

• All other cases:The least energy consuming inverter is found in Chapter 4, namely ourinverter based on the Berlekamp-Massey algorithm, using polynomialbasis representation.

7.1. Conclusions 131

0 10 20 30 40 50 600

50

100

150

200

250Small inverters

BM M&S−1&2 AOP:BM TF−I−small TF−II−small

m

a/m

2

Figure 7.2: Normalized area needed for inversion in F2m using the smallest architec-tures for polynomial basis, normal basis, and tower field representationsconsidered in Chapters 4 through 6.

BM – Polynomial basis inversion based on the Berlekamp-Masseyalgorithm.M&S-1&2 – Normal basis inversion based on multiplication andsquaring using our modified versions of the ideas of Wang et al. andMastrovito respectively.AOP:BM – Normal basis inversion based on the Berlekamp-Masseyalgorithm, for the case where the normal basis is generated from anirreducible all-one polynomial, via a basis exchange to the correspondingpolynomial basis.TF-I-small – Tower field inversion using bases of type I, smallest case.TF-II-small – Tower field inversion using bases of type II, smallestcase.


0 10 20 30 40 50 600

50

100

150

200

250

300

350

400

450

500Low energy inverters

BM M&S−2 M&S−3 AOP:BM TF−I−small TF−II−small

m

np/

m2

Figure 7.3: Normalized energy needed for inversion in F2m using the architecturesneeding the smallest amount of energy for polynomial basis, normalbasis, and tower field representations considered in Chapters 4 through 6.

BM – Polynomial basis inversion based on the Berlekamp-Masseyalgorithm.M&S-2 – Normal basis inversion based on multiplication and squaringusing our modified version of the idea of Mastrovito.M&S-3 – Normal basis inversion based on accellerated multiplicationand squaring using Fengs idea.AOP:BM – Normal basis inversion based on the Berlekamp-Masseyalgorithm, for the case where the normal basis is generated from anirreducible all-one polynomial, via a basis exchange to the correspondingpolynomial basis.TF-I-small – Tower field inversion using bases of type I, smallest case.TF-II-small – Tower field inversion using bases of type II, smallestcase.

7.2. Future Research 133

7.2 Future Research

Our upper bound on the Hamming complexity of normal bases does notseem to be especially good. We noted in Section 5.2 that the bound seemsto be off by approximately 10% for fairly large extension degrees. Is itpossible to find better upper bounds? Our investigation suggests that thecases m ≡ i mod 4, i ∈ {0, 1, 2, 3}, should be separated.

A possibly harder problem, but more interesting, would be the search foran upper bound on the smallest possible Hamming complexity of normalbases for all or some extension degrees.

There seems to be a lack of explicitly given normal bases with small Ham-ming complexity for many extension degrees > 32. Therefore a search fornormal bases for such bases is of interest, preferably with minimum Ham-ming complexity. This is a computationally demanding task. Methods forexplicit constructions are also of interest. Theorem 5 gives us a method forconstruction of normal bases for extension degrees that are not divisible byeight. A corresponding method for degrees that are divisible by eight wouldbe preferable as well.

The list in Appendix A of minimum weight irreducible polynomials over F2,for degrees up to 4000, contains only polynomials of weight ≤ 5. It is ourbelief that this holds for all degrees, but we have yet to prove that.

Most published architectures for multiplication and inversion are based oneither polynomial or normal basis representation. What about other bases?Are there other interesting, preferably infinite, classes of bases? Our in-vestigation of tower field representation, where the extension is a power oftwo, is one example. A natural extension of our investigation of tower fieldrepresentations, would be to consider for instance extension degrees thatare powers of three, or of the form 2i3j .

Appendix A

Minimum Weight Irreducible

Polynomials over F2

This appendix contains a table of irreducible polynomials, one polynomialfor each degree up to 4000. The polynomials are found using Magma bya lexicographic search among the possible trinomials of the given degrees.In case no irreducible trinomial is found for the given degree, the searchcontinues with pentanomials and so on. The first irreducible polynomialfound using this procedure is choosen. The result is that the polynomialchoosen has minimum possible weight for the given degree, m, and giventhat weight it has the minimum degree, k1, of the second term and so on.

To save space we have only listed the degrees of the mid terms, since irre-ducible polynomials over F2 always contain 1 and xm, where m is the degree.The table should be interpreted as follows. The polynomial is a trinomialif there is only one mid term, and it is a pentanomial if there are three midterms. As an example, take the entries for degrees 90 and 91:

m ki

90 2791 8, 5, 1

These two rows should be interpreted as the trinomial x90 + x27 + 1 and thepentanomial x91 + x8 + x5 + x+ 1.

135

136 Appendix A. Minimum Weight Irreducible Polynomials over F2.

A.1 The table

m ki

1

2 1

3 1

4 1

5 2

6 1

7 1

8 4, 3, 1

9 1

10 3

11 2

12 3

13 4, 3, 1

14 5

15 1

16 5, 3, 1

17 3

18 3

19 5, 2, 1

20 3

21 2

22 1

23 5

24 4, 3, 1

25 3

26 4, 3, 1

27 5, 2, 1

28 1

29 2

30 1

31 3

32 7, 3, 2

33 10

34 7

35 2

36 9

37 6, 4, 1

38 6, 5, 1

39 4

40 5, 4, 3

41 3

42 7

43 6, 4, 3

44 5

45 4, 3, 1

46 1

47 5

48 5, 3, 2

49 9

50 4, 3, 2

51 6, 3, 1

52 3

53 6, 2, 1

54 9

55 7

56 7, 4, 2

57 4

58 19

59 7, 4, 2

60 1

61 5, 2, 1

62 29

63 1

64 4, 3, 1

65 18

66 3

67 5, 2, 1

68 9

69 6, 5, 2

70 5, 3, 1

m ki

71 6

72 10, 9, 3

73 25

74 35

75 6, 3, 1

76 21

77 6, 5, 2

78 6, 5, 3

79 9

80 9, 4, 2

81 4

82 8, 3, 1

83 7, 4, 2

84 5

85 8, 2, 1

86 21

87 13

88 7, 6, 2

89 38

90 27

91 8, 5, 1

92 21

93 2

94 21

95 11

96 10, 9, 6

97 6

98 11

99 6, 3, 1

100 15

101 7, 6, 1

102 29

103 9

104 4, 3, 1

105 4

106 15

107 9, 7, 4

108 17

109 5, 4, 2

110 33

111 10

112 5, 4, 3

113 9

114 5, 3, 2

115 8, 7, 5

116 4, 2, 1

117 5, 2, 1

118 33

119 8

120 4, 3, 1

121 18

122 6, 2, 1

123 2

124 19

125 7, 6, 5

126 21

127 1

128 7, 2, 1

129 5

130 3

131 8, 3, 2

132 17

133 9, 8, 2

134 57

135 11

136 5, 3, 2

137 21

138 8, 7, 1

139 8, 5, 3

140 15

m ki

141 10, 4, 1

142 21

143 5, 3, 2

144 7, 4, 2

145 52

146 71

147 14

148 27

149 10, 9, 7

150 53

151 3

152 6, 3, 2

153 1

154 15

155 62

156 9

157 6, 5, 2

158 8, 6, 5

159 31

160 5, 3, 2

161 18

162 27

163 7, 6, 3

164 10, 8, 7

165 9, 8, 3

166 37

167 6

168 15, 3, 2

169 34

170 11

171 6, 5, 2

172 1

173 8, 5, 2

174 13

175 6

176 11, 3, 2

177 8

178 31

179 4, 2, 1

180 3

181 7, 6, 1

182 81

183 56

184 9, 8, 7

185 24

186 11

187 7, 6, 5

188 6, 5, 2

189 6, 5, 2

190 8, 7, 6

191 9

192 7, 2, 1

193 15

194 87

195 8, 3, 2

196 3

197 9, 4, 2

198 9

199 34

200 5, 3, 2

201 14

202 55

203 8, 7, 1

204 27

205 9, 5, 2

206 10, 9, 5

207 43

208 9, 3, 1

209 6

210 7

m ki

211 11, 10, 8

212 105

213 6, 5, 2

214 73

215 23

216 7, 3, 1

217 45

218 11

219 8, 4, 1

220 7

221 8, 6, 2

222 5, 4, 2

223 33

224 9, 8, 3

225 32

226 10, 7, 3

227 10, 9, 4

228 113

229 10, 4, 1

230 8, 7, 6

231 26

232 9, 4, 2

233 74

234 31

235 9, 6, 1

236 5

237 7, 4, 1

238 73

239 36

240 8, 5, 3

241 70

242 95

243 8, 5, 1

244 111

245 6, 4, 1

246 11, 2, 1

247 82

248 15, 14, 10

249 35

250 103

251 7, 4, 2

252 15

253 46

254 7, 2, 1

255 52

256 10, 5, 2

257 12

258 71

259 10, 6, 2

260 15

261 7, 6, 4

262 9, 8, 4

263 93

264 9, 6, 2

265 42

266 47

267 8, 6, 3

268 25

269 7, 6, 1

270 53

271 58

272 9, 3, 2

273 23

274 67

275 11, 10, 9

276 63

277 12, 6, 3

278 5

279 5

280 9, 5, 2

m ki

281 93

282 35

283 12, 7, 5

284 53

285 10, 7, 5

286 69

287 71

288 11, 10, 1

289 21

290 5, 3, 2

291 12, 11, 5

292 37

293 11, 6, 1

294 33

295 48

296 7, 3, 2

297 5

298 11, 8, 4

299 11, 6, 4

300 5

301 9, 5, 2

302 41

303 1

304 11, 2, 1

305 102

306 7, 3, 1

307 8, 4, 2

308 15

309 10, 6, 4

310 93

311 7, 5, 3

312 9, 7, 4

313 79

314 15

315 10, 9, 1

316 63

317 7, 4, 2

318 45

319 36

320 4, 3, 1

321 31

322 67

323 10, 3, 1

324 51

325 10, 5, 2

326 10, 3, 1

327 34

328 8, 3, 1

329 50

330 99

331 10, 6, 2

332 89

333 2

334 5, 2, 1

335 10, 7, 2

336 7, 4, 1

337 55

338 4, 3, 1

339 16, 10, 7

340 45

341 10, 8, 6

342 125

343 75

344 7, 2, 1

345 22

346 63

347 11, 10, 3

348 103

349 6, 5, 2

350 53

A.1. The table. 137

m ki

351 34

352 13, 11, 6

353 69

354 99

355 6, 5, 1

356 10, 9, 7

357 11, 10, 2

358 57

359 68

360 5, 3, 2

361 7, 4, 1

362 63

363 8, 5, 3

364 9

365 9, 6, 5

366 29

367 21

368 7, 3, 2

369 91

370 139

371 8, 3, 2

372 111

373 8, 7, 2

374 8, 6, 5

375 16

376 8, 7, 5

377 41

378 43

379 10, 8, 5

380 47

381 5, 2, 1

382 81

383 90

384 12, 3, 2

385 6

386 83

387 8, 7, 1

388 159

389 10, 9, 5

390 9

391 28

392 13, 10, 6

393 7

394 135

395 11, 6, 5

396 25

397 12, 7, 6

398 7, 6, 2

399 26

400 5, 3, 2

401 152

402 171

403 9, 8, 5

404 65

405 13, 8, 2

406 141

407 71

408 5, 3, 2

409 87

410 10, 4, 3

411 12, 10, 3

412 147

413 10, 7, 6

414 13

415 102

416 9, 5, 2

417 107

418 199

419 15, 5, 4

420 7

421 5, 4, 2

422 149

423 25

424 9, 7, 2

425 12

m ki

426 63

427 11, 6, 5

428 105

429 10, 8, 7

430 14, 6, 1

431 120

432 13, 4, 3

433 33

434 12, 11, 5

435 12, 9, 5

436 165

437 6, 2, 1

438 65

439 49

440 4, 3, 1

441 7

442 7, 5, 2

443 10, 6, 1

444 81

445 7, 6, 4

446 105

447 73

448 11, 6, 4

449 134

450 47

451 16, 10, 1

452 6, 5, 4

453 15, 6, 4

454 8, 6, 1

455 38

456 18, 9, 6

457 16

458 203

459 12, 5, 2

460 19

461 7, 6, 1

462 73

463 93

464 19, 18, 13

465 31

466 14, 11, 6

467 11, 6, 1

468 27

469 9, 5, 2

470 9

471 1

472 11, 3, 2

473 200

474 191

475 9, 8, 4

476 9

477 16, 15, 7

478 121

479 104

480 15, 9, 6

481 138

482 9, 6, 5

483 9, 6, 4

484 105

485 17, 16, 6

486 81

487 94

488 4, 3, 1

489 83

490 219

491 11, 6, 3

492 7

493 10, 5, 3

494 17

495 76

496 16, 5, 2

497 78

498 155

499 11, 6, 5

500 27

m ki

501 5, 4, 2

502 8, 5, 4

503 3

504 15, 14, 6

505 156

506 23

507 13, 6, 3

508 9

509 8, 7, 3

510 69

511 10

512 8, 5, 2

513 26

514 67

515 14, 7, 4

516 21

517 12, 10, 2

518 33

519 79

520 15, 11, 2

521 32

522 39

523 13, 6, 2

524 167

525 6, 4, 1

526 97

527 47

528 11, 6, 2

529 42

530 10, 7, 3

531 10, 5, 4

532 1

533 4, 3, 2

534 161

535 8, 6, 2

536 7, 5, 3

537 94

538 195

539 10, 5, 4

540 9

541 13, 10, 4

542 8, 6, 1

543 16

544 8, 3, 1

545 122

546 8, 2, 1

547 13, 7, 4

548 10, 5, 3

549 16, 4, 3

550 193

551 135

552 19, 16, 9

553 39

554 10, 8, 7

555 10, 9, 4

556 153

557 7, 6, 5

558 73

559 34

560 11, 9, 6

561 71

562 11, 4, 2

563 14, 7, 3

564 163

565 11, 6, 1

566 153

567 28

568 15, 7, 6

569 77

570 67

571 10, 5, 2

572 12, 8, 1

573 10, 6, 4

574 13

575 146

m ki

576 13, 4, 3

577 25

578 23, 22, 16

579 12, 9, 7

580 237

581 13, 7, 6

582 85

583 130

584 14, 13, 3

585 88

586 7, 5, 2

587 11, 6, 1

588 35

589 10, 4, 3

590 93

591 9, 6, 4

592 13, 6, 3

593 86

594 19

595 9, 2, 1

596 273

597 14, 12, 9

598 7, 6, 1

599 30

600 9, 5, 2

601 201

602 215

603 6, 4, 3

604 105

605 10, 7, 5

606 165

607 105

608 19, 13, 6

609 31

610 127

611 10, 4, 2

612 81

613 19, 10, 4

614 45

615 211

616 19, 10, 3

617 200

618 295

619 9, 8, 5

620 9

621 12, 6, 5

622 297

623 68

624 11, 6, 5

625 133

626 251

627 13, 8, 4

628 223

629 6, 5, 2

630 7, 4, 2

631 307

632 9, 2, 1

633 101

634 39

635 14, 10, 4

636 217

637 14, 9, 1

638 6, 5, 1

639 16

640 14, 3, 2

641 11

642 119

643 11, 3, 2

644 11, 6, 5

645 11, 8, 4

646 249

647 5

648 13, 3, 1

649 37

650 3


m ki

651 14

652 93

653 10, 8, 7

654 33

655 88

656 7, 5, 4

657 38

658 55

659 15, 4, 2

660 11

661 12, 11, 4

662 21

663 107

664 11, 9, 8

665 33

666 10, 7, 2

667 18, 7, 3

668 147

669 5, 4, 2

670 153

671 15

672 11, 6, 5

673 28

674 11, 7, 4

675 6, 3, 1

676 31

677 8, 4, 3

678 15, 5, 3

679 66

680 23, 16, 9

681 11, 9, 3

682 171

683 11, 6, 1

684 209

685 4, 3, 1

686 197

687 13

688 19, 14, 6

689 14

690 79

691 13, 6, 2

692 299

693 15, 8, 2

694 169

695 177

696 23, 10, 2

697 267

698 215

699 15, 10, 1

700 75

701 16, 4, 2

702 37

703 12, 7, 1

704 8, 3, 2

705 17

706 12, 11, 8

707 15, 8, 5

708 15

709 4, 3, 1

710 13, 12, 4

711 92

712 5, 4, 3

713 41

714 23

715 7, 4, 1

716 183

717 16, 7, 1

718 165

719 150

720 9, 6, 4

721 9

722 231

723 16, 10, 4

724 207

725 9, 6, 5

m ki

726 5

727 180

728 4, 3, 2

729 58

730 147

731 8, 6, 2

732 343

733 8, 7, 2

734 11, 6, 1

735 44

736 13, 8, 6

737 5

738 347

739 18, 16, 8

740 135

741 9, 8, 3

742 85

743 90

744 13, 11, 1

745 258

746 351

747 10, 6, 4

748 19

749 7, 6, 1

750 309

751 18

752 13, 10, 3

753 158

754 19

755 12, 10, 1

756 45

757 7, 6, 1

758 233

759 98

760 11, 6, 5

761 3

762 83

763 16, 14, 9

764 6, 5, 3

765 9, 7, 4

766 22, 19, 9

767 168

768 19, 17, 4

769 120

770 14, 5, 2

771 17, 15, 6

772 7

773 10, 8, 6

774 185

775 93

776 15, 14, 7

777 29

778 375

779 10, 8, 3

780 13

781 17, 16, 2

782 329

783 68

784 13, 9, 6

785 92

786 12, 10, 3

787 7, 6, 3

788 17, 10, 3

789 5, 2, 1

790 9, 6, 1

791 30

792 9, 7, 3

793 253

794 143

795 7, 4, 1

796 9, 4, 1

797 12, 10, 4

798 53

799 25

800 9, 7, 1

m ki

801 217

802 15, 13, 9

803 14, 9, 2

804 75

805 8, 7, 2

806 21

807 7

808 14, 3, 2

809 15

810 159

811 12, 10, 8

812 29

813 10, 3, 1

814 21

815 333

816 11, 8, 2

817 52

818 119

819 16, 9, 7

820 123

821 15, 11, 2

822 17

823 9

824 11, 6, 4

825 38

826 255

827 12, 10, 7

828 189

829 4, 3, 1

830 17, 10, 7

831 49

832 13, 5, 2

833 149

834 15

835 14, 7, 5

836 10, 9, 2

837 8, 6, 5

838 61

839 54

840 11, 5, 1

841 144

842 47

843 11, 10, 7

844 105

845 2

846 105

847 136

848 11, 4, 1

849 253

850 111

851 13, 10, 5

852 159

853 10, 7, 1

854 7, 5, 3

855 29

856 19, 10, 3

857 119

858 207

859 17, 15, 4

860 35

861 14

862 349

863 6, 3, 2

864 21, 10, 6

865 1

866 75

867 9, 5, 2

868 145

869 11, 7, 6

870 301

871 378

872 13, 3, 1

873 352

874 12, 7, 4

875 12, 8, 1

m ki

876 149

877 6, 5, 4

878 12, 9, 8

879 11

880 15, 7, 5

881 78

882 99

883 17, 16, 12

884 173

885 8, 7, 1

886 13, 9, 8

887 147

888 19, 18, 10

889 127

890 183

891 12, 4, 1

892 31

893 11, 8, 6

894 173

895 12

896 7, 5, 3

897 113

898 207

899 18, 15, 5

900 1

901 13, 7, 6

902 21

903 35

904 12, 7, 2

905 117

906 123

907 12, 10, 2

908 143

909 14, 4, 1

910 15, 9, 7

911 204

912 7, 5, 1

913 91

914 4, 2, 1

915 8, 6, 3

916 183

917 12, 10, 7

918 77

919 36

920 14, 9, 6

921 221

922 7, 6, 5

923 16, 14, 13

924 31

925 16, 15, 7

926 365

927 403

928 10, 3, 2

929 11, 4, 3

930 31

931 10, 9, 4

932 177

933 16, 6, 1

934 22, 6, 5

935 417

936 15, 13, 12

937 217

938 207

939 7, 5, 4

940 10, 7, 1

941 11, 6, 1

942 45

943 24

944 12, 11, 9

945 77

946 21, 20, 13

947 9, 6, 5

948 189

949 8, 3, 2

950 13, 12, 10

A.1. The table. 139

m ki

951 260

952 16, 9, 7

953 168

954 131

955 7, 6, 3

956 305

957 10, 9, 6

958 13, 9, 4

959 143

960 12, 9, 3

961 18

962 15, 8, 5

963 20, 9, 6

964 103

965 15, 4, 2

966 201

967 36

968 9, 5, 2

969 31

970 11, 7, 2

971 6, 2, 1

972 7

973 13, 6, 4

974 9, 8, 7

975 19

976 17, 10, 6

977 15

978 9, 3, 1

979 178

980 8, 7, 6

981 12, 6, 5

982 177

983 230

984 24, 9, 3

985 222

986 3

987 16, 13, 12

988 121

989 10, 4, 2

990 161

991 39

992 17, 15, 13

993 62

994 223

995 15, 12, 2

996 65

997 12, 6, 3

998 101

999 59

1000 5, 4, 3

1001 17

1002 5, 3, 2

1003 13, 8, 3

1004 10, 9, 7

1005 12, 8, 2

1006 5, 4, 3

1007 75

1008 19, 17, 8

1009 55

1010 99

1011 10, 7, 4

1012 115

1013 9, 8, 6

1014 385

1015 186

1016 15, 6, 3

1017 9, 4, 1

1018 12, 10, 5

1019 10, 8, 1

1020 135

1021 5, 2, 1

1022 317

1023 7

1024 19, 6, 1

1025 294

m ki

1026 35

1027 13, 12, 6

1028 119

1029 98

1030 93

1031 68

1032 21, 15, 3

1033 108

1034 75

1035 12, 6, 5

1036 411

1037 12, 7, 2

1038 13, 7, 2

1039 21

1040 15, 10, 8

1041 412

1042 439

1043 10, 7, 6

1044 41

1045 13, 9, 6

1046 8, 5, 2

1047 10

1048 15, 7, 2

1049 141

1050 159

1051 13, 12, 10

1052 291

1053 10, 9, 1

1054 105

1055 24

1056 11, 2, 1

1057 198

1058 27

1059 6, 3, 1

1060 439

1061 10, 3, 1

1062 49

1063 168

1064 13, 11, 9

1065 463

1066 10, 9, 3

1067 13, 9, 8

1068 15, 8, 3

1069 18, 16, 8

1070 15, 14, 11

1071 7

1072 19, 9, 8

1073 12, 6, 3

1074 7, 4, 3

1075 15, 14, 5

1076 8, 6, 3

1077 10, 9, 7

1078 361

1079 230

1080 15, 9, 6

1081 24

1082 407

1083 16, 7, 2

1084 189

1085 62

1086 189

1087 112

1088 22, 21, 10

1089 91

1090 79

1091 12, 10, 5

1092 23

1093 7, 6, 1

1094 57

1095 139

1096 24, 15, 6

1097 14

1098 83

1099 16, 9, 1

1100 35

m ki

1101 9, 7, 4

1102 117

1103 65

1104 21, 9, 6

1105 21

1106 195

1107 23, 11, 10

1108 327

1109 17, 14, 3

1110 417

1111 13

1112 15, 8, 6

1113 107

1114 19, 10, 6

1115 18, 15, 3

1116 59

1117 12, 10, 4

1118 9, 7, 5

1119 283

1120 13, 9, 6

1121 62

1122 427

1123 14, 7, 3

1124 8, 7, 4

1125 15, 8, 3

1126 105

1127 27

1128 7, 3, 1

1129 103

1130 551

1131 10, 6, 1

1132 6, 4, 1

1133 11, 6, 4

1134 129

1135 9

1136 9, 4, 2

1137 277

1138 31

1139 13, 12, 5

1140 141

1141 12, 7, 3

1142 357

1143 7, 2, 1

1144 11, 9, 7

1145 227

1146 131

1147 7, 6, 3

1148 23

1149 20, 17, 3

1150 13, 4, 1

1151 90

1152 15, 3, 2

1153 241

1154 75

1155 13, 6, 1

1156 307

1157 8, 7, 3

1158 245

1159 66

1160 15, 11, 2

1161 365

1162 18, 16, 11

1163 11, 10, 1

1164 19

1165 8, 6, 1

1166 189

1167 133

1168 12, 7, 2

1169 114

1170 27

1171 6, 5, 1

1172 15, 5, 2

1173 17, 14, 5

1174 133

1175 476

m ki

1176 11, 9, 3

1177 16

1178 375

1179 15, 8, 6

1180 25

1181 17, 11, 6

1182 77

1183 87

1184 5, 3, 2

1185 134

1186 171

1187 13, 8, 4

1188 75

1189 8, 3, 1

1190 233

1191 196

1192 9, 8, 7

1193 173

1194 15, 14, 12

1195 13, 6, 5

1196 281

1197 9, 8, 2

1198 405

1199 114

1200 15, 9, 6

1201 171

1202 287

1203 8, 4, 2

1204 43

1205 4, 2, 1

1206 513

1207 273

1208 11, 10, 6

1209 118

1210 243

1211 14, 7, 1

1212 203

1213 9, 5, 2

1214 257

1215 302

1216 27, 25, 9

1217 393

1218 91

1219 12, 10, 6

1220 413

1221 15, 14, 9

1222 18, 16, 1

1223 255

1224 12, 9, 7

1225 234

1226 167

1227 16, 13, 10

1228 27

1229 15, 6, 2

1230 433

1231 105

1232 25, 10, 2

1233 151

1234 427

1235 13, 9, 8

1236 49

1237 10, 6, 4

1238 153

1239 4

1240 17, 7, 5

1241 54

1242 203

1243 16, 15, 1

1244 16, 14, 7

1245 13, 6, 1

1246 25

1247 14

1248 15, 5, 3

1249 187

1250 15, 13, 10


m ki

1251 13, 10, 5

1252 97

1253 11, 10, 9

1254 19, 10, 4

1255 589

1256 31, 30, 2

1257 289

1258 9, 6, 4

1259 11, 8, 6

1260 21

1261 7, 4, 1

1262 7, 4, 2

1263 77

1264 5, 3, 2

1265 119

1266 7

1267 9, 5, 2

1268 345

1269 17, 10, 8

1270 333

1271 17

1272 16, 9, 7

1273 168

1274 15, 13, 4

1275 11, 10, 1

1276 217

1277 18, 11, 10

1278 189

1279 216

1280 12, 7, 5

1281 229

1282 231

1283 12, 9, 3

1284 223

1285 10, 9, 1

1286 153

1287 470

1288 23, 16, 6

1289 99

1290 10, 4, 3

1291 9, 8, 4

1292 12, 10, 1

1293 14, 9, 6

1294 201

1295 38

1296 15, 14, 2

1297 198

1298 399

1299 14, 11, 5

1300 75

1301 11, 10, 1

1302 77

1303 16, 12, 8

1304 20, 17, 15

1305 326

1306 39

1307 14, 12, 9

1308 495

1309 8, 3, 2

1310 333

1311 476

1312 15, 14, 2

1313 164

1314 19

1315 12, 4, 2

1316 8, 6, 3

1317 13, 12, 3

1318 12, 11, 5

1319 129

1320 12, 9, 3

1321 52

1322 10, 8, 3

1323 17, 16, 2

1324 337

1325 12, 9, 3

m ki

1326 397

1327 277

1328 21, 11, 3

1329 73

1330 11, 6, 1

1331 7, 5, 4

1332 95

1333 11, 3, 2

1334 617

1335 392

1336 8, 3, 2

1337 75

1338 315

1339 15, 6, 4

1340 125

1341 6, 5, 2

1342 15, 9, 7

1343 348

1344 15, 6, 1

1345 553

1346 6, 3, 2

1347 10, 9, 7

1348 553

1349 14, 10, 4

1350 237

1351 39

1352 17, 14, 6

1353 371

1354 255

1355 8, 4, 1

1356 131

1357 14, 6, 1

1358 117

1359 98

1360 5, 3, 2

1361 56

1362 655

1363 9, 5, 2

1364 239

1365 11, 8, 4

1366 1

1367 134

1368 15, 9, 5

1369 88

1370 10, 5, 3

1371 10, 9, 4

1372 181

1373 15, 11, 2

1374 609

1375 52

1376 19, 18, 10

1377 100

1378 7, 6, 3

1379 15, 8, 2

1380 183

1381 18, 7, 6

1382 10, 9, 2

1383 130

1384 11, 5, 1

1385 12

1386 219

1387 13, 10, 7

1388 11

1389 19, 9, 4

1390 129

1391 3

1392 17, 15, 5

1393 300

1394 17, 13, 9

1395 14, 6, 5

1396 97

1397 13, 8, 3

1398 601

1399 55

1400 8, 3, 1

m ki

1401 92

1402 127

1403 12, 11, 2

1404 81

1405 15, 10, 8

1406 13, 2, 1

1407 47

1408 14, 13, 6

1409 194

1410 383

1411 25, 14, 11

1412 125

1413 20, 19, 16

1414 429

1415 282

1416 10, 9, 6

1417 342

1418 5, 3, 2

1419 15, 9, 4

1420 33

1421 9, 4, 2

1422 49

1423 15

1424 11, 6, 2

1425 28

1426 103

1427 18, 17, 8

1428 27

1429 11, 6, 5

1430 33

1431 17

1432 11, 10, 6

1433 387

1434 363

1435 15, 10, 9

1436 83

1437 7, 6, 4

1438 357

1439 13, 12, 4

1440 14, 13, 7

1441 322

1442 395

1443 16, 5, 1

1444 595

1445 13, 10, 3

1446 421

1447 195

1448 11, 3, 2

1449 13

1450 16, 12, 3

1451 14, 3, 1

1452 315

1453 26, 10, 5

1454 297

1455 52

1456 9, 4, 2

1457 314

1458 243

1459 16, 14, 9

1460 185

1461 12, 5, 3

1462 13, 5, 2

1463 575

1464 12, 9, 3

1465 39

1466 311

1467 13, 5, 2

1468 181

1469 20, 18, 14

1470 49

1471 25

1472 11, 4, 1

1473 77

1474 17, 11, 10

1475 15, 14, 8

m ki

1476 21

1477 17, 10, 5

1478 69

1479 49

1480 11, 10, 2

1481 32

1482 411

1483 21, 16, 3

1484 11, 7, 4

1485 22, 10, 3

1486 85

1487 140

1488 9, 8, 6

1489 252

1490 279

1491 9, 5, 2

1492 307

1493 17, 10, 4

1494 13, 12, 9

1495 94

1496 13, 11, 4

1497 49

1498 17, 11, 10

1499 16, 12, 5

1500 25

1501 6, 5, 2

1502 12, 5, 1

1503 80

1504 8, 3, 2

1505 246

1506 11, 5, 2

1507 11, 10, 2

1508 599

1509 18, 12, 10

1510 189

1511 278

1512 10, 9, 3

1513 399

1514 299

1515 13, 10, 6

1516 277

1517 13, 10, 6

1518 69

1519 220

1520 13, 10, 3

1521 229

1522 18, 11, 10

1523 16, 15, 1

1524 27

1525 18, 9, 3

1526 473

1527 373

1528 18, 17, 7

1529 60

1530 207

1531 13, 9, 8

1532 22, 20, 13

1533 25, 18, 7

1534 225

1535 404

1536 21, 6, 2

1537 46

1538 6, 2, 1

1539 17, 12, 6

1540 75

1541 4, 2, 1

1542 365

1543 445

1544 11, 7, 1

1545 44

1546 10, 8, 5

1547 12, 5, 2

1548 63

1549 17, 4, 2

1550 189

A.1. The table. 141

m ki

1551 557

1552 19, 12, 2

1553 252

1554 99

1555 10, 8, 5

1556 65

1557 14, 9, 3

1558 9

1559 119

1560 8, 5, 2

1561 339

1562 95

1563 12, 9, 7

1564 7

1565 13, 10, 2

1566 77

1567 127

1568 21, 10, 7

1569 319

1570 667

1571 17, 10, 3

1572 501

1573 18, 12, 9

1574 9, 8, 5

1575 17

1576 20, 9, 2

1577 341

1578 731

1579 7, 6, 5

1580 647

1581 10, 4, 2

1582 121

1583 20

1584 21, 19, 13

1585 574

1586 399

1587 15, 10, 7

1588 85

1589 16, 8, 3

1590 169

1591 15

1592 12, 7, 5

1593 568

1594 10, 7, 1

1595 18, 2, 1

1596 3

1597 14, 3, 2

1598 13, 7, 3

1599 643

1600 14, 11, 1

1601 548

1602 783

1603 14, 11, 1

1604 317

1605 7, 6, 4

1606 153

1607 87

1608 15, 13, 1

1609 231

1610 11, 5, 3

1611 18, 13, 7

1612 771

1613 30, 20, 11

1614 15, 6, 3

1615 103

1616 13, 4, 3

1617 182

1618 211

1619 17, 6, 1

1620 27

1621 13, 12, 10

1622 15, 14, 10

1623 17

1624 13, 11, 5

1625 69

m ki

1626 11, 5, 1

1627 18, 6, 1

1628 603

1629 10, 4, 2

1630 741

1631 668

1632 17, 15, 3

1633 147

1634 227

1635 15, 10, 9

1636 37

1637 16, 6, 1

1638 173

1639 427

1640 7, 5, 1

1641 287

1642 231

1643 20, 15, 10

1644 18, 9, 1

1645 14, 12, 5

1646 16, 5, 1

1647 310

1648 18, 13, 1

1649 434

1650 579

1651 18, 13, 8

1652 45

1653 12, 8, 3

1654 16, 9, 5

1655 53

1656 19, 15, 10

1657 16

1658 17, 6, 5

1659 17, 10, 1

1660 37

1661 17, 10, 9

1662 21, 13, 7

1663 99

1664 17, 9, 6

1665 176

1666 271

1667 18, 17, 13

1668 459

1669 21, 17, 10

1670 6, 5, 2

1671 202

1672 5, 4, 3

1673 90

1674 755

1675 15, 7, 2

1676 363

1677 8, 4, 2

1678 129

1679 20

1680 11, 6, 2

1681 135

1682 15, 8, 7

1683 14, 13, 2

1684 10, 4, 3

1685 24, 13, 10

1686 19, 14, 11

1687 31

1688 15, 8, 6

1689 758

1690 16, 11, 5

1691 16, 5, 1

1692 359

1693 23, 18, 17

1694 501

1695 29

1696 15, 6, 3

1697 201

1698 459

1699 12, 10, 7

1700 225

m ki

1701 22, 17, 13

1702 24, 22, 5

1703 161

1704 14, 11, 3

1705 52

1706 19, 17, 6

1707 21, 14, 12

1708 93

1709 13, 10, 3

1710 201

1711 178

1712 15, 12, 5

1713 250

1714 7, 6, 4

1715 17, 13, 6

1716 221

1717 13, 11, 8

1718 17, 14, 9

1719 113

1720 17, 14, 10

1721 300

1722 39

1723 18, 13, 3

1724 261

1725 15, 14, 8

1726 753

1727 8, 4, 3

1728 11, 10, 5

1729 94

1730 15, 13, 1

1731 10, 4, 2

1732 14, 11, 10

1733 8, 6, 2

1734 461

1735 418

1736 19, 14, 6

1737 403

1738 267

1739 10, 9, 2

1740 259

1741 20, 4, 3

1742 869

1743 173

1744 19, 18, 2

1745 369

1746 255

1747 22, 12, 9

1748 567

1749 20, 11, 7

1750 457

1751 482

1752 6, 3, 2

1753 775

1754 19, 17, 6

1755 6, 4, 3

1756 99

1757 15, 14, 8

1758 6, 5, 2

1759 165

1760 8, 3, 2

1761 13, 12, 10

1762 25, 21, 17

1763 17, 14, 9

1764 105

1765 17, 15, 14

1766 10, 3, 2

1767 250

1768 25, 6, 5

1769 327

1770 279

1771 13, 6, 5

1772 371

1773 15, 9, 4

1774 117

1775 486

m ki

1776 10, 9, 3

1777 217

1778 635

1779 30, 27, 17

1780 457

1781 16, 6, 2

1782 57

1783 439

1784 23, 21, 6

1785 214

1786 20, 13, 6

1787 20, 16, 1

1788 819

1789 15, 11, 8

1790 593

1791 190

1792 17, 14, 3

1793 114

1794 21, 18, 3

1795 10, 5, 2

1796 12, 9, 5

1797 8, 6, 3

1798 69

1799 312

1800 22, 5, 2

1801 502

1802 843

1803 15, 10, 3

1804 747

1805 6, 5, 2

1806 101

1807 123

1808 19, 16, 9

1809 521

1810 171

1811 16, 7, 2

1812 12, 6, 5

1813 22, 21, 20

1814 545

1815 163

1816 23, 18, 1

1817 479

1818 495

1819 13, 6, 5

1820 11

1821 17, 5, 2

1822 18, 8, 1

1823 684

1824 7, 5, 1

1825 9

1826 18, 11, 3

1827 22, 20, 13

1828 273

1829 4, 3, 2

1830 381

1831 51

1832 18, 13, 7

1833 518

1834 9, 5, 1

1835 14, 12, 3

1836 243

1837 21, 17, 2

1838 53

1839 836

1840 21, 10, 2

1841 66

1842 12, 10, 7

1843 13, 9, 8

1844 339

1845 16, 11, 5

1846 901

1847 180

1848 16, 13, 3

1849 49

1850 6, 3, 2


m ki

1851 15, 4, 1

1852 16, 13, 6

1853 18, 15, 12

1854 885

1855 39

1856 11, 9, 4

1857 688

1858 16, 15, 7

1859 13, 10, 6

1860 13

1861 25, 23, 12

1862 149

1863 260

1864 11, 9, 1

1865 53

1866 11

1867 12, 4, 2

1868 9, 7, 5

1869 11, 8, 1

1870 121

1871 261

1872 10, 5, 2

1873 199

1874 20, 4, 3

1875 17, 9, 2

1876 13, 9, 4

1877 12, 8, 7

1878 253

1879 174

1880 15, 4, 2

1881 370

1882 9, 6, 1

1883 16, 10, 9

1884 669

1885 20, 10, 9

1886 833

1887 353

1888 17, 13, 2

1889 29

1890 371

1891 9, 8, 5

1892 8, 7, 1

1893 19, 8, 7

1894 12, 11, 10

1895 873

1896 26, 11, 2

1897 12, 9, 1

1898 10, 7, 2

1899 13, 6, 1

1900 235

1901 26, 24, 19

1902 733

1903 778

1904 12, 11, 1

1905 344

1906 931

1907 16, 6, 4

1908 945

1909 21, 19, 14

1910 18, 13, 11

1911 67

1912 20, 15, 10

1913 462

1914 14, 5, 1

1915 10, 9, 6

1916 18, 11, 10

1917 16, 9, 7

1918 477

1919 105

1920 11, 3, 2

1921 468

1922 23, 16, 15

1923 16, 15, 6

1924 327

1925 23, 10, 4

m ki

1926 357

1927 25

1928 17, 16, 7

1929 31

1930 7, 5, 2

1931 16, 7, 6

1932 277

1933 14, 13, 6

1934 413

1935 103

1936 15, 10, 1

1937 231

1938 747

1939 5, 2, 1

1940 113

1941 20, 10, 7

1942 15, 9, 6

1943 11

1944 27, 22, 18

1945 91

1946 51

1947 18, 13, 12

1948 603

1949 10, 7, 3

1950 9

1951 121

1952 15, 14, 6

1953 17

1954 16, 11, 2

1955 23, 15, 6

1956 279

1957 16, 12, 6

1958 89

1959 371

1960 17, 15, 2

1961 771

1962 99

1963 7, 6, 3

1964 21

1965 10, 7, 5

1966 801

1967 26

1968 25, 19, 14

1969 175

1970 10, 7, 2

1971 20, 5, 4

1972 12, 11, 1

1973 22, 5, 1

1974 165

1975 841

1976 25, 19, 17

1977 238

1978 11, 8, 6

1979 22, 21, 4

1980 33

1981 8, 7, 6

1982 14, 9, 2

1983 113

1984 13, 11, 5

1985 311

1986 891

1987 20, 16, 14

1988 555

1989 23, 14, 8

1990 133

1991 546

1992 6, 3, 2

1993 103

1994 15

1995 10, 7, 3

1996 307

1997 14, 10, 1

1998 15, 12, 2

1999 367

2000 13, 10, 6

m ki

2001 169

2002 22, 21, 11

2003 12, 10, 8

2004 441

2005 17, 12, 7

2006 917

2007 205

2008 26, 23, 13

2009 54

2010 459

2011 17, 15, 4

2012 19, 15, 4

2013 5, 4, 2

2014 9, 7, 6

2015 42

2016 21, 15, 7

2017 330

2018 20, 7, 3

2019 20, 7, 2

2020 81

2021 19, 14, 1

2022 349

2023 165

2024 40, 35, 9

2025 274

2026 475

2027 11, 10, 3

2028 93

2029 12, 7, 4

2030 13, 12, 2

2031 386

2032 7, 6, 2

2033 881

2034 143

2035 9, 8, 4

2036 71

2037 19, 18, 3

2038 16, 11, 6

2039 155

2040 7, 2, 1

2041 735

2042 16, 8, 7

2043 9, 7, 4

2044 45

2045 7, 6, 4

2046 12, 11, 3

2047 3

2048 19, 14, 13

2049 124

2050 15, 13, 8

2051 13, 6, 5

2052 323

2053 21, 13, 6

2054 201

2055 11

2056 13, 12, 3

2057 245

2058 343

2059 14, 12, 10

2060 387

2061 19, 4, 1

2062 16, 3, 2

2063 48

2064 17, 9, 2

2065 97

2066 71

2067 17, 13, 8

2068 18, 10, 7

2069 18, 9, 8

2070 237

2071 11, 5, 3

2072 13, 10, 3

2073 253

2074 231

2075 9, 7, 4

m ki

2076 851

2077 15, 14, 4

2078 16, 6, 5

2079 35

2080 4, 3, 1

2081 467

2082 523

2083 21, 11, 10

2084 4, 2, 1

2085 9, 8, 3

2086 261

2087 141

2088 18, 11, 5

2089 150

2090 9, 4, 1

2091 12, 9, 5

2092 17, 15, 7

2093 16, 15, 7

2094 645

2095 256

2096 19, 4, 2

2097 119

2098 19

2099 15, 12, 9

2100 35

2101 25, 22, 9

2102 33

2103 98

2104 19, 15, 9

2105 153

2106 111

2107 17, 10, 2

2108 21, 5, 3

2109 10, 5, 1

2110 12, 9, 6

2111 249

2112 16, 13, 7

2113 385

2114 155

2115 11, 10, 1

2116 25

2117 24, 16, 11

2118 385

2119 84

2120 17, 14, 6

2121 304

2122 91

2123 14, 11, 3

2124 45

2125 24, 17, 14

2126 881

2127 539

2128 23, 9, 1

2129 21

2130 239

2131 13, 6, 5

2132 213

2133 24, 22, 4

2134 23, 13, 2

2135 47

2136 15, 12, 9

2137 331

2138 13, 9, 2

2139 14, 4, 1

2140 283

2141 16, 3, 1

2142 69

2143 345

2144 13, 7, 3

2145 19

2146 595

2147 8, 3, 2

2148 549

2149 17, 9, 2

2150 569

A.1. The table. 143

m ki

2151 224

2152 24, 13, 7

2153 582

2154 10, 7, 5

2155 10, 9, 8

2156 405

2157 14, 4, 1

2158 93

2159 6

2160 31, 25, 14

2161 766

2162 47

2163 12, 9, 7

2164 561

2165 10, 4, 2

2166 693

2167 840

2168 11, 9, 3

2169 55

2170 411

2171 7, 6, 4

2172 6, 4, 1

2173 15, 8, 4

2174 225

2175 128

2176 15, 8, 1

2177 554

2178 15

2179 8, 7, 2

2180 111

2181 18, 12, 7

2182 93

2183 162

2184 11, 10, 5

2185 51

2186 51

2187 22, 11, 1

2188 99

2189 19, 8, 7

2190 441

2191 111

2192 8, 5, 3

2193 71

2194 15, 13, 9

2195 23, 22, 16

2196 539

2197 6, 5, 2

2198 893

2199 49

2200 20, 15, 5

2201 143

2202 15, 3, 2

2203 14, 6, 5

2204 11, 7, 1

2205 14, 7, 4

2206 793

2207 438

2208 21, 16, 6

2209 142

2210 539

2211 20, 14, 3

2212 423

2213 20, 19, 4

2214 1041

2215 39

2216 24, 7, 2

2217 455

2218 603

2219 22, 12, 11

2220 7

2221 17, 16, 6

2222 333

2223 17, 6, 2

2224 21, 19, 5

2225 47

m ki

2226 19, 16, 7

2227 14, 9, 8

2228 425

2229 17, 8, 7

2230 637

2231 654

2232 19, 17, 4

2233 249

2234 7, 6, 1

2235 20, 17, 11

2236 63

2237 7, 4, 2

2238 1053

2239 120

2240 23, 7, 1

2241 20

2242 7

2243 27, 15, 2

2244 399

2245 22, 12, 11

2246 23, 15, 6

2247 217

2248 9, 4, 3

2249 126

2250 927

2251 19, 16, 13

2252 75

2253 19, 14, 2

2254 10, 9, 2

2255 729

2256 14, 9, 6

2257 829

2258 983

2259 16, 10, 6

2260 12, 4, 1

2261 14, 12, 7

2262 57

2263 273

2264 15, 7, 2

2265 151

2266 343

2267 18, 17, 8

2268 115

2269 15, 10, 7

2270 369

2271 560

2272 21, 10, 9

2273 630

2274 239

2275 15, 12, 1

2276 21

2277 10, 4, 2

2278 17, 14, 7

2279 276

2280 13, 4, 2

2281 715

2282 975

2283 20, 13, 4

2284 889

2285 8, 6, 2

2286 249

2287 651

2288 17, 16, 7

2289 136

2290 23, 6, 5

2291 13, 10, 2

2292 89

2293 10, 8, 3

2294 21, 17, 10

2295 259

2296 15, 10, 1

2297 405

2298 15, 13, 3

2299 16, 6, 1

2300 95

m ki

2301 15, 9, 8

2302 15, 8, 1

2303 80

2304 8, 7, 5

2305 424

2306 551

2307 11, 7, 2

2308 31

2309 12, 10, 8

2310 233

2311 148

2312 19, 6, 4

2313 221

2314 879

2315 17, 15, 4

2316 21

2317 17, 4, 2

2318 245

2319 161

2320 13, 11, 5

2321 543

2322 83

2323 16, 3, 2

2324 717

2325 14, 8, 5

2326 13, 10, 7

2327 32

2328 15, 9, 2

2329 105

2330 15, 5, 1

2331 14

2332 349

2333 18, 15, 8

2334 1125

2335 553

2336 15, 10, 8

2337 523

2338 211

2339 10, 3, 2

2340 39

2341 24, 18, 16

2342 65

2343 415

2344 27, 26, 14

2345 29

2346 987

2347 11, 10, 2

2348 731

2349 31, 16, 9

2350 21, 19, 4

2351 950

2352 23, 20, 2

2353 328

2354 14, 11, 6

2355 12, 11, 6

2356 183

2357 10, 9, 8

2358 161

2359 172

2360 19, 10, 8

2361 646

2362 13, 10, 6

2363 9, 7, 4

2364 643

2365 21, 14, 5

2366 16, 13, 6

2367 610

2368 13, 11, 8

2369 77

2370 12, 11, 6

2371 20, 18, 17

2372 1139

2373 17, 14, 5

2374 24, 16, 13

2375 198

m ki

2376 7, 5, 4

2377 381

2378 243

2379 22, 9, 3

2380 1

2381 18, 12, 2

2382 429

2383 49

2384 21, 19, 1

2385 607

2386 11, 9, 1

2387 8, 7, 6

2388 11

2389 31, 12, 10

2390 629

2391 956

2392 31, 13, 3

2393 59

2394 423

2395 17, 8, 7

2396 173

2397 22, 17, 4

2398 15, 13, 11

2399 107

2400 20, 19, 17

2401 61

2402 251

2403 11, 8, 2

2404 67

2405 17, 14, 5

2406 14, 12, 5

2407 91

2408 23, 6, 4

2409 1198

2410 807

2411 12, 2, 1

2412 25

2413 11, 6, 1

2414 29

2415 154

2416 23, 6, 5

2417 225

2418 311

2419 22, 16, 6

2420 77

2421 11, 8, 4

2422 1117

2423 102

2424 21, 16, 6

2425 678

2426 20, 4, 3

2427 8, 6, 5

2428 301

2429 22, 14, 7

2430 477

2431 303

2432 29, 22, 19

2433 305

2434 507

2435 18, 6, 2

2436 145

2437 9, 4, 3

2438 929

2439 404

2440 12, 7, 5

2441 339

2442 127

2443 15, 13, 4

2444 1115

2445 23, 20, 10

2446 18, 13, 6

2447 786

2448 21, 10, 4

2449 621

2450 191


m ki

2451 10, 4, 3

2452 331

2453 21, 14, 11

2454 357

2455 313

2456 12, 5, 3

2457 238

2458 23, 20, 18

2459 17, 7, 4

2460 35

2461 19, 18, 10

2462 22, 13, 8

2463 1172

2464 5, 4, 3

2465 531

2466 599

2467 18, 14, 2

2468 99

2469 26, 16, 11

2470 217

2471 15, 6, 3

2472 12, 3, 1

2473 225

2474 899

2475 12, 11, 9

2476 17, 3, 2

2477 19, 17, 6

2478 765

2479 72

2480 20, 5, 2

2481 710

2482 11, 7, 6

2483 12, 11, 2

2484 523

2485 142

2486 19, 14, 9

2487 155

2488 23, 13, 9

2489 315

2490 8, 7, 5

2491 25, 16, 12

2492 141

2493 18, 15, 7

2494 13, 8, 2

2495 497

2496 12, 3, 1

2497 1171

2498 8, 7, 4

2499 13, 12, 9

2500 135

2501 22, 21, 5

2502 45

2503 316

2504 19, 8, 6

2505 131

2506 17, 11, 3

2507 13, 8, 1

2508 25

2509 14, 13, 3

2510 1113

2511 110

2512 29, 21, 7

2513 99

2514 183

2515 8, 7, 5

2516 563

2517 14, 4, 1

2518 18, 13, 2

2519 579

2520 31, 15, 13

2521 426

2522 16, 10, 5

2523 23, 17, 14

2524 15, 6, 4

2525 7, 6, 5

m ki

2526 141

2527 640

2528 19, 9, 4

2529 49

2530 14, 5, 3

2531 6, 2, 1

2532 26, 22, 13

2533 10, 3, 1

2534 185

2535 24, 19, 16

2536 21, 10, 9

2537 77

2538 315

2539 10, 9, 3

2540 209

2541 11, 8, 7

2542 97

2543 240

2544 21, 20, 6

2545 982

2546 891

2547 22, 10, 3

2548 373

2549 10, 9, 5

2550 333

2551 103

2552 28, 3, 2

2553 28

2554 1123

2555 9, 6, 2

2556 349

2557 18, 17, 7

2558 18, 8, 1

2559 23

2560 9, 3, 1

2561 201

2562 203

2563 12, 11, 10

2564 561

2565 25, 16, 14

2566 37

2567 122

2568 8, 5, 2

2569 69

2570 18, 15, 14

2571 18, 16, 9

2572 535

2573 12, 11, 3

2574 5

2575 867

2576 7, 2, 1

2577 674

2578 15, 7, 3

2579 23, 6, 1

2580 105

2581 26, 14, 12

2582 22, 19, 15

2583 31

2584 25, 19, 12

2585 263

2586 1047

2587 23, 12, 10

2588 13, 8, 1

2589 29, 11, 10

2590 1017

2591 219

2592 15, 12, 5

2593 297

2594 863

2595 24, 17, 2

2596 145

2597 16, 8, 7

2598 225

2599 289

2600 14, 13, 7

m ki

2601 406

2602 11, 6, 1

2603 18, 8, 7

2604 435

2605 19, 14, 5

2606 1181

2607 34

2608 15, 11, 2

2609 425

2610 427

2611 27, 17, 10

2612 21, 14, 6

2613 14, 12, 9

2614 553

2615 518

2616 17, 8, 7

2617 462

2618 71

2619 17, 10, 1

2620 835

2621 8, 7, 1

2622 11, 5, 3

2623 409

2624 15, 10, 4

2625 112

2626 43

2627 20, 17, 11

2628 47

2629 13, 9, 6

2630 177

2631 139

2632 19, 5, 3

2633 1241

2634 20, 11, 5

2635 25, 21, 14

2636 18, 11, 10

2637 9, 6, 4

2638 10, 3, 1

2639 144

2640 23, 11, 9

2641 736

2642 551

2643 16, 13, 10

2644 597

2645 18, 11, 10

2646 297

2647 513

2648 15, 8, 1

2649 689

2650 17, 13, 5

2651 7, 5, 4

2652 519

2653 17, 4, 2

2654 20, 16, 13

2655 53

2656 19, 11, 5

2657 242

2658 6, 3, 2

2659 20, 18, 16

2660 5

2661 17, 14, 2

2662 14, 12, 7

2663 458

2664 27, 21, 19

2665 772

2666 663

2667 254

2668 819

2669 18, 4, 2

2670 229

2671 46

2672 18, 7, 1

2673 530

2674 967

2675 13, 10, 9

m ki

2676 93

2677 17, 8, 6

2678 15, 6, 5

2679 286

2680 15, 9, 4

2681 635

2682 463

2683 11, 6, 1

2684 14, 12, 3

2685 8, 2, 1

2686 789

2687 225

2688 21, 10, 6

2689 36

2690 12, 9, 3

2691 14, 10, 8

2692 577

2693 10, 5, 3

2694 621

2695 123

2696 17, 15, 12

2697 170

2698 963

2699 32, 30, 29

2700 3

2701 12, 10, 5

2702 257

2703 67

2704 12, 9, 7

2705 12, 10, 5

2706 515

2707 9, 6, 4

2708 423

2709 10, 9, 3

2710 7, 3, 1

2711 690

2712 21, 12, 7

2713 840

2714 12, 8, 7

2715 30, 26, 15

2716 255

2717 14, 8, 3

2718 369

2719 102

2720 25, 18, 1

2721 826

2722 127

2723 9, 6, 5

2724 121

2725 21, 17, 2

2726 10, 6, 1

2727 430

2728 21, 7, 5

2729 96

2730 343

2731 15, 11, 2

2732 845

2733 19, 8, 7

2734 9, 5, 4

2735 933

2736 16, 3, 1

2737 226

2738 923

2739 12, 9, 5

2740 109

2741 6, 5, 4

2742 149

2743 447

2744 19, 18, 10

2745 484

2746 9, 7, 2

2747 15, 11, 6

2748 25

2749 22, 18, 17

2750 629

A.1. The table. 145

m ki

2751 49

2752 15, 4, 2

2753 716

2754 231

2755 13, 7, 6

2756 159

2757 24, 23, 12

2758 17, 5, 4

2759 842

2760 29, 26, 7

2761 108

2762 1319

2763 12, 10, 6

2764 687

2765 16, 10, 3

2766 1285

2767 102

2768 25, 19, 15

2769 269

2770 567

2771 13, 12, 5

2772 135

2773 30, 25, 20

2774 28, 3, 2

2775 802

2776 7, 3, 2

2777 22, 21, 17

2778 1095

2779 20, 17, 9

2780 51

2781 28, 27, 10

2782 22, 10, 9

2783 168

2784 29, 21, 15

2785 349

2786 339

2787 19, 18, 3

2788 21, 16, 2

2789 14, 12, 8

2790 837

2791 490

2792 12, 7, 2

2793 343

2794 11, 9, 4

2795 10, 8, 4

2796 769

2797 19, 6, 1

2798 20, 14, 5

2799 880

2800 17, 14, 6

2801 279

2802 18, 14, 3

2803 18, 16, 13

2804 609

2805 24, 8, 2

2806 729

2807 270

2808 15, 13, 1

2809 1342

2810 23, 10, 9

2811 10, 9, 7

2812 453

2813 13, 7, 6

2814 621

2815 84

2816 21, 19, 8

2817 109

2818 15, 9, 1

2819 10, 6, 5

2820 815

2821 16, 6, 4

2822 18, 17, 3

2823 592

2824 15, 14, 10

2825 288

m ki

2826 135

2827 19, 10, 6

2828 1103

2829 9, 6, 4

2830 17, 15, 13

2831 186

2832 27, 18, 1

2833 409

2834 15, 13, 7

2835 20, 13, 5

2836 1113

2837 17, 8, 3

2838 20, 4, 1

2839 1033

2840 20, 15, 9

2841 370

2842 1231

2843 7, 3, 2

2844 25

2845 10, 9, 1

2846 23, 15, 4

2847 329

2848 15, 8, 1

2849 114

2850 1411

2851 10, 7, 1

2852 1145

2853 14, 8, 1

2854 313

2855 41

2856 15, 13, 3

2857 756

2858 17, 9, 7

2859 29, 20, 11

2860 603

2861 20, 16, 10

2862 405

2863 139

2864 21, 17, 15

2865 212

2866 9, 7, 2

2867 15, 13, 10

2868 915

2869 8, 6, 1

2870 12, 11, 1

2871 272

2872 21, 5, 2

2873 75

2874 13, 6, 3

2875 20, 16, 2

2876 605

2877 10, 7, 4

2878 781

2879 149

2880 13, 10, 6

2881 1201

2882 1431

2883 16, 13, 12

2884 529

2885 13, 11, 6

2886 20, 14, 9

2887 469

2888 11, 4, 1

2889 76

2890 31

2891 16, 15, 10

2892 309

2893 27, 7, 2

2894 16, 14, 9

2895 358

2896 29, 6, 1

2897 15

2898 91

2899 19, 10, 1

2900 303

m ki

2901 11, 3, 2

2902 14, 10, 9

2903 279

2904 27, 15, 6

2905 321

2906 1155

2907 17, 14, 1

2908 19, 13, 10

2909 23, 22, 4

2910 1301

2911 685

2912 16, 9, 2

2913 238

2914 351

2915 18, 7, 5

2916 21

2917 16, 15, 4

2918 237

2919 149

2920 19, 9, 5

2921 480

2922 559

2923 11, 6, 5

2924 12, 4, 1

2925 12, 4, 3

2926 20, 14, 1

2927 974

2928 24, 21, 11

2929 651

2930 9, 4, 1

2931 13, 8, 1

2932 14, 7, 6

2933 15, 14, 13

2934 713

2935 13, 12, 7

2936 5, 3, 2

2937 172

2938 499

2939 30, 17, 5

2940 49

2941 23, 18, 17

2942 1425

2943 320

2944 5, 3, 2

2945 146

2946 551

2947 22, 20, 11

2948 17, 3, 2

2949 17, 7, 4

2950 397

2951 872

2952 17, 13, 2

2953 33

2954 9, 6, 5

2955 12, 10, 6

2956 823

2957 19, 14, 3

2958 23, 13, 5

2959 69

2960 12, 3, 2

2961 86

2962 319

2963 21, 14, 5

2964 83

2965 25, 22, 15

2966 861

2967 1028

2968 29, 27, 4

2969 561

2970 583

2971 18, 13, 2

2972 693

2973 18, 10, 4

2974 11, 3, 1

2975 192

m ki

2976 21, 10, 3

2977 126

2978 375

2979 12, 11, 6

2980 381

2981 13, 2, 1

2982 669

2983 330

2984 17, 9, 6

2985 166

2986 343

2987 8, 3, 2

2988 313

2989 18, 9, 7

2990 26, 22, 9

2991 292

2992 23, 3, 1

2993 569

2994 303

2995 9, 6, 4

2996 345

2997 12, 6, 5

2998 669

2999 1011

3000 15, 12, 9

3001 975

3002 22, 21, 10

3003 12, 11, 5

3004 351

3005 14, 12, 5

3006 15, 9, 6

3007 963

3008 15, 13, 1

3009 1349

3010 25, 12, 10

3011 22, 8, 6

3012 1327

3013 23, 6, 2

3014 17, 15, 5

3015 308

3016 38, 25, 9

3017 108

3018 203

3019 16, 6, 1

3020 413

3021 22, 10, 1

3022 14, 12, 1

3023 734

3024 32, 3, 2

3025 757

3026 19, 18, 13

3027 17, 16, 4

3028 135

3029 11, 6, 4

3030 12, 9, 4

3031 55

3032 17, 15, 4

3033 238

3034 399

3035 21, 20, 2

3036 391

3037 7, 6, 3

3038 633

3039 436

3040 27, 21, 3

3041 776

3042 415

3043 18, 16, 15

3044 69

3045 17, 14, 11

3046 1021

3047 19, 15, 4

3048 18, 3, 2

3049 765

3050 651


m ki

3051 19, 17, 16

3052 363

3053 22, 20, 15

3054 21, 4, 3

3055 13, 7, 1

3056 5, 4, 3

3057 110

3058 811

3059 15, 10, 1

3060 405

3061 22, 15, 1

3062 1053

3063 32

3064 25, 11, 9

3065 432

3066 455

3067 18, 16, 13

3068 215

3069 34, 26, 19

3070 20, 13, 8

3071 65

3072 11, 10, 5

3073 184

3074 17, 9, 3

3075 16, 14, 10

3076 475

3077 12, 10, 8

3078 105

3079 174

3080 21, 19, 16

3081 64

3082 9, 6, 1

3083 23, 20, 18

3084 109

3085 25, 14, 12

3086 1281

3087 49

3088 20, 13, 11

3089 261

3090 279

3091 12, 7, 5

3092 45

3093 14, 11, 8

3094 769

3095 419

3096 33, 29, 14

3097 1162

3098 18, 17, 11

3099 14, 13, 11

3100 45

3101 10, 7, 3

3102 225

3103 124

3104 23, 9, 5

3105 833

3106 6, 2, 1

3107 14, 12, 11

3108 61

3109 26, 20, 19

3110 1421

3111 199

3112 17, 15, 1

3113 191

3114 19, 15, 4

3115 25, 18, 16

3116 461

3117 19, 8, 4

3118 525

3119 315

3120 18, 17, 11

3121 493

3122 22, 7, 6

3123 15, 10, 4

3124 861

3125 24, 21, 18

m ki

3126 449

3127 139

3128 30, 19, 11

3129 23

3130 867

3131 22, 8, 7

3132 123

3133 6, 4, 3

3134 89

3135 356

3136 15, 12, 10

3137 587

3138 29, 19, 13

3139 14, 11, 10

3140 1115

3141 23, 18, 12

3142 981

3143 8

3144 23, 21, 8

3145 112

3146 18, 11, 6

3147 17, 10, 7

3148 1171

3149 22, 3, 2

3150 253

3151 1254

3152 21, 17, 6

3153 98

3154 19, 17, 6

3155 15, 12, 2

3156 565

3157 24, 14, 10

3158 19, 9, 5

3159 103

3160 7, 6, 2

3161 858

3162 315

3163 18, 13, 10

3164 113

3165 17, 13, 10

3166 18, 10, 1

3167 672

3168 33, 31, 18

3169 1123

3170 783

3171 19, 14, 13

3172 301

3173 20, 17, 14

3174 81

3175 646

3176 13, 10, 5

3177 484

3178 915

3179 22, 12, 2

3180 1085

3181 12, 10, 3

3182 1205

3183 1225

3184 11, 10, 2

3185 204

3186 891

3187 9, 8, 2

3188 129

3189 19, 18, 12

3190 12, 4, 1

3191 495

3192 25, 8, 7

3193 211

3194 1059

3195 19, 14, 1

3196 175

3197 22, 16, 14

3198 841

3199 54

3200 11, 6, 4

m ki

3201 674

3202 24, 12, 3

3203 14, 7, 3

3204 31

3205 17, 9, 2

3206 15, 8, 6

3207 704

3208 16, 13, 3

3209 81

3210 1303

3211 12, 10, 5

3212 1559

3213 30, 16, 1

3214 1197

3215 614

3216 21, 11, 3

3217 67

3218 10, 9, 8

3219 24, 10, 3

3220 19

3221 11, 6, 5

3222 145

3223 784

3224 23, 19, 1

3225 101

3226 9, 7, 5

3227 8, 7, 6

3228 1225

3229 12, 9, 7

3230 501

3231 15, 9, 8

3232 12, 9, 7

3233 575

3234 511

3235 21, 11, 8

3236 887

3237 19, 8, 4

3238 409

3239 98

3240 12, 3, 2

3241 127

3242 27, 13, 7

3243 22, 13, 5

3244 1249

3245 11, 10, 4

3246 1221

3247 426

3248 15, 8, 1

3249 149

3250 15, 11, 8

3251 9, 6, 5

3252 567

3253 10, 5, 3

3254 1485

3255 124

3256 31, 26, 2

3257 806

3258 203

3259 22, 4, 1

3260 237

3261 18, 12, 10

3262 15, 13, 7

3263 939

3264 17, 5, 2

3265 18, 16, 7

3266 19, 2, 1

3267 20, 19, 10

3268 73

3269 22, 3, 2

3270 237

3271 333

3272 23, 10, 1

3273 1408

3274 775

3275 24, 13, 10

m ki

3276 69

3277 25, 22, 1

3278 22, 12, 1

3279 446

3280 16, 15, 6

3281 47

3282 783

3283 30, 28, 21

3284 24, 17, 13

3285 18, 4, 1

3286 397

3287 717

3288 21, 18, 11

3289 43

3290 11, 7, 3

3291 18, 7, 1

3292 61

3293 20, 18, 15

3294 249

3295 594

3296 19, 14, 13

3297 7

3298 639

3299 18, 17, 14

3300 55

3301 24, 10, 4

3302 605

3303 1336

3304 19, 17, 3

3305 806

3306 127

3307 15, 10, 2

3308 717

3309 23, 20, 6

3310 1

3311 618

3312 14, 9, 3

3313 436

3314 1019

3315 12, 8, 2

3316 1641

3317 22, 17, 7

3318 585

3319 58

3320 17, 10, 4

3321 20

3322 567

3323 28, 14, 10

3324 173

3325 25, 19, 10

3326 1145

3327 875

3328 17, 9, 2

3329 525

3330 191

3331 18, 17, 11

3332 587

3333 16, 8, 7

3334 6, 4, 1

3335 636

3336 11, 10, 5

3337 370

3338 1155

3339 22, 16, 12

3340 11, 7, 5

3341 25, 19, 12

3342 9, 6, 5

3343 73

3344 30, 27, 15

3345 796

3346 15, 6, 1

3347 23, 18, 16

3348 177

3349 20, 19, 17

3350 1401

A.1. The table. 147

m ki

3351 731

3352 21, 20, 19

3353 389

3354 10, 9, 3

3355 10, 6, 4

3356 339

3357 24, 17, 15

3358 19, 8, 6

3359 99

3360 18, 15, 5

3361 12, 10, 4

3362 11, 7, 4

3363 14, 10, 2

3364 85

3365 24, 15, 2

3366 257

3367 136

3368 7, 5, 1

3369 1541

3370 15, 10, 1

3371 30, 29, 18

3372 47

3373 14, 6, 4

3374 417

3375 49

3376 11, 9, 1

3377 236

3378 623

3379 25, 20, 9

3380 659

3381 7, 4, 1

3382 217

3383 956

3384 21, 9, 3

3385 603

3386 19, 9, 2

3387 26, 25, 16

3388 169

3389 17, 15, 4

3390 1381

3391 465

3392 23, 13, 6

3393 1615

3394 13, 12, 3

3395 22, 10, 6

3396 13, 6, 1

3397 19, 4, 1

3398 245

3399 416

3400 14, 13, 6

3401 531

3402 387

3403 15, 12, 6

3404 173

3405 24, 9, 2

3406 22, 13, 12

3407 507

3408 16, 15, 6

3409 244

3410 1023

3411 14, 8, 5

3412 325

3413 14, 9, 6

3414 93

3415 1272

3416 28, 27, 1

3417 32

3418 15

3419 12, 9, 3

3420 423

3421 19, 14, 5

3422 1121

3423 11

3424 22, 15, 6

3425 189

m ki

3426 1071

3427 16, 12, 1

3428 17, 16, 13

3429 16, 12, 6

3430 153

3431 153

3432 25, 2, 1

3433 28, 25, 12

3434 14, 13, 12

3435 15, 14, 5

3436 159

3437 18, 16, 10

3438 393

3439 147

3440 27, 16, 1

3441 394

3442 8, 7, 3

3443 26, 19, 3

3444 69

3445 21, 5, 2

3446 21, 17, 8

3447 404

3448 17, 11, 6

3449 917

3450 11, 8, 3

3451 19, 14, 9

3452 1145

3453 16, 6, 1

3454 25, 23, 21

3455 21

3456 19, 18, 9

3457 120

3458 519

3459 19, 18, 12

3460 1495

3461 20, 10, 7

3462 225

3463 289

3464 11, 6, 3

3465 304

3466 43

3467 28, 26, 6

3468 921

3469 38, 16, 6

3470 917

3471 314

3472 17, 14, 7

3473 720

3474 735

3475 30, 16, 13

3476 525

3477 16, 15, 12

3478 465

3479 155

3480 19, 15, 13

3481 546

3482 15, 5, 4

3483 12, 5, 2

3484 1329

3485 8, 7, 4

3486 1085

3487 120

3488 12, 11, 1

3489 518

3490 16, 12, 3

3491 19, 14, 7

3492 57

3493 19, 18, 1

3494 25, 19, 9

3495 254

3496 35, 21, 4

3497 1025

3498 567

3499 29, 24, 4

3500 375

m ki

3501 15, 8, 2

3502 15, 13, 6

3503 993

3504 23, 17, 10

3505 103

3506 13, 5, 3

3507 21, 14, 6

3508 10, 7, 6

3509 23, 12, 7

3510 81

3511 1141

3512 37, 35, 6

3513 41

3514 11, 9, 4

3515 17, 10, 9

3516 667

3517 22, 14, 12

3518 16, 14, 9

3519 569

3520 32, 29, 3

3521 129

3522 399

3523 23, 12, 2

3524 1439

3525 10, 7, 5

3526 12, 11, 10

3527 476

3528 25, 18, 7

3529 270

3530 10, 9, 5

3531 18, 3, 1

3532 1561

3533 30, 3, 2

3534 973

3535 162

3536 12, 7, 5

3537 218

3538 13, 6, 5

3539 16, 2, 1

3540 75

3541 23, 7, 2

3542 345

3543 377

3544 21, 14, 2

3545 998

3546 151

3547 26, 23, 12

3548 255

3549 14, 6, 3

3550 1269

3551 183

3552 15, 9, 6

3553 13, 3, 2

3554 24, 23, 17

3555 28, 25, 15

3556 127

3557 14, 8, 5

3558 397

3559 69

3560 17, 3, 2

3561 257

3562 927

3563 18, 15, 6

3564 225

3565 22, 17, 12

3566 8, 6, 1

3567 24, 20, 12

3568 21, 12, 10

3569 1028

3570 699

3571 30, 13, 3

3572 1143

3573 13, 8, 2

3574 889

3575 339

m ki

3576 19, 10, 3

3577 348

3578 17, 9, 5

3579 20, 14, 6

3580 915

3581 22, 15, 2

3582 713

3583 747

3584 25, 12, 10

3585 7

3586 19, 14, 8

3587 26, 6, 5

3588 843

3589 30, 28, 8

3590 1713

3591 509

3592 38, 33, 14

3593 72

3594 59

3595 28, 14, 2

3596 383

3597 22, 9, 3

3598 24, 5, 1

3599 114

3600 9, 5, 2

3601 669

3602 10, 2, 1

3603 23, 11, 6

3604 637

3605 8, 7, 4

3606 861

3607 142

3608 15, 14, 10

3609 1016

3610 12, 5, 2

3611 18, 7, 1

3612 215

3613 17, 7, 6

3614 29

3615 47

3616 25, 18, 7

3617 377

3618 1539

3619 13, 12, 5

3620 231

3621 22, 21, 16

3622 481

3623 10, 9, 7

3624 29, 27, 12

3625 279

3626 26, 25, 13

3627 7, 6, 4

3628 957

3629 15, 10, 2

3630 729

3631 90

3632 26, 17, 5

3633 553

3634 651

3635 15, 8, 2

3636 391

3637 7, 6, 5

3638 28, 8, 1

3639 76

3640 20, 15, 10

3641 1626

3642 771

3643 14, 13, 8

3644 1365

3645 21, 14, 6

3646 20, 17, 6

3647 45

3648 23, 7, 2

3649 394

3650 1691


m ki

3651 15, 13, 6

3652 721

3653 10, 9, 8

3654 273

3655 112

3656 17, 12, 11

3657 928

3658 1471

3659 18, 13, 2

3660 61

3661 16, 11, 6

3662 1365

3663 130

3664 35, 24, 14

3665 189

3666 30, 20, 11

3667 15, 6, 4

3668 269

3669 22, 7, 4

3670 23, 4, 3

3671 101

3672 19, 17, 8

3673 544

3674 27, 15, 11

3675 30, 10, 9

3676 609

3677 25, 20, 7

3678 501

3679 21

3680 14, 13, 7

3681 115

3682 471

3683 15, 13, 10

3684 81

3685 9, 4, 3

3686 81

3687 889

3688 32, 13, 11

3689 759

3690 839

3691 26, 9, 2

3692 6, 5, 3

3693 26, 20, 18

3694 1129

3695 62

3696 36, 33, 22

3697 91

3698 1719

3699 24, 21, 5

3700 675

3701 4, 2, 1

3702 1281

3703 429

3704 14, 13, 1

3705 148

3706 1195

3707 11, 6, 1

3708 147

3709 16, 14, 6

3710 797

3711 1735

3712 13, 12, 7

3713 413

3714 459

3715 20, 18, 11

3716 24, 11, 4

3717 18, 15, 4

3718 23, 18, 10

3719 488

3720 17, 15, 11

3721 31

3722 15, 7, 5

3723 18, 6, 4

3724 10, 9, 8

3725 21, 14, 8

m ki

3726 609

3727 42

3728 9, 4, 2

3729 184

3730 1191

3731 26, 20, 5

3732 1327

3733 8, 7, 3

3734 1305

3735 46

3736 33, 22, 18

3737 287

3738 75

3739 18, 10, 5

3740 95

3741 16, 15, 4

3742 25, 18, 11

3743 279

3744 27, 14, 2

3745 684

3746 22, 9, 7

3747 32, 22, 11

3748 19, 11, 8

3749 15, 4, 1

3750 1013

3751 435

3752 9, 4, 2

3753 407

3754 1611

3755 15, 13, 8

3756 291

3757 18, 16, 5

3758 21, 20, 9

3759 208

3760 23, 9, 1

3761 30

3762 383

3763 23, 10, 2

3764 1307

3765 28, 19, 12

3766 21, 15, 1

3767 672

3768 14, 7, 2

3769 300

3770 107

3771 13, 10, 9

3772 61

3773 10, 9, 4

3774 24, 9, 4

3775 1416

3776 7, 5, 4

3777 1414

3778 9, 5, 1

3779 23, 8, 2

3780 63

3781 10, 9, 6

3782 1785

3783 272

3784 29, 13, 6

3785 87

3786 1027

3787 14, 6, 1

3788 1173

3789 16, 15, 4

3790 22, 21, 17

3791 45

3792 20, 7, 5

3793 481

3794 17, 4, 3

3795 8, 7, 5

3796 127

3797 16, 8, 6

3798 1337

3799 202

3800 24, 23, 21

m ki

3801 112

3802 16, 15, 8

3803 18, 15, 6

3804 349

3805 18, 12, 9

3806 9, 7, 5

3807 68

3808 29, 18, 4

3809 938

3810 323

3811 9, 8, 4

3812 1799

3813 11, 8, 7

3814 22, 21, 11

3815 143

3816 19, 13, 9

3817 252

3818 17, 8, 6

3819 16, 6, 3

3820 20, 11, 3

3821 8, 7, 6

3822 29

3823 609

3824 19, 13, 2

3825 437

3826 23, 8, 1

3827 18, 13, 8

3828 1217

3829 13, 9, 6

3830 713

3831 310

3832 35, 13, 2

3833 35

3834 567

3835 15, 5, 4

3836 681

3837 22, 18, 3

3838 273

3839 503

3840 27, 9, 1

3841 840

3842 1331

3843 16, 5, 2

3844 1063

3845 11, 10, 9

3846 693

3847 108

3848 29, 18, 13

3849 71

3850 583

3851 29, 24, 19

3852 169

3853 12, 7, 5

3854 765

3855 1399

3856 39, 25, 3

3857 50

3858 459

3859 14, 8, 7

3860 35

3861 31, 10, 2

3862 18, 16, 5

3863 834

3864 19, 15, 9

3865 289

3866 315

3867 20, 14, 6

3868 13, 12, 9

3869 24, 22, 13

3870 913

3871 264

3872 10, 3, 2

3873 32

3874 20, 8, 3

3875 11, 10, 4

m ki

3876 157

3877 17, 11, 4

3878 19, 9, 2

3879 121

3880 27, 5, 1

3881 810

3882 1775

3883 20, 9, 2

3884 45

3885 15, 8, 3

3886 273

3887 915

3888 45, 42, 6

3889 340

3890 20, 19, 10

3891 17, 9, 2

3892 289

3893 16, 13, 2

3894 1197

3895 777

3896 15, 7, 5

3897 310

3898 25, 9, 1

3899 21, 20, 12

3900 65

3901 26, 6, 1

3902 1845

3903 350

3904 17, 13, 2

3905 26

3906 251

3907 15, 4, 1

3908 855

3909 14, 12, 11

3910 28, 22, 13

3911 1673

3912 24, 11, 2

3913 393

3914 531

3915 25, 22, 9

3916 445

3917 16, 12, 11

3918 117

3919 285

3920 15, 13, 8

3921 785

3922 26, 21, 1

3923 24, 21, 3

3924 245

3925 18, 16, 5

3926 17, 16, 12

3927 367

3928 8, 7, 5

3929 1440

3930 199

3931 23, 9, 4

3932 1563

3933 30, 19, 3

3934 28, 12, 5

3935 20, 15, 8

3936 15, 5, 3

3937 252

3938 1835

3939 28, 19, 10

3940 21, 5, 2

3941 19, 11, 6

3942 57

3943 1125

3944 31, 29, 28

3945 427

3946 1155

3947 22, 10, 5

3948 293

3949 28, 22, 3

3950 873

A.1. The table. 149

m ki

3951 752

3952 11, 6, 5

3953 698

3954 503

3955 24, 8, 5

3956 429

3957 18, 16, 10

3958 27, 4, 2

3959 891

3960 29, 15, 2

3961 756

3962 255

3963 13, 8, 1

m ki

3964 735

3965 14, 3, 2

3966 337

3967 357

3968 25, 18, 14

3969 196

3970 163

3971 10, 7, 2

3972 595

3973 13, 11, 8

3974 861

3975 322

3976 36, 3, 1

m ki

3977 221

3978 19, 9, 7

3979 25, 9, 2

3980 16, 9, 4

3981 21, 11, 8

3982 21, 13, 8

3983 11

3984 19, 5, 2

3985 1038

3986 12, 8, 7

3987 11, 4, 2

3988 1017

3989 6, 5, 2

m ki

3990 469

3991 168

3992 27, 8, 6

3993 1468

3994 19, 12, 9

3995 12, 9, 8

3996 19

3997 16, 13, 3

3998 153

3999 1250

4000 31, 18, 17

Bibliography

[1] N. H. Abel. Memoires sur les equations algebriques ou on demontrel’impossibilite de la resolution de l’equation generale du cinquiemedegre. Christiania, 1824. Facsimile edition, Trondheim, 1976.

[2] K. Araki, I. Fujita, and M. Morisue. Fast Inverters Over Finite FieldsBased on Euclid’s Algorithm. Transactions of the Institute of Electron-ics, Information and Communication Engineers E, E72(11):1230–34,November 1989.

[3] Y. Asano, T. Itoh, and S. Tsujii. Generalised Fast Algorithm forComputing Multiplicative Inverses in GF (2m). Electronics Letters,25(10):664–65, May 1989.

[4] D. W. Ash, I. F. Blake, and S. A. Vanstone. Low Complexity NormalBases. Discrete Applied Mathematics, 25:191–210, 1989.

[5] Elwyn R. Berlekamp. Algebraic Coding Theory. McGraw-Hill BookCompany, New York, 1968.

[6] Elwyn R. Berlekamp. Bit-Serial Reed-Solomon Encoders. IEEE Trans-actions on Information Theory, IT-28(6):869–74, November 1982.

[7] Richard E. Blahut. Theory and Practice of Error Control Codes.Addison-Wesley publishing company, Reading, Massachusetts, 1983.

[8] Unjeng Cheng. On the Continued Fraction and Berlekamp’s Algorithm.IEEE Transactions on Information Theory, 30(3):541–44, 1984.

[9] George I. Davida. Inverse of Element of a Galois Field. ElectronicsLetters, 8(21):518–20, October 1972.

151

152 Bibliography

[10] Leonard Eugene Dickson. Linear Groups with an Exposition of theGalois Field Theory. Dover Publications Inc., Leipzig, 1901. Reprintedby Dover Publications, New York 1958.

[11] Jean Louis Dornstetter. On the Equivalence Between Berlekamp’sand Euclid’s Algorithms. IEEE Transactions on Information Theory,33(3):428–31, 1987.

[12] Willard L. Eastman. Inside Euclids Algorithm. In Dijen Ray-Chaudhuri, editor, Coding Theory and Design Theory. Part I: CodingTheory, pages 113–127, New York, 1990. Springer-Verlag.

[13] Gui-Liang Feng. A VLSI Architecture for Fast Inversion in GF (2m).IEEE Transactions on Computers, 38(10):1383–1386, October 1989.

[14] Gui-Liang Feng and Kenneth K. Tzeng. A Generalization of theBerlekamp-Massey Algorithm for Multisequence Shift-Register Synthe-sis with Applications to Decoding Cyclic Codes. IEEE Transactionson Information Theory, 37(5):1274–87, 1991.

[15] Evariste Galois. Sur la theorie des nombres. Bulletin des SciencesMathematiques de Ferussac, 13:428–35, June 1830.

[16] Joachim von zur Gathen. Inversion in finite fields using logarithmicdepth. Journal of Symbolic Computation, 9(2):175–83, 1990.

[17] Carl Friedrich Gauss. Werke I: Disquisitiones arithmeticae. Gottingen,1863.

[18] Willi Geiselmann. Algebraische Algorithmenentwicklung am Beispielder Arithmetik in endlichen Korpern. PhD thesis, Universitat Karls-ruhe, Karlsruhe, Germany, 1993.

[19] M. A. Hasan. Shift Register Synthesis For Multiplicative Inversion OverGF (2m). In IEEE International Symposium on Information Theory,page 49, Whistler, British Columbia, Canada, September 1995. IEEE.

[20] M. A. Hasan and V. K. Bhargava. Multiplication and Inversion Overa Class of GF (2m). In IEEE Pacific Rim Conference on Communica-tions, Computers and Signal Processing, volume 1, pages 211–14, NewYork, May 1991. IEEE.

Bibliography 153

[21] M. A. Hasan and V. K. Bhargava. A VLSI Architecture for a Low Com-plexity Rate-Adaptive Reed-Solomon Encoder. In 16:th Biennial Sym-posium on Communications, pages 331–34, Kingston, Canada, May1992.

[22] M. A. Hasan and V. K. Bhargava. Bit-serial Systolic Divider and Mul-tiplier for Finite Fields GF (2m). IEEE Transactions on Computers,41(8):972–80, August 1992.

[23] M. A. Hasan and V. K. Bhargava. Division and Bit-serial Multiplicationover GF (qm). IEE Proceedings E, 139(3):230–36, May 1992.

[24] Nils Hedenstierna and Kjell O. Jeppson. CMOS Circuit Speed andBuffer Optimization. IEEE Transactions on Computer-Aided Design,CAD-6(2):270–81, March 1987.

[25] I. N. Herstein. Topics in Algebra. John Wiley & sons, New York, 1975.

[26] T. Itoh and S. Tsujii. A Fast Algorithm for Computing MultiplicativeInverses in GF (2m) Using Normal Bases. Information and Computa-tion, 78(3):171–77, September 1988.

[27] T. Itoh and S. Tsujii. Effective Recursive Algorithm for ComputingMultiplicative Inverses in GF (2m). Electronics Letters, 24(6):334–35,March 1988.

[28] M. Kovac, N. Ranganathan, and M. Varanasi. A Systolic Algorithmand Architecture for Galois Field Arithmetic. In V. K. Prasanna andL. H. Canter, editors, Proceedings. Sixth International Parallel Process-ing Symposium, pages 283–88, Los Alamitos, California, USA, March1992. IEEE Comput. Soc. Press.

[29] M. Kovac, N. Ranganathan, and M. Varanasi. SIGMA: A VLSI SystolicArray Implementation of a Galois Field GF (2m) Based Multiplicationand Division Algoritm. IEEE Transactions on VLSI Systems, 1(1):22–30, March 1993.

[30] Rudolf Lidl and Harald Niederreiter. Finite Fields. Cambridge Univer-sity Press, Cambridge, 1984.

[31] B. E. Litow and G. I. Davida. O (log(n)) Parallel Time Finite FieldInversion. In VLSI Algorithms and Architectures. 3rd Aegean Workshopon Computing, AWOC88. Corfu, Greece, pages 74–80, Berlin, NewYork, June 1988. Springer-Verlag.

154 Bibliography

[32] J. L. Massey and J. K. Omura. Apparatus for Finite Field Computa-tion. US Patent 4,587,627, 1986.

[33] James L. Massey. Shift-Register Synthesis and BCH Decoding. IEEETransactions on Information Theory, 15(1):122–127, 1969.

[34] Edoardo D. Mastrovito. VLSI Architectures for Computation in GaloisFields. PhD thesis, Linkoping University, Linkoping, Sweden, 1991.No. 242.

[35] R. J. McEliece. Finite Fields for Computer Scientists and Engineers.Kluwer Academic Publishers, Boston, 1987.

[36] M. Morii and M. Kasahara. Efficient Construction of Gate Circuit forComputing Multiplicative Inverses over GF (2m). Transactions of theInstitute of Electronics, Information and Communication Engineers E,E72(1):37–42, January 1989.

[37] M. Morii, M. Kasahara, and D. L. Whitling. Efficient Bit-Serial Mul-tiplication and the Discrete-Time Wiener-Hopf Equations over FiniteFields. IEEE Transactions on Information Theory, IT-35(6):1177–83,November 1989.

[38] R. C. Mullin, I. M. Onyszchuk, and S. A. Vanstone. Optimal NormalBases in GF(pn). Discrete Appl. Math., 22:149–61, 1988/89.

[39] Christof Paar. Efficient VLSI Architectures for Bit Parallel Compu-tation in Galois Fields. PhD thesis, Universitat GH Essen, Essen,Germany, 1994. No. 328.

[40] A.-E. Pellet. Sur les fonctions irreductibles suivant un module premieret une fonction modulaire. C. R. Acad. Sci. Paris, 70(7):328–30, 1870.

[41] Sam Perlis. Normal Bases of Cyclic Fields of Prime-Power Degree.Duke Mathematical Journal, 9:507–17, 1942.

[42] Irving S. Reed, Robert A. Scholtz, T. K. Truong, and Lloyd R. Welch.The Fast Decoding of Reed-Solomon Codes Using Fermat TheoreticTransforms and Continued Fractions. IEEE Transactions on Informa-tion Theory, 24(1):100–106, 1978.

[43] C. E. Shannon. A Mathemetical Theory of Communication. Bell Sys-tem Technical Journal, 27:379–423,623–56, July, October 1948.

Bibliography 155

[44] A. Shen, A. Ghosh, S. Devadas, and K Keutzer. On AveragePower Dissipation and Random Pattern Testability of CMOS Com-binational Logic Networks. In IEEE/ACM International Conferenceon Computer-Aided Design, pages 402–7, Santa Clara, CA, November1992.

[45] H. J. M. Veendrick. Short-Circuit Dissipation of Static CMOS Circuitryand its Impact on the Design of Buffer Circuits. IEEE Journal on Solid-State Circuits, SC-19(4):468–73, August 1984.

[46] Zhe-xian Wan. Introduction to Abstract and Linear Algebra. Studentlit-teratur, Lund, Sweden, 1992.

[47] C. C. Wang, T. K. Truong, H. M. Shao, L. J. Deutsch, J. K. Omura,and I. S. Reed. VLSI Architectures for Computing Multiplications andInverses in GF (2m). IEEE Transactions on Computers, C-34(8):709–17, August 1985.

[48] Chin-Liang Wang and Jung-Lung Lin. A Systolic Architecture for Com-puting Inverses in Finite Fields GF (2m). In 1991 International Sympo-sium on VLSI Technology, Systems, and Applications. Proceedings ofTechnical Papers, pages 312–16, New York, May 1991. IEEE.

[49] Chin-Liang Wang and Jung-Lung Lin. A Systolic Architecture for Com-puting Inverses in Finite Fields GF (2m). IEEE Transactions on Com-puters, 42(9):1141–46, 1993.

[50] M. Z. Wang and I. F. Blake. Bit Serial Multiplication in Finite Fields.SIAM Journal on Discrete Mathematics, 3(1):140–48, February 1990.

[51] Heinrich Weber. Die allgemeinen Grundlagen der Galois’schen Gle-ichungstheorie. Mathematische Annalen, 43:521–49, 1893.

[52] Lloyd R. Welch and Robert A. Scholtz. Continued Fractions andBerlekamp’s Algorithm. IEEE Transactions on Information Theory,25(1):19–27, 1979.

[53] Neil H. E. Weste and Kamran Eshraghian. Principles of CMOS VLSIDesign. A Systems Perspetive. Addison-Wesley Publishing Company,Reading, Massachusetts, 1988.

156 Bibliography

Index

Abelian group, 7Adder tree, 25–26, 27, 54Addition, 8

in F2, 9, 12, 15, 25–26, 62in a ring, 8in extension fields, 11in finite fields, 5Notation for, 5of vectors, 11Properties of, 108 (table), 115

(table), 118Symbol for, 6 (figure)

Algebraic coding theory, 2All-one polynomial, see PolynomialAND gate, see Boolean ANDArea, see ModelArithmetic

in F2, 9in F4, 102–107

Architectures for, 104 (fig-ure), 107 (figure)

Properties of, 107, 108 (ta-ble)

using normal bases, 105–106using polynomial bases,

103–105in F16, 109–115

Architectures for, 108 (fig-ure), 111 (figure), 114 (fig-ure)

Properties of, 115–117 (ta-ble)

Arithmetic (continued)

in tower fields, 85–125

using bases of type I, 88–92

using bases of type II, 92–95

using bases of type III, 96–102

Background, 2–5

Bases

Canonical, 10

Conventional, 10

Dual, 10, 11, 13, 14, 37, 61,74

exchange, 74–76

Multiple of, 13, 14, 37

Normal, 3, 10, 59, 61–67, 74,76, 81, 85–87, 103,105–107, 110, 112, 113,115, 118, 120, 122, 125,128, 130, 133

Best known, 68 (table)

Construction of, 66

Optimal, 66

of F4, 103

of tower fields, 86–88

of type I, 87, 88–92, 120–122,125, 128, 130

of type II, 87, 92–95, 121–122,125, 128, 130

of type III, 87, 96–102, 118,121–122

157

158 Index

Bases (continued)Polynomial, 3, 10, 12–14, 29,

36–38, 47, 49, 58, 74, 76,85–87, 103–107, 109, 110,112, 115, 118, 120–122,125, 128, 130

Standard, 10Triangular, 36, 37, 38, 78

Berlekamp-Massey algorithm, 35,38

relation to the Euclidean al-gorithm, 35

BooleanAND, 5, 6 (figure), 16 (figure),

20gates, 6 (figure), 15, 16 (fig-

ure)Properties of, 28 (table)Scaled, 19–21, 23Symbols for, 5Unscaled, 18, 19–21, 23, 24,

26inversion, 5, 6 (figure), 20, 23NAND, 6 (figure), 20NOR, 6 (figure)operations, 5OR, 5, 6 (figure)XNOR, 6 (figure)XOR, 5, 6 (figure), 16 (figure),

26Buffer, 26, 27, 43, 110, 112, 113,

118Properties of, 28 (table), 107,

108 (table), 115, 115 (ta-ble), 118

Symbol for, 6 (figure)

Canonical basis, see BasesCapacitance, 19Capacitive load, 19

Chinese remainder theorem, 4Clock frequency, 23Clock signal, 22, 27, 35CMOS, see Static CMOSCommutative ring, 8Complex field, 9Composite field, 85Conclusions, 128–130Continued fractions, 35Control

logic, 26signal, 15, 26

Conventional basis, see BasesCost measures, 27–28Critical path, 20, 32, 41, 47, 51,

54, 107, 115, 118through control logic, 26

D Flip-flop, see Flip-flopDelay, see ModelDivision, 5Drain capacitance, 19Dual basis, see BasesDynamic power dissipation,

see Power dissipation

Energy, see ModelError correcting codes, 2Error-locator polynomial, 35Euclidean algorithm, 29, 30

for polynomials, 35relation to the Berlekamp-Mas-

sey algorithm, 35Exponentiation, 2Extension field, 9, 9–10

Fast inverters, see InversionFeedback network, 38

Index 159

Feedback shift register,see Shift register

Field, 8Finite field, 8

Characteristic of, 10Flip-flop, 15, 17 (figure), 27Frequency, 23Future Research, 133

Galois field, see Finite fieldGalois imaginaries, 2Gate, 15Gate capacitance, 19Gates, see Boolean gatesGauss elimination, 36Gauss-Jordan algorithm, 45

Architecture of, 46 (figure), 47,48 (figure)

Generator, 9Glitch, 22Group, 7

Hamming complexity, 63Best known, 68 (table)Lower bound on, 66, 67Upper bound on, 63, 64, 66,

67, 133Hamming weight

of a matrix, 62of a polynomial, 32, 41, 51of a vector, 62, 65

Hankel matrix, see Matrix

Information theory, 2Input gate capacitance, 19Integers, 8Interconnection, 21Introduction, 1–5

Inversion, 2

based on exponentiation, 3,69–74

Algorithm for, 69, 70, 73

Architecture of, 71 (figure),75 (figure)

Properties of, 72 (table), 75(table)

based on pattern recognition,5

based on the Berlekamp-Mas-sey algorithm, 35–45, 54,76

Algorithm for, 39

Architecture of, 40 (figure),38–41

Control signals of, 41

Properties of, 44 (table), 41–45, 77 (table)

based on the Euclidean algo-rithm, 3, 29–35, 54, 76

Algorithm for, 30

Architecture of, 31 (figure)

Control signals of, 31

Properties of, 35 (table), 32–35, 77 (table)

based on the Gauss-Jordan al-gorithm, 45–54, 77–80

Order of input signals for,52 (table)

Preprocessor of, 50 (figure),53 (figure)

Properties of, 49, 50 (table),51, 52 (table), 79, 80 (ta-ble)

Boolean, see Boolean inversion

by a direct network, 4

by a multiplier tree, 3

by Araki/Fujita/Morisue, 3

160 Index

Inversion (continued)by Asano/Itoh/Tsujii, 4by Berlekamp, 3, 29by Feng, 4, 73–74by Hasan/Bhargava, 3, 4, 49,

77by Itoh/Tsujii, 4, 74by Kovac/Ranganatan/Vara-

nasi, 5by Mastrovito, 70by Morii/Kasahara, 4, 90by table lookup, 3by Wang et al., 3, 69by Wang/Lin, 4, 49Fast, 129 (figure)in F4, 105, 109

Architecture for, 104 (figure),107 (figure)

Properties of, 108 (table)in F16, 112–113

Architecture for, 114 (figure)Properties of, 117 (table)

in extension fields, 11in tower fields, 85–125

Properties of, 123 (figure),124 (figure), 118–125

Low Energy, 132 (figure)Small, 131 (figure)using all subfields, 4using bases of type I, 88–90

Architecture of, 89 (figure)Properties of, 120–121

using bases of type II, 93Architecture of, 92 (figure)Properties of, 121

using bases of type III, 96–99Architecture of, 98 (figure)Properties of, 121–122

Inversion (continued)using normal bases, 59–81

Properties of, 80–81, 82–84(figure)

using polynomial bases, 29–58,74–80, 105

Properties of, 55 (figure), 56(figure), 57 (figure), 54–58

using systolic arrays, 4with logarithmic depth, 4

Inverter chain, 23, 27Irreducible

pentanomial, 32, 135polynomial, see Polynomialtrinomial, 32, 135

Large capacitive loads, 23–25Leakage current, 22Linear feedback shift register,

see Shift registerLinear transformation, 10Low energy inverters,

see Inversion

Massey-Omura multiplier,see Multiplication

MatrixHankel, 13, 14, 36–39, 78Nonsingular, 13, 36, 39, 45representation, 11, 12, 13Toeplitz, 4

Minimum size transistor,see MOS transistor

Mobility, 18Model, 1, 15, 17–23

Area, 17, 21Delay, 17–21Energy, 22, 23, 27

Index 161

Model (continued)

Power dissipation, 17, 22–23,27

Size, 21, 27

Time, 27

Monic polynomial, see Polynomial

MOS transistor, 1, 15

Length of, 18

Minimum size of, 18, 21

Scaled, 19–21

Unscaled, 18, 19

Width of, 18

Multiple transitions, see Transitions

Multiplication, 2, 69

by Berlekamp, 13

by Hasan/Bhargava, 3, 36

by Massey/Omura, 3, 59, 61–67

Properties of, 67, 68 (table)

used in inversion, 71

by Paar, 91

by Wang/Blake, 36

in F2, 9, 12, 15, 62

in F4, 103–106


Properties of, 108 (table)

in F16, 113

Architecture for, 114 (figure)


in a field, 8

in a ring, 8

in extension fields, 11, 12

in finite fields, 5

Notation for, 5


using bases of type I, 91


Multiplication (continued)

using bases of type II, 93–95


using bases of type III, 99–100


using normal bases, 105–106

using polynomial bases, 13,103–104

Multiplication by a constant

in F4, 104–106, 109

Architecture for, 104 (figure),107, 108 (figure)


in F16, 109

Architecture for, 108 (figure)


in tower fields, 119


using bases of type I, 91


Properties of, 119

using bases of type II, 95


Properties of, 119

using bases of type III, 100


Properties of, 119

using normal bases, 106

using polynomial bases,104–105

NAND gate, see Boolean NAND

nMOS transistor, 15

Nonsingular matrix, see Matrix

NOR gate, see Boolean NOR

Normal basis, see Bases

inverters, see Inversion

multipliers, see Multiplication

162 Index

Normalized

area, 21, 27, 28 (table)

of an adder tree, 26

of an inverter chain, 24–25

of control logic, 26

capacitance, 20

delay, 20

of an adder tree, 26

of an inverter chain, 23–25


energy, 23, 27


input capacitance, 28 (table)


internal delay, 20, 28 (table)

output resistance, 28 (table)


power dissipation, 23, 27

of an inverter chain, 25

properties, 27, 28 (table)

of addition, see Addition

of Boolean gates, see Booleangates

of buffers, see Buffer

of inversion, see Inversion

of multiplication, see Multi-plication (by a constant)

of VLSI architectures, 15–28

resistance, 19

time, 27

total delay, 20

width, 18

Notation, 5

OR gate, see Boolean OR

Outline, 1–2

Pattern recognition, 5

Pentanomial, 135,see also Polynomial

pMOS transistor, 15

Polynomial

All-one, 3, 59–61, 74

Irreducible, 9, 10, 13, 29, 60,86, 87

of minimum weight, 32, 133,135–149

Table of, 136–149

Monic, 9

Polynomial basis, see Bases

inverters, see Inversion

multipliers, see Multiplication

Power dissipation, see Model

Prime field, 9

Primitive element, 9

Properties, see Normalized prop-erties

Real field, 9

Resistance, 19

Ring, 8

Scaled

gate, see Boolean gates

transistor, see MOS transistor

Scaling factor, 19–21, 23

Shift register, 12, 38, 38 (figure),41, 51, 52 (figure)

Short circuit power dissipation,see Power dissipation

Single transitions, see Transitions

Size, see Model

Small inverters, see Inversion

Source capacitance, 19

Squaring, 2, 69, 91, 95, 100

by Hasan/Bhargava, 3

in F4, 105–106, 109



Index 163

Squaring (continued)in F16, 109–112

Architecture for, 111 (figure)Properties of, 115 (table)

using bases of type I, 91–92Architecture of, 92 (figure)

using bases of type II, 95Architecture of, 94 (figure)

using bases of type III, 100–102Architecture of, 101 (figure)

using normal bases, 62, 106using polynomial bases, 105

Standard basis, see BasesStatic CMOS, 15Static power dissipation,

see Power dissipationSubfield, 4, 9Sun Zi theorem, see Chinese re-

mainder theoremSwitch, 5, 15Symbols, 5, 6 (figure)Systolic array, 4

Time, see ModelToeplitz matrix, see MatrixTower field, 85

Bases of, see BasesInversion in, see InversionMultiplication in, see Multi-

plication (by a constant)Squaring in, see Squaring

Trace, 10Transistor, see MOS transistor and

ModelTransitions, 22Transmission gate, 15Triangular basis, see BasesTrinomial, 135,

see also PolynomialType I bases, see Bases

Type II bases, see BasesType III bases, see Bases

Unscaledgate, see Boolean gatestransistor, see MOS transistor

Vectoraddition, see Addition of vec-

torsrepresentation, 11space, 9

VLSI, 15–28

Weight, see Hamming weight

XNOR gate, see Boolean XNORXOR gate, see Boolean XOR

vlsi aspects on inversion in finite fieldsmikael:... · 2011. 2. 23. · vlsi aspects on inversion...

Documents