2wf15 - discrete mathematics 2 - part 1 algorithmic number ...bdeweger/downloads/discrete... ·...

1

2WF15 - Discrete Mathematics 2 - Part 1

Algorithmic Number Theory

Benne de Weger

version 0.54, March 6, 2012

version 0.54, March 6, 2012 2WF15 - Discrete Mathematics 2 - Part 1

2

2WF15 - Discrete Mathematics 2 - Part 1 version 0.54, March 6, 2012

Contents i

Contents

1 Multi-precision arithmetic 1

1.1 Notation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Representation of integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

1.3 Efficient arithmetic with integers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

1.4 Efficient polynomial arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2 Euclidean and Modular Algorithms 11

2.1 The Euclidean Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

2.2 Efficient Modular Arithmetic . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3 The Chinese Remainder Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3 Multiplicative structure of Z∗n 27

3.1 Euler (and Fermat) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

3.2 Order of an element . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

3.3 Primitive roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

3.4 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4 Quadratic Reciprocity 35

4.1 Quadratic residues and the Legendre symbol . . . . . . . . . . . . . . . . . . . . . 35

4.2 The Quadratic Reciprocity Law . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4.3 Another proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4 The Jacobi symbol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41

4.5 Modular Square Roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5 Prime Numbers 47

5.1 Prime Number Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.2 Probabilistic Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.3 Deterministic Primality Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

5.4 Prime Number Generation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55


ii Contents

6 Multiplicative functions 59

6.1 Multiplicative functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.2 The Mobius function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.3 Mobius inversion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.4 The Principle of Inclusion and Exclusion . . . . . . . . . . . . . . . . . . . . . . . . 63

6.5 Fermat and Euler revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7 Continued Fractions 67

7.1 The Euclidean Algorithm revisited . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.2 Continued Fractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

7.3 Diophantine approximation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

Bibliography 75


1.1 Notation 1

Chapter 1

Multi-precision arithmetic

Introduction

General references for this chapter: [CP, Chapter 9], [GG, Chapters 2, 8], [Kn, Chapter 4], [Sh,Chapter 3].

In this chapter the following topics will be treated:

• representation of integers and polynomials in a computer,

• efficient algorithms for elementary arithmetic operations for the following objects:

– integers,

– polynomials.

1.1 Notation

First the necessary notation. We use the following notation for sets of numbers.

• N = {1, 2, 3, . . .} is the set of natural numbers (excluding 0),

• Z = {. . . ,−3,−2,−1, 0, 1, 2, 3, . . .} is the set of integral numbers or integers,

• Q is the set of rational numbers,

• R is the set of real numbers.

If a set S contains 0, we sometimes write S∗ = S \ {0}.

The cardinality of a set S is denoted by |S|, or by #S.

We will use the floor function bxc, which is the largest integer that is at most x, and the ceilingfunction dxe, which is the smallest integer that is at least x.

The symbol log will always stand for the natural logarithm. The logarithm to the base b will bedenoted by logb.

To get an idea of the growth of f(x) for x → ∞, we say that a function f(x) is of the order ofg(x) for some useful function g, if there exists a constant c such that f(x) < cg(x) for all largeenough x. The notation for this is the big O-notation:

f(x) = O(g(x)).


2 1.2 Representation of integers

For g(x) we usually take simple functions such as xα, eαx, log x, etc. Of course there is a similarbig O concept for e.g. x ↓ 0.

Example:n2 + 3

2√n− 1000000 log n

= O(n32 ). Or, if we want to be a bit more accurate:

n2 + 3

2√n− 1000000 log n

=1

2n

32 + O(n log n).

1.2 Representation of integers

A problem with doing arithmetic with computers is that numbers can only be stored in rathersmall memory words, and that computer arithmetic therefore can, at first sight, only work withsmall sized numbers. Most computers nowadays have a 32 bit architecture, some have 64 bitcapabilities, but there are also many processors that still have 16 or 8 bit registers only, e.g.smartcard processors. In an n bit word we can represent 2n different numbers, e.g. {0, 1, 2, . . . , 2n−1}. In many applications, notably cryptography, we often need to do exact arithmetic withnumbers that require hundreds or thousands of bits. In other words, overflow or rounding errorsare not tolerated at all. This means that the standard arithmetic functions provided by theprocessor or the (operating system) software, operating on single (or at best double) words only,are usually not sufficient, and we must find efficient ways to do arithmetic with large integers.

When we want to store large integers that do not fit in the word size of some computer, we have touse an array of words to represent one integer. There are many different ways to represent integers.We don’t want to get into too much implementation detail issues such as big / little endian or2-complement representations. Instead we’ll keep it at a fairly mathematical level. We also do nottreat how to represent (approximations of) non-integral rational or real numbers. Basically whatwe do is to use a so-called radix b representation, which is a generalization of the familiar decimalnotation.

Let b ≥ 2 be an integer, the so called radix . The elements of the set {0, 1, . . . , b − 1} are calledradix b digits, and we assume that they all fit in one memory word, and that operations on themcan be handled by the available processor. Note that this means that the bit length of a word is≥ dlog2 be.

Which value we take as b will depend on the architecture of our computer. Popular choices areb = 28, 216, 232, reflecting the word size (2 to the power the bit size of a word), as this optimizesthe storage. However, as we’ll see soon, it may also be wise to take e.g. b = 231 or even b = 216

on a 32-bit computer.

Any integer n ∈ Z admits a radix b representation [n]b, as follows. Let m ∈ N be such that|n| < bm, so that m ≥ blogb |n|c + 1. Then, repeatedly using division with remainder by b, itfollows that there exist n0, n1, . . . , nm−1 ∈ {0, 1, . . . , b− 1} such that

|n| = n0 + n1b+ n2b2 + . . .+ nm−1b

m−1 =

m−1∑i=0

nibi.

Then as the radix b representation of n we write

[n]b = ±[nm−1, nm−2, . . . , n0]b,

with ± = + or left out if n ≥ 0, and ± = − if n < 0. Leading zeroes are allowed, but usually areomitted, e.g.

[43]16 = [2, 11]16 = [0, 0, 2, 11]16.


1.3 Efficient arithmetic with integers 3

The parameter m is called the word size of [n]b.

When it is clear from the context which b is meant, we often write [n]b = ±nm−1nm−2 . . . n0.Omitting commas should only be done when confusion is not possible. When b ≤ 10 we use thewell known arabic symbols 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 for the digits. When b > 10 (but not too large)we run out of symbols, so after 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 we sometimes continue with A, B, C, . . ..

When b = 10 we get essentially our every day decimal representation. When b = 2 we get the wellknown binary representation; in that case the word size is equal to the bitsize. When b = 16 thesystem is called hexadecimal representation, and in that case often the digits 10, 11, 12, 13, 14, 15are written as A, B, C, D, E, F. Example: [43]16 = 2B. This hexadecimal system is well known inthe computing world.

1.3 Efficient arithmetic with integers

1.3.1 Addition and subtraction

Addition and subtraction is essentially easy: we use the primary school method, which is themost efficient way of doing it. This works by adding the digits backwards, and keeping track of”carries”.

We start with a formal description of addition of positive numbers. Addition of a positive anda negative number is equivalent to subtraction of two positive numbers, to be treated below.Addition of two negative numbers is equivalent to addition of two positive numbers, and adjustingthe sign of the output. Addition of 0 is trivial.

Algorithm 1.1 (Addition Algorithm )

Input: an integer b ≥ 2, and the radix b representations

[x]b = [xm−1, . . . , x0]b, [y]b = [yn−1, . . . , y0]b(without leading zeroes) of numbers x, y ∈ N

Output: the radix b representation [z]b = [zk−1, . . . , z0]b(without leading zeroes) of z = x+ y

Step 1: c← 0,xi ← 0 for m ≤ i < max{m,n}yi ← 0 for n ≤ i < max{m,n}

Step 2: for i = 0, 1, . . . ,max{m,n} − 1 do

zi ← xi + yi + cif zi ≥ b then zi ← zi − b, c← 1 else c← 0

Step 3: if c = 1 then k ← max{m,n}+ 1, zk−1 = 1 else k ← max{m,n}output [zk−1, . . . , z0]b

Note that indeed zi ∈ {0, 1, . . . , b− 1} for all i.

Next we give a formal description of subtraction of a positive number from a larger positivenumber. Subtraction of a positive number from a smaller positive number is essentially the same,by swapping the two numbers and adjusting the sign of the output. Subtraction of a positive anda negative number is equivalent to addition of two positive numbers, and adjusting the sign of theoutput. Subtraction of two negative numbers is equivalent to subtraction of two positive numbersand adjusting the sign of the output. Subtraction of 0 or from 0 is trivial.


4 1.3 Efficient arithmetic with integers

Algorithm 1.2 (Subtraction Algorithm)

Input: an integer b ≥ 2, and the radix b representations

[x]b = [xm−1, . . . , x0]b, [y]b = [yn−1, . . . , y0]b(without leading zeroes) of numbers x, y ∈ N such that x > y(so m ≥ n)

Output: the radix b representation [z]b = [zk−1, . . . , z0]b(without leading zeroes) of z = x− y

Step 1: c← 0,yi ← 0 for n ≤ i < m

Step 2: for i = 0, 1, . . . ,m− 1 do

zi ← xi − yi − cif zi < 0 then zi ← zi + b, c← 1 else c← 0

Step 3: k ← mwhile k ≥ 2 and zk−1 = 0 do k ← k − 1output [zk−1, . . . , z0]b

Note that these addition and subtraction algorithms have linear complexity, i.e. the number ofatomic operations (addition / subtraction operations on words) needed to add / subtract twonumbers of radix b length n is O(n) (in essence it is n, if we neglect operations on indices, on bitsand on signs). Clearly one cannot do better. You should have noted that these algorithms mayuse variables that sometimes will be one bit longer than the word size. We do not go into detailson how to deal with this (one possible solution is to use b one less than the bit size of a word).

1.3.2 Multiplication

1.3.2.1 Naive

Again multiplication can be done pretty efficiently by the naive, or primary school method. Given

x =

m−1∑i=0

xibi and y =

n−1∑j=0

yjbj , we want to compute

xy =

(m−1∑i=0

xibi

)n−1∑j=0

yjbj

=m−1∑i=0

n−1∑j=0

xiyjbi+j .

The method computes

n−1∑j=0

(xyj)bj using xyj =

m−1∑i=0

(xiyj)bi for j = 0, 1, . . . , n− 1.

A multiplication operation on two words xiyj yields a number that in general will not fit in oneword anymore. Note that 0 ≤ xiyj ≤ (b− 1)2 < b2, so it does always fit in two words. The carryis the most significant word of the two. If to such a product a carry is to be added, say c of wordlength 1, we have xiyj + c ≤ (b− 1)2 + (b− 1) = b2 − b < b2, and so the new carry will also haveword length 1. With induction it’s now easy to prove that the carry will never be larger than oneword.

This is implemented in the algorithm below, for the multiplication of two positieve numbers. Whennon-positive numbers have to be multiplied, you can use this algorithm too, and you simply haveto adjust the sign.



Algorithm 1.3 (Naive Multiplication Algorithm)

Input: the radix b ≥ 2, and the radix-b-representations[x]b = [xm−1, . . . , x0]b, [y]b = [yn−1, . . . , y0]b(without leading zeroes) of numbers x, y ∈ N

Output: the radix-b-representation [z]b = [zk−1, . . . , z0]b(without leading zeroes) of z = xy

Step 1: zi ← 0 for n ≤ i < m+ n− 1Step 2: for i = 0, 1, . . . ,m− 1 do

c← 0for j = 0, 1, . . . , n− 1 do

t← zi+j + xiyj + c

c←⌊t

b

⌋zi+j ← t− cb

zi+n ← cStep 3: if zm+n−1 = 0 then k ← m+ n− 2 else k ← m+ n− 1

output [zk−1, . . . , z0]b

Many variants exist, that usually do not do it essentially better. Another idea is to mimic theformula

(m−1∑i=0

xibi

)n−1∑j=0

yjbj

=

m+n−2∑k=0

(m−1∑i=0

xiyk−i

)bk,

i.e. gather all products of digits that end up in the same digit of the product.

Note that to multiply two words (an atomic operation), the processor must be able to deal withdouble word length. One way to achieve this is to use only half the word size (i.e. have word size> 2 log2 b). Standard processors have more efficient ways to do this. We do not go into details.

The complexity of this type of multiplication of two numbers of radix b lengthsm and n respectivelyis O(mn). If the complexity of addition and subtraction is neglected, it is essentially mn. Becauseoften m = n is taken, this is called quadratic complexity (as opposed to linear complexity foraddition), because the complexity of multiplying two n bit numbers is O(n2).

1.3.2.2 Karatsuba

In the previous sections we noticed that multiplication by the primary school method has quadraticcomplexity: the number of atomic operations (additions or multiplications on words) you haveto perform in order to multiply two n-word integers is O(n2). On the other hand, addition andsubtraction have only linear complexity. That is a notable difference, and the question ariseswhether we can multiply more efficiently. Surprisingly, the answer is affirmative.

The speed of multiplication can be improved quite easily, both in theory and in practice. This isnot obvious, and a lot of theory has been developed. The easiest way to show that improvementcan be reached is by a trick known as Karatsuba multiplication. It is based on three nice ideas.

The first basic idea is to split up the numbers to be multiplied in halves, and perform onlymultiplications on the halves. So let us take two numbers x, y that have a radix b representationof even (bit) length n (if n is odd, add a leading zero). Then we can write

x = xhibn/2 + xlo, y = yhib

n/2 + ylo,


6 1.3 Efficient arithmetic with integers

where xhi, xlo, yhi, ylo are numbers with a length m/2 representation. Then

xy = xhiyhibn + (xhiylo + xloyhi)b

n/2 + xloylo.

At first sight one would guess that this takes 4 multiplications of half-length numbers, and a(linear, therefore negligible) shift and add operation (note that multiplication by powers of b isjust shifting, and essentially for free). Assuming that a multiplication of two numbers of size ncosts cn2 units (unit: atomic operation, or nanoseconds, or whatever reasonable measure you like),this naive method will cost 4× c(n/2)2 + O(n) = cn2 + O(n) units, and we have gained nothing.

The second basic idea now is the nice trick found by Karatsuba. Note that

(xhi + xlo)(yhi + ylo) = xhiyhi + (xhiylo + xloyhi) + xloylo.

Now compute xhiyhi, xloylo, and (xhi + xlo)(yhi + ylo), which is only 3 half-length multiplicationsand some additions. Then we can compute xhiylo + xloyhi = (xhi + xlo)(yhi + ylo)− xhiyhi − xloyloby doing only additions and subtractions, and to get xy we can compose the results by shiftingand additions.

This method costs 3 × c(n/2)2 + O(n) = 34cn

2 + O(n) units, and with this factor of 34 we have

found a faster method.

The third basic idea is to apply the trick recursively: also the half-length multiplications can besped up in the same way. We will now see what this leads to in terms of complexity of the overallmultiplication.

Assume that the resulting algorithm for multiplying two n-word numbers has complexity T (n).Multiplying two numbers of word length 2n then costs 3 times T (n), plus some linear terms. Sothere exists a C > 0 such that

T (2n) < 3T (n) + Cn.

If we now enlarge C, if necessary, to obtain C > T (2), then using induction it easily follows thatT (2k) < C(3k − 2k) for k ≥ 1. Indeed,

T (2k) < 3T (2k−1) + C2k−1 < 3C(3k−1 − 2k−1) + C2k−1 = C(3k − 2k).

With k = dlog2 ne we find

T (n) ≤ T (2k) < C · 3k < 3C · 3log2 n < 3C · nα with α =log 3

log 2= 1.5849 . . . .

In other words: T (n) = O(nα) = O(n1.585). And this is a substantial theoretical improvementover the naive quadratic algorithm.

In practice Karatsuba’s method becomes advantageous already for rather small numbers (fromabout 5 words on).

1.3.2.3 Other methods

Karatsuba’s idea can be generalized as follows. Split up x, y into r + 1 pieces of n words each:

x = xrbrn + xr−1b

(r−1)n + . . .+ x0, y = yrbrn + yr−1b

(r−1)n + . . .+ y0.

Consider the polynomials

X(t) = xrtr + xr−1t

r−1 + . . .+ x0, Y (t) = yrtr + yr−1t

r−1 + . . .+ y0,



and put

Z(t) = X(t)Y (t) = z2rt2r + z2r−1t

2r−1 + . . .+ z0.

Since x = X(bn), y = Y (bn) and xy = Z(bn), we are done when we can efficiently compute the 2r+1coefficients of Z. This we do by computing the values of Z(t) = X(t)Y (t) at t = 0, 1, 2, . . . , 2r,costing 2r + 1 multiplications of n-word numbers, and then by interpolation the coefficients canbe found in linear time, as they are fixed linear combinations of the computed numbers (the linearcombinations are given by the inverse Vandermonde matrix on 0, 1, . . . , 2r). With T (n) as above,we now find T ((r + 1)n) < (2r + 1)T (n) + Cn. Taking C large enough so that C > T (r + 1)

we find by induction that T((r + 1)k

)≤ C

r

((2r + 1)k − (r + 1)k

). It follows, like above, that

T (n) <2r + 1

rCnlogr+1(2r+1), and thus T (n) < 3Cnα for α =

log(2r + 1)

log(r + 1). With r →∞ we find:

for every ε > 0 : T (n) < cn1+ε.

This is quite spectacular, as it shows that multiplication can be done in almost (more precise:asymptotically) linear time. But this really is an asymptotic result, as c will depend on r, and inpractice these generalized Karatsuba methods with r > 1 are not very useful. A better idea, dueto Toom and Cook, is to let r vary with n, but we won’t go into details here.

Completely different method exist, such as Fast Fourier Transforms, that can be practically usedto achieve multiplication in essentially linear complexity. We do not go in details, see [Sh, Secton18.6], [CP, Section 9.5], [GG, Section 8.2].

1.3.3 Division

Division with remainder is the problem of, given x, y ∈ N with x > y, computing the unique pairq, r ∈ N with x = qy+ r and 0 ≤ r < y. This is done by the primary school long division method.The words of the quotient q are computed from left to right. When a new word qi has been found,the numerator x is decreased by qiy. Then the above steps are repeated to find the next word qi−1,until the numerator has become smaller than y. Then q is the number consisting of the words qi,and r is the remaining numerator.

We give the following algorithm, in which you should notice that we did not write out all steps inatomic operations. So to implement this a lot of further work is needed. See [Kn, Section 4.3.1]or [Sh, Section 3.3.4] for more implementation details.

Algorithm 1.4 (Division with Remainder Algorithm)

Input: the radix b ≥ 2, and the radix b representations

[x]b = [xm−1, . . . , x0]b, [y]b = [yn−1, . . . , y0]b(without leading zeroes) of numbers x, y ∈ N

Output: the radix b representations [q]b = [qk−1, . . . , q0]band [r]b = [r`−1, . . . , r0]b (without leading zeroes)

of q, r such that x = qy + r and 0 ≤ r < yStep 1: r ← x

k ← m− n+ 1Step 2: for i = k − 1, k − 2, . . . , 0 do

qi ←⌊r

biy

⌋r ← r − qibiy

Step 3: remove leading zeroes from q = [qk−1, . . . , q0]boutput [q]b and [r]b


8 1.4 Efficient polynomial arithmetic

It remains to explain how to find the most significant word

⌊r

biy

⌋of the quotient. This can be

done by dividing the two most significant words of the numerator by the most significant word ofthe denominator. When the most significant word of the denominator is larger than 1

2b, this is agood approximation (off by at most 2) of qi. When the most significant word of the denominatoris smaller than 1

2b, simply multiply numerator and denominator by a suitable number to make thedenominator’s most significant word larger than 1

2b.

So assume we have to compute the most significant word ofAbm−1 +Bbm−2 + C

Dbm−2 + E, where 0 < A <

b, 0 ≤ B < b, 0 ≤ C < bm−2, 12b ≤ D < b, and 0 ≤ E < bm−2. Instead we compute the most

significant word ofAb+B

D, which we regard as an almost elementary operation on single words

only. Now note that

∣∣∣∣Abm−1 +Bbm−2 + C

Dbm−2 + E− Ab+B

D

∣∣∣∣ =

∣∣∣∣ C

Dbm−2 + E− (Ab+B)E

(Dbm−2 + E)D

∣∣∣∣ ≤≤ max

{C

Dbm−2 + E,

(Ab+B)E

(Dbm−2 + E)D

}≤

≤ max

{bm−2 − 112bm−1 + 0

,((b− 1)b+ b− 1)(bm−2 − 1)

( 12bm−1 + 0) 1

2b

}<

< max

{2

b, 4

}≤ 4,

and consequently the most significant words differ at most by 4 (with a more careful analysisone can actually do better and reach a difference of at most 2). So a small correction to theinitially computed q and r may be necessary, and this can easily be arranged by doing (q, r) ←(q, r)± (1,−y) at most 4 times, until 0 ≤ r < y.

The complexity of division of two numbers of radix b lengths m and n is O(m(m− n+ 1)).

1.4 Efficient polynomial arithmetic

We now make a few remarks on polynomial arithmetic.

1.4.1 Representation of polynomials

Let K be a field, often a finite one. In this section we assume that there is some efficient way ofrepresenting field elements, and to do the field operations.

When K = Zp for a prime number p that fits in one computer word, this is easy. Then the atomicoperations as described above can easily be adapted to this situation: one only has to take careof reducing mod p whenever necessary.

When K = Zp for a larger prime, that does not fit in one word anymore, each field element willbe represented by an array of words, and each field operation will become more involved. Efficientmulti-precision arithmetic in Zn will be treated in the next Chapter.

Anyway, such operations on field elements are now treated as ”atomic operations”. Thus note thatan array of field elements may now have to be represented as a 2-dimensional array of computerwords.


1.4 Efficient polynomial arithmetic 9

With this in mind, there is a lot of similarity between radix b representation:

[n]b = [nm−1, . . . , n0]b representing n =

m−1∑i=0

nibi,

and representing polynomials by listing their coefficients:

[f ] = [fm, . . . , f0] representing f(X) =

m∑i=0

fiXi.

1.4.2 Polynomial arithmetic

Addition, multiplication and division with remainder can now be treated in a similar way to thecorresponding operations for integers. The only significant difference is that now there are nocarries anymore. In fact, this makes polynomial arithmetic easier than integer arithmetic.

Also Karatsuba multiplication goes through for polynomials.

Exercises

1.1. Show that for given a ∈ Z, b ∈ N there exists a unique pair of integers (q, r) such thata = qb+ r and 0 ≤ r < b.

1.2. Let m ∈ N, and let n ∈ Z be such that |n| < bm. Give a formal proof that there exist

n0, n1, . . . , nm−1 ∈ {0, 1, . . . , b− 1} such that |n| =m−1∑i=0

nibi.

1.3. Show that the radix b representation is unique when leading zeroes are not allowed.

1.4. Give an algorithm that, on input of radices b1, b2 and [n]b1 , outputs [n]b2 . Hint: start withb1 = 10, b2 = 2.

1.5. Compute the radix 2, 8 and 16 representations of 2181 and −3507.

1.6. When the radix b representation of n is given by [n]b = [nm−1, nm−2, . . . , n0]b, then what isfor k > 1 the radix bk representation [n]bk?

1.7. Work out the formulas for the cases r = 1, 2 of the generalized Karatsuba multiplicationmethod.

1.8. Describe in detail efficient algorithms for polynomial addition, subtraction, multiplication,Karatsuba multiplication, and division with remainder.

1.9. Implement all algorithms given in this Chapter, and all you did in Exercise 1.8, in yourfavourite computer language or computer algebra system.


10 1.4 Efficient polynomial arithmetic


2.1 The Euclidean Algorithm 11

Chapter 2

Euclidean and ModularAlgorithms

Introduction

General references for this chapter: [BS, Chapters 4, 5], [CP, Chapter 9], [GG, Chapters 3, 4, 5],[Sh, Chapters 4, 11]. And for those preferring Dutch: [Be, Chapters 3, 6], [Ke, Chapter 11], [dW,Chapters 1, 2].

In this Chapter the following topics will be treated:

• an analysis of the well known Euclidean Algorithm,

• algorithms for efficient multi-precision modular arithmetic,

• an algorithmic view of the Chinese Remainder Theorem,

2.1 The Euclidean Algorithm

2.1.1 The Greatest Common Divisor

The greatest common divisor of two integers a, b is the largest integer d such that d | a and d | b.Notation: gcd(a, b) (or simply (a, b)). Note that gcd(a, 0) = |a|.

When gcd(a, b) = 1 we say that a and b are coprime or relatively prime.

Let L(a, b) = {xa+ yb : x, y ∈ Z} be the set of Z-linear combinations of a and b. The set L(a, b)is closely linked to gcd(a, b), as is shown in the following lemma.

Lemma 2.1 L(a, b) is the set of multiples of gcd(a, b).

Proof. Clearly, any common divisor of a and b is a divisor of any xa + yb, thus of each elementof L(a, b). So L(a, b) contains only multiples of gcd(a, b). But we need to prove more: that itcontains all multiples.

Let u be the smallest positive element of L(a, b). Say that u = xa + yb. We now performdivision with remainder on a and u. There exist q, r such that a = qu + r and 0 ≤ r < u. Asr = a − qu = (1 − qx)a + (−qy)b we have r ∈ L(a, b). But, by the definition of u, there are no


12 2.1 The Euclidean Algorithm

elements in L(a, b) between 0 and u, so by 0 ≤ r < u we must have r = 0. This implies a = qu,i.e. u | a. Similarly we prove that u | b, and it follows that u | gcd(a, b). But we already saw thatgcd(a, b) | u, so gcd(a, b) = u ∈ L(a, b), and hence also all multiples of gcd(a, b) are in L(a, b). 2

An important result is the following, expressing the gcd as a Z-linear combination of its arguments.

Theorem 2.2 (gcd and linear combination) For any a, b ∈ Z there exist x, y ∈ Z such thatgcd(a, b) = xa+ yb.

Proof. Immediate from Lemma 2.1. 2

2.1.2 The basic Euclidean Algorithm

The treatment of the gcd in Section 2.1.1 was theoretical: we proved results about existence andproperties of certain numbers, but that does not necessarily give a method to compute the gcd.The prime factorizations of a, b can be used to compute gcd(a, b), and this may be useful for smallnumbers, but finding the prime factorization of large numbers is a notoriously difficult problem initself.

There is a very efficient way to compute gcd(a, b) without knowing the factorizations: the EuclideanAlgorithm. It can also be used to check for coprimality.

Algorithm 2.1 (Euclidean Algorithm)

Input: a, b ∈ ZOutput: d ∈ Z such that d = gcd(a, b)Step 1: a′ ← |a|, b′ ← |b|Step 2: while b′ > 0 do

r ← a′ −⌊a′

b′

⌋b′

a′ ← b′, b′ ← rStep 3: d← a′, output d

Theorem 2.3 (Euclidean Algorithm) The Euclidean Algorithm is correct, i.e. on input a, b ∈Z it outputs gcd(a, b) in a finite number of steps.

Proof. In step 1 we have gcd(a, b) = gcd(a′, b′). Any common divisor of a′ and b′ is also acommon divisor of a′ − qb′ and b′, for any q ∈ Z, and vice versa. So when in the while-loop ofStep 2 the number r is computed, the set of common divisors of a′ and b′ is the same as the setof common divisors of b′ and r. So over the entire while-loop the set of common divisors of a′

and b′ is invariant, hence so is gcd(a′, b′). Further, in Step 2 the value of b′ decreases strictly butremains nonnegative, as 0 ≤ r < b′. Eventually b′ must become 0. At that point Step 2 ends, andgcd(a, b) = gcd(a′, b′) = gcd(a′, 0) = a′. 2

2.1.3 The Extended Euclidean Algorithm

The Euclidean Algorithm can be extended to output also the x and y of Theorem 2.2. This iscommonly called the Extended Euclidean Algorithm.



Algorithm 2.2 (Extended Euclidean Algorithm)

Input: a, b ∈ ZOutput: d, x, y ∈ Z such that d = gcd(a, b) = xa+ ybStep 1: a′ ← |a|, b′ ← |b|

x1 ← 1, x2 ← 0y1 ← 0, y2 ← 1

Step 2: while b′ > 0 do

q ←⌊a′

b′

⌋, r ← a′ − qb′

a′ ← b′, b′ ← rx3 ← x1 − qx2, y3 ← y1 − qy2x1 ← x2, y1 ← y2x2 ← x3, y2 ← y3

Step 3: d← a′

if a ≥ 0 then x← x1 else x← −x1if b ≥ 0 then y ← y1 else y ← −y1output d, x, y

Theorem 2.4 (Extended Euclidean Algorithm) The Extended Euclidean Algorithm is cor-rect, i.e. on input a, b ∈ Z it outputs gcd(a, b), x, y such that gcd(a, b) = xa+ yb in a finite numberof steps.

Proof. After Theorem 2.3 we only have to prove that gcd(a, b) = xa + yb. This follows at oncefrom the invariancy in the while-loop of a′ = x1|a|+ y1|b| and b′ = x2|a|+ y2|b|. 2

2.1.4 A binary variant

The Euclidean Algorithm dates back to antiquity: Euclid lived around the year 300 BC. It is a soelegant algorithm that it should come as a surprise that it can be improved, as was only noticedrecently (second half of the 20th century).

By far the most expensive operation in the (Extended) Euclidean Algorithm is the computation ofq (long division). To avoid this computation, the Binary Euclidean Algorithm has been developed.It replaces division of a by b with repeated subtraction. This is usually faster, though the numberof iterations may grow.

The underlying ideas are:

• if both a and b are even, then divide out a common factor 2; this multiplies the gcd by 2,and max{a, b} decreases;

• if exactly one of a and b is even, then divide the even number by 2; this does not change thegcd, and does not increase max{a, b};

• if both a and b are odd, then a − b is even, so replace the larger one of a and b by |a − b|;this does not change the gcd, and max{a, b} decreases.


14 2.1 The Euclidean Algorithm

Algorithm 2.3 (Binary Euclidean Algorithm)

Input: a, b ∈ ZOutput: d ∈ Z such that d = gcd(a, b)Step 1: a′ ← |a|, b′ ← |b|

d← 1while a′ and b′ are both even do a′ ← 1

2a′, b′ ← 1

2b′, d← 2d

while a′ is even do a′ ← 12a′

while b′ is even do b′ ← 12b′

if a′ < b′ then (a′, b′)← (b′, a′)Step 2: while b′ > 0 do

a′ ← a′ − b′while a′ > 0 and a′ is even do a′ ← 1

2a′

if a′ < b′ then (a′, b′)← (b′, a′)Step 3: d← da′, output d

We mention the following result without proof.

Theorem 2.5 (Binary Euclidean Algorithm) The Binary Euclidean Algorithm is correct, i.e.on input a, b ∈ Z it returns gcd(a, b) in a finite number of steps. The number of times the while-loopis executed is at most O(log max{a, b}).

The bit complexity is now easily seen to be O((log max{a, b})2), as in each while-loop only sub-tractions and shifts take place.

2.1.5 Complexity

2.1.5.1 Time

We now estimate the number of iterations in the (Extended) Euclidean Algorithm in more detail.

The first time the loop in Step 2 is executed, the value of q may be zero, namely when a′ < b′. Inthat case Step 2 simply swaps a′ and b′. But from then on always b′ < a′, hence q ≥ 1. Clearlythe speed at which b′ decreases is minimal if q is minimal, i.e. q = 1. This happens all the timeexactly when a, b are consecutive Fibonacci numbers.

The Fibonacci numbers Fn (for n = 0, 1, 2, . . .) are defined recursively as follows:

F0 = 0, F1 = 1, Fn+1 = Fn + Fn−1 for n ∈ N.

The sequence of Fibonacci numbers starts as

0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, . . . .

Because Fn−1 < Fn when n ≥ 3, we find thatFn+1

Fn= 1 +

Fn−1Fn

< 2, hence the integral part of

Fn+1

Fnis equal to 1, and the remainder is Fn−1.

It follows that when we apply the (Extended) Euclidean Algorithm to a = Fn+1 and b = Fn, thevalues of a′, b′ and q in the beginning of the while-loop are given by the sequence (a′, b′, q) =(Fn+1, Fn, 1), (Fn, Fn−1, 1), (Fn−1, Fn−2, 1), . . ., (3, 2, 1), (2, 1, 2), and finally (a′, b′) = (1, 0). Thewhile-loop is executed n times, and clearly this is the worst case.

We now estimate Fn. Let τ = 12 (1 +

√5) = 1.618 . . . (the well known golden section number), and

τ = 12 (1−

√5) = −0.618 . . ..



Lemma 2.6 Fn =1√5

(τn − τn) for all n = 0, 1, 2, . . ..

Proof. Use mathematical induction. 2

Corollary 2.7

(a) Fn is the integer nearest to1√5τn for all n = 0, 1, 2, . . ..

(b) n− 1 = dlogτ Fne for n = 3, 4, . . ..

Proof. Trivial. 2

We now have the following result, showing that the Euclidean Algorithm is indeed efficient: in theworst case the number of times the while-loop is executed is logarithmic in the input variables.

In practice this means that we can compute gcd’s of pretty large integers very quickly: with e.g.1000 digit integers the number of steps in the Euclidean Algorithm is at most 4800, which on amodern personal computer should be a matter of milliseconds.

Theorem 2.8 (Euclidean Algorithm, complexity) When the (Extended) Euclidean Algorithmis applied to a, b with b ≥ 2, then the number of times the while-loop is executed is at mostdlogτ min{|a|, |b|}e.

Proof. Without loss of generality we may assume that a > b > 0. Let n be the integer such thatFn ≤ b < Fn+1. Note that the result is easily verified for all cases with b = 2.Hence b ≥ 3, and n ≥ 4. We show by induction that the number of iterations is at most n − 1.Then we are done, as by Corollary 2.7 n− 1 = dlogτ Fne ≤ dlogτ be.To get the induction started, note that the result is easily verified for all cases with b = 3, 4. Sowe now assume b ≥ 5, hence n ≥ 5.In the first iteration of the while-loop we arrive at a′ = b and b′ = r = a− qb.If b′ ≤ 2 then the total number of iterations is at most 1 + 2 = 3 ≤ n− 1.If 3 ≤ b′ ≤ Fn then mathematical induction tells us that the total number of iterations is at most1 + (n− 2) = n− 1.If b′ > Fn then we enter a second iteration. We denote the q, r of this second iteration by

q∗, r∗. Note thata′

b′=

b

b′<

Fn+1

Fn< 2, hence in this second iteration we have q∗ = 1, and

r∗ = a′ − b′ = b− r < Fn+1 − Fn = Fn−1.If r∗ ≤ 2 then the total number of iterations is at most 2 + 2 = 4 ≤ n− 1.If r∗ ≥ 3 then mathematical induction tells us that the total number of iterations is at most2 + (n− 3) = n− 1. 2

However, note that each step in the while-loop requires a (long) division. This implies that theso called bit complexity of the Euclidean Algorithm for n bit numbers is O(n3): the while-loop isexecuted O(n) times, and each time a long division of quadratic complexity is done.

On looking a bit more careful, the bit complexity can actually be estimated as O(n2). The reasonis that the divisions are performed on shorter and shorter numbers, so that we get a sum looking

like

n−1∑k=0

n2(n−k)/n ≈ n2.

There exist almost linear (O(n log2 n log log n)) algorithms for computing gcds.


16 2.2 Efficient Modular Arithmetic

2.1.5.2 Memory

Also the memory requirements of the (Extended) Euclidean Algorithm are minimal. The EuclideanAlgorithm requires only storage for a, b and r, i.e. 3 numbers only. The Extended EuclideanAlgorithm requires additional storage for q, x1, x2, y1 and y2, as by an efficient ordering of thecomputations, the temporary values x3 and y3 can be stored in the memory space of r. This is intotal 8 numbers. Even this is not yet optimal, see Exercise 2.4.

2.2 Efficient Modular Arithmetic

We now proceed with describing the basics of efficient multi-precision modular arithmetic, i.e.arithmetic in Zm for some integer m ≥ 2, not necessarily prime. In particular we deal withmulti-precision numbers, including the modulus m.

2.2.1 Reduction

Reduction modulo m, also called modular reduction, is the process of, given an integer x anda modulus m, finding the unique number congruent to x modulo m in the complete residuesystem {0, 1, . . . ,m− 1}. This unique number is often written as x (mod m). When doing morecomplicated computations modulo m, modular reduction will be performed time and again. Themain reason to do it is to keep the numbers small: x (mod m) is never larger than the modulus,and is also unsigned. In a complicated computation reduction is often done interleaved withthe computation, i.e. immediately after, or even interleaved with, each addition / subtraction /multiplication.

Reduction can often be done by common sense methods. For example, when you are certain thatthe number to be reduced is between m and 2m − 1, then simply subtracting m suffices. Andwhen you are certain that it is between −m and −1, simply adding m suffices.

The general method is division with remainder. So a naive algorithm is the following:

Algorithm 2.4 (Modular Reduction, naive method)

Input: x,m ∈ Z with m ≥ 2Output: y ∈ {0, 1, . . . ,m− 1} such that y ≡ x (mod m)

Step 1: q ←⌊ xm

⌋, y = x− qm, output y

But note that this time we’re not at all interested in the quotient, only in the remainder. So analgorithm can easily be designed that does only subtractions (and shifts, i.e. multiplications bypowers of the radix).

Algorithm 2.5 (Modular Reduction with radix b)

Input: x,m, b ∈ Z with m ≥ 2, b ≥ 2Output: y ∈ {0, 1, . . . ,m− 1} such that y ≡ x (mod m)Step 1: x′ ← |x|

k ← word length of x, n← word length of mStep 2: for i = k − n down to 0 do

while x′ ≥ mbi do x′ ← x′ −mbiStep 3: if x ≥ 0 or x′ = 0 then y ← x′ else y = m− x′

output y

This algorithm is particularly efficient when the radix is small (such as b = 2). Of course interme-diate versions of the algorithm, doing divisions on one word numbers only, are also possible. Then


2.2 Efficient Modular Arithmetic 17

one should replace the while-loop in step 2 by one subtraction of qmbi for the proper value of q,found by division of one- or at most two-words numbers.

Another idea to avoid the long division is at the background of Barrett’s modular reduction method .

It makes an easy to compute, educated guess of the quotient q =⌊ xm

⌋. It requires a precomputa-

tion depending only on the modulus, requires b > 3 (which is realistic as b usually is 2 to the powerthe word size) and the number to be reduced less than the square of the modulus (which is realisticas the most difficult modular reduction to be performed in practice is that after multiplying tworeduced numbers). It is advantageous if many modular reductions have to be done with the samemodulus, as happens often in cryptology.

Algorithm 2.6 (Barrett Modular Reduction)

Input: x,m, b ∈ Z with m ≥ 2, b ≥ 3, m ≤ x < m2,

n← radix b length of m,

[precomputed] µ←⌊b2n

m

⌋Output: y ∈ {0, 1, . . . ,m− 1} such that y ≡ x (mod m)

Step 1: q0 ←⌊ x

bn−1

⌋, q ←

⌊ µq0bn+1

⌋Step 2: r1 ← x (mod bn+1), r2 ← qm (mod bn+1)Step 3: if r1 ≥ r2 then y ← r1 − r2 else y ← r1 − r2 + bn+1

Step 4: while y ≥ m do y ← y −m, output y

The precomputation has to be done only once, of course.

Note that the divisions in step 1 all are only simple shifts.

Clearly the complexity of the above algorithms, i.e. the number of operations on one word, isO((log x)(logm)). In practice one applies the algorithm so often that the inputs x are at most ofthe size of m2. Then the algorithm runs in complexity O((logm)2). In practice the algorithm isreasonably efficient for moduli of many thousands of bits.

2.2.2 Addition, subtraction and multiplication

With modular arithmetic algorithms we will always assume that the input is in reduced form.Modular addition and subtraction is straightforward.

Algorithm 2.7 (Modular Addition)

Input: m ∈ Z with m ≥ 2, x, y ∈ {0, 1, 2, . . . ,m− 1}Output: z ∈ {0, 1, . . . ,m− 1} such that z ≡ x+ y (mod m)Step 1: z′ ← x+ yStep 2: if z′ < m then z ← z′ else z ← z′ −m

output z

Algorithm 2.8 (Modular Subtraction)

Input: m ∈ Z with m ≥ 2, x, y ∈ {0, 1, 2, . . . ,m− 1}Output: z ∈ {0, 1, . . . ,m− 1} such that z ≡ x− y (mod m)Step 1: z′ ← x− yStep 2: if z′ ≥ 0 then z ← z′ else z ← z′ +m

output z

Modular multiplication can be done in a similar way, using any of the multiplication algorithmspresented in Section 1.3.



Algorithm 2.9 (Modular Multiplication, naive method)

Input: m ∈ Z with m ≥ 2, x, y ∈ {0, 1, 2, . . . ,m− 1}Output: z ∈ {0, 1, . . . ,m− 1} such that z ≡ xy (mod m)Step 1: compute z′ ← xy by any convenient multiplication method

Step 2: compute z ← z′ (mod m) by any convenient modular reduction method

output z

But here more variants are possible, interleaving the modular reduction in various ways into themultiplication algorithm. In the following algorithm modular reduction has to be done only onnumbers that are at most one word longer than the modulus. This avoids both long divisions andnumbers of word length larger than 1+ the word length of the modulus.

Algorithm 2.10 (Modular Multiplication with interleaved modular reduction)

Input: b ≥ 2, m ∈ N, m ≥ 2, x, y ∈ {0, 1, 2, . . . ,m− 1}Output: z ∈ {0, 1, . . . ,m− 1} such that z ≡ xy (mod m)Step 1: n← the smallest even number that is at least

the length of the radix b representation of mStep 2: xlo ← x (mod bn/2), xhi ← (x− xlo)b−n/2,

ylo ← y (mod bn/2), yhi ← (y − ylo)b−n/2Step 3: compute by Karatsuba’s method z0 ← xloylo (mod m),

z1 ← xhiylo + xloyhi (mod m), z2 ← xhiyhi (mod m)Step 4: z ← z2, for i from 1 to n/2 do z ← bz (mod m),

z ← z + z1 (mod m), for i from 1 to n/2 do z ← bz (mod m),z ← z + z0 (mod m), output z

2.2.3 Modular inversion and division

Modular inversion, i.e. given a (mod m) such that a and m are coprime, computing the uniquenumber b (mod m) with ab ≡ 1 (mod m), can be done by the Extended Euclidean Algorithm.Namely, this algorithm finds x, y ∈ Z such that xa + ym = 1, so b ≡ x (mod m) is the answer.Indeed, we can slightly simplify the algorithm in this case, as shown below. Note that also theExtended Binary Euclidean Algorithm (see Exercise 2.6) can be adapted to produce the modularinverse efficiently.

The bit complexity of this algorithm is that of the Euclidean Algorithm, i.e. O((logm)2).

Algorithm 2.11 (Modular Inversion)

Input: m ∈ Z with m ≥ 2, a ∈ {0, 1, . . . ,m− 1}Output: a−1 (mod m) ∈ {0, 1, . . . ,m− 1} if it exists,

an error message otherwise

Step 1: a′ ← a, m′ ← mx1 ← 1, x2 ← 0

Step 2: while m′ > 0 do

q ←⌊a′

m′

⌋, r ← a′ − qm′

a′ ← m′, m′ ← rx3 ← x1 − qx2x1 ← x2, x2 ← x3

Step 3: if a′ = 1 then a−1 ← x1, output a−1

else output "inverse does not exist"

Modular division, i.e. given a, b (mod m) such that a and m are coprime, computing the unique



number c (mod m) with ac ≡ b (mod m), can be done by modular inversion followed by a mul-tiplication. However, notice that modular division sometimes is possible also when a and m arenot coprime, see Exercise 2.13.

2.2.4 Exponentiation

In Section 1.3 we did not treat exponentiation for integers, as this is usually not very practical.Indeed, computing ab for large a, b will in general be infeasible for b that is larger than 1000 orso. For example, when the numbers are only 10 decimal digits long, ab can have more than 1010

decimal digits, and that’s way too much to be practical.

However, when we are doing modular arithmetic, the situation is much better. Indeed, exponenti-ation methods exist that keep all intermediate results reasonably small, namely of the size of themodulus. And these methods are pretty efficient too, so that modular exponentiations of numbersof 1000 digits can be done in practice. Cryptographers make use of this very much.

Assume we have a modulus m, a number x (mod m) and an exponent a (may all be integers withthousands of bits), and we want to compute xa (mod m).

The naive method, of multiplying x by itself a− 1 times, and reducing after each multiplication,does keep the intermediate results small, but takes a prohibitively long time. (complexity O(a),which is exponential in the size of a, where you actually would like a method that is only polynomialin the size of a.

2.2.4.1 Square and multiply

More efficient ways of performing modular exponentiation use repeated square and multiply . Notethat repeated squaring of x gives x2, x4, x8, x16, x32, . . . , and exponents in between two powers of 2can be found as sums of powers of 2, e.g. 22 = 16+4+2. This means that x22 = x16+4+2 = x16x4x2

can be computed with only 4 squarings and 2 multiplications. Note that the binary expansion ofthe exponent plays an important role here.

We now give several variants. Always we assume that the base of the exponentiation is reducedmodulo m. In theory we can also assume that the exponent is reduced modulo φ(m) (the EulerTotient function), but that is not necessary for describing our algorithms, and moreover thisreduction is not always possible in practice, because often the exact value of φ(m) is not known.

We start with the binary variants, where we use the binary expansion of the exponent. We canscan the binary expansion of the exponent from right to left or from left to right.

Algorithm 2.12 (Modular Exponentiation, binary right to left method)

Input: m, a ∈ N with m ≥ 2, x ∈ {0, 1, . . . ,m− 1}Output: z ∈ {0, 1, . . . ,m− 1} such that z ≡ xa (mod m)Step 1: z ← 1, s← x, a′ ← aStep 2: while a′ > 0 do

if a′ is odd then z ← sz (mod m)

a′ ←⌊a′

2

⌋, if a′ > 0 then s← s2 (mod m)

Step 3: output z

Scanning from left to right is illustrated by noting that 22 = ((2× 2 + 1)× 2 + 1)× 2.



Algorithm 2.13 (Modular Exponentiation, binary left to right method)

Input: m, a ∈ N with m ≥ 2, x ∈ {0, 1, . . . ,m− 1},with the binary expansion [a]2 = [an−1 . . . a0]2 of a

Output: z ∈ {0, 1, . . . ,m− 1} such that z ≡ xa (mod m)Step 1: z ← 1Step 2: for i from n− 1 down to 0 do

z ← z2 (mod m)if ai = 1 then z ← xz (mod m)

Step 3: output z

The complexity of these algorithms is not hard to estimate. The number of multiplications and

squarings is at most

⌊log a

log 2

⌋, and each multiplication / squaring uses numbers less than the mod-

ulus. When the exponent a is reduced, and a modular multiplication algorithm with complexityO((logm)2) is used (as will typically be the case), then the running time of modular exponentiationis O((logm)3).

2.2.4.2 Montgomery

Modular multiplication as treated above is essentially not faster than first doing a multiplication,and then doing a full modular reduction by division. Montgomery reduction is a technique thatavoids the costly division step, at the cost of pre- and post-computations. This is especially effectivewhen a complicated computation has to be done involving a number of modular multiplicationson intermediate numbers, all with the same modulus. The first application that comes to mind isin modular exponentiation.

Let m ∈ N, m ≥ 2 be a modulus. We now choose a fixed number R with R > m and gcd(R,m) = 1,and we assume that we have to compute only with numbers x ∈ Z with 0 ≤ x < Rm. In practicewe take R = bn where b is the radix (word size, so usually a suitable power of 2), and n is takensuch that bn−1 ≤ m < bn, i.e. essentially the word size of m Note that with an odd modulus andR = bn a power of 2, the condition gcd(R,m) = 1 is automatic.

We now introduce the Montgomery representation MR(x) of x, and its inverse operation Mont-gomery reduction M−1R (x) of x, both (mod m) and with respect to R, as follows:

Montgomery representation: MR(x) ≡ xR (mod m),

Montgomery reduction: M−1R (x) ≡ xR−1 (mod m).

The main idea of Montgomery is to work with numbers in Montgomery representation. As pre-computation MR(x) has to be computed. Furthermore a method has to be found to do efficientmultiplication of numbers in Montgomery representation. Finally as postcomputation the Mont-gomery representation has to be undone by Montgomery reduction.

Assume that we have a procedure that for given x, y (mod m) efficiently performs Montgomerymultiplication, i.e. computes the Montgomery product of x and y, that is the Montgomery reduc-tion of xy, that is

mulR(x, y) ≡M−1R (xy) ≡ xyR−1 (mod m).

Applied to the Montgomery representations of x, y, we thus get

mulR (MR(x),MR(y)) ≡ mulR(xR, yR) ≡ (xR)(yR)R−1 ≡ (xyR) ≡MR(xy) (mod m).

So Montgomery multiplication indeed multiplies numbers in Montgomery representation.



Note that both the Montgomery representation and the Montgomery reduction of x can be easilycomputed using Montgomery multiplication, because

MR(x) ≡ xR ≡ mulR(x,R2 (mod m)) (mod m),

M−1R (x) ≡ xR−1 ≡ mulR(x, 1) (mod m).

We will argue by example why working with the Montgomery representation may be advantageous.To compute x5 mod m, the original left-to-right exponentiation method goes as follows:

squaring: x1 ← x2

reduction: x2 ← x1 (mod m) (so x2 ≡ x2 (mod m))squaring: x3 ← x22reduction: x4 ← x3 (mod m) (so x4 ≡ x4 (mod m))multiplication: x5 ← xx4reduction: x6 ← x5 (mod m) (so x6 ≡ x5 (mod m))

Note that three modular reductions are necessary, each requiring a costly division.

With Montgomery multiplication the computation of x5 mod m can be done as follows:

pre-computation: x1 ← mulR(x,R2 (mod m)) (so x1 ≡ xR (mod m))Montgomery square: x2 ← mulR(x1, x1) (so x2 ≡ x21R−1 ≡ x2R (mod m))Montgomery square: x3 ← mulR(x2, x2) (so x3 ≡ x22R−1 ≡ x4R (mod m))Montgomery multiply: x4 ← mulR(x1, x3) (so x4 ≡ x1x3R−1 ≡ x5R (mod m))Montgomery reduce: x5 ← mulR(x4, 1) (so x5 ≡ x5 (mod m))

You should notice that only one modular reduction is necessary (in the precomputation), and thatindeed all numbers are less than mR.

Montgomery multiplication (and thus reduction as well) is easy because of the following lemma.

Lemma 2.9 Let m, b, n ∈ N satisfy b ≥ 2, 2 ≤ m < bn, gcd(m, b) = 1. Put R = bn, m′ ≡ −m−1(mod R). Let x, y ∈ Z satisfy 0 ≤ x < m, 0 ≤ y < m.If u = xym′ (mod R) then (xy+ um)/R is an integer, and (xy+ um)/R ≡ mulR(x, y) (mod m).

Proof. Note that xy + um ≡ xy(1 +mm′) ≡ 0 (mod R), so (xy + um)/R is an integer. Clearly(xy + um)/R ≡ xyR−1 = mulR(x, y) (mod m). 2

Note that 0 ≤ (xy+ um)/R < 2m, so to compute mulR(x, y) it suffices to compute (xy+ um)/R,and if necessary, to subtract m.

Note that computing xy can be done by an efficient method such as Karatsuba’s.

In practice the following algorithm is used, where the idea of Lemma 2.9 is worked out per radixb word, rather than at once for the whole R = bn. This means also that m′ = −m−1 is requiredonly modulo b.

Algorithm 2.14 (Montgomery Multiplication)

Input: m, b, n ∈ N with b ≥ 2, 2 ≤ m < bn, gcd(m, b) = 1,m′ = −m−1 (mod b),x, y ∈ Z with 0 ≤ x < m, 0 ≤ y < m and radix b representations

[x]b = [xn−1, . . . , x0]b, [y]b = [yn−1, . . . , y0]bOutput: mulbn(x, y) = M−1bn (xy)Step 1: a0 ← 0Step 2: for i from 0 to n− 1 do

ui ← ((ai (mod b)) + xiy0)m′ (mod b)ai+1 ← (ai + xiy + uim)/b

Step 3: if an ≥ m then a← an −m, else a← an, output a



We explain why this works. Note that in the second line of step 2 we have, using y ≡ y0 (mod b)and m ·m′ ≡ −1 (mod b), that

ai + xiy + uim ≡ (ai + xiy) + (ai + xiy0)m′ ·m ≡ 0 (mod b),

so ai+1 is an integer. Further note that

bnan ≡ bn−1an−1 + bn−1xn−1y

≡ bn−2an−2 + bn−2xn−2y + bn−1xn−1y

≡ . . .

≡ a0 + x0y + bx1y + b2x2y + . . .+ bn−2xn−2y + bn−1xn−1y

= xy (mod m),

so that indeed a ≡ xyb−n ≡ M−1bn (xy) (mod m). To explain step 3 it remains to note that

0 ≤ an < 2m, because ai+1 ≤aib

+ 2mb− 1

b, hence

a1 ≤ 2mb− 1

b,

a2 ≤ 2mb− 1

b2+ 2m

b− 1

b,

a3 ≤ 2mb− 1

b3+ 2m

b− 1

b2+ 2m

b− 1

b,

. . .

an ≤ 2m(b− 1)

(1

bn+

1

bn−1+ . . .+

1

b2+

1

b

)< 2m(b− 1)

(. . .+

1

b2+

1

b

)= 2m.

Also note that in the first line of step 2 only single word operations are done, and that in thesecond line the division by b is just a shift by one word.

Finally we give the efficient method for Montgomery exponentiation using Montgomery multipli-cation. It is based on the binary left to right exponentiation method.

Algorithm 2.15 (Montgomery Exponentiation)

Input: m, b, n ∈ N with b ≥ 2, 2 ≤ m < bn, gcd(m, b) = 1,x ∈ {0, 1, . . . ,m− 1},a ∈ N with the binary expansion [a]2 = [at−1 . . . a0]2[precomputed] r1 = bn (mod m), r2 = b2n (mod m)

Output: z ∈ {0, 1, . . . ,m− 1} such that z ≡ xa (mod m)Step 1: z ← r1, x′ ← mulbn(x, r2) (mod m)Step 2: for i from t− 1 down to 0 do

z ← mulbn(z, z)if ai = 1 then z ← mulbn(x′, z)

Step 3: z ← mulbn(z, 1), output z

Note that Step 1 does the inverse Montgomery reduction, step 2 is just a straight MontgomerifiedAlgorithm 2.13, and step 3 is Montgomery reduction.

Montgomery exponentiation still is cubic (complexity O(n2t) word multiplications, so O(n3) whent ≈ n), but no divisions have to be done, apart from once in the precomputation. So in practicean important saving is accomplished.


2.3 The Chinese Remainder Theorem 23

2.2.4.3 Windowing

Finally a short note about windowing techniques. This is a variant of the binary left-to-rightexponentiation method, where in stead of scanning the exponent bit by bit, one takes ’windows’,i.e. a block of bits at once. The advantage is that less multiplications have to be done, thedisadvantage that a precomputation is required, and storage of the precomputed numbers.

Algorithm 2.16 (Modular Exponentiation, window method)

Input: m ∈ N with m ≥ 2, x ∈ {0, 1, . . . ,m− 1}, k ∈ N,a ∈ N with radix 2k representation [a]2k = [an−1 . . . a0]2k[precomputed] xi (mod m) for i = 2, 3, . . . , 2k − 1

Output: z ∈ {0, 1, . . . ,m− 1} such that z ≡ xa (mod m)Step 1: z ← 1Step 2: for i from n− 1 down to 0 do

z ← z2k

(mod m)if ai 6= 0 then z ← xaiz (mod m)

Step 3: output z

Variants exist that e.g. optimize storage, and optimize multiplications e.g. by ’sliding windows’,where one skips blocks of bits that contain too many zeroes.

2.3 The Chinese Remainder Theorem

2.3.1 Theoretic viewpoint

An important topic to study is systems of linear congruence equations, where we have only onevariable, but different moduli.

As an example, let us try to solve in a naive way{3x ≡ 7 (mod 11),2x ≡ 9 (mod 13).

The first congruence can be solved on noting that 3−1 = 4 (mod 11), hence x ≡ 7 × 4 ≡ 6(mod 11). Hence we may write x = 6 + 11k for some k ∈ Z. Substituting this into the secondcongruence we get 12 + 22k ≡ 9 (mod 13), or, equivalently, 9k ≡ 10 (mod 13). Note that 9−1 = 3(mod 13), hence k ≡ 10 × 3 ≡ 4 (mod 13). Hence we may write k = 4 + 13` for some ` ∈ Z.Substituting this back into x = 6 + 11k we get x = 6 + 11(4 + 13`) = 50 + 143`, which comes downto x ≡ 50 (mod 143). Note that there is exactly one solution modulo the product of the originalmoduli.

In general, we want to have an efficient method for solving the following system:a1x ≡ b1 (mod m1),a2x ≡ b2 (mod m2),

...akx ≡ bk (mod mk).

To start with, note that in order for solutions to exist we must have gcd(ai,mi) | bi for all i. If thatcondition is satisfied, then in each congruence we can divide by gcd(ai,mi). Hence we may assumewithout loss of generality that gcd(ai,mi) = 1 for all i. Then by multiplying each congruence bya−1i (mod mi) we may also assume without loss of generality that ai = 1 for all i.


24 2.3 The Chinese Remainder Theorem

Next assume that there exist i 6= j such that d = gcd(mi,mj) > 1. Then x ≡ bi (mod d) andx ≡ bj (mod d), so a necessary condition then is that bi ≡ bj (mod d). At least one of mi and

mj , say mj , now satisfies gcd(d,mj

d

)= 1, and the set of solutions to x ≡ bi (mod mi), x ≡ bj

(mod mj) is then equal to the set of solutions to x ≡ bi (mod mi), x ≡ bj (modmj

d) (exercise

2.13). Now we replace mj bymj

d, and then we see that we may assume without loss of generality

that all moduli are pairwise coprime.

Now we are in a position to formulate the main result, which was already known to mathematiciansin ancient China.

Theorem 2.10 (Chinese Remainder Theorem) Let m1,m2, . . .mk be pairwise coprime inte-

gers ≥ 2, and let b1, b2, . . . , bk ∈ Z. Let m =

k∏i=1

mi. There exists exactly one solution x (mod m)

of the systemx ≡ b1 (mod m1),x ≡ b2 (mod m2),

...x ≡ bk (mod mk).

Proof. To show existence, we follow Gauss’ method. Put µi =m

mi=

k∏j=1j 6=i

mi, and let

ni = µ−1i (mod mi) (note that this inverse exists because all moduli are coprime). Then

µjnj ≡

{0 (mod mi) if i 6= j

1 (mod mi) if i = j, and so x =

k∑j=1

bjµjnj is a solution.

To show uniqueness, let us take two solutions x, x′. Then mi | x− x′ for all i, hence1

lcm(m1,m2, . . . ,mk) | x − x′. But as gcd(m1,m2, . . . ,mk) = 1 we must have (by easily provedproperties of the lcm) that lcm(m1,m2, . . . ,mk) = m. Hence x ≡ x′ (mod m). 2

Note that Gauss’ method, as described in the proof above, can in principle be used to computethe solution. The complexity is O((logm)2) bit operations.

Example: with x ≡ 6 (mod 11), x ≡ 11 (mod 13), x ≡ 16 (mod 17) we get µ1 = 221, µ2 = 187,µ3 = 143, n1 = 1, n2 = 8, n3 = 5, so x = 6× 221× 1 + 11× 187× 8 + 16× 143× 5 = 29222 ≡ 50(mod 2431).

2.3.2 Algorithmic viewpoint

A more efficient method is Garner’s Algorithm, especially when a number of systems have to besolved with the same set of moduli. The main advantage is that only reductions modulo mi arerequired.

A case of particular importance for cryptography is the case of x ≡ bp (mod p), x ≡ bq (mod q),where p, q are distinct primes. The abbreviation CRT is used in the cryptographic literature alot, and it usually refers to this special case. The solution is then often given as follows:

u← p−1 (mod q),x← bp + (bq − bp)up (mod pq).

1lcm(a, b) is the least common multiple of a and b, which is equal to ab/ gcd(a, b).


2.3 The Chinese Remainder Theorem 25

Algorithm 2.17 (Garner’s Algorithm)

Input: m1,m2, . . . ,mk ∈ Z with mi ≥ 2 pairwise coprime,

b1, b2, . . . , bk ∈ Z,

[precomputed] Ci =

i−1∏j=1

m−1j (mod mi) for all i = 2, 3, . . . , k

Output: x such that x ≡ mi (mod mi) for all i = 1, 2, . . . , k,

with x ∈ {0, 1, . . . ,m− 1}, where m =

k∏i=1

mi

Step 1: x← b1Step 2: for i from 2 to k do

u← (bi − x)Ci (mod mi)

x← x+ u

i−1∏j=1

mj

Step 3: output x

2.3.3 Practical applications

The Chinese Remainder Theorem has many applications, of which we mention a few practicalones.

In RSA cryptography it is used to speed up the private key operation. See any introductory courseon public key cryptography.

In [Kn, Section 4.3.3B] a method from Schonhage, different from the generalized Karatsuba methoddescribed in Section 1.3.2.3, is described in which modular arithmetic and the Chinese RemainderTheorem are used to speed up multiplication of large numbers to almost linear complexity.

In [Sh, Section 4.4] another application of the Chinese Remainder Theorem is presented that showshow sometimes specific computations can be sped up.

Exercises

2.1. Use the Extended Euclidean Algorithm to compute gcd(3507, 2181), and to find x, y ∈ Zsuch that 3507x+ 2181y = gcd(3507, 2181).

2.2. Let a, b ∈ Z and d = gcd(a, b). Theorem 2.2 shows the existence of x0, y0 ∈ Z such thatd = x0a+ y0b. Find all solutions (x, y) to d = xa+ yb, i.e. express them in terms of (x0, y0).

2.3. Let d = gcd(a, b). Show that gcd

(a

d,b

d

)= 1.

2.4. Find a way to implement the Extended Euclidean Algorithm with only enough memory for7 numbers.

2.5. Write out a proof of Lemma 2.6, and of Corollary 2.7.

2.6. Devise an Extended Binary Euclidean Algorithm.

2.7. Compute again gcd(3507, 2181) (see Exercise 2.1), but this time using the Binary EuclideanAlgorithm. If you did Exercise 2.6), also find x, y ∈ Z such that 3507x+ 2181y = gcd(3507, 2181)by the Extended Binary Euclidean Algorithm. Try to compare the workload (number of iterations)to the original Euclidean Algorithm.

2.8. Let a = 2n + 1 and b = 2n− 1. Compute the number of iterations performed when using the


26 2.3 The Chinese Remainder Theorem

Binary Euclidean Algorithm to compute gcd(a, b).

2.9. (a) Give a proper definition of the greatest common divisor of three integers a, b, c.(b) Show that three numbers can be relatively prime even if any two of them are not relativelyprime.(c) Show that gcd(a, b, c) = gcd(gcd(a, b), c).(d) Show that there exist x, y, z ∈ Z such that xa+ yb+ zc = gcd(a, b, c).(e) Devise an efficient algorithm to compute gcd(a, b, c).(f) Generalize to the greatest common divisor of n integers a1, a2, . . . , an.

2.10. Implement all algorithms given in Section 2.1, and the one you did in Exercise 2.6, in yourfavourite computer language or computer algebra system.

2.11. Prove that Barrett modular reduction (Algorithm 2.6) is correct. Hint: show that 0 ≤x− qm < 3m, and that at the end of step 3 it is always true that z = x− qm.

2.12. Perform Algorithms 2.4, 2.5 and 2.6 on x = 3507449, m = 1913, b = 10. You can do thisby hand, or write a program in your favourite language or computer algebra package.

2.13. Let a, b,m ∈ Z with m ≥ 2. Let d = gcd(a,m). Show that ax ≡ b (mod m) has solutions

if and only if d | b, and that the solutions then are given by x ≡(ad

)−1 bd

(mod

m

d

). Show that

there are exactly d solutions modulo m, and describe them.

2.14. Compute the product of 3507 and 2181 modulo m = 72639 by Algorithm 2.14 (Montgomerymultiplication) using b = 10, n = 5.

2.15. Compute 28355 (mod 1001) by the binary right to left and left to right exponentiationalgorithms.

2.16. In some cryptographic computations several modular exponentiations with the same modu-lus have to be performed, to produce one outcome, e.g. the computation of xayb (mod m). Devisean efficient algorithm that outputs xayb (mod m) on input of m,x, y, a, b.Hint: adapt the binary left to right modular exponentiation algorithm. Your algorithm should beable to compute x22y13 (mod m) in at most 4 multiplications and 4 squarings, after a precompu-tation of 1 multiplication.

2.17. Prove the correctness of Garner’s Algorithm 2.17.

2.18. Solve x ≡ 2 (mod 5), x ≡ 1 (mod 7), x ≡ 3 (mod 11), x ≡ 8 (mod 13), both by Gauss’method and Garner’s algorithm.

2.19. Find all x ∈ Z that satisfy x ≡ 3 (mod 8), x ≡ 7 (mod 15) and x ≡ 12 (mod 25).

2.20. Implement all algorithms given in Section 2.2 in your favourite computer language orcomputer algebra system.

2.21. Implement Gauss’ and Garner’s methods given in Section 2.3 in your favourite computerlanguage or computer algebra system.


3.1 Euler (and Fermat) 27

Chapter 3

Multiplicative structure of Z∗n

Introduction

General references for this chapter: [BS, Chapters 4, 5], [CP, Chapter 9], [GG, Chapters 3, 4, 5],[Sh, Chapters 4, 11]. And for those preferring Dutch: [Be, Chapter 7], [Ke, Chapter 11], [dW,Chapter 2].

In this Chapter we study the multiplicative properties of the integers modulo n. In particular thefollowing concepts will be treated:

• order of an element,

• primitive root.

3.1 Euler (and Fermat)

We start with repeating the required concepts and results from elementary algebra that have beenstudied in the first year Algebra course.

By Zn we denote the integers modulo n, for a given integer n ≥ 2, not necessarily prime. Usuallywe denote the elements of Zn by 0, 1, . . . , n − 1. Formally speaking all this is abuse of notation,since the elements are not numbers, but residue classes, but confusion should not arise.

A set {a1, a2, . . . , ak} ⊂ Z is called a reduced residue system (mod n) if it contains exactly onerepresentative for each congruence class of integers that are coprime to n. A reduced residuesystem modulo n has exactly φ(n) elements, where φ(n) is Euler’s Totient function. This functionhas the following properties.

Theorem 3.1 (Euler φ function)(a) φ is a multiplicative function, i.e. if gcd(m,n) = 1 then φ(mn) = φ(m)φ(n).

(b) If n =

k∏i=1

peii is the factorization of n into primes, then

φ(n) =

k∏i=1

pei−1i (pi − 1) = n

k∏i=1

(1− 1

pi

).


28 3.1 Euler (and Fermat)

Proof. (a) Consider the map ρ : Z∗mn → Z∗m × Z∗n defined by ρ(a (mod mn)) = (a (mod m), a(mod n)). It suffices to show that this map is bijective. That it is surjective is just the ChineseRemainder Theorem (Theorem 2.10). To show that it is injective, let a, b ∈ Z∗mn be such thatρ(a) = ρ(b). Then a ≡ b (mod m) and a ≡ b (mod n), so m|a− b and n|a− b, and because m andn are relatively prime it follows that mn|a− b, hence a ≡ b (mod mn).(b) In view of (a) it suffices to prove that φ(pe) = pe−1(p − 1) for any prime p and any e ≥ 1.Note that there are pe elements in {0, 1, 2, . . . , pe − 1}, and exactly pe−1 of them are divisible byp, namely 0, p, 2p, . . . , pe − p. So φ(pe) = pe − pe−1 = pe−1(p− 1). 2

One of the reasons why Euler’s φ(n) is important is the following result.

Theorem 3.2 (Euler’s Theorem) If n ≥ 2 and a ∈ Z coprime to n, then aφ(n) ≡ 1 (mod n).

Proof. Let Z∗n = {a1, a2, . . . , aφ(n)}, and let a ∈ Z∗n. Note that if aai ≡ aaj (mod n), thenthe fact that gcd(a, n) = 1 implies that ai ≡ aj (mod n), and hence aa1, aa2, . . . , aaφ(n) are alldifferent (mod n). This implies Z∗n = {aa1, aa2, . . . , aaφ(n)}. Multiply all elements of Z∗n in two

ways:

φ(n)∏i=1

ai ≡φ(n)∏i=1

aai = aφ(n)φ(n)∏i=1

ai (mod n), and dividing out

φ(n)∏i=1

ai the result follows. 2

An obvious special case of Euler’s theorem is Fermat’s theorem.

Theorem 3.3 (Fermat’s Theorem) If p is prime and a ∈ Z such that p - a, then ap−1 ≡ 1(mod p).

Note that for some a there may exist smaller exponents than p− 1 for which the power is already1 modulo p.

An important consequence of Euler’s theorem is that when computing powers modulo n, we cantake exponents modulo φ(n). Another consequence is that we can compute modular inverses bymodular exponentiation.

Corollary 3.4 Let n ≥ 2 and x ∈ Z coprime to n.(a) xa ≡ xb (mod n) if a ≡ b (mod φ(n)).(b) The inverse x−1 of x modulo n is given by x−1 ≡ xφ(n)−1 (mod n).

The next result is easy to prove, see Exercise 3.3.

Theorem 3.5∑d|n

φ(d) = n.

We will also need a few results on the number of solutions of polynomial congruences.

Theorem 3.6 Let p be prime, and let f ∈ Zp[X] be a polynomial of degree n. Then f(x) ≡ 0(mod p) has at most n different solutions in Zp.

Proof. For n = 1 this is trivial. For n > 1 we apply induction. Suppose any polynomial in Zp[X]of degree ≤ n − 1 has at most n − 1 zeroes. When f has no zeroes, then the result is triviallytrue, so we assume that f has at least one zero, say α. By division with remainder, we see thatthere exists q ∈ Zp[X] such that f(X) = (X − α)q(X), and clearly q has degree n − 1. Anyzero of f different from α now is a zero of q (note that here we use the fact that p is prime; forcomposite p this is not true anymore). Induction now shows that the total number of zeroes of fis ≤ 1 + (n− 1). 2


3.2 Order of an element 29

Theorem 3.7 Let p be prime, and d | p− 1. Then xd − 1 ≡ 0 (mod p) has exactly d solutions.

Proof. Fermat’s Theorem 3.3 shows that xp−1 − 1 ≡ 0 (mod p) has exactly p− 1 solutions. Nowwrite p− 1 = kd. Note that

xp−1 − 1 = (xd − 1)(x(k−1)d + x(k−2)d + . . .+ xd + 1

),

and Theorem 3.6 shows that the total number of zeroes is (≤ d) + (≤ (k − 1)d). The result nowfollows at once. 2

3.2 Order of an element

We know that Zn is a commutative ring. It follows that Z∗n = {x ∈ Z|0 ≤ x ≤ n−1, gcd(x, n) = 1}is a multiplicative group. Note that #Z∗n = φ(n). In particular, when p is prime then Z∗p ={1, 2, . . . , p− 1} (in this case Zp is even a field).

In group theory, the term ”order” is used in two different ways.

In the first place the number of elements of a finite group G (such as Z∗n) is called the order of thegroup, and we use the notation ord(G) = #G.

In the second place each element of a finite group has an order. This is defined as follows: if G isa finite group and a ∈ G, then the order of the element a is the smallest positive integer e suchthat ae = 1. Notation: ord(a).

In the case of G = Z∗n the equation ae = 1 really means ae ≡ 1 (mod n). For example, in Z∗7 wehave ord(1) = 1, ord(6) = 2, ord(2) = ord(4) = 3, and ord(3) = ord(5) = 6.

Note that the cyclic subgroup of G generated by a, which is denoted by 〈a〉 = {1, a, a2, a3, . . .},has exactly ord(a) elements, i.e. ord(〈a〉) = ord(a), so the two concepts of order coincide in thiscase.

We now have the following results, giving useful properties of element orders, notably that theorder of an element always divides the order of the group it lies in.

Lemma 3.8 Let a ∈ Z∗n.(a) ord(a) exists.(b) If e ≥ 1 such that ae ≡ 1 (mod n) then ord(a) | e.(c) ord(a) | φ(n).

Proof. (a) Euler’s Theorem 3.2 shows that there exists some e ≥ 1 with ae ≡ 1 (mod n), namelyφ(n). But then there also exists a smallest such e.(b) Apply division with remainder: there exist q, r ∈ Z with e = qord(a) + r and 0 ≤ r < ord(a).Clearly ar ≡ ae−qord(a) ≡ ae(aord(a))−q = 1 · 1−q = 1. By the definition of ord(a) this means thatr = 0.(c) This uses (b) together with Euler’s Theorem 3.2. 2

The set 〈a〉 generated by a is a cyclic subgroup of Z∗n, with ord(〈a〉) = ord(a) | φ(n). Often this

order is strictly smaller than φ(n). The integerφ(n)

ord(a)is sometimes called the cofactor of a.

Next we count the number of elements in Z∗p of a given order d, but for primes p only.

Lemma 3.9 Let p be prime, and let d | p− 1. There are exactly φ(d) elements in Z∗p of order d.


30 3.3 Primitive roots

Proof. Let us write

Ad = #{b ∈ Z∗p | ord(b) = d}.

We use induction to prove Ad = φ(d). For d = 1 the result is trivial, since 1 is the only elementof order 1. Now assume the result holds for all d′ < d. Theorem 3.7 and Lemma 3.8(b) tell us:

d = #{b ∈ Z∗p | bd ≡ 1 (mod p)} =∑d′|d

Ad′ = Ad +∑

d′|d,d′ 6=d

Ad′ .

By induction we find d = Ad +∑d′|d,d′ 6=d φ(d′), and then Theorem 3.5 yields d = Ad + (d−φ(d)).

This proves the result. See also Section 6.3. 2

3.3 Primitive roots

In this section we study the multiplicative structure of the set of integers modulo n.

Sometimes the group Z∗n itself is cyclic. Take for example n = 7. Then we have 3 as a generator,as {30, 31, 32, 33, 34, 35} = {1, 3, 2, 6, 4, 5} = Z∗7.

A generator of Z∗n is also called a primitive root modulo n. In other words, a primitive root is bydefinition an element of order φ(n).

Sometimes a primitive root does not exist. Take for example n = 15. Then a simple computationshows that for all x ∈ Z∗15 already x4 ≡ 1 (mod 15), whereas #Z∗15 = 8.

First we study the prime case.

Theorem 3.10 If p is prime then Z∗p is cyclic, i.e. has a primitive root.

Proof. Lemma 3.9 tells us that there are φ(p− 1) elements of order p− 1 = φ(p). 2

Well, that was easy. Note however that this proof is not constructive. Next we study the case ofodd prime powers.

Theorem 3.11 If p is an odd prime and k ≥ 1 then Z∗pk is cyclic, i.e. has a primitive root.

This is more difficult. We first have two lemmas.

Lemma 3.12 Let p be prime and a ≡ 1 (mod pt) for some t ≥ 1.(a) Then ap ≡ 1 (mod pt+1).(b) Suppose p > 2 or t > 1. If a 6≡ 1 (mod pt+1) then ap 6≡ 1 (mod pt+2).

Proof. Write a = 1 + rpt. Note that ap = 1 +

(p1

)rpt +

(p2

)r2p2t + . . .. Because

(p1

)= p and

2t ≥ t+ 1, (a) follows immediately. When p > 2 then p |(p2

), and when t > 1 then 2t ≥ t+ 2. In

both cases we have ap ≡ 1 + rpt+1 (mod pt+2). As p - r, (b) follows. 2

Lemma 3.13 Let a1, a2 ∈ Z∗n with m1 = ord(a1),m2 = ord(a2). Assume that gcd(m1,m2) = 1.Then ord(a1a2) = m1m2.


3.3 Primitive roots 31

Proof. Let t = ord(a1a2). On the one hand (a1a2)m1m2 = (am11 )m2(am2

2 )m1 ≡ 1 (mod n), sot | m1m2. On the other hand, note that 1 ≡ (a1a2)tm2 = atm2

1 (am22 )t ≡ atm2

1 (mod n), so thatm1 = ord(a1) | tm2, hence m1 | t, and similarly we find m2 | t. Thus also m1m2 | t. 2

Proof of Theorem 3.11. Theorem 3.10 covers the case k = 1, so we assume k ≥ 2.We are looking for an element of order φ(pk) = pk−1(p − 1). We will construct such an element,by first finding one of order pk−1, then constructing one of order p− 1, and then we apply Lemma3.13.We first show that ord(p + 1) = pk−1. Namely, applying Lemma 3.12(a) for t = 1, 2, 3, . . . , k

we get p + 1 ≡ 1 (mod p1) ⇒ (p + 1)p ≡ 1 (mod p2) ⇒ (p + 1)p2 ≡ 1 (mod p3) ⇒ . . . ⇒

(p+ 1)pk−1 ≡ 1 (mod pk), so ord(p+ 1) | pk−1 by Lemma 3.8(b). Then, applying Lemma 3.12(b)

for t = 1, 2, 3, . . . , k−1 we get p+1 6≡ 1 (mod p2)⇒ (p+1)p 6≡ 1 (mod p3)⇒ . . .⇒ (p+1)pk−2 6≡ 1

(mod pk), so ord(p+ 1) = pk−1.Now let g1 be a primitive root (mod p), which exists due to Theorem 3.10, and let ` = ord(g1).Lemma 3.8(c) says ` | φ(pk) = pk−1(p − 1). On the other hand, g`1 ≡ 1 (mod pk), so g`1 ≡ 1(mod p), and because g1 is a primitive root modulo p, Lemma 3.8(b) says p− 1 | `. It follows that` = (p− 1)ps for some s satisfying 0 ≤ s ≤ k − 1.

We now take g2 = gps

1 , and we show that ord(g2) = p−1. Indeed, h = p−1 is the smallest positive

integer such that gh2 = ghps

1 ≡ 1 (mod pk).Finally we take g = (p + 1)g2. Lemma 3.13 now implies ord(g) = pk−1(p − 1) = φ(pk), in otherwords: g is a primitive root. 2

Note that this proof is constructive, once a primitive root (mod p) is known.

Next we look at powers of 2.

Theorem 3.14 Z∗2k is cyclic, i.e. has a primitive root, only for k = 1, 2.

Proof. For k = 1, 2 this is trivial: 1 is a primitive root (mod 2), and 3 is a primitive root(mod 4). So assume k ≥ 3. Any odd integer a satisfies a2 ≡ 1 (mod 23). Applying Lemma

3.12(a) for successively t = 3, 4, . . . we get a2 ≡ 1 (mod 23) ⇒ a22 ≡ 1 (mod 24) ⇒ a2

3 ≡ 1

(mod 25)⇒ . . .⇒ a2k−2 ≡ 1 (mod 2k), showing that for any a ∈ Z∗2k we have ord(a) ≤ 2k−2. As

φ(2k) = 2k−1 this implies that primitive roots do not exist. 2

Finally we get our main result.

Theorem 3.15 (Primitive Roots) Z∗n is cyclic, i.e. has a primitive root, exactly when n =1, 2, 4, pk or 2pk, where p is any odd prime, and k ≥ 1.The number of primitive roots, if they exist, is φ(φ(n)).

Proof. Theorems 3.11 and 3.14 cover the prime power cases.The case of 2pk is easy. Let g1 be a primitive root (mod pk). If g1 is odd we take g = g1,otherwise we take g = g1 + pk. Then g is odd, so can be thought of as an element of Z∗2pk . Now,

if gh ≡ 1 (mod 2pk) then gh ≡ 1 (mod pk), hence gh1 ≡ 1 (mod pk), hence φ(pk) | h. Note thatφ(2pk) = φ(pk), so we find φ(2pk) | h, implying that g is a primitive root.Now all remaining cases can be covered as follows. When n 6= 2k, pk or 2pk, then n can be factoredas n = n1n2, with n1 and n2 both > 2, and coprime. We now use the fact that φ(m) is even for allm > 2. Let g be any element of Z∗n. Then gφ(n1) ≡ 1 (mod n1) and gφ(n2) ≡ 1 (mod n2), hence

glcm(φ(n1),φ(n2)) ≡ 1 (mod n). We find ord(g) ≤ lcm(φ(n1), φ(n2)) =φ(n1)φ(n2)

gcd(φ(n1), φ(n2))≤ 1

2φ(n),

and clearly g cannot be a primitive root.The number of primitive roots is found in Exercise 3.6. 2


32 3.4 Algorithms

3.4 Algorithms

In the previous sections we treated the theory of primitive roots and orders. Now we will discusspractical ways of computing them.

The following lemma is useful.

Lemma 3.16 Let a ∈ Z∗n. Then a is a primitive root if and only if aφ(n)/p 6≡ 1 (mod n) for allprimes p | φ(n).

Proof. If a is a primitive root then ae 6≡ 1 (mod n) for all 1 ≤ e < φ(n), so certainly not fore = φ(n)/p. If a is not a primitive root then its cofactor is > 1, hence has a prime divisor p. Itfollows that φ(n)/p is a multiple of ord(a), hence aφ(n)/p ≡ 1 (mod n). 2

Computing the order of an element in Z∗n can very naively be done by simply computing all powersuntil 1 is met (or n − 1 ≡ −1, then you know you’re exactly halfway, see Exercise 3.8). This iscompletely out of the question for moduli n that become bigger than a few thousand.

For large moduli n computing orders is only practically possible when φ(n) and its prime factor-ization are completely known. Here is a method.

Algorithm 3.1 (Order of an element of Z∗n)

Input: a, n ∈ Z with n ≥ 2 and gcd(a, n) = 1,P = {p | p prime divisor of φ(n)}

Output: the order ord(a) of a ∈ Z∗nStep 1: m← φ(n)Step 2: for all p ∈ P do

while p | m and am/p ≡ 1 (mod n) do

m← m/pStep 3: output m

To find elements of a given order, including primitive roots, no really clever ideas are known.Basically one proceeds with trial and error. The only trick that we use in the algorithm below isthat when we happen to have found an element of order being a multiple of the requested order,we can simply take the proper power of that element.

Algorithm 3.2 (Finding an element of Z∗n with given order)

Input: n,m ∈ Z with n ≥ 2 and m ≥ 1, m | φ(n),P = {p|p | φ(n), p prime}

Output: a ∈ Z∗n with ord(a) = mStep 1: b← 1Step 2: while m - ord(b) do

pick a new b (at random, or enumerating 2, 3, . . .)compute ord(b)

Step 3: a← bord(b)/m, output a

This is a somewhat problematic algorithm, as it may decide that an element of the requested orderdoes not exist only by exhausting the complete set Z∗n.

For finding primitive roots, apply the above algorithm with m = φ(n), and of course only if n isa power of an odd prime, or twice a power of an odd prime. In practice this algorithm then turnsout to be pretty efficient.


3.4 Algorithms 33

Exercises

3.1. Make addition and multiplication tables for Z7 and Z∗15.

3.2. Compute φ(n) for n = 1, 2, . . . , 25.

3.3. Let d | n. How many a ∈ {1, 2, . . . , n} have gcd(a, n) = d? (Hint: use Exercise 2.3.) Conclude

that∑d|n

φ(d) = n.

3.4. Let n ≥ 2, and let a be coprime to n. Prove the following: ak ≡ a` (mod n) if and only iford(a)|k − `.

3.5.(a) Find all primitive roots modulo 9 and modulo 11.(b) Show that 14 is a primitive root (mod 29).(c) Find (using (b) and the proof of Theorem 3.11) a primitive root (mod 841) (841 = 292).

3.6. Let g be a primitive root modulo n. Show that ga is a primitive root modulo n if and onlyif gcd(a, φ(n)) = 1. Then show that the number of primitive roots (mod n) is φ(φ(n)).

3.7. If ord(a) = n and d | n then show that ord(ad) = n/d.

3.8. Let a ∈ Z∗n with n > 2. Assume that there exists a positive integer e such that ae ≡ −1(mod n). Then show that 1

2ord(a) is the smallest such number.

3.9. A set {A1, A2, . . . , Ak} in Z∗n is called a basis if every element of Z∗n can be written in exactlyone way as At11 A

t22 · · ·A

tkk , with 0 ≤ ti < ord(Ai).

(a) Show that if a primitive root exists, it constitutes a basis on its own.(b) Let n be not divisible by 8, and let n = pe11 p

e22 · · · p

ekk be its factorization into primes. Let gi

be a primitive root (mod peii ) for i = 1, 2, . . . , k. Prove that a basis is given by

Ai ≡{gi (mod peii )1 (mod p

ejj ) for all j 6= i

(i = 1, 2, . . . , k).

(c) Let n be divisible by 8, and let n = 2e1pe22 · · · pekk be its factorization into primes, so with

e1 ≥ 3. Let gi be a primitive root (mod peii ) for i = 2, 3, . . . , k. Prove that a basis is given by

A0 ≡{−1 (mod 2e1)1 (mod p

ejj ) for all j ≥ 2

,

A1 ≡{

5 (mod 2e1)1 (mod p

ejj ) for all j ≥ 2

,

Ai ≡

1 (mod 2e1)gi (mod peii )1 (mod p

ejj ) for all j ≥ 2, j 6= i

(i = 2, 3, . . . , k).

(d) Let n have n = 2e1pe22 · · · pekk as its factorization into primes of n, with e1 ≥ 0, and ei > 0 for

i > 1. Show that

Z∗n =

Z∗pe22× · · · × Z∗

pekk

if e1 = 0, 1

Z∗2 × Z∗pe22× · · · × Z∗

pekk

if e1 = 2

Z∗2 × Z∗2e1−2 × Z∗

pe22× · · · × Z∗

pekk

if e1 ≥ 3

.


34 3.4 Algorithms


4.1 Quadratic residues and the Legendre symbol 35

Chapter 4

Quadratic Reciprocity

Introduction

General references for this chapter: [BS, Sections 5.7, 5.8, 5.9], [CP, Section 2.3], [Sh, Chapters12, 13]. And for those preferring Dutch: [Be, Chapter 11], [Ke, Chapter 12].


• quadratic residues,

• the Legendre symbol,

• the Quadratic Reciprocity Law,

• the Jacobi symbol,

• modular square roots.

4.1 Quadratic residues and the Legendre symbol

In many applications, also in cryptography, it is useful to know which elements of Z∗n can besquares, and which cannot.

Let a ∈ Z∗n. Then we say that a is a quadratic residue (mod n) if there exists a b ∈ Z∗n such thata ≡ b2 (mod n). If such b does not exist, then we say that a is a quadratic nonresidue (mod n).

Example: the quadratic residues modulo 11 are 1, 3, 4, 5, 9, and the quadratic nonresidues are2, 6, 7, 8, 10.

When n is not prime, then a quadratic residue modulo n necessarily is a quadratic residue moduloany prime divisor of n. We therefore proceed with studying quadratic residues modulo an oddprime p (even primes are not that interesting), and as quadratic residues modulo composite moduliturn out to be not too useful, we will further neglect them.

Lemma 4.1 Let p be an odd prime.(a) There are exactly 1

2 (p− 1) quadratic residues, and 12 (p− 1) quadratic nonresidues (mod p).

(b) Let a be such that p - a. Then a is a quadratic residue if and only if a12 (p−1) ≡ 1 (mod p),

and a is a quadratic nonresidue if and only if a12 (p−1) ≡ −1 (mod p).


36 4.2 The Quadratic Reciprocity Law

Proof. (a) Let g be a primitive root (mod p). Then: ga is a quadratic residue (mod p) if and

only if there is a b such that ga ≡(gb)2

= g2b (mod p), if and only if a ≡ 2b (mod p−1) (Exercise3.4), if and only if a is even (because p − 1 is even). Clearly the quadratic residues are precisely1, g2, g4, . . . , gp−3.An alternative proof: Let g be a primitive root (mod p), so Z∗p =

{1, g, g2, . . . , gp−2

}. Any

quadratic residue a satisfies a ≡ b2 (mod p) for some b, so by Fermat, a(p−1)/2 ≡ bp−1 ≡ 1(mod p). Theorem 3.7 says that there are precisely 1

2 (p − 1) roots of x(p−1)/2 − 1. Clearly1, g2, g4, . . . , gp−3 are 1

2 (p− 1) different quadratic residues, so these are all. Hence g, g3, . . . , gp−2

are the quadratic nonresidues, and their number is 12 (p− 1).

(b) Let c = a12 (p−1) (mod p). By Fermat’s Theorem 3.3, c satisfies c2 = ap−1 ≡ 1 (mod p). It

follows that p | c2 − 1 = (c− 1)(c+ 1), and because p is prime, c ≡ ±1 (mod p).Let g be a primitive root (mod p). There exists a y ∈ N such that a ≡ gy (mod p). Hence

c ≡ g 12 (p−1)y ≡ 1 (mod p), if and only if ord(g) | 1

2(p− 1)y, if and only if p− 1 | 1

2(p− 1)y, if and

only if y is even, if and only if a is a quadratic residue. 2

In particular, the above lemma shows that −1 is a quadratic residue modulo p if p ≡ 1 (mod 4),and −1 is a quadratic nonresidue modulo p if p ≡ 3 (mod 4).

A convenient shorthand notation expressing the quadratic residuosity of a number a modulo aprime p is the Legendre symbol , defined (for all a ∈ Z) as

(a

p

)=

1 if a is a quadratic residue (mod p),−1 if a is a quadratic nonresidue (mod p),

0 if p | a.

Elementary properties of the Legendre symbol are given in the next result.

Lemma 4.2 Let p be an odd prime.

(a)

(a

p

)≡ a 1

2 (p−1) (mod p).

(b)

(a

p

)=

(b

p

)if a ≡ b (mod p).

(c)

(ab

p

)=

(a

p

)(b

p

).

(d) If p - a then

(a2

p

)= 1.

Proof. (a) is Lemma 4.1(b), (b) is trivial, (c) follows at once from (a), and (d) is trivial. 2

4.2 The Quadratic Reciprocity Law

A less elementary but also very useful property of the Legendre symbol is the so called QuadraticReciprocity Law , which we will prove below (it is kind of a sport for number theorists to come upwith new proofs; more than 200 different proofs are known1; we can safely take the risk to believethat the law is true).

Theorem 4.3 (Quadratic Reciprocity Law) Let p, q be distinct odd primes. If p ≡ q ≡ 3

(mod 4) then

(p

q

)= −

(q

p

), otherwise

(p

q

)=

(q

p

).

1See http://www.rzuser.uni-heidelberg.de/~hb3/fchrono.html.


http://www.rzuser.uni-heidelberg.de/~hb3/fchrono.html

4.2 The Quadratic Reciprocity Law 37

Important special values are given in the next lemma.

Lemma 4.4 Let p be an odd prime.

(a)

(−1

p

)= 1 if p ≡ 1 (mod 4),

(−1

p

)= −1 if p ≡ 3 (mod 4).

(b)

(2

p

)= 1 if p ≡ 1, 7 (mod 8),

(2

p

)= −1 if p ≡ 3, 5 (mod 8).

The remainder of this section is devoted to proving these results. We start with an auxiliarylemma due to Gauss.

Let a ∈ Z∗p, for an odd prime p. We look at the set a, 2a, 3a, . . . ,p− 1

2a, and view them as integers

(mod p), where we take representatives in the set

{−p− 1

2,−p− 3

2, . . . ,−1, 0, 1, . . . ,

p− 3

2,p− 1

2

}.

Those in

{−p− 1

2, . . . ,−1

}are called negative representatives, and those in

{1, . . . ,

p− 1

2

}are

called positive representatives.

The numbers ±a,±2a, . . . ,±p− 1

2a are (mod p) all nonzero and all different. Hence{

±a,±2a, . . . ,±p− 1

2a

}=

{±1,±2, . . . ,±p− 1

2

}(mod p). For each s = 1, 2, . . . ,

p− 1

2the

pair±sa has as representatives numbers±us, where for us we take the positive representative of the

pair, i.e. us ∈{

1, 2, . . . ,p− 1

2

}. The observation that

{u1, u2, . . . , u(p−1)/2

}=

{1, 2, . . . ,

p− 1

2

},

is used below.

Lemma 4.5 (Gauss’ Lemma) Let a ∈ Z∗p, for an odd prime p, and let ka be the number of

a, 2a, . . . ,p− 1

2a that have negative representatives. Then

(a

p

)= (−1)ka .

Proof. From each pair ±us one is the representative of sa, and its sign tells us whether it isa positive or negative representative. Multiplying all these representatives and using the aboveobservation, we find

(p− 1

2

)! a

p−12 =

p−12∏s=1

sa ≡

p−12∏s=1

±us = (−1)ka(p− 1

2

)! (mod p),

and the result follows at once by Lemma 4.1(b). 2

We can now find the Legendre symbols

(−1

p

)and

(2

p

).

Proof of Lemma 4.4. (a) follows from Lemma 4.2(a). To prove (b) we have, by Gauss’ Lemma4.5, to count the number of 2, 4, . . . , p − 1 that have negative representatives. This is just thenumber k2 of even numbers between 1

2p and p. They are:

p = 8`+ 1 : 4`+ 2, . . . , 8` so k2 = 2`,p = 8`+ 3 : 4`+ 2, . . . , 8`+ 2 so k2 = 2`+ 1,p = 8`+ 5 : 4`+ 4, . . . , 8`+ 4 so k2 = 2`+ 1,p = 8`+ 7 : 4`+ 4, . . . , 8`+ 6 so k2 = 2`+ 2.

The result now follows by Lemma 4.1(b). 2


38 4.2 The Quadratic Reciprocity Law

To prove the Quadratic Reciprocity Law, we follow an argument by Eisenstein. Note that inGauss’ Lemma we are interested in ka (the number of negative representatives) only (mod 2).

And note that if sa has a positive representative us then sa =

⌊sa

p

⌋p + us, whereas if sa has a

negative representative −us then sa =

⌊sa

p

⌋p + p − us. It seems that adding all sa’s may yield

information about ka, and that’s why we introduce

S(a, p) =

p−12∑s=1

⌊sa

p

⌋.

Let p, q be distinct odd prime numbers. We apply the above with a = q.

Lemma 4.6 (Eisenstein’s First Lemma)

S(q, p) ≡ kq (mod 2).

Proof. The observation just above Gauss’ Lemma 4.5 says that

p−12∑s=1

s =

p−12∑s=1

us. Now we have

q

p−12∑s=1

s =

p−12∑s=1

sq =

p−12∑s=1

⌊sq

p

⌋p+ kqp+

p−12∑s=1

±us = p(S(q, p) + kq) +

p−12∑s=1

±us.

Modulo 2 we have p ≡ 1, q ≡ 1, and ±1 ≡ 1, so we get

S(q, p) + kq ≡

p−12∑s=1

s−

p−12∑s=1

us = 0 (mod 2),

and we’re done. 2

Note that together with Gauss’ Lemma we now have

(q

p

)= (−1)S(q,p).

Lemma 4.7 (Eisenstein’s Second Lemma)

S(p, q) + S(q, p) =p− 1

2· q − 1

2.

Proof of Lemma 4.7. The figure below shows a rectangle with vertices (0, 0), ( 12p, 0), (0, 12q), (

12p,

12q),

in which we count lattice points, i.e. points with integral coordinates. There clearly arep− 1

2· q − 1

2such points.The rectangle is divided into two triangles by the diagonal given by py = qx. We will count thelattice points inside the two triangles, and show that they equal S(p, q), resp. S(q, p).


4.3 Another proof 39

Note that there are no lattice points on the edgesof the triangles (i.e. also not on the diagonal).The part of the sth row inside the upper left blue

triangle is given by y = s and 1 ≤ x < p

qs. The

number of lattice points on the sth row inside

the blue triangle therefore is

⌊sp

q

⌋. So inside

the blue triangle there are exactly

q−12∑s=1

⌊sp

q

⌋, i.e.

S(p, q) points.

The part of the sth column inside the lower right green triangle is given by x = s and 1 ≤ y < q

ps.

The number of lattice points on the sth column inside the green triangle therefore is

⌊sq

p

⌋. So

inside the green triangle there are exactly

p−12∑s=1

⌊sq

p

⌋, i.e. S(q, p) points. 2

Proof of Theorem 4.3. Combining Gauss’ Lemma 4.5 with Eisenstein’s Lemmas 4.6 and 4.7we see that(

p

q

)(q

p

)= (−1)S(p,q)+S(q,p) = (−1)

p−12 ·

q−12 ,

and this suffices. 2

4.3 Another proof

As said there are many proofs of the Quadratic Reciprocity Law. Here is another neat one,discovered in 2008 by Wouter Castryck from Leuven.

Let p, q be distinct odd primes. For odd n ∈ N, let Nn be the number of solutions (x1, . . . , xn) ∈ Znqof

x21 − x22 + x23 − · · ·+ x2n ≡ 1 (mod q). (4.1)

The idea is to count Nn in two different ways.

The number of solutions of (4.1) with x1 ≡ x2 (mod q) is qNn−2, because there are q possibilitiesfor x1(≡ x2 (mod q)) and Nn−2 for (x3, . . . , xn). To count the number of solutions of (4.1) withx1 6≡ x2 (mod q), note that there are qn−2 possibilities for (x3, . . . , xn), and for any c (in fact,c = 1 − x23 + . . . − x2n) we count the number of solutions of x21 − x22 ≡ c (mod q). By writingy = x1 − x2 6≡ 0 (mod q) and z = x1 + x2 (the mapping from (x1, x2) to (y, z) is one to one) wesee that each possible y leads to one solution: z ≡ cy−1 (mod q), so there are q − 1 solutions ofx21−x22 ≡ c (mod q). Hence the number of solutions of (4.1) with x1 6≡ x2 (mod q) is qn−2(q−1),and for the total number of solutions of (4.1) we thus find Nn = qNn−2 + qn−2(q − 1). So

Nn − qn−1 = q(Nn−2 − qn−3

)= q2

(Nn−4 − qn−5

)= . . . = q(n−1)/2 (N1 − 1) ,


40 4.3 Another proof

and with N1 = 2 this proves the expression

Nn = qn−1 + q(n−1)/2. (4.2)

Another way of counting Nn is as follows. The number of solutions x ∈ Zq of x2 ≡ t (mod q) is

equal to 1 +

(t

q

). So

Nn =∑

t1,...,tn∈Zq

t1+...+tn≡1(mod q)

#{x1 ∈ Zq | x21 ≡ t1 (mod q)

}#{x2 ∈ Zq | x22 ≡ −t2 (mod q)

}· · ·

· · · #{xn ∈ Zq | x2n ≡ tn (mod q)

}=

∑t1,...,tn∈Zq

t1+...+tn≡1(mod q)

(1 +

(t1q

)) (1 +

(−t2q

)). . .

(1 +

(tnq

)).

In the expanded product, all terms, except the first and last, are of the type

∑t1,...,tn∈Zq

t1+...+tn≡1(mod q)

(±ti1q

)(±ti2q

)· · ·(±tirq

)= ±qn−1−r

∑ti1 ,...,tir∈Zq

(ti1q

)(ti2q

)· · ·(tirq

)

with 0 < r < n. Because 12 (q − 1) of the

(t

q

)for t ∈ Zq are +1, an equal number are −1, and

one is 0, we have∑t∈Zq

(t

q

)= 0, so all the terms above vanish:

∑ti1 ,...,tir∈Zq

(ti1q

)· · ·(tirq

)=

∑ti1∈Zq

(ti1q

) · · · ∑tir∈Zq

(tirq

) = 0.

From the expanded product only the first and last terms are left. They respectively are∑t1,...,tn∈Zq

t1+...+tn≡1(mod q)

1 = qn−1 and∑

t1,...,tn∈Zq

t1+...+tn≡1(mod q)

(t1(−t2) · · · tn

q

). Using

(−1

q

)= (−1)(q−1)/2 we thus find

Nn = qn−1 + (−1)(q−1)(n−1)/4∑

t1,...,tn∈Zq

t1+...+tn≡1(mod q)

(t1t2 · · · tn

q

). (4.3)

Expressions (4.2) and (4.3) imply that

∑t1,...,tn∈Zq

t1+...+tn≡1(mod q)

(t1t2 · · · tn

q

)= (−1)(n−1)(q−1)/4q(n−1)/2.

Now we take n = p prime. If t1 ≡ t2 ≡ . . . ≡ tp (mod p) then the condition t1 + . . . + tp ≡ 1(mod q) implies t1 ≡ . . . ≡ tp ≡ p−1 (mod q). By taking cyclic shifts, all other (t1, t2, . . . , tp) fall


4.4 The Jacobi symbol 41

into disjunct sets of p elements each, such that all elements in one set have identical t1t2 · · · tp(note that this is in accordance with Fermat’s theorem qp−1 ≡ 1 (mod p)). So taking the sum(mod p) all the contributions of those sets vanish, and we arrive at(

p−p

q

)≡ (−1)(p−1)(q−1)/4q(p−1)/2 (mod p).

The Quadratic Reciprocity Law follows by noting that

(p−p

q

)=

(p

q

), that q(p−1)/2 ≡

(q

p

)(mod p), and that (−1)(p−1)(q−1)/4 is just a cumbersome way of writing 1 if p ≡ 1 (mod 4) orq ≡ 1 (mod 4), and −1 if p ≡ q ≡ 3 (mod 4).

4.4 The Jacobi symbol

With Lemma 4.2, Theorem 4.3 and Lemma 4.4 at our disposal we have a number of powerfultools to efficiently compute Legendre symbols, without having to do the expensive modular ex-ponentiation a

12 (p−1) (mod p). The idea is to reduce a modulo p whenever a > p, to factor a

completely into its prime divisors, and to use quadratic reciprocity to move to smaller primes inthe ’denominator’.

For example, say we want to know whether 70 is a quadratic residue modulo 107. Then we

factor 70, and by the quadratic reciprocity law we have

(70

107

)=

(2

107

)(5

107

)(7

107

)=

(−1)

(107

5

)(−(

107

7

)). Now we reduce 107 (mod 5) and (mod 7), and we get

(70

107

)=(

2

5

)(2

7

)= (−1)1 = −1. Hence 70 is a quadratic nonresidue modulo 107.

The main problem with the above method is that factoring is usually difficult. Therefore the

Legendre symbol has been generalised to the so called Jacobi symbol(an

), in which n is not

necessarily prime anymore. This is more convenient to compute.

Indeed, for integers m,n with n ≥ 3 odd we define the Jacobi symbol(mn

)in such a way that it has

useful multiplicative properties. Indeed, when the prime factorization of n is given by n =

k∏i=1

peii ,

then we define

(mn

)=

k∏i=1

(m

pi

)ei,

where

(m

pi

)is the Legendre symbol.

You should however notice that the Jacobi symbol(mn

)has no direct relation anymore to m

being a quadratic residue or nonresidue modulo n (see Exercise 4.2). The only reason to introduceit is that it eases the computation of Legendre symbols.

The Jacobi symbol has the following properties.

Lemma 4.8 Let k, n be odd integers ≥ 3, and let m be any integer.

(a) If n is prime then the Jacobi symbol(mn

)equals the Legendre symbol

(mn

).


42 4.4 The Jacobi symbol

(b) If gcd(m,n) 6= 1 then(mn

)= 0. And if gcd(m,n) = 1 then

(mn

)= ±1.

(c) If gcd(m,n) = 1 then

(m2

n

)= 1.

(d)(mn

)=

(`

n

)if m ≡ ` (mod n).

(e)

(m`

n

)=(mn

)( `n

).

(f)(mkn

)=(mk

)(mn

).

(g) [Quadratic Reciprocity Law]

If k ≡ n ≡ 3 (mod 4) then

(k

n

)= −

(nk

), otherwise

(k

n

)=(nk

).

(h)

(−1

n

)= 1 if n ≡ 1 (mod 4),

(−1

n

)= −1 if n ≡ 3 (mod 4).

(i)

(2

n

)= 1 if n ≡ 1, 7 (mod 8),

(2

n

)= −1 if n ≡ 3, 5 (mod 8).

Proof. Parts (a) to (f) follow easily from the corresponding properties of the Legendre symbol.

For (g), let n have the prime factorization n =

r∏i=1

peii , and let k have the prime factorization

k =

s∏j=1

qfjj . Without loss of generality we may assume that the set of the pi is disjunct from the

set of the qj , ensuring that all Legendre and Jacobi symbols below are nonzero.Let N = #{i | 1 ≤ i ≤ r and pi ≡ 3 (mod 4) and ei is odd}, and similarly let K = #{j | 1 ≤j ≤ s and qj ≡ 3 (mod 4) and fj is odd}. It easily follows that n ≡ 1 (mod 4) if N is even, andn ≡ 3 (mod 4) if N is odd, and similarly for k and K.

The quadratic reciprocity law for Legendre symbols shows that

((piqj

)(qjpi

))eifj= −1 if and

only if pi ≡ qj ≡ 3 (mod 4) and both ei, fj are odd. Now (e) and (f) imply

(nk

)(kn

)=

r∏i=1

s∏j=1

((piqj

)(qjpi

))eifj= (−1)NK ,

which equals −1 if and only if both N and K are odd, which happens if and only if n ≡ k ≡ 3(mod 4).Part (h) and (i) are left as an exercise (Exercise 4.6). 2

Now we can use the above properties to compute the Jacobi (and hence the Legendre) symbolwithout having to factor numbers (other than splitting off factors 2, which is easy). For example,

let us redo the computation of

(70

107

). This is now done as follows:

(70

107

)=

(2

107

)(35

107

)=

(−1)(−(

107

35

)) =

(2

35

)= −1. Note that we did not have to factor 35.

The following algorithm now should be clear. It can be used to compute Legendre symbols byusing Jacobi symbols.


4.4 The Jacobi symbol 43

Algorithm 4.1 (Jacobi Symbol)

Input: m,n ∈ Z with n ≥ 3 odd

Output:(mn

)Step 1: m′ ← m (mod n), n′ ← n, j ← 1Step 2: while m′ 6= 0 and n′ > 1 do

while m′ is even do

m′ ← m′/2if n′ ≡ 3 or 5 (mod 8) then j ← −j

(m′, n′)← (n′,m′)if m′ ≡ n′ ≡ 3 (mod 4) then j ← −jif n′ > 1 then m′ ← m′ (mod n′)

Step 3: if m′ = 0 then j ← 0output j

The complexity of this algorithm is comparable to the Euclidean Algorithm, which should not comeas a surprise, since the algorithm has a somewhat similar structure: at each step one number isreduced modulo the other, and then they are swapped. The bit complexity is easily seen to beO((logm)(log n)).

To avoid long division in the reduction step it is also possible to use a binary variant of the abovealgorithm.

Algorithm 4.2 (Jacobi Symbol, binary method)

Input: m,n ∈ Z with n ≥ 3 odd

Output:(mn

)Step 1: m′ ← |m|, n′ ← n

if m < 0 and n ≡ 3 (mod 4) then j ← −1 else j ← 1while m′ > 0 and m′ is even do


Step 2: while m′ 6= 0 and n′ > 1 do

while m′ ≥ n′ dom′ ← m′ − n′while m′ > 0 and m′ is even do


(m′, n′)← (n′,m′)if m′ ≡ n′ ≡ 3 (mod 4) then j ← −j

Step 3: if m′ = 0 then j ← 0output j

To conclude this section we discuss the problem of finding numbers that are quadratic residues orquadratic nonresidues.

To find for given prime modulus p a (random) quadratic residue is trivial: take a random numberand square it. To find a (random) quadratic nonresidue is easy when p ≡ 3 (mod 4) or p ≡ 5(mod 8): simply take a random quadratic residue and multiply it by −1 resp. 2.

Given the above efficient algorithms, generating a quadratic nonresidue modulo p when p ≡ 1(mod 8) is in practice possible by trial and error: simply take a random element and compute theLegendre symbol, repeat this until the Legendre symbol is −1. On average one should succeedafter 2 trials.


44 4.5 Modular Square Roots

4.5 Modular Square Roots

With the Legendre symbol and the algorithm for computing it we have a way to find out whethera given number a is a quadratic residue (mod p) or not, but we do not yet have a method offinding the modular square root b such that b2 ≡ a (mod p).

For an odd prime modulus p we now give the method of computing the square roots of a, i.e.of computing b such that b2 ≡ a (mod p). Note that if a is a nonzero quadratic residue moduloa prime p, then there are 2 square roots (mod p), since x2 ≡ b2 (mod p) implies p | x2 − b2 =(x− b)(x+ b), so p | x− b or p | x+ b, so the square roots are x ≡ ±b (mod p).

First we treat the cases where p 6≡ 1 (mod 8), as these turn out to be easy.

Lemma 4.9 Let p be an odd prime, and a 6≡ 0 (mod p) a quadratic residue (mod p).

(a) If p ≡ 3 (mod 4), then b = a14 (p+1) is a square root of a (mod p).

(b) If p ≡ 5 (mod 8), then a14 (p−1) ≡ ±1 (mod p).

In the case a14 (p−1) ≡ 1 (mod p) a square root of a (mod p) is given by b = a

18 (p+3).

In the case a14 (p−1) ≡ −1 (mod p) a square root of a (mod p) is given by b = 2

14 (p−1)a

18 (p+3).

Proof. See Exercise 4.7. 2

If p ≡ 1 (mod 8) then no deterministic method was known until 2006, when Christiaan van deWoestijne described such a method in his University of Leiden PhD thesis. Probabilistic methodshave been known for more than a century, and are very practical. We give the algorithm due toTonelli (that, by the way, works for all odd primes p).

Algorithm 4.3 (Square Root modulo a prime)

Input: p prime,

a quadratic residue a modulo pOutput: b such that b2 ≡ a (mod p)Step 1: pick a random quadratic nonresidue g

compute s, t such that p− 1 = 2st with t odd

e← 0Step 2: for i from 2 to s do

if (ag−e)2s−it 6≡ 1 (mod p) then e← e+ 2i−1

Step 3: h← ag−e, b← g12 eh

12 (t+1), output b

Lemma 4.10 Let p be an odd prime, and a 6≡ 0 (mod p) a quadratic residue (mod p). ThenAlgorithm 4.3 outputs a b such that b2 ≡ a (mod p).

Proof. First note that at the beginning of Step 2 ag−e = a, and (ag−e)2s−1t = a(p−1)/2 ≡ 1

(mod p) as a is a quadratic residue. It follows that (ag−e)2s−2t ≡ ±1 (mod p), and if it is −1,

then e is increased by 2. This implies that (ag−e)2s−2t ≡ (−1)g−2

s−1t (mod p), and since g is

a quadratic nonresidue, we have g−2s−1t = g−(p−1)/2 ≡ −1 (mod p), so that after applying the

inner loop in Step 2 for i = 2 we always have (ag−e)2s−2t ≡ 1 (mod p). Now by induction (see

Exercise 4.8) it follows that (ag−e)2s−it ≡ 1 (mod p) is always true after the inner loop in Step

2 has been done for some i. In particular, at the beginning of Step 3 e has been engineered suchthat (ag−e)t ≡ 1 (mod p). Since b2 = at+1g−et it follows that b2 ≡ a (mod p). 2

Note that this algorithm is probabilistic since no deterministic method is known to find a quadraticnonresidue.


4.5 Modular Square Roots 45

For composite moduli of which the factorization is not known, the problem of finding square rootsis believed to be very hard, i.e. an efficient method is not known. Even finding out whether or nota given number is a quadratic residue modulo such a composite modulus is not well understoodin the case of the Jacobi symbol being 1.

When the factorization of n is known, say n =

k∏i=i

peii , the square roots modulo peii can be found,

and then the Chinese Remainder Theorem can be applied to find all square roots modulo n. Foreach prime power there is only one pair of square roots of a, so the number of square roots modulon is 2k (unless some pi divides a).

We do not work out the details, but for n being the product of distinct primes (i.e. ei = 1 for alli) you should be able to do this.

Exercises

4.1. Prove that the product of two quadratic residues is a quadratic residue.Prove that the product of two quadratic nonresidues is a quadratic residue.Prove that the product of a quadratic residue and a quadratic nonresidue is a quadratic nonresidue.

4.2. Compute the Jacobi symbol

(8

15

). Is 8 a quadratic residue modulo 15?

4.3. Compute the Jacobi symbol

(727

1169

)by both the original and the binary algorithms.

4.4. Compute all odd primes p for which 6 is a quadratic residue (mod p).

4.5. Compute all square roots of 51 modulo 91 (note: there are 4 of them).

4.6. Complete the proof of Lemma 4.8.

4.7. Prove Lemma 4.9.

4.8. Write out the induction argument in the proof of Lemma 4.10.

4.9. Write computer programs that compute Legendre and Jacobi symbols in an efficient way.

4.10. Let n = pq for different odd primes p, q, and assume that you know n but you do not knowp and q. Let a be a quadratic residue (mod n). Show that if you have a way to find the foursolutions b of b2 ≡ a (mod n), then you can compute p and q.

4.11. (a) Let p be an odd prime, and let a ∈ Z with p - a be a quadratic residue (mod p). Letb1 ∈ {1, 2, . . . , p − 1} be such that b21 ≡ a (mod p). Let k ≥ 2. Show that there exists a uniquebk ∈ {1, 2, . . . , pk − 1} with bk ≡ b1 (mod p) such that b2k ≡ a (mod pk). Conclude that x2 ≡ a(mod pk) has exactly 2 solutions.Hint: show by induction that there exists a bk ∈ {1, 2, . . . , pk − 1} with bk ≡ bk−1 (mod pk−1)such that b2k ≡ a (mod pk).(b) Let n ≥ 3 be a positive odd integer, and let a be a quadratic residue (mod p) with gcd(a, n) =1. Prove that there exist precisely 2` solutions of x2 ≡ a (mod n), where ` is the number of distinctprime factors of n.


46 4.5 Modular Square Roots


5.1 Prime Number Distribution 47

Chapter 5

Prime Numbers

Introduction

General references for this chapter: [BS, Chapters 8, 9], [CP, Chapters 3, 4], [GG, Chapter 18],[Sh, Chapter 5, Section 7.5, Chapter 10, 22]. And for those preferring Dutch: [Be, Chapters 8,19], [Ke, Chapter 13], [dW, Chapter 3].


• prime number distribution,

• probabilistic primality tests,

• prime number generation.

5.1 Prime Number Distribution

5.1.1 The Prime Number Theorem

Prime numbers play an important role in discrete mathematics, especially in cryptography. Inapplications one often has to find large ’random’ prime numbers (several thousands of bits). Thismeans that it would be nice if number theory could guarantee that such large prime numbers existin some abundance, and could provide efficient methods of generating them.

The first result in the theory of prime numbers is the fact that there are infinitely many.

Theorem 5.1 (Infinitude of primes) There are infinitely many prime numbers.

Proof. Let p1, p2, . . . , pn be some set of primes. Then P = p1p2 · · · pn + 1 is a number that is notdivisible by any of p1, p2, . . . , pn. On the other hand, P has at least some prime factor p. Thenp must be a prime different from p1, . . . , pn. So for any finite set of primes there exists anotherprime that is not in that set. 2

The main tool in studying the distribution of the primes is the prime counting function

π(x) = #{p ≤ x|p prime}.


48 5.1 Prime Number Distribution

There is an extensive literature about π(x), in an area of number theory called analytic numbertheory. This theory contains a large number of deep results, of which we can only touch the verysurface, most of the time without proofs.

The main result from (analytic) prime number theory is the Prime Number Theorem, whichestimates π(x) in terms of known functions of x. To be able to state it, we need the conceptof asymptotically equivalent functions. This concept describes what it means that two functionsshow the same growth behaviour for x tending towards infinity.

We say that f(x) and g(x) are asymptotically equivalent as x → ∞, notation f(x) ∼ g(x), whentheir quotient tends to 1 as x→∞. In a formula:

f(x) ∼ g(x) means limx→∞

f(x)

g(x)= 1.

Now we can state the Prime Number Theorem. Proving it however is way beyond the scope ofthis course.

Theorem 5.2 (Prime Number Theorem, Hadamard and de la Vallee-Poussin)

π(x) ∼ x

log x.

One way to look at this is in a probabilistic sense. Say we are looking for prime numbers in acertain interval [x, x + ∆x]. For example, we might be looking for prime numbers of 1024 bits,then we have the interval [21023, 21024]. The Prime Number Theorem then says that the number of

primes in this interval, π(x+ ∆x)− π(x), is roughly∆x

log x. As the total number of integers in the

interval is ∆x, the probability that a random number from the interval is prime is approximately1

log x. As log 21023 ≈ 710 this means that about one of every 710 numbers of 1024 bits is prime.

This may look not very much, but note that all primes are odd, so we can immediately rule out alleven numbers, and similarly we can e.g. rule out all multiples of 3. This leaves us with one out ofevery 237 numbers ±1 (mod 6) having 1024 bits that is prime. Note that there are approximately1

71021023 ≈ 10305 prime numbers in this interval. That is an abundance indeed (the number ofelementary particles in the universe is estimated at only 1080).

The estimatex

log xfor π(x) has a very simple form, the right asymptotic behaviour and is a

reasonable approximation in practice. Nevertheless there are better estimates. One that is not

very well known, but in practice almost as easy, and considerably better, is π(x) ∼ x

log x− 1. One

also often sees the expression π(x) ∼ li(x) =

∫ x

2

1

log tdt. This is an even closer approximation,

but less practical.

To get an idea of the accuracy of the estimates for π(x), we show a small table and some graphs.

π(x)/ π(x)−log xlog 10 π(x) x

log xx

log x−1 li(x) xlog x

xlog x−1 li(x) x

log xx

log x−1 li(x)

1 4 4.3429 7.6770 5.1204 0.9210 0.5210 0.7812 −0.3429 −3.6770 −1.12042 25 21.715 27.738 29.081 1.1513 0.9013 0.8597 3.2853 −2.7379 −4.08103 168 144.77 169.27 176.56 1.1605 0.9925 0.9515 23.235 −1.2690 −8.56454 1229 1085.7 1218.0 1245.1 1.1320 1.0091 0.9871 143.26 11.024 −16.0925 9592 8685.9 9512.1 9628.8 1.1043 1.0084 0.9962 906.11 79.900 −36.7646 78498 72382. 78030. 78627. 1.0845 1.0060 0.9984 6115.6 467.55 −128.507 664579 6.204E5 6.615E5 6.649E5 1.0712 1.0047 0.9995 44158. 3120.0 −338.368 5761455 5.429E6 5.740E6 5.762E6 1.0613 1.0037 0.9999 3.328E5 21151. −753.339 50847534 4.825E7 5.070E7 5.085E7 1.0537 1.0029 1.0000 2.593E6 1.460E5 −1699.9

10 455052511 4.343E8 4.540E8 4.551E8 1.0478 1.0023 1.0000 2.076E7 1.041E6 −3102.5(notation: En stands for ×10n)


5.1 Prime Number Distribution 49

π(x),x

log x,

x

log x− 1and li(x) π(x),

x

log x,

x

log x− 1and li(x)

π(x)x

log x

,π(x)x

log x−1and

π(x)

li(x)π(x)− x

log x, π(x)− x

log x− 1and π(x)− li(x)

The graphs above suggest that there is a lot of regularity in the distribution of prime numbers,and that is to some extent true. The prime number distribution seems to have a lot of propertiesthat are very plausible from an experimental point of view, but have not been proved. But thereare also many irregularities.

The main conjecture in this area is the Riemann Hypothesis, a long standing open problem witha lot of consequences. The Riemann Hypothesis can be formulated as a very good estimate forthe error term π(x)− li(x), as follows.

Conjecture 5.3 (Riemann Hypothesis) For x ≥ 3 we have

|π(x)− li(x)| <√x log x.

Only the much weaker

Theorem 5.4 (Prime Number Theorem, error estimate) For any k > 0

π(x) = li(x) + O

(x

(log x)k

).

is known.

The table above gives some experimental evidence for Conjecture 5.3: π(x) and li(x) coincide forabout the first half of their digits.

The following lemma is also sometimes useful. It estimates the size of the n’th prime number.

Lemma 5.5 Let pn be the n’th prime number. Then pn ∼ n log n.


50 5.1 Prime Number Distribution

Also very explicit versions of the prime number theorem exist.

Theorem 5.6 (Prime Number Theorem, explicit version)

(a)x

2(log x)2< π(x)− x

log x<

3x

2(log x)2for x ≥ 59.

(b) −1

2n < pn − n(log n+ log log n− 1) <

1

2n for n ≥ 20.

There are gaps between consecutive primes as large as one wants, see Exercise 5.3. There is anextensive literature on the size of prime gaps. Bertrand’s Postulate says that there is always aprime between n and 2n. This has been improved to

Theorem 5.7 (Prime gaps, Baker and Harman) Let ε > 0, and let n be large enough.There always is a prime between n and n+ n0.535+ε.Equivalently, pn − pn−1 ≤ n0.535+ε.

It is conjectured that the above theorem still is very pessimistic, namely that the right boundshould be pn − pn−1 = O((log n)2).

5.1.2 Probabilistic arguments

We conclude this section with a few lines about probabilistic (or heuristic) reasoning. Experimen-tally it usually appears to be true that different ’events’ involving the primality of random numbersare independent, unless a good reason for dependence can be found. Some of the conjectures onecan make as a result of such heuristic reasoning can be proved, while others remain unproven,though very probable.

An example of the first kind is the prime number theorem for arithmetic progressions. Let begiven a modulus m, and a number a ∈ {0, 1, . . . ,m− 1}. An arithmetic progression is a sequencelike {a, a+m, a+ 2m, a+ 3m, . . .}. We would like to know how many primes up to x are in suchan arithmetic progression, i.e. in a certain congruence class modulo m. We then look at the events’p is prime’ and ’p ≡ a (mod m)’.

If gcd(a,m) 6= 1 then the events clearly are dependent, as a prime p can be congruent to a modulom only if gcd(a,m) | p, implying p | a and p | m. Clearly there are only finitely many such primesp, and this means that for a random p the probability of the two events happening at the sametime is 0.

But if a and m are coprime, the events seem to be independent. Heuristic reasoning now suggeststhat the primes are equally distributed over the congruence classes of a reduced residue systemmodulo any m. The number of primes p in an interval of length ∆x around x is approximately∆x

log x, so the mentioned heuristic means that for each a coprime to m the number of primes p ≡ a

(mod m) in that interval is approximately1

φ(m)

∆x

log x.

This heuristic is indeed known to be true. Put

πa,m(x) = #{p ≤ x|p prime and p ≡ a (mod m)}.

Theorem 5.8 (Prime Number Theorem for Arithmetic Progressions, Dirichlet)Let m ∈ Z, m ≥ 2, and a ∈ {0, 1, . . . ,m− 1} coprime to m. Then

πa,m(x) ∼ 1

φ(m)

x

log x.


5.2 Probabilistic Primality Testing 51

A similar heuristic reasoning can be used to estimate the number of twin primes, i.e. the numberof pairs p − 2, p below x that are both prime. As the events ’p − 2 is prime’ and ’p is prime’can be assumed to be independent (when p ≡ 1 (mod 6)), the number of twin primes below x

can be easily estimated to be cx

(log x)2for a constant c. The correct value for c is the so called

twin prime constant c2 = 2∏p

p(p− 2)

(p− 1)2= 1.3203 . . ., where the product is taken over all odd

primes. To obtain this value a more subtle probabilistic argument is required. A result like thishowever is not proven, it is not even known whether there are infinitely many twin primes. Butexperimentally the heuristic works very well.

Finally we mention Sophie Germain primes, which are primes p for which 12 (p− 1) also is prime.

Note that when p is a Sophie Germain prime then the only possible orders of elements in Z∗p are

1, 2, 12 (p − 1) and p − 1. For this reason Sophie Germain primes have gained some popularityin cryptographic applications, where they are also called safe primes or strong primes. Againassuming the heuristic that the events of p and 1

2 (p − 1) being prime are independent (thenof course p ≡ 3 (mod 4)), the number of Sophie Germain primes below x can be estimated at

cx

(log x)2for a constant c. In fact, this time c = 1

2c2 = 0.66016 . . ., where c2 is the twin prime

constant.

5.2 Probabilistic Primality Testing

5.2.1 Introduction to Primality Testing

We now move to primality testing, i.e. given a (large) integer n, trying to answer the question ”Isthis integer n prime or composite?”. The naive approach would be to list all prime numbers up to√n, and do division for each of them (why only to

√n?). This is an exponential algorithm: the

runtime is exponential in the length of the input. For n larger than 1015 or so this clearly is outof the question.

Primality tests come in two flavors. Deterministic primality tests give a Yes-or-No result: ”Yes,n is proven to be prime”, or ”No, n is proven to be composite”. Probabilistic primality tests are abit weaker, as they give only a Probably-or-No result: ”Probably n is prime”, or ”No, n is provento be composite”. The probability that the algorithm gives a wrong answer should be provablysmall. The main advantage of probabilistic tests is that they are much more efficient.

In practice (at least in cryptography) one usually is happy with a probabilistic test only. When adeterministic test is done, one usually first applies a probabilistic test as well, to quickly discardcomposite numbers. Also, to quickly discard composites that have a small prime divisor, somemore naive methods can also be useful, such as first computing gcd(n, p1p2p3 · · · pk) for the smallestk primes (the value of the product of the first k primes can be precomputed and stored).

5.2.2 Pseudoprimes, Witnesses and Liars

The starting point for many probabilistic primality tests is Fermat’s Theorem 3.3. It says that if pis prime and a some integer not divisible by p, then ap−1 ≡ 1 (mod p). The fact that 214 ≡ 4 6≡ 1(mod 15) thus proves that 15 is composite (without revealing any factors).

However, Fermat’s Theorem cannot be used backwards. That is, for composite n there may existintegers a coprime to n satisfying an−1 ≡ 1 (mod n). For example, 414 ≡ 1 (mod 15).

Such a composite, that ’behaves like a prime’ with respect to the base a, is called a pseudoprimeor Fermat pseudoprime with respect to the base a. A number a coprime to n such that an−1 6≡ 1


52 5.2 Probabilistic Primality Testing

(mod n) is called a Fermat witness for the compositeness of n. A number a coprime to n suchthat an−1 ≡ 1 (mod n) while n is composite is called a Fermat liar for the primality of n.

So it may happen that a primality test based on Fermat’s Theorem fails because one happensto have chosen a base a for which n is a pseudoprime. The situation is even worse: there existnumbers, the so called Carmichael numbers, that are pseudoprimes with respect to any base a.The smallest is 561 (see Exercise 5.6). Though Carmichael numbers seem rather sparse, actuallythere are quite a lot of them: the number of Carmichael numbers up to x is at least proportionalto x2/7.

5.2.3 The Miller-Rabin Test

Assume that we have a number n which we want to test for primality, and we have an integer acoprime to n, that satisfies an−1 ≡ 1 (mod n). So n is either prime or a pseudoprime with respectto a.

Another feature that distinguishes primes from compos-ites is related to square roots. If n is prime, then thereare only two square roots of 1 modulo n, namely ±1.But if n is composite, there are more square roots of 1modulo n. We know that an−1 ≡ 1 (mod n), and n− 1is even, and we can thus easily compute some squareroots of an−1, namely a

12 (n−1) (mod n) (thus by modu-

lar exponentiation). In general, we start for some r ∈ N(at first r = 1) with an even 1

2r−1 (n− 1), and explicitly

known values of a12r (n−1) (mod n). Now there are three

possibilities.

1. If it happens that a12r (n−1) ≡ −1 (mod n) then we

stop because we cannot exclude that n is prime,and we don’t know how to proceed with this a.

2. If it happens that a12r (n−1) 6≡ ±1 (mod n) then

we have proved that n is composite.

3. If a12r (n−1) ≡ 1 (mod n) then we increase r by 1,

and if 12r−1 (n−1) (with the new r) is still even we

compute the new a12r (n−1) (mod n), and repeat

the game; otherwise we stop.

In other words, we look at an−1, a12 (n−1), a

14 (n−1), . . . , a

12r (n−1) (mod n), until 1

2r (n−1) has become

odd (conclusion: n may be prime); or a12r (n−1) ≡ −1 (mod n) (conclusion: n may be prime); or

a12r (n−1) 6≡ ±1 (mod n) (conclusion: n is composite).

Again this is not yet a deterministic test: there are composite numbers n for which this test fails,the so called strong pseudoprimes with respect to the base a. Then a is called a strong liar forthe primality of n. Similarly when the test succeeds in proving that n is composite, a is called astrong witness for the compositeness of n.

Example: n = 561 is a Carmichael number. Let us take a = 206, which by the Euclideanalgorithm can be shown to be coprime to 561. We compute 206560 ≡ 1 (mod 561) (no surprise,as 561 is Carmichael), then 206280 ≡ 1 (mod 561), so we continue with 206140 ≡ 67 (mod 561).The number 67 thus is a solution to x2 ≡ 1 (mod 561) other than ±1, and it follows that 561 iscomposite. So 206 is a strong witness for 561.

Example: n = 13981 = 11× 31× 41 has a = 2 as a strong liar. Namely, 26990 ≡ −1 (mod 13980).


5.2 Probabilistic Primality Testing 53

In practice it is easiest to do the computation in the other direction: by repeated squaring instead of repeated square rooting. The reason is that when x (mod n) has been computed, thencomputing x2 (mod n) is easy, whereas the other way is difficult. So write n− 1 = 2st with t odd

and s ≥ 1. Then compute at, and by repeated squaring successively compute a2t, a4t, . . . , a2s−1t =

a12 (n−1), a2

st = an−1 (mod n), until we hit a2km ≡ 1 (mod n). If the previous term a2

k−1m

(mod n) is not −1, we have proved that n is composite.

This idea is turned into an algorithm below, due to Miller and Rabin, and thus known as theMiller-Rabin Primality Test. This algorithm accepts as input, next to the number n to be tested,a ’security parameter’ R. This is the maximum number of different random a to be tested forbeing (Fermat or strong) liars.

We have the following result about the Miller-Rabin algorithm, which we cannot prove.

Theorem 5.9 (Miller-Rabin Primality Test)(a) If n is an odd prime then the Miller-Rabin algorithm outputs ”probably prime”.(b) If n is an odd composite, then the Miller-Rabin algorithm with security parameter R outputs

”composite” with probability > 1− 1

4R, and ”probably prime” with probability ≤ 1

4R.

(c) The Miller-Rabin algorithm takes O(R(log n)3) bit operations.

This result can be interpreted as follows, as the output phrases suggest:

Corollary 5.10(a) If on input n the Miller-Rabin algorithm outputs ”composite”, then n is composite.(b) If on input n the Miller-Rabin algorithm with security parameter R outputs ”probably prime”,

then n is prime with probability > 1− 1

4R, and composite with probability ≤ 1

4R.

The probability bound appears experimentally to be far from sharp, i.e. in practice the Miller-Rabin primality test performs much better than Theorem 5.9(b) suggests. See [MvOV, Section4.4.1] for a discussion.

Algorithm 5.1 (Miller-Rabin Primality Test)

Input: n ∈ Z with n ≥ 3 and odd

R ∈ N (security parameter)

Output: "probably prime" or "composite"

Step 1: r ← 0, prob-prime ← true

compute s, t such that n− 1 = 2st and t odd

Step 2: while prob-prime and r < R do

Step 3: choose a new random a ∈ {2, 3, . . . , n− 2}k ← 0, x← at (mod n)

Step 4: while k < s and x 6≡ 1 (mod n) do

k ← k + 1, z ← x, x← x2 (mod n)Step 5: if k = 0

then r ← r + 1else

if k = s and x 6≡ 1 (mod n)then prob-prime ← false

else

if z ≡ −1 (mod n)then r ← r + 1else prob-prime ← false

Step 6: if prob-prime

then output "probably prime"

else output "composite"


54 5.3 Deterministic Primality Testing

5.3 Deterministic Primality Testing

5.3.1 The primitive root test

In this course we cannot go deep into the theory of deterministic primality tests. Some tests workvery well for numbers of a special form. To give one example, consider the following lemma.

Lemma 5.11 The number n is prime if and only if there exists an a ∈ Z∗n that has order n− 1.

Recall that to find elements of order n−1 one has to check that an−1 ≡ 1 (mod n), and a(n−1)/q 6≡ 1(mod n) for all primes q | n− 1, see Algorithm 3.1.

This lemma can be used in a primality test for n by choosing a number of random integers a andtesting their orders. When n is prime then there are φ(n−1) primitive roots, and φ(n−1) cannot

be small. In fact, φ(m) > cm

log logmfor some constant c, so after at most O(log log n) trials one

expects to have found a primitive root. If that happens, the number n is proven to be a prime.

When a primitive root is not found in O(log log n) trials, then one suspects that n might becomposite, and the Miller-Rabin test is then likely to spot this.

The advantage of this idea is that if the algorithm returns with the decision ”prime” (from aprimitive root being found) or ”composite” (from the Miller-Rabin test), then the number is aproven prime or composite. The algorithm is however not guaranteed to terminate.

In practice the most troublesome property of the algorithm is that the factorization of n − 1 isrequired; this factorization is usually not available.

The idea has been developed further, e.g. such that only a partial factorization of n−1 is required,but we will not go into details.

5.3.2 Primality certificates

Some algorithms produce primality certificates. That is, for a given number n, after a longcomputation, a small set of small numbers is produced, with which a proof for the primality of nis easily verified. For example, a primitive root can act as a primality certificate.

Note that compositeness certificates are also possible, e.g. (Fermat or strong) witnesses, or acomplete or partial factorization.

5.3.3 Other deterministic primality tests

For numbers of special type fast deterministic primality tests are known, e.g. for Mersenne numbers2n − 1. We give no details here.

Other deterministic primality tests for arbitrary numbers have been developed, such as the Jacobisum primality test and the elliptic curve primality test. They are usually rather time consuming,both in theory (not polynomial time) and in practice.

Until recently it was not known whether there exists a deterministic polynomial time primalityproving algorithm. In 2002 three Indian computer scientists (Agrawal, Kayal and Saxena), whowere unknown in the number theory community, came up with a completely new primality test,now known as the AKS Algorithm or AKS primality test , that is deterministic and runs in poly-nomial time. Their invention got world wide media coverage, and is a major breakthrough indeed.However, it seems that their method is too slow for practical purposes. Improvements have beenfound, notably by Dan Bernstein (http://cr.yp.to), showing quartic complexity, i.e. O((log n)4).


5.4 Prime Number Generation 55

It is not yet clear if such algorithms will become practical for cryptographers. See [Sh, Chapter22].

Unfortunately in this course we cannot go into details.

5.4 Prime Number Generation

5.4.1 Random Primes

The basic idea of generating primes is very simple: generate random numbers with the requiredproperties (such as having a certain number of bits, lying in a certain interval, lying in a certaincongruence class), and apply some primality test, until a (probable) prime is found. In practicalcryptography often the following prime generation algorithm, based on Miller-Rabin, is used to(e.g.) generate primes of a given number of bits.

Algorithm 5.2 (Miller-Rabin Prime Number Generation)

Input: k ∈ N (required number of bits)


Output: a k bit probable prime pStep 1: repeat

choose a new random k bit odd integer pmr ← the output of the Miller-Rabin

primality test with input p,Runtil mr = "probably prime"

Step 2: output p

Of practical interest is the probability that this algorithm returns a composite p. Let this prob-ability be pk,R. Theorem 5.9(b) suggests that pk,R < 4−R. In fact the situation is much better.We quote the following result, showing that already one run of the main loop in the Miller-Rabinprimality test (R = 1) gives better results in practice for moderate k.

Lemma 5.12 (Damgard, Landrock and Pomerance)

If k ≥ 2 then pk,1 < k242−√k.

If k ≥ 88 and 2 ≤ R ≤ 9 then pk,R < k√k2R42−

√RK/√R.

See [MvOV, Sections 4.48, 4.49] for some more elaborate results.

In practice, when a probability of 2−80 is seen as acceptable and the bitsize of the primes is atleast 512 (resp. 1024), a security parameter R = 6 (resp. 3) is already sufficient.

On present day personal computers generating primes of several thousands of bits should take atmost a few seconds.

5.4.2 Strong Primes

Sometimes primes with special properties are required, such as Sophie Germain primes, or primesthat are in some other sense ’strong primes’. Usually this refers to the prime divisors of n− 1 andn+ 1, that should not all be small, to avoid certain cryptographic attacks.

Generating Sophie Germain primes is essentially slower than generating random primes. The onlyknown way is to apply a primality test to both p and 1

2 (p− 1), and of course this test fails muchmore often than for random primes.

Algorithm 5.3 (Sophie Germain Prime Number Generation)


56 5.4 Prime Number Generation



Output: a k bit probable prime p such that12 (p− 1) is also probable prime

Step 1: repeat

choose a new random k bit odd integer pmr’ ← the output of the Miller-Rabin

primality test with input 12 (p− 1), R

if mr’ = "probably prime" then

mr ← the output of the Miller-Rabin

primality test with input p,Relse mr ← "irrelevant"

until mr = "probably prime" and mr’ = "probably prime"

Step 2: output p

The following algorithm generates primes p that are strong primes in the sense that p − 1 has aprime divisor of about half the bitlength of p. For different cryptographic applications differentnotions of ’strong primes’ may exist, so this is only an example. Variations can be easily made.The algorithm below is about as fast as random prime number generation.

Algorithm 5.4 (Strong Prime Number Generation)



Output: a k bit probable prime p such that p− 1has a 1

2k bit probable prime divisor

Step 1: generate a 12k bit probable prime q

Step 2: repeat

choose a new (random) integer rwith 1

q2k−1 < r < 1q2k

p← qr + 1mr ← the output of the Miller-Rabin

primality test with input p,Runtil mr = "probably prime"

Step 3: output p

5.4.3 Constructive methods

We conclude the chapter on prime numbers with mentioning that there are also constructivemethods for making prime numbers. One method is based on the following result known asPocklington’s Theorem.

Theorem 5.13 (Pocklington’s Theorem) Let n ≥ 3 be an odd integer. Suppose that n =1 + 2qR, where q is an odd prime and q > R. Suppose that there exists an a such that an−1 ≡ 1(mod n) and gcd(a2R − 1, n) = 1. Then n is prime.

Proof. Assume that n is composite. Let p be an odd prime such that p | n and p ≤√n. Let r

be the order of a modulo p. Then r | p− 1 by Fermat’s Theorem, and r | n− 1 because an−1 ≡ 1(mod p). If q - r then r | n − 1 = 2qR implies r | 2R, hence a2R ≡ 1 mod p, hence p|a2R − 1.But this contradicts gcd(a2R − 1, n) = 1. Hence q | r. This implies q | p − 1, and as q is odd wemust have q ≤ 1

2 (p − 1). Finally we have n = 1 + 2qR < 1 + 2q2 ≤ 1 + 12 (p − 1)2 < p2, and this

contradicts p ≤√n. Hence n must be prime. 2


5.4 Prime Number Generation 57

The use of this theorem is as follows. Assume we have an odd prime q. Then we can try randomintegers R < q (but not much smaller than q) and random integers a, and check the conditions ofPocklington’s Theorem. If we succeed (and it can be shown that this is likely), we then have aproven prime p that is almost twice the size of q. We can then iterate this and build larger andlarger primes.

Each prime generated in this way comes with a certificate, consisting of q and a, and the certificateof q. So in fact a prime comes with a certificate chain.

Note that producing a certificate may be a long process, as it is hard to predict how many choicesfor R and a will fail. But once a certificate is found, verifying it is easy: only two modularexponentiations are required for checking each certificate in a certificate chain.

Example: with q = 5 we can take R = 3 and a = 7. Then n = 31. Indeed, 730 ≡ 1 (mod 31)and 76 ≡ 4 (mod 31). So Pocklington’s Theorem shows that 31 is prime, and a certificate for theprimality of 31 is {5, 7}. The primality of 5 does not need a certificate.

We can then proceed with q = 31, and we can take e.g. R = 26, a = 1184, and thus obtain theprime 1613 with certificate chain {{5, 7}, {31, 1184}}. Going a few steps further, we easily findthat 4999499696151619140572559421 is a prime, with certificate chain{{5, 7}, {31, 1184}, {1613, 3222331}, {5055143, 40018519077137}, {50720746959643, 4851332306743868645788396116}}.And we easily could have extended this a few more steps.

To conclude we note the drawbacks of this method: it cannot produce primality proofs for givennumbers, it will generally be slower than probabilistic primality generating algorithms, and theprimes that are produced are not really random.

Exercises

5.1. Make a list of all primes below 100. Do this as follows: write all natural numbers up to 100in a table of e.g. 10 rows and 5 columns. Cross out 1, then cross out all multiples of 2 (except 2itself), all multiples of 3 (except 3 itself), etcetera. Stop when you are certain that all numbersleft are primes.Describe a similar procedure to list all primes below x. The multiples of which numbers have tobe crossed out? When can you stop and why? Can you estimate the complexity?This procedure is called the Sieve of Eratosthenes.

5.2. Assuming the Prime Number Theorem, prove that π(x) ∼ x

log x− 1, and π(x) ∼ li(x).

5.3. For each k ∈ N, find a set of k consecutive composite integers.

5.4. Prove Wilson’s Theorem: For n ≥ 2, n is prime if and only if (n− 1)! ≡ −1 (mod n). Whydoes this not give a good primality test?


5.6. Show that 561 is a Carmichael number. Hint: use 561 = 3× 11× 17.

5.7. Let n be a composite number with a being a strong witness but not a Fermat witness. Thenshow how to find some nontrivial factors of n.

5.8. Let n be odd, and a coprime to n. Lemma 4.2(a) asserts that(an

)≡ a(n−1)/2 (mod n) when

n is prime.

Compute

(13

561

)and 13280 (mod 561) (you may use a computer). What is your conclusion on

the primality of 561?The above should suggest a probabilistic primality test to you. Write it out. It is known as the


58 5.4 Prime Number Generation

Solovay-Strassen primality test .

5.9. Witnesses, liars and pseudoprimes for the Solovay-Strassen primality test (see Exercise 5.8)are called Euler witnesses, Euler liars and Euler pseudoprimes. Write out their definitions. Showthat a strong liar is always a Fermat liar, and that an Euler liar is always a Fermat liar (in fact, astrong liar always is an Euler liar, but proving this is not asked here, see [BS, Theorem 9.3.10]).Conclude which of the tests of Miller-Rabin, Solovay-Strassen and the test based on Fermat’sTheorem is the strongest. Finally prove that if n ≡ 3 (mod 4) then an Euler liar always is astrong liar.

5.10. Let n = 493, and take a = 2, 30, 86 or 157. For each of these a find out whether it is aFermat liar or witness, an Euler liar or witness, a strong liar or witness.



6.1 Multiplicative functions 59

Chapter 6

Multiplicative functions

Introduction

General references for this chapter: [Sh, Section 2.6]. And for those preferring Dutch: [Be,Chapters 3, 6], [Ke, Chapter 8].


• arithmetic functions,

• multiplicative functions,

• the Mobius function,

• the Mobius inversion formula,

• the principle of inclusion and exclusion.

6.1 Multiplicative functions

In this section we study arithmetic functions, that are functions f defined on N that satisfyf(1) 6= 0. These functions may take values in any convenient subset of C, often in Z.

In particular we will be interested in multiplicative functions. An arithmetic function f is calledmultiplicative if it satisfies f(mn) = f(m)f(n) whenever gcd(m,n) = 1. Note that multiplicativefunctions are determined completely by their values at prime powers pk. We already met oneimportant multiplicative function, namely Euler’s φ(n), see Theorem 3.1(a).

Recall that φ(n) satisfies∑d|n

φ(d) = n (see Theorem 3.5). For many functions f on N the sum

g(n) =∑d|n

f(d) is of interest. Our main goal in this section will be to invert this formula: to find

an expression for f(n) in terms of the above defined g(n).

In order to do this in a neat way we define the convolution product f ∗g of two arithmetic functionsf and g, by

(f ∗ g)(n) =∑d|n

f(nd

)g(d).


60 6.1 Multiplicative functions

Note that also

(f ∗ g)(n) =∑d|n

f(d)g(nd

)=

∑d1d2=n

f(d1)g(d2),

which is useful as it gives a more symmetric definition of convolution.

We also introduce the following easy multiplicative functions on N:

E(n) =

{1 if n = 10 if n > 1

, U(n) = 1 for all n, I(n) = n for all n.

Note that we now have∑d|n

f(d) = (U ∗ f)(n)

as a convenient notation. For example, the formula∑d|n

φ(d) = n (Theorem 3.5) now amounts to

U ∗ φ = I.

The following lemmas give the basic properties of the convolution product. Basically they saythat the set of arithmetic functions is a commutative group with the convolution product as groupoperation, and that the set of multiplicative functions is a subgroup.

Lemma 6.1 Let f, g, h be arithmetic functions.(a) The convolution product f ∗ g is also an arithmetic function.(b) The convolution product is commutative, i.e. f ∗ g = g ∗ f .(c) The convolution product is associative, i.e. (f ∗ g) ∗ h = f ∗ (g ∗ h).(d) The function E is the unit for the convolution product, i.e. f ∗ E = E ∗ f = f .(e) There exists a unique arithmetic function f−1 that is the inverse of f with respect to theconvolution product, i.e. f ∗ f−1 = f−1 ∗ f = E.

Lemma 6.2 Let f, g be multiplicative functions.(a) f(1) = 1.(b) The convolution product f ∗ g is also a multiplicative function.(c) The inverse f−1 is also a multiplicative function.

Proof of Lemma 6.1.(a) All we have to check is that (f ∗ g)(1) = f(1)g(1) 6= 0.

(b) This follows by d | n⇔ n

d| n.

(c) Note that

((f ∗ g) ∗ h)(n) =∑d|n

(f ∗ g)(d)h(nd

)=

∑dd3=n

(f ∗ g)(d)h(d3)

=∑dd3=n

∑d1d2=d

f(d1)g(d2)h(d3) =∑

d1d2d3=n

f(d1)g(d2)h(d3).

The last expression is symmetric, so it is easily seen to be equal to (f ∗ (g ∗ h))(n).

(d) (f ∗ E)(n) =∑d|n

f(nd

)E(d) = f(n).

(e) This can be done inductively (recursively) from the formula

(f ∗ f−1)(n) =∑d|n

f(nd

)f−1(d) = E(n) =

{1 if n = 10 if n > 1

,


6.2 The Mobius function 61

as follows. First apply this with n = 1, to find f−1(1) =1

f(1), which is possible precisely because

f(1) 6= 0. Next we assume that f−1(1), f−1(2), . . . , f−1(n − 1) are defined, and then from theabove formula we have

f−1(n) = − 1

f(1)

∑d|n,d 6=n

f(nd

)f−1(d).

As there is only one possible choice for f−1(n), uniqueness is guaranteed. 2

Proof of Lemma 6.2.(a) 1 is coprime to itself, so f(1) = f(1 · 1) = f(1)f(1) = f(1)2. As f(1) 6= 0 we have f(1) = 1.(b) Let m,n ∈ N be coprime. The set of divisors d | mn is in one to one correspondence to theset of pairs (dm, dn) such that dm | m and dn | n, namely via d = dmdn. Note that dm and dn arecoprime. So

(f ∗ g)(mn) =∑d|mn

f(mnd

)g(d) =

∑dm|m

∑dn|n

f

(m

dm

n

dn

)g(dmdn)

=∑dm|m

∑dn|n

f

(m

dm

)f

(n

dn

)g(dm)g(dn)

=

∑dm|m

f

(m

dm

)g(dm)

∑dn|n

f

(n

dn

)g(dn)

= (f ∗ g)(m)(f ∗ g)(n).

(c) Define a multiplicative function f by f(pk) = f−1(pk) for all primes p and k ≥ 0 (note that

this defines f(n) for all n). Then by (b) f ∗ f is also multiplicative, and it satisfies (f ∗ f)(pk) =

(f ∗ f−1)(pk) = E(pk). Hence (f ∗ f)(n) = E(n) for all n, i.e. f ∗ f = E, and so by the uniqueness

of the inverse we have f−1 = f . Hence f−1 is multiplicative. 2

As examples of the recursion from the proof of Lemma 6.1(e), note that for primes p

f−1(p) = −f(p),

f−1(p2) = −f(p2) + f(p)2,

f−1(p3) = −f(p3) + 2f(p)f(p2)− f(p)3,

f−1(p4) = −f(p4) + 2f(p3)f(p) + f(p2)2 + 2f(p2)f(p)2 + f(p)4,

etc.

6.2 The Mobius function

The next step is to find the inverse for the function U . This function U−1 is commonly called µ.Applying the recursion, we find for all primes p

µ(p) = −U(p)µ(1) = −µ(1) = −1,

µ(p2) = −U(p2)− U(p)µ(p) = −µ(1)− µ(p) = 0,

µ(p3) = −U(p3)− U(p2)µ(p)− U(p)µ(p2) = −µ(1)− µ(p)− µ(p2) = 0,

µ(p4) = −U(p4)− U(p3)µ(p)− U(p2)µ(p2)− U(p)µ(p3) =

= −µ(1)− µ(p)− µ(p2)− µ(p3) = 0,


62 6.3 Mobius inversion

etc. With induction we get µ(pk) = 0 whenever k ≥ 2, namely

µ(pk) = −k−1∑i=0

U(pk−i)µ(pi) = −µ(1)− µ(p)−k−1∑i=2

µ(pi) = −1− (−1)−k−1∑i=2

0 = 0.

Note that by Lemma 6.2(c) µ is multiplicative, which shows how to define µ(n) for all n.

A number n ∈ N is called squarefree if it is not divisible by a square other than 1. It is now clearthat µ(n) = 0 whenever n is not squarefree, and if n is squarefree, then µ(n) = 1 or −1, accordingto the number of prime divisors of n being even or odd.

The function µ is called the Mobius function. Summarizing, it can be defined by

µ(n) =

0 if n is not squarefree,1 if n is squarefree and has an even number of prime divisors,−1 if n is squarefree and has an odd number of prime divisors,

is multiplicative, and satisfies µ ∗ U = E. This last property written out reads

∑d|n

µ(d) =

{1 if n = 10 if n > 1

.

Example: µ(15) = 1 because 15 = 3×5 is squarefree and has 2 prime divisors, µ(30) = −1 because30 = 2 × 3 × 5 is squarefree and has 3 prime divisors, and µ(60) = 0 because 60 = 22 × 3 × 5 isnot squarefree.

6.3 Mobius inversion

Now we reach the main result of this section, the Mobius Inversion Formula.

Theorem 6.3 (Mobius Inversion Formula)Let f be an arithmetic function. Define the arithmetic function g by g = U ∗ f , i.e.

g(n) =∑d|n

f(d) for all n ∈ N.

Then f = µ ∗ g, that is

f(n) =∑d|n

µ(nd

)g(d) for all n ∈ N.

Moreover, if f is multiplicative, then so is g.

Proof. By the definition of µ we have µ ∗ g = µ ∗ (U ∗ f) = (µ ∗ U) ∗ f = E ∗ f = f . Lemma 6.2guarantees multiplicativity of g based on that of f . 2

Example: with f = E we have g = U ∗E = U , and Mobius inversion now gives E = µ ∗ U , whichis exactly the definition of µ as the inverse of U .


6.4 The Principle of Inclusion and Exclusion 63

Example: Theorem 3.5 showed that∑d|n

φ(d) = n, or, in our new language, U ∗ φ = I. When we

apply Mobius inversion we get φ = µ ∗ I, or

φ(n) =∑d|n

µ(d)n

d.

This gives a new proof for Lemma 3.9. Because U and µ are multiplicative, so is φ. This is anew proof for Theorem 3.1(a). Now note that for primes p and i ≥ 0 we have µ(pi) = 1 if i = 0,µ(pi) = −1 if i = 1, and µ(pi) = 0 if i ≥ 2, so for all k ∈ N

φ(pk) =∑d|pk

µ(d)pk

d=

k∑i=0

µ(pi)pk−i = pk − pk−1.

With the multiplicativity of φ this provides a new proof of Theorem 3.1(b).

These observations on φ will return in the next section.

6.4 The Principle of Inclusion and Exclusion

Consider n = pq where p, q are distinct prime numbers. We will count the number of elementsof Z∗n by a combinatorial argument. We start with Zn = {0, 1, 2, . . . , n − 1}, which has n ele-ments. Then we note that we should not count all multiples of p, of which there are q (namely0, p, 2p, . . . , (q − 1)p), and similarly we should not count all multiples of q, of which there are p(namely 0, q, 2q, . . . , (p − 1)q). Finally we should note that we now have subtracted twice thenumbers that are divisible by both p and q, of which there is exactly one (namely 0), and tocompensate we should again add this number. Hence φ(n) = n− p− q + 1. This is in accordancewith Theorem 3.1(b).

A bit more abstract, let S be a finite set with subsets S1 and S2. Then the number of elements ofS that are not in either one of S1 or S2 is equal to

#(S\(S1 ∪ S2)) = #S −#S1 −#S2 + #(S1 ∩ S2).

And with three subsets S1, S2, S3 we get

#(S\(S1 ∪ S2 ∪ S3)) = #S −#S1 −#S2 −#S3 + #(S1 ∩ S2) +

#(S1 ∩ S3) + #(S2 ∩ S3)−#(S1 ∩ S2 ∩ S3).

In general, say that we have a finite set S with #S = N . Further we have properties P1, P2, . . . , Pkwhich elements of S may or may not possess. For given {i1, i2, . . . , im} ∈ {1, 2, . . . , k} with all ij dif-ferent we denote by Ni1,...,im the number of elements of S that possess properties Pi1 , Pi2 , . . . , Pim ,but no other properties from P1, P2, . . . , Pk. And the number of elements of S that possess noneof the properties Pi is denoted by N∅.

Theorem 6.4 (Inclusion-Exclusion Principle)

N∅ = N −∑

1≤i1≤k

Ni1 +∑

1≤i1<i2≤k

Ni1,i2 + . . .+ (−1)kN1,2,...,k

= N +

k∑m=1

(−1)m∑

1≤i1<...<im≤k

Ni1,...,im .


64 6.5 Fermat and Euler revisited

Proof. Consider an element of S that satisfies none of the properties. It is counted once on bothsides of the inequality.Next consider an element of S that satisfies exactly r of the properties, with 1 ≤ r ≤ k. On the

left hand side it is not counted, and on the right hand side, in the term∑

1≤i1<...<im≤k

Ni1,...,im it is

counted

(rm

)times (which equals 0 when r < m). Together this amounts to 1+

r∑m=1

(−1)m(rm

)=

(1− 1)r = 0 times. 2

Finally we will once more prove Theorem 3.1(b), and show the relation to Mobius inversion. Let

n have the prime factorization n =

k∏i=1

peii . Let Pi be the property of divisibility by pi, applied to

elements of Zn. The Inclusion-Exclusion Principle then yields

φ(n) = n−∑

1≤i1≤k

n

pi1+

∑1≤i1<i2≤k

n

pi1pi2+ . . .+ (−1)k

n

p1p2 . . . pk.

On the one hand,

n−∑

1≤i1≤k

n

pi1+

∑1≤i1<i2≤k

n

pi1pi2+ . . .+ (−1)k

n

p1p2 . . . pk= n

(1− 1

p1

). . .

(1− 1

pk

),

as can be seen by expanding the product in the right hand side. This gives exactly Theorem3.1(b). And on the other hand every squarefree divisor of n is of the form pi1 . . . pim , so

n−∑

1≤i1≤k

n

pi1+

∑1≤i1<i2≤k

n

pi1pi2+ . . .+ (−1)k

n

p1p2 . . . pk=∑d|n

µ(d)n

d,

i.e. φ(n) =∑d|n

µ(d)n

d, in short φ = µ∗I, which we had already seen in the previous section. Again

we have a new proof of∑d|n

φ(n) = n.

6.5 Fermat and Euler revisited

The usual way to generalize Fermat’s Little Theorem 3.2 to non-prime moduli is Euler’s Theorem3.1. Another generalization of Fermat’s Little Theorem to non-prime moduli is the followingelementary result, that seems to be not very well known.

Theorem 6.5 (Generalization of Fermat’s Little Theorem) Let n > 1 be an integer, andlet a ∈ Z. Then

an ≡ −∑

k|n,k<n

µ(n/k)ak (mod n). (6.1)

With n having prime factors p1, p2, . . . , pr (i.e. n =

r∏i=1

peii ), we also can phrase (6.1) as

an ≡r∑

k=1

(−1)k+1∑

S ⊂ {1, 2, . . . , r}#S = k

an/∏

i∈S pi (mod n).


6.5 Fermat and Euler revisited 65

To get a feeling for this, let’s write out a few cases.

When n = p is prime, (6.1) reads

ap ≡ a (mod p),

which is just Fermat’s Little Theorem.

When n = pq for p, q different primes, (6.1) reads

an ≡ ap + aq − a (mod n).

This statement seems already to be not well known. Note that we can infer the funny formula

ap+q−1 ≡ ap + aq − a (mod pq).

When n = pqr for p, q, r different primes, (6.1) reads

an ≡ apq + apr + aqr − ap − aq − ar + a (mod n).

Again we get a funny formula:

apq+pr+qr−p−q−r+1 ≡ apq + apr + aqr − ap − aq − ar + a (mod pqr).

Continuation is obvious.

With n = pe the Generalization of Fermat’s Little Theorem gives ape ≡ ap

e−1

(mod pe), hence if

p - a we have ape−pe−1 ≡ 1 (mod pe), and Euler’s Theorem now readily follows as well.

Here is a short proof of the above Generalization of Fermat’s Little Theorem.

Proof. Consider sequences (x1, x2, . . . , xn) of elements xi ∈ Za. Clearly there are an possiblesequences. We define two sequences to be equivalent when the one is a cyclic shift of the other. Letαk be the number of equivalence classes consisting of k elements. Clearly αk is nonzero preciselywhen k|n, and we find

an =∑k|n

kαk.

Mobius inversion yields

nαn =∑k|n

µ(n/k)ak,

and this immediately gives the result. 2

Exercises

6.1. Let f : N→ Z be an arithmetic function. Show that if f is multiplicative, then also f−1 is afunction from N to Z. What if f is not multiplicative?

6.2. Give interpretations for U ∗ U and U ∗ I.

6.3. Compute the inverse I−1 of I. Also show that the inverse φ−1 of φ is given by∑d|n

dµ(d).

6.4. Count the number of positive integers below 1000 that are not divisible by 5, 6 or 7.


66 6.5 Fermat and Euler revisited


7.1 The Euclidean Algorithm revisited 67

Chapter 7

Continued Fractions

Introduction

General reference for this chapter: [HW, Chapters XI and XXIII]. And for those preferring Dutch:[Be, Chapter 14], [Ke, Sections 7.6, 15.6, 15.7].

In this chapter the following topics will be treated

• continued fractions,

• lattice basis reduction,

• diophantine approximation.

7.1 The Euclidean Algorithm revisited

We reformulate the Extended Euclidean Algorithm. Say we want to apply it to the positiveintegers s and t. Then we put

s0 = t−1 = s, t0 = t, p−2 = 0, p−1 = 1, q−2 = 1, q−1 = 0,

and then do the following computation for n = 0, 1, 2, . . . until we reach tn = 0:

an =

⌊sntn

⌋,

pn = anpn−1 + pn−2,qn = anqn−1 + qn−2,sn+1 = tn,tn+1 = sn − antn.

The invariant now is pnt−qns = (−1)n+1tn+1 = (−1)n+1sn+2. We will not be interested anymorein the greatest common divisor, but in the numbers an, pn and qn that show up in this algorithm.Note that they depend on the fraction, and are independent of the gcd: if s and t are multipliedby the same integer, then the an, pn, qn do not change at all.

Example: let us take s = 23 and t = 16. Then


68 7.2 Continued Fractions

n −2 −1 0 1 2 3 4sn 23 16 7 2 1tn 23 16 7 2 1 0an 1 2 3 2pn 0 1 1 3 10 23qn 1 0 1 2 7 16

7.2 Continued Fractions

If we write the successive divisions with remainder really as divisions, we get

23

16= 1 +

7

16,

16

7= 2 +

2

7,

7

2= 1 +

1

2,

2

1= 2.

Note that the fractions are chained, each time put upside down. When we substitute each nextfraction in the previous one, we get

23

16= 1 +

7

16= 1 +

1167

= 1 +1

2 + 27

= 1 +1

2 + 172

= 1 +1

2 + 13+ 1

2

.

In general we get

s

t=s0t0

= a0 +t1s1

= a0 +1s1t1

= a0 +1

a1 + 1s2t2

= . . . = a0 +1

a1 + 1a2+

1

a3+ 1

...+ 1an

.

This is a typographical nightmare, that’s why we introduce the notation

[a0, a1, a2, . . . , an] = a0 +1

a1 + 1

. . .+ 1an

.

Example:23

16= [1, 2, 3, 2].

Next we look at the truncations:

[1] = 1 =1

1, [1, 2] = 1 +

1

2=

3

2, [1, 2, 3] = 1 +

1

2 + 13

=10

7, [1, 2, 3, 2] =

23

16.

It looks like

[a0, a1, a2, . . . , an] =pnqn

for all n, and this is indeed true in general, as we’ll see below.

Expressions like [a0, a1, a2, . . .] are called continued fractions. We now introduce the continuedfraction algorithm. When applied for rational numbers, it is just a restatement of the ExtendedEuclidean Algorithm applied to numerator and denominator. But it can also be applied for non-rational numbers. The algorithm is the same: each time take the integral part, put the remainderupside down and continue with that.


7.2 Continued Fractions 69

Take for example π = 3.14159 . . ., then we get

π = 3 + 0.14159 . . . , 10.14159... = 7 + 0.06251 . . . , 1

0.06251... = 15 + 0.99659 . . . ,1

0.99659... = 1 + 0.00341 . . . , 10.00341... = 292 + 0.63459 . . . , etc.

So π = [3, 7, 15, 1, 292, . . .].

Rational numbers have finite continued fractions (why?), irrational numbers have infinite continuedfractions (why?).

When α = [a0, a1, a2, . . .] (finite or infinite), the fractions that are equal to the truncated continued

fractions, i.e.pnqn

= [a0, a1, . . . , an], are called convergents of α. The numbers a0, a1, a2, . . . are

called partial quotients of the continued fraction.

The following algorithm computes partial quotients and convergents for any α ∈ R>0 (for negativenumbers everything goes through as well).

Algorithm 7.1 (Continued Fraction Algorithm)

Input: α ∈ R>0, n ∈ NOutput: truncated continued fraction [a0, a1, a2, . . . , am] of α

and the convergents p0q0, p1q1 , . . . ,

pmqm

, where m = n

unless α ∈ Q and α = [a0, a1, . . . , am] with m < nStep 1: α0 ← α

p−2 ← 0, p−1 ← 1q−2 ← 1, q−1 ← 0m← n

Step 2: for i from 0 to m do

ai ← bαicpi ← aipi−1 + pi−2qi ← aiqi−1 + qi−2if αi = ai then m← i

if i < m then αi+1 ←1

αi − aiStep 3: output [a0, a1, . . . , am],

p0q0,p1q1, . . . ,

pmqm

Note that usually m = n, except in the case where α ∈ Q has a continued fraction that is shorterthan n partial quotients.

Let us show that the convergents can indeed be found by the recurrence formulas as given in thecontinued fraction algorithm.

Lemma 7.1 Let α = [a0, a1, a2, . . .]. Let pn, qn for n = −2,−1, 0, 1, 2, . . . be given by{p−2 = 0, p−1 = 1, pn = anpn−1 + pn−2q−2 = 1, q−1 = 0, qn = anqn−1 + qn−2

for n = 0, 1, 2, . . . .

Thenpnqn

= [a0, a1, . . . , an], i.e. thepnqn

are the convergents of α.

To prove this result we need the following auxiliary lemma.

Lemma 7.2 Let a0 ∈ Z, and a1, a2, . . . ∈ N (finitely or infinitely many). Let pn, qn for n =0, 1, 2, . . . be given as in Lemma 7.1. Then for all ξ ∈ R, ξ > 0 and all n ≥ 0

ξpn + pn−1ξqn + qn−1

= [a0, a1, . . . , an, ξ].


70 7.3 Diophantine approximation

Further pnqn−1 − pn−1qn = (−1)n−1 for all n ≥ −1, and pnqn−2 − pn−2qn = (−1)nan for alln ≥ 0.

Proof. By induction. For n = 0 we haveξp0 + p−1ξq0 + q−1

=ξa0 + 1

ξ= a0 +

1

ξ= [a0, ξ]. Next assume

thatηpn−1 + pn−2ηqn−1 + qn−2

= [a0, a1, . . . , an−1, η] for some n ≥ 1 and all η ∈ R, η > 0. Then we find

[a0, a1, . . . , an, ξ] =

[a0, a1, . . . , an−1, an +

1

ξ

]=

(an + 1

ξ

)pn−1 + pn−2(

an + 1ξ

)qn−1 + qn−2

=anpn−1 + pn−2 + 1

ξpn−1

anqn−1 + qn−2 + 1ξ qn−1

=pn + 1

ξpn−1

qn + 1ξ qn−1

=ξpn + pn−1ξqn + qn−1

.

The proofs of pnqn−1− pn−1qn = (−1)n−1 and pnqn−2− pn−2qn = (−1)nan are left as an exercise(7.6). 2

Proof of Lemma 7.1. Apply Lemma 7.2 with ξ = an, to find

[a0, a1, . . . , an−1, an] =anpn−1 + pn−2anqn−1 + qn−2

=pnqn.

2

A consequence is that the denominators qn of the convergents grow at least exponentially, seeExercise 7.7. This implies that finding the convergents up to a certain large size of the numeratorand denominator is computationally easy.

We return to the example of π = [3, 7, 15, 1, 292, . . .], and give the first few convergents:

p0q0

= 3,p1q1

=22

7= 3.14285 . . .,

p2q2

=333

106= 3.14150 . . .,

p3q3

=355

113= 3.14159 . . ..

This might clarify the term convergents. In Greek Antiquity the approximations22

7and

355

113were

already known as good approximations to π.

7.3 Diophantine approximation

Diophantine approximation is the area of number theory that studies how well real numbers canbe approximated by rational numbers. Continued fractions play an important role here.

Theorem 7.3 (Inequality for convergents) Letpnqn

be a convergent of α = [a0, a1, a2, . . .].

Then

1

(an+1 + 2)q2n<

∣∣∣∣α− pnqn

∣∣∣∣ < 1

an+1q2n.

Proof. Define αn+1 = [an+1, an+2, . . .] so that α = [a0, a1, a2, . . . , an, αn+1]. Lemma 7.2 withξ = αn+1 gives∣∣∣∣α− pn

qn

∣∣∣∣ =

∣∣∣∣αn+1pn + pn−1αn+1qn + qn−1

− pnqn

∣∣∣∣ =|pn−1qn − pnqn−1|(αn+1qn + qn−1)qn

=1

(αn+1qn + qn−1)qn,


7.3 Diophantine approximation 71

and the result now follows by an+1 = bαn+1c, because an+1qn < an+1qn+qn−1 < αn+1qn+qn−1 <(an+1 + 1)qn + qn−1 < (an+1 + 2)qn. 2

The following corollaries are immediate.

Theorem 7.4 (Convergents converge) Letp0q0,p1q1,p2q2, . . . be the convergents of α /∈ Q. Then

α = limn→∞

pnqn

.

Theorem 7.5 (Necessary condition for convergents) Letp

qbe a convergent of α. Then∣∣∣∣α− p

q

∣∣∣∣ < 1

q2.

Theorem 7.6 (Diophantine Approximation) If α /∈ Q the inequality

∣∣∣∣α− p

q

∣∣∣∣ < 1

q2has in-

finitely many solutionsp

q.

Theorem 7.3 shows that the convergents approximate the number α very well. When a large partialquotient occurs, the previous convergent is an extremely good approximation. This becomes clearin the example of π, where the approximations 22

7 and 355113 , already well known in antiquity1,

indeed correspond to large partial quotients, namely 15 and 292.

The following results give a criterium for an approximation for being a convergent (a converseto Theorem 7.5), and show that convergents are exactly the best approximations in some sense.First some definitions.

A rational numberp

qis called a best approximation to α if all rational numbers which are closer

to α have larger numerator and denominator. In other words,p

qis a best approximation to α if

for allp′

q′with

∣∣∣∣α− p′

q′

∣∣∣∣ < ∣∣∣∣α− p

q

∣∣∣∣ it holds that q′ > q.

A rational numberp

qis called a strong best approximation to α if for all

p′

q′with |q′α− p′| < |qα− p|

it holds that q′ > q.

A strong best approximation always is a best approximation, but not necessarily the other wayaround, see Exercise 7.15. The concept of best approximation seems better from an intuitive pointof view, but the concept of strong best approximation appears to have nicer properties.

Theorem 7.7 (Sufficient condition for convergents) Let α ∈ R, and letp

qbe a rational

number satisfying

∣∣∣∣α− p

q

∣∣∣∣ < 1

2q2. Then

p

qis a convergent of α.

Theorem 7.8 (Convergents are Strong Best Approximations) Let α ∈ R. The rational

numberp

qis a convergent of α if and only if it is a strong best approximation to α.

To prove these results we introduce a Lemma.

1 227

as good approximation to π was known to Archimedes (Greece, 3rd century BC), and 355113

to Zu Chongzhi(China, 5th century AD, the first European appearance is from Adriaan Anthoniszoon, The Netherlands, 1585).



Lemma 7.9 Let α = [a0, a1, a2, . . .] have convergentspiqi

for i = 0, 1, 2, . . .. Letp

qbe a rational

number such that qn−1 < q < qn for some n ∈ N. Then |p− qα| > |pn−1 − qn−1α|.

Proof. Let x, y be the solution of the system{pn−1x+ pny = pqn−1x+ qny = q

,

in other words, using pnqn−1 − pn−1qn = (−1)n−1, we take

x = (−1)n(pqn − pnq), y = (−1)n−1(pqn−1 − pn−1q),

and we see that both are integers. When y = 0 we find that p = xpn−1, q = xqn−1 with x > 1,so |p − qα| = x|pn−1 − qn−1α| > |pn−1 − qn−1α|. If y > 0, then xqn−1 = q − yqn < q − qn < 0,hence x < 0. And if y < 0, then xqn−1 = q − yqn > q, hence x > 0. So x and y have oppositesign. Next we note that pn−1 − qn−1α and pn − qnα also have opposite sign, see Exercise 7.8. So(pn−1 − qn−1α)x and (pn − qnα)y have equal sign. Now we get

|p− qα| = |(pn−1 − qn−1α)x+ (pn − qnα)y| = |(pn−1 − qn−1α)x|+ |(pn − qnα)y|> |pn−1 − qn−1α|,

unless x = 0, but that would imply q = qny ≥ qn, which is not true. 2

Proof of Theorem 7.8. Letp

qbe a strong best approximation of α, and let α have convergents

piqi

for i = 0, 1, 2, . . .. There is an index n such that qn−1 ≤ q < qn. If q > qn−1 then lemma 7.9

shows that |pn−1 − qn−1α| < |p − qα|, contradicting thatp

qis a strong best approximation of α.

So q = qn−1, and then p = pn−1, sop

qis indeed a convergent.

Now, letpmqm

be a convergent of α, then we have to show that it is a strong best approximation.

So letp

qbe a rational number with q ≤ qm, then we must show that |p− qα| ≥ |pm− qmα|. When

q = qm this is trivial. When q < qm there is an n ≤ m such that qn−1 ≤ q < qn. Lemma 7.9 showsthat either q = qn−1 or |p − qα| > |pn−1 − qn−1α|, so in both cases |p − qα| ≥ |pn−1 − qn−1α|.Exercise 7.9 then shows that |p− qα| ≥ |pn−1 − qn−1α| > |pm − qmα|. 2

Proof of Theorem 7.7. Letp′

q′be an approximation to α such that |q′α− p′| < |qα− p|. Then

we have

1 ≤ |qp′ − pq′| ≤ |qp′ − qq′α|+ |qq′α− pq′| = q|p′ − q′α|+ q′|qα− p|

< (q + q′)|qα− p| = (q + q′)q

∣∣∣∣α− p

q

∣∣∣∣ < (q + q′)q1

2q2=q + q′

2q,

hence q′ > q. This shows thatp

qis a strong best approximation, hence by Theorem 7.8 a conver-

gent. 2

We finally note that to compute all convergents of α with denominators up to N , the number αshould be available with a precision of size at least N−2. When large partial quotients occur thenaccordingly larger precision may be needed. See Exercise 7.16.


7.3 Diophantine approximation 73

Exercises

7.1. Show that a finite continued fraction represents a rational number. Conversely, show thatany rational number has a finite continued fraction. Show that this finite continued fraction isunique apart from the possibility [a0, a1, . . . , an] = [a0, a1, . . . , an − 1, 1].

7.2. With pn, qn, sn, tn, s, t as in Section 7.2, show that pnt− qns = (−1)n+1tn+1.

7.3. Compute a few partial quotients and convergents of e = 2.71828 . . .. Do you notice anypattern? If so, do not attempt to prove it, as the proof is far from trivial.

7.4. Compute the continued fraction and convergents of144

89. Note that 89 and 144 are consecutive

Fibonacci numbers. What is the continued fraction ofFn+1

Fnfor any n?

7.5. Compute a few partial quotients and convergents of α = 12 (1 +

√5). Find the pattern and

prove it. Same questions for α =√

3.

In Exercises 7.6 – 7.9, letpnqn

for n = 0, 1, 2, . . . be the convergents of α ∈ R.

7.6. Prove that pnqn−1− pn−1qn = (−1)n−1 for all n ≥ −1. Hint: use induction. Next prove thatgcd(pn, qn) = 1, and that pnqn−2 − pn−2qn = (−1)nan for all n ≥ 0.

7.7. Prove that qn ≥ Fn, where Fn is the nth Fibonacci number.

7.8. Show thatp0q0

<p2q2

<p4q4

< . . . < α < . . . <p5q5

<p3q3

<p1q1

. Hint: use Exercise 7.6.

7.9. Prove that1

qn+2< |pn − qnα| ≤

1

qn+1. Hint: make a slight refinement in the proof of

Theorem 7.3. Next show that |pn − qnα| > |pn−1 − qn−1α|.

7.10. Is Theorem 7.6 true for α ∈ Q?

7.11. In your favourite computer language (C, Java, ...) or computer algebra system (Mathemat-ica, Maple, ...) program the continued fraction algorithm (do not use built-in functions that do

all the work for you). Compute all solutionsp

qto

∣∣∣∣π − p

q

∣∣∣∣ < 1

2q2with q < 1012. Also compute all

solutionsp

qto

∣∣∣∣π − p

q

∣∣∣∣ < 1

q2with q < 104.

7.12. Find all solutions x, y ∈ Z with 0 < y < 1012 satisfying |2x − 3y| < 100 · 3y

y2.

7.13. Using a computer program, compute up to some point the continued fractions of√d for

the squarefree d up to 100, and find the pattern.

7.14. Prove that if the continued fraction of α is ultimately periodic, then α is the root of aquadratic polynomial with integer coefficients.

[[The converse is also true, i.e. every such so-called quadratic number has an ultimately periodiccontinued fraction. This is known as Lagrange’s theorem, it is more complicated to prove, and wedo not dare to ask that from you. All kinds of patterns can be found in these periods.]]

7.15. Prove that a strong best approximation to a number α is a best approximation to α. Givean example of a best approximation to some number α that is not a strong best approximation.

7.16. Let β and γ share the first n partial quotients of their continued fraction expansions. Showthat the continued fraction expansion of any number between β and γ also shares at least those

first n partial quotients. Show that |β − γ| < 2

q2n.


Bibliography 75

Bibliography

[Be] Frits Beukers, Getaltheorie voor beginners, Epsilon Uitgaven No. 42, Utrecht, 1999,4e druk: 2008.

[BS] Eric Bach and Jeffrey Shallit, Algorithmic Number Theory Vol. 1, EfficientAlgorithms, MIT Press, Cambridge Mass., 1996.

[CP] Richard Crandall and Carl Pomerance, Prime Numbers, A Computational Per-spective, 2nd ed., Springer Verlag, Berlin, 2005.

[Ga] Steven Galbraith, Mathematics of Public Key Cryptography, version 0.9, February11, 2011, http://www.math.auckland.ac.nz/~sgal018/crypto-book/crypto-book.html.

[GG] Joachim von zur Gathen and Jurgen Gerhard, Modern Computer Algebra, 2nded., Cambridge University Press, Cambridge, 2003.

[HW] G.H. Hardy and E.M. Wright, An Introduction to the Theory of Numbers, 6th ed.,Oxford University Press, Oxford, 2008.

[Ke] Frans Keune, Getallen - van natuurlijk naar imaginair, Epsilon Uitgaven No. 65,Utrecht, 2009.

[Kn] Donald E. Knuth, The Art of Computer Programming Vol. 2, Seminumerical Algo-rithms, Addison-Wesley, Reading Mass., 3rd ed., 1997.

[MvOV] Alfred J. Menezes, Paul van Oorschot and Scott Vanstone, Handbook ofApplied Cryptography, CRC Press, 1996. An online version is available at http://www.cacr.math.uwaterloo.ca/hac/.

[NV] Phong Nguyen and Brigitte Valle (eds.), The LLL Algorithm - Survey andApplications, Springer, 2010.

[Sh] Victor Shoup, A Computational Introduction to Number Theory and Algebra, 2nded., Cambridge University Press, Cambridge, 2008. An online version is available athttp://www.shoup.net/ntb/.

[St] William Stein, Elementary Number Theory: Primes, Congruences, and Secrets,Springer, 2008, online version available at http://modular.math.washington.edu/

ent/.

[Wa] Samuel S. Wagstaff, Cryptanalysis of Number Theoretic Ciphers, Chapman andHall / CRC, 2002.

[dW] Benne de Weger, Elementaire getaltheorie en asymmetrische cryptografie, EpsilonUitgaven No. 63, Utrecht, 2009. Second edition 2011. For the accompanying software,see http://www.win.tue.nl/~bdeweger/MCR/


http://www.math.auckland.ac.nz/~sgal018/crypto-book/crypto-book.html

http://www.math.auckland.ac.nz/~sgal018/crypto-book/crypto-book.html

http://www.cacr.math.uwaterloo.ca/hac/

http://www.cacr.math.uwaterloo.ca/hac/

http://www.shoup.net/ntb/

http://modular.math.washington.edu/ent/

http://modular.math.washington.edu/ent/

http://www.win.tue.nl/~bdeweger/MCR/

76 Bibliography


2wf15 - discrete mathematics 2 - part 1 algorithmic number ...bdeweger/downloads/discrete... ·...

Documents