properties of universal hashingktiml.mff.cuni.cz/~babka/hashing/thesis.pdf · charles university in...

Charles University in PragueFaculty of Mathematics and Physics

MASTER THESIS

Martin Babka

Properties of Universal Hashing

Department of Theoretical Computer Scienceand Mathematical Logic

Supervisor: doc. RNDr. Vaclav Koubek, DrSc.

Study programme: Theoretical Computer Science

2010

At this place, I would like to thank my supervisor doc. RNDr. Vaclav Koubek, DrSc.for the invaluable advice he gave to me, for the great amount of knowledge he sharedand time he generously spent helping me when working on the thesis. I would alsolike to say thank you to my friends and family.

I hereby proclaim, that I worked out this master thesis myself, using only the citedresources. I agree that the thesis may be publicly available.

In Prague on 16th April, 2010 Martin Babka

2

Contents

1 Introduction 6

2 Hashing 82.1 Formalisms and Notation . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 Assumptions of Classic Hashing . . . . . . . . . . . . . . . . . . . . . 102.3 Separate Chaining . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112.4 Coalesced Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132.5 Open Addressing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.5.1 Linear Probing . . . . . . . . . . . . . . . . . . . . . . . . . . 142.5.2 Double Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.6 Universal Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.7 Perfect Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.8 Modern Approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.9 Advantages and Disadvantages . . . . . . . . . . . . . . . . . . . . . . 16

3 Universal Classes of Functions 173.1 Universal classes of functions . . . . . . . . . . . . . . . . . . . . . . . 173.2 Examples of universal classes of functions . . . . . . . . . . . . . . . . 203.3 Properties of systems of universal functions . . . . . . . . . . . . . . . 27

4 Expected Length of the Longest Chain 304.1 Length of a Chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.2 Estimate for Separate Chaining . . . . . . . . . . . . . . . . . . . . . 314.3 Estimate for Universal Hashing . . . . . . . . . . . . . . . . . . . . . 33

5 The System of Linear Transformations 365.1 Models of the Random Uniform Choice . . . . . . . . . . . . . . . . . 375.2 Probabilistic Properties . . . . . . . . . . . . . . . . . . . . . . . . . . 395.3 Parametrisation of The Original Proofs . . . . . . . . . . . . . . . . . 495.4 Improving the Results . . . . . . . . . . . . . . . . . . . . . . . . . . 545.5 Probability Distribution of the Variable lpsl . . . . . . . . . . . . . . 575.6 The Expected Value of the Variable lpsl . . . . . . . . . . . . . . . . 655.7 The Achieved Bound . . . . . . . . . . . . . . . . . . . . . . . . . . . 675.8 Minimising the Integrals . . . . . . . . . . . . . . . . . . . . . . . . . 68

3

6 The Model of Universal Hashing 716.1 Time Complexity of Universal Hashing . . . . . . . . . . . . . . . . . 716.2 Consequences of Trimming Long Chains . . . . . . . . . . . . . . . . 726.3 Chain Length Limit . . . . . . . . . . . . . . . . . . . . . . . . . . . . 776.4 The Proposed Model . . . . . . . . . . . . . . . . . . . . . . . . . . . 816.5 Time Complexity of Computing a Hash Value . . . . . . . . . . . . . 816.6 Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 826.7 Potential Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 856.8 Expected Amortised Complexity . . . . . . . . . . . . . . . . . . . . . 87

7 Conclusion 947.1 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

A Facts from Linear Algebra 99

B Facts Regarding Probability 102

4

Nazov prace: Vlastnosti univerzalneho hashovaniaAutor: Martin BabkaKatedra (ustav): Katedra teoreticke informatiky a matematicke logikyVeduci bakalarskej prace: doc. RNDr. Vaclav Koubek, DrSc.E-mail veduceho: [email protected]

Abstrakt: Ciel’om tejto prace je navrhnut’ model hashovania s akceptovatel’nou ca-sovou zlozitost’ou aj v najhorsom prıpade. Vytvoreny model je zalozeny na princıpeuniverzalneho hashovania a vysledna ocakavana casova zlozitost’ je konstantna. Akosystem univerzalnych funkciı sme zvolili mnozinu vsetkych linearnych zobrazenı me-dzi dvomi vektorovymi priestormi. Tento system uz bol studovany s vysledkom pred-povedajucim dobry asymptoticky odhad. Tato praca nadvazuje na predchadzajucivysledok a rozsiruje ho. Tiez ukazuje, ze ocakavany amortizovany cas navrhnutejschemy je konstantny.

Kl’ucove slova: univerzalne hashovanie, linearne zobrazenie, amortizovana casovazlozitost’

Title: Properties of Universal HashingAuthor: Martin BabkaDepartment: Department of Theoretical Computer Science and Mathematical LogicSupervisor: doc. RNDr. Vaclav Koubek, DrSc.Supervisor’s e-mail address: [email protected]

Abstract: The aim of this work is to create a model of hashing with an acceptableworst case time complexity. The model is based on the idea of universal hashingand the expected time of every operation is constant. The used universal class offunctions consists of linear transformations between vector spaces. This class hasalready been studied with a good asymptotic result. This work improves the resultand proves that the expected amortised time of the scheme is constant.

Keywords: universal hashing, linear transformation, amortised time complexity

5

Chapter 1

Introduction

One of the successful tasks done by the computers today is the data storage and itsrepresentation. By this term we not only mean the database management systemsstoring large amounts of data. This term includes various forms of the set represen-tation problem [23] – representation and storing sets of elements. There are a lot ofapproaches, that solve this problem and deal with the situations from representingsmall sets stored in the main memory to enormous databases stored distributivelyon distinct systems.

Solving this problem has many apparent practical applications from databases inbusiness environment to more hidden ones. Consider bus stops of a city representedby objects of a programming language. Think about solving the shortest path prob-lem by graph library. This library does not provide an association of the library’snode object with its city. But you need a fast mapping of the nodes onto stops tobuild the path. This mapping is usually provided by the data structures solving theset representation problem. Nowadays these data structures are commonly presentas HashTables, ArrayLists or Dictionaries in the standard libraries of many currentprogramming languages. Their fast implementation plays an important role in thequality of solutions provided by programmers.

The set representation problem, also known as the dictionary problem, is a storageof a set consisting of chosen elements of a universe. The universe includes all therepresentable elements. Basic operations provided by a data structure solving thisproblem allow a modification of the stored set and querying if an element is storedwithin it.

• Member(x) – returns true if the element x is contained by the represented set.

• Access(x) – returns the data associated with the element x if it is containedwithin the stored set.

• Insert(x) – adds the element into the stored set.

• Delete(x) – removes the element from the stored set.

There are plenty of different data structures solving the problems providing uswith various running times of these operations [8]. If our algorithms prefer some

6

operations to the others, we can gain an asymptotic improvement in their runningtimes by a good choice of the underlying data structure.

The data structures can also guarantee the worst case running time of everyoperation. Other promise to be quick in the average case but their worst case runningtime can be linear with the number of elements stored. For example the running timeof balanced trees, AVL trees [17], red-black trees [18] or B-trees [5] is logarithmic forevery operation. Hash tables [25], [23], [6], [22] should run in O(1) expected time butwe may get unlucky when iterating a chain of large length. The most simple datastructures such as arrays or lists are preferred because they have certain operationsrunning in constant times but the other are definitely linear.

The warranties, provided by the data structures, mean a trade off with the ex-pected times. For instance binary search trees are expected to be faster in theexpected case when assuming uniformity of the input than red black trees or AVLtrees. These comparisons are discussed in [4] and later results in [30] or [14]. Ifthe uniformity of input may not be assumed we are made to use the conservativesolution.

There are special data structures, such as heaps, that do not implement the basicoperations effectively. We can not consider it as a flaw, their use is also special. Inthe case of the heaps it means efficient finding of minimal or maximal element of theset. On the other hand some data structures may allow other operations.

• Ord(k), k ∈ N – returns the kth element of the stored set.

• Pred(x), Succ(x) – returns the predecessor or successor of x in the set.

• Min(x), Max(x) – returns the minimal or maximal element of the stored set.

In this work we try to design a hash table which has the constant expectedrunning time of every operation. In addition it also guarantees a reasonable worstcase bound, O (log n log log n). We can not expect running it as fast as classic hashingwhen assuming the input’s uniformity. The computations indicate that it runs muchfaster than balanced or binary search trees.

The concept of hashing is introduced in Chapter 2. It continues by descriptionof different models of hashing and finally mentions current approaches and fields ofinterests of many authors. In the third chapter the principle of universal hashing isdiscussed. It also introduces many universal classes of functions and states their basicproperties. In Chapter 4 we show how to compute the expected length of the longestchain and discuss its importance. The work later continues by the chapter where weshow the already known results regarding the system of linear transformations. Inthe following chapters we improve the results obtained when using universal hashingwith the system. In the last chapter, we show the properties of the proposed modelbased on the new results.

7

Chapter 2

Hashing

Hashing is one of the approaches designed for the efficient solving of the dictionaryproblem. Various implementations differ in many ways. However usage of a hashfunction and a quickly accessible table, typically represented by an array, is commonto most of them. Every hash function transforms the elements of the universe intothe addresses of the table.

Represented sets are always small when compared to the size of the universe.In order to prevent waste of space we are forced to use tables as small as possible.Typically the size of the hash table containing the represented elements is chosen sothat it is comparable to the size of the stored set.

The fact that the hash table is much smaller than the universe and any twoelements may represented means that the elements may share the same addressafter hashing. This event is called a collision. For hashing it is very importanthow collisions are handled. This is also the most interesting distinctive feature fordistinct hashing models. In fact, collision handling is crucial when determining thetime complexity of the scheme.

When two elements collide they should be stored in a single bucket, cell of thehash table. A bucket is often represented by the simplest data structure possible. Forinstance, a singly linked list should be sufficient in many cases. More sophisticatedschemes, like perfect hashing, represent every chain in another hash table withoutcollisions.

The find operation of a hash table works in the following way. Every element isplaced as an argument for the hash function. The element’s address is then computedand used as an index of the hash table. Then we look into the bucket lying at thereturned address if the element is stored inside or not.

If buckets are represented by linked lists, then the expected time of the findoperation is proportional to one half of the length of the list.

Lemma 2.1. Let S be a set represented by a linked list. Assume that for everyelement x ∈ S:

Pr (x is argument of the find operation) =1

|S| .

Then the expected time of the find operation is |S|+12

.

8

x ∈ U

h(x) = x mod p

0:

1:

...

x y

2:

3:

Figure 2.1: Concept of a hash table.

Proof. Let xi ∈ S be the i-th element of the list, 1 ≤ i ≤ |S|. Time to find theelement xi inside the list equals i. The expected time of the find operation can beexpressed directly from its definition

E (time of the find operation) =

|S|∑i=1

iPr (xi is the argument of the find operation)

=

∑|S|i=1 i

|S| =|S|(|S|+ 1)

2|S| =|S|+ 1

2.

As seen in the previous lemma the time complexity of an operation needs notto be measured by its worst case time. Compare |S|, which is the worst case, to

the expected value |S|+12

which is better. Considering only the worst case times doesnot tell much about the structure’s real behaviour. We should use probability basedcharacteristics that give more accurate results. These characteristics include theexpected operation’s time or its expected worst case time which is usually far moredifficult to analyse. For hashing this difference means an asymptotic improvement.

2.1 Formalisms and Notation

Notation and formalisms that mathematically describe a hash table are crucial partof an exact analysis. Assume that we are hashing a universe U on a hash table ofsize m. The table consists of buckets each having its unique address. The addressesare usually quite simple. When the table consists of m buckets the addresses arejust 0, . . . ,m− 1. Buckets are always identified by their addresses and thus we canrefer to the set B = 0, . . . ,m− 1 as a hash table.

Universe consists of all the representable elements, examples include the objectsof a programming language, strings over an alphabet, numbers or anything else. Hash

9

function is a way to transform these elements into an address of the table, usually anatural number. Hash function h can be described by a function h : U → B. Theother requirements placed on the function h are discussed later.

The letter S typically denotes the stored set. We often refer to the variable n asto the size of the stored set, n = |S|. As already mentioned we always assume thatS ⊂ U and |S| |U |.

Since the universes are really large, recall the previous examples, sizes of the tablesare much smaller when compared to those of the universes. Space waste is typicallycaused by allocating a large table containing many empty buckets. The definitionof the table’s load factor allows an exact analysis of the phenomena connected withthe table’s filling – its performance or waste of space.

Definition 2.2 (Load factor). Let n be the size of the represented set and m be thesize of the hash table. The variable α defined as

α =n

m

is called the load factor of the table.

To take control of the table’s overhead it is sufficient to keep the load factor in apredefined interval. Not only the empty buckets cause troubles. The overfilled onesare another extreme especially if there are too many collisions.

Definition 2.3 (Collision). Let h be a hash function and x, y ∈ U be two distinctelements. We say that the elements x and y collide if h(x) = h(y).

Above all the collision handling is what differentiates distinct hash tables anddetermines their performance.

To completely explain the notation we have to mention that we refer to the func-tion log as to the binary logarithm. The function ln denotes the natural logarithm.

2.2 Assumptions of Classic Hashing

In Lemma 2.1 we computed the expected time of the find operation and used itas a measure of its time complexity. Known probability distribution of the inputenables us to compute the expected value. For the lemma we assumed the uniformdistribution. In hashing similar situation occurs. If we want to find the expectedresults, then we need corresponding probabilistic assumptions on the input. Theassumptions we make may be found in [23] or in [24]. They are similar to ours.

• Hash function h distributes the elements of universe uniformly across the hashtable:

||h−1(x)| − |h−1(y)|| ≤ 1 for every x, y ∈ B.

• Every set has the same probability of being stored among the sets of the samesize:

Pr (S is stored) =1(|U ||S|) for every set S ⊂ U .

10

• Every element of the universe has the same probability of being an argumentof an operation:

Pr (x is used as an argument of an operation) =1

|U | for every x ∈ U .

These assumptions provide us with a probabilistic model for the average caseanalysis of classic hashing.

2.3 Separate Chaining

Separate chaining [22], [24] and [25] may be considered the most basic hash tableimplementation. However, it provides a sufficient framework for illustration of var-ious problems and analyses. Separate chaining usually utilises one hash functionmapping elements to an address – an index of the array representing the hash table.Every represented element is stored within a bucket given by its hash address. Everybucket is represented by a singly linked list.

Separate chaining, like the other classic models of hashing, is quite dependenton the stored input. The previous assumptions are required for its average caseanalysis. This model assumes hashing of the universe U = 0, . . . , N − 1 for N ∈ Ninto a table of size m. Number m is much smaller then N and moreover it ischosen to be a prime. Primality of m improves the ability of the hash function touniformly distribute the stored set across the table. The following example clarifieswhy primality is important for the choice of m.

Example 2.4. Consider the set S = 5n | n ∈ 0, . . . , 5 and a table consistingof 10 buckets. Set S contains only 6 elements, so the table’s load factor equals 0.6.If we use function x mod 10, then it maps the elements of the set S just onto twoaddresses, 0 and 5. The hash table is not used effectively. Choosing m to be a primenumber improves the results but is not a remedy.

The apparent disadvantage of classic hashing is usage of a single hash functionknown in advance. For example if the stored set S is chosen, by an adversary, so thatS ⊆ f−1(b) for a single bucket b ∈ B, we run into problems. Because our assumptionssay that the probability of having such an input is quite low, these inputs may beneglected.

To illustrate how separated chaining works the algorithms are presented. Al-though the model is quite simple the subroutines regarding the singly linked lists arenot discussed. They are considered elementary enough. We remark that the runningtime is proportional to length of the represented chain. Also notice that the insertprocedure has to check whether the element is already stored inside the list else itcan be stored twice.

11

Algorithm 1 Find operation of the separate chaining.

Require: x ∈ Uh = h(x)

if T [h] contains x thenreturn true operation is successful

elsereturn false operation is unsuccessful

end if

Algorithm 2 Insert operation of the separate chaining.


if T [h] does not contain x theninsert x into chain T [h]

end if

Algorithm 3 Delete operation of the separate chaining.


if T [h] contains x thenremove x from chain T [h]

end if

Theorem 2.5 (Average case of the separate chaining). Let α > 0 be the load factorof the hash table resolving collisions by separate chaining. Then the expected time ofthe successful find operation converges to 1 + α

2and to e−α + α in the unsuccessful

case.

Proof. The theorem is proved in [24].

Various modifications of separate chaining are known. If the universe is an or-dered set then ordering the elements in chains monotonously may bring a speedup.This is caused by the fact that the operations concerning the chains do not have toiterate throughout the whole list and stop earlier.

There is a modification utilising the move to front principle [2]. This principle ismotivated by the rule of the locality of reference [1]. The rule says that whenever anelement is accessed, it is very likely that it is going to be accessed again in a shortfuture. Therefore it is quite convenient to put the last accessed element to the frontof the list so that it is accessed fast.

12

0:

1:

2:

3:

4:

5:

6:

8

15

9

16

4

3

Figure 2.2: Singly linked lists represented directly in the hash table. The first numberis the bucket’s address, the second is the represented key and the last one is the nextpointer.

Another set of modifications changes the representation of chains. They are nolonger represented by common linked lists since their cache behaviour [20] is poor asstated in [32]. The chains are stored directly in the hash table. So when resolvinga collision an empty place in the hash table is taken. There are problems connectedwith this approach that need to be solved. Above all chains may not be mergedtogether. This appears when a new chain should be started but the place of the firstelement is already taken. It may be solved by moving the element, not belonging tothe chain, to another empty place. To perform the replacement quickly we are forcedto use double linkage. Or, instead of using doubly linked lists, at every address anadditional begin pointer may point to the first element of the chain. In addition,we have to deal with the fact that schemes using only the hash table suffer frombad behaviour when the load factor is almost one. And of course a uniform randomchoice of an empty bucket has to be provided when prolonging a chain.

2.4 Coalesced Hashing

In coalesced hashing, the chains of colliding elements are also stored directly in thehash table. However, they are allowed to fuse together. Every chain is representedby a singly linked list. The simpler methods such as LISCH and EISCH do not useany special additional memory. In addition, the EISCH method utilises the moveto front rule. Methods such as LICH, EICH and VICH use the additional memorywhich is not directly addressable by the hash function. Collision resolution uses theadditional memory to store growing chains. When an empty place for a collidingelement is needed, it is first sought in the additional memory. Only if the specialmemory is full, then the addressable memory is used. The methods are distinct in away they use the move to front rule. The EICH method always obeys it, VICH usesit only in the addressable memory and the LICH method does not obey the rule atall.

13

2.5 Open Addressing

Methods of coalesced hashing and separate chaining represent the chains explicitly bylinked lists. The additional pointer of the list not only consumes an extra memory. Italso worsens the cache behaviour of the hash table. Open addressing also merges thechains together. They are stored directly in the hash table and their representationis more implicit. To remove the need of an additional pointer, a secondary functionfor conflict resolution is used.

Models of open addressing assume existence of two hash functions h1 : U → Band h2 : U → B. Assume x ∈ U is the ith element of the chain and the order ofelements in a chain starts from zero. Its address is then determined by a compoundhash function h(x, i) = h1(x) + ih2(x). When we insert the element x ∈ U we find aminimal i such that the bucket at the address h(x, i) is empty.

This approach also removes the problem of the choice of a free bucket when acollision occurs. Unfortunately, there is no fast implementation of the delete opera-tion known so far. To delete an element we can mark its bucket as free and reuse itwhen possible. Marking the deleted element’s bucket is necessary if we do not wantto accidentally loose the elements lying behind the deleted one in the chains.

2.5.1 Linear Probing

Linear probing is a scheme of open addressing with h2(x) = 1 for every x ∈ U . Thelast element of the prolonged chain is placed into the first empty bucket behind theone with the address h1(x).

Another problem of linear hashing is its degradation. Clusters of non-empty buck-ets emerge when load factors reach one. Delete operation implemented by markingonly emphasises this problem. Despite of the mentioned problems, linear probingtakes the advantage of the good cache behaviour. No wonder, that its modificationsare studied nowadays, e.g hopscotch hashing mentioned in Section 2.8.

2.5.2 Double Hashing

If we want to distribute the stored set uniformly across the hash table, then thechoice of a free bucket should be random. Whenever function h(x, i) of argumenti is a random permutation of B, then the mentioned choice becomes random. Anecessary condition is that h2(x) and m are relatively prime. Therefore the size ofthe hash table is chosen so that it is a prime number again.

Double hashing has theoretical analyses yielding remarkable results.

Theorem 2.6. Let α ∈ (0, 1) be the load factor of the hash table resolving collisionsby double hashing. Then the running time of the find operation converges to 1

1−α in

the unsuccessful case and to 1α

log(

11−α

)when the operation is successful.

Proof. The proof is shown in [24].

14

2.6 Universal Hashing

Universal hashing uses a universal system of functions instead of a single function.This removes the dependence of universal hashing on the uniformity of its input. Itdoes not require the assumptions of standard hashing made in Section 2.2. Universalhashing is a randomised algorithm and the probability space is determined by theuniform choice of a hash function, instead of the selection of a stored set. We studyuniversal hashing and its systems in a more detailed way in Chapter 3.

2.7 Perfect Hashing

Assume that the stored set S is known in advance. The problem solved by perfecthashing is how to create a hash table and a hash function such that the find operationtakes only a constant time. Insert and delete operations are forbidden, the stored setS remains fixed. Additional requirements are placed on the size of the scheme. Sizeof the hash table is not the only problem. We must also take care about the spaceneeded to represent the constructed hash function since it becomes quite complex.The problem is solved in [16].

2.8 Modern Approaches

Modern approaches to hashing are nowadays often based on their probabilistic prop-erties. Authors adapt and change their models to improve the asymptotic results inthe average case and in the expected worst case. The algorithms still remain fastand are enriched by simple rules.

Straightforward use of a single function is not enough and various universal sys-tems are used. Theoretical analyses of the models do not necessarily involve theideas of universal hashing. Universal classes are used as a heuristic. Although thealgorithms are not complicated, their analyses become more and more difficult.

A model called robin hood hashing [7], [10] is an improvement of the doublehashing. In double hashing we are able to access the chain at an arbitrary positionin constant time. The main idea of robin hood hashing is to minimise the varianceof the expected chain length. If probing starts from the average chain length theexpected time of the find operation becomes constant.

Another model, hopscotch hashing [21], is a modification of the linear probingutilising the computer’s cache behaviour. A chain is kept together in one place aslong as possible. Algorithms in control of the processor cache store whole blocks ofmemory inside it when only a single part of the block is accessed. Provided that thechains are compact, storing them in the cache is quite probable. This optimisationmakes probing substantially faster.

Next investigated model is cuckoo hashing [9]. However, the interest fades thesedays since there are some problems connected with it [13], [12].

15

2.9 Advantages and Disadvantages

Hashing, compared to other data structures solving the dictionary problem, e.g.trees, does not provide the complete set of operations. If the universe is ordered,we can not query the hash table for the kth stored element. The hash functionsare supposed to distribute the elements across the table evenly. This assumption is,somehow, in the opposition to the preservation of the ordering. Especially, differentfunctions of a universal class can not preserve the ordering, they are required to bedifferent. So in general, hashing does not provide the operations based or relatedto ordering, such as Succ, Pred, Order of an element and the already mentionedoperation – kth stored element. Despite this, it can also be a rare advantage. Treesrequire an ordering of the elements they store, so when the universe is not naturallyordered a problem occurs. Usually we solve the problem by adding a unique identifierto every element before storing. This consumes a little memory and in distributedenvironments causes small troubles with uniqueness, too.

The obvious advantage of hashing is its simplicity. In addition, its constantexpected time of the find operation may seem far better than the logarithmic timeof trees. On the other hand, if provided with an unsuitable, input hashing oftendegrades. Balanced trees provide warranties for every input they are provided with.This is a tradeoff that has to be decided by the user – excellent expected and poorworst case time compared to the logarithmic time in every case.

To sum up, when we need a fast and simple to implement mapping of an objectto another one, hashing is a good solution. Especially, when the assumptions fromSection 2.2 are satisfied. These conditions are often met with objects having artificialidentifiers. However, the worst case guarantee is poor. Despite this, the expectedworst case warranty is similar to that of trees, as shown later in the work. Nowa-days simplicity does not mean any extra advantage. Standard libraries of currentprogramming languages provide us with many implementations of hashing and trees.

16

Chapter 3

Universal Classes of Functions

Universal classes of functions play an important role in hashing since they improveindependence of the associative memory on its input. Their greatest advantage isremoval of the two assumptions of classic hashing – uniformity of input. The firstone states that the probability of storing a set is distributed uniformly among setshaving the same size. The second one says that every element of the universe hasthe same probability of being hashed.

In addition, models of standard hashing take probability of collision relative to thechoice of the stored set. In models of universal hashing probability space correspondsto the choice of a random function selected from the universal class. Collision iscaused by an inconvenient choice of a universal function more than by an unsuitableinput. Notice that the relation between a collision and the input, the representedset, is not so tight as in the classic hashing. Another advantage of universal hashingis that any obtained bound holds for every stored set.

Probability that the selected universal function is not suitable for a stored setis low. So every function from a universal system is suitable for many stored sets.If the selected function is not suitable for a stored set we are free to pick anotherone. Rehashing the table using a new function is not likely to fail. The probabilityof failure for several independent choices of a hash function decreases exponentiallywith respect to the number of choices.

3.1 Universal classes of functions

Various definitions of universal classes may be found in [6], [23], [25] and [24]. Theirvarious application are shown in [34] and [27]. Originally for strongly n-universalclasses the equality in the probability estimate holds in [34]. Various situations maysatisfy only inequality and therefore their definition is a bit changed in this work.We also define nearly strongly n-universal classes.

Definition 3.1 (c-universal class of hash functions). Let c ∈ R be a positive numberand H be a multiset of functions h : U → B such that for each pair of elements

17

x, y ∈ U , x 6= y we have

|h ∈ H | h(x) = h(y)| ≤ c|H||B|

where the set on the left side is also considered a multiset. The system of functionsH is called c-universal class of hash functions.

First notice that the bound on the number of colliding functions for each pair ofelements is proportional to the size of the class H. More interesting is the inverseproportionality to the size of the hash table. A generalisation of the above definitionfor more than two elements provides us with a better estimate on the number ofcolliding functions. This count becomes inversely proportional to the power of thetable size.

Definition 3.2 (Nearly strongly n-universal class of hash functions). Let n ∈ N,c ∈ R, c ≥ 1 and H be a multiset of functions h : U → B such that for everychoice of n different elements x1, x2, . . . , xn ∈ U and n elements y1, y2, . . . , yn ∈ Bthe following inequality holds

|h ∈ H | h(x1) = y1, h(x2) = y2, . . . , h(xn) = yn| ≤ c|H||B|n

where the set on the left side is also considered a multiset. The system of functionsH is called nearly strongly n-universal class of hash functions with constant c.

Definition 3.3 (Strongly n-universal class of hash functions). Let n ∈ N and H bea multiset of functions h : U → B such that for every choice of n different elementsx1, x2, . . . , xn ∈ U and n elements y1, y2, . . . , yn ∈ B the following inequality holds

|h ∈ H | h(x1) = y1, h(x2) = y2, . . . , h(xn) = yn| ≤ |H||B|n

where the set on the left side is also considered a multiset. The system of functionsH is called strongly n-universal class of hash functions.

Every strongly n-universal class of hash functions is nearly strongly n-universalclass with constant one. Most of the presented strongly universal classes not onlysatisfy the inequality from Definition 3.3 but the bound equals the number of thecolliding functions.

Definition 3.4 (Strongly ω-universal class of hash functions). Family of functionsH is strongly ω-universal if it is strongly n-universal for every n ∈ N.

The strongly ω-universal systems provide us with an estimate of the expectedlength of the longest chain. This bound can be proved directly from the aboveproperty without regarding any other special attributes of a system. As shownlater the above property is difficult to be satisfied by a small class of functions.A straightforward example of a strongly ω-universal class is the set of all functions

18

from a domain U to a hash table B. It is obvious that the system is not veryconvenient because of its size. Simple construction of the system of all functions isbriefly discussed later and shown in [34].

One may ask if there are classes that are strongly ω-universal other than theempty class or the class of all functions. However, there are classes that are notnecessarily strongly ω-universal, but give the asymptotic bound on the expected

length of the longest chain O(

lognlog logn

). The bound is the same for standard hashing

and for strongly ω-universal systems as shown in Chapter 4. The families are basedon the system of polynomials or linear systems and are quite large and not veryefficient compared to the standard systems. ”Distinct large families of functionswere given by Siegel [11] and by Dietzfelbinger and Meyer auf der Heide [33]. Bothprovide families of size |U |nε . Their functions can be evaluated in a constant time ona RAM.” The families are somewhat more complex to implement when compared tothe class of linear functions as stated in [34]. However, these classes may be cosidered”nearly” strongly ω-universal, in fact they are strongly log n-universal.

The above definitions of universal classes can be rewritten in terms of probabil-ity as well. The probability space corresponds to the uniform choice of a functionfrom the system of functions. For example in Definition 3.1 of c-universal class H,probability of collision of two different elements x and y can be rewritten as

Pr (h(x) = h(y)) =|h ∈ H | h(x) = h(y)|

|H| =c

|B| .

In the same manner we can reformulate and use definitions for the other universalsystems.

In the remainder of this chapter we sum up some basic facts regarding universalsystems, show remarks about their combinations and derive some theoretic boundsof strongly k-universal systems.

Theorem 3.5. Every strongly 2-universal class of hash functions is also 1-universal.

Proof. Let H be strongly 2-universal class containing functions from a universe Uinto a set B. We have to find the number of functions in H which make a given pairof different elements x, y ∈ U collide.

First, define the set of functions that map both x and y on the image t ∈ BHt = h ∈ H | h(x) = h(y) = t.

These sets are disjoint and moreover |Ht| ≤ |H||B|2 . By summing throughout all t ∈ B

we have

|h ∈ H | h(x) = h(y)| =∑t∈B

|Ht| ≤ |B| |H||B|2 =H

|B| .

The number of all colliding functions is then less or equal to H|B| and the system H

is clearly 1-universal.

Theorem 3.6. Every strongly n-universal class of functions is strongly k-universalfor every 1 ≤ k ≤ n.

19

Proof. Let H be a n-universal class consisting of functions from a universe U intoa hash table B. Similarly to the previous proof there are k different elementsx1, . . . , xk ∈ U that should be mapped to the prescribed buckets y1, . . . , yk ∈ B.For every choice of their images y1, . . . , yk ∈ B we need to find the number of func-tions mapping the element xi to its image yi, 1 ≤ i ≤ k.

The only estimate we can use comes from the strong n-universality of H and holdsfor n elements only. We extend the set of images by other n−k elements xk+1, . . . , xn.They are let to be mapped on arbitrary element of B. Such extension must be donecarefully so that the final sequence consists of different elements x1, . . . , xn, too.From now on fix one such extension, xk+1, . . . , xn.

The set of functions mapping given k elements onto their prescribed images, H,may be seen as

H = h ∈ H | h(xi) = yi, i ∈ 1, . . . , k=

⋃yk+1,...,yn∈B

h ∈ H | h(xi) = yi, i ∈ 1, . . . , n.

Notice that sets of functions that map elements xk+1, . . . , xn onto different imagesyk+1, . . . , yn are disjoint. The sum of their sizes is then equal to |H|. It follows that

|H| =∣∣∣∣∣∣

⋃yk+1,...,yn∈B

h ∈ H | h(xi) = yi, i ∈ 1, . . . , n∣∣∣∣∣∣

=∑

yk+1,...,yn∈B

|h ∈ H | h(xi) = yi, i ∈ 1, . . . , n|

≤∑

yk+1,...,yn∈B

H

|B|n = |B|n−k H

|B|n =H

|B|k .

Now we can see that the class of functions H is strongly k-universal, too.

3.2 Examples of universal classes of functions

In this section we show some examples of common universal classes of hash functions.With each system we provide a simple proof of its universality or strong universality.Presented systems not only differ in contained functions but also in their domainsand co-domains. However every system can be thought of as a mapping from asubset of natural numbers onto another subset of natural numbers.

Linear system. System of linear functions was among the first systems of uni-versal hash functions that were discovered. They were introduced by Carter andWegman in [6] where they showed basic properties of universal hashing. Above allthey mentioned the expected constant time of the find operation of universal hash-ing. Nowadays various modifications of linear systems are known and are presentedin the following text. The system is also mentioned in [23] and [25].

20

Definition 3.7 (Linear system). Let N be a prime, m ∈ N and let U = 0, . . . , N−1and B = 0, . . . ,m− 1 be sets. Then the class of linear functions

LS = ha,b(x) : U → B | a, b ∈ U where ha,b(x) = ((ax+ b) mod N) mod m

is called linear system.

Remark 3.8. Linear system isdNme2(Nm)

2 -universal.

Proof. Consider two different elements x, y ∈ U . We need to find the number offunctions ha,b ∈ H such that

(ax+ b mod N) mod m = (ay + b mod N) mod m.

This is equivalent to the existence of numbers r, s, t that satisfy the followingconstraints:

r ∈ 0, . . . ,m− 1t, s ∈

0, . . . ,

⌈N

m

⌉− 1

(ax+ b) mod N = sm+ r

(ay + b) mod N = tm+ r.

Since N is a prime number and x 6= y for every choice of parameters r, s, t thereis an exactly unique solution; parameters a, b that satisfy the equalities.

Number of all possible choices of parameters r, s, t is m⌈Nm

⌉2. So there are exactly

m⌈Nm

⌉2functions ha,b that map the elements x, y to the same bucket.

We compute the system’s c-universality constant:

m

⌈N

m

⌉2

= m

⌈N

m

⌉2 N2

mN2

m

=

⌈Nm

⌉2

N2

m2

N2

m=

⌈Nm

⌉2

N2

m2

|LS||B| .

Hence the linear system isdNme2(Nm)

2 -universal.

For examining the properties of the original linear system simpler and smallermultiplicative system may be used. They have many common properties and sharesome same drawbacks. This multiplicative system may provide us with some insightsfor which choices of hash function and stored set they both fail.

Definition 3.9 (Multiplicative system). Let p be a prime and m, m < p be the sizeof the hash table. Then the system

Multm,p = h : Zp → Zm | h(x) = (kx mod p) mod m for k ∈ Zp − 0

is called the multiplicative system.

21

An important application of the multiplicative system is the construction of aperfect hash function shown in [16] by Fredman, Komlos and Szemeredi. The systemand its other properties are also mentioned in [25].

Remark 3.10. For a prime p and m < p the multiplicative system is 2-universal.

Proof. Let x and y be different elements of the universe. We have to find the numberof functions in the multiplicative system that make their images the same. For acollision function we have that

kx mod p mod m = ky mod p mod m

kx− ky mod p mod m = 0.

This may be rewritten as:

k(x− y) mod p = lm for l ∈−⌊ pm

⌋, . . . ,

⌊ pm

⌋− 0.

Notice that the parameter l 6= 0 since k 6= 0. Thus for every choice of x andy there are at most 2

⌊pm

⌋functions. For every value of l there is exactly one k

satisfying the equality because parameter p is a prime but k can be a solution fordistinct l. For the number of colliding functions we have

|h ∈ Multm,p | h(x) = h(y)| ≤ 2p

m

and the system is then 2-universal.

Previous systems contain only simple linear functions from natural numbers tonatural numbers. An analogous class can be constructed by using linear transfor-mations between vector spaces. The systems have various similar properties but thelatter one gives us a suitable bound on the size of the largest bucket.

System of linear transformations. Systems of linear transformations are stud-ied later in Chapter 5. We use them in Chapter 6 to create a new model of universalhashing. The system is discussed in a very detail way in [3].

Definition 3.11 (The set of all linear transformations, the set of all surjectivelinear transformations). Let U and B be two vector spaces. Denote the set of alllinear transformations as

LT (U,B) = T : U → B | T is a linear transformation

and let

LTS(U,B) = T : U → B | T is a surjective linear transformation

denote the set of all surjective linear transformations between vector spaces U and B.

22

Definition 3.12 (System of linear transformations). Let U and B be two vectorspaces over the field Z2. The set LT (U,B) is called the system of linear transforma-tions between vector spaces U and B.

Remark 3.13. System of linear transformations between vector spaces U and B is1-universal.

Proof. Let m and n denote the dimensions of vector spaces U and B respectively.Then every linear transformation L : U → B is associated with a unique n × mbinary matrix T .

Let ~x, ~y ∈ U be two distinct vectors mapped on a same image. We need to findthe number of linear mappings confirming this collision. Let k be a position suchthat xk 6= yk. Such k exists since ~x 6= ~y. We show the following statement. If thematrix T is fixed except the kth column, then the kth column is determined uniquely.We just lost n degrees of freedom when creating matrix T causing the collision of ~xand ~y. This lost implies 1-universality of the system, this is proved later, too.

In order to create a collision of ~x and ~y their images T~x, T~y must be the same.The system of equalities must then be true for every i, 1 ≤ i ≤ n:

m∑j=1

ti,jxj =m∑j=1

ti,jyj.

For ti,k we have:

ti,k = (xk − yk)−1

m∑j=1,j 6=k

ti,j(yj − xj).

The elements in the k-th column are uniquely determined by the previous equal-ity.

The number of all linear functions, equivalently the number of all binary matrices,is 2mn. Notice that the size of the target space B is 2n. The number of all linearfunctions mapping ~x and ~y to a same value is 2mn−n since we are allowed to arbitrarilychoose every element of the matrix T except the n ones in the k-th column.

System of linear transformations is 1-universal because

|L ∈ LT (U,B) | L(x) = L(y)| = 2mn−n =2mn

2n=|LT ||B| .

We are not limited only to vector spaces over field Z2. Systems of all lineartransformations between vector spaces over any finite field Zp are 1-universal, too.However the most interesting results are achieved for the choice of Z2.

23

Bit sum of a string over the alphabet 0, 1. Presented family is anotherlinear system that is 1-universal, too. It is clear that every natural number can bewritten as a string over alphabet consisting from two characters only, 0 and 1, itis the number’s binary form. Weighted digit sum is considered the number’s hashvalue. This reminds us of the system of linear transformations.

If we are hashing strings that are k + 1-bit long we also need k + 1 coefficients.Also assume hashing into a table of size p ∈ N where p is a prime. Coefficientsmay be chosen arbitrarily but are not greater than a parameter l, 0 < l ≤ p. Totransform a digit sum into an address of the hash table simple modulo p operation isused. Parameter l may seem quite artificial. However it sets the range for coefficientsand therefore determines the size of the whole system.

Definition 3.14 (Bit string system). Let p be a prime, k ∈ N, B = 0, . . . , p − 1and l ∈ N, l ≤ p. System of functions

BSSp,l =

h~c : 0, 1k+1 → B | h(x) =

(k∑i=0

cixi

)mod p,~c = (c0, . . . , ck) ∈ Zk+1

l

is called bit string system with parameters p and l.

Remark 3.15. Bit string system for binary numbers of length k + 1 modulo primep and constant l ∈ N, l ≤ p is p

l-universal.

Proof. Let x and y be two different bit strings x, y ∈ 0, 1k+1. We must estimatethe number of all sequences of coefficients 0 ≤ c0, . . . , ck < l that make the twoelements collide. Every collision sequence must satisfy:(

k∑i=0

cixi

)mod p =

(k∑i=0

ciyi

)mod p.

Since x 6= y there is an index j, 0 ≤ j ≤ k such that xj − yj 6= 0. This allows usto exactly determine the jth coefficient cj from the others:

(k−1∑i=0

cixi

)mod p =

(k−1∑i=0

ciyi

)mod p

cj(xj − yj) mod p =

(k−1∑

i=0,i 6=j

ci(xi − yi))

mod p

cj = (xj − yj)−1

(k−1∑

i=0,i 6=j

ci(xi − yi))

mod p.

Last equality holds since p is a prime number and the number xj−yj is invertiblein the field Zp. The computed cj may be from the set 0, . . . , p − 1 but we needit less then l. For such choice of the other coefficients ci, i 6= j there is no function

24

causing a collision. However we still have a valid upper bound on the number ofcolliding functions. There is one degree of freedom lost when choosing a functionthat makes x and y collide. The number of all functions is lk+1 and the size of thetable is p. The number of colliding functions can be computed as:

|h ∈ BSSp,l|h(x) = h(y)| ≤ lk =p

l

lk+1

p=p

l

|BSSp,l||B| .

The bit string system is thus pl-universal.

Polynomials over finite fields. The following system is a generalisation of so fardiscussed linear functions and linear transformations to polynomials. The advantagedelivered by the system of polynomials is its strong universality. Unfortunately it istraded for the bigger size of the system. The class is briefly mentioned in [34].

Definition 3.16 (System of polynomial functions). Let N be a prime number andn ∈ N. Define U = 0, . . . , N − 1 and B = 0, . . . ,m− 1 The system of functions

Pn = hc0,...,cn(x) : U → B | ci ∈ U, 0 ≤ i ≤ nwhere

hc0,...,cn(x) =

((n∑i=0

cixi

)mod N

)mod m

is called the system of polynomial functions of the n-th degree.

Remark 3.17. Let N be a prime number and n ∈ N. If B = U then the system ofpolynomial functions of the n-th degree is strongly n+ 1-universal. When B 6= U thesystem is nearly strongly n+ 1-universal.

Proof. Let x1, x2, . . . , xn+1 be different elements of U and y1, y2, . . . , yn+1 are theirprescribed images. We can write down a system of linear equations that can be usedto find the coefficients c0, c1, . . . , cn:

h(xi) = yi, 0 ≤ i ≤ n.

If U = B the function h(x) is reduced to the form:

h(x) =

(n∑i=0

cixi

)mod N .

Since N is a prime number and the elements xi are different the correspondingVandermond matrix is regular. There is exactly one solution of the above system,

let c0, . . . , cn denote the solution. Size of the system is |U |n+1 and 1 = |U |n+1

|B|n+1 = |U |n+1

|U |n+1 .Now it is clear that the system of polynomial functions is strongly n+ 1-universal.

Let U 6= B. We use the idea from the proof of c-universality of linear system.First we write down the equations h(xi) = yi in the field ZN . Instead using modNwe introduce new variables ri as in the proof for linear system such that

25

(n∑j=0

cjxji

)mod N = yi + rim ri ∈

0, . . . ,

⌈N

m

⌉− 1

.

For every choice of all ri, 0 ≤ i ≤ n, we obtain an unique solution of the abovesystem of equations since we are in the field ZN . The number of all choices of ri is⌈Nm

⌉n+1. From the number of functions which map xi to yi for 1 ≤ i ≤ n it follows

that

|h ∈ H | h(xi) = yi, i ∈ 1, . . . , n+ 1| ≤⌈N

m

⌉n+1

=

⌈Nm

⌉n+1(Nm

)n+1

(N

m

)n+1

.

Hence this system is nearly strongly n+1-universal with the constantdNmen+1

(Nm)n+1 .

Although the choice of U = B is not very practical one, however it gives us aclue that the system might be nearly strongly universal.

System of all functions. The strongly ω-universality is a powerful property. Thequestion is whether a system that satisfies the property exists. One example, al-though not very useful, can be immediately constructed – it is a set of all functionsbetween two sets. The applicable construction of the system is taken from [34].

Definition 3.18 (System of all functions). Let U and B be sets such that |B| < |U |.The family of functions

H = h : U → Bis called the system of all functions between U and B.

Remark 3.19. System of all functions between sets U and B is strongly ω-universal.

Proof. We are given n different elements x1, . . . , xn and their images y1, . . . , yn. Wemust compute size of the set Hn = h ∈ H | h(xi) = yi, i ∈ 1, . . . , n. Now note

that the system’s size is |H| = |B||U |. Since the values for elements x1, . . . , xn arefixed we have

|Hn| = |B||U |−n =|B||U ||B|n =

|H||B|n

and thus the system is strongly ω-universal.

The main problem when using this system is that fact that to encode a functionh ∈ H is highly inconvenient. To store one we need |U | log |B| bits. When using thissimple approach we need to encode the values of elements that with great probabilitywill never be stored in the hash table. An alternative is to construct random partialfunction from the set of all functions from B to U as shown in [34].

These ideas come from the existence of fast associative memory having expectedtime of find and insert operations O(1) and existence of a random number generator.Whenever we have to store an element we must determine its hash value. First, by

26

using the associative memory we find out if the stored element already has a hashvalue associated. If not we use the random number generator to obtain an elementfrom B and remember this connection, element – its hash value, to the associativememory. This sequentially constructs a random mapping from the system of allfunctions. Because the random number generator used chooses the elements of Buniformly created system is certainly strongly ω-universal. In addition we do notstore the hash values for irrelevant elements.

By using this idea to construct a fast hash table we are trapped in a cycle sincewe already supposed existence of a fast associative memory. But ω-universal classcan not only be used to construct a fast solution to the dictionary problem. It cansolve other problems like the set equality as shown in [34].

3.3 Properties of systems of universal functions

In this section we sum up some properties of strongly k-universal classes of functions.We concentrate on the generalisation of the known properties of the c-universalclasses to the strongly universal classes. These results may inspire us to create morepowerful or better behaving systems without losing strong universality.

The first theorems show the results of combinations of two strongly universalsystems.

Theorem 3.20. Let H and I be strongly k-universal systems both from a universe Uinto a hash table B with m = |B|. For h ∈ H and i ∈ I let us define

f(h,i)(x) = (h(x), i(x))

for all x ∈ U . Then the system

J = f(h,i) : U → B ×B | h ∈ H, i ∈ Iis strongly k-universal.

Proof. Let x1, . . . , xk be distinct elements of U and y1, . . . , yk, z1, . . . , zk ∈ B. Wefind the number of pair functions f(h,i) such that for both functions h and i wehave h(x1) = y1, . . . , h(xk) = yk, i(x1) = z1, . . . , i(xk) = zk. From strong k-

universality of both systems we know that there are at most |H|mk

functions h and|I|mk

functions i satisfying mentioned criteria. So the number of all functions f(h,i)

such that f(h,i)(x1) = (y1, z1), . . . , f(h,i)(xk) = (yk, zk) is at most |H||I|m2k . The size of

the table is m2 and thus the constructed system is strongly k-universal.

Combination of two universal systems does not have to increase the size of thetable. From the previous result we obtain a smaller probability of collision by payingthe expensive price of the table expansion. When using single class of hash functionsthe same effect on probability decrease can be achieved by simple squaring the table’ssize. We could also combine two strongly universal systems of functions and use asuitable operation to merge the given results into a small table. The created systemremains strongly universal as it is stated in the next theorem and the combinationdoes not bring us any obvious advantage.

27

Theorem 3.21. Let B denote the set of buckets of a hash table of size m, |B| = m.Let both H and I be strongly k-universal classes of functions over the table B. Let obe a binary left cancellative operation on B, ie. for every a, c ∈ B there exists exactlyone b ∈ B with o(a, b) = c. Then the system of functions

F = f(h,i)|h ∈ H, i ∈ I

wheref(h,i)(x) = o(h(x), i(x))

is strongly k-universal.

Proof. To prove the statement assume that x1, . . . , xk are pairwise distinct elementsof U and y1, . . . , yk are elements of B. For every yi, 1 ≤ i ≤ k choose a pair (ai, bi)with o(ai, bi) = yi. The number of such choices is mk. From Theorem 3.20 for afixed choice of (ai, bi), 1 ≤ i ≤ k, it follows that the probability of the event

h(x1) = a1, . . . , h(xk) = ak and i(x1) = b1, . . . , i(xk) = bk

is less than 1m2k . Hence Pr (f(xi) = yi, 1 ≤ i ≤ k) = mk

m2k = 1mk

. Now we see that thesystem F is strongly k-universal.

Corollary 3.22. Combining the results of pair functions created from strongly k-universal classes by operations

• xor where the table size is a power of 2

• +, − over an additive group Zl where l is the table size

gives strongly k-universal classes of functions.

Proof. Both of the operations are left cancellative because they are group operations.

Next step is to discover the dependence of the parameter k of strongly k-universalclasses on their size. Theorem 3.23 shows us the best possible probability boundfor the collision of k different elements depending on the size of the system whenconsidering the strongly k-universality of a system. Original proof for c-universalclasses may be found in [24] and [23]. We slightly modify it to obtain a lower boundon the size of a strongly k-universal class.

Theorem 3.23. Let H be a strongly k-universal system of functions from a universeU of size N to a table B of size m. Size of the system H is then bounded by:

|H| > mk−1 logm

(N

km

).

28

Proof. Let H = h1, . . . , h|H|. By induction we create sequence U0, U1, . . . of setssuch that U0 = U and Ui is a greatest subset of Ui−1 such that hi(Ui) is a singleton.

From the Dirichlet’s principle size of every set Ui satisfies |Ui| ≥ |Ui−1|m

. Hence we

obtain |Ui| ≥ |U |mi

.Let i be the greatest index with |Ui| ≥ k. Functions h1, h2, . . . , hi map at least

k elements of set Ui to a singleton. By Remark 3.6, the probability of this event isless or equal to 1

mk−1 . It follows that i ≤ |H|mk−1 . Since i is the greatest index with the

property |Ui| ≥ k we also have that |U |mi+1 ≤ |Ui+1| < k.

From Nmi+1 < k it follows N

km< mi and thus logm

(Nkm

)< i.

Obtaining the bound on the size of system of functions:

|H|mk−1

≥ i > logm

(N

km

)|H| > mk−1 logm

(N

km

).

Corollary 3.24. If H is a strongly k-universal system of functions from a universeU of size N to a table B of size m, then

k < 1 +log |H| − log logm

(Nkm

)logm

< 1 +log |H|logm

.

Proof. Theorem 3.23 can be reformulated to get a bound on k:

mk−1 <|H|

logm(Nkm

)k < 1 +

log |H| − log logm(Nkm

)logm

k < 1 +log |H|logm

.

29

Chapter 4

Expected Length of the LongestChain

4.1 Length of a Chain

In Lemma 2.1 we explain why the length of a chain is so important for hashing.Consider a model of hashing which assumes that chains, representing the elementsof a single slot, are stored by linked lists. To find an element in a bucket (or chain)we need to iterate it. Thus the time of the find operation is linear with respect to thenumber of elements of a chain. An approach, how to estimate the expected lengthof the longest chain, for both classic and universal hashing is shown.

Definition 4.1 (Equality indicator). Let x and y be two variables from the samedomain B. Then the function I : B ×B → 0, 1 such that

I(y = x) =

1 if x = y

0 if x 6= y

is called equality indicator.

Definition 4.2 (Length of a chain, length of the longest chain). Let U be a universe,B be a hash table and h : U → B be a hash function. Let b ∈ B be a bucket andS ⊂ U be a stored set. Then the length of the chain representing the bucket b when

hashing with the function h is denoted by psl(b, h) =∑x∈S

I(h(x) = b). The value

lpsl(h) = maxb∈B

(psl(b, h)) is called the length of the longest chain when hashing with

the function h.

Notation probe sequence comes from the open addressing. Open addressing as-sumes that chains are represented directly in the hash table. In addition, the chainsfrom various buckets may be merged together. Thus they may contain elements withdifferent hash values. Many authors refer to the process of finding an element in achain as to the probing a sequence. Authors denote corresponding random variablesby names psl and lpsl as well, for example in [7].

30

When taking the probability into an account we define random variables psl(b)and lpsl over the probability space determined by a choice of a hash from a universalclass.

Definition 4.3 (Randomised variables psl and lpsl). Random variable psl(b) de-notes the length of the chain representing a bucket b ∈ B. Its values are naturalnumbers and probability density function is defined as:

Pr (psl(b) = k) =

∑h∈H I(psl(b, h) = k)

|H| for k ∈ N.

Random variable lpsl denoting the length of the longest chain is defined as

lpsl = maxb∈B

(psl(b)).

In universal hashing the expected value of both random variables psl and lpsl istaken over the choice of a function h ∈ H. Hence the bounds on the variable lpslare valid for any stored set S ⊂ U . They frequently depend on the size of the storedset and on the size of the hash table but are independent of the stored elements.The value E (lpsl) is not only important characteristic of the expected worst case.As shown later in Chapter 6, if we are able to find a tight upper bound on E (lpsl),then we are also able to guarantee the worst case behaviour of the model.

4.2 Estimate for Separate Chaining

The standard result of classic hashing is the bound O(

lognlog logn

)on the length of the

longest chain. We can reuse calculations in its proof found in [24] to show the samebound for universal hashing with a strongly ω-universal class. In either case whenproving this result, we need to estimate the probability of collision of k elements.In the classic model of hashing the estimate follows from the assumptions given inSection 2.2. The assumptions are strong enough to solve the problem in the areaof classic hashing. However, they are not necessarily satisfied for many real worldsituations.

Definition 4.3 of the random variables lpsl and psl are made for models of uni-versal hashing. Yet, they are easy to understand with the standard hashing, too.

Remark, when using the standard model, we do not have to mention the addi-tional parameter h of the variables psl and lpsl because the standard model assumesa single hash function only. Recall that the probability space of this model corre-sponds to the choice of a stored set.

Additional restriction placed on the hash function requires that it distributes theelements of the universe uniformly across the hash table. Hence the other parameter,bucket b ∈ B, of the variable psl is no longer necessary, too. From this fact it followsthat for two arbitrary buckets a, b ∈ B the random variables psl(a) and psl(b) areequal:

Pr (psl(a) = k) = Pr (psl(b) = k) for every k ∈ N.

31

In this chapter we use the notation from Section 2.1. So U is a universe, mdenotes the size of the hash table B and n is the size of the stored set S ⊂ U .

Theorem 4.4. Assume the model of separate chaining with α < 1. Then

E (lpsl) ∈ O(

log n

log log n

).

Proof. The probability estimate for the variable lpsl is simply obtained from theprobability estimate of psl as:

Pr (lpsl ≥ i) ≤ mPr (psl(b) ≥ i for any b ∈ B) .

We use the definition of the expected value to find E (lpsl):

E (lpsl) =∞∑i=0

iPr (lpsl = i)

=∞∑i=0

i(Pr (lpsl ≥ i)−Pr (lpsl ≥ i+ 1))

=∞∑i=0

Pr (lpsl ≥ i) .

We obtained:

E (lpsl) ≤∞∑i=0

mPr (psl ≥ i) .

Since the hash function divides the universe U uniformly across the table, forevery x ∈ U , b ∈ B we have that Pr (h(x) = b) = 1

m. Now we use the assumption of

the independent and uniform choice of the hashed element. Provided the assumption,the probability of collision of i ∈ N elements may be estimated by a binomial randomvariable as. Hence for the collision of at least i elements it follows

Pr (psl ≥ i) ≤(n

i

)(1

m

)i.

Remark that this estimate holds only when the universe is substantially largerthan the stored set. Selection of the first element slightly changes the probability ofthe choice of the second one and so on. This fact is neglected for large or infiniteuniverses.

We can finish the estimate of the variable lpsl by computing:

E (lpsl) ≤∞∑i=0

mPr (psl ≥ i)

≤∞∑i=0

mmin

(1,

(n

i

)(1

m

)i)

= O

(log n

log log n

).

If we assume that the table’s load factor α < 1, we can complete the proof. Thedetails of the estimate can be found in [24] and in [23].

32

4.3 Estimate for Universal Hashing

In universal hashing we move to a different probability estimate of collision of kelements. The probability bound follows from strongly ω-universality of the classH, we use. We neither assume nor exploit specific properties other than stronglyω-universality.

Definition 4.5 (Set indicator). Let I be a function such that

I : 2U → 0, 1

I(M) =

0 M = ∅1 M 6= ∅.

Then I is called a set indicator.

The relation between the size of a set M and its indicator is described by thenext lemma.

Lemma 4.6.I(M) ≤ |M |.

Proof. We distinguish between the cases M = ∅ and M 6= ∅.• If M = ∅ then I(M) = 0 = |M | and the inequality holds.

• If M 6= ∅ then I(M) = 1 ≤ |M | so the statement is true.

Now we present a method how to estimate the probability of collision of k ele-ments.

Definition 4.7 (Collision sets of k elements). Let h ∈ H be a function, S ⊂ U be aset and b ∈ B. For a natural number k define

M≥k(h, b) = Y ⊆ S | |Y | ≥ k, h(Y ) = b,M=k(h, b) = Y ⊆ S | |Y | = k, h(Y ) = b.

When it is clear, we can omit parametrisation of the sets and use the notationM≥k or M=k. Simply said, sets M≥k and M=k denote the chains of length at least kand equal to k, respectively. Formally, chain of length k is a subset of the stored setS consisting of k elements which are hashed to a singleton.

Lemma 4.8. Let U be a universe, B represent a hash table, b ∈ B a bucket, S ⊂ Ube a stored set and h : U → B. Then the non-emptiness of the set M≥k(h, b) isequivalent to the non-emptiness of the set M=k(h, b). Equivalently

I(M≥k(h, b)) = I(M=k(h, b)).

33

Proof. Let M≥k be a non empty set, then:

Y ∈M≥k ⇒ ∀Y ′ ⊆ Y, |Y ′| = k : h(Y ′) = b⇔ ∀Y ′ ⊆ Y, |Y ′| = k : Y ′ ∈M=k

⇒ ∃Y ′ ⊆ Y, |Y ′| = k : Y ′ ∈M=k.

Let M=k be a non empty set:

Y ′ ∈M=k ⇒ Y ′ ∈M≥k.

Now we have:

M≥k 6= ∅ ⇔M=k 6= ∅I(M≥k) = I(M=k).

Now fix any arbitrary bucket b ∈ B. If the chain in the bucket b has psl(b, h) ≥ k,then there is a subset Y ⊆ S, |Y | ≥ k, such that h(Y ) = b. If psl(b, h) = k, thenset Y , |Y | = k, is maximal considering its size. It means that the set Y may notbe extended to a larger one hashed just onto b. We can write down the probabilitydensity functions of psl(b) as:

Pr (psl(b) ≥ k) =

∑h∈H I(Y ⊆ S | |Y | ≥ k, h(Y ) = b)

|H| ,

Pr (psl(b) = k) =

∑h∈H I(Y ⊆ S | |Y | = k, h(Y ) = b, Y is maximal)

|H| .

We add some remarks clarifying the following calculations:

• Every chain is determined by a bucket b ∈ B.

• Probability estimate is always computed for a fixed set S.

• We have to find an estimate

Pr (psl(b) ≥ k) ≤ p(B,U,H, |S|)

for some p(B,U,H, |S|). This estimate is then used to estimate the distributionfunction of lpsl:

Pr (lpsl ≥ k) ≤ |B|p(B,U,H, |S|).• The obtained estimate does not depend on the elements of the stored set S.

However, it depends on its size n = |S|.

34

From Lemma 4.8 it follows that the probability of collision of more than k ele-ments can be bound as

Pr (psl(b) ≥ k) =

∑h∈H I(Y ⊆ S | |Y | ≥ k, h(Y ) = b)

|H|=

∑h∈H I(Y ⊆ S | |Y | = k, h(Y ) = b)

|H|≤∑

h∈H |Y ⊆ S | |Y | = k, h(Y ) = b||H|

=|(h, Y ) | |Y | = k, h(Y ) = b|

|H|

=

∑Y⊆S,|Y |=k |h ∈ H | h(Y ) = b|

|H| .

Claim 4.9. Let k ∈ N, b ∈ B be a bucket and H be a universal class. If we assumeuniversal hashing with the class H, then

Pr (psl(b) ≥ k) ≤∑

Y⊆S,|Y |=k |h ∈ H | h(Y ) = b||H| .

Proof. Follows from the previous calculation.

We further estimate the above fraction from the properties of the class H. Inthis case we use strong ω-universality as stated in the next theorem.

Theorem 4.10. If H is a strongly ω-universal system, then E (lpsl) ∈ O(

lognlog logn

).

Proof. Estimate the probability of the existence of a chain consisting of at leastk ∈ N elements:

Pr (psl(b) ≥ k) ≤∑

Y⊆S,|Y |=k |h ∈ H|h(Y ) = b||H|

≤∑

Y⊆S,|Y |=k|H||B|k

|H|=

(|S|k

)1

|B|k .

Now we carry on as in the case of standard hashing. The probability estimateof collision of k elements is exactly the same. Thus we can obtain the same result

E (lpsl) = O(

lognlog logn

).

A construction of a small strongly ω-universal system is difficult. Later in thiswork we investigate the system of linear transformations. Although the system is notstrongly ω-universal, its specific properties, independent of the strong ω-universality,give an interesting upper bound on the distribution function of the variable lpsl.

35

Chapter 5

The System of LinearTransformations

From now on, we concentrate on the systems of linear transformations betweenvector spaces over the field Z2. The reason is simple, they allow a construction ofa remarkable model of universal hashing. Like for the other universal models, theexpected length of a chain is constant and so is the expected running time of thefind operation. In addition, the expected length of the longest chain is bounded bya sub-linear function of a single variable – the size of the hash table. In our model,we have also managed to prove that the expected amortised time of every operation,find, insert and delete, is constant.

Once again, the main result is the upper bound on the expected length of thelongest chain when a system of linear maps is used as a universal class. As alreadymentioned, probability of collision of k-elements is not estimated directly from theproperties common to all universal systems. Hence the proof of the bound can notbe based on the idea shown in the previous chapter.

First, we show a few models of the uniform random choice of a linear function.They describe other ways of the uniform random choice of a linear function fromthe universal class – choice of the universal hash function. Later, we prove sometechnical lemmas regarding the vector spaces. Finally, we state the required factsin terms of system of linear transformations. Then we interpret the facts in termsof hashing and propose a new model. Many theorems come from [3]. Our work liesin their correction, improvement and modification leading to a better and practicalresult.

In the following sections we restrict ourselves only to the vector spaces over thefield Z2. If vector spaces are taken over other finite field than the field Z2, thenwe can not expect such good results as shown in [3]. The original result proposeshashing of m logm elements into a hash table of size of m slots and leads to thebound O(logm log logm). Although the result may already be used for hashing melements, too, it is later improved for α ≤ 1.

When unsure about any exact definition or fact from linear algebra, see Ap-pendix A where we exactly summarise and accurately state them.

36

5.1 Models of the Random Uniform Choice

Our technical preparations for key statements include models of the random uniformchoice of a linear map. These models are used later to simplify proofs assuming therandom uniform choice of a linear transformation – choice of a hash function. Thesimple random choice is transformed to the random uniform selection of two or moreobjects that precisely correspond to the linear function.

We recall Definition 3.11 saying that for two vector spaces A and B, the sets

LT (A,B) = T : A→ B | T is a linear transformation,LTS(A,B) = T : A→ B | T is a surjective linear transformation

and Definition A.8 stating that for two affine vector spaces AA and BA, the set

LTA(AA, BA) = TA : AA → BA | TA is an affine linear transformation.Definition 5.1 (Uniform selection). Let S = s1, . . . , sn. By the random uniformselection of an element s ∈ S we understand the model of selection with

Pr (s ∈ S is selected) =1

n.

Definition 5.2 (Set of all permutations). Let b be a natural number. Then Πb

denotes the set of all permutations of the set 1, . . . , b.The first model shows the correspondence between the random uniform choice of

a basis of the source space that is mapped onto a fixed basis of the smaller targetspace and the random uniform choice of a surjective linear transformation.

Remark 5.3 (Surjective linear map selection). Let B be the set of all bases of thevector space Zf

2 and the set ~y1, . . . , ~yb be a basis of the vector space Zb2 where

b, f ∈ N and b ≤ f . Let us denote S the set of all subsets of 1, 2, . . . , f of size

f − b. For the random uniform choice of a basis β = ~b1, . . . , ~bf ∈ B, a set s ∈ Sand a permutation π ∈ Πb we define the linear map Tβ,s,π as

Tβ,s,π(bi) =

yπ(i) if i /∈ s0 if i ∈ s.

If we perform the random uniform choices of a basis β ∈ B, a set s ∈ S and apermutation π ∈ Πb, then we obtain a randomly and uniformly selected surjectivelinear map Tβ,s,π.

Proof. First, remark that Tβ,s,π is a surjective linear map for every choice of β ∈ B,s ∈ S and π ∈ Πb. From the fact that Tβ,s,π is defined for every vector of the basis βwe known it may be uniquely extended to a linear map. It is surjective since for everyvector ~yi, i ∈ 1, . . . , b, there is a vector ~bj, j ∈ 1, . . . , f, such that Tβ,s,π(~bj) = ~yi.

We show that for every surjective linear transformation T there is a choice of β, sand π such that Tβ,s,π = T . Consider set T−1(~0), it certainly contains f − b linearlyindependent vectors, denote them as β0.

37

Now for every i = 1, 2, . . . , b choose ~ci ∈ T−1(~yi), then β1 = ~c1, ~c2, . . . , ~cb is alinearly independent set because the set ~y1, . . . , ~yb is a basis of the vector spaceZb

2. Since T (β0) = ~0 we deduce that β = β0 ∪ β1 is a linearly independent set ofsize f , thus it is a basis.

Let δ denote the number of all bases in the vector space Zf−b2 and T be a fixed

surjective linear transformation. The number of choices generating the transforma-tion T equals f !δt2f−b. To choose a basis β of Zf

2 the sets β0 and β1 are selectedindependently of each other as already stated. Number of choices for β0 is δ. Theprocess of selection of β1 is already described and implies that the number of choicesis t2f−b. The factor f ! appears because the bases of the vector space Zf

2 are consid-ered ordered. Since we want to create the fixed function T the choice for s and π isalready exactly determined by the previous choice of the basis β.

Remark 5.4. Let u, f, b be natural numbers such that b ≤ f . Assume the randomuniform choices of a linear transformation T0 : Zu

2 → Zf2 among LT (Zu

2 ,Zf2) and a

surjective function T1 : Zf2 → Zb

2 among LTS(Zf2 ,Zb

2). Then we obtain the randomuniform choice of a linear transformation T = T1 T0 among LT (Zu

2 ,Zb2).

Proof. The idea of the proof is to show that there exists a natural number γ suchthat every linear map T is generated by γ choices of T0 and T1.

To prove this fact, let T1 be a fixed surjective linear map. First we show that thenumber of choices of T0, such that T = T1 T0, equals u2f−b. Let β be a fixed basisof the vector space Zu

2 . From Lemma A.10 it follows that for every ~y of β there are2f−b choices of T0(~y) ∈ T−1

1 (T (~y)). Clearly T = T1 T0.Now for every T there are exactly γ = u2f−b|LTS(Zf

2 ,Zb2)| choices of transfor-

mations T0 and T1.

See Appendix A for the definitions and facts regarding the orthogonal comple-ment, affine vector spaces and affine linear maps.

Remark 5.5. Let u, f, b ∈ N such that f ≥ b. Assume the random uniform choicesof T0 : Zu

2 → Zf2 among LT (Zu

2 ,Zf2) and T1 : Zf

2 → Zb2 among LTS(Zf

2 ,Zb2). Set

T = T1 T0. Assume that ~y ∈ Zb2 and denote UA = T−1(~y) and FA = T−1

1 (~y). IfUA 6= ∅, then T0|UA : UA → FA is a random affine linear map chosen uniformly fromthe set LTA(UA, FA).

Proof. First set U = Zu2 , U0 = T−1(~0), F0 = T−1

1 (~0) and U1 = U⊥0 .Now we show that there is a one-to-one relationship among the linear transfor-

mations T0 : U → F and pairs of mappings T0|U0 and T0|U1 . Assume that β0 andβ1 are orthonormal bases of the vector spaces U0 and U1. The set β = β0 ∪ β1 is anorthonormal basis of the vector space U , too. For a vector ~x ∈ β set the value T0(~x)equal to T0|U0(~x) or T0|U1(~x), according to the presence of ~x in β0 or β1. Thus theuniform choice of a mapping T0 corresponds to the uniform independent choices oflinear functions T0|U0 and T0|U1 .

From the observed fact it follows that the uniform choice T0 gives the uniformselection of a linear transformation T0|U0 . When ~y = ~0, then, by the previous state-ment, we simply perform the random uniform selection of a mapping T0|UA = T0|U0 .

38

When ~y 6= ~0, we have to randomly and uniformly select an affine mapping T0|UA .By setting T0|UA(~x) = T0(~v) + T0|U0(~x− ~v) for an arbitrary but fixed vector ~v ∈ UA,the uniform choice of T0|U0 implies the uniform choice of T0|UA .

5.2 Probabilistic Properties

Many of the following claims are taken from [3]. It is convenient to show the originalproofs and then modify them according to our future needs. Technical definitionsand statements which follow are useful in order to show our goal – the bound on thelength of the longest chain.

Once again, see Appendix A for the exact definitions of the set

~v + A = ~v + ~a | ~a ∈ A

and the setA+B = ~a+~b | ~a ∈ A,~b ∈ B.

Original Lemma 5.6 can be found in [24], however our proof is more exact andclear.

Lemma 5.6. Let V be a finite vector space and A be its subset. Define µ = 1− |A||V |as the inverse density of the set A in the vector space V . Let ~v ∈ V be a randomuniformly chosen vector independent of A. Then

E

(1− |A ∪ (~v + A)|

|V |)

= µ2

The expectation is taken over the possible choices of ~v ∈ V .

Proof. To simplify further computations define X~v = |A ∪ (~v + A)| as a randomvariable taken over the random uniform choice of a vector ~v ∈ V . The most difficultpart of the proof is to compute E (X~v). From its definition we have

E (X~v) =∑~v∈V

|A ∪ (~v + A)| ·Pr (~v is chosen) =∑~v∈V

|A ∪ (~v + A)||V | .

The size of the set A ∪ (~v + A) can be expressed using the indicator function as

|A ∪ (~v + A)| =∑~u∈V

I(~u ∈ A ∨ ~u ∈ (~v + A)).

To compute the sum∑

~v∈V |A∪ (~v+A)| we count the number of pairs of vectors~u,~v ∈ V satisfying the indicator’s condition. Notice, if ~u ∈ A, then there are exactly|V | vectors ~v ∈ V satisfying the above condition. To count the number of vectorsconfirming to the second one assume that ~u /∈ A and ~a ∈ A. There is exactly onevector ~v ∈ V for every choice of ~u and ~a such that ~a+~v = ~u. It follows that there are

39

exactly |A|(|V | − |A|) such vectors ~u and ~v satisfying the condition. The remainingchoices are refused. Thus

|(~u,~v) | ~u ∈ A ∨ ~u ∈ (~v + A), ~u,~v ∈ V | = |A|(|V | − |A|) + |V ||A|= 2|V ||A| − |A|2.

Substituting into the definition of E (X~v) and rewriting sums into the just com-puted number of pairs gives that

E (X~v) =

∑~v∈V

∑~u∈V I(~u ∈ A ∨ ~u ∈ (~v + A))

|V |=|(~u,~v) | ~u ∈ A ∨ ~u ∈ (~v + A), ~u,~v ∈ V |

|V |=

2|V ||A| − |A|2|V | .

Now we finally compute the expected value

E

(1− |A ∪ (~v + A)|

|V |)

= 1− E (|A ∪ (~v + A)|)|V |

= 1− E (X~v)

|V |= 1− 2|V ||A| − |A|2

|V |2

=|V |2 − 2|V ||A|+ |A|2

|V |2

=

(1− |A||V |

)2

= µ2.

The next lemma and its corollaries are so simple that one can think they should beomitted. However, they are used later in a few complicated situations. Sometimes itis hard to realise why the inequalities hold and recalling this lemma and its corollariesoften helps.

Lemma 5.7. Let f : (0, 1)→ R be a decreasing function. Then the function xf(x) isincreasing in the interval (0, 1).

Proof. To prove the lemma assume that a, b ∈ (0, 1) and a < b. It follows thatf(a) > f(b). Observe that the function ax of variable x is decreasing and thefunction xf(b) of variable x is increasing. Hence

af(a) < af(b) < bf(b).

40

Corollary 5.8. The function xc+log log( 1x) is increasing in the interval (0, 1) for every

c ∈ R.

Proof. Use Lemma 5.7 and note that the function log log(

1x

)is decreasing.

Corollary 5.9. The function xc−log x−log log x is decreasing in the interval (1,∞) forevery c ∈ R.

Proof. Set the function g(y) = y−c+log( 1y )+log log( 1

y ) for 0 < y < 1. From Lemma 5.7for g(y) it follows that g is increasing in (0, 1). Then for every x > 1 the functionf(x) = xc−log x−log log x = g

(1x

). Thus the function f is decreasing since for 1 < a < b

we have

f(a) = g

(1

a

)> g

(1

b

)= f(b).

The following lemma is a very technical one and is taken from [24]. It is laterused to estimate probabilities of events related to the system of linear maps. Theseevents correspond to the existence of a large set having a singleton image – to theexistence of a long chain.

Lemma 5.10. Let 0 < µ0 < 1 be a constant and for every i, 1 ≤ i ≤ k, let µi be arandom variable. Let us assume that for every random variable µi, 1 ≤ i ≤ k, thefollowing holds:

0 ≤ µi ≤ µi−1

E (µi | µi−1, . . . , µ1) = µ2i−1.

Then for every constant t ∈ [0, 1]

Pr (µk ≥ t) ≤ µk−log log( 1

t)+log log

“1µ0

”0 .

Proof. We prove the statement by induction over k.

The initial step, k = 0. Since µ0 is constant we have

Pr (µ0 ≥ t) =

0 if µ0 < t

1 if µ0 ≥ t.

From µ0 > 0 it follows that

µ0−log log( 1

t )+log log“

1µ0

”0 ≥ 0

and thus the estimate holds for 0 < µ0 < t.

If t ≤ µ0 < 1, then − log log(

1t

)+ log log

(1µ0

)≤ 0 and hence

µ0−log log( 1

t )+log log“

1µ0

”0 ≥ 1.

The statement thus holds for k = 0.

41

The induction step. We prove the statement for k+1 using the assumption thatthe lemma holds for k ≥ 0. Let t ∈ (0, 1) be fixed. For simplicity, let us denotec = k − log log

(1t

). Then we have to prove

Pr (µk+1 ≥ t) ≤ µc+1+log log

“1µ0

”0 .

Whenever exponent c + 1 + log log(

1µ0

)≤ 0, the estimate holds because µ0 < 1.

We can restrict ourselves to the case when c + 1 + log log(

1µ0

)> 0. To prove our

statement we fix the value of the variable µ1 and use the induction hypothesis fork. For a value a ∈ [0, µ0] define g(a) = Pr (µk+1 ≥ t | µ1 = a). First, we give thefollowing claim.

Claim 5.11.

Pr (µk+1 ≥ t) =

µ0∫0

Pr (µk+1 ≥ t | µ1 = a) Pr (µ1 = a) da = E (g(µ1)) .

Proof. This claim is a corollary of Lemma B.20.

The following functions are convenient for using the induction hypothesis. Forevery x ∈ (0, 1) define functions f and f0 as

f0(x) =

xc+log log( 1

x) if 0 < x < 1

0 if x = 0,

f(x) = min1, f0(x).If β0 = a ∈ [0, µ0] is constant and for i = 1, . . . , k βi are random variables

satisfying the conditions of this lemma, then apparently Pr (βk ≥ t) = g(a).

Claim 5.12. For every a ∈ [0, µ0] we have g(a) ≤ f(a).

Proof. We omit the constant µ0 from the sequence of random variables and use theprevious fact for βi = µi+1 for i = 0, . . . , k. The induction hypothesis for k then

states g(a) ≤ f(a) = ak−log log( 1t )+log log( 1

a).

These claims are used later in the proof. Now we investigate the behaviour ofthe function f0

xin the interval (0, 1). First, we compute the first derivation of the

function f0x

for every x ∈ (0, 1).(f0(x)

x

)′=xf0(x)

[ln(x)

(c+ log log

(1x

))]′ − f0(x)

x2

=

f0(x)

[c+ log log

(1x

)+ x · lnx

log( 1x) ln 2

· xln 2· −1x2

]− f0(x)

x2

=

(c− 1 + log log

(1

x

)+ log e

)f0(x)

x2.

42

The investigated function f0(x)x

has the unique stationary point xs = 2−2−c+1−log e.

First it is increasing in the interval (0, xs] and then decreasing in the interval [xs, 1).Let us also define x1 = 2−2−c , the point where f0(x) reaches 1 for the first time,

f0(x1) = x1c+log log

“1x1

”= x1

c−c = 1.

Since −c > −c + 1 − log e, the inequality x1 < xs holds and thus point x1 is stillin the increasing phase of the function f0(x)

x. By Corollary 5.8 the function f0(x)

is increasing in the interval [x1, 1). Therefore for every x ∈ [x1, 1) we have thatf0(x) ≥ f0(x1) = 1 and f(x) = min1, f0(x) = 1.

At last, define the third point x2 =√x1 = 2−2−c−1

. The inequality x2 > xs followsfrom −c− 1 < −c+ 1− log e. The point x2 then lies in the decreasing phase of thefunction f0(x)

x. For x2 the following statement holds:

f0(x2)

x2

=

(2−2−c−1

)c+log“− log

“2−2−c−1

””

2−2−c−1

=

(2−2−c−1

)c+log(2−c−1)

2−2−c−1

=

(2−2−c−1

)−1

2−2−c−1

=1

x22

=1

x1

.

Claim 5.13. The following holds:

(1) f0(x1)x1

= 1x1

,

(2) f0(x2)x2

= 1x1

,

(3) f(x)x≤ f0(µ0)

µ0if 0 < x ≤ µ0 ≤ xs,

(4) 1x1≤ f0(µ0)

µ0if µ0 ∈ [xs, x2].

Proof. The first fact f0(x1)x1

= 1x1

, follows from the definition of x1 because f0(x1) = 1.The proof of the second one is above.The third one comes from the observation that the function f0(x)

xis increasing in

the interval (0, xs]. In addition, f(x) ≤ f0(x) in the interval (0, 1).

The last one comes from the equality 1x1

= f0(x2)x2

and the fact that both µ0 andx2 lie in the decreasing phase. Thus the relation µ0 ≤ x2 implies the result

f0(x2)

x2

≤ f0(µ0)

µ0

.

43

The proof is now divided into three cases. The first and the second case take care

about the situation when the exponent c + 1 + log log(

1µ0

)is non-negative. In the

first two cases it is proved that f(x) ≤ f0(µ0)xµ0

for every x ∈ (0, µ0].

Constant µ0 is in the increasing phase, µ0 ≤ xs. From Claim 5.13(3) it followsthat

f(x)x

x≤ f0(µ0)x

µ0

.

Constant µ0 is in the decreasing phase, xs ≤ µ0 ≤ x2.

Assume that x ∈ (0, x1]. The Claim 5.13(1) and the fact that the function isincreasing in the interval imply

f(x)

x≤ f0(x)

x≤ f0(x1)

x1

=1

x1

.

Let x ∈ [x1, µ0]. Because of the choice x ≥ x1 we have that f(x) = 1 and itfollows that

f(x)

x=

1

x≤ 1

x1

.

For every x ∈ (0, µ0] we may estimate the function f(x) as

f(x) =f(x)x

x≤ x

x1

.

The Claim 5.13(4) proves that

f(x) ≤ x

x1

≤ f0(µ0)x

µ0

.

Proof of the lemma assuming either of the first two cases. In both casesfor every x ∈ (0, µ0] the following holds

f(x) ≤ f0(µ0)x

µ0

.

Now use the previous estimate, Claim 5.12, Claim 5.11 and properties of the expectedvalue stated in Lemma B.18 to prove the lemma,

Pr (µk+1 ≥ t) = E (g(µ1)) ≤ E (f(µ1)) ≤ E

(f0(µ0)µ1

µ0

)=f0(µ0)

µ0

E (µ1 | µ0)

=f0(µ0)

µ0

µ20 = µ0f0(µ0) = µ0

c+1+log log“

1µ0

”.

44

Constant µ0 is set so that the exponent is negative, µ0 ≥ x2. To prove the

lemma in this case, it is sufficient to show that the exponent, c + 1 + log log(

1µ0

),

is not positive. As already observed before the estimate is then at least 1. Theexponent is non-positive since

c+ 1 + log log

(1

µ0

)≤ c+ 1 + log log

(1

x2

)= c+ 1 + log

(− log

(2−2−c−1

))= c+ 1 + log

(2−c−1

)= c+ 1− c− 1 = 0.

The inequality, Pr (µk+1 ≥ t) ≤ µc+1+log log

“1µ0

”, holds for every of the three cases

and the induction step is thus complete.

Theorem 5.14 is also taken from [24] and is used to estimate the probability ofthe event that a subset of the domain is not mapped onto whole target space.

Theorem 5.14. Let f, b ∈ N such that b ≤ f . Let S be a proper and non-emptysubset of the vector space Zu

2 . Set µ = 1 − |S|2f

as the inverse density of the set S

in the vector space Zf2 . Then for a random uniformly chosen surjective linear map

T : Zf2 → Zb

2 we have

Pr(T (S) 6= Zb

2

) ≤ µf−b−log b+log log 1µ .

Proof. Set k = f − b. Choose vectors ~v1, . . . , ~vk ∈ Zf2 independently and randomly

using the uniform distribution. Note that the vectors ~v1, . . . , ~vk ∈ Zf2 are not neces-

sarily linearly independent.We perform the random uniform choice of a linear surjective transformation T

according to Model 5.3. First fix a basis of the target space Zb2. Secondly, maximal

linearly independent subset of vectors ~v1, . . . , ~vk is extended to a random basis β ofthe vector space Zf

2 . Now choose a random permutation π ∈ Πb as stated in thementioned model. Because of the random uniform selection of the vectors ~v1, . . . , ~vk,the basis β may be chosen uniformly as well. If we place the vectors ~v1, . . . , ~vk intothe kernel of the created mapping, then they define a subset s′ of the needed sets ∈ S. The set s′ is then uniformly extended to a set s ∈ S such that s′ ⊆ s sothat we perform the uniform choice of a set s ∈ S. The random uniform choice of asurjective linear function T is finally complete. Just note that we have T (~vi) = ~0 forall i = 1, . . . , k.

Let us define a bounded sequence of sets S0 = S and Si = Si−1 ∪ (Si−1 + ~vi) and

set µi = 1 − |Si|2f

for i ∈ 1, . . . , k. When considering µi random variables, then byusing Lemma 5.6 we can derive that E (µi) = µ2

i−1 for every i ∈ 1, . . . , k. Becauseevery set Si is an extension of the previous set Si−1 it follows that 0 < µ = µ0 < 1and µi ≤ µi−1 for all i = 1, . . . , k. The assumptions of Lemma 5.10 are now satisfied

45

and we obtain

Pr(µk ≥ 2−b

) ≤ µk−log log( 1

2−b )+log log( 1µ)

= µk−log b+log log( 1µ)

= µf−b−log b+log log( 1µ).

If we prove that the event T (Sk) = Zb2 occurs whenever µk < 2−b, then the

statement will be almost proved. Since µk = 1 − |Sk|2f

the size of the set Sk equals2f (1 − µk). Using the assumption µk < 2−b it follows that |Sk| > 2f − 2f−b. To getthe contradiction we assume that there is a vector ~x ∈ Zb

2 − T (Sk) or equivalentlyT (Sk) 6= Zb

2. Under these conditions it is clear that T−1(~x) and Sk are disjoint. FromLemma A.10 we have |T−1(~x)| = 2f−b. It follows that

2f = |Zf2 | ≥ |Sk ∪ T−1(~x)| > 2f − 2f−b + 2f−b = 2f .

This is a contradiction.To prove the theorem rewrite the statement, µk < 2−b ⇒ T (Sk) = Zb

2, in termsof probability:

Pr(µk < 2−b

) ≤ Pr(T (Sk) = Zb

2

).

Because Zf2 is a vector space over the field Z2, by induction over i ∈ 1, . . . , k,

we obtain that Si = Si−1 ∪ (~vi + Si−1) = S0 + span (~v1, . . . , ~vi). Note that T (~vi) = ~0because the mapping T is chosen so that every vector ~vi is placed in the kernel of T .This simply implies T (Sk) = T (S).

The proof of the theorem is finished by putting the previous notes together,

Pr(T (S) 6= Zb

2

)= Pr

(T (Sk) 6= Zb

2

)= 1−Pr

(T (Sk) = Zb

2

)≤ 1−Pr

(µk < 2−b

)= Pr

(µk ≥ 2−b

)≤ µf−b−log b+log log( 1

µ).

Theorem 5.15 shows the probability of the complementary event, T (S) = Zb2, if

the set S is large enough. It is taken from [24] and is consequently improved by ourStatements 5.18 and 5.23.

Theorem 5.15. Let T : Zu2 → Zb

2 be a random uniformly chosen linear map. Forevery ε ∈ (0, 1) there is a constant cε > 0 such that for every subset S of the domainZu

2 with |S| ≥ cεb2b we have that

Pr(T (S) = Zb

2

) ≥ 1− ε.

46

Proof. The idea of the proof is to factorise the transformation T through the factorspace Zf

2 , where the dimension f is specified later, into two linear functions T0 andT1 such that T = T1 T0. In fact, we estimate the probability of the complementaryevent, T (S) 6= Zb

2. To successfully use the estimate from Theorem 5.14 we mustbound the inverse density µ. The bound is obtained by using the Law of TotalProbability, Theorem B.13 as

Pr(T (S) 6= Zb

2

)= Pr

(T (S) 6= Zb

2 ∧ |T0(S)| ≤ |S|2

)+ Pr

(T (S) 6= Zb

2 ∧ |T0(S)| > |S|2

).

Then we use the Markov’s inequality to estimate the expression Pr(|T0(S)| ≤ |S|

2

)and Theorem 5.14 on the remaining one.

First set f =⌈log(2|S|

ε)⌉. Let T1 : Zf

2 → Zb2 be a random uniformly chosen

surjective linear mapping. Since cε is later chosen large enough, we have that f ≥ band thus such onto mapping T1 exists. Fix it. Secondly, perform the random uniformchoice of a linear mapping T0 : Zu

2 → Zf2 . From Model 5.4 it follows that the linear

mapping T = T1 T0 is chosen randomly and uniformly, too.Since the family of all linear mappings from Zu

2 into Zf2 is 1-universal we conclude

thatPr (T0(~x) = T0(~y)) = 2−f

for all distinct vectors ~x and ~y from the vector space Zu2 . If dS is the number of all

pairs of distinct vectors ~x, ~y ∈ S with T0(~x) = T0(~y), then the expected value of therandom variable dS is

E (dS) =

(|S|2

)2−f .

If |T0(S)| ≤ |S|2

then there exist at least |S|2

pairs of distinct vectors ~x, ~y ∈ S withT0(~x) = T0(~y). By the Markov’s inequality, Theorem B.21, for every k > 1 we have

Pr

(dS ≥ k

(|S|2

)2−f)≤ 1

k.

Thus if we set k = |S|2f

2(|S|2 ), then

k ≥ 2|S|2ε(|S|2 − |S|) ≥ 2ε−1 > 1

and

k

(|S|2

)2−f =

|S|2f2(|S|

2

)(|S|2

)2−f =

|S|2

and, by the Markov’s inequality, we obtain

Pr

(|T0(S)| ≤ |S|

2

)≤ Pr

(dS ≥ |S|

2

)≤ 2

(|S|2

)|S|2f =

|S| − 1

2f<|S|2f≤ ε|S|

2|S| =ε

2.

47

We can summarise that

Pr

(T (S) 6= Zb

2 ∧ |T0(S)| ≤ |S|2

)≤ ε

2.

Secondly we compute Pr(T (S) 6= Zb

2 ∧ |T0(S)| > |S|2

). By Theorem 5.14 used

for the mapping T1 : Zf2 → Zb

2 and the set T0(S) ⊆ Zf2 we have

Pr

(T (S) = T1(T0(S)) 6= Zb

2 ∧ |T0(S)| > |S|2


µ)

where µ = 1− |T0(S)|2f

. Since 2f ≤ 4|S|ε

, clearly

µ = 1− |T0(S)|2f

< 1− |S|2 · 2f ≤ 1− ε|S|

8|S| ≤ e−ε8 .

In the following constant cε is chosen as 4(

2ε

) 8ε . Then we can estimate:

− ε

8

(f − b− log b+ log log

(1

µ

))= − ε

8

(⌈log

(2|S|ε

)⌉− b− log b+ log log

(1

µ

))

≤ − ε8

log

8(

2ε

) 8ε b2b

ε

− b− log b+ log log

(1

µ

)≤ − ε

8

(3 +

8

εlog

2

ε− log ε+ log b+ b− b− log b+ log

(( ε8

)log e

))= − ε

8

(3− log ε+

8

εlog

2

ε+ log ε− 3 + log log e

)= − ε

8

(8

εlog

2

ε+ log log e

)= log

ε

2− ε

8log log e

≤ logε

2.

And for the calculated probability we derive:

Pr

(T (S) 6= Zb

2 ∧ |T (S)| > |S|2


µ)

≤ e−ε(f−b−log b+log log( 1

µ))8

≤ elog( ε2) ≤ eln( ε2) =ε

2.

If we put both alternatives together, we deduce that

Pr(T (S) = T1(T0(S)) 6= Zb

2

) ≤ ε

2+ε

2= ε.

48

From this observation follows the required estimate:

Pr(T (S) = Zb

2

) ≥ 1− ε.

The biggest disadvantage of Theorem 5.15 is the great value of constant cε. Asfollows from Theorem 5.38 and Theorem 5.40 it is necessary to decrease its value.Since the other constants are proportional to it, we need to find the smallest valueof cε. It is the aim of the following theorems and corollaries. When we successfullymanage the goal, then we are able to propose a reasonable chain limit rule, boundingthe length of the longest chain.

As observed in the next corollary, we may apply Theorem 5.15 for two affinevector subspaces and an affine linear transformation between them, too.

Corollary 5.16. Let UA and FA be affine vector subspaces of the vector spacesZu

2 and Zf2 . Let TA : UA → FA be a random uniformly chosen affine linear map,

ε ∈ (0, 1), constant cε be chosen so that Theorem 5.15 is satisfied and SA ⊆ UA suchthat |SA| ≥ cε|FA| log |FA|. Then

Pr (TA(S) = FA) ≥ 1− ε.

Proof. See Appendix A for the definitions of affine linear spaces and transformations.Recall Definition A.3 saying that UA = ~u + U0 for a vector subspace U0 ≤ U and avector ~u ∈ U . Similarly we can assume that FA = ~f+F0 for F0 ≤ F and ~f ∈ F . FromDefinition A.8 of the affine linear mapping TA it follows that TA(~x) = ~f + T0(~x− ~u)for a fixed linear transformation T0 : U0 → F0 and for every vector ~x ∈ FA.

We known that U0, F0 are vector spaces. By setting S0 = SA − ~u ⊆ U0 itfollows that |S0| ≥ cε|F0| log |F0|. We can apply Theorem 5.15 and we obtainPr (T0(S0) = F0) ≥ 1− ε.

Now we must realise that the result is valid in the affine case, too. This is simple,since T0(S0) = F0 if and only if TA(SA) = FA. From Lemma A.4 it follows thatFA = TA(~u) + F0. The relation T0(S0) = F0 ⇔ TA(SA) = FA holds since

( ~x0 ∈ S0 ⇒ T0( ~x0) ∈ F0)⇔ ( ~x0 ∈ S0 ⇒ TA(~u+ ~x0) ∈ TA(~u) + F0)

⇔ ( ~xA = ~u+ ~x0 ∈ SA, ~x0 ∈ S0 ⇒ TA( ~xA) ∈ FA).

5.3 Parametrisation of The Original Proofs

In order to obtain the minimal value of the constant cε we present a simple parametri-sation of the proof of the previous theorem. Combination of this simple powerfultechnique with the novel ideas of Statement 5.23 brings a new reasonable result. Theresult is used later in Theorem 5.41 and exploited by the proposed model of hashingin Chapter 6.

49

As mentioned, parametrisation of the proof of Theorem 5.15 is done. The opti-misation of the value cε itself is not performed analytically because of its complexitycaused by the emerged constraints. Instead, we created a straightforward computerprogram that makes it numerically. Each parameter is assigned a value from a pre-defined interval. The intervals were chosen manually so that the assumptions of thetheorems are satisfied and a reasonable result is achieved. Program enumerates thevalues of the intervals uniformly with the prescribed step set for each interval. Theconstant cε is then computed for every assignment and the minimal value is remem-bered along with the parametrisation. The parameters then allow the verification ofall the assumptions of all the claims stated in the proof of a parametrised theorem.

In the following text we use the assumptions and notation from the proof ofTheorem 5.15. The next lemma allows a parametrisation of the size of the set T0(S).In the moment when we use the Law of Total Probability we split the proof ofTheorem 5.15 into two cases. We distinguish between them according to the size ofthe set T0(S). The question is what happens if another value than |S|

2is chosen. The

parameter k, k ≥ 1, corresponds to the choice of a different size, now set to |S|k

.

Lemma 5.17. Let T0 : Zu2 → Zf

2 be a function, S ⊆ Zu2 and k ∈ R such that k ≥ 1.

If |T0(S)| ≤ |S|k

, then T0 has at least |S|(k−1)2

collisions of elements from S.

Proof. Define the family bi ∈ N0i∈T0(S) such that bi =∣∣S ∩ T−1

0 (i)∣∣. First note

that∑

i∈T0(S) bi = |S|. The Cauchy Bunyakovsky Schwarz inequality for the vectors

~bii∈T0(S) and ~1i∈T0(S) states that ∑i∈T0(S)

b2i

∑i∈T0(S)

12

≥ ∑i∈T0(S)

bi · 12

∑i∈T0(S)

b2i ≥

|S|2|T0(S)| .

The number of all colliding pairs can be computed now as

|x, y | x 6= y ∈ S, T0(x) = T0(y)| = 1

2

∑i∈T0(S)

bi(bi − 1)

≥ |S|2

( |S||T0(S)| − 1

)≥ |S|(k − 1)

2.

Statement 5.18. For every ε ∈ (0, 1) Theorem 5.15 holds if the constant cε forsome parameters k, l ≥ 1 satisfies the inequalities:

cε ≥ 2

log

0@„ε− ε(k−1)2l

«µ′

log ε−l−log log( 1µ′ )

1Alog µ′ where µ′ = 1− ε

2k2l(5.19)

50

andl + log cε + log b− logε ≥ 0.

Proof. We parametrise the original proof of Theorem 5.15 by two real variables k andl. As already mentioned in the introduction of Lemma 5.17, the parameter k controlsthe size of the set T0(S). The limit is changed to |S|

kfor k ∈ R, k ≥ 1. Another

constant is present in the dimension f of the factor space F , f =⌈log(

2|S|ε

)⌉. The

second parameter l is obtained by setting f =⌈log(

2l|S|ε

)⌉.

We summarise the claims of the original proof and recall its objects that arepresent in the current one. We factorise a random uniformly chosen linear transfor-mation T : Zu

2 → Zb2 through the vector space Zf

2 . From Model 5.4 we obtained twolinear mappings T0 : Zu

2 → Zf2 and a T1 from Zf

2 onto Zb2. For the existence of a

surjective transformation T1 we need f ≥ b. Since

f =

⌈log

(2l|S|ε

)⌉≥ l + log cε + b+ log b− log ε

then from the inequality l + log cε + log b− log ε ≥ 0 it follows f ≥ b.We continue by using the Law of Total Probability as in the original proof:

Pr(T (S) 6= Zb

2

)= Pr

(T (S) 6= Zb

2 ∧ |T0(S)| ≤ |S|k

)+ Pr

(T (S) 6= Zb

2 ∧ |T0(S)| > |S|k

)≤ Pr

(|T0(S)| ≤ |S|

k

)+ Pr

(T1(T0(S)) 6= Zb

2 ∧ |T0(S)| > |S|k

).

To satisfy the required statement we show Pr(T (S) 6= Zb

2

) ≤ ε. The requiredinequality is obtained by choosing a convenient value of cε. Of course, the value iscomputed according to the values of the parameters. The estimate of cε is furthermodified when compared to the original proof. Value of cε is computed directlywithout any inevitable estimates only specifying its accuracy.

The probability of the first event, |T0(S)| ≤ |S|k

, is estimated using the Markov’s

inequality, too. By Lemma 5.17 the number of collisions dS is at least |S|(k−1)2

.

The expected number of collisions remains equal to

(|S|2

)2−u. We estimate the

probability as

Pr

(|T0(S)| ≤ |S|

k

)≤ Pr

(dS ≥ |S|(k − 1)

2

)

≤

(|S|2

)2−f

|S|(k−1)2

=|S| − 1

(k − 1)2f.

We state the obtained bound as a standalone claim.

51

Claim 5.20.

Pr

(|T0(S)| ≤ |S|

k

)≤ |S| − 1

(k − 1)2f.

Our estimate of the probability of the event, T1(T0(S)) 6= Zb2 ∧ |T0(S)| > |S|

k,

is based on Theorem 5.14. This is also similar to the original proof. Recall thatTheorem 5.14 is used for the set T0(S), the transformation T1, the source vector space

Zf2 and the target vector space Zb

2. The corresponding inverse density is µ = 1− |T0(S)|2f

.Then

Pr

(T1(T0(S)) 6= Zb

2 ∧ |T0(S)| > |S|k


µ).

The following upper bound on the inverse density µ becomes soon very useful.In the last inequality we used a tight upper bound following from the definition of

the dimension f =⌈log(

2l|S|ε

)⌉,

µ = 1− |T0(S)|2f

≤ 1− |S|k2f≤ 1− ε

2k2l.

The assumption of Theorem 5.14 placed on the set S

|S| ≥ cε2bb

and the choice of the dimension f give that

f =

⌈log

(2l|S|ε

)⌉≥ l + log cε + b+ log b− log ε.

The bound on the investigated probability is obtained from Corollary 5.8 and byputting together all the above inequalities:

Pr

(T1(T0(S)) 6= Zb

2 ∧ |T0(S)| > |S|k


µ)

≤(

1− ε

2k2l

)l+log cε+b+log b−log ε−b−log b+log log

„1

1− ε2k2l

«

≤(

1− ε

2k2l

)l+log cε−log ε+log log

„1

1− ε2k2l

«.

Put µ′ = 1− ε2k2l

. These ideas completed an important part of the proof and aresummarised in the following claim.

Claim 5.21.

Pr

(T1(T0(S)) 6= Zb

2 ∧ |T0(S)| > |S|k

)≤ µ′

l+log cε−log ε+log log“

1µ′

”.

52

The probability estimate of Pr(T (S) 6= Zb

2

)follows from Claim 5.20 and Claim

5.21.

Pr(T (S) 6= Zb

2

)≤ Pr

(|T0(S)| ≤ |S|

k

)+ Pr

(T1(T0(S)) 6= Zb

2 ∧ |T0(S)| > |S|k

)≤ |S|

(k − 1)2f+ µl+log cε−log ε+log log( 1

µ)

≤ ε|S|(k − 1)|S|2l + µ′


1µ′

”

≤ ε

(k − 1)2l+ µ′


1µ′

”.

Our goal is to choose cε such that it is minimal possible and this probability willbe less than ε. Thus we can compute

ε

(k − 1)2l+ µ′

log cε+l−log ε+log log“

1µ′

”≤ ε

µ′log cεµ′

l−log ε+log log“

1µ′

”≤ ε− ε

(k − 1)2l(ε− ε

(k − 1)2l

)µ′

log ε−l−log log“

1µ′

”≥ µ′

log cε

log

((ε− ε

(k−1)2l

)µ′

log ε−l−log log“

1µ′

”)log µ′

≤ log cε.

Thus cε may be chosen minimal so that it satisfies Inequality 5.19:

cε ≥ 2

log

0@„ε− ε(k−1)2l

«µ′

log ε−l−log log( 1µ′ )

1Alog µ′ .

In order to obtain a better value of cε we parametrised the proof of Theorem5.15. The value is computed for every choice of parameters k and l directly from theabove expression. The next corollary states the best value of cε we have managed toachieve.

Corollary 5.22. For ε = 0.98 the value of the constant cε may be chosen as 67.77.

Proof. Choose k = 3.28 and l = 0.5, then certainly f ≥ b. Since there are no otherassumptions this choice is valid.

53

5.4 Improving the Results

When we use the estimate of Statement 5.18 in Theorem 5.38, of course, we getsmaller values of the multiplicative constant in the estimate of E (lpsl). However,the constant is on the border of being practical. More precisely said, usage of thebound becomes suitable only for hashing enormous sets. Although such sets are notuncommon, they are rarely present in everyday use. The worst case guarantee playseven more important role for such large sets. However the question is, if we areable to further improve it. And the answer is yes, if we bring some new ideas. Inaddition, the chain limit rule based on the proposed improvement starts beating thelinear one, lpsl ≤ n, for n in the order of hundreds.

We bring novel ideas and again improve the value of cε in the next statement.

Statement 5.23. For every ε ∈ (0, 1) Theorem 5.15 holds if the constant cε forsome parameters k, l ≥ 1 satisfies the inequalities:

cε ≥ 2

log

0@„ε− 2l

2lk−2

«·µ′l−log log( 1

µ′ )1A

log µ′ where µ′ = 1− 1

k(5.24)

andlog b+ log cε − l ≥ 0.

Proof. The proof we present is fully parametrised like that of Statement 5.18. Wecan choose values of its arguments to optimise the value of the constant cε. To brieflypresent the sketch of the proof recall that we need a factor space Zf

2 , f ≥ b. Thenwe decompose a uniformly chosen random linear map T into two mappings T0 anda surjective T1 such that T = T1 T0 going through the vector space Zf

2 .Instead of making the factor space Zf

2 larger than the set S, we bound its size

between |S|2l

and 2|S|2l

where l ≥ 1 is a parameter. In order to exist a surjective

linear transformation from Zf2 onto Zb

2 it must be |Zf2 | ≥ |Zb

2|. By the inequalitylog b+ log cε − l ≥ 0 and the requirement |S| ≥ cεb2

b it will be satisfied.

For every value |S|2l

, l ∈ R, l ≥ 1, there is an uniquely determined number f ∈ Nsuch that |S|

2l≤ 2f ≤ 2|S|

2l. More formally f =

⌈log(|S|2l

)⌉. Then

f = dlog |S|e − l ≥ dlog cε + log be − l + b ≥ b.

The second novel idea is not to make the size of the set T0(S) relative to thatof the set S. We refer to the size of the factor space Zf

2 instead. We introduce a

new parameter k ∈ R, k ≥ 1. We show, if |T0(S)| ≤ 2f

k, then there are at least

|S|2

(k|S|2f− 1)

collisions caused by T0. If |T0(S)| ≤ |S|k′

for some k′ ≥ 1 then, by

Lemma 5.17, there are at least |S|(k′−1)

2collisions for S and T0. Hence if |S|

k′= 2f

k,

then k′ = k|S|2f

and thus if |T0(S)| ≤ Sk′

= 2f

k, then there exist at least |S|

2(k|S|

2f− 1)

collisions for S and T0.

54

As in the previous proof we estimate the two probabilities obtained by the Lawof Total Probability. Recall that the random variable dS denotes the number of colli-

sions and E (dS) =

(|S|2

)2−f . The bound on the first probability, Pr

(|T0(S)| ≤ 2f

k

),

is found again using the Markov’s inequality,

Pr

(|T0(S)| ≤ 2f

k

)≤ Pr

(dS ≥ |S|

2

(k|S|2f− 1

))≤ |S|(|S| − 1)

2 · 2f |S|2

(k|S|2f− 1)

≤ |S|k|S| − 2f

≤ |S|k|S| − 2|S|

2l

=2l

2lk − 2.

As in the previous statement we state the fact in a claim.

Claim 5.25.

Pr

(|T0(S)| ≤ 2f

k

)≤ 2l

2lk − 2.

The remaining case occurs when |T0(S)| > 2f

k. Define µ as the inverse density of

the set T0(S) in the factor space Zf2 . Then the assumption |T0(S)| > 2f

kimplies

µ = 1− |T0(S)|2f

< 1− 1

k< 1.

From the choice of the dimension f and the assumption of Theorem 5.14

|S| ≥ cεb2b

it follows that

f =

⌈log

( |S|2l

)⌉≥ log cε + log b+ b− l.

From Theorem 5.14 used for the set T0(S), the source space Zf2 , the inverse

density µ, the target space Zb2 and the transformation T1 we have

Pr

(T (S) 6= Zb

2 ∧ |T0(S)| > 2f

k

)≤ µf−b−log b+log log 1

µ

≤ µlog cε+log b+b−l−b−log b+log log 1µ

≤(

1− 1

k

)log cε−l+log log

„1

1− 1k

«.

This result is worth remembering.

55

Claim 5.26.

Pr

(T (S) 6= Zb

2 ∧ |T0(S)| > 2f

k

)≤(

1− 1

k

)log cε−l+log log

„1

1− 1k

«.

Recall, we use the Law of Total Probability, Theorem B.13, to split the expressionPr(T (S) 6= Zb

2

). To prove the statement, the probability of event T (S) 6= Zb

2 mustbe less than ε. Hence it suffices

Pr(T (S) 6= Zb

2

)= Pr

(T (S) 6= Zb

2 ∧ |T0(S)| ≤ 2f

k

)+ Pr

(T (S) 6= Zb

2 ∧ |T0(S)| > 2f

k

)≤ Pr

(|T0(S)| ≤ 2f

k

)+ Pr

(T (S) 6= Zb

2 ∧ |T0(S)| > 2f

k

)≤ ε.

Put µ′ = 1− 1k. From Claims 5.25 and 5.26 we obtain

2l

2lk − 2+ µ′

log cε+log log“

1µ′

”−l ≤ ε(

ε− 2l

2lk − 2

)· µ′l−log log

“1µ′

”≥ µ′

log cε

log

((ε− 2l

2lk−2

)· µ′l−log log

“1µ′

”)log µ′

≤ log cε.

Thus the value of the constant cε may be chosen as the minimal one satisfyingfor some k, l ≥ 1 the inequality:

cε ≥ 2

log

0@„ε− 2l

2lk−2

«·µ′l−log log( 1

µ′ )1A

log µ′ .

The following corollary shows an interesting value of cε achieved by the previousstatement.

Corollary 5.27. For ε = 0.8967 the value of the constant cε may be chosen as 17.31.

Proof. Use the previous statement for k = 2.06 and l = 2. Clearly,

log cε + log b− l ≥ 0.

56

5.5 Probability Distribution of the Variable lpsl

Theorems 5.14 and 5.15 and Corollaries 5.8, 5.9 and 5.16 give us enough power toachieve our first goal – asymptotic restriction of the expected length of the chainlength. But first, we find the probability distribution of the variable lpsl for varioussituations depending on the size of the stored set.

In the original article [3] authors suppose hashing even super-linear amount ofm logm elements into a table consisting of m slots. Common models use load factorslower than one and we suppose hashing of αm elements. When storing sets of sizeequal to αm for a bounded load factor α, the expected length can not grow, ifcompared to hashing m logm elements. Every stored set can be further extendedinto m logm elements and the estimate must still remain valid. Unfortunately, thisargument does not bring a reasonable result.

Therefore we further generalise and refine our statements. We discover an impor-tant dependence of E (lpsl) on the table’s load factor. We later state that E (lpsl)is proportional to the load factor α.

First let us summarise what is shown in the following pages. We use two basicideas – factorisation and probability estimates of two highly correlated events. Thetwo events E1 and E2 allow us to estimate the probability of the existence of a chainwith length at least l, l ∈ N.

Then we show Remark 5.36 that comes from the original work [3]. It is immedi-ately improved and the results are later applied for hashing. The theorems we stateare not expressed in the notation of hashing. Rather we say them in general termsof linear algebra. Then we use the theorems in the model proposed by us.

The notation of the section is the following. The source space, represents laterthe universe, is the vector space Zu

2 . The target space, becomes a representation ofthe hash table, is denoted by Zb

2. As done many times before, we factor a randomuniformly chosen linear transformation T ∈ LT (Zu

2 ,Zb2) through the factor space Zf

2 .Model 5.4 shows the existence of two linear functions T0 : Zu

2 → Zf2 and T1 : Zf

2 → Zb2.

In addition the last one is surjective and they are chosen uniformly among LT (Zu2 ,Z

f2)

and LTS(Zf2 ,Zb

2). Definitions 5.28 and 5.29, Remarks 5.31, 5.33 and Remark 5.36stating the basic probability estimate and Theorem 5.38 giving the basic estimate ofE (lpsl) are from [24], too.

Definition 5.28 (Event E1(S, T, l)). Let l ∈ N, T : Zu2 → Zb

2 be a linear transfor-mation and S ⊂ Zu

2 . Event E1(S, T, l) occurs if there is a subset of S containing atleast l elements mapped by the function T on a singleton,

E1(S, T, l) ≡ ∃~y ∈ Zb2 : |T−1(~y) ∩ S| > l.

The event E1 corresponds to the existence of a chain of length at least l. Thesecond event is defined to simplify the estimate of probability of the event E1. Itmay seem quite unnatural but it fits the scheme of Theorem 5.14 as shown later.

Definition 5.29 (Event E2(S, T0, T1)). Let T0 : Zu2 → Zf

2 , T1 : Zf2 → Zb

2 be lineartransformations with T1 being surjective and S ⊂ Zu

2 . Event E2(S, T0, T1) occurs if

E2(S, T0, T1) ≡ ∃~y ∈ Zb2 : T−1

1 (~y) ⊆ T0(S).

57

Zb2

Zf2

T0(S)

~yT−1

1 (~y)

T1

Figure 5.1: Occurrence of the event E2.

Remember that when it is clear what we mean by S, T0, T1 and l we omit theparametrisation of the events and just use E1 or E2.

Now we will point an equivalent definition of the event E2 showing why it maybe used with Corollary 5.16.

Remark 5.30. Let T0 : Zu2 → Zf

2 , T1 : Zf2 → Zb

2 be linear transformations withT1 being surjective and S ⊂ Zu

2 . Then the event E2(S, T0, T1) occurs if and only ifT1(Zf

2 − T0(S)) 6= Zb2. Equivalently

(E2(S, T0, T1) ≡ ∃~y : T−11 (~y) ⊆ T0(S))⇔ (T1(Zf

2 − T0(S)) 6= Zb2).

Proof. To prove the direction from the left to the right assume that the event E2

occurs. This happens if there is a vector ~y ∈ Zb2 such that T−1

1 (~y) ⊆ T0(S). Hencetransformation T1 can not map the set Zf

2 − T0(S) onto the vector space Zb2 since

~y /∈ T1(Zf2 − T−1(~y)) ⊇ T1(Zf

2 − T0(S)) or equivalently Zb2 6= T1(Zf

2 − T0(S)).Now we show the reverse direction. If Zb

2 6= T1(Zf2−T0(S)), then there is a vector

~y ∈ Zb2 such that ~y /∈ T1(Zf

2−T0(S)). Since T1 is surjective we have that T1(Zf2) = Zb

2.Since there is no point in Zf

2 − T0(S) mapped onto ~y it follows that the preimage of~y must be contained in T0(S). Thus T−1

1 (~y) ⊆ T0(S).

Now, in the next remark, we use the previous equivalence to estimate the prob-ability of the event E2.


2 , T1 : Zf2 → Zb

2 be linear transformations with T1

being surjective and S ⊂ U . Set d =|Zf2 |b2b

. If |S| ≤ b2b and d > 1, then

Pr (E2(S, T0, T1))) ≤ d− log d−log log d.

Proof. We apply Theorem 5.14 for the transformation T1, the set Zf2 − T0(S), the

target space Zb2 and the inverse density µ = 1− |Zf2−T0(S)|

|Zf2 |and we obtain

Pr(T1(Zf

2 − T0(S)) 6= Zb2

)≤ µf−b−log b+log log 1

µ .

58

We can estimate µ using only the value d as

µ = 1− |Zf2 − T0(S)||Zf

2 |= 1− |Z

f2 | − |T0(S)||Zf

2 |=|T0(S)||Zf

2 |≤ |S||Zf

2 |≤ b2b

|Zf2 |

=1

d< 1.

To obtain the bound claimed by the remark rewrite the logarithm of the value d,

log d = log|Zf

2 |b2b

= log |Zf2 | − log(b2b) = f − b− log b.

In the following computation we use Corollary 5.8 to remove the inverse densityµ. Since E2 ≡ T1(Zf

2 − T0(S)) 6= Zb2 and µ ≤ 1

dwe have the needed bound,

Pr (E2) ≤ µf−b−log b+log log( 1µ)

= µlog d+log log( 1µ)

≤(

1

d

)log d+log log d

= d− log d−log log d.

To meet the assumptions of Theorem 5.14, we must have that ∅ 6= Zf2 − T0(S)

and Zf2 − T0(S) 6= Zf

2 . Since the S 6= ∅, it certainly holds that Zf2 − T0(S) 6= Zf

2 .

Because d =|Zf2 ||S| > 1, it follows that |Zf

2 | > |S| ≥ |T0(S)| and the set Zf2 − T0(S)

then can not be empty.

We show similar remarks fitting the situation when the size of the set S is pro-portional to that of the target space Zb

2. This corresponds to the situation whenhashing only αm elements.

Statement 5.32. Let T0 : Zu2 → Zf

2 , T1 : Zf2 → Zb

2 be linear transformations withT1 being surjective and S ⊂ U . Let α ∈ R, α > 0 and assume that |S| = α2b.

1. If d = 2f

αb2band d > 1, then Pr (E2(S, T0, T1))) ≤ d− logα−log d−log log d.

2. If d = 2f

α2band d > 1, then Pr (E2(S, T0, T1))) ≤ dlog b−logα−log d−log log d.

Proof. We slightly changed the choice of the variable d and naturally moved to adifferent size |S|. The proof itself remains almost the same.

First, we show µ ≤ 1d< 1 in either case. Observe that

µ = 1− |Zf2 − T0(S)||Zf

2 |=|T0(S)||Zf

2 |≤ |S||Zf

2 |=α2b

2f.

In the first case, d = 2f

αb2b, we have

µ ≤ α2b

2f≤ αb2b

2f=

1

d< 1.

59

In the second case, d = 2f

α2b:

µ ≤ α2b

2f=

1

d< 1.

The wanted estimate thus holds.Secondly, we find the log d in either case. In the first one, d = 2f

αb2b:

log d = f − logα− b− log b.

In the second one, d = 2f

α2b, we have

log d = f − logα− b.From µ < 1 and |S| = α2b > 0 we have the assumption ∅ 6= Zf

2 − T0(S) 6= Zf2

of Theorem 5.14 met. Now we use Theorem 5.14 for the transformation T1, the setZf

2 − T0(S) and the inverse density µ and obtain

Pr (E2) ≤ µf−b−log b−log log( 1µ).

To prove the statement, we continue similarly as in the proof of Remark 5.31.The next estimates follow from Corollary 5.8 and the fact that µ ≤ 1

d. In the first

case we have that


= µlog d+logα+log log( 1µ)

≤(

1

d

)log d+logα+log log d

= d− logα−log d−log log d.

In the second case we have that


= µlog d+logα−log b+log log( 1µ)

≤(

1

d

)log d+logα−log b+log log d

= dlog b−logα−log d−log log d.

A similar remark for an estimate of the conditional probability of the event E2 | E1

now follows.


2 and T1 : Zf2 → Zb

2 be random uniformly chosenlinear transformations with T1 being surjective and T = T1 T0. Let cε be a constantfor which Theorem 5.15 is satisfied, S ⊂ Zu

2 , ε ∈ (0, 1) and l ∈ N, l ≥ cε(f − b)2f−b.Then

Pr (E2(S, T0, T1) | E1(S, T, l)) ≥ 1− ε.

60

T0 T1

Zu2 Zf

2

Zb2

~y

T

~0 ~0

UA = T−1(~y)

FA = T−11 (~y)

S

SA = S ∩ UA

~0

TA|UA

T1

Figure 5.2: Image depicting the situation in the proof.

Proof. According to Model 5.4 we have that the linear transformation T = T1 T0

is chosen uniformly.Assume that the event E1 occurs, then there is a vector ~y ∈ Zb

2 : |T−1(~y)∩S| > l.We fix the vector ~y and let UA = T−1(~y) and FA = T−1

1 (~y). So there is a maximalsubset SA ⊆ S, |SA| > l mapped to ~y. Notice that SA = UA ∩ S. According toLemma A.10 the sets UA and FA are affine vector subspaces and |FA| = 2f−b.

Since T0 is a random uniformly chosen linear mapping from Model 5.5 it followsthat its restriction T0|UA is also a random uniformly chosen affine linear transforma-tion.

Now we use Corollary 5.16 for the source space UA, the set SA ⊆ UA, the targetspace FA and mapping T0|UA and obtain that

Pr (T0|UA(SA) = FA | E1) ≥ 1− ε.

If FA = T−11 (~y) ⊆ T0(S), then the event E2 certainly occurs. Stated in the

language of probability:

Pr (FA ⊆ T0(S) | E1) ≤ Pr (E2 | E1) .

From FA = T0|UA(S) ⊆ T0(S) it follows that:

Pr (E2 | E1) ≥ Pr (FA ⊆ T0(S) | E1) ≥ Pr (FA = T0|UA(SA) | E1) ≥ 1− ε.

Thus the proof of Remark 5.33 finishes.

61

The next corollary puts Remark 5.33 and Lemma B.10 together.

Corollary 5.34. Let T0 : Zu2 → Zf

2 and T1 : Zf2 → Zb

2 be a random uniformly chosenlinear transformations with T1 being surjective. Let ε ∈ (0, 1), S ⊂ Zu

2 and l ∈ N,l ≥ cε(f − b)2f−b. Then

Pr (E1(S, T, l)) ≤ 1

1− εPr (E2(S, T0, T1)) .

Proof. The proof is a straightforward use of Remark 5.33 and Lemma B.10. Theyimply that

Pr (E1) ≤ Pr (E2)

Pr (E2 | E1)≤ 1

1− εPr (E2) .

Definition 4.3 of the variable lpsl comes from the area of hashing and uses thenotation of chains and their lengths. However we can refer to it now, too. Assumethat the universe is equal to the vector space Zu

2 and the hash table is represented bythe vector space Zb

2 and S ⊂ U . The randomness is brought by the uniform choiceof a linear transformation T : Zu

2 → Zb2 among LT (Zu

2 ,Zb2). Realise that the random

variable psl(~b) = |T−1(~b) ∩ S| for a vector ~b ∈ Zb2. By setting lpsl = max

~b∈Zb2psl(~b)

their meanings remain the same.

Lemma 5.35. If T : Zu2 → Zb

2 is a random uniformly chosen linear transformation,S ⊂ Zu

2 and l ∈ N, then E1(S, T, l)⇔ lpsl > l.

Proof. The event E1(S, T, l) denotes the existence of a vector ~y ∈ Zb2 such that

|T−1(~y) ∩ S| > l. Observe that such vector ~y exists if and only if the variablelpsl > l because

(∃~y ∈ Zb2 : |T−1(~y) ∩ S| > l)⇔ (∃~y ∈ Zb

2 : psl(~y) > l)⇔ (lpsl > l).

We are able to bound the probability density function of the random variablelpsl when referring to its above definition.

Remark 5.36. Let T : Zu2 → Zb

2 be a random uniformly chosen linear map, ε ∈ (0, 1)and S ⊂ Zu

2 , |S| ≤ b2b. Then for every r ≥ 4

Pr (lpsl > 4cεrb log b) ≤ 1

1− ε(

r

log r

)− log( rlog r )−log log( r

log r ).

Proof. In the proof we conveniently use Corollary 5.34, Remark 5.31 and Lemma5.35. We only have to choose their parameters. First we create a factor spaceZf

2 , its dimension is specified later. From Model 5.5 and the random uniform andindependent selection of two linear transformations T0 : Zu

2 → Zf2 and surjective

62

T1 : Zf2 → Zb

2 it follows that the linear mapping T : Zu2 → Zb

2 such that T = T1 T0

is chosen uniformly as well. Fix the mappings T0, T1 and T .Now set

f = bb+ log b+ log r − log log r + 1c ,

l = 4cεrb log b,

d =2f

b2b≥ b2b

b2b· r

log r=

r

log r≥ 2.

The choice of f implies that |Zf2 | ≥ Zb

2| because

f ≥ b+ log b+ log r − log log r ≥ b.

Hence a surjective function T1 exists and may be fixed. Notice that the choices meetall the assumptions, d > 1, of Remark 5.31. To verify the condition l ≥ cε(f − b)2f−bof Remark 5.36 we first show that

2f−b ≤ 2b+log b+log r−log log r+1−b =2rb

log r.

From the fact, f − b ≤ log(

2rblog r

)≤ log b log r, we have that the assumption holds

since

cε(f − b)2f−b ≤ cε2rb

log rlog

(2rb

log r

)≤ 2cεb

r

log rlog b log r

≤ 4cεrb log b

= l.

From Corollary 5.34, Remark 5.31 and Corollary 5.9 it follows that

Pr (E1) ≤ 1

1− εPr (E2)

≤ 1

1− εd− log d−log log d

≤ 1

1− ε(

r

log r


log r ).

According to Lemma 5.35 the event E1 occurs if and only if lpsl > l. The proofis completed by writing down the facts obtained so far

Pr (lpsl > l) = Pr (lpsl > 4cεrb log b)

= Pr (E1(S, T, 4cεrb log b))

≤ 1

1− ε(

r

log r


log r ).

63

We modify the previous remark for the set S having |S| = α2b.

Remark 5.37. Let T : Zu2 → Zb

2 be a random uniformly chosen linear mapping,ε ∈ (0, 1), r ≥ 4 and α ∈ (0.5, log r

2). If S is a subset of Zu

2 such that |S| = α2b, then

Pr (lpsl > 4cεαrb log b) ≤ 1

1− ε(

r

log r

)− logα−log( rlog r )−log log( r

log r )and

Pr (lpsl > 2αcεr) ≤ 1

1− ε(

r

log r

)log b−logα−log( rlog r )−log log( r

log r ).

Proof. We do not repeat the whole proof of Remark 5.36 since the approach remainsthe same. Like in the previous proof, fix the linear mappings T0, surjective T1 withT = T1 T0 and let Zf

2 be the factor space. We just show the choices that prove theremark. The choices for the first claim are indexed with 1 and in the second casewe use the index 2. When we refer to a chosen variable without an index, then thestatement, in which it appears, must be valid for either choice. Now perform thechoices by setting

f1 = bb+ log b+ log r − log log r + logα + 1c ,

f2 = bb+ log r − log log r + logα + 1c ,

d1 =2f1

αb2b,

d2 =2f2

α2b,

l1 = 4cεαrb log b,

l2 = 2αcεr.

So that T1 exists, we have to verify that |Zf2 | ≥ |Zb

2|. It is enough to have f ≥ b.The inequality f ≥ b follows from the fact log r − log log r + logα ≥ 0 and from ourchoice of f since

f1 ≥ b+ log b+ log r − log log r + logα ≥ b,

f2 ≥ b+ log r − log log r + logα ≥ b.

Secondly, we show that the assumptions of Statement 5.32 are met for bothchoices of d. Precisely we need d > 1 and this is satisfied because

d1 =2f1

αb2b≥ r

log r≥ 2,

d2 =2f2

α2b≥ r

log r≥ 2.

Naturally, we want to meet the condition placed on the variable l of Corollary5.34. Recall the assumption α < log r

2, which is equivalent to log 2α

log r< 0, that is

used in either case. Let us discuss the first case:

f1 − b ≤ log b+ log r − log log r + logα + 1 ≤ log b+ log r ≤ log b log r.

64

Thus for the value of the variable l1 we have that

cε(f1 − b)2f1−b < cε

(2αbr

log r

)(log b log r)

≤ 4cεαrb log b

= l1.

The validity in the second case follows from

cε(f2 − b)2f2−b ≤ cε

(2αr

log r

)log

(2αr

log r

)≤ 2cεα

r

log rlog r

= 2cεαr = l2.

Thus we are able to refer to Corollary 5.34. Now use Lemma 5.35, Corollary 5.34,Remark 5.31 and Corollary 5.9 in either case. In the first one we have

Pr (lpsl > 4cεαrb log b) = Pr (lpsl > l1)

= Pr (E1(S, T, l1))

≤ 1

1− εPr (E2)

≤ 1

1− εd− logα−log d−log log d

=1

1− ε(

r

log r


log r ).

And in the second case we have

Pr (lpsl > 2αcεr) = Pr (lpsl > l2)

= Pr (E1(S, T, l2))

≤ 1

1− εPr (E2)

≤ 1

1− εdlog b−logα−log d−log log d

≤ 1

1− ε(

r

log r


log r ).

5.6 The Expected Value of the Variable lpsl

In this section we estimate the expected value of the variable lpsl, E (lpsl). We ex-ploit bounds from Section 5.5 which give its cumulative probability density function.We are interested in this expected value because it corresponds to the length of thelongest chain when using the system of linear transformations as a universal class.

65


2 be a random uniformly chosen linear transfor-mation and ε ∈ (0, 1). If S is a subset of the vector space Zu

2 such that |S| ≤ b2b,then

E (lpsl) ≤ 4cε(4 + Iε)b log b.

The value Iε is defined as:

Iε =

∞∫4

1

1− ε(

r

log r


log r )dr. (5.39)

Proof. First set Kε = 4cεb log b. In order to prove the theorem we use the probabilitydistribution of the variable lpsl from Remark 5.36. For every r ≥ 4 it states that

Pr (lpsl ≥ rKε) ≤ 1

1− ε(

1

log r


log r ).

Lemma B.24 implies

E (lpsl) =

∞∫0

Pr (lpsl > l) dl

≤ 4Kε +

∞∫4Kε

Pr (lpsl > l) dl

= 4Kε +Kε

∞∫4

Pr (lpsl > rKε) dr

≤ 4Kε +Kε

∞∫4

1

1− ε(

r

log r


log r )dr

= Kε(4 + Iε) ∈ O(Kε) = O(b log b).

The fact that the integral Iε is convergent for every ε ∈ (0, 1) is shown later inLemma 5.45.

Multiplicative constant 4cε(4 + Iε) plays an important role for a practical useof the result. For example when choosing ε equal to 1

2the value of the constant

cε equals 417 when using the original estimate. Our next goal is to show a betterestimate when assuming that α ≤ 1 and using our results.


2 be a random uniformly chosen linear transforma-tion, ε ∈ (0, 1) and α ∈ [0.5, 1]. If S is a subset of the vector space Zu

2 such that|S| = α2b, then

E (lpsl) ≤ 4αcε(4 + Iε)b log b.

The value Iε is defined in Equality 5.39.

66

Proof. Set Kε = 4cεb log b. When estimating Pr (lpsl > rαKε) we use Remark 5.37.Its assumptions are satisfied since we use it for r ≥ 4 and α ≤ 1 ≤ log r

2. It states

that

Pr (lpsl > rαKε) ≤ 1

1− ε(

r

log r


log r ).

Since α ≤ 1, then logα ≤ 0 and thus we conclude that

Pr (lpsl > rαKε) ≤ 1

1− ε(

r

log r


log r ).

Now we compute the expected value using Lemma B.24 as

E (lpsl) =

∞∫0

Pr (lpsl > l) dl

≤ 4αKε +

∞∫4αKε

Pr (lpsl > l) dl

= 4αKε + αKε

∞∫4

Pr (lpsl > rαKε) dr

≤ 4αKε + αKε

∞∫4

1

1− ε(

r

log r


log r )dr

= αKε(4 + Iε) ∈ O(αKε) = O(αb log b).

5.7 The Achieved Bound

The improved estimate of cε combined with the last claim of Remark 5.37 are usedtogether to show a tighter bound on E (lpsl).

Theorem 5.41. Let b ∈ N, b ≥ 4, T : Zu2 → Zb

2 be a random uniformly chosenlinear transformation, ε ∈ (0, 1) and α ∈ [0.5, 1]. If S is a subset of the vector spaceZu

2 such that |S| = α2b, then

E (lpsl) ≤ 4cεα

1− εb log b+ 2cεα

(Jα,b − 4

1− ε + 4

).

The value Jα,b is defined as

Jα,b =

∞∫2b log b

(r

log r


log r )dr. (5.42)

67

Proof. From Remark 5.37, Lemma B.24 and Lemma 5.46 it follows that

E (lpsl) =

∞∫0

Pr (lpsl > l) dl

≤ 8cεα +

∞∫8cεα

Pr (lpsl > l) dl

= 8cεα + 2cεα

∞∫4

Pr (lpsl > 2cεαr) dr

= 2cεα

4 +1

1− ε

∞∫4

min

(1,

(r

log r


log r ))dr

≤ 2cεα

(4 +

1

1− ε (2b log b− 4 + Jα,b)

).

Let the value J ′α,b denote the integral,

J ′α,b =

∞∫4

min

(1,

(r

log r


log r ))dr. (5.43)

Later, in Lemma 5.46 we estimate the integral J ′α,b by 2b log b− 4 + Jα,b, whereThe whole bound on the expected value, E (lpsl), then looks like:

E (lpsl) ≤ 4cεα

1− εb log b+ 2cεα

(Jα,b − 4

1− ε + 4

).

In advance, let us note that from Lemma 5.46 it follows that Jα,b ≤ 3.36. Thebound thus exists for every α ∈ [0.5, 1].

Corollary 5.44. Let b ∈ N, b ≥ 4, T : Zu2 → Zb

2 be a random uniformly chosenlinear transformation and α ∈ [0.5, 1]. If S is a subset of the vector space Zu

2 suchthat |S| = α2b, then

E (lpsl) ≤ 538αb log b+ 44.

Proof. The best estimate on the expected value E (lpsl) is achieved by the parametri-sation technique using the estimate of cε from Statement 5.23. For the choice ofε = 0.8 and by settings parameters k = 2.26, l = 2 of Statement 5.23 we get therequired bound.

5.8 Minimising the Integrals

This integral Iε is a part of the multiplicative constant 4cε(4 + Iε). The next lemmastates a way how to estimate it and compute the value of the multiplicative constantso that it is minimal.

68

In the original proof of Theorem 5.38 in [3] only the special case I 12

is considered.

We moved to Iε because choosing other values for ε ∈ (0, 1) may bring an improve-ment for the value of the multiplicative constant. Indeed, if we select different valueof ε and use better estimates for the constant cε, then we really obtain a smallervalue of the multiplicative constant and a tighter bound on E (lpsl).

Lemma 5.45. The integral Iε from Equality 5.39 converges for every ε ∈ (0, 1) and

Iε ≤ 4.8

1− ε .

Proof. We recall that the integral Iε equals

Iε =1

1− ε

∞∫4

(r

log r


log r ).

Evaluation of the integral Iε is split into two parts. We compare the integrandof Iε to a function f : R→ R which has a convergent improper integral

∫∞4f(x)dx.

The function f chosen here is x−1.5.The function f majors the integrand only for r ≥ 16. Therefore we bound the

value of the integral Iε in the interval [4, 16] by its upper Riemann sum.For r = 16, the function f equals f(r) = r−1.5 = 1

64. The value of the integrand

equals (16

log 16

)− log( 16log 16)−log log( 16

log 16)= 4−2−1 =

1

64.

By combining our estimates we obtain that

Iε ≤ 1

1− ε

15∑r=4

(r

log r


log r )+

∞∫16

1

r1.5dr

≤ 1

1− ε(

4.3 +1

2

)=

4.8

1− ε .

The estimate of the Riemann sum can be computed numerically.

Lemma 5.46. Let b ∈ N, b ≥ 4, α ∈ [0.5, 1], J ′α,b be the integral defined in Equality5.43 and Jα,b be the integral defined in Equality 5.42. Then

J ′α,b ≤ 2b log b− 4 + Jα,b and

Jα,b ≤ 3.36.

Proof. We estimate the integral

J ′α,b =

∞∫4

min

(1,

(r

log r


log r ))dr

69

in a similar way to the previous one.However, the situation is slightly complicated. First we must realise that if the

exponent is negative, the integrand is certainly less than one. We show that ifr > 2b log b, then the exponent is negative.

First we show that r > 2b log b implies log(

rlog r

)≥ log b since

r

log r≥ 2b log b

1 + log b+ log log b

=2b

1 + 1log b

+ log log blog b

≥ 2b

2= b.

Since we assume b ≥ 4, we have that r > 2b log b ≥ 16. So if r > 2b log b, thenfor the remaining part of the exponent we have that

− logα− log log

(r

log r

)< − log 0.5− log log

16

4= 0.

Indeed, if r > 2b log b, then the exponent is negative.Now the integral J ′α,b is split into two parts according to the value of the variable r.

If r ∈ [4, 2b log b], then we estimate the integrand by one. In the interval [2b log b,∞)we use the value

Jα,b =

∞∫2b log b

(r

log r


log r )dr.

SoJ ′α,b ≤ 2b log b− 4 + Jα,b.

The integral Jα,b is determined in the similar way as the value of the integral Iεin Lemma 5.45. The integrand is majored by the function x−1.3 for x ≥ 2048. In theinterval [16, 2048] we estimate the integral by its upper Riemann sum. We partitionthe interval uniformly with the norm equal to 0.1. For simplicity put g(r) = r

log r

and estimate the integral Jα,b as

Jα,b =

∫ ∞2b log b

g(r)log b−logα−log g(r)−log log g(r)dr

≤∞∫

16

g(r)1−log log g(r)dr

≤∑

r∈16,16.1,...,2048

g(r)1−log log g(r)

10+

∞∫2048

r−1.3dr

≤ 3.01 +2048−0.3

0.3≤ 3.36.

70

Chapter 6

The Model of Universal Hashing

The hashing scheme we propose later in this chapter is a solution to the set represen-tation problem. Solution of this problems usually provide some basic operations suchas Insert, Delete and Find. These operations allow querying if an element is storedwithin the represented set, accessing the stored elements and inserting an element.Some schemes, e.g. double hashing, do not implement the element removal at all orthe efficient implementation is not known.

The must important are the operations’ running times. Various solutions tomany algorithmic problems prefer different operations when representing sets. Forinstance some applications query the stored data rarely, e.g. log files. On the otherhand, other applications of the set representation problem may store the data thatis almost never changed – find operation is preferred. Therefore the running timesof only selected operations are considered crucial.

In the case of a simple array we have O(1) time for the find procedure providedthat we know an element’s index. But insertion or deletion may take O(n) time.Better bounds for dynamic arrays can be obtained using the amortised complexity.Another example are balanced trees, they have running times typically bounded bythe logarithmic time. As already mentioned, the right answer which data structureshould be used lies in the estimated structure of the operations. The right choicemay be an asymptotic improvement. Anyway, this does not change the fact thatshort running times are appreciated.

In this chapter we analyse the running times of the universal hashing. We startby mentioning the known facts. Then, we analyse the universal hashing using thesystem of linear transformations over vector spaces. Finally, we propose a modelthat guarantees the worst case complexity of the find operation.

6.1 Time Complexity of Universal Hashing

In this section we assume that the system of hash functions, which we use, is atleast c-universal. The running time of the find operation is certainly proportional tothe length of the chain of an element’s bucket. The obvious worst case time is O(n)where n is the number of elements hashed. The universal hashing gives a far better

71

expected estimate, O(1).Recall the definitions and notation from Chapter 2. The value n denotes the size

of the represented set and m is the size of the table. The load factor of a hash tableis denoted by α = n

m.

Theorem 6.1. Assume that we store a set S in a hash table using a c-universal classof functions H. Let the table’s load factor α be upper bounded. Then the expectedlength of a chain is lower or equals cα.

Proof. We find the expected length of a chain containing arbitrary element x ∈ U .The expectation is taken over the uniform choice of a function from the universalfamily of functions. From the definition of the expected value we have that

E (psl) =

∑h∈H psl(h(x), h)

|H|=

∑h∈H

∑y∈S I(h(x) = h(y))

|H|=

∑y∈S∑

h∈H I(h(x) = h(y))

|H|=∑y∈S

Pr (h(x) = h(y))

≤ cn

m= cα.

Corollary 6.2. If we hash using a c-universal class and α denotes the table’s loadfactor, then the expected time of Find operation is 1 + cα.

Proof. The running time of the find operation is proportional to the length of thechain in which the searched element belongs. In the worst case Find operationiterates the whole chain. We also add time for retrieving the element’s hash valueand for checking if the chain is not empty. These operations are usually performedin a constant time. From Theorem 6.1 it follows that the expected length of a chainis cα. Since we have no assumptions on the distribution of the input, the expectedrunning time is then bounded by 1 + cα.

Corollary 6.3. The expected time of Find operation in every model of universalhashing is O(1) when the load factor is bounded.

Proof. Follows from Corollary 6.2.

6.2 Consequences of Trimming Long Chains

The model of hashing we propose guarantees the worst case bound on the length ofthe longest chain. Hence it bounds the running times of the operations. Knowledge

72

E (lpsl) enables us to state a bound on the length of a chain. If a chain, whoselength is greater than our bound, is found, then we choose another function in ourc-universal system and we rehash the represented set using the new function. Thisbound corresponds to the probability of the existence of a long chain. In fact, we setthe bound according to this probability. Now let us examine consequences of suchlimits on models of universal hashing.

Following computations support and motivate our later ideas. By the Markov’sinequality, we have that for every k > 1

Pr (lpsl > kE (lpsl)) ≤ 1

k.

This fact means that less than half of all the universal functions create longest chainslonger than 2E (lpsl) for arbitrary stored set. For instance, in case of the system oflinear transformations we may choose the limit as 2E (lpsl) ≤ 1 076 logm log logm+88. This bound follows from Corollary 5.44.

The expected length of the longest chain gives us a generic hint when the tableshould be rehashed if the worst case time for the find operation has to be guaranteed.The lower the value E (lpsl), or the tighter its estimate is, the better worst case limitis achieved.

Mentioned bound on the length of a chain, which is computed using the Markov’sinequality and the expected value, can be further improved. If the probability densityfunction of the random variable lpsl is known, then the density function may be useddirectly. Such limit l is associated with a probability p ∈ (0, 1) that can be computedfrom the density function as Pr (lpsl > l) ≤ p. And we can also go the other way,for the probability p we may find a minimal limit l with Pr (lpsl > l) ≤ p.

In addition, for the system of linear transformations we already have the proba-bility density function and we use it in Section 6.3, indeed. To sum up, the approachwith the expected length and the Markov’s inequality is more general but achievesgreater limits. The approach with the probability density function gives better re-sults. It is less general because if we know the probability density function of lpsl,then we are able to find E (lpsl).

Definition 6.4 (Chain length limit function, long chain, p-trimmed-system, trim-ming rate). Let H be a universal system of functions that map a universe U to a hashtable B. Let m be the size of the hash table, S ⊂ U be the stored set with n = |S|and α = n

mbe the table’s load factor.

Then function l : N × R+0 → N of variables m and α, l(m,α), is a chain length

limit function.We say that function h ∈ H creates a long chain when hashing the set S if there

is a chain of length strictly greater than the limit value l(m,α).Moreover let p ∈ (0, 1) be probability such that Pr (lpsl > l(m,α)) ≤ p, then the

system

HSp = h ∈ H | h does not create a long chain when hashing the set S

is called a p-trimmed system. The probability p is called the trimming rate.

73

The probability bound p ∈ (0, 1) of the existence of a long chain has importantconsequences for the model that limits its chains.

• At most p|H| of all the functions in the original universal system H create longchains – longer than the prescribed limit l(m,α).

• The probability that the table needs to be rehashed, equivalently probability ofselecting an inconvenient function, is lower than p provided the uniform choiceof a hash function.

• During rehashing caused by an occurrence of a long chain, the probability offinding a suitable function is at least 1− p when assuming the uniform choiceof a hash function.

Lemma 6.5. If HSp is a p-trimmed system, then |HS

p | ≥ (1− p)|H|.Proof. Simply use the definition of HS

p and that of the trimming rate p:

|HSp | = |h ∈ H | h does not create a long chain|

= Pr (lpsl ≤ l(m,α)) |H|= (1−Pr (lpsl > l(m,α))) |H|≥ (1− p)|H|.

Regarding that every function is chosen uniformly and the unsuitable ones arediscarded, we still perform the uniform selection of a hash function. The choice isrestricted to the functions that do not create long chains; to the class HS

p . Note thatthe restriction to the functions of the original universal system H comes from aninformation about the stored set.

Previous remarks are quite interesting. So now, we ask, if it is possible to usethe family HS

p as a universal one.

Theorem 6.6. Let H be a c-universal system of hash functions, U be a universeand B be a hash table. Let p ∈ (0, 1) be the trimming rate and set m = |B|. Thenfor every S ⊂ U the system of functions HS

p is c1−p-universal. Equivalently:

Pr(h(x) = h(y) for h ∈ HS

p

) ≤ 1

1− pPr (h(x) = h(y) for h ∈ H) .

Proof. From Lemma 6.5 and from the assumption of c-universality of the original

74

system it follows that

Pr(h(x) = h(y) for h ∈ HS

p

)=|h ∈ H | h(x) = h(y) ∧ h does not create long chains|

|h ∈ H | h does not create long chains|≤ |h ∈ H | h(x) = h(y) ∧ h does not create long chains|

(1− p)|H|≤ |h ∈ H | h(x) = h(y)|

(1− p)|H|=

1

1− pPr (h(x) = h(y) for h ∈ H)

≤ c

(1− p)m .

Hence the system HSp is c

1−p -universal.

Similar statements hold for the strongly universal systems. The probability ofa collision in the system HS

p is always 11−p times higher than the probability of a

collision in the original system H.Next statement summarises results for the systems of linear transformations.

Corollary 6.7. For every trimming rate 0 < p < 1 the p-trimmed system of lineartransformations, LT (U,B)Sp , is 1

1−p-universal.

Proof. System of linear transformations is 1-universal as seen in Remark 3.13. Thefact then follows from Theorem 6.6.

Every chain length limit function l(m,α) comes with an associated trimmingrate, probability of the event lpsl > l(m,α). This probability not only determinesthe probability of failure for a single function but it also determines the expectednumber of trials to find a suitable function, as stated in Lemma 6.8.

Lemma 6.8. Let l be a chain length limit function and p ∈ (0, 1) be the trimmingrate such that Pr (lpsl > l(m,α)) ≤ p. Then the expected number of trials to find afunction, which does not create a long chain, is at most 1

1−p and the variance of thisnumber is bounded by p

(1−p)2 .

Proof. The probability of k independent unsuccessful searches of a function withbounded chains is at most pk and thus the distribution of the first successful attemptis bounded by the geometric distribution. For an estimate of the expected value andthe variance of the first successful attempt we need the following facts, that may be

75

found in [29],

∞∑i=0

pi =1

(1− p) ,

∞∑i=0

ipi =p

(1− p)2,

∞∑i=0

i2pi =p(1 + p)

(1− p)3.

The expected time of success is then given by:

∞∑i=0

(i+ 1)pi(1− p) = (1− p)∞∑i=0

pi + (1− p)∞∑i=0

ipi =1− p1− p +

(1− p)p(1− p)2

=1

(1− p) .

Now we estimate the variance:

∞∑i=0

(i+ 1− 1

(1− p))2

pi(1− p)

=∞∑i=0

(i+ 1)2 pi(1− p)−∞∑i=0

2(i+ 1)pi +∞∑i=0

pi

(1− p)

=1− pp

(∞∑i=0

i2pi

)− 2

(1− p)2+

1

(1− p)2

=1− pp

p(1 + p)

(1− p)3− 1

(1− p)2

=1 + p

(1− p)2− 1

(1− p)2

=p

(1− p)2.

So the schema to obtain a chain length limit function is as follows. For a pre-scribed probability of failure – trimming rate p we find a minimal chain length limitfunction l(m,α) such that Pr (lpsl > l(m,α)) ≤ p. The probability p is chosen ac-cording to the expected number of trials required to find a function which does notcreate a long chain. For example in our model, we choose two trails and thus p = 0.5.

The lower the trimming rate p is the greater values of l(m,α) are obtained inorder to meet Pr (lpsl > l(m,α)) ≤ p. In addition, from Corollary 6.7 it follows thatthe smaller the trimming rate p is, the better expected results are obtained. AndTheorem 6.1 implies that the expected chain length still remains constant providedthat the load factor α is bounded. So the small values of p give good expected results

76

and low number of trails required to obtain a function. On the other hand choosinga low trimming rate p gives only a poor worst case warranty.

The most interesting and the most important idea of trimming is that everyp-trimmed system is an adaptation of the original class H to the stored set S.

6.3 Chain Length Limit

From now we concentrate our effort to obtain the tightest bound on the chain lengthfor a given trimming rate p ∈ (0, 1). The corresponding bound is determined fromthe density function shown in Remark 5.37. This limit is used later in our model inSection 6.4.

In Theorem 6.9 the set Se is not directly the stored set. Instead, we use the setSe that comes from the analysis of our model shown in Theorems 6.18 and 6.20. Onthe other hand every set Se is a subset of the stored set S. The size of the set Se isbounded according to the requirements of our analysis.


2 be a random uniformly chosen linear transforma-tion, m = 2b, α′ ∈ 1, 1.5, Se ⊂ Zu

2 such that |Se| ≤ α′m and p ∈ (0, 1) be thetrimming rate.

Then there is a chain length limit function l(m) = a logm log logm+ b logm forsome a, b ∈ R depending only on α′ and p. For the chain length limit function l(m)we have Pr (lpsl > l(m)) ≤ p.

• When α′ = 1.5 and p = 0.5, set l(m) = 57.29 logm log logm.

• When α′ = 1 and p = 0.5, set l(m) = 47.63 logm log logm.

Proof. It is enough to prove the statement for |Se| = α′m. For a smaller set Se,with |Se| < α′m, it follows that Pr (lpsl > l(m)) ≤ p, since it may be extended to aset that has exactly α′m elements and confirms to the bound. Hence the statementholds for every Se such that |Se| ≤ α′m provided that it holds for every Se with|Se| = α′m.

Now assume that |Se| = α′m and set f(x) = xlog b−logα′−log x−log log x. From Corol-lary 5.9 it follows that the function f(x) is decreasing in the interval [2,∞).

First, we introduce a parameter ε ∈ (0, 1) which is used later in the proof to

minimise the value of the limit function. If r ≥ 4, α′ ∈(

0.5, rlog f

)and ε ∈ (0, 1),

then by Remark 5.37 we have

Pr (lpsl > 2α′cεr) ≤ 1

1− εf(

r

log r

).

We find the minimal value of r such that 11−εf

(r

log r

)≤ p. By setting the value of

the variable r we also obtain the chain limit confirming to the prescribed trimmingrate p.

77

Our next step is to define a lower bound d, d ≥ 2, of the expression rlog r

. Since

d ≤ rlog r

we have that f(d) ≥ f(

rlog r

)because the function f is decreasing in the

interval [2,∞). Whenever we have a value of the variable d such that f(d) ≤ (1−ε)p,then f

(r

log r

)≤ (1 − ε)p, too. If we manage to find the minimal value of r from

d ≤ rlog r

, then we set the chain limit to 2α′cεr and the trimming rate p is achievedas well.

First, we show a way how to estimate the value of the variable r for a given d ≥ 2.

Claim 6.10. If d ≥ 2 and r = 2d log d, then d ≤ rlog r

and r ≥ 4.

Proof. Since we selected d as a lower bound for rlog r

we have to find minimal r fromthe inequality r

log r≥ d. Observe that for every d ≥ 2 we have that log d ≥ 1+log log d

and hencelog d+ log d

1 + log d+ log log d≥ 1.

Putting r = 2d log d satisfies the inequality since

r

log r=

2d log d

1 + log d+ log log d=

d(log d+ log d)

1 + log d+ log log d≥ d.

The value of r is thus r = 2d log d ≥ 4.

The probability estimate of the event lpsl > 2cεα′r requires that r ≥ 4. For the

choice of the value r from Claim 6.10 it follows that r ≥ 4 when d ≥ 2. Because wechoose d ≥ 2, we no longer pay attention to the assumption of Remark 5.37. Theremaining one α′ < log r

2may cause a problem. Its validity must be verified at the

end, immediately after we state the exact choice of r.To finish the proof set the lower bound d = jb for a positive constant j. Instead of

finding exact value of d, it is sufficient to find a minimal value of j. The simplificationis motivated by the fact that the order of the asymptotic growth of E (lpsl) is b log b.In Claim 6.11 we show that putting d = jb respects this asymptotic growth. So nowwe just need to find the multiplicative constant as small as possible.

Claim 6.11. Let j be a positive constant such that d = jb ≥ 2, then the chain limitrule we propose has the form

4cεα′jb(log b+ log j).

Proof. We choose the value of r from Claim 6.10 as

r = 2d log d = 2jb(log b+ log j).

Hence our chain limit can be finally rewritten as:

4cεα′r = 4cεα

′jb(log b+ log j).

78

Claim 6.12. To find the minimal value of d, d ≥ 2 satisfying f(d) ≤ (1 − ε)p usethe inequality

(jb)− logα′−log j−log log(jb) ≤ (1− ε)p. (6.13)

Proof. From remark Remark 5.37 we obtain the following inequality that allows usto find the minimal suitable value of d:

f(d) = dlog b−logα′−log d−log log d ≤ (1− ε)p.

We substitute jb into d to get the required result:

dlog b−logα′−log d−log log d = (jb)− logα′−log j−log log(jb) ≤ (1− ε)p.

Recall that we use the hash table B = Zb2 and refer to m = 2b as to its size.

Inequality 6.13 may have various interpretations.

• For fixed 0 < p, ε < 1 and a positive constant j we get a lower bound onm when our estimate becomes valid. Realise, that we can obtain arbitrarilylow multiplicative constant. This follows from the fact that we are allowedto choose values for the constant j. Such estimates are valid only for large

numbers of stored elements, since we need d = jb ≥ 2 and n ≥ m2≥ 2

2j−1.

• Or we can find the parameters ε and j such that multiplicative constant 4cεα′j

is the smallest possible for the trimming rate p and the size of the hash tablem. This statement is used to find the required estimate for m ≥ 4 096.

We use Inequality 6.13 to obtain the chain limit. The limit is computed for tablesconsisting of at least 4 096 buckets and the probability bound is set to 0.5. Thesechoices were not random. When we used the formula for the first time, we gainedthe multiplicative constants in the order of tens. Estimates with the multiplicativeconstant in the order of tens start beating the most basic linear estimate, lpsl ≤ α′m,when hashing thousands of elements.

Program optimising the value of the multiplicative constant only minimises thevalue 4cεα

′j and does not pay any attention to the other constant. After the mini-mal value is retrieved, together with the values of the parameters, the value of theconstant, 4cεj log jb, is determined. To find the value of the constant cε according toInequality 5.24 of Statement 5.23, additional parameters k and l are required. Weuse Algorithm 5 to solve this optimisation problem.

79

Algorithm 4 Calculate the multiplicative constant for parameters p,m, α′, ε, k, l.

cε ← constant cε computed with parameters k, ε and lr ← (1− ε)p Right side, inevitably achieved bound.j ← 1

Lower the value of j so that the right side is skipped.l← (j logm)− log(α′)−log(j)−log(log(j logm))

while l < r and j > 0 doj ← j − STEP ;l← (j logm)− log(α′)−log(j)−log(log(j logm))

end while

j ← j + STEPreturn 4cεα

′j

Algorithm 5 Calculate the smallest limit for p = 0.5, m ≥ 4 096 and prescribed α′.c←∞for k ∈ [2, 4] with STEP do

for l ∈ [1.3, 3] with STEP dofor ε ∈ [0.85, 0.99] with STEP do

if c > multiplicative constant for p = 0.5,m = 4 096, α′, ε, k, l thenc← computed multiplicative constant

end ifend for

end forend for

return c

The form of the chain length limit function follows from Claim 6.11 and Algorithm5, which is able to compute the minimal constant a and the corresponding constantb.

The best result achieved for α′ = 1.5, m = 2b ≥ 4 096 is for ε = 0.96, j = 0.74and equals 4cεα

′j = 57.29. The same approach gives multiplicative constant 47.63for α′ = 1. The assumption α′ < r

2holds since α′ ≤ 1.5 < 6 < jb ≤ d log d = r

2.

Now compare this limit with the linear estimate, lpsl ≤ α′m, for the size of thehash table m ≥ 4 096 and the number of stored elements n ≥ α′m = 0.5 · 4 096 =2 048. Linear estimate on the length of the longest chain equals n = 2 048. Ourestimate equals approximately 57.29 logm log logm ≤ 57.29 · 12 · 2.27 ≤ 1 555 whichis already far better.

These limits may be improved since we neglected the part 4cεα′j log j logm which

is negative.

80

6.4 The Proposed Model

The model we propose is a slight modification of the simplest model of universalhashing scheme with separated chains. We distinguish the two cases – when Deleteoperation is not allowed or when the stored elements can be removed.

• Universal class. We use the system of linear transformations as the universalfamily of functions. The universe U = Zu

2 and the target space, the hash table,is referred to as B = Zb

2. We may imagine the situation as hashing u-bit binarynumbers to the space of b-bit binary numbers. We refer to the size of the hashtable as to m = |B| = 2b.

• Load factor rule. The load factor of the table is kept in the predefinedinterval. If the load factor is outside the interval, whole table is rehashed intoa greater or smaller table. New size is chosen so that the load factor was asnear 0.5 as possible. Load factor is maintained in the interval [0.5, 1] for thescheme without the delete operation. We need the interval [0.25, 1] if the deleteoperation is allowed.

• Chain limit rule. When there is a chain longer than the limit value l(m), thenthe hash table is rehashed. The value l(m) is chosen according to Theorem 6.9with the trimming rate p = 0.5. The chain length limit function of our modeldoes not depend on the table’s load factor so we omit the parameter α and usejust l(m).

The exact chain length limit function comes from Theorem 6.9. If Deleteoperation is forbidden, then set the limit value l(m) = 47.63 logm log logmand use α′ = 1. When Delete operation is allowed, the chain length limitfunction l(m) = 57.29 logm log logm and α′ = 1.5.

Estimates for the chain limit rule are valid for the tables consisting of at least4 096 slots. There are two ways how to deal with this problem. First, we may setthe initial size of the table to 4 096 buckets. If the size of the table is 4 096 buckets,then the lower bound of the load factor rule is not applied and we allow the table’sload factor to be in the interval [0, 1).

Since not every hash table grows to a size of 4 096 buckets, this overhead becomesunacceptable. The other option lies in turning off the chain limit rule when the tablehas less than 4 096 buckets.

6.5 Time Complexity of Computing a Hash Value

Since our model is based on the system of linear transformations, what is quiteunusual, we have to deal with the time complexity of computing an element’s hashvalue. We ask if the time required to compute the hash value is still constant. Whenwe bound the size of the universe |U |, certainly it is. But this time may be worse,when compared to the time required by linear system 3.7.

81

Linear system uses a constant number of arithmetic operations. If we do notbound the size of the universe |U |, we may compute them in O(log2 |U |) time.

Our system, system of linear transformations, represents every transformation bya binary matrix M ∈ Zb×u

2 . Every element ~x ∈ U is a vector consisting of u bits. Toobtain its hash the matrix-vector multiplication M~x is performed. We can certainlydo it in O(ub) = O(log |U | log |B|) time.

If the size of the universe is bounded, this time is constant. Despite the betterasymptotic bound, in practise, the time is worse than that of linear system. Assume,that we represent elements by a word determined by the computer’s architecture. Inthis case linear system needs just three operations, one multiplication, one additionand a modulo. Computing a matrix-vector multiplication can be optimised, but isnot so fast.

In addition, when hashing is applied, elements usually fit in the word of theunderlying architecture. Therefore the arithmetic operations become constant andthis is our case, too. To deal with the longer running times of computing a hash value,sometimes, it is possible to cache once computed hash value within the element. Thissolution is a trade-off that improves the time complexity but consumes additionalmemory.

6.6 Algorithms

In Section 6.4 we propose a model of universal hashing without any exact definition.Precise algorithms, showing how the operations work, are required. Despite the factthat the algorithms are very similar to those shown in Chapter 2, now they aredescribed exactly.

First, let us describe the hash table’s variables and notation. Hash table containsvariables storing its size Size, the number of represented elements Count and arrayT is the hash table itself. Function Limit denotes the chain length limit functionand Hash is the current universal function – a linear transformation represented bya binary matrix. Every bucket T [i] contains two variables T [i].Size, the number ofelements inside the slot, and T [i].Chain, the linked list of the elements in the bucket.

Initialisation. We use Algorithm 5 from the proof of Theorem 6.9 to compute thebound for α′ ∈ 1, 1.5, according to the status of Delete operation, and for p = 0.5.Theorem 6.9 states that the limit function is in the form a logm log logm + b logmfor a, b ∈ R depending only on α′ and p. Thus we need to store just two real numbersa and b to represent the chain length limit function. Let us note that Theorem 6.9gives the chain length limit function for a different parametrisation, too. However,in such case the amortised analysis must be changed.

Initialisation creates a new empty table with the prescribed size. It also choosesa universal function by the random initialisation of the bits in the matrix Hash.The values of bits are chosen randomly each bit having the same probability 0.5. Ifthe delete operation is allowed αmin = 0.25, αmax = 1 and α′ = 1.5, if it is not, thenαmin = 0.5, αmax = 1 and α′ = 1.

82

Algorithm 6 Initialisation of the hash table

Require: p ∈ (0, 1), default 0.5Require: m ∈ N, default 4 096

Ensure: the random uniform choice of a linear transformation Hash

initialise the value α′ according to the delete operationuse Algorithm 5 to compute the bound Limit(m) for the prescribed p and α′

initialise the variables αmin and αmaxcreate the table T of size mSize← mCount← 0choose the hash function – random binary matrix Hash

Algorithm 7 Rehash operation

Require: m ∈ N, default Size New size of the hash table.

repeatT ′.Initialise(m)turn off the chain limit rule in T ′

turn off the load factor rule in T ′

for all element x stored in the hash table doT ′.Insert(x)

end foruntil chain limit rule is satisfied in T ′

swap T and T ′

turn on the chain limit rule in T ′

turn on the load factor rule in T ′

free T

Algorithm 8 Find operation

Require: x ∈ Ui← h(x)if T[i].Chain.Contains(x) then

return true Successful find.else

return false Unsuccessful find.end if

83

Algorithm 9 Insert operation

Require: x ∈ Ui← h(x)if T[i].Chain.Insert(x) thenSet the number of stored elements.Count← Count+ 1T [i].Size← T [i].Size+ 1

Rehash← falseThe load factor rule may be violated.if Count > Size ∗ αmax thenNewSize← 2 ∗ SizeRehash← true

elseNewSize← Size

end if

The chain limit rule may be violated.if T [i].Size > Limit(Size) thenRehash← true

end if

Rehash the table if needed, new function is chosen.if Rehash thenRehash(NewSize)

end if

return true Successful insert.else

return false Unsuccessful insert.end if

84

Algorithm 10 Delete operation

Require: x ∈ Ui← h(x)if T[i].Chain.Remove(x) thenCount← Count− 1T [i].Size← T [i].Size− 1

if Count < Size ∗ αmin thenRehash(Size/2)

end if

return true Successful delete.else

return false Unsuccessful delete.end if

Rehash operation To enumerate the stored elements in Rehash operation, Algo-rithm 7, we iterate every chain of the table. A common optimisation is to place allthe stored elements into a linked list. This allows faster enumeration but causes aspace overhead.

Whenever a load factor rule or the chain limit rule is violated, the table is re-hashed using a new function chosen in initialisation. Both Initialisation and Rehashoperation, Algorithms 6 and 7, take an argument m specifying the new size, so thatit is possible to rehash the table into a larger or smaller one.

Find, Insert and Delete operations Find is very straightforward. Let us noticethat the linked list manipulation operations return true in the successful case, if anelement is found, deleted or inserted, and otherwise return false. Insert and Deleteoperations are slightly complicated compared to the original ones because of the bothrules that are required to hold.

6.7 Potential Method

In Section 6.8 we estimate the expected amortised time complexity using the poten-tial method. So let us explain it first. Assume that we have a data structure anda sequence of operations oini=1 performed on it. We want to estimate the runningtime of the sequence. Of course, we can use the worst case time for each operationbut this may be very misleading.

Consider a simple array. The elements are placed at the array’s end. If thereis a free position, we store the element. When the array is full, then we double itssize and store the element. Clearly, if we store n elements inside the array, thenthe worst case time required to insert another one is O(n). We ask, what time isrequired to store n elements in such an array. The estimate using the worst case timefor each operation gives the result O(n2) = O(

∑ni=1 i). Amortised analysis gives a

85

far better result, O(n). This result can be explained by the fact that the fast insertsaccumulate a potential that is used later by a following slow insert.

Now we describe the potential method – a method how to estimate the amortisedcomplexity. The potential of the data structure and an operation’s amortised costare tools which distribute the running time of a sequence among the operations moreevenly. Assume that we have a data structure at the initial state s0 and we performa sequence of n ∈ N operations oini=1. Every operation oi changes the state of thedata structure from si−1 to si, i ∈ 1, . . . , n.

Every state has a real-valued potential given by the potential function p. Hencewe obtain a sequence of potentials pi for every i ∈ 0, . . . , n. The potential p0 isthe initial potential and is often chosen as zero. So the operations not only changethe data structure, they change its potential, too. We define ai = ti + pi − pi−1 asthe amortised cost of the ith operation where ti is its running time.

Definition 6.14 (Amortised complexity). Assume that we perform an operation ofa data structure. Let t be the time required to perform the operation. Let pb be thepotential of the data structure before performing the operation and pa be the potentialafter. Then the amortised complexity of the operation

a = t+ pa − pb.The time taken by the operations in a sequence o, To =

∑ni=1ti, may be estimated

by the amortised time of the sequence, Ao =∑

ni=1ai, as

To =n∑i=1

ti =n∑i=1

(ai − pi + pi−1) = Ao + p0 − pn.

Let us show some facts regarding the potential functions and the amortised com-plexity of a sequence of operations.

Claim 6.15. Assume that we estimate the amortised complexity by a potential func-tion p. Let o = oini=1 be the performed sequence of operations, p0 be the startingpotential and pn be the potential after performing the last operation.

(1) If p ≥ 0, then To ≤ Ao + p0.

(2) If p ≤ 0, then To ≤ Ao − pn.

(3) If p0 = 0, then To = Ao − pn.

For randomised algorithms we may take the expected time consumed by an op-eration.

Definition 6.16 (Expected amortised complexity). Assume that an operation of arandomised data structure is performed and the operation runs in time t. Let pb bethe potential before performing the operation and pa be the potential after. Then theexpected amortised complexity of the operation is defined as

a = E (t+ pa − pb) .

A description of the potential method may be found in [23] and [25].

86

6.8 Expected Amortised Complexity

Expected amortised time complexity of the introduced scheme is analysed in thenext two theorems in either of the two cases – when Delete operation is allowed ornot.

Let us discuss the situation of Lemma 6.17. We are given a sequence of setsS1 ⊆ · · · ⊆ Sk ⊆ Se that should be represented in a hash table. We start with theinitial hash function h0. Set S1 causes violation of the chain limit rule for functionh0. In order to enforce the rule, we select random functions h1, h2, . . . until we find asuitable function for the set S1, denote it hi1 . Later, after some inserts, we obtain aset S2 and the function hi1 is no longer suitable. We continue by selecting functionshi1+1, . . . , hi2 with hi2 being suitable for the set S2. The chain length limit functionis chosen for the set Se and for the trimming rate p from Theorem 6.9. We needto find the expected number of trials, the number of selected functions, needed toenforce the chain limit rule for the sets S1, . . . , Sk. The set Se is a set that comesfrom the analysis, we use the same set Se is in Theorem 6.9.

Lemma 6.17. Let Zu2 be the universe, Zb

2 be the hash table with the size m = 2b,p ∈ (0, 1) be the trimming rate and α′ ∈ 1, 1.5. Let the chain length limit functionbe chosen according to Theorem 6.9 for α′ and p.

Let S1 ⊆ · · · ⊆ Sk be a sequence of represented sets and Se ⊂ Zu2 be a set such

that |Se| ≤ α′m and Sk ⊆ Se. Let h0, h1, . . . , hl be a sequence of random uniformlychosen linear transformations selected to enforce the chain limit rule for the sequenceof sets. Assume that 0 = i0 < · · · < ik = l is the sequence such that

(1) the functions hij , hij+1, . . . , hij+1−1 create a long chain for the set Sj+1 for everyj ∈ 0, . . . , k − 1,

(2) the function hij does not create a long chain for the set Sj, j ∈ 1, . . . , k.Then E (l) = 1

1−p .

Proof. First, observe that if a function h is suitable for the set Se, then it is suitablefor every set S1, . . . , Sk.

By Lemma 6.8 the expected number of trials needed to represent the set Sewithout a long chain is 1

1−p .The sequence of functions h1, . . . , hl is random and the selection of every function

is uniform. The chain length limit function is chosen according to Theorem 6.9 –it fits for the set Se. Hence if we do not consider the sets S1, . . . , Sk, we have thatE (l) = 1

1−p .If a function ha, 1 ≤ a < l creates a long chain for a set Sb, ib−1 ≤ a < ib, then it

certainly creates a long chain for the set Se. Thus if we fail with the function ha forthe set Sb, then we fail with the same function for the set Se, too. In this situation wecontinue by choosing functions ha+1, ha+2, . . . . The sequence of functions h1, . . . , hlfor the sequence of sets, remains the same for the single set Se, provided that therandom choice is the same. So the functions selected for the sequence of sets may

87

be selected when rehashing the set Se, too. Hence we do not need to consider thesequence of sets when choosing a right function for the set Se.

Thus E (l) = 11−p as observed before.

Theorem 6.18. Consider hashing with forbidden Delete operation. Then the opera-tions Find and Insert have the expected amortised time complexity O(1). Moreover,if the size of a hash table is m, then the find operation runs in O(logm log logm)time in the worst case.

Proof. For the expected amortised complexity analysis we use the potential method.Let the expression αm−m denote the potential of the scheme. This is the negativevalue of the number of the remaining successful insert operations which would makethe table’s load factor reach one. Whenever a successful insertion is performed wecheck if

• the prolonged chain does not violate the chain limit rule or

• the load factor is not greater than 1.

If either of the two conditions is violated the whole table is rehashed. We have fourcases to analyse.

• Find operation or an unsuccessful Insertion operation is performed. From The-orem 6.1 and Corollary 6.7 we know that it takes O(1) expected time only. Thepotential does not change. Since the chains are bounded by l(m), its worst caserunning time is O(logm log logm).

• Insert operation is performed and the Rehash operation is not necessary. Thenfrom Theorem 6.1 the operation takes O(1) time in the expected case. Thepotential is increased by one.

• Rehash operation after an Insert operation is required because the load factorexceeds one. The rehash itself takes the O(m) time and the size of the tableis doubled. Thus the resulting potential is −m. The potential before theoperation equals 0. So the operation took O(1) expected amortised time.

• Suppose that a Rehash operation after an Insert is needed because of theviolation of the chain limit rule. We seek for a new function satisfying therule without resizing the table. For convenience we define a sub-sequence ofoperations called a cycle.

Definition 6.19. Cycles create a partitioning of the original sequence of oper-ations. Each cycle is a sub-sequence consisting of the operations between twoimmediate inserts which cause the violation of the load factor rule. The firstinsert causing the violation is included in the cycle and the second one belongsto the next cycle.

88

We refer to the current cycle as to the cycle which contains the analysed op-eration. Now we find the number of rehash operations that are caused by thechain limit rule violation and occur in a single cycle. Let Se be the set repre-sented at the end of the current cycle. Then by Lemma 6.17 we need just 1

1−ptrials in the current cycle.

Now we compute the expected amortised complexity of the insert operationscausing the violation of the chain limit rule in a single cycle. For every tablethat consists of m slots at the beginning of the cycle, there are exactly 0.5minsert operations raising the load factor from 0.5 to 1. The expected time spentby fixing the chain limit rule in a cycle is, by Lemma 6.17, 1

(1−p)O(m). We candivide this amount of time along the cycle of 0.5m inserts. This distributionraises the expected amortised time for every insert operation only by a constant.The potential is incremented by one. The expected time of Insert operationwithout Rehash operation is O(1).

We have one issue to care about. In the last case we distributed the time evenlyalong a complete cycle. If the number of inserts is not a power of two, we maypossibly distribute a long time along a non-complete cycle. With this distributionwe can not obtain a constant amortised time for the insert operation. However, wecan distribute the time spent by fixing the chain limit rule from the last incompletecycle along all the insert operations performed so far.

Thus the expected amortised time complexity of every analysed operation isO(1).

One can find a better potential function proving the previous result more formally.We use such an explicitly expressed potential function in the next theorem. Weassume that Delete operation is allowed.

Theorem 6.20. If the initial hash table is empty and Delete operation is allowed,then the expected amortised time complexity of every operation is constant. Moreover,if the size of a hash table is m, then the find operation runs in O(logm log logm)time in the worst case.

Proof. In this proof we have two types of the operation cycles. We need to distinguishbetween the work required to enforce the load factor rule and the time spent bykeeping the chain limit rule. In the analysis with forbidden Delete operation thesituation is simpler and these cycles are the same. And recall the difference, thechain length limit function is computed for α′ = 1.5. The load factor α is now in theinterval [0.25, 1].

We deal with the amortised time of Find and unsuccessful Insert or Delete oper-ations in advance. Their expected running time is proportional to the expectedchain length. From Theorem 6.1 it follows that this value is constant. Sincechains are bounded by l(m) we have that the worst case time of Find operationis O(logm log logm). These operations do not change the potential1. Our analysisis thus simplified by omitting finds and unsuccessful delete and insert operations.

1The potential is defined later in the proof.

89

Let the sequence o = oini=1 denote the performed operations, the ith operationoi ∈ Insert,Delete. The following two definitions make the analysis more exactand clear.

Definition 6.21 (α-cycle). The α-cycle consists of the operations between the twoimmediate operations that cause the violation of the load factor rule. Every α-cyclecontains the second operation causing the violation and the first one is included inthe previous α-cycle. The operation o1 is contained in the first α-cycle.

First, notice that for this definition it is not important if the upper or lower boundof the load factor is the cause of the violation. When a rehash operation is executed,the table size is chosen so that the load factor α was as near 0.5 as possible.

The next definition of the α-cycle is intended for the analysis of the time spentby fixing the violations of the load factor rule.

Definition 6.22 (l-cycle). The l-cycles are the partitioning of the sequence o suchthat every l-cycle ends

• after an operation causing the violation of the load factor rule or

• when we performed 0.5m insertions from the beginning of the l-cycle and theload factor did not exceed the value of 1.

The l-cycle allows us to analyse the work caused by the chain limit rule. Bothl-cycles and α-cycles divide the sequence o. Notice that if an α-cycle ends at theposition i, the corresponding l-cycle also ends at the same position.

The analysis now takes the ith operation for every i = 1, . . . , n. We show thatthe expected amortised time complexity of the opertion oi is O(1) independentlyon its type. The potential now consists of the two parts, p1 and p2. Let e = 1

(1−p)denote the expected number of rehash operations, the expected number of trials,when finding a suitable function.

Above all we want every simple insertion and deletion to take O(1) time only.Thus the potential consists of the part p1 = 4eiα + 8edα where iα is the numberinserts and dα is the number of delete operations performed so far in the currentα-cycle.

Next, we need to distribute the work caused by the chain limit rule. The secondpart of the potential p2 = 2eil + (ce − r)m. The value il denotes the number ofinsertions performed so far in the current l-cycle. The variable r denotes the numberof performed Rehash operations, which are caused by the chain limit rule violation,from the initial state. The variable c denotes the number of started l-cycles from thebeginning so far. The overall potential p = p1 + p2.

The analysis of Delete operation is simpler and comes first. When a deletion isperformed we have to discuss the two cases:

• Simple successful deletion is expected to take O(1) time. We search in chainsthat have constant expected length. The potential is increased by 8e since thenumber of deletions in p1 gets increased by one.

90

• The load factor α violates the lower bound of the load factor rule. Realisethat there are at least 0.25m deletions performed in the current α-cycle, let usexplain why. At the beginning of the cycle the table has the size of m slots andthe load factor equals 0.5. At the end of the cycle the load factor decreasedto 0.25. Such a descent may be caused by at least 0.25m successful deleteoperations. Let us discuss the potential change. First, the potential differenceof the part p1 is less or equals −2em since iα and dα get zeroed. The secondpart p2 gets increased by at most em since a new l-cycle is started and thevalue il is zeroed. Rehashing of the table takes O(em) expected time. Hencethe expected amortised cost of the operation is O(1).

The analysis of the first two cases of Insert operation remains similar to the casewith forbidden Delete:

• Suppose no Rehash operation after an Insert is performed. The searched chainhas the constant expected length and the potential is increased by 4e+ 2e.

• If the load factor exceeds the upper bound, then there are at least 0.5m inser-tions in the current α-cycle. It follows that the potential p2 is certainly greateror equals 2em. After performing the operation a new l-cycle is started. Thepotential p2 is raised by em because the variable c gets incremented by one.But it is also lowered by at least 2em because il is set to zero. The rehash op-eration is expected to take O(em) time. Regarding the fact that the potentialp1 only decreases we expect O(1) amortised time for Insert operation violatingthe load factor rule.

• Insert operation is the last one in the l-cycle and the chain limit rule is notviolated. Then there are 0.5m insertions in the current l-cycle, equivalentlyil = 0.5m and p2 = em+ (ce− r)m. Since a new l-cycle is started, the il termis set to zero and the value c is incremented by one. Therefore there is nopotential change in the part p2. The part p1 only decreases.

• Analysis of Insert operation, which violates the chain limit rule, remains. Timespent by Rehash operation equals O(m) times the number of fails when findinga suitable function. The potential p2 is decreased by the same value, since thevariable r gets incremented by the number of fails. The potential increase inthe part p1 and p2 equals 6e. Without Rehash operation the expected timeof Insert is O(1). The expected amortised time complexity of the analysedoperation is thus constant provided that E ((ce− r)m) = 0.

First, we show why the expected number of fails in one l-cycle equals e. LetS be the set stored after performing the analysed operation. Let Se be the setcreated from the set S by the remaining inserts of the current l-cycle. Becausethe table’s load factor is maintained lower than 1 and there are at most 0.5minserts in every l-cycle, we have that |Se| ≤ 1.5m and S ⊆ Se. From Lemma6.17 we have that e = 1

1−p is the expected number trials needed to find asuitable function for every l-cycle.

91

Second, we show that E ((ce− r)m) = 0. Define the random variable Ci equalto the number of rehash operations required to fix the chain limit rule inthe ith l-cycle for i = 1, . . . , c. From the previous observation it follows thatE (Ci) = e. Clearly r =

∑ci=1 Ci. From Lemma B.18 it follows that

E ((ce− r)m) ≤ m

(ce− E

(c∑i=1

Ci

))= m(ce− cE (C1)) = mc(e− e) = 0.

Since E ((ce− r)m) = 0 and p0 = 0, the expected potential is always non-negative. Hence To = Ao + p0 − pn ≤ Ao and our analysis is thus correct.

We showed that the expected amortised complexity is constant for every opera-tion.

Let us note that the fact∣∣∣ (ce−r)mn

∣∣∣ n→∞−−−→ 0 indicates that it might be possible to

show that the amortised complexity is constant, without any expectation. Since uni-versal hashing is a randomised algorithm, such result would be certainly remarkable.However, if it was really true, there would be still many troubles showing the result.

So, one may ask if using the Law of Large Numbers, Theorem B.23, can not help.Indeed, from Lemma 6.8 we have that Var (Ci) is finite and it may be applied:∣∣∣∣(ce− r)mn

∣∣∣∣ ≤ m

∣∣∣∣ce−∑ci=1 Cic

∣∣∣∣≤ m

∣∣∣∣cE (C1)

c−∑c

i=1 Cic

∣∣∣∣= m

∣∣∣∣E (C1)−∑c

i=1 Cic

∣∣∣∣ .Clearly, the convergence in probability stated by Theorem B.23 holds:∣∣∣∣E (X1)−

∑ci=1Xi

c

∣∣∣∣ c→∞−−−→ 0.

But in the case of m∣∣∣E (C1)−

Pci=1 Cic

∣∣∣ we have to be careful. There is an infinite

class of sequences oini=1 for which m ∈ Ω(c). Thus for these sequences, if c → ∞,then m→∞ as well.

In order to obtain convergence in probability we have that

Pr

(m

∣∣∣∣E (C1)−∑c

i=1 Cic

∣∣∣∣ ≥ ε

)= Pr

(∣∣∣∣E (C1)−∑c

i=1 Cic

∣∣∣∣ ≥ ε

m

)≤ Var (Ci)m

2

mε2=mVar (Ci)

ε2.

This is a problem, since when m → ∞, we can not expect that the probabilityconverges to zero for a fixed ε.

92

For the class of sequences satisfyingm ∈ Ω(c) the inequalities we used for estimate

of∣∣∣ (ce−r)mn

∣∣∣ mean only a multiplicative factor. Thus the estimate is tight and we

see that the bound obtained from the Chebyshev’s inequality is not sufficient. Itsasymptotic rate is not high enough and the factor m appears. However, becausewe know the probability distribution of Ci, maybe we could exploit it and be moreaccurate. This problem remains open.

93

Chapter 7

Conclusion

In this work we present a model of universal hashing which preserves the constantexpected length of a chain. The running time of the find operation is then O(1), too.The model uses the system of linear transformations and exploits its remarkableproperties shown in [3]. In addition, these results are substantially improved sothat they allow the construction of the model. We are also able to show that theexpected amortised running time of the insert and delete operations is constant. Notonly that, our model bounds the worst case running time of the find operation inO(logm log logm).

Because the model is based on the system of linear transformations, the time tocompute the hash value of an element is worsened, despite its asymptotic behaviour.The solution that we propose is to store once computed hash values within the object.This optimisation takes the advantage of the warranties provided by the model, ifthe find operation is dominant.

7.1 Future Work

There are many ways how to improve the model. For example, we can go the way ofthe tighter estimates. Although, it may be very interesting to describe the behaviourof double hashing when it is used with a universal class of functions. Combined withthe system of linear transformations it may be possible to obtain a similar worst casebound without violating the expected running times.

Our next option is to use ideas of perfect hashing. Every chain may be representedby a hash table allowed to have a small load factor. Whenever the elements in thebucket can not be accessed in a constant time, then it might be possible to rehashonly the small table instead of the large one. This approach may bring anotherspeedup.

Another brilliant idea comes from the area of load balancing. We can hash bytwo functions simultaneously and a newly stored element is placed into a smallerbucket. The find operation has to search in both buckets associated with an ele-ment. However, as stated in [28], the expected worst case time for classic hashing isthen substantially better and the expected complexity is preserved. The question is

94

whether using two universal functions may help.Unfortunately, the thesis does not show the experimental results. A high quality

benchmark of the model is required. The benchmark needs to be performed with andwithout the mentioned optimisation – caching of the hash values. This should showthe influence of the universal system on the running times. The benchmark also hasto show when the warranty is needed in dependence on the operations compositionand the input distribution. If we try inputs created by a random number generator,then they are uniformly distributed. So the obtained chains are short even for classichashing. On the other hand such good inputs are seldom present. The real worldinputs, which are inconvenient for classic hashing, should be another output of thebenchmark. On the other hand we should point out when the classic hashing orthe other implementations outperform our model. When the chains are longer andthe find operation is the most frequent one, then it is convenient to have a worstcase warranty especially for large number of stored elements. The question is, howfrequent the find operation has to be when compared to the modifying ones.

Of course, usage of simpler classes brings faster times of computing the hashcodes. Linear classes are quite similar to each other. Maybe we could find a cor-respondence allowing the results found for the class of linear transformations to bebrought into a faster class.

Models providing a reasonable worst case warranty with a good expected timecomplexity may be a suitable solution for various set representation problems. Cur-rent models of hashing may provide such qualities when enriched by simple rules.Also the approach of relaxation, shown in [19] for red black trees, may be help-ful when achieving similar results. Data structures providing strong warranties areusually slower in the average case. Their relaxation worsens the warranties but im-proves the average case. Relaxation may be considered a reverse approach comparedto ours.

95

Bibliography

[1] Alfred V. Aho, Ravi Sethi, and Jeffrey D. Ullman. Compilers: Principles,Techniques, and Tools. Addison-Wesley, 1986.

[2] Susanne Albers and Jeffery Westbrook. Self-organizing data structures. InDevelopments from a June 1996 seminar on Online algorithms, pages 13–51,London, UK, 1998. Springer-Verlag.

[3] Noga Alon, Martin Dietzfelbinger, Peter Bro Miltersen, Erez Petrank, andGabor Tardos. Linear hash functions. J. ACM, 46(5):667–683, 1999.

[4] C.R. Aragon and R.G. Seidel. Randomized search trees. In Foundations ofComputer Science, 1989., 30th Annual Symposium on, pages 540 –545, oct-1nov 1989.

[5] Rudolf Bayer and Edward M. McCreight. Organization and maintenance oflarge ordered indices. Acta Inf., 1:173–189, 1972.

[6] Larry Carter and Mark N. Wegman. Universal classes of hash functions. J.Comput. Syst. Sci., 18(2):143–154, 1979.

[7] Pedro Celis, Per-Ake Larson, and J. Ian Munro. Robin hood hashing. Founda-tions of Computer Science, Annual IEEE Symposium on, 0:281–288, 1985.

[8] Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein.Introduction to algorithms, second edition, 2001.

[9] Luc Devroye and Pat Morin. Cuckoo hashing: further analysis. Inf. Process.Lett., 86(4):215–219, 2003.

[10] Luc Devroye, Pat Morin, and Alfredo Viola. On worst case robin-hood hashing,2004.

[11] M. Dietzfelbinger and Friedhelm Meyer auf der Heide. A new universal classof hash functions and dynamic hashing in real time. In Proceedings of theseventeenth international colloquium on Automata, languages and programming,pages 6–19, New York, NY, USA, 1990. Springer-Verlag New York, Inc.

[12] Martin Dietzfelbinger and Ulf Schellbach. On risks of using cuckoo hashing withsimple universal hash classes. In SODA ’09: Proceedings of the twentieth Annual

96

ACM-SIAM Symposium on Discrete Algorithms, pages 795–804, Philadelphia,PA, USA, 2009. Society for Industrial and Applied Mathematics.

[13] Martin Dietzfelbinger and Ulf Schellbach. Weaknesses of cuckoo hashing witha simple universal hash class: The case of large universes. In Mogens Nielsen,Antonın Kucera, Peter Bro Miltersen, Catuscia Palamidessi, Petr Tuma, andFrank D. Valencia, editors, SOFSEM, volume 5404 of Lecture Notes in ComputerScience, pages 217–228. Springer, 2009.

[14] Michael Drmota. An analytic approach to the height of binary search trees.Algorithmica, 29(1):89–119, 2001.

[15] W. Feller. An Introduction to Probability Theory and Its Applications, Vol. 1.Wiley, New York, NY, third edition, 1968.

[16] Michael L. Fredman, Janos Komlos, and Endre Szemeredi. Storing a sparsetable with 0(1) worst case access time. J. ACM, 31(3):538–544, 1984.

[17] E.M. Landis G.M. Adelson-Velskii. An algorithm for the organization of infor-mation. Sov. Math. Dokl. 3, pages 1259–1262, 1962.

[18] Leonidas J. Guibas and Robert Sedgewick. A dichromatic framework for bal-anced trees. In FOCS, pages 8–21, 1978.

[19] Sabine Hanke, Thomas Ottmann, and Eljas Soisalon-Soininen. Relaxed bal-anced red-black trees. In CIAC ’97: Proceedings of the Third Italian Conferenceon Algorithms and Complexity, pages 193–204, London, UK, 1997. Springer-Verlag.

[20] John L. Hennessy and David A. Patterson. Computer Architecture, FourthEdition: A Quantitative Approach. Morgan Kaufmann Publishers Inc., SanFrancisco, CA, USA, 2006.

[21] Maurice Herlihy, Nir Shavit, and Moran Tzafrir. Hopscotch hashing. In DISC,pages 350–364, 2008.

[22] Donald E. Knuth. The art of computer programming, volume 3: (2nd ed.)sorting and searching. Addison Wesley Longman Publishing Co., Inc., RedwoodCity, CA, USA, 1998.

[23] Alena Koubkova and Vaclav Koubek. Datove struktury. MATFYZPRESS,Prague, Czech Republic, 2010.

[24] Kurt Mehlhorn. Data Structures and Algorithms 1: Sorting and Searching,volume 1 of Monographs in Theoretical Computer Science. An EATCS Series.Springer, 1984.

[25] Kurt Mehlhorn and Peter Sanders. Algorithms and Data Structures: The BasicToolbox. Springer, 2008.

97

[26] Carl D. Meyer. Matrix Analysis and Applied Linear Algebra. SIAM, 2001.

[27] Peter B. Miltersen. Universal hashing, 1998.

[28] Michael Mitzenmacher and Eli Upfal. Probability and Computing: RandomizedAlgorithms and Probabilistic Analysis. Cambridge University Press, New York,NY, USA, 2005.

[29] Lennart Rade and Bertil Westergren. Mathematics handbook for science andengineering. Birkhauser Boston, Inc., Secaucus, NJ, USA, 1995.

[30] Bruce Reed. The height of a random binary search tree. J. ACM, 50(3):306–332,2003.

[31] Sheldon M. Ross. Probability Models for Computer Science. Academic Press,Inc., Orlando, FL, USA, 2001.

[32] Shai Rubin, David Bernstein, and Michael Rodeh. Virtual cache line: A newtechnique to improve cache exploitation for recursive data structures. In In Pro-ceedings of the 8th International Conference on Compiler Construction, pages259–273. Springer, 1999.

[33] A. Siegel. On universal classes of fast high performance hash functions, theirtime-space tradeoff, and their applications. In SFCS ’89: Proceedings of the30th Annual Symposium on Foundations of Computer Science, pages 20–25,Washington, DC, USA, 1989. IEEE Computer Society.

[34] Mark N. Wegman and J. Lawrence Carter. New classes and applications ofhash functions. In SFCS ’79: Proceedings of the 20th Annual Symposium onFoundations of Computer Science, pages 175–182, Washington, DC, USA, 1979.IEEE Computer Society.

98

Appendix A

Facts from Linear Algebra

Some definitions and facts, which are used in the work, are mentioned and provedin the appendix. Basic definitions of vector space, linear transformation and factsabout solving linear equations can be found for example in [26].

Definition A.1. Let V be a vector space and A be a subset of V . For a vector ~v ∈ Vdefine the set ~v + A as

~v + A = ~v + ~a | ~a ∈ A.Definition A.2. Let V be a vector space and A,B be subsets of V . Define the setA+B as

A+B = ~a+~b | ~a ∈ A,~b ∈ B.Definition A.3 (Affine subspace). Let V be a vector space, V ≤ U be its subspaceand ~a ∈ U . The set A = ~a+ V is an affine subspace of the vector space U .

Lemma A.4. Let A = ~a + ~V be an affine subspace of a vector space U determinedby a subspace V ≤ U and a vector ~a ∈ U . Then A = ~b+ V for every ~b ∈ A.

Proof. Every vector ~x ∈ A may be written in the form ~x = ~a+ ~vx for a vector ~vx ∈ V .We need to rewrite it in the form ~x = ~b + ~wx for a vector ~wx ∈ V . Since ~b ∈ Awe also have that ~b = ~a + ~vb. Substituting ~a = ~b − ~vb into ~x = ~a + ~vx gives that~x = ~a+ ~vx = ~b− ~vb + ~vx. Since V is a vector space we have that ~wx = ~vx − ~vb ∈ V .We just proved that A ⊂ ~b+V . The reverse inclusion comes from the fact that ~b+Vis an affine subspace and a ∈ ~b+ V and. The proof is symmetric.

Definition A.5 (Null-space of a matrix). Let T be a field, A ∈ Tm×n be a matrixwith n,m ∈ N. The set ~x ∈ T n | A~x = ~0 is called the null-space of the matrix A.

There is a one-to-one relationship between a matrix over a field T and a lineartransformation between arithmetic vector spaces over the same field. For everyvector ~x ∈ T n let [~x]β denote the coordinates of the vector ~x relative to the basisβ of T n. Consider two vector spaces Tm and T n and their two fixed ordered basesβ = ~b1, . . . ,~bn of T n and γ of Tm. Let ~c1, . . . ,~cn be a subset of Tm. Then

99

every linear transformation L : T n → Tm with L(~bi) = ~ci for every i ∈ 1, . . . , ncorresponds to a matrix A ∈ Tm×n such that

A =

[~c1]γ . . . [~cn]γ

.

For the matrix A we have that A[~x]β = [L(x)]γ for every ~x ∈ T n. Every matrix Afor two fixed bases thus gives a linear transformation, too.

Remark A.6. Null-space of every matrix A ∈ Tm×n is a subspace of the vector spaceT n.

Proof. Proof is a straightforward verification of the three properties of vector sub-spaces. In the following assume that ~x, ~y ∈ N (A).

The zero vector is in N (A) because A~0 = ~0.

The sum ~x+ ~y ∈ N (A),

A(~x+ ~y) = A~x+ A~y = ~0 +~0 = ~0.

For every t ∈ T we have that t~x ∈ N (A) because

At~x = tA~x = t~0 = ~0.

Definition A.7 (Orthogonal complement). Let V be a vector subspace of a vectorspace U and the operation 〈~u | ~v〉 denote the scalar product of vectors ~u,~v ∈ V .Orthogonal complement of subspace V , denoted by V ⊥, is defined as:

V ⊥ = ~u ∈ U | 〈~u | ~v〉 = 0 for all ~v ∈ V .Definition A.8 (Affine linear transformation). Let V, U be vector spaces, V0 ≤ V ,U0 ≤ U be their subspaces and VA = ~v + V0 and UA = ~u + U0 be affine subspacesof V and U with vectors ~v ∈ V and ~u ∈ U . Function TA : VA → UA is an affinelinear transformation if there is a linear transformation T0 : V0 → U0 such thatTA(~x) = ~u+ T0(~x− ~v), ~x ∈ VA.

Definition A.9 (Set of all affine linear maps). Let V, U be vector spaces and V0 ≤ V ,U0 ≤ U be their subspaces. Let VA = ~v + V0 and UA = ~u+ U0 be affine subspaces ofV and U respectively with vectors ~v ∈ V and ~u ∈ U . Set of all affine linear mappingsbetween affine spaces VA and UA, LTA(VA, UA), is defined as:

LTA(VA, UA) = T : VA → UA | T is an affine linear transformation.Lemma A.10. Let T : Zf

2 → Zb2 for f ≥ b be an onto linear map and ~y ∈ Zb

2. ThenT−1(~y) is an affine subspace of Zf

2 and |T−1(~y)| = 2f−b.

100

Proof. By Remark A.6, T−1(0) is a vector subspace of Zf2 . The set T−1(~y) is an

affine subspace of the vector space Zf2 since for every vector ~u ∈ T−1(~y) we have that

T−1(~y) = ~u+ T−1(~0). First we show, ~u+ T−1(~0) ⊆ T−1(~y):

~v ∈ T−1(~0)⇒ T (~u+ ~v) = T (~u) + T (~v) = ~y +~0 = ~y

⇒ ~u+ T−1(~0) ⊆ T−1(~y).

The reverse inclusion, T−1(~y) ⊆ ~u+ T−1(~0), holds as well since

~v ∈ T−1(~y)⇒ T (~v − ~u) = T (~v)− T (~u) = ~y − ~y = ~0

⇒ ~v − ~u ∈ T−1(~0)

⇒ T−1(~y) ⊆ ~u+ T−1(~0).

Now we use this fact to prove the second statement of the lemma, |T−1(~y)| = 2f−b.Fix arbitrary vector ~u ∈ T−1(~y). From Lemma A.4 then follows T−1(~y) = ~u+T−1(~0).In addition, the equality describes a one-to-one map between T−1(~0) and T−1(~y). Theconsequence of is that all sets T−1(~y) for every y ∈ Zb

2 have the same size, exactly|T−1(~y)| = 2f−b.

Lemma A.11. Let A ∈ Zm×n2 be a matrix such that rank (A) = m and m ≤ n. Then

the system of linear equations A~x = ~y has 2n−m solutions for every vector ~y ∈ Zm2 .

Proof. This is a consequence of Lemma A.10. We apply it for the linear transforma-tion given by the matrix A and vector ~y ∈ Zm

2 .

101

Appendix B

Facts Regarding Probability

In the appendix we summarise some facts regarding the probability theory. We showthe basic definitions and prove the statements used in the work.

First we discuss the discrete probability and discrete random variables. Thenwe move to continuous random variables. Finally we show the Markov’s and theChebyshev’s inequality.

Whenever we perform a random experiment we get a random outcome – a randomevent occurred. The probability theory models the uncertainty included in randomexperiments and deals with the probabilities of the random events. For exampleif the coin toss is the random experiment, then the set roll, die contains all thepossible outcomes. If the coin is fair, they share the same probability of occurrence.

Definition B.1 (Probability space, elementary event). We refer to the set Ω con-sisting of all the possible outcomes of a random experiment, Ω = ω1, . . . , ωn, asto the probability space. Its elements ωi, i ∈ 1, . . . , n, are the elementary events.The probability of every elementary event ωi, i ∈ 1, . . . , n, equals pi, 0 ≤ pi ≤ 1,and is written as:

Pr (ωi) = pi.

The sum of the probabilities of the elementary events must be equal to one:

n∑i=1

pi = 1.

In general the probabilities of the elementary events need not to be the same.However, this assumption is frequently satisfied and then pi = 1

nfor i = 1, . . . , n.

Definition B.2 (Event, probability). The set A = a1, . . . , am ⊆ Ω, containingonly elementary events, is called a compound event. The probability of the com-pound event A is

Pr (A) =∑ω∈A

Pr (ω) .

If the probabilities of the elementary events are the same, then

Pr (A) =|A||Ω| =

m

n.

102

Definition B.3 (Complementary event, the certain event, the impossible event).Let A ⊆ Ω be an event. The event Ω − A is the complementary event of the eventA. The certain event is the probability space Ω and its complementary event ∅ is theimpossible event.

Corollary B.4 (Probability of the complementary event). If A ⊆ Ω is an eventthen the probability of the complementary event is

Pr (Ω− A) = 1−Pr (A) .

Proof. Follows from the definition of probability:

1 = Pr (Ω) = Pr ((Ω− A) ∪ A)

=∑

ω∈Ω−A

Pr (ω) +∑ω∈A

Pr (ω)

= Pr (Ω− A) + Pr (A) .

We can compute the size of the set, consisting of elementary events, if we knowthe event’s probability and the size of the probability space.

Lemma B.5. Let A ⊆ Ω be an event such that Pr (A) = p. If the probabilities ofthe elementary events are the same, then |A| = p|Ω|.Proof. This statement is a direct use of the definition of the probability of the eventA.

p = Pr (A) =|A||Ω|

Definition B.6 (Intersection). Let A,B ⊆ Ω be events. The event that both A andB occur is denoted by A,B and equals the event A∩B. The probability of the eventA,B is

Pr (A,B) = Pr (A ∩B) .

Definition B.7 (Disjoint events). Let A,B ⊆ Ω be events. Events A and B aredisjoint if A ∩B = ∅. The corresponding compound probability then equals

Pr (A,B) = Pr (∅) = 0.

Definition B.8 (Independent events). Let A,B ⊆ Ω be events. Events A and B areindependent if

Pr (A,B) = Pr (A) Pr (B) .

The motivation of the following definition is a way how to compute the probabilityof event A inside the restricted probability space B.

103

Definition B.9 (Conditional probability). Let A,B ⊆ Ω be events and assume thatPr (B) 6= 0. Then the conditional probability of event A | B is defined as

Pr (A | B) =Pr (A,B)

Pr (B).

Lemma B.10. If A,B ⊆ Ω are non-disjoint events such that Pr (B) 6= 0, then

Pr (B) ≤ Pr(A)Pr(A|B)

.

Proof. From our assumptions we have that the conditional probability of the eventA | B is defined and positive. To prove the lemma the definition of the con-ditional probability and the straightforward bound on the compound probability,Pr (A,B) ≤ Pr (A), is used

Pr (B) =Pr (A,B)

Pr (A | B)≤ Pr (A)

Pr (A | B).

Remark B.11. Let A,B ⊆ Ω be events with B 6= ∅. The events are independent ifand only if Pr (A | B) = Pr (A).

Proof. If we assume the independence of the two events we have

Pr (A | B) =Pr (A,B)

Pr (B)=

Pr (A) Pr (B)

Pr (B)= Pr (A) .

The reverse implication holds since

Pr (A,B) = Pr (A | B) Pr (B) = Pr (A) Pr (B) .

Lemma B.12. Properties of Probability

(1) Pr (∅) = 0

(2) Pr (Ω) = 1

(3) 0 ≤ Pr (A) ≤ 1

(4) Pr (A ∪B) = Pr (A) + Pr (B)−Pr (A ∩B)

Theorem B.13 (The Law of Total Probability). Let A ⊆ Ω be an event andΩ1, . . . ,Ωk be a decomposition of the probability space Ω such that it does not containthe impossible event. Then

Pr (A) =k∑i=1

Pr (A | Ωi) Pr (Ωi) .

104

Proof. We perform a simple computation to decompose the event A. The decom-posed event is then rewritten using the definition of the conditional probability.

Pr (A) = Pr (A ∩ Ω)

=k∑i=1

Pr (A,Ωi)

=k∑i=1

Pr (A | Ωi) Pr (Ωi)

Definition B.14 (Discrete random variable). Let Ω be a probability space. Thendiscrete random variable X is a function X : Ω→ N. Probability of the event X = xequals

Pr (X = x) =∑ω∈Ω

X(ω)=x

Pr (ω) .

In the continuous probability we usually assume that the vector space Ω is anuncountable set.

Definition B.15 (Continuous random variable, probability density function, cumu-lative probability density function). Continuous random variable X is a functionX : Ω→ R.

Function F : R → [0, 1] is the cumulative probability density function of therandom variable X if the following conditions are satisfied

(1) limt→−∞

F (t) = 0,

(2) limt→∞

F (t) = 1.

(3) The function F is non-decreasing.

(4) F (t) = Pr (X ≤ t) for every t ∈ R.

Function f : R → R∗ such that f(t) = F ′(t) for every t ∈ R is the probabil-ity density function of the random variable X. The probability density function fsatisfies that

(1)∞∫−∞

f(t) dt = 1,

(2)t∫−∞

f(x) dx = F (t).

105

More formal definitions of random variables use the Lebesgue measure and Borelsets to measure the probability Pr (X = x). However previous definitions are suffi-cient for our needs.

If X is a continuous random variable with the cumulative density function Fthen the probability of X ∈ [a, b] equals

Pr (a ≤ X ≤ b) = F (b)− F (a) =

b∫a

f(x) dx.

The motivation for the definition of the probability density function is the factthat it corresponds to the probability of X being equal to a singleton. We seethat for every ε > 0 value εf(b) approximates Pr (b− ε ≤ X ≤ b), if it is smallenough. However for continuous variables the event X = b is impossible unless F isin-continuous in b and thus f(b) =∞.

After observing the previous fact we can rewrite F (t) to the form:

F (t) = Pr (X ≤ t) =

t∫−∞

Pr (X = t) dt.

Definition B.16 (Expected value). Let X be a discrete random variable. Its ex-pected value, E (X), is defined as

E (X) =∑x∈N

xPr (X = x) .

In the continuous case, assume that X is a continuous random variable, then itsexpected value, E (X), is defined as

E (X) =

∞∫−∞

xPr (X = x) dx.

Definition B.17 (Variance). Let X be a random variable. Its variance, Var (X),is defined as

Var (X) = E((X − E (X))2

).

We only show the basic properties of the expected value and variance withoutthe proof.

Lemma B.18. Assume that X and Y are random variables and a, b, c ∈ R. Then

(1) E (aX + bY + c) = aE (X) + bE (Y ) + c

(2) Var (aX + bY + c) = a2Var (X) + b2Var (Y )

106

Definition B.19 (Expected value of a function of a random variable). Let X be acontinuous random variable and g : R→ R be a function. The expected value of therandom variable Y = g(X) is defined as

E (Y ) =

∞∫−∞

yPr (Y = y) dy =

∞∫−∞

g(x)Pr (X = x) dx.

Lemma B.20. Let X, Y be random variables and t ∈ R. For every z ∈ [0, 1] setωt(z) = y ∈ R | z = Pr (X = t | Y = y). Let Zt be a random variable bounded inthe interval [0, 1] such that

Pr (Zt = z) =

∫ωt(z)

Pr (Y = y) dy.

Then Pr (X = t) = E (Zt).

Proof. By the Law of Total Probability we state the following fact:

Pr (X = t) =

∞∫−∞

Pr (X = t, Y = y) dy

=

∞∫−∞

Pr (X = t | Y = y) Pr (Y = y) dy

=

1∫0

zPr (Zt = z) dz

= E (Zt) .

So the expected value of the variable Z equals the probability Pr (X = t).

In Lemma 5.10 we refer to the modification of Lemma B.20 for X ≥ t. In thismodified version we assume that variable Z = Pr (X ≥ t | Y = y). A straightforwardinspection proves the consequence.

The following two theorems, the Markov’s [31] and Chebyshev’s [15] inequalities,are well known and we often refer to them in the work. Let us note that variousimprovements for the higher moments hold as well.

Theorem B.21 (Markov’s inequality). Let X be a random variable and t ∈ R+.Then the probability of the event X ≥ t is bounded as

Pr (|X| ≥ t) ≤ E (|X|)t

.

107

Proof. For the indicator I(|X| ≥ t) we have that tI(|X| ≥ t) ≤ |X|. If |X| ≥ t,than I(|X| ≥ t) = 1 and it follows that tI(|X| ≥ t) = t ≤ |X|. If |X| < t, then0 = I(|X| ≥ t) ≤ |X|.

E (|X|) =

∞∫0

|X|Pr (|X| = x) dx

≥∞∫

0

tI(|X| ≥ t)Pr (|X| = x) dx

= t

∞∫0

I(|X| ≥ t)Pr (|X| = x) dx

= tPr (|X| ≥ t) .

Theorem B.22 (Chebyshev’s inequality). If X is a random variable, then

Pr (|X − E (X) | ≥ ε) ≤ Var (X)

ε2.

Proof. First observe that the event |X − E (X) | ≥ ε is equivalent to the event(X − E (X))2 ≥ ε2. The theorem follows from the Markov’s inequality used for thelatter event and from the definition of variance.

Pr (|X − E (X) | ≥ ε) = Pr((X − E (X))2 ≥ ε2

)≤ E ((X − E (X))2)

ε2

=Var (X)

ε2.

From the Chebyshev’s inequality it follows that the average converges to the ex-pected value. This fact is commonly referred to as the Weak Law of Large Numbers.

Theorem B.23 (Weak Law of Large Numbers). Let n ∈ N, ε > 0 and X1, . . . , Xn

be independent identically distributed random variables. Then

Pr

(∣∣∣∣∑ni=1Xi

n− E (X1)

∣∣∣∣ ≥ ε

)≤ Var (X1)

nε2.

Proof. Define the average of X1, . . . , Xn as the random variable X =Pni=1Xin

. Fromthe properties of the expected value and variance stated in Lemma B.18 it followsthat

E(X)

= E (X1)

Var(X)

=Var (X1)

n.

108

The Chebyshev’s inequality used for the random variable X yields the requiredresult.

Following lemma enables us to compute the expected value of a continuous ran-dom variable if we know its cumulative probability density function.

Lemma B.24. Let X be a continuous random variable taking only non-negativevalues. Let F : R+

0 → [0, 1] be its cumulative probability density function. Then

E (X) =

∞∫0

1− F (x) dx

Proof. From the definition of the expected value and the cumulative probabilitydensity function we have that

E (X) =

∞∫0

tPr (X = t) dt

=

∞∫0

t∫0

Pr (X = t) dx dt

=

∞∫0

∞∫x

Pr (X = t) dt dx

=

∞∫0

Pr (X ≥ x) dx

=

∞∫0

1− F (x) dx.

109

properties of universal hashingktiml.mff.cuni.cz/~babka/hashing/thesis.pdf · charles university in...

Documents