hash table theory and chaining. hash table: theory and chaining hash table formalism for hashing...

Hash Table

Theory and chaining

Hash table: Theory and chaining

Hash Table

Formalism for hashing functions

Resolving collisions by chaining

FormFormαα∫∫ismism


Hash table

Lets denote by U the set of possible values of the keys.

Lets denote by n the size of the hashing table.

h: U → {0, 1, …, n – 1}

Hashing function key position

U

h

1 m

Example: keys are words of 6 characters, and the hashing table has 13 cells.

h: 26^6 → 10

Size of the alphabet

If the hash table is at least as large as the number of different keys we’re expecting.


h: U → {0, 1, …, n – 1}

Lets say that you were living in a really perfect world. What’s the

best feature of a hash function you could ever dream of?

The only problem that we have is when there is a collision: two keys

mapped on the same position.

h is perfect iff there is no collision

In which conditions can that happen?

If the table is exactly as large, h is a minimal perfect hash function.

Hash table


If the table is exactly as large, h is a minimal perfect hash function.

Hash table

h: U → {0, 1, …, n – 1}

We are interested in hash tables smaller that the set S of possible keys.

S U, |S| > n⊆

Thus, we will have collisions.

What’s the best behaviour of the hash function we can hope for?

#of times the position gets hit

Position

Uniform distribution.

If x ≠ y then P[h(x) = h(y)] = 1/n


Hash table

h: U → {0, 1, …, n – 1}

S U, |S| > n⊆ Uniform distribution.

Since there will be collisions, we will have to probe for empty spots.

<h (x), h (x), …, h (x)> is the probe sequence for a key x.1 2 n – 1

• What is the probability that T[h (x)] is already used?1

Lets say we have m elements in the table.

All cells have the same probability of being used.

The probability is m/n.

m/n


Hash table

h: U → {0, 1, …, n – 1}


Since there will be collisions, we will have to probe for empty spots.

<h (x), h (x), …, h (x)> is the probe sequence for a key x.1 2 n – 1

• What is the probability that T[h (x)] is already used?1 m/n

• If we have to probe more than once, what could be the cells targetted ...by the remaining sequence?

Any permutation of {0, 1, 2, …, n – 1} \ {h (x)}0

The expected number of probes is: E[T(n,m)] = 1 + * E[T(n–1,m–1)]mn

The number of times you have to probe is the complexity of a lookup.


Hash table

h: U → {0, 1, …, n – 1}




If the hash table is empty, how many extra probes will we need?

0: first probe is the good one. So T(n,0) = 1.

The base case being proven, lets prove by recursion E[T(n,m)] ≤ n/(n–m )

mn

E[T(n,m)] = 1 + * E[T(n–1,m–1)]

≤ (n – 1)/(n – 1 – m + 1)≤ (n – 1)/(n – m)

≤m nn n – m1 + *

Because n – 1 ≤ 1

= n/(n–m)


Hash table

h: U → {0, 1, …, n – 1}




If the hash table is empty, how many extra probes will we need?

0: first probe is the good one. So T(n,0) = 1.

The base case being proven, lets prove by recursion E[T(n,m)] ≤ n/(n–m )

The ratio of used cells m / n is called the load factor and denoted α.

E[T(n,m)] ≤ n/(n–m ) = 1 / (1 – α) ∈ O(1) because α is a constant.

Heuristics

Hash table

We’ve been wandering in the realms of somewhat pure mathematics.

By now you probably all love it.

But let’s come back to reality for a minute.

♥

α√π

♥

∑∂x²

♥

Heuristics

Hash table

The probe sequences that we generate are not totally random. We have:

• Linear probing: h (x) = (h(x) + i) mod ni

• Quadratic probing: h (x) = (h(x) + i²) mod ni

• Double hashing: h (x) = (h(x) + i.s(x)) mod niwhere s(x) is a secondary hashing function. Commonly q – (k % q).

ChainingChaining


Hash table

You’re back at the train station.

Instead of a seat number, the ticket is a compartment

number.

What do you do? You sit with the other people of your compartment.


Hash table

What do you do? You sit with the other people of your compartment.

0 1 2 3 4


Hash table

Each cell is now a data structure. Which one?

AVL!LinkedList!

HashTable!

Pretty much everything but an ArrayList.


Hash table

Each cell is now a data structure. Which one?

• With a LinkedList, access in O(1 + L(x)).

If L(x) denotes the length of the list at T[h(x)], then:

• With a balanced binary search tree, you access in O(1 + log L(x))

The second part depends on the complexity of the structure you use.

Pretty much everything but an ArrayList.

hash table theory and chaining. hash table: theory and chaining hash table formalism for hashing...

Documents

hashing table

hashing functionshash

nn n m1

s nuniform distribution

probe sequence

s nthus

hashing functionsif

hash tabletheory