hash table theory and chaining. hash table: theory and chaining hash table formalism for hashing...

17
Hash Table Theory and chaining

Upload: terence-chandler

Post on 13-Jan-2016

227 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Hash Table

Theory and chaining

Page 2: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Hash table: Theory and chaining

Hash Table

Formalism for hashing functions

Resolving collisions by chaining

Page 3: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

FormFormαα∫∫ismism

Page 4: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Formalism for hashing functions

Hash table

Lets denote by U the set of possible values of the keys.

Lets denote by n the size of the hashing table.

h: U → {0, 1, …, n – 1}

Hashing function key position

U

h

1 m

Example: keys are words of 6 characters, and the hashing table has 13 cells.

h: 26^6 → 10

Size of the alphabet

Page 5: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

If the hash table is at least as large as the number of different keys we’re expecting.

Formalism for hashing functions

h: U → {0, 1, …, n – 1}

Lets say that you were living in a really perfect world. What’s the

best feature of a hash function you could ever dream of?

The only problem that we have is when there is a collision: two keys

mapped on the same position.

h is perfect iff there is no collision

In which conditions can that happen?

If the table is exactly as large, h is a minimal perfect hash function.

Hash table

Page 6: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Formalism for hashing functions

If the table is exactly as large, h is a minimal perfect hash function.

Hash table

h: U → {0, 1, …, n – 1}

We are interested in hash tables smaller that the set S of possible keys.

S U, |S| > n⊆

Thus, we will have collisions.

What’s the best behaviour of the hash function we can hope for?

#of times the position gets hit

Position

Uniform distribution.

If x ≠ y then P[h(x) = h(y)] = 1/n

Page 7: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Formalism for hashing functions

Hash table

h: U → {0, 1, …, n – 1}

S U, |S| > n⊆ Uniform distribution.

Since there will be collisions, we will have to probe for empty spots.

<h (x), h (x), …, h (x)> is the probe sequence for a key x.1 2 n – 1

• What is the probability that T[h (x)] is already used?1

Lets say we have m elements in the table.

All cells have the same probability of being used.

The probability is m/n.

m/n

Page 8: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Formalism for hashing functions

Hash table

h: U → {0, 1, …, n – 1}

S U, |S| > n⊆ Uniform distribution.

Since there will be collisions, we will have to probe for empty spots.

<h (x), h (x), …, h (x)> is the probe sequence for a key x.1 2 n – 1

• What is the probability that T[h (x)] is already used?1 m/n

• If we have to probe more than once, what could be the cells targetted ...by the remaining sequence?

Any permutation of {0, 1, 2, …, n – 1} \ {h (x)}0

The expected number of probes is: E[T(n,m)] = 1 + * E[T(n–1,m–1)]mn

The number of times you have to probe is the complexity of a lookup.

Page 9: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Formalism for hashing functions

Hash table

h: U → {0, 1, …, n – 1}

S U, |S| > n⊆ Uniform distribution.

The expected number of probes is: E[T(n,m)] = 1 + * E[T(n–1,m–1)]mn

The number of times you have to probe is the complexity of a lookup.

If the hash table is empty, how many extra probes will we need?

0: first probe is the good one. So T(n,0) = 1.

The base case being proven, lets prove by recursion E[T(n,m)] ≤ n/(n–m )

mn

E[T(n,m)] = 1 + * E[T(n–1,m–1)]

≤ (n – 1)/(n – 1 – m + 1)≤ (n – 1)/(n – m)

≤m nn n – m1 + *

Because n – 1 ≤ 1

= n/(n–m)

Page 10: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Formalism for hashing functions

Hash table

h: U → {0, 1, …, n – 1}

S U, |S| > n⊆ Uniform distribution.

The expected number of probes is: E[T(n,m)] = 1 + * E[T(n–1,m–1)]mn

The number of times you have to probe is the complexity of a lookup.

If the hash table is empty, how many extra probes will we need?

0: first probe is the good one. So T(n,0) = 1.

The base case being proven, lets prove by recursion E[T(n,m)] ≤ n/(n–m )

The ratio of used cells m / n is called the load factor and denoted α.

E[T(n,m)] ≤ n/(n–m ) = 1 / (1 – α) ∈ O(1) because α is a constant.

Page 11: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Heuristics

Hash table

We’ve been wandering in the realms of somewhat pure mathematics.

By now you probably all love it.

But let’s come back to reality for a minute.

α√π

∑∂x²

Page 12: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Heuristics

Hash table

The probe sequences that we generate are not totally random. We have:

• Linear probing: h (x) = (h(x) + i) mod ni

• Quadratic probing: h (x) = (h(x) + i²) mod ni

• Double hashing: h (x) = (h(x) + i.s(x)) mod niwhere s(x) is a secondary hashing function. Commonly q – (k % q).

Page 13: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

ChainingChaining

Page 14: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Resolving collisions by chaining

Hash table

You’re back at the train station.

Instead of a seat number, the ticket is a compartment

number.

What do you do? You sit with the other people of your compartment.

Page 15: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Resolving collisions by chaining

Hash table

What do you do? You sit with the other people of your compartment.

0 1 2 3 4

Page 16: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Resolving collisions by chaining

Hash table

Each cell is now a data structure. Which one?

AVL!LinkedList!

HashTable!

Pretty much everything but an ArrayList.

Page 17: Hash Table Theory and chaining. Hash table: Theory and chaining Hash Table Formalism for hashing functions Resolving collisions by chaining

Resolving collisions by chaining

Hash table

Each cell is now a data structure. Which one?

• With a LinkedList, access in O(1 + L(x)).

If L(x) denotes the length of the list at T[h(x)], then:

• With a balanced binary search tree, you access in O(1 + log L(x))

The second part depends on the complexity of the structure you use.

Pretty much everything but an ArrayList.