hashing
TRANSCRIPT
Hashing
Department of Computer ScienceIslamia College Univerisity Peshawar
Fall 2012 SemesterBCS course: CS 00 Analysis of Algorithms
Course Instructor: Mr. Zahid
04/12/23 Lecture #9 Adapted from slides by Dr Onaiza Maqbol
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by
Dr Onaiza Maqbol
Dictionary Holds n records
What data structure should be used to implement T?
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hashing
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Direct Addressing Assumptions
The set of keys Keys are distinct
Create a table T[0..u-1]
Benefit Each operation takes constant time
Drawbacks The range of keys can be large
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hashing Solution
Use a hash function h to map the universe U of all keys into {0, 1, …, m–1}
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hash Table
The mapped keys are stored into table called hash table
The table consists of m cells
A hash table requires much less storage than a direct address table
With direct addressing, an element in key k is stored in slot k, with hashing, this element is stored in slot h(k)
So the hash function h : U → {0, 1, …., m-1}
h(k) is also called hash value of key k
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hashing Functions - Modulo Function Several functions can be used to map keys into a set of integers. The
choice is made on the basis of amount of computation time required, and simplicity of the computational steps. A common choice is a modulo function h(x) defined as:
h(k) = k mod m
where k is the key, m is some positive integer and mod denotes the modulus operator which computes the remainder of key k divided by m.
It follows that the hash function h(x) maps the set of keys {k1, k2, k3,…….kn} into a set of integers {0,1,2,……m-1}
In essence, the modulo function is used to create a hash table of size m
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Modulo Function (contd…)
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hashing Functions - Multiplication Method
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Hashing of Strings
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
ASCII Sum Method
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Radix Method
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Universal Hashing
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Universal Hashing (contd…)
Ha,b(k)=((ak+b)modp)mod m where p is large enough so that every possible key k is in the range 0 to p-1, inclusive, and 0<a<p and 0<=b<p
belongs to the the family of universal functions
mod 6
s
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Perfect Hashing
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Perfect Hashing
Using perfect hashing to store {10, 22, 37, 40, 60, 70, 75}, outer hash function is Ha,b(k)=((ak+b)modp)mod m where a=3, b=42, p=101, and m=9. e.g. h(75)=2. Since h2(75)=1, 75 is stored in slot1 of secondary hash table
0
1
2
3
…
8
m2 a2 b2 S2
4 10 18 60 75
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Collisions
Two or more than two keys may hash to the same slot
When a record to be inserted maps to an already occupied slot in T, a collision occurs
Can we avoid collisions altogether?
Not if |U| > m
We need a method to resolve collisions that occur
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Collisions
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Collision Resolution
Two basic approaches to collision resolution are called chained hashing and open address hashing
Chained Hashing: In chained hashing the elements of a hash table are stored in a set of linked lists. All colliding elements are kept in one linked list. The list head pointers are usually stored in an array. Chained hashing is also known as open hashing
Open Address Hashing: In open address hashing, the hashed keys are stored in the hash table itself. The colliding keys are allocated distinct cells in the table. Open address hashing is also referred to as closed hashing
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Collision Resolution by Chaining Records in the same slot are linked into a list
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Collision Resolution by Chaining (contd…)
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Analysis of Hashing with Chaining
How long does it take to search for an element with a given key?
Let n be the number of keys in the table, and let m be the number of slots
Define the load factor of T to be α = n/m = average number of keys per slot
Analysis is in terms of α, which can be less than, equal to, or greater than 1
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Worst Hashing - Searching
All hash keys are mapped to a single list.
This situation may be referred to as worst distribution of hash keys
In practice, this extreme situation may not arise, but nevertheless, possibility does exist
Worst case time for searching is thus θ(n), plus time to compute the hash function
The best search time is θ(1), since the key will be found in the front node
On an average, half the list will be examined. Thus, average search time is θ(n)
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Worst Hashing - Insertion
The worst case running time for insertion is θ(1)
The assumption is that the key is not already present in the table
To check presence, search of the key is required – As just mentioned, worst case time of searching is θ(n)
Thus worst case running time of insertion is θ(n)
Average cost running time of insertion is also θ(n)
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Simple Uniform Hashing - Searching The keys are uniformly distributed among all the linked lists i.e. it is
assumed that any given element is equally likely to hash into any of the m slots
Let us denote length of the list T[j] for j= 0,1,…., m-1 by nj so that n=n0+n1+…+nm-1 and the average value of nj=E[nj] = α = n/m
We assume that hash value h(k) can be computed in O(1) time
So time required to search for an element with key k depends linearly on the length nh(k) of the list T[h(k)]
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Simple Uniform Hashing - Searching
Two cases Unsuccessful search Successful search
Unsuccessful search Expected time to search unsuccessfully for a key k is the expected time to search to
the end of list T[h(k)], which has the expected length E[nh(k)]= α
Thus total time required is θ(1+ α)
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Simple Uniform Hashing - Insertion In order to find average time for inserting a key, let us consider the case
when kth key is inserted. At that stage, the list has already k-1 keys distributed uniformly over m linked lists. Thus, prior to insertion of kth key, the average length of each list is (k-1)/m, as shown in the diagram
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Simple Uniform Hashing - Insertion The insertion of new key would require probing of (k-1)/m keys plus the cost of
adding new key.
Thus, the overall cost of insertion of kth key is 1+(k-1)/m, assuming that each operation consumes unit time 1.
The expected cost of inserting a key is obtained by summing over all possible values of k. Thus, the expected cost I is given by
The average cost of inserting key is 1+ α /2- 1/2m = θ(1+ α)
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Simple Uniform Hashing - Searching
Successful search We assume that element x to be searched is equally likely to be any
of the n elements stored in the table The number of elements examined is one more than number of
elements that appear before x is x’s list Elements before x in the list were all placed after x was inserted
Total time required for a successful search is 1+ α /2- α /2n = θ(1+ α)
If n=O(m), α=n/m=O(m)/m=1 Thus searching takes constant time on average
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Open Addressing All elements are stored in the hash table itself
In open addressing, the hash table can fill up, so that no further insertions can be made
The load factor α can never exceed 1
Advantage is that open addressing avoids pointers altogether
Extra memory freed provides hash table with a larger number of slots for the same amount of memory
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Insertion
We successively examine or probe the hash table until we find an empty slot in which to put the key
The sequence of positions probed depends upon the key being inserted
To determine which points to probe, we extend hash functions to include the probe number as a second input. Thus hash function becomes:
h : U x {0, 1, …., m-1} → {0, 1, …., m-1}
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Pseudo code
HASH-INSERT(T, k)
1. i ← 0
2. Repeat j ← h(k,i)
3. if T[j]=NIL
4. then T[j]←k
5. return j
6. else i ← i+1
7. until i=m
8. Error “Table full”
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Linear Probing In linear probing the hashed key is incremented by an integer value. In
general the hash function is defined as function
h(k,i)=( h’(k)+ i) mod m,
where h’(k) is an auxiliary hash function and m is the table size.
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Linear Probing (contd…)
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Searching
HASH-SEARCH(T, k)
1. i ← 0
2. Repeat j ← h(k,i)
3. if T[j]=k
4. then return j
5. i ← i+1
6. until T[j]=NIL or i=m
7. Return NIL
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Quadratic Probing
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Quadratic Probing
Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr
Onaiza Maqbol
Quadratic Probing