hashing

Hashing

Department of Computer ScienceIslamia College Univerisity Peshawar

Fall 2012 SemesterBCS course: CS 00 Analysis of Algorithms

Course Instructor: Mr. Zahid

04/12/23 Lecture #9 Adapted from slides by Dr Onaiza Maqbol

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by

Dr Onaiza Maqbol

Dictionary Holds n records

What data structure should be used to implement T?

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Hashing


Onaiza Maqbol

Direct Addressing Assumptions

The set of keys Keys are distinct

Create a table T[0..u-1]

Benefit Each operation takes constant time

Drawbacks The range of keys can be large


Onaiza Maqbol

Hashing Solution

Use a hash function h to map the universe U of all keys into {0, 1, …, m–1}


Onaiza Maqbol

Hash Table

The mapped keys are stored into table called hash table

The table consists of m cells

A hash table requires much less storage than a direct address table

With direct addressing, an element in key k is stored in slot k, with hashing, this element is stored in slot h(k)

So the hash function h : U → {0, 1, …., m-1}

h(k) is also called hash value of key k


Onaiza Maqbol

Hashing Functions - Modulo Function Several functions can be used to map keys into a set of integers. The

choice is made on the basis of amount of computation time required, and simplicity of the computational steps. A common choice is a modulo function h(x) defined as:

h(k) = k mod m

where k is the key, m is some positive integer and mod denotes the modulus operator which computes the remainder of key k divided by m.

It follows that the hash function h(x) maps the set of keys {k1, k2, k3,…….kn} into a set of integers {0,1,2,……m-1}

In essence, the modulo function is used to create a hash table of size m


Onaiza Maqbol

Modulo Function (contd…)


Onaiza Maqbol

Hashing Functions - Multiplication Method


Onaiza Maqbol

Hashing of Strings


Onaiza Maqbol

ASCII Sum Method


Onaiza Maqbol

Radix Method


Onaiza Maqbol

Universal Hashing


Onaiza Maqbol

Universal Hashing (contd…)

Ha,b(k)=((ak+b)modp)mod m where p is large enough so that every possible key k is in the range 0 to p-1, inclusive, and 0<a<p and 0<=b<p

belongs to the the family of universal functions

mod 6

s


Onaiza Maqbol

Perfect Hashing


Onaiza Maqbol

Perfect Hashing

Using perfect hashing to store {10, 22, 37, 40, 60, 70, 75}, outer hash function is Ha,b(k)=((ak+b)modp)mod m where a=3, b=42, p=101, and m=9. e.g. h(75)=2. Since h2(75)=1, 75 is stored in slot1 of secondary hash table

0

1

2

3

…

8

m2 a2 b2 S2

4 10 18 60 75


Onaiza Maqbol

Collisions

Two or more than two keys may hash to the same slot

When a record to be inserted maps to an already occupied slot in T, a collision occurs

Can we avoid collisions altogether?

Not if |U| > m

We need a method to resolve collisions that occur


Onaiza Maqbol

Collisions


Onaiza Maqbol

Collision Resolution

Two basic approaches to collision resolution are called chained hashing and open address hashing

Chained Hashing: In chained hashing the elements of a hash table are stored in a set of linked lists. All colliding elements are kept in one linked list. The list head pointers are usually stored in an array. Chained hashing is also known as open hashing

Open Address Hashing: In open address hashing, the hashed keys are stored in the hash table itself. The colliding keys are allocated distinct cells in the table. Open address hashing is also referred to as closed hashing


Onaiza Maqbol

Collision Resolution by Chaining Records in the same slot are linked into a list


Onaiza Maqbol

Collision Resolution by Chaining (contd…)


Onaiza Maqbol

Analysis of Hashing with Chaining

How long does it take to search for an element with a given key?

Let n be the number of keys in the table, and let m be the number of slots

Define the load factor of T to be α = n/m = average number of keys per slot

Analysis is in terms of α, which can be less than, equal to, or greater than 1


Onaiza Maqbol

Worst Hashing - Searching

All hash keys are mapped to a single list.

This situation may be referred to as worst distribution of hash keys

In practice, this extreme situation may not arise, but nevertheless, possibility does exist

Worst case time for searching is thus θ(n), plus time to compute the hash function

The best search time is θ(1), since the key will be found in the front node

On an average, half the list will be examined. Thus, average search time is θ(n)


Onaiza Maqbol

Worst Hashing - Insertion

The worst case running time for insertion is θ(1)

The assumption is that the key is not already present in the table

To check presence, search of the key is required – As just mentioned, worst case time of searching is θ(n)

Thus worst case running time of insertion is θ(n)

Average cost running time of insertion is also θ(n)


Onaiza Maqbol

Simple Uniform Hashing - Searching The keys are uniformly distributed among all the linked lists i.e. it is

assumed that any given element is equally likely to hash into any of the m slots

Let us denote length of the list T[j] for j= 0,1,…., m-1 by nj so that n=n0+n1+…+nm-1 and the average value of nj=E[nj] = α = n/m

We assume that hash value h(k) can be computed in O(1) time

So time required to search for an element with key k depends linearly on the length nh(k) of the list T[h(k)]


Onaiza Maqbol

Simple Uniform Hashing - Searching

Two cases Unsuccessful search Successful search

Unsuccessful search Expected time to search unsuccessfully for a key k is the expected time to search to

the end of list T[h(k)], which has the expected length E[nh(k)]= α

Thus total time required is θ(1+ α)


Onaiza Maqbol

Simple Uniform Hashing - Insertion In order to find average time for inserting a key, let us consider the case

when kth key is inserted. At that stage, the list has already k-1 keys distributed uniformly over m linked lists. Thus, prior to insertion of kth key, the average length of each list is (k-1)/m, as shown in the diagram


Onaiza Maqbol

Simple Uniform Hashing - Insertion The insertion of new key would require probing of (k-1)/m keys plus the cost of

adding new key.

Thus, the overall cost of insertion of kth key is 1+(k-1)/m, assuming that each operation consumes unit time 1.

The expected cost of inserting a key is obtained by summing over all possible values of k. Thus, the expected cost I is given by

The average cost of inserting key is 1+ α /2- 1/2m = θ(1+ α)


Onaiza Maqbol

Simple Uniform Hashing - Searching

Successful search We assume that element x to be searched is equally likely to be any

of the n elements stored in the table The number of elements examined is one more than number of

elements that appear before x is x’s list Elements before x in the list were all placed after x was inserted

Total time required for a successful search is 1+ α /2- α /2n = θ(1+ α)

If n=O(m), α=n/m=O(m)/m=1 Thus searching takes constant time on average


Onaiza Maqbol

Open Addressing All elements are stored in the hash table itself

In open addressing, the hash table can fill up, so that no further insertions can be made

The load factor α can never exceed 1

Advantage is that open addressing avoids pointers altogether

Extra memory freed provides hash table with a larger number of slots for the same amount of memory


Onaiza Maqbol

Insertion

We successively examine or probe the hash table until we find an empty slot in which to put the key

The sequence of positions probed depends upon the key being inserted

To determine which points to probe, we extend hash functions to include the probe number as a second input. Thus hash function becomes:

h : U x {0, 1, …., m-1} → {0, 1, …., m-1}


Onaiza Maqbol

Pseudo code

HASH-INSERT(T, k)

1. i ← 0

2. Repeat j ← h(k,i)

3. if T[j]=NIL

4. then T[j]←k

5. return j

6. else i ← i+1

7. until i=m

8. Error “Table full”


Onaiza Maqbol

Linear Probing In linear probing the hashed key is incremented by an integer value. In

general the hash function is defined as function

h(k,i)=( h’(k)+ i) mod m,

where h’(k) is an auxiliary hash function and m is the table size.


Onaiza Maqbol

Linear Probing (contd…)


Onaiza Maqbol

Searching

HASH-SEARCH(T, k)

1. i ← 0

2. Repeat j ← h(k,i)

3. if T[j]=k

4. then return j

5. i ← i+1

6. until T[j]=NIL or i=m

7. Return NIL


Onaiza Maqbol

Quadratic Probing

hashing

Technology

dr onaiza maqbolwednesday

analysis of hashing

hashing solution

closed hashing

set of keys keys

hash table of size

worst hashing insertion

hash function hx