hashing

38
Hashing Department of Computer Science Islamia College Univerisity Peshawar Fall 2012 Semester BCS course: CS 00 Analysis of Algorithms Course Instructor: Mr. Zahid 07/03/22 Lecture #9 Adapted from slides by Dr Onaiza Maqbol

Upload: abbas-ali

Post on 24-May-2015

475 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: Hashing

Hashing

Department of Computer ScienceIslamia College Univerisity Peshawar

Fall 2012 SemesterBCS course: CS 00 Analysis of Algorithms

Course Instructor: Mr. Zahid

04/12/23 Lecture #9 Adapted from slides by Dr Onaiza Maqbol

Page 2: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by

Dr Onaiza Maqbol

Dictionary Holds n records

What data structure should be used to implement T?

Page 3: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Hashing

Page 4: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Direct Addressing Assumptions

The set of keys Keys are distinct

Create a table T[0..u-1]

Benefit Each operation takes constant time

Drawbacks The range of keys can be large

Page 5: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Hashing Solution

Use a hash function h to map the universe U of all keys into {0, 1, …, m–1}

Page 6: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Hash Table

The mapped keys are stored into table called hash table

The table consists of m cells

A hash table requires much less storage than a direct address table

With direct addressing, an element in key k is stored in slot k, with hashing, this element is stored in slot h(k)

So the hash function h : U → {0, 1, …., m-1}

h(k) is also called hash value of key k

Page 7: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Hashing Functions - Modulo Function Several functions can be used to map keys into a set of integers. The

choice is made on the basis of amount of computation time required, and simplicity of the computational steps. A common choice is a modulo function h(x) defined as:

h(k) = k mod m

where k is the key, m is some positive integer and mod denotes the modulus operator which computes the remainder of key k divided by m.

It follows that the hash function h(x) maps the set of keys {k1, k2, k3,…….kn} into a set of integers {0,1,2,……m-1}

In essence, the modulo function is used to create a hash table of size m

Page 8: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Modulo Function (contd…)

Page 9: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Hashing Functions - Multiplication Method

Page 10: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Hashing of Strings

Page 11: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

ASCII Sum Method

Page 12: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Radix Method

Page 13: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Universal Hashing

Page 14: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Universal Hashing (contd…)

Ha,b(k)=((ak+b)modp)mod m where p is large enough so that every possible key k is in the range 0 to p-1, inclusive, and 0<a<p and 0<=b<p

belongs to the the family of universal functions

mod 6

s

Page 15: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Perfect Hashing

Page 16: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Perfect Hashing

Using perfect hashing to store {10, 22, 37, 40, 60, 70, 75}, outer hash function is Ha,b(k)=((ak+b)modp)mod m where a=3, b=42, p=101, and m=9. e.g. h(75)=2. Since h2(75)=1, 75 is stored in slot1 of secondary hash table

0

1

2

3

8

m2 a2 b2 S2

4 10 18 60 75

Page 17: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Collisions

Two or more than two keys may hash to the same slot

When a record to be inserted maps to an already occupied slot in T, a collision occurs

Can we avoid collisions altogether?

Not if |U| > m

We need a method to resolve collisions that occur

Page 18: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Collisions

Page 19: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Collision Resolution

Two basic approaches to collision resolution are called chained hashing and open address hashing

Chained Hashing: In chained hashing the elements of a hash table are stored in a set of linked lists. All colliding elements are kept in one linked list. The list head pointers are usually stored in an array. Chained hashing is also known as open hashing

Open Address Hashing: In open address hashing, the hashed keys are stored in the hash table itself. The colliding keys are allocated distinct cells in the table. Open address hashing is also referred to as closed hashing

Page 20: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Collision Resolution by Chaining Records in the same slot are linked into a list

Page 21: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Collision Resolution by Chaining (contd…)

Page 22: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Analysis of Hashing with Chaining

How long does it take to search for an element with a given key?

Let n be the number of keys in the table, and let m be the number of slots

Define the load factor of T to be α = n/m = average number of keys per slot

Analysis is in terms of α, which can be less than, equal to, or greater than 1

Page 23: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Worst Hashing - Searching

All hash keys are mapped to a single list.

This situation may be referred to as worst distribution of hash keys

In practice, this extreme situation may not arise, but nevertheless, possibility does exist

Worst case time for searching is thus θ(n), plus time to compute the hash function

The best search time is θ(1), since the key will be found in the front node

On an average, half the list will be examined. Thus, average search time is θ(n)

Page 24: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Worst Hashing - Insertion

The worst case running time for insertion is θ(1)

The assumption is that the key is not already present in the table

To check presence, search of the key is required – As just mentioned, worst case time of searching is θ(n)

Thus worst case running time of insertion is θ(n)

Average cost running time of insertion is also θ(n)

Page 25: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Simple Uniform Hashing - Searching The keys are uniformly distributed among all the linked lists i.e. it is

assumed that any given element is equally likely to hash into any of the m slots

Let us denote length of the list T[j] for j= 0,1,…., m-1 by nj so that n=n0+n1+…+nm-1 and the average value of nj=E[nj] = α = n/m

We assume that hash value h(k) can be computed in O(1) time

So time required to search for an element with key k depends linearly on the length nh(k) of the list T[h(k)]

Page 26: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Simple Uniform Hashing - Searching

Two cases Unsuccessful search Successful search

Unsuccessful search Expected time to search unsuccessfully for a key k is the expected time to search to

the end of list T[h(k)], which has the expected length E[nh(k)]= α

Thus total time required is θ(1+ α)

Page 27: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Simple Uniform Hashing - Insertion In order to find average time for inserting a key, let us consider the case

when kth key is inserted. At that stage, the list has already k-1 keys distributed uniformly over m linked lists. Thus, prior to insertion of kth key, the average length of each list is (k-1)/m, as shown in the diagram

Page 28: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Simple Uniform Hashing - Insertion The insertion of new key would require probing of (k-1)/m keys plus the cost of

adding new key.

Thus, the overall cost of insertion of kth key is 1+(k-1)/m, assuming that each operation consumes unit time 1.

The expected cost of inserting a key is obtained by summing over all possible values of k. Thus, the expected cost I is given by

The average cost of inserting key is 1+ α /2- 1/2m = θ(1+ α)

Page 29: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Simple Uniform Hashing - Searching

Successful search We assume that element x to be searched is equally likely to be any

of the n elements stored in the table The number of elements examined is one more than number of

elements that appear before x is x’s list Elements before x in the list were all placed after x was inserted

Total time required for a successful search is 1+ α /2- α /2n = θ(1+ α)

If n=O(m), α=n/m=O(m)/m=1 Thus searching takes constant time on average

Page 30: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Open Addressing All elements are stored in the hash table itself

In open addressing, the hash table can fill up, so that no further insertions can be made

The load factor α can never exceed 1

Advantage is that open addressing avoids pointers altogether

Extra memory freed provides hash table with a larger number of slots for the same amount of memory

Page 31: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Insertion

We successively examine or probe the hash table until we find an empty slot in which to put the key

The sequence of positions probed depends upon the key being inserted

To determine which points to probe, we extend hash functions to include the probe number as a second input. Thus hash function becomes:

h : U x {0, 1, …., m-1} → {0, 1, …., m-1}

Page 32: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Pseudo code

HASH-INSERT(T, k)

1. i ← 0

2. Repeat j ← h(k,i)

3. if T[j]=NIL

4. then T[j]←k

5. return j

6. else i ← i+1

7. until i=m

8. Error “Table full”

Page 33: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Linear Probing In linear probing the hashed key is incremented by an integer value. In

general the hash function is defined as function

h(k,i)=( h’(k)+ i) mod m,

where h’(k) is an auxiliary hash function and m is the table size.

Page 34: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Linear Probing (contd…)

Page 35: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Searching

HASH-SEARCH(T, k)

1. i ← 0

2. Repeat j ← h(k,i)

3. if T[j]=k

4. then return j

5. i ← i+1

6. until T[j]=NIL or i=m

7. Return NIL

Page 36: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Quadratic Probing

Page 37: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Quadratic Probing

Page 38: Hashing

Wednesday, March 18, 200904/12/23Lecture #9 Adapted from slides by Dr

Onaiza Maqbol

Quadratic Probing