data structures and algorithms - anisnazer.com filedata structures and algorithms hashing eng. anis...

Data Structures And Algorithms

Hashing

Eng. Anis NazerFirst Semester 2016-2017

Searching

● Search: find if a “key” exists in a given set● Searching algorithms:

– linear (sequential) search– binary search– Search based on a hash function

Linear/sequential Search● Algorithm:

– go through the elements one by one, if found, return

● Code:

● What is the complexity ?

bool linearSearch( int A[], int size, int key){ for ( i=0 ; i < size ; i++) if (A[i] == key ) return true; return false;}

Binary Search● Assumption: the array elements are sorted● Algorithm:

– compare key with element at the middle● if ( key == element)

– return true;● if ( key > element )

– search left sub array● else

–search right sub array● Question: when to stop? how to determin key is not found?● What is the complexity ?

Binary Search

bool binarySearch( int A[], int size, int key){ int L = 0 , R = size – 1; int M = (L+R) / 2; while ( L <= R ) { if if ( key == A[M] ) return true; else ( key > A[M] ) L = M+1; else R = M – 1; M = (L+R)/2; } return false;}

Hash function● Hash function is a function that gives the result based

on the input or part of the input.● Example of a hash function:

f(x) = x % 10● Assume we store the elements in an array based on the

hash function

– the index of value “x” is f(x)– A[ f(x) ] = x

Hash function● Example: store the following in an array of size 10,

given that the hash function is

f(x) = x % 10

1 , 18 , 15, 930, 77, 29

● is 44 in the array ?

f(44) = 44 % 10 = 4, A[4] is empty → 44 not in array

0 1 2 3 4 5 6 7 8 9

930 1 15 77 18 29

Hash function● What is the advantage of using a hash function ?● What is the problem when using a hash function ?

– two inputs hash to the same value● ex. f(x) = x % 10

f(15) = 5f(225) = 5

● What to do if two values hash to the same index?

Collision● Collision: when two distinct values v1 and v2

hash to the same index● How to deal with collisions?

– Use a perfect hash function:● i.e. no two values hash to the same index● this is practically impossible since the data is

unknown● A good hash function is a function that avoids

collisions

Hash functions● Some examples of hash functions:

– Division– Folding– Mid-Square– Extraction– Radix transformation

Hash functions● Division: based on the modulo operator:

– h(x) = x % (array size)– It is better to have array size a prime number

Hash functions● Folding: the key is divided into parts, and the

parts are processed to generate the index (address)

– Example: divide the key into parts of three digits, then add the digits, then take the modulo array sizeID = 199805535, array size = 101

h(199805535) = (199 + 805 + 535 ) % 101= 24

Hash functions● Mid-Square: The key is squared and the middle is taken

Example: key = 3121 , size = 1000

3121^2 = 9740641,

middle = 406● It is better to use a power of 2 size and use the middle of

the binary representation

Example: key = 3121 , size = 1024

3121^2 = 9740641 = 100101001010000101100001

→ h(3121) = 0101000010 = 322

Hash functions● Extraction: take a part of the key,

Example: take the first 4 digits of the ID number:

h(199805535) = 5535● This method is a useful when part of the key is

common in the data,

– ID numbers usually start with the same digits

Hash functions● Radix transformation: the key is converted to

another number system, and the value is divided modulo array size:

Example: key = 345 , size = 100, base 9

h(345) = ( (423) % 100 ) = 23

h(245) = ( (309) % 100 ) = 9

Collision resolution● Collision: two keys hash to the same address (index)● How to deal with collision:

– Use a perfect hash function, not practical– Open addressing: Find an availble position to place the

colliding key● linear probing● quadratic probing● double hashing

– Chaining: use a linked list to store the keys

Collision resolution● Linear probing: look for the next available

position, wrap around the end of the array● Ex. h(x) = x % 10 , size = 10

16, 22, 77, 48, 35, 62, 47, 99

0 1 2 3 4 5 6 7 8 9

Collision resolution● Linear probing tends to create clusters.

– elements tend to group near each other● The empty position following a cluster has a

higher chance to be filled.

– this is proportional to the cluster size,– the bigger the cluster, the higher the

probability

Collision resolution● Quadratic probing: look for positions using a

quadratic formula:

h(x) + i

i = 1 , -1 , 4, -4, 9, -9, ….● Ex. h(x) = x % 10 , size = 10

16, 22, 77, 48, 35, 62, 47, 99

0 1 2 3 4 5 6 7 8 9

Collision resolution● Assume key = 9, h(x) = x %19 and the array

is full except A[3], what is the sequence of indices (probes) that are tried?

● Quadratic probing avoids clustering but will generate “secondary clusters” since two elements that hash to the same index, will generate the same probe sequence

Collision resolution● How to know when to stop if the key is not in

the array ?● If the size of the array is a prime number of the

form 4j + 3 , where j is an integer, the probing sequence is guarenteed to cover all the indices

Collision resolution● Double hashing: if a collision occures, use another

hash function● probe sequence:

h(x), h(x)+h2(x), h(x) + 2h2(x), h(x)+3h2(x)● Example:

– h(x) = x%19– h2(x) = x%13– What are the probe sequences for x = 3, x = 22

Comparison

Collision resolution● Chaining: store a pointer to a linked list in the

array, and store the data in the linked list● The list can be sorted for efficiency● Chaining requires more space to store the

pointers

Collision resolution● Separate chaining:

Collision resolution● Coalesced chaining:

– 2D array: Size x 2 → A[size][2]– the second column stores the index of the next element

in the chain● Example: store the following data,

12, 23, 15, 72, 49, 35, 9, 22

h(x) = x % 10

-2 → position is available

-1 → element is last in the chain

collision resolution: linear probing

Example

12, 23, 15, 72, 49, 35, 9, 220

Example

12, 23, 15, 72, 49, 35, 9, 220 9 -1

2 12 4

3 23 -1

4 72 7

5 15 6

6 35 -1

7 22 -1

9 49 0

Deletion● What happens if you delete a value from a

hash table ?

Example: arrange the data: 11, 34, 62, 4, 91

– use h(x) = x%10, and linear probing– then delete data 34, 62– then search for 4

0 1 2 3 4 5 6 7 8 9

Deletion● The position of the deleted item should not be

marked as empty, why ?● Can we reuse the position of the deleted element ?● if you have many delete operations and few insert

operations, you should rehash the table after a number of deletions

● Rehash: arrange the data using a different table size and/or different hash function

THE END●

data structures and algorithms - anisnazer.com filedata structures and algorithms hashing eng. anis...

Documents

data structures introduction 1. welcome to data structures!...

data structures and algorithms introduction to algorithms

more graph algorithms data structures and algorithms

data structures and algorithms 3€¦ · ·...

algorithms & data structures

algorithms and data structures*

data structures and algorithms 3 - prod-edxapp.edx … ·...

data structures and algorithms searching algorithms

data structures and algorithms - o’reilly...

data structures and algorithms -...

algorithms and data structures

data structures and algorithms in java™ - lagout...

algorithms and data structures (csc112) 1. review...

algorithms + data structures = program

algorithms and data structures - advanced sorting algorithms

data structures and algorithms - lagout v. aho - data...

cs200 algorithms and data structures colorado … 1 cs200...

data structures & algorithms

algorithms and data structures (csc112) 1. introduction...

data structures/ algorithms and generic programming sorting...