data structures and algorithms - anisnazer.com filedata structures and algorithms hashing eng. anis...
Post on 31-Aug-2019
13 Views
Preview:
TRANSCRIPT
Data Structures And Algorithms
Hashing
Eng. Anis NazerFirst Semester 2016-2017
Searching
● Search: find if a “key” exists in a given set● Searching algorithms:
– linear (sequential) search– binary search– Search based on a hash function
Linear/sequential Search● Algorithm:
– go through the elements one by one, if found, return
● Code:
● What is the complexity ?
bool linearSearch( int A[], int size, int key){ for ( i=0 ; i < size ; i++) if (A[i] == key ) return true; return false;}
Binary Search● Assumption: the array elements are sorted● Algorithm:
– compare key with element at the middle● if ( key == element)
– return true;● if ( key > element )
– search left sub array● else
–search right sub array● Question: when to stop? how to determin key is not found?● What is the complexity ?
Binary Search
Code:
bool binarySearch( int A[], int size, int key){ int L = 0 , R = size – 1; int M = (L+R) / 2; while ( L <= R ) { if if ( key == A[M] ) return true; else ( key > A[M] ) L = M+1; else R = M – 1; M = (L+R)/2; } return false;}
Hash function● Hash function is a function that gives the result based
on the input or part of the input.● Example of a hash function:
f(x) = x % 10● Assume we store the elements in an array based on the
hash function
– the index of value “x” is f(x)– A[ f(x) ] = x
Hash function● Example: store the following in an array of size 10,
given that the hash function is
f(x) = x % 10
1 , 18 , 15, 930, 77, 29
● is 44 in the array ?
f(44) = 44 % 10 = 4, A[4] is empty → 44 not in array
0 1 2 3 4 5 6 7 8 9
930 1 15 77 18 29
Hash function● What is the advantage of using a hash function ?● What is the problem when using a hash function ?
– two inputs hash to the same value● ex. f(x) = x % 10
f(15) = 5f(225) = 5
● What to do if two values hash to the same index?
Collision● Collision: when two distinct values v1 and v2
hash to the same index● How to deal with collisions?
– Use a perfect hash function:● i.e. no two values hash to the same index● this is practically impossible since the data is
unknown● A good hash function is a function that avoids
collisions
Hash functions● Some examples of hash functions:
– Division– Folding– Mid-Square– Extraction– Radix transformation
Hash functions● Division: based on the modulo operator:
– h(x) = x % (array size)– It is better to have array size a prime number
Hash functions● Folding: the key is divided into parts, and the
parts are processed to generate the index (address)
– Example: divide the key into parts of three digits, then add the digits, then take the modulo array sizeID = 199805535, array size = 101
h(199805535) = (199 + 805 + 535 ) % 101= 24
Hash functions● Mid-Square: The key is squared and the middle is taken
Example: key = 3121 , size = 1000
3121^2 = 9740641,
middle = 406● It is better to use a power of 2 size and use the middle of
the binary representation
Example: key = 3121 , size = 1024
3121^2 = 9740641 = 100101001010000101100001
→ h(3121) = 0101000010 = 322
Hash functions● Extraction: take a part of the key,
Example: take the first 4 digits of the ID number:
h(199805535) = 5535● This method is a useful when part of the key is
common in the data,
– ID numbers usually start with the same digits
Hash functions● Radix transformation: the key is converted to
another number system, and the value is divided modulo array size:
Example: key = 345 , size = 100, base 9
h(345) = ( (423) % 100 ) = 23
h(245) = ( (309) % 100 ) = 9
Collision resolution● Collision: two keys hash to the same address (index)● How to deal with collision:
– Use a perfect hash function, not practical– Open addressing: Find an availble position to place the
colliding key● linear probing● quadratic probing● double hashing
– Chaining: use a linked list to store the keys
Collision resolution● Linear probing: look for the next available
position, wrap around the end of the array● Ex. h(x) = x % 10 , size = 10
16, 22, 77, 48, 35, 62, 47, 99
0 1 2 3 4 5 6 7 8 9
Collision resolution● Linear probing tends to create clusters.
– elements tend to group near each other● The empty position following a cluster has a
higher chance to be filled.
– this is proportional to the cluster size,– the bigger the cluster, the higher the
probability
Collision resolution● Quadratic probing: look for positions using a
quadratic formula:
h(x) + i
i = 1 , -1 , 4, -4, 9, -9, ….● Ex. h(x) = x % 10 , size = 10
16, 22, 77, 48, 35, 62, 47, 99
0 1 2 3 4 5 6 7 8 9
Collision resolution● Assume key = 9, h(x) = x %19 and the array
is full except A[3], what is the sequence of indices (probes) that are tried?
● Quadratic probing avoids clustering but will generate “secondary clusters” since two elements that hash to the same index, will generate the same probe sequence
Collision resolution● How to know when to stop if the key is not in
the array ?● If the size of the array is a prime number of the
form 4j + 3 , where j is an integer, the probing sequence is guarenteed to cover all the indices
Collision resolution● Double hashing: if a collision occures, use another
hash function● probe sequence:
h(x), h(x)+h2(x), h(x) + 2h2(x), h(x)+3h2(x)● Example:
– h(x) = x%19– h2(x) = x%13– What are the probe sequences for x = 3, x = 22
Comparison
Collision resolution● Chaining: store a pointer to a linked list in the
array, and store the data in the linked list● The list can be sorted for efficiency● Chaining requires more space to store the
pointers
Collision resolution● Separate chaining:
Collision resolution● Coalesced chaining:
– 2D array: Size x 2 → A[size][2]– the second column stores the index of the next element
in the chain● Example: store the following data,
12, 23, 15, 72, 49, 35, 9, 22
h(x) = x % 10
-2 → position is available
-1 → element is last in the chain
collision resolution: linear probing
Example
12, 23, 15, 72, 49, 35, 9, 220
1
2
3
4
5
6
7
8
9
Example
12, 23, 15, 72, 49, 35, 9, 220 9 -1
1 -2
2 12 4
3 23 -1
4 72 7
5 15 6
6 35 -1
7 22 -1
8 -2
9 49 0
Deletion● What happens if you delete a value from a
hash table ?
Example: arrange the data: 11, 34, 62, 4, 91
– use h(x) = x%10, and linear probing– then delete data 34, 62– then search for 4
0 1 2 3 4 5 6 7 8 9
Deletion● The position of the deleted item should not be
marked as empty, why ?● Can we reuse the position of the deleted element ?● if you have many delete operations and few insert
operations, you should rehash the table after a number of deletions
● Rehash: arrange the data using a different table size and/or different hash function
THE END●
●
●
top related