design and analysis of algorithms hash tables
DESCRIPTION
Design and Analysis of Algorithms Hash Tables. Haidong Xue Summer 2012, at GSU. Dictionary operations. Very likely. Worst case. INSERT DELETE SEARCH. ( 1 ). O(1). O(1). O(1). ( n ). O(1). - PowerPoint PPT PresentationTRANSCRIPT
Design and Analysis of AlgorithmsHash Tables
Haidong XueSummer 2012, at GSU
Dictionary operations
• INSERT • DELETE • SEARCH
O(1)
O(1)
O(1)
“A hash table is an effective data structure for implementing dictionaries” – textbook page 253
Very likely Worst case
(1)
O(1)
(n)
51 2 3 4 6 7 8 9 10
Direct-address tables
2 3 6 1 7 5
Direct-address table:
SEARCH(S, 6)
INSERT(S, )
DELETE(S, )7
4
O(1)
O(1)
O(1)
What’s the problem here?
Storage requirement = , is the universe of keys
When the range of element is in [1, 30000]…..
Direct-addressing: use keys as addresses
0 1 2
2 3 6 1 7 5
Hash tables• Can we have O(1) INSERT, DELETE AND
SEARCH with less storage?
2 3 6 1 7 5
Hash Table:
Hash Function: h(x) = x mod 3
h(2) = 2 mod 3 = 2
h(3) = 3 mod 3 = 0
h(6) = 6 mod 3 = 0
h(1) = 1 mod 3 = 1
h(7) = 7 mod 3 = 1
h(5) = 5 mod 3 = 2
Multiple elements in one slot
Collision!
Yes!
Hash tables0 1 2
Hash Table:
3 1
7 5
2
6
SEARCH(S, 6)
INSERT(S, )
DELETE(S, )7
4
O(1)+2
DELETE in 1-linked-list
SEARCH in 0-linked-list
INSERT in 1-linked-list O(1)+O(1) = O(1)
O(1)+O(1) = O(1)
(2 is the length of the linked-list)h(6)=6 mod 3=0
h(4)=4 mod 3=1
h(7)=7 mod 3=1
A common method is to put them into a linked-list, i.e. chaining
What is the upper bound length?What is the average length?
Analysis of hash tables
0 1 2Hash Table:
3 4
……..
……..
n m
m-1
… … … … … …
Load factor
Uniform hashing “each key is equally likely to hash to any of the m slots”
Analysis of hash tables0 1 2 3 4
……..m-1
… … … … … … 𝜶
Therorem11.1 Unsuccessful search:
(1+ )
Therorem11.2Successful search:
(1+ )
= , T(n)=(1+ )
If =, T(n)=(1+ O(m))=O(1)
How to get uniform hashing?
With the assumption of uniform hashing
Hash functionsHow to get uniform hashing?
Uniform hashing “each key is equally likely to hash to any of the m slots”
• Division hashing• Multiplication hashing• Universal hashing
To achieve this goal, many hashing methods are proposed:
Hash functions – division hashing
• h(k) = k mod mwhere k is value of key, m is the number of slots • E.g.: – Final grades of all my students with a hash table of
10 slots– Items in grocery stores with a hash table of 10 slots
• 99 cents, large soda• $1.99, ground beef• $6.99, lamb
What’s the problem here?What if we still use 10 slots?
Hash functions – division hashing
• h(k) = k mod m• Choose m as a prime number• 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43,
47, 53, 59, 61, 67, 71, 73,…
• it sometimes not very convenient to be implemented ()
What’s the problem here?
e.g.: 99 mod 7 = 1 199 mod 7 = 3699 mod 7 = 6
Hash functions – multiplication hashing
• h(k) = floor(m(kA mod 1))where m is the number of slots and A is a constant number in (0, 1)• E.g.: A=0.123, m=10– 99*0.123=12.177– 199*0.123=24.477– 699*0.123= 85.977
h(99)=floor(10*0.177)=1h(199)=floor(10*0.477)=4
h(699)=floor(10*0.977)=9
Hash functions – universal hashing
• is set of hash functions;• At the beginning of each execution, randomly
choose a hash function from • Universal: where, and are keys, is the number of slots• If is not in the table, • If is in the table, Theorem 11.3
Another method to deal with collisions: Open Address
• No linked-list• Hash functions include probe number:
• Linear probing: • Quadratic probing: • Double hashing:
• When does not work, use
Number of probes for unsuccessful search is at most
Number of probes for successful search is at most
40 1 2 3 5 6 7 8 9Open addressing:
3 6 12
Another method to deal with collisions: Open Address
3 6 1
h ′ (𝑘 )=𝑘𝑚𝑜𝑑3
h (𝑘 ,𝑖 )=(h′ (𝑘)+𝑖)𝑚𝑜𝑑10
2
h(2, 0)=((2 mod 3) +0)mod 10=2
h(3, 0)=((3 mod 3) +0)mod 10=0
h(6, 0)=((6 mod 3) +0)mod 10=0
h(6, 1)=((6 mod 3) +1)mod 10=1
h(1, 0)=((1 mod 3) +0)mod 10=1
h(1, 1)=((1 mod 3) +1)mod 10=2
h(1, 2)=((1 mod 3) +2)mod 10=3