![Page 1: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/1.jpg)
Hash Tables
CS 310 – Professor Roch
Weiss Chapter 20
All figures marked with a chapter and section number are copyrighted © 2006 by
Pearson Addison-Wesley unless otherwise indicated. All rights reserved.
![Page 2: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/2.jpg)
Hash tables
• Suppose we decide that the average cost of O(log N) for operations of a binary search tree are too slow.
• Hash tables provide a way to insert, delete and find in average O(1) time.
• Why did we even bother with binary search trees?
![Page 3: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/3.jpg)
No free lunch
• The constant time comes with a cost:
• Hash table elements have no order, so– visiting according to an ordering property– finding the minimum or maximum elements– etc.
are all expensive
![Page 4: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/4.jpg)
Foundations of hashing
• Much like binary search trees, we choose some field of a record to serve as a key.
• A function maps the key to an index.
HashFunction(key) index• The index is used in an array and the
array entries are sometimes referred to as “hash buckets.”
![Page 5: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/5.jpg)
Hash functions
• A naïve hash function for a string might build a polynomial from the string’s encoding:
Example:
0123 12810812811112811112867'' Cool
![Page 6: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/6.jpg)
Hash functions
• We can index an array by hash function index:
]12810812811112811112867[ 0123 HashTable
‘Cool’+ any otherinformation
![Page 7: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/7.jpg)
Uh-oh
• For a 4 character string we need over 268,000,000 entries in the array.
• We can reduce the size to something manageable by using the modulo operator:
142342124 % 10000 = 2124
![Page 8: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/8.jpg)
More about the hash function
• If we consider the hash function to be a polynomial of variable X, e.g. for strings:
we can reduce the number of multiplications by incrementally computing the hash function
iAlength
i i XAAhash
1)(
0)(
2
10
)'('
)'(')'(')"(" e.g.
Xeencoding
XoencodingXmencodingmoehash
![Page 9: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/9.jpg)
Overflowing the hash function
• Consider X=128 as in our previous example for hashing strings and assume that we are using 64 bit unsigned integers:
so any 10 character string (and most 9 character strings) would overflow a 64 bit unsigned integer.
6379
7
64
22128 :10length of string aConsider
2 128 that Recall
]12,,2,1,0[int unsigned
9
![Page 10: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/10.jpg)
Resolving hash overflow
hash_value = 0
for i = 0 to length(A)
hash_value = hash_value*X + encoding(A[i])
• Avoids computing Xi explicitly, but the sum can still overflow…
![Page 11: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/11.jpg)
Resolving hash overflow
1. Apply modulo after each operationhash_value = 0
for i = 0 to length(A)
hash_value = /* modulo is expensive */
(hash_value*X + encoding(A[i])) % TableSize
2. Allow overflow. We need to be careful though as long polynomials will shift the first elements of the key out of range.
![Page 12: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/12.jpg)
Avoiding overflow
![Page 13: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/13.jpg)
Allowing overflow
![Page 14: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/14.jpg)
Going to extremes…
• Here, we have effectively set X in our polynomial to the value 1.
• What are the implications of this?
![Page 15: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/15.jpg)
Collisions
• Our hash function is no longer unique.
• If we choose our hash function carefully, this will not happen too often.
• Nonetheless, we still need to handle it and we will investigate different ways to do so.
![Page 16: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/16.jpg)
Linear probing
• Simple idea:When a collison occurs, look for the next empty
hash bucket.
use this one
hashes to usedused
![Page 17: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/17.jpg)
Linear probing analysis
• The load factor is defined as
• Let us assume that1. Each insertion/access of the hash table is
independent of other ones (very naïve assumption)
2. The hash table is large (reasonable assumption)
buckets of #
buckets hash table used of #
![Page 18: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/18.jpg)
Naïve analysis
• Assuming independence of probes, the average number of buckets examined in a linear probing insertion is 1/(1-λ)
Proof:Pr(empty bucket)=(1- λ). On average, if an event occurs with Pr(event)=p,
we need to try 1/p times before we expect to have seen the event with probability 1.
So, we should have to try 1/(1- λ) times before we see an empty bucket.
![Page 19: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/19.jpg)
Primary clustering
• Hash insertions and finds are not independent.
• Results in “primary clustering”
7.
![Page 20: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/20.jpg)
Linear probing complexity
• Given a loading factor of λ, the number of cells examined in a linear probing insertion is approximately:
• We will accept this without proof.
2
1 211
![Page 21: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/21.jpg)
Analysis of find
• Unsuccessful find– Same as cost of insertion.
• Successful find– Same as finding item at time when inserted.– If the bin was unused, only 1 probe is needed.– As more collisions occur, the number of
probes increase.
![Page 22: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/22.jpg)
Analysis of successful findwhen primary clustering is present
• Need to average over all load factors up to the current one:
![Page 23: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/23.jpg)
Deletion
• Cost similar to that of find.
• We cannot simply delete a node.– Why not?
![Page 24: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/24.jpg)
Lazy deletion
• Instead of clearing an entry, we mark it as deleted.
• A new insertion may place a new value there and mark it active.
• Hash bins are either: unused, active, or deleted.
![Page 25: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/25.jpg)
Perhaps we can do better…
• Linear probing is not bad:– Average number of probes for a successful search
with a hash table 50% loaded is 2.5.– Begins to be problematic as λ approaches 1 (λ=.90
50.5).– Note that this is independent of the table size.
• Any algorithm that wishes to reduce this needs must be inexpensive enough that it is cheaper than the small number of probes typically needed.
![Page 26: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/26.jpg)
Quadratic probing
• Basic idea: Scatter the collisions so they do not group near one another
• Suppose hash(n) = H and bin H is used.– Try (H + i2)%TableSize for i = 1, 2, 3, …– Note that linear probing used (H+i)%TableSize for i =
1, 2, 3, …
• Works best when the table size is a prime number.
![Page 27: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/27.jpg)
![Page 28: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/28.jpg)
Quadratic probing
• Thm 20.4 – When inserting into a hash table that is at least half empty using quadratic probing, a new element can always be inserted, and no hash bucket is probed more than one time.
![Page 29: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/29.jpg)
Insertion with quadratic probing
![Page 30: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/30.jpg)
Insertion with quadratic probing
![Page 31: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/31.jpg)
Insertion with quadratic probing
![Page 32: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/32.jpg)
Insertion with quadratic probing
![Page 33: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/33.jpg)
What does this buy us?
• For a hash table which is less than half full, we have removed the primary clustering.
• Consequently, we are closer our naïve analysis.• On average, when the table is half full, this
saves us:– .5 for each insertion– .1 for each successful search
• In addition, long chains are avoided.
![Page 34: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/34.jpg)
What does this cost us?
• The squared operation and the modulo are relatively expensive given that on average we do not save much.
• Fortunately, we can improve this ...
![Page 35: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/35.jpg)
Efficient quadratic probing
MiHH
MiiiHH
MiiHH
MiHMiHMHH
MiHH
MiHH
ii
ii
ii
ii
i
i
%)12(
%)12(
%)1(
%)1(%%
:first thefromequation second thegSubtractin
%)1(
%
1
221
221
20
201
201
20
![Page 36: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/36.jpg)
Effective quadratic probing
• Multiplication by can be implemented trivially by shift.
• 2i-1 < M as we never insert into a table that is more than half full.
• So, Hi-1+2i-1 is either less than Hi or is <2M and can be adjusted by subtracting M.
![Page 37: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/37.jpg)
More than M/2 entries?
• Increase the size of the table to the next prime number.
• Figure 20.7 (read) shows a prime number generation subroutine that is at most O(N.5logN). This is less than O(N).
• Copying the table take O(N) time, and has an amortized cost of O(1).
![Page 38: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/38.jpg)
Copying the hash bins
• We do not use the same entries.– Why not?
• Instead we rehash each item to a new position.
![Page 39: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/39.jpg)
![Page 40: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/40.jpg)
![Page 41: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/41.jpg)
![Page 42: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/42.jpg)
![Page 43: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/43.jpg)
![Page 44: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/44.jpg)
![Page 45: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/45.jpg)
![Page 46: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/46.jpg)
![Page 47: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/47.jpg)
![Page 48: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/48.jpg)
Read
• Read the remainder of the code online and make sure that you understand it.
• In addition, read the iterator class code.
![Page 49: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/49.jpg)
Complexity of quadratic probing
• No known analysis
• Eliminates primary clustering
• Introduces secondary clustering
![Page 50: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/50.jpg)
Alternatives
• Double hashing – Resolve collisions with a second hash function
• Separate chain hashing – Place collisions on a linked list.
![Page 51: Hash Tables CS 310 – Professor Roch Weiss Chapter 20 All figures marked with a chapter and section number are copyrighted © 2006 by Pearson Addison-Wesley](https://reader035.vdocument.in/reader035/viewer/2022062511/5519b1ad5503465b578b4604/html5/thumbnails/51.jpg)
Applications
• Content addressable tables
• Symbol tables
• Game playing – Caching state
• Song recognition