data structures uri zwick january 2014 hashing. 2 dictionaries d dictionary() – create an empty...
TRANSCRIPT
![Page 1: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/1.jpg)
Data Structures
Uri ZwickJanuary 2014
Hashing
![Page 2: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/2.jpg)
2
Dictionaries
D Dictionary() – Create an empty dictionary
Insert(D,x) – Insert item x into D
Find(D,k) – Find an item with key k in D
Delete(D,k) – Delete item with key k from D
Can use balanced search trees O(log n) time per operation
(Predecessors and successors, etc., not supported)
Can we do better? YES !!!
![Page 3: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/3.jpg)
3
Dictionaries with “small keys”Suppose all keys are in [m] = {0,1,…,m−1}, where m = O(n)
Can implement a dictionary using an array D of length m.
(Assume different items have different keys.)
O(1) time per operation (after initialization)
What if m>>n ? Use a hash function
0 1 m-1
Special case: Sets D is a bit vector
Direct addressing
![Page 4: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/4.jpg)
HashingHuge
universe U
Hash table0 1 m-1
Hash function h Collisions
![Page 5: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/5.jpg)
Hashing with chaining [Luhn (1953)] [Dumey (1956)]
Each cell points to a linked list of items
0 1 m-1i
![Page 6: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/6.jpg)
Hashing with chainingwith a random hash function
Balls in BinsThrow n balls randomly into m bins
![Page 7: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/7.jpg)
Balls in Bins
Throw n balls randomly into m bins
All throws are uniform and independent
![Page 8: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/8.jpg)
Balls in Bins
Throw n balls randomly into m bins
Expected number of balls in each bin is n/m
When n=(m), with probability ofat least 11/n, all bins contain
at most O(log n/(log log n)) balls
![Page 9: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/9.jpg)
What makes a hash function good?
Behaves like a “random function”Has a succinct representation
Easy to compute
A single hash function cannot satisfy the first condition
![Page 10: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/10.jpg)
Families of hash functionsWe cannot choose a “truly random” hash function
Compromise:Choose a random hash function h from a
carefully chosen family H of hash functions
Each function h from H should have a succinct representation and should be easy to compute
Goal:For every sequence of operations, the running time
of the data structure, when a random hash function h from H is chosen, is expected to be small
Using a fixed hash function is usually not a good idea
![Page 11: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/11.jpg)
Modular hash functions [Carter-Wegman (1979)]
p – prime number
Form a “Universal Family” (see below)
Require (slow) divisions
![Page 12: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/12.jpg)
Multiplicative hash functions [Dietzfelbinger-Hagerup-Katajainen-Penttonen (1997)]
Extremely fast in practice!
Form an “almost-universal” family (see below)
![Page 13: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/13.jpg)
Tabulation based hash functions [Patrascu-Thorup (2012)]
+
A variant can also be used to hash strings
hi can be stored in a small table
“byte”
Very efficient in practice
Very good theoretical properties
![Page 14: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/14.jpg)
Universal families of hash functions [Carter-Wegman (1979)]
A family H of hash functions from U to [m] is said to be universal if and only if
A family H of hash functions from U to [m] is said to be almost universal if and only if
![Page 15: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/15.jpg)
k-independent families of hash functions
A family H of hash functions from U to [m] is said to be k-independent if and only if
A family H of hash functions from U to [m] is said to be almost k-independent if and only if
![Page 16: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/16.jpg)
A simple universal family[Carter-Wegman (1979)]
To represent a function from the family we only need two numbers, a and b.
The size m of the hash table can be arbitrary.
![Page 17: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/17.jpg)
A simple universal family[Carter-Wegman (1979)]
![Page 18: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/18.jpg)
Probabilistic analysis of chainingn – number of elements in dictionary D
m – size of hash table
Assume that h is randomly chosen from a universal family H
Expected Worst-case
Successful SearchDelete
Unsuccessful Search(Verified) Insert
=n/m – load factor
![Page 19: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/19.jpg)
Chaining: pros and cons
Pros:Simple to implement (and analyze)
Constant time per operation (O(1+))Fairly insensitive to table size
Simple hash functions suffice
Cons:Space wasted on pointers
Dynamic allocations required
Many cache misses
![Page 20: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/20.jpg)
Hashing with open addressingHashing without pointers
Insert key k in the first free position among
Assumed to be a permutation
To search, follow the same orderNo room found Table is full
![Page 21: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/21.jpg)
Hashing with open addressing
![Page 22: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/22.jpg)
How do we delete elements?
Caution: When we delete elements, do not set the corresponding cells to null!
“deleted”
Problematic solution…
![Page 23: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/23.jpg)
Probabilistic analysis of open addressing
n – number of elements in dictionary Dm – size of hash table
Uniform probing: Assume that for every k,h(k,0),…,h(k,m−1) is random permutation
=n/m – load factor (Note: 1)
Expected time forunsuccessful search
Expected time forsuccessful search
![Page 24: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/24.jpg)
Probabilistic analysis of open addressing
Claim: Expected no. of probes for an unsuccessful search is at most:
If we probe a random cell in the table, the probability that it is full is .
The probability that the first i cells probed are all occupied is at most i.
![Page 25: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/25.jpg)
Open addressing variants
Linear probing:
Quadratic probing:
Double hashing:
How do we define h(k,i) ?
![Page 26: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/26.jpg)
Linear probing“The most important hashing technique”
But, much less cache misses
More probes than uniform probing,as probe sequences “merge”
More complicated analysis(Requires 5-independence or tabulation hashing)
Extremely efficient in practice
![Page 27: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/27.jpg)
Linear probing – Deletions
Can the key in cell j be moved to cell i?
![Page 28: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/28.jpg)
Linear probing – Deletions
When an item is deleted, the hash table is in exactly the state it would have been
if the item were not inserted!
![Page 29: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/29.jpg)
Expected number of probesAssuming random hash functions
SuccessfulSearch
UnsuccessfulSearch
Uniform Probing
Linear Probing
When, say, 0.6, all small constants
[Knuth (1962)]
![Page 30: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/30.jpg)
Expected number of probes
0.5
![Page 31: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/31.jpg)
Perfect hashingSuppose that D is static.
We want to implement Find is O(1) worst case time.
Perfect hashing:No collisions
Can we achieve it?
![Page 32: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/32.jpg)
Expected no. of collisions
![Page 33: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/33.jpg)
Expected no. of collisions
No collisions!If we are willing to use m=n2, then any universal family contains a
perfect hash function.
![Page 34: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/34.jpg)
Two level hashing[Fredman, Komlós, Szemerédi (1984)]
![Page 35: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/35.jpg)
Two level hashing[Fredman, Komlós, Szemerédi (1984)]
![Page 36: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/36.jpg)
Total size:
Assume that each hi can be represented
using 2 words
Two level hashing[Fredman, Komlós, Szemerédi (1984)]
![Page 37: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/37.jpg)
A randomized algorithm for constructing a perfect two level hash table:
Choose a random h from H(p,n) and compute the number of collisions. If there
are more than n collisions, repeat.
For each cell i,if ni>1, choose a random hash function hi from H(p,ni
2). If there are any collisions, repeat.
Expected construction time – O(n)
Worst case search time – O(1)
![Page 38: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/38.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
![Page 39: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/39.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
O(1) worst case search time!What is the (expected) insert time?
![Page 40: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/40.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
Difficult insertion
How likely are difficult insertion?
![Page 41: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/41.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
Difficult insertion
![Page 42: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/42.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
Difficult insertion
![Page 43: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/43.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
Difficult insertion
![Page 44: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/44.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
Difficult insertion
How likely are difficult insertion?
![Page 45: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/45.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
A more difficult insertion
![Page 46: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/46.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
A more difficult insertion
![Page 47: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/47.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
A more difficult insertion
![Page 48: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/48.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
A more difficult insertion
![Page 49: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/49.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
A more difficult insertion
![Page 50: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/50.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
A more difficult insertion
![Page 51: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/51.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
A more difficult insertion
![Page 52: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/52.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
A failed insertion
If Insertion takes more than MAX steps, rehash
![Page 53: Data Structures Uri Zwick January 2014 Hashing. 2 Dictionaries D Dictionary() – Create an empty dictionary Insert(D,x) – Insert item x into D Find(D,k)](https://reader037.vdocument.in/reader037/viewer/2022103123/56649dac5503460f94a9be36/html5/thumbnails/53.jpg)
Cuckoo Hashing[Pagh-Rodler (2004)]
With hash functions chosen at random from an appropriate family of hash functions, the amortized expected insert time is O(1)