hash table - nthucswkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key...
TRANSCRIPT
![Page 1: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/1.jpg)
Hash table
Speaker : MARK
2008/4/10 L.O.A.D.S. 1
![Page 2: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/2.jpg)
Outline Introduction
Direct addressing table Hash table
Hash function Division Mid-square Folding
Collision & Overflow handing Chaining Open addressing
2008/4/10 L.O.A.D.S. 2
![Page 3: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/3.jpg)
Introduction
Many applications require a dynamic set S tosupports the following dictionary operations: Search(k): check if k is in S Insert(k): insert k into S Delete(k): delete k from S
Hash table: an effective data structure forimplementing dictionaries
2008/4/10 L.O.A.D.S. 3
![Page 4: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/4.jpg)
Definitions
U : a set of universe keys
K : a dynamic set of actual keys Like an application needs in which each element has a
key drawn from the universe U = {0, 1, ..., m-1}
T : the table denoted by T[0 ~ m-1], in which each position, or slot, corresponds to a
key in the universe U .
2008/4/10 L.O.A.D.S. 4
![Page 5: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/5.jpg)
Direct addressing table
Ex.
Search time = Insert time = Delete time = O(1)
2008/4/10 L.O.A.D.S. 5
Key = 2 Name = John … … …
Slot
![Page 6: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/6.jpg)
Direct addressing table
The difficulty with direct addressing is obvious: The table T size = O(|U|) If |K| << |U| , then use too much spaces.
Time is money ! Space is money, too !?
2008/4/10 L.O.A.D.S. 6
![Page 7: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/7.jpg)
What is hashing ?
Hashing has following advantages: Use hashing to search, data need not be sorted Without collision & overflow, search only takes
O(1) time. Data size is not concerned Security. If you do not know the hash function,
you cannot get data
2008/4/10 L.O.A.D.S. 7
![Page 8: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/8.jpg)
Hash table With direct addressing ,
an element with key k is stored in slot k
With hashing , this element is stored in slot h(k)
2008/4/10 L.O.A.D.S. 8
![Page 9: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/9.jpg)
Hash function
A good hash function satisfies (approximately)the assumption of simple uniform hashing :
Each key is equally likely to hash to any of them slots, independently of where any other keyhas hashed to.
2008/4/10 L.O.A.D.S. 9
![Page 10: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/10.jpg)
Hash function
For example, if the keys k are known to berandom real numbers independently anduniformly distributed in the range 0 ≤k < 1,the hash function
h(k) = b km c
satisfies the condition of simple uniformhashing.
2008/4/10 L.O.A.D.S. 10
![Page 11: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/11.jpg)
Hash function
Interpreting keys as natural numbers
Most hash functions assume that the universeof keys is the set N = {0, 1, 2, ...} of naturalnumbers.
Ex. Key ‘pt’ p = 112 & t = 116 in ASCII table as a radix-128 integer,‘pt’= (112·128) + 116 = 14452
2008/4/10 L.O.A.D.S. 11
![Page 12: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/12.jpg)
(1) Division
Mapping a key k into one of m slots by takingthe remainder of k divided by m
h(k) = k mod m
Ex. m = 12, k = 100, then h(k) = 4
Prime number m may be good choice !
2008/4/10 L.O.A.D.S. 12
![Page 13: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/13.jpg)
(2) Mid-square
Mapping a key k into one of m slots by get themiddle some digits from value k2
h( k ) = k2 get middle (log m) digits
Ex. m = 10000, k = 113586, log m = 4h(k) = 1135862 get middle 4 digits
= 12901779369 get middle 4 digits= 1779
2008/4/10 L.O.A.D.S. 13
![Page 14: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/14.jpg)
(3) Folding
Divide k into some sections, besides the lastsection, have same length . Then add thesesections together. a. shift folding b. folding at the boundaries
H(k) = ∑(section divided from k) by a or b
2008/4/10 L.O.A.D.S. 14
![Page 15: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/15.jpg)
(3) Folding
Ex, k = 12320324111220, section length = 3
2008/4/10 L.O.A.D.S. 15
![Page 16: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/16.jpg)
Collision & Overflow handing
2008/4/10 L.O.A.D.S. 16
Collision!
![Page 17: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/17.jpg)
(1) Chaining
In chaining, we put all the elements that hashto the same slot in a linked list
2008/4/10 L.O.A.D.S. 17
![Page 18: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/18.jpg)
(1) Chaining analysis
Worst-case insert time = O(1) insert into the beginning of each link list
Worst-case search time = Θ(n) Every key mapping to the same slot
Ex. h(1) = h(2) = h(3) = … = h(n) = xthen search key ‘1’
2008/4/10 L.O.A.D.S. 18
![Page 19: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/19.jpg)
(1) Chaining analysis
For j = 0, 1, ..., m-1, let us denote the length ofthe list T[j] by nj , so that
n = n0 + n1 + … + nm-1
the average value of nj is E[nj] = α = n/m.
Average search time = Θ(1 + α)
2008/4/10 L.O.A.D.S. 19
![Page 20: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/20.jpg)
(1) Chaining analysis
Unsuccessful search time = Θ(1 + α)
The expected time to search unsuccessfully for akey k is the expected time to search to the end oflist T[h(k)], which has expected lengthE[nh(k)] = α.
2008/4/10 L.O.A.D.S. 20
![Page 21: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/21.jpg)
(1) Chaining analysis
Successful search time = Θ(1 + α) The situation for a successful search is slightly
different, since each list is not equally likely to besearched.
Instead, the probability that a list is searched isproportional to the number of elements it contains.
2008/4/10 L.O.A.D.S. 21
![Page 22: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/22.jpg)
(1) Chaining analysis
For keys ki and kj , we defineindicator random variable Xij = I{h(ki) = h(kj)}
Under the assumption of simple uniformhashing, we havePr{h(ki) = h(kj)} = 1/m, and E[Xij] = 1/m
The expected number of elements examined ina successful search is :
2008/4/10 L.O.A.D.S. 22
![Page 23: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/23.jpg)
(1) Chaining analysis
2008/4/10 L.O.A.D.S. 23
Θ(2 + α/2 - α/2n) =Θ(1 + α)
![Page 24: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/24.jpg)
(1) Chaining analysis
Θ(1 + α) means ? If the number of hash-table slots is at least
proportional to the number of elements in thetable, we haven = O(m) and, α = n/m = O(m)/m = O(1).
Thus, searching takes constant time onaverage.
2008/4/10 L.O.A.D.S. 24
![Page 25: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/25.jpg)
(2) Open addressing
In open addressing, all elements are stored inthe hash table itself.
That is, each table slot contains either anelement of the dynamic set or NIL.
The hash table can "fill up"=> no further insertions can be made;
load factor α = n/m ≤1.
2008/4/10 L.O.A.D.S. 25
![Page 26: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/26.jpg)
(2) Open addressing
The assumption of uniform hashing :we assume that each key is equally likely tohave any of the m! permutations of<0, 1, ..., m–1> as its probe sequence.
Linear probing, Quadratic probing, and Doublehashing are commonly used to compute the probesequences required for open addressing.
2008/4/10 L.O.A.D.S. 26
![Page 27: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/27.jpg)
(2.1) Linear Probing
h(k, i) = (h’(k) + i) mod m ,h’: auxiliary hash functioni : 0, 1, ... , m-1
2008/4/10 L.O.A.D.S. 27
![Page 28: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/28.jpg)
(2.2) Quadratic Probing
h(k, i) = (h’(k) + c1i + c2i2) mod m ,h’: auxiliary hash functionc1, c2 ≠ 0 : auxiliary constantsi : 0, 1, ... , m-1
This method works much better than linearprobing, but to make full use of the hash table,
the values of c1, c2, and m are constrained.
2008/4/10 L.O.A.D.S. 28
![Page 29: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/29.jpg)
(2.3) Double hashing
h(k, i) = (h1(k) + ih2(k)) mod m ,h1, h2 : auxiliary hash functioni : 0, 1, ... , m-1
Double hashing is one of the best methodsavailable for open addressing
because the permutations produced have manyof the characteristics of randomly chosenpermutations.
2008/4/10 L.O.A.D.S. 29
![Page 30: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/30.jpg)
(2) Open addressing
These techniques all guarantee that<h(k, 0), h(k, 1), ... , h(k, m-1) > is apermutation of < 0, 1, ..., m–1> for each key k
None of these techniques fulfills the assumptionof uniform hashing.
Double hashing has the greatest number ofprobe sequences and, as one might expect,seems to give the best results.
2008/4/10 L.O.A.D.S. 30
![Page 31: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/31.jpg)
Given an open-address hash table with loadfactor α = n/m < 1, the expected number ofprobes in an unsuccessful search is at most1/(1-α) , assuming uniform hashing.
Define the random variable X to be the number ofprobes made in an unsuccessful search.
Define the event Ai , for i = 1, 2, ..., to be theevent that there is an ith probe and it is to anoccupied slot.
(2) Open addressing analysis
2008/4/10 L.O.A.D.S. 31
![Page 32: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/32.jpg)
(2) Open addressing analysis
Then the event {X ≥i} = A1∩A2∩···∩Ai-1 . We will bound Pr{X ≥i} by bounding
Pr {A1∩A2∩···∩Ai-1} = Pr{A1} · Pr{A2|A1} ·Pr{A3|A1∩A2} · Pr{Ai-1|A1∩A2∩···∩Ai-2}
2008/4/10 L.O.A.D.S. 32
![Page 33: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/33.jpg)
(2) Open addressing analysis
If α is a constant, anunsuccessful search runs inO(1) time.
Ex. average number of probesin an unsuccessful search : If the hash table is half full :
at most 1/(1 - 0.5) = 2 If the hash table is 90% full :
at most 1/(1 - 0.9) = 10
2008/4/10 L.O.A.D.S. 33
![Page 34: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/34.jpg)
(2) Open addressing analysis
Inserting an element into an open-addresshash table with load factor α requires at most1/(1 - α) probes on average, assuminguniform hashing.
Inserting a key requires an unsuccessful searchfollowed by placement of the key in the firstempty slot found.
Thus, the expected number of probes is at most1/(1 - α).
2008/4/10 L.O.A.D.S. 34
![Page 35: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/35.jpg)
(2) Open addressing analysis
Given an open-address hash table with loadfactor α < 1, the expected number of probesin a successful search is at mostassuming uniform hashing and assuming thateach key in the table is equally likely to besearched for.
2008/4/10 L.O.A.D.S. 35
![Page 36: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/36.jpg)
(2) Open addressing analysis
if k was the (i + 1)st key inserted into the hashtable, the expected number of probes made in asearch for k is at most 1/(1 - i/m) = m/(m-i).
Averaging over all n keys in the hash table givesus the average number of probes in a successfulsearch:
2008/4/10 L.O.A.D.S. 36
![Page 37: Hash table - NTHUCSwkhon/algo08-tutorials/tutorial-hashing.pdf · if k was the (i + 1)st key inserted into the hash table, the expected number of probes made in a search for k is](https://reader035.vdocument.in/reader035/viewer/2022063008/5fbdae20e60ca326ca4f2813/html5/thumbnails/37.jpg)
(2) Open addressing analysis
Ex. the expected number of probes in a successfulsearch is : If the hash table is half full : less than 1.387 If the hash table is 90% full : less than 2.559
2008/4/10 L.O.A.D.S. 37