chapter 8 hashing
DESCRIPTION
Chapter 8 Hashing. Part II. Dynamic Hashing. Also called extendible hashing Motivation Limitations of static hashing When the table is to be full, overflows increase. As overflows increase, the overall performance decreases. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/1.jpg)
Part II
Chapter 8 Hashing
![Page 2: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/2.jpg)
Dynamic Hashing Also called extendible hashingMotivation
Limitations of static hashingWhen the table is to be full, overflows increase. As
overflows increase, the overall performance decreases.We cannot just copy entries from smaller into a
corresponding buckets of a bigger table.The use of memory space is not flexible.
k1k2k3…
Hash table0
1
2
n
Keys
h (Hash function)
![Page 3: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/3.jpg)
Properties of Dynamic HashingAllow the size of dictionary to grow and shrink.
The size of hash table can be changed dynamically.The term “dynamically” implies the following two
things can be modified:Hash functionThe size of hash table
k1k2k3…
Hash table0
m
Keys
h k1k2k3…
Hash table0
m
Keys
h’
m’
![Page 4: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/4.jpg)
8.3.2 Dynamic Hashing Using DirectoriesUse an auxilinary table to record the pointer of
each bucket.
k1k2k3…
(Directory)
KeysAuxilinary
table
d
Bucket 1
Bucket 2
Bucket 3
Disk
![Page 5: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/5.jpg)
Dynamic Hashing Using DirectoriesDefine the hash function h(k) transforms k into 6-
bit binary integer.For example:
k h(k)
A0 100 000
A1 100 001
B0 101 000
B1 101 001
C1 110 001
C2 110 010
C3 110 011
C5 110 101
![Page 6: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/6.jpg)
Dynamic Hashing Using DirectoriesThe size of d is 2r, where r is the number of bits
used to identify all h(x).Initially, Let r = 2. Thus, the size of d = 22 = 4.
Suppose h(k, p) is defined as the p least significant bits in h(k), where p is also called dictionary depth.
E. g. h(C5) = 110 101h(C5, 2) = 01h(C5, 3) = 101
![Page 7: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/7.jpg)
Process to Expand the Directory Consider the following keys have been already
stored. The least r is 2 to differentiate all the input keys.
k h(k)
A0 100 000
A1 100 001
B0 101 000
B1 101 001
C2 110 010
C3 110 011
d
A0 B0
Directory of pointers to
buckets00
01
10
11
A1 B1
C2
C3
![Page 8: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/8.jpg)
When C5 (110101) is to enter
1. Since r=2 and h(C5, 2) = 01, follow the pointer of d[01].
2. A1 and B1 have been at d[01]. Bucket overflows.
Find the least u such that h(C5, u) is not the same with some keys in h(C5, 2) (01) bucket.
In this case, u = 3. Step 2-1
Since u > r, expand the size of d to 2u and duplicate the pointers to the new half (why?).
![Page 9: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/9.jpg)
When C5 (110101) is to enter Table 的 size變大後照理說每個 entry都必須重新根據新的
hash function重算其所在的 bucket。但由於重算耗費的時間太多,所以暫時保留在原來的 bucket,在新的 bucket中保留舊的 pointer,直到下次出現 overflow時再重算。
A0 B0
A1 B1
C2
C3
000
001
010
011
100
101
110
111
![Page 10: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/10.jpg)
When C5 (110101) is to enter Step 2-2
Rehash identifiers 01 (A1 and B1) and C5 using new hash function h(k, u).
Step 2-3 Let r = u = 3.
A0 B0
A1 B1
C2
C3
000
001
010
011
100
101
110
111
C5
![Page 11: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/11.jpg)
When C1 (110001) is to enter
1. Since r=3 and h(C1, 3) = 001, follow the pointer of d[001].
2. A1 and B1 have been at d[001]. Bucket overflows.
Find the least u such that h(C1, u) is not the same with some keys in h(C1, 3) (001) bucket.
In this case, u = 4. Step 2-1
Since u > r, expand the size of d to 2u and duplicate the pointers to the new half.
![Page 12: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/12.jpg)
A0 B0
A1 B1
C2
C3
0000
0001
0010
0011
0100
0101
0110
0111
C5
1000
1001
1010
1011
1100
1101
1110
1111
![Page 13: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/13.jpg)
A0 B0
C2
C3
0000
0001
0010
0011
0100
0101
0110
0111
C5
1000
1001
1010
1011
1100
1101
1110
1111
Step 2-2 Rehash identifiers 001
(A1 and B1) and C1 using new hash function h(k, u).
Step 2-3 Let r = u = 4.
B1
A1 C1
![Page 14: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/14.jpg)
When C4 (110100) is to enter
1. Since r=4 and h(C4, 4) = 0100, follow the pointer of d[0100].
2. A0 (100000) and B0 ((101000)) have been at d[0100]. Bucket overflows.
Find the least u such that h(C1, u) is not the same with some keys in h(C1, 4) (0100) bucket.
In this case, u = 3. Step 2-1
Since u = 3 < r = 4, d is not required to expand its size.
![Page 15: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/15.jpg)
A0 B0
A1 C1
C2
C3
0000
0001
0010
0011
0100
0101
0110
0111
C5
1000
1001
1010
1011
1100
1101
1110
1111
B1
B0
C4
![Page 16: Chapter 8 Hashing](https://reader035.vdocument.in/reader035/viewer/2022081419/568139de550346895da19431/html5/thumbnails/16.jpg)
AdvantagesOnly doubling directory rather than the whole
hash table used in static hashing.Only rehash the entries in the buckets that
overflows.