hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · open addressing hash tables. hash table...
TRANSCRIPT
![Page 1: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/1.jpg)
Hash tables
Hash tables
![Page 2: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/2.jpg)
Dictionary
Definition
A dictionary is a data structure that stores a set of elementswhere each element has a unique key, and supports thefollowing operations:
Search(S , k) Return the element whose key is k .
Insert(S , x) Add x to S .
Delete(S , x) Remove x from S
Assume that the keys are from U = 0, . . . , u − 1.
Hash tables
![Page 3: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/3.jpg)
Direct addressing
S is stored in an array T [0..u − 1]. The entry T [k]contains a pointer to the element with key k , if suchelement exists, and NULL otherwise.
Search(T , k): return T [k].
Insert(T , x): T [x .key]← x .
Delete(T , x): T [x .key]← NULL.
What is the problem with this structure?0123456789
23
5
8
keysatellitedata
1
94
07
6
25
38
U
S
Hash tables
![Page 4: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/4.jpg)
Hashing
In order to reduce the space complexity of directaddressing, map the keys to a smaller range0, . . .m − 1 using a hash function.
There is a problem of collisions (two or more keys thatare mapped to the same value).
There are several methods to handle collisions.
ChainingOpen addressing
Hash tables
![Page 5: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/5.jpg)
Hash table with chaining
Let h : U → 0, . . . ,m − 1 be a hash function (m < u).
S is stored in a table T [0..m − 1] of linked lists. Theelement x ∈ S is stored in the list T [h(x .key)].
Search(T , k): Search the list T [h(k)].
Insert(T , x): Insert x at the head of T [h(x .key)].
Delete(T , x): Delete x from the list containing x .
S=30,9,26,19,1,6
m=5, h(x)=x mod 5
Hash tables
![Page 6: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/6.jpg)
Hash table with chaining
Let h : U → 0, . . . ,m − 1 be a hash function (m < u).
S is stored in a table T [0..m − 1] of linked lists. Theelement x ∈ S is stored in the list T [h(x .key)].
Search(T , k): Search the list T [h(k)].
Insert(T , x): Insert x at the head of T [h(x .key)].
Delete(T , x): Delete x from the list containing x .
S=30,9,26,19,1,6
m=5, h(x)=x mod 5
30
Hash tables
![Page 7: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/7.jpg)
Hash table with chaining
Let h : U → 0, . . . ,m − 1 be a hash function (m < u).
S is stored in a table T [0..m − 1] of linked lists. Theelement x ∈ S is stored in the list T [h(x .key)].
Search(T , k): Search the list T [h(k)].
Insert(T , x): Insert x at the head of T [h(x .key)].
Delete(T , x): Delete x from the list containing x .
S=30,9,26,19,1,6
m=5, h(x)=x mod 5
30
9
Hash tables
![Page 8: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/8.jpg)
Hash table with chaining
Let h : U → 0, . . . ,m − 1 be a hash function (m < u).
S is stored in a table T [0..m − 1] of linked lists. Theelement x ∈ S is stored in the list T [h(x .key)].
Search(T , k): Search the list T [h(k)].
Insert(T , x): Insert x at the head of T [h(x .key)].
Delete(T , x): Delete x from the list containing x .
S=30,9,26,19,1,6
m=5, h(x)=x mod 5
9
26
30
Hash tables
![Page 9: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/9.jpg)
Hash table with chaining
Let h : U → 0, . . . ,m − 1 be a hash function (m < u).
S is stored in a table T [0..m − 1] of linked lists. Theelement x ∈ S is stored in the list T [h(x .key)].
Search(T , k): Search the list T [h(k)].
Insert(T , x): Insert x at the head of T [h(x .key)].
Delete(T , x): Delete x from the list containing x .
S=30,9,26,19,1,6
m=5, h(x)=x mod 5
19 9
26
30
Hash tables
![Page 10: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/10.jpg)
Hash table with chaining
Let h : U → 0, . . . ,m − 1 be a hash function (m < u).
S is stored in a table T [0..m − 1] of linked lists. Theelement x ∈ S is stored in the list T [h(x .key)].
Search(T , k): Search the list T [h(k)].
Insert(T , x): Insert x at the head of T [h(x .key)].
Delete(T , x): Delete x from the list containing x .
S=30,9,26,19,1,6
m=5, h(x)=x mod 5
19 9
261
30
Hash tables
![Page 11: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/11.jpg)
Hash table with chaining
Let h : U → 0, . . . ,m − 1 be a hash function (m < u).
S is stored in a table T [0..m − 1] of linked lists. Theelement x ∈ S is stored in the list T [h(x .key)].
Search(T , k): Search the list T [h(k)].
Insert(T , x): Insert x at the head of T [h(x .key)].
Delete(T , x): Delete x from the list containing x .
S=30,9,26,19,1,6
m=5, h(x)=x mod 5
19 9
6 261
30
Hash tables
![Page 12: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/12.jpg)
Analysis — Worst case
The worst case running times of the operations are:
Search: Θ(n).
Insert: Θ(1).
Delete: Θ(n) (can be reduced to Θ(1) using doublylinked list).
Hash tables
![Page 13: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/13.jpg)
Analysis
Assumption of simple uniform hashing: any element isequally likely to hash into any of the m slots,independently of where other elements have hashed into.
The above assumption is true when the keys are chosenuniformly and independently at random (withrepetitions), h(x) = x mod m, and u is divisible by m.
We want to analyze the performance of hashing under theassumption of simple uniform hashing. This is the ballsinto bins problem.
Suppose we randomly place n balls into m bins. Let X bethe number of balls in bin 0.
The expected time of an unsuccessful search in a hashtable is Θ(1 + E [X ]).
Hash tables
![Page 14: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/14.jpg)
The expectation of X
Each ball has probability 1/m to be in bin 0.
The random variable X has binomial distribution withparameters n and 1/m.
Therefore, E [X ] = n/m.
The expected time of an unsuccessful search in a hashtable is Θ(1 + α), where α = n/m.
The value α is called the load factor.
The expected time of searching a random key from S isalso Θ(1 + α).
Hash tables
![Page 15: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/15.jpg)
Example
If α = 1 and n is large:
Pr[X = 0] ≈ 0.368 Pr[X = 4] ≈ 0.015
Pr[X = 1] ≈ 0.368 Pr[X = 5] ≈ 0.003
Pr[X = 2] ≈ 0.184 Pr[X = 6] ≈ 0.0005
Pr[X = 3] ≈ 0.061 Pr[X = 7] ≈ 0.00007
Hash tables
![Page 16: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/16.jpg)
The maximum number of balls in a bin
Suppose that n = m = 106. Most bins contain 0–3 balls.
The probability that a specific bin contains at least 8 ballsis ≈ 0.000009.
However, it is very likely that some bin will contain 8balls.
Theorem
If m ∈ Θ(n) then the size of the largest bin isΘ(log n/ log log n) with probability at least 1− 1/nΘ(1).
Hash tables
![Page 17: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/17.jpg)
Universal hash functions
Random keys + fixed hash function (e.g. mod)⇒ The hashed keys are random numbers.
A fixed set of keys + random hash function (selectedfrom a universal set)⇒ The hashed keys are semi-random numbers.
Hash tables
![Page 18: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/18.jpg)
Universal hash functions
Definition
A collection H of hash functions is a universal if for every pairof distinct keys x , y ∈ U , Prh∈H[h(x) = h(y)] ≤ 1
m.
Example
Let p be a prime number larger than u.fa,b(x) = ((ax + b) mod p) mod mHp,m = fa,b|a ∈ 1, 2, . . . , p − 1, b ∈ 0, 1, . . . , p − 1
![Page 19: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/19.jpg)
Universal hash functions
Theorem
Suppose that H is a universal collection of hash functions.If a hash table for S is built using a randomly chosen h ∈ H,then for every k ∈ U, the expected length of T [h(k)] is atmost 1 + n/m.
Proof.
Let X = length of T [h(k)].X =
∑y∈S Iy where Iy = 1 if h(y .key) = h(k) and Iy = 0
otherwise.
E [X ] = E
[∑y∈S
Iy
]=
∑y∈S
E [Iy ] =∑y∈S
Prh∈H
[h(y .key) = h(k)]
If k /∈ S , E [X ] ≤ n · 1m
. Otherwise, E [X ] ≤ 1 + (n − 1) 1m
.
Hash tables
![Page 20: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/20.jpg)
Rehashing
We wish to maintain n ∈ O(m) in order to have Θ(1)expected search time.
This can be achieved by rehashing. Suppose we wantn ≤ m. When the table has m elements and a newelement is inserted, create a new table of size 2m andcopy all elements into the new table.
19 34
6 21
h(x)=x mod 5
30
34
21
6
30
h'(x)=x mod 10
19
Hash tables
![Page 21: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/21.jpg)
Rehashing
We wish to maintain n ∈ O(m) in order to have Θ(1)expected search time.
This can be achieved by rehashing. Suppose we wantn ≤ m. When the table has m elements and a newelement is inserted, create a new table of size 2m andcopy all elements into the new table.The cost of rehashing is Θ(n).
Hash tables
![Page 22: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/22.jpg)
Rehashing
In order to keep the space usage Θ(n), we needn ∈ Ω(m).
We can keep α ∈ [ 14, 1]. If an insert would make α > 1,
create a new table of size 2m. If a delete would makeα < 1
4, create a new table of size m/2.
Hash tables
![Page 23: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/23.jpg)
Time complexity of chaining
Under the assumption of simple uniform hashing, theexpected time of a search is Θ(1 + α) time.
If α = Θ(1), and under the assumption of simple uniformhashing, the worst case time of a search isΘ(log n/ log log n), with probability at least 1− 1/nΘ(1).
If the hash function is chosen from a universal collectionat random, the expected time of a search is Θ(1 + α).
The worst case time of insert is Θ(1) if there is norehashing.
Using doubly linked lists, the worst case time of delete isΘ(1).
Hash tables
![Page 24: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/24.jpg)
Linear probing
In the following, we assume that the elementsin the hash table are keys with no satelliteinformation.
To insert element k , try inserting to T [h(k)].If T [h(k)] is not empty, try T [h(k) + 1 mod m],then try T [h(k) + 2 mod m] etc.
Insert(T , k)
(1) j ← h(k)(2) for i = 0 to m − 1(3) if T [j ] = NULL(4) T [j ]← k(5) return(6) j ← j + 1 mod m(7) error “hash table overflow”
0123456789
h(k)=k mod 10
Hash tables
![Page 25: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/25.jpg)
Linear probing
In the following, we assume that the elementsin the hash table are keys with no satelliteinformation.
To insert element k , try inserting to T [h(k)].If T [h(k)] is not empty, try T [h(k) + 1 mod m],then try T [h(k) + 2 mod m] etc.
Insert(T , k)
(1) j ← h(k)(2) for i = 0 to m − 1(3) if T [j ] = NULL(4) T [j ]← k(5) return(6) j ← j + 1 mod m(7) error “hash table overflow”
0123456789
14
insert(T,14)
Hash tables
![Page 26: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/26.jpg)
Linear probing
In the following, we assume that the elementsin the hash table are keys with no satelliteinformation.
To insert element k , try inserting to T [h(k)].If T [h(k)] is not empty, try T [h(k) + 1 mod m],then try T [h(k) + 2 mod m] etc.
Insert(T , k)
(1) j ← h(k)(2) for i = 0 to m − 1(3) if T [j ] = NULL(4) T [j ]← k(5) return(6) j ← j + 1 mod m(7) error “hash table overflow”
0123456789
14
20
insert(T,20)
Hash tables
![Page 27: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/27.jpg)
Linear probing
In the following, we assume that the elementsin the hash table are keys with no satelliteinformation.
To insert element k , try inserting to T [h(k)].If T [h(k)] is not empty, try T [h(k) + 1 mod m],then try T [h(k) + 2 mod m] etc.
Insert(T , k)
(1) j ← h(k)(2) for i = 0 to m − 1(3) if T [j ] = NULL(4) T [j ]← k(5) return(6) j ← j + 1 mod m(7) error “hash table overflow”
0123456789
14
20
34
insert(T,34)
Hash tables
![Page 28: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/28.jpg)
Linear probing
In the following, we assume that the elementsin the hash table are keys with no satelliteinformation.
To insert element k , try inserting to T [h(k)].If T [h(k)] is not empty, try T [h(k) + 1 mod m],then try T [h(k) + 2 mod m] etc.
Insert(T , k)
(1) j ← h(k)(2) for i = 0 to m − 1(3) if T [j ] = NULL(4) T [j ]← k(5) return(6) j ← j + 1 mod m(7) error “hash table overflow”
0123456789
14
20
34
19insert(T,19)
Hash tables
![Page 29: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/29.jpg)
Linear probing
In the following, we assume that the elementsin the hash table are keys with no satelliteinformation.
To insert element k , try inserting to T [h(k)].If T [h(k)] is not empty, try T [h(k) + 1 mod m],then try T [h(k) + 2 mod m] etc.
Insert(T , k)
(1) j ← h(k)(2) for i = 0 to m − 1(3) if T [j ] = NULL(4) T [j ]← k(5) return(6) j ← j + 1 mod m(7) error “hash table overflow”
0123456789
14
20
34
19
55
insert(T,55)
Hash tables
![Page 30: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/30.jpg)
Linear probing
In the following, we assume that the elementsin the hash table are keys with no satelliteinformation.
To insert element k , try inserting to T [h(k)].If T [h(k)] is not empty, try T [h(k) + 1 mod m],then try T [h(k) + 2 mod m] etc.
Insert(T , k)
(1) j ← h(k)(2) for i = 0 to m − 1(3) if T [j ] = NULL(4) T [j ]← k(5) return(6) j ← j + 1 mod m(7) error “hash table overflow”
0123456789
14
20
34
19
55
49
insert(T,49)
Hash tables
![Page 31: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/31.jpg)
Searching
Search(T , k)
(1) j ← h(k)(2) for i = 0 to m − 1(3) if T [j ] = k(4) return j(5) if T [j ] = NULL(6) return NULL(7) j ← j + 1 mod m(8) return NULL
0123456789
14
20
34
19
55
49
search(T,55)Hash tables
![Page 32: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/32.jpg)
Searching
Search(T , k)
(1) j ← h(k)(2) for i = 0 to m − 1(3) if T [j ] = k(4) return j(5) if T [j ] = NULL(6) return NULL(7) j ← j + 1 mod m(8) return NULL
0123456789
14
20
34
19
55
49
search(T,25)Hash tables
![Page 33: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/33.jpg)
Deletion
If we want to delete the element T [i ], we cannotwrite NULL into T [i ].
0123456789
14
20
34
19
55
49
23
delete 14
Hash tables
![Page 34: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/34.jpg)
Deletion
If we want to delete the element T [i ], we cannotwrite NULL into T [i ].
0123456789
20
34
19
55
49
23
delete 14
Hash tables
![Page 35: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/35.jpg)
Deletion
Delete method 1: To delete element T [i ], storein T [i ] a special value DELETED.
Change line (3) in Insert to:
if T [j ] = NULL OR T [j ] = DELETED
0123456789
20
34
19
55
49
23
delete 14
D
Hash tables
![Page 36: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/36.jpg)
Deletion
Delete method 2: Erase T [i ] from the table (re-place it by NULL) and also erase the consecutiveblock of elements following T [i ]. Then, reinsertthe latter elements to the table.
0123456789
14
20
34
19
55
49
23
delete 14
Hash tables
![Page 37: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/37.jpg)
Deletion
Delete method 2: Erase T [i ] from the table (re-place it by NULL) and also erase the consecutiveblock of elements following T [i ]. Then, reinsertthe latter elements to the table.
0123456789
20
19
49
23
delete 14
Hash tables
![Page 38: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/38.jpg)
Deletion
Delete method 2: Erase T [i ] from the table (re-place it by NULL) and also erase the consecutiveblock of elements following T [i ]. Then, reinsertthe latter elements to the table.
0123456789
20
19
49
23
delete 14
34
Hash tables
![Page 39: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/39.jpg)
Deletion
Delete method 2: Erase T [i ] from the table (re-place it by NULL) and also erase the consecutiveblock of elements following T [i ]. Then, reinsertthe latter elements to the table.
0123456789
20
19
49
23
delete 14
3455
Hash tables
![Page 40: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/40.jpg)
Time complexity of linear probing
Assuming simple uniform hashing and that α ≤ 1− ε for somefixed constant ε > 0:
The expected time of search is Θ(1).
The expected time of insert/delete without rehashing isΘ(1).
Hash tables
![Page 41: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/41.jpg)
Open addressing
Open addressing is a generalization oflinear probing.
Leth : U × 0, . . .m − 1 → 0, . . .m − 1be a hash function such thath(k , 0), h(k , 1), . . . h(k ,m − 1) is apermutation of 0, . . .m − 1 for everyk ∈ U .
The slots examined during search/insertare h(k , 0), then h(k , 1), h(k , 2) etc.
In the example on the right,h(k , i) = (h1(k) + ih2(k)) mod 13 whereh1(k) = k mod 13h2(k) = 1 + (k mod 11)
0123456789
101112
79
6998
72
50
14
insert(T,14)
h(14,0)
h(14,1)
h(14,2)
Hash tables
![Page 42: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/42.jpg)
Open addressing
Insertion is the same as in linear probing:Insert(T , k)
(1) for i = 0 to m − 1(2) j ← h(k , i)(3) if T [j ] = NULL OR T [j ] = DELETED(4) T [j ]← k(5) return(6) error “hash table overflow”
Deletion is done using delete method 1 defined above(using special value DELETED).
Hash tables
![Page 43: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/43.jpg)
Double hashing
In the double hashing method,
h(k , i) = (h1(k) + ih2(k)) mod m
for some hash functions h1 and h2.The value h2(k) must be relatively prime to m for theentire hash table to be searched. This can be ensured byeither
Taking m to be a power of 2, and the image of h2
contains only odd numbers.Taking m to be a prime number, and the image of h2
contains integers from 1, . . . ,m − 1.For example,
h1(k) = k mod m
h2(k) = 1 + (k mod m′)
where m is prime and m′ < m.Hash tables
![Page 44: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/44.jpg)
Comparison of hash table methods – Search
8 10 12 14 16 18 20 220
50
100
150
200
250
300
log n
Clo
ck C
ycle
s
Successful Lookup
CuckooTwo−Way ChainingChained HashingLinear Probing
Hash tables
![Page 45: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/45.jpg)
Comparison of hash table methods – Search
8 10 12 14 16 18 20 220
50
100
150
200
250
300
log n
Clo
ck C
ycle
s
Unsuccessful Lookup
CuckooTwo−Way ChainingChained HashingLinear Probing
Hash tables
![Page 46: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/46.jpg)
Comparison of hash table methods – Insert
8 10 12 14 16 18 20 220
50
100
150
200
250
300
350
400
450
log n
Clo
ck C
ycle
s
Insert
CuckooTwo−Way ChainingChained HashingLinear Probing
Hash tables
![Page 47: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/47.jpg)
Comparison of hash table methods – Delete
8 10 12 14 16 18 20 220
50
100
150
200
250
300
350
log n
Clo
ck C
ycle
s
Delete
CuckooTwo−Way ChainingChained HashingLinear Probing
Hash tables
![Page 48: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/48.jpg)
Hash function for strings
A string can be viewed as a sequence of integer, whereeach character is replaced by its ASCII value.
For example, the string ABXY is the sequence65,66,88,89.
To hash a sequence of integers x0, . . . , xr−1 from1, . . . , u, we use a prime number p > u. We select arandom number z ∈ 0, . . . , p − 1 and use the functionhz(x0, . . . , xr−1) = (x0z0 + x1z1 + · · ·+ xr−1z r−1) mod p.
For example h300(65, 66, 88, 89) =(65 + 66 · 300 + 88 · 3002 + 89 · 3003) mod p.
Hash tables
![Page 49: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/49.jpg)
Hash function for strings
We definedhz(x0, . . . , xr−1) = (x0z0 + x1z1 + · · ·+ xr−1z r−1) mod p
Theorem
Given two different sequences x0, . . . , xr−1 and y0, . . . , yr−1
over 1, . . . , u,Pr[hz(x0, . . . , xr−1) = hz(y0, . . . , yr−1)] ≤ (r − 1)/p when z ischosen randomly from 0, . . . , p − 1.
Proof.
A non-trivial polynomial of degree d over the field Zp has atmost d roots. Therefore, the polynomialQ(z) = hz(x0, . . . , xr−1)− hz(y0, . . . , yr−1) =(x0 − y0)z0 + · · ·+ (xr−1 − yr−1)z r−1
has at most r − 1 roots over the field Zp.The prob. of choosing z to be a root of Q is ≤ (r − 1)/p.
Hash tables
![Page 50: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/50.jpg)
Applications of hash functions
Applications of hash functions
![Page 51: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/51.jpg)
Applications of hash functions
Data deduplication: Suppose that we have many files, andsome files have duplicates. In order to save storage space,we want to store only one instance of each distinct file.
Rabin-Karp pattern matching algorithm To find theoccurrences of a string P of length m in a string T ,compute a hash function on every substring of T oflength m and compare with h(P).
Distributed storage: Storing files on several servers.
Applications of hash functions
![Page 52: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/52.jpg)
Rabin-Karp pattern matching algorithm
Suppose we are given strings P and T , and we want tofind all occurrences of P in T .
The Rabin-Karp algorithm is as follows:Compute h(P)for every substring T ′ of T of length |P |
if h(T ′) = h(P) check whether T ′ = P .
The values h(T ′) for all T ′ can be computed in Θ(|T |)time using rolling hash.
Example
Let T = CABXYZ, P = ABXY.
h300(CABX ) = (67 + 65 · 300 + 66 · 3002 + 88 · 3003) mod p
h300(ABXY ) = (65 + 66 · 300 + 88 · 3002 + 89 · 3003) mod p
= ((h300(CABX )− 67)/300 + 89 · 3003) mod p
Applications of hash functions
![Page 53: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/53.jpg)
Distributed storage
Suppose we have many files, and we want to store themon several servers.
Let m denote the number of servers.
The simple solution is to use a hash functionh : U → 1, . . . ,m, and assign file x to server h(x).
The problem with this solution is that if we add orremove a server, we need to do rehashing which will movemost files between servers.
Applications of hash functions
![Page 54: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/54.jpg)
Distributed storage: Consistent hashing
Suppose that the server have identifiers s1, . . . , sm.
Let h : U → [0, 1] be a hash function.
For each server i associate a point h(si) on the unit circle.
For each file f , assign f to the server whose point is thefirst point encountered when traversing the unit cycleanti-clockwise starting from h(f ).
0
server 1
server 3
server 2
Applications of hash functions
![Page 55: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/55.jpg)
Distributed storage: Consistent hashing
Suppose that the server have identifiers s1, . . . , sm.
Let h : U → [0, 1] be a hash function.
For each server i associate a point h(si) on the unit circle.
For each file f , assign f to the server whose point is thefirst point encountered when traversing the unit cycleanti-clockwise starting from h(f ).
0
server 1
server 3
server 2
Applications of hash functions
![Page 56: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/56.jpg)
Distributed storage: Consistent hashing
When a new server m + 1 is added, let i be the serverwhose point is the first server point after h(sm+1).
We only need to reassign some of the files that wereassigned to server i .
The expected number of files reassignments is n/(m + 1).
0
server 1
server 3
server 2
server 4
Applications of hash functions
![Page 57: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/57.jpg)
Bloom Filter
Bloom Filter
![Page 58: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/58.jpg)
Blocking malicious web pages
Suppose we want to implement blocking of malicious webpages in a web browser.
Assume we have a list of 10,000,000 malicious pages.
We can store all pages in a hash table, but this requires alarge amount of memory.
Bloom filter is a solution for the problem that uses smallamount of memory.
Bloom filter stores a set of elements and supportMembership and Insert operations.
For an element that is in the set, the Membershipoperation always returns a positive answer.
For an element that is not in the set, the Membershipoperation can return either a negative answer, or apositive answer (false positive).
Bloom Filter
![Page 59: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/59.jpg)
Blocking malicious web pages
Suppose that the list of malicious pages is stored in aBloom filter.
For a query URL, if the answer is negative, we know thatthe URL is not a malicious page.
If the answer is positive, we do not know the correctanswer. In this case, we get the answer from a server.
We want a small probability for a false positive.
Bloom Filter
![Page 60: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/60.jpg)
Bloom filter — simple version
Suppose we want to store a set S .
We use an array A of size m initialized to 0.
We then choose a hash function h, and set A[h(x)]← 1for every x ∈ S .
For a query y , if A[h(y)] = 0, we know that y /∈ S .
If A[h(y)] = 1, either y is in S or not.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
0
Bloom Filter
![Page 61: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/61.jpg)
Bloom filter — simple version
Suppose we want to store a set S .
We use an array A of size m initialized to 0.
We then choose a hash function h, and set A[h(x)]← 1for every x ∈ S .
For a query y , if A[h(y)] = 0, we know that y /∈ S .
If A[h(y)] = 1, either y is in S or not.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
01
Bloom Filter
![Page 62: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/62.jpg)
Bloom filter — simple version
Suppose we want to store a set S .
We use an array A of size m initialized to 0.
We then choose a hash function h, and set A[h(x)]← 1for every x ∈ S .
For a query y , if A[h(y)] = 0, we know that y /∈ S .
If A[h(y)] = 1, either y is in S or not.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
011
Bloom Filter
![Page 63: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/63.jpg)
Bloom filter — simple version
Suppose we want to store a set S .
We use an array A of size m initialized to 0.
We then choose a hash function h, and set A[h(x)]← 1for every x ∈ S .
For a query y , if A[h(y)] = 0, we know that y /∈ S .
If A[h(y)] = 1, either y is in S or not.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
011
Bloom Filter
![Page 64: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/64.jpg)
Bloom filter — simple version
Suppose we want to store a set S .
We use an array A of size m initialized to 0.
We then choose a hash function h, and set A[h(x)]← 1for every x ∈ S .
For a query y , if A[h(y)] = 0, we know that y /∈ S .
If A[h(y)] = 1, either y is in S or not.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
011
Bloom Filter
![Page 65: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/65.jpg)
Probability of false positive
Assume simple uniform hashing: any element is equallylikely to hash into any of the m slots, independently ofwhere other elements have hashed into.
For fixed i , the probability that A[i ] = 0 isp = (1− 1/m)n.
For y /∈ S , the probability that A[h(y)] = 1 is 1− p.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
011
Bloom Filter
![Page 66: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/66.jpg)
Reducing the false positive probability
We choose t hash functions h1, h2, . . . , ht , and setA[hj(x)]← 1 for j = 1, 2, . . . t, for every x ∈ S .
For a query y , if A[hj(y)] = 0 for some j , we know thaty /∈ S .
If A[hj(y)] = 1 for all j , either y is in S or not.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
0
Bloom Filter
![Page 67: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/67.jpg)
Reducing the false positive probability
We choose t hash functions h1, h2, . . . , ht , and setA[hj(x)]← 1 for j = 1, 2, . . . t, for every x ∈ S .
For a query y , if A[hj(y)] = 0 for some j , we know thaty /∈ S .
If A[hj(y)] = 1 for all j , either y is in S or not.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
01 11
Bloom Filter
![Page 68: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/68.jpg)
Reducing the false positive probability
We choose t hash functions h1, h2, . . . , ht , and setA[hj(x)]← 1 for j = 1, 2, . . . t, for every x ∈ S .
For a query y , if A[hj(y)] = 0 for some j , we know thaty /∈ S .
If A[hj(y)] = 1 for all j , either y is in S or not.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
01 1111
Bloom Filter
![Page 69: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/69.jpg)
Reducing the false positive probability
We choose t hash functions h1, h2, . . . , ht , and setA[hj(x)]← 1 for j = 1, 2, . . . t, for every x ∈ S .
For a query y , if A[hj(y)] = 0 for some j , we know thaty /∈ S .
If A[hj(y)] = 1 for all j , either y is in S or not.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
01 1111
Bloom Filter
![Page 70: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/70.jpg)
Reducing the false positive probability
We choose t hash functions h1, h2, . . . , ht , and setA[hj(x)]← 1 for j = 1, 2, . . . t, for every x ∈ S .
For a query y , if A[hj(y)] = 0 for some j , we know thaty /∈ S .
If A[hj(y)] = 1 for all j , either y is in S or not.
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
01 1111
Bloom Filter
![Page 71: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/71.jpg)
Probability of false positive
For fixed i , the probability that A[i ] = 0 isp = (1− 1/m)tn.
For y /∈ S , the probability that A[h(y)] = 1 is (1− p)t .
000 0 0 0 0 0 0 00 1 2 3 4 5 6 7 8 9
01 1111
Bloom Filter
![Page 72: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/72.jpg)
Optimizing the probability of false positive
For the previous example (n = 2, m = 10):
For t = 1, p = 0.81, (1− p)1 = 0.19.
For t = 2, p ≈ 0.656, (1− p)2 ≈ 0.118.
For t = 3, p ≈ 0.531, (1− p)3 ≈ 0.103.
For t = 4, p ≈ 0.430, (1− p)4 ≈ 0.105.
For t = 5, p ≈ 0.349, (1− p)5 ≈ 0.117.
Bloom Filter
![Page 73: Hash tablesds202/wiki.files/08-hash.pdf · 2020-04-29 · Open addressing Hash tables. Hash table with chaining Let h : U !f0;:::;m 1gbe a hash function (m](https://reader034.vdocument.in/reader034/viewer/2022042803/5f472a89e8d3484a366e4468/html5/thumbnails/73.jpg)
Optimizing the probability of false positive
Let f (t) be the probability of false positive when using thash functions.
p = (1− 1/m)tn ≈ e−tn/m
f (t) = (1− p)t ≈ (1− e−tn/m)t = et·ln(1−e−tn/m)
Minimizing f (t) is equivalent to minimizingg(t) = t · ln(1− e−tn/m).dgdt
= ln(1− e−tn/m) + tnm
e−tn/m
1−e−tn/m .
The minimum of g(t) is when dg/dt = 0 =⇒ t = mn
ln 2.
The minimum of f (t) is ≈ 0.6185m/n.
Example: For m = 8n, mn
ln 2 ≈ 5.55. f (5) ≈ 0.0217,f (6) ≈ 0.0216.
Bloom Filter