bloom filter
DESCRIPTION
bloomfilter is a data structure that can support very fast owership query and it has very compacted storage space.TRANSCRIPT
![Page 2: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/2.jpg)
2
• A Membership Query Problem
• What is Bloom Filter
• BloomFilter Math Theory
• Compression
• Application Scenario
Agenda
![Page 3: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/3.jpg)
3
Problem Description
Given an element E, query whether it
belongs to an big elements set S.
– Fast as soon as possible
– Small as soon as possible
Membership Query Problem
![Page 4: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/4.jpg)
4
Some Solutions
hashtable
fast but big data structure
bitmap index
can be smaller?
Membership Query Problem
![Page 5: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/5.jpg)
5
Tradeoff Solutions
To obtain speed and size improvements,
allow some probability of error.
Bloom Filter
Membership Query Problem
![Page 6: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/6.jpg)
6
Support approximate set membership Given a set S = {x1,x2,…,xn}, construct data
structure to answer queries of the form “Is y in S?”
Data structure should be:–Fast (Faster than searching through S).–Small (Smaller than explicit representation).
To obtain speed and size improvements, allow some probability of error.
–False positives: y S but we report y S–False negatives: y S but we report y S
What is Bloom Filter
![Page 7: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/7.jpg)
7
What is Bloom Filter
7
Start with an m bit array, filled with 0s.
Hash each item xj in S k times. If Hi(xj) = a, set B[a] = 1.
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0B
0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0B
To check if y is in S, check B at Hi(y). All k values must be 1.
0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0B
0 1 0 0 1 0 1 0 0 1 1 1 0 1 1 0BPossible to have a false positive; all k values are 1, but y is not in S.
n items m = cn bits k hash functions
![Page 8: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/8.jpg)
What is Bloom Filter
False Positive
8
A
0
0
1
0
1
0
0
0
0
1
0
hash1
hash2
hash3B
![Page 9: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/9.jpg)
Bloom Filter Math Theory
9
Pr(specific bit of filter is 0) is
If is fraction of 0 bits in the filter then false positive probability is
Approximations valid as is concentrated around E[].
–Martingale argument suffices. Find optimal at k = (ln 2)m/n by calculus.
–So optimal fpp is about (0.6185)m/n
pmp mknkn /e)/11('
kckkkk pp )e1()1()'1()1( /
n items m = cn bits k hash functions
![Page 10: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/10.jpg)
Bloom Filter Math Theory
10
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0 1 2 3 4 5 6 7 8 9 10
Hash functions
Fal
se p
osit
ive
rate
Opt k = 8 ln 2 = 5.45...m/n = 8
n items m = cn bits k hash functions
![Page 11: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/11.jpg)
Bloom Filter Compression
Use BF on Network Transmission
BF as a message, should be small
enough
to transmitted over the network
Compressing bit vector is easy
Arithmetic coding gets close to entropy.
Can Bloom filters be compressed?
11
![Page 12: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/12.jpg)
Bloom Filter Compression
• Optimize to minimize false positive.
• At k = m (ln 2) /n, p = 1/2.
• Bloom filter looks like a random string.– Can’t compress it.– H(p) = -plog2p – (1-p)log2(1-p)
12
mknkn emp /)/11(]empty is cellPr[ kmknk epf )1()1(]pos falsePr[ /
nmk /)2ln(
![Page 13: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/13.jpg)
Bloom Filter Compression With more decompressed size (storage),
we can achive compression.
13
• Assumption: optimal compressor, z = mH(p). – H(p) is entropy function; optimally get
H(p) compressed bits per original table bit.– Arithmetic coding close to optimal.
• Optimization: Given z bits for compressed filter and n elements, choose table size m and number of hash functions k to minimize f. )(;)1(; // pmHzefep kmknmkn
![Page 14: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/14.jpg)
Bloom Filter Compression
1414
0
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
0 1 2 3 4 5 6 7 8 9 10
Hash functions
Fal
se p
osit
ive
rate
z/n = 8Original
Compressed
![Page 15: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/15.jpg)
Bloom Filter Compression
• At k = m (ln 2) /n, false positives are maximized with a compressed Bloom filter.– Best case without compression is worst case
with compression; compression always helps.
– Side benefit: Use fewer hash functions with compression; possible speedup.
1515
Conclusion
![Page 16: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/16.jpg)
Application Scenario
Speed up answers in a key-value like syetem
16
filter(memory)
storage(memory)key1
no
key2yes
disk accesssuccess
key3yes
disk accessfail
![Page 17: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/17.jpg)
Application Scenario
Web Cache
17
cache1 cache2 cache3……
Web Server
![Page 18: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/18.jpg)
Q&A
18
Q&A
![Page 19: Bloom filter](https://reader033.vdocument.in/reader033/viewer/2022061207/5486013fb4af9fac198b4592/html5/thumbnails/19.jpg)