![Page 1: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/1.jpg)
Mining of Massive DatasetsJure Leskovec, Anand Rajaraman, Jeff UllmanStanford University
http://www.mmds.org
Note to other teachers and users of these slides: We would be delighted if you found this our material useful in giving your own lectures. Feel free to use these slides verbatim, or to modify them to fit your own needs. If you make use of a significant portion of these slides in your own lecture, please include this message, or a link to our web site: http://www.mmds.org
![Page 2: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/2.jpg)
More algorithms for streams:(1) Filtering a data stream: Bloom filtersSelect elements with property x from stream
(2) Counting distinct elements: Flajolet‐MartinNumber of distinct elements in the last k elements of the stream
(3) Estimating frequency moments: AMS methodEstimate std. dev. of last k elements
(4) Counting frequent itemsets: Exponentially Decaying Windows
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 2
![Page 3: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/3.jpg)
![Page 4: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/4.jpg)
4J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 5: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/5.jpg)
5J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 6: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/6.jpg)
Given a set of keys S that we want to filterCreate a bit array B of n bits, initially all 0sChoose a hash function h with range [0,n)Hash each member of s∈ S to one of n buckets, and set that bit to 1, i.e., B[h(s)]=1Hash each element a of the stream and output only those that hash to bit that was set to 1Output a if B[h(a)] == 1
6J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 7: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/7.jpg)
Creates false positives but no false negativesIf the item is in S we surely output it, if not we may still output it
7
FilterItem
0010001011000
Output the item since it may be in S.Item hashes to a bucket that at least one of the items in S hashed to.
Hash func h
Drop the item.It hashes to a bucket set to 0 so it is surely not in S.
Bit array B
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 8: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/8.jpg)
8J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 9: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/9.jpg)
9J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 10: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/10.jpg)
We have m darts, n targetsWhat is the probability that a target gets at least one dart?
10
(1 – 1/n)
Probability sometarget X not hit
by a dart
m
1 -
Probability atleast one darthits target X
n( / n)
EquivalentEquals 1/eas n →∞
1 – e–m/n
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 11: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/11.jpg)
Fraction of 1s in the array B == probability of false positive = 1 – e‐m/n
Example: 109 darts, 8∙109 targetsFraction of 1s in B = 1 – e‐1/8 = 0.1175Compare with our earlier estimate: 1/8 = 0.125
11J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 12: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/12.jpg)
Consider: |S| = m, |B| = nUse k independent hash functions h1 ,…, hkInitialization:Set B to all 0sHash each element s∈ S using each hash function hi, set B[hi(s)] = 1 (for each i = 1,.., k)
Run‐time:When a stream element with key x arrivesIf B[hi(x)] = 1 for all i = 1,..., k then declare that x is in SThat is, x hashes to a bucket set to 1 for every hash function hi(x)
Otherwise discard the element xJ. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 12
(note: we have a single array B!)
![Page 13: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/13.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 13
![Page 14: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/14.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 14
0 2 4 6 8 10 12 14 16 18 200.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
Number of hash functions, k
Fals
e po
sitiv
e pr
ob.
![Page 15: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/15.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 15
![Page 16: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/16.jpg)
![Page 17: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/17.jpg)
17J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 18: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/18.jpg)
18J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 19: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/19.jpg)
19J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 20: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/20.jpg)
Each element x is placed in bucket k with probability 1/2k
E[#elements before bucket k is non‐empty] = 2k
R = maximum index of non‐empty bucket. R is only information we need to maintain.
Estimated number of distinct elements = 2R
20J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 21: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/21.jpg)
21J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 22: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/22.jpg)
Very very rough and heuristic intuition why Flajolet‐Martin works:h(a) hashes a with equal prob. to any of N valuesThen h(a) is a sequence of log2 N bits, where 2‐r fraction of all as have a tail of r zeros About 50% of as hash to ***0About 25% of as hash to **00So, if we saw the longest tail of r=2 (i.e., item hash ending *100) then we have probably seen about 4 distinct items so far
So, it takes to hash about 2r items before we see one with zero‐suffix of length r
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 22
![Page 23: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/23.jpg)
23J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 24: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/24.jpg)
24
Prob. that given h(a) ends in at most r zeros
Prob. all end in at most r zeros.
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 25: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/25.jpg)
25J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
1)21( 2 =≈−−−− rmmr e
0)21( 2 =≈−−−− rmmr e
rrr mmrmr e−− −−− ≈−=− 2)2(2)21()21(
![Page 26: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/26.jpg)
26J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 27: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/27.jpg)
![Page 28: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/28.jpg)
28
∑∈Aik
im )(
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 29: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/29.jpg)
0thmoment = number of distinct elementsThe problem just considered
1st moment = count of the numbers of elements = length of the streamEasy to compute
2nd moment = surprise number S =a measure of how uneven the distribution is
29J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
∑∈Aik
im )(
![Page 30: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/30.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 30
![Page 31: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/31.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 31
[Alon, Matias, and Szegedy]
![Page 32: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/32.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 32
![Page 33: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/33.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 33
Time t whenthe last i is seen (ct=1)
Time twhenthe penultimatei is seen (ct=2)
Time twhenthe first i is seen (ct=mi)
Group timesby the valueseen
a a a a
1 32 ma
b b b b
Count:
Stream:
mi … total count of item i in the stream
(we are assuming stream has length n)
![Page 34: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/34.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 34
a a a a
1 32 ma
b b b bStream:
Count:
![Page 35: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/35.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 35
![Page 36: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/36.jpg)
36J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 37: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/37.jpg)
(1) The variables X have n as a factor –keep n separately; just hold the count in X(2) Suppose we can only store k counts. We must throw some Xs out as time goes on:Objective: Each starting time t is selected with probability k/nSolution: (fixed‐size sampling!)Choose the first k times for k variablesWhen the nth element arrives (n > k), choose it with probability k/nIf you choose it, throw one of the previously stored variables X out, with equal probability
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 37
![Page 38: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/38.jpg)
![Page 39: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/39.jpg)
New Problem: Given a stream, which items appear more than s times in the window?Possible solution: Think of the stream of baskets as one binary stream per item1 = item present; 0 = not presentUse DGIM to estimate counts of 1s for all items
39J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
0 1 0 0 1 1 1 0 0 0 1 0 1 0 0 1 0 0 0 1 0 1 1 0 1 1 0 1 1 1 0 0 1 0 1 0 1 1 0 0 1 1 0 1 0N
0112
234
106
![Page 40: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/40.jpg)
40J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 41: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/41.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 41
![Page 42: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/42.jpg)
42J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 43: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/43.jpg)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 43
1/c
. . .
![Page 44: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/44.jpg)
44J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 45: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/45.jpg)
Count (some) itemsets in an E.D.W.What are currently “hot” itemsets?Problem: Too many itemsets to keep counts of all of them in memory
When a basket B comes in:Multiply all counts by (1‐c)For uncounted items in B, create new countAdd 1 to count of any item in B and to any itemsetcontained in B that is already being countedDrop counts < ½Initiate new counts (next slide)
J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org 45
![Page 46: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/46.jpg)
Start a count for an itemset S ⊆ B if every proper subset of S had a count prior to arrival of basket BIntuitively: If all subsets of S are being counted this means they are “frequent/hot” and thus S has a potential to be “hot”
Example:Start counting S={i, j} iff both i and j were counted prior to seeing BStart counting S={i, j, k} iff {i, j}, {i, k}, and {j, k}were all counted prior to seeing B
46J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org
![Page 47: Stanford University ://courses.corelab.ntua.gr/.../course/section/712/ch04-streams2.… · More algorithms for streams: (1)Filtering a data stream: Bloom filters Select elements with](https://reader036.vdocument.in/reader036/viewer/2022063007/5fb924d5dc48c7192260c8aa/html5/thumbnails/47.jpg)
47J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, http://www.mmds.org