mining frequent closed graphs on evolving data streams
DESCRIPTION
Graph mining is a challenging task by itself, and even more so when processing data streams which evolve in real-time. Data stream mining faces hard constraints regarding time and space for processing, and also needs to provide for concept drift detection. In this talk we present a framework for studying graph pattern mining on time-varying streams and large datasets.TRANSCRIPT
![Page 1: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/1.jpg)
Mining Frequent Closed Graphs on Evolving Data Streams
A. Bifet, G. Holmes, B. Pfahringer and R. Gavalda
University of WaikatoHamilton, New Zealand
Laboratory for Relational Algorithmics, Complexity and Learning LARCAUPC-Barcelona Tech, Catalonia
San Diego, 24 August 201117th ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining 2011
![Page 2: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/2.jpg)
Mining Evolving Graph Data StreamsProblemGiven a data stream D of graphs, find frequent closed graphs.
Transaction Id Graph
1
C C S N
O
O
2
C C S N
O
C
3 C C S N
N
We provide three algorithms,of increasing power
IncrementalSliding WindowAdaptive
2 / 48
![Page 3: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/3.jpg)
Non-incrementalFrequent Closed Graph Mining
CloseGraph: Xifeng Yan, Jiawei Hanright-most extension based on depth-first searchbased on gSpan ICDM’02
MoSS: Christian Borgelt, Michael R. Bertholdbreadth-first searchbased on MoFa ICDM’02
3 / 48
![Page 4: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/4.jpg)
Outline
1 Streaming Data
2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER
3 Experimental Evaluation
4 Summary and Future Work
4 / 48
![Page 5: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/5.jpg)
Outline
1 Streaming Data
2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER
3 Experimental Evaluation
4 Summary and Future Work
5 / 48
![Page 6: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/6.jpg)
Mining Massive Data
Source: IDC’s Digital Universe Study (EMC), June 2011
6 / 48
![Page 7: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/7.jpg)
Data Streams
Data StreamsSequence is potentially infiniteHigh amount of dataHigh speed of arrivalOnce an element from a data stream has been processedit is discarded or archived
Tools:approximationrandomization, samplingsketching
7 / 48
![Page 8: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/8.jpg)
Data Streams Approximation Algorithms
1011000111 1010101
Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1
εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002
8 / 48
![Page 9: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/9.jpg)
Data Streams Approximation Algorithms
10110001111 0101011
Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1
εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002
8 / 48
![Page 10: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/10.jpg)
Data Streams Approximation Algorithms
101100011110 1010111
Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1
εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002
8 / 48
![Page 11: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/11.jpg)
Data Streams Approximation Algorithms
1011000111101 0101110
Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1
εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002
8 / 48
![Page 12: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/12.jpg)
Data Streams Approximation Algorithms
10110001111010 1011101
Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1
εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002
8 / 48
![Page 13: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/13.jpg)
Data Streams Approximation Algorithms
101100011110101 0111010
Sliding WindowWe can maintain simple statistics over sliding windows, usingO(1
εlog2 N) space, whereN is the length of the sliding windowε is the accuracy parameter
M. Datar, A. Gionis, P. Indyk, and R. Motwani.Maintaining stream statistics over sliding windows. 2002
8 / 48
![Page 14: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/14.jpg)
Outline
1 Streaming Data
2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER
3 Experimental Evaluation
4 Summary and Future Work
9 / 48
![Page 15: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/15.jpg)
Pattern Mining
Dataset ExampleDocument Patterns
d1 abced2 cded3 abced4 acded5 abcded6 bcd
10 / 48
![Page 16: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/16.jpg)
Itemset Mining
d1 abced2 cded3 abced4 acded5 abcded6 bcd
Support Frequentd1,d2,d3,d4,d5,d6 c
d1,d2,d3,d4,d5 e,ced1,d3,d4,d5 a,ac,ae,aced1,d3,d5,d6 b,bc
d2,d4,d5 d,cdd1,d3,d5 ab,abc,abe
be,bce,abced2,d4,d5 de,cde
11 / 48
![Page 17: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/17.jpg)
Itemset Mining
d1 abced2 cded3 abced4 acded5 abcded6 bcd
Support Frequent6 c5 e,ce4 a,ac,ae,ace4 b,bc4 d,cd3 ab,abc,abe
be,bce,abce3 de,cde
12 / 48
![Page 18: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/18.jpg)
Itemset Mining
d1 abced2 cded3 abced4 acded5 abcded6 bcd
Support Frequent Gen Closed6 c c c5 e,ce e ce4 a,ac,ae,ace a ace4 b,bc b bc4 d,cd d cd3 ab,abc,abe ab
be,bce,abce be abce3 de,cde de cde
12 / 48
![Page 19: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/19.jpg)
Itemset Mining
d1 abced2 cded3 abced4 acded5 abcded6 bcd
Support Frequent Gen Closed Max6 c c c5 e,ce e ce4 a,ac,ae,ace a ace4 b,bc b bc4 d,cd d cd3 ab,abc,abe ab
be,bce,abce be abce abce3 de,cde de cde cde
12 / 48
![Page 20: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/20.jpg)
Outline
1 Streaming Data
2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER
3 Experimental Evaluation
4 Summary and Future Work
13 / 48
![Page 21: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/21.jpg)
Graph DatasetTransaction Id Graph Weight
1
C C S N
O
O 1
2
C C S N
O
C 1
3
C S N
O
C 1
4 C C S N
N
1
14 / 48
![Page 22: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/22.jpg)
Graph Coresets
Coreset of a set P with respect to some problemSmall subset that approximates the original set P.
Solving the problem for the coreset provides anapproximate solution for the problem on P.
δ -tolerance Closed GraphA graph g is δ -tolerance closed if none of its proper frequentsupergraphs has a weighted support ≥ (1−δ ) ·support(g).
Maximal graph: 1-tolerance closed graphClosed graph: 0-tolerance closed graph.
15 / 48
![Page 23: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/23.jpg)
Graph Coresets
Coreset of a set P with respect to some problemSmall subset that approximates the original set P.
Solving the problem for the coreset provides anapproximate solution for the problem on P.
δ -tolerance Closed GraphA graph g is δ -tolerance closed if none of its proper frequentsupergraphs has a weighted support ≥ (1−δ ) ·support(g).
Maximal graph: 1-tolerance closed graphClosed graph: 0-tolerance closed graph.
15 / 48
![Page 24: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/24.jpg)
Graph Coresets
Relative support of a closed graphSupport of a graph minus the relative support of its closedsupergraphs.
The sum of the closed supergraphs’ relative supports of agraph and its relative support is equal to its own support.
(s,δ )-coreset for the problem of computing closedgraphsWeighted multiset of frequent δ -tolerance closed graphs withminimum support s using their relative support as a weight.
16 / 48
![Page 25: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/25.jpg)
Graph Coresets
Relative support of a closed graphSupport of a graph minus the relative support of its closedsupergraphs.
The sum of the closed supergraphs’ relative supports of agraph and its relative support is equal to its own support.
(s,δ )-coreset for the problem of computing closedgraphsWeighted multiset of frequent δ -tolerance closed graphs withminimum support s using their relative support as a weight.
16 / 48
![Page 26: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/26.jpg)
Graph DatasetTransaction Id Graph Weight
1
C C S N
O
O 1
2
C C S N
O
C 1
3
C S N
O
C 1
4 C C S N
N
1
17 / 48
![Page 27: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/27.jpg)
Graph Coresets
Graph Relative Support SupportC C S N 3 3
C S N
O
3 3
C S
N
3 3
Table: Example of a coreset with minimum support 50% and δ = 1
18 / 48
![Page 28: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/28.jpg)
Graph Coresets
Figure: Number of graphs in a (40%,δ )-coreset for NCI.
19 / 48
![Page 29: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/29.jpg)
Outline
1 Streaming Data
2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER
3 Experimental Evaluation
4 Summary and Future Work
20 / 48
![Page 30: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/30.jpg)
INCGRAPHMINER
INCGRAPHMINER(D,min sup)
Input: A graph dataset D, and min sup.Output: The frequent graph set G.
1 G← /02 for every batch bt of graphs in D3 do C← CORESET(bt ,min sup)4 G← CORESET(G∪C,min sup)5 return G
21 / 48
![Page 31: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/31.jpg)
WINGRAPHMINER
WINGRAPHMINER(D,W ,min sup)
Input: A graph dataset D, a size window W and min sup.Output: The frequent graph set G.
1 G← /02 for every batch bt of graphs in D3 do C← CORESET(bt ,min sup)4 Store C in sliding window5 if sliding window is full6 then R← Oldest C stored in sliding window,
negate all support values7 else R← /08 G← CORESET(G∪C∪R,min sup)9 return G
22 / 48
![Page 32: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/32.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111W0= 1
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 33: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/33.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111W0= 1 W1 = 01010110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 34: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/34.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111W0= 10 W1 = 1010110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 35: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/35.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111W0= 101 W1 = 010110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 36: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/36.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111W0= 1010 W1 = 10110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 37: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/37.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111W0= 10101 W1 = 0110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 38: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/38.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111W0= 101010 W1 = 110111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 39: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/39.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111W0= 1010101 W1 = 10111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 40: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/40.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111W0= 10101011 W1 = 0111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 41: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/41.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111 |µW0− µW1 | ≥ εc : CHANGE DET.!
W0= 101010110 W1 = 111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 42: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/42.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 101010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 43: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/43.jpg)
Algorithm ADaptive Sliding WINdowExample
W= 01010110111111 Drop elements from the tail of WW0= 101010110 W1 = 111111
ADWIN: ADAPTIVE WINDOWING ALGORITHM
1 Initialize Window W2 for each t > 03 do W ←W ∪{xt} (i.e., add xt to the head of W )4 repeat Drop elements from the tail of W5 until |µW0− µW1 | ≥ εc holds6 for every split of W into W = W0 ·W17 Output µW
23 / 48
![Page 44: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/44.jpg)
Algorithm ADaptive Sliding WINdow
TheoremAt every time step we have:
1 (False positive rate bound). If µt remains constant withinW, the probability that ADWIN shrinks the window at thisstep is at most δ .
2 (False negative rate bound). Suppose that for somepartition of W in two parts W0W1 (where W1 contains themost recent items) we have |µW0−µW1 |> 2εc . Then withprobability 1−δ ADWIN shrinks W to W1, or shorter.
ADWIN tunes itself to the data stream at hand, with no need forthe user to hardwire or precompute parameters.
24 / 48
![Page 45: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/45.jpg)
Algorithm ADaptive Sliding WINdow
ADWIN using a Data Stream Sliding Window Model,can provide the exact counts of 1’s in O(1) time per point.tries O(logW ) cutpointsuses O(1
εlogW ) memory words
the processing time per example is O(logW ) (amortizedand worst-case).
Sliding Window Model
1010101 101 11 1 1Content: 4 2 2 1 1Capacity: 7 3 2 1 1
25 / 48
![Page 46: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/46.jpg)
ADAGRAPHMINER
ADAGRAPHMINER(D,Mode,min sup)
1 G← /02 Init ADWIN3 for every batch bt of graphs in D4 do C ← CORESET(bt ,min sup)5 R← /06 if Mode is Sliding Window7 then Store C in sliding window8 if ADWIN detected change9 then R← Batches to remove
in sliding windowwith negative support
10 G← CORESET(G∪C∪R,min sup)11 if Mode is Sliding Window12 then Insert # closed graphs into ADWIN13 else for every g in G update g’s ADWIN14 return G
26 / 48
![Page 47: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/47.jpg)
ADAGRAPHMINER
ADAGRAPHMINER(D,Mode,min sup)
1 G← /02 Init ADWIN3 for every batch bt of graphs in D4 do C ← CORESET(bt ,min sup)5 R← /06789
10 G← CORESET(G∪C∪R,min sup)111213 for every g in G update g’s ADWIN14 return G
26 / 48
![Page 48: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/48.jpg)
ADAGRAPHMINER
ADAGRAPHMINER(D,Mode,min sup)
1 G← /02 Init ADWIN3 for every batch bt of graphs in D4 do C ← CORESET(bt ,min sup)5 R← /06 if Mode is Sliding Window7 then Store C in sliding window8 if ADWIN detected change9 then R← Batches to remove
in sliding windowwith negative support
10 G← CORESET(G∪C∪R,min sup)11 if Mode is Sliding Window12 then Insert # closed graphs into ADWIN1314 return G
26 / 48
![Page 49: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/49.jpg)
Outline
1 Streaming Data
2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER
3 Experimental Evaluation
4 Summary and Future Work
27 / 48
![Page 50: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/50.jpg)
Experimental Evaluation
ChemDB datasetPublic dataset4 million moleculesInstitute for Genomics and Bioinformatics at the Universityof California, Irvine
Open NCI DatabasePublic domain250,000 structuresNational Cancer Institute
28 / 48
![Page 51: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/51.jpg)
What is MOA?{M}assive {O}nline {A}nalysis is a framework for onlinelearning from data streams.
It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:
classificationclustering
Easy to extendEasy to design and run experiments
29 / 48
![Page 52: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/52.jpg)
WEKA: the bird
30 / 48
![Page 53: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/53.jpg)
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
31 / 48
![Page 54: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/54.jpg)
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
31 / 48
![Page 55: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/55.jpg)
MOA: the bird
The Moa (another native NZ bird) is not only flightless, like theWeka, but also extinct.
31 / 48
![Page 56: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/56.jpg)
Classification Experimental Setting
32 / 48
![Page 57: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/57.jpg)
Classification Experimental Setting
ClassifiersNaive BayesDecision stumpsHoeffding TreeHoeffding Option TreeBagging and BoostingADWIN Bagging andLeveraging Bagging
Prediction strategiesMajority classNaive Bayes LeavesAdaptive Hybrid
33 / 48
![Page 58: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/58.jpg)
Clustering Experimental Setting
34 / 48
![Page 59: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/59.jpg)
Clustering Experimental Setting
ClusterersStreamKM++CluStreamClusTreeDen-StreamCobWeb
35 / 48
![Page 60: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/60.jpg)
Clustering Experimental SettingInternal measures External measuresGamma Rand statisticC Index Jaccard coefficientPoint-Biserial Folkes and Mallow IndexLog Likelihood Hubert Γ statisticsDunn’s Index Minkowski scoreTau PurityTau A van Dongen criterionTau C V-measureSomer’s Gamma CompletenessRatio of Repetition HomogeneityModified Ratio of Repetition Variation of informationAdjusted Ratio of Clustering Mutual informationFagan’s Index Class-based entropyDeviation Index Cluster-based entropyZ-Score Index PrecisionD Index RecallSilhouette coefficient F-measure
Table: Internal and external clustering evaluation measures.36 / 48
![Page 61: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/61.jpg)
Cluster Mapping Measure
Hardy Kremer, Philipp Kranen, Timm Jansen, ThomasSeidl, Albert Bifet, Geoff Holmes and Bernhard Pfahringer.An Effective Evaluation Measure for Clustering on EvolvingData StreamKDD’11
CMM: Cluster Mapping MeasureA novel evaluation measure for stream clustering on evolvingstreams
CMM(C ,C L ) = 1− ∑o∈F w(o) ·pen(o,C)
∑o∈F w(o) ·con(o,Cl(o))
37 / 48
![Page 62: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/62.jpg)
Extensions of MOA
Multi-label ClassificationActive LearningRegressionClosed Frequent Graph MiningTwitter Sentiment Analysis
Challenges for bigger data streamsSampling and distributed systems (Map-Reduce, Hadoop, S4)
38 / 48
![Page 63: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/63.jpg)
Open NCI dataset
Time NCI Dataset
0
20
40
60
80
100
120
10.0
00
30.0
00
50.0
00
70.0
00
90.0
00
110.
000
130.
000
150.
000
170.
000
190.
000
210.
000
230.
000
250.
000
Instances
Se
co
nd
s
IncGraphMiner IncGraphMiner-C MoSS closeGraph
39 / 48
![Page 64: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/64.jpg)
Open NCI dataset
Memory NCI Dataset
0100020003000400050006000700080009000
10.0
00
30.0
00
50.0
00
70.0
00
90.0
00
110.
000
130.
000
150.
000
170.
000
190.
000
210.
000
230.
000
250.
000
Instances
Me
ga
by
tes
IncGraphMiner IncGraphMiner-C MoSS closeGraph
40 / 48
![Page 65: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/65.jpg)
ChemDB dataset
Memory ChemDB Dataset
05000
100001500020000250003000035000400004500050000
10.0
00
240.
000
470.
000
700.
000
930.
000
1.16
0.00
0
1.39
0.00
0
1.62
0.00
0
1.85
0.00
0
2.08
0.00
0
2.31
0.00
0
2.54
0.00
0
2.77
0.00
0
3.00
0.00
0
3.23
0.00
0
3.46
0.00
0
3.69
0.00
0
3.92
0.00
0
Instances
Me
ga
by
tes
IncGraphMiner IncGraphMiner-C MoSS closeGraph
41 / 48
![Page 66: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/66.jpg)
ChemDB dataset
Time ChemDB Dataset
0500
100015002000250030003500400045005000
10.0
00
240.
000
470.
000
700.
000
930.
000
1.16
0.00
0
1.39
0.00
0
1.62
0.00
0
1.85
0.00
0
2.08
0.00
0
2.31
0.00
0
2.54
0.00
0
2.77
0.00
0
3.00
0.00
0
3.23
0.00
0
3.46
0.00
0
3.69
0.00
0
3.92
0.00
0
Instances
Se
co
nd
s
IncGraphMiner IncGraphMiner-C MoSS closeGraph
42 / 48
![Page 67: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/67.jpg)
ADAGRAPHMINER
0
10
20
30
40
50
60
10.0
00
60.0
00
110.
000
160.
000
210.
000
260.
000
310.
000
360.
000
410.
000
460.
000
510.
000
560.
000
610.
000
660.
000
710.
000
760.
000
810.
000
860.
000
910.
000
960.
000
Instances
Nu
mb
er
of
Clo
se
d G
rap
hs
ADAGRAPHMINER ADAGRAPHMINER-Window IncGraphMiner
43 / 48
![Page 68: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/68.jpg)
Outline
1 Streaming Data
2 Frequent Pattern MiningMining Evolving Graph StreamsADAGRAPHMINER
3 Experimental Evaluation
4 Summary and Future Work
44 / 48
![Page 69: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/69.jpg)
SummaryMining Evolving Graph Data StreamsGiven a data stream D of graphs, find frequent closed graphs.
Transaction Id Graph
1
C C S N
O
O
2
C C S N
O
C
3 C C S N
N
We provide three algorithms,of increasing power
IncrementalSliding WindowAdaptive
45 / 48
![Page 70: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/70.jpg)
Summary{M}assive {O}nline {A}nalysis is a framework for onlinelearning from data streams.
http://moa.cs.waikato.ac.nz
It is closely related to WEKAIt includes a collection of offline and online as well as toolsfor evaluation:
classificationclusteringfrequent pattern mining
MOA deals with evolving data streamsMOA is easy to use and extend
46 / 48
![Page 71: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/71.jpg)
Future work
Structured massive data miningsampling, sketchingparallel techniques
47 / 48
![Page 72: Mining Frequent Closed Graphs on Evolving Data Streams](https://reader033.vdocument.in/reader033/viewer/2022052618/554c83b0b4c905df3c8b500a/html5/thumbnails/72.jpg)
Thanks!
48 / 48