![Page 1: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/1.jpg)
1
Mining Tree Queries in a Graph
Bart Goethals , Eveline Hoekx and Jan Van den Bussche
KDD’05presentor: Ming Jing Tsai
![Page 2: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/2.jpg)
2
Introduction
mining tree pattern T in a single graph Incremental in the number of nodes Unordered, rooted
For each tree T, all conjunctive queries are generated
SQL
![Page 3: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/3.jpg)
3
Tree query pattern example
Selected node(constant):0,8 Existential node:∃ Distinguished node: x
![Page 4: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/4.jpg)
4
matching A query Q matchs in a graph G Homomorphism h
(i,j) ∈ Q , (h(i), h(j)) ∈ G Verify value on x to distinguish them
Don’t care existential nodes on different values
![Page 5: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/5.jpg)
5
∃0 8
Q
G
Frequency = 3(4,5,8)
![Page 6: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/6.jpg)
6
Generate all trees
Increasing number of nodes Canonically ordered
Level sequence ith number is the depth of the ith node in preord
er Lexicagraph:Maximal one
Level sequence 012212 > 012122
![Page 7: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/7.jpg)
7
queries
Levelwise Fix a tree T, and find all queries based o
n T whose frequency in G is at lease k Q{∏, ∑, λ}
∏: existential nodes ∑: selected nodes λ: label of selected nodes
![Page 8: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/8.jpg)
8
![Page 9: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/9.jpg)
9
To generate candidate in an efficient manner,using of candidacy tables and frequency tables
![Page 10: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/10.jpg)
10
CanTab ∏, ∑
parents
Each candidacy table can be computed by taking the natural join of its parent’s(∏’, ∑’) frequency tables
CanTabφ,{x} as the table with a single column x,holding all nodes of the graph G being mined
![Page 11: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/11.jpg)
11
∏=x2,formulate expression->SQL
∑={x1,x3} Candidacy table
Frequency table
![Page 12: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/12.jpg)
12
Equivalent queries
To avoid query Q2 equivalent to an earlier query Q1
Containment mapping Q1 to Q2 is a homomorphism the distinguished variables of Q1 is mapping
one-to-one to those of Q2 So as selected nodes
Case1:Q1 has fewer nodes than Q2 Case2:Q1 and Q2 have the same number
of nodes
![Page 13: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/13.jpg)
13
Case1 redundancy checking
Q2 contains redundant subtrees such that removing them yields an equivalent query
Redundancy a subtree C in the form of a linear chain of exist
ential nodes such that parent of C has another subtree that is at least as deep as C
Q1Q2Q2
![Page 14: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/14.jpg)
14
Case 2 canonical forms
Q1 and Q2 are tree isomorphism Canonical forms
Existential nodes-> ∃ Selceted nodes ->c Distinguished nodes->X
C, ∃
∃,C
∃,X
C,X
X,C
X,X
C, ∃
∃,C
∃,X
C,X
X,C
X,X
![Page 15: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/15.jpg)
15
experiment
Pentium4 2.8GHz 1GB main memory Linux 2.6 C++ embedded SQL Relational database:DB2 UDB v8.2
![Page 16: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/16.jpg)
16
Real dataset
A food web, a protein intersactions graph, and a citation graph
k: frequency threshold Size: maximal size of trees in the run It all takes several hours
![Page 17: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/17.jpg)
17
Food web
154 species dependent on Scotch Broom Label 20 occurs in many frequent patterns->
Orthotylus adenocarpi( 什麼都吃的植物害蟲 )
Frequency 176
![Page 18: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/18.jpg)
18
Protein interaction graph
1870 種 Saccharomyces cerevisiae 發酵酵母菌 ( 幫助麵包發酵 )
A small number of highly connected nodes occur
![Page 19: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/19.jpg)
19
Citation graph
Kdd cup 2003 2500 papers high-energy physics 350,000 cross-references
Frequency 1655
![Page 20: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/20.jpg)
20
Synthetic data,web graphs Tree size 5 Minsup 4,10,25
![Page 21: 1 Mining Tree Queries in a Graph Bart Goethals, Eveline Hoekx and Jan Van den Bussche KDD ’ 05 presentor: Ming Jing Tsai](https://reader035.vdocument.in/reader035/viewer/2022062407/56649e265503460f94b15694/html5/thumbnails/21.jpg)
21
Uniform random graphs
Dense, uniform minsup: 10,25 edges:47,264,997