mining frequent pattern in a set of graph using sub-graph ... › usr › local › pub ›...

1
Mining Frequent pattern in a set of graph using sub - graph Mining - gSpan with closed graph Ankita Sambhare ([email protected]) Advisor: Dr. Carlos Rivero Rochester Institute Of Technology Background Research CONCLUSIONS Example REFERENCES Approach gSpan includes mapping each graph to a DFS code, builds a lexicographic ordering on these codes, followed by the construction of a search tree based on the lexicographic order. The search tree is traversed on the basis of the number of edges in the graph. Figure2: A simple example of patterns mined from 2 graphs It is very clear from the results that gSpan works faster than the other branch and bound candidate graph generation algorithm due to DFS codes introduced. It is also clear that gspan mines more relevant subgraph patterns as compared to gaston as it allows performing closed mining. Graph Mining Domains: 1. Frequent subgraph mining 2. Approximate graph pattern mining 3. Graph pattern summarization 4. Graph classification 5. Graph clustering 6. Graph indexing 7. Graph searching 8. Correlated graph pattern mining 9. Optimal graph pattern mining 10. Graph kernels 11. Link mining 12. Web structure mining 13. Workflow mining 14. Biological network mining 1. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. UIUC-CS Tech. Report: R-2002-2296, (a 4-page short version in REPLACE THIS BOX WITH YOUR ORGANIZATION’S HIGH RESOLUTION LOGO Goal Extract all the frequently occurring patterns from a set of graphs to study most commonly occurring behaviorally significant patterns among the different graphs. The mined patterns can then be used for further analyzing the set of graphs on the basis of its similarities and identify its significance. RESULTS FUTURE WORK Building approximate graph mining on top of frequent subgraph mining to add approximation to the mined patterns which is required due to the noise and the diversity of the data. Handle complex data such as programs data where each node is a complex structure Steps: 1. DFS subscripting with rightmost extension 2. DFS codes Algorithm Algorithm (Contd.) 3. Lexicographical ordering of DFS codes 4. Minimum DFS Code 5. Perform dfs on DFS code tree 2770 10027 736 1363 401 706 0 2000 4000 6000 8000 10000 12000 gSpan gaston Output Fragments Algorithm with minimum frequency Gspan vs gaston on - 340 graphs (dense edges) 5% 10% 15% 0 2 4 6 8 10 12 14 16 5% 10% 15% RunTime Minimum Frequency Gspan vs gaston on - 340 graphs (dense edges) gspan gaston 1795 -1 460 460 225 225 126 126 -200 0 200 400 600 800 1000 1200 1400 1600 1800 2000 gSpan gaston Output Fragments Algorithm with minimum frequency Gspan vs gaston on - 10000 graphs (sparse edges) 5% 10% 15% 20% 0 5 10 15 20 25 30 35 5% 10% 15% 20% RunTime Minimum Frequency Gspan vs gaston on - 10000 graphs (sparse edges) gspan gaston

Upload: others

Post on 05-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Mining Frequent pattern in a set of graph using sub-graph ... › usr › local › pub › GraduateProjects › 2165 › as9391 › Poster.pdfAlgorithm Algorithm (Contd.) 3. Lexicographical

Mining Frequent pattern in a set of graph using sub-graph Mining

- gSpan with closed graphAnkita Sambhare ([email protected])

Advisor: Dr. Carlos RiveroRochester Institute Of Technology

Background Research

CONCLUSIONS

Example

REFERENCES

Approach

gSpan includes mapping

each graph to a DFS

code, builds a

lexicographic ordering on

these codes, followed by

the construction of a

search tree based on the

lexicographic order. The

search tree is traversed

on the basis of the

number of edges in the

graph.

Figure2: A simple example of patterns mined from 2 graphs

It is very clear from the results that gSpan works faster than the

other branch and bound candidate graph generation algorithm

due to DFS codes introduced.

It is also clear that gspan mines more relevant subgraph patterns

as compared to gaston as it allows performing closed mining.

Graph Mining Domains:1. Frequent subgraph mining2. Approximate graph pattern mining3. Graph pattern summarization4. Graph classification5. Graph clustering6. Graph indexing7. Graph searching8. Correlated graph pattern mining9. Optimal graph pattern mining10. Graph kernels11. Link mining12. Web structure mining13. Workflow mining14. Biological network mining

1. X. Yan and J. Han. gSpan: Graph-based substructure pattern mining. UIUC-CS Tech. Report: R-2002-2296, (a 4-page short version in

REPLACE THIS BOX WITH YOUR ORGANIZATION’S

HIGH RESOLUTION LOGO

GoalExtract all the frequently

occurring patterns from a

set of graphs to study

most commonly occurring

behaviorally significant

patterns among the

different graphs. The

mined patterns can then

be used for further

analyzing the set of

graphs on the basis of its

similarities and identify its

significance.

RESULTS

FUTURE WORK

Building approximate graph mining on top of frequent

subgraph mining to add approximation to the mined

patterns which is required due to the noise and the

diversity of the data.

Handle complex data such as programs data where each

node is a complex structure

Steps:

1. DFS subscripting with rightmost extension

2. DFS codes

Algorithm

Algorithm (Contd.)

3. Lexicographical ordering of DFS codes

4. Minimum DFS Code

5. Perform dfs on DFS code tree

2770

10027

7361363

401 706

0

2000

4000

6000

8000

10000

12000

gSpan gaston

Outp

ut F

ragm

en

ts

Algorithm with minimum frequency

Gspan vs gaston on - 340 graphs (dense edges)

5% 10% 15%

0

2

4

6

8

10

12

14

16

5% 10% 15%

Run

Tim

e

Minimum Frequency

Gspan vs gaston on - 340 graphs (dense edges)

gspan gaston

1795

-1

460 460

225 225126 126

-200

0

200

400

600

800

1000

1200

1400

1600

1800

2000

gSpan gaston

Outp

ut F

ragm

en

ts

Algorithm with minimum frequency

Gspan vs gaston on - 10000 graphs (sparse edges)

5% 10% 15% 20%

0

5

10

15

20

25

30

35

5% 10% 15% 20%

Run

Tim

e

Minimum Frequency

Gspan vs gaston on - 10000 graphs (sparse edges)

gspan gaston