our new progress on frequent/sequential pattern mining we develop new frequent/sequential pattern...

21
Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins Our new methods Conventional methods Frequent pattern mining FP-grow th Apriori,TreeProjection Sequential pattern mining PrefixSpan, FreeSpan GSP Frequent closed pattern mining CLO SET A-close,CHARM

Post on 18-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Our New Progress on Frequent/Sequential Pattern Mining

We develop new frequent/sequential pattern mining methods

Performance study on both synthetic and real data sets shows that our methods outperform conventional ones in wide margins

Our newmethods

Conventionalmethods

Frequent patternmining

FP-growth Apriori, TreeProjection

Sequential patternmining

PrefixSpan,FreeSpan

GSP

Frequent closedpattern mining

CLOSET A-close, CHARM

Page 2: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Mining Complete Set of Frequent Patterns on T10I4D100k

0

20

40

60

80

100

120

140

0.00% 0.05% 0.10% 0.15%

Support threshold

Ru

nti

me (

seco

nd

)

Apriori

TreeProjection

FP-growth

Page 3: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Mining Complete Set of Frequent Patterns on T25I20D100k

0

20

40

60

80

100

120

140

160

180

200

0.00% 0.50% 1.00% 1.50%

Support threshold

Ru

nti

me (

seco

nd

)

Apriori

TreeProjection

FP-growth

Page 4: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Mining Complete Set of Frequent Patterns on Connect-4

0

50

100

150

200

250

300

350

400

70% 75% 80% 85% 90% 95%

Support threshold

Ru

nti

me (

seco

nd

) Apriori

TreeProjection

FP-growth

Page 5: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Mining Sequential Patterns on C10T4S16I4

0

100

200

300

400

500

600

700

800

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

Ru

n t

ime (

seco

nd

)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Page 6: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Mining Sequential Patterns on C10T8S8I8

0

20

40

60

80

100

120

140

160

180

200

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

Ru

n t

ime (

seco

nd

)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Page 7: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Scalability of Mining Sequential Patterns on C10-100T8S8I8

0

100

200

300

400

500

600

700

800

0 20000 40000 60000 80000 100000

Number of sequences

Ru

n t

ime

(s

ec

on

d)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Page 8: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Scalability of Mining Sequential Patterns on C10-100T4S16I4

0

200

400

600

800

1000

1200

1400

1600

0 20000 40000 60000 80000 100000

Number of sequences

Ru

n t

ime

(s

ec

on

d)

PrefixSpan-1

PrefixSpan-2

GSP

FreeSpan-2

Page 9: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Why Prefix Is Faster Than GSP?

0.001

0.01

0.1

1

10

100

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

# cand/pattern inGSP

Runtime/proj. db inPrefixSpan

0.001

0.01

0.1

1

10

100

0.00% 0.50% 1.00% 1.50% 2.00%

Support threshold

# cand/pattern inGSP

Runtime/proj. db inPrefixSpan

Dataset C10T4S16I4 Dataset C10T8S8I8

Page 10: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Mining Frequent Closed Itemsets on T25I20D100k

0

20

40

60

80

100

0.7% 0.9% 1.1% 1.3% 1.5%

Support threshold

Ru

nti

me (

seco

nd

)

A-CLOSE

CLOSET

ChARM

Page 11: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Mining Frequent Closed Itemsets on Connect-4

1

10

100

1000

10000

40% 50% 60% 70% 80% 90% 100%

Support threshold

Ru

nti

me (

seco

nd

) A-CLOSE

CLOSET

ChARM

Page 12: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Mining Frequent Closed Itemsets on Pumsb

0

50

100

150

200

250

300

75% 80% 85% 90% 95%

Support threshold

Ru

nti

me (

seco

nd

) A-CLOSE

CLOSET

ChARM

Page 13: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

References R. Agarwal, C. Aggarwal, and V. V. V. Prasad. A tree projection algorithm for

generation of frequent itemsets. In Journal of Parallel and Distributed Computing (Special Issue on High Performance Data Mining), (to appear), 2000.

R. Agrawal and R. Srikant. Fast algorithms for mining association rules. In Proc. 1994 Int. Conf. Very Large Data Bases, pages 487--499, Santiago, Chile, September 1994.

J. Han, J. Pei, B. Mortazavi-Asl, Q. Chen, U. Dayal, and M. Hsu. FreeSpan: Frequent pattern-projected sequential pattern mining. In Proc. KDD'2000, Boston, August 2000.

J. Han, J. Pei, and Y. Yin. Mining Frequent Patterns without Candidate Generation, Proc. SIGMOD’2000, Dallas, TX, May 2000.

J. Pei, J. Han, H. Pinto, Q. Chen, U. Dayal, and M. Hsu. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth, submitted for publication

R. Srikant and R. Agrawal. Mining sequential patterns: Generalizations and performance improvements. In Proc. 5th Int. Conf. Extending Database Technology (EDBT), pages 3--17, Avignon, France, March 1996.

N. Pasquier, Y. Bastide, R. Taouil, and L. Lakhal. Discovering frequent closed itemsets for association rules. In Proc. ICDT’99, Israel, January 1999.

M.J. Zaki and C. Hsiao. ChARM: An efficient algorithm for closed association rule mining. In Proc. KDD'2000, Boston, August 2000.

Page 14: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

DBMiner Version 2.5 (Beta)

DBMiner Technology Inc.B.C. Canada

Page 15: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

What we had for DBMiner 2.0…

Association module on data cubes Classification module on data cubes Clustering module on data cubes OLAP browser 3D Cube browser

Page 16: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

What we will do in DBMiner 2.5…

Keep the existing association module and classification module in version 2.0

Change the existing clustering module Add new visual classification module

both on SQL server and OLAP Add new sequential pattern modules

on SQL server using FP algorithm

Page 17: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

What we have done…

We have incorporated the existing association module and added OLAP browser Module

We have added the visual classification module

We have changed the existing clustering module

We have added the sequential pattern module

We are still in the development stage

Page 18: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

Association module on data cubes

Page 19: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

New sequential pattern module on SQL Server

Page 20: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

New visual classification module on data cubes

Page 21: Our New Progress on Frequent/Sequential Pattern Mining We develop new frequent/sequential pattern mining methods Performance study on both synthetic and

New clustering module on data cubes