data not in the pre-defined feature vectors that can be used to construct predictive models

• Data not in the pre-defined feature vectors that can be used to construct predictive models.

Applications:

• Transactional database

• Sequence database

• Graph database

Frequent pattern is a good candidate for discriminative features, especially for data of complicated structures.

Motivation:

Direct Mining of Discriminative and Essential Frequent Patterns via Model-based Search Tree

Wei Fan, Kun Zhang, Hong Cheng, Jing Gao, Xifeng Yan, Jiawei Han, Philip S. Yu, Olivier Verscheure

Why Frequent Patterns?

• A non-linear conjunctive combination of single features• Increase the expressive and discriminative power of the feature space

Examples:

• Exclusive OR problem & Solution

Data is non-linearly separable in (x, y)

X Y XY C

0 0 0 0

0 1 0 1

1 0 0 1

1 1 1 0

transfo

Data is linearly separable in (x, y, xy)

3D Projection Using XY

0.0 0.2 0.4 0.6 0.8 1.0

Conventional Frequent Pattern-Based Classification: Two-Step Batch Method

1. Mine frequent patterns;

2. Select most discriminative patterns;

3. Represent data in the feature space using such patterns;

4. Build classification models.

F1 F2 F4

Data1 1 1 0Data2 1 0 1Data3 1 1 0

Data4 0 0 1………

represent

Frequent Patterns1-------------------------------2----------3----- 4 --- 5 ----------- 6 ------- 7------

DataSet mine

Mined Discriminative

Patterns1 2 4

select

Petal.Width< 1.75setosa

versicolor virginica

Petal.Length< 2.45

Any classifiers you can name

Basic Flows: Problems of Separated Mine & Select in Batch Method

1. Mine step: Issues of scalability and combinatorial explosion • Dilemma of setting minsupport

• Promising discriminative candidate patterns?• Tremendous number of candidate patterns?

2. Select step: Issue of discriminative power

• 5 Datasets: UCI Machine Learning Repository

• Scalability Study:

Adult Chess Hypo Sick Sonar

Log(DT #Pat) Log(MbT #Pat)

Log(DTAbsSupport) Log(MbTAbsSupport)

Datasets #Pat using MbT sup Ratio (MbT #Pat / #Pat using MbT sup)

Adult 252809 0.41%

Chess +∞ ~0%

Hypo 423439 0.0035%

Sick 4818391 0.00032%

Sonar 95507 0.00775%

Itemset Mining

• Accuracy of Mined Itemsets

DT Accuracy MbT Accuracy

Graph Mining

• 11 Datasets:• 9 NCI anti-cancer screen datasets

• PubChem Project.• Positive class : 1% - 8.3%

• 2 AIDS anti-viral screen datasets

• URL: http://dtp.nci.nih.gov.• H1: 3.5%, H2: 1%

• Scalability Study

0300600900

120015001800

NCI1 NCI33 NCI41 NCI47 NCI81 NCI83 NCI109 NCI123 NCI145 H1 H2

DT #Pat MbT #Pat

Log(DT Abs Support) Log(MbT Abs Support)

• Predictive Quality of Mined Frequent Subgraphs

DT MbT Accuracy

DT MbTAUC

AUC of MbT, DT MbT VS Benchmarks

• Case Study

Motivation

Problems

Proposed Algorithm

Experiments

dataset

3 4 6 7

Few Data

……..+

……..

Divide-and-Conquer Based Frequent Pattern Mining

mine & select

Mined Discriminative Patterns

1234567

1. Mine and Select most discriminative patterns;

2. Represent data in the feature space using such patterns;

3. Build classification models.

F1 F2 F4

Data1 1 1 0Data2 1 0 1

Data3 1 1 0 Data4 0 0 1

………

represent

Petal.Width< 1.75setosa

versicolor virginica

Petal.Length< 2.45

Any classifiers you can name

Direct Mining & Selection via Model-based Search Tree

Procedures as Feature Miner Or Be Itself as Classifier

Analyses:

1. Scalability of pattern enumeration• Upper bound

• “Scale down” ratio2. Bound on number of returned features

3. Subspace pattern selection

4. Non-overfitting5. Optimality under exhaustive search

Take Home Message:

1. Highly compact and discriminative frequent patterns can be directly mined through Model based Search Tree without worrying about combinatorial explosion.

2. Software and datasets are available by contacting the authors.

data not in the pre-defined feature vectors that can be used to construct predictive models

Documents

1 vectors. 2 vectors and scalars, addition of vectors ...

computing covariant lyapunov vectors, oseledets vectors

physics vectors and projectile motion. vectors vectors have...

large-scale dynamic predictive regressions...2018/06/18 ·...

chapter 3 : vectors - introduction - addition of vectors -...

predictive modelling / machine learning - cineca ›...

scalars and vectors · the vectors a and b represent...

assessment concepts reliability validity inter-rater...

cloning vectors section h h1 design of plasmid vectors h2...

www.mathsrevision.com nat 5 vectors vectors and scalars...

technology topics most relevant to technology vectors...

ch03:vectors - an-najah national university · 1...

vectors computer graphics. a review of vectors: scalar...

interactions between ovine lentiviral vectors and primary...

some new regression methods for predictive and construct

continuous frames, function spaces, and the discretization...

chapter 2 vectors - wikispacesfor... · • vectors are...

máca parasites vectors parasites & vectors

4.1 saccharomyces cerevisiae vectors 4.2 mammalian … · 4...

section 9.2 vectors goals goals introduce vectors. introduce...