lazy association classification

32
1 Lazy Associative Classification Adriano Veloso, Wagner Meira Jr, Mohamme d J. Zaki Computer Science Dept, Federal University of Minas Gerais, Brazil Computer Science Dept, Rensselaer Polytechnic Insti tute, Troy, USA ICDM’06 Reporter: Chieh-Chang Yang Date: 2007.03.19

Upload: jason-yang

Post on 19-Jan-2015

1.661 views

Category:

Business


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Lazy Association Classification

1

Lazy Associative Classification

Adriano Veloso, Wagner Meira Jr, Mohammed J. ZakiComputer Science Dept, Federal University of Minas Gerais, BrazilComputer Science Dept, Rensselaer Polytechnic Institute, Troy, US

AICDM’06

Reporter: Chieh-Chang YangDate: 2007.03.19

Page 2: Lazy Association Classification

2

Outline

Introduction Related Work Eager Associative Classifiers Lazy Associative Classifiers Experiments Conclusions

Page 3: Lazy Association Classification

3

Introduction Classification is a well-suited problem

and several models have been proposed. Among these models, decision tree

classifiers are particularly suited because it’s relatively fast and simple.

Decision trees perform a greedy search for rules by selecting the most promising features. Such greedy search may prune important rules.

Page 4: Lazy Association Classification

4

Introduction As an alternative models, associative

classifiers first mine association rules from training data, and use these rules to build a classifier.

Associative classifiers perform a global search for rules satisfying some quality constraints. However, this global search may generate a large number of rules.

Page 5: Lazy Association Classification

5

Introduction In this paper we propose a novel lazy associ

ative classifier, in which the computation is performed on a demand-driven basis. It focus on the features that actually occur within the test instance while generating the rules.

We assess the performance of the lazy associative classifier, and prove that it outperforms the eager associative one and decision tree classifier.

Page 6: Lazy Association Classification

6

Related Work Most existing work on associative classification relies on

developing new algorithms to improve the overall accuracy. CBA generate a single rule-set and rank the rules according to

their confidence/support. It selects the best rule to be applied to each test instance.

HARMONY uses an instance-centric rule-generation approach that it assures the inclusion of at least one rule for each training instance in the final rule set.

CMAR uses multiple rules to perform the classification. CPAR adopts a greedy technique to generate a smaller rule-

sets. CAEP explores the concept of emerging patterns that usually

predict accurately all classes even if their populations are unbalanced.

Page 7: Lazy Association Classification

7

Related Work Rule induction classifiers includes RISE, RIPPER, and

SLEEPER. RISE performs a complete overfiting by considering e

ach instance as a rule, and then generalizes the rules. RIPPER and SLEEPER extend the “ overfit and prune

” paradigm, that is, they start with a large rule-set and prune it using several heuristics.

SLEEPER also associates a probability with each rule, weighting the contribution of the rule during classification.

Page 8: Lazy Association Classification

8

Eager Associative Classifier

Decision Trees and Decision Rules Entropy-based Associative

Classifier

Page 9: Lazy Association Classification

9

Decision trees and decision rules Given any subset of training instance S, let

si denote the number of instance with class ci, and |S|=Σsi. Then pi=si/|S| denotes the probability of class ci in S.

The entropy of S is E(S)= Σpi log pi. For any partition of S into m subsets, with S

=∪Si, the split entropy is E({Si})= Σ(|Si|/|S|)E(Si).

The information gain for the split is I(S,{Si})=E(S)-E({Si}).

Page 10: Lazy Association Classification

10

Decision trees and decision rules

A decision tree is built using a greedy, recursive splitting strategy, where the best split is chosen at each internal node according to the information gain.

The splitting at a node stops when all instances are from a single class or if the size of the node falls below a minimum support threshold, called minsup.

Page 11: Lazy Association Classification

11

Decision trees and decision rules

Page 12: Lazy Association Classification

12

Entropy-based Associative Classifier We denote as class association rules (CAR) those

association rules of the form X-> c, where the antecedent (X) is composed of feature variables and the consequent (c) is just a class.

CAR may be generated by a slightly modified association rule mining algorithm. Each itemset must contain a class and the rule generation also follows a template in which the consequent is just a class.

CARs are ranked in decreasing order of information gain. During the testing phase, the associative classifier simply checks whether each CAR matches the test instance; the class associated with the first match is chosen.

Page 13: Lazy Association Classification

13

Entropy-based Associative Classifier

Page 14: Lazy Association Classification

14

Entropy-based Associative Classifier

Three CARs match the test instance of our example using EAC:

1. {windy=false and temperature=cool->play=yes}

2. {outlook=sunny and humidity=high->play=no}3. {outlook=sunny and temperature=cool->play=

yes} First rule is selected. In our example, the test c

ase is recognized by only one rule in the decision tree, while the same test case is recognized by three CARs in the associative classifier.

Page 15: Lazy Association Classification

15

Entropy-based Associative Classifier They discuss two theorems about the perf

ormance of decision trees and eager associative classifiers. They have proved both are true.

1. The rules derived from a decision tree are a subset of the CARs mined using an eager associative classifier based on information gain.

2. CARs perform no worse than decision tree rules, according to the information gain principle.

Page 16: Lazy Association Classification

16

Entropy-based Associative Classifier

Page 17: Lazy Association Classification

17

Lazy Associative Classifier

Page 18: Lazy Association Classification

18

Lazy Associative Classifier

By definition, both CeAand Cl

A are composed of CARs {X->c} in which X≤A. Because DA≤D, for a given minsup, if a rule {X->c} is frequent in D, then it must also be frequent in DA. Since Cl

A is generated from DA and

CeA is generated from D (and DA≤D), Ce

A ≤ ClA.

Page 19: Lazy Association Classification

19

Lazy Associative Classifier

Page 20: Lazy Association Classification

20

Lazy Associative Classifier & Eager Associative Classifier

Suppose minsup is set to 40% (|D|=10, so must occur at least 4 times in D), the set of CARs found by eager classifier is composed of these two:

1. {windy=false and humidity=normal->play=yes}2. {windy=false and temperature=cool->play=yes} None of the two CARs matches the testing instan

ce. The lazy classifier found two CARs in DA (only nee

d two times in DA):1. {outlook=overcast->play=yes}2. {temperature=hot->play=yes}

Page 21: Lazy Association Classification

21

Lazy Associative Classifier & Eager Associative Classifier

Intuitively, lazy classifier perform better than eager classifiers because of two characteristics:

1. Missing CARs: Eager classifiers search for CARs in a large search space, which is induced by all features of the training data. While this strategy generates a large rule-set, CARs that are important to some specific test instances may be missed.

2. Highly Disjunctive Spaces: Eager classifiers generate CARs before the test instance is even known. For this reason, eager classifiers often combine small disjuncts in order to generate more general predicitions. This can reduce classification performance in highly disjunctive spaces.

Page 22: Lazy Association Classification

22

Problems of Lazy Associative Classifier

The aforementioned discussion show an intuitive concept: the more CARs are generated, the better is the classifier.

However, the same concept also leads to overfitting, reducing the generalization and affecting the classification accuracy.

Page 23: Lazy Association Classification

23

Problems of Lazy Associative Classifier

In fact, overfitting and high sensitivity to irrelevant features are shortcomings of lazy classifications.

A natural solution is to identify and discard the irrelevant features. Thus, feature selection methods may be used.

In experiments we show that our lazy classifiers were not seriously affected by overfitting because only the best and more general CARs are used.

Page 24: Lazy Association Classification

24

Problems of Lazy Associative Classifier

Another disadvantage is that the lazy classifiers is typically require more work to classify all test instances.

However, simple caching mechanisms are very effective to decrease this workload. The basic idea is that different test instances may induce different rule-sets, but different rule-sets may share common CARs.

Page 25: Lazy Association Classification

25

Experimental Evaluation

In this section we show the experimental results for the evaluation of the proposed classifiers in terms of classification effectiveness and computational performance.

Our evaluation is based on a comparison against C4.5 and LazyDT decision tree classifiers. We also compare our numbers to some results from other associative classifiers, such as CPAR, CMAR, and HARMONY, and to some results from rule induction classifiers, such as RISE, RIPPER, and SLEEPER.

Page 26: Lazy Association Classification

26

Experimental Evaluation

We used 26 datasets from the UCI Machine Learning Repository to compare the effectiveness of the classifiers.

In all experiments we used 10-fold cross-validation. We quantify the classification effectiveness of the cl

assifiers through the conventional error rate. We used the entropy method to discretize continuo

us attributes. In the experiments we set minimum confidence to

50% and minsup to 1%.

Page 27: Lazy Association Classification

27

Comparison Between Decision Trees ,Eager Classifiers, and Lazy Classifiers

Page 28: Lazy Association Classification

28

Comparison Between Decision Trees ,Eager Classifiers, and Lazy Classifiers

Page 29: Lazy Association Classification

29

Comparison Between Rule Induction and Associative Classifiers

Page 30: Lazy Association Classification

30

Overfitting and Underfitting

Page 31: Lazy Association Classification

31

Execution Times

Page 32: Lazy Association Classification

32

Conclusions We present an assessment of associative cl

assification and propose improvements to associative classification by introducing a novel lazy classifier.

An important feature of the proposed lazy classifier is its ability to deal with the small disjuncts problem.

We also compare the proposed classifiers against other three associative classifiers and three rule induction classifiers and outperform them in most of the cases.