lazy association classification

1

Lazy Associative Classification

Adriano Veloso, Wagner Meira Jr, Mohammed J. ZakiComputer Science Dept, Federal University of Minas Gerais, BrazilComputer Science Dept, Rensselaer Polytechnic Institute, Troy, US

AICDM’06

Reporter: Chieh-Chang YangDate: 2007.03.19

2

Outline

Introduction Related Work Eager Associative Classifiers Lazy Associative Classifiers Experiments Conclusions

3

Introduction Classification is a well-suited problem

and several models have been proposed. Among these models, decision tree

classifiers are particularly suited because it’s relatively fast and simple.

Decision trees perform a greedy search for rules by selecting the most promising features. Such greedy search may prune important rules.

4

Introduction As an alternative models, associative

classifiers first mine association rules from training data, and use these rules to build a classifier.

Associative classifiers perform a global search for rules satisfying some quality constraints. However, this global search may generate a large number of rules.

5

Introduction In this paper we propose a novel lazy associ

ative classifier, in which the computation is performed on a demand-driven basis. It focus on the features that actually occur within the test instance while generating the rules.

We assess the performance of the lazy associative classifier, and prove that it outperforms the eager associative one and decision tree classifier.

6

Related Work Most existing work on associative classification relies on

developing new algorithms to improve the overall accuracy. CBA generate a single rule-set and rank the rules according to

their confidence/support. It selects the best rule to be applied to each test instance.

HARMONY uses an instance-centric rule-generation approach that it assures the inclusion of at least one rule for each training instance in the final rule set.

CMAR uses multiple rules to perform the classification. CPAR adopts a greedy technique to generate a smaller rule-

sets. CAEP explores the concept of emerging patterns that usually

predict accurately all classes even if their populations are unbalanced.

7

Related Work Rule induction classifiers includes RISE, RIPPER, and

SLEEPER. RISE performs a complete overfiting by considering e

ach instance as a rule, and then generalizes the rules. RIPPER and SLEEPER extend the “ overfit and prune

” paradigm, that is, they start with a large rule-set and prune it using several heuristics.

SLEEPER also associates a probability with each rule, weighting the contribution of the rule during classification.

8

Eager Associative Classifier

Decision Trees and Decision Rules Entropy-based Associative

Classifier

9

Decision trees and decision rules Given any subset of training instance S, let

si denote the number of instance with class ci, and |S|=Σsi. Then pi=si/|S| denotes the probability of class ci in S.

The entropy of S is E(S)= Σpi log pi. For any partition of S into m subsets, with S

=∪Si, the split entropy is E({Si})= Σ(|Si|/|S|)E(Si).

The information gain for the split is I(S,{Si})=E(S)-E({Si}).

10

Decision trees and decision rules

A decision tree is built using a greedy, recursive splitting strategy, where the best split is chosen at each internal node according to the information gain.

The splitting at a node stops when all instances are from a single class or if the size of the node falls below a minimum support threshold, called minsup.

11

Decision trees and decision rules

12

Entropy-based Associative Classifier We denote as class association rules (CAR) those

association rules of the form X-> c, where the antecedent (X) is composed of feature variables and the consequent (c) is just a class.

CAR may be generated by a slightly modified association rule mining algorithm. Each itemset must contain a class and the rule generation also follows a template in which the consequent is just a class.

CARs are ranked in decreasing order of information gain. During the testing phase, the associative classifier simply checks whether each CAR matches the test instance; the class associated with the first match is chosen.

13

Entropy-based Associative Classifier

14


Three CARs match the test instance of our example using EAC:

1. {windy=false and temperature=cool->play=yes}

2. {outlook=sunny and humidity=high->play=no}3. {outlook=sunny and temperature=cool->play=

yes} First rule is selected. In our example, the test c

ase is recognized by only one rule in the decision tree, while the same test case is recognized by three CARs in the associative classifier.

15

Entropy-based Associative Classifier They discuss two theorems about the perf

ormance of decision trees and eager associative classifiers. They have proved both are true.

1. The rules derived from a decision tree are a subset of the CARs mined using an eager associative classifier based on information gain.

2. CARs perform no worse than decision tree rules, according to the information gain principle.

16


17

Lazy Associative Classifier

18


By definition, both CeAand Cl

A are composed of CARs {X->c} in which X≤A. Because DA≤D, for a given minsup, if a rule {X->c} is frequent in D, then it must also be frequent in DA. Since Cl

A is generated from DA and

CeA is generated from D (and DA≤D), Ce

A ≤ ClA.

19


20

Lazy Associative Classifier & Eager Associative Classifier

Suppose minsup is set to 40% (|D|=10, so must occur at least 4 times in D), the set of CARs found by eager classifier is composed of these two:

1. {windy=false and humidity=normal->play=yes}2. {windy=false and temperature=cool->play=yes} None of the two CARs matches the testing instan

ce. The lazy classifier found two CARs in DA (only nee

d two times in DA):1. {outlook=overcast->play=yes}2. {temperature=hot->play=yes}

21

Lazy Associative Classifier & Eager Associative Classifier

Intuitively, lazy classifier perform better than eager classifiers because of two characteristics:

1. Missing CARs: Eager classifiers search for CARs in a large search space, which is induced by all features of the training data. While this strategy generates a large rule-set, CARs that are important to some specific test instances may be missed.

2. Highly Disjunctive Spaces: Eager classifiers generate CARs before the test instance is even known. For this reason, eager classifiers often combine small disjuncts in order to generate more general predicitions. This can reduce classification performance in highly disjunctive spaces.

22

Problems of Lazy Associative Classifier

The aforementioned discussion show an intuitive concept: the more CARs are generated, the better is the classifier.

However, the same concept also leads to overfitting, reducing the generalization and affecting the classification accuracy.

23


In fact, overfitting and high sensitivity to irrelevant features are shortcomings of lazy classifications.

A natural solution is to identify and discard the irrelevant features. Thus, feature selection methods may be used.

In experiments we show that our lazy classifiers were not seriously affected by overfitting because only the best and more general CARs are used.

24


Another disadvantage is that the lazy classifiers is typically require more work to classify all test instances.

However, simple caching mechanisms are very effective to decrease this workload. The basic idea is that different test instances may induce different rule-sets, but different rule-sets may share common CARs.

25

Experimental Evaluation

In this section we show the experimental results for the evaluation of the proposed classifiers in terms of classification effectiveness and computational performance.

Our evaluation is based on a comparison against C4.5 and LazyDT decision tree classifiers. We also compare our numbers to some results from other associative classifiers, such as CPAR, CMAR, and HARMONY, and to some results from rule induction classifiers, such as RISE, RIPPER, and SLEEPER.

26

Experimental Evaluation

We used 26 datasets from the UCI Machine Learning Repository to compare the effectiveness of the classifiers.

In all experiments we used 10-fold cross-validation. We quantify the classification effectiveness of the cl

assifiers through the conventional error rate. We used the entropy method to discretize continuo

us attributes. In the experiments we set minimum confidence to

50% and minsup to 1%.

27

Comparison Between Decision Trees ,Eager Classifiers, and Lazy Classifiers

28

Comparison Between Decision Trees ,Eager Classifiers, and Lazy Classifiers

29

Comparison Between Rule Induction and Associative Classifiers

30

Overfitting and Underfitting

31

Execution Times

32

Conclusions We present an assessment of associative cl

assification and propose improvements to associative classification by introducing a novel lazy classifier.

An important feature of the proposed lazy classifier is its ability to deal with the small disjuncts problem.

We also compare the proposed classifiers against other three associative classifiers and three rule induction classifiers and outperform them in most of the cases.

lazy association classification

Business

decision rules entropy

decision tree rules

entropy of s

s i s es i

decision tree classifiers

important rules

rule generation

best rule