classification & regression - university of notre damerjohns15/cse40647.sp14/www... · data...

24
Classification & Regression

Upload: lamdang

Post on 30-Sep-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

Classification & Regression

Data Preprocessing

Classification& Regression

Rule-Based Classifiers

• Technique for classifying records using a collection of “if…then…” rules

• Rule set 𝑅 = (𝑟1 ∨ 𝑟2 ∨ ⋯∨ 𝑟𝑘)– 𝑟𝑖

′𝑠 𝑎𝑟𝑒 𝑡ℎ𝑒 𝑐𝑙𝑎𝑠𝑠𝑖𝑓𝑖𝑐𝑎𝑡𝑖𝑜𝑛 𝑟𝑢𝑙𝑒𝑠 𝑜𝑟 𝑑𝑖𝑠𝑗𝑢𝑛𝑐𝑡𝑠

2

𝑟1: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑩𝒊𝒓𝒅𝒔𝑟2: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑭𝒊𝒔𝒉𝒆𝒔𝑟3: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑦𝑒𝑠 ∧ 𝐵𝑜𝑑𝑦 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝑤𝑎𝑟𝑚 − 𝑏𝑙𝑜𝑜𝑑𝑒𝑑 ⟶ 𝑴𝒂𝒎𝒎𝒂𝒍𝒔𝑟4: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑛𝑜 ⟶ 𝑹𝒆𝒑𝒕𝒊𝒍𝒆𝒔𝑟5: 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑠𝑒𝑚𝑖 ⟶ 𝑨𝒎𝒑𝒉𝒊𝒃𝒊𝒂𝒏𝒔

Data Preprocessing

Classification& Regression

Classification Rules

• Each rule is expressed in the following way:𝒓𝒊: (𝑪𝒐𝒏𝒅𝒊𝒕𝒊𝒐𝒏) ⟶ 𝒚𝒊

– Left-hand side is called rule antecedent or precondition

𝐶𝑜𝑛𝑑𝑖𝑡𝑖𝑜𝑛 = 𝐴1𝑜𝑝 𝑣1 ∧ 𝐴2 𝑜𝑝 𝑣2 ∧ ⋯(𝐴𝑘 𝑜𝑝 𝑣𝑘)

– op is chosen from the set {=, ≠,<,>,≤, ≥}

– 𝒚𝒊 is called the rule consequent, which contains the predicted class 𝑦𝑖

• A rule r covers a record x if its precondition matches the attributes of x

3

Data Preprocessing

Classification& Regression

Classification Rules

4

𝑟1: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑩𝒊𝒓𝒅𝒔𝑟2: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑭𝒊𝒔𝒉𝒆𝒔𝑟3: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑦𝑒𝑠 ∧ 𝐵𝑜𝑑𝑦 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝑤𝑎𝑟𝑚 − 𝑏𝑙𝑜𝑜𝑑𝑒𝑑 ⟶ 𝑴𝒂𝒎𝒎𝒂𝒍𝒔𝑟4: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑛𝑜 ⟶ 𝑹𝒆𝒑𝒕𝒊𝒍𝒆𝒔𝑟5: 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑠𝑒𝑚𝑖 ⟶ 𝑨𝒎𝒑𝒉𝒊𝒃𝒊𝒂𝒏𝒔

NameBody

TemperatureSkin

CoverGives Birth

Aquatic Creature

Aerial Creature

Has Legs Hibernates

Hawk Warm-blooded Feather No No Yes Yes No

Grizzly Bear Warm-blooded Fur Yes No No Yes Yes

• 𝑟1 covers the first vertebrate

• Is the second instance covered?

Data Preprocessing

Classification& Regression

Evaluating Rules

5

• Given a dataset D and a classification rule 𝑟: 𝐴 → 𝑦, we can evaluate it based on the following two metrics:

𝐶𝑜𝑣𝑒𝑟𝑎𝑔𝑒 𝑟 =𝐴

𝐷

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 𝑟 =𝐴 ∩ 𝑦

𝐴

𝑨 : Number of records that satisfy the antecedent

𝑨 ∩ 𝒚 : number of records that satisfy both antecedent and consequent

𝑫 : Total number of records

Data Preprocessing

Classification& Regression

Example

6

𝑂𝑢𝑡𝑙𝑜𝑜𝑘 = 𝑠𝑢𝑛𝑛𝑦 ∧ 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 ≥ 80 ⟶ 𝑛𝑜

𝐶𝑜𝑣𝑒𝑟𝑎𝑔𝑒 = 50%𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 100%

Data Preprocessing

Classification& Regression

How a Rule-Based Classifier Works

7

• Lemur triggers 𝑟3 and is classified as a mammal

• Turtle triggers 𝑟4 and 𝑟5 which are conflicting. Issue must be resolved.

• No rules are triggered by the shark, but we must ensure that the classifier can still make a reliable prediction.

𝑟1: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑩𝒊𝒓𝒅𝒔𝑟2: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑦𝑒𝑠 ⟶ 𝑭𝒊𝒔𝒉𝒆𝒔𝑟3: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑦𝑒𝑠 ∧ 𝐵𝑜𝑑𝑦 𝑇𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 = 𝑤𝑎𝑟𝑚 − 𝑏𝑙𝑜𝑜𝑑𝑒𝑑 ⟶ 𝑴𝒂𝒎𝒎𝒂𝒍𝒔𝑟4: 𝐺𝑖𝑣𝑒𝑠 𝐵𝑖𝑟𝑡ℎ = 𝑛𝑜 ∧ 𝐴𝑒𝑟𝑖𝑎𝑙 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑛𝑜 ⟶ 𝑹𝒆𝒑𝒕𝒊𝒍𝒆𝒔𝑟5: 𝐴𝑞𝑢𝑎𝑡𝑖𝑐 𝐶𝑟𝑒𝑎𝑡𝑢𝑟𝑒 = 𝑠𝑒𝑚𝑖 ⟶ 𝑨𝒎𝒑𝒉𝒊𝒃𝒊𝒂𝒏𝒔

NameBody

TemperatureSkin

CoverGives Birth

Aquatic Creature

Aerial Creature

Has Legs Hibernates

Lemur Warm-blooded Fur Yes No No Yes Yes

Turtle Cold-blooded Scales No Semi No Yes No

Shark Cold-blooded Scales Yes Yes No No No

Data Preprocessing

Classification& Regression

Important Properties

• Mutually Exclusive Rules: No two rule in a set of rules R are triggered by the same record. This ensures that each record triggers at most one rule.

• Exhaustive Rules: Every record is covered by at least one rule in R.

• Together, these properties ensure that every record is covered by exactly one rule.

• If a set of rules is not exhaustive, we can assign the remaining cases to a default class (usually the majority)– 𝑟𝑑: () ⟶ 𝑦𝑑

8

Data Preprocessing

Classification& Regression

Important Properties

• If a set of rules is not mutually exclusive (as in the previous example), we must resolve the conflict by one of two ways:– Ordered rules: Rules are sorted in the order of their priority and

predictions are made accordingly

– Unordered rules: Considers the consequent of each rule as a vote for a record. Votes are tallied and the class with most votes is assigned to the record. This is less susceptible to errors.

9

Data Preprocessing

Classification& Regression

Rule-Ordering Schemes

10

• Rule-based ordering– Individual rules are ranked based on their quality

• Class-based ordering– Rules that belong to the same class are grouped together

Data Preprocessing

Classification& Regression

Building Classification Rules

11

• Direct Method:– Extract rules direct from data

– E.g.: RIPPER, CN2, 1R

• Indirect Method:– Extract Rules from classification models (e.g., decision trees, neural networks, etc.)

– E.g.: C4.5rules

Data Preprocessing

Classification& Regression

Direct Methods for Rule Extraction

12

• Sequential covering:1. Start from an empty rule

2. Grow a rule using the Learn-One-Rule function

3. Remove records covered by the rule

4. Repeat (2) and (3) until stop criterion is met

Data Preprocessing

Classification& Regression

13

Data Preprocessing

Classification& Regression

Aspects of Sequential Covering

14

• Rule growing

• Instance elimination

• Rule evaluation

• Stop criterion

• Rule pruning

Data Preprocessing

Classification& Regression

Rule Growing

15

Data Preprocessing

Classification& Regression

Instance Elimination

16

• Why eliminate instances?– Otherwise the next rule would be

identical to previous

• Why eliminate + instances?– Ensure the next rule is different

• Why eliminate – instances?– Prevent underestimating

accuracy of rule

Data Preprocessing

Classification& Regression

Rule Evaluation

17

𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =𝑓+𝑛

𝐿𝑎𝑝𝑙𝑎𝑐𝑒 =𝑓+ + 1

𝑛 + 𝑘

𝑚 − 𝑒𝑠𝑡𝑖𝑚𝑎𝑡𝑒 =𝑓+ + 𝑘𝑝+𝑛 + 𝑘

𝒏: number of examples covered by rule

𝒇+: number of positive examples covered by rule

𝒌: total number of classes

𝒑+: prior probability for positive class

Data Preprocessing

Classification& Regression

Rule Evaluation

18

𝐹𝑂𝐼𝐿′𝑠 𝑖𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝑔𝑎𝑖𝑛 = 𝑝1 × log2𝑝1

𝑝1 + 𝑛1− log2

𝑝0𝑝0 + 𝑛0

Note that this metric is proportional to 𝑝1 and 𝑝1/(𝑝1 + 𝑛1). Therefore, it would rank rules with high support and high accuracies with a higher score.

Suppose the rule r: 𝐴 → + covers 𝒑𝟎positive records and 𝒏𝟎 negatives.

After adding a new conjunct B, the extended rule r’: 𝐴 ∧ 𝐵 → + covers 𝒑𝟏positive instances and 𝒏𝟏 negatives.

Data Preprocessing

Classification& Regression

Stopping Criterion and Rule Pruning

19

• Stop criterion:– Compute gain

– If not significant, discard the new rule

• Rule pruning:– Done to reduce error

– Remove one of the conjuncts in the rule

– Compare error on validation set before and after pruning

– If error improves, prune the conjunct

Data Preprocessing

Classification& Regression

Summary of Direct Method

20

1. Grow a single rule

2. Remove instances from rule

3. Prune the rule if needed

4. Add rule to current rule set

5. Repeat

Data Preprocessing

Classification& Regression

RIPPER

21

• For binary class problem:– Choose one of the classes as the negative

– Learn rules for positive (typically the minority) class

– Negative class will be the default class

• For multi-class problem:– Order the classes according to their frequencies

– Learn the rule set for smallest class first, treat rest as negative class

– Repeat with next smallest class as positive class

Data Preprocessing

Classification& Regression

RIPPER

22

• Growing a rule:– Start from empty rule

– Add conjuncts as long as they improve FOIL’s information gain

– Stop when rule no longer covers negative examples

– Prune rule based on the metric 𝑣 = (𝑝 − 𝑛)/(𝑝 + 𝑛)• Where 𝑝 is the number of positive examples covered in the validation set• 𝑛 is the number of negative examples covered

– If 𝑣 improves after pruning, we remove the conjunct

– Ex.: 𝐴𝐵𝐶𝐷 → 𝑦, check if 𝐷 should be pruned, followed by 𝐶𝐷, 𝐵𝐶𝐷

Data Preprocessing

Classification& Regression

RIPPER

23

• Building a Rule Set:– After generating rule, remove all positive and negative examples

covered by it

– Add rule to rule set as long as it does not violate stopping conditions

– If rule increases description length* of rule set by 𝑑 bits, RIPPER stops adding rules (𝑑 is set to 64 bits by default)

– Another stopping condition: error rate must not exceed 50%

Description length = number of bits needed to code current rule set, and their exceptions

Data Preprocessing

Classification& Regression

RIPPER

24

• Pros:– Very efficient (scales almost linearly with the number of records)

– Works well with imbalanced data

– Works well with noisy data since the validation set prevents overfitting.