first-order rule learning. sequential covering (i) learning consists of iteratively learning rules...

33
First-Order Rule Learning

Upload: abigayle-hall

Post on 20-Jan-2016

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

First-Order Rule Learning

Page 2: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Sequential Covering (I)Learning consists of iteratively learning

rules that cover yet uncovered training instances

Assume the existence of a Learn_one_Rule function:Input: a set of training instancesOutput: a single high-accuracy (not

necessarily high-coverage) rule

Page 3: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Sequential Covering (II)Algorithm Sequential_Covering(Instances)

Learned_rules Rule Learn_one_Rule(Instances) While Quality(Rule, Instances) > Threshold Do

Learned_rules Learned_rules + Rule Instances Instances - {instances correctly classified by

Rule}Rule Learn_one_Rule(Instances)

Sort Learned_rules by Quality over Instances# Quality is user-defined rule quality evaluation function

Return Learned_rules

Page 4: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 (I) Algorithm Learn_one_Rule_CN2(Instances, k)

Best_hypo Candidate_hypo {Best_hypo} While Candidate_hypo Do

All_constraints {(a=v): a is an attribute and v is a value of a found in Instances}

New_candidate_hypo For each h Candidate_hypo

For each c All_constraints, specialize h by adding c Remove from New_candidate_hypo any hypotheses that are duplicates,

inconsistent or not maximally specific For all h New_candidate_hypo

If Quality_CN2(h, Instances) > Quality_CN2(Best_hypo, Instances) Best_hypo h

Candidate_hypo the k best members of New_candidate_hypo as per Quality_CN2

Return a rule of the form “IF Best_hypo THEN Pred”# Pred = most frequent target attribute’s value among instances that match Best_hypo

Page 5: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 (II)Algorithm Quality_CN2(h, Instances)

h_instances {i Instances: i matches h} Return -Entropy(h_instances)

where Entropy is computed with respect to the target attribute

Note that CN2 performs a general-to-specific beam search, keeping not the single best candidate at each step, but a list of the k best candidates

Page 6: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustrative Training SetRisk Assessment for Loan Applications

Client # Credit History Debt Level Collateral Income Level RISK LEVEL

1 Bad High None Low HIGH

2 Unknown High None Medium HIGH

3 Unknown Low None Medium MODERATE

4 Unknown Low None Low HIGH

5 Unknown Low None High LOW

6 Unknown Low Adequate High LOW

7 Bad Low None Low HIGH

8 Bad Low Adequate High MODERATE

9 Good Low None High LOW

10 Good High Adequate High LOW

11 Good High None Low HIGH

12 Good High None Medium MODERATE

13 Good High None High LOW

14 Bad High None Medium HIGH

Page 7: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 Example (I)

Risk Assessment for Loan Applications

Client #

Credit History

Debt Level

Collateral Income Level

RISK LEVEL

1 Bad High None Low HIGH 2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 4 Unknown Low None Low HIGH 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 7 Bad Low None Low HIGH 8 Bad Low Adequate High MODERATE 9 Good Low None High LOW

10 Good High Adequate High LOW 11 Good High None Low HIGH 12 Good High None Medium MODERATE 13 Good High None High LOW 14 Bad High None Medium HIGH

First pass:

Full instance set

2-best1: « Income Level = Low » (4-0-0), « Income Level = High » (0-1-5)

Can’t do better than (4-0-0)

Best_hypo: « Income Level = Low »

First rule:

IF Income Level = Low THEN HIGH

Page 8: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 Example (II)

Risk Assessment for Loan Applications

Client #

Credit History

Debt Level

Collateral Income Level

RISK LEVEL

2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE 9 Good Low None High LOW

10 Good High Adequate High LOW 12 Good High None Medium MODERATE 13 Good High None High LOW 14 Bad High None Medium HIGH

Second pass:

Instances 2-3, 5-6, 8-10, 12-14

2-best1: « Income Level = High » (0-1-5), « Credit History = Good » (0-1-3)

Best_hypo: « Income Level = High »

2-best2: « Income Level = High AND Credit History = Good » (0-0-3), « Income level = High AND Collateral = None » (0-0-3)

Best_hypo: « Income Level = High AND Credit History = Good »

Can’t do better than (0-0-3)

Second rule:

IF Income Level = High AND Credit History = Good THEN LOW

Page 9: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 Example (III)

Risk Assessment for Loan Applications

Client #

Credit History

Debt Level

Collateral Income Level

RISK LEVEL

2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE

12 Good High None Medium MODERATE 14 Bad High None Medium HIGH

Third pass:

Instances 2-3, 5-6, 8, 12, 14

2-best1: « Credit History = Good » (0-1-0), « Debt level = High » (2-1-0)

Best_hypo: « Credit History = Good »

Can’t do better than (0-1-0)

Third rule:

IF Credit History = Good THEN MODERATE

Page 10: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 Example (IV)

Risk Assessment for Loan Applications

Client #

Credit History

Debt Level

Collateral Income Level

RISK LEVEL

2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE

14 Bad High None Medium HIGH

Fourth pass:

Instances 2-3, 5-6, 8, 14

2-best1: « Debt level = High » (2-0-0), « Income Level = Medium » (2-1-0)

Best_hypo: « Debt Level = High »

Can’t do better than (2-0-0)

Fourth rule:

IF Debt Level = High THEN HIGH

Page 11: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 Example (V)

Risk Assessment for Loan Applications

Client #

Credit History

Debt Level

Collateral Income Level

RISK LEVEL

3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE

Fifth pass:

Instances 3, 5-6, 8

2-best1: « Credit History = Bad » (0-1-0), « Income Level = Medium » (0-1-0)

Best_hypo: «  Credit History = Bad »

Can’t do better than (0-1-0)

Fifth rule:

IF Credit History = Bad THEN MODERATE

Page 12: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 Example (VI)

Risk Assessment for Loan Applications

Client #

Credit History

Debt Level

Collateral Income Level

RISK LEVEL

3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW

Sixth pass:

Instances 3, 5-6

2-best1: « Income Level = High » (0-0-2), « Collateral = Adequate » (0-0-1)

Best_hypo: «  Income Level = High  »

Can’t do better than (0-0-2)

Sixth rule:

IF Income Level = High THEN LOW

Page 13: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 Example (VII)

Risk Assessment for Loan Applications

Client #

Credit History

Debt Level

Collateral Income Level

RISK LEVEL

3 Unknown Low None Medium MODERATE

Seventh pass:

Instance 3

2-best1: « Credit History = Unknown » (0-1-0), « Debt level = Low » (0-1-0)

Best_hypo: « Credit History = Unknown »

Can’t do better than (0-1-0)

Seventh rule:

IF Credit History = Unknown THEN MODERATE

Page 14: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 Example (VIII)Quality: -pilog(pi)

Rule 1: (4-0-0) - Rank 1Rule 2: (0-0-3) - Rank 2Rule 3: (1-1-3) - Rank 5Rule 4: (4-1-2) - Rank 6Rule 5: (3-1-0) - Rank 4Rule 6: (0-1-5) - Rank 3Rule 7: (2-1-2) - Rank 7

Page 15: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

CN2 Example (IX)IF Income Level = Low

THEN HIGH

IF Income Level = High AND Credit History = Good

THEN LOW

IF Income Level = High

THEN LOW

IF Credit History = Bad

THEN MODERATE

IF Credit History = Good

THEN MODERATE

IF Debt Level = High

THEN HIGH

IF Credit History = Unknown

THEN MODERATE

Page 16: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Limitations of AVL (I)Consider the MONK1 problem:

6 attributesA1: 1, 2, 3A2: 1, 2, 3A3: 1, 2A4: 1, 2, 3A5: 1, 2, 3, 4A6: 1, 2

2 classes: 0, 1Target concept: If (A1=A2 or A5=1) then Class

1

Page 17: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Limitations of AVL (II)Can you build a decision tree for this concept?

Page 18: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Limitations of AVL (III)Can you build a rule set for this concept?

If A1=1 and A2=1 then Class=1If A1=1 and A2=1 then Class=1 If A1=2 and A2=2 then Class=1If A1=2 and A2=2 then Class=1 If A1=3 and A2=3 then Class=1If A1=3 and A2=3 then Class=1 If A5=1 then Class=1If A5=1 then Class=1 Class=0Class=0

Page 19: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

First-order LanguageSupports first-order concepts -> relations

between attributes accounted for in a natural way

For simplicity, restrict to Horn clausesA clause is any disjunction of literals whose

variables are universally quantifiedHorn clauses (single non-negated literal):

n

n

LLH

LLH

1

1

Page 20: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

FOIL (I) Algorithm FOIL(Target_predicate, Predicates, Examples)

Pos those Examples for which Target_predicate is true Neg those Examples for which Target_predicate is false Learned_rules While Pos Do

New_rule the rule that predicts Target_predicate with no precondition

New_rule_neg Neg While New_rule_neg Do

Candidate_literals GenCandidateLit(New_rule, Predicates) Best_literal argmaxLCandidate_literals FoilGain(L, New_rule) Add Best_literal to New_rule’s preconditions New_rule_neg subset of New_rule_neg that satisfies New_rule’s

preconditions Learned_rules Learned_rules + New_rule Pos Pos – {members of Pos covered by New_rule}

Return Learned_rules

Page 21: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

FOIL (II) Algorithm GenCandidateLit(Rule,

Predicates)Let Rule P(x1, …, xk) L1, …, Ln

Return all literals of the form Q(v1, …, vr) where Q is any predicate in Predicates and

the vi’s are either new variables or variables already present in Rule, with the constraint that at least one of the vi’s must already exist as a variable in Rule

Equal(xj, xk) where xj and xk are variables already present in Rule

The negation of all of the above forms of literals

Page 22: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

FOIL (III)Algorithm FoilGain(L, Rule)

Return )log(log00

02

11

12 np

p

npp

t

wherewhere pp00 is the number of positive bindings of is the number of positive bindings of RuleRule nn00 is the number of negative bindings of is the number of negative bindings of RuleRule pp11 is the number of positive bindings of is the number of positive bindings of

Rule+LRule+L nn11 is the number of negative bindings of is the number of negative bindings of

Rule+LRule+L tt is the number of positive bindings of is the number of positive bindings of RuleRule

that are still covered after adding that are still covered after adding LL to to RuleRule

Page 23: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (I)Consider the data:

GrandDaughter(Victor, Sharon)Father(Sharon, Bob)Father(Tom, Bob)Female(Sharon)Father(Bob, Victor)

Target concept: GrandDaughter(x, y)

Closed-world assumption

Page 24: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (II)

Training set: Positive examples:

GrandDaughter(Victor, Sharon) Negative examples:

GrandDaughter(Victor, Victor) GrandDaughter(Victor, Bob) GrandDaughter(Victor, Tom) GrandDaughter(Sharon, Victor) GrandDaughter(Sharon, Sharon) GrandDaughter(Sharon, Bob) GrandDaughter(Sharon, Tom) GrandDaughter(Bob, Victor) GrandDaughter(Bob, Sharon) GrandDaughter(Bob, Bob) GrandDaughter(Bob, Tom) GrandDaughter(Tom, Victor) GrandDaughter(Tom, Sharon) GrandDaughter(Tom, Bob) GrandDaughter(Tom, Tom)

Page 25: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (III)Most general rule:

GrandDaughter(x, y)

Specializations: Father(x, y) Father(x, z) Father(y, x) Father(y, z) Father(x, z) Father(z, x) Female(x) Female(y) Equal(x, y) Negations of each of the above

Page 26: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (IV) Consider 1st specialization

GrandDaughter(x, y) Father(x, y)

16 possible bindings: x/Victor, y/Victor x/Victor y/Sharon … x/Tom, y/Tom

FoilGain: p0 = 1 (x/Victor, y/Sharon) n0 = 15 p1 = 0 n1 = 16 t = 0 So that GainFoil(1st specialization) = 0

Page 27: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (V) Consider 4th specialization

GrandDaughter(x, y) Father(y, z)

64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom

FoilGain: p0 = 1 (x/Victor, y/Sharon) n0 = 15 p1 = 1 (x/Victor, y/Sharon, z/Bob) n1 = 11 (x/Victor, y/Bob, z/Victor) (x/Victor, y/Tom, z/Bob) (x/Sharon, y/Bob,

z/Victor) (x/Sharon, y/Tom, z/Bob) (x/Bob, y/Tom, z/Bob) (x/Bob, y/Sharon, z/Bob) (x/Tom, y/Sharon, z/Bob) (x/Tom, y/Bob, z/Victor) (x/Sharon, y/Sharon, z/Bob) (x/Bob, y/Bob, z/Victor) (x/Tom, y/Tom, z/Bob)

t = 1 So that GainFoil(4th specialization) = 0.415

Page 28: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (VI)Assume the 4th specialization is indeed selected

Partial rule: GrandDaughter(x, y) Father(y, z) Still covers 11 negative examples

New set of candidate literals: All of the previous ones Female(z) Equal(x, z) Equal(y, z) Father(z, w) Father(w, z) Negations of each of the above

Page 29: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (VII) Consider the specialization

GrandDaughter(x, y) Father(y, z), Equal(x, z)

64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom

FoilGain: p0 = 1 (x/Victor, y/Sharon, z/Bob) n0 = 11 p1 = 0 n1 = 3 (x/Victor, y/Bob, z/Victor) (x/Bob, y/Tom, z/Bob) (x/Bob,

y/Sharon, z/Bob) t = 0 So that GainFoil(specialization) = 0

Page 30: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (VIII) Consider the specialization

GrandDaughter(x, y) Father(y, z), Father(z, x)

64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom

FoilGain: p0 = 1 (x/Victor, y/Sharon, z/Bob) n0 = 11 p1 = 1(x/Victor, y/Sharon, z/Bob) n1 = 1 (x/Victor, y/Tom, z/Bob) t = 1 So that GainFoil(specialization) = 2.585

Page 31: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (IX)Assume that specialization is indeed selected

Partial rule: GrandDaughter(x, y) Father(y, z), Father(z, x)

Still covers 1 negative example

No new set of candidate literalsUse all of the previous ones

Page 32: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (X) Consider the specialization

GrandDaughter(x, y) Father(y, z), Father(z, x), Female(y)

64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom

FoilGain: p0 = 1 (x/Victor, y/Sharon, z/Bob) n0 = 1 p1 = 1(x/Victor, y/Sharon, z/Bob) n1 = 0 t = 1 So that GainFoil(specialization) = 1

Page 33: First-Order Rule Learning. Sequential Covering (I) Learning consists of iteratively learning rules that cover yet uncovered training instances Assume

Illustration (XI)No negative examples are covered and all

positive examples are covered

So, we get the final correct rule:

GrandDaughter(x, y) Father(y, z),Father(z, x),Female(y)