first-order rule learning. sequential covering (i) learning consists of iteratively learning rules...

First-Order Rule Learning

Sequential Covering (I)Learning consists of iteratively learning

rules that cover yet uncovered training instances

Assume the existence of a Learn_one_Rule function:Input: a set of training instancesOutput: a single high-accuracy (not

necessarily high-coverage) rule

Sequential Covering (II)Algorithm Sequential_Covering(Instances)

Learned_rules Rule Learn_one_Rule(Instances) While Quality(Rule, Instances) > Threshold Do

Learned_rules Learned_rules + Rule Instances Instances - {instances correctly classified by

Rule}Rule Learn_one_Rule(Instances)

Sort Learned_rules by Quality over Instances# Quality is user-defined rule quality evaluation function

Return Learned_rules

CN2 (I) Algorithm Learn_one_Rule_CN2(Instances, k)

Best_hypo Candidate_hypo {Best_hypo} While Candidate_hypo Do

All_constraints {(a=v): a is an attribute and v is a value of a found in Instances}

New_candidate_hypo For each h Candidate_hypo

For each c All_constraints, specialize h by adding c Remove from New_candidate_hypo any hypotheses that are duplicates,

inconsistent or not maximally specific For all h New_candidate_hypo

If Quality_CN2(h, Instances) > Quality_CN2(Best_hypo, Instances) Best_hypo h

Candidate_hypo the k best members of New_candidate_hypo as per Quality_CN2

Return a rule of the form “IF Best_hypo THEN Pred”# Pred = most frequent target attribute’s value among instances that match Best_hypo

CN2 (II)Algorithm Quality_CN2(h, Instances)

h_instances {i Instances: i matches h} Return -Entropy(h_instances)

where Entropy is computed with respect to the target attribute

Note that CN2 performs a general-to-specific beam search, keeping not the single best candidate at each step, but a list of the k best candidates

Illustrative Training SetRisk Assessment for Loan Applications

Client # Credit History Debt Level Collateral Income Level RISK LEVEL

1 Bad High None Low HIGH

2 Unknown High None Medium HIGH

3 Unknown Low None Medium MODERATE

4 Unknown Low None Low HIGH

5 Unknown Low None High LOW

6 Unknown Low Adequate High LOW

7 Bad Low None Low HIGH

8 Bad Low Adequate High MODERATE

9 Good Low None High LOW

10 Good High Adequate High LOW

11 Good High None Low HIGH

12 Good High None Medium MODERATE

13 Good High None High LOW

14 Bad High None Medium HIGH

CN2 Example (I)

Risk Assessment for Loan Applications

Client #

Credit History

Debt Level

Collateral Income Level

RISK LEVEL

1 Bad High None Low HIGH 2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 4 Unknown Low None Low HIGH 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 7 Bad Low None Low HIGH 8 Bad Low Adequate High MODERATE 9 Good Low None High LOW

10 Good High Adequate High LOW 11 Good High None Low HIGH 12 Good High None Medium MODERATE 13 Good High None High LOW 14 Bad High None Medium HIGH

First pass:

Full instance set

2-best1: « Income Level = Low » (4-0-0), « Income Level = High » (0-1-5)

Can’t do better than (4-0-0)

Best_hypo: « Income Level = Low »

First rule:

IF Income Level = Low THEN HIGH

CN2 Example (II)


Client #

Credit History

Debt Level


RISK LEVEL

2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE 9 Good Low None High LOW

10 Good High Adequate High LOW 12 Good High None Medium MODERATE 13 Good High None High LOW 14 Bad High None Medium HIGH

Second pass:

Instances 2-3, 5-6, 8-10, 12-14

2-best1: « Income Level = High » (0-1-5), « Credit History = Good » (0-1-3)

Best_hypo: « Income Level = High »

2-best2: « Income Level = High AND Credit History = Good » (0-0-3), « Income level = High AND Collateral = None » (0-0-3)

Best_hypo: « Income Level = High AND Credit History = Good »


Second rule:

IF Income Level = High AND Credit History = Good THEN LOW

CN2 Example (III)


Client #

Credit History

Debt Level


RISK LEVEL

2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE

12 Good High None Medium MODERATE 14 Bad High None Medium HIGH

Third pass:

Instances 2-3, 5-6, 8, 12, 14

2-best1: « Credit History = Good » (0-1-0), « Debt level = High » (2-1-0)

Best_hypo: « Credit History = Good »


Third rule:

IF Credit History = Good THEN MODERATE

CN2 Example (IV)


Client #

Credit History

Debt Level


RISK LEVEL

2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE

14 Bad High None Medium HIGH

Fourth pass:

Instances 2-3, 5-6, 8, 14

2-best1: « Debt level = High » (2-0-0), « Income Level = Medium » (2-1-0)

Best_hypo: « Debt Level = High »


Fourth rule:

IF Debt Level = High THEN HIGH

CN2 Example (V)


Client #

Credit History

Debt Level


RISK LEVEL

3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE

Fifth pass:

Instances 3, 5-6, 8

2-best1: « Credit History = Bad » (0-1-0), « Income Level = Medium » (0-1-0)

Best_hypo: « Credit History = Bad »


Fifth rule:

IF Credit History = Bad THEN MODERATE

CN2 Example (VI)


Client #

Credit History

Debt Level


RISK LEVEL

3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW

Sixth pass:

Instances 3, 5-6

2-best1: « Income Level = High » (0-0-2), « Collateral = Adequate » (0-0-1)

Best_hypo: « Income Level = High »


Sixth rule:

IF Income Level = High THEN LOW

CN2 Example (VII)


Client #

Credit History

Debt Level


RISK LEVEL

3 Unknown Low None Medium MODERATE

Seventh pass:

Instance 3

2-best1: « Credit History = Unknown » (0-1-0), « Debt level = Low » (0-1-0)

Best_hypo: « Credit History = Unknown »


Seventh rule:

IF Credit History = Unknown THEN MODERATE

CN2 Example (VIII)Quality: -pilog(pi)

Rule 1: (4-0-0) - Rank 1Rule 2: (0-0-3) - Rank 2Rule 3: (1-1-3) - Rank 5Rule 4: (4-1-2) - Rank 6Rule 5: (3-1-0) - Rank 4Rule 6: (0-1-5) - Rank 3Rule 7: (2-1-2) - Rank 7

CN2 Example (IX)IF Income Level = Low

THEN HIGH

IF Income Level = High AND Credit History = Good

THEN LOW

IF Income Level = High

THEN LOW

IF Credit History = Bad

THEN MODERATE

IF Credit History = Good

THEN MODERATE

IF Debt Level = High

THEN HIGH

IF Credit History = Unknown

THEN MODERATE

Limitations of AVL (I)Consider the MONK1 problem:

6 attributesA1: 1, 2, 3A2: 1, 2, 3A3: 1, 2A4: 1, 2, 3A5: 1, 2, 3, 4A6: 1, 2

2 classes: 0, 1Target concept: If (A1=A2 or A5=1) then Class

1

Limitations of AVL (II)Can you build a decision tree for this concept?

Limitations of AVL (III)Can you build a rule set for this concept?

If A1=1 and A2=1 then Class=1If A1=1 and A2=1 then Class=1 If A1=2 and A2=2 then Class=1If A1=2 and A2=2 then Class=1 If A1=3 and A2=3 then Class=1If A1=3 and A2=3 then Class=1 If A5=1 then Class=1If A5=1 then Class=1 Class=0Class=0

First-order LanguageSupports first-order concepts -> relations

between attributes accounted for in a natural way

For simplicity, restrict to Horn clausesA clause is any disjunction of literals whose

variables are universally quantifiedHorn clauses (single non-negated literal):

n

n

LLH

LLH

1

1

FOIL (I) Algorithm FOIL(Target_predicate, Predicates, Examples)

Pos those Examples for which Target_predicate is true Neg those Examples for which Target_predicate is false Learned_rules While Pos Do

New_rule the rule that predicts Target_predicate with no precondition

New_rule_neg Neg While New_rule_neg Do

Candidate_literals GenCandidateLit(New_rule, Predicates) Best_literal argmaxLCandidate_literals FoilGain(L, New_rule) Add Best_literal to New_rule’s preconditions New_rule_neg subset of New_rule_neg that satisfies New_rule’s

preconditions Learned_rules Learned_rules + New_rule Pos Pos – {members of Pos covered by New_rule}

Return Learned_rules

FOIL (II) Algorithm GenCandidateLit(Rule,

Predicates)Let Rule P(x1, …, xk) L1, …, Ln

Return all literals of the form Q(v1, …, vr) where Q is any predicate in Predicates and

the vi’s are either new variables or variables already present in Rule, with the constraint that at least one of the vi’s must already exist as a variable in Rule

Equal(xj, xk) where xj and xk are variables already present in Rule

The negation of all of the above forms of literals

FOIL (III)Algorithm FoilGain(L, Rule)

Return )log(log00

02

11

12 np

p

npp

t

wherewhere pp00 is the number of positive bindings of is the number of positive bindings of RuleRule nn00 is the number of negative bindings of is the number of negative bindings of RuleRule pp11 is the number of positive bindings of is the number of positive bindings of

Rule+LRule+L nn11 is the number of negative bindings of is the number of negative bindings of

Rule+LRule+L tt is the number of positive bindings of is the number of positive bindings of RuleRule

that are still covered after adding that are still covered after adding LL to to RuleRule

Illustration (I)Consider the data:

GrandDaughter(Victor, Sharon)Father(Sharon, Bob)Father(Tom, Bob)Female(Sharon)Father(Bob, Victor)

Target concept: GrandDaughter(x, y)

Closed-world assumption

Illustration (II)

Training set: Positive examples:

GrandDaughter(Victor, Sharon) Negative examples:

GrandDaughter(Victor, Victor) GrandDaughter(Victor, Bob) GrandDaughter(Victor, Tom) GrandDaughter(Sharon, Victor) GrandDaughter(Sharon, Sharon) GrandDaughter(Sharon, Bob) GrandDaughter(Sharon, Tom) GrandDaughter(Bob, Victor) GrandDaughter(Bob, Sharon) GrandDaughter(Bob, Bob) GrandDaughter(Bob, Tom) GrandDaughter(Tom, Victor) GrandDaughter(Tom, Sharon) GrandDaughter(Tom, Bob) GrandDaughter(Tom, Tom)

Illustration (III)Most general rule:

GrandDaughter(x, y)

Specializations: Father(x, y) Father(x, z) Father(y, x) Father(y, z) Father(x, z) Father(z, x) Female(x) Female(y) Equal(x, y) Negations of each of the above

Illustration (IV) Consider 1st specialization

GrandDaughter(x, y) Father(x, y)

16 possible bindings: x/Victor, y/Victor x/Victor y/Sharon … x/Tom, y/Tom

FoilGain: p0 = 1 (x/Victor, y/Sharon) n0 = 15 p1 = 0 n1 = 16 t = 0 So that GainFoil(1st specialization) = 0

Illustration (V) Consider 4th specialization

GrandDaughter(x, y) Father(y, z)

64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom

FoilGain: p0 = 1 (x/Victor, y/Sharon) n0 = 15 p1 = 1 (x/Victor, y/Sharon, z/Bob) n1 = 11 (x/Victor, y/Bob, z/Victor) (x/Victor, y/Tom, z/Bob) (x/Sharon, y/Bob,

z/Victor) (x/Sharon, y/Tom, z/Bob) (x/Bob, y/Tom, z/Bob) (x/Bob, y/Sharon, z/Bob) (x/Tom, y/Sharon, z/Bob) (x/Tom, y/Bob, z/Victor) (x/Sharon, y/Sharon, z/Bob) (x/Bob, y/Bob, z/Victor) (x/Tom, y/Tom, z/Bob)

t = 1 So that GainFoil(4th specialization) = 0.415

Illustration (VI)Assume the 4th specialization is indeed selected

Partial rule: GrandDaughter(x, y) Father(y, z) Still covers 11 negative examples

New set of candidate literals: All of the previous ones Female(z) Equal(x, z) Equal(y, z) Father(z, w) Father(w, z) Negations of each of the above

Illustration (VII) Consider the specialization

GrandDaughter(x, y) Father(y, z), Equal(x, z)


FoilGain: p0 = 1 (x/Victor, y/Sharon, z/Bob) n0 = 11 p1 = 0 n1 = 3 (x/Victor, y/Bob, z/Victor) (x/Bob, y/Tom, z/Bob) (x/Bob,

y/Sharon, z/Bob) t = 0 So that GainFoil(specialization) = 0

Illustration (VIII) Consider the specialization

GrandDaughter(x, y) Father(y, z), Father(z, x)


FoilGain: p0 = 1 (x/Victor, y/Sharon, z/Bob) n0 = 11 p1 = 1(x/Victor, y/Sharon, z/Bob) n1 = 1 (x/Victor, y/Tom, z/Bob) t = 1 So that GainFoil(specialization) = 2.585

Illustration (IX)Assume that specialization is indeed selected

Partial rule: GrandDaughter(x, y) Father(y, z), Father(z, x)

Still covers 1 negative example

No new set of candidate literalsUse all of the previous ones

Illustration (X) Consider the specialization

GrandDaughter(x, y) Father(y, z), Father(z, x), Female(y)


FoilGain: p0 = 1 (x/Victor, y/Sharon, z/Bob) n0 = 1 p1 = 1(x/Victor, y/Sharon, z/Bob) n1 = 0 t = 1 So that GainFoil(specialization) = 1

Illustration (XI)No negative examples are covered and all

positive examples are covered

So, we get the final correct rule:

GrandDaughter(x, y) Father(y, z),Father(z, x),Female(y)

first-order rule learning. sequential covering (i) learning consists of iteratively learning rules...

Documents

hypo candidate

instances quality

rules rule

hypo hcandidate

hypo doall

single best candidate

h candidate

lowfirst rule