first-order rule learning. sequential covering (i) learning consists of iteratively learning rules...
TRANSCRIPT
First-Order Rule Learning
Sequential Covering (I)Learning consists of iteratively learning
rules that cover yet uncovered training instances
Assume the existence of a Learn_one_Rule function:Input: a set of training instancesOutput: a single high-accuracy (not
necessarily high-coverage) rule
Sequential Covering (II)Algorithm Sequential_Covering(Instances)
Learned_rules Rule Learn_one_Rule(Instances) While Quality(Rule, Instances) > Threshold Do
Learned_rules Learned_rules + Rule Instances Instances - {instances correctly classified by
Rule}Rule Learn_one_Rule(Instances)
Sort Learned_rules by Quality over Instances# Quality is user-defined rule quality evaluation function
Return Learned_rules
CN2 (I) Algorithm Learn_one_Rule_CN2(Instances, k)
Best_hypo Candidate_hypo {Best_hypo} While Candidate_hypo Do
All_constraints {(a=v): a is an attribute and v is a value of a found in Instances}
New_candidate_hypo For each h Candidate_hypo
For each c All_constraints, specialize h by adding c Remove from New_candidate_hypo any hypotheses that are duplicates,
inconsistent or not maximally specific For all h New_candidate_hypo
If Quality_CN2(h, Instances) > Quality_CN2(Best_hypo, Instances) Best_hypo h
Candidate_hypo the k best members of New_candidate_hypo as per Quality_CN2
Return a rule of the form “IF Best_hypo THEN Pred”# Pred = most frequent target attribute’s value among instances that match Best_hypo
CN2 (II)Algorithm Quality_CN2(h, Instances)
h_instances {i Instances: i matches h} Return -Entropy(h_instances)
where Entropy is computed with respect to the target attribute
Note that CN2 performs a general-to-specific beam search, keeping not the single best candidate at each step, but a list of the k best candidates
Illustrative Training SetRisk Assessment for Loan Applications
Client # Credit History Debt Level Collateral Income Level RISK LEVEL
1 Bad High None Low HIGH
2 Unknown High None Medium HIGH
3 Unknown Low None Medium MODERATE
4 Unknown Low None Low HIGH
5 Unknown Low None High LOW
6 Unknown Low Adequate High LOW
7 Bad Low None Low HIGH
8 Bad Low Adequate High MODERATE
9 Good Low None High LOW
10 Good High Adequate High LOW
11 Good High None Low HIGH
12 Good High None Medium MODERATE
13 Good High None High LOW
14 Bad High None Medium HIGH
CN2 Example (I)
Risk Assessment for Loan Applications
Client #
Credit History
Debt Level
Collateral Income Level
RISK LEVEL
1 Bad High None Low HIGH 2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 4 Unknown Low None Low HIGH 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 7 Bad Low None Low HIGH 8 Bad Low Adequate High MODERATE 9 Good Low None High LOW
10 Good High Adequate High LOW 11 Good High None Low HIGH 12 Good High None Medium MODERATE 13 Good High None High LOW 14 Bad High None Medium HIGH
First pass:
Full instance set
2-best1: « Income Level = Low » (4-0-0), « Income Level = High » (0-1-5)
Can’t do better than (4-0-0)
Best_hypo: « Income Level = Low »
First rule:
IF Income Level = Low THEN HIGH
CN2 Example (II)
Risk Assessment for Loan Applications
Client #
Credit History
Debt Level
Collateral Income Level
RISK LEVEL
2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE 9 Good Low None High LOW
10 Good High Adequate High LOW 12 Good High None Medium MODERATE 13 Good High None High LOW 14 Bad High None Medium HIGH
Second pass:
Instances 2-3, 5-6, 8-10, 12-14
2-best1: « Income Level = High » (0-1-5), « Credit History = Good » (0-1-3)
Best_hypo: « Income Level = High »
2-best2: « Income Level = High AND Credit History = Good » (0-0-3), « Income level = High AND Collateral = None » (0-0-3)
Best_hypo: « Income Level = High AND Credit History = Good »
Can’t do better than (0-0-3)
Second rule:
IF Income Level = High AND Credit History = Good THEN LOW
CN2 Example (III)
Risk Assessment for Loan Applications
Client #
Credit History
Debt Level
Collateral Income Level
RISK LEVEL
2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE
12 Good High None Medium MODERATE 14 Bad High None Medium HIGH
Third pass:
Instances 2-3, 5-6, 8, 12, 14
2-best1: « Credit History = Good » (0-1-0), « Debt level = High » (2-1-0)
Best_hypo: « Credit History = Good »
Can’t do better than (0-1-0)
Third rule:
IF Credit History = Good THEN MODERATE
CN2 Example (IV)
Risk Assessment for Loan Applications
Client #
Credit History
Debt Level
Collateral Income Level
RISK LEVEL
2 Unknown High None Medium HIGH 3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE
14 Bad High None Medium HIGH
Fourth pass:
Instances 2-3, 5-6, 8, 14
2-best1: « Debt level = High » (2-0-0), « Income Level = Medium » (2-1-0)
Best_hypo: « Debt Level = High »
Can’t do better than (2-0-0)
Fourth rule:
IF Debt Level = High THEN HIGH
CN2 Example (V)
Risk Assessment for Loan Applications
Client #
Credit History
Debt Level
Collateral Income Level
RISK LEVEL
3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW 8 Bad Low Adequate High MODERATE
Fifth pass:
Instances 3, 5-6, 8
2-best1: « Credit History = Bad » (0-1-0), « Income Level = Medium » (0-1-0)
Best_hypo: « Credit History = Bad »
Can’t do better than (0-1-0)
Fifth rule:
IF Credit History = Bad THEN MODERATE
CN2 Example (VI)
Risk Assessment for Loan Applications
Client #
Credit History
Debt Level
Collateral Income Level
RISK LEVEL
3 Unknown Low None Medium MODERATE 5 Unknown Low None High LOW 6 Unknown Low Adequate High LOW
Sixth pass:
Instances 3, 5-6
2-best1: « Income Level = High » (0-0-2), « Collateral = Adequate » (0-0-1)
Best_hypo: « Income Level = High »
Can’t do better than (0-0-2)
Sixth rule:
IF Income Level = High THEN LOW
CN2 Example (VII)
Risk Assessment for Loan Applications
Client #
Credit History
Debt Level
Collateral Income Level
RISK LEVEL
3 Unknown Low None Medium MODERATE
Seventh pass:
Instance 3
2-best1: « Credit History = Unknown » (0-1-0), « Debt level = Low » (0-1-0)
Best_hypo: « Credit History = Unknown »
Can’t do better than (0-1-0)
Seventh rule:
IF Credit History = Unknown THEN MODERATE
CN2 Example (VIII)Quality: -pilog(pi)
Rule 1: (4-0-0) - Rank 1Rule 2: (0-0-3) - Rank 2Rule 3: (1-1-3) - Rank 5Rule 4: (4-1-2) - Rank 6Rule 5: (3-1-0) - Rank 4Rule 6: (0-1-5) - Rank 3Rule 7: (2-1-2) - Rank 7
CN2 Example (IX)IF Income Level = Low
THEN HIGH
IF Income Level = High AND Credit History = Good
THEN LOW
IF Income Level = High
THEN LOW
IF Credit History = Bad
THEN MODERATE
IF Credit History = Good
THEN MODERATE
IF Debt Level = High
THEN HIGH
IF Credit History = Unknown
THEN MODERATE
Limitations of AVL (I)Consider the MONK1 problem:
6 attributesA1: 1, 2, 3A2: 1, 2, 3A3: 1, 2A4: 1, 2, 3A5: 1, 2, 3, 4A6: 1, 2
2 classes: 0, 1Target concept: If (A1=A2 or A5=1) then Class
1
Limitations of AVL (II)Can you build a decision tree for this concept?
Limitations of AVL (III)Can you build a rule set for this concept?
If A1=1 and A2=1 then Class=1If A1=1 and A2=1 then Class=1 If A1=2 and A2=2 then Class=1If A1=2 and A2=2 then Class=1 If A1=3 and A2=3 then Class=1If A1=3 and A2=3 then Class=1 If A5=1 then Class=1If A5=1 then Class=1 Class=0Class=0
First-order LanguageSupports first-order concepts -> relations
between attributes accounted for in a natural way
For simplicity, restrict to Horn clausesA clause is any disjunction of literals whose
variables are universally quantifiedHorn clauses (single non-negated literal):
n
n
LLH
LLH
1
1
FOIL (I) Algorithm FOIL(Target_predicate, Predicates, Examples)
Pos those Examples for which Target_predicate is true Neg those Examples for which Target_predicate is false Learned_rules While Pos Do
New_rule the rule that predicts Target_predicate with no precondition
New_rule_neg Neg While New_rule_neg Do
Candidate_literals GenCandidateLit(New_rule, Predicates) Best_literal argmaxLCandidate_literals FoilGain(L, New_rule) Add Best_literal to New_rule’s preconditions New_rule_neg subset of New_rule_neg that satisfies New_rule’s
preconditions Learned_rules Learned_rules + New_rule Pos Pos – {members of Pos covered by New_rule}
Return Learned_rules
FOIL (II) Algorithm GenCandidateLit(Rule,
Predicates)Let Rule P(x1, …, xk) L1, …, Ln
Return all literals of the form Q(v1, …, vr) where Q is any predicate in Predicates and
the vi’s are either new variables or variables already present in Rule, with the constraint that at least one of the vi’s must already exist as a variable in Rule
Equal(xj, xk) where xj and xk are variables already present in Rule
The negation of all of the above forms of literals
FOIL (III)Algorithm FoilGain(L, Rule)
Return )log(log00
02
11
12 np
p
npp
t
wherewhere pp00 is the number of positive bindings of is the number of positive bindings of RuleRule nn00 is the number of negative bindings of is the number of negative bindings of RuleRule pp11 is the number of positive bindings of is the number of positive bindings of
Rule+LRule+L nn11 is the number of negative bindings of is the number of negative bindings of
Rule+LRule+L tt is the number of positive bindings of is the number of positive bindings of RuleRule
that are still covered after adding that are still covered after adding LL to to RuleRule
Illustration (I)Consider the data:
GrandDaughter(Victor, Sharon)Father(Sharon, Bob)Father(Tom, Bob)Female(Sharon)Father(Bob, Victor)
Target concept: GrandDaughter(x, y)
Closed-world assumption
Illustration (II)
Training set: Positive examples:
GrandDaughter(Victor, Sharon) Negative examples:
GrandDaughter(Victor, Victor) GrandDaughter(Victor, Bob) GrandDaughter(Victor, Tom) GrandDaughter(Sharon, Victor) GrandDaughter(Sharon, Sharon) GrandDaughter(Sharon, Bob) GrandDaughter(Sharon, Tom) GrandDaughter(Bob, Victor) GrandDaughter(Bob, Sharon) GrandDaughter(Bob, Bob) GrandDaughter(Bob, Tom) GrandDaughter(Tom, Victor) GrandDaughter(Tom, Sharon) GrandDaughter(Tom, Bob) GrandDaughter(Tom, Tom)
Illustration (III)Most general rule:
GrandDaughter(x, y)
Specializations: Father(x, y) Father(x, z) Father(y, x) Father(y, z) Father(x, z) Father(z, x) Female(x) Female(y) Equal(x, y) Negations of each of the above
Illustration (IV) Consider 1st specialization
GrandDaughter(x, y) Father(x, y)
16 possible bindings: x/Victor, y/Victor x/Victor y/Sharon … x/Tom, y/Tom
FoilGain: p0 = 1 (x/Victor, y/Sharon) n0 = 15 p1 = 0 n1 = 16 t = 0 So that GainFoil(1st specialization) = 0
Illustration (V) Consider 4th specialization
GrandDaughter(x, y) Father(y, z)
64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom
FoilGain: p0 = 1 (x/Victor, y/Sharon) n0 = 15 p1 = 1 (x/Victor, y/Sharon, z/Bob) n1 = 11 (x/Victor, y/Bob, z/Victor) (x/Victor, y/Tom, z/Bob) (x/Sharon, y/Bob,
z/Victor) (x/Sharon, y/Tom, z/Bob) (x/Bob, y/Tom, z/Bob) (x/Bob, y/Sharon, z/Bob) (x/Tom, y/Sharon, z/Bob) (x/Tom, y/Bob, z/Victor) (x/Sharon, y/Sharon, z/Bob) (x/Bob, y/Bob, z/Victor) (x/Tom, y/Tom, z/Bob)
t = 1 So that GainFoil(4th specialization) = 0.415
Illustration (VI)Assume the 4th specialization is indeed selected
Partial rule: GrandDaughter(x, y) Father(y, z) Still covers 11 negative examples
New set of candidate literals: All of the previous ones Female(z) Equal(x, z) Equal(y, z) Father(z, w) Father(w, z) Negations of each of the above
Illustration (VII) Consider the specialization
GrandDaughter(x, y) Father(y, z), Equal(x, z)
64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom
FoilGain: p0 = 1 (x/Victor, y/Sharon, z/Bob) n0 = 11 p1 = 0 n1 = 3 (x/Victor, y/Bob, z/Victor) (x/Bob, y/Tom, z/Bob) (x/Bob,
y/Sharon, z/Bob) t = 0 So that GainFoil(specialization) = 0
Illustration (VIII) Consider the specialization
GrandDaughter(x, y) Father(y, z), Father(z, x)
64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom
FoilGain: p0 = 1 (x/Victor, y/Sharon, z/Bob) n0 = 11 p1 = 1(x/Victor, y/Sharon, z/Bob) n1 = 1 (x/Victor, y/Tom, z/Bob) t = 1 So that GainFoil(specialization) = 2.585
Illustration (IX)Assume that specialization is indeed selected
Partial rule: GrandDaughter(x, y) Father(y, z), Father(z, x)
Still covers 1 negative example
No new set of candidate literalsUse all of the previous ones
Illustration (X) Consider the specialization
GrandDaughter(x, y) Father(y, z), Father(z, x), Female(y)
64 possible bindings: x/Victor, y/Victor, z/Victor x/Victor y/Victor, z/Sharon … x/Tom, y/Tom, z/Tom
FoilGain: p0 = 1 (x/Victor, y/Sharon, z/Bob) n0 = 1 p1 = 1(x/Victor, y/Sharon, z/Bob) n1 = 0 t = 1 So that GainFoil(specialization) = 1
Illustration (XI)No negative examples are covered and all
positive examples are covered
So, we get the final correct rule:
GrandDaughter(x, y) Father(y, z),Father(z, x),Female(y)