1/23 learning from positive examples main ideas and the particular case of cprogol4.2 daniel...
TRANSCRIPT
![Page 1: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/1.jpg)
1/23
Learning from positive examples
Main ideas and the particular case of CProgol4.2
Daniel Fredouille, CIG talk,11/2005
![Page 2: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/2.jpg)
2/23
What is it all about?
• Symbolic machine learning.• Learning from positive examples instead
of positive and negative examples.• The talk contains two parts:
1. General ideas and tactics to learn from positives.
2. How the particular ILP system CProgol 4.4 of S. Muggleton (1997) deals with positive only learning
![Page 3: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/3.jpg)
3/23
Disclaimer
• This talk has not been extracted from a survey or any article in particular: this is more like a patchwork of my experiences in the domain and how I interpret them.
• Feel free to criticize: I would like feedback on these ideas since I never shared them before.
• I would really appreciate comments on the
slides with the ? sign.
![Page 4: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/4.jpg)
4/23
Definitions
Concept space Instances space
orde
ring
Inferred concept C’
Positive/Negative example of CTarget concept C
• Is more general / less specific than• The concept space is usually partially ordered with this relation
![Page 5: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/5.jpg)
5/23
Positive and Negative Learning
Possibility 1: Discrimination of classes• Characterise the difference in the pos/neg examples• No model of the positive concept !
?
![Page 6: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/6.jpg)
6/23
Positive and Negative Learning
Possibility 2: Characterisation of a class• Use neg. examples to prevent over-generalisation• Needs neg. examples “close” to the concept border
?
![Page 7: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/7.jpg)
7/23
Positive Only Learning
Aim: Characterisation of a class
Choice ?
![Page 8: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/8.jpg)
8/23
Positive Only Learning
• Two strategies:1. Bias in the search space: choosing a space
with a (very) strong structure.
2. Bias in the evaluation function: choose a concept with a compromise between:– Generality/specificity of the concept– Coverage of the positives by the concept– Complexity of the hypothesis representing the
concept
?
![Page 9: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/9.jpg)
9/23
Search space bias approach
• Main idea: consider strongly organised concept spaces
• Possible inference algorithm:– Select the concept the least general covering all
examples.– The constraints on the search space ensures there is
only one such concept.
Trivial example (generally not useful), “tree organisation”:
![Page 10: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/10.jpg)
10/23
Search space bias approach
• Advantages: – Strong theoretical convergence results possible.– Can lead to (very) fast inference algorithms.
• Drawback:– Not available for all concepts spaces!– Theorem: super-finite classes of concepts are not
inferable in the limit this way (Gold 69).Super-finite = contains all concepts covering a finite number of examples and at least one concept covering an infinity.
![Page 11: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/11.jpg)
11/23
Heuristic Approach
• Scoring making a compromise between:1. Specificity of the concept
2. Coverage of the positives by the concept
3. Complexity of the concept
• Implementations: – Ad-hoc measure of points 1, 2, 3 and combination in
a formulae, e.g.: Score = Coverage + Specificity – Complexity
– Minimum Message Length ideas (~MDL)
?
![Page 12: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/12.jpg)
12/23
Heuristic Approach: Ad-hoc implementation
• Elements of the score– Coverage: counting covered instances– Specificity: measure of the “proportion” of instances of
the space covered– Complexity: the size of the concept representation
(e.g., number of rules)• Advantages:
– Usually easy to implement– Usually provides parameters to tune the compromise
• Disadvantage: – No theory– Bias not always clear– How to combine coverage/specificity/complexity?
?
![Page 13: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/13.jpg)
13/23
Heuristic Approach: MML implementation
Canal
Examples
Hyp. Examples classes ¦ Hyp classes
0100101001011010101110101
Canal Examples and classesHyp.
00101101010111011101101
Examplesand classes ¦ Hyp
MML for discrimination
MML for characterisation
Gain = number of bits needed to send the message without compression – number of bits needed to send the message with compression.
?
![Page 14: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/14.jpg)
14/23
Heuristic Approach: MML implementation
• Advantages:– Some theoretical justifications in Kolmogorov/
Solomonov/ Ockam/ Bayes/ Chaitin works.– Absolute and meaningful score.
• Disadvantage:– Limit of the theory: the optimal code can NOT
be computed !– Difficult implementation:
the choices of the encoding creates the inference biases, this is not very intuitive.
![Page 15: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/15.jpg)
15/23
Positive only learning in ILP with CProgol4.2
![Page 16: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/16.jpg)
16/23
Positive only learning in ILP• The following is not a survey! This is from what I
already encountered but I have not looked for further references.
• MML implementations– Muggleton [88]– Srinivasan, Muggleton, Bain [93]– Stahl [96]
• Other implementations:– Muggleton CProgol4.2 [97]– Heuristic had-hoc method– Somehow based on MML, but the implementation
details makes it quite different.
![Page 17: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/17.jpg)
17/23
CProgol4.2 uses Bayes
DH DI DI ¦h
h H i I
Score: P(h ¦ E) = P(h) * P(E ¦ h) / P(E) • Fixing distributions and computing P(h), P(E ¦ h), P(E)
h
IH
E
![Page 18: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/18.jpg)
18/23
Assumptions for the distributions
• P(h) = e- size(h)
– Large theories are less probable than small ones
– size(h) = sum over the rules ci of h of the number of literals in the body of ci
• P(E ¦ h) = ΠeE DI¦h(e) = ΠeE DI (e) / DI (h)
– Assumption that DI and DH gives DI¦h
– Independence assumption between examples
![Page 19: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/19.jpg)
19/23
Replacing in Bayes
• P(h ¦ E) =
e- size(h) * [ ΠeE DI (e) / DI (h) ] / P(E)
• As we want to compare hypotheses:= [e- size(h) / DI (h)|E|] * Cste1
• Take the log: ln(P(h ¦ E)) = -size(h) + |E| * ln(1/DI (h)) + Cste2
• We still have to compute DI (h) ...
![Page 20: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/20.jpg)
20/23
DI (h): weight of h in the instance set
• Computing DI:
– Using a stochastic logic program S trained with the BK to model DI (not included in the talk)
• Computing DI(h):
– Generate R instances from DI
– h covers r of them
– DI (h) = (r+1) / (R+2)H
![Page 21: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/21.jpg)
21/23
Formulae for a whole theory covering E
• ln(P(h ¦ E)) = -size(h) - |E| * ln((r+1)/(R+2)) + C2
Complexity SpecificityCoverage
Estimation of final theory score from a partially inferred theory:• ln(P(h’ ¦ E)) =
|E|/p * size(h’) - |E| * ln( |E|/p * (r’+1)/(R+2)) + C3
![Page 22: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/22.jpg)
22/23
Final evaluation
• Suppression of |E| and C2:– f(h’) = size(h’) /p + ln(p) - ln(|E| * (r’+1)/(R+2))
• Possible boost of positives with k:– size(h’)/(k*p) + ln(k*p) - ln( |E|*(r’+1)/(R+2) )
• The formulae is not written anywhere (the above one is my best guess !).
• The papers are hard to understand• But it seems to work ...
Complexity SpecificityCoverage
![Page 23: 1/23 Learning from positive examples Main ideas and the particular case of CProgol4.2 Daniel Fredouille, CIG talk,11/2005](https://reader035.vdocument.in/reader035/viewer/2022062619/5515e683550346d46f8b4f5b/html5/thumbnails/23.jpg)
23/23
Conclusion
• Learning from positives only is a real challenge and methods from positive and negatives can hardly be adapted.
• Some nice theoretical frameworks exist. • When it gets to implementing heuristic
frameworks:– The theory is often lost in approximations and choices
of implementation.– Useful systems can be created but tuning and
understanding the biases have to be considered as very important stages of inference.