machine learning and formal concept analysisgabis/docdiplome/fca/kuzicfca04.pdf · 2010-04-19 ·...
TRANSCRIPT
![Page 1: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/1.jpg)
Machine Learning and Formal Concept AnalysisSergei O. Kuznetsov
All-Russia Institute for Scientific and Technical Information (VINITI), Moscow
Institut fur Algebra, Technische Universitat Dresden
Machine Learning... [1/79]
![Page 2: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/2.jpg)
Contents
1. Brief historical survey
2. JSM-method
3. Learning with Pattern Structures
4. Decision trees
5. Version spaces
6. Conclusions
Machine Learning... [2/79]
![Page 3: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/3.jpg)
Machine learning vs.Conceptual (FCA-based) Knowledge Discovery
Machine learning is “concerned with the question of how to construct computer
programs that automatically improve with experience” (T. Mitchell).
Conceptual (FCA-based) knowledge discovery is a “human-centered discovery
process”. “Turning information into knowledge is best supported when the
information with its collective meaning is represented according to the social and cultural
patterns of understanding of the community whose individuals are supposed to create the
knowledge.” (R. Wille)
Machine Learning... [3/79]
![Page 4: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/4.jpg)
Contents
1. Brief historical survey
2. JSM-method
3. Learning with Pattern Structures
4. Decision trees
5. Version spaces
6. Conclusions
Machine Learning... [4/79]
![Page 5: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/5.jpg)
Lattices in machine learning. Antiunification
Antiunification, in the finite term case, was introduced by G. Plotkin and J. C. Reynolds.
The antiunification algorithm was studied in
J. C. Reynolds, Transformational systems and the algebraic structure of atomic formulas,
Machine Intelligence, vol. 5, pp. 135-151, Edinburgh University Press, 1970.
as the least upper bound operation in a lattice of terms.
Example:
If
� � � ����� ��
� �
and
� � � �����
� �
� � , then
� � �� � � � ����� � ��� �
�����
.
Antiunification was used by Plotkin
G.D. Plotkin, A Note on inductive generalization, Machine Intelligence, vol. 5, pp. 153-163,
Edinburgh University Press, 1970.
as a method of generalization and later this work was extended to form a theory of inductive
generalization and hypothesis formation.
Machine Learning... [5/79]
![Page 6: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/6.jpg)
Formal Concept Analysis[Wille 1982, Ganter, Wille 1996]
� �
, a set of attributes
� �
, a set of objects
� relation
�� ��� �
such that
���� � �
if and only if object � has the attribute .
� �� � � �� �� �
is a formal context.
Derivation operators:
� � def� � � � � � ���� � � � � � �� � � def� � � � � � � �� � � � � � �
A formal concept is a pair
� �� �
:
� � �� � � ��
� � � �� �� � � � � ���
-
�
is the extent and
�
is the intent of the concept
� �� �
.
- The concepts, ordered by
� �� � �� �� � �� � �� � �� ! ��
form a complete lattice, called the concept lattice
" � �� �� �
.
Machine Learning... [6/79]
![Page 7: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/7.jpg)
Implications and attribute exploration
� Implication
��� �
for
�� � � �
holds if
� � � � �
, i.e., every object that has all
attributes from
�
also has all attributes from
�
.
� Implications obey Armstrong rules:
��� �
��� �� � �
��� ��
�� �
� � � � � �
� � �� � � �
��� �
Learning aspects
� Next Closure an incremental algorithm for constructing implication bases.
� Attribute exploration is an interactive learning procedure.
Machine Learning... [7/79]
![Page 8: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/8.jpg)
Lattice-based machine learning models. 1980s
� Closure systemsJSM-method [V. Finn, 1983]: similarity as meet operation
� Galois connections and non-minimal implication basesCHARADE system [J. Ganascia, 1987]
� Dedkind-McNeille closure of a generality order and implicationsGRAND system [G. D. Oosthuizen, 1988]
Machine Learning... [8/79]
![Page 9: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/9.jpg)
Lattices in machine learning. 1990s
� In 1990s the idea of a version space was elaborated by means of logical programming
within the Inductive Logical Programming (ILP)
S.-H. Nienhuys-Cheng and R. de Wolf, Foundations of Inductive Logic Programming, Lecture Notes in
Artificial Intelligence, 1228, 1997
where the notion of a subsumption lattice plays an important role.
� In late 1990s the notion of a lattice of “closed itemsets” became important in the datamining community, since it helps to construct bases of association rules.
Machine Learning... [9/79]
![Page 10: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/10.jpg)
Contents
1. Brief historical survey
2. JSM-method
3. Learning with Pattern Structures
4. Decision trees
5. Version spaces
6. Conclusions
Machine Learning... [10/79]
![Page 11: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/11.jpg)
JSM-method. 1
One of the first models of machine learning that used lattices (closure systems) was the
JSM-method by V. Finn.
V. K. Finn, On Machine-Oriented Formalization of Plausible Reasoning in the Style of F. Backon – J. S. Mill,
Semiotika Informatika, 20 (1983), 35-101 [in Russian]
Method of Agreement (First canon of inductive logic):
“ If two or more instances of the phenomenon under investigation have only one
circumstance in common, ... [it] is the cause (or effect) of the given phenomenon.”
John Stuart Mill, A System of Logic, Ratiocinative and Inductive, London, 1843
In the JSM-method positive hypotheses are sought among intersections of positive example
given as sets of attributes, same for negative hypotheses. Various additional conditions can be
imposed on these intersections.
Machine Learning... [11/79]
![Page 12: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/12.jpg)
JSM-method. 2
Logical means of the JSM-method: Many-valued many-sorted extension of the First-Order
Predicate Logic with quantifiers over tuples of variable length (weak Second Order).
Example: Formalization of the Mill’s Method of Agreement:
� ����� � �� � �� � �� � � ���� � �� � � �
� � ���� � �� � � �� � ���� � � � � ��� ���� � � � ��� ��&���� ���� � � � � � !� � � � &
&
"� � ���� � � � � � !� � ��# $% � � � � & &
� ��� & � � � & � � � � � &
� ' � (
&
� ' � (
&
&
") "* � � ) ' � * �
&
+, ) * , � ��# � � ' � �.- �& &
"/ "0 � � ���� � � � / !� 0 �
&
&
" $ � ���� � � � / !� � ��# $1 0 �
& &
� % / � # � �1 0
&
��32 ���� / � � � � � � � & � 45 � �
The predicate defines a closure system (w.r.t.
6
) generated by descriptions of positive
examples. At the same time,
6is a means of expressing “similarity” of objects given by
attribute sets.Machine Learning... [12/79]
![Page 13: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/13.jpg)
FCA translation[Ganter, Kuznetsov 2000]
A target attribute � � �
,
� positive examples: Set
��� � �
of objects known to have �,
� negative examples: Set
��� � �
of objects known not to have �,
� undetermined examples: Set
��� � �
of objects for which it is unknown whether they
have the target attribute or do not have it.
Three subcontexts of
� � � �� �� �
:
�� � � � �� �� �� � ��� � � � � .
A positive hypothesis
� � �
is an intent of�� not contained in the intent � � of any negative
example � �� :
�� �� � �� � �
Machine Learning... [13/79]
![Page 14: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/14.jpg)
Example of a learning context
G
�
M color firm smooth form fruit
apple yellow no yes round
grapefruit yellow no no round
kiwi green no no oval
plum blue no yes oval
toy cube green yes yes cubic �
egg white yes yes oval �
tennis ball white no no round �
Machine Learning... [14/79]
![Page 15: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/15.jpg)
Natural scaling of the context
G
�
M w y g b f
�
s � r � fruit
apple � � � �
grapefruit � � � �
kiwi � � � �
plum � � � �
toy cube � � � � �
egg � � � � �
tennis ball � � � � �
Abbreviations:
“g” for green, “y” for yellow, “w” for white, “f” for firm, “
�
” for nonfirm,
“s” for smooth, “ �” for nonsmooth, “r” for round,
“� ” for nonround.
Machine Learning... [15/79]
![Page 16: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/16.jpg)
Positive Concept Lattice({1,2,3,4}, {
�
})
({1,2}, {y,
�
,r})
({1,4},{
�
,s})
({2,3},{
�
, �}) ({3,4},{
�
, �})
({1}, {1}
�
) ({2}, {2}
�
)({3}, {3}
�
)({4}, {4}
�
)
((
,
�
)
minimal (
�
)-hypothesesfalsified (
�
)-generalizations
{7}
�
= {w,�
, �,r}
G
�
M w y g b f
�
s � r fruit
apple �
grapefruit �
kiwi �
plum �
toy cube �
egg �
tennis ball �
Machine Learning... [16/79]
![Page 17: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/17.jpg)
Classification of undetermined example �
� If � �� contains a positive and no negative hypothesis, � � is classified positively (predicted
to have �).
� If � �� contains a negative and no positive hypothesis, � � is classified negatively.
� If � �� contains hypotheses of both kinds, or if � �� contains no hypothesis at all, then the
classification is contradictory or undetermined, respectively.
For classification purposes it suffices to have all minimal (w.r.t.
�
) hypotheses
Machine Learning... [17/79]
![Page 18: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/18.jpg)
Classifying undetermined example mango
G
�
M w y g b f
�
s � r � fruit
1 apple � � � � �
2 grapefruit � � � � �
3 kiwi � � � � �
4 plum � � � � �
5 toy cube � � � � �
6 egg � � � � �
7 tennis ball � � � � �
8 mango � � � � �
The object mango is classified positively:
� �� � � �
is a
�
-hypothesis,�� � � � �
mango
� � �
y��
� �� � �
;
� for
��
-hypotheses
�
w
�
and
�f, s,� �
:�
w
� ��
mango
�
,� �
, s,� � ��
mango
�
.
Machine Learning... [18/79]
![Page 19: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/19.jpg)
Variations of the learning model
- allowing for � �
of counterexamples (for hypotheses and/or classifications),
- imposing other logical conditions (e.g. of the “Difference method” of J. S. Mill):
Finn’s “lattice of methods”,
- nonsymmetric classification (applying only (
)-hypotheses),
- and so on.
The invariant: hypotheses are sought among positive and negative intents.
Machine Learning... [19/79]
![Page 20: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/20.jpg)
Toxicology analysis by means of the JSM-methodBioinformatics, 19(2003)
V. G. Blinova, D. A. Dobrynin, V. K. Finn, S. O. Kuznetsov and E. S. Pankratova
Predictive Toxicology Challenge: (PTC) Workshop at the joint 5th European Conference
on Knowledge Discovery in Databases (KDD’2001) and the 12th European Conference on
Machine Learning (ECML’2001), Freiburg.
Organizers: Machine Learning groups of the Freiburg University, Oxford University,
University of Wales.
Toxicology experts: US Environmental Protection Agency, US National Institute of
Environmental and Health Standards.
Machine Learning... [20/79]
![Page 21: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/21.jpg)
Toxicology analysis by means of the JSM-methodBioinformatics, 19(2003)
Training Sample: Data of the National Toxicology Program (NTP) with 120 to 150 positive
examples and 190 to 230 negative examples of toxicity: molecular graphs with indication of
whether a substance is toxic for four sex/species groups:
�
male, female� � �
mice, rats
�
.
Testing Sample: Data of Food and Drug Administration (FDA): about 200 chemical
compounds with known molecular structures, whose (non)toxicity, known to organizers, was to
be predicted by participants.
Participants: 12 research groups (world-wide), each with up to 4 prediction models for every
sex/species group.
Evaluation: ROC diagrams
Stages of the Competition:1. Encoding of chemical structures in terms of attributes,
2. Generation of classification rules,
3. Prediction by means of classification rules.
Results of each stage were made public by the organizers. In particular, encodings of chemical
structures made by a participant were made available to all participants.Machine Learning... [21/79]
![Page 22: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/22.jpg)
Example of Coding
Chemical structure Complete list of descriptors
S
O
��
��
��
��
��� �N
��H
��
H
H
��
H
� �
H
� �
H
��
O
��
��
��
��
��� � H
H
��
H
� �
H
� �
NH � �
H
��
6,06 � 20200331 � 21300241 � 22400331 � 202642410262241
��
��
��
��
��� �
6,06 (cyclic descriptors)
S
��
��
��
��
��� �
S
��
��
��
��
��� �
0200331 (linear descriptors)
Machine Learning... [22/79]
![Page 23: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/23.jpg)
Some positive hypotheses
Molecular graph FCCS descriptors (encoding)
�
of predictions in sex/species group(s)
��
��
��
��
��� �
��
HNNH�� ��
6,06 0200021 2FR
��
NH
��
NH
O
��
NH
CH �
0201131 0202410 1FR 1MM
Machine Learning... [23/79]
![Page 24: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/24.jpg)
ROC diagrams : Rats
Machine Learning... [24/79]
![Page 25: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/25.jpg)
ROC diagrams : Mice
Machine Learning... [25/79]
![Page 26: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/26.jpg)
Contents
1. Brief historical survey
2. JSM-method
3. Learning with Pattern Structures
4. Decision trees
5. Version spaces
6. Conclusions
Machine Learning... [26/79]
![Page 27: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/27.jpg)
Order on labeled graphs
�� � � � �� ���
� ��
dominates
�� � � � �� ���
� ��
or
�� � ��
if there exists a one-to-one mapping � �� � �� such that
� respects edges:
���� � �� � � ��� � � � � �� ,
� fits under labels:
�� �� � �� � � ���
.
Example:
� �
�
��
vertex labels are unordered
� � �
�
�� �
�
�
� � �
� � �
for any vertex label
� �
� � � �
�
�� �
Machine Learning... [27/79]
![Page 28: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/28.jpg)
Semilattice on graph sets
� � ��� �� � � �� � � � �� � � �� � � �� � � � �� � �
= The set of all maximal common subgraphs of
�� and
�� .
Example:
� � �
�
�� �
�������������������������
�������������������������
� � ��
�
� ��
�������������������������
�������������������������
� � ��
�
�
�
� �
�
��
�������������������������
�������������������������
Machine Learning... [28/79]
![Page 29: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/29.jpg)
Meet of graph sets
For sets of graphs
� � � �� � � � �� ��� �
and
� � �� � � � � �� ��
�
�� � � MAX
� ����
� � �� � � �� �
�
is idempotent, commutative, and associative.
Example:
� � �
�
� �
�������������������������
�������������������������
� � ��
�
�
�
� �
�
��
�������������������������
�������������������������
��
�
�
� �
�
�����������������
�����������������
Machine Learning... [29/79]
![Page 30: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/30.jpg)
Examples
�� :
Positive examples:
� � �
�
� � � �
��� :
� � �
�
� � �
� � :
� � �
�
�� �
��� :
� � ��
�
� ��
��� :
Negative examples:
� � � �
�
� � � �
��� :
� � � �
�
� ��
��� :
� � � �
�
� � ��
Machine Learning... [30/79]
![Page 31: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/31.jpg)
Positive (semi)lattice{1,2,3,4}
{1,2,3} {2,3,4}
{1,2} {2,3} {3,4}
{1} {2} {3} {4}
� � � � �
�
� � � � � �
�������
������
������
������
� � � � �
�
� � � � �
��������
������
�������
������
� � � � �
�
�� � �
��������
������
�������
������
� � � ���
� � ��
�������
������
������
������
� � � � � �
�
� � ��
�������
������
������
������
� � � � �
�
� � ���������
������
�������
������
� � � � �
�
� � �
�
� �
�������
������
������
������
� � � ���
� �
��
� � ��
��������
������
�������
������
� � � � �
�
����
��
���
��
� � ��
� �
��������
������
�������
������
��
� �
� � �
�
��������
������
�������
������
positive examples 1, 2, 3, 4
negative example 6
�
�
Machine Learning... [31/79]
![Page 32: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/32.jpg)
Positive lattice{1,2,3,4}
{1,2,3} {2,3,4}
{1,2} {2,3} {3,4}
{1} {2} {3} {4}
(
� � � � �
�
� � � � � �
�������
������
������
������
� � � � �
�
� � � � �
��������
������
�������
������
� � � � �
�
�� � �
��������
������
�������
������
� � � ���
� � ��
�������
������
������
������
� � � � � �
�
� � ��
�������
������
������
������
� � � � �
�
� � ���������
������
�������
������
� � � � �
�
� � �
�
� �
�������
������
������
������
� � � ���
� �
��
� � ��
��������
������
�������
������
� � � � �
�
����
��
���
��
� � ��
� �
��������
������
�������
������
��
� �
� � �
�
��������
������
�������
������
positive examples 1, 2, 3, 4
negative example 6
�
�
Machine Learning... [32/79]
![Page 33: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/33.jpg)
Pattern Structures[Ganter, Kuznetsov 2001]
� �� ��
�
is a pattern structure if
� �
is a set (“set of objects”);
� � � � �� �
is a meet-semilattice;
� � �� �
is a mapping;
� the set
� � � � � � ��� � � � �
generates a complete subsemilattice
� ���� �
of
� �� �
.
Possible origin of
�
operation:
� A set of objects
�
, each with description from�
;
� Partially ordered set
� �� �
of “descriptions” (
�
is a “more general than” relation);
� The (distributive) lattice of order ideals of the ordered set
� �� �
.
Machine Learning... [33/79]
![Page 34: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/34.jpg)
Pattern Structures
Pattern structure is a tuple
� ��� �
� � �
�
, where
� �
is a set of “examples”,
� �
is a mapping of examples to “descriptions”,
� � � ��
� � � � � � ��� � � � �
.
The subsumption order: � � � � �� � � ��
Derivation operators:
� � � � � � � � ���
for� � �
� � � � � � � � � � � � �for � ��
A pair
� �� � is a pattern concept of
� ��� �
� � �
�
if
� � �� � ��
� � � �� � � � �
�
is extent and � is pattern intent.
Machine Learning... [34/79]
![Page 35: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/35.jpg)
Pattern-based Hypotheses
�� and
�� are positive and negative examples for some goal attribute,
�� � �� � �� �� � �� � �
A positive hypothesis
�
is a pattern intent of
� �� � � �� � �
�
not subsumed by any negative
example:
� � � �� � �
and
� �� � �� � � ��
Machine Learning... [35/79]
![Page 36: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/36.jpg)
Projections as Approximation Tool
Motivation: Complexity of computations in
� �� ��
�
, e.g.,
SUBGRAPH ISOMORPHISM, i.e., testing
�
for graphs is NP-complete.
�
is projection (kernel operator) on an ordered set
� �� �
if
�
is
monotone: if � � � , then
� � � � � ���
,
contractive:
� � � � �,
idempotent:
� � � � � � � � � .
Machine Learning... [36/79]
![Page 37: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/37.jpg)
Projections as Approximation Tool
Example. A projection for labeled graphs:
��
� �
takes
�
to the set of its �-chains not
dominated by other �-chains. Here � � �
.
��
�
� �
� �
�
��� �
� �
� �
� �
�
�� �
�� � �
�� � �
� � �
� � �
� � �
� � �
�� ���� �
�� � �
� � �
� � �
� � �
� � �
� ��� �
Machine Learning... [37/79]
![Page 38: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/38.jpg)
Property of projections
Any projection of a complete semilattice
� �� �
is
�
-preserving, i.e., for any
�� � �
� � �� � � � � � � � �� �
Example. A projection for labeled graphs:
��
� �
takes
�
to the set of its �-chains not
dominated by other �-chains. Here � � �
.
�� � �
�� � �
���
� �� � � �
� � �
� � �
� � �
�� � �
� � �
��
� �� � � �
� � �
� � �
�� � �
� � �
��
� �� � ��
� �� � � �
� � �
� � �
Machine Learning... [38/79]
![Page 39: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/39.jpg)
Projections and Representation Context
Graphs
�
� �
� �
�
��� � �
Graph projections
�� � �
�� � �
� � �
� � �
� � �
� � �� � �
Lattice of graph sets Lattice of graph sets projections
Representation Context
G
�
M a b c d e f goal
1 � � � � �2 � � � � � � �3 � � � �4 � � � � �5 � � � �
6 � � � � � �
7 � � � � � �
Representation Subontext
G
�
M a b c d e f goal
1 � � � � �
2 � � � � � � �
3 � � � �
4 � � � � �
5 � � � �
6 � � � � � �
7 � � � � � �
�
projection
�
projection
�
projection
Basic Theorem of FCA Basic Theorem of FCA
Machine Learning... [39/79]
![Page 40: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/40.jpg)
4-Projections{1,2,3,4}
{1,2,3} {2,3,4}
{1,2} {2,3} {3,4}
{1} {2} {3} {4}
(
� � ��
� � � � � � � �
� � � �
��
� � � � � �
��������
������
�������
������
� � ��
� � � � � � � �
� � � �
��
� � � � �
��������
������
�������
������
� � ��
� �
��
�� � � � � �� � �
���
��������
������
�������
������
� � ��
��
� � � ���
� � �
��
� � ��
��������
������
�������
������
� � � ��
� � � � �� � � �
���
�������
������
������
������
� � ��
� � � � � � � �
�
��������
������
�������
������
� � � � �
�
� � �
�
� �
�������
������
������
������
� � ��
� � � � �
� ���
� �
��������
������
�������
������
� � � � �
�
����
��
���
��
� � ��
� �
��������
������
�������
������
��
� �
� � �
�
��������
������
�������
������
positive examples 1, 2, 3, 4
negative example 6
�
�
Machine Learning... [40/79]
![Page 41: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/41.jpg)
3-Projections{1,2,3,4}
{1,2,3}
{1,2} {3,4}
{1} {2} {3} {4}
(
� � ��
� � �
� � � �
�
� � � � � �
����
��
���
��
� � ��
� � � � �
� � � �
�
� � � � �
����
��
���
�� � � � � �
��� � � � � �
��
��
�������
������
������
������
� � ��
� � � ��
� � �
�
� � ��
����
��
���
��
��
� � � � ��
���
�������
������
�������
������
��
� � � � � �
�
� � � � � �
������
������
������
������
� � ��
� � �
��� � �
����
��
���
��
� � � � � � � �
� � �
�
����
��
���
��
��
� �
� � �
�
��������
������
�������
������
positive examples 1, 2, 3, 4
negative example 6
�
�
Machine Learning... [41/79]
![Page 42: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/42.jpg)
2-Projections{1,2,3,4}
{1,2} {3,4}
(
� � ��
���
� � �
��
�� �
��
� �
� � �
��
�� �
�� � �
��
� �
��
� � �
�� �
���
��
� �
� � �
�� �
��
��
� �
positive examples 1, 2, 3, 4
negative example 6�
�
�
Machine Learning... [42/79]
![Page 43: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/43.jpg)
Spam filtering
� First successful applications of concept-based hypotheses for filtering spam:
L. Chaudron and N. Maille, Generalized Formal Concept Analysis, in Proc. 8th Int. Conf. on
Conceptual Structures, ICCS’2000, G. Mineau and B. Ganter, Eds., Lecture Notes in Artificial
Intelligence, 1867, 2000, pp. 357-370.
� Data Mining Cup (DMC, April-May 2003) http://www.data-mining-cup.de
� Organized by Technical University Chemnitz, European Knowledge Discovery Network,
and PrudSys AG
� 514 participants from 199 Universities from 39 countries
� Training dataset: 8000 e-mail messages (39
�
) qualified as spam (positive examples)
and the rest (61
�
) as nonspam (negative examples), 832 binary attributes and one
numerical (ID)
� Test dataset: 11177 messages
Machine Learning... [43/79]
![Page 44: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/44.jpg)
Spam filtering
� The sixth place was taken by a model of F. Hutter (TU-Darmstadt) which combined
“Naive Bayes” approach with that of concept-based (JSM-) hypotheses. This was the best
model among those that did not use the first (numerical) ID attribute, which was implicit
time (could be scaled ordinally).
� The sixteenth and seventeenth places in the competition were taken by models from
TU-Darmstadt that combined concept-based (JSM-) hypotheses, decision trees, and Naive
Bayes approaches using the majority vote strategy.
Machine Learning... [44/79]
![Page 45: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/45.jpg)
Contents
1. Brief historical survey
2. JSM-method
3. Learning with Pattern Structures
4. Decision trees
5. Version spaces
6. Conclusions
Machine Learning... [45/79]
![Page 46: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/46.jpg)
Decision trees
Input: descriptions of positive and negative examples as sets of attribute values.
All vertices (except for the root and the leaves) are labeled by attributes and edges are labeled
by values of the attributes (e.g., 0 or 1 in case of binary attributes), each leaf is labeled by a class
or � : examples with all attribute values in the path leading from the root to the leaf belong to
a certain class, either
or � .
Systems like ID3 [R. Quinlan 86] compute the value of the information gain (IG), or negentropy
for each vertex and each attribute not chosen in the branch above.
The algorithm sequentially extends branches of the tree by choosing an attribute with the
highest information gain (that “most strongly separates” objects from classes
and � ).
Extension of a branch terminates when a next attribute value together with attribute values
chosen before uniquely classify examples into one of the classes
or � . An algorithm can stop
earlier to avoid overfitting.
Machine Learning... [46/79]
![Page 47: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/47.jpg)
Entropy
In real systems (like ID3, C4.5) a next chosen attribute should maximize some information
functional, e.g., information gain (IG), based on the entropy w.r.t. the target attribute
Ent
� � � �� � � � � � �
� � � � ��
� � �� � � � � �
� � � �
are values of the target attribute � � � �
is the conditional sample probability (for the
training set) that an object having a set of attributes
�
belongs to a class � � � �
.
Machine Learning... [47/79]
![Page 48: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/48.jpg)
An example of a decision tree
Decision tree obtained by the IG-based algorithm:
G
�
M w y g b f
�
s � r fruit
apple �
grapefruit �
kiwi �
plum �
toy cube �
egg �
tennis ball �
w
�examples 6,7
yes
f
no
�
example 5
yes
examples 1,2,3,4
no
� Note that attributes f and w has the same IG value (a similar tree with f at the root is also
optimal), IG-based algorithms usually take the first attribute with the same value of IG.
� The tree corresponds to three implications
�
w
� � � ,
� �, f
� � � ,
� ��� � �
.
Machine Learning... [48/79]
![Page 49: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/49.jpg)
An example of a decision tree
Decision tree obtained by the IG-based algorithm:
G
�
M w y g b f
�
s � r fruit
apple �
grapefruit �
kiwi �
plum �
toy cube �
egg �
tennis ball �
w
�examples 6,7
yes
f
no
�
example 5
yes
examples 1,2,3,4
no
� The closures of the implication premises make the corresponding negative and positive
hypotheses.
� Note that the hypothesis
� �, f
� � �
is not minimal, since there is a minimal hypothesis
�
f
� � �
contained in it. The minimal hypothesis
�
f
� � �
corresponds to a decision path of the
IG-optimal tree with the attribute f at the root.
Machine Learning... [49/79]
![Page 50: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/50.jpg)
Decision trees in FCA terms
Training data is given by the context
�� � � � �� � �� � �� �� � ��
with the derivation
operator
��
�
. In FCA terms
�� � is the subposition of
�� and
�� .
Assumption. The set of attributes
�
is dichotomized: For each attribute �
there is an
attribute � �
, a “negation” of : � � �
iff � � �
.
� A subset of attributes
� � �
is noncontradictory if � �or � � �
.
� A subset of attributes
� � �
is complete if for every �
one has �
or � �
.
The construction of an arbitrary decision tree proceeds by sequentially choosing attributes.
First we ignore the optimization aspect related to the information gain.
A sequence of attributes
� � � � � �� � �
is called a decision path if
� � � � � �� � �
is
noncontradictory and there exists an object � � � � �� such that
� � � � � �� � � � � � �
(i.e., there is an example with this set of attributes).
Machine Learning... [50/79]
![Page 51: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/51.jpg)
Decision trees in FCA terms
� A decision path
� � � � � �� � �
is a (proper) subpath of a decision path
� � � � � �� � �
if
� � �
(
� � �
, respectively).
� A decision path
� � � � � �� � �
is called full if objects having attributes� � � � � �� � �
are all either positive or negative examples.
� A full decision path is irredundant if none of its subpaths is a full decision path. The set
of all chosen attributes in a full decision path can be considered as a sufficient condition
for an object to belong to a class � � � �
.
A decision tree is a set of full decision paths.
� The closure of a decision path
� � � � � �� � �is the closure of the corresponding set of
attributes, i.e.,
� � � � � �� � � � �
.
� A sequence of concepts with decreasing extents is called a descending chain.
� A chain starting at the top element of the lattice is called rooted.
Machine Learning... [51/79]
![Page 52: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/52.jpg)
Semiproduct of dichotomic scales
The semiproduct of two contexts
�� and
�� is defined by
��
��� �� � � �� � �� � �� �� �� � � , where
� � � � � � � ��� �� � � for
� ��� � � �
For example, the semiproduct of three dichotomic scales
� �� � ��
�� � �� � looks as follows:
a � b
c �
1 � � �
2 � � �
3 � � �
4 � � �
5 � � �
6 � � �
7 � � �
8 � � �
Machine Learning... [52/79]
![Page 53: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/53.jpg)
Semiproduct of dichotomic scalesConcept lattice of the semiproductof three dichotomic scales(diagram vertices are labeled by intents)
(
� � � � � �
� � � � � �
� � � � � � � �
� �
� � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � �
�
�� ��� ��
��� � �
a � b
�
c �
1
2
3
4
5
6
7 8
Machine Learning... [53/79]
![Page 54: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/54.jpg)
Decision trees vs. semiproducts of dichotomic scales
Consider the following context
� � � �� �� �
:
The set of objects
�
is of size
� � � ���
and the relation
�
is such that the set of object intents is
exactly the set of complete noncontradictory subsets of attributes.
In terms of FCA the context
�
is the semiproduct of
� � � � �dichotomic scales or
� � ��
��� � � ���� � � � ��� (denoted by
��� �
�
for short), where each dichotomic scale
�� stays for the pair of attributes
�
m, � .
Proposition. Every decision path is a rooted descending chain in
" � �� �
�
and every rooted
descending chain consisting of concepts with nonempty extents in
" � �� �
�
is a decision path.
Machine Learning... [54/79]
![Page 55: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/55.jpg)
Decision trees vs. semiproducts of dichotomic scales
To relate decision trees to hypotheses introduced above we consider again the contexts�� � � �� � �� ��
,
�� � � �� � �� ��
, and
�� � � � �� � �� � �� �� � �� . The context
�� � can be much smaller than
��� �
�
because the latter always has
� � � ���objects while the
number of objects in the former is the number of examples. Also the lattice
" � �� �
can be
much smaller than
" � ��� �
�
.
Proposition. A full decision path
� � � � � �� � �
corresponds to a rooted descending chain
� � � �� � �� � � � �� � � � � � � �� � � � ��
� � � � � �� � � � �
of the line diagram of
" � �� �
and the
closure of each full decision path
� � � � � �� � �
is a hypothesis, either positive or negative. Moreover,
for each minimal hypothesis
�
, there is a full irredundant path
� � � � � �� � �
such that� � � � � �� � � � � � �
.
Machine Learning... [55/79]
![Page 56: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/56.jpg)
Discussion of the propositions
The propositions illustrates the difference between hypotheses and irredundant decision paths.
� Hypotheses correspond to “most cautious” (most specific) classifier consistent with the
data: they are least general generalizations of descriptions of positive examples (i.e., of
object intents).
� The shortest decision paths (for which in no decision tree there exist full paths with proper
subsets of attribute values) correspond to the “most courageous” (or “most discriminant”)
classifiers: being the shortest possible rules, they are most general generalizations of
positive example descriptions.
� It is not guaranteed that for a given training set there is a decision tree such that minimal
hypotheses are among closures of its paths.
� In general, to obtain all minimal hypotheses as closures of decision paths one needs to
consider not only paths optimal w.r.t. the information gain functional.
The issues of generality of generalizations, e.g., the relation between most specific and most
general generalizations, are naturally captured in terms of version spaces.Machine Learning... [56/79]
![Page 57: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/57.jpg)
Recalling the Information Gain
For dichotomized attributes the information gain is natural to define for a pair of attributes
� �
.
For a decision path
� � � � � �� � �
IG
� � �� � ���
�� � � Ent
� ��
�� � ��
�� � � Ent
� ��
�
where
�� � � � � � � �� �� �
,
�� � � � � � � �� �� �
, and for
� � �
Ent
� � � �� � � � � � �
� � � � ��
� � �� � � � � �
� � � �
are values of the target attribute � � � �
is the conditional sample probability (for the
training set) that an object having a set of attributes
�
belongs to a class � � � �
.
Machine Learning... [57/79]
![Page 58: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/58.jpg)
Information Gain is nonsensitive to closure
If the derivation operator
��
�
is associated with the context
� � � � �� � �� �� � �� , then
� � � � �� � � � � � �
� � � � �� � � � � � � � � �
� � � � � � � � � � � � � �
by the property of the derivation operator
��
�
:
� � � � � � � �
.
Hence,
� Instead of considering decision paths, one can consider their closures without affecting the
values of the information gain.
In FCA terms: instead of the concept lattice
" � ��� �
�
one can consider the concept
lattice
" � �� � � " � �� � �� � �� �� � ��
, which can be much smaller.
� If implication � � holds in the context
�� � � � �� � �� � �� �� � ��
, then an
IG-based algorithm will not choose attribute � in the branch below chosen and will not
choose in the branch below chosen �.
Machine Learning... [58/79]
![Page 59: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/59.jpg)
Contents
1. Brief historical survey
2. JSM-method
3. Learning with Pattern Structures
4. Decision trees
5. Version spaces
6. Conclusions
Machine Learning... [59/79]
![Page 60: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/60.jpg)
Version spaces
T. Mitchell, Generalization as Search, Artificial Intelligence 18, no. 2, 1982.
T. Mitchell, Machine Learning, The McGraw-Hill Companies, 1997.
� An example language
�� that describes a set
�
of examples;
� A classifier language
��� that describes a set
�
of classifiers (elsewhere called concepts);
� A matching predicate
� � �� �
: We have
� � �� �
iff � is an example of classifier � or �
matches � . The set of classifiers is (partially) ordered by a subsumption order: for
�� � �� �� ,
�� � �� � �� � � � � �� � � � � � �� � � �
� Sets
�� and
�� of positive and negative examples of a target attribute with
�� � �� � �
.
Machine Learning... [60/79]
![Page 61: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/61.jpg)
Version spaces
� Consistency predicate cons
� � :cons
� � holds if for every � �� the matching predicate
� � �� �
holds and for every
� �� the negation �
� � �� �
holds.
� Version space is the set of all consistent classifiers: VS
� ��� ��� � � �� � � �� � ��
�
� Learning problem:
Given
��� ��� � � �� � � �� � �� .
Find the version space VS
� ���� ��� � � �� � � �� � ��
.
� Classification:
A classifier � VS classifies an example � positively if � matches � , otherwise it classifies
it negatively.
An example � is � �
-classified if no less than
�� � � �
�
VS
�
classifiers classify it positively.
Machine Learning... [61/79]
![Page 62: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/62.jpg)
Version spaces in terms of boundary sets
T. Mitchell, Generalization as Search, Artificial Intelligence 18, no. 2, 1982.
T. Mitchell, Machine Learning, The McGraw-Hill Companies, 1997.
If every chain in the subsumption order has a minimal and a maximal element, a version space
can be described by sets of most specific
� �
VS
and most general� �
VS
elements:
� �
VS
� MIN
�
VS
� � � VS
��
� �� VS �� � � ��
� �
VS
� MAX
�
VS
� � � VS
��
� �� VS � � �� � �
Machine Learning... [62/79]
![Page 63: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/63.jpg)
Version spaces in terms of Galois connections
Formal context
� ��
�� �
:
� �
is the set of examples containing disjoint sets of observed positive and negative
examples of a target attribute:
� ! �� � �� ,
�� � �� � �
;
� �
is the set of classifiers;
� relation
�
corresponds to the matching predicate
� � �� � : for � �
, � �
the relation
� � � holds iff
� � �� � � � ;
� ��
is the complementary relation: � �� � holds iff� � �� � � .
Proposition.
VS
� �� � �� � �� � � ��
�� �
Machine Learning... [63/79]
![Page 64: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/64.jpg)
Corollary: Merging version spaces
H. Hirsh, Generalizing Version Spaces, Machine Learning 17, 5-46, 1994.
For fixed
�� ,
�� ,
� � �� �
and two sets
�� � � �� � and
�� � � �� � of positive and negative
examples one has
VS
� �� � � �� � � �� � � �� � � VS
� �� � � �� � �
VS
� �� � � �� � �
Machine Learning... [64/79]
![Page 65: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/65.jpg)
Corollary: Merging version spaces
H. Hirsh, Generalizing Version Spaces, Machine Learning 17, 5-46, 1994.
For fixed
�� ,
�� ,
� � �� �
and two sets
�� � � �� � and
�� � � �� � of positive and negative
examples one has
VS
� �� � � �� � � �� � � �� � � VS
� �� � � �� � �
VS
� �� � � �� � �
Proof. By the property
� � � � � � � � � � �
,
VS
� �� � � �� � � �� � � �� � � � �� � � �� � � � � �� � � �� � �� �
� �
� � � � �
� � � � ��
� � � � ��
� � � � � �
� � � � ��
� � � � � �
� � � � ��
� � �
VS
� �� � � �� � �
VS
� �� � � �� � �
Machine Learning... [65/79]
![Page 66: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/66.jpg)
More corollaries: Classifications and closed sets
Proposition. The set of all 100%-classified examples defined by the version space
VS
� �� � ��
is given by
� �� � � ���� � �
Machine Learning... [66/79]
![Page 67: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/67.jpg)
More corollaries: Classifications and closed sets
Proposition. The set of all 100%-classified examples defined by the version space
VS
� �� � ��
is given by
� �� � � ���� � �
Interpretation of a closed set of examples:
Proposition. If
�� � � � �� and
�� � �
, then there cannot be any 100%-classified
undetermined example.
Machine Learning... [67/79]
![Page 68: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/68.jpg)
More corollaries: Classifications and closed sets
Proposition. The set of all 100%-classified examples defined by the version space
VS
� �� � ��
is given by
� �� � � ���� � �
Interpretation of a closed set of examples:
Proposition. If
�� � � � �� and
�� � �
, then there cannot be any 100%-classified
undetermined example.
Proposition. The set of examples that are classified positively by at least one element of the
version space VS
� �� � ��
is given by
� � � �� � � ���� �� �
Machine Learning... [68/79]
![Page 69: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/69.jpg)
Classifier semilattices
Proposition. If the classifiers, ordered by subsumption, form a complete semilattice, then the
version space is a complete subsemilattice for any sets of examples
�� and�� .
We use again pattern structures here
B. Ganter and S. O. Kuznetsov, Pattern Structures and Their Projections, Proc. 9th Int. Conf. on
Conceptual Structures, ICCS’01, G. Stumme and H. Delugach, Eds., Lecture Notes in Artificial
Intelligence, 2120, 2001, pp. 129-142.
Assumption: the set of all classifiers forms a complete semilattice
� �� �
.
Corollary: a dual join operation
�
is definable.
Machine Learning... [69/79]
![Page 70: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/70.jpg)
Pattern Structures
Pattern structure is a tuple
� ��� �
� � �
�
, where
� �
is a set of “examples”,
� �
is a mapping of examples to “descriptions”,
� � � ��
� � � � � � ��� � � � �
.
The subsumption order: � � � � �� � � ��
Derivation operators:
� � � � � � � � ���
for� � �
� � � � � � � � � � � � �for � ��
A pair
� �� � is a pattern concept of
� ��� �
� � �
�
if
� � �� � ��
� � � �� � � � �
�
is extent and � is pattern intent.
Machine Learning... [70/79]
![Page 71: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/71.jpg)
Pattern-based Hypotheses
�� and
�� are positive and negative examples for a target attribute,
�� � �� � �� �� � �� � �
A positive hypothesis
�
is a pattern intent of
� �� � � �� � �
�
not subsumed by any negative
example:
� � � �� � �
and
� �� � �� � � ��
Machine Learning... [71/79]
![Page 72: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/72.jpg)
Hypotheses vs. version spaces[Ganter, Kuznetsov 2003]
Definition A positive example � � is hopeless iff
��� � � � � �� � � �
Interpretation: � � has a negative counterpart � � �� such that every classifier which
matches � � also matches � � .
Theorem 1. Suppose that the classifiers, ordered by subsumption, form a complete meet-semilattice
� �� �
, and let
� ��� �
� � �
�
denote the corresponding pattern structure. Then the following are
equivalent:
1. The version space VS
� �� � ��
is not empty.
2.
� �� � � � �� � �
.
3. There are no hopeless positive examples and there is a unique minimal positive hypothesis
�
min.
In this case,
�
min
� � �� �
, and the version space is a convex set in the lattice of all pattern
intents ordered by subsumption with maximal element
�
min.
Machine Learning... [72/79]
![Page 73: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/73.jpg)
Hypotheses vs. version spaces
� �
is a proper (positive) predictor if
� � � �� � � ��
� � � �� � ��
��� � � � � � �� � � �
Theorem 2. Let
�� ,
�� be sets of positive and negative examples,� ��
� �� � ��
,
� �� � �� � ��
denote sets of minimal positive hypotheses and proper positive predictors, respectively. Then
� ��
� �� � �� � � � � � ��� � � � �
� �
VS
�� � � ��
� �� � �� � �� � �
� � � � �
� �
VS
� �� � ��
Machine Learning... [73/79]
![Page 74: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/74.jpg)
Proper predictors
� � �
� � � � �
�
����
��
���
��
� � � ���
� �
� � �
�
��
�������
������
������
������
minimal hypothesis
� � � � ���
� �
� � � ���
���
��
���
��
� � �
�
��
�������
������
������
������
proper predictors
� � ��
� �
��������
������
�������
������
��
� �
� � �
�
������
������
������
������
falsifiedgeneralizations(
�
-intents)
� � � � � �
�
� � ��
��������
������
�������
������
negative example 6
�
�
Machine Learning... [74/79]
![Page 75: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/75.jpg)
Example. Boundaries of the Version Space
If disjunction is not allowed, then
� � � �
. If disjunction is allowed, then
� �� � �� �
�� � � ���
�� � � � ��
�� �� �
�� � � ���
� � ��
������������
������������
������������
������������
� � ���
� �
� � ���
����
����
����
���� ,� � ��
�
� �
�� ��
�
���������
��������
���������
��������
equivalent to
� �� , but generally can be of size
� �� �� �
�
�, where
� � ����� � � � � � ��� ��
.
�
���� � � ��
� ��� � ���
���� �� ��
� ��� � �
���� � � ��
�� ����
���� � ���
� � ��
������������� ������������"!
# ����������� $�����������"%
&'�( & )'&
*'�+ *'+
�,���������������-!
# ��������$������-%
&' ( & ) '&
*'+ ) '
�,���������������-!
# ��������$������-%
&' ( & )'&
&. &' (
�,���������������-!
# ��������$������-%
&'�( & &.&
)' &.
�/���������������-!
# �������$�������-%
trivial generalization: disjunction of all positive examples.
Machine Learning... [75/79]
![Page 76: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/76.jpg)
Computing a Version Space
We order the set
� � � � � ��� � ��� of all examples as follows:
� � ��� ��
���� � � � � �� � ��� � � � � �� ���� � � � � ��
The following notation is adapted from the standard formulation of the NextClosure algorithm:
� For
�� � � �
and
� � �
define
� ��� ��� �� � � �� � �� �� and ! � � �� ! � �
for all
! � � "�
� For
� � �
and
� � �
,
� �� �
, define
�$# �� � � ! � � � ! � �� � � �� " % %�
Machine Learning... [76/79]
![Page 77: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/77.jpg)
An algorithm
If the classifiers, ordered by subsumption, form a finite meet-semilattice, then the version space
can be computed as follows:
1. If
� � " % %� ��� � � �
then the version space is empty else
2. The first element is
�
min
� � � � " %
;
3. If
�
is an element of the version space, then the “next” element is
�
next
� � �# � " %�
where
�� � � %
, and
�
is the largest element that is greater than �
max and that satisfies� �� �# ��
Machine Learning... [77/79]
![Page 78: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/78.jpg)
Conclusions
� Decision trees and version spaces are neatly expressed in terms of Galois connections and
formal concepts
� Under reasonable assumptions version spaces can be computed as concept lattices
� The set of classifiers between (in sense of generalization order) minimal hypotheses and
proper predictors can be more interesting and/or more compact than a version space, since
it introduces “restricted” disjunction over minimal hypotheses.
� Generally, FCA is a convenient tool for formalizing symbolic models of machine learning
based on generality relation.
Machine Learning... [78/79]
![Page 79: Machine Learning and Formal Concept Analysisgabis/DocDiplome/FCA/kuzicfca04.pdf · 2010-04-19 · Lattices in machine learning. 1990s In 1990s the idea of a version space was elaborated](https://reader033.vdocument.in/reader033/viewer/2022042623/5faf9d3f35fbd30f4a11bed1/html5/thumbnails/79.jpg)
Thank you
Machine Learning... [79/79]