Download - Ecml2010 Slides
Induction of Concepts in Web Ontologiesthrough Terminological Decision Trees
Nicola Fanizzi Claudia d’Amato Floriana Esposito
LACAM – Dipartimento di InformaticaUniversita degli Studi di Bari ”Aldo Moro”
ECML/PKDD 2010 – Barcelona, Spain
Preliminaries Motivation
Context
In the context of the Semantic Webnext Generation Knowledge Bases expressed as Ontologies
Problem with building ontologies:Burdensome taskDomain Expert 6= Knowledge Engineer
ThenAutomated Methods for learning conceptsexpressed in standard SW representations
founded on Description Logics
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 2 / 32
Preliminaries State of the Art
Related Work
Early works
focused on learnability, LCS op. for the CLASSIC family (ancestorsof the DL languages) [Cohen et al., 1992, Cohen and Hirsh, 1994]KLUSTER: conceptual clustering in BACK [Kietz and Morik, 1994]
approaches based on refinement operators
ALER refinement operators [Badea and Nienhuys-Cheng, 2000]YINYANG: downward operator based on the notion ofcounterfactuals; examples expressed as most specific concepts:complex concepts definitions [Iannone et al., 2007]DL-LEARNER: top-down GP algorithm, based on new downwardoperators, heuristic that favor definitions of limited complexity[Lehmann and Hitzler, 2008, Lehmann and Hitzler, 2010]DL-FOIL adapts FOIL to DL representation [Fanizzi et al., 2008]
Other approaches: hybrid languages[Rouveirol and Ventos, 2000, Kietz, 2002, Lisi and Esposito, 2008]
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 3 / 32
Preliminaries State of the Art
In this work
Introduce Terminological Decision TreesInduction, Classification, ConversionEvaluation
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 4 / 32
Preliminaries State of the Art
Outline
1 DL: Representation & InferenceSyntax & SemanticsDL Knowledge BasesInference
2 Learning Concepts through TDTsLearning ProblemTerminological Decision TreesInductionClassificationConversion
3 EvaluationSetupResults
4 Conclusions
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 5 / 32
DL: Representation & Inference
Outline
1 DL: Representation & InferenceSyntax & SemanticsDL Knowledge BasesInference
2 Learning Concepts through TDTsLearning ProblemTerminological Decision TreesInductionClassificationConversion
3 EvaluationSetupResults
4 Conclusions
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 6 / 32
DL: Representation & Inference Syntax & Semantics
DLs Preliminaries I
In DLsaxioms inductively defined building on a vocabulary of
NC set of primitive concept namesNR set of primitive role namesNI set of individual names
and syntax constructors
Set-theoretic semantics defined by interpretations I = (∆I , ·I)∆I domain of the interpretation (non-empty)·I interpretation function that maps names to extensions
each A ∈ NC to a set AI ⊆ ∆I andeach R ∈ NR to RI ⊆ ∆I ×∆I
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 7 / 32
DL: Representation & Inference Syntax & Semantics
DLs Preliminaries II
ALC Syntax
C,D → > top concept| ⊥ bottom concept| A primitive concept Animal
| ¬C (full) concept negation ¬Parent| C uD concept conjunction Person u Male
| C tD concept disjunction Male u Female
| ∃R.C existential restriction ∃hasChild.Male| ∀R.C universal restriction ∀hasChild.Female
grammar rules names examples
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 8 / 32
DL: Representation & Inference Syntax & Semantics
DLs Preliminaries III
ALC Semantics
construct interpretatation OWL>I = ∆I owl:Thing
⊥I = ∅ owl:Nothing
¬CI = ∆I \ CI owl:complementOf
(C uD)I = CI ∩DI owl:intersectionOf
(C tD)I = CI ∪DI owl:unionOf
(∃R.C)I = {x | ∃y : (x, y) ∈ RI ∧ y ∈ CI} owl:someValuesFrom
(∀R.C)I = {x | ∀y : (x, y) ∈ RI → y ∈ CI} owl:allValuesFrom
In SW/DL: Open world assumption made
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 9 / 32
DL: Representation & Inference DL Knowledge Bases
Knowledge Bases I
A knowledge base K = 〈T ,A〉 contains
TBox T set of axioms C v D (resp. C ≡ D),meaning CI ⊆ DI (resp. CI = DI )where C is atomic and D is a concept description
ABox A set of assertions like C(a) and R(a, b),meaning that aI ∈ CI and (aI , bI) ∈ RI
Ind(A) = set of individuals occurring in A
Interpretations of interest (models) satisfy all axioms in K
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 10 / 32
DL: Representation & Inference DL Knowledge Bases
Knowledge Bases II
Example (Œdipus’ family)
T =
Female ≡ ¬Male,Father ≡ Male u ∃hasChild.>,Mother ≡ Female u ∃hasChild.>,Parent ≡ Mother t Father,
MotherWithNoDaughter ≡ Mother u ∀hasChild.¬Female
A =
Female(JOCASTA), Female(POLYNEIKES),Male(OEDIPUS), Male(THERSANDROS),hasChild(JOCASTA, OEDIPUS), hasChild(JOCASTA, POLYNEIKES),hasChild(OEDIPUS, POLYNEIKES),hasChild(POLYNEIKES, THERSANDROS),Parricide(OEDIPUS),¬Parricide(THERSANDROS)
Parent(OEDIPUS) trueMotherWithNoDaughter(POLYNEIKES) ?: daughter of POLYNEIKES not known
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 11 / 32
DL: Representation & Inference Inference
Inference & OWA
Q = ∃hasChild.(Parricide u ∃hasChild.¬Parricide)(class of individuals with a child who is a parricide and has a child whois not a parricide)
K |= Q(JOCASTA) ?
Problem of incomplete knowledge about the truth ofα = Parricide(POLYNEIKES)
OWA (reasoning on the possible models): truedividing interpretations (of K) into two classes:
1 models of α and2 models of ¬α
In both cases JOCASTA satisfies Q (J-P-T / J-O-P)
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 12 / 32
Learning Concepts through TDTs
Outline
1 DL: Representation & InferenceSyntax & SemanticsDL Knowledge BasesInference
2 Learning Concepts through TDTsLearning ProblemTerminological Decision TreesInductionClassificationConversion
3 EvaluationSetupResults
4 Conclusions
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 13 / 32
Learning Concepts through TDTs Learning Problem
Concept Induction I
Let K = (T ,A) be a DL knowledge base (acting as BK )
Definition (DL concept learning problem)Given
a target concept name C;a set of positive and negative examples for C:S+C (A) = {a ∈ Ind(A) | K |= C(a)} andS−C (A) = {b ∈ Ind(A) | K |= ¬C(b)}
Find a concept description D that satisfiesK |= D(a) ∀a ∈ S+
C (A) and
K |= ¬D(b) ∀b ∈ S−C (A)
Then induced axiom C ≡ D can be added to K
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 14 / 32
Learning Concepts through TDTs Learning Problem
Concept Induction II
Example (car checking [Blockeel and De Raedt, 1997])
T =
Gear v Replaceable,Chain v Replaceable,Engine v ¬Replaceable,Wheel v ¬Replaceable
SendBack v ¬(Fix t Ok),Fix v ¬(Ok t SendBack),Ok v ¬(SendBack t Fix)
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 15 / 32
Learning Concepts through TDTs Learning Problem
Concept Induction III
Example (cont’d)The original examples can be encoded as assertions:
Machine(M1), hasPart(M1, G1), Gear(G1), Worn(G1),hasPart(M1, C1), Chain(C1), Worn(C1),Machine(M2), hasPart(M2, E2), Engine(E2), Worn(E2),hasPart(M2, C2), Chain(C2), Worn(C2),Machine(M3), hasPart(M3, W2), Wheel(W3), Worn(W3),Machine(M4)
⊆ A
Given this KB and the example setsS+C (A) = {M1, M3} and S−C (A) = {M2, M4},
a good definition for C = SendBack may be:
SendBack ≡ Machine u ∃hasPart.(Worn u ¬Replaceable)
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 16 / 32
Learning Concepts through TDTs Terminological Decision Trees
Terminological Decision Trees I
First-order logical decision trees (FOLDTs) are defined[Blockeel and De Raedt, 1998] as binary decision trees in which
1 the nodes contain tests in the form of FOL formulae;2 left and right branches stand, resp., for the truth-value (resp. true
and false) determined by the test evaluation;3 different nodes may share variables
with some limitations
Terminological decision trees (TDTs) extend this definition,allowing DL concept descriptions as (variable-free) node tests
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 17 / 32
Learning Concepts through TDTs Terminological Decision Trees
Terminological Decision Trees II
A TDT providing the definition of the SendBack concept
∃hasPart.>
∃hasPart.Worn
∃hasPart.(Worn u ¬Replaceable)
SendBack ¬SendBack (v Fix)
¬SendBack (v Ok)
¬SendBack (v Machine)
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 18 / 32
Learning Concepts through TDTs Induction
Induction of TDTs – base case
function INDUCETDTREE(C; D; Ps, Ns, Us): TDT;
C: concept name;D: current description;Ps, Ns, Us: set of (positive, negative, unlabeled) training individuals;const θ: purity thresholdbeginInitialize new TDT T ;if |Ps| = 0 and |Ns| = 0 then
beginif Pr+ ≥ Pr− then T.root← C else T.root← ¬C;return T ;end
if |Ns| = 0 and |Ps|/(|Ps|+ |Us|) > θ thenbegin T.root← C; return T ; end
if |Ps| = 0 and |Ns|/(|Ns|+ |Us|) > θ thenbegin T.root← ¬C; return T ; end
{ ... }
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 19 / 32
Learning Concepts through TDTs Induction
Induction of TDTs – recursive case
{ ... }Specs← GENERATENEWCONCEPTS(D,Ps,Ns);Dbest ← SELECTBESTCONCEPT(Specs,Ps,Ns,Us);((P l, N l, U l), (P r, Nr, Ur))← SPLIT(Dbest ,Ps,Ns,Us);T.root← Dbest ;T.left← INDUCETDTREE(C,D uDbest , P
l, N l, U l);T.right← INDUCETDTREE(C,D u ¬Dbest , P
r, Nr, Ur);return T ;end
The (im)purity measure is based on the Gini index
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 20 / 32
Learning Concepts through TDTs Classification
TDTs – Classification of individuals
function CLASSIFY(a: individual, T : TDT, K: KB): concept;begin
1 N ← ROOT(T );2 while ¬LEAF(N, T ) do
1 (D, Tleft, Tright)← INODE(N);2 if K |= D(a) then N ← ROOT(Tleft)3 elseif K |= ¬D(a) then N ← ROOT(Tright)4 else return >
3 (D, ·, ·)← INODE(N);4 return D;
end
Observation To avoid unknown answers due to OWA (test failure onboth branches) use weaker right-branch test (2.3): K 6|= Di(a)
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 21 / 32
Learning Concepts through TDTs Conversion
Conversion – TDTs to DL Concepts I
function DERIVEDEFINITION(C, T ): concept description;C: concept name;T : TDT;begin
1 S ← ASSOCIATE(C, T,>);2 return
⊔D∈SD;
end
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 22 / 32
Learning Concepts through TDTs Conversion
Conversion – TDTs to DL Concepts IIfunction ASSOCIATE(C; T ; Dc): set of descriptions;C: concept name;T : TDT;Dc: current concept descriptionbegin
1 N ← ROOT(T );
2 (Dn, Tleft, Tright)← INODE(N);
3 if LEAF(N, T )then
1 if Dn = C then return {Dc}; else return ∅;else
1 Sleft ← ASSOCIATE(C, Tleft, Dc uDn);2 Sright ← ASSOCIATE(C, Tright, Dc u ¬Dn);3 return Sleft ∪ Sright;
end
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 23 / 32
Evaluation
Outline
1 DL: Representation & InferenceSyntax & SemanticsDL Knowledge BasesInference
2 Learning Concepts through TDTsLearning ProblemTerminological Decision TreesInductionClassificationConversion
3 EvaluationSetupResults
4 Conclusions
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 24 / 32
Evaluation Setup
Evaluation – SetupSystem TermiTIS applied to classification problems
50 random queries per ontology generated by composition of 2through 8 concepts built by means of ALC constructors.632 bootstrap strategyDL reasoner PELLET ver. 2 employed to decide the actualclass-membership w.r.t. the queriesDefault threshold (θ = .05)OWL ontologies selected from standard repositories
DL #obj. #d-typeontology language #concepts prop’s prop’s #ind’s
FSM SOF(D) 20 10 7 37MDM0.73 ALCHOF(D) 196 22 3 112
WINES ALCOF(D) 75 12 1 161BIOPAX ALCIF(D) 74 70 40 323
HDISEASE ALCIF(D) 1498 10 15 639NTN SHIF(D) 47 27 8 676
FINANCIAL ALCIF 60 16 0 1000
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 25 / 32
Evaluation Results
Performance
Compare classification of the test individuals using both the inducedtrees and the deductive one provided by a reasoner
inductive vs. deductive classification
match case: −1 vs. −1, 0 vs. 0, +1 vs. +1;omission error case: 0 vs. −1, 0 vs. +1;commission error case: −1 vs. +1, +1 vs. −1;induction case: −1 vs. 0, +1 vs. 0;
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 26 / 32
Evaluation Results
Results I
ontology match commission omission inductionrate rate rate rate
FSM 96.68±01.98 00.99±01.35 00.02±00.18 02.31±00.51MDM0.73 93.96±05.44 00.39±00.61 03.50±04.16 02.15±01.47
WINES 74.36±25.63 00.67±04.63 12.46±14.28 12.13±23,49BIOPAX 96.51±06.03 01.30±05.72 02.19±00.51 00.00±00,00
HDISEASE 78.60±39.79 00.02±00.10 01.54±06.01 19.82±39.17NTN 91.65±15.89 00.01±00.09 00.36±01.58 07.98±14.60
FINANCIAL 96.21±10.48 02.14±10.07 00.16±00.55 01.49±00.16
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 27 / 32
Evaluation Results
Results II
Examples of induced concepts and original queries
BIOPAX
induced: (Or (And physicalEntity protein) dataSource)
original:(Or (And (And dataSource externalReferenceUtilityClass)
(ForAll ORGANISM (ForAll CONTROLLED phys icalInteraction)))
protein)
NTNinduced: (Or EvilSupernaturalBeing (Not God))
original: (Not God)
FINANCIAL
induced: (Or (Not Finished) NotPaidFinishedLoan Weekly)
original: (Or LoanPayment (Not NoProblemsFinishedLoan))
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 28 / 32
Conclusions
Outline
1 DL: Representation & InferenceSyntax & SemanticsDL Knowledge BasesInference
2 Learning Concepts through TDTsLearning ProblemTerminological Decision TreesInductionClassificationConversion
3 EvaluationSetupResults
4 Conclusions
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 29 / 32
Conclusions
Conclusions & Outgoing Work
Introduced terminologicaldecision trees, + new methodfor learning concepts in DLsthat support the standard SWontology languagesTERMITIS system
top-down tree inductionadaptation of standardtree-induction methodsclassificationconversion
Experiments made on variousontologies proves the methodeffective and robust (highmatch rate, few commissionerrors)
Experiments with domainexperts (ontology population)More expressive DLs(+ new ref.op.’s)
currently KBs representedwith expressive DLsbut build concepts withALCQ constructors usingconcept names as atoms
impurity indices to exploit theuncertainty related to theunlabeled individualsDerive new hierarchicalclustering algorithms
Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 30 / 32
time for questions
Many thanks for attending this talk
comments / questions ?
(also, meet me @ Poster Session)
OfflineNicola Fanizzi [email protected]
Claudia d’Amato [email protected]
Floriana Esposito [email protected]
ReferencesBadea, L. and Nienhuys-Cheng, S.-H. (2000).A refinement operator for description logics.In Cussens, J. and Frisch, A., editors, Proceedings of the 10th International Conference on Inductive Logic Programming,volume 1866 of LNAI, pages 40–59. Springer.
Blockeel, H. and De Raedt, L. (1997).Experiments with top-down induction of first order decision trees.Technical Report CW 247, Dept. of Computer Science, K.U. Leuven.
Blockeel, H. and De Raedt, L. (1998).Top-down induction of first-order logical decision trees.Artificial Intelligence, 101(1-2):285–297.
Cohen, W., Borgida, A., and Hirsh, H. (1992).Computing the least common subsumers in description logic.In Swartout, W., editor, Proceedings of the 10th National Conference on Artificial Intelligence, pages 754–760. Mit Press.
Cohen, W. and Hirsh, H. (1994).Learning the CLASSIC description logic.In Torasso, P. et al., editors, Proceedings of the 4th International Conference on the Principles of KnowledgeRepresentation and Reasoning, pages 121–133. Morgan Kaufmann.
Fanizzi, N., d’Amato, C., and Esposito, F. (2008).DL-FOIL: Concept learning in Description Logics.In Zelezny, F. and Lavrac, N., editors, Proceedings of the 18th International Conference on Inductive Logic Programming,ILP2008, volume 5194 of LNAI, pages 107–121. Springer.
Iannone, L., Palmisano, I., and Fanizzi, N. (2007).An algorithm based on counterfactuals for concept learning in the semantic web.Applied Intelligence, 26(2):139–159.
Kietz, J.-U. (2002).Learnability of description logic programs.In Matwin, S. and Sammut, C., editors, Proceedings of the 12th International Conference on Inductive Logic Programming,volume 2583 of LNAI, pages 117–132, Sydney. Springer.
Kietz, J.-U. and Morik, K. (1994).A polynomial approach to the constructive induction of structural knowledge.Machine Learning, 14(2):193–218.
Lehmann, J. and Hitzler, P. (2008).Foundations of refinement operators for description logics.In Blockeel, H. and et al., editors, Proceedings of the 17th International Conference on Inductive Logic Programming,ILP2007, volume 4894 of LNCS, pages 161–174. Springer.
Lehmann, J. and Hitzler, P. (2010).Concept learning in description logics using refinement operators.Machine Learning, 78(1-2):203–250.
Lisi, F. and Esposito, F. (2008).Foundations of onto-relational learning.In Zelezny, F. and Lavrac, N., editors, Proceedings of the 18th International Conference on Inductive Logic Programming,ILP2008, volume 5194 of LNAI, pages 158–175.
Rouveirol, C. and Ventos, V. (2000).Towards learning in CARIN-ALN .In Cussens, J. and Frisch, A., editors, Proceedings of the 10th International Conference on Inductive Logic Programming,volume 1866 of LNAI, pages 191–208. Springer.