ecml2010 slides

32
Induction of Concepts in Web Ontologies through Terminological Decision Trees Nicola Fanizzi Claudia d’Amato Floriana Esposito LACAM – Dipartimento di Informatica Universit ` a degli Studi di Bari ”Aldo Moro” ECML/PKDD 2010 – Barcelona, Spain

Upload: fanizzi

Post on 17-Jul-2015

205 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ecml2010 Slides

Induction of Concepts in Web Ontologiesthrough Terminological Decision Trees

Nicola Fanizzi Claudia d’Amato Floriana Esposito

LACAM – Dipartimento di InformaticaUniversita degli Studi di Bari ”Aldo Moro”

ECML/PKDD 2010 – Barcelona, Spain

Page 2: Ecml2010 Slides

Preliminaries Motivation

Context

In the context of the Semantic Webnext Generation Knowledge Bases expressed as Ontologies

Problem with building ontologies:Burdensome taskDomain Expert 6= Knowledge Engineer

ThenAutomated Methods for learning conceptsexpressed in standard SW representations

founded on Description Logics

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 2 / 32

Page 3: Ecml2010 Slides

Preliminaries State of the Art

Related Work

Early works

focused on learnability, LCS op. for the CLASSIC family (ancestorsof the DL languages) [Cohen et al., 1992, Cohen and Hirsh, 1994]KLUSTER: conceptual clustering in BACK [Kietz and Morik, 1994]

approaches based on refinement operators

ALER refinement operators [Badea and Nienhuys-Cheng, 2000]YINYANG: downward operator based on the notion ofcounterfactuals; examples expressed as most specific concepts:complex concepts definitions [Iannone et al., 2007]DL-LEARNER: top-down GP algorithm, based on new downwardoperators, heuristic that favor definitions of limited complexity[Lehmann and Hitzler, 2008, Lehmann and Hitzler, 2010]DL-FOIL adapts FOIL to DL representation [Fanizzi et al., 2008]

Other approaches: hybrid languages[Rouveirol and Ventos, 2000, Kietz, 2002, Lisi and Esposito, 2008]

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 3 / 32

Page 4: Ecml2010 Slides

Preliminaries State of the Art

In this work

Introduce Terminological Decision TreesInduction, Classification, ConversionEvaluation

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 4 / 32

Page 5: Ecml2010 Slides

Preliminaries State of the Art

Outline

1 DL: Representation & InferenceSyntax & SemanticsDL Knowledge BasesInference

2 Learning Concepts through TDTsLearning ProblemTerminological Decision TreesInductionClassificationConversion

3 EvaluationSetupResults

4 Conclusions

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 5 / 32

Page 6: Ecml2010 Slides

DL: Representation & Inference

Outline

1 DL: Representation & InferenceSyntax & SemanticsDL Knowledge BasesInference

2 Learning Concepts through TDTsLearning ProblemTerminological Decision TreesInductionClassificationConversion

3 EvaluationSetupResults

4 Conclusions

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 6 / 32

Page 7: Ecml2010 Slides

DL: Representation & Inference Syntax & Semantics

DLs Preliminaries I

In DLsaxioms inductively defined building on a vocabulary of

NC set of primitive concept namesNR set of primitive role namesNI set of individual names

and syntax constructors

Set-theoretic semantics defined by interpretations I = (∆I , ·I)∆I domain of the interpretation (non-empty)·I interpretation function that maps names to extensions

each A ∈ NC to a set AI ⊆ ∆I andeach R ∈ NR to RI ⊆ ∆I ×∆I

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 7 / 32

Page 8: Ecml2010 Slides

DL: Representation & Inference Syntax & Semantics

DLs Preliminaries II

ALC Syntax

C,D → > top concept| ⊥ bottom concept| A primitive concept Animal

| ¬C (full) concept negation ¬Parent| C uD concept conjunction Person u Male

| C tD concept disjunction Male u Female

| ∃R.C existential restriction ∃hasChild.Male| ∀R.C universal restriction ∀hasChild.Female

grammar rules names examples

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 8 / 32

Page 9: Ecml2010 Slides

DL: Representation & Inference Syntax & Semantics

DLs Preliminaries III

ALC Semantics

construct interpretatation OWL>I = ∆I owl:Thing

⊥I = ∅ owl:Nothing

¬CI = ∆I \ CI owl:complementOf

(C uD)I = CI ∩DI owl:intersectionOf

(C tD)I = CI ∪DI owl:unionOf

(∃R.C)I = {x | ∃y : (x, y) ∈ RI ∧ y ∈ CI} owl:someValuesFrom

(∀R.C)I = {x | ∀y : (x, y) ∈ RI → y ∈ CI} owl:allValuesFrom

In SW/DL: Open world assumption made

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 9 / 32

Page 10: Ecml2010 Slides

DL: Representation & Inference DL Knowledge Bases

Knowledge Bases I

A knowledge base K = 〈T ,A〉 contains

TBox T set of axioms C v D (resp. C ≡ D),meaning CI ⊆ DI (resp. CI = DI )where C is atomic and D is a concept description

ABox A set of assertions like C(a) and R(a, b),meaning that aI ∈ CI and (aI , bI) ∈ RI

Ind(A) = set of individuals occurring in A

Interpretations of interest (models) satisfy all axioms in K

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 10 / 32

Page 11: Ecml2010 Slides

DL: Representation & Inference DL Knowledge Bases

Knowledge Bases II

Example (Œdipus’ family)

T =

Female ≡ ¬Male,Father ≡ Male u ∃hasChild.>,Mother ≡ Female u ∃hasChild.>,Parent ≡ Mother t Father,

MotherWithNoDaughter ≡ Mother u ∀hasChild.¬Female

A =

Female(JOCASTA), Female(POLYNEIKES),Male(OEDIPUS), Male(THERSANDROS),hasChild(JOCASTA, OEDIPUS), hasChild(JOCASTA, POLYNEIKES),hasChild(OEDIPUS, POLYNEIKES),hasChild(POLYNEIKES, THERSANDROS),Parricide(OEDIPUS),¬Parricide(THERSANDROS)

Parent(OEDIPUS) trueMotherWithNoDaughter(POLYNEIKES) ?: daughter of POLYNEIKES not known

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 11 / 32

Page 12: Ecml2010 Slides

DL: Representation & Inference Inference

Inference & OWA

Q = ∃hasChild.(Parricide u ∃hasChild.¬Parricide)(class of individuals with a child who is a parricide and has a child whois not a parricide)

K |= Q(JOCASTA) ?

Problem of incomplete knowledge about the truth ofα = Parricide(POLYNEIKES)

OWA (reasoning on the possible models): truedividing interpretations (of K) into two classes:

1 models of α and2 models of ¬α

In both cases JOCASTA satisfies Q (J-P-T / J-O-P)

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 12 / 32

Page 13: Ecml2010 Slides

Learning Concepts through TDTs

Outline

1 DL: Representation & InferenceSyntax & SemanticsDL Knowledge BasesInference

2 Learning Concepts through TDTsLearning ProblemTerminological Decision TreesInductionClassificationConversion

3 EvaluationSetupResults

4 Conclusions

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 13 / 32

Page 14: Ecml2010 Slides

Learning Concepts through TDTs Learning Problem

Concept Induction I

Let K = (T ,A) be a DL knowledge base (acting as BK )

Definition (DL concept learning problem)Given

a target concept name C;a set of positive and negative examples for C:S+C (A) = {a ∈ Ind(A) | K |= C(a)} andS−C (A) = {b ∈ Ind(A) | K |= ¬C(b)}

Find a concept description D that satisfiesK |= D(a) ∀a ∈ S+

C (A) and

K |= ¬D(b) ∀b ∈ S−C (A)

Then induced axiom C ≡ D can be added to K

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 14 / 32

Page 15: Ecml2010 Slides

Learning Concepts through TDTs Learning Problem

Concept Induction II

Example (car checking [Blockeel and De Raedt, 1997])

T =

Gear v Replaceable,Chain v Replaceable,Engine v ¬Replaceable,Wheel v ¬Replaceable

SendBack v ¬(Fix t Ok),Fix v ¬(Ok t SendBack),Ok v ¬(SendBack t Fix)

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 15 / 32

Page 16: Ecml2010 Slides

Learning Concepts through TDTs Learning Problem

Concept Induction III

Example (cont’d)The original examples can be encoded as assertions:

Machine(M1), hasPart(M1, G1), Gear(G1), Worn(G1),hasPart(M1, C1), Chain(C1), Worn(C1),Machine(M2), hasPart(M2, E2), Engine(E2), Worn(E2),hasPart(M2, C2), Chain(C2), Worn(C2),Machine(M3), hasPart(M3, W2), Wheel(W3), Worn(W3),Machine(M4)

⊆ A

Given this KB and the example setsS+C (A) = {M1, M3} and S−C (A) = {M2, M4},

a good definition for C = SendBack may be:

SendBack ≡ Machine u ∃hasPart.(Worn u ¬Replaceable)

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 16 / 32

Page 17: Ecml2010 Slides

Learning Concepts through TDTs Terminological Decision Trees

Terminological Decision Trees I

First-order logical decision trees (FOLDTs) are defined[Blockeel and De Raedt, 1998] as binary decision trees in which

1 the nodes contain tests in the form of FOL formulae;2 left and right branches stand, resp., for the truth-value (resp. true

and false) determined by the test evaluation;3 different nodes may share variables

with some limitations

Terminological decision trees (TDTs) extend this definition,allowing DL concept descriptions as (variable-free) node tests

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 17 / 32

Page 18: Ecml2010 Slides

Learning Concepts through TDTs Terminological Decision Trees

Terminological Decision Trees II

A TDT providing the definition of the SendBack concept

∃hasPart.>

∃hasPart.Worn

∃hasPart.(Worn u ¬Replaceable)

SendBack ¬SendBack (v Fix)

¬SendBack (v Ok)

¬SendBack (v Machine)

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 18 / 32

Page 19: Ecml2010 Slides

Learning Concepts through TDTs Induction

Induction of TDTs – base case

function INDUCETDTREE(C; D; Ps, Ns, Us): TDT;

C: concept name;D: current description;Ps, Ns, Us: set of (positive, negative, unlabeled) training individuals;const θ: purity thresholdbeginInitialize new TDT T ;if |Ps| = 0 and |Ns| = 0 then

beginif Pr+ ≥ Pr− then T.root← C else T.root← ¬C;return T ;end

if |Ns| = 0 and |Ps|/(|Ps|+ |Us|) > θ thenbegin T.root← C; return T ; end

if |Ps| = 0 and |Ns|/(|Ns|+ |Us|) > θ thenbegin T.root← ¬C; return T ; end

{ ... }

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 19 / 32

Page 20: Ecml2010 Slides

Learning Concepts through TDTs Induction

Induction of TDTs – recursive case

{ ... }Specs← GENERATENEWCONCEPTS(D,Ps,Ns);Dbest ← SELECTBESTCONCEPT(Specs,Ps,Ns,Us);((P l, N l, U l), (P r, Nr, Ur))← SPLIT(Dbest ,Ps,Ns,Us);T.root← Dbest ;T.left← INDUCETDTREE(C,D uDbest , P

l, N l, U l);T.right← INDUCETDTREE(C,D u ¬Dbest , P

r, Nr, Ur);return T ;end

The (im)purity measure is based on the Gini index

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 20 / 32

Page 21: Ecml2010 Slides

Learning Concepts through TDTs Classification

TDTs – Classification of individuals

function CLASSIFY(a: individual, T : TDT, K: KB): concept;begin

1 N ← ROOT(T );2 while ¬LEAF(N, T ) do

1 (D, Tleft, Tright)← INODE(N);2 if K |= D(a) then N ← ROOT(Tleft)3 elseif K |= ¬D(a) then N ← ROOT(Tright)4 else return >

3 (D, ·, ·)← INODE(N);4 return D;

end

Observation To avoid unknown answers due to OWA (test failure onboth branches) use weaker right-branch test (2.3): K 6|= Di(a)

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 21 / 32

Page 22: Ecml2010 Slides

Learning Concepts through TDTs Conversion

Conversion – TDTs to DL Concepts I

function DERIVEDEFINITION(C, T ): concept description;C: concept name;T : TDT;begin

1 S ← ASSOCIATE(C, T,>);2 return

⊔D∈SD;

end

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 22 / 32

Page 23: Ecml2010 Slides

Learning Concepts through TDTs Conversion

Conversion – TDTs to DL Concepts IIfunction ASSOCIATE(C; T ; Dc): set of descriptions;C: concept name;T : TDT;Dc: current concept descriptionbegin

1 N ← ROOT(T );

2 (Dn, Tleft, Tright)← INODE(N);

3 if LEAF(N, T )then

1 if Dn = C then return {Dc}; else return ∅;else

1 Sleft ← ASSOCIATE(C, Tleft, Dc uDn);2 Sright ← ASSOCIATE(C, Tright, Dc u ¬Dn);3 return Sleft ∪ Sright;

end

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 23 / 32

Page 24: Ecml2010 Slides

Evaluation

Outline

1 DL: Representation & InferenceSyntax & SemanticsDL Knowledge BasesInference

2 Learning Concepts through TDTsLearning ProblemTerminological Decision TreesInductionClassificationConversion

3 EvaluationSetupResults

4 Conclusions

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 24 / 32

Page 25: Ecml2010 Slides

Evaluation Setup

Evaluation – SetupSystem TermiTIS applied to classification problems

50 random queries per ontology generated by composition of 2through 8 concepts built by means of ALC constructors.632 bootstrap strategyDL reasoner PELLET ver. 2 employed to decide the actualclass-membership w.r.t. the queriesDefault threshold (θ = .05)OWL ontologies selected from standard repositories

DL #obj. #d-typeontology language #concepts prop’s prop’s #ind’s

FSM SOF(D) 20 10 7 37MDM0.73 ALCHOF(D) 196 22 3 112

WINES ALCOF(D) 75 12 1 161BIOPAX ALCIF(D) 74 70 40 323

HDISEASE ALCIF(D) 1498 10 15 639NTN SHIF(D) 47 27 8 676

FINANCIAL ALCIF 60 16 0 1000

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 25 / 32

Page 26: Ecml2010 Slides

Evaluation Results

Performance

Compare classification of the test individuals using both the inducedtrees and the deductive one provided by a reasoner

inductive vs. deductive classification

match case: −1 vs. −1, 0 vs. 0, +1 vs. +1;omission error case: 0 vs. −1, 0 vs. +1;commission error case: −1 vs. +1, +1 vs. −1;induction case: −1 vs. 0, +1 vs. 0;

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 26 / 32

Page 27: Ecml2010 Slides

Evaluation Results

Results I

ontology match commission omission inductionrate rate rate rate

FSM 96.68±01.98 00.99±01.35 00.02±00.18 02.31±00.51MDM0.73 93.96±05.44 00.39±00.61 03.50±04.16 02.15±01.47

WINES 74.36±25.63 00.67±04.63 12.46±14.28 12.13±23,49BIOPAX 96.51±06.03 01.30±05.72 02.19±00.51 00.00±00,00

HDISEASE 78.60±39.79 00.02±00.10 01.54±06.01 19.82±39.17NTN 91.65±15.89 00.01±00.09 00.36±01.58 07.98±14.60

FINANCIAL 96.21±10.48 02.14±10.07 00.16±00.55 01.49±00.16

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 27 / 32

Page 28: Ecml2010 Slides

Evaluation Results

Results II

Examples of induced concepts and original queries

BIOPAX

induced: (Or (And physicalEntity protein) dataSource)

original:(Or (And (And dataSource externalReferenceUtilityClass)

(ForAll ORGANISM (ForAll CONTROLLED phys icalInteraction)))

protein)

NTNinduced: (Or EvilSupernaturalBeing (Not God))

original: (Not God)

FINANCIAL

induced: (Or (Not Finished) NotPaidFinishedLoan Weekly)

original: (Or LoanPayment (Not NoProblemsFinishedLoan))

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 28 / 32

Page 29: Ecml2010 Slides

Conclusions

Outline

1 DL: Representation & InferenceSyntax & SemanticsDL Knowledge BasesInference

2 Learning Concepts through TDTsLearning ProblemTerminological Decision TreesInductionClassificationConversion

3 EvaluationSetupResults

4 Conclusions

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 29 / 32

Page 30: Ecml2010 Slides

Conclusions

Conclusions & Outgoing Work

Introduced terminologicaldecision trees, + new methodfor learning concepts in DLsthat support the standard SWontology languagesTERMITIS system

top-down tree inductionadaptation of standardtree-induction methodsclassificationconversion

Experiments made on variousontologies proves the methodeffective and robust (highmatch rate, few commissionerrors)

Experiments with domainexperts (ontology population)More expressive DLs(+ new ref.op.’s)

currently KBs representedwith expressive DLsbut build concepts withALCQ constructors usingconcept names as atoms

impurity indices to exploit theuncertainty related to theunlabeled individualsDerive new hierarchicalclustering algorithms

Fanizzi, d’Amato, Esposito UniBa.IT Induction of Terminological Decision Trees ECML/PKDD 2010 30 / 32

Page 31: Ecml2010 Slides

time for questions

Many thanks for attending this talk

comments / questions ?

(also, meet me @ Poster Session)

OfflineNicola Fanizzi [email protected]

Claudia d’Amato [email protected]

Floriana Esposito [email protected]

Page 32: Ecml2010 Slides

ReferencesBadea, L. and Nienhuys-Cheng, S.-H. (2000).A refinement operator for description logics.In Cussens, J. and Frisch, A., editors, Proceedings of the 10th International Conference on Inductive Logic Programming,volume 1866 of LNAI, pages 40–59. Springer.

Blockeel, H. and De Raedt, L. (1997).Experiments with top-down induction of first order decision trees.Technical Report CW 247, Dept. of Computer Science, K.U. Leuven.

Blockeel, H. and De Raedt, L. (1998).Top-down induction of first-order logical decision trees.Artificial Intelligence, 101(1-2):285–297.

Cohen, W., Borgida, A., and Hirsh, H. (1992).Computing the least common subsumers in description logic.In Swartout, W., editor, Proceedings of the 10th National Conference on Artificial Intelligence, pages 754–760. Mit Press.

Cohen, W. and Hirsh, H. (1994).Learning the CLASSIC description logic.In Torasso, P. et al., editors, Proceedings of the 4th International Conference on the Principles of KnowledgeRepresentation and Reasoning, pages 121–133. Morgan Kaufmann.

Fanizzi, N., d’Amato, C., and Esposito, F. (2008).DL-FOIL: Concept learning in Description Logics.In Zelezny, F. and Lavrac, N., editors, Proceedings of the 18th International Conference on Inductive Logic Programming,ILP2008, volume 5194 of LNAI, pages 107–121. Springer.

Iannone, L., Palmisano, I., and Fanizzi, N. (2007).An algorithm based on counterfactuals for concept learning in the semantic web.Applied Intelligence, 26(2):139–159.

Kietz, J.-U. (2002).Learnability of description logic programs.In Matwin, S. and Sammut, C., editors, Proceedings of the 12th International Conference on Inductive Logic Programming,volume 2583 of LNAI, pages 117–132, Sydney. Springer.

Kietz, J.-U. and Morik, K. (1994).A polynomial approach to the constructive induction of structural knowledge.Machine Learning, 14(2):193–218.

Lehmann, J. and Hitzler, P. (2008).Foundations of refinement operators for description logics.In Blockeel, H. and et al., editors, Proceedings of the 17th International Conference on Inductive Logic Programming,ILP2007, volume 4894 of LNCS, pages 161–174. Springer.

Lehmann, J. and Hitzler, P. (2010).Concept learning in description logics using refinement operators.Machine Learning, 78(1-2):203–250.

Lisi, F. and Esposito, F. (2008).Foundations of onto-relational learning.In Zelezny, F. and Lavrac, N., editors, Proceedings of the 18th International Conference on Inductive Logic Programming,ILP2008, volume 5194 of LNAI, pages 158–175.

Rouveirol, C. and Ventos, V. (2000).Towards learning in CARIN-ALN .In Cussens, J. and Frisch, A., editors, Proceedings of the 10th International Conference on Inductive Logic Programming,volume 1866 of LNAI, pages 191–208. Springer.