learning from ordinal data with ilp in description logic · 2017. 9. 12. · learning from ordinal...
TRANSCRIPT
LearningfromOrdinalDatawithILPinDescriptionLogic
NunungNurulQomariyahandDimitarKazakov
ComputerScience,UniversityofYork,UK
Presentedonthe27thInternaDonalConferenceonInducDveLogicProgrammingOrléans,France,4-6September2017
THEOUTLINE
1. IntroducDon
2. ProblemrepresentaDon
3. ProposedAlgorithm
4. EvaluaDon5. ConclusionsandFurtherWork
INTRODUCTION
ILPalgorithmsusingDLrepresentaDon– HavethepotenDaltobeappliedtolargevolumesof
linkedopendata
– Tobenefitfromthetoolsavailableforsuchdata• e.g.IDEssuchasProtégé,DBsystemsuchastriplestoreand
Ontologyreasoners
INTRODUCTION
PreviousworkonILPinDLincludes:– DL-Learner[1],YinYang[2],DL-FOIL[3],Kietz’s[4]andKonstantopoulos’s[5]work.
– TheyaremostlyaimingtolearnaboutaconceptdescripDons
INTRODUCTION
TheapplicaDonarea:PreferenceLearning(PL)– PL[6]aimstoinducepredicDveuserpreference
modelsfromempiricaldata.
several benefits of representing both data and models in alogic-based language, as this allows for the use of reasonertools that can infer logical consequences from a given knowl-edge data base. Our method employs an ontology reasonerto recommend items consistent with the user preference hy-potheses produced by ILP. This approach has the potentialto make suggestions about items that have never been ex-plicitly discussed with the user.
In summary, our contribution in this paper is as follows:• We propose a new approach to learning preference
from multi-attribute items for recommender systemsbased on Inductive Logic Programming (ILP).
• We propose a new architecture for knowledge repre-sentation and inference using Semantic Web Rule Lan-guage (SWRL) and an ontology reasoner.
• We also describe a way to tune the settings of ourlearning algorithm based on experiments with a realworld dataset in order to improve system performance.
We divide the rest of the paper as follows. In Section 2, weexplain the details of our proposed approach. We describethe dataset and the experiments carried out in Section 3.Then, we discuss the results in Section 4. We explore relatedwork in Section 5. Finally, we conclude and suggest futurework in Section 6.
2. PROPOSED APPROACHWe have guided our choice of learning algorithm by the
need for an expressive representation formalism learning al-gorithm capable of handling a variety of hypotheses based onthe user preferences, along with the desire to be able to learnrobust hypotheses from a limited number of examples andexpress the result in a human readable form. While otherresearchers [9] have used linear SVM to approximate userpreferences, we opted for the flexibility of Inductive LogicProgramming. In addition, the performance of our systemis boosted through the use of constraints on the range of hy-potheses considered, which reduces the time complexity ofthe learning task. We divide this section into four subsectionexplaining each step in more detail.
2.1 Problem FormalizationArguably, the use of data that genuinely reflects the user
preferences is essential for the success of any recommendersystem. Therefore we have opted for a form of knowledgeelicitation that minimises the subjectivity of the user’s repliesby limiting the complexity of the query asked and restrictingthe feedback provided to qualitative information alone. Inpractice, this is achieved through queries consisting of pairsof items along with their descriptions, where the user onlyneeds to select the better of the two items. Such pairwisecomparisons are used to learn which items will be classifiedas “Good”, i.e. ones that the user would consider buying.We use the user’s answers to classify the unlabelled dataand make a prediction about classes. We illustrate the gen-eral annotation process in Figure 1.
The figure shows how we derive conclusions about prefer-ences regarding individual attributes from data pairs of theform “Car 1 is-better-than Car 2”. The bold arrow repre-sents the annotation from the user and the dotted arrowsshow possible implications about individual attributes thatthe learning algorithm will consider. Note that in general,ILP makes it possible to compare combinations of attributes,e.g. hprice1, mileage1i vs. hprice2, mileage2i, through the
car 1 car 2better than
mileage 1
price 1
mileage 2
price 2
year 1
type 1
year 2
type 2
Figure 1: User annotation
use of appropriately defined relations (so called backgroundknowledge), but this aspect of ILP is not explored here. Theway in which we build hypotheses is explained in more detailin Section 2.2.
Definition 1 (Items). An item I is described by a set ofattribute names and their values: {A1 = v1, A2 = v2, . . . An
=vn
}.
Definition 2 (Comparison Pair). Given a set of itemsE, we define a comparison pair P as any (e, e0) 2 E⇥E, e 6=e0. We shall then refer to the variable represented by the at-tribute A of the first element of the pair as A
first
, whileA
second
will refer to the variable represented by the attributeA of the second element of the pair. The values of these vari-ables will be denoted as value(A
first
), resp. value(Asecond
).
Definition 3 (User Annotation). The annotation pro-vided by the user is binary: a given pair (e, e0) is given aclass label 1 if the user considers e to be better than e0, orthe class label is set to 0 if the user considers e0 to be betterthan e. The relationship better than represents strict in-equality, and the user is forced to choose between these twoalternatives. We therefore define a predicate C, such that:
C(he, e0i) =(1 if e is better than e0
0 if e0 is better than e.(1)
Definition 4 (Training Examples). The set of train-ing examples S consists of the union of all pairs he, e0i suchthat C(he, e0i) = 1, along with all pairs he0, ei such thatC(he, e0i) = 0.
Now we state our main learning task as:
Definition 5 (Learning Problem). Find a model T thatis consistent with the set of training examples S.
2.2 Learning AlgorithmWe use the annotated data as input for our learning al-
gorithm. We build an algorithm that searches the space ofpossible hypotheses starting from the most general hypothe-ses, i.e. the ones based on the least number of constraints,and progresses towards the most specific rule possible givenby a Progol-like bottom clause [8]. The di↵erence betweenthe ILP system Progol and ours is that Progol searches thehypothesis space in a greedy way, throwing away all pos-itive examples that are already covered by the hypothesis,while we derive all parts of the hypothesis in a cautious way,
INTRODUCTION
TheapplicaDonarea:RecommenderSystem– Wehavepreviouslypublishedworkshoppaperatthe
ACMRecSys2017,Como,Italy
PROBLEMREPRESENTATION
TheobjecDveofthiswork:tolearnabouttransiDveanD-reflexiverelaDons:
– Transi>ve:• Weusetheexamplesprovidedbytheuser,alongwiththeirtransi>veclosure,intheircorrectorder(e.g.“carAisbecerthancarB”)asposiDveexamples
• ExampleoftransiDveclosure:– Userprovides:“carAisbecerthancarB”– Useralsoprovides:“carBisbecerthancarC”– Weaddaclosure:“carAisbecerthancarC”asaposiDveexample.
– An>-reflexive:• WeusethesameexamplesinreverseorderasnegaDveexamples.(e.g.“carBisbecerthancarA”)
PROBLEMREPRESENTATION
Hypothesislanguage
InAleph
AsobjectpropertyinRDF/XML:
5.1 Introduction
quantification. DL-Learner is quite close with our work, while it learns about concepts, our goal
is to learn Domain and Range axioms of a specific object properties. DL-Learner is another
improvement of previous ILP implementation in DL, such as YinYang [9] and DL-FOIL [10].
Kietz [11] and Konstantopoulos [12] also perform similar work in DL.
5.1.3 Problem Representation
We develop an algorithm by adopting the ILP to learn relations in DLs. In DL terms, our problem
is defined as follows:
Given: a set of individual and classes (represented in an ontology hierarchy) and a set
of preferences value (represented in an asymmetric transitive object property),
Find: the best axioms (in form of conjunction of classes) of both Domain and Range
for a specific object property (betterthan) that is complete and consistent with the
preferences given.
This problem is categorized as supervised learning problem, where the user gives label for each
item pair by considering their attributes comparisons. Then we proceed the label as a guidance for
searching a class description. The class description that map the left side of the object property to
the right side of it is known as Domain and Range axioms. In our preference learning problem, we
can say that any individual that belong to the class description in the Domain is more preferred
by the users than any individual belong to the class in the Range. The order of preferences here
is anti-symmetric, which means if item A is better than item B, then in any case item B cannot
be better than item A. It is also transitive, which means whenever item A is better than item B,
and item B is better than item C, then item A is better than item C. We use the reasoner to
retract all the transitive chain of the object properties. We treat them as an additional pair and
include them in the positive examples. For the anti-symmetric property, we handle it by assuming
all the opposite order of relations as the negative examples. Figure 5.1 shows how we represent
our problem in Protege.
5.1.3.1 Hypothesis languange
The aim of ILP is to find a theory that is complete (it covers all the given positive examples) and is
consistent (it covers no negative examples). We show how the hypothesis language is represented
as mode declaration in Aleph:
:- modeh(1,betterthan(+car,+car)).
:- modeb(1,hasbodytype(+car,#bodytype)).
We build the hypothesis by specifying which object property in the given ontology we want to
learn. For examples, we want to learn the object property betterthan as shown below:
<owl:ObjectProperty rdf:ID="betterthan"/>
4
5.1 Introduction
quantification. DL-Learner is quite close with our work, while it learns about concepts, our goal
is to learn Domain and Range axioms of a specific object properties. DL-Learner is another
improvement of previous ILP implementation in DL, such as YinYang [9] and DL-FOIL [10].
Kietz [11] and Konstantopoulos [12] also perform similar work in DL.
5.1.3 Problem Representation
We develop an algorithm by adopting the ILP to learn relations in DLs. In DL terms, our problem
is defined as follows:
Given: a set of individual and classes (represented in an ontology hierarchy) and a set
of preferences value (represented in an asymmetric transitive object property),
Find: the best axioms (in form of conjunction of classes) of both Domain and Range
for a specific object property (betterthan) that is complete and consistent with the
preferences given.
This problem is categorized as supervised learning problem, where the user gives label for each
item pair by considering their attributes comparisons. Then we proceed the label as a guidance for
searching a class description. The class description that map the left side of the object property to
the right side of it is known as Domain and Range axioms. In our preference learning problem, we
can say that any individual that belong to the class description in the Domain is more preferred
by the users than any individual belong to the class in the Range. The order of preferences here
is anti-symmetric, which means if item A is better than item B, then in any case item B cannot
be better than item A. It is also transitive, which means whenever item A is better than item B,
and item B is better than item C, then item A is better than item C. We use the reasoner to
retract all the transitive chain of the object properties. We treat them as an additional pair and
include them in the positive examples. For the anti-symmetric property, we handle it by assuming
all the opposite order of relations as the negative examples. Figure 5.1 shows how we represent
our problem in Protege.
5.1.3.1 Hypothesis languange
The aim of ILP is to find a theory that is complete (it covers all the given positive examples) and is
consistent (it covers no negative examples). We show how the hypothesis language is represented
as mode declaration in Aleph:
:- modeh(1,betterthan(+car,+car)).
:- modeb(1,hasbodytype(+car,#bodytype)).
We build the hypothesis by specifying which object property in the given ontology we want to
learn. For examples, we want to learn the object property betterthan as shown below:
<owl:ObjectProperty rdf:ID="betterthan"/>
4
PROBLEMREPRESENTATION
Hypothesislanguage
InProtégé
5.1 Introduction
(a) Representation of background
knowledge as class hierarchy
(b) Representation of examples as individuals
(c) Representation of hypothesis language as an object property
Figure 5.1: Problem representation in Protege
5.1.3.2 Background knowledge
The di↵erence how we represent the attributes in Aleph is that they are written as predicate with
two arguments, while in our algorithm, we treat them as classes and their individual member (e.g.
a class of car with sedan body type; we specify any individual that has a body type of sedan is a
member of that class). In Aleph, the background knowledge is written as below:
car(car1). car(car2).
bodytype(sedan). bodytype(suv).
hasbodytype(car1,sedan). hasbodytype(car2,suv).
In our algorithm, the background knowledge is written in RDF/XML as below:
<owl:Class rdf:ID="sedan"/>
<owl:Class rdf:ID="suv"/>
<sedan rdf:ID="car1"></sedan>
<suv rdf:ID="car2"></suv>
5.1.3.3 Examples
In ILP, the examples are represented as ground facts with predicate betterthan/2, where the
arguments are of type car. The positive examples is written as: betterthan(car1,car2), while
5
PROBLEMREPRESENTATION
5.1 Introduction
(a) Representation of background
knowledge as class hierarchy
(b) Representation of examples as individuals
(c) Representation of hypothesis language as an object property
Figure 5.1: Problem representation in Protege
5.1.3.2 Background knowledge
The di↵erence how we represent the attributes in Aleph is that they are written as predicate with
two arguments, while in our algorithm, we treat them as classes and their individual member (e.g.
a class of car with sedan body type; we specify any individual that has a body type of sedan is a
member of that class). In Aleph, the background knowledge is written as below:
car(car1). car(car2).
bodytype(sedan). bodytype(suv).
hasbodytype(car1,sedan). hasbodytype(car2,suv).
In our algorithm, the background knowledge is written in RDF/XML as below:
<owl:Class rdf:ID="sedan"/>
<owl:Class rdf:ID="suv"/>
<sedan rdf:ID="car1"></sedan>
<suv rdf:ID="car2"></suv>
5.1.3.3 Examples
In ILP, the examples are represented as ground facts with predicate betterthan/2, where the
arguments are of type car. The positive examples is written as: betterthan(car1,car2), while
5
5.1 Introduction
(a) Representation of background
knowledge as class hierarchy
(b) Representation of examples as individuals
(c) Representation of hypothesis language as an object property
Figure 5.1: Problem representation in Protege
5.1.3.2 Background knowledge
The di↵erence how we represent the attributes in Aleph is that they are written as predicate with
two arguments, while in our algorithm, we treat them as classes and their individual member (e.g.
a class of car with sedan body type; we specify any individual that has a body type of sedan is a
member of that class). In Aleph, the background knowledge is written as below:
car(car1). car(car2).
bodytype(sedan). bodytype(suv).
hasbodytype(car1,sedan). hasbodytype(car2,suv).
In our algorithm, the background knowledge is written in RDF/XML as below:
<owl:Class rdf:ID="sedan"/>
<owl:Class rdf:ID="suv"/>
<sedan rdf:ID="car1"></sedan>
<suv rdf:ID="car2"></suv>
5.1.3.3 Examples
In ILP, the examples are represented as ground facts with predicate betterthan/2, where the
arguments are of type car. The positive examples is written as: betterthan(car1,car2), while
5
Backgroundknowledge
InAleph
AsclasshierarchyinRDF/XML:
PROBLEMREPRESENTATION
5.1 Introduction
(a) Representation of background
knowledge as class hierarchy
(b) Representation of examples as individuals
(c) Representation of hypothesis language as an object property
Figure 5.1: Problem representation in Protege
5.1.3.2 Background knowledge
The di↵erence how we represent the attributes in Aleph is that they are written as predicate with
two arguments, while in our algorithm, we treat them as classes and their individual member (e.g.
a class of car with sedan body type; we specify any individual that has a body type of sedan is a
member of that class). In Aleph, the background knowledge is written as below:
car(car1). car(car2).
bodytype(sedan). bodytype(suv).
hasbodytype(car1,sedan). hasbodytype(car2,suv).
In our algorithm, the background knowledge is written in RDF/XML as below:
<owl:Class rdf:ID="sedan"/>
<owl:Class rdf:ID="suv"/>
<sedan rdf:ID="car1"></sedan>
<suv rdf:ID="car2"></suv>
5.1.3.3 Examples
In ILP, the examples are represented as ground facts with predicate betterthan/2, where the
arguments are of type car. The positive examples is written as: betterthan(car1,car2), while
5
Backgroundknowledge
InProtégé
Examples
InAleph
AsrelaDonsbetweenindividualsinRDF/XML:
PROBLEMREPRESENTATION
5.2 The Proposed Algorithm
the negative examples is generated by opposing the positive examples :-betterthan(car2,car1).
In our algorithm, a set of examples given by the user label is translated as a set of individual
which has a betterthan relationship with the other individual. We show a representation of the
above example in RDF/XML as below:
<sedan rdf:ID="car1">
<betterthan> <suv rdf:ID="car2"></suv> </betterthan>
</sedan>
5.2 The Proposed Algorithm
In this section we describe our proposed algorithm. We implement our algorithm using Java with
the OWL API library for DL implementation. Progol is an implementation of ILP in C language
by Muggleton [13]. Aleph [14] is one of the ILP implementation that use Prolog language and
following the same procedure as Progol. We follow the four basic procedures used in Progol/Aleph
as below:
1. Select a positive example. Select an example to be generalized based on their order in
the examples file. Each instance of the relation can be seen as a pair of object IDs.
2. Build the bottom clause. The bottom clause is the conjunction of all non-disjoint class
memberships for each object in the pair.
3. Search. Find a clause more general than the bottom clause. This step uses greedy best-first
search to find a clause consistent with the data.
4. Remove covered examples. Our algorithm is greedy, removing all covered examples once
each highest-scoring clause is added to the current theory.
5.2.1 Search and refinement operator
We use a top down approach similar to Aleph. The algorithm start the generalization by using
each class in the bottom clause. The bottom clause will contain conjunction of n class on the
Domain side, and the conjunction of n class on the Range side. This will produce n ⇥ n possible
pair class combination on the first level of generalization. We evaluate all of them, except the one
that compare the same classes and the already checked hypothesis. This is illustrated in Figure 5.2.
We use a common ILP scoring function, P ⇥ (P � N), where P is the number of positive
examples covered, and N – the number of negative examples covered. In the case that the solution
has the same score as another alternative, Aleph will only return the first solution found. In our
algorithm, we consider all the non-redundant hypotheses that are consistent with the examples (i.e.
covered zero negative and more than 2 positive). The search will not stop until all the possible
combinations have been considered.
6
5.1 Introduction
(a) Representation of background
knowledge as class hierarchy
(b) Representation of examples as individuals
(c) Representation of hypothesis language as an object property
Figure 5.1: Problem representation in Protege
5.1.3.2 Background knowledge
The di↵erence how we represent the attributes in Aleph is that they are written as predicate with
two arguments, while in our algorithm, we treat them as classes and their individual member (e.g.
a class of car with sedan body type; we specify any individual that has a body type of sedan is a
member of that class). In Aleph, the background knowledge is written as below:
car(car1). car(car2).
bodytype(sedan). bodytype(suv).
hasbodytype(car1,sedan). hasbodytype(car2,suv).
In our algorithm, the background knowledge is written in RDF/XML as below:
<owl:Class rdf:ID="sedan"/>
<owl:Class rdf:ID="suv"/>
<sedan rdf:ID="car1"></sedan>
<suv rdf:ID="car2"></suv>
5.1.3.3 Examples
In ILP, the examples are represented as ground facts with predicate betterthan/2, where the
arguments are of type car. The positive examples is written as: betterthan(car1,car2), while
5
Examples
PROBLEMREPRESENTATION
5.1 Introduction
(a) Representation of background
knowledge as class hierarchy
(b) Representation of examples as individuals
(c) Representation of hypothesis language as an object property
Figure 5.1: Problem representation in Protege
5.1.3.2 Background knowledge
The di↵erence how we represent the attributes in Aleph is that they are written as predicate with
two arguments, while in our algorithm, we treat them as classes and their individual member (e.g.
a class of car with sedan body type; we specify any individual that has a body type of sedan is a
member of that class). In Aleph, the background knowledge is written as below:
car(car1). car(car2).
bodytype(sedan). bodytype(suv).
hasbodytype(car1,sedan). hasbodytype(car2,suv).
In our algorithm, the background knowledge is written in RDF/XML as below:
<owl:Class rdf:ID="sedan"/>
<owl:Class rdf:ID="suv"/>
<sedan rdf:ID="car1"></sedan>
<suv rdf:ID="car2"></suv>
5.1.3.3 Examples
In ILP, the examples are represented as ground facts with predicate betterthan/2, where the
arguments are of type car. The positive examples is written as: betterthan(car1,car2), while
5
InProtégé
PROPOSEDALGORITHM
WefollowthefourbasicstepsusedintheProgol[7]/Aleph[8]greedylearningapproach:
1. Selectaposi>veexample.• EachinstanceoftherelaDoncanbeseenasapairofobjectIDs.
2. BuildtheboMomclause.• ThebocomclauseistheconjuncDonofallnon-disjointclassmembershipsforeachobjectinthepair.
3. Search.• Thisstepusesgreedybest-firstsearchtofindaclauseconsistentwiththedata.
4. Removecoveredposi>veexamples.• Ouralgorithmisgreedy,removingallcoveredexamplesonceeachhighest-scoringclauseisaddedtothecurrenttheory.
PROPOSEDALGORITHM
!
PROPOSEDALGORITHM
Learning from Ordinal Data with ILP in Description Logic 3
bottom clause contains the conjunction of n constraints (of type class member-ship) on the Domain side, and same number of constraints again on the Rangeside of the relation. This will produce n ⇥ n possible pairs on the first level ofgeneralisation. (We have chosen not to consider hypotheses only constrainingone of the arguments.) We evaluate all combinations of constraints, except theones that imply the same class membership of both arguments (i.e. X is better
than Y because they both share the same property/class membership) and thosethat have already been considered. This is illustrated in Figure 1.
(Thing) betterthan (Thing)
(Manual) betterthan (LargeCar) (Manual) betterthan (NonHybrid . . .
(Manual u Nonhybrid) betterthan (LargeCar
u Manual). . .
. . .
?(Manual u NonHybrid u SmallCar u Sedan) betterthan ((LargeCar u Manual u NonHybrid u Suv)
Fig. 1: Refinement Operator
We use a common ILP scoring function, P ⇥ (P �N), where P is the numberof positive examples covered, and N – the number of negative examples covered.In the case that the solution has the same score as another alternative, Alephwill only return the first solution found. In our algorithm, we consider all thenon-redundant hypotheses that are consistent with the examples (i.e. coveredzero negative and more than 2 positive). The search will not stop until all thepossible combinations have been considered.
If we have not found yet a consistent hypothesis, we continue to refine theone with the highest non-negative score, which means that we add a pair ofliterals to constrain each of the two objects in the relation. We stop at 2 literalseach for Domain and Range (this is the same as Aleph’s default clause lengthof 5). Similarly to Aleph, we also consider any examples where we cannot find aconsistent generalisation as exceptions. In this case, we add the bottom clauseas the consistent rule.
4 Algorithm complexity
We implement our algorithm in one of the DL family of languages, namely ALC(attributive language with complement) [14], the basic DL language which has
PROPOSEDALGORITHM
Searchandrefinementoperator:! Refinement:
– Ifwehavenotfoundyetaconsistenthypothesis,weconDnuetorefinetheonewiththehighestnon-negaDvescore,
– WeaddapairofliteralstoconstraineachofthetwoobjectsintherelaDon.
– Westopat2literalseachforDomainandRange(thisisthesameasAleph’sdefaultclauselengthof5).
! Excep>on:– SimilarlytoAleph,wealsoconsideranyexampleswherewecannotfind
aconsistentgeneralisaDonasexcepDons.Inthiscase,weaddthebocomclauseastheconsistentrule.
PROPOSEDALGORITHM
Searchandrefinementoperator:
! WeevaluateallcombinaDonsofconstraints,except:– theonesthatimplythesameclassmembershipof
botharguments• e.g.carAhasbodytypesedanandcarBhasbodytypesedan
– thosethathavealreadybeenconsidered.
PROPOSEDALGORITHM
!
EVALUATION
DatasetWeusetwopubliclyavailabledatasets,carpreferences[8]andsushipreferences[9]withthestaDsDcsbelow:
! Numberofitems:10items
! Numberofpairs:45preferencepairs(2-combinaDonof10)
! NumberofparDcipants:60users
! Numberofacributes:– cardatasethas4acributes(bodytype,transmission,fuelconsumpDon
andenginesize)
– sushidatasethas7acributes(style,major,minor,heaviness,howfrequentlyconsumedbyauser,priceandhowfrequentlysold)
EVALUATION
EvaluaDonmethod! Thegoal:
– toassesstheaccuracyofthepredicDvepowerofeachalgorithmtosolvethepreferencelearningproblem.
! Wecompareouralgorithmwiththreeothermachinelearningalgorithms:– SVM,theMatlabCARTDecisionTree(DT)learner,andAleph
EVALUATION
!
4 Nunung Nurul Qomariyah and Dimitar Kazakov
the least expressivity. ALC allows one to construct complex concepts from sim-pler ones using various language constructs. The capabilities include direct orindirect expression, e.g. concept disjointness, domain and range of roles, includ-ing the empty role.
The most expensive process is the membership checking part for every possi-ble hypothesis. This is used for scoring the hypothesis. For every single hypoth-esis the reasoner needs to check the coverage of each hypothesis. One possibleway to reduce the complexity is by minimising the search tree and checking theredundancy without reducing the accuracy.
5 Evaluation
Dataset. We use two publicly available preference datasets [3] [4]. Both thesushi and the car datasets have 10 items to rank which leads to 45 prefer-ence pairs per user. We take 60 users from each dataset and perform 10-foldcross validation for each user’s individual preferences. The car dataset has 4 at-tributes: body type, transmission, fuel consumption and engine size, while thesushi dataset has 7 attributes: style, major, minor, heaviness, how frequentlyconsumed by a user, price and how frequently sold. Despite the di↵erence in thenumber of attributes in the two datasets, we found that the maximum clauselength of 4 (in Aleph and in our algorithm) is su�cient to produce consistenthypotheses.
Evaluation method. The goal of this evaluation is to assess the accuracy of thepredictive power of each algorithm to solve the preference learning problem. Wecompare our algorithm with three other machine learning algorithms: SVM, theMatlab CART Decision Tree (DT) learner, and Aleph. SVM is a very commonstatistical classification algorithm that used in many domains. Similar work ofpairwise preference learning was performed by Qian et. al. [8] show that SVMcan also be used to learn in this domain. Both DT and Aleph can be includedin the evaluation since both of them are logic based learner, where the first is inpropositional logic and the latter is in First Order Logics.
We learn each individual preferences and test them using 10-fold cross vali-dation. The result is shown in Table 1 and Figure 2a. According to the ANOVAtest, the result shows that there is a significant di↵erence amongst the algorithmswith the p-value 2.0949⇥ 10�21 for the car dataset and 7.3234⇥ 10�36 for thesushi dataset.
Table 1: Mean and standard deviation of 10-fold cross validation testSVM DT Aleph Our algorithm
car dataset 0.8317±0.12 0.7470±0.10 0.7292± 0.08 0.8936±0.05sushi dataset 0.7604±0.09 0.8094±0.06 0.7789±0.06 0.9302±0.03
Learning from Ordinal Data with ILP in Description Logic 5
0.00.10.20.30.40.50.60.70.80.91.0
SVM Decision Tree Aleph Our algorithm
Accu
racy
10-fold cross validation accuracy
car dataset sushi dataset
(a) 10-cross validation accuracy
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1# training examples
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
accu
racy
Accuracy vs Number of Training Examples
SVMDecision TreeAlephOur algorithm
(b) Accuracy by varying number of training examples
Fig. 2: Evaluation results
We also perform several experiments with the algorithms by varying theproportion of training examples and test it on 10% of examples. For a morerobust result, we validate each cycle with 10-fold cross validation. The result ofthis experiments is shown in Figure 2b. We show that our algorithm still workbetter even with the smaller number of training examples.
Sample solutions found. Our algorithm can produce more readable resultsfor a novice user compared to Aleph. An example of consistent hypothesis foundby our algorithm is shown below:Automatic u Hybrid betterthan MediumCar u Suv
While Aleph produces rules, such as:betterthan(A,B) :-hasfuelcons(B,nonhybrid), hasbodytype(B,suv).
6 Conclusion and Further Work
In this paper, we have shown that the implementation of ILP in DL can beuseful to learn a user’s preferences from pairwise comparisons. We are currentlyworking to address the following limitations of our algorithm:
– We only consider one level class hierarchy in the ontology for simplicity. Inthe real world, the class hierarchy can be more complex.
– Currently, our algorithm uses the Closed World Assumption, which makesit easier to find a consistent hypothesis. This is not in line with the fact thatmost DL-based knowledge databases and their reasoners operate under theOpen World Assumption.
Results
Experiment2:
WesetdifferentproporDon
oftrainingexamplesand
testiton10%oftestdata.
Foramorerobustresult,wevalidateeachcycle
with10-foldcrossvalidaDon.
EVALUATION
ProporDonoftrainingexamples
EVALUATION
Samplesolu>ons
! ByusingDLrepresentaDon,wecanproducemorereadableresultsforanoviceuser
! Alephproducesrules,suchas:
5.6 Conclusion and Further Work
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1# training examples
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
accu
racy
Accuracy vs Number of Training Examples
SVMDecision TreeAlephOur algorithm
Figure 5.4: Accuracy vs number of training examples on both dataset
We also perform several experiments with the algorithms by varying the proportion of training
examples and test it on 10% of examples. For more robust result, we validate each cycle with
10-fold cross validation. The result of this experiments is shown in Figure 5.4. We show that our
algorithm still work better even with the smaller number of training examples.
5.5.3 Sample solutions found
Our algorithm can produce more readable results for a novice user compared to Aleph. An example
of consistent hypothesis found by our algorithm is shown below:
Automatic u Hybrid betterthan MediumCar u Suv
While Aleph produces rules, such as:
betterthan(A,B) :-hasfuelcons(B,nonhybrid), hasbodytype(B,suv).
5.6 Conclusion and Further Work
In this paper, we have shown that the implementation of ILP in DL can be very beneficial to
estimate a preference from pairwise comparisons. We are currently working on the improvement
of some limitations of our algorithm as mentioned below:
• We only consider one level class hierarchy in the ontology for simplicity. In the real world,
the class hierarchy can be more complex.
10
5.6 Conclusion and Further Work
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1# training examples
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
accu
racy
Accuracy vs Number of Training Examples
SVMDecision TreeAlephOur algorithm
Figure 5.4: Accuracy vs number of training examples on both dataset
We also perform several experiments with the algorithms by varying the proportion of training
examples and test it on 10% of examples. For more robust result, we validate each cycle with
10-fold cross validation. The result of this experiments is shown in Figure 5.4. We show that our
algorithm still work better even with the smaller number of training examples.
5.5.3 Sample solutions found
Our algorithm can produce more readable results for a novice user compared to Aleph. An example
of consistent hypothesis found by our algorithm is shown below:
Automatic u Hybrid betterthan MediumCar u Suv
While Aleph produces rules, such as:
betterthan(A,B) :-hasfuelcons(B,nonhybrid), hasbodytype(B,suv).
5.6 Conclusion and Further Work
In this paper, we have shown that the implementation of ILP in DL can be very beneficial to
estimate a preference from pairwise comparisons. We are currently working on the improvement
of some limitations of our algorithm as mentioned below:
• We only consider one level class hierarchy in the ontology for simplicity. In the real world,
the class hierarchy can be more complex.
10
CONCLUSIONS
! WehaveshownthattheimplementaDonofILPinDLcanbeusefultolearnauser’spreferencefrompairwisecomparisons.
! Currently,ouralgorithmusestheClosedWorldAssumpDon,whichmakesiteasiertofindaconsistenthypothesisandtestthecoverage.
! Intermofaccuracy,wehaveshownthatourproposedalgorithmoutperformedtheotheralgorithmsevenwiththesmallernumberoftrainingexamples.
FUTUREWORK
WearecurrentlyworkingtoaddressthefollowinglimitaDonsofouralgorithm:
– Workingwithmorecomplexclasshierarchy(includesnegaDons,unionsandquanDfiers)
– WorkingundertheOpenWorldAssumpDon(OWA)
– Theimprovementofthesystemperformanceandscalability• e.g.implemenDngtriplestore
! WeplantoimplementouralgorithminRecommenderSystemapplicaDonandinvitetherealuserstoevaluateusinglargerrealdatasetfromAutotrader(7361cars)
REFERENCES
[1] Lehmann,J.:DL-Learner:LearningConceptsinDescripDonLogics.In:TheJournalofMachineLearningResearch,vol.10,pp.2639–2642.(2009)
[2] Iannone,L.,Palmisano,I.,Fanizzi,N.:AnAlgorithmBasedonCounterfactualsforConceptLearninginTheSemanDcWeb.In:AppliedIntelligence,vol.26no.2,pp.139–159.Springer,Heidelberg(2007)
[3] Fanizzi,N.,d’Amato,C.,andEsposito,F.:DL-FOILConceptLearninginDescripDonLogics.In:ProceedingsofInducDveLogicProgramming,ser.LNCS,vol.5194,pp.107–121.Springer,Heidelberg(2008)
[4] Kietz,J.:LearnabilityofDescripDonLogicPrograms.In:ProceedingsofInducDveLogicProgramming,ser.LNCS,vol.2583,pp.117–132.Springer,Heidelberg(2002)
[5] Konstantopoulos,S.andCharalambidis,A.:FormulaDngDescripDonLogicLearningasAnInducDveLogicProgrammingTask.In:ProceedingsofIEEEWorldCongressonComputaDonalIntelligence,pp.1–7(2010)
[6] FürnkranzJ.andHüllermeierE.:Preferencelearning:AnintroducDon.Springer(2010)[7] Muggleton,S.:InverseEntailmentandProgol.In:NewGeneraDonCompuDng,vol.13no.3-4,pp.
245–286(1995)[8] Srinivasan,A.:TheAlephManual.In:TechnicalReport.CompuDngLaboratory,OxfordUniversity
(2000),hcp://web.comlab.ox.ac.uk/oucl/research/areas/machlearn/Aleph/[9] Abbasnejad,E.andSanner,S.andBonilla,E.V.andPoupart,P.:LearningCommunity-based
PreferencesviaDirichletProcessMixturesofGaussianProcesses.In:Proceedingsofthe23rdInternaDonalJointConferenceonArDficialIntelligence(IJCAI)(2013)
[10]Kamishima,T.:NantonacCollaboraDveFiltering:RecommendaDonBasedonOrderResponses.In:ProceedingsoftheNinthACMSIGKDDInternaDonalConferenceonKnowledgeDiscoveryandDataMining,pp.583–588.ACM,NewYork(2003)
Thankyou
AnyquesDonorfeedback?