questions and answers: reasoning and querying in description …tessaris/papers/phd-thesis.pdf ·...

QUESTIONS AND ANSWERS:

REASONING AND QUERYING IN

DESCRIPTION LOGIC

A THESIS SUBMITTED TO THE UNIVERSITY OF MANCHESTER

FOR THE DEGREE OF DOCTOR OF PHILOSOPHY

IN THE FACULTY OF SCIENCE AND ENGINEERING

April 2001

Sergio Tessaris

Department of Computer Science

Contents

Abstract 5

Declaration 6

Copyright 7

Acknowledgements 8

1 Introduction 9

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.1.1 Logic based knowledge representation . . . . . . . . . . . . . 10

1.1.2 Description logics . . . . . . . . . . . . . . . . . . . . . . . 11

1.2 Reasoning with DL knowledge bases . . . . . . . . . . . . . . . . . . 13

1.3 Thesis outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

2 Preliminaries 17

2.1 Description Logics . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

2.1.1 Syntax and Semantics . . . . . . . . . . . . . . . . . . . . . 18

2.1.2 DL based systems . . . . . . . . . . . . . . . . . . . . . . . 22

2.2 Reasoning with DLs . . . . . . . . . . . . . . . . . . . . . . . . . . 25

2.2.1 Terminological reasoning . . . . . . . . . . . . . . . . . . . . 26

2.2.2 Hybrid reasoning . . . . . . . . . . . . . . . . . . . . . . . . 30

2.3 Role axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

2.4 The description logic ��f . . . . . . . . . . . . . . . . . . . . . . . 35

3 Model characterisation of ��f 36

3.1 Quasi transitive shrub interpretations . . . . . . . . . . . . . . . . . . 37

3.1.1 Canonical interpretations . . . . . . . . . . . . . . . . . . . . 41

3.2 Transforming interpretations into q.t. shrubs . . . . . . . . . . . . . . 46

3.2.1 Unravelling the interpretation . . . . . . . . . . . . . . . . . 48

3.2.2 Properties of unravelled interpretations . . . . . . . . . . . . 50

3.3 Completeness of q.t. shrub interpretations . . . . . . . . . . . . . . . 59

3.4 Application to terminological reasoning . . . . . . . . . . . . . . . . 61

3.5 Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4 KB satisfiability algorithms 64

4.1 Alternative Abox reasoning techniques . . . . . . . . . . . . . . . . . 64

4.1.1 Direct tableaux . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.1.2 Encoding . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.1.3 Resolution based methods . . . . . . . . . . . . . . . . . . . 66

4.2 Precompletion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4.2.1 KB satisfiability algorithm . . . . . . . . . . . . . . . . . . . 68

4.2.2 Correctness and completeness of the algorithm . . . . . . . . 71

4.2.3 Pros and cons of precompletion . . . . . . . . . . . . . . . . 74

5 Precompletion algorithm for ��f 77

5.1 Precompletions of knowledge bases . . . . . . . . . . . . . . . . . . 77

5.2 Satisfiability of precompletions . . . . . . . . . . . . . . . . . . . . . 83

5.2.1 Interpretation of roles . . . . . . . . . . . . . . . . . . . . . . 87

5.2.2 Interpretation of concepts . . . . . . . . . . . . . . . . . . . 95

5.2.3 Satisfiability of a precompleted KB . . . . . . . . . . . . . . 98

5.3 Knowledge base satisfiability . . . . . . . . . . . . . . . . . . . . . . 100

6 Querying Aboxes 101

6.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

6.2 Conjunctive Queries in DL . . . . . . . . . . . . . . . . . . . . . . . 103

6.2.1 Queries with multiple terms . . . . . . . . . . . . . . . . . . 104

6.2.2 Queries with variables . . . . . . . . . . . . . . . . . . . . . 106

6.2.3 Extending the framework . . . . . . . . . . . . . . . . . . . . 110

6.3 Relation with other works . . . . . . . . . . . . . . . . . . . . . . . . 111

7 Answering conjunctive queries in ��f 113

7.1 Query language . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.1.1 Syntax . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113

7.1.2 Semantics . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

7.2 Completeness of quasi transitive shrub interpretations . . . . . . . . . 115

7.3 Answering boolean queries . . . . . . . . . . . . . . . . . . . . . . . 117

7.3.1 Query normal form . . . . . . . . . . . . . . . . . . . . . . . 118

7.3.2 Conjunctive queries . . . . . . . . . . . . . . . . . . . . . . . 120

7.3.3 Disjunction of conjunctive queries . . . . . . . . . . . . . . . 121

7.4 Answering queries in normal form . . . . . . . . . . . . . . . . . . . 123

7.4.1 Query transformation rules . . . . . . . . . . . . . . . . . . . 123

7.4.2 Notes on transformation rules . . . . . . . . . . . . . . . . . 129

7.4.3 Limitation of the technique . . . . . . . . . . . . . . . . . . . 139

7.4.4 Example of query answering . . . . . . . . . . . . . . . . . . 142

7.4.5 Termination . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

7.4.6 Correctness and completeness . . . . . . . . . . . . . . . . . 153

7.5 Speeding up the answer . . . . . . . . . . . . . . . . . . . . . . . . . 168

8 Implementation and testing 170

8.1 Description of the system . . . . . . . . . . . . . . . . . . . . . . . . 170

8.2 Optimising the algorithm . . . . . . . . . . . . . . . . . . . . . . . . 171

8.2.1 Evaluation strategy . . . . . . . . . . . . . . . . . . . . . . . 171

8.2.2 Axiom absorption and lazy expansion . . . . . . . . . . . . . 172

8.2.3 Lexical normalisation . . . . . . . . . . . . . . . . . . . . . . 172

8.2.4 Backjumping . . . . . . . . . . . . . . . . . . . . . . . . . . 173

8.2.5 Abox partitioning . . . . . . . . . . . . . . . . . . . . . . . . 175

8.2.6 Using the terminological reasoner . . . . . . . . . . . . . . . 176

8.2.7 Other techniques . . . . . . . . . . . . . . . . . . . . . . . . 179

8.3 Experiments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 180

8.3.1 DL benchmark suite . . . . . . . . . . . . . . . . . . . . . . 180

8.3.2 Measurements . . . . . . . . . . . . . . . . . . . . . . . . . 182

8.3.3 Notes on results . . . . . . . . . . . . . . . . . . . . . . . . . 184

9 Conclusions 190

9.1 Thesis contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . 190

9.1.1 KB satisfiability . . . . . . . . . . . . . . . . . . . . . . . . 190

9.1.2 Query answering . . . . . . . . . . . . . . . . . . . . . . . . 191

9.2 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

Bibliography 193

Abstract

Description Logics (DLs) are a family of formal languages for describing complexstructured classes. These languages contain boolean operators and quantification overclass attributes, as well as the specification of elements of the classes and their proper-ties. DL knowledge bases (KBs) consist of a terminological part which describes thegeneral organisation of the classes, and an assertional part for describing the propertiesof individuals. In spite of its inherent intractability, practical algorithms have been re-cently developed for purely terminological reasoning; when individuals are introducedthere is still a lack of results.

This thesis constitutes an advance in the direction of the development of DL sys-tems providing efficient and powerful reasoning in presence of individuals. We choosean expressive DL (��f), which has been previously studied from the terminologicalperspective (i.e. without individuals), and we investigate the problem of reasoning withindividuals.

��f extends the standard DL �� with transitive roles, role hierarchy, and at-tributes. This extension enables the representation of general inclusion axioms usinga technique called internalisation. This thesis investigates two automated reasoningproblems: KB satisfiability and query answering.

KB satisfiability is the fundamental problem of deciding whether a given KB doesnot contain any contradiction. This problem is not only important for delivering con-sistent KBs, but most of the reasoning tasks for DL systems can be reduced to KBsatisfiability. We show how the technique of precompletion, which has previously onlybe used in less expressive DLs, can be used in our expressive DL.

A serious shortcoming of many Description Logic based knowledge representationsystems is the inadequacy of their query languages. We present a novel technique thatcan be used to provide an expressive query language for such systems. One of themain advantages of this approach is that, being based on a reduction to knowledgebase satisfiability, it can easily be adapted to most existing (and future) DescriptionLogic implementations. We believe that providing Description Logic systems withan expressive query language for interrogating the knowledge base will significantlyincrease their utility.

Declaration

No portion of the work referred to in this thesis has been

submitted in support of an application for another degree or

qualification of this or any other university or other institu-

tion of learning.

Copyright

Copyright in text of this thesis rests with the Author. Copies (by any process) either

in full, or of extracts, may be made only in accordance with instructions given by the

Author and lodged in the John Rylands University Library of Manchester. Details may

be obtained from the Librarian. This page must form part of any such copies made.

Further copies (by any process) of copies made in accordance with such instructions

may not be made without the permission (in writing) of the Author.

The ownership of any intellectual property rights which may be described in this

thesis is vested in the University of Manchester, subject to any prior agreement to the

contrary, and may not be made available for use by third parties without the written

permission of the University, which will prescribe the terms and conditions of any

such agreement.

Further information on the conditions under which disclosures and exploitation

may take place is available from the head of Department of Computer Science.

Acknowledgements

I would like to thank all the people who helped and inspired me during the period ofmy PhD: my supervisors (in order of appearance) Carole Goble, Graham Gough, andIan Horrocks, the present and past members of Information Management Group, andseveral other people in the Department of Computer Science (the list would be toolong).

I am particularly grateful to Ian Horrocks and Graham Gough for their support,both scientific and human. Moreover, without their tireless proofreading this thesiswould have been hardly readable.

I would like to mention Enrico Franconi as the principal cause for my involvementin the DL community, and to thank him for his advice.

This work has been supported by a grant from the Engineering and Physical Sci-ences Research Council.

Chapter 1

Introduction

Description logics are knowledge representation systems evolved from early semantic

networks and frame systems. They have been developed as a logical reconstruction

of the KL-ONE–like knowledge representation systems (see Woods and Schmolze

[1992]). They differ from their ancestors in that they place a strong emphasis on a

precise semantic characterisation of the modelling language. The need for a unequiv-

ocally defined semantics stems from the fact that having a clear semantics enables the

description and justification of the automated deduction processes.

Description logics have been proved effective in several application fields including

automated language understanding and generation, configuration and data modelling

(see Franconi [1994], Calvanese et al. [1998b]). In addition, their formal semantics

enables the exact characterisation of the expressiveness of a DL system against its

computational properties.

Although good results have recently been obtained in the development of optimised

algorithms for DL knowledge bases without individuals, little progress has been made

with reasoning involving individuals. This research is an attempt to fill this gap, by

exploring the possibility of developing optimised algorithms.

A serious shortcoming of many Description Logic based knowledge representation

systems is the inadequacy of their query language. In this thesis we present a novel

technique that can be used to provide an expressive query language for such systems.

This chapter presents an overview of background of the thesis, together with the

overall description of the document.

1.1. BACKGROUND 10

1.1 Background

1.1.1 Logic based knowledge representation

This thesis concerns knowledge representation systems that represent the domain using

a formal language. In these systems, a precise semantics unambiguously defines the

meaning of a knowledge base written using the language. In addition, the knowledge

representation system provides a set of automated reasoning services to manipulate

and/or interrogate knowledge bases.

In particular it is assumed that the knowledge can be partitioned into terminological

and assertional parts. The terminological part (or simply terminology) describes the

structural properties of the terms of the domain, while the assertional part depicts a

particular configuration of the domain by introducing individuals and asserting their

properties using the definitions in the terminology.

Example 1.1

TerminologyBirds are Animals.

Penguins are Birds and not Flyers.

Assertionstweety is-a Bird and not Flyer and

has FRIEND which-is Flyer.

This example shows a simple knowledge base written using a natural language–

like syntax. The terminological part defines the properties of Birds and Penguins,

while the assertional part states some properties of a particular individual (tweety).

This individual is described as a bird which cannot fly, having at least one friend which

is a flyer.

Partitioning knowledge in this way is quite a common methodology in computer

science. For instance, in the database setting the distinction between schemata and an

actual database could be seen as the dichotomy between the terminology and the as-

sertional part. Generally speaking there are two main reasons to adopt the partitioning

technique; firstly, the same terminology could be used for several knowledge bases,

and secondly, different algorithms can be used on each part, exploiting the different

characteristics of the two components.

Two other important features of knowledge representation systems are the possi-

bility of adopting an open world semantics for the knowledge base (see Reiter [1977]),

as well as the ability to represent incomplete information.

1.1. BACKGROUND 11

The adoption of an open world semantics leaves a degree of uncertainty regarding

facts that are not explicitly asserted. For example, individuals not mentioned in the

knowledge base could be part of the domain and each individual mentioned could have

more properties than the asserted ones. Referring to Example 1.1, tweety could

either be a penguin or not, and there could also be additional birds or animals in that

little world.

Incomplete information can appear in different forms. Firstly, the implied existence

of individuals without explicitly mentioning them: the knowledge base in Example 1.1

implies the existence of at least one flyer which is friend of tweety. Secondly, in

the form of disjunctive information, for example it could be specified that tweety’s

friend should be either an auk or a puffin.

Description logics knowledge representation systems take advantage of the above

properties. The next section gives an overview of the main features of this Description

Logic formalism, while a more formal description is provided in Chapter 2.

1.1.2 Description logics

There are several ways to give an intuition about what a description logic is. For a

knowledge representation audience, DLs could be viewed as a logical reconstruction

of frame systems and semantic networks. An object oriented database audience can

look at DLs as the static (declarative) part of a data definition language, while a lo-

gician could look at them as a syntactic variant of a multi-modal modal logic. Given

this intuition, dipping into the literature reveals a strong emphasis on the formal logic

nature of DLs (see KRSS). In fact a DL consists of a formal language, described by a

clear set theoretical semantics, together with a calculus (see Chapter 2).

Language expressions of a DL describe classes (concepts) of individuals that share

some properties. Properties can also be specified by means of relations (roles) between

individuals. The language is compositional, i.e. the concept descriptions are built by

combining different subexpressions using constructors. For instance, if the concepts

STAFF and TECHNICIAN are defined, the notion of technical staff can be represented

by putting together the two concepts in the expression �� . The kind

of expressions that can be built using a DL language are

Bird � �Flyer � ��FRIEND�Flyer�

Vehicle � ��PART-OF�Engine� � �� PART-OF�Wheel�

where elements in typewriter style are terms of the language (concepts in small letters

1.1. BACKGROUND 12

and roles in capital letters), while �, �, � are language constructors.1

The semantics of the language is given in a set theoretical way, over a domain which

is a set of elements. A concept expression corresponds to a subset of the domain, while

a role expression corresponds to a binary relation over the domain. As their shape

suggests, some of these constructors have a strong relationship with boolean operators

and logical quantifiers.

A DL knowledge base consists of a terminology, traditionally called the Tbox,

and an assertional part, called Abox. The Tbox contains the definitions of the terms

(concept definitions), while the Abox contains a set of membership and role assertions.

Membership assertions relate an individual to a concept, stating that the individual

involved is an instance of the concept. Role assertions state that two individuals are

linked by a given role. Example 1.1 could be expressed by the DL knowledge base in

Example 1.2.

Example 1.2

TboxBird Animal

Penguin Bird � �Flyer

Abox tweety��Bird � �Flyer � ��FRIEND�Flyer��

Reasoning with DL knowledge bases is a deduction process which extracts not only

the facts explicitly asserted in a knowledge base, but also their logical consequences.

For instance, from the knowledge base of Example 1.2 can be derived the fact that

tweety is an Animal. When only the terminology is involved in a deduction, the

reasoning is said to be terminological; otherwise it is said to be hybrid.

DL systems provide automated reasoning services for interacting with the knowl-

edge base. Typical reasoning services provided are: verifying the non–contradiction of

assertions in the knowledge base, checking that an individual satisfies a concept expres-

sion, retrieval of all the individuals satisfying a concept expressions and subsumption

checking, which verifies whether one expression is more general than another.2 Nev-

ertheless, it has been proved that all services can be reduced in polynomial time to the

problem of knowledge base satisfiability (see Schaerf [1994]), that is the checking for

the absence of contradicting assertions in a given knowledge base.

1The meaning of the different elements of the language will be clarified in Chapter 2.2An expression is more general than another if its corresponding set always includes that of the

second concept.

1.2. REASONING WITH DL KNOWLEDGE BASES 13

The term complexity of reasoning in this context refers to the complexity of the

knowledge base satisfiability problem, and it characterises the computational proper-

ties of a DL system. The complexity actually depends on the expressiveness of the

underlying DL language. Moreover, the language (and its expressiveness) is defined

by the kinds of constructors it provides. Therefore the complexity is determined by the

constructors provided by the language.

One example of the influence of the constructors on the computational properties

is the interaction of the role conjunction constructor with the number restriction con-

structor. The role conjunction constructor builds a new role expression from the inter-

section of atomic roles; while number restriction constructor constrains the number of

individuals related by a given role. These two kinds of constructor do not cause any

problem separately, but when they are taken together (i.e. allowing role conjunction in

the role of the number restriction) they cause the undecidability of the KB satisfiability

problem (see Baader and Sattler [1996]).

Different applications need different language constructors, and the modularity of

the language facilitates the addition of new constructors. The result is that there is not

a single DL language, but a family of different languages. A big effort has been spent

in extending the expressiveness of DL languages, while maintaining the soundness and

decidability of the reasoning services (see for example Buchheit et al. [1993]).

Nowadays the undecidability boundaries and the complexity of the reasoning are

well understood on the theoretical side (see Donini et al. [1997]). However there is still

a lot work to do on the definition of “practical” algorithms, and the analysis of the in-

teraction between the different constructors is still an active topic in the DL community

(see Franconi et al. [1998a]).

1.2 Reasoning with DL knowledge bases

Description logics have proved to be effective in various applications, both those that

require an assertional part and those that do not. In fact, while DLs were initially

developed to be the knowledge representation part of an AI system, recent work has

showed that their ability to reason in an abstract way about a domain3 could be ex-

tremely useful for knowledge modelling tasks. For example, DLs could be used as

tools for checking the consistency of entity–relationship or object oriented schemata

(see Calvanese et al. [1998b]).

3Without the necessity to populate the domain with individuals.

1.2. REASONING WITH DL KNOWLEDGE BASES 14

Recent years have seen significant advances in the design of sound and complete

reasoning algorithms for DLs with both expressive logical languages and unrestricted

Tboxes, i.e., those allowing arbitrary concept inclusion axioms (see Baader [1991],

Horrocks and Sattler [1999], De Giacomo and Massacci [1998]). Moreover, systems

using highly optimised implementations of (some of) these algorithms have also been

developed, and have been show to work well in realistic applications (see Horrocks

[1998], Patel-Schneider [1998]). While most of these have been restricted to termino-

logical reasoning (i.e., the Abox is assumed to be empty), attention is now turning to

the development of both algorithms and (optimised) implementations that also support

Abox reasoning (see Haarslev and Moller [1999], Tessaris and Gough [1999]).

This research starts with the premise that there are still several key applications

for a DL hybrid system. Generally speaking, applications that require the ability to

state incomplete knowledge about the individuals in the domain are good candidates

for the use of a hybrid DL; one example is natural language processing (see Franconi

[1994]). The optimisations recently applied to Tbox algorithms (see Horrocks and

Patel-Schneider [1998a]) give pointers towards the improvement or redesign of current

Abox algorithms in order to enhance performance in realistic applications.

In this thesis we discuss techniques and algorithms used to deal with two essential

reasoning tasks: knowledge base satisfiability and query answering. We focus on a

particular DL language, which is implemented in the terminological system4 FaCT

(see Horrocks [1998]). This DL, called ��f,5 possesses most of the relevant features

of an expressive DL language. In addition, experience with the FaCT system has shown

that ��f is well suited for knowledge representation modelling.

Several techniques have been proposed for the verification of the satisfiability of a

KB, and many of them have lead to the development of DL systems. However, some

of these techniques have been developed for DLs less expressive than is required for a

modern system. We investigate the extension of one of these techniques, the so called

precompletion technique (see Hollunder [1996]), for reasoning with ��f.

Although modern DL systems provide sound and complete Abox reasoning for

very expressive logics, their utility is limited compared to earlier DL systems by their

very weak Abox query languages. Typically, these only support instantiation (is an

4DL system without the Abox.5��f has been previously called ��f�� , but the new name follows a more compact naming

scheme recently introduced in Horrocks et al. [1999b].

1.3. THESIS OUTLINE 15

individual � an instance of a concept �), realisation (what are the most specific con-

cepts � is an instance of) and retrieval (which individuals are instances of �). This is in

contrast to a system such as Loom where a full first order query language is provided,

although based on incomplete reasoning algorithms (see MacGregor and Brill [1992]).

We present a technique for answering conjunctive queries over DL KBs. This

work has been inspired by the use of Abox reasoning to decide conjunctive query

containment (see Horrocks et al. [1999a], Calvanese et al. [1998a]). However, the

characteristics of ��f forces the use of an algorithm which is much more complicated

of those presented in these previous works.

Our main contributions can be summarised in the following points.

The extension of the already known precompletion technique for KB satisfia-

bility to the DL language ��f. Previously the technique was applied to DLs

without general inclusion axioms, transitive roles, or role hierarchies.

A query answering algorithm for disjunctions of conjunctive queries for ��f

KBs with Aboxes.

A prototype software system performing KB satisfiability for ��f has been de-

veloped. The system has been used for verifying the practical feasibility of the

precompletion technique, and the impact of different optimisations on the per-

formance of the system.

A characterisation of the class of models of the logic ��f. This was previously

done (implicitly) for the problem of KB satisfiability, but to our knowledge, it is

the first result for the query answering.

The next section gives an overview of the structure of this thesis.

1.3 Thesis outline

The thesis contains both introductory and technical chapters, in particular the reader

can get an overview of the work presented by reading the even numbered chapters only.

Chapter 2 gives a formal description of the Description Logics in general, and in

particular the DL ��f which is used in the following chapters.

1.3. THESIS OUTLINE 16

Chapter 3 describes the class of interpretations which characterise the logic ��f.

The results from this chapter are used in the proofs presented in Chapter 5 and Chap-

ter 7. This model–theoretic characterisation of ��f is not only essential for most of

the proofs, but it provides an insight on the expressiveness of ��f.

Chapter 4 and Chapter 5 investigate the problem of Knowledge Base satisfiability

for ��f. The first gives an overview of the methods which have been used for tackling

the problem, and provides an introduction to the technique we investigated. Chapter 5

provides a formal description of the precompletion algorithm, and shows the proofs

for its correctness and completeness.

Chapter 6 and Chapter 7 introduce and provide a solution to the problem of answer-

ing conjunctive queries over ��f knowledge bases. The first chapter gives an intuitive

presentation of the problem and our solution; while Chapter 7 provides the full details.

Chapter 8 describes the experiments we performed with an implementation of the

algorithm for KB satisfiability. Finally, Chapter 9 draws conclusions from this work

and presents further lines of research suggested by the results achieved so far.

Chapter 2

Preliminaries

In this chapter Description Logics (DLs) are formally introduced, together with the

description of what we intend by the term DL based system. In Section 2.1 we present

the syntax and semantics of the logics belonging to the class of DLs, followed by the

description of DL based systems. Section 2.2 introduces common reasoning problems

over DL knowledge bases (KBs), and the automation of the reasoning itself. Sec-

tion 2.3 describes an extension of DL KBs with the introduction of axioms on roles.

Finally, we describe the DL language we investigate, and which will be used towards

this thesis.

2.1 Description Logics

The core of Description Logics is the concept language, a formal language designed to

describe classes and relationships between elements of the classes. DL based knowl-

edge bases are built using concept languages expressions, and they are usually divided

in two distinct parts: intensional and extensional. The intensional part describes the

general schema of the classes and relationships, while the extensional part constitutes

a (partial) instantiation of the schema, since it contains assertions about a set of indi-

viduals. Historically the intensional part takes the name of terminology or Tbox, and

the extensional part the name of assertional part or Abox. Queries on a DL knowl-

edge base are answered by an inferential process involving both the Tbox and Abox

parts. The answer to a query is deduced as logical consequences of the content of the

knowledge base according to the formal semantics.

2.1. DESCRIPTION LOGICS 18

2.1.1 Syntax and Semantics

All description logic systems are based on a common family of languages, called con-

cept languages, for describing structured classes of objects. The foundations of con-

cept languages are concepts and roles: a concept represents a class of objects sharing

some common characteristics, while a role represents a binary relation between objects

or, in other words, attributes attached to objects.

The language is completely described by a formal syntax and a Tarsky-like seman-

tics. A formal definition of the language is essential for knowledge bases characterisa-

tion, and for the definition of reasoning services.

A concept language is composed of an alphabet of distinct concept names (�� ),

role names (�� ) and individual names ( ); together with a set of constructors for

building concept and role expressions. Concept expressions (or simply concepts) de-

scribe subsets of the domain; for example Motorcycles � Cars denotes the union

of the elements in Motorcycles and in Cars. Describing the abstract syntax, con-

cept names are indicated by letter � or �, role names by � , and individual names by

lowercase letters �, �. Concept expressions are indicated by letters � or �, and role

expressions by or .

Description logics form a family of different logics, distinguished by the set of con-

structors they provide. Each language is named according to a convention introduced

in Schmidt-Schauss and Smolka [1991]: each constructor is associated to a different

capital letter (e.g. � to disjunction and � to negation) and the name of a language is

composed by the prefix �� (acronym for attributive language) followed by the letters

correspondent to the constructors in the language (e.g. �� or ��). One of the rea-

sons for having such a fine granularity on the classification of concept languages is that

the computational properties of the reasoning changes dramatically with the presence

or absence of a particular constructor; therefore the ability to indicate precisely the

constructors which are in a DL is very useful.

The very basic language denoted by the prefix �� is close to the expressivity

of frame based representation systems. It enables the specification of hierarchies of

concepts by means of the conjunction of two concepts (�). The expression � � �

denotes the intersection of the elements in � and in �. The hierarchy comes from the

fact that � �� is more specific than both � and �, because it denotes a smaller set of

elements. A second group of constructors in �� enables the specification of attributes

by the unqualified existential (��) and qualified universal (��) quantifiers. These

expressions build new concepts in terms of the roles: expression �� denotes the set

of elements related to some other element by the role , while �� denotes the set

of elements which are related by the role exclusively to elements of the concept

�. Using frame systems nomenclature, the existential quantifier represents the fact

of having attribute , while the universal quantifier specifies the type restriction of

that attribute. In addition, the concept language �� has the ability to represent the

complement of concept names by the expression ��,1 and it provides the symbols �

and � for the full domain and the empty set respectively.

The semantics of DL constructors is defined in terms of an interpretation � �

�� consisting of a nonempty domain �� and a interpretation function �� . The in-

terpretation function maps concept names into subsets of the domain, role names into

subsets of the cartesian product of the domain (��), and individual names into el-

ements of the domain. The only restriction on the interpretations is the so called unique

name assumption, which imposes that different individual names must be mapped into

distinct elements of the domain. Given a concept name� (role name � ) the set denoted

by �� (� �) is called the interpretation, or extension, of � () w.r.t. �.

Example 2.1

For example, given the concept names Even, Odd, the role SUCC, and the individuals

one,two,three and four. A possible interpretation is composed by the natural

numbers �� and the interpretation function defined by:

Odd� � ��

Even� � ��

SUCC� � ��

one� � �

two� � �

three� � �

four� � �

Note that at this stage there is no way to say that an interpretation is correct or

wrong. In this example the interpretation �� “looks correct” because of the natu-

ral language semantics we associate to the names, but a different interpretation where

the role SUCC is mapped to �� would be a valid interpretation as well.

1But not the negation of a general concept expression.

Syntax Semantics Description

� �� concept name� �� top� � bottom

� �� conjunction��

��

�universal quantification

��

�existential quantification (�)

� �� general negation (�)� �� disjunction (�)� ��

��

�number restriction (� )

� ��

��

� ��

��

�qualified number restriction(�)

� ��

��

� ��

��

one-of (�)

Table 2.1: Syntax and Semantics of concept expression constructors.

Syntax Semantics Description

� � � � �� role name � � �� role conjunction (�) � � �� role disjunction��

��

�converse role (�)

Æ � Æ� role composition�

��

�test

��

�� reflexive transitive closure: ��

�� , and �� Æ�

Table 2.2: Syntax and Semantics of role expression constructors.

Interpretations can be naturally seen as directed labeled graphs; the nodes are the

elements of the interpretation domain, while the interpretation function provides both

the labeled edges and the set of node labels. For example, the interpretation � of

Example 2.1 is represented by the graph

� � � �

one two three four

�Odd� �Odd� �Odd��Even� �Even�

�SUCC� �SUCC� �SUCC� �SUCC�

�SUCC�

Note that the node and edge labels are sets of concept and role names respectively,

because a single element of the domain can be in the extension of more than one con-

cept name (and analogously a pair of elements in the extension of several roles). In the

graph we indicated the mapping for the individual names with the dotted lines, but in

general this will be omitted when we are interested in the structure of the interpretation

Expressions are interpreted according the semantics given in Table 2.1 and Ta-

ble 2.2; for instance, with respect to the Example 2.1 the expression ��SUCC�Even��

is interpreted as the set �� . The interpretation of a concept expression can be the

empty set as well; for instance, with respect to the interpretation of Example 2.1 the

set �Even � ��SUCC�Even�� is empty.

An interpretation is said to be a model for a concept expression if the extension of

that expression is nonempty. The existence of a model for a concept expression defines

the satisfiability of the concept itself; i.e. a concept expression � is satisfiable if and

only if there exists an interpretation � such that �� is nonempty.

Different concept expressions can be compared with respect to their models. Two

concepts are equivalent if and only if their extensions are equal with respect to every

interpretation. In a similar way, a concept is subsumed by another if the extension

of the first one is included in that of the second with respect to every interpretation.

Formally, concept � is subsumed by concept � (written as � �) if and only if for

any interpretation � the inclusion �� holds.

2.1.2 DL based systems

Traditionally, the knowledge base of a DL-based system is composed of two distinct

parts: the intensional (Tbox) and the extensional (Abox) components.

Terminology

The intensional part describes the relation between concepts and roles expressions. It

can be seen under two different lights: as a collection of definitions for role and concept

names, or as a set of axioms that restrict the models for the knowledge base. The Tbox

is composed of a set of statements of the forms:

�� (2.1)

� � (2.2)

The first statement asserts that the concept expressions � and � are equivalent,

while the second that concept expression � is more specific that (or included in) ex-

pression �. When the left hand side of the statement is a concept name (�� or

� �), the statement is said to be definitional, because it defines the characteristics

of the concept name. Examples of general statements are �Odd � Even� � and

� �� SUCC�, while Integer�� Odd � Even� � �� SUCC� is a definition.

The semantics of Tbox statements is given in terms of interpretations analogously

to the subsumption and equivalence seen in Section 2.1.1. An interpretation � satisfies

(is a model of) the statement�� if and only if�� , and it satisfies�� if and

only if�� . The generalisation to a full Tbox is straightforward, an interpretation

is a model for a Tbox if it satisfies all the statements contained in the Tbox.

DL systems are also characterised by the kind of statements they allow on the

Tbox. Systems with free Tboxes allow any kind of assertions, others restrict to the

definitional statements or to acyclic assertions only. A terminological cycle in a Tbox

is a recursive definition, or one or more mutually recursive definitions. Examples of

recursive definitions are Human �OFFSPRINGS�Human or the pair of definitions�Odd

�� Even � �SUCC�Even��

Even �SUCC�Odd

��

Allowing terminological cycles in the Tbox rises several problems, on both the se-

mantic and computational aspects (see Nebel [1991]). For this reason, most of the early

DL systems do not allow cycles in the terminology (see Borgida and Patel-Schneider

[1994], Baader and Hollunder [1991b]). On the other hand, recursive definitions are

ubiquitous in computer science (e.g. data structures), and a very natural way to model

application domains (e.g. the Human definition above). For these reasons recursive

definitions have been widely studied (see Nebel [1991], De Giacomo and Lenzerini

[1997]), and all modern systems provide unrestricted concept inclusion assertions.

The first problem for cyclic definitions is providing a semantics; the different pro-

posed semantics are either fixed point based or descriptive. Using the descriptive se-

mantics, all the interpretations satisfying the Tbox statements are models for the Tbox

itself. While under the fixed point semantics, interpretations are filtered according a

fixed point operator (see Nebel [1991], De Giacomo and Lenzerini [1997] for more

details). The descriptive semantics is adopted here because of its wide acceptance as

the most appropriate one (see Donini et al. [1996b]).

We distinguish three different types of terminologies: definitional acyclic, defini-

tional, and free. The latter (free) type is the most general one: every kind of termi-

nological axiom of the form � � can be put in the terminology2. The definitional

approach restricts the form of the axioms in such a way that only concept names are

allowed on the left hand side of an axiom, and each name can appear only once in a

left hand side. Under the definitional restriction the axioms are called definitions.3 The

most restrictive approach imposes that the definitions must be acyclic.

Many systems take the acyclic definitional approach because it is the simplest case,

and it exhibits good computational properties. In addition, it can be shown that a

knowledge base containing only acyclic definitions can be transformed in an equivalent

one with an empty terminology.

The transformation proceeds in two phases. Firstly, all the inclusion definitions

of the form � � are transformed into equivalent equality definitions �� ,

introducing a new concept name for each inclusion. Finally, all the defined concept

names in concept expressions are substituted with their definition. After the transfor-

mation the Tbox can be ignored and all the reasoning can be carried out with the fully

expanded expressions.

2The inclusion axiom is general enough for representing definitions (��) as well. In fact, a single

definition �� can be expressed by the pair of inclusion axioms � �� and � � �.

3Actually, in DLs which provide the disjunctive constructor, free terminologies can always be trans-formed in (possibly cyclical) definitional terminologies (see Buchheit et al. [1993]).

Abox assertions

The extensional part describes a particular configuration of a domain, introducing in-

dividual names and their properties. Aboxes are composed of statements of the form:

�� (2.3)

�� (2.4)

The assertion �� expresses the fact that the interpretation of individual � is in the

extension of a concept expression �. The role assertion �� states that the pair

composed by the interpretations of individuals � and � is in the interpretation of role

expression . The semantics of an assertion is given with respect to interpretations.

An interpretation � satisfies �� (�� ) if and only if �� (�� ).

Analogously to the Tbox, an interpretation � is a model for an Abox if it satisfies

every assertion in the Abox.

DL knowledge bases

A DL knowledge base is a pair � � �� , where � is a Tbox and � an Abox.

Given the definition of models for a Tbox and an Abox, a model for a knowledge base

� � �� is a model for both � and �. The definition of models of a knowledge

base naturally introduces the concept of logical implication: a statement� is a logically

implied (or entailed) by the knowledge base � (written as � �� ) if and only if � is

true in every model of �. For example the statement �� is a logical implication of the

knowledge base � � �� .

Two knowledge bases are equivalent if the models for one are models for the

other and vice versa. This equivalence is a powerful tool for manipulating knowl-

edge bases: as long as the equivalence is guaranteed, a knowledge base can be modi-

fied without affecting its logical implications. For example the knowledge base � �

�� is equivalent to ��

�� ,

and the statement �� is logically implied by both � and � �.

2.2. REASONING WITH DLS 25

2.2 Reasoning with DLs

Reasoning with DL knowledge bases is the fundamental process of discovering the

facts entailed by the knowledge base. The interaction with a DL system is performed

by a collection of reasoning services implemented employing automated reasoning

techniques. The reasoning services may differ according to the purpose of the DL

system, for example checking the truth value of boolean queries, or verifying the con-

sistency of the knowledge base.

The reasoning services provided by a DL system are formally defined in terms of

the logical consequence. Reasoning services can be classified as basic services, which

involve the checking of the truth value for a statement, and complex services. Among

complex reasoning services are tasks such as finding all the individuals being in a

concept expression, or organising the concept names appearing in the terminology in a

taxonomy. Basic services are for instance the verification of the subsumption between

two concepts or the satisfiability of a concept. The principal basic services are:

Knowledge base satisfiability written as � �� !" �, is the problem of checking

whether there is at least a nonempty model for �.

Concept satisfiability written as � �� !" �, is the problem of checking whether

there exists a model of � in which the extension of the concept � is not empty.

Subsumption written as � �� , is the problem of verifying that in every model

of � the extension of � is included in that of �.

Instance Checking written as � �� , is the problem of verifying that in every

model of � the interpretation of � is in the extension of �.

Reasoning services are not independent of one another, on the contrary each can be

often reduced to another. The reducibility actually depends on the underlying concept

language: using a language closed under negation, all the basic reasoning services can

be reduced to knowledge base satisfiability (see Schaerf [1994] for the details).

The complex reasoning tasks provided vary from system to system, and are de-

fined on top of the basic services. The most common are classification and retrieval.

Classification consists of explicitly representing the concept taxonomy entailed by the

knowledge bases. The concept taxonomy is a graph whose nodes are the concept

names that appear in the knowledge base, and the edges represent the subsumption

relation between them. This graph can be constructed by checking the subsumption

between every pair of concept names. Retrieval (or query answering) is the collecting

of all individuals in the knowledge base that are instance of a given concept in every

model of the knowledge base. It is formally defined by the set �� ,

where � is the querying concept.

2.2.1 Terminological reasoning

Terminological reasoning involves only the terminology (i.e. without considering Abox

assertions). In spite of the fact that all the reasoning services can be reduced to knowl-

edge base satisfiability, terminological reasoning is important because of the fact that

for most of the concept languages the terminology does not depend on the assertional

part. This independence relies on the fact that some services which require dealing

with concept expressions only (e.g. subsumption) can be performed ignoring the asser-

tions. This assumption is extremely useful; for example, it enables the verification that

an Elephant is a Mammal without considering the complete knowledge base about

wild life.

Unfortunately, not all concept languages enjoy this property; among these are the

concept languages in which individuals can be used in concept expressions (e.g. the

ones with one-of). In fact, while in general the expression �� is not sub-

sumed by ��, with respect to a knowledge base containing the assertions ��

the subsumption actually holds:

� � ��

Since most of the languages enjoy the property of terminology independence, and

terminological reasoning services are fundamental for knowledge representation sys-

tems (classification, for example), a big effort has been spent on the development of

efficient concept satisfiability algorithms (see Horrocks and Patel-Schneider [1998a]).

The concept satisfiability problem is general enough, because concept subsumption

can be reduced to the unsatisfiability of a concept expressions: � � if and only if

� � �� is unsatisfiable.4

The main idea of the concept satisfiability algorithm, presented in Baader and Hol-

lunder [1991a], is based on a notational variant of the first–order tableaux calculus. It

is based on the general idea that, since presenting a model is a way of showing that the

4If �� is unsatisfiable � intersected with �� it is empty in each model, so � has to be includedin �. The other way around is analogous.

concept is satisfiable, a tableau can be used to build a model of the given concept.

Example 2.2

Looking for a model of expression � � �� means building an interpretation � in

which the expression’s extension is not empty. The process starts “creating” an object

in the domain �� , which belong to the extension of the concept � � ��:

� ��

should be in both �� and �� because of the semantics of �, so two more

constraints are induced:

� ��

The semantics of the existential constructor guides the generation of a new individual

� � �� (potentially different from ) such that:

� � ��

From the set of constraints above an interpretation can be derived:

��

� � ��

which can be easily verified as being a model for the initial concept expression � �

��.

Example 2.2 can be generalised by the notion of constraint system as defined in

Schmidt-Schauss and Smolka [1991]. A constraint system is a set of constraints, which

are syntactic elements of the forms:

�� concept constraint

� � �� role constraint

!� � inequality constraint

Although the syntax is similar to the one for the Abox assertions, there is a difference in

the meaning of the and � items. These are not individuals, but variables, so the unique

name assumption does not apply to them unless stated by an inequality constraint. The

similarity with Abox assertions is that both kind of expressions restrict their arguments

to be in a concept or role expression.

When in a constraint system between two variables the inequality constraint holds,

these two variables are said to be separated. A constraint system can be seen as a graph

where the edges are labelled with the roles, and the nodes are the variable names; when

a variable is connected by a role to a variable �, � is said to be an -successor

of . Given a constraint system � the notation �� represents the constraint system

obtained by substituting each occurrence of variable � with variable � in the constraint

system �.

The process of looking for a model proceeds by extending (or completing) a con-

straint system using a set of completion rules. These rules modify a constraint system,

adding or rewriting the constraints contained on it. For example the rule for the exis-

tential quantification (see Example 2.2) is defined as:

� #� ��

if �� in �, � is a new variable

and there is no � s.t. both � � �� and �� in �.

The meaning of the rule is that the constraint system � should be expanded by

adding the constraints � � �� and ��, if the conditions in the following lines are

verified. The set of completion rules varies according to the different concept lan-

guages, for example the rules for �� are shown in Figure 2.1.

The concept satisfiability algorithm starts with the constraint system � ��, where

� is the concept to check, then it applies the completion rules as long as the precon-

ditions are satisfied. When no rule is applicable the constraint system is said to be

completed, and if there are not contradictory constraints, a model for the concept �

can be derived (see Donini et al. [1996b]). Contradictions in a constraint system are

detected by the so–called clashes, which are constraint systems containing constraints

of one of these forms:

1. � ��

2. � ��

3. � �$ ��

� �� !� �� !� ��

� #� � �� if �� in �,and both �� and �� are not in �.

� #� � �� if �� in �,neither �� or �� is in �and � � �� or � � ��.

� #� �� if �� in �, and � � �� is in �,and �� not in �.

� #� �� if �� in �, � is a new variableand there is no � s.t. both � � �� and �� in �.

� #� �� !� �� !� �� if �� in �, �� are new variablesand has not � pairwise disjoint -successors in �.

� # �� if �$ � in �, has no more than � -successors in �,� � ��, � � �� are in �,and � !� � is not in �.

Figure 2.1: Completion rules for ��

Not all the completion rules are deterministically applicable (i.e. only in one way).

For example in the disjunction rule#� one of the two concepts can be nondeterministi-

cally chosen to be inserted in the augmented constraint system; the choice can generate

results in two different constraint systems. One is chosen and if it leads to a clash, the

algorithm backtracks and tries the second one. In the case of �� the nondetermin-

istic rules are those associated with disjunction, and with the upper number restriction

(the variables to be merged are nondeterministically chosen).

The algorithm based on the completion rules has been proven to solve the concept

consistency checking problem; i.e. it produce a clash free completed constraint system

if and only if the concept is satisfiable (see Donini et al. [1996b]). Presenting the for-

mal proof of that result is not in the scope of this work, but an overview of how it is

structured can be useful because of it can be generalised to other concept languages

with different set of rules. The first step consists in showing that the deterministic rules

application preserves the satisfiability of the constraint system, and that the nondeter-

ministic rules can be applied in such a way that the satisfiability is preserved. The

second step consists in showing that the rule application always terminates; i.e. all

the possible completed constraint systems which can be generated are finite. Finally,

the last step shows that from a completed clash-free constraint system a model can be

constructed.

2.2.2 Hybrid reasoning

Hybrid reasoning takes account of both the parts of a knowledge base. We consider al-

gorithms for solving the problem of knowledge base satisfiability. In principle, this ap-

proach is general enough because all reasoning services can be reduced to knowledge

base satisfiability (see Section 2.2). Indeed, most of the DL systems available pro-

vide different reasoning services by reducing them to KB satisfiability, and although

alternative approaches have been suggested (see for example Rousset [1999]), no other

satisfactory alternative has yet been found.

For a large class of concept languages, including �� , the algorithm can be

a generalisation of that used for terminological reasoning. The notion of constraint

system is extended, allowing the presence of constraints for individuals as well as vari-

ables. In addition, due to the unique name assumption, each pair of different individu-

als implicitly introduces an inequality constraint which forces them to be separated.

The initial constraint system is not composed by a single constraint as the termi-

nological case, but the full Abox is included in the constraint system. Each role or

concept assertion is added to the constraint system as a corresponding role or concept

constraint. Then the algorithm proceeds, completing the constraint system as in the

terminological case. In Buchheit et al. [1993] the proof of correctness and complete-

ness of the technique has been provided. Example 2.3 below, shows an knowledge

base completion for an instance checking problem.

Example 2.3

Consider the following knowledge base with an empty terminology:

� � � �

��susan�Female�bill��Female�

�sarah�susan��FRIEND� �sarah�andrea��FRIEND�

�susan�andrea��LOVES� �andrea�bill��LOVES

��

This is one of the classical examples for showing the necessity of reasoning by case

in DL. The example pivots on the fact that the sex of the individual andrea is not

specified, so the KB contains incomplete knowledge.5

We want to check whether sarah has a female friend loving a male, i.e. verifying

the instance checking problem

� �� sarah��FRIEND��Female � ��LOVES��Female��

It is easy to verify the instance checking problem using reasoning by case: we can

assume that andrea is either in the concept Female or �Female. In the first case,

andrea is the female friend of sarah loving a male (bill). In the second case,

susan is the female friend of sarah loving a male (andrea).

This problem can be reformulated in term of KB satisfiability by verifying that the

knowledge base augmented by the assertion

sarah��FRIEND��Female � ��LOVES��Female��

is unsatisfiable (see Donini et al. [1994]). The corresponding initial constraint system

is:6 ��sarah��FRIEND��Female � ��LOVES�Female��

susan�Female�bill��Female�

�sarah�susan��FRIEND� �sarah�andrea��FRIEND�

�susan�andrea��LOVES� �andrea�bill��LOVES

��Using the universal rule (#�) applied to the first constraint and the role constraints,

new constraints for susan and andrea are introduced:�susan��Female � ��LOVES�Female��

andrea��Female � ��LOVES�Female��

With the new constraint for susan the nondeterministic #� rule can be applied.

However choosing the susan��Female constraint leads to a clash with the initial

susan�Female assertion; therefore the second concept is chosen, and the constraint

system is extended with: �susan��LOVES�Female�

5The name Andrea is considered a male name in Italy, and female in most of the rest of the world.6All the negations have been pushed in front of concept names.

2.3. ROLE AXIOMS 32

This constraint, together with the role constraint �susan�andrea��LOVES can be

used with the universal rule leading to the new constraint:�andrea�Female

Again the #� can be applied to the not yet considered constraint

containing a disjunction. As in the previous case, choosing andrea��Female leads

to a clash, so the added constraint is:�andrea��LOVES�Female�

which can be used with the �andrea�bill��LOVES constraint to add the new con-

straint bill�Female, which generates a new clash. At this point all the possible

nondeterministic choices have been explored, and all the possibilities leaded to a clash.

Therefore the constraint system is not satisfiable, and the answer to the instance check-

ing problem is affirmative.

2.3 Role axioms

The DLs we have introduced up to now are powerful languages for describing classes

of elements, but the possibility of specifying “global” properties of the roles is limited.

For instance, a DL with the transitive closure constructor (see Table 2.2) can “talk”

about the transitive closure of a role but it cannot restrict the interpretation of a role to

be transitive.

Most of the recently developed DL systems have introduced the possibility of

adding axioms about roles into the terminology (see Horrocks [1998], Haarslev and

Moller [2000b]). Usually these axioms state the inclusion relationship between role

names; but, in addition, transitivity or functional restrictions can be imposed on role

names. The choice of role axioms provided by a DL system depends on the balance

between the representional usefulness of the construct, and the practical tractability of

the KB satisfiability problem.

Although some of the role restrictions can be imposed by appropriate concept ax-

ioms, having an algorithm which handles them directly is usually more efficient. A

2.3. ROLE AXIOMS 33

typical case is the functional restriction; i.e. the fact that every element has at most one

successor. This restriction can be enforced by a role axiom which states that a role

� is functional, or by using the concept axiom � $ �� (in DLs providing the

number restriction constructor). In spite of the fact that they are equivalent, the latter

method may introduce unnecessary nondeterminism in the tableau construction (see

Section 2.2.1).

In our work we concentrate on three kinds of role restrictions: transitivity, func-

tional, and inclusion (or role hierarchy). Indicated in the name of the logic respectively

by the letters �� , �, and f.

Transitivity This restriction on role names states that the transitive closure of a role

must be contained in the role itself. Formally, this can be expressed as a requirement

for interpretations, in the same way as in the case of concept axioms (see Section 2.1.2).

For example, if the role is transitive then an interpretation � � �� satisfies the

transitivity of iff

�� implies � � ��

Transitivity can be extremely useful for modelling part-of relations (see Sattler [1996]

for more details). The reader may have noticed the similarity of the transitive re-

striction with the transitive closure constructor on roles (see Table 2.2). In fact, the

latter can be used in place of the transitive restriction (the converse is not true). How-

ever, transitive restriction has proved to be practically more tractable than the transi-

tive closure constructor, leading to simpler satisfiability algorithms (see Horrocks et al.

[1999b]).

Generally, the transitivity restrictions are not directly represented as axioms in the

terminology; instead, we distinguish a subset of the role names as the set of transitive

roles (i.e. � �� ).

Functional If a role name is functional, in every interpretation satisfying the restric-

tion the relation associated to the role name must be a partial function (i.e. it relate

any element of the domain to no more than one element). It is easy to see that this

is equivalent to a concept axiom of the form � $ ��. However, functional re-

strictions are common in domain modelling, therefore they are usually handled by the

satisfiability algorithm directly.

Analogusly to the transitivity restrictions, functional roles are considered a subset

2.3. ROLE AXIOMS 34

of the set of role names (i.e. %�� ). In addition, we impose the limitation (as

in the FaCT system) that the sets of functional roles and transitive roles are disjoint.

The reason for this limitation lies in the fact that is not clear whether transitive func-

tional role are of any usefulness in the modelling process; moreover, the decidability

of such a DL is still an open problem (we know that allowing transitive roles in number

restriction leads to undecidability, see Horrocks et al. [1999b]).

Inclusion This kind of axiom is the role counterpart of the concept axioms shown

in Section 2.1.2, with the difference that they relate sets of pair of elements of the

domain instead of sets of elements. Role axioms are expressed by statements of the

form � �, where � and � are role expressions. An interpretation � � ��

satisfies the role axiom iff

��

� �

We consider only simple axioms where the role expressions are simply role names, in

this way the role axioms express only a taxonomy of role names. It is worth notic-

ing that axioms hide an implicit negation,7 therefore without restrictions on the role

axioms full boolean role expressions can be obtained if the DL has conjunction or dis-

junction role constructors. The problem is that having a language for role expressions

closed w.r.t. negation push the DL close to the decidability border (see Lutz and Sattler

[2000]).

Obviously, sub-roles of functional roles must be functional as well; therefore if the

DL provides these three restrictions a transitive role cannot be included in a functional

role (see the discussion about transitive functional roles above).

Role hierarchy can be “simulated” by using either role conjunction or disjunction.

Roughly speaking, the idea is to substitute each node of the role taxonomy by either

the conjunction of all the top level role names “above” the node, or the disjunction of

the leaves below it. Each role name in then replaced by the corresponding expression;

this is more or less the same process as the “unfolding” of the concept definitions. This

role unfolding can be performed only if the DL does not provide the qualified num-

ber restriction constructor (i.e. $ ��) because for keeping decidability complex

role expressions are not allowed as argument of this constructor (see De Giacomo and

Lenzerini [1994]).

An example of the expressivity that role restrictions can provide is given by the fact

7Just consider the concept case in which the axiom � �� is equivalent to assert that every elementof the domain must belong to the interpretation of the concept ��

2.4. THE DESCRIPTION LOGIC ��F 35

that arbitrary concept axioms can be represented by the combination of transitivity and

role inclusion (see Baader [1991]).

2.4 The description logic ��f

In the rest of the thesis we consider the DL ��f; this is the very same logic handled

by the DL system FaCT (see Horrocks [1998]) with the addition of the Abox.

��f is an extension of the logic �� to include transitive roles, role hierarchy and

functional restriction. Following the naming convention introduced in Horrocks et al.

[1999b], the DL �� extended with transitive role (i.e. ��) is abbreviated as �.

This is due to the relationship with the propositional (multi) modal logic �� (see

Schild [1991]).

Summarising, the DL ��f is built over a signature of distinct concept (�� ), role

(�� ) and individual ( ) set of names; in addition we distinguish two non–overlapping

subsets � �� , %�� of �� which denote the transitive and the functional roles.

Valid concept expressions are defined by the abstract syntax:

� � � � � � � � � � ��

where � is a concept name chosen from the set �� and a role name from �� .

A ��f knowledge base is composed by a Tbox containing axioms of the form

�� or � �, and an Abox containing individual assertions of the form �� or

�� . The semantics is the one described in the previous sections.

Since role inclusion axioms contain role names only, from the set of role inclusion

assertions a taxonomy of role names can be trivially build. In addition, if there is a

cycle in the axioms, then all the names involved in the cycle must correspond to the

same binary relation in every interpretation (satisfying the axioms). For this reason,

we assume that the role taxonomy is acyclical, and we represent it by a partial order &

defined over the set of role names.

In the next chapter we characterise the structure of the models for a ��f KB. The

idea is to recognise a class of interpretations which completely describes ��f, in the

sense that whenever a KB has a model it has a model of this class as well.

This characterisation of the logic will be crucial for the following chapters describ-

ing the KB satisfiability and query answering algorithms.

Chapter 3

Model characterisation of ��f

As anticipated in the previous chapters, many of the convenient properties of Descrip-

tion Logics stem from their tree model property; i.e. the fact that every satisfiable

formula has a model that is a tree (see Vardi [1997], Gradel [1999]). In the logic we

consider the picture is slightly more complicated by the fact that Tbox assertions on

roles and the Abox can force non-tree shaped models. In particular, the transitivity

forces the presence of “shortcuts” corresponding to the transitive closure of parts of

the tree; while the Abox assertions can force any graph shape among the nodes corre-

sponding to individual names. However, we still have a tree–like model property; the

models look like a cloud containing the nodes corresponding to individual names with

(possibly transitive) trees hanging off the nodes in the cloud (see Figure 3.1).

Figure 3.1: Structure of interpretations for ��f

3.1. QUASI TRANSITIVE SHRUB INTERPRETATIONS 37

We call this kind of interpretations shrubs, because we distinguish a central inter-

connected part with branches coming out of it.1 There are a few properties of these

interpretations which are fundamental for the proofs we are going to present in the

following chapters. The first one is that elements in distinct trees are not connected by

any link; i.e. elements of the trees cannot “see” elements of other trees. The second

is that each tree is “attached” to a single node corresponding to an individual; if there

are links starting from other nodes in the cloud, then these edges must be the result of

a transitive closure. This means that all the connections of the tree with the rest of the

graph are “mediated” by the individual the tree is attached to.

This chapter will formally present the properties of this class of interpretations

which we have called quasi transitive shrubs. In addition, we define a transformation

for arbitrary interpretations which enables us to present a completeness result for both

the problem of satisfiability (see Section 3.3) and logical implication (see Section 7.2)

for ��f.

Note that completeness w.r.t. KB satisfiability can be proved by using a technique

simpler than the one we are presenting in this chapter. In fact, it is well know that

the DL ��f has the finite model property (see Horrocks [1998]),2 while our technique

always generates infinite interpretations (even uncountable if the starting domain is

uncountable). So the natural question is whether we need a different mechanism. Our

answer to this doubt is obviously positive, because we need to show the completeness

w.r.t. a problem not directly reducible to KB satisfiability (see Section 7.2). We will

come back to this latter point in the last section of this chapter.

3.1 Quasi transitive shrub interpretations

As explained above we are interested in interpretations being structured as a collection

of trees whose roots are possibly interconnected by links. We start from a set of trees,

then we add the required links connecting the roots to the rest of elements.

There are different way of building a tree; the one we have chosen consists of

selecting an alphabet (without any restriction of cardinality), and then consider the set

of all finite sequences of elements of that alphabet. When it comes to actually build

the tree by adding the edges, we are going to connect two sequences only if the second

one is equal to the first one with one more element of the alphabet (i.e. a sequence

1Strictly speaking a proper shrub does not have the interconnected core, but from an adequate dis-tance they looks like balls of leaves with a few branches coming off them.

2Without Abox assertions, however Abox is not affecting the finiteness.

�� can have only sequences �� as successors). This process naturally

leads to a forest where each element of the selected alphabet (or more precisely the

sequence containing a single element) can be the root of one of the trees. Note that

any node of the trees can have as many successors as the cardinality of the alphabet

(which is not restricted in any way). Note that this method does not satisfy completely

our requirements, because there are still the problems of transitive relations, and Abox

role assertions. These two aspects are indeed considered in Definition 3.2.

We chosen this method because what we want to do is to take an arbitrary inter-

pretation and transforming it into a quasi transitive shrub interpretation still having the

same properties (in terms of satisfiability of a KB). In addition, we would like to main-

tain a strict relation between the elements of the original interpretation domain and

those of the new interpretation; the idea is taking the original interpretation domain as

an alphabet (see Section 3.2.1 for the details).

Let us start with the definition of the set of sequences of elements from a given

domain.

Definition 3.1. Let � be an arbitrary domain then'#� is the set of all the finite se-

quences of elements of �. We indicate the empty sequence as �, which by assumption

is not in'#� .

To distinguish the elements of the “alphabet” from the sequences we are going to

use the letters � � for elements of the “alphabet” domain, and the letters �� for

sequences. We indicate with � �� the sequence containing the elements �� of �; while � represents the sequence resulting from the concatenation of the the

element to the sequence �.

We use the set of finite sequences as the domain for the quasi transitive shrub

interpretations; in addition, we impose some restriction on the interpretation function.

In particular, we require that individual names are mapped to sequences of length 1,

and we add a few requirements on the interpretation of roles, which we will detail

below.

Definition 3.2 (Quasi transitive shrubs). Let � be an arbitrary domain. An interpre-

tation � � �� over a set of individual names , role names �� , and concept

names �� is a quasi transitive shrub interpretation iff it satisfies the following prop-

erties. Let �� be arbitrary elements of �� , an element of �, and a name in �� :

�� '#� (3.2a)

if � � � then there is � � � such that � � �� (3.2b)

if �� then �� (3.2c)

if �� then either � � � or �� (3.2d)

if �� then �� (3.2e)

The first two restrictions of Definition 3.2 correspond to the two above mentioned

properties that the domain is composed by finite sequences of given elements, and

the individual names are always interpreted as “singleton” sequences. The third prop-

erty (3.2c) ensures that there is not any link going back from one of the trees to the root

of any other tree; i.e. the interaction between different trees is always mediated by the

roots, and is restricted to elements mapped from individual names.

The last two properties ((3.2d) and (3.2e)) ensure the tree–like shape, extended to

the transitive case. Property (3.2d) extends the tree–shape idea to cover the transitive

parts of the interpretations. In fact, if the transitive closure is not considered, the

presence of an edge �� implies that � � � (the successors of a node are sequences

containing just one more element). However, we need to consider the case in which

a role is transitive or contains a transitive sub-role; therefore we must allow the case

in which �� is a shortcut for the two edges �� and �� . The last property

relates the elements contained into a subtree with elements outside the given subtree.

In fact, given a node �, all the nodes in the form � �� (with � � �) are in

the subtree rooted in �. The property describes the case in which there is an edge

connecting a node outside this subtree (�) to both the root (�) and an element of the

subtree.3 In this case it is ensured that the edge �� is a shortcut for the path

�� and �� .

Clearly, the the structures characterised by Definition 3.2 are neither trees nor

acyclic graphs. However, we can distinguished the tree–like structures defined by all

the elements beginning with the same sequence. With an abuse of terminology we call

these substructures trees. In fact, if we consider their intuitive support tree, the added

edges are only the result of transitive closure of the support tree. This restriction does

not apply to the roots of these trees, which can be connected in any way (even with

cycles). The idea is that the relation among the roots reflects the role assertions in the

3The case in which � � � is trivially satisfied.

The restriction on the structure of q.t. shrub interpretations are strong enough to

characterise two sequences connected by a role. This is shown by the next lemma

which is used in Chapter 7 for showing the completeness of the proposed algorithm

for query answering (see Proposition 7.16).

Lemma 3.3. Let � � �� be a quasi transitive shrub interpretation built from the

domain � (i.e. �� '#�). Let be an arbitrary role name, � and � � � �� be

elements of �� ( � � � for � � �� ) for � � �.4 Then the pair �� is in �

only if:

there is an index � � � s.t. � � � �� (3.3a)

or �� and �� (3.3b)

Proof. Since the length of the sequence � is finite, then we prove the lemma by induc-

tion on the length of �.

Let be � � � � �, then by (3.2d) either � � � � or �� ;

indeed, these are exactly the two cases (3.3a) and (3.3b).

Let us assume that the lemma holds for any sequence whose length is less than

�, and let us consider �� . By (3.2d) either � � � �� or

�� . In the first case (3.3a) is sat-

isfied by � � � ' �, while for the latter case we can use the inductive hypothesis

for the pair �� . Therefore, either there is an index � � � ' �

such that � � � �� or �� . In the first case

(3.3a) is satisfied, while in the second we have �� . We need to show

that �� . This is verified by (3.2e) because �� and

�� by hypothesis.

The previous lemma describes the situation in which there is a relation � be-

tween an arbitrary element � and an element � which is “inside” one of the trees (it

cannot be one of the roots because it is a sequence longer than a single element by

assumption). When this is the case then either � is contained in the subtree rooted in

� (condition (3.3a)), or the relation is “coming” via the root of the tree containing �

(i.e. the sequence � � (condition (3.3b)). Note that by Definition (3.2c), when the latter

condition is verified the element � must correspond to one of the individual names.

4We exclude the case � � ��, because this is trivially covered by the definition in (3.2c).

This property is important because it guarantees that inside one of the trees links

can only go downward. Therefore there cannot be any link going back to one of the

roots, or to elements contained in any other tree.

3.1.1 Canonical interpretations

The definition of a quasi transitive shrub interpretation is independent of any given

knowledge base. In particular it does not take into account restrictions on the inter-

pretations imposed by a particular KB, like transitivity restrictions on role names, or

inclusion relations between roles. Therefore, by using Definition 3.2 we cannot prevent

the transitive closure of relations or multiple relations connecting two elements, even

when they are not required (by the kb). The point is that we really need to minimise

the non–tree characteristics of the structure of an interpretation (for reasons which will

be clear in Chapter 6 and Chapter 7).

Let us consider for example the assertion � . In terms of the structure of

interpretations, the axiom guarantees that whenever there is an edge labelled with �

between two nodes then there must be an edge labelled as well. This is not a general

property of the ��f logic, but only of interpretations satisfying the knowledge base

containing the given assertion.

This problem is specifically related to the possibility of specifying terminological

axioms about roles. For example with ��, or even a DL as expressive as PDL, the

tree–model property can be stated independently from any particular kb.

In our case the DL ��f allows the presence of role axioms; therefore, what we

really need is a definition characterising an interpretation w.r.t. a given KB. For this

reason we introduce the notion of canonical interpretations w.r.t. a given KB (Def-

inition 3.6). Roughly speaking, the definition considers the transitivity and the role

ordering induced by the KB in order to restrict the structure of allowed interpretations.

It is important to notice that a canonical interpretation does not necessarily satisfy the

knowledge base; the two notions are orthogonal one to the other.

Given a knowledge base �, the role inclusion axioms in the terminology define a

partial order over the set of role names �� , we use the symbol &� to denote this

reflexive order (by definition). For the sake of simplicity, wherever there is only

a single terminology the KB is omitted and the ordering of role names is denoted by

the symbol & alone. Moreover, some of the role names have additional restrictions:

namely being functional or transitive. Note that the functionality restriction propagates

w.r.t. the role ordering, forcing each subrole of a functional role to be functional as

well. We distinguish these two non-overlapping subsets of �� as %�� (functional)

and � �� (transitive).

Let us consider the requirement that two elements of the domain of a canonical q.t.

shrub interpretation are related by two different roles only if this is required in every

interpretation. Suppose that we have two elements � � in the original interpretation �

related by two different roles (� � �� ); in the q.t. shrub interpretation, all

the sequences of the form � and � � would be related by the same pair of roles. In

order to avoid this, we can duplicate the element � into �� and ��, then add the pair

�� to the interpretation of and �� to the interpretation of �. This

technique cannot be applied in every case; for example, if the role � is included in

(i.e. � & ), whenever a pair � � �� is in �� it must be in � as well.

Therefore we must consider groups of role names that must “stay together”. Fortu-

nately, this can be discovered by looking at the structure of the relation & induced by

a knowledge base, together with the functional restrictions. We identify a subset of all

the subsets of �� , calling the elements of it labels. Intuitively, we are going to make

as many copies of elements of the original domain as the number of possible labels.

The following definition provides the formal description of the set of labels w.r.t. a

given knowledge base. It is important to stress the fact that this set is induced by the

assertions about the roles in a knowledge base, therefore they are not intrinsic to the

logic ��f.

The first thing we need to do is to formalise the notion of the fact that two roles

can “stay together”; i.e. the fact that when we have different roles connecting two

elements they cannot be split. The simplest case is when a role name is included in an

other (i.e. & �), but the interaction between role hierarchy and functional restriction

makes the story slightly more complicated. For example, let us consider two role

names � and � both included in a functional role � . In an arbitrary interpretation

�, whenever � � �� and � � ��

� we know that both � � �� and � � ��

must be in � � because of the inclusion; in addition, � � is functional therefore � � �.

This scenario may be even more elaborated by considering three roles �, �, �

and two unrelated functional roles ��, �� having the inclusion relation like � & ��,

� & ��, � & ��, and � & ��. With arguments which are similar to the simpler

case, we can conclude that if in all three relations an element has a successor (i.e.

� � �� , � � ��

� , and � � �� ) then this successor must be the same

(i.e. �� ).

Note that we are not saying that � � �� implies the presence of � � ��

as well (like the case in which � & �), but only that if they are in the interpretation

then they must coincide. This property is used in the proofs for the query answering

algorithm presented in Chapter 7 (see Proposition 7.15).

The broad idea in the definition is the fact that for a non–functional role every

“preceding” role in the hierarchy must stay with it; while when is functional we

must take the “successors” as well. This should be propagated for covering the cases

shown above. So we define a relation � describing the “staying together” of the roles.

Note that this relation is not always symmetric because if it is the case that � &

and !& �, then � � but not the converse, while when there are functional roles

involved it becomes symmetric (e.g. in the first of the example above, we have that

� � � and the converse as well).

Definition 3.4. The relation � is recursively defined by

� � � iff

��

� & �� or

� & � and � is functional, or

� � and � � for some role if neither

� & � nor � & �

(3.4a)

The relation is well defined because it can be build by using the structure of the role

hierarchy. We use the newly defined relation for introducing the set of labels of a KB.

A label � is a subset of �� (� � �� ). For a fixed ordering & and set of func-

tional roles %�� we define the set of labels of �� as a maximal5 subset � of ��

satisfying the properties that for any label � � �

if �� then � � or � � ; (3.4b)

if � � then �� . (3.4c)

First we show a few properties of the sets of labels of a given KB.

5Maximal w.r.t. set inclusion.

Proposition 3.5. Let us consider a set of labels � of a given KB:

� �� (3.5a)

for all � �� there is � � � s.t. � �; (3.5b)

the set of labels is unique; (3.5c)

if two labels share a functional role, then they are equal. (3.5d)

Proof. 3.5a Let assume that !� �, clearly satisfies the properties in Definition 3.4.

Therefore � � � � is a set of labels which strictly includes �, contradicting the

hypothesis that � is maximal.

3.5b Let us assume that there is a role s.t. !� � for any label � in �. Let us

build the new label �� , note that �� satisfies both the properties

in Definition 3.4; therefore �� is a set of labels which strictly includes �,

contradicting the hypothesis that � is maximal.

3.5c Let ��, �� be two sets of labels satisfying the Definition 3.4. We are going to

show that �� , by choosing a label � in �� and proving that it must be in

�� as well.

If � is the empty set, then it is contained in �� by (3.5a), so we consider the case

in which � !� . Let us assume that � !� ��; the label � satisfies the properties

in Definition 3.4 because �� is a set of labels. Therefore �� is a set of

labels which strictly includes �, contradicting the hypothesis that � is maximal.

Using the very same arguments we can show that �� , therefore they are

equal.

3.5a Let ��,�� be two labels in � sharing a functional role � .

First we show that �� . Let us consider an arbitrary role name in ��; if

is equal to � , then it is in �� by definition. Let us assume that is different

from � ; by Property (3.4b) we have that either � � or � � . If � �

then � �� by Property (3.4c). Let us suppose that � � , since & we

can use Definition (3.4a) for showing that � � as well (substitute � with � ,

and �� with . Therefore � �� by Property (3.4c).

Analogously, we can show that �� , therefore �� .

Definition 3.6 (Canonical Interpretations). A quasi transitive shrub interpretation

� � �� is said to be canonical w.r.t. a KB � if the following two conditions

apply. Given arbitrary role names �� , and different elements �� of ��

with � !� � :

if �� then

there is a transitive role � & such that �� (3.6a)

if ��

� then there is a label � s.t. �� (3.6b)

The definition of a canonical interpretations is given in such a way that it tries

to minimize the non–tree characteristics of the structure of an interpretation. In fact,

given the ��f DL we cannot restrict ourselves to tree structures; for example the role

hierarchy can force two different edges connecting the very same pair of elements, or

the transitivity can force “shortcuts” of role paths.

The first restriction (3.6a) ensures that a canonical interpretation contains only the

necessary shortcuts; i.e. those required for satisfying the transitivity restrictions. The

property states that if the interpretation contains a shortcut (i.e. ��

�), this must be caused by a transitive role. Because of the interaction between tran-

sitivity and role inclusion, it is not necessary that the role is transitive: a transitive

role included in can force the shortcut as well.

This is not enough, in fact the restriction (3.6a) does not require the expected

�� . The reason comes from the interaction of assertions about the individuals

with the role structure; in fact, we know that Abox assertions can impose any shape in

the part of interpretations concerning the individuals.

Example 3.1

Let us consider the Abox assertions

��

together with the inclusion assertions � � �, � , and the role names �,� � being

transitive. One interpretation (�) satisfying the assertions is depicted as the graph

3.2. TRANSFORMING INTERPRETATIONS INTO Q.T. SHRUBS 46

below�

��

where the element � corresponds to � (i.e. �� ), � to �, and �� to �; in addition

� is an anonymous element. It is not difficult to realise that � is a sort of “minimal”

interpretation satisfying the set of assertions; in addition, it does not force any transitive

subrole of connecting the elements corresponding to � and �.

The restriction (3.6b) concerns the case of multiple edges connecting the very same

pair. Again, these cases are restricted to only those which are necessarily imposed by

the knowledge base.

3.2 Transforming interpretations into q.t. shrubs

In this section we show how to build a canonical q.t. shrub interpretation from an

arbitrary interpretation satisfying a given knowledge base. First we define the transfor-

mation, leading to what we call the unravelled interpretation and its transitive closure

(Section 3.2.1), then we highlight some of their properties (Section 3.2.2). These re-

sults are used to show the completeness of the class of q.t. shrub interpretation w.r.t.

the problem of kb satisfiability (Section 3.3), and conjunctive query answering as pre-

sented in Chapter 6 and Chapter 7 (see Section 7.2).

The underlying idea of the transformation is simple: we take the domain of the

interpretation as the starting alphabet for the sequences, then we define the interpre-

tation function according to the properties, w.r.t. the original interpretation, of the last

element of each sequence. Roughly speaking, a sequence is in the extension of a con-

cept name if the last element is in the extension of the same concept w.r.t. the original

interpretation. Analogously, two sequences are related if the second one is equal to the

first one with the addition of an element (see Definition 3.2), and their last elements

are related in the original interpretation (see Definition 3.7).

Example 3.2

Let us consider the following simple interpretation without individual names:

� � ��

��

The idea is transforming it into into an interpretation containing an infinite number of

trees like:

��

where � is an arbitrary sequence of elements from the set �� or the empty se-

quence (�).6

As shown in the example above, even starting from a finite interpretation we always

build a new infinite interpretation (because of the arbitrary sequence � in the example).

It is conceivable to impose more restrictions in order to obtain smaller interpretations

(e.g. in the example we may restrict the prefix � to the empty sequence �); however,

more restrictions means more complications and the construction we use is sufficient

for our purpose.

The construction we sketched makes the new interpretations violating any transitive

restriction. For instance, the original interpretation of the example above satisfies a

transitive restriction on the role �, but the newly build interpretation do not satisfies it.

Note that this is something intrinsic in the construction; therefore in a subsequent step

the appropriate relations are transitively closed (see Definition 3.8). In addition, this

simple initial idea is complicated by the fact that the interpretation must be canonical

w.r.t. a given knowledge base, together with the pernicious interaction of transitive and

functional restrictions with the role hierarchy.

6Actually, the transformation is more involved as we are going to show in the following sections.This example is giving the intuition behind the transformation.

3.2.1 Unravelling the interpretation

We assume that the interpretation we are transforming is defined over a set of individual

names , of concept names �� , and role names �� . The set of role names includes

two non-overlapping sets: the functional (%�� ) and transitive (� �� ) names. In

addition, a partial ordering & is defined on �� ; this ordering must satisfy the con-

ditions that names included in functional names are functional as well, and transitive

roles cannot be included into functional roles. Formally, given two names �� in ��

if � & then � %�� implies � � %�� , and � � � �� implies !� %�� .

Now we define the unravelled interpretation of �. We start by building as many

copies of � as the number of labels in � (�� for any � � �). For each label � there

is a bijection Æ� mapping each copy in �� to the original element in �. We define a

new domain �� by taking the union of all the copies of � (i.e. ��

�� ). For

any element of this new set we define two mappings: the first (Æ) maps any element

to the original element of � it was copy of (i.e. Æ ��

�� Æ�); while the second ( )

maps any element to the label corresponding to the given copy of � (i.e. � � � # �,

and � � � � iff � ��). Using the domain �� we define the set'#�� of all the finite

nonempty sequences of elements of �� (see Definition 3.1). The mappings Æ and can

easily be extended to'#�� by using the value of the last element of the given sequence

(i.e. Æ�� Æ� � and �� for any � �'#� � ��).

Elements of'#�� can be naturally associated to nodes in a forest, where � are the

roots and � is a successor of �. We then use the relation structure in the original

interpretation � for adding the necessary edges between nodes. Since we did not make

any assumption on the original domain �, we cannot fix the width of the resulting tree.

In fact it can be even uncountable. The following definition formally describes the

unravelled interpretation of �.

To simplify the notation we indicate with � the set of elements of � to which

the individual names in are mapped by the interpretation � � �� (i.e. � �� ! � �!� �

Definition 3.7 (Unravelled interpretation). The unravelled interpretation ��

��

of � � �� is defined as:

the domain �� is the set of all the finite nonempty sequences of elements of ��

��

'#��

given an individual name ! �

!�� for some � �� s.t. Æ� � � !� and � � � ;

given a role name � �� ,

��

�� Æ�� Æ��

(3.7a)

� ��

�� Æ�� Æ� ��

and � !� �� or Æ� � !� ��

(3.7b)

Note that the two sets (3.7a) and (3.7b) are distinct because in the first one all

the pairs are of the form �� , while in the second �� . The condition

“� !� �� or Æ� � !� �” in the (3.7b) part prevents pairs of the form ��

being added to the interpretation when an analogous pair can be added by the

(3.7a) part. Without the condition, an assertion like �� in the Abox with

� functional would cause the functional restriction on � to be violated in the

unravelled interpretation.

given an atomic concept � � ��

��

�� Æ��

The mapping �� is well defined; in particular the only problem can arise from the

non–uniqueness of the element in the interpretation of individual names. However,

it is easy to realise that we map an individual name ! to a single copy of the original

element !� (the one in the domain ��).

The unravelling is a well known technique for showing that a modal logic has the

tree model property (see Vardi [1997], Streett [1982], Gradel [1999]). Unfortunately,

the standard technique is not enough in our case, in fact the unravelling of an interpre-

tation would not satisfies the transitive restriction on roles. It is easy to see the problem

by considering the fact that only pairs like �� are added to the interpretation of a

role by the (3.7b) set; therefore if both �� and �� are in the interpreta-

tion of a transitive role, the required �� pair is obviously missing. We overcome

this shortcoming by simply adding the transitive closure of relations corresponding to

transitive role names. We call this operation the transitive closure of the interpretation.

Definition 3.8. Let ��

�� be the unravelled interpretation of �. The transitive

closure of �� , written as ��

��, is defined as following:

the domain �� is equal to the unravelled domain �

��;

individuals are mapped as in ��!�� !

��

given an atomic concept � � ��

��

given a non–transitive role name � �� ( � ��

��

given a transitive role name � � ��

��

�� s.t. �� is finite and

�� ' ��

The definition of transitive closure of �� is recursive and well defined because we

assume an acyclic ordering &, therefore �� can be build “bottom-up” w.r.t. the ordering

over role names. It is easy to realise that the transitive closure satisfies the transitive

restrictions on role names, while it does not lose the other properties of the unravelled

interpretation. This will enable us to conclude that given an arbitrary interpretation

we have an effective way of providing an example of an interpretation satisfying the

required properties. This is the fundamental brick which provides the foundations for

the completeness proof as we show in the following sections.

3.2.2 Properties of unravelled interpretations

In this section we present two results which constitute the foundation for the complete-

ness proof. The first two propositions highlight some of the properties of the (transitive

closure of) unravelled interpretations; while the last one shows that these generated in-

terpretations are indeed canonical q.t. shrubs.

We start by showing that the unravelled interpretation �� satisfies some fundamental

properties; then in Proposition 3.10 we show that the same properties are maintained

even after the transformation in transitive closure.

Proposition 3.9. Let ��

�� be the unravelled interpretation of � � �� .

Then, �� satisfies the following properties:

For any ! � , Æ�!�� !� (3.9a)

For any �� and role , if ��

�� then �Æ�� Æ�� (3.9b)

For any � � �� , � �, and role , if �Æ�� then

there is � � �� s.t. ��

�� and Æ�� (3.9c)

For any � � �� and concept �, � � �

�� iff Æ�� (3.9d)

For any � � �� and functional role � %�� ,

"��

�� "� � �Æ��

� (3.9e)

For any �� in �� , and role names � � s.t. � & ,

if �� and �Æ�� Æ�� , then ��

��(3.9f)

Proof.

(3.9a) This is true by definition, because Æ�!�� Æ� � � !� .

(3.9b) If �� then by Definition 3.7 in both the two sets (3.7a) and (3.7b), the

condition �Æ�� Æ�� must be satisfied.

(3.9c) Let us assume that �Æ�� with � �. We distinguish the two cases in

which � is or is not mapped from an individual name (i.e. � � ��).

– Let assume that � � �� . If � � , then there is an element � � �� such

that � �� , and Æ� �� ( � � by Proposition (3.5a) ). In addition,

� � � �� by definition; therefore ��

�� because it is in the set

(3.7a). The element � we are looking for is � �.

If !� � , then there is an element � � �� such that � � �� and

Æ� �� (by Proposition (3.5b) there is at least a label containing ). We

are going to show that the element � � is the � we need. Clearly � � !�

�� because !� �; in addition, �Æ�� Æ�� because Æ��

Æ� �� . Therefore the pair �� is in the set (3.7b), so it is in �� as

– Now we assume that � !� �� . By definition there is an element � � ��

such that � � �� and Æ� �� . We can proceed as in the previous case

by considering � �. Clearly Æ�� Æ� �� , so �Æ�� Æ�� ;

therefore all the conditions for the set (3.7b) are satisfied (we assumed that

� !� ��) and ��

�� .

(3.9e) Let us consider the mapping Æ restricted to the domain��

��

. First

we show that the codomain is equal to the set� � �Æ��

�, and then that

the restriction of Æ is bijective. This is enough for prooving that the cardinality

of the two sets are equal.7

If � ��

��

then �Æ�� Æ�� by the already shown Propo-

sition (3.9b), therefore Æ�� Æ��

�. For the other direction,

let be an element of� � �Æ��

�. By Proposition (3.9c) there is an

element � such that �� and Æ�� ; therefore � �

��

(and the restriction of Æ we are considering is surjective).

For showing that the restriction of Æ is bijective we only need to show that it is in-

jective. This is trivially true in the case that the cardinality of��

��

is less than 2. Therefore we assume that it contains at least two elements. Let

�� and �� be two elements of the first set such that �� !� ��. We reason by

contradiction by assuming that Æ�� Æ��.

The pair �� either belongs to the set (3.7a) or (3.7b), and the same applies

to �� . If both of them belong to (3.7a), then �� and �� , with

both � �� and � �� equal to . This means that � and � belong to ��; in

addition Æ�� Æ�� because Æ�� Æ��. Therefore � � � because Æ�is a bijection: this is clearly in contradiction with the hypothesis that �� !� ��.

If �� belongs to the set in (3.7a), then � � �� ; therefore �� cannot be

in the set in (3.7b). The converse applies if we assume �� in the set (3.7a).

Therefore both �� and �� must be in the set in (3.7b).

This means that �� and �� with � !� � because �� !� �� by

hypothesis. By definition of the domain �� there should be two labels �� and

�� such that � � �� and � � �� . In addition, Æ�� Æ�� because

7Note that we do not make any assumption on the initial interpretation .

Æ�� Æ�� therefore �� !� �� because both Æ�� and Æ�� are bijective. By

definition of the set (3.7b) the role must be in both the labels �� and ��.

The role is functional by hypothesis, and we already showed that if two labels

have a functional role in common they must coincide. This is in contradiction

with the fact that �� and �� are different. Therefore, the assumption that Æ��

can be equal to Æ�� is false, so Æ restricted to the two given sets of (3.9e) is

injective as well as surjective.

The existence of a bijection between two sets guarantees that the cardinality of

the two sets is the same.

(3.9f) If �� is in �� then, by Definition 3.7, the pair �� is either in the set (3.7a)

or in the set (3.7b). In the first case ��

�� and �Æ�� Æ�� by

assumption8 therefore �� .

In the second case � � � for some in ��, � � ��, and it is not the case that

� � �� and Æ� � � � . Note that � � (see Definition (3.4a)), so � �� as

well by Definition (3.4c). This enable us to conclude that �� by using

Definition 3.7.

(3.9d) We prove this point by induction on the structure of the concept expression �.

For atomic concept names it is true by definition.

For the inductive step we assume that (3.9d) holds for the concepts ��, then

we show that it holds also for ��, � ��, and ��.

�� For any �, � � �� iff � !� �

�� . For the induction hypothesis this is true

iff Æ�� !� �� , which is the case iff Æ�� .

� �� For any �, � � �� iff � � �

�� and � � �� . As in the previous

case this is the case iff Æ�� and Æ�� , which is true iff Æ��

�� .

�� If � � �� , then there is a � � �

�� such that �� and � � �

�� .

By (3.9b) �Æ�� Æ�� , and Æ�� by the inductive hypothesis.

Therefore Æ�� .

For the other direction, let us assume that Æ�� . Then there is

� � such that �Æ�� and � �� . By (3.9c) there is � � ��

8Note that since we do not assume that the interpretation satisfies the role inclusion, we explicitlyadd �Æ�� Æ�� as a prerequisite.

such that �� and Æ�� . Since � � �

�� by induction hypothesis

� � �� .

Now we turn our attention to the transitive closure of the unravelled interpretation,

showing that this operation would not destroy the properties described in Proposi-

tion 3.9, provided that the original interpretation satisfies the transitive and inclusion

restrictions for role names. In addition, in the case of the transitive closure we as-

sume that the original interpretation satisfies the restriction on the roles imposed by

the knowledge base. This is necessary because Definition 3.8 adds the pairs for ensur-

ing the satisfiability of these restrictions to the interpretation of roles. If these are not

satisfied in the original interpretation, then there is a dichotomy between the original

and the result interpretations which may invalidate part of the properties we stated in

Proposition 3.9.

�� be the transitive closure of the unravelled

interpretation ��

��. If the original interpretation � satisfies the transitive and

inclusion restrictions then �� satisfies all the properties in Proposition 3.9.

Proof.

(3.9a) Since !�� !

�� then Æ�!�� !� .

(3.9b) If �� then either ��

�� or there is a transitive role � &

such that there is a sequence � � �� of elements of �� such that

�� . In the first case �Æ�� Æ�� by (3.9b). In

the second ��Æ�� Æ�� ; therefore �Æ�� Æ��

because �� is transitive by hypothesis (transitive restrictions are satisfied). In

addition, �� because the inclusion restrictions are satisfied; therefore

�Æ�� Æ�� .

(3.9c) This is trivially satisfied because �� for any .

(3.9e) If a role is functional, then by the restriction on the language it has not any

transitive subroles (i.e. � !& for any � � � �� . Therefore ��

�� by

definition, so the property is satisfied because �� satisfies (3.9e).

(3.9f) Let �� , we distinguish the two cases in which � is or is not transitive.

If � is transitive and is not transitive, then ��

�� by Definition 3.8; so

let us assume that is transitive as well. By Definition 3.8 there is a sequence

� � �� of elements of �� such that �� .

By Proposition (3.9b)

��Æ�� Æ�� and

��Æ�� Æ��

because � & and the inclusion restrictions are satisfied by hypothesis. We

can use Proposition (3.9f) for showing that �� ,

therefore �� by construction (see Definition 3.8).

If � is not transitive and �� , then either ��

�� or there is a

transitive role � � s.t. � � & �, � !& � �, and �� . In the first case

�� by Proposition (3.9f) and

�� . In the second case � � &

because � & and we can proceed as in the previous case in which � was

transitive, by using � � in place of �.

(3.9d) We can proceed exactly as in the corresponding proof in Proposition 3.9. In fact,

the proof relies on the previously shown (3.9b) and (3.9c).

Before verifying that whenever an interpretation satisfies a KB, the transitive clo-

sure of its unravelled interpretation satisfies the very same KB; we want to be sure that

the interpretation we defined is a canonical quasi transitive shrub. This is what we are

going to prove with the next proposition.

�� be the transitive closure of the unravelled

interpretation ��

��. Then if � satisfies the role ordering, �� is a canonical

quasi transitive shrub interpretation.

Proof. First we show that �� is a quasi transitive shrub (Definition 3.2). The first

properties (3.2a) and (3.2b) are trivially satisfied by definition.

(3.2c) Let �� be a pair in �� for an arbitrary role name . Summarising the

Definition 3.8, either �� or there is a transitive role � & and a finite

sequence of elements �� of �� such that

�� ' ��

In the first case �� must be in the set (3.7a) (because � !� � by definition);

therefore �� . In the second case, the same property can be shown for

all the elements �� .

(3.2d) Let �� be a pair in �� for an arbitrary role name .

If is transitive then there is a finite sequence of element �� of �� such

that �� ' �� . Note that ��

because � cannot be the empty sequence (see (3.7b)). If � � � then � � ��

�; otherwise �� is in �� and �� is in

�� by

definition (take the sequence �� in Definition 3.8).

If is not transitive, then �� or ��

��

��.

In the first case � � � by definition (see (3.7b)). In the latter case, there is a

transitive role � s.t. � & , !& � and �� . We just showed that

if the role is transitive then either � � � or �� . Moreover,

�� &

�� because � & .

(3.2e) Suppose �� , we distinguish two cases where � � �

and � � �.

– If � � � then �� . Since we already showed that (3.2d) holds,

we can use it to conclude that either � � � (i.e. �� )

or �� . In both cases the property is satisfied.

– Let us assume � � �. Since �� , then � !� � �� for all

� � �� by construction (see Definitions 3.8 and 3.7).

Firstly we show that for any role , when � !� � �� for all � �

�� and �� , then ��

�� for

� � �� . Since �� , then either � � � ��

or �� . The first case can

be ruled out because � !� � �� for all � � �� ; therefore both

�� and �� are in ��. We can apply the

same arguments to �� to conclude that both ��

and �� are in �� as well; this inductive argument

can be carried on for each � �� , as long as � ' # � �. Therefore

�� for any � � �� .

In addition, we can use the fact that �� to conclude that

�� by using the same arguments.

If is transitive then, �� and ��

��

for any � � �� (by construction, see Definition 3.8). Therefore

�� , so ��

��.

If is not transitive, then there is a transitive role � & such that

�� because � � � by assumption, so �� can-

not be in �� . By using the same arguments as the transitive case we can

conclude that �� , therefore ��

�� as well.

The next step is to show that �� is canonical (Definiton 3.6).

(3.6b) Let �� be role names and �� elements of �� such that � !� � . We

have to show that if ��

� then there is a label � s.t.

�� . Since � !� � and �� , then there are two

elements � in �� and in �� such that � � � (by Definition (3.2c), because

it has been proved above that �� is a q.t. shrub). By using Definition (3.2d)

together with the fact that �� , for each role �, we can conclude

that either � � � or �� . Note that if � � �, this must be

true for all the roles �� and the hypotesis would become ��

��

�; on the other hand, � would be the same for all the roles again,

therefore ��

� . This shows that we can restrict ourselves

to the case ��

� without loss of generality, which allows

us to consider only the Definition 3.7 of unravelled interpretation, by ignoring

the transitive closure.9

By the fact that the pair has the structure �� , we know that for every role �

��

�� Æ�� Æ� ��

� and � � � �

therefore � � � � for all roles �� , which shows that there is a label

( � �) which includes all the roles.

9The role hierarchy does not add any pair �� to a role because of the Property (3.9c) and thefact that satisfies the role ordering by hypothesis.

(3.6a) Let �� be different elements in �� such that � !� � and

��

If is transitive then (3.6a) is verified by definition because & . So let

be non–transitive: this means that either �� or there is a transitive role

� s.t. � & , !& �, and �� (see Definition 3.8). Analogously,

either �� or there is a transitive role � � s.t. � � & , !& � �, and

�� . Let us consider this case by case.

– If �� is in �� , then � � � for some � �� because � !�

�� (see

Definition 3.7). If �� then � � � as well, therefore � � �. This

is in contradiction with the assumption that � and � are distinct.

– Let �� be in �� and �� be in � �

��. Then � � � and there are

�� in �� s.t. �� , �� , and �� ' ��

� �� . Since � � � then �� , therefore �� is in � �

��

which is included in � ��.

– If �� is in ��, and �� is in �� , then we can conclude that �� is

in �� as well by using the very same arguments as for the previous case.

– Let �� be in ��and �� be in � �

�� . Then there are �� and

�� in �

�� s.t. �� , �� , �� ,

�� ' ��

��

As in the previous cases we note that �� ; therefore ��

��

�� . Since �� by definition (see Definition 3.8), and � �

��

� �� , then ��

�� .

By using the already proved Property (3.6b), we know that there is a label

� containing both � and � �; by Definition (3.4b), either � � � � or � �

� �. Note that by assumption neither � nor � � can have any functional role

which includes them (or be functional themselves). Let us assume that

� � � �; since � � is not functional then either � � & � or there is a role �

such that � � & � and � � �. The first case is our goal, while in the

3.3. COMPLETENESS OF Q.T. SHRUB INTERPRETATIONS 59

latter case we can assume that � must not be functional because � � cannot

be included into a functional role. Since the ordering � is well defined,

sooner or later we are going to hit � without reaching any functional role;

therefore the result is a chain of inclusions which shows that � � & � by the

transitivity of &.10 Assuming that � � � � we can conclude that � & � � by

using the same arguments we used for the dual case.

Therefore we have just shown that either � � & � or � & � �. Let us consider

the case in which � & � �. We assumed that � satisfies the role ordering,

then by using (3.9b) and (3.9f) it is easy to show that if � & � � then ��

� �� (see the proof for Lemma 3.12). Therefore both �� and �� are

in � ��. If we consider the case where �� & � we obtain the same result

with � instead of � �.

3.3 Completeness of q.t. shrub interpretations

Now all the pieces are in place for showing the completeness of the described models.

Obviously, if a knowledge base has a quasi transitive shrub model it has a model.

The difficult bit is showing that having an unrestricted model implies that there is a

quasi transitive shrub model as well. Clearly, the complex translation which has been

described in the previous sections provides the basis for such a proof. We start by

showing that given an arbitrary interpretation satisfying a KB, the transitive closure of

its unravelled interpretation satisfies the KB as well.

Lemma 3.12. Let � be a knowledge base, & be the role ordering induced by �, and

� �� %�� be the transitive and functional role names. We assume that � ��

and %�� are consistent w.r.t. &. If an arbitrary interpretation � � �� satisfies

�, then Then the transitive closure of its unravelled interpretation ��

��

satisfies � as well.

Proof. We prove the lemma by considering all the assertions in � and showing that

they are satisfied by ��. Since � satisfies �, the requirement for Proposition 3.10 are

satisfied. Therefore we are guaranteed that the properties in Proposition 3.9 hold for��.10The last point can be demostrated more formally by induction on �, but we think that it is clear

enough.

3.3. COMPLETENESS OF Q.T. SHRUB INTERPRETATIONS 60

Let �� be a concept axiom in �. Assume that the axiom is not satisfied, then

there is an element � such that � � �� and � !� �

��. By (3.9d), Æ��

and Æ�� !� �� therefore � do not satisfies �. This is in contradiction with the

hypothesis, therefore � � must be satisfied.

Let � be a role axiom in �; then � & by definition. Let �� be a pair

in ��, then �Æ�� Æ�� by (3.9b). We can conclude that �� by

using (3.9f).

If � � �� , then �� is transitive by construction (see Definition 3.8). This

can be proved by considering that if �� , then we have two

paths in �� : one connecting � to �, and a second connecting � to �. Therefore

there is a path in �� connecting � to �.

If is functional, then for any � � �� ,

"��

�� "� � �Æ��

�by (3.9e). In addition, "

� � �Æ��

�$ � because � is functional;

therefore �� is functional as well.

Let �� be a concept assertion in �. It is easy to see that �� , therefore

Æ�� . The assumption that ��

�!� �

�� implies that Æ�� !� �� by

(3.9d). Obviously this is a contradiction with the hypothesis that � satisfies �;

therefore ��

�� .

Let �� be a role assertion in �. Let us consider the pair ��

��: by

Definition 3.8 ��

�� and ��

�� . Note that ��

�� , and

�Æ�� Æ��

�� . Moreover, �� because � satisfies the role

assertion. Therefore, ��

�� because the pair is in the set (3.7a), and

��

�� because ��

�� .

We can now conclude by stating the completeness of quasi transitive shrub models

w.r.t. our logic. In Chapter 7 we present a slightly different version of the present

theorem for deduction instead of satisfiability.

Theorem 3.13. A knowledge base � is satisfiable iff there is a canonical quasi transi-

tive shrub interpretation satisfying it.

3.4. APPLICATION TO TERMINOLOGICAL REASONING 61

Proof. The “only if” direction direction is trivial because if there is a quasi transitive

shrub interpretation satisfying �, then � is satisfiable by definition. For the “if” direc-

tion, let be � an interpretation satisfying �, and ��

�� the transitive closure

of its unravelled interpretation. According to Lemma 3.12 �� satisfies � as well; in

addition �� is a canonical quasi transitive shrub interpretation by Proposition 3.11.

3.4 Application to terminological reasoning

We can use the very same technique to establish the completeness of the class of inter-

pretations we described w.r.t. terminological reasoning. In particular we concentrate

on the problem of verifying the satisfiability of a concept expression w.r.t. a given ter-

minology (see Chapter 2). We have already shown that the structure of models for a

generic kb is a set of quasi transitive trees “glued” together through their roots. For

terminological reasoning (i.e. without individuals) models are much simpler because

they are single transitive trees. This property is exploited in Chapter 5 to show the

completeness of the presented kb satisfiability algorithm.

A given concept � is satisfiable w.r.t. a terminology � iff there is an interpretation

� satisfying � , such that the set �� is non empty. The property we want to show is

that this is the case iff there is a quasi transitive shrub interpretation � s.t.: it satisfies

� , it is a single quasi transitive tree, and the root of this tree is in � �� . These properties

will be used in Chapter 5.

We notice that the given satisfiability problem can easily be reduced to knowledge

base satisfiability (see Schaerf [1994]). In fact,� is satisfiable w.r.t. a terminology � iff

the knowledge base � � �� is satisfiable (where � is an arbitrary individual

name). This can easily be proved by considering that if � is an interpretation satisfying

� , such that �� is not empty; then, we can build an intrepretation for � by mapping

� to one of the elements of �� . Conversly, if � satisfies � then �� is an element of

��; therefore � is satisfiable w.r.t. � .

We use this reduction to establish the completeness result. Let us start from an

arbitrary interpretation � satisfying �. As we have just shown, there is a direct way

of obtaining an interpretation � � satisfying � , by taking � � as equal to � but for the

addition of the mapping from � to a given element of � � . As shown in Section 3.2.1

we build �� as the transitive closure of the unravelled interpretation of � �.

The main idea is “extracting”, from �� the single tree rooted in ��

. Since in

� there are not any role assertion, all the q.t. trees in �� are unconnected, and there

3.5. REMARKS 62

cannot be any loop on the root � ��

(see Definition 3.7). Formally, we extract an

interpretation � � �� by selecting all the elements of � ��

connected to � ��

��

, or there is a finite sequence ��

( ��

and roles �� s.t.

��

� �� and �� for � � �� ' ��

and then we define the new interpretation function �� restricting the original � ��

to the

new domain:

��

By construction � is connected, in the sense that there is a path from the root � ��

to any other individual. It is not difficult to verify that � is still a canonical quasi

transitive shrub interpretation satisfying � , and the root is in � �� .

3.5 Remarks

At this point we need to clarify the reason why the complicate mechanism that has

been put in place was really necessary. The alternative technique can be derived from

the algorithm used for showing the satisfiability of DL formulae w.r.t. a terminology

(see Horrocks [1998]). It can be shown that if the formula is satisfiable the algorithm

produces a pseudo–model from which an interpretation can be trivially built. It is not

difficult to show that the interpretations generated by the pseudo–models satisfy the

properties we are interested in. This method is very good for showing the completeness

of the class of interpretations w.r.t. the problem of kb satisfiability (Section 3.3), but it

has the disadvantage that the connection between an arbitrary interpretation and one of

the interpretations belonging to the restricted class cannot easily be established.

This connection is not necessary for kb satisfiability, but we need it for the prob-

lem of query answering (see Horrocks and Tessaris [2000]) when the query language is

different from the assertional language (see Section 7.1). As we will see in Chapter 6,

query answering is strongly related to logical deduction; which, roughly speaking, is

3.5. REMARKS 63

the problem of deciding whether a formula is verified in all the interpretations satisfy-

ing a knowledge base (written as � �� $).

In general, when the knowledge base and the formulae share a common language

closed under negation the problem � �� $ is simply tranformed into the kb (un) satis-

fiability problem � � ��$�. In the case of our language, this is not directly possible

because the formula $ is written in a language strictly more expressive than the one

of �. However, our goal is to reduce the problem to KB (un) satisfiability anyway, by

introducing a transformation of the original formula $ into a kind of negation $, such

that � �� $ iff � � $ is not satisfiable.

The required transformation is inspired by the “tree–model property” of the DLs

we consider (see Section 6.2.2); therefore we show a restricted version of the logical

deduction problem in which we consider only interpretations belonging to a given class

�.11 I.e. deciding whether a formula is verified in all the interpretations belonging to

the class � and satisfying a knowledge base (written as � �� $).

The class of interpretations cannot be chosen arbitrarly, in fact we are interested in

a class � such that � �� $ iff � �� $. Our candidate class is the one presented in

this chapter, therefore we must show that it satisfies the required property. The “only

if” direction is the easy bit, since � �� $ takes into account all the interpretations, we

must show that if � �� $ then � �� $.

The technique we will use for this proof (which will be presented in detail in Chap-

ter 7) relies on the interpretation transformation presented in Section 3.2; in particular,

on the ability to relate the original and the transformed interpretations in both direc-

tions. If we were interested in kb satisfiability only, we needed only the first direction;

therefore the mere existence of an arbitrary interpretation belonging to the required

class would have been enough.

11In particular we consider the class of q.t. shrub canonical w.r.t. the kb (see Section 3.1).

Chapter 4

KB satisfiability algorithms

In this chapter we present a brief overview of the different methods for KB satisfiability

presented in the literature, then we describe the precompletion technique, investigated

in this thesis. In addition, we provide the motivation for our decision to concentrate on

the precompletion method.

4.1 Alternative Abox reasoning techniques

In our analysis we concentrate on complete algorithms suitable for DLs whose expres-

sivity is at least �� with general axioms. This choice is dictated by the acknowledge-

ment that the ability to express general axioms is essential for modern DL systems. For

this reason, for example, we are not going to describe the structural algorithm used by

the CLASSIC DL system.

4.1.1 Direct tableaux

In Section 2.2.2 we provided an example of verification for the satisfiability of a KB

based on an extension of the tableaux–based technique presented in Section 2.2.1. In

general, tableaux algorithms for terminological reasoning can be extended to hybrid

reasoning without major changes.

The main idea is that the expansion starts with an initial set of constraints (see

Equation 2.5) corresponding to the assertions in the Abox (see Buchheit et al. [1993],

4.1. ALTERNATIVE ABOX REASONING TECHNIQUES 65

Haarslev and Moller [2000b], Horrocks et al. [2000b]).1 Individual names are sub-

stituted by different variables in the constraint system; therefore if the unique name

assumption holds for individuals, inequality constraints are added for imposing that

two different individual names are never merged.

It is worth mentioning that up until now the fastest DL system providing Abox

reasoning (described in Haarslev and Moller [2000a]) uses a direct tableaux technique.

We are interested to experiment with a technique which maintains the algorithmic

distinction between Abox and Tbox. In addition we want to reuse terminological rea-

soners as they are (if possible). So we decided to not investigate the direct tableaux

technique but a variation of it, called the precompletion technique, which will be de-

scribed later on.

4.1.2 Encoding

The encoding technique reduces the KB satisfiability problem to the problem of veri-

fying the satisfiability of a concept w.r.t. a terminology (see De Giacomo and Lenzerini

[1996]). This is achieved by treating individuals as mutually disjoint newly introduced

concept names, and encoding the Abox assertions as terminological axioms. For ex-

ample the assertions �� and �� are encoded as the two axioms �� and

�� respectively.

This idea alone does not provide a complete solution, since when concepts are used

instead of individuals we are relaxing the fundamental property of an individual of be-

ing unique. In fact we cannot restrict the cardinality of the concept the individual has

been substituted with, therefore it is possible that a knowledge base which is unsatisfi-

able because of this uniqueness restriction results in a satisfiable one once encoded.

A similar idea was exploited in Borgida and Patel-Schneider [1994] and Era and

Donini [1992]; however in the first case a different semantics was adopted for the

individual name,2 while the second approach works only for not very expressive DLs.

In De Giacomo and Lenzerini [1996] the proposed algorithm deals with very ex-

pressive DLs, providing correct and complete reasoning. The trick they use is to ensure

that, given an interpretation for the encoded KB, all the elements belonging to the same

individual representative concept are indistinguishable. To guarantee this property they

use terminology axioms to impose that if an element of a representative concept �� has

1Note that terminological reasoning always starts with a single concept formula� to check, thereforecorresponding to the single constraint ��.

2In fact they relaxed the assumption that an individual is unique.

4.1. ALTERNATIVE ABOX REASONING TECHNIQUES 66

a property (i.e. belongs to a concept) � then all the elements of �� have the same prop-

Note that the concept � is not specified, and in principle there are infinitely many

different expressions; therefore, infinitely many new axioms for each representative

concept. However, they show that the number of different concept expressions they

need is finite and polynomialy bound by the size of the original knowledge base.

Although the technique is a very powerful theoretical mechanism for investigating

the computational properties of very expressive DLs, this method has not lead to any

practical implementations. The encoding they suggests requires a very expressive un-

derlying language (i.e. Converse PDL), for which no efficient reasoner is yet available;3

it is also the case that the size of the generated terminology, although polynomial, soon

becomes unmanageable when the number of individuals grows.

4.1.3 Resolution based methods

In spite of the fact that the DL community is strongly focused on tableaux–based meth-

ods, recently there have been some attempts to apply resolution–based methods to the

KB satisfiability problem. We will not spend much time here on this approach since

reasoning with Aboxes using resolution seems to be still at an early stage. However,

some results have been obtained by translational methods4 applied to terminological

and Abox reasoning (see Hustadt and Schmidt [2000], Tammet [1995]). Note that a

resolution–based terminological reasoner can be used in conjunction with the tech-

nique we are investigating (see Section 4.2).

Recently, a direct resolution–based method has been proposed for KB satisfiability

(see Areces et al. [1999]). Although the DL they cover lacks some important fea-

tures (for example the possibility of expressing general inclusion axioms) we think

that more work in this subject can produce interesting results, since several modern

computational logic tools are based on resolution methods rather than tableaux.

3In fact, no attempts have been made to investigate whether the encoding technique is applicable todifferent DLs.

4Modal or Description logics are translated into First Order Logic and then resolution methods areapplied with an appropriate strategy for guaranteeing termination.

4.2. PRECOMPLETION 67

4.2 Precompletion

The main idea behind the precompletion technique is to split the KB satisfiability algo-

rithm in two parts. In the first part all the information implicit in the role assertions is

made explicit, generating what we call a “precompletion” of the knowledge base, then

a terminological reasoner is used for verifying the consistency of the concept assertions

for each individual name (see Hollunder [1996], Donini et al. [1994]).

The precompletion approach has been successfully used for both providing cor-

rect and complete algorithms, and analysing the complexity of the KB satisfiability

problem. The works cited above focused on DL knowledge bases with empty termi-

nologies, and languages not including transitive or functional roles.5 With the work

presented in this thesis we generalised the precompletion technique for KBs with gen-

eral inclusion axioms, transitive roles, and functional roles (i.e. the DL ��f).

Let us consider for example a very simple Abox containing only the assertions:

� � ��

The two first assertions can be used to derive the new assertion ��; it is easy to realise

that � is satisfiable iff �� is satisfiable as well (which is not the case,

because � cannot be in � and �� at the same time). The interesting point is that when

we check the satisfiability of �� we do not need to consider the role assertion, because

its effects have been made “explicit” in the new assertion. Therefore we can verify

the KB satisfiability by checking the satisfiability of the concepts �� and � � ��

separately.

This technique consists of a correctness–preserving process which eliminates spe-

cific information regarding dependencies between individuals, while maintaining the

consequences of such information. Once these dependencies are eliminated, the as-

sertions about a single individual can be independently verified, ignoring the fact that

an individual is involved. The precompletion, or elimination of the dependencies, is

performed by adding new assertions using a set of nondeterministic syntactic rules.

Because of the nondeterminism of the rules, many different precompletions can be de-

rived from a single knowledge base, which is satisfiable if and only if at least one of

these precompletions is satisfiable.

A subset of the tableaux rules for ��f (see Horrocks [1998]) is used to transform

5Even if the DLs included number restriction constructors, general axioms are needed for simulatingfunctional restrictions.

the KB into one of its precompletions (see Figure 5.1). In the case of ��f, the only

“dropped” rule is the �-rule (see Figure 2.1), and this means that new variables are

never generated.

The nondeterminism is introduced by the �-rule and possibly an exponential num-

ber of precompletions can be generated. On the other hand, since the number of vari-

ables is constant, the size of a precompletion is alway polynomial in the size of the

original KB, and, most importantly, we do not need to worry about the possible non–

termination of the algorithm (see Section 5.1).

The main advantage of this technique lies in the fact that the concept satisfiability

tests can be performed by using any available terminological reasoner (providing a suf-

ficiently expressive DL language); in particular, impressive results have been recently

obtained by optimised tableau–based systems like FaCT (see Franconi et al. [1998b]).

Indeed, one of the primary motivations for reexamining the precompletion approach is

the availability of such optimised systems.

The formal proof of correctness and completeness of the algorithm is provided in

Chapter 5, here we sketch the actual algorithm and the technique used for proving its

completeness.6

4.2.1 KB satisfiability algorithm

The input consists of a set containing the constraints (see Formulae 2.5) corresponding

to the assertions in the Abox. In this section we introduce the algorithm for the less

expressive language �� (i.e. without functional roles). The rules for this language are

simpler and more intuitive, the full details for ��f are presented in the next chapter.

The rules in Figure 4.1 are repeatedly applied to the initial set until either no rule is

applicable or a contradictory combination of constraints is detected (a so-called clash).

The �-rule generates several alternatives branches, these are exhaustively explored

by means of backtrack points where the algorithm restarts in case of a failure.

Intuitively, a clash corresponds to an unsatisfiable combination of constraints in

the constraint set. When the precompletion process encounters a clash there is no need

to continue adding new constraints, the precompletion will be unsatisfiable anyway;

therefore the algorithm backtracks to the most recent backtrack point and continues

the precompletion from that state.

If none of the rules is applicable a precompletion has been generated, and can

6In this case the correctness is the easy part.

� #� �!�� if ! is in , � � is in �and !�� is not in �.

� #� �!�� !�� if !�� in �,and either !�� or !�� is not in �.

� #� �!�� if !�� in �,and � � �� or � � ��

and !�� is not in �.

� #� �!�� if !�� in �, and �!� !�� is in �, and � & ,and !�� is not in �.

� #�� !�� if !��%�� in �, �!� !�� is in �,and there is � � �� such that � & & % ,and !�� is not in �.

Figure 4.1: Precompletion rules for ��

be checked for its satisfiability. This is done by gathering all the concept assertions

concerning an individual, the concepts are then conjoined generating a new single

concept associated to each single individual. These concepts are then verified by using

an external call to a terminological reasoner; if they are all satisfiable then the algorithm

terminates with success, otherwise it backtracks as in the case of clash detection.

If the process fails to find a satisfiable precompletion after all the nondeterministic

branches have been explored, then it terminates with failure.

As presented here the algorithm sounds rather naive and prone to inefficient explo-

ration of the search space. In fact, we need a few “tricks” to avoid a hopelessly slow

algorithm. In Chapter 8 we will come back to this point and show how to improve the

algorithm to obtain acceptable behaviour in most of the cases. Note that the theoretical

worst case complexity of reasoning is EXPTIME, therefore there can be pathological

KBs manifesting this complexity.

Example 4.1

We use a slightly modified version of Example 2.3 to illustrate the precompletion al-

gorithm. Let us consider the problem of checking the satisfiability of the Abox��

sarah��FRIEND��Female � ��LOVES�Female��

susan�Female�

andrea��LOVES��Female�

�sarah�susan��FRIEND�

�sarah�andrea��FRIEND�

�susan�andrea��LOVES

��

with an empty terminology. This Abox corresponds to the one of Example 2.3 where

the two constraints bill��Female and �andrea�bill��LOVES have been replaced

by the constraint andrea��LOVES��Female. Note that this Abox is still unsatisfi-

able, it can be easily verified by using similar arguments to those used for the previ-

ously seen example; however we want to show its unsatisfiability by using the precom-

pletion algorithm.

Using the �–rule with the first concept constraint and the two role constraints

�sarah�susan��FRIEND and �sarah�andrea��FRIEND we add the two con-

straints �susan��Female � ��LOVES�Female��

��

Both the newly added constraints are possible branching points where the �–rule is

applicable; however if we choose to add the concept constraint susan��Female we

obtain a contradiction straight away. Therefore we ignore that branch and we add the

constraint �susan��LOVES�Female�

We can carry on with the deterministic rules and apply the �–rule with the latest added

constraint and the role constraint �susan�andrea��LOVES; this results in the further

addition of the constraint �andrea�Female

At this point, the only option is applying the �–rule to the constraint

The choice of adding the constraint andrea��Female can be immediately ruled out

by the fact that it is in contradiction with the previously added andrea�Female,

therefore we select the other disjunct and we add the constraint�andrea��LOVES�Female�

It is easy to verify that we have obtained a precompletion, since there are no further

applicable rules; in addition, all the search space has been explored. Therefore, the

original Abox is satisfiable iff this precompletion is satisfiable. The satisfiability check

is performed by verifying the satisfiability of the concept constraints associated to each

individual:

Individual Concept

sarah ��FRIEND��Female � ��LOVES�Female��

susan �Female � ��Female � ��LOVES�Female�� LOVES�Female��

andrea ��LOVES��Female� � ��Female � ��LOVES�Female��

� Female � ��LOVES�Female��

The first two concepts are satisfiable, while the third one is not because of the conjunc-

tion of ��LOVES��Female� and ��LOVES�Female�. Therefore the initial Abox is

unsatisfiable.

4.2.2 Correctness and completeness of the algorithm

This section presents an overview of the proof for correctness and completeness of the

precompletion technique; the full formal proof is provided in the next chapter.

The proof is divided in two parts: in the first we show that the set of precompletions

characterises the satisfiability of the original knowledge base. The second part gives a

method for checking the satisfiability of such a precompletion using a terminological

reasoner.

The definition of a precompletion for a knowledge base � � �� is given in

a procedural way as a new KB � � � �� where the ABox � � is obtained by

extending � using the nondeterministic syntactic rules in Figure 5.1 as long as they

are applicable.

The precompletion rules are designed in such a way that, whatever strategy of

application is chosen, the process of completing a knowledge base always terminates,

with the same set of precompletions. In fact the only rule that does not introduce a

smaller assertion in the knowledge base is the ��–rule, but its applicability is bounded

by the number of role assertions, which is invariant. The number of precompletions of

a KB can be exponential because of the presence of a nondeterministic rule, however

the size of each precompletion is polynomial with respect to the size of the original

The set of all precompletions of a knowledge base characterises its satisfiability.

If one of the precompletions is satisfiable then the original KB is trivially satisfiable

because the process never removes constraints, therefore the assertions in the Abox are

still in the precompletion. The other direction is proved by using a technique similar

to that used in Horrocks and Sattler [1998]. A model � of � which witnesses its

satisfiability is used to deterministically guide the application of precompletion rules

to a unique satisfiable precompletion. For this purpose the only nondeterministic rule

is transformed into its deterministic counterpart:

�#� �!��

if !�� in �,

and � � �� if !� � �� , � � �� otherwise

Satisfiability is preserved by rule application since if � is a model for the ABox

before the application of the rule, then it is a model for the extended ABox as well.

Therefore, given the fact that the terminology does not change, � is a model for the

generated precompletion.

Now we know that we can concentrate on precompleted KBs without loss of gen-

erality. In �� all the contradictions can be detected by the terminological reasoner,

but in ��f there can be clashes involving role assertions (see Definition 5.7). For this

reason in Chapter 5 we will assume clash–free precompletions, since the presence of a

clash makes a precompletion trivially unsatisfiable.

Given a precompleted knowledge base � �, for each individual ! in the KB we

consider the individual concept�� !�, which is defined as the conjunction of

all the concept expressions in the set �� !�� , or � if there are no assertions

about !. It is clear that a model for the knowledge base is a model for every individual

concept. We now show how, given models for the individual concepts, we can build a

model for the knowledge base.

If each individual concept�� !� is separately satisfiable with respect to

the terminology, then for every individual name ! there is an individual model ��

�� , which witnesses the satisfiability. Interpretations can be infinite, but with-

out loss of generality it can be assumed that their interpretation domains are pairwise

disjoint, as well as having tree structures,7 whose root elements (indicated by ��) are

in the extension of the corresponding individual concept�� !� (as shown in

Section 3.4).

The idea is to build a union interpretation � � �� by “gluing” together all the

tree interpretations, and adding new pairs to the interpretation of roles according to the

role assertions.

The fact that the union interpretation is a model for the precompleted KB is in-

tuitive for the propositional part (concept conjunction, disjunction and negation), but

not so for the modal part. In particular, trouble can arise from the interaction between

the universal quantification and the newly added pairs. However, the domain disjoint-

ness and tree structure assumptions guarantee that the properties of role interpretations

overcome this problem.

First, new links are added only to root nodes (those associated to individual

names); in fact for each pair in � � �� , � �� (�!� � implies that � � ��

is in �� . An important consequence of this property is that if a role connects

two individuals of different individual domains, then the first element of the pair

must be a root individual.

Second, there cannot be a connection between elements from different individual

domains without a path passing through a root node. That is, given two distinct

individuals � and �, whenever there is a pair �� with � ��, there

should be a transitive role � included in such that both pairs �� , ��

are in �� , or � �� .

This property ensures that elements in different individual domains can interact

only via individual names in the Abox (i.e. the roots). Therefore, restrictions

like �� holding inside an individual model cannot affect elements belonging

to other individual domains.

It is easy to see that interpretation of roles satisfies the role assertions in the KB,

because of their simple form; however concept assertions are more problematic. In

fact, although the interpretation of a concept name is simply the union of the interpre-

tations of individual models, the extension of some concept expressions can violate the

7Actually, the structure is more tree–like, as explained in Chapter 3, but here we can consider itsimply as a tree.

semantics of concept forming constructors because of the presence of new pairs in the

interpretation of roles.

Example 4.2

For example, given the simple precompleted knowledge base � � �� , we can

build an individual model �� for the label of �. We choose the interpretation domain

�� with the interpretation function �� , and �� . The union

interpretation maps to �� and � to �.

Let us consider the concept expression ��, by the semantics of the universal

constructor (see Table 2.1) �� because �� is not related to any other

element via . However, in the union interpretation this is no longer true because ��

is related to itself via �� , but �� cannot be in �� .

This problem is localised to root individuals, because the interpretation of roles re-

stricted to non–root individuals does not change. In fact, the interpretation of arbitrary

concept expressions, restricted to non–root individuals is a monotonic extension of the

one in individual models; i.e. for any concept expression �, �� (�!�

� �� for

each individual name !.

For root individuals this property does not hold for arbitrary concept expressions;

however, the property is still valid for concept expressions which appear as assertions

in the precompleted KB. In fact, for each assertion !�� the individual �� is in the

extension �� . This restricted property (together with the more general applicability to

non–roots) is sufficient to prove that all the axioms and assertions in the precompleted

knowledge base are satisfied.

4.2.3 Pros and cons of precompletion

The initial motivation of our research was exploring the feasibility of providing an

Abox reasoner for the DL of the FaCT system (��f). In particular we are interested in

the differences in developing DL systems with and without Abox. For this reason we

decided to investigate the precompletion technique, since it enables a clear distinction

between the hybrid reasoning and the terminological component of the algorithm.

The algorithm is completely independent from the terminological reasoner which

is used for the concept satisfiability test. In this way, an Abox can be easily added to

most existing DL systems without re-implementing the whole system. Moreover, op-

timisation strategies implemented at the terminological level do not adversely interact

with the precompletion algorithm.

A further advantage of precompletion is that it confines the exponential blow of the

complexity to the terminological reasoning. In fact, the size of precompletions is poly-

nomial in the size of the KB. This may indicate that a system based on precompletion

can run with a smaller memory footprint compared to a system based on an extension

of the Tableaux method.8

On the other hand, precompletion presents two big disadvantages w.r.t. other tech-

niques. The first one is related to the fact that the precompletion process cannot obtain

useful information from the terminological reasoner (just a binary yes/no). This means

that in case of failure of a concept satisfiability test, the precompletion process cannot

use additional information for improving heuristics in the exploration of the search

space.

However, we think that the major drawback lies in the fact that the limits on the ex-

pressiveness of DLs suitable for this approach are not clear. For example, the extension

of precompletion to the inverse role constructor is problematic.

Example 4.3

Consider for example the Abox

��

This Abox is unsatisfiable, as can be seen by considering the following diagram

� �

��

The key point is the combination of the existential and universal quantification

�� which pushes “new” restrictions on the individuals � and � (i.e. the

concept �� on �, and consequently the concept � on �). Note that the Abox is

already precompleted, and the concept �� is satisfiable; therefore a

naive application of precompletion leads to a wrong result.

Precompletion is failing in this case because it does not take into account the fact

that new concepts may have to be added to the label of an individual because of the

terminological reasoning.

8This actually depends on the kind of optimisation implemented in the DL system. As we are goingto suggest later on, the precompletion technique can be used as a testbed for improving strategies forTableaux based systems.

In the next chapter we prove that the precompletion technique can be used with

the DL ��f. From the precise description of the technique an actual algorithm can

be easily devised. We have implemented a prototype of an Abox reasoner based on

precompletion, and we performed some evaluation experiments with it. The system

and the results are described in Chapter 8.

Chapter 5

Precompletion algorithm for ��f

The proof for correctness and completeness of the technique is in two parts: in Sec-

tion 5.1 we present a method for deriving a set of simpler knowledge bases called

precompletions, and we show that this set characterises the satisfiability of the origi-

nal knowledge base. The second part, in Section 5.2 shows that the satisfiability of a

precompletion can be verified using a terminological reasoner.

The purpose of the precompletion process is to generate a simpler (not smaller)

knowledge base where the role assertions about individuals can be ignored. Precom-

pletions of knowledge bases are built using a set of nondeterministic syntactic rules

which extend the Abox of the original knowledge base. It will be proved that a knowl-

edge base is satisfiable if and only if a satisfiable precompletion can be derived.

In Section 5.2 we show that for checking the satisfiability of a precompletion the

role assertions can be ignored. In a precompletion the relevant elements are the concept

assertions and the terminology. Each individual is associated to the conjunction of

the concepts appearing in its concept assertions. If all those individual concepts are

satisfiable with respect to the terminology, then we show that their models can be

combined in an interpretation satisfying the precompleted knowledge base.

5.1 Precompletions of knowledge bases

Without loss of generality we assume that all the concept axioms in the Tbox are in the

form � �, where � is a concept expression. This assumption is not restrictive at all

in DLs closed under negation. An arbitrary assertion �� can be transformed into

the equivalent assertion � �� , which is in the required form.

A second assumption we adopt is that concept expressions are in negation normal

5.1. PRECOMPLETIONS OF KNOWLEDGE BASES 78

form, where the �� constructor can appear only in front of concept names. Any concept

expression can be transformed into an equivalent expression in normal form using the

following rewriting rules.

�� " � �� " �� " ��

�� " �� " ��

This assumption simplifies the description of the algorithm and the proofs detailed in

this chapter.

Intuitively, a knowledge base is precompleted if all the information entailed by

the presence of role assertions is exhibited in the form of concept assertions. The

definition of a precompletion for a knowledge base is given in a procedural way as a

new KB where the Abox is extended using the syntactic rules of Figure 5.1 as long as

they are applicable.

We define a set of new binary operators � )� � (one for each individual name !), to

simplify the formulae in the rules. Intuitively, they take into account the possible in-

teraction between the role hierarchy and the functional restrictions (see Section 3.1.1).

Two role names and � are � )� � related if they are functional, and the Abox asser-

tions force the and � successors of the individual name ! to be the same element.

Strictly speaking, the operator depends on the KB as well, but for simplifying the no-

tation we are omitting it. In places where the relation is used, it will be clear which is

the KB we are referring to.

Definition 5.1. Given two roles and �, an individual !, and a KB � � �� , the

relation )� � holds iff:

there is a role � & s.t. either !�� or �!� !�� is in �; and

there is a role �� & � s.t. either !�� or �!� !�� is in �; and

there is a set of roles �� , and a set of functional roles �� ,

– either !�� or �!� !�� is in �, for any � � �� ' � and

– & ��, � & ��, � & ��, � & ��, � & ��, . . . , �� & ��.

The � )� � relation can be better understood by considering that it is describing a

situation in which part of the role taxonomy looks like

��

� � � � ��

� ��

and the individual name ! has a successor for each of the role names�� .

In this case, the functional restrictions cause all these successors to be interpreted as

the very same element. By the way it is defined, the relation � )� � is symmetric (i.e.

)� � * � )� ).

Proposition 5.2. Let � � �� be a KB, and � � �� be an interpretation

satisfying �. For any individual name !, role name , �, and elements � � of �� . If

�!�� , �!�� , and )� �, then � �.

Proof. All the roles � �� of Definition 5.1 are functional because they

are included in functional roles (�� ); therefore !� has at most one successor

for any of these roles. We are going to show that all these successors are equal.

Note that for any �, the constraint !�� (or �!� !��) implies the existence of

an element � in �� s.t. �!�� .

Let us consider �!�� , and �!�� . Since �

� � �� and � �

�� , then

��!� � �� !

��

� . From the functionality of �� we can conclude

that � � .

The very same arguments can be applied to all pair of roles �, ��, including the

last ��, �; therefore � � � � � � � � � �� .

Given this introductory definition, the transformation rules of Figure 5.1 should be

clear, and we are going to introduce the formal definition of a precompletion of a KB.

Definition 5.3. A precompletion of a knowledge base � � �� is defined as a

knowledge base � � � �� , where � � is an extension of � built by applying the

syntactic rules in Figure 5.1 as long as their pre–conditions are satisfied.

The precompletion rules are designed in such a way that, whatever strategy of

application is chosen, the process of completing a knowledge base always terminates

(Proposition 5.4).

� #� �!�� if ! is in , � � is in �and !�� is not in �.

� #� �!�� !�� if !�� is in �,and either !�� or !�� if not in �.

� #� �!�� if !�� is in �,and � � �� or � � ��

� #�� !�� if !�� and �!� !�� are in �, )� �, and !�� is not in �.

� #�� !�� if !�� and �!� !�� are in �,there is � & s.t. � )� �and !�� is not in �.

� #� �!�� if !�� is in �, and �!� !�� is in �, and � & ,and !�� is not in �.

� #�� !�� if !��%�� in �, �!� !�� is in �,and there is � � �� such that � & & % ,and !�� is not in �.

Figure 5.1: Precompletion rules for ��f

Proposition 5.4. The precompletion process always terminates, and any precomple-

tion has a size which is polynomial w.r.t. the size of the knowledge base.

(Sketched). The number of new assertions introduced by the terminology via the #�

rule is equal to the number of individual names in the KB multiplied by the number

of concept axioms in the Tbox. The rules #�, #�, #�� , #��, and #� always in-

troduce assertions smaller than the original ones. The only rule that introduces non

decreasing assertions is #��; however, its applicability is bounded by the number of

role assertions, which is invariant.

For estimating the size of each precompletion we can use the argument that the

number of different concept expressions that can be generated is polynomial w.r.t. the

size of the KB.1 Therefore the size of a precompletion cannot exceed the number of

individual names multiplied by the number of concept expressions, and this number is

still polynomial w.r.t. the size of the KB.

The precompletion algorithm generates a finite, but possibly exponential, number

of precompletions; which can be individually checked for their consistency. The ad-

vantage over the original knowledge base is that they are simpler, enabling the use of

techniques based on terminological reasoning.

The satisfiability of a knowledge base and the satisfiability of its precompletions

are strictly related. In fact, the knowledge base is satisfiable if and only if at least one of

its precompletions is satisfiable (Proposition 5.5). Since Proposition 5.4 ensures that

the precompletions of a given knowledge base can be enumerated, the satisfiability

checking can be performed on precompleted knowledge bases.

Proposition 5.5. A knowledge base � � �� is satisfiable if and only if it has a

satisfiable precompletion.

Proof. + Since � is included in all its precompletions, a model for a precompletion

� � is a model for � as well.

* Given a model � � �� for �, a satisfiable precompletion of � can be built.

This precompletion � � � �� is built by extending � using a set of rules

constrained by the model �. The rules are the same as Figure 5.1 apart from

the nondeterministic #� rule, which is transformed into a deterministic one by

using the model �:

1Again the only problem may come from the �� rule, but the number of formulae that it cangenerate is limited by the number of transitive role names.

� #� �!��

if !�� in �,

and � � �� if !� � �� and � � �� otherwise

All the rules preserve satisfiability, in the sense that if � is a model for the Abox

before the application of the rule, then it is a model for the extended Abox as

#� If � is a model for �� , then every element of the domain is in ��

(including !� for each ! � ). Therefore the assertion !�� is satisfied.

#� If � is a model for �� , then !� � �� . Therefore !� � ��

� and

!� � �� .

#� If � is a model for �� , then !� � �� ; therefore either !� � ��

or !� � �� . Suppose that !� � ��

� , then � is extended by adding the

assertion !�� which is satisfied by �, and analogously for the case in which

!� � �� .

#�� If � is a model for �� , then �!�� !�� , and there is an element

� �� s.t. �!�� . By Proposition 5.2 � !�� ; therefore the

interpretation � satisfies the assertion !�� as well.

#�� If � is a model for �� , then �!�� !�� , and every element s.t.

�!� � � � � must be in �� .

Since � )� �, there is a role� & � and an element such that �!��

�� . In addition, � & therefore �!�� , which means that �

�� .

Finally, we can use Proposition 5.2 with � and � for concluding that �

!�� , so � satisfies the assertion !��.

#� If � is a model for �� , then !� � �� , �!� � !�� and ��

� . So �!�� !�� as well, therefore !�� .

#�� If � is a model for �� , then !� � ��%�� , �!� � !��

% � and � � ��. If there is in �� such that �!�� , then

�!� � � � � as well because � is transitive. The element should be in

�� because !� � ��%�� and � � % � ; therefore !�� .

5.2. SATISFIABILITY OF PRECOMPLETIONS 83

Because of the preserved satisfiability, � must be a model for the precompleted

knowledge base � � as well.

The following Proposition 5.6 makes explicit the logical properties of precomple-

tions implicit in the rules of Figure 5.1.

Proposition 5.6. A precompleted ��f knowledge base � � � �� , containing

assertions about the set of individual names , satisfies all the following conditions:

#� if ! � and � � � � � then !�� ;

#� if �� then �� and �� ;

#� if �� then �� or �� ;

#�� if �!�� , and )� �, then �� ;

#�� if �!�� , and there is � & s.t. � )� �, then �� ;

#� if �� and � & then �� ;

#�� if ��%�� and there is a transitive role � � �� such that

� & & % , then �� .

Proof. Each property corresponds to the pre–conditions of one of the rules of Fig-

ure 5.1. So if the property is not satisfied, the corresponding rule is still applicable and

� � is not yet a precompletion.

At this point we have reduced the problem of checking the satisfiability of a ��f

knowledge base to the problem of verifying whether one of its precompletions is satisfi-

able. Now we turn our attention to precompleted KBs. The advantage in concentrating

on such KBs is that the consistency can be verified using a terminological reasoner

(Theorem 5.20).

5.2 Satisfiability of precompletions

In this section we show how the satisfiability of a precompletion can be reduced to the

concept satisfiability problem. This reduction enables us to close the circle and leads

to the main result for a general KB in Section 5.3.

Since we are going to ignore the role assertions of a precompleted KB, first we

must make sure that a precompleted KB does not contain any contradiction caused by

role assertions. In ��f this case is restricted to assertions involving functional roles.

Definition 5.7. A precompletion � � � �� contains a clash iff there are two

roles �� , and individual names �� with � !� �, such that )� �, and

��

A precompletion containing a clash is not satisfiable because it violates some the

functional restrictions; in fact, by Proposition 5.2 the interpretations of � and � must

coincide, which is in contradiction with the unique name assumption. Precompletions

not containing clashes are said to be clash-free.

Given a knowledge base, the label of an individual is the set of concept expressions

in the assertions on the individual itself. This is formally defined in the next definition.

Definition 5.8. Given a knowledge base � � �� , the label of an individual ! �

with respect to � (written as �� !�) is defined as the set of all concept expressions in

the assertions about the individual name !:

�� !� �

�� !�� if this set is not empty, or

�� otherwise

By using the label of an individual, the concept expression�� !� is defined as the

conjunction of all the concept expressions in �� !�:

�� !� � ��

where �� !�.

Since in precompleted knowledge bases the information carried by role assertions

is made explicit, all the relevant properties of an individual are in the form of concept

assertions. The label of an individual completely characterises the properties of the

individual, and it can be used to verify that these properties are not contradictory.

If each individual concept�� !� is separately satisfiable with respect to

the terminology, then for every individual name ! there is an individual model ��

�� , which witnesses the satisfiability. As seen in Section 3.4, it can be assumed

that each individual model has a quasi transitive tree structure.

The domains of the models can be considered pairwise disjoint without loss of

generality. Let �� and �� be two models of�� and

�� respectively,

whose domains �� and �� overlap. A new domain �� can be chosen having the

same cardinality as ��, and being disjoint from ��, and a bijection can be defined

between the sets �� and ��. Moreover the defined bijection can be used to build a

new interpretation in which every element of �� is substituted by the mapped element

in ��. It is easy to show that that new interpretation � ��

�� is a model for the

corresponding concept�� with respect to the given terminology.

Starting from the individual models ��, an interpretation for the overall knowledge

base can be build by combining them. The domain of the union interpretation is the

union of all the domains from each individual model. The interpretation function is

defined in terms of the interpretation functions of each individual model:

Each individual name is interpreted as the root of its corresponding model (i.e.

!��), which is defined because to each individual name corresponds a label (see

Definition 5.8).

Concept names are interpreted as the union of their interpretations in the different

models.

Interpretation of role names is slightly more involved because the union of the

single interpretations is not enough. In fact, the pairs corresponding to the role

assertions in the Abox must be added as well (including those induced by both

the role hierarchy, and transitive restriction on roles).

As anticipated, the interpretation of roles involves something more than the union

of individual interpretations. First, the role assertions should be added; then, we must

ensure that functional roles are still functional. The latter point can be understood

by considering the precompletion of the KB � � � � �� , where �

is a functional role. The precompletion of � is � � �� , which

is satisfiable. From the precompletion, two individual models can be derived: �� for

the concept �� and �� for �. If we try to merge these two models, together with

the pair generated by the role assertion �� , the resulting interpretation would not

satisfy the functional restriction on � . The solution to this problem relies on a more

careful definition of the union interpretation, which takes into account the functional

restrictions on role extensions. For this purpose we introduce the notion of restricted

interpretation of roles (Definition 5.9).

The idea is to remove the links that will be added later on by means of role asser-

tions in the Abox. We perform this operation only on functional roles, because they

are the problematic ones.

Definition 5.9. Let � � � �� be a clash–free precompleted KB, and ��

�� be the individual model for the individual ! in � �. The restricted individ-

ual interpretation function �� for ! is defined as:

!�� !��

��

�� (

��!�� !��

�if there are �� s.t. & ��

�!� !�� and � )� ��

�� otherwise.

Since the restricted interpretation is different from the original one only for the role

names, in the following we are going to use it only with role names.

Note that if is transitive, then�� because does not have any functional

super–role (see Definition 5.1).

Note that the restricted role interpretation depends on the knowledge base as well

as on each individual model. It is used instead of the original interpretation function

in Definition 5.10, where the extension of role names is defined in such a way that

pairs removed by means of Definition 5.9 are substituted by pairs induced by Abox

assertions. This is necessary only for non–transitive roles, because transitive roles

cannot have functional super–roles by assumption, so the � )� � relations do not hold

among transitive roles.

Then, the following Sections provide the formal proofs showing that this union

interpretation is indeed a model for the precompleted KB.

Definition 5.10. Given a precompleted knowledge base � � � �� and the indi-

vidual models �� for each individual ! � , then the union interpretation

� � �� is defined as:

� ��

��

!� � !��

��

�� )�

��

if is not transitive,

� ��

��

� & �� otherwise.

For each ! � , � � �� , � �� . The operator �� builds the transitive closure

of a relation. For an arbitrary binary relation &, it is defined as &� ��

�� &�; where

&� � &, and &�� is the composition of & and &�, i.e. &�� & Æ &�.

Note that the definition is recursive because the union interpretation is used to

build the interpretation for roles. Given the fact that we assume that the role hierarchy

is acyclic (see Section 2.4), the interpretation of the roles is well defined.

The fact that the union interpretation is a model for the precompleted KB is intuitive

for the propositional part (concept conjunction, disjunction and negation), but not so

for the modal part. In particular, troubles can arise from the interaction between the

universal quantification and the newly added pairs. However, we are going to prove

that the domain disjointness and tree–like structure of the individual interpretations

guarantee that the union interpretation is indeed a model for the precompleted KB.

5.2.1 Interpretation of roles

The disjointness of the domains guarantees that the individual model trees do not over-

lap, so the role interpretations �� are pairwise disjoint. This property is crucial for

the structure of the role extension in the union interpretation, since it ensures that new

links are added only to root nodes (i.e. to those associated to individual names).

First we are going to show that restricted interpretations still satisfy the role hier-

archy.

Proposition 5.11. Let � � � �� be a clash–free precompleted KB, and ��

�� be the individual model for the individual ! in � �. For any pair of role

names � and , individual name !, if � & then �� .

Proof. Since �� satisfies the role hierarcy by hypothesis, �� ; therefore the

only case in which �� !� �� is the one in which a pair is removed from �� but

not from �� . We are going to reason by contradiction, assuming that there is a pair

�!�� s.t. there are �� , & �, �!� !�� , and � )� ��

(see Definition 5.9). This generates a contradiction because � & & �, therefore

the pair should have been eliminated from �� as well.

Now we consider the relation between the union interpretation and the individual

interpretations. We start by verifying that no pair of elements, coming from the same

individual domain, is in the union interpretation of a role unless it is in the individual

interpretation of the role itself. Naturally, we must exclude the case of a pair induced

by an Abox interpretation like �� , which forces the pair �� in � .

Proposition 5.12. Given a precompleted knowledge base � � � �� , and the

union interpretation � � �� from the individual models �� with ! � .

For each role � �� , individual name ! � , and elements � � �� :

� � �� and � !� !� implies � � ��

Proof. We prove this proposition by induction using the partial order & over the set

of roles �� . By the structure of Definition 5.10 we can start from transitive roles

(no induction is necessary in this case), and then prove the property for non–transitive

roles.

If is transitive, we consider the set

��

� & �

and we show that for any integer �, � � �� , � � �� , and � !� !� implies

� � �� . We show this by induction on �.

If � � � then either

� � ��

�� or

� � ��

� & �

The second case can be ruled out because � !� � . In addition, the sets �� and

�� are disjoint for any !� � different from !, because the individual domains

are disjoint by assumption; therefore � � �� .

As an inductive hypothesis we assume that the property holds for any number

less or equal than �, and we show that it holds for � � �.

If � � ��

there is an element � such that � � �� and �� . Note

that � must be different from !� , because we assumed that �� is a quasi transitive

shrub (see Section 3.1). We can use the inductive hypothesis to conclude that

both � � �� and �� are in ��; therefore � � �� , because ��

and �� is transitive by assumption. In addition, �� because cannot

have any functional super–role (see Definition 5.9 and Definition 5.1). Therefore

� � �� .

If is not transitive then we have four cases:

� � ��

��

� � �� )�

� �

� � ��

� & �

� � ��

��

The three initial cases are analogous to the one for transitive roles and � � �: the

second and third cases can be ruled out because � !� � , while the first case satisfies

the proposition because of the disjointness of the individual domains.

In the fourth case there is a transitive role � included in (and different, because

is not transitive) s.t. � � �� . Since we already proved the property for transitive

roles, then � � �� ; therefore � � �� by Proposition 5.11.

The following proposition shows that pairs of root elements come from Abox as-

sertions only.

For any role � �� and elements � � of �, if � � �� and � � � then:

� � ��

� & � or

� � �� )�

� � or

� � ��

� & � �� for some transitive role � & .

Proof. If is transitive, let us consider the set

��

� & �

We show by induction on � that if � � ��

and � � � then

� � ��

� & ��

For � � � it is true because � � �� !��

�� since �� is a quasi transitive tree

interpretation for any ! (see Section 3.3). Let us assume the property for any integer

less or equal than �, and we consider ��

. If � � ��

, then there is an element

� s.t. � � �� and �� . By the inductive hypothesis applied to ��

we conclude that

��

� & ��

therefore � � � . We can then use the inductive hypothesis with � � �� conclud-

ing that

� � ��

� & ��

as well; therefore

� � ��

� & ��

If is not transitive and � � �� , then one of the following cases must be

satisfied:

� � ��

��

� � �� )�

� �

� � ��

� & �

� � ��

��

We can rule out the first case because of the structure of the individual interpretations.

The second and third cases trivially satisfy the proposition, while the fourth case falls

into the previously proved property for transitive roles.

The second important property of role extensions in union interpretation is that

there cannot be a connection between elements from different individual domains with-

out a path passing through a root node (Proposition 5.14). This property ensures that

all the restrictions applying to a non–root element can only be induced through the root

of its own individual domain, instead of by being directly propagated from different

individual domains.

For any role � �� , different individual names �� , and elements � ��, � � ��,

if � � �� then � �� , and � � �� or there is a role � � � �� such that � &

and��

� �� .

Proof. The proof follows the same structure of the previous Proposition 5.12.

If is transitive let us consider the set

��

� & �

We show by induction on � that that for any integer, if � ��, � � �� with � !� � and

� � �� is in �, then � �� , and � � �� or � � ��, �� are in � (this satisfies the

condition, because is transitive and & ).

If � � � then either

� � ��

�� or

� � ��

� & �

The first case can be ruled out because of the disjointness of the individual do-

mains, so � � . In addition, �� is the only element in both �� and �;

therefore � �� (and analogously � � �� ).

As an inductive hypothesis we assume that the property holds for any number

less or equal than �, and we show that it holds for � � �.

If � � ��

there is an element � such that � � �� and �� .

If � � ��, we can use the inductive hypothesis with � � � for concluding that

� �� . We then distinguish the two cases � � �� and � !� �� . In the first

case the condition is trivially satisfied (note that �� for any �). In the

second case,��

� � by the inductive hypothesis; in addition

the transitivity of � and the fact that��

� � , guarantees the

satisfiability of the condition.

If � !� ��, then there is an individual ! different from � s.t. � � ��. We can use

the induction hypothesis with ! and � for �� , concluding that � � !�

and either � � �� or��

� � . Since � � !�� , � �

by Proposition 5.13; therefore � �� . If � � �� the condition is trivially

satisfied, while if��

� � ,

��

� � because�

� � �� and � is transitive.

If is not transitive then have four cases:

� � ��

��

� � �� )�

� �

� � ��

� & �

� � ��

��

The first case can be ruled out because of the disjointness of the individual domains.

In the second and third case � �� because �� is the only element in both �� and

� (and analogously � � �� ). Finally, for the fourth case there is a transitive role �

included in s.t. � � �� , and we can conclude that the proposition is true because

we have already shown that it holds for transitive roles.

Before considering the interpretation of concepts, we must verify that the func-

tional restrictions, and the role hierarchy are still satisfied in the union interpretation.

If a role is included in a functional role � (i.e. & � ), then "��

for any element of �.

Proof. Note that there are no transitive roles included in (and !� � �� ), since

this is forbidden by the language definition. First of all we are going to show that if

is included in a functional role � then the definition of � (see Definition 5.10) can be

simplified by the formula:

� ��

�� )�

� � (,)

Since is included in a functional role, then it is not transitive and it does not include

any transitive role; therefore the component�

�� is empty. We just need

to show that��

� & �� )�

� �

by proving that if � & , then � )� � implies )� � for any role � (see

Definition 5.1). But this is actually the case because � & , and by assumption there

is a functional role (� ) including both � and. The sequence of roles in Definition 5.1

can be extended for by using � and � , and if there is a constraint �� (or

�� !��) with �� & �, this is good for as well because �� & � & .

Now that we proved Equation (,) we are going to show that the functional restric-

tion on is satisfied. We proceed by contradiction, assuming that there are � � ��,

� � �� s.t. � !� � and �� . For doing this we use the alternative

definition of � shown in Equation (,).

Note that it cannot be the case that ��

�� because �� is

functional for any !, and the individual domains are disjoint. If

�� )�

then there is ! � s.t. � !� , and )� because is included in a functional

role (see Definition 5.1); therefore, there is a contradiction between the assumption

that � !� � and Proposition 5.2.

Therefore

� � ��

�� and

� � �� )�

(or the converse, since using � or � is symmetric). Note that there must be two individ-

ual names ! and !�, s.t. � !� , � � !�� , �!� !�� is in � �, and )� �.

Since �!��

�� , then �!� � �� because of the disjointness of the

individual domains. This is in contradiction with the fact that )� � (see Defini-

tion 5.9).

Now we show that the inclusion restrictions among role names are satisfied as well.

For two arbitrary roles and �, if � & then �� .

Proof. Let be � � �� , we need to show that � � �� as well.

If � � � , then � � �� because of Proposition 5.13 and the way the union

interpretation is constructed (see Definition 5.10).

We assume that � !� � , and we distinguish the two cases in which � � ��

for same individual !, and in which � ��, � � �� with ! !� !�. In the first case

� � �� by Proposition 5.12, therefore � � �� by Proposition 5.11. If

� �� and � � �� with ! !� !�, then by Proposition 5.14 either � � !�� (which is

in contradiction with the assumption that � !� �), or there is a transitive role � � & �

s.t.�� !�� !��

� � �� . If is not transitive, then � �� & � by construction

(see Definition 5.10); therefore we assume that is transitive. In addition, by using

Proposition 5.12 we can conclude that �!�� , therefore �!�� by

Proposition 5.11. By Proposition 5.13

� � !��

��

� & � � ��

because � � does not have any functional super–role; therefore � � !�� as well.

Finally, we can conclude that � � �� because � is transitive by construction (see

Definition 5.10).

Now we are going to look at the concepts, by considering the properties of the

union interpretation.

5.2.2 Interpretation of concepts

It is easy to prove that the interpretation of roles satisfies the role assertions in the

KB (because of their relatively low expressivity); however concept assertions are more

problematic. In fact, the presence of new pairs in the interpretation of roles can modify

the extension of concept forming constructors (see Example 4.2).

This “non-monotonicity” problem does not involve any element of the domain, but

it is localised to root elements. Roughly speaking, the reason for this lies on the fact

that the interpretation of roles, restricted to non–root elements does not change (see

Proposition 5.12). Proposition 5.17 shows that, if we restrict to non–root elements, the

extension of arbitrary concept expressions w.r.t. the union interpretation is monotonic

w.r.t the individual models.

Proposition 5.17. For any individual name � � and concept expression �, �� (�� .

Proof. The proposition is proved by induction on the structure of the concept expres-

sion �. The basic case of a concept name is true by construction of � (see Defini-

tion 5.10). If it is true for the concept expressions �� and ��, then for each more

complex expression:

�� Let � �� (��

be such that !� �� , then !� �� by construction (the

individual domains are disjoint), so � �� .

�� By definition ��

� � �� and ��

� � �� , ��

� � �� for the

induction hypothesis; therefore ��

�� It is analogous to the � case.

�� If � �� (

��

, then there exists � � �� such that � � ��

and � � �� . Note that � � �� , because only pairs whose first element is

the root element are removed in the restricted individual interpretation function

(see Definition 5.9). Therefore � � �� (see Definition 5.10). In addition,

� !� �� because �� is a q.t. tree, so � � �� by the induction hypothesis; therefore

� �� .

�� Let be such that � �� (

��

, and � � �� . By using

Proposition 5.13 and Proposition 5.12 we can conclude that � � �� is in �� so

� � �� , and � !� �� . Therefore � � ��

� for the induction hypothesis.

As shown in Example 4.2, this monotonicity property does not hold for arbitrary

concept expressions applied to root elements. However, it is still valid for concept

expressions which appear explicitly as assertions in the precompleted KB (Proposi-

tion 5.18). In fact, the concept in Example 4.2 does not appear in the KB as an assertion

involving the individual �. Moreover, this restricted property of the union interpreta-

tion (together with the more general one applying to non–roots) is sufficient to prove

that all the axioms and assertions in the precompleted knowledge base are satisfied by

the union interpretation (Proposition 5.19).

Proposition 5.18. For any individual name � � , �� implies �� .

Proof. The Proposition is proved by induction on the structure of the concept expres-

sion �.

The basic case is a concept name: If �� is in � � then � � �� so � �

��; therefore �� because �� .

If it is true for the concept expressions �� and ��, then for each more complex

expression:

�� If �� , then �� so �� !� �� . From the disjointness

of the individual domains �� !� �� (see Definition 5.10), therefore ��

��

�� If �� , then both �� and �� are in � � (Proposi-

tion 5.6). For the induction hypothesis �� is in both �� and ��

� , so it is in

�� as well.

�� If �� , then either �� or �� is in � � (Propo-

sition 5.6). If �� is in � �, then for induction hypothesis �� ;

therefore �� as well. The same if �� is in � �.

�� If �� is in � � then �� ; therefore there is

an element � in �� s.t. �� and � � �� . The problem is that

�� , so the pair �� can be removed in the restricted individual

interpretation function (see Definition 5.9).

If �� , then �� as well, and since � � �� (��

because of the tree–like structure of ��, then � � �� by Proposition 5.17.

This concludes that �� .

If �� !� �� , then there are �� s.t. & �, �� !�� and

� )� ��. Since � )�

�� and & �, there is a functional role � such

that & � and � & � ; in addition, the constraint �� is in � �, so

)� �� (see Definition 5.1). This means that

�� )�

��

therefore there is an individual � s.t. � � �� and �� (see Defini-

tion 5.10).

The only remaining thing is to show that �� . However, the constraint

�� must be in � � because of the #�� rule (by Proposition 5.6); therefore

we can use the inductive hypothesis to conclude that �� .

�� Suppose that �� , then we distinguish three different cases

according to the element : is one of the root elements (i.e. � �),

� �� (��

, or there is � !� � s.t. � �� (��

– If � !� then the pair �� must be introduced by means of Abox

assertions (see Proposition 5.13). Let us consider the three different

cases.

��

there is a constraint �� with � & s.t. � �� . The con-

straint �� must be in � � because of the #� rule (by Proposi-

tion 5.6). Therefore we can use the inductive hypothesis for con-

cluding that � �� .

�� )�

there is a constraint �� with )� � s.t. � �� . The

constraint �� must be in � � because of the #�� rule (by Propo-

sition 5.6). Therefore we can use the inductive hypothesis for con-

cluding that � �� .

��

� & � ��

there is a set of constraints

�� !�� !�� !�� !�� !��

and a transitive role � s.t. �� & � & for all � � �� , and

� !�� . By Proposition 5.6 each � � �� the contraints

!�� and !�� are in � � (because of the #�� rule and

the #�). Therefore !�� is � � and � �� by the inductive

hypothesis.

– If � �� (��

then �� , because both �� and are in the

same individual domain (see Definition 5.10). In addition, ��

�� so � �� because �� is a model for the label �� .

Using Proposition 5.17 we can conclude the � �� .

– If there is � !� � s.t. � �� (��

, then by Proposition 5.14 there is

a transitive role � s.t. � & and both �� , �� are in �� . By

using the very same arguments as the first case we can show that the

constraint �� is in � �; therefore we can conclude that � ��

by using the same arguments as in the previous case.

5.2.3 Satisfiability of a precompleted KB

It is clear that a model for the knowledge base is a model for every individual con-

cept. Given the results of the previous sections, Propositions 5.19 shows that the union

interpretation is a model for the precompleted knowledge base.

Proposition 5.19. A clash-free ��f precompleted knowledge base � � � ��

is satisfiable if and only if for each individual name ! � the concept expression�� !� (see Definition 5.8) is satisfiable with respect the terminology � �.

Proof. * Let � be a model for � �, then for any individual name ! and for any concept

� in �� !� the individual !� is in the extension �� (see Definition 5.8).

Therefore !� � �� !��

� so its interpretation is non–empty with respect

to the terminology � �.

+ If for every named individual ! the concept expression�� !� is satisfiable

with respect to the terminology � �, there is a quasi transitive tree interpreta-

tion �� which satisfies � � and having the root element in �� !��

�� (see

Section 3.4). Therefore the union interpretation � � �� can be defined as

Definition 5.10. Verifying that interpretation � satisfies every assertion in � � it

can be proved that it is a model for � �.

Let �� be a concept assertion in � �, then by Proposition 5.18 �� .

Let �� be a role assertion in � �, then �� by construction

of � in Definition 5.10.

Let � � be an axiom in � �, then � satisfies the axiom if and only if

every element in the domain � is in the extension of � with respect to �.

Let be an element of �, then there exists � � such that � ��, and

� �� because the model �� satisfies � �. There are two cases: � ��

or � �� (��

– If � �� then � �� because of Proposition 5.18 (�� is in the

precompleted ABox � � by Proposition 5.6).

– If !� �� , then � �� by Proposition 5.17.

If � � �� then � is transitive by construction (see Definition 5.10).

If the role is functional (i.e. � %�� , or there is a role � � %��

such that � � ), then its functional restriction is satisfied by Proposi-

tion 5.15.

If � is an axiom in � �, then � & and �� by Proposition 5.16.

5.3. KNOWLEDGE BASE SATISFIABILITY 100

5.3 Knowledge base satisfiability

As summarised in Theorem 5.20, the satisfiability of a general knowledge base can

be verified by checking the satisfiability of its precompletions via a terminological

reasoner.

Theorem 5.20. The satisfiability checking problem for an ��f knowledge base can be

reduced to concept satisfiability with respect to a terminology.

Proof. A knowledge base � is satisfiable if and only if one of the precompletions

�� generated by the algorithm in Definition 5.3 is satisfiable (Proposition 5.5).

Moreover, the problem of checking the satisfiability of each one of the precompletions

can be solved by a concept satisfiability checking for each named individual in the

knowledge base (Proposition 5.19).

This means that we now have a KB satisfiability algorithm: the precompletion

rules of Figure 5.1 provide an algorithmic way to enumerate the precompletions of

a knowledge base, while the satisfiability of each precompletion is verified using an

external terminological system such as FaCT (see Horrocks [1998]).

Chapter 6

Querying Aboxes

In earlier chapters we have shown how to verify the satisfiability of a ��f KB (Chap-

ter 5) and how to answer to simple DL queries (Section 2.2). This chapter introduces

the reader to a novel query answering technique for DL KBs. This technique provides

correct and complete answers for disjunctions of conjunctive queries. The aim of this

chapter is to introduce the ideas underlying this technique for the simpler DL ��.

The full formal discussion of the extension to the language ��f is in the next Chapter.

6.1 Introduction

Typically, the only kind of query DL systems support is instantiation (is an individual

� an instance of a concept �?), realisation (what are the most specific concepts � is an

instance of?) and retrieval (which individuals are instances of �?). The reason for this

weakness is that, in these expressive logics, all reasoning tasks are reduced to that of

determining KB satisfiability (consistency). In particular, instantiation is reduced to

KB (un)satisfiability by transforming the query into a negated assertion; however, this

technique cannot be used (directly) for queries involving roles because these logics do

not support role negation.

For example, it can be inferred that John is an instance of Animal if and only if

the KB is not satisfiable when an axiom is added to the Abox asserting that John is not

an instance of Animal (i.e., that John is an instance of the negation of Animal). Re-

alisation and retrieval can, in turn, be achieved through repeated application of instan-

tiation tests. However, this technique cannot be used (directly) to infer from the above

axioms that the pair �John�Bill� is an instance of the transitive role Brother, be-

cause these logics do not support role negation, i.e., it is not possible to assert that

6.1. INTRODUCTION 102

�John�Bill� is an instance of the negation of Brother.

In this chapter we introduce a technique for answering such queries using a more

sophisticated reduction to KB satisfiability. We then show how this technique can be

extended to determine if an arbitrary tuple of individuals (i.e., not just a singleton or

pair) satisfies a disjunction of conjunctions of concept and role membership assertions

that can contain both constants (i.e., individual names) and variables. This provides

a powerful query language, similar to the conjunctive queries typically supported by

relational databases,1 that allows complex Abox structures (e.g., cyclical structures)

to be retrieved by using variables to enforce co-reference (see Chandra and Merlin

[1977]). For example, the query

�� Bill��Parent - �� Parent -

�� Parent - � � ��Hates�� (6.1)

would retrieve all the pairs of hostile siblings in Bill’s family.2

In our work we focus on answering boolean queries, i.e., determining if a query is

true with respect to a KB. Retrieval can be easily (although inefficiently) turned into

a set of boolean queries for all candidate tuples. Note that in available systems, the

retrieval problem is similarly reduced to instantiation.3 It is important to stress the

fact that, given the expressivity of DLs, query answering cannot simply be reduced to

model checking as in the database framework (we illustrate this point in Example 6.1

below). This is because KBs may contain nondeterminism and/or incompleteness,

making it infeasible to use an approach based on minimal models. In fact, query an-

swering in the DL setting requires the same reasoning machinery as logical derivation.

Example 6.1

For using model checking technique for query answering we must be able to as-

sociate a “preferred” model to a given KBs, and this is quite difficult for arbitrary

DL KBs. Let us consider for example a simple KB containing the single axiom

Elephant �Mouse, stating that elephants and mice are disjoint, and the Abox

assertion hathi��Elephant � Mouse�.

We can identify two minimal (w.r.t. inclusion) interpretations satisfying the KB:

1The work is inspired by the use of Abox reasoning to decide conjunctive query containment (seeHorrocks et al. [1999a], Calvanese et al. [1998a]).

2Note that a sound and complete KB satisfiability algorithm will guarantee sound and completequery answers.

3Apart from not very expressive DL languages (see for example Rousset [1999]).

6.2. CONJUNCTIVE QUERIES IN DL 103

in the first the element mapped from the individual hathi is in the extension of the

concept Elephant, while in the second it is in the extension of the concept Mouse.

Which of the two interpretations can be considered the “preferred” one? The point is

that there is not any general mechanism for choosing one, even with this trivial KB.

An important advantage with the technique presented here is that it is quite generic,

and can be used with any DL providing general inclusion axioms where instantiation

can be reduced to KB satisfiability. It could therefore be used to significantly increase

the utility of Abox reasoning in a wide range of existing (and future) DL implementa-

tions.

6.2 Conjunctive Queries in DL

In this chapter we will focus on conjunctive queries only (i.e. queries containing only

conjunctions of terms), and the DL ��. This choice has been made for simplifying

the description of the underlying ideas. In the next chapter we are going to formally

present the technique for a wider range of queries and the more expressive DL ��f.

A key feature of conjunctive queries is that they may contain variables, and we

will assume the existence of a set of variables that is disjoint from the set of individual

names. A conjunctive query is a formula � �� '� - � � � - '��, where '�� '�

are concept or role query terms. Concept and role terms are syntactically equal to

Abox assertions of the form �� or � � �� where � is a concept, is a role and

� � are either individual names or variables.4 A query formula generally contains free

variables which correspond to the tuple the query is meant to retrieve; these variables

are also called distinguished. Queries without free variables are said to be boolean,

and the answer to these queries is not a set of tuples but a true or false value.

In this chapter we are going to introduce the semantics of query answering in a

intuitive way; a formal definition will be presented in the next chapter. The semantics

relies on the the definition of interpretations for a DL: an interpretation � satisfies a

query iff the interpretation function can be extended to the variables in in such a

way that � satisfies every term in. In addition, we require that free variables in are

interpreted as element corresponding to individual names; this restriction ensure that

the set corresponding to a query answer contains only individual names. Using this

definition we can introduce the knowledge base into the framework: a query is true

4We decided to avoid the more common notation �� and �� because it is confusing when �and � are complex concept and role expressions.

w.r.t. � (written � �� ) iff every interpretation that satisfies � also satisfies . The

answer to a non–boolean query will be the set of tuples of individual names that, once

substituted to the distinguished variables, make the query true w.r.t. the knowledge

base. For example, the query

" �Bill� ��Parent - �� Parent - ��Male (6.2)

is true w.r.t. a KB � iff it can be inferred from � that Bill has a grandson. Note that

query truth value and the idea of logical consequence are strictly related. In fact, a

boolean query is true w.r.t. a KB iff it is a logical consequence of the KB.

In the following, we will only consider how to answer boolean queries. Retrieving

sets of tuples can be achieved by repeated application of boolean queries with differ-

ent tuples of individual names substituted for variables. For example, the answer to

the retrieval query �� w.r.t. a KB � is a set of tuples �� , where �� are

individual names occurring in �, such that � �� for the boolean query � obtained

by substituting �� for �� in . Of course the naive evaluation of such a retrieval

could be prohibitively expensive, but would clearly be amenable to optimisation. For

example, let us consider the query (6.1) of the previous section. In many DLs the ex-

pressivity of the Abox for roles is very limited: in ��, for example, a KB implies a

query term like ��Bill��Parent, where the second argument is an individual name,

only if there is an explicit assertion of this form in the Abox. Therefore we may use

role assertions in the Abox to reduce the number of candidates among the individual

names.

We will show how to answer boolean queries in two steps. Firstly, we will consider

conjunctions of terms containing only individual names appearing in the KB; secondly,

we will show how this basic technique can be extended to deal with variables. For

simplifying the notation, we represent the knowledge bases as a single set containing

both the Tbox and Abox assertions (instead of a pair as in Chapter 2).

6.2.1 Queries with multiple terms

In this section we will consider queries expressed as a conjunction of concept and

role terms built using only names appearing in the KB (i.e. without variables); e.g.,

Tom�Student or �Tom�CS710��Enrolled.

As we have already seen, logical consequence can easily be reduced to a KB satis-

fiability problem if the query contains only a single concept term (this is the standard

instantiation problem). For example, Tom�Person is a logical consequence of the KB

�Student Person�Tom�Student� iff the KB

�Student Person�Tom�Student�Tom��Person�

is not satisfiable. This can be generalised to queries containing conjunctions of concept

terms simply by transforming the query test into a set of (un)satisfiability problems: a

conjunction �� - � � � - �� is a logical consequence of a KB iff each �� is a

logical consequence of the KB.

However, this simple approach cannot be used in our case since a query may also

contain role terms. Instead, we will show how simple transformations can be used to

convert every role term into a concept term. We call this procedure rolling up a query.

The rationale behind rolling up can easily be understood by imagining the availabil-

ity of the DL one-of operator, which allows the construction of a concept containing

only a single named individual (see Schaerf [1994]). The standard notation for such

a concept is � � �, where � is an individual name, and the semantics is given by the

equation � � ��

. For example, the expression � Bill � represents a concept

containing only the individual Bill (i.e., � Bill ��

Bill��

Using the one-of constructor, the role term �John�Bill��Brother can be

transformed into the equivalent concept term John��Brother�� Bill ��. Further-

more, other concept terms asserting additional facts about the individual being rolled

up (Bill in this case) can be absorbed into the rolled up concept term. For example,

the conjunction

�John�Sally��Parent - Sally�Female - Sally�PhD

can be transformed into John��Parent�� Sally � � Female � PhD�� The ab-

sorption transformation is not strictly necessary for queries without variables, but it

serves to reduce the number of satisfiability tests needed to answer the query (by re-

ducing the number of conjuncts), and it will be required with queries containing vari-

ables. By applying rolling up to each role term, an arbitrary query can be reduced to an

equivalent one which contains only concept terms, and which can be answered using a

set of satisfiability tests as described above.

However, the logic we are using does not include the one-of constructor, and

most of the state of the art DL systems do not provide the constructor (with the excep-

tion of the latest version of the DLP reasoner, see Patel-Schneider [1998]). Fortunately,

we do not need the full expressivity of one-of, and in our case it can be “simulated”

without the need of adding new axioms (see Section 4.1.2). The technique used is to

substitute each occurrence of one-of with a new concept name not appearing in the

knowledge base. These new concept names must be different for each individual in

the query, and are called the representative concepts of the individuals (written ��,

where � is the individual name). In addition, assertions which ensure that each in-

dividual is an instance of its representative concept must be added to the knowledge

base (e.g., Bill��Bill). In general, a representative concept cannot be used in place

of one-of because it can have instances other than the individual which it represents

(i.e., �� .

��

). However, representative concepts can be used instead of one-

of in our reduced setting. In particular, the conjunction �� - �� is a logical

consequence of a given knowledge base if and only if �� is a logical con-

sequence of the very same knowledge base augmented by the assertion ��. Note

that the trick of using a concept as representative for an individual has been used in a

few other cases in the literature (see De Giacomo and Lenzerini [1996], Borgida and

Patel-Schneider [1994], Era and Donini [1992]).

6.2.2 Queries with variables

In this section we show how variables can be introduced in this framework by using

a more complex rolling up procedure in order to obtain a similar reduction to the KB

(un)satisfiability problem. Variables can be used exactly as individual names, but their

meaning is as “place-holders” for unknown elements of the domain. Because variables

may be interpreted as any element of the domain, they cannot simply be considered as

individual names to which the unique name assumption does not apply; nor can they

be treated as referring only to named individuals, giving the possibility of nondeter-

ministically substituting them with names in the KB. In fact the query

�Bill� ��Parent - �� Parent - ��Male (6.3)

is true w.r.t. both the KBs��Bill�Mary��Parent�

�Mary�Tom��Parent�

Tom�Male

�� and �Bill��Parent��Parent�Male��

but for the first KB the variables can be substituted by the individual names Mary and

Tom, while in the second case the variables may need to be interpreted as elements of

the domain that are not the interpretations of any named individuals.

Answering queries containing variables involves a more sophisticated rolling up

technique. For example, let us consider the terms �� Parent and ��Male of

query (6.3). If � were an individual name, then the terms could be rolled up as

��Parent�� Male�, but this is not an equivalent query when � is a variable

name because � can be interpreted as any element of the domain, not just an element

of �� . However, since in this case � is no longer referred to in any other place in

the query, there is no other constraint on how an interpretation can be extended w.r.t.

�, so the concept � (whose interpretation is always the whole domain) can be used

instead of ��. The resulting concept term is ��Parent�� Male�, which can be

simplified to ��Parent�Male. The same procedure can now be applied to �, thereby

reducing query (6.3) to the single concept term Bill��Parent��Parent�Male�.

In order to show how this procedure can be more generally applied, it will be useful

to consider the directed graph induced by the query, i.e., a graph in which there is a

node for each individual or variable in the query, and an edge from node to

node � for each role term � � �� in the query. For instance, Query (6.3) corresponds

to the graph5

Bill � �MaleParent Parent

It is easy to see that the rolling up procedure can be used to eliminate variables from

any tree-shaped part of a query by starting at the leaves and working back towards the

root (this is similar to the notion of descriptive support described in Rousset [1999]).

The fact that rolling up should start from leaves is essential for correctness: for ex-

ample, rolling up query (6.3) in the reverse order would lead to the non-equivalent

Bill��Parent�� - ��Parent�� - ��Male.

However, this simple procedure cannot be applied to parts of the query that contain

cycles, or where more than one edge enters a node corresponding to a variable (i.e.,

with terms like � � �� - �� ).

Let us consider the case where a variable is involved in a cycle, e.g., the simple

5More details are provided in the next chapter.

� � ��Path - �� Path - �� Path (6.4)�

Path Path

which tests the KB for the presence of a particular type of loop involving the role

Path. Rolling up one of the terms does not help, because the resulting query

� � ��Path - �� Path - ��Path��

Path Path

�Path��

still contains another reference to the variable , and replacing �� with � would result

in a non-equivalent query that no longer contained a cycle. Moreover, it is obvious that

there is no way to roll up the query in order to obtain a single occurrence of any of the

three variables.

This problem can be solved by exploiting the tree model property of the logic.

Given this property, we know that Tbox assertions alone cannot constrain all mod-

els to be cyclical (if there is a model, then there is a tree model), so any cycle that

might satisfy a cyclical query must be explicitly asserted in the Abox. Moreover, given

the restricted expressivity of role assertions (i.e., that they apply only to atomic role

names), cycles enforced in every interpretation must be composed only of elements

interpreting individual names occurring in the knowledge base. Therefore, before ap-

plying the rolling up procedure, a variable occurring in a cycle should be substituted

by the individual names in the KB. This procedure generates a number of alternative

queries corresponding to each individual in the KB. In the next chapter we will see

how they are handled. The intuition behind this property can be understood by consid-

ering that, given an arbitrary interpretation satisfying the cycle only with elements not

corresponding to individual names, a new interpretation can be build where the cycle

is split by duplicating one or more of the involved elements. This new interpretation

can be defined in such a way that it still satisfies the KB, but no longer contains the

required cycle. This duplication can be performed only if the elements are not “fixed”

individual names and by assertions in the Abox. For example, if in the query (6.4) the

variable is substituted by the individual name �, then it can be transformed into the

�� Path - �� Path - ��Path��

which no longer contains a cycle composed only of variables. Consequently, it can be

rolled up into the single concept term

��Path��Path��Path��

where the concept �� is used to close the cycle.

For DLs which can enforce more restrictions on role structure (for example the

transitivity or hierarchy in ��f) the simple formulation of the tree model property is

not longer valid and a more involved definition must be adopted (see Chapter 3). This

fact complicates the basic algorithm we are sketching in this chapter as we show in

Chapter 7.

A similar argument can be used when variables appear as the second argument of

more than one role term, e.g., the variable � in the query � � ��-�� . Such vari-

ables can also be dealt with by nondeterministically substituting them with individual

names occurring in the Abox.

We have seen how role terms containing variables can be rolled up into concept

terms, but these may still be of the form ��, where is a variable. For example, the

query � � ��Parent, where and � are variables, can only be reduced to the single

term ��Parent��. In this case we need to verify that the interpretation of the concept

�Parent�� is nonempty in every interpretation that satisfies the KB. In general, the

interpretation of a concept � is nonempty in every interpretation that satisfies the KB

� iff adding the assertion � �� to � makes it unsatisfiable.

Summarising, the procedure for answering an arbitrary boolean conjunctive query

is divided into two phases. Firstly, the role terms are eliminated by repeatedly applying

the following rules: (i) if the graph induced by the query contains a leaf node �, then

the role term � � �� is rolled up, and the edge � � �� is removed from the graph; (ii)

otherwise, if the graph contains a node � with multiple incoming edges, then all role

terms � � �� are rolled up,6 and the corresponding edges are removed from the graph;

6If is a variable, then it is first replaced with an individual name chosen nondeterministically fromthose occurring in the the KB.

(iii) if the graph still contains edges but no leaf nodes and no confluent nodes, then it

must contain a cycle. In this case a node � in a cycle is chosen (preferably an individual

as this reduces nondeterminism) and rolled up as in case (ii) above. Secondly, the

query (which now contains only concept terms) evaluates to true iff there is at least

one nondeterministic replacement of variables with individual names such that every

term is a logical consequence of the KB.

6.2.3 Extending the framework

In the previous sections we sketched the very basic idea of the query answering al-

gorithm. This intuitive procedure cannot be applied straight away to the ��f logic

we are considering. In fact, role hierarchy and transitivity complicate the structure of

interpretations (see Chapter 3). In addition, the presence of potential cycles introduces

implicit disjunctions which we have not yet shown how to handle.

We need to add a further note about the one-of constructor (or the so called

nominals in the Modal Logic community). From Section 6.2.1 the reader may have

drawn the erroneous conclusion that this constructor can be used to represent variables

as well as individual names. This is false, and providing a logic with one-of would

not eliminate the creation of alternative queries in presence of cycles as described in

Section 6.2.2. The reason for this is that, even though both variables and one-of

represent a single element in the interpretation, the name of the individual in the one-

of is more restrictive than the name of a variable.

Let us consider for example the query � � �� - � � ��. We may be

tempted to use the one-of constructor for representing the joining variable � by in-

troducing a new individual name !� and “rolling up” as in Section 6.2.1, giving the

new query � � �� !� �� - �� !� ��. The semantics of the new query is

completely different from the original one, because it is satisfied by an interpretation

only if the interpretation function maps !� to the appropriate element.7 On the other

hand, if the actual name !� does not appear in the knowledge base, the element mapped

from !� by an interpretation function is arbitrary. Therefore, there is always at least

an interpretation where !� is mapped to a “wrong” element. This means that the new

query would be never satisfied, even if the original one would be.8

7I.e. if the interpretation is , satisfying the condition that �� and ��

�� for some

�� .8The query must be satisfied in every interpretation satisfying the KB.

6.3. RELATION WITH OTHER WORKS 111

6.3 Relation with other works

In the DL community soundness and completeness of reasoning services are consid-

ered essential; therefore we consider only approaches which satisfy this requirement.

This assumption excludes for example the work done with LOOM which provides a

first order query language with an incomplete algorithm (see MacGregor and Brill

[1992]).

As we mentioned early on, our work on conjunctive query answering has been in-

spired by the application of Abox reasoning for solving the problem of query contain-

ment under constraints (see Horrocks et al. [2000a] and Calvanese et al. [1998a]). The

similarity between the two problems comes from the fact that the query containment

problem can be reformulated into a query answering problem. The trick, well known

in the database community (see Chandra and Merlin [1977]), consists in transforming

the less general query into a database by considering the variables as constants. If

querying this simple database with the more general query provides an answer, then

the inclusion is verified. This simple technique can be generalised to the case where

the valid databases are restricted by a DL based constraint language (see Calvanese

et al. [1998a]). The query containment problem is reduced to query answering in the

DL setting where the constraints are transformed into the terminology.

The relation between the two problems is also exploited in Calvanese et al. [2000].

In their framework the query answering problem is reduced to satisfiability of Con-

verse PDL formulae (see Chapter 4). Their technique is very similar to the one we

are presenting, in fact we have been inspired by their work, particularly the idea of

representing a query as a graph and using a “rolling-up” technique.

The difference lies on the underlying DL they use and the scope of their work. They

rely heavily on Converse PDL logic for the transformation, therefore their approach is

only applicable to systems providing a Converse PDL reasoner. In contrast, we want to

present a general technique which is applicable to a wide range of DL systems with a

minimum effort. For this reason, for example, we develop a mechanism for simulating

the inverse role constructor which is provided by Converse PDL but not implemented

in most of the available DL reasoners.

Even if Converse PDL is, for most aspects, more expressive than the logic ��f

we are focusing on, the presence of transitivity and functional restrictions as well as

role hierarchy in ��f makes the structure of its interpretations more convoluted (see

Chapter 3). This complexity is reflected on the query answering algorithm, as shown

in the next chapter.

6.3. RELATION WITH OTHER WORKS 112

The problem of enhancing the query language for Abox KBs has also been attacked

in the context of the system CARIN (see Levy and Rousset [1996a]). This DL based

system integrates a DL with a query language which is essentially (recursive) Datalog.

Their combination provides a very powerful query language which subsumes conjunc-

tive queries;9 however, its applicability is limited by the fact that combining recursive

Datalog with an expressive DL leads to undecidability (see Levy and Rousset [1996b]).

In addition, the technique for query answering uses an ad hoc algorithm which is not

easily portable across different DL reasoners.

On the same track as the CARIN approach, a proposal has been made to use backward-

chaining for evaluating conjunctive queries based on query expansion (see Rousset

[1999]). The problem with this approach is the extreme restriction on the expressive-

ness of the DL language (�� with empty terminologies). The advantage of this

technique is that the query answer is built bottom-up, leading to much better perfor-

mance. The downside is that the the procedure relies on the fact that the underlying DL

has the property of having some sort of “minimal model” against which the query can

be evaluated. As soon that the DL expressivity is enriched by constructors enabling the

representation of disjunctive information the underpinning idea behind the technique

cannot no longer be applied.

In the next chapter we give a formal description of the technique outlined in this

chapter. Formal proofs are provided for supporting the correctness and completeness

of the described technique.

9Using non–recursive Datalog yields to a language which is essentially equivalent to the one pre-sented in Calvanese et al. [2000].

Chapter 7

Answering conjunctive queries in ��f

7.1 Query language

The query language we consider is an adaptation of the basic conjunctive query lan-

guage as defined in the database setting (see Abiteboul et al. [1995], Chandra and

Merlin [1977]). The main difference in the DL case lies in the fact that in DL there are

only relations of arity one (concepts) or two (roles). Individual names can be viewed

as the constants in the database case.

In the following we adopt the same notation as the one for the conjunctive calculus

as described in [Abiteboul et al., 1995, section 4.2].

7.1.1 Syntax

Definition 7.1. Let �� be the set of valid concept expressions build over a set �� of

atomic concept names, and a set �� of role names. We assume a set of individual

names, and a distinct set / of variables. The set 0�� of conjunctive formulae over

�� is the set of all formulae defined by the following abstract syntax:

$ 1 �� ! � � � �

$� - $� � $� 2 $� � � $

where � is a concept expression in ��, �� are role names in �� , ! is an

individual name in ), and � � are variable names in / .

The atomic formulae of the form ��, � � �� , � !, and �

� are called terms. We distinguish the terms into concept terms ( ��), role terms

7.1. QUERY LANGUAGE 114

(� � �� ), and equality terms.

Given a formula of 0�� we use the natural definition of free variables as

those not bound by an existential quantifier. We indicate the set of free variables of

a given formula $ as ��$�. For the sake of simplicity we introduce the notation

� � � � � � �$ as equivalent to the formula � �� $� � � ��.

Given the conjunctive set of formulae 0�� we define a query as a formula

containing free variables. Intuitively, the free variables correspond to the “results” of

the query. We distinguish a special form of query when the formula does not contain

any free variable.

Definition 7.2. A conjunctive query over a DL �� is an expression

� �� $�

where $ is a conjunctive formula ($ � 0��), and �� are called dis-

tinguished variables and are exactly the free variable names of $ (� ��

��$�). Among the conjunctive queries we distinguish the boolean conjunctive queries

as those without free variables.

7.1.2 Semantics

The semantics of the conjunctive formulae is given in the same way as for the DL by

means of interpretations composed by a domain and an interpretation function (see

Chapter 2).

Definition 7.3. Let � � �� be an interpretation over the set of individual names

, role names �� , and concept names �� . Given a formula $ � 0��, and a

mapping ( from the names in ��$� to the domain �, we say that the interpretation

� models $ w.r.t. ( (� �� $) when:

� �� iff (� � � ��

� �� iff �(� �� (��

� �� ! iff (� � � !�

� �� iff (� � � (��

� �� $� - $� iff � �� $� and � �� $�

� �� $� 2 $� iff � �� $� or � �� $�

� �� $ iff there is ) � �� s.t. (� � (� �) and � �� $

7.2. COMPLETENESS OF QUASI TRANSITIVE SHRUB INTERPRETATIONS115

Where the notation (� �) represents the mapping ( extended by the pair � � )� if

is not in the domain of (, otherwise the original value for is replaced by ) (i.e.

(� �) � �� )�� )�� ( � � !" �). With the simplified notation � �� $,

we say that � �� $ iff there is an extension (� of ( with a mapping for

all the variable names �� such that � �� $.

When there is at least a mapping ( such that � �� $, then the interpretation �

models $.

We can use the above defined semantics for specifying the meaning of answering

a query w.r.t. a given knowledge base �. We start from boolean queries; then we are

going to extend the definition to general queries.

Definition 7.4. Let � be a DL knowledge base, and$ a conjunctive formula in0��

without free variables. We say that � answers (implies) $ (written as � �� $) iff any

interpretation satisfying � models $: i.e. for any interpretation �, � �� implies

� �� $.

The definition of logical implication for boolean queries provides the basis for

the formal semantics of query answering. Analogously to the database setting, the

answer to a query is a set of tuples of individual names (see Reiter [1984]). Each tuple

corresponds to an instantiation of the conjunctive query into a boolean query where

the free variables are bound to the corresponding names in the tuple. In other words, a

tuple �!�� !�� belongs to the answer of the query � �� $� w.r.t. the kb �

iff � �� !� - � � � - � � !� - $�.

It is obvious from the definition that query answering can be reduced to logical im-

plication by repeated application of boolean queries with different tuples of individual

names substituted for variables. Of course the naive evaluation of such a retrieval could

be prohibitively expensive, but would clearly be amenable to optimisation (see Sec-

tion 7.5 for more details). However, from now on we concentrate on boolean queries

by showing how to decide if a query formula without free variables is a logical impli-

cation of a given knowledge base.

7.2 Completeness of quasi transitive shrub interpreta-

As anticipated in Section 6.2.2 we need to show that cycles can be enforced only by

means of Abox assertions (see Proposition 7.15 and Proposition 7.16). For this purpose

7.2. COMPLETENESS OF QUASI TRANSITIVE SHRUB INTERPRETATIONS116

we are going to use a result of completeness similar to the one already presented in

Chapter 3. Canonical transitive shrub interpretations provides a formal characterisation

of this “tree-like” structure of the parts of an interpretation which are not constrained

by Abox assertions.

Note that we cannot just reuse the completeness result presented in Section 3.13

because the problem of logical implication is different from the one of kb satisfiability.

In many logics these two problems can be trivially reduced to each other (for example

using negation), but in our case the fact that the query language is different from the

assertional language prevents us from taking this simple shortcut.

Our goal is to show that for providing the evidence that a given knowledge base im-

plies a formula, we do not need to consider any arbitrary interpretation but only quasi

transitive shrubs. For this purpose we are going to use the very same transformation

for interpretations defined in Section 3.2.1.

Theorem 7.5. Let � be a ��f knowledge base, and $ a conjunctive formula in

0�� without free variables. Then � �� $ iff � �� $ w.r.t. canonical quasi transi-

tive shrub interpretations.

Proof. The “only if” direction is trivial because quasi transitive shrub interpretations

are interpretations as well.

For the “if” direction let us assume that � �� $ w.r.t. canonical quasi transitive

shrub interpretations. Let � � �� be an arbitrary interpretation satisfying � (i.e.

� �� ); we have to show that � models $ as well (Definition 7.3).

Let ��

�� be the transitive closure of the unravelled interpretation of

�; in addition, the mapping Æ maps elements of �� to � (See Definition 3.7 and

Definition 3.8). Since we assumed that � �� $ w.r.t. canonical quasi transitive shrub

interpretations, then �� models $ (�� is a canonical quasi transitive interpretation by

Proposition 3.11).

Let ( be an arbitrary mapping ( � / # ��; the mapping ( � is a mapping from

/ to � defined by (� � �� Æ�� (�. We are going to show by induction

on the structure of the formula $ that if �� $ then � �� $. Note that since �

satisfies �, then �� satisfies the properties in Proposition 3.9 (see Proposition 3.10).

First let us consider the basic cases.

�� If �� then (� � � ��, and (�� Æ�(� �� by (3.9d); therefore

� �� .

7.3. ANSWERING BOOLEAN QUERIES 117

� � �� If �� then �(� �� (��

� for

� � �� and �(�� (�� Æ�(� �� Æ�(�� by (3.9b); therefore

� �� .

� ! If �� ! then (� � � !��, and (�� Æ�(� �� !� by (3.9a);

therefore � �� !.

� � If �� then (� � � (��, and (�� Æ�(� �� Æ�(�� (��;

therefore � �� .

As inductive hypothesis let us assume that, for an arbitrary mapping � � / # ��,

if �� $ then � �� $ (and the same holds for $�, $�). We are going to show that

the very same property holds for the formulae $� - $�, $� 2 $�, and � $.

$� - $� If �� $� - $� then �� $� and �� $�. By inductive hypothesis

� �� $� and � �� $�; therefore � �� $� - $�.

$� 2 $� If �� $� 2$� then �� $� or �� $�. If �� $� then � �� $�

by inductive hypothesis; therefore � �� $� 2 $�. Analogously, we come to the

very same result if we assume that �� $�.

� $ If �� $ there is an element � of �� such that �� $with � � (� �� .

For the inductive hypothesis � �� $, where �� Æ�� .

It is easy to show that � � � (�� Æ�� : if �� )� � ��, then either � " and

) � Æ��, therefore �� )� � (�� Æ�� ; or �� with Æ�� ), therefore

�� )� � (�� Æ�� because � � (� �� . Analogusly, it can be shown that if

�� )� � (�� Æ�� then �� )� � ��.

This means that � ��Æ�� $, therefore � �� $.

By Definition 7.3 there is a mapping ( � / # �� such that �� $. We just

shown that there is a mapping ( � � / # � such that � �� $. Therefore from � �� $

for the arbitrariness of � we can conclude that � �� $.

7.3 Answering boolean queries

We are going to present a technique which enables us to decide whether a query for-

mula without free variables is logical consequence of a given knowledge base. Our

algorithm cannot answer arbitrary query formulae but only a restricted class repre-

sentable in a normal form we are going to define. This normal form is general enough

for answering the kind of queries we are focusing on: conjunctive queries and disjunc-

tions of conjunctive queries.

First we are going to present the normal form, then we show how connected and

unconnected conjunctive queries can be verified by using the normal form.1 Finally,

we show how disjunctions of conjunctive queries can be verified by a slightly more

sophisticated reduction to the normal form.

7.3.1 Query normal form

We consider a disjunctive normal form

�� $� 2 � � � 2 � �$�� (7.1)

where each $� is a conjunction of terms of the form of ��, � � �� ,

� ! and � � (i.e. it does not contain any disjunction or existential quantification).

In addition, we require that its free variables are exactly � and �, and all the variables

in each $� are connected. As the normal form suggests, we differentiate a special

variable � whose purpose is “joining” all the different disjuncts, which otherwise do

not share any variable.

It is easy to realise that this normal form is not general enough to cover all the valid

query formulae as described in Section 7.1.1. The simple formula � �� 2

� � �� for example cannot be transformed into the normal form. However, conjunc-

tive queries (i.e. not containing any disjunction) or disjunctions of conjunctive queries

(not sharing any variable) can be transformed into the normal form by renaming of

variables.2

The reduction to boolean queries shown in Section 7.1.2 can generate queries that

apparently cannot be transformed in the normal form; e.g. the query � �� $� 2 $��,

once instantiated into a boolean query, becomes � � �� !� - � � !� - �$� 2$��

for some individual names !� and !�. However, variables being identified with an indi-

vidual name can be distributed among the disjuncts without altering the semantics, be-

cause the uniqueness of the individual name ensures that they will be identified with the

1Intuitively, being connected means that there is a path constituted by the role terms connecting eachpair of variables (we provide a more formal definition at the end of next section).

2Actually the transformation is a little more involved in the case of non connected or disjunctivequeries, as described in the next sections.

same element of the domain. Therefore the formula in the example can be transformed

into the equivalent � � �� !� - � � !� - $�� 2 � � �� !� - � � !� - $��,

which is a disjunction of conjunctive queries.

For a better explanation of the algorithm we use an alternative equivalent represen-

tation of the formula (7.1) by using a set of sets of query terms. The idea behind this

alternative representation is that since all the variables are existentially quantified we

do not really need to specify the quantification operator for binding a variable name.

Therefore we drop the quantification at the beginning of each $� and we use the set

of terms in it instead of the formula itself; i.e. the formula �� $� 2 � � � 2 � �$��

is represented by �*�� *�� , where *� is the set of all the conjuncts in $�. The

notation �� specifies that � is the special variable joining the disjuncts.

Given the one-to-one mapping between the set and explicit forms of the query for-

mula, we use the same notation for describing the notion that an interpretation models

a formula (see Definition 7.3). Given the formula �*�� *�� we say that an in-

terpretation � models the formula w.r.t a mapping ( (� �� *�� *�� ) iff there

is an element (disjunct) *� such that the domain of ( includes all the variables in *�

and for each term + in *�, � �� +. It is easy to see that, for the class of queries rep-

resentable in the given normal form, the two notions of model for a formula coincide.

To simplify the notation, when there is no ambiguity we are going to write formulae

having a single element (disjunct) in the the form � �� * instead of � �� *� ��.

The set representation enables us to provide a formal definition of the property of

connectedness for a formula. First we introduce the notion of a path connecting two

variables.

Definition 7.6. A path connects two variables and it is defined in the following recur-

sive way:

the set �� is a path connecting to �;

if the set $ is a path connecting to �, the set $� is a path connecting � to �, and

$ � $� is empty, then $ � $� is a path connecting to �;3

if $ is a path from to �, then it is a path connecting � to as well.

A cycle is a path connecting a variable to itself. A set of terms contains a path (cycle)

if a subset of it is a path (cycle).

3We require the disjointness of the two paths because we want to prevent cases like � � �� from being wrongly classified as paths starting and ending with the same variable.

By using the previous definition we can now provide a formal characterisation of

a connected formula. A query formula in the normal form is connected iff in every

element (disjunct) there is a path between any pair of variables.

7.3.2 Conjunctive queries

Since nested existential quantifiers can easily be prepended to the formula by renaming

of the variables, a connected conjunctive boolean query is trivially in query normal

containing a single element. In this section we consider boolean queries in the form

� ��

��$�� - � � � - � ��

��$��

where the formulae $�� $�� are connected and do not contain any existential

quantifier nor disjunctions. Without loss of generality we can assume that all the vari-

able names are distinct in each formula.

It is intuitively the fact that, since formulae do not interact one to each other, such

queries may be answered by considering one formula at a time. This is indeed the case,

and we are going to show it formally in the following paragraphs.

Since we can prove a more general result about unconnected conjunction, which

will be used in the next section, we are going to relax the constraint on the form of the

query. Let us consider the query answering problem

� ��

��$�� - � � � - � ��

��$�� (,)

where there is not any restriction on the form of $�� $�� other than the fact that

they do not share any free variables. We are going to show that the query answering

problem (,) is verified iff � �� $�� for each � � �� . The uncon-

nected conjunctive queries we are considering in this section are a particular case of

Since the “only if” direction (i.e. *) is trivially satisfied by definition, we are going

to concentrate on the other direction. Let us assume that the variable names are distinct

in each formula � �� $��, and � ��

��$�� for each � � �� .

Let � be an arbitrary interpretation satisfying � (� �� ); we have to show that �

models the query formula in (,) as well.

Since � �� $�� for each � � �� , then � ��

��$��;

therefore there are � mappings (�� (� for the variable names such that � �� $��.

Let us consider the formula

� ��

��

��$�� - � � � - $��

which is equivalent to the original query formula in (,), because the variable names are

distinct. Since the variable names are distinct, making the union of the mappings does

not alter the variables relevant for each single formula $��; therefore, the mapping

( � (�� (� is such that � models �$��- � � �-$�� w.r.t. (. Therefore � models

the query formula in (,) as well.

7.3.3 Disjunction of conjunctive queries

The last class of queries we consider is the disjunction of conjunctive queries. In this

class, the query formula is a disjuntion of formulae like *� 2 � � � 2 *�, where each *�is in the form

� ��

��$�� - � � � - � ��

��$��

and the formulae $�� $�� are conjunctive queries; i.e. do not contain any ex-

istential quantifier nor disjunctions. A disjunctive query answering problem can be

formulated as

� �� $�� - � � � - $��

2 � � �

2 �$�� - � � � - $��

where each $�� is a connected conjunctive formula. We can assume without loss

of generality that all the existential quantifiers appear at the beginning of each sub-

formula $��. First the query formula is transformed into the conjunctive normal form

by using the distributive property of boolean constructors (i.e. �� - �� 2 � is equiva-

lent to �� 2 �� - �� 2 ��). In this way we obtain an equivalent query formula where

the conjunctions appear at the top level, like

� �� $�� 2 � � � 2 $��

- � � �

- �$�� 2 � � � 2 $��

By using the result shown in Section 7.3.2 we can consider each conjunct �$�� 2

� � � 2 $�� independently from the rest of the formula. Therefore, we need to show

how to answer to a query in the form

� �� $� 2 � � � 2 $�

where $�� $� are connected conjunctive formulae. The formula $� 2 � � � 2 $� has

a close resemblance with the query normal form described in Section 7.3.1. What is

missing is a common “joining” variable connecting the disjuncts (i.e. the variable � in

Formula (7.1)).

The idea is to transform the query into the normal form by choosing arbitrarily one

variable from each disjunct and “promoting” it as the joining variable name. In the

next Proposition we are going to show that this approach is indeed correct and allows

us to solve the query answering problem.

Proposition 7.7. Let � be a knowledge base and ��$� 2 � � �2��$� a query formula

such that ��$�� $� are connected conjunctive query formulae not containing

the variable name �; then � �� $� 2 � � � 2 ��$� iff � �� $�� 2 � � � 2

$�� .4

Proof. Let us assume that � �� $� 2 � � � 2 ��$� and let � be an interpretation

satisfying �. By assumption � �� $� 2 � � � 2 ��$�; therefore there is a mapping

( such that � �� $� 2 � � � 2 ��$�. By definition there is one of the disjunct

��$� such that � �� $�; therefore � �� $�, where (� � (��) for some

) � �� . Let (�� be a mapping such that ( �� (��(�� . The mapping (�� satisfies

� �� $�� because $� does not contain the variable �, so � �� $�� 2 � � �2

$�� , therefore � �� $�� 2 � � � 2 $�� by definition. This means

that � �� $�� 2 � � �2$�� . By the arbitrariness of the choice of � we can

conclude that � �� $�� 2 � � � 2 $�� .

For the other direction, let us assume that � �� $�� 2 � � � 2 $�� ,

and � be an interpretation satisfying �. By assumption there is a mapping ( such

that � �� $�� 2 � � � 2 $�� ; therefore there is a disjunct $�� and

a mapping (� � (��) , for some ) � �� , such that � �� $�� . Let (�� be a

new mapping such that ( �� (��(�� . Since �� is not appearing as a free variable

in $�� , then � �� $�� . In addition, since � is not appearing in $�,

4The formula �� indicates the formula in which all the free occurrences of the variable �, �itself is substituted by .

7.4. ANSWERING QUERIES IN NORMAL FORM 123

�$�� " $� so � �� $� (the only differences between ( and (�� are the

mapping of variables � and ��). Therefore � �� $�2 � � �2��$� and consequently

� �� $� 2 � � �2��$�. By the arbitrariness of the choice of � we can conclude that

� �� $� 2 � � � 2 ��$�.

7.4 Answering queries in normal form

In this section we describe the actual algorithm for answering queries in the normal

form described in Section 7.3.1. The algorithm is described in terms of a set of non-

deterministic rules which transform the initial implication by modifying the query for-

mulae and possibly the KB on the left hand side. The goal of the rules is producing an

implication test of the form � �� ; when such a form is reached

the rules are no longer applied. Such a simple implication can be verified by checking

whether the knowledge base �, extended with the axiom �� is

unsatisfiable. In fact, the formula �� is implied by the KB � iff in every inter-

pretation satisfying � the extension of the concept � is non-empty. We can rephrase

the question by verifying whether there is an interpretation in which the extension is

empty (i.e. the negation of the concept itself is equal to the whole domain).

We require that the algorithm is correct, complete, and terminating. We are going

to build these transformation rules according to these requirements. In the next sections

we prove that this is actually the case.

7.4.1 Query transformation rules

Rules transform query problems of the form � �� *�� *�� into “simpler”5 prob-

lems of the same form. An element *� is selected such that it matches the preconditions

of one or more rules. One of the matching rules is selected according to a strategy, and

the query is transformed as described in the rule. The procedure is repeated until no

elements of the query formula satisfy the preconditions of any rule. At this point the

resulting query formula will be of the form

��

which is equivalent to the formula �� .

5Simpler in the sense that the query after the application of a rule is closer to the target formula�� .

In general, rules substitute the element (disjunct) satisfying the precondition with

a new disjunct, and possibly they modify the KB � in the left hand side:

� �� * � *� � � � � � *� ��

�� *� � *� � � � � � *� ��

In some cases, the selected element * may be substituted by more than one elements

*�� *��; but since this transformation is performed by the last two rules only, we

maintain the simpler single element substitution for the rest of the rules.

Before presenting the rules we need to introduce some notation which is going to

simplify the following exposition. As introduced in Chapter 6, we are considering

knowledge bases as sets of Tbox and Abox assertions instead of a pair of sets. This

simplifies the notation, when new axioms are added to the KB (see nominal elimination

rule).

A further simplification is made to the role syntax in the role terms, which are

represented in a set like notation instead of a conjunction; i.e. the expression��

� is represented by �� . In cases where the actual elements of the sets

are not used, the set will be indicated by a calligraphic character like � or �; i.e. the

conjunction is �� (see for example the role collapsing rule).

New concept names are introduced in some of the rules; we assume that these new

names do not appear in the knowledge base to which the rule is applied. Some of these

new names are uniquely associated to individual names and are called representative

concepts; they are denoted as ��, where ! is the associated individual name (see the

nominal elimination rule for an example). It is important to note that the representative

concepts are unique w.r.t. each individual name. Therefore, if the assertion !�� is

already in �, then the knowledge base �� !�� is not modified. New concept

names, different from the representative concept, can be introduced by appropriate

rules (e.g. the simple inverse rolling up rule); they are unique w.r.t. the actual rule

application.

Finally, in some of the rules the presence a concept term of the form �� may be

required among the other conditions (e.g. the rolling up ones). If such term is not in

the formula, we assume an implicit �� (see the simple rolling up rule).

The ordering in which the rules are presented is not arbitrary, since the strategy

for their application gives the precedence according to this order. This assumption is

essential for some of the proofs we are going to show.

1. Equality elimination: if � � �� * (or �� *), ,� are variable

names, !" �, and *� �� indicates the set * in which all the variables are

substituted by �; then

��

*� � *� �� ( ��

Note that � � � is used because has been substituted by � in *� �� .

2. Conjunction elimination: if � �� *, then

��

*� � � �� * ( � ��

3. Role collapsing: if �� * then

��

*� � ��

� �* ( ��

4. Contradiction elimination: if � � !�� !�� *, and !� is different from

!�; then

��

*� � ��

5. Nominal elimination: if �� !� � * and there is no term of the form � ��

in *; then

�� !��

*� � �� * ( �� !��

Where �� is the representative concept name associated to the individual !.

6. Role elimination: if �� *, � � �, and there are two

indexes #� � s.t. � & �; then

��

*� � �� ( ��

� �* ( ��

7. Shortcut elimination: If

�� *�

and there is an index # $ � and a transitive role � such that � & �, and in every

set of roles ��, there is at least a role �� such that �� & �; then

��

*� �

�� ( �� if � � �, or

� �* ( ��

* ( �� if � � �.

8. Simple rolling up: if �� *, � !" �,6 and * does not contain any

other term involving �; then

��

*� � � �� * ( ��

9. Simple inverse rolling up: if �� *, � !" �, and * does not

contain any other term involving �; then

��

*� � � �� * ( ��

Where �� is a new concept name not appearing in �.

10. Functional rolling up: if �� *, � � �, � !" �,

6We are going to use the symbol �, instead of �, to denote syntactic equivalence for avoidingconfusion with the use of � in query terms (see section 7.1.1).

there is a label � s.t. �� ,7 and * does not contain any other term

involving �; then

��

*� � � ��

� �* ( ��

11. Functional inverse rolling up: if �� *, � � �,

� !" �, there is a label � s.t. �� , and * does not contain any

other term involving �; then

��

*� � ��

� �* ( ��

12. Nominal rolling up: if �� !� � � �� *, � !" �, and *

does not contain any other term involving �; then

�� !��

*� � � ��

� �* ( �� !� � � ��

13. Nominal inverse rolling up: if �� !� �� *, � !"

�, and * does not contain any other term involving �; then

�� !��

��

*� ��

��

� �* ( �� !� ��

Where �� is the representative concept name associated to the individual !, and

��

�� are new concept names not appearing in �.

7The set of labels of a given KB are defined in Definition 3.4.

14. Cycle breaking: if �� !� � � �� *, and in

�* ( ��

there is a path from to � (see Definition 7.6), or " �; then

�� !��

*� � � ��

� �* ( ��

15. Simple cycle breaking: if �� !� � � �� *, and � � �;

�� !��

*� � ��

� �* ( ��

Where �� is the representative concept name associated to the individual !.8

16. Inverse cycle breaking: if �� !� �� *, and in �* (

�� there is a path from to �, or " �; then

�� !��

��

*� ��

�� * ( ��

Where �� is the representative concept name associated to the individual !, and

��,. . . ,��

�� are new concept names not appearing in �.

17. Simple nominal introduction: if �� *, � � �, there

is not any label � such that �� , and there is not any term like

� � ! in *, and !�� !� are the individual names appearing in �; then � new

disjuncts are added to the query formula in place of *:

� �� * � �� !�� * � �� !�� *�� *��

8The term �� is left in the query for avoiding the problem of breaking the connectedness ofthe formula.

where *�� *� are the original disjuncts.

18. Cyclic nominal introduction: if * contains a cycle involving the variable �,

none of the variables �� involved in the cycle appear in a term like �� ! in *,

and !�� !� are the individual names appearing in �, then new disjuncts are

added to the query formula in place of *:

� �� * � �� !�� * � �� !��

)'�� *� � � � � )'�� *� *�� *��

where *�� *� are the original disjuncts, and )'�� )'�� are all the possi-

ble combinations of equality terms among pairs of variable names �� involved

in the cycle; i.e. )'�� .

7.4.2 Notes on transformation rules

In this section we explain in detail the rationale behind the rules introduced in Sec-

tion 7.4.1; we provide the intuition on their correctness, while the formal proofs are in

the following sections.

The underlying idea is that a conjunctive formula, represented by a set of terms,

can be visualised as a graph. The nodes of the graph are the variable names, labeled by

the set of concept terms applying to the variable node. Nodes are connected by edges

representing the role terms. In addition, equality terms are explicitly attached to the

appropriate node. For example, the query expression

�� !� ��

is represented as the graph

� � !

��

A formula in normal form is a set of conjunctive formulae sharing a common variable;

by the graph analogy it corresponds to a set of graphs “hooked” on the joining variable

(the � in Formula (7.1)). Each graph corresponds to one of the elements in the normal

Note the similarity between the graph representation of a query and of an interpre-

tation (see Chapter 2). The main difference is that, in the case of interpretations, nodes

are elements of the domain and are labelled by concept names only (in interpretation

graphs, edges are labelled by sets of names, but this is just a different notation for the

conjunction constructor). The parallel between the two representations enables us to

view query answering under the different perspective of graph matching. In fact, given

an interpretation and a conjunctive query, we can see the question of whether the in-

terpretation satisfies the query as a problem of verifying if the graph corresponding to

the query can be matched against the graph representing the interpretation. The pres-

ence of disjunction in the query language slightly blurs this perspective; however, since

we restrict ourselves to disjunctions of conjunctive queries we can consider the non-

deterministic matching with one of the graphs in the disjunctive formula. The graph

matching perspective is very useful for understanding the rationale behind the algo-

rithm. In particular, it enables us to build and explain examples9 in a very intuitive

The purpose of the transformation rules is to collapse each graph contained in a

disjunctive formula into a graph composed of a single node and without any edges.

The rules are grouped according to the kind of transformation they perform on the

graph.

Normalisation rules

Due to the role conjunction constructor, more than one formula may correspond to the

very same graph; e.g. both the formulae �� and �� are

represented by the very same graph. Moreover, there are valid formulae that do not

fit nicely with the graph idea; e.g. formulae having multiple concept or equality terms

applied to the very same variable name (like �� or � � �� !�). For this

reason, the first five rules are designed to transform the formula in a normal form which

correspond univocally to a graph representation. We call these five rules normalisation

rules.

The equality elimination rule eliminates equality terms by renaming the appropriate

variables. Note that the rule is written in such a way that it never eliminates the joining

9And counterexamples as well. Most of the problems discovered in the development of the algorithmhave been spotted by using the graph matching analogy.

variable � (for this reason, both the � � and � � cases must be considered).

The conjunction elimination and role collapsing rules serve the same purpose of

gathering together concept (role) terms applying to the vary same variable (pair of

variables) by means of the DL conjunction constructor.

The contradiction elimination rule takes care of formulae which cannot be satisfied

by any interpretation because of two individual names. In fact, by the unique name

assumption two different individual names cannot be interpreted by the very same

element of the domain. The contradictory formula is substituted by the simpler ��

which has no models either. The choice of the substituting formula is driven by the

fact that the resulting formula must contain the connecting variable �, and should be

connected (see the definition of normal form in Section 7.3.1). An alternative choice

would have been the elimination of the element (i.e. the disjunct * is simply eliminated

from the query). However, this choice has been ruled out to avoid the necessity of

taking into account cases in which the formula contains a single disjunct (e.g. like

�� ).

The last normalisation rule, the nominal elimination rule, is needed for transform-

ing a formula in which the “linking” variable is identified with an individual name (we

will come back to this case later on, when we described the rest of the rules). It is easy

to see that the number of times the normalisation rules can be sequentially applied to

the same disjunct is bounded by the number of terms plus one (because of the nominal

elimination rule).

Role elimination rules

The role elimination and shortcut elimination rules remove redundant edges from the

graph. This is necessary because parts of the graph that may appear as a cycle (or as

multiple edges connecting two nodes) may not really be cycles. If these “false” cycles

are not correctly handled then the nominal introduction rules would be incomplete (see

the proofs in Proposition 7.15 and Proposition 7.16). The following example shows

why the absence of the role elimination rule would cause problems.

Example 7.1

Let us consider for example the query �� against the knowledge base � �

�� . Clearly, each interpretation satisfying� is a model for �� ;

therefore � �� (i.e. � �� ). On the other hand, the

formula �� is a candidate for the simple nominal introduction rule, which

transforms the query into � �� - � ��. However, this new query

formula is not implied by �. The problem is that, even though the query graph shows

two entering edges into the node , the fact that � is included in implies that an edge

labeled � is always an edge as well.

The shortcut elimination rule is conceptually similar, but it covers the cases in

which the transitivity restriction forces the existence of edges not explicitly appearing

in the query formula itself.

In figure 7.4.2 these “phantom” edges are depicted as dotted lines. Solid lines

denote explicit edges, while dashed lines represent implied edges. Finally, undirected

edges represent arbitrary connection with the rest of the graph.

��

� ��

Figure 7.1: Prerequisites for shortcut elimination rule.

Since the proposed technique works only when in cycles there are no transitives

role (or roles including transitive ones). The shortcut elimination rule can transform a

query which is apparently not covered by the algorithm into a suitable one.

Rolling up rules

The next three pairs of rules (rolling up and its inverse version) are the core of the graph

collapsing procedure. They act on “leaves” of the graph,10 eliminating the leaf node

and the edge(s) connecting it to the parent node. We call them “rolling up” because the

information about the removed node and edge(s) is incorporated into the parent node

via a concept term.

The simple rolling up rule is the more intuitive one; and the easiest way of under-

standing it is to consider the first order translation of the DL constructor ��, which

is � � �� - �� (see Borgida [1994]). The translation correspond exactly

to the part of the query �� involved in the rule.

10The condition of being leaves is enforced by the requirement “. . .� does not contain any other terminvolving ”.

��

Figure 7.2: Application of simple rolling up rule.

Note that any reference of the fact that the edge was connecting to � is lost in the

application of the rule; therefore the rule can be applied only if there are not any other

terms involving � itself (i.e. � is a leaf node). Rolling up a non–leaf node may produce

a wrong aswer to the query as shown in the following example.

Example 7.2

To show why this can cause problems, let us consider the simple query

��

and the kb � � �� . Clearly, � does not imply the given query

because there is a model satisfying � and not having a path with the two roles in

sequence (e.g. the interpretation �� , � � �� , and �� satisfies

�). However, if the simple rolling up rule is applied to the term �� , the resulting

query formula is �� . It is easy to realise that the latter query is

a logical implication of � (i.e. identifying both � and with �); therefore the rule

application would not be correct.

The simple rolling up rule is applied only if there is a single edge connecting the

leaf node; e.g. the rule cannot be a applied to the term to the term � � �� . Again,

the reason lies in the loss of information which would hide the fact that there is only one

confluent node (i.e. �). However, there are cases in which this uniqueness is redundant

because there are other contraints that guarantee it. This is the rationale behind the

functional rolling up and nominal rolling up rules.

Let us consider the case when the leaf node � is connected to the “father” node

by a set of roles (i.e. � � �� ) all being subroles of a common functional

role � (see Figure 7.3 (a)). Variable � can be duplicated � ' � times, because the

common functional role � forces � and its copies to be interpreted as the very same

value (Figure 7.3 (b)). All the generated nodes can then be rolled up as described for

the simple rolling up rule.

��

� ��

�� Figure 7.3: Functional rolling up rule.

The nominal rolling up rule works in the same way (Figure 7.4); but in this case the

uniqueness is guaranteed by the equality term. Note that the basic rolling up procedure

cannot be used directly in this case because there is an equality term (i.e. � � !).

To cover these cases we use the representative concept of the individual name in-

volved. As introduced in Section 6.2.1, the representative concept is used in place

of the DL one-of constructor. In fact, using the one-of constructor, the terms

�� !� � � �� (when � is a leaf node) can be substituted by the

single term assertion �� ! �� ! � � � � � � �� ! ��. The nomi-

nal rolling up rule uses the concept �� in place of the concept expression � ! �. In

addition, an assertion which ensure that the individual is an instance of its representa-

tive concept is added to the knowledge base (e.g., !��) if it is not already there.

��

��!

� � � ��!

� ��

��

� � � �� Figure 7.4: Nominal rolling up rule.

Since our underlying language does not allow the use of the inverse role construc-

tor, all the rolling up rules have their inverse counterpart. This is necessary because

we cannot guarantee that the graph can be collapsed by following the direction of the

edges. In fact, is quite easy to come up with an example of graph in which all nodes

have both entering and exiting edges. The idea is not dissimilar to the one already ex-

ploited for the representative concept: the introduction of a new concept representing

the existance of the appropriate edge and node.

Consider the basic simple inverse rolling up rule for a disjunct having the terms

�� (node � is a leaf). If the underlying DL logic provided the inverse role

constructor, the pair of term could have been substituted by the single concept term

�� in the same way as the simple rolling up rule. In order to overcome the defi-

ciency of the logic we are going to substitute the expression with a new concept name

(��). The technique is similar to the one used for reducing the satisfiability problem

of Converse PDL to the very same problem in PDL (see De Giacomo [1996]). In our

case we are not interested in enriching the language with the inverse role constructor,

but only in representing a particular “shape” in the interpretations. Therefore, we can

localise the scope of the newly introduced concept name to the given expression � by

using the axiom � �� (see Proposition 7.18).

The nominal inverse rolling up rule is essentially a combination of the simple in-

verse rolling up rule, combined with the duplicating arguments we already presented

in the case of the nominal rolling up rule. The functional inverse rolling up rule takes

a different approach, exploiting the functional restriction on the role names. Consider

the transformation shown in Figure 7.5, which is exactly the same pattern as seen in

the functional rolling up rule (see Figure 7.3).

��

Figure 7.5: Functional inverse rolling up rule.

In this case we duplicate the variable ; the new nodes �� can then be rolled

up by using the same arguments as the simple rolling up rule. The resulting structure

shown in Figure 7.5 (c) is then suitable for the application of a simple inverse rolling

up rule in a subsequent step.

The rolling up rules eliminate nodes and edges from the graph, but we should

make sure that the two conditions, maintaining the joining variable � and maintaining

the connectedness, are still satisfied by the resulting graph. The second condition is

guaranteed by the fact that rules only operate on leaves, while the first one is explicitly

stated on each rule (with the condition � !" �). However, this latter requirement for

the rules clashes with the necessity of eliminating the equality terms as well as the

role terms. In fact, equalities with individual names are eliminated by the rolling up

procedure (by the nominal rolling up and nominal inverse rolling up rules; but they

never apply to the joining variable �. For this reason we have the nominal elimination

rule which applies to � only. Note that this rule must be applied only when there are

no role terms in the conjunctive formula, otherwise the fact that � must be identified

with a particular individual name would be lost.

Cycles in query formulae

Until this point we have shown the rules which handle tree–like structures; in fact,

when the graph is a tree we can simply roll it up starting from the leaves.11 When the

query graph contains cycles we need a means for “breaking” these cycles and rolling

up the graph using the rules described above. This means is provided by the cycle

breaking rules toghether with the nominal introduction rules.

Let us first examine the three cycle breaking rules, starting from the cycle breaking

and inverse cycle breaking rules. They work using the very same principle which has

been exploited for the nominal rolling up rules. The fact that the variable � is identified

with an individual name allows us to introduce a new variable forced to have the very

same interpretation as � (because the new variable is identified with the very same

individual name). In addition, the path still connecting the two variables guarantees

the connectedness of the resulting graph (see Figure 7.6). The inverse cycle breaking

rule is similar, the only difference being the direction of the edges connecting the two

variables. The two rules also cover the case in which the two variables coincide; in

that case the graph contains a loop centered on the variable.

��!

��

�(c)

��!

��

Figure 7.6: Cycle breaking rule.

The simple cycle breaking rule is similar, in principle, to the cycle breaking rule; but

it does not assume that removing the edges will leave the graph connected; therefore,

11This is not completely precise because, even in tree–like graphs, we may still need to use the simplenominal introduction rule.

one of the role names (�) is left unchanged in order to maintain the connectedness.

At a first sight, simple cycle breaking looks redundant because of the nominal rolling

up rule. However, the rule is necessary when the rolling up is contrary to the direction

of the edges (for example when � " �, or the leaf node is ) because the nominal

inverse rolling up rule covers only cases in which the starting node is identified with

an individual name.

The rule is part of the ones handling cycles because, although the role term (i.e.

� � �� ) is not necessarily part of a cycle (according to Definition 7.6)

when role names appearing in a role term are unrelated (see the role labels in Defini-

tion 3.4), the term itself can be considered a sort of a cycle. In fact, we can traverse

one of the edges in one direction and a different edge in the other direction, coming

back to the very same variable. This is not true for the “false cycles” as explained in

the normalisation rules.

In the query there can be cycles (even the “cycles” composed by a single term)

composed by variables only; i.e. in which none of the variables are identified with an

individual name by an equality term. In these cases we cannot use the cycle breaking

rules, and one of the nominal introduction rules must be used.

These two rules exploit the completeness of the quasi transitive shrub interpreta-

tions (Section 3.1), since the properties of these interpretations guarantee that cycles

can only be enforced by assertions about the individual names. Consider the simple

nominal introduction rule applied to a term � � �� . The idea consists

in identifying the second variable (�) with one of the individual names. The choice

cannot simply be nondeterministic, as shown in Example 7.3; therefore we must be

able to try all the possibilities at once. This is the reason why maintaining a connect-

ing variable (the � variable in Section 7.3.1) across all the disjuncts is essential for the

completeness of the algorithm.

The problem with the nondeterministic approach is that once the (nondeterminis-

tic) choice is made, it must be the correct choice for all the possible interpretations

satisfying the knowledge base. In fact, we are going to show an example of knowledge

base in which for some interpretations a variable must be identified with one individ-

ual, while for others a different individual must be chosen. Trying all the possibilities

at once allows all the different cases to be covered.

Example 7.3

Let us consider the knowledge base

� � ��

� forces two different cycles with the same role names:

��

in addition, the assertion �� forces either � or � to be an instance of � in

every model. Therefore the query formula � �� - � � �� - �� is a logical

consequence of �. On the other hand, we cannot nondeterministically choose one of

the two individual names to be identified with (or �). In fact, in this way all the

possible choices will lead to query formulae which are not logical consequences of �.

When a true cycle is present in the query, the cyclic nominal introduction rule is

used; its rationale relies on the very same arguments as for the simple nominal in-

troduction rule. However, in this case we must be more careful in order to avoid a

different kind of “false cycle”, as shown in the next example.

Example 7.4

Let us consider the formula * � �� having a

diamond shape:

and the knowledge base � � ��. By interpreting both variables �and � as the very same element, it is easy to see that � �� *�. In addition, the graph

is cyclic and there are no applicable rules other than the cyclic nominal introduction

Suppose that we are going to apply the rule to the variable � (choosing a different

variable would not make any real difference), without adding all the possible combi-

nations of equality terms among the variables in the query. In this case we obtain the

new formula

*� � ��

because � is the only individual name in �. It is easy to see that �* �� is not a logical

consequence of � (i.e. � !�� *�).

Applying the rule as it is defined, we obtain several disjuncts and the formula

� � � �� among them. This disjunct is sim-

plified into the formula �� by the equality elimination rule, and

subsequently rolled up into the formula *�� . It is easy to see that

the query �� *�� is a logical consequence of �.

As shown by the example above, the cases of false cycles are covered by the intro-

duction of the equalities in the cyclic nominal introduction rule. If two (or more) of

the variable names in a cycle must be interpreted as the very some element, there is at

least one of the disjuncts which has the required equality terms explicitly stated.

7.4.3 Limitation of the technique

Unfortunately the presented technique works only under some restrictions on the form

of the queries and/or terminologies. We have two main sources of problems, namely

the transitivity restrictions and the interaction between functionality and the role hier-

archy. This is symptomatic of the fact that the restriction on role interpretation compli-

cates the structure of the models as shown in Chapter 3. In the following two section

we are going to describe the limitations of the query answering algorithm.

Functionality and hierarchy

As highlighted by the Definition 3.6 of a canonical interpretation, we can assume that if

a pair is in the interpretation of two different roles these two roles must be in a common

label (let us ignore Abox assertions). However, the contrary is not necessarly true; i.e.

interpretations of roles do not have to coincide if two roles are in the very same label.

For the application of the simple nominal introduction rule it is essential to know

whether a variable must be identified with an individual in the presence of two variables

connected by more than one edge (i.e. the node � with a term like � � �� ).

Trivial cases are covered by the role elimination rule, since the inclusion relation be-

tween two roles guarantees that every pair in the included role must be in the including

role as well. However, the interaction between functional restrictions and role hierar-

chy may cause more subtle “terminological” coreferences.

For example, in the case in which two unrelated (via &) roles have a common func-

tional super–role. Let � and � be the two roles both included in a functional role

� ; in an arbitrary interpretation � satisfying the restrictions on the roles, whenever

there is a pair �� in �� and a pair �� in �

� then � � � because they must be

both in the interpretation of � . Note that this is different from saying that whenever a

pair is in one of the two roles it must be in the second as well (as in the case if inclu-

sion). This property allows us to “roll up” terms like � � �� into something

like �� without loosing the information that the two successors must

be the same element.

The cases shown until now have in common the fact that the “terminological”

coreference can be discovered by looking at the relation among the role names. Un-

fortunately there are cases in which the role structure is not enough and the single

interpretation makes the difference.

Example 7.5

Let ��, �� be two unrelated functional roles and � �� % three different roles s.t. &

��, � & ��, � & ��, and % & �� (assuming that the the converse elations do not

hold). Given an arbitrary interpretation �, and two pairs �� in � and �� in

% � , whether or not � and � must be the same actually depends on the existance of an

�–successor of �.

If there is an element �� s.t. �� , then by using the same arguments as in

the previous case we can show that � � �� and �� ; therefore � � �. Otherwise,

nothing is forcing � and� to be the same. Here we have a “terminological” coreference

that does not depend on the roles structure alone.

How this affects the presented algorithm can be shown by a simple example. Let

us consider the following three Aboxes sharing the same role terminology described

�� %�� %� (7.5a)

�� %�� (7.5b)

�� %�� (7.5c)

We want to decide whether the formula �� % is a logical consequence of the

assertions.12

It is easy to realise that the given formula is a logical consequence of both the first

two Aboxes (7.5a) and (7.5b), but not of the third one. In fact, an interpretation �

where �� , �� % � � ��

� , �� , and �� ), satisfies the

Abox assertions in (7.5c) but not the formula � �� % �.

The problem in devising an algorithm to discern the different cases comes down to

figuring out whether the variable should or should not be identified with one of the in-

dividuals of the Abox. For the Abox (7.5a) we can use a transformation rule like simple

nominal introduction which adds, (amongst others) the disjunct �� % � � ��.

However, this is not sufficient to detect the case of the Abox (7.5b); this last case can

be covered by a rule like functional rolling up, which transforms the formula into

�� %��.

On the other end, if the latter transformation is indiscriminatevily used, the Abox

(7.5c) would be wrongly recognised as satisfying �� % . In this particular case,

the right trasformation should be something that takes into account the necessity of an

�–successor for �, like the term �� %�� .

The last example suggests that a combination of the two mentioned rules can be

used to detect these tricky cases of interaction between role hierarcy and functional

restrictions. However, we still have to generalise this intuition to the general case, and

for the time being we are forced to restrict the expressivity of the role terminology.

Roughly speaking, we forbid cases like the described one by ensuring that for any

pair of roles in a single label either they are included one in the other, or they share

a common functional super–role. As shown in the proof of Proposition 7.14, this

restriction is enough to prove correctness and completeness of the functional rolling

up rules.

Transitivity and cycles

The kind of queries we are able to answer using this technique must not contain any

cycle which includes a transitive role name.13 The problem with cycles including tran-

sitive roles is that there may be variables in the cycle that are not identified with one

of the individual names in the knowledge base. The reason is that transitivity pro-

vides a way of creating shortcuts for arbitrarily long directed paths. Let us consider for

12With an abuse of notation we used the individual � instead of a variable for simplifying the formula.13In fact, the requirement can be less restrictive, as highlighted by the proof of Proposition 7.16.

However, this formulation is much simpler and clearer.

example a knowledge base forcing interpretations having a structure like:

� �

��

where �, �, � represent elements of the domain corresponding to individual names

and the dot an “anonymous” element. When the role name is transitive the dotted

shortcuts are presents whenever the non–dotted structure is present (for example by

means of an assertion like ��). Therefore there is always a cycle which includes

an element not corresponding to an individual name (the “anonymous” element). The

scenario can be complicated by the role hierarchy, since these transitive shortcuts can

be forced into non trasitive roles as well.14

7.4.4 Example of query answering

For a better undersanding of the algorithm we show the rule application process on a

concrete example. Let � � � �� !�� be a knowledge

base and we like to know whether the answer to the boolean query

�� - �� - � ! - �� - ��

is positive or negative. We can guess that the answer must be negative (i.e. � does not

imply the given formula). The reason for this is that the constraint �� does not

force a loop on � but only the existence of a element related with � by role � (it may

be � itself, �, !, or even a fourth one not specified).

Since the query formula is conjuntive and connected, it is already in normal form.

We choose � as the joining variable:

�� !� ��

14We have the conjecture that the problem can be solved by introducing more disjuncts with the cyclicnominal introduction rule. The idea is that, since the underlying DL does not provide the inverse roleconstructor, at least one of the variables in a cycle must be coreferenced with one of the individual namesin the KB. However, this hypothesis is still being investigated.

To simplify the exposition we start from a knowledge base � � already containing the

assertions for the representative concepts (i.e. �� !�� ). In this way

the knowledge base will be unmodified by the rules we are going to use. Initally the

query formula contains a single graph, to which are applied first the role elimination,

followed by the nominal rolling up rules:

� � !

��

� � !

��

At this point we can only apply the cyclic nominal introduction rule to the loop in-

volving the variable �. The rule generates three different isomorphic graphs where

the variable � is identified with each one of the three different individual names in the

knowledge base. The query now is composed of the three graphs:

� ��

��

� ��

��

� �!

��

Since the graphs have the same shape, they are transformed in the very same way; with

the only difference of the representative concept used for the variable �.First, the cycle

breaking rule is applied to the loop on �, then the nominal rolling up rule concludes

the collapsing:

� ��

��

� ��

��

� � ��

� ��

The query formula we are now going to verify is composed by three disjuncts in the

form of �� ; where � � is substituted by ��, ��, or ��.

Since the variable � is the same we can collapse the disjunction into a single concept

expression by using the � constructor.

��

� ��

By using simple transformations on the concept expressions we obtain the query an-

swering problem

��

which can be verified by checking whether the knowledge base

��

is unsatisfiable. We can push the negation down into the concept expression to obtain

the kb

��

This kb is indeed satisfiable, and this can be verified by using a KB satisfiability al-

gorithm. However, here we demonstrate its satisfiability by exhibiting an example of

interpretation satisfying the constraints. Let us consider the following interpretation �:

��

!��

��

Interpretation � has been constructed in such a way that it satisfies all the constraints in

�, and all the assertions for the representative concepts. To satisfy the added inclusion

assertion derived from the query formula, every element of the domain must be in the

interpretation of the concept. The elements !, �, and �� satisfy the constraint because

they are not in �� (therefore they are in ��). The element � obviously is not in

�� , nor in ��; therefore we should verify that � is in ��

�� . Clearly, this is the case because it is in ��

in �� , and in ��

� .15

Since the transformed KB is satisfiable, the conclusion is that the query formula is

not a logical consequence of the KB �.

7.4.5 Termination

Transformation rules are applied by following a simple “ordering” strategy: if more

than a rule can be applied to the same disjunct, then the first one (in the order they have

been presented) is applied.

Given this strategy the rule application process terminates; i.e. the transformation

always leads to a formula where no rules are applicable, and the resulting formula has

the form � �� .

The proof of termination is divided in three parts. First we show that the application

of any rule maintains the connectivity of the query formula. Secondly, we show that

the variable � is never eliminated; finally, we prove that in a finite number of steps

we are going to obtain a query formula without role terms. If an element (disjunct) of

the query is connected, contains the variable �, and does not contain any role term it

must be in the form �� (possibly containing a term in the form � � !

as well). From that is collapsed into �� by the conjunction elimination rule (and

nominal elimination rule if � � ! is present).

Proposition 7.8. Let � �� *�� *�� be a query where all the elements *�� *�are connected, and �� *�� *

�� be the resulting query after the application

of one of the rules. Then all the disjuncts *�� *�� are connected.

Proof. Given the assumption that each one of the starting disjuncts is connected, we

have to show that the connectivity is maintained by the rules which remove role terms.

15Note that, the interpretation provides also an example of interpretation satisfying �, but notmodelling the query formula. Therefore an alternative way of showing that the formula is not a logicalconsequence of the KB �.

Equality elimination: even if all the occurrences of variable are renamed to

�, the disjunct is still connected because no role terms are removed.

Role collapsing: two role terms are eliminated between � and �, but a new role

term is inserted connecting the very same variables. Therefore if the disjunct

was connected before the rule application, then it is still connected after the

application.

Contradiction elimination: the new disjunct *� � �� is trivially connected.

Role elimination: since � � �, then the role term is substituted with a different

role term between the very same variables.

Shortcut elimination: even if a role term between variables � and � is re-

moved, there still is a path connecting the two of them by hypothesis.

Simple rolling up: the term � � �� is removed from *; but since the variable

� no longer appears in *�, then if the disjunct * is connected, so must be the new

Simple inverse rolling up: this case is analogous to the previous one.

Functional rolling up: again, this case is analogous to the simple rolling up.

Functional inverse rolling up: a role term connecting variables � and is re-

moved from *; but a new role term connecting the very same variables is added.

Nominal rolling up: this case is analogous to simple rolling up.

Nominal inverse rolling up: this case is analogous to simple rolling up.

Cycle breaking, inverse cycle breaking: a role term connecting and � is

removed from *; however, in the remaining formula there is a path connecting

and � by hypothesis (or � " ). Therefore *� is still connected.

Simple cycle breaking: a role term connecting and � still remains in *�, so the

connectedness is maintained.

Proposition 7.9. Let � �� *�� *�� be a query answering problem where the

variable � appears in all the disjuncts *�, and �� *�� *�� be the resulting

query after the application of one of the rules. Then the variable � appears in all the

disjuncts *��.

Proof. Analogously to the previous proposition we consider the rules that might delete

terms.

Equality elimination: � is never renamed by definition.

Conjunction elimination: the variable is still in *� because of the fact that a

new concept term is added.

Role collapsing: both � and � are still kept in *�.

Nominal elimination: this rule does not affect � by definition.

Contradiction elimination: variable � is still in *� by definition.

Nominal elimination: a concept term containing � is still in the disjunct * �.

Role elimination, shortcut elimination: both � ( �) and � ( �) are still in *�.

Simple rolling up, simple inverse rolling up, functional rolling up, functional

inverse rolling up, nominal inverse rolling up, and nominal rolling up: these

rules do not remove � by definition.

Cycle breaking, simple cycle breaking, and inverse cycle breaking: even

thought a role term containing � is removed, in *� there is at least an equality

term containing �.

Proposition 7.10. Let � �� *�� *�� be a query where all the disjuncts *� are

connected, and � is appearing in all of them. Then, there is a strategy for applying the

rules such that a new query �� containing only concept

terms can be obtained in a finite number of application of the rules.

Proof. The idea behind the proof is showing that we only eliminate role terms by

using the rules, but never introduce new ones; therefore we obtain a query without role

terms. Moreover each disjunct is connected and contains the variable � as shown in the

previous propositions; therefore, if the query does not contain any role term, then each

disjunct must contain only terms like � � ! or ��. Elements containing only concept

terms can be normalised into the form �� by using the nominal elimination and

conjunction elimination rules.

Given a set of terms *, we define the number of edges of * as the total number of

different role names connecting each pair of variable names in role terms. Formally,

this can be defined as the cardinality of the set

�� *� � ��

Note that the rules never increase this number per disjunct, at most they multiply the

number of disjuncts in the query (the nominal introduction rules). Given a query for-

mula �*�� *�� , we consider the number of edges of the formula as the maximum

among the number of edges of each disjunct *�. We are going to show that this number

decrease in a finite number of applications of the rules.

A given disjunct in the query can be transformed (by the substitutions performed by

the rules) into a form in which no rules are applicable or only the nominal introduction

rules are applicable, in a number of steps which is linear w.r.t. the size of the disjunct

itself.

Let us consider the query answering problem � �� *�� *�� ; we can assume

that none of the normalisation rules can be applied to any of the *� (if it is not the

case this can be achieved in a number of steps linear w.r.t. the sum�

�� "*�, where

"*� denotes the cardinality of the set *�). If the number of edges of �*�� *�� is

greater than 0, there should be a subset of the disjuncts having that number of edges.

We are going to take one of those disjuncts and show that the number of edges of

disjuncts “derived” from the chosen one can be decreased in a finite number of steps.

In addition we are going to show that we can limit the number of “derivable” dis-

juncts. We can apply this arguments to all the “maximal” disjuncts in the original

query; therefore in a finite number of steps we obtain a new query answering problem

with a smaller number of edges.

First we must specify precisely what we intend by the fact that a disjunct is “de-

rived” from another one in the query. As we described in Section 7.4.1, most of the

rules take one of the disjuncts * from the query and replace it with a modified version

*� of it; we say that *� is “derived” from *. The case of nominal introduction rules is

different; the disjunct * is substituted by a finite number of new disjuncts, and all these

are said to be derived from *.

Let * be one of the disjuncts in the query �*�� *�� . Clearly all the derived

disjuncts have a number of edges which is less or equal than the one of *. We assumed

that none of the normalisation rules is applicable; in addition none of the role elimina-

tion rules is applicable as well, otherwise we already obtained a derived disjunct with

a smaller number of edges.

First we must show that, under these assumptions, when there is at least a role

term in * one of the nominal introduction rules is applicable. We distinguish two cases

according to the fact that there is or there is not any cycle in *.

If there is no cycle in *, then there is a role term � � �� being the

last term of a path not terminating with �; i.e. either there is not any different role

term containing � and � !" �, or there is not any different role term containing

and !" �.

Let us assume that in * there is not any different role term containing � !" �.

Since none of the normalisation rule are applicable, there can be only two other

terms containing �: � � ! for some individual name ! and ��. If � � ! is

in *, then the nominal rolling up would be applicable; which is not the case

by assumption, therefore � � ! is not in *. If , � � (i.e. � � ��), then the

simple rolling up would be applicable, so , � �. Finally, there is not any label

� s.t. �� , otherwise the functional rolling up rule would be

applicable. Therefore the simple nominal introduction rile is applicable because

all the preconditions are met.

Let us assume that in * there is not any different role term containing !" �. As

before we can exclude the case in which , � � and all the roles are included in

a single label (otherwise the simple inverse rolling up or the Functional inverse

rolling up rule would be applicable). In the same way we should exclude the

case in which either a term like � ! or � � !� are in *, otherwise the inverse

cycle breaking or simple cycle breaking rule would be applicable. Therefore the

simple nominal introduction rule is again applicable.

If there are cycles in *, let us consider each role term � � �� in-

volved in at least a cycle. Since neither the cycle breaking nor the inverse cycle

breaking rules are applicable, then there are no terms like � ! or � � ! in *;

therefore the cyclic nominal introduction rule is applicable.

Now we consider the cases in which the nominal introduction rules must be ap-

plied. We show that in a finite number of steps a new query is obtained where all the

disjuncts derived from * have a number of edges less than that of * (i.e. a rule which

eliminates edges is applied). We proceed by induction on the number of variables, not

identified with any individual name (by a term like � !), which are involved in at

least a cycle or in a term like � � �� where �� are not included

in a single label.

First let us consider the case in which there is only one variable � satisfying the

above conditions. Then, either the simple nominal introduction rule is applicable

or there is a loop on � and the cyclic nominal introduction is applied. In both

cases the disjuncts introduced in place of * are of the form *�� !�� (because

adding an equality term between the same variable does not make any sense). In

the case of the simple nominal introduction rule either the nominal rolling up

or simple cycle breaking rules are then applicable. If the cyclic nominal intro-

duction rule has been applied, then there is the possibility of applying the cycle

breaking rule as well. Anyway, in all three cases the rules eliminate one or more

edge from the derived disjunct * � �� !��.

If there is no cycle, then there must be a term like � � �� , and we

can apply the very same arguments we used in the basic case. So let us assume

that there is a cycle involving more than one variable (not identified with any

individual name) and one of the nominal introduction rules is applied to one of

these variables �. We consider the two cases separately.

Simple nominal introduction Let us consider one of the disjuncts *� � * �

�� !�� inserted in place of *. Let � � �� be the role term

matching the rule application. In this case the nominal rolling up rule is

not applicable because � is part of a cycle, but one of the cycle breaking or

simple cycle breaking rules can be applied to any of the resulting disjuncts,

diminushing their number of edges.

Cyclic nominal introduction Since the cycle is longer than �, either there is

a role term � � �� or �� in the path, with

!" �. In addition, the derived disjuncts are either in the form *�� !��

or )'�� *, where )'�� is a set of equalities between variables.

Let us consider the disjunct *� � * � �� !��. As we mentioned, either

� � �� or �� are in the path.

If � � �� is in the path then the preconditions of the cycle

breaking rule are satisfied (and none of the preceding ones); therefore in the

next step the (unique) disjunct derived from *� will have a smaller number

of edges. Let us now consider the case in which �� is in

the path. The cycle breaking rule is not applicable because by assumption

*� does not contain any term like � !, and the same is true for the simple

cycle breaking rule. However, the inverse cycle breaking rule is applicable,

which is a role elimination rule.

Let us consider a disjunct in the form *� � )'�� *, where )'�� is a

set of equalities between variables. The query answering problem can be

transformed into a new problem in which *� is substituted by a single new

disjunct at each step, by applying the normalisation or role elimination

rules. In a number of steps which is linear w.r.t. the size of *� we obtain a

single disjunct *�� derived from *� to which only the nominal introduction

rules are applicable (or none of the rules).

Note that the number of variables involved in a cycle in *�� is decreased be-

cause of the fact that two of them have been explicitly made equal by a term

� � � (so one of them has been eliminated by the nominal elimination

rule). Therefore, we can use the inductive hypothesis and conclude that in

a finite number of steps we obtain a formula in which all the disjuncts that

are derived from *�� have a number of edges smaller than *� (and conse-

quently than *). The arguments are valid for all the disjuncts *� � )'�� *

derived from *.

Worst case complexity analysis

The complexity16 of the actual rolling up procedure is polynomial in the size of the

query formula. This can be seen by considering the graph corresponding to each dis-

junct; the rolling up eliminates the leaves of the graph (after a linear normalisation

by the first five rules). Even if new assertions are added to the knowledge base, these

are linearly bounded by the sum of the number of edges of each graph representing a

disjunct in the query formula.

Unfortunately, the nominal introduction rules multiply the number of disjuncts

(graphs) in the query. As the reader can guess from the last section, the number of

16In this section, we use the word complexity intending the worst case complexity.

newly introduced disjuncts can be exponential in the size of the initial problem.17 The

worst case is related to the cyclical nominal introduction rule, because the other rule

introduce a lesser number of elements (the equalities are not introduced). Therefore,

the worst case complexity can be calculated by analysing the cyclical nominal intro-

duction rule.

The proof for termination (see Proposition 7.10) provides the hints for the analysis

of the complexity. The point of the proof to look at is where the inductive arguments

are used for the cases in which the cyclical nominal introduction rule can possibly be

applied recursively. At each application of this rule one of the variables involved in a

cycle is “eliminated” from the derived new disjuncts.18 Therefore, if at the next step

the cyclical nominal introduction rule is applied again, the number of equalities to be

introduced will be smaller than the one at the previous step.

The complexity is of the order of the number of graphs that can be generated by

recursive application of the nominal introduction rules. This number can be calculated

by considering the tree whose root is a disjunct and the nodes the new elements gener-

ated by successive applications of the rule from the root element. The number of all the

generated graphs is equal to the number of leaves in this tree; so we need to estimate

the maximum number of successors for each node.

By looking at the rule we can estimate the number of successors of each node by

considering the sum of the number of individual names in the knowledge base (we

denote this number by " ), with the number of equalities that can be generated by

the number of variables in the cycle (we can consider this number as the total number

of variables). Although the number of variables can be tightened to a more precise

lower value, we assume that it is equal to the size of the single disjunct (we denote this

number as �). The number of equalities generated from � variables can be calculated

by considering a square matrix of size � where each row (column) is marked by a

variable. Each element of this matrix denotes an equality between the correspondent

two variables. We do not need the whole matrix because we are not interested in the

principal diagonal (terms like � ), and we need only half of the matrix (there is not

point in having both � � and � � ); therefore the number of equalities is ��

The maximum number of successor of the root are " � ��

, at the next level

we have �" � ��

��" � ��

�, and so on until the number of variables is

17Unfortunately it is not only on the size of the formula because of the identification of the variableswith the individual names in the knowledge base.

18Or it is identified with an individual, which is the same from the perspective of the cyclical nominalintroduction rule.

equal to one:

�" ��' ��

��" �

��' ��' ��

�� " �

��' ��

��" � ��

� times.

Therefore, the number of disjuncts is bounded by the formula �" � ��. The size of

each disjunct is bounded by the size of the original formula, and in addition the rolling

up procedure can add to the knowledge base as many new assertions as the number of

inverse rolling up rule applications. This does not alter the order, which is still-�. ��

w.r.t. the size of the query problem.

At the end of the collapsing procedure the algorithm uses the knowledge base sa-

tisfiability procedure whose complexity is EXPTIME (see Chapter 4). Note that we

already estimated that the size of the input formula is of the order of -�.��, therefore

the worst case complexity for the whole problem is double exponential time (2EXP-

TIME).

This is more than intractable, the question at this point is—can we do better than

that? The answer can only be found by the reduction of a different problem known to be

2EXPTIME–hard to our query answering problem. However, there is good evidence

that this is the intrinsic complexity of the problem, since it is the same order as the

results presented in Calvanese et al. [1998a] for a similar problem. Although the DL

presented in that paper is more expressive than the one we are presenting here, we

already know that the complexity of the satisfiability problem is the same for both the

logics. The reader may object that in the same work they do not show hardness results,

however the nondeterministic nature of the problem suggests that the addition of an

exponential step to the basic satisfiability problem cannot be avoided.

7.4.6 Correctness and completeness

The correctness and completeness of a rule are defined in the usual way. A rule is cor-

rect if a positive answer to the query problem is still positive for the new query problem

generated by the rule itself; i.e. if� �� *� *�� *�� then�� *�� *�� *�� .

A rule is complete if negative answers are preserved; i.e. if �� *�� *�� *��

then � �� *� *�� *�� ).

For rules that do not modify the KB on the left hand side (� � ��) the pattern of

the proof is “simple” because the set of interpretations satisfying the KB before and

after the rule application does not change. On the other hand, when a rule modifies the

KB the set of satisfying interpretations is reduced because new constraints are added to

the KB. In this case we must pay attention to the introduction of new concept names;

in particular regarding the representative concept. In fact, a representative concept

is an effective way of representing a single element only if the reasoning mechanism

is unable to distinguish the case where the interpretation of a representative concept

contains more than one element from the case in which it contains a single element.

Therefore we want to show a completeness result for the class of interpretations in

which the representative concepts are restricted to contain a single element (this is

analogous to the completeness of a class of models as seen in Section 7.2).

We call an interpretation nominal representative (n.r.) if the interpretation func-

tion maps each representative concept into the set containing only the interpretation

of the corresponding individual name (i.e. ��

�!��

). Using the definition of n.r.

interpretations we introduce the notion of nominal safe queries, as those completely

characterised by n.r. interpretations. A query answering problem � �� $ is nominal

safe when � �� $ w.r.t. n.r. interpretations implies that � �� $ (note that the converse

is trivially true).

We have to show that the manipulations of the query answering problem done by

the rules keep the result nominal safe. Otherwise, the use of representative concepts

will can produce wrong answers. The initial query answering problem is trivially nom-

inal safe because no representative concepts appear neither in the knowledge base nor

in query itself.19

Some of the proofs we provide rely on the fact that the query problem remains

nominal safe after the application of any rule; this is an additional requirement to the

completeness and correctness. Let � �� $ be the query answering problem matching

a rule, and �� $� the problem resulting from the application of the rule itself. We

want to show that:

1. the rule is complete: �� $� implies � �� $;

2. the rule is correct: � �� $ implies �� $�;

3. the resulting problem �� $� is nominal safe: �� $� w.r.t. n.r. interpretations

implies �� $�.

19An interpretation satisfying the KB can just be extend by adding the representative concepts, andmapping them to the set containing the element of the domain corresponding to the appropriate individ-ual (i.e. starting from the interpretation , we make a new � from by adding � �

��

for everyindividual name �).

If we start from the assumption that � �� $ is nominal safe, we can simplify the proofs

by showing that the two conditions

�� $� w.r.t. n.r. interpretations implies

� �� $ w.r.t. n.r. interpretations, and(7.3a)

� �� $ implies �� $� (7.3b)

hold for every rule. Note that the second condition is just the correctness as defined

above; the real difference is the first restricted version of completeness. Firstly, we

show that the two conditions are sufficient when the problem � �� $ is nominal safe.

1. Completeness: if �� $� then �� $� w.r.t. n.r. interpretations. We can

then use the Condition (7.3a) to conclude that � �� $ w.r.t. n.r. interpretations.

In addition � �� $ is nominal safe by hypothesis, therefore � �� $ w.r.t. n.r.

interpretations implies that � �� $.

2. Correctness: it is exactly the Condition (7.3b).

3. Safeness: if �� $� w.r.t. n.r. interpretations, then � �� $ w.r.t. n.r. interpreta-

tions by Condition (7.3a). In addition, � �� $ w.r.t. n.r. interpretations implies

that � �� $ because the problem is nominal safe by assumption. Therefore, we

can use Condition (7.3b) to conclude that �� $�.

In the following propositions we assume that the query problem before the applica-

tion of the rules is nominal safe, and by completeness we intend the Condition (7.3a)

instead of the classical definition. The proofs are organised by rules, first the rules

which do not modify the knowledge base, and then the ones which do modify it.

The pattern of all the proofs is similar, so we are going to introduce it to help the

reader in following the proofs. The rules transform the problem � �� *� *�� *��

into the new problem �� *�� *�� *�� *��, where the element * is substi-

tuted by one or more elements *�� *��. Let us consider completeness (correctness

is similar), and a rule that substitute the element with a single new element.

We have to show that assuming �� *�� *�� *��, it is the case that � ��

�*� *�� *��. We choose an arbitrary n.r. interpretation � satisfying � and we

show that it models the formula �*� *�� *�� (see Definition 7.3). If necessary

for a rule modifying the kb, we first extend � to a new interpretation � � satisfying ��.

In general, this new interpretation is equal to the original one with mappings for the

newly introduced concept names; we can then easily go back to � by discarding the

appropriate mapping. Since � � satisfies ��, then � � �� *�� *�� *�� by hypothesis;

so there is a mapping ( such that � � �� *� or � � �� *�� *��. In the latter case,

we have that � � �� *� *�� *�� as well, and in general it is easy to show that

if � � �� *�� *��, then � �� *�� *��. This is because either the new

names in � � do not appear in any of the formulae *�� *�, or they are already in

� (representative concept) and � � � �. Therefore in the actual proof we show that if

� � �� *� then there is a mapping ( � such that � �� *.

Rules that do not modify the KB

Proposition 7.11. The equality elimination, conjunction elimination, role collapsing,

role elimination, and shortcut elimination rules are complete and correct.

Proof. Correctness and completeness of these rules are trivially verified by the fact

that for every interpretation � and mapping (, � �� * iff � �� *�; therefore � ��

�*� *�� *�� iff � �� *�� *�� *��.

Let us consider the shortcut elimination rule as an example. Since the two query

formulae differ only for the disjuncts * and *�, we just need to show that � �� * iff

� �� *� for an arbitrary mapping (.

If � �� * then � �� , therefore

� �� ( ��

as well. The rest of the terms in *� are the same of those in *, therefore � �� *� as

Let us assume that � �� *�, we need to show that � �� .

We consider the case in which � � �, the other is not really different. Since � �� *�,

then � �� ( ��; therefore we must verify that � ��

� �� . But this is true because

�� *�

therefore in every set of roles ��, there is at least a role �� such that �� & �, which

means that � �� for � � �� ' �. For the transitivity restriction on �

� �� , therefore � �� because � & �.

Proposition 7.12. The contradiction elimination rule is correct and complete.

Proof. If � � !�� !�� *, and !� is different from !�, then there is no interpreta-

tion satisfying *. The very same applies to *� " ��. If � �� *� *�� *�� , then

an arbitrary interpretation satisfying � must model �*�� *�� (without the dis-

junct *); therefore it models �*�� *�� *�� . The other direction is analogous.

Proposition 7.13. The simple rolling up rule is correct and complete.

Proof. Let � �� *� *�� *�� be the query before the application of the simple

rolling up rule, such that * satisfies the rule conditions, and � �� * �� *�� *�� is

the resulting query.

completeness Let � be a n.r. interpretation satisfying �. We assume that � ��

�*�� *�� *��, so there is a mapping ( such that � �� *�� *�� *��. If

� �� *� for � � �� then � �� *� *�� *�� as well; so, we are going to

show that if � �� *�, then there is a mapping ( � s.t. � �� *.

If � �� *� then � �� , therefore there is an element � � �� s.t.

�(� �� and � � �� . The mapping (� � (�� in which � is mapped to

� satisfies the required property that � �� and � �� . The rest of terms

in * are satisfied because � does not appear in any other term, therefore � �� *.

Correctness Let � be an interpretation satisfying �; for reasons analogous to those

in the completeness proof, we are going to show that if � �� * then � �� *�.

If � �� * then � �� and � �� , it is easy to see that this implies that

� �� (the element (�� is the “witness” for the property). For the remaining

terms in *�, since they are in * as well they are modeled by � w.r.t. (; therefore

� �� *�.

Proposition 7.14. If the role hierarchy is such that for every unrelated pair of roles20 in

the same label there is a functional role that includes both of them, then the functional

rolling up and functional inverse rolling up rules are correct and complete.

Proof. The proof for these two rules are very similar to the previous one for the simple

rolling up rule. In particular, the correctness part is almost identical and is left to the

intuition of the reader. So we are going to concentrate on the completeness.

Let � �� *� *�� *�� be the query before the application of the functional

rolling up rule, such that * satisfies the rule conditions, and � �� * �� *�� *�� is

20They are not included one in each other.

completeness Again, the structure is the same as the previous completeness for the

simple rolling up rule. The crucial point here is that although we “split” the role con-

junction into several existential assertions, the interpretations are such that all the suc-

cessors must be the very same element.

The point we need to show is that, in any interpretation satisfying �, if an element

is related to two elements via two different role names being in the very same label

and not being included one in the other, then the two related elements must coincide.

Formally, let � be an interpretation satisfying �. Let � be an element of �� , and �,

� two role names in the same label s.t. neither � & � nor � & �. Then, if

�� and ��

� , �� .

Note that this is not true if � & �, because this implies that ��

� but

not the converse. The intuition behind the proof is that if the two roles are not related,

then there must be an interaction with a functional role that forces the collapsing of

successors of � (i.e. there is a functional role � s.t. � & � and � & � ).

If � and � belongs to the very same label, then either � � � or � � � (by

Definition 3.4b); let us assume that � � �. We prove the claim by induction on the

definition of � (Definition 3.4a).

The two basic cases � & � and � & � are ruled out by assumption, so there

must be a functional role � such that � & � and � & � . This means that both

�� and �� are in � � , and �� because � satisfies the functional restriction

on � .

The property we have just shown applies to the functional (inverse) rolling up rule

because we assumed that the role elimination rule is not applicable. Therefore, if the

set of role names �� is included in a label and there is more than one role

in it, they must be unrelated.

Proposition 7.15. The simple nominal introduction rule is correct and complete.

Proof. We assume that the query before the application rule is � �� *� *�� *��,

* satisfies the conditions of simple nominal introduction rule, and none of the rules

with a higher priority can be applied to *.

completeness Let � be a canonical (w.r.t. �) quasi transitive shrub interpretation

satisfying �, which is n.r. as well. We assume that � �� * � �� !�� * �

�� !�� *�� *��, so there is a mapping( such that � �� *�� !�� *�

�� !�� *�� *��. By definition there is one of the disjuncts such that � �� +

for each term + contained in it. Moreover, each of the disjuncts in �*�� !�� *�

�� !�� *�� *�� includes at least one of the disjuncts in �*� *�� *�� ;

therefore � �� *� *�� *�� as well.

Correctness Let � be a canonical (w.r.t. �) quasi transitive shrub interpretation sat-

isfying �. We assume that � �� *� *�� *�� , so there is a mapping ( such that

� �� *� *�� *�� . We proceed by contradiction by assuming that �!��* �

�� !�� * � �� !�� *�� *��.

If � �� *� for some � � �� , then

��* � �� !�� * � �� !�� *�� *��

as well; so we must assume only that � �� *. If (�� , there is a contradiction

with the assumption that none of the disjuncts *� �� !�� is satisfied. If (�� !� � ,

and since we assumed than � � �, then �(� �� (��

� � � � � � �� .

Therefore, by (3.6b) there is a label � s.t. �� , which is in contradiction

with the conditions for the applicability of the rule itself.

Given the fact that we exhausted all the possibilities, finding only contradictions,

we must reject the given assumption that

�!��* � �� !�� * � �� !�� *�� *��

Therefore we conclude that

� �� * � �� !�� * � �� !�� *�� *��

Proposition 7.16. If the roles involved in the cycle are not transitive and do not have

any transitive sub-role, then the cyclic nominal introduction rule is correct and com-

plete.

Proof. We assume that the query before the rule application is � �� *� *�� *��,

* satisfies the conditions of cyclic nominal introduction rule, and none of the rules with

an higher priority can be applied to *. The query after the application of the rule is

� �� * � �� !�� * � �� !��

)'�� *� � � � � )'�� *� *�� *�� (,)

completeness We assume that (,) is satisfied. Let � be a canonical (w.r.t. �) quasi

transitive shrub interpretation satisfying �, which is n.r. � models the query formula in

(,) by hypotheses; therefore there is a mapping( such that � models the query formula

w.r.t. (. By definition there should be one of the disjuncts such that � is a model for

it w.r.t. (. However, all the disjuncts in (,) are supersets of at least one disjunct in

�*� *�� *��. Therefore � models �*� *�� *�� w.r.t. ( as well.

Correctness We must show that if � �� *� *�� *��, then (,) is satisfied. Let

� be a canonical (w.r.t. �) quasi transitive shrub interpretation satisfying �. By hy-

pothesis � �� *� *�� *��, therefore there is a mapping ( such that � ��

�*� *�� *��. We proceed by contradiction assuming that � does not model the

query formula in (,) w.r.t. (.

By definition either � �� *� or � �� *�� for � � �� (since no *� contains

an existential quantifier, we can assume that ( is a mapping for all the variables). If it

is the latter case, then � models the query formula in (,) w.r.t. ( as well, leading to a

contradiction; therefore we must assume that � �� *�.

Note that, since � is not a model for (,), then the mapping ( must not satisfy

any term + in �� ! � ! � � or in�

� )'��, otherwise � would be a model for the

corresponding disjunct * � �+� in (,). Therefore

(�� !� �;

(� �� !� (� �� for any pair of variables �� occurring in a cycle with �.

Let us consider a cycle *� � * involving the variable � in *. Obviously � �� *��;

we are going to show that this is in contradiction with the assumptions we made.

Before analysing the cases we show a property that will be used in the rest of the

proof. Given the assumptions, it is the case that for each term � ��

in *� either �(� �� (� �� or (� �� (� ��) for some element ) (see

Definition 3.2). If (� �� and �(� �� (� �� then (� �� as

well by (3.2c). Let us assume that (� �� !� � , then (� �� ) for some el-

ement ) of � and � of �� (See Definition 3.2). By (3.2d) either � � (� �� or

��(� �� (� �� . The first case is exactly the result we want to prove;

therefore we assume that ��(� �� (� �� , and we show that this is a

contradiction. Since � is canonical, then we can use the property (3.6a) to infer that

there is a transitive role � & �. This is in contradiction with the fact that we assumed

that none of the role names involved in any cycle in * is transitive or has a transitive

subrole.

We consider three cases according to the length � of the cycle (cardinality of *�).

� � � If the cycle contains a single term then it must be like �� . This

means that �(�� (�� , which is impossible by Lemma 3.3 because we

assumed that (�� !� �

� � � If the cycle contains two terms, then they must be of the form

��

This is because if there is not any variable different from �, then there is a smaller

cycle. On the other hand, with more than two different variables and two role

terms, having a cycle is impossible. Moreover, we assumed that no other rules

can be applied to *, therefore there cannot be any equality like � � in *;

nor can the variables and � appear in the same order in the two role terms in

*� (i.e. like �� , and �� ), otherwise the role

collapsing rule would be applicable. Since (�� !� � we already proved that

(�� (� �) and (� � � (��)�, which is a contradiction.

� � � Let � and � be the two “neighbour” variables of � (i.e. there is a role term

connecting � and � and a different term connecting � and �). The two variables

must be different, otherwise there will be a cycle of length �. We can distinguish

four cases according to the ordering of the variables in the terms; let us consider

each one of them.

– Let us assume that the two terms are

�� and ��

We showed that (� �� (��)�� and (� �� (��)�� for some )�� , )��;

therefore (� �� is not in � . This applies to the next variable connected

to �: either (� �� (� �)�� or (� � � (� ��)�. The first case is ruled

out because (� � cannot be equal to (�� (we assumed this at the begin-

ning); therefore (� � � (� ��)�. The same arguments can be carried on

for all the subsequent variables in the cycle, and this is clearly in contra-

diction with the fact that, for the last variable �, it must be the case that

(� �� (��)�� .

�� and � ��

Therefore (� �� (��)�� and (�� (� ��)� for some )�� , )�. Using

the same argument of the previous case we can show that every variable in

the cycle has the property of being in the form (� � � (��)�� )�)�. This

is in contradiction with the fact that (�� (� ��)�.

� �� and ��

Therefore (� �� (��)�� and (�� (� ��)� for some )�� , )�. Using

the arguments of the previous case starting from � instead of � we get a

similar contradiction.

� �� and � ��

Therefore (�� (� ��)� and (�� (� ��)� for some )�. This is in

contradiction with the fact that (� �� and (� �� must be different.

At this point we exhausted all the possibilities, finding only contradictions. There-

fore we must reject the original assumption that � does not model (,) w.r.t. (. Given

the arbitrariness of the choice of interpretation � we conclude that (,) is satisfied.

Rules that modify the KB

Proposition 7.17. The nominal elimination rule is correct and complete.

Proof. Let � �� *� *�� *�� be the query before the application of the nominal

elimination rule, such that * satisfies the rule conditions, and � � �� *�� *�� *��

is the resulting query.

completeness We assume that �� *�� *�� *�� w.r.t. n.r. interpretations.

Let � be a n.r. canonical (w.r.t. �) quasi transitive shrub interpretation satisfying �.

We have to show that � �� *� *�� *��. We can distinguish the case in which

�!�� from the case in which it is not.

If �!�� then �� ; therefore there is a mapping ( such that � ��

�*�� *�� *�� by the assumption that �� *�� *�� *��. As seen in

the previous proofs we concentrate in the case in which � �� *�; we just need

to show that � �� !. Since we assumed that � is n.r., then ��

�!��

because �!�� . Therefore (�� !� because � �� , which proves

that � �� !.

If �!�� !� � then the concept name �� does not appear in �, because the

assertion !�� is added to the left hand side whenever the representative concept

�� is introduced in the query formula. We define a new interpretation � � by

adding (or redefining) the mapping ��

�!��

to �.21 � � �� because � does

not contain any reference to ��; so � � �� !��. By assumption there is a

mapping ( such that � � �� *�� *�� *��. Let us consider the case in which

� � �� *�: as in the previous case (�� !��

� !� . In addition we can use the

very same mapping ( with �, therefore � �� !; this shows that � �� *.

Correctness We assume that � �� *� *�� *��. Let � be a canonical (w.r.t.

��) quasi transitive shrub interpretation satisfying ��. We have to show that � ��

�*�� *�� *��. Again we distinguish the two cases according to whether !�� is

already in �. However, in both cases � �� because � � ��; therefore there is a

mapping ( such that � �� *� *�� *��. We just need to show that if � �� *,

then � �� *�, by showing that � �� . However, this is trivially verified because

� �� ! (by the assumption that � �� *), and !� � �� because � �� .

Proposition 7.18. The simple inverse rolling up rule is correct and complete.

Proof. Let � �� *� *�� *�� be the query before the application of the simple in-

verse rolling up rule, such that * satisfies the rule conditions, and � � �� *�� *�� *��

21The domain of the interpretation function �� contains � by definition; therefore the value �� isdefined.

completeness We assume that �� *�� *�� *�� w.r.t. n.r. interpretations. Let

� be a n.r. canonical (w.r.t. �) quasi transitive shrub interpretation satisfying �. We

build a new n.r. interpretation � � equal to � with the addition/substitution of the inter-

pretation for the new concept name �� such that

��

�� and � � ��

��

Note that since �� does not appear either in � or in �*� *�� *��, � � still sat-

isfies �, and if � � �� *� *�� *�� then � �� *� *�� *��. It is easy to see

that � � satisfies � �� ; i.e. �� and � � ��

�(note

that �� because �� does not appear in �). Therefore � � satisfies ��, and by

assumption there is a mapping ( such that � � �� *�� *�� *��. We just need to

show that if � � �� *�, then there is a mapping ( � s.t. � � �� *.

Let us assume that � � �� *�, then (� � � ��; therefore there is an element

� of �� such that �� (� �� and � � �� . We define (� as (�� ; note that

still � � �� *� because � does not appear in *� by hypothesis. By construction � � ��

�� and � � �� ; therefore � � �� *.

��) quasi transitive shrub interpretation satisfying ��. Obviously � satisfies � as

well because it is a subset of ��; therefore there is a mapping ( such that � ��

�*� *�� *��. We need to show that if � �� *, then � �� *�.

Let us assume that � �� *, then (�� ; therefore (�� because

� satisfies ��. In addition, �(�� (� �� because � �� *; therefore (� � �

�� , which means that � �� *

Proposition 7.19. The nominal rolling up rule is correct and complete.

rolling up rule, such that * satisfies the rule conditions, and � � �� *�� *�� *�� is

completeness We assume that �� *�� *�� *�� w.r.t. n.r. interpretations. Let

� be a n.r. canonical (w.r.t. �) quasi transitive shrub interpretation satisfying �. We

build a new interpretation � � by adding (or redefining) the mapping ��

�!��

�. The interpretation � � is still n.r. In addition, if !�� is already in �, then � � � �

(see the proof of Proposition 7.17). If on the other hand !�� is not in �, it is easy to

see that if � � �� *� *�� *�� then � �� *� *�� *�� as well because �� does

not appear either in � or in �*� *�� *��. The interpretation � � trivially satisfies

��; therefore there is a mapping ( such that � � �� *�� *�� *��. We just need to

show that if � � �� *�, then there is a mapping ( � s.t. � �� *.

Let us assume that � � �� *�. Since � � �� ,

then (� � � �� ; therefore there is an element � of �� such that

�(� ��

� , � � �� (because either � does not include �� or

��

�) and � � �� . Note that ��

�� !��

, therefore � � !� � !��

very same arguments, applied to (� � � �� for � � �� , allow us to con-

clude that �(� �� . We define (� as (�� . By construction � �� !,

� �� , and � �� ; therefore � � �� *.

��) quasi transitive shrub interpretation satisfying ��. Obviously � satisfies � as

well because it is a subset of ��; therefore there is a mapping ( such that � ��

�*� *�� *��. We need to show that if � �� * then � �� *�.

Let us assume that � �� *, therefore there is an element (�� of �� such that

(�� !� (because � �� !), (�� (because � �� ), and �(� �� (��

��

� (because � �� ). In addition, (�� because

� satisfies �� (!�� is in ��). Therefore (��

�� and � �� *

Proposition 7.20. The cycle breaking and simple cycle breaking rules are correct and

complete.

Proof. Let � �� *� *�� *�� be the query before the application of the rule, such

that * satisfies the rule conditions, and �� *�� *�� *�� is the resulting query

after the application of one of the two rules.

completeness We proceed as in the previous Proposition 7.19. Let � be a n.r. canoni-

cal (w.r.t. �) quasi transitive shrub interpretation satisfying �, and � � defined as � with

the mapping ��

�!��

. As in the previous proposition, � � is still n.r. and it satisfies

��. Therefore by hypothesis there is a mapping ( such that � � �� *�� *�� *��.

As before we assume that � � �� *�. We are going to show that � �� * as well. Note

that in the application of both the rules the variables and � are still in * � (in the cycle

breaking rule either because of the path or because " �); therefore both (� � and

(�� are defined.

Let us consider the simple cycle breaking rule; since � � �� *�, then �(� �� (��

��

� and there are � ' � elements )�� )� of �� such that )� � ��

and �(� �� )��

� . Since � � is n.r., then )� � � � � � )� � (��, so

�(� �� (��

�; therefore � �� . The cycle

breaking rule is the same.

Correctness The proof is exactly the same as for proposition 7.19 for the nominal

rolling up rule.

Proposition 7.21. The nominal inverse rolling up rule is correct and complete.

inverse rolling up rule, such that * satisfies the rule conditions, and � � �� *�� *�� *��

completeness We assume that �� *�� *�� *�� w.r.t. n.r. interpretations.

Let � be a n.r. canonical (w.r.t. �) quasi transitive shrub interpretation satisfying �.

Again, we proceed as in Proposition 7.19 and Proposition 7.18 by considering � � as

equal to � with the mapping ��

�!��

, and for each � � ��

� � �� !� � ��

if !� � �� or �� otherwise. This interpreta-

tion is still n.r.; in addition, it satisfies � because all the �� do not appear in �,

and either �� appears in � and ��

� because � is n.r., or �� does not ap-

pear in �. The interpretation � � satisfies the assertion !�� by definition; now we

are going to show that it satisfies ��

�� as

well. Note that �� because either �� does not syntactically appear in � or

��

� ; therefore either ��

�!�

��

or �� . In the first

case, let us consider the role name � for some � � �� . If ) � �� is such that

�!��

� )� � �� then ) �

�� !��

��

; therefore ) � ��

�� by defini-

tion, and !��

� ��

�� because of the arbitrariness of ). Given the fact that this

is true for all � � �� then ��

��

�� . In the

case that �� the inclusion constraint is trivially satisfied. This shows that

� � satisfies ��. By assumption there is a mapping ( such that � � �� *�� *�� *��.

We need to show that if we assume � � �� *�, then there is a mapping ( � s.t. � �� *.

Since we assumed that � � �� *�, then (� � � ��

��

�� . Therefore

�!�� (� ��

� for all � � �� and !� � �� because ��

is not empty. The new mapping ( � � (��!� is clearly such that � �� * because �

does not appear in other terms in * by hypothesis.

Correctness We assume that � �� *� *�� *��. Let � be a canonical (w.r.t. ��)

quasi transitive shrub interpretation satisfying ��. Obviously � satisfies � as well be-

cause is a subset of ��; therefore there is a mapping( such that � �� *� *�� *��.

We need to show that if we assume that � �� * then � �� *�.

Since � �� *, then (�� !� , (�� , and �(�� (� �� for all

� � �� . In addition, (��

�� because � satisfies

the inclusion ��

��, and (�� .

Therefore for all � � �� we have the combination �(�� (� �� and (��

��

� , which implies that (� � � ��

� . This allows us to conclude that

� �� *�.

Proposition 7.22. The inverse cycle breaking rule is correct and complete.

Proof. Let � �� *� *�� *�� be the query before the application of the inverse cy-

cle breaking rule, such that * satisfies the rule conditions, and �� *�� *�� *��

completeness Let us assume that �� *�� *�� *�� w.r.t. n.r. interpretations,

and let � be a n.r. canonical (w.r.t. �) quasi transitive shrub interpretation satisfying

�. We proceed as in proposition 7.21 by considering � � as equal to � with the map-

ping ��

�!��

, and for each � � ��

�� !��

��

Again, interpretation � � is still n.r. and satisfies �. Using the same arguments as for

Proposition 7.21 we can show that � � satisfies �� as well. By assumption there is a

mapping mapping ( such that � � �� *�� *�� *��. We are going to show that if

we assume � � �� *�, then � �� * (note that both the variables and � maching the

precondition are still in *�).

Since we assumed that � � �� *�, then (� � � ��

�� , so

�!�� (� ��

� for all � � �� . In addition, (�� !� because

� � ! is still in *�; therefore � �� (i.e. � �� * because

* � *� � �� ).

Correctness We assume that � �� *� *�� *��. Let � be a canonical (w.r.t. ��)

quasi transitive shrub interpretation satisfying ��. Obviously � satisfies � as well be-

cause is a subset of ��; therefore there is a mapping( such that � �� *� *�� *��.

We need to show that if we assume that � �� *, then � �� *�.

Since � �� * then (�� !� , and �(�� (� �� for all � � �� . In ad-

dition, since (�� !� � �� we derive that (��

�� for all � � �� ,

7.5. SPEEDING UP THE ANSWER 168

because � satisfies the inclusion ��

��. This means

that (� � � ��

� for all � � �� ; therefore � ��

�� (i.e.

� �� *�).

7.5 Speeding up the answer

The major problem for the performances of the query algorithm lies in the nondeter-

ministic substitution of the variables with the individual names. Note this is true for

both the transformation of a “retrieval” query into a boolean query (see Section 7.1.2),

and the actual boolean query answering (see Section 7.4). Therefore, reducing the

number variable substitutions to perform, and the number of individuals which take

part in each substitution is the way to go for a better performing algorithm.

The variables which need to be substituted by individual names cannot be reduced

in the phase of transformation of a query into a boolean query, but there are a lot of

possibilities in the application of the transformation rules.

Example 7.6

Let us consider a cyclical query like

��

This query contains two cycles: one involving the variables , �, and �, and a second

constituted by the loop in (i.e. � � ��). The cyclic nominal introduction rule can

be applied to any of the tree variables. However, if it is applied to , both the cycles

are “eliminated” at once; while applying the rule to either � or � leaves the loop in

in each one of the newly generated disjuncts.

Note that, in this example, choosing reduces the size of the transformed query of

a logarithmic factor.

For reducing the number of candidate individuals for a variable name, we are con-

sidering two different techniques. The first one relies on the concept terms contained

in the query, while the second on the structure of role terms.

They both rely on the observation that if a subset of the terms in a disjunct are never

satisfied, then neither the disjunct will be satisfied. Therefore, if in a query contains a

concept term like ��, and the variable must be substituted by an individual name, we

can just consider the subset of individual names which can be instance of � (verifiable

by instance checking tests like � �� , see Section 2.2). Note that for variable

7.5. SPEEDING UP THE ANSWER 169

name different from the joining variable � (see Section 7.3.1) we can even consider the

more restricted set of individual names which are instance of �.

We can make an analogous consideration for role terms, with the difference that the

kind of role assertions in a ��f KB are not very expressive. In particular, they do not

allow to express incomplete information as the concept assertions do. Using concept

assertions we can states something like �� which leave a degree of uncertainty

about the properties of �.22 This is impossible to do for role assertions, which are

usually limited to simple statements like �� . This limited expressivity for roles is

shared by most of the DLs studied and/or implemented.

This limitation can be used in order to restrict the number of candidate individu-

als. Let us consider for example a term like � � ��, where the variable � must be

substituted by an individual name. In DL like ��f we can restrict to names which

appear explicitly in assertions like �� as second element of the pair.23 Note that

this argument is no longer valid for DLs providing the inverse role constructor; on the

other hand we are confident that effective optimisations can be devised in most of the

cases.

We did not investigate these ideas yet, and all these optimisations must be carefully

verified in order to avoid incomplete or incorrect answers. This will be subject of our

future work.

22Note that the incomplete information may be stated in more subtle ways and/or implied by theterminology.

23Note that the role hierarchy must be taken into account as well, therefore considering role assertionswith sub–roles of � as well.

Chapter 8

Implementation and testing

The main goal of our DL system implementation is not the delivery of a complete and

robust system, but the development of a prototype to be used as test bed for verifying

the validity of our approach and for suggesting new directions to investigate.

The first objective was the analysis of the overall performances of the system com-

pared to other state of the art DL reasoners. This point is important because we know

that the naive implementation of the tableaux–like method for DLs leads to unaccept-

able performance (see Horrocks [1997]). Modern DL systems, on the other hand,

adopt a series of optimization techniques which provide exceptional results (see Hor-

rocks and Patel-Schneider [1999], Haarslev and Moller [1999, 2000c]). Unsurpris-

ingly, these techniques provide better results if the algorithm has control over the whole

reasoning process. In our case, on the other hand, we have a sharp separation between

the precompletion phase (dealing with Abox assertions) and the invocation of the ter-

minological reasoner. The other goal of the experimentation was to understand the

impact of different optimisation techniques applied at the Abox precompletion level.

The focus of this chapter is not to describe the actual software artifact we have

developed, but rather on the underlying ideas and techniques used.

8.1 Description of the system

The DL system has been written in Common Lisp because it is an excellent language

for symbolic manipulation, and it enables the programmer to easily extend and modify

the code to experiment with different approaches.

Most available DL systems have been written using Lisp, mainly for the above

mentioned reasons, but also because we are dealing with highly intractable algorithms;

8.2. OPTIMISING THE ALGORITHM 171

therefore, twiddling with a low level programming language for gaining an order of

magnitude (which is not that easy against modern Lisp compilers) is pointless without

first investigating higher level optimisations, which can lead to gains of several order

of magnitude.

Interaction with the system is performed by means of Lisp function and macros

designed to be conform to the KRSS standard (see KRSS), as well as compatible with

the syntax used by the FaCT system (see Horrocks [1997]).

The terminological reasoner used is FaCT. Since it is written in Common Lisp as

well, its integration in the code as a library was very easy, and a natural choice since it

is the system used in our research group. However, in principle, any DL terminological

reasoner can be integrated as a library.

8.2 Optimising the algorithm

The experience with previous DL systems shows that the direct implementation of the

tableaux–based satisfiability algorithms provides very poor performance, unaccept-

able for any real application. Even if the experiments with other DL systems have

been mainly at the terminological level, we can try to extract some lessons from those

experiences (see Haarslev and Moller [1999], Horrocks and Patel-Schneider [1999]).

One of the advantages of the precompletion technique is that the precompletion

phase is completely separated from the terminological reasoning, so optimisation tech-

niques implemented at the two levels do not interact adversely. The terminological

reasoner we are using (i.e. FaCT) is already optimised, therefore we focus on the opti-

misation we can perform during the precompletion.

Among the answers we are looking for from the experiments is whether optimisa-

tions made at the precompletion level make a real difference, and which ones produce

the best results. Given the fact that during the precompletion no new individuals are

created, the source of complexity must be related to the nondeterminism in the �–rule.

8.2.1 Evaluation strategy

According to the formal description of the algorithm in Chapter 5, the order in which

the rules are selected is irrelevant to the correctness and completeness of the algo-

rithm.1 However, for specifying the actual algorithm we decided to consider all the

1Note that with ��f this is no longer true for terminological reasoning. The difference is the absenceof the �–rule in the generation of precompletions.

constraints associated to a given individual before moving to a different individual.

Note that there is always the possibility that constraints associated with different in-

dividuals cause the addition of new constraints for the individual which has been pre-

viously considered. In this case, eventually the individual in question will be selected

again.

Intuitively, this strategy may give good results when there is not a strong interde-

pendency between individuals (i.e. role assertions). On the other hand, it may cause an

unnecessary overhead if there are “easy” contradictions in individuals not yet consid-

8.2.2 Axiom absorption and lazy expansion

General axioms are one of the major sources of nondeterminism; in fact, in every

axiom � � is hidden the disjunctive formula � � �� applied to every individual

name. One the most effective ways of reducing the effects of axioms is the so called

absorption technique used in conjunction with lazy expansion of concept names (see

Horrocks and Tobies [2000]).

Roughly speaking, the idea behind this technique is to transform a general axiom

into the special form � � where � is a concept name. Then the axiom is treated as

a sort of definition for the name � and ignored until a concept constraint of the form

!�� is examined; at this point the “definition” � of � is added to the label of ! (i.e.

the new constraint !��). This basic idea can be extended to negated concept names as

well; i.e. having definitions of the form �� . However, their combination must

be used carefully to avoid incorrect results. We implemented the absorption algorithm

described in Horrocks and Tobies [2000].

8.2.3 Lexical normalisation

Concept expressions are normalised and encoded according to the transformation rules

described in Horrocks and Patel-Schneider [1999]. In the normal form, concept expres-

sions can be concept names, conjunctions of normal form concepts, universal quantifi-

cation constructors, and the negation of a normal form. Conjunctions are represented

as sets, so the order of the conjuncts does not affect the syntactic equivalence of dif-

ferent expressions; in addition, nested conjunctions are flattened (e.g. expressions like

�� are transformed into �� ).

Normal form expressions are uniquely associated with an identifier which is used

Concept expression Normal form� ��

Table 8.1: Lexical normalisation rules

in place of the complex expression itself. A table is used to relate the normalised

expressions to their respective identifiers. In this way, every time an expression is

encountered a second time during the parsing, the very same identifier is used to rep-

resent it. This mechanism enables the possibility of detecting trivial inconsistencies at

an early stage, without the necessity of expanding the concept expressions.

For example, let us consider the expression �� . The

parser is recursively invoked over its subexpressions, first the expression ��

is transformed into the negation of an universal quantification ��.

During the transformation each subexpression is associated to an identifier, and the

identifier table will contain the mappings �/� 3# � ��, and �/� 3# ��/�.2

During the parsing of the second top level conjunct �� the parsed subex-

pressions are recognised as stored in the table and substituted by their identifiers. So

the resulting normal form is ��/�� /�� which is immediately normalised as ��.

8.2.4 Backjumping

Inherent unsatisfiability concealed in sub-problems can lead to large amounts of un-

productive backtracking search known as thrashing.

Example 8.1

2There will also be identifiers for the other subexpressions, but these are sufficient to show our point.

Consider the set of constraints��

��

which may cause the exploration of �� alternative combinations of constraints on �

deriving from the concepts �� , while the true cause for the

failure is related to the individual �. For example, this will happen if the rule application

strategy forces the evaluation of all the constraints associated with an individual before

considering a different individual.

This problem is addressed by adapting a form of dependency directed backtracking

called backjumping, which has been used in solving constraint satisfiability problems

(see Baker [1995]). Backjumping works by labeling concept constraints with a depen-

dency set indicating the branching points on which they depend. A concept constraint

�� depends on a branching point if �� was added to the label by the �–rule gener-

ating the branching point or if �� was generated by a different rule and the concept

constraints involved in the rule depends on the branching point.3

For example, the constraints �� generate the new constraint ��

by means of the �–rule. The constraint �� inherits the very same dependency set as

the constraint ��.

When a clash is discovered, the dependency sets of the clashing concept constraints

can be used to identify the most recent branching point where exploring the other

branch might alleviate the cause of the clash. The algorithm can then jump back over

intervening branching points without exploring alternative branches.

In more detail, when a clash is detected the union of the dependency sets of the

clashing concepts is taken, and backtracking is performed. During backtracking, each

branching point encountered is checked against the dependency set to see if it is a

member. If it is not in the dependency set, then the other branch is ignored and back-

tracking continues. If the branching point is in the dependency set, and the other branch

has not been explored, then backtracking stops and the algorithm proceeds with the ex-

ploration of the second branch. If both branches have already been explored, then the

dependency sets from the two branches are unioned and backtracking continues.

Note that when a contradiction is discovered by the verification of a label, there

is no way for our algorithm of knowing the source of the contradiction; therefore, is

3Since new role assertions are never introduced, the dependency is only related to concept expres-sions.

returned the union of the dependency set of all constraints of the label. As we are

going to show in Section 8.2.6, we can modify the algorithm in order to make the

backjumping more effective.

With respect to the given example, the precompletion algorithm generates the first

precompletion ��

��

by choosing the first concept for each constraint containing the disjunction, and ap-

plying the �–rule to the constraints �� and �� . In this process, the

algorithm generates � different branching points, and every constraint ��

is labelled with a different branching point. The constraints associated with � have no

branching point associated with them (this indicate a dependency with the top level),

so when the terminological reasoner discovers the inconsistency of the label associated

with � there is no reason to explore the alternative options �� .

8.2.5 Abox partitioning

Analysing the precompletion algorithm, it is easy to realise that individuals can in-

fluence each other only by means of role assertions. In addition, these assertions are

“static”, in the sense that they do not change during the precompletion process. This

guarantees that unconnected parts of the Abox can be precompleted independently.

The whole KB is satisfiable iff each connected component is independently satisfiable.

This property makes little difference in case of KB satisfiability, since the only ad-

vantage is on the memory occupation.4 However, it can be a significant improvement

in case of the instance checking problem (i.e. verifying whether an individual is mem-

ber of a concept in every model of the KB, see 2.2). In fact, this problem is usually

reduced to KB (un)satisfiability: given a KB �� , an individual �, and a concept �,

�� iff the KB �� is unsatisfiable. Knowing that the initial

KB �� is satisfiable (this is usually the case) allows us to verify the satisfiability

of the KB by considering only the assertions about individuals “connected” with the

name in question (the individual �). The rest of the Abox can be ignored.

4Even though one can use the property for loading into the main memory only the part of the Aboxrelevant for the current precompletion (see for example Elhaik and Rousset [1998]).

8.2.6 Using the terminological reasoner

According to the described algorithms the terminological reasoner is used only when

a precompletion is generated. However, it can be used in different phases of the algo-

rithm in a more sophisticated way. The starting assumption is that the terminological

reasoner is more efficient than the precompletion phase; therefore it can be used not

only for verifying the consistency of a precompletion but also for guiding the algo-

rithm. In addition, is reasonable to expect that the Abox is much bigger than the Tbox.

Result caching As pointed out in Donini et al. [1996a], caching the satisfiability

results for the tested concept expressions is essential for maintaining the complexity of

the actual algorithm in the EXPTIME theoretical complexity. In addition, in our case it

can allow us to avoid the overhead of converting a concept expression from the internal

representation to a format suitable for the terminological reasoner.

As described in Section 8.2.3, the system keeps a table of all the normalised con-

cept expressions the system comes across during the precompletion. When the ter-

minological reasoner is invoked for checking the satisfiability of a concept expression

(the conjunction of one or more concepts in a label), the result is stored in the table

as well. If a concept expression, having a cached satisfiability value, must be checked

again, its cached value is returned without invoking the terminological reasoner again.

Early inconsistency detection The first alternative use of the terminological rea-

soner is as an early inconsistency detector. This is based on the observation that if the

constraints applied to an individual are inconsistent they will be inconsistent in any

generated precompletion. Therefore the label of an individual can be verified even

before an actual precompletion is generated. If an individual is found having an incon-

sistent label, the algorithm stops exploring the current search branch and backtracks to

the appropriate saved state of the precompletion process.

Taking this approach too far can be damaging if the terminological reasoner is

called too often. We decided to use the focusing/un-focusing of the precompletion

process on an individual as a trigger for the label verification. The label is verified when

all the constraints associated to an individual have been considered, before considering

a different individual. If during subsequent precompletion of the rest of the KB no new

constraints are added to an individual which as been already precompleted, there is no

reason to verify its label again.

Modal verification Backjumping is an essential technique for pruning the search

space, however it works properly only if we have precise knowledge of the constraints

which generate a clash. Our problem is that in case of a clash detected by the termi-

nological reasoner, we cannot get the information about the constraints responsible for

the clash. We could assume that any concept in the label can be the cause of the clash,

but this would make the backjumping technique much less effective.

For example, the label �� is inconsistent if the terminol-

ogy contains an axiom like � ��. However, the inconsistency can only be detected

by the terminological reasoner. Once the unsatisfiability of the label is discovered, the

system cannot tell which element of the label caused the inconsistency; in principle it

can even be among the set ��.

We can improve the algorithm observing that if during the precompletion a clash

has not been detected, the only possible cause for an inconsistency must be associated

with an “anonymous” element whose existence is enforced by an existential quantifi-

cation. This is because the tree–like model property of the logic guarantees that no

constraints can be “pushed back” from these “anonymous” elements, and contradic-

tions generated in the propositional part of the labels (e.g. like � and ��) are detected

during the precompletion process.

When there is the necessity of verifying the satisfiability of a label, the standard

algorithm builds a new concept by conjoining all the concepts in the label. With the

modal verification technique, the only concepts considered are the universal and exis-

tential quantifications (i.e. �� and ��). The algorithm selects the smallest subsets

of the label which can be independently verified without compromising the complete-

ness of the algorithm. In the previous example the terminological reasoner is used to

verify the two concepts �� and �� independently.

The basic idea for selecting these subsets is to consider each existential concept in a

different subset, and adding the universal restrictions which can apply to it (i.e. having

the same or a more general role name). However, this simple approach does not work

with ��f, because of the interaction between functional role and role hierachy. In

fact, two (or more) of those “anonymous” elements may be forced to co-refer by a

common functional super-role (see Section 3.1.1, and Example 7.5). For example,

a label containing the two concepts �� and �� , is not satisfiable if there is a

functional role � such that & � and � & � . This contradiction would not be

detected by considering the two existential concepts separately.

The solution for this problem is to check existential concepts whose role share com-

mon functional super–roles together. This must be extended to “chains” of roles like

& ��, � & ��, � & ��, and % & �� as well (see Example 7.5). Universal concepts

alone do not generate contradictions without the actual existence of a successor; i.e.

two concepts like �� and �� are not contradictory, unless they are combined

with an corresponding existential concept (or Abox assertion). Therefore they do not

need to be verified by themselves (they are considered by means of exitential concepts,

or of the corresponding precompletion rule).

Using this technique, in case of unsatisfiability we can narrow the set of involved

constraints. Therefore it can be extremely useful in conjunction with backjumping,

because it enables a more precise identification of the backtrack point responsible for

the clash.

Example 8.2

Let us consider the following variation of Example 8.1��

��

In this case backjumping alone would not provide any improvement because the gen-

erated precompletion ��

��

is associated with the single individual �. Therefore, the terminological reasoner is

invoked with the concept

��

and the algorithm is unable to detect which constraints are the cause of the unsatisfia-

bility.

By checking the modal part only, the terminological reasoner is invoked with the

concept �� . When the unsatisfiability is reported, the algorithm

knows that the clash is generated by one of the two constraints ��

and the backjumping is more effective.

Instance checking by subsumption The terminological reasoner can sometimes be

used to avoid precompletion in case of instance checking, by using an approximation

of the Most Specific Concept technique (see Era and Donini [1992]). The idea is to

build a concept expression which is guaranteed to contain the individual, and is as

specific as possible. The trivial way of doing it is by conjoining all the concepts in the

label of the individual, but the role constraints can be considered as well to narrow the

expression by using existential quantification.

For example, from the set of constraints �� we know that the in-

dividual �must be contained in the expression ��. This process can be carried

on for different levels of role assertions (e.g. � can be related to a third individual � or

even to � itself); we decided to build the MSC by stopping when a cycle is detected.

Now, let us assume that we calculated the MSC associated to the element � and

we call it 0��. If we are interested in verifying whether the individual � is in the

extension of the concept � in every interpretation, we can first check if 0�� is

subsumed by � or by �� before performing any precompletion. In fact, if 0�� is

subsumed by � then � is an instance of �, while if it is subsumed by �� we know for

sure that � cannot be an instance of � in any interpretation satisfying the KB.

8.2.7 Other techniques

Query caching If an instance checking like �� is verified, the assertion

�� is a logical consequence of the KB; therefore the new KB �� is

equivalent to the original one. In addition, the proof for the original instance checking

problem can be rather involved.

The idea behind the instance checking is to add the results of positive instance

checking problems as new assertions in the KB. This can be useful in cases in which

the same query is issued again, or when a contradiction can be detected earlier because

of the new assertions.

Semantic branching Semantic branching is a technique for exploring disjoint bran-

ches of the search space, when a nondeterministic rule must be applied (see Horrocks

and Patel-Schneider [1999]). When a constraint containing a disjunction constructor

(i.e. �� ) must be expanded, the �–rule selects the first disjunct and adds it

to the constraint system (i.e. ��). If the new constraint causes a contradiction, the al-

gorithm backtracks and the second disjunct �� is choosen. The process continues until

either a satisfiable precompletion is found, or all the possibilities have been exhausted.

8.3. EXPERIMENTS 180

This method is called syntactic branching, because it follows the syntactic order of the

disjuncts in the concept expression. However, the options are not necessarily disjoint,

so there is nothing to prevent the recurrence of unsatisfiable constraints in different

branches.

We have not implemented semanting branching in the way it is described in Hor-

rocks and Patel-Schneider [1999]. This is because it requires a completely different

backtracking mechanism, and we wanted a technique which could be easily activated

or deactivated. What we do instead, is a modified version of the syntactic branching

where every time a new disjunct is tried, the conjunction of the negation of previous

failed choices are added as well. For example, if the first option �� fails, the new

constraints �� and �� are tried. If the second choice fails as well, the third option

is �� with the constraint �� , and so on util the last option. The idea is

that the negated concept inserted would cause contradictions to be detected at an early

stage, pruning the useless part of the search space.

Unfortunately, this implementation is far from optimal, because it may suffer the

overhead of evaluating the newly introduced constraints; in fact, with the tests we used

the optimisation did not improve the performance of the system (although, it has not

degraded either).

8.3 Experiments

The main purpose of the experiments we performed with the system was to verify the

feasability of the approach in terms of performance. We compared our implementation

with Race (see Haarslev and Moller [2000a]) because it is the fastest available Abox

rasoner, and it provides a superset of ��f (it has unqualified number restrictions as

well). The comparison of Race with other DL systems can be found in Franconi et al.

[1998b].

8.3.1 DL benchmark suite

The main problem in performing experiments with an Abox reasoner is the lack of

real data (i.e. KBs). For our experiments we used the synthetic Abox tests of the DL

benchmark suite.5 The DL benchmark suite is a collection of tests for DL systems

which has been introduced for the DL’98 workshop (see Horrocks and Patel-Schneider

5It is available at the URL http://kogs-www.informatik.uni-hamburg.de/˜moeller/dl-benchmark-suite.html.

[1998b]). The test suite was designed according to a structure which has been first

used for the Comparison of Theorem Provers for Modal Logics at Tableaux ’98 (see

de Swart [1998]).

The test suite is mainly oriented towards terminological reasoning, because the

tests were originated in the modal logic community. There are 9 classes of tests

(k branch n, k d4 n, k dum n, k grz n, k lin n, k path n, k ph n, k poly n, k t4p n)

each one containing different instances of problems of increasing difficulty. Each in-

stance is automatically generated according to a schema related to the class, and it

consists of a Tbox, an Abox and a set of Abox queries.

Each instance is evaluated by loading the knowledge base (i.e. Tbox and Abox),

then the KB satisfiability is checked and all the Abox queries are evaluated. If the

system is unable to evaluate completely all the queries within a CPU time limit of 500

seconds, the test for the current instance is aborted.

For each class the benchmark starts with the easiest instance (the first) and con-

tinues until all the instances for the class are evaluated, or a timeout interrupts the

evaluation. The number of the latest instance the system has been able to evaluate is

recorded together with the CPU time spent to evaluate it, and a new class of problems

is considered.

Abox test instances derive from the concepts generated for the Concept Satisfia-

bility Tests (see Horrocks and Patel-Schneider [1998b]). The Tbox is generated by

naming all the sub-concepts appearing in the test concept, while the Abox is generated

from the model generated by a satisfiability test over the concept itself.

This approach has the advantage of being well defined, so the results of each Abox

query is known in advance; this is crucial for verifying the correctness of the DL sys-

tem. On the other hand, it has several disadvantages that make the tests unsatisfactory

as examples of realistic problems. Firstly, the originating concepts contain a single role

name, and this is very unlikely in any real KB. The second problem is the terminology.

Since all the sub–concepts are recursively named the resulting Tbox is unfoldable (i.e.

acyclic and definitional, see Section 2.1.2), and this means that it can be considered

as an empty terminology. The last problem of the synthetic Abox tests is the fact that

Aboxes are built starting from a model generated by a tableaux based DL system for a

concept. The kind of models these systems generate are not unstructured, but they are

always tree shaped and mostly fully connected. Obviously they cannot be taken as a

representative sample for generic KBs.

As we are going to show in the next section, the inadequacy of the data tests is

highlighted by some of the results of our experiments. Unfortunately, these synthetic

Abox tests are the only coherent suite for testing DL systems, so we must make the

most of them. In performing the experiments we were not interested in the absolute

performance, but on the actual behaviour of the system with different optimisation fea-

tures activated. However, the results must be considered in the light of their limitations.

We do not think that the available Abox tests reflect realistic KBs, so they cannot give

an estimate of the size/type of KB our system can handle.

8.3.2 Measurements

Our objective is to etablish a correlation between the performance of the system on

different tests and the behaviour of the optimization techniques. Some of the different

optimisations mentioned in Section 8.2 can be turned on and off by flags, and sev-

eral counters have been placed in crucial parts of the code to obtain a picture of the

behaviour of the system during the evaluation of a tests. For example, we counted

the number of times the knowledge base is precompleted, in contrast with the times

the MSC test is sufficient to answer an instance check. Another counter registers the

number of backjumps which have been performed.

We assumed that each class of tests presents a different pattern, therefore we took

separate measurements for each class. The results are summarised in Table 8.2, Ta-

ble 8.3, and Table 8.4.

Results of the experiments are grouped by the class of tests, and each row shows

the results for the system with a particular set of optimisations. As a reference, the first

row of each class shows the results of the Race system. The following rows contain the

results obtained by our system with different configuration of optimisation techniques.

The results obtained by our system appear in decreasing order of performance, so the

first row contains always the best result, and the last the worse. We chosen this order

because it highlights the impact of the different optimisations on the performance of

the system.

All the experiments have been performed on a machine equipped with a Celeron

450 MHz processor, with 256 Mb of main memory and running Linux with the kernel

version 2.2.17. Both the systems are written in Common Lisp and run using Allegro

Common Lisp version 5.0.1.

We will first explain the meaning of each column in the tables, then we will try to

draw some conclusions from the results of the experiments.

The System field shows the active optimisations with our system (aFaCT) or the

version of the Race system that has been used. The six different optimisations which

can be activated are6

LABEL-MODAL-CHECK enable the modal verification discussed in Section 8.2.6;

CHECK-BEFORE-PRECOMP enable the verification of the satisfiability of the la-

bel of an individual before starting to precomplete it (the verification at the end

is always performed);

MSC-CHECKING the Most Specific Concept of an individual is build and used for

the instance checking before the actual precompletion of the Abox;

BACKJUMPING enable the backjumping;

QUERY-CACHING enable query caching (see Section 8.2.7);

SEMANTIC-BRANCHING activate the semantic branching (see Section 8.2.7).

As we said above, the configuration opt6 has all these optimisation switches ac-

tivated, then at each different configuration one of the switches is turned off. So in

the configuration opt5 the MSC-CHECKING is turned off, in opt4 the QUERY-

CACHING, in opt3 the SEMANTIC-BRANCHING, in opt2 the CHECK-BEFORE-

PRECOMP, in opt1 the LABEL-MODAL-CHECK, and finally in opt0 the BACK-

JUMPING is turned off as well.

We could have tried every combination of the six flags, but it would have taken

too much time to run, and a volume of result data rather unmanageable. Therefore we

decided to start with all the optimisation turned on, and disable a feature at each step.

This gives us seven different configuration of optimisations, named from opt6 to opt0.

With opt6, all the optimisations are active, while with opt0 they are all switched off.

In the configuration opt5 the MSC-CHECKING is turned off, in opt4 the QUERY-

CACHING, in opt3 the SEMANTIC-BRANCHING, in opt2 the CHECK-BEFORE-

PRECOMP, in opt1 the LABEL-MODAL-CHECK, and finally in opt0 the BACK-

JUMPING is turned off as well. The order is based on the our experience with the

system, we first eliminated the options that we thought did not impact the performance.

The first flag MSC-CHECKING is an exception, because the use of subsumption to

perform instance checking is not an optimisation which impacts on the behaviour of

the algorithm.

6The remaining techniques we mentioned in the previous section are always active.

The next two columns Last Instance and Duration show the last instance of the

class that has been solved, and how many seconds of CPU time have been used for

solving that instance. The following fields are the value of the different counters.

All-Instance-Checks shows the total number of instance checks that have been

performed, while Prec-Instance-Checks indicates the number of them which have

been performed by precompletion. Note that unless the MSC-CHECKING is active,

these two counters have the same value.

Backjumping counts the number of times the following options of a backtracking

point have been ignored because of the backjumping optimisation. The counter is in-

cremented at every backatrack point, so for a single failure more than one backtrack

points can be “skipped” (see the example in Section 8.2.4). Therefore its quantitative

value is not very significant; however, the order of magnitude gives an idea of how

much the backjumping is “active”. Its value compared with the number of clashes de-

tected gives an idea of how “deep” the backjumps are (how many backtracking points

are ignored in a single backtrack).

Call-Reasoner-Full-Label and Call-Reasoner-Modal-Part count the number of

times the terminological reasoner has been invoked to verify the satisfiability of a label.

Sat-Label on the other hand counts the total number of times the label has been ver-

ified. This number is typically higher than the sum of the two previous ones, because

the satisfiability value of the label could have been cached (counters Cached-Clash-

Label and Cached-Sat-Label), or a contradiction is immediately detected among the

concepts of the label (Simple-Clash or Bottom-Detect, this latter one detects the

presence of an explicit � contraint in the label, usually added by a concept normalisa-

tion.) Finally, the counter Label-Clash shows haw many times the verification of the

satisfiability of a label returned a failure.

8.3.3 Notes on results

The first thing worth noticing is the fact that most of the instance checking tests can

be solved by terminological reasoning only, without precompleting the KB. This can

be observed by considering the value of the counter Prec-Instance-Checks when

the system has the MSC-CHECKING enabled (configuration opt6). In all classes

except k d4 n and k grz n all the instance checks are solved without precompleting

the Abox, and even in those two classes over 98% of the instance checks are solved

All-Ins

Backju

r-Full

Call-R

l-Clas

d-Clas

kbranchn

All-Ins

Backju

r-Full

Call-R

l-Clas

d-Clas

kpathn

All-Ins

Backju

r-Full

Call-R

l-Clas

d-Clas

el Simple

h Botto

kpolyn

by using the MSC.7 This is a clear indication that the tests are not adequate for fully

evaluating an Abox reasoner.

Note that using the MSC is not always the best way (in terms of efficiency) of

performing the instance checking. Although, in some cases it provides remarkable

good results (e.g. k dum n and k t4p n), in other it actually manifests a significant

loss of performance (e.g. k poly n and k ph n).

Unsurprisingly, no particular configuration of optimizations is best in all experi-

ments. Different classes of problems seem to require different optimisations for better

performance, and this may suggests that a good approach could be an adaptive system

which activates or deactivates some optimizations according to the behaviour of the

system with a given KB.

Having said that, the nondeterminism is definitely the main source of slow down of

the system, and optimisations that are directed at reducing the nondeterminism (such

as backjumping) seem to provide uniformily better results. This is clearly indicated

by the fact that the “activeness” of the backjumping (counter Backjumping) is one

of the few indicators that can be clearly related to the performance (see for example

k branch n, but the same pattern can be spotted in other classes). Moreover, as pre-

dicted in Section 8.2.4, the backjumping alone (configuration opt1) does not seems

to provide good results without the LABEL-MODAL-CHECK activated as well. This

is clearly indicated by the ratio of the two counters Call-Reasoner-Full-Label and

Call-Reasoner-Modal-Part w.r.t. the counter Backjumping.

The rest of the counters do not seem to show a direct correlation with performance.

This probably indicates that the optimisations they are monitoring do not significantly

influence the behaviour of the system (at least with the available Abox tests).

The last thing we would like to mention is the fact that we really did not expect our

system to be faster than the Race system; even in a few classes. We had this impression

because we know that Race is no slower than FaCT as a terminological reasoner, and

it uses the very same engine for both Abox and terminological reasoning. If you look

closely at the precompletion technique, it uses transformation rules which are very

similar to the ones implemented in Race; the difference lies the fact that our system

made a sharp distiction between rules working with the Abox (the precompletion rules)

and rules working at the terminological level (delegated to the terminological reasoner

FaCT).

7The reader may be puzzled by the fact that the rest of the counters are not equal to zero; this isexplained by the fact that the satisfiability of the KB is verified before performing any instance checkingtests.

A system without this separation (like Race) has the possibility to maximise the

effect of optimisations like backjumping because it has better information about con-

straints causing contradictions (something we tried to simulate by verifying the modal

part of the label only). We think that with classes like k branch n it is this fact that

makes the real difference in performance. In addition, Race is a more mature system

and it has more optimisation techniques implemented (for example model merging,

see Haarslev and Moller [2000a]).

For these reasons, we did not expect to outperform Race for any test having a small

and almost fully connected Abox, where partitioning and memory occupation do not

make any difference. Obviously, it could be the case that some of the results exhibited

by our system depend on the peculiar structure of the test problems. However, if this

behaviour is exhibited even with general KBs, it means that there is some unneces-

sary overhead in the hybrid reasoning in Race which can be reduced by separating the

treatment of Abox and “terminological” constraints. This would suggest that the per-

formance of tableaux–based algorithms, such as the one implemented in Race, can be

improved by using an evaluation strategy more similar to the precompletion strategy

(e.g. work on individual first).

Chapter 9

Conclusions

This chapter summarise the results presented in this thesis, and suggests some direc-

tions for future work.

9.1 Thesis contributions

The aim of our work is to study basic reasoning techniques for DL knowledge bases

with Abox, and to determine whether they can lead to practical tractability. This thesis

investigates two reasoning problems, namely KB satisfiability and query answering.

9.1.1 KB satisfiability

Different techniques have been considered, and we chose to investigate the possibility

of extending the precompletion technique in order to provide an algorithm for the DL

��f.

By exploiting the tree–like model property of ��f, it has been proved that the tech-

nique leads to a correct and complete algorithm for KB satisfiability. An experimental

DL reasoner based on the studied algorithm has been developed, and some experiments

have been performed to evaluate the behaviour of the system.

The experiments provided evidence of the fact that, even with an optimised termi-

nological reasoner, different optimisation techniques must be used in the precomple-

tion phase in order to obtain acceptable performance. Although we have not provided

formal proofs of correctness and completeness of the optimisations used, we have pro-

vided arguments supporting the fact that their use does not affect the correctness and

completeness of the underlying algorithm.

9.2. FUTURE WORK 191

The system developed, although experimental, exhibits good performance com-

pared to a state of the art DL system. As pointed out in Chapter 8, we think that the

ideas behind the precompletion technique may be used to improve the performance of

tableaux based systems like Race.

9.1.2 Query answering

The language for querying KBs has always been one of the weakest points of DL

systems; recently, there have been some attempts to find a solution to this inadequacy.

In this thesis we develop a technique for answering conjunctive queries over ��f KBs.

Again, the technique relies on a careful use of the model–theoretic properties of

��f. A class of interpretations has been identified which is complete w.r.t. the prob-

lem of conjunctive query answering in ��f. The properties of this class have been

used to devise an algorithm for query answering, and for proving its correctness and

completeness.

Although the results which have been presented here are specific to the DL ��f,

the technique is general enough to provide query answering algorithms for a wide

range of DLs. In addition, since the problem is reduced to KB satisfiability, the algo-

rithm described can easily be implemented on top of most of the available DL systems.

9.2 Future work

There are several research directions which stem from the work presented in this thesis.

KB satisfiability On the KB satisfiability front it is natural to ask ourselves whether

the precompletion technique can be extended to more expressive DLs. We think that

this technique can easily be adapted to most of the DLs having the tree–model property

(or a tree–like property like the one detailed in Chapter 3). For example, the addition of

(qualified) number restrictions should not be too difficult. On the other hand, there are

DL constructors that appear to be much more difficult (or even impossible) to handle

in the precompletion framework as, for example, the inverse role constructor.

The experiments with the developed DL system show that there is still the neces-

sity to produce an adequate test suite for Abox reasoners. The fact that DL systems

providing Abox reasoning are becoming fast enough to be used in applications may

encourage the production of such tests.

9.2. FUTURE WORK 192

Since the experiments show that different techniques must be incorporated in the

basic algorithm to obtain acceptable performance, there is still some theoretical work

to do. We provided arguments supporting the correctness and completeness of the

modified algorithm, but ideally this should be shown by means of formal proofs.

Query answering Concerning the query answering problem, there is still a long way

to go before having usable systems. Moreover, there are still some restrictions on the

query language for ��f which should be lifted (if possible). Some conjectures about

the solution to these problems have been made, and they will be investigated in the

near future.

Providing a completeness result for the complexity of query answering is an impor-

tant task that would provide a useful theoretical tool for evaluating query algorithms.

Before the actual implementation of an experimental system, we think that the pos-

sibility of optimising the nondeterminism in the query transformation must be carefully

considered. In fact, although the algorithm as it is can probably handle small Aboxes,

it is not suitable for large ones. In addition, the experience with the results obtained by

optimising the terminological and hybrid reasoning encourages research in this promis-

ing field.

Bibliography

S. Abiteboul, R. Hull, and V. Vianu. Foundations of databases. Addison-Wesley, 1995.

C. Areces, H. de Nivelle, and M. de Rijke. Prefixed resolution: A resolution method for

modal and description logics. In Proceedings of CADE 99, pages 187–201, January

F. Baader. Augmenting concept languages by transitive closure of roles: An alternative

to terminological cycles. In Proceedings of the 12th International Joint Conference

on Artificial Intelligence, pages 446–451. Morgan Kaufmann, 1991.

F. Baader and B. Hollunder. A terminological knowledge representation system with

complete inference algorithm. In Proceedings of the Workshop on Processing

Declarative Knowledge, pages 67–86, 1991a.

F. Baader and B. Hollunder. KRIS: Knowledge representation and inference system.

SIGART Bulletin, 2(3):8–14, 1991b.

F. Baader and U. Sattler. Number restrictions on complex roles in description logics:

A preliminary report. In Luigia Carlucci Aiello, Jon Doyle, and Stuart C. Shapiro,

editors, Proc. of the fifth International Conference on Principles of Knowledge Rep-

resentation and Reasoning, Cambridge, Massachusetts, USA, 1996. Morgan Kauf-

A. B. Baker. Intelligent Backtracking on Constraint Satisfaction Problems: Experi-

mental and Theoretical Results. PhD thesis, University of Oregon, 1995.

A. Borgida. On the relationship between description logic and predicate logic. In Pro-

ceedings of the Third International Conference on Information and Knowledge Man-

agement (CIKM’94), Gaithersburg, Maryland, November 29 - December 2, 1994,

pages 219–225. ACM, 1994.

BIBLIOGRAPHY 194

A. Borgida and P. F. Patel-Schneider. A semantic and complete algorithm for subsump-

tion in the classic description logic. Journal of Artificial Intelligence Research, 1,

M. Buchheit, F. M. Donini, and A. Schaerf. Decidable reasoning in terminological

knowledge representation systems. Journal of Artificial Intelligence Research, 1:

109–138, 1993.

D. Calvanese, G. De Giacomo, and M. Lenzerini. On the decidability of query contain-

ment under constraints. In Proceedings of the Seventeenth ACM SIGACT SIGMOD

SIGART Symposium on Principles of Database Systems (PODS-98), 1998a.

D. Calvanese, G. De Giacomo, and M. Lenzerini. Answering queries using views over

description logics knowledge bases. In Proceedings of the 16th National Conference

on Artificial Intelligence (AAAI 2000), pages 386–391, 2000.

D. Calvanese, M. Lenzerini, and D. Nardi. Description logics for conceptual data

modeling. In Jan Chomicki and Gunter Saake, editors, Logics for Databases and

Information Systems, pages 229–264. Kluwer Academic Publisher, 1998b.

A. K. Chandra and P. M. Merlin. Optimal implementation of conjunctive queries in

relational data bases. In Conference Record of the Ninth Annual ACM Symposium

on Theory of Computing, pages 77–90, 1977.

G. De Giacomo. Eliminating “converse” from converse pdl. Journal of Logic, Lan-

guage and Information, 5:193–208, 1996.

G. De Giacomo and M. Lenzerini. Boosting the correspondence between description

logics and propositional dynamic logics. In AAAI-Press/MIT-Press, editor, Pro-

ceedings of the 12th National Conference on Artificial Intelligence (AAAI’94), pages

205–212, 1994.

G. De Giacomo and M. Lenzerini. TBox and ABox reasoning in expressive descrip-

tion logics. In Proceedings of the Fifth International Conference on the Principles of

Knowledge Representation and Reasoning (KR-96), pages 316–327. Morgan Kauf-

mann Publishers, 1996.

G. De Giacomo and M. Lenzerini. A uniform framework for concept definitions in

description logics. Journal of Artificial Intelligence Research, 6:87–110, 1997.

BIBLIOGRAPHY 195

G. De Giacomo and F. Massacci. Combining deduction and model checking into

tableaux and algorithms for converse-pdl. Information and Computation, 1998.

H. de Swart, editor. Automated Reasoning with Analytic Tableaux and Related Meth-

ods: International Conference Tableaux’98, number 1397 in LNAI, 1998. Springer.

F. Donini, G. De Giacomo, and F. Massacci. EXPTIME tableaux for ��. In

L. Padgham, E. Franconi, M. Gehrke, D. L. McGuinness, and P. F. Patel-Schneider,

editors, Collected Papers from the International Description Logics Workshop

(DL’96), number WS-96-05 in AAAI Technical Report, pages 107–110. AAAI

Press, Menlo Park, California, 1996a.

F. M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf. Reasoning in description logics.

In Foundation of Knowledge Representation, pages 191–236. CSLI-Publications,

1996b.

F.M. Donini, M. Lenzerini, D. Nardi, and A. Schaerf. Deduction in concept languages:

From subsumption to instance checking. Journal of Logic and Computation, 4(4):

423–452, 1994.

Francesco M. Donini, M. Lenzerini, D. Nardi, and W. Nutt. The complexity of concept

languages. Information and Computation, 134(1):1–58, 1997.

Q. Elhaik and M. C. Rousset. Making an abox persistent. In Franconi et al. [1998b].

A. Era and F. Donini. Most specific concepts for knowledge bases with incomplete

information. In International Conference on Information and Knowledge Manage-

ment, Baltimore, November 1992.

E. Franconi. Description logics for natural language processing. In Working Notes of

the 1994 AAAI Fall Symposium on Knowledge Representation for Natural Language

Processing in Implemented Systems, 1994.

E. Franconi, G. De Giacomo, I. R. Horrocks, D. L. McGuinness, W. Nutt, P. F. Patel-

Schneider, and C. A. Welty. Report on the 1998 intl. workshop on description logics

(dl’98), 1998a.

Enrico Franconi, G. De Giacomo, Robert M. MacGregor, Werner Nutt, and Christo-

pher A. Welty, editors. Proceeding of the 1998 International Workshop on Descrip-

tion Logics (DL’98), 1998b. CEUR Pubblication.

BIBLIOGRAPHY 196

E. Gradel. Why are modal logics so robustly decidable? Bulletin of the European

Association for Theoretical Computer Science, 68:90–103, 1999.

V. Haarslev and R. Moller. An empirical evaluation of optimization strategies for abox

reasoning in expressive description logics. In Lambrix et al. [1999], pages 115–119.

V. Haarslev and R. Moller. Consistency testing: The race experience. In Proceedings

of TABLEAUX 2000, volume 1847 of Lecture Notes in Computer Science, pages

57–61. Springer, 2000a.

V. Haarslev and R. Moller. Expressive abox reasoning with number restrictions, role hi-

erarchies, and transitively closed roles. In A.G. Cohn, F. Giunchiglia, and B. Selman,

editors, Proceedings of the Seventh International Conference on Knowledge Repre-

sentation and Reasoning (KR2000), pages 273–284. Morgan Kaufmann, 2000b.

V. Haarslev and R. Moller. High performance reasoning with very large knowledge

bases. In Franz Baader and Ulrike Sattler, editors, Proceedings of the 2000 Interna-

tional Workshop on Description Logics (DL2000), pages 143–152, 2000c.

B. Hollunder. Consistency checking reduced to satisfiability of concepts in terminolog-

ical systems. Annals of Mathematics and Artificial Intelligence, 18:95–131, 1996.

I. Horrocks. Optimising Tableaux Decision Procedures for Description Logics. PhD

thesis, University of Manchester, 1997.

I. Horrocks. Using an expressive description logic: FaCT or fiction? In Proc. of the 6th

Int. Conf. on the Principles of Knowledge Representation and Reasoning (KR’98),

I. Horrocks and P. F. Patel-Schneider. Comparing subsumption optimizations. In Fran-

coni et al. [1998b].

I. Horrocks and P. F. Patel-Schneider. Dl system comparison. In Franconi et al.

[1998b].

I. Horrocks and P. F. Patel-Schneider. Optimising description logic subsumption. Jour-

nal of Logic and Computation, 9(3):267–293, 1999.

I. Horrocks and U. Sattler. A description logic with transitive and inverse roles and

role hierarchies. In Franconi et al. [1998b].

BIBLIOGRAPHY 197

I. Horrocks and U. Sattler. A description logic with transitive and inverse roles and

role hierarchies. Journal of Logic and Computation, 9(3):385–410, 1999.

I. Horrocks, U. Sattler, S. Tessaris, and S. Tobies. Query containment using a DLR

ABox. LTCS-Report 99-15, LuFG Theoretical Computer Science, RWTH Aachen,

Germany, 1999a.

I. Horrocks, U. Sattler, S. Tessaris, and S. Tobies. How to decide query containment un-

der constraints using a description logic. In Logic for Programming and Automated

Reasoning (LPAR 2000), volume 1955 of Lecture Notes in Computer Science, pages

326–343. Springer, 2000a.

I. Horrocks, U. Sattler, and S. Tobies. Practical reasoning for expressive description

logics. In H. Ganzinger, D. McAllester, and A. Voronkov, editors, Proceedings of the

6th International Conference on Logic for Programming and Automated Reasoning

(LPAR’99), number 1705 in Lecture Notes in Artificial Intelligence, pages 161–180.

Springer-Verlag, 1999b.

I. Horrocks, U. Sattler, and S. Tobies. Reasoning with individuals for the description

logic ��0. In David MacAllester, editor, Proceedings of the 17th International

Conference on Automated Deduction (CADE-17), number 1831 in Lecture Notes In

Artificial Intelligence, pages 482–496. Springer-Verlag, 2000b.

I. Horrocks and S. Tessaris. A conjunctive query language for description logic aboxes.

In National Conference on Artificial Intelligence (AAAI 2000), pages 399–404.

American Association for Artificial Intelligence, 2000.

I. Horrocks and S. Tobies. Reasoning with axioms: Theory and practice. In Proceed-

ings of the 7th International Conference on the Principles of Knowledge Represen-

tation and Reasoning (KR’2000), pages 285–296, 2000.

U. Hustadt and R. A. Schmidt. Issues of decidability for description logics in the

framework of resolution. In R. Caferra and G. Salzer, editors, Automated Deduction

in Classical and Non-Classical Logics, volume 1761 of Lecture Notes in Artificial

Intelligence, pages 191–205. Springer, 2000.

KRSS. Description-logic knowledge representation system specification, November

BIBLIOGRAPHY 198

Patrick Lambrix, Alex Borgida, M. Lenzerini, Ralf Moller, and P. Patel-Schneider, edi-

tors. Proceeding of the 1999 International Workshop on Description Logics (DL’99),

1999. CEUR Pubblication.

A. Y. Levy and M. C. Rousset. Carin: A representation language combining horn

rules and description logics. In Proceedings of the Twelfth European Conference on

Artificial Intelligence (ECAI-96). John Wiley and Sons, 1996a.

A. Y. Levy and M. C. Rousset. The limits on combining recursive horn rules and

description logics. In Proceedings of the AAAI Thirteenth National Conference on

Artificial Intelligence 1996, 1996b.

C. Lutz and U. Sattler. The complexity of reasoning with boolean modal logic. In

Advances in Modal Logic 2000 (AiML 2000), Leipzig, Germany, 2000.

R. M. MacGregor and D. Brill. Recognition algorithms for the LOOM classifier. In

Proceedings of AAAI-92, pages 774–779. AAAI Press, 1992.

B. Nebel. Terminological cycles: Semantics and computational properties. In J. Sowa,

editor, Principles of Semantic Networks. Morgan Kaufmann, 1991.

P. F. Patel-Schneider. DLP system description. In Franconi et al. [1998b], pages 87–89.

R. Reiter. On closed world data bases. In H. Gallaire and J. Minker, editors, Logic and

Data Bases, pages 55–76, 1977.

R. Reiter. Towards a logical reconstruction of relational database theory. In Michael L.

Brodie, John Mylopoulos, and Joachim W. Schmidt, editors, On Conceptual Mod-

elling, Perspectives from Artificial Intelligence, Databases, and Programming Lan-

guages, Topics in Information Systems, pages 191–233. Springer, 1984.

M. C. Rousset. Backward reasoning in aboxes for query answering. In Enrico

Franconi and Michael Kifer, editors, Knowledge Representation meets Databases

(KRDB’99), volume CEUR-WS/Vol-21. CEUR, 1999.

U. Sattler. A concept language extended with different kinds of transitive roles. In

G. Gorz and S. Holldobler, editors, 20. Deutsche Jahrestagung fur Kunstliche In-

telligenz, number 1137 in Lecture Notes in Artificial Intelligence. Springer Verlag,

BIBLIOGRAPHY 199

A. Schaerf. Reasoning with individuals in concept languages. Data and Knowledge

Engineering, 13(2):141–176, 1994.

K. Schild. A correspondence theory for terminological logics: Preliminary report. In

Proceedings of the 12th International Joint Conference on Artificial Intelligence,

pages 466–471, 1991.

M. Schmidt-Schauss and G. Smolka. Attributive concept descriptions with comple-

ments. Artificial Intelligence, 48:1–26, 1991.

R. S. Streett. Propositional dynamic logic of looping and converse is elementarily

decidable. Information and Control, 54(1/2):121–141, July/August 1982.

T. Tammet. Using resolution for extending kl-one-type languages. In CIKM ’95,

Proceedings of the 1995 International Conference on Information and Knowledge

Management, November 28 - December 2, 1995, Baltimore, Maryland, USA, pages

326–332. ACM, 1995.

S. Tessaris and G. Gough. Abox reasoning with transitive roles and axioms. In Lambrix

et al. [1999].

M. Y. Vardi. Why is modal logic so robustly decidable? Technical Report TR97-274,

Rice University, 1997.

W. A. Woods and J. G. Schmolze. The KL-ONE family. Computer and Mathematics

with Applications, special issue: Semantic Networks in Artificial Intelligence, 23

(2-5):133–177, March-May 1992.

questions and answers: reasoning and querying in description …tessaris/papers/phd-thesis.pdf ·...

Documents

verbal reasoning - aim4aiims · pdf fileverbal reasoning...

logical reasoning answers

reasoning visual reasoning questions, answers & …reasoning...

querying - ics.uci.edu

acm sigmod sbd2016 - querying and reasoning over large scale...

7/2/20151 jena. rdf stores jena (jena.semanticweb.org)...

verbal reasoning c1 - lpf- · pdf fileverbal reasoning c1...

section 2 logical reasoning - old.ivyglobal.ca · preptests...

associate)professor)...

sbi po reasoning previous question and answers

distribution of semantic reasoning on the edge of internet...

reasoning across species: querying for genes in similar...

science, society, and criminological researchscience,...

general intelligence & reasoning questions, …...general...

mental ability & logical reasoning question &...

towards ontology based event processingtowards expressive...

trigger querying

analytical reasoning questions and answers

foundation of practice specimen paper answers and...

querying and reasoning over large scale building datasets:...