[ieee 2012 7th international forum on strategic technology (ifost) - tomsk, russia...

4
An Ontology Development and Maintenance System Ivan Zaikin Tomsk Polytechnic University, Tomsk, Russia [email protected] Abstract—Modern intellectual information systems make use of web ontologies for greater flexibility. The number of ontologies in the Linked Open Data cloud grows every year. There are many tools related to web ontologies, e. g. editors, inference engines, triple stores. Ontology development is now done by groups of people including ontology engineers, domain specialists, quality assurance staff and others. This paper discusses the problems and tasks of collaborative development and maintenance of web ontologies and introduces a system that facilitates these tasks. We suggest an approach based on using issue tracking systems together with distributed source code management systems for ontology development and maintenance. This approach permits collaborative working on semantic web ontologies, versioning them, posting and discussing issues linked to specific versions of ontologies and much more. Compared to other approaches (like web-editors), our approach allows using of any customary ontology editor and gives all of the advantages of distributed source code management such as creating new versions without network connection. There are, however, some problems which do not allow using just source code management systems with ontologies. We also discuss these problems and propose a solution which utilizes the ability of source code management systems to be extended. Keywords—Ontology, knowledge base, collaborative development, version control. I. INTRODUCTION Ontology is a formal description of a specific domain in a standardized language readable by both humans and computers. Like software, ontologies need to be maintained after development. There are various reasons to change ontologies: changes in the domain, changes in understanding of the domain, fixing errors of representation of the domain, ontology refactoring. Often ontologies are designed to be modular, like software. Multiple interconnected ontologies are called a network of ontologies. Development and maintenance of ontologies is done by groups of specialists including ontology engineers, domain specialists, quality assurance staff and others. Collaborative development of complex ontologies just can’t be done without version control tools which allow to rollback changes, compare different versions of files, merge changes done by different developers, and resolve conflicts. There would be no point in developing ontologies without publishing them (either on a web server or in a triple store). Before being published, ontologies should automatically be checked for syntax errors and consistency. It is possible to integrate popular version control systems with issue tracking systems (see [1] and [2]) which provide such features as discussing project issues, email notifications about new issues and changes, viewing changes directly in web browser, searching and filtering of issues and changes, project wiki pages, time tracking, automatic creating of Gantt charts. Such systems play an important role in software development lifecycle. They can also be very useful for development and maintaining of ontologies. They could be even more useful if they had such features as comparing ontologies directly in browser, searching and viewing history of ontology elements, ability to link to ontology elements from wiki and commit messages. So we need a system which would facilitate managing versions, collaborative editing, and publishing of ontologies. During requirement analysis, we have formulated the following requirements for the system: it should allow viewing previous versions of ontologies, comparing them and creating new versions, searching for a version where a specific concept was introduced or cancelled, editing ontologies with any customary ontology editor (such as Protégé [3]), creating new versions without connection to repository, merging changes by different users, and resolving conflicts semi-automatically. II. THEORY We can think of an ontology O = <E,S> as a combination of entities E, and statements S. Entities are classes, data types, properties, and individuals. Statements define ontology format, namespace prefixes, imports, ontology identifiers and the logical constituent – axioms – which define relations between entities. Note that an ontology can only contain entities that are used by one or more statements. Therefore removing an entity from an ontology implies removing all statements referring to this entity. Let us define a signature σ(s) of a statement s as a finite set of entities which are referred by s. A signature of a set of statements is calculated as a union of signatures of individual statements: σ({s 1 ,s 2 ,…,s n }) = σ(s 1 ) σ(s 2 ) σ(s n ). (1) Let v 1 S and v 2 S be the first version of an ontology and the second one (modified by a user) respectively. Define a change set C as a set of changes between ontology v 1 and v 2 . Each change can be represented as a pair (op, s) where op is an operation applied to s. Given two operations, “+” and “–”, a change set can easily be generated from a pair of statement sets (s , s + ) where s = v 1 \v 2 is a set of statements removed from v 1 and s + = v 2 \v 1 is a set of statements added to v 1 to obtain v 2 . Signature of a change is equal to signature of a corresponding statement. Signature of a change set can be calculated as a union of signatures of individual changes. Let us denote by δ(v 1 , v 2 ) a function which compares two ontology versions and 978-1-4673-1773-3/12/$31.00 ©2013 IEEE

Upload: ivan

Post on 30-Mar-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: [IEEE 2012 7th International Forum on Strategic Technology (IFOST) - Tomsk, Russia (2012.09.18-2012.09.21)] 2012 7th International Forum on Strategic Technology (IFOST) - An ontology

An Ontology Development and Maintenance System

Ivan Zaikin Tomsk Polytechnic University, Tomsk, Russia

[email protected]

Abstract—Modern intellectual information systems make use of web ontologies for greater flexibility. The number of ontologies in the Linked Open Data cloud grows every year. There are many tools related to web ontologies, e. g. editors, inference engines, triple stores. Ontology development is now done by groups of people including ontology engineers, domain specialists, quality assurance staff and others. This paper discusses the problems and tasks of collaborative development and maintenance of web ontologies and introduces a system that facilitates these tasks. We suggest an approach based on using issue tracking systems together with distributed source code management systems for ontology development and maintenance. This approach permits collaborative working on semantic web ontologies, versioning them, posting and discussing issues linked to specific versions of ontologies and much more. Compared to other approaches (like web-editors), our approach allows using of any customary ontology editor and gives all of the advantages of distributed source code management such as creating new versions without network connection. There are, however, some problems which do not allow using just source code management systems with ontologies. We also discuss these problems and propose a solution which utilizes the ability of source code management systems to be extended.

Keywords—Ontology, knowledge base, collaborative development, version control.

I. INTRODUCTION Ontology is a formal description of a specific domain in a

standardized language readable by both humans and computers. Like software, ontologies need to be maintained after development. There are various reasons to change ontologies: changes in the domain, changes in understanding of the domain, fixing errors of representation of the domain, ontology refactoring.

Often ontologies are designed to be modular, like software. Multiple interconnected ontologies are called a network of ontologies. Development and maintenance of ontologies is done by groups of specialists including ontology engineers, domain specialists, quality assurance staff and others. Collaborative development of complex ontologies just can’t be done without version control tools which allow to rollback changes, compare different versions of files, merge changes done by different developers, and resolve conflicts. There would be no point in developing ontologies without publishing them (either on a web server or in a triple store). Before being published, ontologies should automatically be checked for syntax errors and consistency.

It is possible to integrate popular version control systems with issue tracking systems (see [1] and [2]) which provide

such features as discussing project issues, email notifications about new issues and changes, viewing changes directly in web browser, searching and filtering of issues and changes, project wiki pages, time tracking, automatic creating of Gantt charts. Such systems play an important role in software development lifecycle. They can also be very useful for development and maintaining of ontologies. They could be even more useful if they had such features as comparing ontologies directly in browser, searching and viewing history of ontology elements, ability to link to ontology elements from wiki and commit messages.

So we need a system which would facilitate managing versions, collaborative editing, and publishing of ontologies. During requirement analysis, we have formulated the following requirements for the system: it should allow viewing previous versions of ontologies, comparing them and creating new versions, searching for a version where a specific concept was introduced or cancelled, editing ontologies with any customary ontology editor (such as Protégé [3]), creating new versions without connection to repository, merging changes by different users, and resolving conflicts semi-automatically.

II. THEORY We can think of an ontology O = <E,S> as a combination

of entities E, and statements S. Entities are classes, data types, properties, and individuals. Statements define ontology format, namespace prefixes, imports, ontology identifiers and the logical constituent – axioms – which define relations between entities. Note that an ontology can only contain entities that are used by one or more statements. Therefore removing an entity from an ontology implies removing all statements referring to this entity. Let us define a signature σ(s) of a statement s as a finite set of entities which are referred by s. A signature of a set of statements is calculated as a union of signatures of individual statements:

σ({s1,s2,…,sn}) = σ(s1) ∪ σ(s2) ∪…∪ σ(sn). (1)

Let v1 ⊆ S and v2 ⊆ S be the first version of an ontology and the second one (modified by a user) respectively. Define a change set C as a set of changes between ontology v1 and v2. Each change can be represented as a pair (op, s) where op is an operation applied to s. Given two operations, “+” and “–”, a change set can easily be generated from a pair of statement sets (s–, s+) where s– = v1\v2 is a set of statements removed from v1 and s+ = v2\v1 is a set of statements added to v1 to obtain v2. Signature of a change is equal to signature of a corresponding statement. Signature of a change set can be calculated as a union of signatures of individual changes. Let us denote by δ(v1, v2) a function which compares two ontology versions and

978-1-4673-1773-3/12/$31.00 ©2013 IEEE

Page 2: [IEEE 2012 7th International Forum on Strategic Technology (IFOST) - Tomsk, Russia (2012.09.18-2012.09.21)] 2012 7th International Forum on Strategic Technology (IFOST) - An ontology

v1

v2

v3

v4merge

Figure 1. Three way merge

Remote RepositoryLocal RepositoryLocal Tools

Browser

Issue TrackerOntology Editors

Remote Services

Figure 2. Components of the system

returns a change set between them. Denote by ε a function which applies a change set C to v1 so that ε(v1, C) = v2.

When another user modifies v1 he produces a third version denoted by v3 (see Figure 1). We can separate changes done by both users С = δ(v1, v2) ∪ δ(v1, v3) into three categories:

common changes

Cm=δ(v1,v2)∩δ(v1,v3), (2)

conflicting changes

( ) ( ) ( )( ) ⎭⎬⎫

⎩⎨⎧

≠∩∈= {}\,,31211 mC CvvcvvcC δσσδ , (3)

( ) ( ) ( )( ) ⎭⎬⎫

⎩⎨⎧

≠∩∈= {}\,,21312 mC CvvcvvcC δσσδ , (4)

and other changes

CO1 = (δ(v1, v2) \ Cm) \ CC1, (5) CO2 = (δ(v1, v3) \ Cm) \ CC2. (6)

Common changes are changes made my both users i. e. statements added by both users or removed by both users. Conflicting changes of first user CC1 are changes made by first user which refer to entities which are also referred by changes of second user. Similarly, conflicting changes of second user CC2 are changes which refer to entities which are also referred by changes of first user. The intuition behind these equations is that if both users make different changes to the same entity, then these changes should be considered conflicting and resolved manually by users. Common changes and other changes can be applied automatically. In case there are no conflicting changes, the resulting version v4 can be found as ε(v1,δ(v2,v3)).

In order to understand changes made to ontology, it is not sufficient just to see removed and added statements. One may want to know which entities have been added, removed, or modified. New entities can be found using the following equation:

( )( ) ( )121 \, vvvE σδσ=+ . (7)

This equation means that every entity that has associated changes and is not mentioned in v1 is considered new. Removed entities can be found using the following equation:

( )( ) ( )221 \, vvvE σδσ=− . (8)

Every entity that has associated changes and is not mentioned in v2 is considered removed. Finally, modified entities can be found using a slightly more complicated equation:

( ) ( ) ( )( )2121* ,vvvvE δσσσ ∩∩= . (9)

That is, every entity which has associated changes, and is found both in v1 and v2, is considered modified.

III. SYSTEM DESIGN We propose to use one of distributed source code

management systems [4] to version control ontologies. This considerably simplifies implementing such requirements as ability to edit ontologies with any customary editor and to create new versions without connection to repository. It turns out, however, that the built-in tools for comparing and merging files do not fit well for performing these tasks with ontologies for many reasons. One of them is that an ontology by definition is an unordered set of statements, and therefore the order of statements does not matter. Another reason is that ontology format does not matter as well – the same ontology can be serialized in different formats, say, in RDF/XML [5] and OWL/XML [6]. Even when using the same format, different ontology editors may produce different representations of the same ontology. The representation would differ but the model would remain the same. Any diff tool commonly used for source code diffing would generate a lot of “changes” between these files. Built-in three-way merge tools do not work for ontologies for the same reason. That is why separate tools for these tasks were developed using equations (1-9).

Figure 2 shows the system at a glance. It has two important parts: local tools and remote services. Local tools are used to compare versions, perform three-way merge and publish ontologies. Remote services provide issue tracker with ontology-specific information such as ontology hierarchy, entity history and entity usage across all ontologies in repository. Both parts make use of a common library implementing algorithms described above.

Page 3: [IEEE 2012 7th International Forum on Strategic Technology (IFOST) - Tomsk, Russia (2012.09.18-2012.09.21)] 2012 7th International Forum on Strategic Technology (IFOST) - An ontology

Figure 3. Main window of owl2merge – the three way merge tool

IV. IMPLEMENTATION The common diff and merge library is implemented in Java

using the OWL API library [7] which fully supports OWL 2 specification [8]. Local diff and merge tools are also implemented in Java with some shell scripts which help to integrate local tools with version control system. This part is open source and available online1. Here is an example output of the diff tool: Total additions: 3 Total removals: 3 New: NamedIndividual: pizza:Russia Modified: Class: pizza:Country Class: owl:Thing NamedIndividual: pizza:America NamedIndividual: pizza:England NamedIndividual: pizza:France NamedIndividual: pizza:Germany NamedIndividual: pizza:Italy - Prefix(owl11:=<http://www.w3.org/2006/12/owl11#>) - Prefix(owl11xml:=<http://www.w3.org/2006/12/owl11-xml#>) - DifferentIndividuals(pizza:America pizza::England pizza:France pizza::Germany pizza:Italy ) + ClassAssertion(owl:Thing pizza:Russia) + ClassAssertion(pizza:Country pizza:Russia) + DifferentIndividuals(pizza:America pizza:England pizza:France pizza:Germany pizza:Italy pizza:Russia )

Here we can see the total number of added statements, total number of removed statements, one new entity and 7 modified entities, followed by 3 removed statements and 3 added statements. Each statement is encoded using its type name followed by parameters enclosed in parentheses. There are 6 types of statements shown in Table I.

TABLE I. STATEMENT TYPES

Name Notation Arguments

Ontology format OntologyFormat

One of the following values: RDF/XML; Turtle; OWL/XML;

OWL Functional Syntax; Manchester OWL Syntax

Namespace prefix NamespacePrefix Expression in form

Prefix:=<Namespace> Ontology import OntologyImport IRI of an imported ontology

Ontology IRI OntologyIRI IRI of the ontology

Version IRI VersionIRI IRI of the ontology version

Axiom Depending on axiom type. OWL Functional Syntax is used.

The main window of the three-way merge tool is shown in Figure 3. It contains four tabs which display common changes, conflicting changes, other changes, and the merge result, and allow including or excluding specific changes. Common changes and other changes are included (checked) by default. Conflicting changes are not included by default and the user is expected to select appropriate ones to include them in the result.

1 http://ontovcs.googlecode.com

These tools have been integrated with popular distributed version control systems (Git [9] and Mercurial [10]). This allows using all features of distributed version control for ontology development. For example, when a user performs a “git diff” command, the two versions are passed to the owl2diff tool. When a merge conflict occurs, the version control system passes the three versions to the owl2merge tool.

V. EVALUATION The comparison algorithm has been evaluated on large

OWL files from NCI Thesaurus2. Each file contained about 1,210,000 axioms and about 88,000 classes. Table II shows the results of this evaluation. One can see that it requires some time to load files into memory, but the comparison itself is fast. One option to reduce the size of RAM used to compare versions is to divide large ontologies into smaller modules which actually can help to understand and organize them better.

TABLE II. EVALUATION RESULTS

Thesaurus_11.01e Thesaurus_11.03d

File size 217 MB 219 MB

Axiom count 1 209 302 1 219 319

Class count 87 396 88 064

Change count 20 653 (1.7%)

RAM used 2.76 GB

CPU frequency 1.6 GHz

Time to load 65 seconds

Time to compare 8.3 seconds

2 http://ncit.nci.nih.gov/

Page 4: [IEEE 2012 7th International Forum on Strategic Technology (IFOST) - Tomsk, Russia (2012.09.18-2012.09.21)] 2012 7th International Forum on Strategic Technology (IFOST) - An ontology

REFERENCES [1] J. P. Lang. Redmine. 2012. URL: http://redmine.org/. [2] The Trac Project. 2012. URL: http://trac.edgewall.org/. [3] The Protégé Ontology Editor and Knowledge Acquisition System //

Protégé. 2012. URL: http://protege.stanford.edu/. [4] Intro to Distributed Version Control (Illustrated) // BetterExplained.

2007. URL: http://betterexplained.com/articles/intro-to-distributed-version-control-illustrated/.

[5] RDF/XML Syntax Specification // World Wide Web Consortium. 2004. URL: http://www.w3.org/TR/REC-rdf-syntax/.

[6] B. Motik, B. Parsia, P. F. Patel-Schneider, OWL 2 Web Ontology Language. XML Serialization. URL: http://www.w3.org/TR/owl2-xml-serialization/.

[7] M. Horridge, S. Bechhofer, The OWL API: A Java API for Working with OWL 2 Ontologies // OWL Experiences and Directions: Procs. 6th Intern. Workshop. – Chantilly, 2009. – V. 529. P. 53–62.

[8] B. Motik, P. F. Patel-Schneider, B. Parsia, OWL 2 Web Ontology Language structural specification and functional style syntax // World Wide Web Consortium. 2009. URL: http://www.w3.org/TR/owl2-syntax/.

[9] S. Chacon, Pro Git. 2009. URL: http://git-scm.com/book/. [10] A. Babenhauserheide, Learning Mercurial in Workflows. 2011. URL:

http://mercurial.selenic.com/guide/.