seminar work – rdf databases - dfkisauermann/papers/seminarworkr... · seminar work – rdf...

29
1 Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development projects Leo Sauermann (9526103) Hardtgasse 27/11, A-1190 Wien [email protected] May 28th, 2003 Abstract The Semantic Web is the next generation of the World Wide Web, enriching documents with metadata and providing ways to express semantic information in a general way. The basis of the Semantic Web is the Resource Description Framework (RDF), it defines a syntax and rules for metadata about resources. Currently various systems exist that are based on the Resource Description Framework, more systems will follow. Some applications will need to RDF information or search in it. In this paper, three databases are described that provide permanent storage of RDF data and have querying facilities. The features of the databases are listed and information about using them in development projects. Finally a detailed chart compares the servers and a conclusion is given, how each of them meets different demands. Keywords: Semantic Web, RDF, storage, server 1 Introduction Many systems are using the Semantic Web technology to handle their metadata. RDF is an upcoming standard that provides the basis for storage and exchange of this metadata. The Semantic Web is a field of interest for different parties. Web sites are enhanced with semantic functionality and provide their existing information in this new standard. Different academic groups use the system in their research work, the topics are numerous: knowledge management, database systems, B2B communication, search engines, knowledge portals, inference engines, artificial intelligence – and many more. When stepping into Semantic Web development, a developer tries to get an overview about the existing technology and projects. The market is still new and there are already many tools available, some of them are outdated or abandoned, other have been developed for years, are mature and have been used in different

Upload: others

Post on 20-Jul-2020

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

1

Seminar Work – RDF Databases

Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway

in development projects

Leo Sauermann (9526103) Hardtgasse 27/11, A-1190 Wien

[email protected] May 28th, 2003

Abstract The Semantic Web is the next generation of the World Wide Web, enriching documents with metadata and providing ways to express semantic information in a general way. The basis of the Semantic Web is the Resource Description Framework (RDF), it defines a syntax and rules for metadata about resources. Currently various systems exist that are based on the Resource Description Framework, more systems will follow. Some applications will need to RDF information or search in it. In this paper, three databases are described that provide permanent storage of RDF data and have querying facilities. The features of the databases are listed and information about using them in development projects. Finally a detailed chart compares the servers and a conclusion is given, how each of them meets different demands. Keywords: Semantic Web, RDF, storage, server

1 Introduction Many systems are using the Semantic Web technology to handle their metadata. RDF is an upcoming standard that provides the basis for storage and exchange of this metadata.

The Semantic Web is a field of interest for different parties. Web sites are enhanced with semantic functionality and provide their existing information in this new standard. Different academic groups use the system in their research work, the topics are numerous: knowledge management, database systems, B2B communication, search engines, knowledge portals, inference engines, artificial intelligence – and many more.

When stepping into Semantic Web development, a developer tries to get an overview about the existing technology and projects. The market is still new and there are already many tools available, some of them are outdated or abandoned, other have been developed for years, are mature and have been used in different

Page 2: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

2

projects; there are also new tools that have a promising start. The systems are evaluated and a decision has to be made, on which system to base the own development.

This paper describes three interesting RDF Databases, systems that are needed by Semantic Web applications to store metadata. Today there are only a few products available, this paper provides an introduction into the field and can be used as a starting point to find the right system for a project.

The remainder of the paper is organised as follows. Section 2 is an overview about the Semantic Web and the storage problem. The three systems are described in Sections 3, 4 and 5. In section 6 is a short listing of other storage systems. A comparison of the systems is given in section 7. The last section is the appendix where you find the references and acknowledgements.

Acknowledgements I thank Gerald Reif for reviewing the work again and again and providing me with many documents and ideas on how to improve.

Contents 1 Introduction ........................................................................................................... 1 2 Semantic Web and RDF Storage ........................................................................... 3

2.1 Semantic Web................................................................................................. 3 2.2 Storing RDF data............................................................................................ 4 2.3 The three servers............................................................................................. 4

3 FORTH-RDFSuite................................................................................................. 5 3.1 Architecture of RDFSuite............................................................................... 5 3.2 Features .......................................................................................................... 6

4 Sesame................................................................................................................... 9 4.1 Architecture of Sesame................................................................................... 9 4.2 Features ........................................................................................................ 11

5 RDF Gateway ...................................................................................................... 15 5.1 Architecture of RDF Gateway...................................................................... 15 5.2 Features ........................................................................................................ 17

6 Other RDF Databases .......................................................................................... 22 7 Comparison ......................................................................................................... 23 References .............................................................................................................. 27

Page 3: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

3

2 Semantic Web and RDF Storage

2.1 Semantic Web The Semantic Web is a vision of an improved kind of Web with enhanced functionality which will permit semantic-based representation and processing of Web information. W3C has proposed a series of technologies that can be applied to achieve this vision. The Semantic Web extends the current Web by giving the web content a well-defined meaning, better enabling computers and people to work in cooperation. [10] XML is aimed at communicating and storing data so that different systems can understand and interpret the information. XML is focused on the syntax of a document and it provides essentially a mechanism to declare and use simple data structures. However there is no way for a program to actually understand the knowledge contained in the XML documents.

Resource Description Framework (RDF) [6, 7] provides a common framework for expressing the semantic information so it can be exchanged between applications without loss of meaning. RDF uses XML to exchange descriptions of Web resources and emphasizes facilities to enable automated processing. The RDF descriptions provide a simple ontology system to support the exchange of knowledge and semantic information on the Web.

RDF is based on the idea of identifying things using Web identifiers (URIs), and describing resources in terms of simple properties and property values. A basic RDF statement consist of three parts. First the subject – a resource that is to be further described. The second part is the predicate, what property of the subject the statement further describes. The third part is the object, a value of this property. The object could either be a literal (a number, a name) or another resource, identified by an URI. Subject and predicate are always a URI. The three together, subject, predicate and object, form a statement that is called a triple. This enables RDF to represent simple statements about resources as a graph of nodes and arcs representing the resources, and their properties and values.

RDF Schema, abbreviated RDF(S) [8], provides the basic vocabulary to describe the structure of RDF documents. RDF Schema can be used to define properties and types of the web resources. Similar to XML Schema which gives specific constraints on the structure of an XML document, RDF Schema provides information about the interpretation of the RDF statements. The DARPA Agent Markup Language (DAML) [1] is an AI-inspired description logic-based language for describing taxonomic information. DAML is combined with Ontology Interchange Language (OIL) and is now called DAML+OIL. The DAML+OIL language builds on top of RDF(S) to provide a language with both a well-defined semantics and a set of language constructs including classes, subclasses and typed properties. DAML+OIL can further express restriction on membership in classes and restrictions on domains and ranges of properties.

Semantic Web is highly distributed, and different parties may have different understanding of the same concept. This is central to the concept of an ontology. The ontology of a Semantic Web service is a document that formally defines the relations among terms. The most typical kind of ontology has a taxonomy for

Page 4: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

4

classification and a set of inference rules to derive new information out of existing data. RDF(S) and DAML+OIL supply the language to define the ontology.

2.2 Storing RDF data The basic building block in the Semantic Web are RDF triples, consisting of subject, predicate and object. The subject contains a URI that identifies a resource, the predicate is a URI to a taxonomy. The object contains either a URI or a literal value. Through an ontology an object can be restricted to a given type, for example a string, a date or a floating point number. This is the syntax to express information about Web resources and other resources. An ontology defined in a language like RDF(S) can be expressed using RDF triples.

A persistent RDF database stores RDF description data about resources and ontologies like RDF Schema, then the database is contacted by clients that request the stored information.

The simplest way to store RDF data is to write all statements in XML files – the RDF/XML standard [9] describes how to markup RDF in XML. Another syntax is Notation3 [2] that defines a way to note RDF in plain text files. XML and N3 files cannot be used for large amounts of data because of the low performance when inserting and searching.

A database that stores RDF data therefore has to use other ways to store the RDF statements. Existing relational databases (RDBMS) can be used to store the triples in tables, also the optimized indizes and SQL query functionality can be reused in this way. The design of how to represent the RDF triples in the RDBMS tables and building useful indizes influences the performance and behaviour of the RDF database. A RDF database can also use a proprietary file format and be independent from an external storage system.

Different institutions and individuals from the RDF community have created tools to store RDF data. The most contributions come from academic institutions. RDF Gateway is the first commercial development.

The focus of many Semantic Web systems is to publish existing data in RDF format, these systems don’t need a RDF repository to store data as they extract the information out of the existing databases. As the Semantic Web keeps on evolving, the need for native RDF storage systems will grow, because applications will be created that are entirely based on RDF as data format. At the moment RDF has not reached its potential.

2.3 The three servers In this work three different systems are described, each having distinct and characteristic features. RDFSuite was the first framework with a repository and lay the ground for all following products. Sesame is powerful, open source software. RDFGateway is the first commercial product and has no academic roots.

Because the market is still new and small, these applications are representative for all applications in this sector. The three databases where first published in the years 2000, 2001 and 2003 and illustrate the evolution of this market.

Page 5: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

5

3 FORTH-RDFSuite RDFSuite [11, 21, 22] is a framework of high-level scalable tools created by “The Institute of Computer Science of the Foundation for Research and Technology-Hellas” (ICS-FORTH), located in Heraklion, Greece. RDFSuite has been partially supported by EU projects C-Web (IST-1999-13479), MesMuses (IST-2001- 26074) and QUESTION-HOW (IST-2000-28767).

Most development work on RDFSuite has been done during the years 1999 and 2000. RDFSuite was the first framework of its kind and pioneered the way for other RDF projects. The project participants used RDFSuite to do fundamental research in the area of RDF storage. [11].

3.1 Architecture of RDFSuite Three applications form the suite. The Validating RDF Parser (VRP) reads RDF data and validates it. The RDF Schema Specific Data Base (RSSDB), it can read the output of VRP and is able to store RDF triples. The RDF Query Language (RQL) parses queries and retrieves data out of the RSSDB.[Figure 1]

All three applications are loosely coupled and can be used independent from each other.

Figure 1 The ICS-FORTH RDFSuite Architecture

VRP – Validating RDF Parser The VRP is designed to analyze, validate and process RDF descriptions. It is based on the compiler generator tool CUP/JFlex [33], which distinguishes it from other RDF tools that use XML parsers to read RDF data. Through the stream-based parsing support of JFlex and the quick LALR grammar of CUP a good performance is possible when parsing large volumes of RDF descriptions.

VRP understands embedded RDF in HTML or XML and provides full Unicode support. It is programmed in Java and used as a command line tool.

FORTH adapted RDF Model & Syntax and RDF Schema by adding some restrictions. A property should only be defined once and the domain and range of the property have to be defined. A property can contain either a literal or object value but not both. All constraints are listed in [11].

RDF Schema data is separated from the RDF triple data, the schema data is verified against the RDF Schema specification and the additional constraints by FORTH. The RDF data is verified against the ontology information in an RDF(S).

Page 6: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

6

Output of the VRP are triples that can be used by other applications, especially to feed the RSSDB.

RSSDB – RDF Schema Specific Data Base The core of RSSDB are two java modules for loading and updating, the modules implement primitive functions for inserting, deleting and modifying RDF triples. The triples are stored in a relational database, the java modules use JDBC to connect to the database.

RSSDB creates tables in the relational database. Namespaces, schemas and RDF triples are stored in different tables, the triple table only contains references to the namespace and schema tables. There is a distinction between unary and binary relations. Indices are constructed on the tables to speed up joins and the selection of specific triples.

RQL – RDF Query Language RQL is a typed language following a functional and supports generalized path expressions. It features variables for nodes (resources) and edges (properties). RQL relies on a formal graph model. The syntax of RQL has constructs for schema and data querying and schema information can be used in search for triples. [23]

The Query Language consists of three modules, first the Parser, analyzing the syntax of queries; second the Graph Constructor, capturing the semantics of queries in terms of typing and interdependencies of involved expressions; and third the Evaluation Engine, accessing RDF descriptions from the underlying database via SQL3 queries.

3.2 Features Parser Input and Output The parser has various input formats: § Embedded RDF in HTML or XML § Full Support of XML Schema Data Types (XSD)

It is able to parse voluminous files because it does not rely on a XML representation in memory and instead implements its own lexical analyser. No XML software needs to be installed. It is possible to create statistics of validated schemas and resource descriptions about time, characteristics e.g., descendants/ancestors distribution.

The standard output format is a stream of triples. A unusual feature of the parser is the Scalable Vector Graphics (SVG) output, the parser can produce a graph visualisation of the RDF model and save it in SVG syntax. These graphics can be used for documentation or other purposes.

Validation The parser has various validation options. A syntactic validation checks for XML and RDF correctness. When validating RDF statements, that are written according to an ontology, the parser checks if the statements are correct by including files that contain the ontology, the ontology can be defined in RDF(S) or DAML+OIL. The result of the validation is a list of warnings and errors about: § Class Hierarchy Loops

Page 7: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

7

§ Property Hierarchy Loops § Domain/Range of SubProperties § Source/Target Resources of properties § Types of Resources § Validation § Customization of Semantic Validation Constraints

Enumerations RSSDB extends the RDFSchema functionality by adding the enumeration data type. An enumeration is used to restrict a property to have values out of a predefined set of values. Enumerations are bound to the string data type. For example a weekday data type could be restricted to values “Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday”.

Thesauri A thesauri can be integrated with RSSDB. An example is given where a thesauri is used to describe regions of France.

Data types XML schema data types are validated when used in RDF data. It is possible to use derived or faceted datatypes (i.e., Thesaurus, Enumeration).

Meta schemas User defined meta-schemas (e.g., DAML-OIL) are supported. Peculiarities of RDF schemas and description bases (e.g., number of classes)

Query functionality § XML Schema data types (for filtering literal values) § grouping primitives (for constructing nested XML results) § arithmetic operations (for converting literal values) § aggregate functions (for extracting statistics) § namespace facilities (for handling different schemas) § metaschemas querying (for browsing schemas) § recursive traversal of class and property hierarchies (for advanced

matchmaking) The RQL Query module pushes as much as possible query evaluation to the underlying DBMS, it makes extensive use of DB indices. Other features of the query engine are § Results are provided in a generic RDF/XML form. § XSL/XSL processing for customized rendering. § Easy to couple with commercial ORDBMSs § RDF Querying APIs (SQL3/C++ functions) § Easy to integrate with Web Application Servers § C++ or Java clients to RQL servers (XDR based, SOAP services)

Debugging § Incremental loading (detecting changes) of RDF namespaces § Statistics (loading time, characteristics) of stored schemas and resource

descriptions

Page 8: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

8

Focus on Scalability The design and implementation of the system is optimised for scalability. The size of the database scales linearly with the number of loaded triples. The system was capable to handle a test with a Open Directory RDF dump that contained about 6 million triples.

Documentation The tarball distribution includes a html file that describes the installation process. There is no documentation package for download, the documentation consist of several files on the FORTH-website and in the distribution. There are separate documentations for VRP, RSSDB and RQL.

There are some example files, RQL uses the museum example similar to an example used by Sesame. On the RDFSuite webpage a collection of schemas is provided to test RDFSuite.

A detailed description of RDFSuite can be found in academic publications published by the participating persons and institutions.

The documentation is not exhaustive and to find needed information, the user may have to search in different places.

Installation The documentation describes an installation on a Linux or Solaris system. It could be possible to install RDFSuite on a Windows operating system using the Cygwin Framework [29], windows installation is not documented and was not tested.

RDFSuite relies on a PostreSQL RDBMS [31], minimum version 7.1.3 . A part of the RDFSuite is the SPI-API, it is a source distribution and has to be compiled using the PostgreSQL source. RQL is distributed in C files.

RSSDB is distributed as Java source files and a compiled JAR-file version. Installation of these works according to the documentation. Some file and path instructions in the documentation differ from the actual distribution of RSSDB and the documentation itself is very short.

Requirements The Validating RDF Parser requires Java 1.4. To install and run RSSDB and RQL: § Java 1.2 § PostgreSQL v7.1.3 (or higher) § C/C++ compiler (tested with gcc-2.95.1)

Extensibility Programming of the RDFSuite started in 1999, and despite its age the package is maintained and supports the latest RDF(S) working draft.

The Validating RDF Parser VRP is provided as Java application, all sources are available and distributed. There is a demonstration Application provided that shows most of the features of VRP. Together with the exhaustive Javadoc-generated documentation there is a stable foundation for expanding the VRP and reusing its classes.

RSSDB can be installed on many Unix or Linux based systems, it is provided as source distribution and has to be compiled for the target system. There is a documentation of the architecture.

Page 9: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

9

RQL is distributed with its source, consisting of core libraries and client applications that use the libraries. RQL provides interfaces for C, Java and SOAP.

There are numerous scientific papers about all three applications of the RDF Suite, written by the developers of the software.

License RDFSuite is distributed under the C-WEB License which is GPL-compatible and it is possible to redistribute software based on RDFSuite sources under a GPL license.

4 Sesame Sesame is an RDF Schema-based Repository and Querying facility. It was a deliverable in the European IST project On-To-Knowledge (EU-IST-1999-10132) and has been developed by the project partner Aidministrator Nederland. It serves as a central tool for RDF storage and querying, several other deliverables of the project are using Sesame. The tools OnToShare, RDF Ferret, Spectacle (also by Aidministrator), IsaViz, OntoEdit and Jena can interact with Sesame. [26]

The first publication about Sesame was released on October 1st, 2001, the development of the server continued as part of the On-To-Knowledge. On March 1st, 2002 Aidministrator released Sesame as an Open-Source project under the GNU Lesser General Public License (LGPL).

When the whole On-To-Knowledge project ended in October 2002, the foundation “Stichting NLnet” [28] began to support the development of Sesame, NLnet’s policy is to fund Open-Source projects. The project will be continued by NLnet for 20 months, until 31st October, 2003.

Ongoing contributions are made by the Bulgarian company OntoText, which developed the Ontology Management Module of Sesame as part of the On-To-Knowledge project, the company will keep supporting the project by investing more resources into its development.

The project is managed by Arjohn Kampman and Jeen Broekstra, both software developers at Aidministrator. A source distribution is published on Sourceforge [27].

Sesame is partly connected to RDFSuite, both where a deliverable of an European IST project. The RDF query language RQL [23] of RDFSuite was adapted and is now a part of Sesame. The FORTH institute is not a partner in the On-To-Knowledge project [3] but the researchers meet at the workshops and conferences that emerged around OntoWeb [5] and other projects.

4.1 Architecture of Sesame Sesame consist of several parts that are clearly separated from each other and together they provide the functionality of the server. [Figure 2]

Page 10: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

10

Figure 2 Sesame Architecture

Storage and Inference Layer - SAIL Sesame can store RDF data in different relational databases. The function modules do not access the database directly, a Storage and Inference Layer (SAIL) provides and abstract interface to the underlying database.

The SAIL consists of modules, these are responsible for access to persistent storage systems like RDBMS or files and also the inference functionality is here. A SAIL can be stacked on top of another SAIL, with this stacking of SAILs it is possible to add functionality to the server. Calls to stacked SAIL modules are propagated down the stack, until the last SAIL finally handles the actual request and the result is then propagated back up again. A module in the stack can be used to for logging or access restrictions.

There are different SAIL implementations for the different database systems, PostgreSQL, MySQL and Oracle databases are supported, support for DB2 and

Page 11: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

11

Poet are under development. All database SAILs are derived from one generic class and it is possible to create own modules based on that.

Functional Modules All functions that are available for applications using Sesame are separated in functional modules that work independently from each other. Currently there are five functional modules available § A data administration module for adding and removing data. § An RDF Export module that can be used to export (parts of) the data in a

repository as an RDF document. § An RQL query engine that can be used to evaluate RQL queries. RQL was

created by FORTH [23] § An RDQL query engine that can be used to evaluate RDQL queries. RDQL

is part of JENA [20] § A versioning module for ontology versioning.

The functional modules access RDF data stored in Sesame through the SAIL modules.

Protocol Handlers The server publishes the functionality through different protocols. Each protocol has its own handler running in the server. There are three protocol handlers available § HTTP § JAVA RMI § SOAP

When a request is accepted by a Protocol Handler it is forwards to the Request Router.

Request Router All requests from outside the server reach the Request Router, this module decides which Functional Module is responsible for the request.

Runtime environment The server runs as a Java web servlet which needs a servlet engine as environment, the free Apache Tomcat Server is recommended to be used with Sesame.

4.2 Features RDF Schema Sesame has basic support for RDF Schema. Classes, properties and inheritances are stored in specialised tables in the underlying DBMS. The RQL query language has a syntax that makes it easy to query for classes and properties. The queries are evaluated in accordance to the RDF-Schema recommendation.

Data Uploads RDF datasets can be uploaded in two ways, either by passing a URL to the server that points to a document or by posting the RDF statements as a text to the server. Sesame accepts RDF data in XML encoded or N-Triples syntax.

Page 12: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

12

RQL queries RQL was created by ICS-FORTH for the RDFSuite. It is a proposal for a declarative language for RDF and RDFS. The language was adapted for Sesame and the implementation of the query engine is different from the approach in RDFSuite.

A query is parsed and translated into a query model. In this model the query is optimized. After that, each node of the resulting tree is evaluated. Every node is translated in a single call to the SAIL, this means that all data needed for the query is loaded and then evaluated, the optimizations of the underlying DMBS system are not used.

A query contains triples that contain named variables, the query engine searches for triples that fit the missing values in the variables.

The result of a query is a list of values that fit the variables of the query, the result format is either HTML, RDF/XML or a special XML notation. A major trouble with Sesame is that the result does not contain the names of the variables, the programmer has to match the result values to the names by using the index of the variable. This may change in the future.

RDQL queries Hewlett Packard created the RDQL query language for the Jena framework. It is based on SquishQL and rdfDB. [20] The syntax of RDQL is different from RQL but the evaluation process is similar to RQL.

Data Extraction Single statements can be extracted from the repository through queries. To extract the whole repository, there is a function that returns all data and schema information, the format is RDF/XML, N-Triples or Notation3.

Data Removal RDF statements can be removed from the repository, the selection of the statements is done by specifying subject, object and predicate of the statement that has to be removed. To remove all statements with a known subject, the remove command is called with the subject and empty object and predicate. By providing empty values, it is possible to delete more statements.

Web Interface There is a HTML interface that allows a user to log on to the server, select a repository and access most of the functions of Sesame. This helps in administration and testing.

RDF Explorer Part of the web interface is an explorer that allows to browse through a repository. The user enters the uri of a resource and gets all statements that include the resource, these statements are linked and the user can browse to connected resources. This screenshot [Figure 3] shows the Museum example.

Page 13: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

13

Figure 3: Sesame RDF Explorer

Multiple Repositories One Server can handle multiple repositories with different security options. Each repository can use a different SAIL implementation. Each repository needs its own database.

PeerToPeer A Sesame server can access repositories on other servers and integrate these. This enables the aggregation of a set of databases. A new PeerToPeer repository is configured with a list of existing repositories, the new repository itself is empty and gathers data from the servers. PeerToPeer mode is restricted to read-only access and has no caching. Also the repositories are queried sequentially and therefore the query engine cannot find the results when the information of two repositories has to be combined..

Ontology Middleware Module The Ontology Middleware Module (OMM) is an enterprise back-end for formal knowledge management. It extends the inference capabilities of Sesame and enables the use of RDF(S) and ontology languages structurally compatible with it, such as, DAML+OIL and OWL. In addition, OMM provides number of middleware features such as versioning, tracking changes, fine-grained access control, meta-information, and more interfaces (Built-In, RMI, SOAP).

OMM is an extension of Sesame and is available as development of the Ontotext project [4].

Page 14: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

14

Security Users rights can be defined on a “per repository” level. Read and write access is configurable. The Administration of users and rights is in the configuration file and has to be done by manually editing the file.

Configuration All configuration data of the server is stored in one XML file – “system.conf”. The facts about repositories, users, rights and the underlying DBMS can be changed in this file. The JDBC drivers connection string for underlying databases are here. There are exhaustive examples about how to edit this file.

Documentation A “user guide” document covers the topics of installation, getting started, using the features of Sesame and connecting to the server. Some chapters of the “user guide” reveal only a “to be done note” but the document is very useful to get the server running.

A separate description about RQL gives an introduction and examples to use RQL. There are some more documents available by Aidministrator about the query languages and fact sheets. For the software itself a JavaDoc documentation about all classes of the server and client applications exists, this is needed for extending Sesame or deep understanding how the server really works. Finally there is a scientific paper about Sesame, this paper was written for the On-To-Knowledge project.

Installation Sesame relies on Tomcat as servlet engine, which has to be installed. Tomcat is part of the Apache Jakarta project and freely available.

Also a DBMS compatible with Sesame has to be available, MySql is a common database and easily installed. Sesame needs a user account and at least one database on the DBMS. For each repository a new database is needed. The JDBC driver for the DBMS is needed.

Requirements Sesame needs: § Java2 runtime § Tomcat 4 or other servlet engine that implements Servlet 2.2 and JSP 1.1. § Optionally a database system: mySql, PostgreSQL, or any JDBC compliant

RDBMS

Extensibility There are many ways to use and extend Sesame, it was designed to be adapted and extended in the On-To-Knowledge project.

The basic way to use Sesame is to install it on a Tomcat Server and a supported database system like MySql. This way an application can access Sesame through its protocols to store and retrieve data.

Application developers can install a Sesame server and integrate own modules in the Server. OMM is an example how a SAIL was used to extend the functionality. Also Protocol Handlers can be easily added to Sesame to support more protocols. To extend the functionality, new Functional Modules can be integrated.

Page 15: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

15

Because of the structured architecture of Sesame and the Open Source distribution it is possible to build a custom runtime environment or integrate the whole Sesame functionality in own application servers.

5 RDF Gateway The company Intellidimension [12], located in the town Windsor, Vermont (USA) created a commercial RDF platform called RDF Gateway. The focus is on ease of use and interoperability. The framework is restricted to the Microsoft Windows platform, a version for Linux or other operating systems is not planned.

Production of RDF Gateway started with the foundation of Intellidimension in June 2000. The first beta testing was in 2001. The programmers promoted and discussed the features of the system in public discussion forums of the W3C. Finally the commercial release date of version 1.0 is on March 3rd, 2003, the announcement of the product in public forums started the next day. To my knowledge it is the first RDF product that was created in a private company and not within a public project or university like Ontobroker.

As it is a commercial product, a license is required. Free licenses for academic purposes and for developers who support packages for RDF Gateway are available.

5.1 Architecture of RDF Gateway RDF Gateway is a lightweight and fast server that combines the feature sets of a database management system and a web server. It is designed as a platform for gathering, querying, transforming and delivering RDF data. [Figure 4]

A Script Processor is the centrepiece of RDF Gateway. Through the scripting language it is possible to access all functions of the RDF Gateway, the syntax of the language is similar to Javascript. It includes some extensions to run queries on the database engine and provides objects to access other functions of the server.

RDF Gateway uses its own database format and driver to store RDF data. All configuration settings are also stored in this RDF Database, making it easy to access and change the configuration.

A web server is included in RDF Gateway. It is based on RDFQL script embedded in HTML pages similar to the other common dynamic HTML languages ASP or PHP.

Applications that use the server can be programmed completely in RDFQL Script and be hosted on the web server part of RDF Gateway. Another common way to interact with RDF Gateway is to use RDFQL Script queries over ADO or JDBC, it is possible to edit and query RDF triples in this way. Code packages could be integrated with the server and the server could be integrated in many other ways into applications, including COM and other technologies.

RDF Gateway uses the work of some open source projects. The RDF parser Raptor by David Beckett [19] is used for parsing input RDF statements. XML is parsed with Expat [30] and XMP by Adobe [32] enables the access to embedded metadata in files.

Page 16: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

16

Figure 4 Architecture of RDF Gateway

RDFQL Script Processor The RDFQL script processor is a preemptive virtual machine that compiles, caches, and executes RDFQL scripts. RDFQL is a server-side scripting language based on ECMA Script (Java Script). RDFQL integrates SQL-like query extensions to provide easy access to RDF Gateway's deductive database engine. The RDFQL script processor allows pages to contain a combination of script and static content similar to Microsoft Active Server Pages (ASP). Server functionality is exposed to RDFQL scripts through a library of intrinsic objects (Server, Session, Request, Response, ...).

Database Engine RDF Gateway has a deductive database engine that was design from scratch to support RDF. It performs a bottom-up query evaluation that is federated across all specified data sources. The logical inference capabilities of the engine provide support for RDFQL's declarative rule syntax. The deductive database engine implements its own native file persistence with full-text search. The database engine does not access an external data management system.

Data Service Interface The data service interface allows external data sources to be integrated with RDF Gateway. A data service provider is a module that implements this interface and exposes the contents of a specific type of data source as RDF data. RDFQL allows federated queries to be performed over multiple data services. This open interface

Page 17: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

17

makes it possible to use any of the currently available data service providers or develop a custom provider for a data source.

Authentication/Security RDF Gateway has a security model based on rights and permissions to control access to server and database resources. RDF Gateway supports its own users and roles, as well as, NT users and groups. A NT user is always authenticated using the NT credentials for the account. RDF Gateway's support for NT users and groups makes external security administration possible.

Network IO The network interface supports both HTTP and a proprietary TCP/IP based protocol. The network IO layer supports secure network authentication schemes such as NT Challenge/Response (NTLM). A client connects to the server through this interface.

Package Management RDF Gateway allows complete applications to be developed and deployed as packages. A package consists of RDF server pages, HTML pages, images or any other file type.

Component Management RDFQL supports COM in its server-side scripts. This allows the functionality to RDF Gateway to be extended or for applications to be integrated with RDF Gateway.

Session Management Session management allows user state to be maintained on the server.

5.2 Features Representation of RDF triples in tables The RDBMS paradigm of storing data in tables was adapted to store RDF triples. The data model of the tables is a triple which contains predicate, subject and object. Table columns do not have names but always contain these three items in the noted order. Note that the predicate is the first item. There is an optional fourth column for storing metadata about the triple, this metadata is labelled “context” of the triple. The context field can store a resource identifier that can be used to handle security issues or identify the source of the triple or any custom functionality.

Other data sources External data sources and native databases that are accessed from the server are wrapped in datasource objects. A datasource object has the same structure like a table, containing triples in rows. There is support for in-memory tables and it is possible to create own wrappers for external data.

Databases Storage of tables is partitioned in databases. A server can contain different databases, a table has to be created in a database. The database format is a proprietary file format, each database is stored in one file.

Page 18: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

18

RDFQL script language The scripting language is based on ECMA script, commonly known as Javascript. The following concepts are implemented: § Functions § Variables and Arrays § Loops and the If-statement § Exception handling § Importing of other script files § Comments § RDF Gateway specific statements

The statements for RDF Gateway include every aspect of the server and enable the programmer to access all functionality. An example is the server configuration tool which is a website written in RDFQL that is interpreted by the integrated web server and allows access to all server objects like tables, databases, users, packages.

To navigate through RDF triple datasets a RDF node object is provided, it collects all predicates and subjects of a given object and makes it possible to change the values of subjects.

To run queries on the server a set of database commands is available. The database commands are embedded in RDFQL script, this syntax is commonly known from SQL commands in C source files that are interpreted by a precompiler.

Access to ActiveX and COM Objects is enabled through an ActiveXObject language construct.

If RDFQL script is evaluated in the web server context, objects containing session, request and response data are provided.

Adding and retrieving data Data manipulation commands are similar to SQL syntax commands. The functionality is extended for RDF specific needs. There are INSERT, SELECT and DELETE statements. The statements use variables to bind data, similar to the RQL language used by RDFSuite.

INSERT { [http://www.artchive.com/] [http://www.icom.com/schema.rdf#technique] [http://www.artchive.com/rembrandt/abraham.jpg]

'Oil on canvas' } INTO museum;

This example shows how to insert a triple into the table “museum”. The triple is written between curly brackets and contains four values: § Context § Predicate § Subject § Object or Literal

Page 19: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

19

The semantic information of this triple is that the specified painting “abraham.jpg” is in technique “Oil on canvas” and that this information is taken from “www.artchive.com”.

SELECT ?a, ?b, ?c USING museum WHERE {?a ?b ?c} AND ?c LIKE “Oil”;

To retrieve triples from a table, a SELECT statement is used. The example retrieves all triples that contain the term “oil” in a literal object value. Note that the triple between curly brackets contains only three values, context is omitted. Data can be taken from external data sources or transferred from one table to another.

var doc = new DataSource( "inet?url=file://c:/Museum.xml&parsetype=rdf"); SELECT ?a, ?b, ?c USING #doc WHERE {?a ?b ?c}; INSERT {?p ?s ?o} INTO museum USING #doc WHERE {?p ?s ?o};

In this example a RDF data is taken out of a text file and inserted into the table museum. Note that in RDFQL Javascript code is mixed with a SQL like code – the javascript variable “doc” is used in the database commands as “#doc”.

Built-in Webserver RDF Gateway has a built-in web server. The configuration and management interface is published as a website. Application developers can create websites with this web server, using the RDFQL script language. This feature can be used for debugging and development but also to build entire web applications using RDF Gateway. Regarding that is possible to use ActiveX Objects through RDFQL, the web server is a powerful feature.

RDF Query Analyzer RDFQL statements and queries can be created using this visual application. [Figure 5] The query analyzer is similar to the query evaluating products of popular SQL servers. Complex scripts can be authored here and be used in web-pages or other applications. Queries can be evaluated against a local or remote RDFGateway-Server, the text editor has syntax highlighting and can save and open queries.

Inference Engine The RDF Gateway database engine contains an inference engine. New RDF triple statements can be generated dynamically, based on inference rules and the existing triples. Functions can be defined that extract data from the database based on rules. The rules are defined in RDFQL script language and can be used in data manipulation commands.

RULEBASE schema { INFER {[rdf:type] ?s ?class} FROM {[rdf:type] ?s ?subclass} AND {[rdfs:subClassOf] ?subclass ?class}; };

Page 20: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

20

SELECT ?p ?s ?o USING #ds RULEBASE schema WHERE {[rdf:type] ?s ?o} AND {?p ?s ?o};

This example defines the RDF Schema rule about subclasses: if a subject is of type X and X is defined as a subclass of Y, then the subject is also of type Y. The rule is then used in a SELECT statement to retrieve all classes and derived classes of all subjects.

RDF Schema is not supported natively by RDF Gateway, it has to be described in inference rules.

Figure 5: RDF Query Analyzer

Client Libraries RDF Gateway has client drivers for Microsoft ADO and Sun Microsystems JDBC. This allows RDF Gateway to support a wide-variety of client such as web browsers, Windows applications, Java applications, XML or RDF based clients.

Security When accessing RDF Gateway through http, ADO or other protocols, the user has to be identified using username and password. An anonymous user account is provided for public access.

Page 21: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

21

The security system uses both the windows security database to authenticate windows users and an internal user database. Together with Internet Explorer, NT Authentication can be used with http.

Every item that is managed by RDF Gateway can be restricted to defined users, these items include packages, tables, data sources and components. On table level it is possible to modify read, write and delete rights for individual users.

A row-based security concept for RDF statements in tables is based on the context column, the fourth additional field added to subject, predicate and object. A user can be allowed to have read, write and delete rights for a special context.

There is no support for user groups.

Configuration and Management The detailed configuration settings are accessed through a web interface that is hosted by the built in web server. The user has to logon using a Windows administrator account. The web application is labelled “RDF Gateway Management Utility” and provides access to Databases, Tables, Users, Contexts, ActiveX Components, Data Services, Roles, Packages, MimeTypes and Timers. For most of the items security options about permissions can be set.

The management utility is implemented as RDF Gateway web package.

Documentation The documentation consists of the Intellidimension website and a windows help file in the distribution, both are based on the same data. There is a brief description of the architecture and internals of RDF Gateway. The manual contains a brief introduction, a developers guide that explains some examples together with best practise solutions and an exhaustive developers reference. There is only a limited amount of code examples in the manual.

Problems or deeper questions can be posted on a user mailing list, where developers of Intellidimension answer questions. There is also a email support, the company is able to respond to emails within a day but this might change when the product is used by a wider public.

Small Binary, Easy Installation The trial version is distributed in a 6MB executable that contains an Installshield installer for windows. Setting up the server can be done in a few minutes. The whole server including all described features is programmed in a lean 1.2 MB binary. It uses some dynamic link libraries (DLL) to connect to other data sources like web-pages and POP email accounts.

Requirements RDF Gateway can be installed on WindowsXP, 2000 and NT.

Extensibility RDF Gateway is provided as binary distribution. The source can not be changed. The product is not designed for reprogramming, it provides a framework for applications that can be build on top of RDF Gateway.

The easiest way to build applications is using the RDFQL scripting language to create websites. RDFQL can also be used from within the various interfaces like ADO or JDBC and this is the proposed way to build applications using RDF Gateway.

Page 22: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

22

It is also possible to extend the functionality of RDF Gateway by adding components to the server. It can be extended with ActiveX components. Custom data sources can be accessed through DLL libraries that have to implement a defined interface. Both ActiveX components and Custom data sources can be managed through the web-based management utility.

Bugs The first release date of RDF Gateway was in March 2003. The evaluated version is in public use for two months and there are some minor bugs in this version. On the German language version of Windows Professional the security authentification system did not work properly. Some other internationalisation and Unicode related issues still remain – the software has major problems with german umlauts and other special characters. These bugs will probably be removed in the next releases.

Keeping in mind the complex and numerous features and the fact that the release date is only two months ago, these bugs are minimal.

6 Other RDF Databases There are other RDF databases available at the moment. Fortunately, the community of RDF developers is still small and so it is possible to get an overall view of the projects that are in progress. The following systems show major directions that can be found today.

OntoBroker Ontobroker started as an academic project out in the year 1997 on the AIFB (Institut für Angewandte Informatik und Formale Beschreibungsverfahren) at the university of Karlsruhe, Germany. The basic idea was to enrich parts of the WWW with semantic information and allow “intelligent” knowledge retrieval [16, 17, 18].

There are many tools around Ontobroker, like OntoEdit, which has a plugin for Sesame.

Ontobroker is sold in a high price class, there are no free trial versions and academic users can work with a special academic distribution, that also has a respectable price.

The main functionality of Ontobroker is to inference new information based on rules and data that are entered into the system. The input is either RDF or F-Logic (a Datalog/Prolog like syntax) or a combination of both. It can be used as a command line tool that reads input data and a query and writes the result of the query. All data and rules can be held in main memory.

To be used as a repository, Ontobroker runs in a server mode and stores the fact base in an external database. Today MySQL and MS-SQL can be used by the server.

Storing with Jena and Joseki The Jena [13] Toolkit is a RDF framework and has widespread use in academic and commercial projects. It is developed by the Hewlett-Packard Laboratories in Bristol, UK. Jena is not a server but a framework for RDF, other similar frameworks do also exist and I decided to list Jena here as an example for them. It

Page 23: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

23

is based on Java, open source and widely used and developed by a company that can ensure the future of the framework.

HP published a Jena-based server called Joseki [14] that can store RDF triples and answer RDQL queries through a http interface.

Jena is able to work with RDF triples, they are represented as Java Objects and reside in main memory. A collection of RDF triples is represented in a model, one application can work with different models. It does not directly support RDF-Schema but has a DAML+OIL engine that allows the use of ontologies.

It is widely used for storage because applications often use Jena’s RDF-API to represent RDF data in main memory and don’t need a dedicated server application.

Empolis Semantic Web Server This is an example of a Semantic Web application, that had good ideas and started at the right moment, but is no longer developed.

On the homepage of SemanticWebServer it is possible to download version v1.1, dated March 13th, 2002. The product has not changed since then and the homepage is also freezed. The company that created SemanticWebServer, Empolis, is focused on knowledge management and may use SemanticWebServer in its own projects [24] .

7 Comparison In this chapter the features of all servers are listed in a chart, to compare the systems. Following the chart is a discussion about the individual advantages and operational area of each product. [Figure 6] The basic features of all three servers are the same. Permanent storage and retrieval of RDF triples is the core functionality of a RDF repository. The ability to create new information based on existing data in combination with ontologies or rules is also included in all servers.

Each repository was created for a different goal and each provides special features. A rating that compares the applications to each other is not useful – instead, a developer has to find the repository that fits the needs of a given project best.

Page 24: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

24

RDF Suite Sesame RDF Gateway Client Interfaces

Java RMI ü ü

HTTP ü ü SOAP ü ü ADO ü JDBC ü Input Formats

RDF/XML ü ü ü HTML-embedded ü

N-Triple ü ü Features

RDF-Schema ü ü

DAML+OIL ü ü

Ontology Validation ü

RQL Query ü ü

Other Query RDQL1 RDFQL2

Inference engine ü ü ü3 Webserver ü4 ü Thesauri ü Databases supported

PostgreSQL JDBC compliant

5

PostgreSQL MySQL Oracle

+ more to come

Based on own database

Extendable complicated Java Modules ActiveX, DLL

Open Source ü ü

Easy Installation ü ü Security Users

Server level Users Repository level

Users fine grained Win compatible easy administration

Operating System Unix Java based Windows 32

License Model GPL compatible LGPL Commercial 1 … RDQL is by Jena. 2 … RDFQL is proprietary 3 … has to be customized

4 … only a basic webserver 5 … no example is given for JDBC

Figure 6 Comparison Chart

Page 25: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

25

RDF Suite RDF Suite was created in a scientific community and was used as a testbed for the first RDF storage concepts. A focus in RDF Suite is high performance, each of the three modules can handle large volumes of RDF descriptions. § The parser VRP uses its own lexical analyser and can handle data of any size

in an efficient way § For RSSDB different database layouts where tested for the best performance § The query engine translates the queries to database queries, these use all

optimization PostreSQL can offer

The server was created in an Unix environment and will work best in this operating system.

The validation services provide the best and widest features of all tested products, and the validator has the most detailed output available today.

The included documentation needs to be studied very deeply to understand how to install, use and change the server.

RDF Suite has its strength in projects that handle large amounts of data, the system was tested with the data of the Open Directoy Project that included 6 million triples, it is highly scalable. The fast and validating parser can handle a project that integrates many and voluminous data sources, like a knowledge portal.

Sesame Like RDF-Suite, Sesame was developed in a scientific research project. The repository was designed to be the basis of many different applications that where build in the same project.

Sesame can be used on different platforms. It is written completely in Java and connects to common products like Tomcat and MySQL. There are already many other projects that use Sesame, these serve as an example of how to build complex and distributed Semantic Web applications. It has proven to provide: § Easy installation § Platform independent § Easy to extend the functionality § Supports common protocols for access § Commonly used

A major drawback of Sesame is the performance when it comes to large volumes of RDF triples. The modular architecture of Sesame does not support the optimization of queries to fit the underlying database, but this may come in the future.

The security system is still basic and to have a fine grained security, some modules like OMM have to be installed or programmed.

Sesame can be easily used for small projects and it may be modified for many other uses. It can be used as a repository for Semantic Web tools.

Page 26: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

26

RDF Gateway The first commercial RDF storage system has the focus on the ability to build systems with it. Intellidimension provides ideas and examples how to use the server in own projects. The server is easy to use and to install.

RDF Gateway is restricted to run on the windows platform, it can include ADO datasources and connect to other platforms through modules. Clients can access RDF Gateway by using ADO, JDBC and HTTP, so the server is able to support a a heterogeneous client environment.

It is easy to create basic Semantic Web applications that integrate data sources and distribute information using the build-in features: § Easy Installation § Not dependent on other libraries § Web Server § Scripting Language § ActiveX integration § Clients can connect through various interfaces § native RDF storage, fast processing of queries

A good example for the use of RDF Gateway could be a knowledge portal tool for a small business intranet. In a small company different systems can be integrated and the information published by RDF Gateway. A website can be written and tested in a short time. The server integrates the WindowsNT authentification system and supports the NT- challenge response functionality of InternetExplorer, so RDF applications can handle existing intranet users and their access rights.

Building larger applications is restricted by the table and database architecture. Every query is bound to a table.

The inference engine does not know RDF Scheme and uses the proprietary format of inference rules. A special rulebase command has to be included in every query so that the server runs inferences when evaluating the query. The lack of native ontology support and the missing validation functionality make it hard to integrate new ontologies and data files from unknown sources.

For building ad hoc applications and small solutions on the Windows platform, RDF Gateway is perfect, for other projects some customising has to be done.

Who will provide RDF storage systems ? European IST projects are one basis for semantic Web development. The projects should support companies in creating and selling technology that is needed for the Semantic Web.

Ontoprise [18] was founded in the year 1999 as a spin off company of the University of Karlsruhe. It benefits from the experience in many projects made by the AIFB group at the university. The software OntoBroker was converted from an open-source development to a commercial software.

Companies like Ontoprise, that emerge from an academic environment, will be able to provide solutions.

As noted in the beginning, commercial developments will follow, especially the important software industry in the United States will bring their own products to

Page 27: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

27

the market. Hewlett-Packard has created the JENA framework and may build commercial applications.

One company, that is known for its ability to integrate all its numerous products and make them work together, will surely play a major role when it publishes its own view of the Semantic Web – Microsoft.

References Semantic Web

1. DAML http://www.daml.org/

2. Tim Berners-Lee. Notation 3 http://www.w3.org/DesignIssues/Notation3.html

3. On-To-Knowledge EU Project: Content-driven Knowledge-Management Tools through Evolving Ontologies http://www.ontoknowledge.org/

4. Ontotext: a Sirma laboratory for Knowledge and Language Engineering http://www.ontotext.com/

5. Ontoweb Project http://ontoweb.aifb.uni-karlsruhe.de/

6. RDF: http://www.w3.org/RDF/

7. Frank Manola, Eric Miller (editors). RDF Primer. http://www.w3.org/TR/rdf-primer/

8. Dan Brickley, R.V. Guha, (editors). RDF Vocabulary Description Language 1.0: RDF Schema http://www.w3.org/TR/rdf-schema/

9. Dave Beckett (editor). RDF/XML Syntax Specification http://www.w3.org/TR/rdf-syntax-grammar/

10. W3C Semantic Web Activity: http://www.w3.org/2001/sw/

RDF & Storage

11. S. Alexaki, V. Christophides, G. Karvounarakis, D. Plexousakis, K. Tolle. The ICS-FORTH RDF Suite: Managing Voluminous RDF Description Bases. 2nd International Workshop on the Semantic Web (SemWeb'01), in conjunction with Tenth International World Wide Web Conference (WWW10), pp. 1-13, Hongkong, May 1, 2001

12. Intellidimension: Delivering a Platform for the Semantic Web http://www.intellidimension.com/

13. JENA: Semantic Web Toolkit http://www.hpl.hp.com/semweb/jena.htm

Page 28: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

28

14. Joseki: The Jena RDF Server http://www.joseki.org/

15. Ontology Middleware Module http://www.sirma.bg/OntoText/omm/index.html

16. Ontobroker http://ontobroker.semanticweb.org/ http://www.ontoprise.de/products/ontobroker_en

17. Stefan Decker, Michael Erdmann, Dieter Fensel, and Rudi Studer: Ontobroker: Ontology Based Access to Distributed and Semi-Structured Information. In R. Meersman et al. (editors): Semantic Issues in Multimedia Systems. Proceedings of DS-8. Kluwer Academic Publisher, Boston, 1999, p 351-369.

18. OntoPrise http://www.ontoprise.com/

19. Dave Beckett. Raptor RDF Parser Toolkit http://www.redland.opensource.ac.uk/raptor/

20. RDQL: RDF Data Query Language http://www.hpl.hp.com/semweb/rdql.htm

21. RDFSuite related publications can be found here: http://athena.ics.forth.gr:9090/RDF/publications/index.html

22. RDF Suite http://athena.ics.forth.gr:9090/RDF/

23. G. Karvounarakis, A. Magkanaraki, S. Alexaki, V. Christophides, D. Plexousakis, M. Scholl, K. Tolle. RQL: A Functional Query Language for RDF, to be published at Functional Approaches to Computing With Data, P.M.D.Gray, L.Kerschberg, P.J.H.King, A.Poulovassilis (eds.), LNCS Series, Springer-Verlag http://139.91.183.30:9090/RDF/publications/FuncBook.pdf

24. Semantic Web Server http://www.semanticwebserver.com

25. Sesame: http://sesame.aidministrator.nl/

26. Jeen Broekstra, Arjohn Kampman. Sesame: A generic Architecture for Storing and Querying RDF and RDF Schema. Deliverable 10 in the EU-IST On-To-Knowledge Project

27. Sesame on Sourceforge http://sourceforge.net/projects/sesame/

28. Stichting NLnet Foundation http://www.nlnet.nl/foundation/ http://www.nlnet.nl/project/sesame/description.html

Page 29: Seminar Work – RDF Databases - DFKIsauermann/papers/SeminarWorkR... · Seminar Work – RDF Databases Using the three RDF Databases FORTH-RDFSuite, Sesame, RDF Gateway in development

29

Other

29. Cygwin http://www.cygwin.com/

30. James Clark: Expat XML Parser Toolkit http://www.jclark.com/xml/expat.html

31. PostgreSQL http://www.postgresql.org/

32. XMP – Extensible Metadata Platform http://partners.adobe.com/asn/tech/xmp/index.jsp

33. CUP/JFlex http://www.jflex.de/