1 ios press bringing relational databases into the semantic web: … · 2012. 10. 30. · 2 d.e....

Semantic Web 0 (0) 1–41 1IOS Press

Bringing Relational Databases into theSemantic Web: A SurveyEditor(s): Jie Tang, Tsinghua University, ChinaSolicited review(s): Zi Yang, Carnegie Mellon University, USA; Yuan An, Drexel University, USA; Ling Chen, University of Technology,Sydney, Australia; Juanzi Li, Tsinghua University, China

Dimitrios-Emmanuel Spanos a,∗, Periklis Stavrou a and Nikolas Mitrou a

a National Technical University of Athens, School of Electrical & Computer Engineering, 9, Heroon Polytechneioustr., 15773 Zografou, Athens, GreeceE-mail: {dspanos,pstavrou}@cn.ntua.gr,[email protected]

Abstract. Relational databases are considered one of the most popular storage solutions for various kinds of data and they havebeen recognized as a key factor in generating huge amounts of data for Semantic Web applications. Ontologies, on the other hand,are one of the key concepts and main vehicle of knowledge in the Semantic Web research area. The problem of bridging the gapbetween relational databases and ontologies has attracted the interest of the Semantic Web community, even from the early yearsof its existence and is commonly referred to as the database-to-ontology mapping problem. However, this term has been usedinterchangeably for referring to two distinct problems: namely, the creation of an ontology from an existing database instanceand the discovery of mappings between an existing database instance and an existing ontology. In this paper, we clearly definethese two problems and present the motivation, benefits, challenges and solutions for each one of them. We attempt to gatherthe most notable approaches proposed so far in the literature, present them concisely in tabular format and group them under aclassification scheme. We finally explore the perspectives and future research steps for a seamless and meaningful integration ofdatabases into the Semantic Web.

Keywords: Relational Database, Ontology, Mapping, OWL, Survey

1. Introduction

Over the last decade and more, the Semantic Web(SW) has grown from an abstract futuristic vision,mainly existing in the head of its inspirer, Tim Berners-Lee, into an ever approaching reality of a global web ofinterlinked data with well-defined meaning. Standardlanguages and technologies have been proposed andare constantly evolving in order to serve as the build-ing blocks for this “next generation” Web, relevanttools are being developed and gradually reaching ma-turity, while numerous real world applications alreadygive an early taste of the benefits the Semantic Webis about to bring in various domains, as diverse as life

*Corresponding author. E-mail: [email protected].

sciences, environmental monitoring, cultural heritage,e-Government and business process management. Thisevident progress is the result of years-long researchand it comes as no surprise that, nowadays, Seman-tic Web is perceived as a multidisciplinary researchfield on its own, combining and gaining expertise fromother scientific fields, such as artificial intelligence, in-formation science, algorithm and complexity theory,database theory and computer networks, to name a few.

The participation of databases and their role in thisevolving Web setting has been investigated from thevery beginning of the Semantic Web conception, notonly because it was initially compared to “a globaldatabase” [23], but also because this – new at thetime – research field could take advantage of the greatexperience and maturity of the database field. How-ever, the collaboration and exchange of ideas between

0000-0000/0-1900/$00.00 c⃝ 0 – IOS Press and the authors. All rights reserved

2 D.E. Spanos et al. / Bringing Relational Databases into the Semantic Web: A Survey

these two fields was not unidirectional: the databasecommunity quickly recognized the opportunities aris-ing from a close cooperation with the Semantic Webfield [113] and how the latter could offer solutions tolong-standing issues and provide inspiration to sev-eral database subcommunities, interested in heteroge-neous database integration and interoperability, dis-tributed architectures, deductive databases, conceptualmodelling and so on.

Attempts to combine these two different worldsoriginally focused on the reconciliation of the discrep-ancies among the two most representative and domi-nant technologies of each world: relational databasesand ontologies. This problem is also known as thedatabase to ontology mapping problem, which is sub-sumed by the broader object-relational impedancemismatch problem and is due to the structural differ-ences among relational and object-oriented models.Correspondences between the relational model and theRDF graph model, which is a key component of the Se-mantic Web, were also investigated and a W3C Work-ing Group1 has been formed to examine this issue andpropose related standards. Nevertheless, the definitionof a theoretically sound mapping or transformation be-tween the mentioned models is not an end on its own.The motivation driving the consideration of mappingsamong relational databases and Semantic Web tech-nologies is multifold, leading to separate problems,where mappings are discovered, defined and used in adifferent way for each problem case.

Originally, database systems were considered by theSemantic Web community as an excellent means forefficient ontology storage [19], because of their knownand well evidenced performance benefits. This consid-eration has led to the development and production ofseveral database systems, especially optimized for thepersistent storage, maintenance and querying of SWdata [47]. Such systems are informally known as triplestores, since they are specifically tailored for the stor-age of RDF statements, which are also referred to astriples. This sort of collaboration between database andSemantic Web specifies a data and information flowfrom the latter to the former, in the form of populationof specialized databases, that often have some prede-fined structure, with SW data.

A different, perhaps more interesting research linetakes as starting point an existing and fully functional

1RDB2RDF Working Group: http://www.w3.org/2001/sw/rdb2rdf/

relational database and seeks ways to extract informa-tion and render it suitable for use from a Semantic Webperspective. In this case, motivation is shifted fromthe efficient storage and querying of existing ontolog-ical structures to problems, such as database integra-tion, ontology learning, mass generation of SW data,ontology-based data access and semantic annotation ofdynamic Web pages. These problems have been inves-tigated in the relevant literature, each one touching ona different aspect of the database to ontology mappingproblem. Unfortunately, this term has been freely usedto describe most of the aforementioned issues, creatingslight confusion regarding the goal and the challengesfaced for each one of them. Hence, in this paper, wetake a look at approaches that do one or more of thefollowing:

– create from scratch a new ontology based on in-formation extracted from a relational database,

– generate RDF statements that conform to one ormore predefined ontologies and reflect the con-tents of a relational database,

– answer semantic queries directly against a rela-tional database, and

– discover correspondences between a relationaldatabase and a given ontology

We attempt to define and distinguish between theseclosely related but distinct problems, analyze meth-ods and techniques employed in order to deal withthem, identify features of proposed approaches, clas-sify them accordingly and present a future outlook forthis much researched issue.

The paper is structured as follows: Section 2 men-tions the motivations and benefits of the databaseto ontology mapping problem, while Section 3 givesa short background and defines some terminologyof database systems and Semantic Web technologies.Section 4 presents a total classification of relationaldatabase to ontology mapping approaches and lists de-scriptive parameters used for the description of eachapproach. The structure of the rest of paper is largelybased on this taxonomy. Section 5 deals with theproblem of creating a new ontology from a relationaldatabase and is further divided in subsections that dis-tinguish among the creation of a database schema on-tology (Section 5.1) and a domain-specific ontology(Section 5.2) as well as among approaches that rely ex-tensively on the analysis of the database schema (Sec-tion 5.2.2) and approaches that do not (Section 5.2.1).Section 6 investigates the problem of discovering anddefining mappings between a relational database and

http://www.w3.org/2001/sw/rdb2rdf/

http://www.w3.org/2001/sw/rdb2rdf/

D.E. Spanos et al. / Bringing Relational Databases into the Semantic Web: A Survey 3

one or more existing ontologies. Section 7 sums up themain points of the paper, giving emphasis on the chal-lenges faced by each category of approaches, whileSection 8 gives some insight on future directions andrequirements for database to ontology mapping solu-tions.

2. Motivation and Benefits

The significance of databases from a Semantic Webperspective is evident from the multiple benefits anduse cases a database to ontology mapping can be usedin. After all, the problem of mapping a database intoSemantic Web did not emerge as a mere exercise oftransition from one representation model to another. Itis important to identify the different motivations andproblems implicating interactions between relationaldatabases and SW technologies, in order to succeed aclear separation of goals and challenges. That is not tosay that methods and approaches presented here cor-respond strictly to one particular use case, as there arequite a few versatile tools that kill two (or more) birdswith one stone. In the following, we present some ofthe benefits that can be achieved from the interconnec-tion of databases and ontologies.

Semantic annotation of dynamic web pages. An as-pect of the Semantic Web vision is the transformationof the current Web of documents to a Web of data. Astraightforward way to achieve this would be to anno-tate HTML pages, which specify the way their con-tent is presented and are only suitable for human con-sumption. HTML pages can be semantically annotatedwith terms from ontologies, making their content suit-able for processing by software agents and web ser-vices. Such annotations have been facilitated consid-erably since the proposal of the RDFa recommenda-tion2 that embeds in XHTML tags references to ontol-ogy terms. However, this scenario does not work quitewell for dynamic web pages that retrieve their contentdirectly from underlying databases: this is the case forcontent management systems (CMS), fora, wikis andother Web 2.0 sites [14]. Dynamic web pages repre-sent the biggest part of the World Wide Web, formingthe so called Deep Web [54], which is not accessible tosearch engines and software agents, since these pagesare generated in response to a web service or a web

2RDFa Primer: http://www.w3.org/TR/xhtml-rdfa-primer/

form interface request. It has been argued that, due tothe infeasibility of manual annotation of every singledynamic page, a possible solution would be to “anno-tate” directly the underlying database schema, insofaras the web page owner is willing to reveal the structureof his database. This “annotation” is simply a set ofcorrespondences between the elements of the databaseschema and an already existing ontology that fits thedomain of the dynamic page content [125]. Once suchmappings are defined, it would be fairly trivial to gen-erate dynamic semantically annotated pages with theembedded content annotations derived in an automaticfashion.

Heterogeneous database integration. The resolutionof heterogeneity is one of the most popular, long-standing issues in the database research field, that re-mains, to a large degree, unsolved. Heterogeneity oc-curs between two or more database systems when theyuse different software or hardware infrastructure, fol-low different syntactic conventions and representationmodels, or when they interpret differently the sameor similar data [114]. Resolution of the above formsof heterogeneity allows multiple databases to be inte-grated and their contents to be uniformly queried. Intypical database integration architectures, one or moreconceptual models are used to describe the contentsof every source database, queries are posed againsta global conceptual schema and, for each sourcedatabase, a wrapper is responsible to reformulate thequery and retrieve the appropriate data. Ontology-based integration employs ontologies in lieu of con-ceptual schemas and therefore, correspondences be-tween source databases and one or more ontologieshave to be defined [126]. Such correspondences con-sist of mapping formulas that express the terms of asource database as a conjunctive query against the on-tology (Local as view or LAV mapping), express on-tology terms as a conjunctive query against the sourcedatabase (Global as view or GAV mapping), or statean equivalence of two queries against both the sourcedatabase and the ontology (Global Local as view orGLAV mapping) [80]. The type of mappings used inan integration architecture influences both the com-plexity of query processing and the extensibility of theentire system (e.g. in the case of GAV mappings, queryprocessing is trivial, but the addition of a new sourcedatabase requires redefinition of all the mappings; theinverse holds for LAV mappings). Thus, the discov-ery and representation of mappings between relationaldatabase schemas and ontologies constitute an integral

http://www.w3.org/TR/xhtml-rdfa-primer/

http://www.w3.org/TR/xhtml-rdfa-primer/


part of a heterogeneous database integration scenario[8].

Ontology-based data access. Much like in a databaseintegration architecture, ontology-based data access(OBDA) assumes that an ontology is linked to a sourcedatabase, thus acting as an intermediate layer betweenthe user and the stored data. The objective of an OBDAsystem is to offer high-level services to the end userof an information system who does not need to beaware of the obscure storage details of the underlyingdata source [98]. The ontology provides an abstractionof the database contents, allowing users to formulatequeries in terms of a high-level description of a domainof interest. In some way, an OBDA engine resembles awrapper in an information integration scenario in thatit hides the data source-specific details from the up-per levels by transforming queries against a concep-tual schema to queries against the local data source.This query rewriting is performed by the OBDA en-gine, taking into account mappings between a databaseand a relevant ontology describing the domain of in-terest. The main advantage of an OBDA architecture isthe fact that semantic queries are posed directly againsta database, without the need to replicate its entire con-tents in RDF.

Apart from OBDA applications, a database to ontol-ogy mapping can be useful for semantic rewriting ofSQL queries, where the output is a reformulated SQLquery better capturing the intention of the user [21].This rewriting is performed by substitution of termsused in the original SQL query with synonyms and re-lated terms from the ontology. Another notable relatedapplication is the ability to query relational data usingas context external ontologies [45]. This feature hasbeen implemented in some database management sys-tems3, allowing SQL queries to contain conditions ex-pressed in terms of an ontology.

Mass generation of Semantic Web data. It has beenargued that one of the reasons delaying the Seman-tic Web realization is the lack of successful tools andapplications showcasing the advantages of SW tech-nologies [73]. The success of such tools, though, isdirectly correlated to the availability of a sufficientlylarge quantity of SW data, leading to a “chicken-and-egg problem” [62], where cause and effect form a vi-cious circle. Since relational databases are one of themost popular storage media holding the majority of

3Oracle and OpenLink Virtuoso are the most representative ex-amples.

data on the World Wide Web, a solution for the genera-tion of a critical mass of SW data would be the, prefer-ably automatic, extraction of relational databases’ con-tents in RDF. This would create a significant pool ofSW data, that would alleviate the inhibitions of soft-ware developers and tool manufacturers and, in turn,an increased production of SW applications would beanticipated. The term database to ontology mappinghas been used in the literature to describe such trans-formations as well.

Ontology learning. The process of manually de-veloping from scratch an ontology is difficult, time-consuming and error-prone. Several semi-automaticontology learning methods have been proposed, ex-tracting knowledge from free and semi-structured textdocuments, vocabularies and thesauri, domain expertsand other sources [56]. Relational databases are struc-tured information sources and, in case their schemahas been modelled following standard practices [51](i.e. based on the design of a conceptual model, suchas UML or the Extended Entity Relationship Model),they constitute significant and reliable sources of do-main knowledge. This is true especially for busi-ness environments, where enterprise databases are fre-quently maintained and contain timely data [129].Therefore, rich ontologies can be extracted from rela-tional databases by gathering information from theirschemas, contents, queries and stored procedures, aslong as a domain expert supervises the learning pro-cess and enriches the final outcome. Ontology learn-ing is a common motivation driving database to ontol-ogy mapping when there is not an existing ontologyfor a particular domain of interest, a situation that fre-quently arose not so many years ago. Nevertheless, asyears pass by, ontology learning techniques are mainlyused to create a wrapping ontology for a source rela-tional database in an ontology-based data access [110]or database integration [29] context.

Definition of the intended meaning of a relationalschema. As already mentioned, standard databasedesign practices begin with the design of a conceptualmodel, which is then transformed, in a step known aslogical design, to the desired relational model. How-ever, the initial conceptual model is often not keptalongside the implemented relational database schemaand subsequent changes to the latter are not propa-gated back to the former, while most of the times thesechanges are not even documented at all. Usually, thisresults in databases that have lost the original intentionof their designer and are very hard to be extended or


re-engineered to another logical model (e.g. an object-oriented one). Establishing correspondences between arelational database and an ontology grounds the orig-inal meaning of the former in terms of an expres-sive conceptual model, which is crucial not only fordatabase maintenance but also for the integration withother data sources [38], and for the discovery of map-pings between two or more database schemas [7,49].In the latter case, the mappings between the databaseand the ontology are used as an intermediate step anda reference point for the construction of inter-databaseschema mappings.

Integration of database content with other data sources.Transforming relational databases into a universal de-scription model, as RDF aspires to be, enables seam-less integration of their contents with informationalready represented in RDF. This information canoriginate from both structured and unstructured datasources that have exported their contents in RDF, thusovercoming possible syntactic disparities among them.The Linked Data paradigm [60], which encouragesRDF publishers to reuse popular vocabularies (i.e. on-tologies), to define links between their dataset andother published datasets and reuse identifiers that de-scribe the same real-world entity, further facilitatesglobal data source integration, regardless of the datasource nature. Given the uptake of the Linked Datamovement during the last few years, which has re-sulted in the publication of voluminous RDF content(in the order of billion statements) from several do-mains of interest, the anticipated benefits of the inte-gration of this content with data currently residing inrelational databases as well as the number of potentialapplications harnessing it are endless.

3. Preliminaries

In this section, we make a short introduction onsome aspects of database theory and Semantic Webtechnologies and define concepts that will be usedthroughout the paper. We first focus on conceptualand logical database design by visiting the Entity-Relationship and relational models and then, we brieflytouch upon the RDF graph model and OWL ontologylanguage.

3.1. Database theory fundamentals

The standard database design process mainly con-sists of three steps: a) conceptual, b) logical and c)

physical database design [102]. Conceptual designusually follows after a preliminary phase where dataand functional requirements for potential databaseusers are analyzed. The output of the conceptual de-sign is a conceptual schema, which describes from ahigh-level perspective the data that will be stored in thedatabase and the constraints holding over it. The con-ceptual schema is expressed in terms of a conceptualmodel, the most widely used for this purpose being theEntity-Relationship (ER) model, proposed by PeterChen [39]. The main features of the ER model are:

– entity sets, which are collections of similar ob-jects, called entities.

– attributes associated with each entity set. Anattribute can be either single-valued or multi-valued. All the entities belonging to a given entityset are described by the same attributes. For everyentity set, a key is specified as a minimal set ofattributes, the values of which uniquely identifyan entity.

– relationship sets, which are sets of relationshipinstances, connecting n entities with each other.n is called the degree of the relationship set. A re-lationship set can have descriptive attributes, justlike an entity set.

– cardinality constraints, which specify the mini-mum and maximum number of entities from agiven entity set that can participate in a relation-ship instance from a given relationship set.

– weak entity sets, which consist of weak entities,the existence of which depends on the presenceof another entity, called the owner entity. The re-lationship that connects a weak entity with itsowner entity is called identifying relationship.

Since its original definition, the ER model has beenfurther extended to include more characteristics thatenhanced its expressiveness. The extended versionof the ER model is known as the Extended Entity-Relationship (EER) model, which allows the defini-tion of subclass/superclass relationships among entitysets and introduces, among others, the union type andaggregation features. A formal definition of the ERmodel can be found in [33].

Logical database design consists of building a log-ical schema, which conforms to a representationalmodel that conceptually sits between the ER modeland the physical model describing the low-level stor-age details of the database. Such models include thenetwork, hierarchical and relational models, the most


popular one being the latter, proposed by Edgar Codd[41]. The main elements of the relational model are:

– relations, which intuitively can be thought of astables.

– attributes, which can be regarded as columns of atable. Every attribute belongs to a given relation.A minimal set of attributes that uniquely definean element of a relation (i.e. a tuple or informally,a row of the table) is a candidate key for that rela-tion. In case there exist more than one candidatekeys, one of them is specified as the primary keyof the relation and is used for the identification ofthis relation’s tuples. A foreign key for a given re-lation, called parent relation, is a set of attributesthat reference a tuple in another relation, calledchild relation.

– domains, which are sets of atomic constant val-ues. Every attribute is related to a specific domainand all of its values are members of this domain.

Again, a formal definition of the relational model canbe found in [3]. A relation schema is simply the nameof a relation, combined with the names and domains ofits attributes and a set of primary key and foreign keyconstraints. A relation instance is a set of tuples thathave the same attributes as the corresponding relationschema and satisfy the relation schema constraints.A relational database schema is the set of relationschemas for all relations in a database, while a rela-tional database instance is a set of relation instances,one for every relation schema in the database schema.

The transition from a conceptual schema expressedin terms of the EER model to a relational schema isperformed according to well established standard algo-rithms [51], that define correspondences between con-structs of the two models. A very rough sketch of thesecorrespondences is presented in Table 1, which, al-though incomplete, clearly reveals that the transforma-tion of the EER model to the relational one is not re-versible. This is the source of the database reverse en-gineering problem i.e. the problem of recovering theinitial conceptual schema from the logical one.

The relational schema obtained from a direct trans-lation of the conceptual schema is not always opti-mal for data storage and may cause problems duringdata update operations. Thus, the relational schemais further refined in an additional normalization step,which minimizes the redundancy of stored informa-tion. In short, the normalization process decomposesrelations of a relational database schema, in order toeliminate dependencies among relation attributes. The

most prominent types of dependencies met in rela-tional schemas are functional, inclusion, multi-valuedand join dependencies. Depending on the type of de-pendencies eliminated, the decomposed relations aresaid to be in a specific normal form. Normal formsbased on functional dependencies are the first, thesecond, the third and the Boyce-Codd normal form.Fourth and fifth normal form are based on multi-valuedand join dependencies respectively4. Without goinginto further detail, it suffices to say that normalization,as an additional design step, incurs further complica-tions to the database reverse engineering problem.

Much of the success of relational databases is dueto SQL, which stands for Structured Query Language[1] and is the standard query language for relationaldatabases. Despite the fact that SQL does not strictlyfollow Codd’s relational model5, it is relationally com-plete, i.e. all of the relational algebra operations (e.g.selection, projection, join, union) can be expressed asan SQL query. SQL serves not only as a declarativequery language – the subset of SQL called Data Ma-nipulation Language (DML) – but also as a databaseschema definition language – the subset of SQL calledData Definition Language (DDL). Thus, a relationaldatabase schema can often be expressed as an SQLDDL statement.

3.2. Semantic Web technologies

Moving on to the Semantic Web field, we begin withRDF, probably one of the most significant SW build-ing blocks. The RDF model is a graph-based modelbased on the concepts of resources and triples or state-ments. A resource can be anything that can be talkedabout: a web document, a real-world entity or an ab-stract concept and is identified by a URI (Uniform Re-source Identifier). RDF triples consist of three compo-nents: a subject, a predicate and an object. An RDFtriple is an assertion stating that a relationship, de-noted by the predicate, holds between the subject andthe object. An RDF graph is a set of RDF triples thatform a directed labelled graph. Subjects and objects ofRDF triples are represented by nodes, while predicates

4For more information on dependency types and normalization,see standard database theory textbooks [3,51,102].

5The SQL standard uses the terms table, column and row, insteadof the equivalent relational model terminology of relation, attributeand tuple. It also allows the presence of identical rows in the same ta-ble, a case frequently observed in derived tables that hold the resultsof an SQL query.


Table 1EER to relational model transformation

EER model Relational model

Entity set Relation

AttributeSingle-valued AttributeMulti-valued Relation and foreign key

Binaryrelationship set

1:1 or 1:N Foreign key or relationM:N Relation

n-degree relationship set (n > 2) Relation and n foreign keys

Hierarchy of entity setsOne relation per entity set and foreign keys

or one relation for all entity sets

are arcs pointing from the subject to the object of thetriple. A special case of nodes are blank nodes, whichare not identified by a URI. An RDF graph can be se-rialized in various formats, some of which are suitablemainly for machine consumption (such as the XML-based RDF/XML) and some are more easily under-standable by humans (such as the plain text Turtle andN3 syntaxes6).

Another fundamental concept not only for Seman-tic Web, but information science in general, is the con-cept of ontology. In the context of computer science,several definitions have been given for ontology, fromTom Gruber’s “ontology is an explicit specification ofa conceptualization”[58] and Nicola Guarino’s “ontol-ogy is a logical theory accounting for the intendedmeaning of a formal vocabulary”[59] to more concreteformal ones [46,69]. The abundance of ontology defi-nitions stems from the broadness of the term, which in-cludes ontologies of several logic formalisms and rep-resentation languages. Therefore, we slightly adapt thedefinition of Maedche and Staab [86] which is generalenough to describe most ontologies. According to [86],an ontology is a set of primitives containing:

– a set of concepts.– a set of relationships between concepts, described

by domain and range restrictions.– a taxonomy of concepts with multiple inheritance.– a taxonomy of relationships with multiple inheri-

tance.– a set of axioms describing additional constraints

on the ontology that allow to infer new facts fromexplicitly stated ones.

Ontologies can be classified according to the subjectof the conceptualization: hence, among others, we can

6The most popular among the two, Turtle, has been put on W3CRecommendation track (Working Draft: http://www.w3.org/TR/turtle/).

distinguish between knowledge representation, upper-level, domain and application ontologies [56]. Domain(or domain-specific) ontologies are the most commonones and provide a vocabulary for concepts of a spe-cific domain of interest and their relationships. We willsee in Section 5 that there are approaches that map re-lational databases to domain ontologies, where the do-main can be either the relational model itself or thetopic on which the contents of the database touch.

Several ontology languages have been proposed forthe implementation of ontologies, but RDF Schema(RDFS) and Web Ontology Language (OWL) are themost prominent ones, especially in the Semantic Webdomain. RDFS is a minimal ontology language builton top of RDF that allows the definition of:

– classes of individual resources.– properties, connecting two resources.– hierarchies of classes.– hierarchies of properties.– domain and range constraints on properties.

OWL, on the other hand, provides ontology design-ers with much more complex constructs than RDFSdoes and its popularity is constantly rising. As inRDFS, the basic elements of OWL are classes, prop-erties and individuals, which are members of classes.OWL properties are binary relationships and are dis-tinguished in object and datatype properties. Objectproperties relate two individuals, while datatype prop-erties relate an individual with a literal value. Hierar-chies of classes and properties, property domain andrange restrictions, as well as value, cardinality, existen-tial and universal quantification restrictions on the in-dividuals of a specific class can be defined, among oth-ers. OWL is based on Description Logics (DL) [15], afamily of languages that are subsets of first-order logic,and offers several dialects of differing expressiveness.Following the DL paradigm, statements in OWL can

http://www.w3.org/TR/turtle/

http://www.w3.org/TR/turtle/


be divided in two groups: the TBox, containing inten-sional knowledge (i.e. axioms about classes and prop-erties) and the ABox, containing extensional knowl-edge (i.e. statements involving individuals). Daring ananalogy with database systems, TBox could be com-pared to the database schema and ABox to the databaseinstance. The combination of TBox and ABox, to-gether with a provider of reasoning services, is oftenalso called a knowledge base.

The first generation of the OWL language family,also known as OWL 1, defined three OWL species:OWL Lite, OWL DL and OWL Full, in order ofincreasing expressiveness. However, these speciesturned out not to be very practical for real-world ap-plications. Therefore, the second generation of OWLlanguage, OWL 2, introduced three profiles of dif-ferent expressiveness, with overlapping feature sets:OWL 2 EL, OWL 2 RL and OWL 2 QL, each onebest suited for a different application scenario7. Fur-thermore, OWL 2 DL and Full were defined as syntac-tic extensions of OWL 1 DL and Full respectively in-cluding additional features, such as punning, propertychains and qualified cardinality constraints.

The expressiveness of an ontology language influ-ences the complexity of the reasoning algorithms thatcan be applied to an ontology. Such algorithms can,among others, infer new implicit axioms based onexplicitly specified ones, check for satisfiability ofclasses, compute hierarchies of classes and propertiesand check consistency of the entire ontology. As thereis a trade-off between the richness of an ontology lan-guage and the complexity of reasoning tasks, thereis also an ongoing quest for ontology languages thatstrike a balance between these two measures.

As with ontology definition languages, more than afew ontology query languages exist, but the de factoquery language for RDF graphs is SPARQL. SPARQLuses RDF graphs expressed in Turtle syntax as querypatterns and can return as output variable bindings(SELECT queries), RDF graphs (CONSTRUCT andDESCRIBE queries) or yes/no answers (ASK queries).SPARQL is constantly evolving so as to allow updateoperations as well8 and it has already been proved tobe as expressive as relational algebra [10].

An alternative way of modeling knowledge arerules, which can sometimes express knowledge that

7For more details on OWL 2 Profiles, see http://www.w3.org/TR/owl2-profiles/.

8SPARQL 1.1 Update: http://www.w3.org/TR/sparql11-update/

can not expressed in OWL. The related W3C Recom-mendation is RIF9 which introduces three dialects thatshare a lot in common with several existing rule lan-guages. The goal is to provide a uniform format forexchange of rules of different kinds. Despite its Rec-ommendation status, RIF has not been widely adoptedyet, and instead, the Semantic Web community seemsto favour the de facto standard of SWRL10, which al-lows rules to be expressed as OWL axioms and thus,be distributed as part of OWL ontologies.

4. A Classification of Approaches

The term database to ontology mapping has beenloosely used in the related literature, encompassing di-verse approaches and solutions to different problems.In this section, we give a classification which will helpus categorize and analyze in an orderly manner theseapproaches. Furthermore, we introduce the descriptiveparameters to be used for the presentation of each ap-proach.

Classification schemes and descriptive parametersfor database to ontology mapping methods have al-ready been proposed in related work. The distinctionbetween classification criteria and merely descriptivemeasures is often not clear. Measures that can act asclassification criteria should have a finite number ofvalues and ideally, should separate approaches in nonoverlapping sets. Such restrictions are not necessaryfor descriptive features, which can sometimes be qual-itative by nature instead of quantifiable measures. Ta-ble 2 summarizes classification criteria and descrip-tive parameters proposed in the related literature, de-spite the fact that some of the works mentioned have asomewhat different (either narrower or broader) scopethan ours. As it can be seen from Table 2, there is acertain level of consensus among classification efforts,regarding the most prominent measures characterizingdatabase to ontology mapping approaches.

The taxonomy we propose and which we are go-ing to follow in the course of our analysis is based onsome of the criteria mentioned in Table 2. The selec-tion of these criteria is made so as to create a total clas-sification of all relevant solutions in mutually disjointclasses. In other words, we partition the database to

9Rule Interchange Format Overview: http://www.w3.org/2005/rules/wiki/Overview

10Semantic Web Rule Language: http://www.w3.org/Submission/SWRL/

http://www.w3.org/TR/owl2-profiles/

http://www.w3.org/TR/owl2-profiles/

http://www.w3.org/TR/sparql11-update/

http://www.w3.org/TR/sparql11-update/

http://www.w3.org/2005/rules/wiki/Overview

http://www.w3.org/2005/rules/wiki/Overview

http://www.w3.org/Submission/SWRL/

http://www.w3.org/Submission/SWRL/


Table 2Classification criteria and descriptive parameters identified in literature

Work Classification Criteria Values Descriptive Parameters

Auer et al.(2009) [14]a

Automation in the creation of mappingSource of semantics consideredAccess paradigmDomain reliance

Automatic, Semi-automatic, ManualExisting domain ontologies, Database, Database and UserExtract-Transform-Load (ETL), SPARQL, Linked DataGeneral, Dependent

Mapping representationlanguage

Barrasa Ro-driguez,Gomez-Perez(2006) [17]

Existence of ontologyArchitectureMapping exploitation

Yes (ontology reuse), No (created ad-hoc)Wrapper, Generic engine and declarative definitionMassive upgrade (batch), Query driven (on demand)

–

Ghawi, Cullot(2007) [55]

Existence of ontologyComplexity of mapping definitionOntology population processAutomation in the creation of mapping

Yes, NoComplex, DirectMassive dump, Query drivenAutomatic, Semi-automatic, Manual

Automation in the instanceexport process

Hellmann et al.(2011) [61]

– – Data sourceb, Data exposi-tion, Data synchronization,Mapping language, Vocab-ulary reuse, Mapping au-tomation, Requirement ofdomain ontology, Existenceof GUI

Konstantinou,Spanos, Mitrou(2008) [72]

Existence of ontologyAutomation in the creation of mappingOntology development

Yes, NoAutomatic, Semi-automatic, ManualStructure driven, Semantics driven

Ontology language,RDBMS supported, Se-mantic query language,Database componentsmapped, Availability ofconsistency checks, Userinteraction

Sahoo et al.(2009) [105]

Same as in Auer et al. (2009) with the addition of:Query implementationData integration

SPARQL, SPARQL→SQLYes, No

Mapping accessibility, Ap-plication domain

Sequeda et al.(2009) [111]c

– – Correlation of primary andforeign keys, OWL andRDFS elements mapped

Zhao, Chang(2007) [129]

Database schema analysis Yes, No Purpose, Input, Output,Correlation analysis ofdatabase schema elements,Consideration of databaseinstance, application sourcecode and other sources

aIn this work, approaches are also divided in 4 categories: a) alignment, b) database mining, c) integration approaches and d) mappinglanguages/servers.

bApart from relational data sources, this report reviews knowledge extraction from XML and CSV data sources as well.cThis report investigates only approaches that create a new ontology by reverse engineering the database schema.

ontology mapping problem space in distinct categoriescontaining uniform approaches. The few exceptionswe come across are customizable software tools thatincorporate multiple workflows, with each one fallingunder a different category.

We categorize solutions to the database to ontologymapping problem to the classes shown in Figure 1.Next to each class, we append the applicable motiva-tions and benefits, as mentioned in Section 2. Associ-ation of a specific benefit to a given class denotes thatthere is at least one work in this class mentioning thisbenefit.

The first major division of approaches is based onwhether an existing ontology is required for the perfor-mance of a method. Therefore, the first classificationcriterion we employ is the existence of an ontologyas a requirement for the mapping process, distinguish-ing among methods that establish mappings between agiven relational database and a given existing ontology(presented in Section 6) and methods that create a newontology from a given relational database (presentedin Section 5). In the former case, the ontology to bemapped to the database should model a domain that iscompatible with the domain of the database contents inorder to be able to define meaningful mappings. This


Relational DBs to Semantic Web

New ontology Existing ontology

Database schema ontology

Domain-specific ontology

No database reverse engineering

Database reverse engineering

√ ontology based data access√ mass generation of SW data√ heterogeneous database integration√ integration with other data sources

√ semantic annotation of dynamic web pages√ mass generation of SW data√ definition of meaning of relational schema√ heterogeneous database integration√ ontology based data access√ integration with other data sources√ semantic annotation of dynamic web pages

√ ontology based data access√ mass generation of SW data√ heterogeneous database integration

√ heterogeneous database integration√ ontology learning√ ontology based data access

Fig. 1. Classification of approaches.

is the reason why the ontology, in this case, is usuallyselected by a human user who is aware of the nature ofdata contained in the database. In the case of methodscreating a new ontology, the existence of a domain on-tology is not a prerequisite, including use cases wherean ontology for the domain covered by the database isnot available yet or even where the human user is notfamiliar with the domain of the database and relies onthe mapping process to discover the semantics of thedatabase contents.

The class of methods that create a new ontologyis further subdivided in two subclasses, dependingon the domain of the generated ontology. On theone hand, there are approaches (presented in Section5.1) that develop an ontology with domain the rela-tional model. The generated ontology consists of con-cepts and relationships that reflect the constructs ofthe relational model, mirroring the structure of the in-put relational database. We will call such an ontol-ogy a “database schema ontology”. Since the gen-erated ontology does not implicate any other knowl-edge domain, these approaches are mainly automaticrelying on the input database schema alone. On theother hand, there are methods that generate domain-specific ontologies (presented in Section 5.2), depend-ing on the domain described by the contents of the in-put database.

The last classification criterion we consider is theapplication of database reverse engineering tech-niques in the process of creating a new domain-specificontology. Approaches that apply reverse engineeringtry to recover the initial conceptual schema from therelational schema and translate it to an ontology ex-pressed in a target language (see Section 5.2.2). Onthe other hand, there are methods that, instead of re-verse engineering techniques, apply few basic transla-tion rules from the relational to the RDF model (de-tailed in Section 5.2.1) and/or rely on the human expertfor the definition of complex mappings and the enrich-ment of the generated ontology model.

By inspection of Figure 1, we can also see that,with a few exceptions, there is not significant corre-lation between the taxonomy classes and the motiva-tions and benefits presented in Section 2. This happensbecause, as we argued above, this taxonomy catego-rizes approaches based on the nature of the mappingand the techniques applied to establish the mapping.On the contrary, most benefits state the applicationsof the already established mappings (e.g. database in-tegration, web page annotation), which do not de-pend on the mapping process details. Notable excep-tions to the above observation are the ontology learn-ing and definition of semantics of database contentsmotivations. The former, by definition, calls for ap-proaches that produce an ontology by analyzing a re-


lational database, while the latter calls for approachesthat ground the meaning of a relational database toclearly defined concepts and relationships of an exist-ing ontology.

Given the taxonomy of Figure 1 and the three clas-sification criteria employed in it, we choose among therest of the features and criteria mentioned in Table 2the most significant ones as descriptive parameters forthe approaches reviewed in this paper. The classifica-tion criteria and descriptive parameters adopted in thispaper are shown in Figure 2 and elaborated in the fol-lowing:

Level of automation. This variable denotes the ex-tent of human user involvement in the process. Pos-sible values for it are automatic, semi-automatic andmanual. Automatic methods do not involve the humanuser in the process, while semi-automatic ones requiresome input from the user. This input may be necessaryfor the operation of the process, constituting an integralpart of the algorithm, or it may be optional feedbackfor the validation and enrichment of the result of themapping process. In manual approaches, the mappingis defined in its entirety by the human user withoutany feedback or suggestions from the application. Insome cases, automation level is a characteristic that iscommon among approaches within the same taxonomyclass. For instance, methods that create a new databaseschema ontology are purely automatic, while the ma-jority of approaches that map a relational database toan existing ontology are manual, given the complexityof the discovery of correspondences between the two.

Data accessibility. Data accessibility describes howthe mapping result is accessed. This variable is alsoknown in the literature as access paradigm, mappingimplementation or data exposition. Possible valuesfor this variable are ETL (Extract, Transform, Load),SPARQL or another ontology query language andLinked Data. ETL means that the result of the map-ping process, be it an entire new ontology or a groupof RDF statements referencing existing ontologies, isgenerated and stored as a whole in an external stor-age medium (often a triple store but it can be an ex-ternal file as well). In this case, the mapping result issaid to be materialized. Other terms used for ETL isbatch transformation or massive dump. An alternativeway of accessing the result of the mapping process isthrough SPARQL or some alternative semantic querylanguage. In such a case, only a part (i.e. the answerto the query) of the entire mapping result is accessedand no additional storage medium is required sincethe database contents are not replicated in a material-

ized group of RDF statements. The semantic query isrewritten to an SQL query, which is executed and itsresults are transformed back as a response to the se-mantic query. This access mode is also known as querydriven or on demand and characterizes ontology-baseddata access approaches. Finally, the Linked Data valuemeans that the result of the mapping is published ac-cording to the Linked Data principles: all URIs use theHTTP scheme and, when dereferenced, provide usefulinformation for the resource they identify11.

The data accessibility feature is strongly related tothe data synchronization measure, according to which,methods are categorized in static and dynamic ones,depending on whether the mapping is executed onlyonce or on every incoming user query, respectively. Webelieve that the inclusion of data synchronization asan additional descriptive feature would be redundant,since the value of data accessibility uniquely deter-mines whether a method maintains static or dynamicmappings. Hence, an ETL value signifies static map-pings, while SPARQL and Linked Data values denotedynamic mappings.

Mapping language. This feature refers to the lan-guage in which the mapping is represented. The val-ues for this feature vary considerably, given that, un-til recently, no standard representation language ex-isted for this purpose and most approaches used propri-etary formats. W3C recognized this gap and proposedR2RML12 (RDB to RDF Mapping Language) as a rep-resentation language for relational database to ontol-ogy mappings. As R2RML is currently under develop-ment, it is expected to be adopted by software tools inthe near future. The mapping language feature is ap-plicable only to methods that need to reuse the map-ping. A counterexample is the class of ontology learn-ing methods that create a new domain ontology. Meth-ods of this class usually do not represent the mappingin any form, since their intention is simply the gen-eration of a new domain ontology, having no interestin storing correspondences with database schema ele-ments.

Ontology language. This parameter states the lan-guage on which the involved ontology is expressed.

11The use of the term Linked Data may be slightly misleading,as both the ETL export of RDF data dumps and the SPARQL ac-cess scheme are considered alternative ways of Linked Data pub-lishing [60]. Nevertheless, we follow the convention of [14,61,105],where the term Linked Data denotes an RDF access scheme basedon HTTP dereferenceable URIs.

12R2RML Specification: http://www.w3.org/TR/r2rml/

http://www.w3.org/TR/r2rml/

http://www.w3.org/TR/r2rml/


Existence of ontology• Yes• No

Ontology domain• Relational model• Other

Application of database reverse engineering

• Yes• No

Automation level• Automatic• Semi-automatic• Manual

Data Accessibility• ETL• SPARQL• Linked Data

Mapping Language• SQL• RDF/XML• Custom language etc.

Ontology Language• RDFS• OWL dialect

Vocabulary Reuse• Yes• No

Software availability• Yes• No• Commercial

Graphical user interface• Yes• No

Purpose• Mass generation of SW data• Ontology learning• Ontology based data access

• Database integration etc .

Fig. 2. Classification criteria and descriptive parameters used.

Depending on the kind of approach, it may refer to thelanguage of the ontology generated by the approach orto the language of the existing ontology required. Sincethe majority of methods reviewed implicate SemanticWeb ontologies, the values for this feature range fromRDFS to all flavours of OWL. When the expressive-ness of the OWL language used is not clear from thedescription of the method, no specific profile or speciesis mentioned.

Vocabulary reuse. This parameter states whether arelational database can be mapped to more than oneexisting ontologies or not. This is mainly true for man-ual approaches where the human user is usually free toreuse terms from existing ontologies at will. Reusingterms from popular vocabularies is one of the prereq-uisites for Semantic Web success and hence, is an im-portant aspect for database to ontology mapping ap-proaches. Of course, methods allowing for vocabularyreuse do not enforce the user to do so, thus this vari-able should not be mistaken with the existence of on-tology requirement that was used as a classification cri-terion in Figure 1. Once again, values of this param-eter may be common among all approaches within acertain class, e.g. methods generating a new databaseschema ontology do not directly allow for vocabularyreuse, since the terms used are specified a priori by themethod and are not subject to change.

Software availability. This variable denotes whetheran approach is accompanied with freely accessiblesoftware, thus separating purely theoretical methods orunreleased research prototypes from approaches pro-

viding practical solutions to the database to ontologymapping problem. Therefore, this feature serves (to-gether with the purpose feature) as an index of theavailable software that should be considered from apotential user seeking support for a particular problem.Commercial software, which is not offering freely itsfull functionality to the general public, is indicated ap-propriately.

Graphical user interface. This feature applies toapproaches accompanied with an accessible softwareimplementation (with the exception of unavailable re-search prototypes that have been stated to have a userinterface) and shows whether the user can interact withthe system via a graphical user interface.This param-eter is significant especially for casual users or evendatabase experts who may not be familiar with Se-mantic Web technologies but still want to bring theirdatabase into Semantic Web. A graphical interface thatguides the end user through the various steps of themapping process and provides suggestions is consid-ered as essential for inexperienced users.

Purpose. This is the main motivation stated by theauthors of each approach. This is not to say that themotivation stated is the only applicable and that the ap-proach cannot be used in some other context. Usually,all the benefits and motivations showcased in Figure 1apply to all tools and approaches of the same category.

We note once again that any of the above measurescould have been considered as a criterion for the def-inition of an alternative taxonomy. However, we ar-gue that the criteria selected lead to a partition of the


entire problem space in clusters of approaches thatpresent increased intra-class similarity. For example,some works [14,17,55,105] categorize methods basedon the data accessibility criterion to static and dynamicones. This segregation is not complete though, as thereare methods that restrict themselves in finding corre-spondences between a relational database and an on-tology without migrating data to the ontology nor of-fering access to the database through the ontology.This was the main reason behind the omission of thisdistinction in the classification of Figure 1. The valuesof the above descriptive parameters for all approachesmentioned in this paper are presented in Table 8.

Summing up the categories of methods recognized,we will deal with the following, in the next sections ofthe paper:

1. Creation of database schema ontology (Section5.1)

2. Creation of domain-specific ontology withoutdatabase reverse engineering (Section 5.2.1)

3. Creation of domain-specific ontology with databasereverse engineering (Section 5.2.2)

4. Definition or discovery of mappings betweena relational database and an existing ontology(Section 6)

5. Ontology Creation from Existing Database

In this section, we deal with the issue of generatinga new ontology from a relational database and option-ally populating the former with RDF data that are ex-tracted from the latter. Schematically, this process isroughly shown on Figure 3. The mapping engine inter-faces with the database to extract the necessary infor-mation and, according to manually defined mappingsor internal heuristic rules, generates an ontology whichcan also be viewed as a syntactic equivalent of an RDFgraph. This RDF graph can be accessed in three ways,as described in Section 4: ETL, SPARQL or LinkedData (lines 1, 3 and 2 in Figure 3, respectively). InETL, the RDF graph can be stored in a file or in a per-sistent triple store and be accessed directly from there,while in SPARQL and Linked Data access modes, theRDF graph is generated on the fly from the contentsresiding in the database. In these modes, a SPARQLquery or an HTTP request respectively are transformedto an appropriate SQL query. Finally, the mapping en-gine retrieves the results of this SQL query (this is notdepicted in Figure 3) and formulates a SPARQL so-

Relational database

Mapping engine

Mappings

Rules

File storage

Persistent storage

1

2

2 3

SQL

3

Fig. 3. Generation of ontology from a relational database.

lution response or an appropriate HTTP response, de-pending on the occasion.

Before we delve into the different categories of on-tology creation approaches, we describe as briefly aspossible an elementary transformation method of a re-lational database to an RDF graph proposed by TimBerners-Lee [22] that, albeit simple, has served as areference point for several ontology creation meth-ods. This method has been further extended [19] soas to produce an RDFS ontology and is also infor-mally known as “table-to-class, column-to-predicate”,but for brevity we are going to refer to it as the basicapproach throughout this paper. According to the basicapproach:

1. Every relation R maps to an RDFS class C(R).2. Every tuple of a relation R maps to an RDF node

of type C(R).3. Every attribute att of a relation maps to an RDF

property P (att).4. For every tuple R[t], the value of an attribute att

maps to a value of the property P (att) for thenode corresponding to the tuple R[t].

As subjects and predicates of RDF statements mustbe identified with a URI, it is important to come upwith a URI generation scheme that guarantees unique-ness for distinct database schema elements. As ex-pected, there is not a universal URI generation scheme,but most of them follow a hierarchical pattern, muchlike the one presented in Table 3, where for conve-nience multi-column primary keys are not considered.An important property that the URI generation schememust possess and is indispensable for SPARQL andLinked Data access is reversibility, i.e. every generatedURI should contain enough information to recognizethe database element or value it identifies.

The basic approach is generic enough to be appliedto any relational database instance and to a large de-


Table 3Typical URI generation scheme

Database Element URI Templatea Example

Database {base_URI}/{db}/ http://www.example.org/univ/Relation {base_URI}/{db}/{rel} http://www.example.org/univ/staff/Attribute {base_URI}/{db}/{rel}#{attr} http://www.example.org/univ/staff#addressTuple {base_URI}/{db}/{rel}/{pk} = {pkval} http://www.example.org/univ/staff/id=278

aThe name of the database is denoted db, rel refers to the name of a relation, attr is the name of an attribute and pk is the name ofthe primary key of a relation, while pkval is the value of the primary key for a given tuple of a relation.

gree automatic, since it does not require any input fromthe human user, except from the base URI. However, itaccounts for a very crude export of relational databasesto RDFS ontologies that does not permit more complexmappings nor does it support more expressive ontol-ogy languages. Moreover, the resulting ontology looksa lot like a copy of the database relational schema asall relations are translated to RDFS classes, even theones which are mere artifacts of the logical databasedesign (e.g. relations representing a many-to-many re-lationship between two entities). Other notable short-comings of the basic approach include its URI gen-eration mechanism, where new URIs are minted evenwhen there are existing URIs fitting for the purposeand the excessive use of literal values, which seriouslydegrade the quality of the RDF graph and complicatethe process of linking it with other RDF graphs [31].Nevertheless, as already mentioned, several methodsrefine and build on the basic approach, by allowing formore complex mappings and discovering domain se-mantics hidden in the database structure. Such meth-ods are presented in the next subsections.

5.1. Generation of a database schema ontology

We first review the class of approaches that create anew database schema ontology. Database schema on-tologies have as domain the relational model and donot model the domain that is referred to by the contentsof the database. In other words, a database schema on-tology contains concepts and relationships pertainingto the relational model, such as relation, attribute, pri-mary key and foreign key and simply mirrors the struc-ture of a relational database schema.

The process of generating such an ontology is en-tirely automated, since all the information needed forthe process is contained in the database schema andadditional domain knowledge from a human user orexternal information sources is not needed. Both ETLand SPARQL based approaches are met in this cate-

gory regarding data accessibility, while the mapping it-self is rarely represented in some form. This is due tothe fact that the correspondences between the databaseschema and the generated ontology are straightfor-ward, as every element of the former is usually mappedto an instance of the respective class in the latter. Forexample, a relation and each one of its attributes ina database schema are mapped to instances of someRelation and Attribute classes in the generated ontol-ogy respectively, with the Relation instance being con-nected with the Attribute instances through an appro-priate hasAttribute property. The names of classes andproperty in the above example are of course figurative,as every approach defines its own database schema on-tology.

An issue frequently arising when modeling a databaseschema ontology is the need to represent, apart fromthe database schema, the schema constraints thatare applied to the database contents. In the major-ity of cases, the motivation is the check of databaseschema constraints using ontology reasoning engines.This requires not only representation of the databaseschema but also description of the data itself. Themost straightforward and natural design pattern thatachieves this represents schema-level database con-cepts as instances of the database schema ontologyclasses and data is then represented as instances of theschema layer concepts. This design makes use of moreadvanced metamodeling ontology features. Informally,metamodeling allows an ontology resource to be usedat the same time as a class, property and individual.Metamodeling violates the separation restriction be-tween classes, properties, individuals and data valuesthat is enforced in OWL (1/2) DL, the decidable frag-ment of the OWL language, and leads to OWL Fullontologies, which have been shown to be undecidable[89]. The introduction of the punning feature in OWL2 DL may allow different usages of the same term, e.g.both as a class and an individual, but the standard OWL2 DL semantics and reasoners supporting it consider


these different usages as distinct terms and thus, antic-ipated inferences based on OWL (1/2) Full semanticsare missed.

Approaches that make use of metamodeling forbuilding a database schema ontology [95,121] createOWL Full ontologies and do not go as far as to checkdatabase constraints with appropriate ontology reason-ers. This may be due to the popular misbelief that sucha task can be carried out only by reasoners that cancompute all OWL Full entailments of an ontology. Onthe contrary, widely available OWL 2 RL reasoners13

or first order logic theorem proving systems may suf-fice for the task [108].

The vast majority of methods in this group mapa relational database schema to strictly one ontol-ogy and do not reuse terms from external vocabu-laries, since their domain of interest is the relationalmodel. The obvious disadvantage of this group of ap-proaches is the lack of data semantics in the gener-ated database schema ontology, a fact that makes themonly marginally useful for integration with other datasources. Thereby, some approaches complement thedatabase schema ontology with manually defined map-pings to external domain-specific ontologies. Theseontology mappings may take the form of ontology(rdfs:subClassOf and owl:equivalentClass) axioms[74,75], SPARQL CONSTRUCT queries that outputa new RDF graph with terms from known ontolo-gies [96] or SWRL rules that contain terms from thedatabase schema ontology in their body and termsfrom popular ontologies in their head [53]. Such ap-proaches, even indirectly, manage to capture the se-mantics of database contents and thus, increase thesemantic interoperability among independently devel-oped relational databases.

The most popular and dominant work in this cat-egory is Relational.OWL [95], which is also reusedin other approaches and tools (e.g. ROSEX [43] andDataMaster [93]). Relational.OWL embraces the de-sign pattern commented above and makes use of theOWL Full metamodeling features. Every relation andattribute in the relational model gives rise to the cre-ation of a corresponding instance of meta-classes Ta-ble and Column. Data contained in the database is rep-

13OWL 2 RL Rules (http://www.w3.org/TR/owl2-profiles/#Reasoning_in_OWL_2_RL_and_RDF_Graphs_using_Rules) offer a partial axiomatization of theOWL 2 Full semantics and thus, can be used for developingincomplete reasoning systems that extract some OWL 2 Fullentailments.

resented according to the rules of the basic approach:every tuple of a relation is viewed as an instance ofa schema representation class, while tuple values asviewed as values of properties-instances of the Columnclass. This lack of separation among classes, propertiesand individuals makes the produced ontology OWLFull. Moreover, OWL classes denoting the concepts ofprimary key and database schema are defined, as wellas OWL properties that tie together database schemaelements (such as the hasColumn property connectinginstances of the Table and Column classes and the ref-erences property describing foreign key links). As withmost database schema ontologies, the generation of theontology is performed automatically, without any userintervention. To overcome the lack of domain seman-tics in the database schema ontology, the authors alsopropose the use of Relational.OWL as an intermediatestep for the creation of an RDF graph that reuses termsfrom external ontologies [96]. The RDF graph is pro-duced as the result of SPARQL CONSTRUCT queriesexecuted on the Relational.OWL ontology. Apart fromstandard ETL access, the addition of the RDQuerywrapper system [97] allows for querying a relationaldatabase through SPARQL queries using terms fromthe Relational.OWL ontology.

DataMaster [93] is a tool that exports the schemaand the contents of a relational database in an ontologyusing the Relational.OWL terminology. The noveltythough is that it also offers two alternative modelingversions of Relational.OWL that manage to stay withinthe syntactic bounds of OWL DL. In the first one,the schema representation layer of Relational.OWLis completely dropped and thereby, database relationsand attributes are translated to OWL classes and prop-erties respectively (much like in the basic approach). Inthe second one, it is the data representation layer thatis missing, since relations and attributes are translatedto instances of the Table and Column Relational.OWLclasses respectively, but the latter are not translated toproperties as well, therefore avoiding the metamodel-ing feature used in Relational.OWL.

ROSEX [43] also uses a slightly modified versionof the Relational.OWL ontology to represent the re-lational schema of a database as an OWL ontology,thus ignoring the data representation layer of the latter.The generated database schema ontology is mappedto a domain-specific ontology that is generated auto-matically by reverse engineering the database schema(much like approaches of Section 5.2.2 do) and thismapping is used for the rewriting of SPARQL queriesexpressed over the domain ontology to SQL queries

http://www.w3.org/TR/owl2-profiles/#Reasoning_in_OWL_2_RL_and_RDF_Graphs_using_Rules




expressed over the relational database. Although it isargued that the insertion of the database schema on-tology as an intermediate layer between the domainontology and the database can contribute in track-ing changes of the domain ontology as the databaseschema evolves, it is not clear how this is actuallyachieved.

An approach similar to Relational.OWL that pro-duces a new database schema OWL ontology isRDB2ONT [121]. The classes and properties de-fined follow the same metamodeling paradigm as inRelational.OWL, but the attempt to model relationalschema constraints as OWL constraints is erroneous.Although relational schema constraints refer to thecontents of a relation, the authors translate these con-straints by applying them on the schema-level OWLclasses, in an unsuccessful effort to avoid OWL Full.Furthermore, it is not clear whether information fromthe ER conceptual schema of the database is used inthe design of ontology, since the exact cardinalitiesof relationships are assumed in this work to be given,while it is known that they cannot always be deducedby inspection of the relational schema alone.

Avoidance of metamodeling is successful in thework of Kupfer and colleagues [75], where repre-sentation of the database contents is not considered.The output of this approach is an OWL-Lite ontol-ogy reflecting the database schema, which is latermanually enriched with terms from a domain ontol-ogy through the use of standard rdfs:subClassOf andowl:equivalentClass axioms.

Automapper [53] (now part of the commercial AsioSBRD tool) allows a relational database to be queriedthrough SPARQL using terms from a domain ontol-ogy. Despite the fact that it is mainly an ontology-based data access solution, we present it in this sec-tion, since it involves a database schema ontology aswell. Automapper creates an OWL ontology enhancedwith SWRL rules to describe the relational schemaand express its constraints, such as the uniqueness ofthe primary key or attribute datatype restrictions. Thegeneration of the schema ontology is automatic andfollows the basic approach of one class per relationand one property per attribute. Automapper is the onlytool in this category that stores explicitly the mappingamong the database schema and the generated ontol-ogy. The mapping is expressed in terms of a mappingontology heavily influenced by D2RQ [27], which isdescribed in Section 5.2. Once again, the construc-tion of a database schema ontology is supplementedwith connections to a domain ontology, this time via

SWRL rules. SPARQL queries using terms from thedomain ontology are translated to SQL queries, takinginto account details of the stored mapping, such as theschema-specific URI generation mechanism.

A similar workflow is described in the work of Dol-bear and Hart [48], which focuses on spatial databases.Their automatically generated data ontology is en-riched with manually defined ontological expressionsthat essentially correspond to SQL queries in theunderlying database. These complex expressions arethen linked through ordinary rdfs:subClassOf andowl:equivalentClass axioms with an existing domainontology.

FDR2 [74] diverges considerably from the basic ap-proach: every attribute of a relation is mapped to anRDFS class, a relation corresponds to an RDF listof these classes, while every pair of attributes of thesame relation is mapped to a so-called virtual prop-erty. A database value is mapped as instance of its re-spective “column class” and a tuple is represented in-directly, with the interlinkage of values of the samerow via virtual properties. This design choice leadsto a certain amount of redundancy in the representa-tion of database contents, especially in the case of re-lations with n attributes, where n(n − 1) virtual at-tributes are created. Correspondences of the createddatabase schema ontology with an existing domain on-tology are defined manually as rdfs::subClassOf andrdfs:subPropertyOf axioms.

An interesting summary of solutions for the trans-formation of relational tuples and hence relationalschemas in RDFS ontologies is given in the work ofLausen [78], with ultimate goal the identification ofthe optimal scheme allowing for database constraintchecking, while staying clear of OWL Full solutions.In this work, four different alternatives are presented.The tuple-based mapping is essentially the same as thebasic approach, with the only difference being the useof blank nodes for each tuple, instead of a generatedURI. The obvious disadvantage of this approach is theencoding of all database values as literals, which ig-nores the presence of foreign keys and leads to dupli-cation of data in the output RDF graph. This is over-come in value-based mapping, where URIs are gener-ated for the values of key attributes and thus, “foreignkey” properties can directly reference these values inthe output RDF graph. The URI-based mapping pro-duces a URI for every tuple of a relation, based on thevalues of the primary key of the relation, pretty muchfollowing the URI scheme of Table 3. Also, since pri-mary key values are encoded in the tuple URI, the pri-


mary key of a relation is not explicitly mapped to anRDF property. The drawback of this type of mappingis the fact that foreign key constraints may also im-plicitly be encoded as dependencies between URIs inthe RDF graph, in case the primary key of a relationis also a foreign key to some other relation, a resultthat is highly undesirable. The object-based mappingsticks to the assignment of URIs to the values of keyattributes (as in the case of value-based mapping), butforeign keys are represented as properties linking theblank node identifiers of the tuples being linked viathe foreign key relationship. Again, foreign key con-straints are difficult to be checked in a straightforwardway, since a property path should be followed in theRDF graph in order to ensure for referential integrity.Therefore, the solution adopted by Lausen is an RDFSontology that employs the value-based mapping ap-proach and uses a custom vocabulary of classes andproperties for the denotation of primary and foreignkeys. This generated database schema ontology is infact used to check database constraints via executionof appropriate SPARQL queries in [79].

Levshin [81] focuses on database schema constraintchecking as well. His approach constructs an OWLontology that follows the basic approach, but he alsoenriches it with vocabulary for database concepts andconstraint checking. Special care is taken for null val-ues, which are explicitly marked in the generated OWLontology, contrary to common practice, which simplyignores database null values during the transformationof database contents in RDF. Not null, primary, foreignkey and anti-key14 constraints are encoded as SWRLrules, which are executed as soon as the target on-tology is generated. Then, the source of violation ofthese constraints, if any, is retrieved through SPARQLqueries. This approach is slightly different to the oneof Lausen and colleagues [79], in which only SPARQLqueries are used for the inspection of constraint vio-lations. Both of these works distinguish constraints topositive and negative ones, depending on whether thepresence or the lack of some data causes the violationof the constraint. While Lausen and colleagues pro-pose an extension to SPARQL so as to be able to checkthe violation of all possible constraints, Levshin suc-ceeds in doing so by definition of two distinct prop-erties (violates and satisfies) for the two type of con-

14An anti-key constraint for a number of attributes in a relation issatisfied if there is at least one pair of tuples with identical values forthese attributes.

straints and then, by combined use of SWRL rules withSPARQL queries for spotting violations.

The complications arising when considering valida-tion of database schema constraints through ontologylenses stem from two fundamental differences in thesemantics of relational databases and description log-ics ontologies. In database systems, both the closedworld assumption (CWA) and the unique name as-sumption (UNA) hold, while DL ontologies follow theopen world assumption (OWA) and do not adopt UNA.This means that, what is not stated explicitly in a rela-tional database instance is known to be false, while ina DL ontology, when something is not explicitly statedit is simply not known whether it holds or not. Fur-thermore, the lack of UNA in an ontology allows dif-ferent names to refer to the same individual, in con-trast with databases where every instance is identifiedby a unique name. These differences in semantics be-tween databases and ontologies have been extensivelyinvestigated in the works of Motik and colleagues [90]and Sirin and Tao [115], proposing different seman-tics for the interpretation of ontology axioms that actas integrity constraints. In fact, in [115], it is arguedthat validation of database integrity constraints canbe performed through the execution of SPARQL con-straints, therefore giving right to the efforts describedin [79,81].

Slight variations of the tuple-based mapping are pre-sented in CROSS [35] and in the work of Chatter-jee and Krishna [36]. CROSS exports an OWL rep-resentation of a relational database schema and actsas a wrapper for the data contained in the database,i.e. it employs ETL access to the schema and querybased access to the data. For the data component ofthe created ontology, CROSS adopts a variant of thebasic approach: one class for every database relationis constructed, tuples are translated to instances of therelation’s class, and values are translated to literalsthat are not directly linked to the tuple instances, butthrough an intermediate blank node instead. Databaseschema constraints are interpreted as OWL axiomsand thus, their validity (with the exception of foreignkey constraints) can be checked with application of anOWL reasoner. Chatterjee and Krishna [36] constructan RDF graph that follows the principles of tuple-based mapping and they enrich it with vocabulary per-taining to the relational and ER models. No hints aregiven though as to how this RDF representation couldbe used for ensuring the validity of data with respectto database schema constraints.


OntoMat-Reverse [125] is a semantic annotationtool for dynamic web pages (now part of the OntoMat-Annotizer tool), supporting two different workflowscenarios: one for the annotation of the web presenta-tion and another for direct annotation of the databaseschema. While the latter workflow is a typical exampleof a mapping discovery approach between a databaseschema and an existing ontology (and is therefore de-scribed in Section 6), the former involves the construc-tion of a database schema ontology and is of interestin the current Section. The database schema ontologyconstructed in this approach is a simple RDFS ontol-ogy that represents just the structure of the database.Terms from this ontology are used in RDFS represen-tations of SQL queries that, when executed, return val-ues to be used in dynamically generated web pages andwhich are also embedded in the web presentation. Themanual annotation that is then performed through agraphical interface essentially relates an ontology termto the column of the result set of an SQL query. In thiscase, the constructed database schema ontology is usedas a mere provider of column identifiers for the anno-tation task, thus one could question its necessity, sincesimpler solutions could have been equally successfulfor this task.

The relational schema ontology of Spyder15 is an-other example of a database schema ontology. Spyderis a tool that outputs a domain-specific ontology ac-cording to manually specified mappings and therefore,does not quite fit in this category. However, it is men-tioned here because, as an intermediate step, it gen-erates an ontology that describes a database schema.This ontology provides a vocabulary for the databaseterms that are going to be referred to in the specifica-tion of mappings.

In Table 4, we summarize the design choices ofthe mentioned approaches by explicitly presenting theontology constructs used for each of the relationaldatabase schema elements (approach-specific ontologyvocabulary is marked with italics), including schemaconstraints for attributes, such as domain and not nullconstraints. We also state whether approaches considerthe enrichment of their generated database schema on-tology with domain semantics, usually in the form ofcorrespondences with an external domain ontology.

15Spyder homepage: http://www.revelytix.com/content/spyder-beta

5.2. Generation of a domain-specific ontology

Instead of creating an ontology that mirrors thedatabase instance and adding at a later stage domainsemantics, it is preferable to create directly an ontol-ogy that refers to the subject domain of the contents ofthe database. In this section, we deal with approachesthat create domain-specific ontologies, in contrast withthe database schema ontologies considered in Section5.1. These domain-specific ontologies do not containany concepts or relationships that are related to the re-lational or ER models but instead, concepts and rela-tionships pertaining to the domain of interest that isdescribed by a relational database instance. Naturally,the expressiveness of the generated ontology largelydepends on the amount of domain knowledge incorpo-rated in the procedure. The two main sources of do-main knowledge are the human user and the relationaldatabase instance itself. This is the reason why we dis-tinguish between approaches that mainly rely on re-verse engineering the relational schema of a databasefor the generation of a domain-specific ontology (butcan also accept input from a human user) and ap-proaches that do not perform extensive schema analy-sis but instead are mainly based on the basic approachand optionally, input from a human expert.

5.2.1. Approaches not performing reverseengineering

In this section, we take a look at approaches thatgenerate a new domain-specific ontology from an ex-isting relational database without performing extensivedatabase schema analysis for the extraction of domainsemantics. Approaches of this category mainly use thebasic approach for exporting a relational database in-stance to an RDFS ontology and, in the majority ofcases, allow a human user who is familiar with themeaning of the database contents to define semanti-cally sound mappings, expressed in a custom mappingrepresentation language. Approaches in this categoryare usually accompanied with working software appli-cations and this is one of the reasons why this categoryof tools is by far the most popular and why several rel-evant commercial tools also begin to emerge16.

The automation level of this category of tools variesand depends on the level of involvement of the humanuser in the process. Most of these tools support all ofthe three possible workflows: from the fully automated

16A notable example is Anzo Connect: http://www.cambridgesemantics.com/products/anzo_connect

http://www.revelytix.com/content/spyder-beta

http://www.revelytix.com/content/spyder-beta

http://www.cambridgesemantics.com/products/anzo_connect

http://www.cambridgesemantics.com/products/anzo_connect


Table 4Overview of approaches of Section 5.1

ApproachRelational schema elements Link with

domainontology

Relation Attribute Tuple Primary Key Foreign Key Attribute Constraints

Automapper[53]

Class Datatype prop-erty

Instance of a re-lation’s class

SWRL rule – owl:allValuesFrom andowl:cardinality = 1 ax-ioms for domain and notnull constraints respec-tively

Yes

CROSS[35]

Subclass of Rowclass

Datatype prop-erty

Instance of Rowclass

Object property Object property rdfs:range restriction onthe attribute’s propertyfor domain constraints

No

DataMaster[93]a

Class Datatype prop-erty

Instance of a re-lation class

– Instance of For-eignKey class

– No

Instance of Ta-ble class

Instance of Col-umn class

– Instance of Pri-maryKey class

references prop-erty

rdfs:range restriction onhasXSDType propertyfor domain constraints

No

FDR2 [74] RDF list of at-tributes’ classes

Class RDF list of val-ues

– – – Yes

Kupfer etal. (2006)[75]

Instance of Re-lation class

Instance of At-tribute class

– – isAssociatedWithproperty

consistsOf property fordomain constraints

Yes

Lausen(2007) [78]

Class Property Blank node Instance of Keyclass

Instance ofFKey class

– No

Levshin(2009) [81]

Subclass of Tu-ple class

Datatype prop-erty


SWRL rule SWRL rules rdfs:range restriction onthe attribute’s propertyfor domain constraints

No

OntoMat-Reverse[125]




– type property for domainconstraints

Yes

RDB2ONT[121]

Instance of theRelation class

Property and in-stance of the At-tribute class


Instance ofPrimaryKeyAt-tribute class

referenced At-tribute property

isNullable and typeproperties for not nulland domain constraintsrespectively

Yes

RelationalOWL [95]

Class and in-stance of Tablemeta-class

Datatype prop-erty and in-stance of Col-umn meta-class


Instance of Pri-maryKey class

references prop-erty

rdfs:range restriction onthe attribute’s propertyfor domain constraints

Yes [96]

ROSEX[43]




Instance of For-eignKey class

– Yes

Spyder[88]




Instance of For-eignKey class

dataType, defaultValueand nullable propertiesfor domain, defaultvalue and not nullconstraints respectively

No

aDataMaster offers three alternative operation modes, one of them being the same as Relational.OWL and therefore not mentioned in the table.

standard basic approach with or without inspection andfine-tuning of the final result by the human user tomanually defined mappings. Regarding data accessi-bility, all three modes are again met among reviewedapproaches with a strong inclination towards SPARQLbased data access.

The language used for the representation of the map-ping is particularly relevant for methods of this group.Unlike the case of tools outputting a database schemaontology, where the correspondences with database el-ements are obvious, concepts and relationships of adomain-specific ontology may correspond to arbitrar-ily complex database expressions. To express thesecorrespondences, a rich mapping representation lan-guage that contains the necessary features to cover

real world use cases is needed. Typical mapping rep-resentation languages are able to specify sets of datafrom the database as well as transformation on thesesets that, eventually, define the form of the resultingRDF graph. It comes as no surprise that, until today,with R2RML not being yet finalized, every tool usedto devise its own native mapping language, each withunique syntax and features. This resulted in the lock-in of mappings that were created with a specific tooland could not be freely exchanged between differentparties and reused across different mapping engines. Itwas this multilingualism that has propelled the devel-opment of R2RML as a common language for express-ing database to ontology mappings. Plans for adoptionof R2RML are already under way for some tools and


the number of adopters is about to increase in the nearfuture.

Since the majority of the tools we deal with in thissection rely on the basic approach for the generationof an ontology, the output ontology language is usuallysimple RDFS. The goal of these tools is rather to gen-erate a lightweight ontology that possibly reuses termsfrom other vocabularies for increased semantic inter-operability, than to create a highly expressive ontologystructure. Vocabulary reuse is possible in the case ofmanually defined mappings, but the obvious drawbackis the fact that, in order to select the appropriate on-tology terms that better describe the semantics of thedatabase contents, the user should also be familiar withpopular Semantic Web vocabularies. The main motiva-tion that is driving the tools of this Section is the massgeneration of RDF data from existing large quantitiesof data residing in relational databases, that will in turnallow for easier integration with other heterogeneousdata.

D2RQ [27] is one of the most prominent tools inthe field of relational database to ontology mapping.It supports both automatic and user-assisted operationmodes. In the automatic mode, an RDFS ontology isgenerated according to the rules of the basic approachand additional rules, common among reverse engi-neering methodologies, for the translation of foreignkeys to object properties and the identification of M:N(many-to-many) relationships. In the manual mode,the contents of the relational database are exported toan RDF graph, according to mappings specified in acustom mapping language also expressed in RDF. Asemi-automatic mode is also possible, in case the userbuilds on the automatically generated mapping in orderto modify it at will. D2RQ’s mapping language offersgreat flexibility, since it allows for mapping virtuallyany subset of a relation in an ontology class and severaluseful features, such as specification of the URI gen-eration mechanism for ontology individuals or defini-tion of translation schemes for database values. D2RQsupports both ETL and query based access to the data.Therefore, it can serve as a programming interface andas a gateway that offers ontology-based data accessto the contents of a database through either Seman-tic Web browsers or SPARQL endpoints. The enginethat uses D2RQ mappings to translate requests fromexternal applications to SQL queries on the relationaldatabase is called D2R Server [25]. Vocabulary reuseis also supported in D2RQ through manual editing ofthe mapping representation files.

OpenLink Virtuoso Universal Server is an integra-tion platform that comes in commercial and open-source flavours and offers an RDF view over a rela-tional database with its RDF Views feature [28], thatoffers similar functionality to D2RQ. That is, it sup-ports both automatic and manual operation modes. Inthe former, an RDFS ontology is created followingthe basic approach, while in the latter, a mapping ex-pressed in the proprietary Virtuoso Meta-Schema lan-guage is manually defined. This mapping can covercomplex mapping cases as well, since Virtuoso’s map-ping language has the same expressiveness to D2RQ’s,allowing to assign any subset of a relation to an RDFSclass and to define the pattern of the generated URIs.ETL, SPARQL based and Linked Data access modesare supported. One downside of both D2RQ and Virtu-oso RDF Views is the fact that a user should familiarizehimself with these mapping languages in order to per-form the desired transformation of data into RDF, un-less he chooses to apply the automatic basic approach.

In the same vein, SquirrelRDF [109] and Spydercreate an RDFS view of a relational database, allow-ing for the execution of SPARQL queries against it.Both tools support all the workflows mentioned ear-lier, that cover the entire range of automation levels.Their automatic mode is typically based on the basicapproach and they define their own mapping represen-tation languages, although very different in their ex-pressiveness. SquirrelRDF’s mapping language is ex-tremely lightweight and lacks several essential fea-tures, as it only supports mapping relations to RDFSclasses and attributes to RDF properties. On the con-trary, Spyder’s mapping language is much richer, whilethe tool also supports a fair amount of R2RML fea-tures. The main data access scheme for both tools isSPARQL based, however SquirrelRDF does not sup-port certain kinds of SPARQL queries that, when trans-formed to SQL, prove to be very computationally ex-pensive.

Tether [31] also outputs an RDF graph from a rela-tional database. In this work, several shortcomings ofthe basic approach are identified, some of them men-tioned in the introduction of Section 5, and fixes forthese shortcomings are proposed. However, this workis targeted specifically to cultural heritage databasesand the majority of the suggestions mentioned dependlargely on the database schema under examination andthe cultural heritage domain itself. The decision as towhether the proposed suggestions are applicable andmeaningful in the database under consideration is to bemade by a domain expert.


As the basic approach is an integral part of the work-flow of most tools in this category, the RDB2RDFWorking Group, which is responsible for the R2RMLspecification, has been chartered in order to also pro-pose a basic transformation of relational databases toRDFS ontologies, called direct mapping [24]. The di-rect mapping is supposed to standardize the informalbasic approach that has been used so far with slightvariations from various tools. Therefore, taking intoaccount the structure of a database, the direct mappingproduces an RDF graph that can be later modified toachieve more complex mappings.

Triplify [14] is another notable RDF extraction toolfrom relational schemas. SQL queries are used to se-lect subsets of the database contents and map them toURIs of ontology classes and properties. The mappingis stored in a configuration file, the modification ofwhich can be performed manually and later enrichedso as to reuse terms from existing popular vocabular-ies. The generated RDF triples can be either materi-alized or published as Linked Data, thus allowing fordynamic access. Triplify also accommodates for ex-ternal crawling engines, which ideally would want toknow the update time of published RDF statements.To achieve this, Triplify also publishes update logs inRDF that contain all RDF resources that have been up-dated during a specific time period, which can be con-figured depending on the frequency of updates. Thepopularity of the Triplify tool is due to the fact thatit offers predefined configurations for the (rather sta-ble) database schemas used by popular Web applica-tions. An additional advantage is the use of SQL queryfor the mapping representation, which does not requireusers to learn a new custom mapping language.

A tool that follows a somehow different rationaleis METAmorphoses [118]. METAmorphoses uses atwo-layer architecture for the production of a material-ized RDF graph that can potentially reuse terms fromother popular ontologies. Apart from the typical map-ping layer, where a mapping is defined in a customXML-based language, there is also a template layer,where the user can specify in terms of another XML-based language the exact output of the RDF graph byreusing definitions from the mapping layer. This ar-chitecture introduces unnecessary complexity for thebenefit of reusing mapping definitions, a feature that isachieved much more elegantly in R2RML.

Another approach that needs to be mentioned atthis point is OntoAccess [63], which also introducesa mapping language, R3M, for the generation of anRDF graph from a relational database. Although R3M

shows considerably less features from other more ad-vanced mapping languages, it contains all the neces-sary information for propagating updates of the RDFgraph back to the database. OntoAccess offers dataaccess only through SPARQL and is one of the fewapproaches that focus on the aspect of updating RDFviews and ensuring that database contents are modi-fied accordingly. Nevertheless, it does not allow the se-lection of an arbitrary subset of database contents likemost approaches do, but only supports trivial relationto class and attribute and link relations (representingmany-to-many relationships) to property mappings.

A very crude overview of some of the features sup-ported by the mapping definition languages of the toolsmentioned in this section is shown at Table 5. Amongthe features included are:

– Support for conditional mappings, allowing forthe definition of complex mappings that go be-yond the trivial “relation-to-class” case, by en-abling the selection of a subset of the tuples of therelation, according to some stated condition.

– Customization of the URI generation scheme forclass individuals, allowing for the definition ofURI patterns that may potentially use values fromattributes other than the primary key (therefore,generalizing the basic approach URI generationscheme).

– Application of transformation functions to at-tribute values for the customization of the RDFrepresentation. Such transformations may includestring patterns combining two or more databasevalues or more complex SQL functions.

– Support for creation of classes from attribute val-ues.

– Foreign key support, which interprets foreign keyattribute values as properties that link two indi-viduals with each other, instead of simple literals.

A more in-depth comparison of mapping representa-tion languages can be found at [64].

5.2.2. Approaches performing reverse engineeringThe tools presented in Section 5.2.1 mainly rely on

the human user as a source of domain knowledge andsupport him on the mapping definition task, by pro-viding an initial elementary transformation often basedon the basic approach, that can be later customized.On the contrary, approaches presented in this sectionconsider the relational database as the main sourceof domain knowledge, complemented with knowledgefrom other external sources and optionally, a human


Table 5Comparison of language features for tools of Section 5.2.1

Language Conditionalmappings

URI genera-tion schemefor individu-als

Transformationfunctionsfor attributevalues

Attribute val-ues to classes

Foreign keysupport

D2RQ [27]√ √ √ √ √

METAmorphoses[118]

√– – –

√

R2RML√ √ √ √ √

R3M [63] –√

– –√

Spyder√ √ √ √ √

SquirrelRDF [109] – – – – –Virtuoso Meta-Schema Language[28]

√ √ √ √ √

expert as well. Examples of such sources are databasequeries, data manipulation statements, application pro-grams accessing the database, thesauri, lexical reposi-tories, external ontologies and so on.

The majority of methods in this category gain in-spiration from traditional database reverse engineeringwork. The process of reverse engineering the logicalschema of a database to acquire a conceptual one isa well known problem of database systems research,having attracted the interest of many researchers sincethe early 90s. Since the nature of the problem is ratherbroad, as it encompasses different logical and concep-tual models, the related literature is extremely rich andsolutions proposed show considerable variance. Sev-eral methodologies for the reverse engineering of a re-lational database to a conceptual schema that followsthe Entity-Relationship or other object-oriented mod-els have been proposed and have later been adaptedaccordingly for providing solution to the relationaldatabase to ontology mapping problem. Still though,there exists no silver bullet for the accurate retrievalof the conceptual schema that underpins a relationaldatabase. This is due not only to the unavoidable lossof semantics that takes place in the logical databasedesign phase, as different constructs of the ER modelmap to elements of the same type in the relationalmodel, but also to the fact that database designers oftendo not use standard design practices nor do they namerelations and attributes in a way that gives insight totheir meaning [101]. As a consequence, reverse engi-neering a relational schema is more of an art and lessof a formal algorithm with guaranteed results.

Traditional relational database reverse engineeringsolutions vary significantly because of the different as-

sumptions they are based on. These assumptions re-fer to the normal form of the input relational schema,the amount of information that is known with respectto the relational schema, the target model the concep-tual schema is expressed in and the external sourcesof information used. Most methods assume that the in-put relational schema is on third normal form (3NF),which is the least strict normal form that allows for de-composing relations into relations that preserve someinteresting properties [102]. Nevertheless, there aremethods that are less restrictive and are able to man-age relational schemas in second normal form (2NF)[103]. Moreover, some approaches do not consider asgiven the candidate, primary and foreign keys of eachrelation and therefore, try to discover as well func-tional and inclusion dependencies by analyzing thedata of the database (e.g. [6,40]). The knowledge ofkeys in a relational schema is a very important is-sue, that greatly simplifies the reverse engineering pro-cess. As such, it is indispensable for most approaches(e.g. [67,87]), although exceptions do exist [101,103].Some methodologies focus on specific aspects of theconceptual model, such as cardinalities of participationconstraints in relationships [6] or generalization hierar-chies [77], and attempt to identify them. Finally, meth-ods differ with respect to the target output model con-sidered: some extract an EER model (e.g. [40]), whileothers target object-oriented models (e.g. [20,101]). Ashinted earlier, most reverse engineering methods aresemi-automatic, as they require validation from a hu-man expert and can use additional information sources,such as queries that are executed against the databaseor relevant domain thesauri [70].


Despite the variance of the above methodologies,they can all prove quite useful and relevant when con-sidering the generation of a domain ontology fromrelational databases, since Description Logics – onwhich OWL is based – bear some similarities withobject-oriented models. The approaches mentioned inthis category accept as input a relational schema, usu-ally in the form of SQL Data Definition Language(DDL) statements, which contain information aboutprimary and foreign keys as well as domain constraintsfor relational attributes, and generate a domain ontol-ogy by mining information not only from the databaseinstance, but also from other sources of domain knowl-edge. The generation of the ontology is based on a setof heuristic rules, that essentially try to implement theinverse transform of the logical database design pro-cess, sketched in Table 1. The variety of assumptionsof traditional reverse engineering approaches that wasmentioned before is less evident here, as the vast ma-jority of methodologies imply the existence of a 3NFrelational schema with full knowledge of keys and do-main constraints, while the output is an ontology usu-ally expressed in OWL or another DL language, in thegeneral case.

Apart from this typical line of work, there are alsoa couple of other relevant groups of methods. The firstone considers an ER model as the input to the ontol-ogy creation process, while the second one uses inter-mediate models for the transition from the relational tothe ontology model. In the first case, the input for theprocess is often not available, as conceptual schemasusually get lost after the logical design phase or sim-ply exist in the minds of database designers. There-fore, the practical utility of these tools is arguable, al-though they tend to generate more expressive ontolo-gies, compared to approaches that work on a relationalschema. This is natural, since an ER model is a con-ceptual model and most, if not all, of its features canbe represented in an OWL ontology. Approaches, suchas ER2WO [127], ERONTO [122], ER2OWL [52] andthe work of Myroshnichenko and Murphy [92] gen-erate ontologies that cover the entire range of OWLspecies – from OWL Lite to OWL Full – by applyinga set of rules that translate constructs of the ER modelto OWL axioms. Thus, these tools do not need to ap-ply reverse engineering in the relational schema andare restricted to a simple model translation. The sec-ond group of approaches that deserves a mention usesan intermediate model for the generation of an OWLontology from a relational database [13,65] and con-

stitutes an odd minority, compared to the large numberof approaches that directly create a domain ontology.

Approaches of this category are mainly automatic,since they rely, to a large extent, on a set of generalrules that are applied on every possible input relationalschema. Of course, the result of this procedure cannotbe always correct and represent faithfully the mean-ing of the relational schema and thereby, a human ex-pert is often involved in the process to validate the re-sulting ontology and, optionally, enrich it with linksto other domain ontologies either manually or semi-automatically by following suggestions produced bysome algorithm [82]. In some cases, the discovery ofcorrespondences between the generated ontology andother domain ontologies or lexical databases, such asWordNet, is performed automatically (e.g. in [16] and[68] respectively). An ER model can also be takeninto account for the validation of the resulting ontol-ogy [4]. However, in contrast with early reverse en-gineering methods, no approach utilizes queries andSQL DML statements in order to extract more accu-rate domain ontologies. A summary of the informationsources considered by approaches of this category forthe generation of a domain ontology is depicted in Ta-ble 6.

Regarding data accessibility, all approaches outputa materialized ontology. This is due to the fact thatthe main motivation of this group of approaches is thelearning of a new ontology that can be later used toother applications and thus, needs to be materialized.This is also reflected in the absence of mapping rep-resentation for the majority of these approaches, sincethe relational database is used as a knowledge sourcein this context and the storage of mappings is of littleinterest as they will not be reused. As far as the ontol-ogy language of the resulting ontology, there seems tobe consensus about the use of OWL, with very few ex-ceptions [11,84,117]. Vocabulary reuse is not possiblein this category of approaches, although some of them,as mentioned before, try to establish links with existingdomain ontologies, in order to better ground the termsof the newly generated ontology. A shortcoming that iscommon among all approaches that apply reverse en-gineering techniques is the structural similarity of thegenerated ontology with the database schema.

In contrast with the tools of Section 5.2.1, whichexport the database contents as RDF graphs and, infact, create instances of RDFS classes, approachesthat perform reverse engineering do not always popu-late the generated ontology classes with data from thedatabase, thus restricting themselves in the generation


Table 6List of input information sources for approaches of Section 5.2.2

ApproachInput sources

Schema Data OtherSources

User

Astrova (2004)[11]

√ √ √

Astrova (2009)[12]

√

Buccella et al.(2004) [29]

√ √

DB2OWL [55]√

DM-2-OWL [5]√

Juric, Banek,Skocir [68]

√WordNet

Lubyte, Tessaris(2009) [84]

√ √

R2O [116]√

ER model forvalidation

RDBToOnto [34]√ √ √

ROSEX [43]√

Shen et al. (2006)[112]

√

SOAM [82]√

Lexicalrepositories

√

SQL2OWL [4]√ √

ER model forvalidation

√

L. Stojanovic, N.Stojanovic, Volz(2002) [117]

√ √

Tirmizi, Sequeda,Miranker (2008)[120]

√

of an ontology TBox. Approaches that do populate thegenerated ontology with instances tend to materializethem, with the exception of DB2OWL [55] and Ultra-wrap [110] tools that offer dynamic SPARQL basedaccess.

The heuristic rules on which the ontology genera-tion are often based on the separation of relations, ac-cording to the number of attributes that constitute theprimary key, the number of foreign keys and the inter-section of primary and foreign keys of a relation. Inmost cases, these rules are described informally [29,55,82,112,116,117] and sometimes, only through con-venient examples that cannot be generalized [11,12].Moreover, few methods yield semantically sound re-sults, as they often contain faulty rules that misin-terpret the semantics of the relational schema, whilein other cases, OWL constraints are misused. Twoof the most complete and formalized approaches areSQL2OWL [4] and the work of Tirmizi and colleagues

[120], in which a set of Horn rules that covers all pos-sible relation cases is proposed.

As the sets of rules are, to a large extent, repeatedacross different approaches, we refrain from present-ing separately each approach and instead present in-formally the gist of the most significant categories ofrules proposed:

1. Default rules. These are the rules describing andextending the basic approach for the case of theOWL model, i.e. a relation maps to an OWLclass, non-foreign key attributes are mapped toOWL datatype properties and foreign key at-tributes are mapped to OWL object propertieswith domain and range the OWL classes corre-sponding to the parent and child relations respec-tively. Moreover, a relation tuple maps to an in-dividual of the respective OWL class.

2. Hierarchy rules. Such rules identify hierarchiesof classes in the relational schema. Accordingto them, when two relations have their primarykeys linked with a foreign key constraint, theyare mapped to two OWL classes, the one beinga superclass of the other. More specifically, theclass that corresponds to the parent relation is su-perclass of the class that corresponds to the childrelation. This kind of rules often conflicts withfragmentation rules.

3. Binary relationships rules. These rules identifya relation that can be mapped to an OWL objectproperty. The primary key of such a relation iscomposed of foreign keys to two other relationsthat are mapped to OWL classes, while the re-lation itself has no additional attributes. One ofthese OWL classes constitutes the domain andthe other one the range of the generated objectproperty, with the choice being left to the hu-man user who may supervise the procedure. Inorder to overcome this obstacle, automatic meth-ods often create two inverse object properties. Incase a relation has additional attributes, it cor-responds to a binary relationship with descrip-tive attributes. Nevertheless, as this feature is notdirectly supported in OWL, it is mapped to anOWL class with appropriate datatype properties,much like in the default rules case.

4. Weak entity rules. These rules identify weak en-tities and their corresponding owner entities. Aweak entity is usually represented with a relationthat has a composite primary key that contains aforeign key to another relation, which represents


the owner entity. Such relations are still mappedto OWL classes, however the semantics of theidentifying relationship that connects a weak en-tity with its owner is difficult to be specified.Some rules, in fact, choose to interpret the re-lationship between the two relational as a pairof inverse hasPart and isPartOf object proper-ties. This kind of rules also conflicts with multi-valued property rules.

5. n-ary relationships rules. These rules identifyn-ary relationships, usually in cases where theprimary key of a relation is composed of foreignkeys to more than two other relations that aremapped to OWL classes. Essentially, such rulesare not different from default rules, given that n-ary relationships cannot be directly expressed inOWL, hence they are mapped to an OWL class.

6. Fragmentation rules. These rules identify rela-tions that have been vertically partitioned and,in fact, describe the same entity. Relations iden-tified are mapped to the same OWL class. Theidentification relies on the presence of the sameprimary key in all implicated relations, much likein the rules that discover hierarchies of classes.

7. Multi-valued property rules. These rules tryto identify relations that act as multi-valued at-tributes. As there are quite a few ways to encodea multi-valued attribute in a relational schema,it is difficult to recognize such an attribute. Thiskind of rules often assume that a relation that hasa composite primary key containing a foreign keyreferencing a parent relation can be translated toa datatype property with domain the class thatcorresponds to the parent relation. As mentionedbefore, these rules are based on the same assump-tions with weak entity rules.

8. Constraint rules. These rules exploit additionalschema constraints, which are present in SQLDDL statements. Such schema constraints in-clude restrictions on non-nullable and unique at-tributes, as well as domain constraints regard-ing an attribute’s datatype. These are usuallymapped to appropriate OWL cardinality axiomson datatype properties, inverse functional prop-erties and either global (rdfs:range axioms) orlocal universal quantification axioms (e.g. anowl:allValuesFrom axiom applied to a specificOWL class) respectively.

9. Datatype translation rules. These rules are es-sentially translation tables defining correspon-dences between SQL datatypes on the one hand

and XML Schema Types, which are typicallyused as datatypes in RDF and OWL, on the other.Such correspondences are defined in detail in theSQL standard [2].

The rule categories used by every method perform-ing reverse engineering for the generation of a domain-specific ontology are shown in Table 7, together with anote on whether the created ontology is populated withinstances from the database. It should be noted thatthis categorization of rules is somewhat rough, thusslight differences between approaches (e.g. how ap-proaches handle specific constraints, such as primarykey, uniqueness, not null constraints etc.) cannot be re-vealed in Table 7. A similar, more analytic overviewof a limited number of approaches, as well as a rathertechnical table that gathers the precise RDFS and OWLelements used by each approach can be found at [111].Approaches that include conflicting rule categories, asthe ones pinpointed earlier, often rely on the decisionof the human user or introduce further premises in or-der to differentiate between them. However, the va-lidity of these proposals is arguable, since relationalschemas that serve as counterexamples can be easilythought of.

It can be easily seen that the rules mentioned be-fore only investigate the relational database schemaand do not consider database contents, as some re-verse engineering approaches do by exploiting datacorrelations in the database instance and employingdata mining techniques. Until now, very few meth-ods belonging to this category have incorporated someform of data analysis. Astrova [11] examines correla-tion among key, non-key attributes and data in key at-tributes between two relations, in order to extract var-ious types of class hierarchies (e.g. both single andmultiple inheritance hierarchies). However, this workis structured around informal convenient examples andits results cannot be generalized.

A much more interesting effort is RDBToOnto [34],which is the only tool of this category that proposes aformal algorithm for exploiting data from the databasein order to extract class hierarchies. The main intuitionbehind RDBToOnto is the discovery of attributes thatpresent the lowest variance of containing values. Theassumption is that these attributes are probably cate-gorical attributes that can split horizontally a relationR initially mapped to a class C(R), separating its tu-ples in disjoint sets, which in turn can be mapped todisjoint subclasses of the initial C(R) class. This dataanalysis step is applied to attributes that contain some


Table 7Rules applied to each approach of Section 5.2.2

ApproachRule categories

Instance population1 2 3 4 5 6 7 8 9

Astrova (2004) [11]√ √ √ √ √ √

–√

– YesAstrova (2009) [12]

√ √ √ √ √– –

√ √Yes

Buccella et al. (2004) [29]√

–√ √ √

– –√ √

NoDB2OWL [55]

√ √ √– – – –

√ √Yes

DM-2-OWL [5]√ √ √

– – – –√ √

NoJuric, Banek, Skocir [68]

√–

√– – – –

√ √No

Lubyte, Tessaris (2009) [84]√

–√

–√

– –√

– NoR2O [116]

√ √ √ √ √–

√ √– No

RDBToOnto [34]√ √ √

– – – – – – YesROSEX [43]

√–

√ √– – – – – No

Shen et al. (2006) [112]√ √ √

– –√

–√ √

YesSOAM [82]

√ √ √ √ √ √–

√ √Yes

SQL2OWL [4]√ √ √

–√ √ √ √ √

OptionalL. Stojanovic, N. Stojanovic, Volz (2002) [117]

√ √ √–

√ √–

√– Yes

Tirmizi, Sequeda, Miranker (2008) [120]√ √ √

–√

– –√ √

Yes

lexical clues in their name, which potentially can clas-sify them as categorical attributes. RDBToOnto com-bines this data analysis step with some of the mostcommon heuristic rules already mentioned and gener-ates an OWL ontology.

As most of the approaches that belong to this groupare based on heuristics, their efficiency and abilityto interpret correctly the full meaning of a relationalschema is doubtful, since they cannot capture all possi-ble database design practices and guess the original in-tention of the designer. The most significant problemswith this kind of approaches are:

Identification of class hierarchies. Typically, hier-archies in relational databases are modeled by the pres-ence of a primary key that is also a foreign key refer-ring to the primary key of another relation. We have al-ready pointed out the fact that this design can also im-ply that a relation has been vertically partitioned andinformation on the same instance has been split acrossseveral relations. However, this is only one of the pos-sible alternative patterns that can be used for express-ing generalization hierarchies in the relational model[77]. As we have seen, identification of categorical at-tributes and analysis of data may provide hints for min-ing subclasses.

Interpretation of primary keys. Primary keys andgenerally, attributes with unique values are usuallytransformed to inverse functional datatype properties.The introduction of such axioms leads to OWL Fullontologies, which seem to intimidate ontology design-

ers. Despite the fact that state of the art reasoners, suchas Pellet, support inverse functional datatype proper-ties, some methods, in order to avoid OWL Full, modelthe primary key of a relation as a separate class andtranslate the primary key constraint to an inverse func-tional object property. It is also quite surprising thatthe OWL 2 hasKey property is not considered by anyapproach for the modeling of primary key attributes,although this may be due to the fact that most methodsof this group were proposed before the introduction ofOWL 2. The owl:hasKey property also provides a nat-ural solution for composite primary keys, the transfor-mation of which has not been mentioned in any of themethods reviewed.

Elicitation of highly expressive ontology axioms.So far, no method exists that is able to map success-fully expressive OWL features, such as symmetric andtransitive properties or even more advanced OWL 2constructs like property chains, since these featurescannot be elicited by inspection of the database schemaalone. Furthermore, among the approaches reviewed,there are divergent opinions on the expression of do-main constraints for attributes. Some approaches sug-gest translating these constraints to rdfs:range axiomson the corresponding OWL datatype property, whileothers suggest the use of universal quantification ax-ioms for a specific class with the owl:allValuesFromconstruct.


Domain ontology

Relational database

Mapping engineMappings1

1

Schema matching algorithm

Mapping execution module

Mappings

2

RDF Graph

Fig. 4. Mapping an existing ontology to a relational database.

6. Mapping a Database to an Existing Ontology

So far, we have come across tools and methods thatdo not require a priori knowledge on existing ontolo-gies. In this section, we examine approaches that takeas granted the existence of one or more ontologies andeither try to establish mappings between a relationaldatabase and these ontologies or exploit manually de-fined mappings for generating an RDF graph that usesterms from these ontologies. These two alternatives aredepicted in Figure 4 (lines 2 and 1 respectively). Theunderlying assumption adopted by all approaches isthat the selected ontologies model the same domainas the one modeled by the relational database schema.Therefore, on the one hand, there are approaches thatapply schema matching algorithms or augment reverseengineering techniques as the ones discussed in Sec-tion 5.2.2 with linguistic similarity measures, in or-der to discover mappings between a relational databaseand an appropriately selected ontology. These map-pings can then be exploited in various contexts and ap-plications, such as heterogeneous database integrationor even database schema matching [7,49]. On the otherhand, there are tools that share quite a lot with toolsof Section 5.2.1, receiving user-defined mappings be-tween the database schema and one or more existingontologies and generating an RDF graph that essen-tially populates ontologies with instance data from thedatabase.

Given that the database schema and the ontology aredeveloped independently, they are expected to differsignificantly and often, the domains modeled by thetwo are not identical, but overlapping. Hence, map-pings are usually not trivial or straightforward and, in

the general case, they are expressed as complex ex-pressions and formulas. Thus, in any case, tools shouldbe able to discover and express complex mappings,with the first task being particularly challenging andvery hard to be automated. As a consequence, thevast majority of approaches in this category are ei-ther manual or semi-automatic. In the latter case, someuser input is required by the matching algorithm, usu-ally in the form of initial simple correspondences be-tween elements of the database schema and the on-tology. A purely automated procedure would need tomake oversimplifying assumptions on the lexical prox-imity of corresponding element names in the databaseschema and the ontology that are not always true.Such assumptions tend to overestimate the overall effi-ciency of mapping discovery methods, as for examplein MARSON [66] and OntoMat-Reverse [125].

As far as data accessibility is concerned, unlike allcategories of tools encountered so far, it is not appli-cable to approaches that simply output a set of map-ping expressions (line 2 in Figure 4), an observationthat prevented us from including data accessibility as aclassification criterion in Figure 1. For such tools, themain aim is the generation of mappings and thereby,they do not offer RDF dumps of database contents orsemantic query based access. These tools left aside,this category supports all three data access scenarios.

Regarding the mapping language parameter, al-though reusability of mappings is important for thiscategory of approaches, the same comment can bemade as in Section 5.2.1, i.e. every tool uses its ownproprietary mapping language. Again, expectations arethat the level of adoption of R2RML will increase inthe near future. All approaches support OWL ontolo-gies, while vocabulary reuse is not possible for sometools that require strictly one ontology to carry outthe mapping process. That being said, the vocabularyreuse parameter is also not applicable to approachesthat merely discover mappings between a relationaldatabase and an ontology, for the same reason as indata accessibility. Another distinguishing feature oftools in this category is the presence of a graphicaluser interface in the majority of cases. Usually, the in-terface allows the user to either graphically define themappings between the database and the ontology, sup-ported by the visualization of their structure or validateautomatically discovered mappings.

Starting off with approaches where mappings aremanually defined, we mention R2O [18], a declarativemapping language that allows a domain expert to de-fine complex mappings between a relational database


instance and one or more existing ontologies. R2O isan XML-based language that permits the mapping ofan arbitrary set of data from the database – as it con-tains features that can approximate an SQL query – toan ontology class. R2O’s expressiveness is comparableto the most advanced mapping languages discussed inSection 5.2.1, supporting features such as conditionalmappings or specification of URI generation schemes.The ODEMapster engine [17] exploits mappings ex-pressed in R2O in order to create ontology individualsfrom the contents of the database allowing for eithermaterialized or query-driven access.

Manual mappings between a database schema anda given ontology are defined in DartGrid [38] andits successor Linked Data Mapper [130]. DartGriddefines a typical database integration architecture,a critical component of which is the definition ofmappings between each local database and a globalRDFS ontology. Correspondences between compo-nents of the two models are defined via a graphi-cal user interface and Local as View (LAV) map-pings (database relations expressed as a conjunctivequery against the ontology) are generated in the back-ground and stored in an RDF/XML format. DartGridalso allows ontology-based data access, by perform-ing translation of SPARQL queries to SQL, by ex-ploiting the mappings defined. Linked Data Mapperintegrates DartGrid’s interface with Virtuoso Univer-sal Server, therefore taking advantage of the capabili-ties and rich features of the latter and enabling, amongothers, ontology-based data access to the contents of arelational database.

Graphical user interfaces for the manual definitionof mappings are also included in the VisAVis [71] andRDOTE [123] tools. In both approaches, SQL is usedas the means for the specification of the data subset thatwill be mapped to instances of an ontology class. Inthe case of VisAVis, an SQL query is stored as a literalvalue of a special purpose datatype property for thecorresponding ontology class, while in RDOTE the ap-propriate SQL query is stored in a configuration file to-gether with possible transformation functions appliedto elements of the database. VisAVis allows for querybased access, while RDOTE outputs an RDF graphdump that uses classes and properties from the exist-ing ontology. The results of SQL queries are also usedfor the population of a template of an RDF/XML filein the RDB2Onto tool [76].

The RDB2OWL framework [30] adopts a slightlymore complex road for the storage and execution ofcomplex mappings between a relational database and

one or more existing ontologies. RB2OWL stores themappings in an external database schema, especiallytailored for the purpose. This database schema is, infact, a mapping meta-schema and contains relationsand attributes that store the description of the databaseschema and ontologies to be mapped as well as themappings themselves. RDB2OWL allows, similarly topreviously mentioned tools, for the selection of a datasubset from the database, thus simulating a basic SQLquery. Such SQL queries are in fact produced by SQLmeta-level queries and executed for the population ofthe OWL ontologies considered.

Moving on to approaches that discover mappings ina semi-automatic or even automatic way, we first men-tion the MAPONTO tool [8]. MAPONTO is a semi-automatic tool that receives as input from a human ex-pert initial correspondences between relation attributesand properties. The goal of MAPONTO is the extrac-tion of LAV mappings of the form R(X) : −Φ(X, Y ),where R(X) is some relation and Φ(X, Y ) is a con-junctive formula over ontology concepts with X ,Ybeing sets of variables or constants. Such mappingsdefine the meaning of database relations in terms ofan ontology and can be harnessed in a database in-tegration context. In short, MAPONTO views the re-lational schema as a graph (where foreign key linksare the connecting edges) and for each relation in it,tries to build a tree that defines the semantics of therelation. Foreign key links are traversed, adding vis-ited relations to the tree, while the algorithm is awareof the various relation structures that may representbinary, n-ary relationships or weak entities (follow-ing the typical reverse engineering rules mentionedin Section 5.2.2). By taking into account informationinitially provided by the user on the correspondencesbetween relation attributes and ontology properties,MAPONTO is able to encode, in a straightforwardmanner, the generated tree in the relational graph ina conjunctive formula that uses ontology terms. Withslight modifications, the same algorithm can be usedfor investigating the opposite direction, i.e. the discov-ery of Global as view (GAV) mappings that expressconcepts of the ontology as conjunctive queries overthe database. Such GAV mappings can then be directlyexploited for the population of an existing ontologywith database instances.

Another interesting approach in this group is theone used in MARSON [66]. At first, relations are cat-egorized on groups, according to whether they rep-resent entities or relationships, following the reverseengineering process of Chiang and colleagues [40].


Vectors of coefficients, called virtual documents, arethen constructed for every relation and attribute of thedatabase schema. The description of a relation takesinto account the description of relations connected toit via foreign keys, while the description of attributesincorporates the description of its parent relation andother relations connected to the latter as well as the at-tribute’s domain. Likewise, virtual documents for ev-ery class and property of the ontology are constructedand similarity between the elements of the databaseschema and the ontology is computed with pairwisecalculation of their identifying vectors’ cosine dis-tance. Essentially, the intuition is that the relationalschema is interpreted as a graph and some form ofelementary graph matching is performed. MARSONadopts the simplest case, in which an element is de-scribed only by its name, an obvious shortcoming thateliminates the possibility of matching e.g. a relationnamed academic_staff with a faculty class. As a nextstep, MARSON uses these simple correspondences be-tween relations and OWL classes in order to test theconsistency of the attribute-to-property mappings dis-covered, while it exploits database data and ontologyindividuals to discover more complex mappings, e.g.finding the most appropriate categorical attribute of arelation that splits it into subsets, which in turn cor-respond to disjoint OWL subclasses of a superclass.The discovery of the categorical attribute in a relationis accomplished by computing the information gain (ameasure known from the information theory field) forevery attribute in a relation, a procedure that raisesquestions regarding the computational efficiency of theapproach, as the number of string similarity checksneeded for the computation of the information gainis directly related to the data volume of the database.Nevertheless, MARSON is the only effort made so farto completely automatize the process of discoveringmappings between a relational database and a givenontology. As such, it relies on the assumption of lexicalproximity between names of elements in the databaseand the ontology during the calculation of similaritycoefficients and the assumption of existence of a suf-ficient number of individuals in the ontology, requiredfor the computation of information gain.

OntoMat-Reverse [125] is part of the OntoMat-Annotizer semantic annotation framework, alreadymentioned in Section 5.1, where scenario of the in-direct annotation of web presentation was described.The direct annotation of the database schema, fromwhich the data of a dynamic web page is retrieved, isthe second possible workflow scenario of this frame-

work. To this end, OntoMat-Reverse discovers map-pings between the database schema and an existingontology by complementing the reverse engineeringrules of Stojanovic and colleagues [117] with a lex-ical similarity metric. In essence, it recognizes spe-cial structures in the database schema (in much thesame way as the methods in Section 5.2.2) and tries tomatch them lexically with the classes and properties ofthe existing ontology. The discovery of mappings thentrigger the population of the existing ontology with in-stances from the database. The efficiency of this tool isbounded by both the inherent inability of reverse engi-neering methods to interpret correctly all database de-sign choices and the reliance on lexical methods to findthe best match. Naturally, validation from an expert isrequired in the end of the process.

Other tools that rely on lexical similarity mea-sures between database and ontology elements areRONTO [94], D2OMapper [128] and AuReLi [100].In RONTO, linguistic similarity measures are com-puted between corresponding structural elements ofa database and an ontology. Therefore, relation, nonforeign key and foreign key attribute names in thedatabase are compared to class, datatype and objectproperty names in the ontology, respectively, to findthe closest lexical matches, which are then validated bythe user. A similar matching approach is followed byD2OMapper, which uses the ER2WO [127] translationrules from an ER model to an OWL ontology. How-ever, as these rules present some peculiarities, such asthe mapping of binary relationships to OWL classes,it is arguable whether D2OMapper is able to discovermappings between a relational database and an ontol-ogy that has not been produced by the ER2WO tool.AuReLi, on the other hand, uses several string similar-ity measures to compare relation attributes with prop-erties of several specified a priori ontologies. Further-more, AuReLi tries to link database values with on-tology individuals by issuing queries to the DBPediadataset17. The results of the algorithm are then pre-sented to the human user for validation. AuReLi isimplemented as a D2R Server [25] extension offeringLinked Data and SPARQL data access.

In contrast with the string-based techniques de-scribed in the previous paragraph, StdTrip [106] pro-poses a structure-based framework that incorporatesexisting ontology alignment tools, in order to find on-tology mappings between a rudimentary ontology that

17DBPedia homepage: http://dbpedia.org/

http://dbpedia.org/


is generated from a relational database following thebasic approach and a set of given ontologies. As in Au-ReLi, the results of the ontology alignment algorithmare presented as suggestions to the human user, whoselects the most suitable ontology mapping.

We close this section by mentioning MASTRO[32], a framework that enables the definition of map-pings between a relational database and an ontologyand the issue of conjunctive queries expressed in EQL,an SQL-like language. The MASTRO framework isan ontology-based data access tool suite that reformu-lates a conjunctive query expressed over a DL-LiteAontology18 to an SQL query that is evaluated by therelational database. DL-LiteA has been proved to al-low for tractable reasoning, while an algorithm hasbeen proposed for the reformulation of a conjunctivequery to SQL [98]. This algorithm differs consider-ably from typical SPARQL to SQL rewriting methods[37,44,50,83] that are employed by other tools offeringdynamic data access. MASTRO is also complementedwith the OBDA plugin for the popular ontology edi-tor Protégé [99] that allows for the definition of GAVmappings between a database and an ontology via agraphical interface.

7. Discussion

Having gone through all different categories of ap-proaches that pertain to the database to ontology map-ping issue, we attempt to summarize and stress someof the main points addressed throughout the paper re-garding the main challenges faced by each group of ap-proaches. We also argue on why a common evaluationmethodology for all proposed solutions is very diffi-cult, if not impossible, to be developed, as some cate-gories of approaches can only be evaluated in a quali-tative basis, rather than in terms of an impartial quanti-fied metric. Table 8 lists all database to ontology map-ping tools and approaches mentioned throughout thepaper, together with values for each of the descriptiveparameters identified in Section 4.

Except for the points already discussed in Sections5 and 6, we point out the limited software availabil-ity, as less than half of the approaches are accompa-nied by publicly available software and even feweramong them are actively maintained and have reacheda certain level of maturity. We can also make an infor-

18DL-LiteA belongs to the DL-Lite family of languages, on whichthe OWL 2 QL profile is based and is a proper subset of OWL DL.

mal observation with respect to the correlation of thelevel of automation of approaches with their seman-tic awareness, i.e. the amount of data semantics theyelicit correctly. This happens mainly because a humanexpert is the most reliable source of semantics of adatabase schema. Hence, purely manual tools, wheremappings are given by the human user, capture by defi-nition the true meaning of the logical schema, whereaspurely automatic tools can rarely achieve such levels ofsuccessful interpretation. Semi-automatic tools, whichinteract with the human user, fall somewhere in be-tween, thus we could say that there is some trade-offbetween the level of automation of an approach and thelevel of semantic awareness it achieves.

Going into further detail with respect to challengesand efficiency of each category of approaches, wecould observe that the generation of a databaseschema ontology (Section 5.1) is rarely a goal perse, and the majority of methods reviewed corrobo-rate this statement. This happens because a databaseschema ontology is a faithful representation of the re-lational schema (and in some cases, of the contents aswell) of a database. As such, it does not contain anyother domain semantics that are usually of interest toa human user or an external application, which wouldrather abstract the low-level organization details of adatabase and interact with it in terms of higher-levelconcepts. That is why most approaches complementthe generated ontology with additional axioms thatlink its terms with concepts from a domain-specificontology and offer access to the database contents byallowing an external human or software agent to issuequeries expressed over this domain-specific ontology.The transformation of the issued semantic queries toSQL queries over the database is performed by tak-ing into account the defined correspondences betweenthe terms of the database schema ontology and thedomain-specific one. However, the benefits of adopt-ing a database schema ontology as an in-between layerinstead of selecting a direct mapping definition amonga database and an ontology have not yet been shown inpractice for the task of query rewriting.

On the contrary, reasoning mechanisms are used incases where an ontological representation of a rela-tional database is employed to check the validity ofdatabase schema constraints. The challenge with thiskind of methods (e.g. [79,81]) is, on the one hand, tobridge the gap between the closed world semantics ofdatabase systems with the open world assumption ofontologies and, on the other hand, to achieve an onto-logical representation that does not increase the


Tabl

e8:

Des

crip

tive

para

met

ers

ofap

proa

ches

revi

ewed

.

App

roac

hL

evel

ofA

utom

a-tio

nD

ata

Acc

essi

bilit

yM

appi

ngL

an-

guag

eO

ntol

ogy

Lan

-gu

age

Voca

bula

ryR

euse

Soft

war

eA

vaila

bilit

yG

raph

ical

Use

rInt

erfa

ceM

ain

Purp

ose

Ast

rova

(200

4,20

09)[

11,1

2]Se

mi-

auto

mat

ic/

Aut

omat

icE

TL

–F-

Log

ic/

OW

LD

LN

oN

oN

oO

ntol

ogy

lear

ning

AuR

eLi[

100]

Sem

i-au

tom

atic

SPA

RQ

L/L

inke

dD

ata

D2R

Qm

appi

ngla

ngua

ge(c

usto

mR

DF-

base

d)

OW

LY

esN

oY

esO

ntol

ogy-

base

dda

taac

cess

Aut

omap

per(

Asi

oSB

RD

)[53

]Se

mi-

auto

mat

icSP

AR

QL

cust

om(R

DF-

base

d)O

WL

+SW

RL

No

Com

mer

cial

Yes

Ont

olog

y-ba

sed

data

acce

ssB

ucce

llaet

al.

(200

4)[2

9]Se

mi-

auto

mat

icE

TL

–O

WL

DL

No

No

No

Dat

abas

ein

tegr

atio

n

CR

OSS

[35]

Sem

i-au

tom

atic

cust

omac

cess

prot

ocol

–O

WL

Yes

Yes

No

Dat

abas

ein

tegr

atio

n

D2O

Map

per[

128]

Sem

i-A

utom

atic

–cu

stom

repr

esen

-ta

tion

(XM

Lfil

e)O

WL

DL

No

No

No

Sem

antic

anno

tatio

nof

web

page

sD

2RQ

[ 27]

/D

2RSe

rver

[25]

Aut

omat

ic/

Man

-ua

lE

TL

/SPA

RQ

L/

Lin

ked

Dat

aD

2RQ

map

ping

lang

uage

(cus

tom

RD

F-ba

sed)

RD

FSY

esY

esN

oO

ntol

ogy-

base

dda

taac

cess

Dar

tGri

d[3

8]M

anua

lSP

AR

QL

cust

omre

pres

en-

tatio

n(R

DF/

XM

Lfil

e)

OW

LD

LN

oN

oY

esD

atab

ase

inte

grat

ion

Dat

aMas

ter[

93]

Aut

omat

icE

TL

–O

WL

DL

/Ful

lN

oY

esY

esG

ener

atio

nof

SWda

taD

B2O

WL

[ 55]

Aut

omat

icE

TL

/SPA

RQ

LR2

O(c

usto

mte

xt-

base

d)O

WL

DL

No

No

No

Dat

abas

ein

tegr

atio

n

DM

-2-O

WL

[5]

Aut

omat

icE

TL

–O

WL

Full

No

No

No

Ont

olog

yle

arni

ngD

olbe

ar,

Har

t(2

008)

[48]

Sem

i-au

tom

atic

SPA

RQ

L–

OW

LD

LN

oN

oN

oSp

atia

ldat

abas

ein

te-

grat

ion

FDR

2[ 7

4]Se

mi-

auto

mat

icE

TL

–R

DFS

Yes

No

No

Dat

abas

ein

tero

per-

abili

tyK

upfe

ret

al.

(200

6)[7

5]Se

mi-

auto

mat

icE

TL

–O

WL

-Lite

Yes

No

No

Dat

abas

ein

tegr

atio

n

Juri

c,B

anek

,Sk

ocir

[68]

Aut

omat

icE

TL

–O

WL

Full

No

No

No

Ont

olog

yle

arni

ng

Lau

sen

(200

7)[ 7

8]A

utom

atic

ET

L–

RD

FSN

oN

oN

oD

atab

ase

inte

grat

ion

Lev

shin

(200

9)[8

1]A

utom

atic

ET

L–

OW

L+S

WR

LN

oN

oN

oC

onst

rain

tval

idat

ion

Lin

ked

Dat

aM

ap-

per[

130]

Man

ual

SPA

RQ

LV

irtu

oso

Met

a-Sc

hem

aL

angu

age

OW

LN

oY

esY

esD

atab

ase

inte

grat

ion

Con

tinue

don

next

page


App

roac

hL

evel

ofA

utom

a-tio

nD

ata

Acc

essi

bilit

yM

appi

ngL

an-

guag

eO

ntol

ogy

Lan

-gu

age

Voca

bula

ryR

euse

Soft

war

eA

vaila

bilit

yG

raph

ical

Use

rInt

erfa

ceM

ain

Purp

ose

Lub

yte,

Tess

aris

(200

9)[8

4]Se

mi-

auto

mat

icSP

AR

QL

–D

LR-D

BN

oN

oN

oO

ntol

ogy-

base

dda

taac

cess

MA

PON

TO[8

]Se

mi-

Aut

omat

ic–

cust

omre

pres

en-

tatio

n(X

ML

file)

OW

LD

LN

oY

esY

esM

eani

ngde

finiti

onof

data

base

MA

RSO

N[ 6

6]A

utom

atic

–cu

stom

repr

esen

-ta

tion

(XM

Lfil

e)O

WL

DL

No

No

No

Dat

abas

ein

tero

per-

abili

tyM

AST

RO

[32]

/O

BD

Apl

ugin

for

Prot

égé

[99]

Man

ual

SPA

RQ

Lcu

stom

repr

esen

-ta

tion

DL-

LiteA

/O

WL

2Q

LN

oY

esY

esO

ntol

ogy-

base

dda

taac

cess

ME

TAm

orph

oses

[ 118

]M

anua

lE

TL

cust

om(X

ML

-ba

sed)

RD

FY

esY

esY

esG

ener

atio

nof

SWda

taO

ntoA

cces

s[6

3]M

anua

lSP

AR

QL

Upd

ate

cust

om(R

DF-

base

d)R

DFS

Yes

Yes

No

Ont

olog

y-ba

sed

data

upda

teO

ntoM

at-R

ever

se[ 1

25]

(Ont

oMat

-A

nnot

izer

)

Sem

i-A

utom

atic

ET

L/F

-Log

icF-

Log

icF-

Log

icN

oY

esY

esSe

man

tican

nota

tion

ofw

ebpa

ges

R2O

[116

]Se

mi-

auto

mat

icE

TL

–O

WL

DL

No

No

No

Dat

ain

tegr

atio

nR2

O/

OD

EM

ap-

ster

[17]

Man

ual

ET

L/

cust

omqu

ery

lang

uage

(OD

EM

QL

)

R2

O(c

usto

mte

xt-

base

d)O

WL

Yes

Yes

No

Ont

olog

y-ba

sed

data

acce

ss

RD

B2O

NT

[121

]A

utom

atic

ET

L–

OW

LFu

llN

oN

oN

oD

atab

ase

inte

grat

ion

RD

B2O

WL

[30]

Man

ual

ET

LSQ

LO

WL

Yes

No

No

Gen

erat

ion

ofSW

data

RD

BTo

Ont

o[3

4]Se

mi-

auto

mat

icE

TL

–O

WL

No

Yes

Yes

Ont

olog

yle

arni

ngR

DO

TE

[ 123

]M

anua

lE

TL

SQL

OW

LY

esY

esY

esG

ener

atio

nof

SWda

taR

elat

iona

l.OW

L[9

5]/

RD

Que

ry[9

7]

Aut

omat

icE

TL

/SPA

RQ

L–

OW

LFu

llN

oY

esY

esD

ata

exch

ange

ina

P2P

netw

ork

RO

NTO

[ 94]

Sem

i-au

tom

atic

–cu

stom

repr

esen

-ta

tion

RD

FS/O

WL

No

No

Yes

Gen

erat

ion

ofSW

data

RO

SEX

[43]

Aut

omat

icSP

AR

QL

cust

omon

tolo

gyO

WL

DL

No

No

No

Ont

olog

y-ba

sed

data

acce

ssSh

enet

al.

(200

6)[1

12]

Aut

omat

icE

TL

–O

WL

DL

No

No

No

Dat

abas

ein

tegr

atio

n

SOA

M[8

2]Se

mi-

Aut

omat

icE

TL

–O

WL

DL

No

No

No

Ont

olog

yle

arni

ngSQ

L2O

WL

[4]

Sem

i-A

utom

atic

ET

L–

OW

LD

LN

oN

oY

esO

ntol

ogy

lear

ning

Spyd

erA

utom

atic

/M

an-

ual

ET

L/S

PAR

QL

cust

om(R

DF-

base

d)/R

2RM

LR

DFS

Yes

Yes

No

Ont

olog

y-ba

sed

data

acce

ssSq

uirr

elR

DF

[109

]A

utom

atic

/M

an-

ual

SPA

RQ

Lcu

stom

(RD

F-ba

sed)

RD

FSY

esY

esN

oO

ntol

ogy-

base

dda

taac

cess

Con

tinue

don

next

page


App

roac

hL

evel

ofA

utom

a-tio

nD

ata

Acc

essi

bilit

yM

appi

ngL

an-

guag

eO

ntol

ogy

Lan

-gu

age

Voca

bula

ryR

euse

Soft

war

eA

vaila

bilit

yG

raph

ical

Use

rInt

erfa

ceM

ain

Purp

ose

StdT

rip

[106

]Se

mi-

auto

mat

icE

TL

cust

omre

pres

en-

tatio

n/S

QL

OW

LY

esN

oY

esG

ener

atio

nof

SWda

taL

.St

ojan

ovic

,N

.St

ojan

ovic

,Vo

lz(2

002)

[117

]

Sem

i-au

tom

atic

ET

L–

F-L

ogic

/RD

FSN

oN

oN

oSe

man

tican

nota

tion

ofw

ebpa

ges

Teth

er[3

1]Se

mi-

Aut

omat

icE

TL

–R

DFS

Yes

No

No

Gen

erat

ion

ofSW

data

Tirm

izi,

Sequ

eda,

Mir

anke

r(2

008)

[ 120

]

Aut

omat

icE

TL

–O

WL

DL

No

No

No

Gen

erat

ion

ofSW

data

Trip

lify

[ 14]

Man

ual

ET

L/L

inke

dD

ata

SQL

RD

FSY

esY

esN

oG

ener

atio

nof

SWda

taU

ltraw

rap

[110

]A

utom

atic

SPA

RQ

L–

OW

LD

LN

oN

oN

oO

ntol

ogy-

base

dda

taac

cess

Vir

tuos

oR

DF

Vie

ws

[ 28]

Aut

omat

ic/

Man

-ua

lE

TL

/SPA

RQ

L/

Lin

ked

Dat

aV

irtu

oso

Met

a-Sc

hem

aL

angu

age

RD

FSY

esY

esY

esO

ntol

ogy-

base

dda

taac

cess

Vis

AVis

[71]

Man

ual

RD

QL

SQL

OW

LD

LN

oN

oY

esM

eani

ngde

finiti

onof

data

base


complexity of reasoning tasks beyond acceptable lev-els. Methods dealing with this problem use a varietyof technologies, including rules, SPARQL queries aswell as the metamodeling and the inverse functionaldatatype property features of OWL. However, there arestrong doubts regarding the practical value of such so-lutions, given that validation of database schema con-straints is already performed efficiently by databasemanagement systems, does not constitute an impor-tant database performance factor and it is highly un-likely that solutions involving ontology reasoning en-gines will prove more efficient. In fact, none of theseworks offers a practical evaluation of their proposals,which seem more like theoretical exercises examin-ing the interactions of the relational model with OWLontologies. A suitable evaluation methodology wouldprobably need to assess the efficiency of each approachwith respect to the number of relations and constraintsin a database schema as well as the size of the data inthe database. For each test case, it could then calcu-late the total time needed for constraint validation andcompare it with the time needed by a database man-agement system.

Moving on to approaches that create a new domainontology (Section 5), the main challenges faced bytools of this category include the definition of feature-rich mapping languages that allow for a fully cus-tomizable export of database contents to RDFS on-tologies and for the storage of information requiredby query rewriting algorithms, the search for efficientquery rewriting algorithms, the extraction of domainknowledge from the database schema and the, as au-tomated as possible, enrichment of the generated on-tology with knowledge from other relevant domainsources. However, these high-level goals depend on themotivation considered by each approach and usually,cannot be all sought after together. The first two ofthem are normally faced by tools of Section 5.2.1, themajority of which support dynamic query-based dataaccess, while the third one is the subject of approachespresented in Section 5.2.2.

The definition of new expressive and user-friendlydatabase to ontology mapping languages has been agoal for researchers in the past, but the standardiza-tion process of R2RML that is currently under way hasalready moved the focus away from research on newlanguages. On the contrary, research on efficient queryrewriting algorithms still remains a challenge for toolsaiming at ontology-based data access and consti-tutes an important success factor for them. In fact, theemergence of efficient strategies for the transforma-

tion of SPARQL queries to equivalent SQL ones (e.g.[37,50]) has led to the consideration of such databaseto ontology mapping tools as an alternative to triplestores, at least in the case of data that resides in re-lational databases and is not native RDF. Since rela-tional database management systems, due to their ro-bust optimization strategies, outperform triple stores inthe query answering task by factors up to one hundred[26], it is not surprising that, as long as SPARQL toSQL rewriting algorithms do not introduce a large de-lay, the combination of a relational database with a dy-namic access database to ontology mapping tool andwill still outperform triple stores. This is true for atleast two such tools, namely D2R Server [25] and theRDF Views feature of Virtuoso Universal Server [28],which have been found to perform better than conven-tional triple stores [26] for large datasets.

Although the efficiency of database to ontologymapping systems that offer SPARQL data access canbe quantified by calculating the response time to aquery, it is not a trivial task to come up with an ob-jective benchmark procedure for the fair and thor-ough comparison of the performance of such systems.Indeed, some of the evaluation methodologies thathave been proposed present contradictory results re-garding the performance of mapping tools and nativetriple stores with respect to SPARQL query answer-ing [26,57]. A proper benchmarking methodology forthis category of tools usually involves the construc-tion of data generators outputting datasets of varioussizes with desired properties and the proposal of se-quences of queries of different complexity that are is-sued against these datasets. The efficiency of a sys-tem is then evaluated in terms of the response time fora sequence of these queries or equivalently, in termsof the number of queries that can be executed by thesystem in a given time period. A procedure like theabove is implemented in the Berlin SPARQL Bench-mark [26], which is one of the most acclaimed bench-marks for storage systems that expose SPARQL end-points and, as such, it also includes D2R Server andVirtuoso RDF Views in its tests. Another similar no-table benchmark is the SP2Bench SPARQL perfor-mance benchmark [107], which can be applied to bothtriple stores and SPARQL to SQL rewriting systems,although none of the latter have been included in itstests.

Another measure that is more relevant to approachesthat materialize the generated RDF graph is the timeneeded for the production of an RDF statement. Twoof the factors that influence this measure is the to-


tal size and the complexity of the RDF graph gener-ated [119]. Therefore, a suitable evaluation strategywould, once again, involve the considerations of RDFgraphs of different sizes and complexities. However,this measure is usually of little interest, especially toapproaches that are driven by the motivation of learn-ing a domain ontology from a relational database andreusing it in some other application. In such cases, theproduction time of the ontology is not the key factorfor success, since this procedure will usually be per-formed only once. What is more interesting for theend user and, in the same time, challenging for thetool is the elicitation of domain knowledge from therelational database instance as well as knowledge ex-traction from additional information sources in casethe relational database proves insufficient for the pur-pose. As far as the above goals are considered, the effi-ciency of tools that learn an ontology from a relationaldatabase is very hard to be quantified and measuredobjectively. Some systems measure the efficiency oftheir approach by comparing the generated ontologywith an a priori known ER model corresponding to theinput relational database. The above rationale couldbe somehow quantified, if pairs of relational databaseschemas and “gold standard” ontologies were speci-fied, with the latter capturing the intended domain se-mantics contained in the database. Ideally, these testpairs will need to include relational schemas followingnon-standard design practices often met in real worldapplications. The efficiency of an approach would thenbe dependent on the amount of ontology constructsidentified as expected and on the total number of testcases handled correctly.

Such a collection of test cases could also be em-ployed for the evaluation of the performance of toolsthat discover mappings between a relational databaseand an ontology, presented in Section 6. In fact, a shortcollection of pairs of database schemas and ontolo-gies has been created for the purpose of the evalua-tion of the MAPONTO tool [8] and has also been ap-plied in the evaluation of MARSON [66]. This collec-tion of schemas and ontologies together with the re-lated collection gathered in the relational-to-RDF ver-sion of THALIA testbed19 could serve as the basis forthe development of a full benchmark that will incorpo-

19Relational-to-RDF version of THALIA bench-mark: http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/THALIATestbed

rate more complex real world schemas and ontologiesfrom a variety of subject domains.

As the goal of mapping discovery tools is to au-tomatize as much as possible the mapping discoveryprocess, a true challenge for them is to rely less onlexical similarity techniques that pose restrictions ontheir effectiveness and try to solve the synonym andhomonym problems with the aid of appropriate lexi-cal reference sources and techniques that compare thestructure of the two models. As far as approaches thatsupport the manual definition of mappings between adatabase and an existing ontology, also presented inSection 6, the challenges are more or less the sameas the ones previously mentioned for the tools of Sec-tion 5.2.1, i.e. the search for an expressive mappinglanguage and the efficient query rewriting for the caseof tools that support dynamic data access. In additionto these high-level goals, tools of this kind also facethe challenge of presenting to the end user an intu-itive graphical interface that will allow him to definecomplex mappings, without requiring him to be famil-iar with the underlying mapping representation lan-guage. This is a very important prerequisite for thewidespread use of such tools, that unfortunately is usu-ally not given enough attention, in favour of other moretangible indicators of performance.

From the above, it becomes evident that a non-superficial and thorough analysis of the efficiency ofreviewed approaches needs special care and cannot beperformed in the context of the current paper. An ad-ditional practical difficulty to the ones already men-tioned is the lack of a common standard application in-terface for software implementations, with the excep-tion of tools that expose a SPARQL endpoint, whichare easily pluggable to an external benchmark applica-tion, regardless of the implementation’s programminglanguage. This is not true for other categories of tools,which express their output in various forms, i.e. differ-ent ontology languages and/or different mapping rep-resentation formats, making their comparison cumber-some. As already stated, the only relevant compara-tive evaluation available is the Berlin SPARQL Bench-mark and we encourage the interested reader to con-sult [26] for the detailed performance analysis of D2RServer and Virtuoso RDF Views. As far as evaluationof other tool categories is concerned, we gathered atone place20 all experimental datasets mentioned in the

20Test datasets for relational to ontology mapping: http://orpheus.cn.ece.ntua.gr/rdb-rdf

http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/THALIATestbed



http://orpheus.cn.ece.ntua.gr/rdb-rdf

http://orpheus.cn.ece.ntua.gr/rdb-rdf


literature and used in the evaluation of individual ap-proaches, hoping that they come of use for the evalua-tion of new relevant methods.

8. Future Directions

In this paper, we tried to present the wealth of re-search work marrying the worlds of relational databasesand Semantic Web. We illustrated the variety of dif-ferent approaches and identified the main challengesthat researchers of this field face as well as proposedsolutions. We close this paper by mentioning someproblems that have only been lightly touched uponby database to ontology mapping solutions as well assome aspects that need to be considered by future ap-proaches.

1. Ontology-based data update. A lot of ap-proaches mentioned offer SPARQL based ac-cess to the contents of the database. However,this access is unidirectional. Since the emer-gence of SPARQL Update that allows updateoperations on an RDF graph, the idea of issu-ing SPARQL Update requests that will be trans-formed to appropriate SQL statements and exe-cuted on the underlying relational database hasbecome more and more popular. Some earlywork has already appeared in the OntoAccessprototype [63] and the extensions on the D2RQtool, D2RQ/Update21 and D2RQ++ [104]. How-ever, as SPARQL Update is still under devel-opment and its semantics is not yet well de-fined, there is some ambiguity regarding thetransformation of some SPARQL Update state-ments. Moreover, only basic (relation-to-classand attribute-to-property) mappings have beeninvestigated so far. The issue of updating rela-tional data through SPARQL Update is similar tothe classic database view update problem, there-fore porting already proposed solutions wouldcontribute significantly in dealing with this issue.

2. Mapping update. Database schemas and ontolo-gies constantly evolve to suit the changing ap-plication and user needs. Therefore, establishedmappings between the two should also evolve,instead of being redefined or rediscovered fromscratch. This issue is closely related to the pre-

21D2RQ/Update and D2R/Update Server homepage: http://d2rqupdate.cs.technion.ac.il/

vious one, since modifications in either partic-ipating model do not simply incur adaptationsto the mapping but also cause some necessarychanges to the other model as well. So far, onlyfew solutions have been proposed for the caseof the unidirectional propagation of databaseschema changes to a generated ontology [42]and the consequent adaptation of the mapping[9]. The inverse direction, i.e. modification ofthe database as a result of changes in the on-tology has not been investigated thoroughly yet.On a practical note, both database trigger func-tions and mechanisms like the Link MaintenanceProtocol (WOD-LMP) from the Silk framework[124] could prove useful for solutions to this is-sue.

3. Generation of Linked Data. A fair number ofapproaches support vocabulary reuse, a factorthat has always been important for the progressof the Semantic Web, while a few other ap-proaches try to discover automatically the mostsuitable classes or properties from popular vo-cabularies that can be mapped to a given databaseschema. Nonetheless, these efforts are still notadequate for the generation of RDF graphs thatcan be smoothly integrated in the Linking OpenData (LOD) Cloud22. For the generation of trueLinked Data, the real world entities that databasevalues represent should be identified and linksbetween them should be established, in con-trast with the majority of current methods, whichtranslate database values to RDF literals. Lately,a few interesting tools that handle the transfor-mation of spreadsheets to Linked RDF data byanalyzing the content of spreadsheet tables havebeen presented, with the most notable examplesbeing the RDF extension for Google Refine [85]and T2LD [91]. Techniques as the ones appliedin these tools can certainly be adapted to the re-lational database paradigm.

These aspects, together with the challenges enumer-ated in Section 7, mark the next steps for databaseto ontology mapping approaches. Although a lot ofground has been covered during the last decade, itlooks like there is definitely some interesting roadahead in order to seamlessly integrate relational databaseswith the Semantic Web, turning it into reality at last.

22LOD Cloud diagram: http://richard.cyganiak.de/2007/10/lod/

http://d2rqupdate.cs.technion.ac.il/

http://d2rqupdate.cs.technion.ac.il/

http://richard.cyganiak.de/2007/10/lod/

http://richard.cyganiak.de/2007/10/lod/


Acknowledgements

Dimitrios-Emmanuel Spanos wishes to acknowl-edge the financial support of ‘Alexander S. Onas-sis Public Benefit Foundation’, under its ScholarshipsProgramme.

References

[1] ISO/IEC 9075-1:2008, SQL Part 1: Framework(SQL/Framework), International Organization for Standard-ization, 27 January 2009.

[2] ISO/IEC 9075-14:2008, SQL Part 14: XML-Related Specifi-cations (SQL/XML), International Organization for Standard-ization, 27 January 2009.

[3] S. Abiteboul, R. Hull and V. Vianu, Foundations ofDatabases, Addison-Wesley, New York City, NY, 1st ed.,1995.

[4] N. Alalwan, H. Zedan and F. Siewe, Generating OWL Ontol-ogy for Database Integration, in P. Dini, J. Hendler, J. Nollet al., eds., Proceedings of the Third International Conferenceon Advances in Semantic Processing (SEMAPRO 2009), pp.22–31, IEEE, 2009.

[5] K. M. Albarrak and E. H. Sibley, Translating Relational &Object-Relational Database Models into OWL Models, inS. Rubin and S.-C. Chen, eds., Proceedings of the 2009 IEEEInternational Conference on Information Reuse & Integration(IRI 2009), pp. 336–341, IEEE, 2009.

[6] R. Alhajj, Extracting the Extended Entity-Relationship Modelfrom a Legacy Relational Database, Information Systems,28(6), pp. 597–618, 2003.

[7] Y. An, A. Borgida, R. J. Miller and J. Mylopoulos, A Seman-tic Approach to Discovering Schema Mapping Expressions,in R. Chirkova and V. Oria, eds., Proceeding of 2007 IEEE23rd International Conference on Data Engineering (ICDE2007), pp. 206–215, IEEE, 2007.

[8] Y. An, A. Borgida and J. Mylopoulos, Discovering the Se-mantics of Relational Tables through Mappings, Journal onData Semantics, VII, pp. 1–32, 2006.

[9] Y. An, X. Hu and I.-Y. Song, Round-Trip Engineering forMaintaining Conceptual-Relational Mappings, in Z. Bellah-sène and M. Léonard, eds., Advanced Information SystemsEngineering: 20th International Conference (CAiSE 2008),Lecture Notes on Computer Science, vol. 5074, pp. 296–311,Springer, 2008.

[10] R. Angles and C. Gutierrez, The Expressive Power ofSPARQL, in A. Sheth, S. Staab, M. Dean, M. Paolucci,D. Maynard, T. Finin and K. Thirunarayan, eds., The Seman-tic Web - ISWC 2008: 7th International Semantic Web Con-ference, Lecture Notes in Computer Science, vol. 5318, pp.114–129, Springer, 2008.

[11] I. Astrova, Reverse Engineering of Relational Databasesto Ontologies, in C. J. Bussler, J. Davies, D. Fensel andR. Studer, eds., The Semantic Web: Research and Appli-cations: First European Semantic Web Symposium (ESWS2004), Lecture Notes in Computer Science, vol. 3053, pp.327–341, Springer, 2004.

[12] I. Astrova, Rules for Mapping SQL Relational Databases toOWL Ontologies, in M.-A. Sicilia and M. D. Lytras, eds.,Metadata and Semantics, pp. 415–424, Springer, 2009.

[13] P. Atzeni, S. Paolozzi and P. Del Nostro, Ontologies andDatabases: Going Back and Forth, in Proceedings of the4th International VLDB Workshop on Ontology-based Tech-niques for Databases in Information Systems and KnowledgeSystems (ODBIS 2008), pp. 9 – 16, 2008.

[14] S. Auer, S. Dietzold, J. Lehmann, S. Hellmann and D. Au-mueller, Triplify: Light-Weight Linked Data Publication fromRelational Databases, in Proceedings of the 18th Interna-tional Conference on World Wide Web (WWW’09), pp. 621–630, ACM, 2009.

[15] F. Baader, D. L. McGuinness, P. F. Patel-Schneider andD. Nardi, The Description Logic Handbook: Theory, Imple-mentation, and Applications, Cambridge University Press,2nd ed., 2007.

[16] M. Baglioni, M. V. Masserotti, C. Renso and L. Spin-santi, Building Geospatial Ontologies from GeographicalDatabases, in F. Fonseca, M. A. Rodríguez and S. Levashkin,eds., GeoSpatial Semantics: Second International Conference(GeoS 2007), Lecture Notes in Computer Science, vol. 4853,pp. 195–209, Springer, 2007.

[17] J. Barrasa - Rodriguez and A. Gómez-Pérez, Upgrading Re-lational Legacy Data to the Semantic Web, in Proceedingsof the 15th International Conference on World Wide Web(WWW’06), pp. 1069–1070, ACM, 2006.

[18] J. Barrasa, O. Corcho and A. Gómez-Pérez, R2O, an Ex-tensible and Semantically Based Database-to-Ontology Map-ping Language, in Second International Workshop on Seman-tic Web and Databases (SWDB 2004), 2004.

[19] D. Beckett and J. Grant, SWAD-Europe Deliverable 10.2:Mapping Semantic Web Data with RDBMSes, avail-able at: http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/,Technical Report, 2003.

[20] A. Behm, A. Geppert and K. R. Dittrich, On The Migrationof Relational Schemas and Data to Object-Oriented DatabaseSystems, in Proceeding of the 5th International Conferenceon Re-Technologies for Information Systems (ReTIS 1997),pp. 13–33, 1997.

[21] C. Ben Necib and J.-C. Freytag, Semantic Query Transforma-tion Using Ontologies, in B. C. Desai and G. Vossen, eds.,Proceedings of 9th International Database Engineering &Application Symposium (IDEAS 2005), pp. 187–199, IEEE,2005.

[22] T. Berners-Lee, Relational Databases on the Semantic Web,available at: http://www.w3.org/DesignIssues/RDB-RDF.html, 1998.

[23] T. Berners-Lee, Semantic Web Road map, available at:http://www.w3.org/DesignIssues/Semantic.html, 1998.

[24] A. Bertails and E. G. Prud’hommeaux, Interpreting RelationalDatabases in the RDF Domain, in M. A. Musen and O. Cor-cho, eds., Proceedings of the 2011 Knowledge Capture Con-ference (K-CAP 2011), pp. 129–135, ACM, 2011.

[25] C. Bizer and R. Cyganiak, D2R Server - Publishing RelationalDatabases on the Semantic Web, poster in 5th InternationalSemantic Web Conference (ISWC 2006), 2006.

[26] C. Bizer and A. Schultz, The Berlin SPARQL Benchmark, In-ternational Journal On Semantic Web and Information Sys-

http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/

http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/

http://www.w3.org/DesignIssues/RDB-RDF.html

http://www.w3.org/DesignIssues/RDB-RDF.html

http://www.w3.org/DesignIssues/Semantic.html

http://www.w3.org/DesignIssues/Semantic.html


tems, 5(2), pp. 1–24, 2009.[27] C. Bizer and A. Seaborne, D2RQ - Treating non-RDF

Databases as Virtual RDF Graphs, poster in 3rd InternationalSemantic Web Conference (ISWC 2004), 2004.

[28] C. Blakeley, Virtuoso RDF Views Getting Started Guide,available at: http://www.openlinksw.co.uk/virtuoso/Whitepapers/pdf/Virtuoso_SQL_to_RDF_Mapping.pdf, OpenLink Software, 2007.

[29] A. Buccella, M. R. Penabad, F. R. Rodriguez, A. Farina andA. Cechich, From Relational Databases to OWL Ontologies,in Proceedings of 6th Russian Conference on Digital Li-braries (RCDL 2004), 2004.

[30] G. Bumans and K. Cerans, RDB2OWL : a Practical Approachfor Transforming RDB Data into RDF/OWL, in A. Paschke,N. Henze and T. Pellegrini, eds., Proceedings of the 6th In-ternational Conference on Semantic Systems (I-SEMANTICS2010), ACM, 2010.

[31] K. Byrne, Having Triplets - Holding Cultural Data as RDF,in M. Larson, K. Fernie, O. J. and J. Cigarran, eds., Proceed-ings of the ECDL 2008 Workshop on Information Access toCultural Heritage, 2008.

[32] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini,A. Poggi, M. Rodriguez-Muro, R. Rosati, M. Ruzzi and D. F.Savo, The MASTRO System for Ontology-based Data Ac-cess, Semantic Web Journal, 2(1), pp. 43–53, 2011.

[33] D. Calvanese, M. Lenzerini and D. Nardi, Unifying Class-Based Representation Formalisms, Journal of Artificial Intel-ligence Research, 11, pp. 199–240, 1999.

[34] F. Cerbah, Mining the Content of Relational Databases toLearn Ontologies with Deeper Taxonomies, in Y. Li, G. Pasi,C. Zhang, N. Cercone and L. Cao, eds., Proceedings of 2008IEEE/WIC/ACM International Conference on Web Intelli-gence and Intelligent Agent Technology Workshops(WI-IAT2008), pp. 553–557, IEEE, 2008.

[35] P.-A. Champin, G.-J. Houben and P. Thiran, CROSS: an OWLWrapper for Reasoning on Relational Databases, in C. Par-ent, K.-D. Schewe, V. C. Storey and B. Thalheim, eds., Con-ceptual Modeling - ER 2007: 26th International Conferenceon Conceptual Modeling, Lecture Notes in Computer Science,vol. 4801, pp. 502–517, Springer, 2007.

[36] N. Chatterjee and M. Krishna, Semantic Integration of Het-erogeneous Databases on the Web, in B. B. Bhattacharyya,C. A. Murthy, B. B. Chaudhuri and B. Chanda, eds., Inter-national Conference on Computing: Theory and Applications(ICCTA 2007), pp. 325–329, IEEE, 2007.

[37] A. Chebotko, S. Lu and F. Fotouhi, Semantics PreservingSPARQL-to-SQL Translation, Data & Knowledge Engineer-ing, 68(10), pp. 973–1000, 2009.

[38] H. Chen, Z. Wu, Y. Mao and G. Zheng, DartGrid: a SemanticInfrastructure for Building Database Grid Applications, Con-currency and Computation: Practice and Experience, 18(14),pp. 1811–1828, 2006.

[39] P. P.-S. Chen, The Entity-Relationship Model - Toward a Uni-fied View of Data, ACM Transactions on Database Systems(TODS), 1(1), pp. 9–36, 1976.

[40] R. H. Chiang, T. M. Barron and V. C. Storey, Reverse Engi-neering of Relational Databases: Extraction of an EER Modelfrom a Relational Database, Data & Knowledge Engineering,12(2), pp. 107–142, 1994.

[41] E. F. Codd, A Relational Model of Data for Large SharedData Banks, Communications of the ACM, 13(6), pp. 377–

387, 1970.[42] O. Curé and R. Squelbut, A Database Trigger Strategy to

Maintain Knowledge Bases Developed via Data Migration, inC. Bento, A. Cardoso and G. Dias, eds., Progress in ArtificialIntelligence: 12th Portuguese Conference on Artificial Intelli-gence (EPIA 2005), Lecture Notes in Computer Science, vol.3808, pp. 206–217, Springer, 2005.

[43] C. Curino, G. Orsi, E. Panigati and L. Tanca, Accessing andDocumenting Relational Databases through OWL Ontolo-gies, in T. Andreasen, R. R. Yager, H. Bulskov, H. Chris-tiansen and H. Legind Larsen, eds., Flexible Query AnsweringSystems: 8th International Conference (FQAS 2009), LectureNotes in Computer Science, vol. 5822, pp. 431–442, Springer,2009.

[44] R. Cyganiak, A Relational Algebra for SPARQL, TechnicalReport HPL-2005-170, Hewlett-Packard, 2005.

[45] S. Das and J. Srinivasan, Database Technologies for RDF, inS. Tessaris, E. Franconi, T. Eiter, C. Gutierrez, S. Handschuh,M.-C. Rousset and R. A. Schmidt, eds., Reasoning Web. Se-mantic Technologies for Information Systems: 5th Interna-tional Summer School 2009, Lecture Notes in Computer Sci-ence, vol. 5689, pp. 205–221, Springer, 2009.

[46] J. de Bruijn, F. Martin-Recuerda, D. Manov and M. Ehrig,State-of-the-art Survey on Ontology Merging and Aligning,SEKT Project, Deliverable D4.2.1, 2004.

[47] M. del Mar Roldan-Garcia and J. F. Aldana-Montes, A Sur-vey on Disk Oriented Querying and Reasoning on the Seman-tic Web, in R. S. Barga and X. Zhou, eds., Proceedings of22nd International Conference on Data Engineering Work-shops (ICDEW’06), IEEE, 2006.

[48] C. Dolbear and G. Hart, Ontological Bridge Building - UsingOntologies to Merge Spatial Datasets, in D. L. McGuinness,P. Fox and B. Brodaric, eds., Semantic Scientific KnowledgeIntegration: AAAI Spring Symposium (AAAI/SSS Workshop),pp. 15–20, 2008.

[49] E. Dragut and R. Lawrence, Composing Mappings betweenSchemas using a Reference Ontology, in R. Meersman andZ. Tari, eds., On the Move to Meaningful Internet Systems2004: CoopIS, DOA, and ODBASE, Lecture Notes in Com-puter Science, vol. 3290, pp. 783–800, Springer, 2004.

[50] B. Elliott, E. Cheng, C. Thomas-Ogbuji and Z. M. Ozsoyoglu,A Complete Translation from SPARQL into Efficient SQL,in B. C. Desai, ed., Proceedings of the 2009 InternationalDatabase Engineering & Applications Symposium (IDEAS’09), pp. 31–42, ACM, 2009.

[51] R. Elmasri and S. B. Navathe, Fundamentals of Database Sys-tems, The Benjamin/Cummings Publishing Company, Inc.,San Francisco, CA, USA, 6th ed., 2010.

[52] M. Fahad, ER2OWL: Generating OWL Ontology from ERDiagram, in Z. Shi, E. Mercier-Laurent and D. Leake, eds.,Intelligent Information Processing IV: 5th IFIP InternationalConference on Intelligent Information Processing, pp. 28–37,Springer, 2008.

[53] M. Fisher, M. Dean and G. Joiner, Use of OWL and SWRLfor Semantic Relational Database Translation, in K. Clark andP. F. Patel-Schneider, eds., Proceedings of the Fourth OWLEDWorkshop on OWL: Experiences and Directions, 2008.

[54] J. Geller, S. A. Chun and Y. J. An, Toward the Semantic DeepWeb, Computer, 41(9), pp. 95–97, 2008.

[55] R. Ghawi and N. Cullot, Database-to-Ontology MappingGeneration for Semantic Interoperability, in 3rd International

http://www.openlinksw.co.uk/virtuoso/Whitepapers/pdf/Virtuoso_SQL_to_RDF_Mapping.pdf




Workshop on Database Interoperability (InterDB 2007), heldin conjunction with VLDB 2007, 2007.

[56] A. Gomez-Perez, O. Corcho-Garcia and M. Fernandez-Lopez, Ontological Engineering, Springer-Verlag New York,Inc., Secaucus, NJ, USA, 1st ed., 2003.

[57] A. J. G. Gray, N. Gray and I. Ounis, Can RDB2RDF ToolsFeasibly Expose Large Science Archives for Data Integra-tion?, in L. Aroyo, P. Traverso, F. Ciravegna et al., eds., TheSemantic Web: Research and Applications: 6th European Se-mantic Web Conference (ESWC 2009), Lecture Notes in Com-puter Science, vol. 5554, pp. 491–505, Springer, 2009.

[58] T. Gruber, Toward Principles for the Design of Ontolo-gies Used for Knowledge Sharing, International Journal ofHuman-Computer Studies, 43(5-6), pp. 907–928, 1995.

[59] N. Guarino, Formal Ontology and Information Systems, inFormal Ontologies in Information Systems: Proceedings ofthe 1st International Conference (FOIS’98), pp. 3–15, IOSPress, 1998.

[60] T. Heath and C. Bizer, Linked Data: Evolving the Web intoa Global Data Space, Morgan & Claypool Publishers, SanRafael, 2011.

[61] S. Hellmann, J. Unbehauen, A. Zaveri, J. Lehmann, S. Auer,S. Tramp, H. Williams, O. Erling, T. Thibodeau Jr, K. Ide-hen, A. Blumauer and H. Nagy, Report on KnowledgeExtraction from Structured Sources, LOD2 Project, Deliv-erable 3.1.1, available at: http://static.lod2.eu/Deliverables/deliverable-3.1.1.pdf, 2011.

[62] J. Hendler, Web 3.0: Chicken Farms on the Semantic Web,IEEE Computer, 41(1), pp. 106–108, 2008.

[63] M. Hert, G. Reif and H. C. Gall, Updating Relational Datavia SPARQL/Update, in F. Daniel, L. Delcambre, F. Fotouhiet al., eds., Proceedings of the 2010 EDBT/ICDT Workshops,ACM, 2010.

[64] M. Hert, G. Reif and H. C. Gall, A Comparison ofRDB-to-RDF Mapping Languages, in C. Ghidini, A.-C.Ngonga Ngomo, S. Lindstaedt and T. Pellegrini, eds., Pro-ceedings of the 7th International Conference on SemanticSystems (I-SEMANTICS 2011), pp. 25–32, ACM, 2011.

[65] G. Hillairet, F. Bertrand and J. Y. Lafaye, MDE for Publish-ing Data on the Semantic Web, in F. Silva Parreiras, J. Z. Pan,U. Assmann and J. Henriksson, eds., Transforming and Weav-ing Ontologies in Model Driven Engineering: Proceedings ofthe 1st International Workshop (TWOMDE 2008), pp. 32–46,2008.

[66] W. Hu and Y. Qu, Discovering Simple Mappings between Re-lational Database Schemas and Ontologies, in K. Aberer, K.-S. Choi, N. Noy et al., eds., The Semantic Web: 6th Inter-national Semantic Web Conference, 2nd Asian Semantic WebConference (ISWCN 2007 + ASWC 2007), Lecture Notes inComputer Science, vol. 4825, pp. 225–238, Springer, 2007.

[67] P. Johannesson, A Method for Transforming RelationalSchemas into Conceptual Schemas, in Proceedings of the10th International Conference on Data Engineering (ICDE1994), pp. 190–201, IEEE, 1994.

[68] D. Juric, M. Banek and Z. Skocir, Uncovering the Deep Web:Transferring Relational Database Content and Metadata toOWL Ontologies, in I. Lovrek, R. J. Howlett and L. C. Jain,eds., Knowledge-Based Intelligent Information and Engineer-ing Systems: 12th International Conference (KES 2008), Lec-ture Notes in Computer Science, vol. 5177, pp. 456–463,Springer, 2008.

[69] Y. Kalfoglou and M. Schorlemmer, Ontology Mapping: theState of the Art, The Knowledge Engineering Review, 18(1),pp. 1–31, 2003.

[70] V. Kashyap, Design and Creation of Ontologies for Environ-mental Information Retrieval, in First Agricultural ServiceOntology (AOS) Workshop, 2001.

[71] N. Konstantinou, D.-E. Spanos, M. Chalas, E. Solidakis andN. Mitrou, VisAVis: An Approach to an Intermediate Layerbetween Ontologies and Relational Database Contents, inF. Frasincar, G.-J. Houben and P. Thiran, eds., Proceedings ofthe CAiSE’06 3rd International Workshop on Web Informa-tion Systems Modeling (WISM’06), pp. 1050–1061, 2006.

[72] N. Konstantinou, D.-E. Spanos and N. Mitrou, Ontology andDatabase Mapping: A Survey of Current Implementationsand Future Directions, Journal of Web Engineering, 7(1), pp.1–24, 2008.

[73] N. Konstantinou, D.-E. Spanos, P. Stavrou and N. Mitrou,Technically Approaching the Semantic Web Bottleneck, In-ternational Journal of Web Engineering and Technology,6(1), pp. 83–111, 2010.

[74] M. Korotkiy and J. L. Top, From Relational Data to RDFSModels, in N. Koch, P. Fraternali and M. Wirsing, eds., WebEngineering: 4th International Conference (ICWE 2004),Lecture Notes in Computer Science, vol. 3140, pp. 430–434,Springer, 2004.

[75] A. Kupfer, S. Eckstein, K. Neumann and B. Mathiak, Han-dling Changes of Database Schemas and Corresponding On-tologies, in J. F. Roddick, V. R. Benjamins, S. S.-S. Cherfiet al., eds., Advances in Conceptual Modeling - Theory andPractice: ER 2006 Workshops, Lecture Notes in ComputerScience, vol. 4231, pp. 227–236, Springer, 2006.

[76] M. Laclavík, RDB2Onto: Relational Database Data toOntology Individuals Mapping, in P. Návrat, P. Bartoš,M. Bieliková, L. Hluchý and P. Vojtáš, eds., Tools for Acquisi-tion, Organisation and Presenting of Information and Knowl-edge, pp. 86–89, 2006.

[77] N. Lammari, I. Comyn-Wattiau and J. Akoka, ExtractingGeneralization Hierarchies from Relational Databases: A Re-verse Engineering Approach, Data & Knowledge Engineer-ing, 63(2), pp. 568–589, 2007.

[78] G. Lausen, Relational Databases in RDF: Keys and ForeignKeys, in V. Christophides, M. Collard and C. Gutierrez, eds.,Semantic Web, Ontologies and Databases: VLDB Workshop(SWDB-ODBIS 2007), Lecture Notes in Computer Science,vol. 5005, pp. 43–56, Springer, 2007.

[79] G. Lausen, M. Meier and M. Schmidt, SPARQLing Con-straints for RDF, in A. Kemper, P. Valduriez, N. Mouaddibet al., eds., Advances in Database Technology: Proceedingsof the 11th International conference on Extending DatabaseTechnology (EDBT ’08), pp. 499–509, ACM, 2008.

[80] M. Lenzerini, Data Integration: A Theoretical Perspective,in Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, pp.233–246, 2002.

[81] D. Levshin, Mapping Relational Databases to the SemanticWeb with Original Meaning, in D. Karagiannis and Z. Jin,eds., Knowledge Science, Engineering and Management:Third International Conference (KSEM 2009), Lecture Notesin Computer Science, vol. 5914, pp. 5–16, Springer, 2009.

[82] M. Li, X. Du and S. Wang, A Semi-Automatic Ontology Ac-quisition Method for the Semantic Web, in W. Fan, Z. Wu

http://static.lod2.eu/Deliverables/deliverable-3.1.1.pdf

http://static.lod2.eu/Deliverables/deliverable-3.1.1.pdf


and J. Yang, eds., Advances in Web-Age Information Man-agement: 6th International Conference (WAIM 2005), LectureNotes in Computer Science, vol. 3739, pp. 209–220, Springer,2005.

[83] J. Lu, F. Cao, L. Ma, Y. Yu and Y. Pan, An Effective SPARQLSupport over Relational Databases, in V. Christophides,M. Collard and C. Gutierrez, eds., Semantic Web, Ontolo-gies and Databases: VLDB Workshop (SWDB-ODBIS 2007),Lecture Notes in Computer Science, vol. 5005, pp. 57–76,Springer, 2007.

[84] L. Lubyte and S. Tessaris, Automatic Extraction of Ontolo-gies Wrapping Relational Data Sources, in S. S. Bhowmick,J. Küng and R. Wagner, eds., Database and Expert SystemsApplications: 20th International Conference (DEXA 2009),Lecture Notes in Computer Science, vol. 5690, pp. 128–142,Springer, 2009.

[85] F. Maali, R. Cyganiak and V. Peristeras, Re-using Cool URIs:Entity Reconciliation Against LOD Hubs, in Proceedings ofthe 4th Linked Data on the Web Workshop (LDOW 2011),2011.

[86] A. Maedche and S. Staab, Ontology Learning for the Seman-tic Web, IEEE Intelligent Systems, 16(2), pp. 72–79, 2001.

[87] P. Martin, J. R. Cordy and R. Abu-Hamdeh, Information Ca-pacity Preserving Translations of Relational Schemas UsingStructural Transformation, External Technical Report, ISSN0836-0227-95-392, Ontario, Canada, 1995.

[88] A. Miller and D. McNeil, Revelytix RDB Map-ping Language Specification, available at: http://www.knoodl.com/ui/groups/Mapping_Ontology_Community/wiki/User_Guide/media/RDB_Mapping_Specification_v0.2,Revelytix, 2010.

[89] B. Motik, On the Properties of Metamodeling in OWL, Jour-nal of Logic and Computation, 17(4), pp. 617–637, 2007.

[90] B. Motik, I. Horrocks and U. Sattler, Bridging the Gap be-tween OWL and Relational Databases, in C. Williamson andM. E. Zurko, eds., Proceedings of the 16th International Con-ference on World Wide Web (WWW 2007), pp. 807–816, ACMPress, 2007.

[91] V. Mulwad, T. Finin, Z. Syed and A. Joshi, Using Linked Datato Interpret Tables, in O. Hartig, A. Harth and J. Sequeda,eds., Proceedings of the First International Workshop on Con-suming Linked Data (COLD 2010), 2010.

[92] I. Myroshnichenko and M. C. Murphy, Mapping ER Schemasto OWL Ontologies, in S.-C. Chen, R. Glasberg, J. Heflinet al., eds., Proceedings of the 2009 IEEE International Con-ference on Semantic Computing (ICSC 2009), pp. 324–329,IEEE, 2009.

[93] C. Nyulas, M. O’Connor and S. Tu, DataMaster - a Plug-infor Importing Schemas and Data from Relational Databasesinto Protégé, in 10th International Protégé Conference, 2007.

[94] P. Papapanagiotou, P. Katsiouli, V. Tsetsos, C. Anagnostopou-los and S. Hadjiefthymiades, RONTO: Relational to Ontol-ogy Schema Matching, AIS SIGSEMIS Bulletin, 3(3-4), pp.32–36, 2006.

[95] C. Pérez De Laborda and S. Conrad, Relational.OWL - AData and Schema Representation Format Based on OWL,in S. Hartmann and M. Stumptner, eds., Proceedings ofthe Second Asia-Pacific Conference on Conceptual Modeling(APCCM2005), pp. 89–96, 2005.

[96] C. Pérez De Laborda and S. Conrad, Database to Semantic

Web Mapping using RDF Query Languages, in D. W. Em-bley, A. Olivé and S. Ram, eds., Conceptual Modeling - ER2006: 25th International Conference on Conceptual Model-ing, Lecture Notes in Computer Science, vol. 4215, pp. 241–254, 2006.

[97] C. Pérez De Laborda, M. Zloch and S. Conrad, RDQuery- Querying Relational Databases on-the-fly with RDF-QL,poster in the 15th International Conference on KnowledgeEngineering and Knowledge Management (EKAW 2006),2006.

[98] A. Poggi, D. Lembo, D. Calvanese, G. De Giacomo, M. Lenz-erini and R. Rosati, Linking Data to Ontologies, Journal onData Semantics, 10, pp. 133–173, 2008.

[99] A. Poggi, M. Rodriguez-Muro and M. Ruzzi, Ontology-basedDatabase Access with DIG-MASTRO and the OBDA Pluginfor Protégé, in K. Clark and P. F. Patel-Schneider, eds., Pro-ceedings of the Fourth OWLED Workshop on OWL: Experi-ences and Directions, 2008.

[100] S. Polfliet and R. Ichise, Automated Mapping Generationfor Converting Databases into Linked Data, in A. Polleresand H. Chen, eds., Proceedings of the ISWC 2010 Posters& Demonstrations Track: Collected Abstracts, pp. 173–176,2010.

[101] W. J. Premerlani and M. R. Blaha, An Approach for ReverseEngineering of Relational Databases, Communications of theACM, 37(5), pp. 42–49, 1994.

[102] R. Ramakrishnan and J. Gehrke, Database Management Sys-tems, McGraw-Hill, New York City, NY, 3rd ed., 2002.

[103] S. Ramanathan and J. Hodges, Extraction of Object-OrientedStructures from Existing Relational Databases, ACM SIG-MOD Record, 26(1), pp. 59–64, 1997.

[104] S. Ramanujam, V. Khadilkar, L. Khan, M. Kantarcioglu,B. Thuraisingham and S. Seida, Update-Enabled Triplifica-tion of Relational Data into Virtual RDF Stores, InternationalJournal of Semantic Computing, 4(4), pp. 423–451, 2010.

[105] S. Sahoo, W. Halb, S. Hellmann, K. Idehen, T. Thibodeau,S. Auer, J. Sequeda and A. Ezzat, A Survey of Current Ap-proaches for Mapping of Relational Databases to RDF, W3CRDB2RDF Incubator Group Report, 2009.

[106] P. E. Salas, K. K. Breitman, J. F. Viterbo and M. A. Casanova,Interoperability by Design using the StdTrip Tool: an a prioriApproach, in A. Paschke, N. Henze and T. Pellegrini, eds.,Proceedings of the 6th International Conference on SemanticSystems (I-SEMANTICS 2010), ACM, 2010.

[107] M. Schmidt, T. Hornung, G. Lausen and C. Pinkel,SP2Bench: A SPARQL Performance Benchmark, in J. Li andP. S. Yu, eds., Proceedings of the 25th International Confer-ence on Data Engineering (ICDE 2009), pp. 222–233, IEEE,2008.

[108] M. Schneider and G. Sutcliffe, Reasoning in the OWL 2 FullOntology Language using First-Order Automated TheoremProving, in N. Bjørner and V. Sofronie-Stokkermans, eds.,Automated Deduction - CADE-23: 23rd International Con-ference on Automated Deduction, Lecture Notes in ComputerScience, vol. 6803, pp. 461–475, Springer, 2011.

[109] A. Seaborne, D. Steer and S. Williams, SQL-RDF, in W3CWorkshop on RDF Access to Relational Databases, 2007.

[110] J. F. Sequeda, R. Depena and D. P. Miranker, Ultrawrap: Us-ing SQL Views for RDB2RDF, poster in 8th InternationalSemantic Web Conference (ISWC 2009), 2009.

[111] J. F. Sequeda, S. H. Tirmizi, O. Corcho and D. P. Miranker,

http://www.knoodl.com/ui/groups/Mapping_Ontology_Community/wiki/User_Guide/media/RDB_Mapping_Specification_v0.2





Direct Mapping SQL Datababases to the Semantic Web: ASurvey (Technical Report TR-09-04), University of Texas,Austin, Department of Computer Sciences, 2009.

[112] G. Shen, Z. Huang, X. Zhu and X. Zhao, Research onthe Rules of Mapping from Relational Model to OWL, inB. Cuenca-Grau, P. Hitzler, C. Shankey and E. Wallace, eds.,Proceedings of the OWLED’06 Workshop on OWL: Experi-ences and Directions, 2006.

[113] A. Sheth and R. Meersman, Amicalola Report: Database andInformation Systems Research Challenges and Opportunitiesin Semantic Web and Enterprises, ACM SIGMOD Record,31(4), pp. 98–106, 2002.

[114] A. P. Sheth and J. A. Larson, Federated Database Systemsfor Managing Distributed, Heterogeneous, and AutonomousDatabases, ACM Computing Surveys, 22(3), pp. 183–236,1990.

[115] E. Sirin and J. Tao, Towards Integrity Constraints in OWL,in R. Hoekstra and P. F. Patel-Schneider, eds., Proceedingsof the 6th International Workshop on OWL: Experiences andDirections (OWLED 2009), 2009.

[116] K. Sonia and S. Khan, R2O Transformation System: Relationto Ontology Transformation for Scalable Data Integration, inJ. Bernardino and B. C. Desai, eds., Proceedings of the 2008International Database Engineering & Applications Sympo-sium (IDEAS ’08), pp. 291–295, ACM, 2008.

[117] L. Stojanovic, N. Stojanovic and R. Volz, Migrating Data-Intensive Web Sites into the Semantic Web, in G. B. Lam-ont, ed., Proceedings of the 2002 ACM Symposium on AppliedComputing (SAC’02), pp. 1100–1107, ACM, 2002.

[118] M. Svihla and I. Jelinek, Two Layer Mapping from Databaseto RDF, in Proceedinfs of the Sixth International Scien-tific Conference Electronic Computers and Informatics (ECI2004), pp. 270–275, 2004.

[119] M. Svihla and I. Jelinek, Benchmarking RDF ProductionTools, in R. Wagner, N. Revell and G. Pernul, eds., Databaseand Expert Systems Applications: 18th International Confer-ence(DEXA 2007), Lecture Notes in Computer Science, vol.4653, pp. 700–709, Springer, 2007.

[120] S. H. Tirmizi, J. F. Sequeda and D. P. Miranker, TranslatingSQL Applications to the Semantic Web, in S. S. Bhowmick,J. Küng and R. Wagner, eds., Database and Expert SystemsApplications: 19th International Conference (DEXA 2008),Lecture Notes in Computer Science, vol. 5181, pp. 450–464,Springer, 2008.

[121] Q. Trinh, K. Barker and R. Alhajj, RDB2ONT: A Tool forGenerating OWL Ontologies From Relational Database Sys-tems, in T. Atmaca, P. Dini, P. Lorenz and J. Neuman deSousa, eds., Proceedings of the Advanced International Con-ference on Telecommunications and International Confer-ence on Internet and Web Applications and Services (AICT-

ICIW’06), IEEE, 2006.[122] S. R. Upadhyaya and P. S. Kumar, ERONTO: a Tool for Ex-

tracting Ontologies from Extended E/R Diagrams, in L. M.Liebrock, ed., Proceedings of the 2005 ACM Symposium onApplied Computing (SAC’05), pp. 666 – 670, ACM, 2005.

[123] K. N. Vavliakis, T. K. Grollios and P. A. Mitkas, RDOTE -Transforming Relational Databases into Semantic Web Data,in A. Polleres and H. Chen, eds., Proceedings of the ISWC2010 Posters & Demonstrations Track: Collected Abstracts,pp. 121–124, 2010.

[124] J. Volz, C. Bizer, M. Gaedke and G. Kobilarov, Discoveringand Maintaining Links on the Web of Data, in A. Bernstein,D. R. Karger, T. Heath, L. Feigenbaum, D. Maynard, E. Mottaand K. Thirunarayan, eds., The Semantic Web - ISWC 2009:8th International Semantic Web Conference, Lecture Notes inComputer Science, vol. 5823, pp. 650–665, Springer, 2009.

[125] R. Volz, S. Handschuh, S. Staab, L. Stojanovic and N. Sto-janovic, Unveiling the Hidden Bride: Deep Annotation forMapping and Migrating Legacy Data to the Semantic Web,Web Semantics: Science, Services and Agents on the WorldWide Web, 1(2), pp. 187–206, 2004.

[126] H. Wache, T. Vögele, U. Visser, H. Stuckenschmidt,G. Schuster, H. Neumann and S. Hübner, Ontology-Based In-tegration of Information - A Survey of Existing Approaches,in A. Gómez-Pérez, M. Gruninger, H. Stuckenschmidt andM. Uschold, eds., Proceedings of the IJCAI-01 Workshop onOntologies and Information Sharing, pp. 108–117, 2001.

[127] Z. Xu, X. Cao, Y. Dong and W. Su, Formal Approach andAutomated Tool for Translating ER Schemata into OWL On-tologies, in H. Dai, R. Srikant and C. Zhang, eds., Advancesin Knowledge Discovery and Data Mining: 8th Pacific-AsiaConference (PAKDD 2004), Lecture Notes in Computer Sci-ence, vol. 3056, pp. 464–475, Springer, 2004.

[128] Z. Xu, S. Zhang and Y. Dong, Mapping between RelationalDatabase Schema and OWL Ontology for Deep Annotation,in T. Nishida, Z. Shi, U. Visser et al., eds., Proceedings ofthe 2006 IEEE/WIC/ACM International Conference on WebIntelligence, pp. 548–552, IEEE, 2006.

[129] S. Zhao and E. Chang, From Database to Semantic Web On-tology: An Overview, in R. Meersman, Z. Tari and P. Herrero,eds., On the Move to Meaningful Internet Systems: OTM 2007Workshops, Lecture Notes in Computer Science, vol. 4806,pp. 1205–1214, Springer, 2007.

[130] C. Zhou, C. Xu, H. Chen and K. Idehen, Browser-based Se-mantic Mapping Tool for Linked Data in Semantic Web, inC. Bizer, T. Heath, K. Idehen and T. Berners-Lee, eds., Pro-ceedings of the WWW 2008 Workshop on Linked Data on theWeb (LDOW 2008), 2008.

1 ios press bringing relational databases into the semantic web: … · 2012. 10. 30. · 2 d.e....

Documents