benchmarking reasoners for multi-ontology applications

Ameet N Chitnis, Abir Qasem and Jeff Heflin

11 November 2007

Talk Organization• Motivation (a.k.a. why yet another benchmark?) and Influences • The Workload

• Domain Ontologies, map ontologies, data sources, queries• The Metrics• How do we generate things?

• Domain ontology generation• Map ontology Generation

• Parameters & Relationships• Map Generator Algorithm

• Data Source Generation• Query Generation

• Sample Workload • Conclusion & Future Work

MotivationAs the Semantic Web matures …

OWL Ontologies and data from various organizations will gain commercial value

Alignment of different ontologies and integration of data that commit to them will be a viable business enterprise

Quite possibly we will have post development alignments between ontologies (Alignment tools, third parties etc.)

Currently DBPedia, Hawkeye provides some form of third party alignments (non commercial)

We wanted to develop a benchmark that reflects the above reality

Influences Lehigh University Benchmark (LUBM) by Y. Guo, Z.

Pan, and J. Heflin. (ISWC 2004) Extended LUBM (can support both OWL Lite and OWL

DL) by L. Ma, Y. Yang, Z. Qiu, G, Xie and Y. Pan. (ESWC 2006)

Statistical Analysis of the available Semantic Web ontologies by Tempich, C. and Volz, R. (ISWC 2003)

Benchmarking DL systems by I. Horrocks and P. Patel-Schneider. (DL Workshop 1998)

Internet topology generator by J. Winick and S. Jamin. (University of Michigan)

The Workload (1)Domain ontologies

“Simple” ontologies. We can control number of classes, properties, and branching factor of the hierarchies

Data sourcesWe can control number of data sources that commit to a

given ontology, number of classes that will have individuals, number of properties that will connect those individuals, number of triples.

QueriesExtensional queries in SPARQL. We can control the mix of classes, properties, individualsWe can control selectivity

The Workload (2)Map ontologies: Main focus of this work

In our work a map ontology consists solely of “mapping” axioms that establish alignment between two domain ontologies

This is just for convenience of generation and analysis. Semantically they are not much different from the domain ontologies

Macro level: We generate Directed acyclic graph of domain ontologies Every edge represents a map ontology

Micro level: We can control the type of axioms that are used to map two

domain ontologies

MetricsMetrics Systems with

Centralized Approach

Systems with Distributed Approach

Initialization Time

Time taken to Load the knowledge base

Time taken to read the index (e.g. meta-data)

Query Response Time

Reasoning time Load Time + reasoning time

Query Completeness

Consider queries that entail at least one answer.In determining the relative completeness of queries against a reference set.

Repository Size Number of triples

N/A

Domain Ontology GenerationSimple taxonomyThe number to generate vary in a normal

distribution with a user supplied value for the mean

Given a branching factor and number of terms we generate a balanced tree

Complex axioms are left for map ontologies

Map Ontology GenerationInputs

No. of Ontologies we want in the workload Average Out-degree (referred to as out below) Diameter

The number of maps created is approximately equal to - maps ~(total onts-terminal onts)* out However we do not have terminal onts as a parameter A reasonable approximation is Terminal ontologies ~ (onts*out)/(diameter+out) Thus we have Number of maps ~ (onts*out*diameter)/(diameter+out)

Map Generator Algorithm1. Determine and mark the number of terminal nodes2. Create a path of diameter length3. Choose targets for every non-terminal ontology.

Constraints:a. No Cyclesb. No path greater than diameterc. Non-terminal nodes should not become terminal

Create the corresponding map ontologies by generating mapping axioms

4. Update the parameters of the source and the target

Mapping axiomsGiven two domain ontologies and a desired

distribution of OWL constructors and restrictions

We choose terms from the domain ontologies and create an axiom that connects them

We can generate fairly complex axioms E.g. O1:A ⊔ O1:B ⊑ ∃ O2:P.O2:C ⊓ ∀O2:Q.O2:D

Currently the algorithm is restricted to generating axioms that will keep the ontology to OWLII (a subset of OWL used by OBII, Qasem et al. 2007, ISWC NFR workshop)

But this is NOT a limitation of our approach

Source GenerationChoose an ontologyChoose number of classes to create

individualsGenerate triples

We can either generate random individuals or Use the domain and range information to

connect the individuals with properties

Query GenerationSPARQL Queries (SELECT) 1. Choose the first predicate from the classes of an ontology.

2. We bias the next predicate with a 75% chance of being one of the properties from the ontology.

3. We make use of shared variables in order to implement “joins”. A shared variable is equally likely to be in the subject as well as the object position.

4. For single predicate queries all the variables are distinguished. For others, on an average 2/3rd of the variables are distinguished and the rest are non-distinguished.

5. There exists a 10% chance for a constant.

A Sample WorkloadWe used the benchmark to evaluate OBII – a

distributed query answering system We compared it with a “baseline” system which

was essentially a KAON2 wrapperSome characteristics of the workload

50% of classes had individuals On an average we generated 75 triples in a source

Generated configurations as large as 100 domain ontologies with about 1000 data sources

Conclusion and Future Work A focus on workload that accounts for post

development alignments Micro level - controlling mapping axioms Macro level - controlling how ontologies are mapped

Domain ontologies synthesis can be expanded to support complex axioms

Experiment with different characteristics Hubs and Authorities (different in-degree / out-degree

pattern)

benchmarking reasoners for multi-ontology applications

Documents

owl ontologies

workloaddomain ontologies

number of properties

number of classes

number of data sources

number of terms

ontologies alignment

given ontology