semantic web 0 (0) 1 1 ios press a stitch in time saves ... · semantic web 0 (0) 1 1 ios press a...

24
Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals Harsh Thakkar a,* , Dharmen Punjani b Yashwant Keswani c Jens Lehmann a,d and Sören Auer e a Smart Data Analytics, University of Bonn, Germany, E-mail: {thakkar, jens.lehmann}@cs.uni-bonn.de b Department, National and Kapodistrian University of Athens, Greece, E-mail: [email protected] c DA-IICT, India, E-mail: [email protected] d Fraunhofer IAIS, Germany, E-mail: [email protected] e TIB Technische Informationsbibliothek & L3S Research Center, Leibniz University of Hannover, Germany, E-mail: [email protected] Abstract. Knowledge graphs have become popular over the past years and frequently rely on the Resource Description Frame- work (RDF) or Property Graphs (PG) as underlying data models. However, the query languages for these two data models – SPARQL for RDF and Gremlin for property graph traversal – are lacking interoperability. We present Gremlinator, a novel SPARQL to Gremlin translator. Gremlinator translates SPARQL queries to Gremlin traversals for executing graph pattern match- ing queries over graph databases. This allows to access and query a wide variety of Graph Data Management Systems (DMS) using the W3C standardized SPARQL query language and avoid the learning curve of a new Graph Query Language. Grem- lin is a system agnostic traversal language covering both OLTP graph database or OLAP graph processors, thus making it a desirable choice for supporting interoperability wrt. querying Graph DMSs. We present a comprehensive empirical evaluation of Gremlinator and demonstrate its validity and applicability by executing SPARQL queries on top of the leading graph stores Neo4J, Sparksee and Apache TinkerGraph and compare the performance with the RDF stores Virtuoso, 4Store and JenaTDB. Our evaluation demonstrates the substantial performance gain obtained by the Gremlin counterparts of the SPARQL queries, especially for star-shaped and complex queries. Keywords: SPARQL, Gremlin, Pattern Matching, Graph Traversal, Query Translator, RDF Graph, Property Graph, Gremlinator 1. Introduction Knowledge graphs have become increasingly pop- ular over the past years. The two most popular data models for representing and storing knowledge graphs are property graphs (PG) and the Resource Description Framework (RDF). For RDF, the SPARQL query lan- guage was standardized by W3C, whereas for PGs sev- eral languages are frequently used, including Grem- lin [1]. Both data models and the corresponding data management techniques have distinct and complemen- tary characteristics: RDF is suited for distributed data * Corresponding author. E-mail: {thakkar, jens.lehmann}@cs.uni- bonn.de. integration with built-in world-wide unique identifiers and the expressive SPARQL query language; PGs on the other hand support extremely scalable storage and querying and are meanwhile widely used for modern Web applications. In this article, we present an approach for execut- ing SPARQL queries over graph databases via Grem- lin traversals – Gremlinator, thus building a bridge be- tween the currently still largely disjoint semantic and graph data technology ecosystems and thus addressing the query interoperability problem. A SPARQL-PG query translation renders several benefits: (1) Applications based on W3C Semantic Web standards, like SPARQL and RDF, can use prop- erty graph databases in a non-intrusive fashion. (2) The 1570-0844/0-1900/$35.00 © 0 – IOS Press and the authors. All rights reserved arXiv:1801.02911v2 [cs.DB] 12 Feb 2018

Upload: others

Post on 03-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

Semantic Web 0 (0) 1 1IOS Press

A Stitch in Time Saves Nine – SPARQLQuerying of Property Graphs using GremlinTraversalsHarsh Thakkar a,*, Dharmen Punjani b Yashwant Keswani c Jens Lehmann a,d and Sören Auer e

a Smart Data Analytics, University of Bonn, Germany, E-mail: {thakkar, jens.lehmann}@cs.uni-bonn.deb Department, National and Kapodistrian University of Athens, Greece, E-mail: [email protected] DA-IICT, India, E-mail: [email protected] Fraunhofer IAIS, Germany, E-mail: [email protected] TIB Technische Informationsbibliothek & L3S Research Center, Leibniz University of Hannover, Germany,E-mail: [email protected]

Abstract. Knowledge graphs have become popular over the past years and frequently rely on the Resource Description Frame-work (RDF) or Property Graphs (PG) as underlying data models. However, the query languages for these two data models –SPARQL for RDF and Gremlin for property graph traversal – are lacking interoperability. We present Gremlinator, a novelSPARQL to Gremlin translator. Gremlinator translates SPARQL queries to Gremlin traversals for executing graph pattern match-ing queries over graph databases. This allows to access and query a wide variety of Graph Data Management Systems (DMS)using the W3C standardized SPARQL query language and avoid the learning curve of a new Graph Query Language. Grem-lin is a system agnostic traversal language covering both OLTP graph database or OLAP graph processors, thus making it adesirable choice for supporting interoperability wrt. querying Graph DMSs. We present a comprehensive empirical evaluationof Gremlinator and demonstrate its validity and applicability by executing SPARQL queries on top of the leading graph storesNeo4J, Sparksee and Apache TinkerGraph and compare the performance with the RDF stores Virtuoso, 4Store and JenaTDB.Our evaluation demonstrates the substantial performance gain obtained by the Gremlin counterparts of the SPARQL queries,especially for star-shaped and complex queries.

Keywords: SPARQL, Gremlin, Pattern Matching, Graph Traversal, Query Translator, RDF Graph, Property Graph, Gremlinator

1. Introduction

Knowledge graphs have become increasingly pop-ular over the past years. The two most popular datamodels for representing and storing knowledge graphsare property graphs (PG) and the Resource DescriptionFramework (RDF). For RDF, the SPARQL query lan-guage was standardized by W3C, whereas for PGs sev-eral languages are frequently used, including Grem-lin [1]. Both data models and the corresponding datamanagement techniques have distinct and complemen-tary characteristics: RDF is suited for distributed data

*Corresponding author. E-mail: {thakkar, jens.lehmann}@cs.uni-bonn.de.

integration with built-in world-wide unique identifiersand the expressive SPARQL query language; PGs onthe other hand support extremely scalable storage andquerying and are meanwhile widely used for modernWeb applications.

In this article, we present an approach for execut-ing SPARQL queries over graph databases via Grem-lin traversals – Gremlinator, thus building a bridge be-tween the currently still largely disjoint semantic andgraph data technology ecosystems and thus addressingthe query interoperability problem.

A SPARQL-PG query translation renders severalbenefits: (1) Applications based on W3C SemanticWeb standards, like SPARQL and RDF, can use prop-erty graph databases in a non-intrusive fashion. (2) The

1570-0844/0-1900/$35.00 © 0 – IOS Press and the authors. All rights reserved

arX

iv:1

801.

0291

1v2

[cs

.DB

] 1

2 Fe

b 20

18

Page 2: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

2 H. Thakkar et al. / Gremlinator

query translation lays the foundation for a hybrid useof RDF triple stores and property graph DMS – wherea particular query can be dispatched to the DMS ca-pable to answer the query more efficiently [2]. In par-ticular, property graph databases have been shown towork very well for a wide range of queries which ben-efit from locality in a graph. Rather than performingexpensive joins, property graph databases use microindices to perform traversals. (3) Users familiar withthe W3C SPARQL query language can avoid learninganother query language.

To the best of our knowledge, this is the firstwork addressing the query interoperability (transla-tion) problem. Related work (cf. Section 2) mostlycovers the area of SPARQL to SQL conversion andvice versa. In contrast to those previous efforts, wehave to overcome the challenge of mediating be-tween two very different execution paradigms: WhileSPARQL uses pattern matching techniques, Grem-lin is based on performing graph traversals. Morespecifically, previous efforts applied query rewritingtechniques between formalisms, which are ultimatelyrooted in relational algebra operations, whereas we hadto bridge more disparate query paradigms. While thisis a significant challenge, it is also the reason why sub-stantial performance improvements can be made de-pending on the query characteristics: Whereas directSPARQL query execution can be expected to be suit-able for large analytical joins over the entire dataset,the Gremlin conversion can significantly speed up allqueries that require exploiting the graph locality.

We selected TinkerPop Gremlin as target language,since it is more general than, e.g. CYPHER, as it sup-ported by a wide range of property graph databases(including OLTP and OLAP processors (see Figure 1(a)). Moreover, Gremlin supports both the imperative(graph traversal) and declarative (graph pattern match-ing) style [1], for addressing the query interoperabil-ity issue. Lastly, together with the Apache TinkerPopframework, Gremlin is a language and a virtual ma-chine, enabling to design another traversal languagethat compiles to the Gremlin traversal machine (analo-gous to how Scala compiles to the JVM), ref. Figure 1(b).

We map SPARQL queries to the pattern matchingGremlin traversals (i.e. we map declarative SPARQLqueries to declarative Gremlin constructs and not theimperative ones). This ensures a level of fairnesswhen comparing the performance of both Graph QueryLanguages (GQLs). Furthermore, instead of translat-ing SPARQL queries to a specific dialect of Grem-

Figure 1. The Gremlin Traversal Language and Machine.

lin (e.g. Gremlin-Java8, Gremlin-Python etc.), we mapeach corresponding operation within a SPARQL ba-sic graph pattern (BGP) to its corresponding traver-sal step in the Gremlin instruction library (i.e. a singlestep traversal operation). As a result, we build complexpattern matching traversals, in an analogous fashion toSPARQL style querying wherein multiple BGPs formcomplex graph patterns (CGP). Thus, it is possible toconstruct a corresponding Gremlin traversal for eachSPARQL query.

Overall, we make the following contributions:– We propose a novel approach for mapping SPARQL

queries to Gremlin pattern matching traversals, ,which is the first work converting an RDF to aproperty graph query language to the best of ourknowledge.

– Our Gremlinator implementation for executingSPARQL queries over a plethora of third partygraph DMS such as Neo4J, Sparksee, OrientDB,etc. using the Apache TinkerPop framework isopenly available.

– We report the results of a comprehensive em-pirical evaluation of the proposed translation ap-proach comprising a variety of state-of-the-artproperty graph databases and triple stores on theNorthwind and BSBM datasets.

The remainder of the article is organized as follows:Section 2 covers related query conversion efforts. Sec-tion 3 introduces preliminary notions. Section 4 de-scribes the relationship between SPARQL graph pat-tern matching and Gremlin traversal steps. Section 5explains our mapping approach. Section 6 presents theexperimental evaluation on two famous datasets, dis-cusses the results and observations. Finally, Section 7concludes the article and describes future work.

Page 3: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 3

2. Related Work

In this section we briefly survey the related workwith regard to techniques and tools that support thetranslation and execution of GQLs.

SPARQL → SQL: There is a substantial amountof work been done for conversion of SPARQL queriesto SQL queries [3–8]. Ontop [3]1 exposes relationaldatabases as virtual RDF graphs by linking the terms(classes and properties) in the ontology to the datasources through mappings. This virtual RDF graphcan then be queried using SPARQL by dynamicallyand transparently translating the SPARQL queriesinto SQL queries over the relational databases. Thework presented in [4] generates SQL that is optimizedand also provides a well-defined specification of theSPARQL semantics used in the translation. In addition,Ontop also supports R2RML mappings over generalrelational schemas. The authors show that their imple-mentation can outperform other well known SPARQL-to-SQL systems, as well as commercial triple storesby large margin. In [5] a SPARQL-to-SQL translationtechnique is introduced, that focuses on the genera-tion of efficient SQL queries. It relies on a mappinglanguage that lacks support for URI templates and isless expressive than R2RML. [6] proposes a transla-tion function that takes a query and two many-to-onemappings: (i) a mapping between the triples and thetables, and (ii) a mapping between pairs of the form(triple, pattern, position) and relational attributes. Inaddition, the approach in [6] assumes that the under-lying relational DB is denormalized, and stores RDFterms. The two semantics deviate in the definition ofthe OPTIONAL algebra operator. The work in [8] isthe extension of work in [6] to include R2RML map-pings. [7] makes use of non-standard SQL constructsfor SPARQL–SQL translation and lacks the formalproof that the translation is correct and an empiricalevaluation with realistic data is missing.

SQL→ SPARQL: The work in [9] presents a for-mal semantics preserving the translation from SQL toSPARQL. RETRO [9] deals only with schema map-ping and query mapping rather than to transform thedata physically. Schema mapping derives a domain-specific relational schema from RDF data. Query map-ping transforms an SQL query over the schema into anequivalent SPARQL query, which in turn is executedagainst the RDF store.

1Ontop system (http://ontop.inf.unibz.it/)

SQL → CYPHER: CYPHER2 is the graph querylanguage used to query the Neo4j3 graph database.There has been no work yet aiming to convert theRDBMS to CYPHER. However, there are some exam-ples4 that show the equivalent CYPHER queries forcertain SQL queries.

3. Preliminaries

In this section, we recall and summarize the mathe-matical concepts which will be used in this article. Ournotation closely follows [10] and extends [11] by in-troducing the notion of vertex labels, a detailed discus-sion on which can be found in [12].

3.1. Graph Data Models

3.1.1. Edge-labeled Graphs.The Resource Description Framework (RDF) is a

well-known W3C standard, which is used for datamodeling and encoding machine readable content onthe Web [13] and within intranets. An RDF graphcan be seen as a set of triples, roughly analogous tonodes and edges in a graph database. However, RDF ismore specific in defining disjoint vertex-sets of blanknodes, literals and IRIs. In the rudimentary form, anRDF graph is a directed, edge-labeled, multi-graph orsimply an edge-labeled graph. In our current context,we do not consider blank nodes.5 Edge-labeled graphscan be used to encode complex information despitetheir elementary structure [10]. Edge-labeled graphshave been formally defined in a wide variety of texts,such as [10, 14–17]. We adapt the definition providedby [10], which is the closest to our current context:

Definition 3.1 (Edge-labeled Graph). An edge-labeledgraph is defined as G = {V, E}; where:

– V is the set of vertices,– E is the set of directed edges such that E ⊆ (V ×

Lab× V) where Lab is the set of Labels.

2CYPHER Query Language (https://neo4j.com/developer/cypher-query-language/)

3Neo4j (https://neo4j.com/)4SQL to CYPHER (https://neo4j.com/developer/

guide-sql-to-cypher/)5The treatment of blank nodes is orthogonal to our current goal,

as they related to the translation RDF graphs to property graphs. Wefocus on the pattern matching features and semantics of SPARQLand Gremlin.

Page 4: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

4 H. Thakkar et al. / Gremlinator

Figure 2. RDF graph example. This figure portrays a collaboration network of employees in a software company.

3.1.2. Property GraphsProperty graphs, also referred to as directed, edge-

labeled, attributed multi-graphs, have been formallydefined in a wide variety of texts, such as [10, 18–21].We adapt the definition of property graphs presentedby [20]:

Definition 3.2 (Property Graph). A property graph isdefined as G = {V, E, λ, µ}; where:

– V is the set of vertices,– E is the set of directed edges such that E ⊆ (V ×

Lab× V) where Lab is the set of Labels,– λ is a function that assigns labels to the edges and

vertices (i.e. λ : V ∪ E → Σ∗)6, and– µ is a partial function that maps elements and keys

to values (i.e. µ : (V∪E)×R→ S ) i.e. properties(key r ∈ R, value s ∈ S ).

Figures 2 and 3, present different data model visual-izations of the Apache TinkerPop modern crew graph7.

6A finite set of strings (Σ∗)7TinkerPop Modern Crew property graph (http://tinkerpop.

apache.org/docs/3.2.3/reference/#intro)

Figure 3. Property Graph example. This figure presents the prop-erty graph version of the RDF graph as in Figure 2

We use these as running examples throughout this ar-ticle.

3.2. Graph Pattern Matching

The Graph Pattern Matching (GPM) problem is gen-erally perceived as a subgraph matching problem (akasubgraph isomorphism problem) [16]. GPM can bedone over both undirected and directed graphs respec-tively8. Traditionally GPM is a computational task that

8In this work we will only focus on directed graphs.

Page 5: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 5

can be defined as the evaluation of graph patterns overa graph database [10]. The most trivial form of a graphpattern is the basic graph pattern (BGP). A BGP cou-pled with features such as projection, union, differenceand optional forms a complex graph pattern (CGP). Weillustrate these concepts in brief with respect to. con-text of SPARQL and Gremlin GQLs in Section 4. De-tailed information on GPM is available in [10, 16, 18].

GPM is carried out by matching (also referred toas evaluation), a sub-graph pattern over a graph G.Matching has been formally defined in various textsand we summarize a formal definition in our contextwhich closely follows the definition provided by [10,18].

Definition 3.3 (Match of a Graph Pattern JPKG). Agraph pattern P = (Vp, Ep, λp, µp); is matching thegraph G = (V, E, λ, µ), iff the following conditions aresatisfied:

1. there exist mappings µp and λp such that, all vari-ables are mapped to constants, and all constantsare mapped to themselves (i.e. λp ∈ λ, µp ∈ µ),

2. each edge é ∈ Ep in P is mapped to an edge e ∈ Ein G, each vertex v́ ∈ Vp in P is mapped to avertex v ∈ V in G, and

3. the structure of P is preserved in G (i.e. P is asub-graph of G)

The definition for matching for edge-labeled graphsis analogous to that of the property graph (ref. Def. 3.3):(i) a mapping m maps the constants to themselves andvariables to constants; and (ii) the structure of P ispreserved in G (example illustration ref. Figure 3).

3.3. SPARQL Query Language

SPARQL is a declarative GQL which is a W3C rec-ommendation and the query language of the RDF triplestores. The building blocks of SPARQL are RDF triplepatterns, consisting of subject, predicate, and object,where either of it can be a variable, literal value or IRI.In this work, we do not consider the blank node seman-tics.

Definition 3.4 (SPARQL BGP). A SPARQL querydefines a graph pattern to be matched against a givenRDF graph. A basic graph pattern (BGP) is a setof triple patterns, tp = (s

′, p

′, o

′) where s

′ ∈{s, ?s}, p

′ ∈ {p, ?p} and o′ ∈ {o, ?o}.

3.3.1. Evaluation of a graph pattern in SPARQLSPARQL operates over homomorphism-based bag

semantics defined in [22, 23]. In the context of SPARQL,the evaluation of a graph pattern P against an RDFgraph G has been well defined in literature. We referto [10, 15, 21, 23, 24] for the formal definitions. In thelater sections we illustrate the evaluation of a SPARQLgraph pattern with examples.

3.4. The Gremlin Graph Traversal Language andMachine

Gremlin is the query language of Apache Tinker-Pop9 graph computing framework. Gremlin is systemagnostic, and enables both – pattern matching (declar-ative) and graph traversal (imperative) style of query-ing over graphs.

3.4.1. The Machine.Theoretically, a set of traversers in T move (tra-

verse) over a graph G (property graph, cf. Section 3.2)according to the instruction in Ψ, and this computationis said to have completed when there are either:

1. no more existing traversers (t), or2. no more existing instructions (ψ) that are refer-

enced by the traversers (i.e. program has halted).

The result of the computation is either an empty set(i.e. former case) or the multiset union of the graph lo-cations (vertices, edges, labels, properties, etc.) of thehalted traversers which they reference. Rodriguez [1]formally define the operation of a traverser t as fol-lows:

G ← µt ∈ T{β, ς} ψ → Ψ [1] (1)

where, µ: T → U is a mapping from the traverser toits location in G; ψ: T → Ψ maps a traverser to a stepin Ψ; β: T → N maps a traverser to its bulk10; ς: T →U maps a traverser to its sack (i.e. local variable of atraverser) value.

3.4.2. The Traversal.A Gremlin graph traversal can be represented in any

host language that supports function composition andfunction nesting. These steps are either of:

1. Linear motif - f ◦ g ◦ h, where the traversal is alinear chain of steps; or

9Gremlin: Apache TinkerPop’s graph traversal language and ma-chine (https://tinkerpop.apache.org/)

10 The bulk of a traverser is number of equivalent traversers aparticular traverser represents.

Page 6: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

6 H. Thakkar et al. / Gremlinator

2. Nested motif - f ◦(g◦h) where, the nested traver-sal g ◦ h is passed as an argument to step f [1].

A step f ∈ Ψ can be, defined as f : A? → B?11.Where, f maps a set of traversers of type A (located atobjects of A) to a set of traversers of type B (located atobjects of B). Given that Gremlin is a language and avirtual machine, it is possible to design another traver-sal language that compiles to the Gremlin traversal ma-chine (analogous to how Scala compiles to the JVM).As a result, there exist various Gremlin dialects suchas Gremlin-Groovy, Gremlin-Python, etc.

3.4.3. Evaluation of a graph pattern in GremlinIn Gremlin, GPM is performed by traversing12 over

a graph G. A traversal t over G derives paths of arbi-trary length. Therefore, a GPM query in Gremlin canbe perceived as a path traversal. Rodriguez et al. [11]define a path as:

Definition 3.5 (Path). A path p is a sequence or astring, where p ∈ E? and E ⊂ (V × Le × V)13. A pathallows for repeated edges and the length of a path isdenoted by ||p||, which is equal to the number of edgesin p.

Moreover, from [20] we also know that these pathqueries are comprised of several atomic operationscalled the single-step traversals. We discuss these inbrief in Section 4.2. The evaluation of an input graphpattern in Gremlin is taken care by two functions:

1. the recursively defined match() function, whichevaluates each constituting graph pattern andkeeps track of the traverser’s location in the graph(i.e. path history), and,

2. the bind() function, which maps the declaredvariables (elements and keys) to their respectivevalues.

The evaluation (also know as matching, ref. Def. 3.3)of a graph pattern in Gremlin is carried out by thematch()-step. We borrow the notation of the eval-uation of a graph pattern (JQKG) from [15] for repre-senting the evaluation of a Gremlin traversal Ψ overa graph G, i.e. JΨKgml

G . Details of execution of thematch()-step in Gremlin are described in [1].

11The Kleene star notation (A?, B?) denotes that multiple tra-versers can be in the same element (A,B).

12The act of visiting of vertices (v ∈ V) and edges (e ∈ E) in agraph in an alternating manner (in some algorithmic fashion) [20].

13The kleene star operation ? constructs the free monoid E? =⋃∞n=0 Ei. where E0 = {ε}; ε is the identity/empty element.

4. SPARQL↔ Gremlin homology

In this section we present the correspondence be-tween SPARQL BGPs/CGPs with Gremlin patternmatching path traversals. In doing so we devise a for-mal analogy borrowing the evaluation semantics ofa SPARQL query [10, 15, 22] (referring to the wellestablished bag semantics) and put them in contextof Gremlin traversals [1, 11, 20]. A detailed discus-sion on the one-to-one operator level mapping betweenSPARQL and Gremlin can be found in the study [12].Furthermore, we illustrate the applicability of theseconcepts with respect to the running examples (asshown in Figures 2 and 3).

4.1. Graph Pattern Matching via Traversing

A SPARQL query consists of several BGPs whichwhen used with features such as projection or union,form a CGP (as we discussed in section 3.3). BGPs(ref. Definition 3.4) are comprised of triple patterns,which match to RDF triples that constitute the RDFdataset. Moreover, the RDF data model resembles es-sentially a directed, edge-labeled, multi-graph or RDFgraph. It is, therefore, possible to traverse an RDFgraph with Gremlin (i.e. construct traversals), regard-less of it being an edge-labeled graph or a propertygraph as the core-concept of traversing remains unal-tered.

Analogous to SPARQL, Gremlin also provides theGPM construct, using the Match()-step. This en-ables the user to represent the query using multiple in-dividually connected or disconnected graph patterns.Each of these graph patterns can be perceived as a sim-ple path traversal, to-and-from a specific source anddestination, over the graph.

In Gremlin, each traversal can be perceived as apath query, starting from a particular source (A) andterminating at a destination (B) by visiting verticesv ∈ V and edges e ∈ E(V × V). Each path query iscomposed of one or more single-step traversals (SST)as shown by [1]. Through function composition andcurrying, it is possible to define a query of arbitrarylength [1]. Furthermore, just as multiple BGPs form aCGP in SPARQL, the corresponding SSTs can be cou-pled with features such as projection, union, optional,etc. to form a complex path traversal query in Grem-lin. These path queries can be a combination of eithera source, destination, labeled traversal or all of them ina varying fashion, depending on the information needof the user.

Page 7: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 7

4.2. SPARQL BGPs as Gremlin Single StepTraversals

In this section we establish the exact analogy be-tween SPARQL BGPs and Gremlin SSTs. In SPARQL,GPM is conducted by matching BGPs which consistof triple patterns (TP), that form a sub-graph, againstan RDF graph G (i.e. checking whether a sub-graph iscontained in G). We can represent BGPs notationallyas:

BGP = {T P }+ ; T P = {s p o . }∗14 (2)

In this unique representation, each subject (s) and ob-ject (o) (i.e. nodes) in a triple is connected throughonly one predicate (p) relation (i.e. edges). Figure 2presents an example of the graph representation of asample RDF dataset.

In Gremlin, GPM is conducted by the match()-step, wherein each above graph patterns, representedas pattern matching traversals are evaluated against agraph G. We already know that Gremlin allows a userto form traversals of arbitrary length using functioncurrying and composition. Due to this functionalityand given the nature of the information represented ina triple, it is possible to represent the underlying traver-sal operation using a SST, which represented by itspredicate/edge.

For instance, consider the BGP in listing 1, wherethe information need is to find what marko has created.This pattern, i.e. a subgraph formed by the BGP willbe matched against a graph (ref. Figure 2) to bind thevalues of the variables labeled as "x" to "marko", and"y" to the name property of the node connected by theedge/predicate labeled "created" by "x". Listing 1 rep-resents the SPARQL BGP as shown in Figure 4(a).

1 { ? x a : name " marko " . ? x a : C r e a t e d ? y . }

Listing 1: What has marko created?

The corresponding Gremlin traversal for the aboveSPARQL query is shown in listing 2 from Figure 3.Here the underlying SSTs required are.has(’name’,’marko’) and .out(’Created’)that map to the HasStep() and VertexStep()instructions in the Gremlin instruction-set library [1,11] respectively.

14The ∗ symbolizes that a TP can also be an empty graph pattern,whereas + symbolizes that a BGP can consists of more than one TPs(i.e. a set of triple patterns).

1 g .V( ) . a s ( ’ x ’ ) . has ( ’ name ’ , ’ marko ’ ) . o u t ( ’C r e a t e d ’ ) . a s ( ’ y ’ )

Listing 2: An outgoing traversal from the vertex"marko" via an edge labeled "created".

Here, g.V() i.e. Vg is the traverser definition bijec-tive to V where, ]iµ((Vg)i) = V. Thus, each predicatein a triple pattern of a SPARQL BGP manifests theSST required for the matching the graph pattern. Wedescribe the different types of Gremlin SSTs and theircorrespondence with the SPARQL BGPs and summa-rize them in Table 1.

In [1], Rodriguez presents an itemization of theGremlin SSTs which can be combined together toform a complex path traversal (analogous to CGP inSPARQL). We classify these SSTs into four categoriesdepending on the whether the predicate-object combi-nation (s p o) of the corresponding SPARQL BGPis a literal/value of a vertex/edge label (L) or a ver-tex/edge property (P1) or a variable representing aproperty value (P2) or a traversal to and from a vertexgiven an edge label (E). These four categories are:

– Case L – Traversal to access the label values of avertex or an edge (Lv/Le)

– Case P1/P2 – Traversal to access the propertyvalues of a vertex or an edge (Pv/Pe)

– Case E – Incoming/outgoing traversal betweentwo adjacent vertices given an edge label (Ein/Eout)

We consider the above mentioned four cases as ourbase cases for constructing complex/composite traver-sals from SSTs with respect to their correspondingSPARQL CGPs. Now, lets recall the SPARQL BGPfrom listing 1. Here, the corresponding Gremlin SSTsfor the SPARQL BGPs are the cases – Pv (as the traver-sal is to access the property value of a vertex) and Eout

(as the traversal is from a vertex named "marko" via theedge labelled "Created"). Table 1 connects the dots bymapping the the Gremlin SSTs [notationally ψs] (de-fined in [1, 11]) to the SPARQL BGPs.

4.3. SPARQL Queries as Gremlin Pattern MatchingTraversals

In SPARQL query language, as we have mentionedin earlier sections, a query comprises of one or moreCGPs which in turn are formed by a combination ofBGPs.

Similarly, in Gremlin traversal language, a patternmatching traversal comprises of one or more path

Page 8: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

8 H. Thakkar et al. / Gremlinator

S.S.T. [1] Basic Graph Pattern (BGP) Corresponding Gremlin Traversal Step σ(BGP) = ψs Case

Lv { ?x v:label "person" .} [MatchStartStep(x), HasStep([∼label.eq(person)]), MatchEndStep]Le { ?x e:label "knows" .} [MatchStartStep(x), HasStep([∼label.eq(knows)]), MatchEndStep]

L

Pv { ?x v:name "marko" .} [MatchStartStep(x), HasStep([name.eq(marko)], MatchEndStep]Pe { ?x e:weight "0.8" .} [MatchStartStep(x), HasStep([weight.eq(0.8)]), MatchEndStep]

P1

Pe { ?x e:weight ?y .} [MatchStartStep(x), PropertiesStep([name],value), MatchEndStep(y)]Pv { ?x v:name ?y .} [MatchStartStep(x), PropertiesStep([name],value), MatchEndStep(y)]

P2

EOUT { ?x e:knows ?x .} [MatchStartStep(x), VertexStep(OUT,[knows],vertex), MatchEndStep(y)]EIN { ?y e:knows ?x .}* [MatchStartStep(y), VertexStep(IN,[knows],vertex), MatchEndStep(x)]

E

Table 1Mapping between the SPARQL BGPs, Gremlin SSTs and their corresponding Traversal steps. Each SPARQL BGP can be mapped to a corre-sponding Gremlin SST as described in Sect. 4.2.

Figure 4. Example GPM evaluation notion of a, (a) BGP and (b)CGP, SPARQL query over an RDF graph in Figure 2.

Figure 5. Corresponding GPM evaluation notion of a, (a) BGP and(b) CGP, in a Gremlin traversal over a property graph in Figure 3.

traversals which in turn are comprised of a combina-tion of several SSTs.

From the already well established semantics ofSPARQL query language [10, 14, 15, 23, 25], a query(Q) can be notationally represented as:

Q = {[PROJ.] BGP [UNION/DIFF./OPT.]

BGP [Filter (c)]}+(3)

Where, Proj., Union, Diff. and Filter are the relationaloperators defined on the BGPs.

We have already established from Table 1, that eachBGP can be mapped to a corresponding Gremlin sin-gle step traversal (σ(BGP) = ψs). Thus, from equations(2, 3), we can create a mapping function σ, such that:

σ(BGP) = ψs (4)

Therefore, building on equations (4, 3) a SPARQLquery Q can be mapped as:

σ(Q) = σ({[PROJ.] BGP [UNION/DIFF./OPT.] BGP [Filter (c)]}

)+={σ([PROJ.]) ψs σ(UNION/DIFF./OPT.) ψs σ(Filter (c))

}+

= Ψ

(5)

Where, σ([PROJ.]), σ([UNION/DIFF./OPT.]) andσ([FILTER (C)]) represent the respective Gremlin in-struction steps for the operators such as Projection,Union, etc. We present a consolidated summary ofthe correspondence between SPARQL features/key-words and their corresponding Gremlin instructionsteps in Table 2. Furthermore, we also present theSPARQL query language constructs and their corre-sponding Gremlin traversal language constructs in Ta-ble 2.

The evaluation of a SPARQL query Q is carried outby matching or evaluating the graph patterns withinQ, against a graph G (an RDF graph in this case), de-noted as JPKsparql

G ). Similarly, in Gremlin traversal lan-guage and machine, the evaluation of a pattern match-ing traversal Ψ is carried out, by the match()-step,by matching or evaluating the SSTs within Ψ againsta graph G (a property graph in this case). We borrow

Page 9: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 9

Operation SPARQL keyword Gremlin Step SPARQL construct (Q) Gremlin construct σ(Q) = Ψ

Graph Pattern(s) { s p o . } ψ (i.e. σ(s p o .)) BGP ψ (single step traversal [list of ψ])Matching WHERE { ... } MatchStep(AND,[]) WHERE { BGP1 . BGP2 . } [MatchStep(AND,[[ψ1],[ψ2]]Restriction FILTER(C) WhereTraversalStep(p(C)) FILTER (?v1 <30) WhereTraversalStep([value(v1), IsStep(lt(30))])Join JOIN AndStep() BGP1 * BGP2 AndStep([[ψ1], [ψ2]])Projection SELECT SelectStep() SELECT ?v1 ?v2 SelectStep([a, b,])Combination UNION UnionStep() BGP1 UNION BGP2 UnionStep(p(BGP1),p(BGP2))Deduplication DISTINCT DedupStep() DISTINCT ?v1 DedupStep([a,b])Restriction LIMIT(M) RangeStep(0,M) LIMIT 2 RangeStep(0,2)Restriction OFFSET(N) RangeStep(N,M+N) OFFSET 10 RangeStep(10,12)Sorting ORDER BY() OrderStep() ORDER BY DESC(?a) OrderStep([[value(a), desc]])Grouping GROUP BY() GroupStep() GROUP BY(?a) GroupStep(value(a))

Table 2A consolidated list of SPARQL features/keywords & their corresponding Instruction steps in Gremlin.

the same notation from [14, 15] to fit our purpose anddenote as JΨKgml

G ).We display brevity in constructing our arguments by

quick examples instead of re-inventing the wheel byre-defining formal concepts and proofs (which alreadyhave been addressed in the works [10, 23, 26, 27]).Moreover, we illustrate using examples, the semanticanalogy between the evaluation of Gremlin traversalfeatures in a homologous fashion to that of the multi-set semantics of SPARQL queries defined by [23]who extend the work of [14, 15]. We show, by struc-tural analogy created with the evaluation semantics ofSPARQL15, that:

JQKsparqlG ≡ JΨKgml

G (∵ σ(Q) = Ψ, Eqn : 5) (6)

Projection. The projection operator projects/selectsthe values of a specific set of variables (x, y, .., n), fromthe solution of a matched graph pattern P, against thegraph G. Furthermore, it is possible to declare vari-ables in Gremlin using .as() steps, which serve assyntactic sugars. For instance, in the CGP as shown inFigures 4(b) and 5(b), we project only variable ?c de-spite using (?a, ?b & ?c) in the query, since we are onlyinterested knowing the value binded to it. It is carriedout using the SELECT keyword in SPARQL, and cor-responding .select() step in Gremlin. The corre-sponding evaluation of a select step in Gremlin can beillustrated as:

15This is because both Gremlin and SPARQL operate over bag se-mantics and works such as [22, 23, 27] have already debated and for-mally established the equivalence between underlying semantics ofrelational and graph-specific operators for RDF and Property graphs

� JSELECT ?x1 ?x2 {BGP}KsparqlG

= Jσ(SELECT?x1 ?x2 {BGP})KgmlG

= Jσ(BGP)σ(SELECT ?x1 ?x2)KgmlG

= Jψs SelectStep([x1,x2])KgmlG = JΨKgml

G(7)

Here, ψs (SST) and SelectStep([x1,x2]) col-lectively form the final pattern matching traversal(analogous to a collection of BGPs and BGPs form-ing). Moreover, ψs is mapped from Table 1, dependingon the case it corresponds.

Optional. The optional operator in corresponds toa left-join operation (in relational sense). The optionalgraph patterns in a query are declared using this oper-ator. For instance, given the CGP: BGP1 OPT. BGP2;if the optional BGP2 does not match with graph G,then the results of BGP1 are returned unchanged, elseadditional bindings of BGP2 are added to the solu-tion. It is present in both SPARQL (as OPTIONAL)and Gremlin (as .optional() keyword which cor-responds to ChooseStep() in the Gremlin instruc-tion library).

� JBGP1 OPT. BGP2KsparqlG

= Jσ(BGP1 OPT. BGP2)KgmlG

= Jσ(BGP1) ChooseStep(σ(BGP2))KgmlG

= Jψs1,ChooseStep(ψs2)KgmlG = JΨKgml

G

(8)

Union. The union operator combines the solutionsets of the two input graph patterns. In SPARQL, unionoccurs between two BGPs or CGPs, analogously in

Page 10: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

10 H. Thakkar et al. / Gremlinator

Gremlin, it occurs between two SSTs and Traversals(i.e. the result set of two traversers). The solution setreturned after the union operation is not de-duplicatedby default, because of the governing bag semantics.Thus, all possible solutions are returned. Formally, theevaluation of a union can be illustrated as:

� JBGP1 UNION BGP2KsparqlG

= Jσ(BGP1 UNION BGP2)KgmlG

= JUnionStep(σ(BGP1),σ(BGP2)]KgmlG

= JUnionStep([ψs1, ψs2])KgmlG = JΨKgml

G

(9)

For instance, consider the sample SPARQL CGP withUNION over the graph G (ref. Figure 3) as illustratedin the example below. The idea is to find the all thesoftware created by "marko" which are in "java" lan-guage.

Illustration of a CGP with UnionSPARQL CGP Gremlin Traversal (σ(BGP))

{ ?soft v:lang"java" .} UNION{ ?person v:name"marko" .}

UnionStep ([[StartStep(soft),PropertiesStep([lang],value),IsStep(eq(java)), EndStep],[StartStep@[person],PropertiesStep([name],value),IsStep(eq(marko)), EndStep]])

FILTERs. The filter keyword (or a group of op-erators) is used to restrict the results based on user-defined criteria. Filters declare one or more constraintson the variables in the query, depending on the need ofthe user, and limit the solution of the overall group ofBGPs with respect to specified equality/inequality/reg-ular expressions (i.e. constraints). It is present in bothSPARQL (as FILTER C, where C is the declared con-straint) and Gremlin (as .where(C), where C is theconstraint). In Gremlin the .where(C) keyword cor-responds to the WhereTraversalStep() from theinstruction set library

� JBGP FILTER CKsparqlG

= Jσ(BGP FILTER C)KgmlG

= Jσ(BGP)σ(FILTER C)KgmlG

= Jψs, WhereTraversalStep(ψc)KgmlG = JΨKgml

G(10)

Here, ψc denotes the corresponding Gremlin logicaloperator steps (i.e. .eq() for = , .neq() for 6=,

.gte(), etc.). The Gremlin traversal language sup-ports all the logical operators defined in SPARQLquery language (as described here16), which can befound at the online documentation17. However, the cur-rent version of the Gremlin traversal language doesnot support regular expression matching REGEX oper-ators, although specific graph databases that leverageTinkerPop framework may provide a partial match ex-tension.

Illustration of a CGP with FILTERSPARQL CGP Gremlin Traversal (σ(BGP))

{ ?a v:name?b . ?av:age ?d .FILTER(?d<30)}

[MatchStartStep(a),PropertiesStep([name],value),MatchEndStep(b)],[MatchStartStep(a),PropertiesStep([age],value)@[d],MatchEndStep(d)],WhereTraversalStep([WhereStartStep(d),IsStep(lt(30))])

Like in SPARQL, it is possible to declare multiple con-straints inside a single FILTER clause:FILTER (C1 && C2)→WhereTraversalStep(AndStep[(C1, C2)]FILTER (C1 || C2)→WhereTraversalStep(OrStep[(C1, C2)]For brevity we skip the illustration of this step, as itbeing perceptible.

Query Modifiers. The solution set returned by theevaluated graph patterns is NOT de-duplicated or or-dered by default, as both the languages operate on bagsemantics. Therefore, query modifiers or solution se-quence modifiers are used for presenting the resultsin the desired order. We list out query modifiers, theircorresponding keywords and language constructs inTable 2. Examples of query modifiers include DIS-TINCT (for result de-duplication), LIMIT & OFF-SET (for restricting no. of results), GROUP BY (forgrouping manipulation of result stream), ORDER BY(for ordering manipulation of result stream).

For brevity we skip the formal definitions of eachmodifier, rather illustrate their correspondence and ap-plicability in Table 2.

Subgraphs. Like in SPARQL query language, itis also possible to load/create and query NAMEDgraphs. This can be achieved using the Gremlin

16SPARQL operator definitions – (https://www.w3.org/TR/rdf-sparql-query/#SparqlOps)

17Gremlin logical operators (predicates) – (http://tinkerpop.apache.org/docs/current/reference/#a-note-on-predicates)

Page 11: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 11

Subgraph()-step. It allows a user to create cus-tom graphs based on specific graph patterns (vertices,edges and properties) and later query them using thesame approach as described earlier in this section.

5. Approach

In this section we discuss our proposed approach –Gremlinator, its execution pipeline and limitations inbrief.

5.1. Encoding SPARQL prefixes

We encode the prefixes of SPARQL queries withinGremlinator implementation, in order to aid the SPARQLto Gremlin translation process. We define custom pre-fixes keeping in mind the four categories of SSTs(as stated in sec. 4.2). For instance, the standardrdfs:label prefix (which is generally a predicate)is represented as e:label or v:label (where e =edge and v = vertex). A similar procedure is followedfor other three cases.

5.2. Gremlinator Architecture & Algorithm

We now present the architectural overview of Grem-linator in Fig. 6 and discuss the role of each of the four-step execution pipeline.

Step 1. The input SPARQL query is first parsed us-ing the Jena ARQ module, thereby: (i) validating thequery and (ii) generating its abstract syntax tree (AST)representation.

Step 2. From the obtained AST of the parsedSPARQL query, Gremlinator then visits each BGPs,mapping them to the corresponding Gremlin SSTs (ψs,ref. Table 1).

Step 3. Thereafter, depending on the operator prece-dence obtained from the AST of the parsed SPARQLquery, each of the corresponding SPARQL keywordsare mapped to their corresponding instruction stepsfrom the Gremlin instruction library (ref. Table 2).Thus, a final conjunctive Traversal (Ψ) is generated ap-pending the SSTs and instruction steps. This can beperceived analogous to the SPARQL query language,wherein a set of BGPs form a single complex graphpattern (CGP).

Step 4. This final conjunctive traversal (Ψ) is usedto generate bytecode18 which can be used on multiple

18Bytecode is simply serialized representation of a traversal, i.e. alist of ordered instructions where an instruction is a string operatorand a (flattened) array of arguments.

Figure 6. The architectural overview of Gremlinator.

language and platform variants of the Apache Tinker-Pop Gremlin family.

Algorithm. The SPARQL to Gremlin translation al-gorithm is presented in Algorithm 1

5.3. Limitations

The current version of Gremlinator supports theSPARQL 1.0 SELECT queries with the following ex-ceptions: 1.) REGEX (regular expressions) in FILTER(restrictions) of a graph pattern are currently not sup-ported19. 2.) Gremlinator does not support variablesfor the property predicate, i.e. the predicate {p} ina graph pattern {s p o .} has to be defined orknown for the traversal to be generated. This is becausetraversing a graph is not possible without knowing theprecise traversal operation to the destination (vertex oredge) from the source (vertex or edge).

6. Empirical Evaluation

We now shed light on the empirical evaluation set-tings of our experiments. These include the dataset andquery descriptions, a carefully curated experimentalsetup (keeping in mind the various settings native toboth RDF and Graph DMSs), a brief note on the cor-rectness of our approach, the reported results and theirmeticulous discussion. Finally, with a brief note onthe curated public demonstration of Gremlinator whichpromotes the users to get a first hand experience of theproposed system, we conclude the section.

6.1. Datasets

Northwind – is a synthetic-dataset with an e-commerce scenario between a fictional company "North-wind Traders", its Customers, and Suppliers. Origi-

19This is because the REGEX feature is not supported in Tinker-Pop Gremlin as of now. Thus, it is Gremlin’s limitation and not ofour approach.

Page 12: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

12 H. Thakkar et al. / Gremlinator

Algorithm 1: SPARQL2Gremlininput : S Q : SPARQL Queryoutput: GT : Gremlin Traversal

1 GT ← ∅; T← ∅ // list of single step

traversals T

2 ; AS T ← getAST(SQ); BGPs← getAllBGPs(AST)

3 foreach bgpi ∈ BGPs do4 T ← T ∪ ψs // mapping BGP to Gremlin

S.S.T. (ψs = σ(bgpi)) ∵ Table 1

5 end6 // mapping the corresponding Gremlin

operators in the A.S.T. cf. Table 2

7 if c← AS T.FILT ER, ∃c 6= ∅ then8 foreach c ∈ AS T do9 T ← T ∪ WhereTraversalStep(ψc)

10 end11 end12 if c← AS T.UNION then13 GT ← UnionStep(Match(T ))14 end15 if |T | > 1 then16 GT ← Match(T)17 else18 GT ← GT ∪ T19 end20 if c← AS T.ORDERBY then21 GT ← T ∪ OrderStep(ψc)22 end23 if c← AS T.GROUPBY then24 GT ← T ∪ GroupByStep(ψc)25 end26 if c← AS T.LIMIT then27 if k← AS T.OFFS ET then28 GT ← T ∪ RangeStep(k, c + k)29 else30 GT ← T ∪ RangeStep(c)31 end32 end33 return GT

nated as a sample dataset shipped with Microsoft Ac-cess20, it raised to fame with an enormous demand fore-commerce use cases in benchmarking DMSs. In Fig-ure 7(a) we present the dataset schema. We obtained

20Northwind Database (https://northwinddatabase.codeplex.com/)

Table 3Dataset statistics

Criterion Northwind BSBMRDF PG RDF PG

Classes 11 - 159 -Entities & Nodes 4413 3209 71015 92757Distinct subjects 4413 - 71017 -Distinct objects 8187 - 166384 -Properties 55 55 40 40Number of Triples & Edges 33003 6177 1000313 238309

graph version of the dataset from here21.Berlin SPARQL Benchmark [28] (BSBM) – is asynthetic dataset, which is built around an e-commerceuse case, between a set of products, their vendors, con-sumers who review the products. It is a widely famousfor benchmarking RDF DMSs as it offers the flexibil-ity of generating graphs of custom size and density.We generated a standard 1M triples dataset using theirdata generation script, which makes it available in var-ious formats (e.g. .nt, .csv, .sql, .ttl, etc) Figure 7(b)describes the schema of the BSBM Dataset.

Table 3 summarizes the statistics of both Northwindand BSBM-1M dataset.

Figure 7. The dataset schema of (a) Northwind and (b) BSBM.

6.2. Pre-defined Queries

We created a pre-defined set of 30 SPARQL queries,for each dataset, which cover 10 different query fea-tures (i.e. three queries per feature with a combina-tion of various modifiers). These features were se-lected after a systematic study of SPARQL query se-mantics [10, 15, 22] and from BSBM [28] explore usecases22 and Watdiv Query templates23. A gold stan-

21SQL2Gremlin website – (http://www.sql2gremlin.com)22BSBM Explore Use Cases (https://goo.gl/y1ObNN)23Watdiv Query Features (http://dsg.uwaterloo.ca/watdiv/

basic-testing.shtml)

Page 13: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 13

dard set of corresponding Gremlin traversals of theSPARQL queries was created by three Gremlin ex-pert users, for a twofold validation of the traversalsgenerated by our approach. We elaborate on the ap-proach evaluation and correctness in the following sub-section. Table 4, summarizes the query design and thefeature distribution within them that was used for ourexperiment.

6.3. Experimental Correctness

In order to validate the correctness of our approachempirically, we – (a) loaded the RDF datasets in thethree top of the line RDF DMSs and the correspond-ing Property graph datasets in the three top of the lineGraph DMSs (cf. Section 6.4). Thereafter, we executedthe SPARQL queries against the RDF DMSs and thecorresponding Gremlinator translated Gremlin traver-sals against the Graph DMSs. We then compared theresults returned by each of these queries for correct-ness; (b) compared the results returned by the Gremli-nator translated traversals with respect to that returnedby the hand crafted Gremlin traversals (gold standardqueries curated by three Gremlin experts), for the cor-responding SPARQL queries, for all the three GraphDMSs.

Having conducted the above validation, we ob-served that the results returned by the – (a) RDF andGraph DMSs were equal. However, the representationof the returned results were distinct. The results re-turned by the SPARQL queries were in a tabular for-mat, whereas those returned by the Graph DMSs werein a list of sets format. We report this using a subsetof results for BSBM dataset over both RDF and GraphDMSs in Table 5 of Appendix A for reference. Here,we can clearly observe that the results in both the casesare equal though having two different representations.A complete set of all the results for both the datasetscan be referred by vising the online resource describedin the caption of Table 5. (b) Gremlin translated traver-sals and the hand crafted Gremlin traversals were alsoequal. Thus, ensuring that the proposed SPARQL →Gremlin translation approach is correct, as it preservesthe meaning of the original query (i.e. the informationneed of the input SPARQL is not manipulated in thetranslation process.)

6.4. Experimental Setup

We selected the following DMSs for the experi-ments: RDF DMS: Openlink Virtuoso [29] [v7.2.4],

Table 4Query feature design and description

Que

ry

Feat

ure

FILT

ER

CO

UN

T

LIM

IT

DIS

TIN

CT

#Pa

tter

ns

#Pr

oj.V

ars.

C1 CGP X X 2 2C2 CGP X 1 1C3 CGP X 1 1F1 CONDITION X(1) 3 3F2 CONDITION X(2) 3 3F3 CONDITION X(1) X 2 1L1 RESTRICTION X(1) X X 4 2L2 RESTRICTION X 2 2L3 RESTRICTION X 2 2G1 GROUP BY X X 2 2G2 GROUP BY X(1) 6 2G3 GROUP BY X 1 2Gc1 GROUP COUNT X X 3 2Gc2 GROUP COUNT X 2 2Gc3 GROUP COUNT X X 1 2O1 ORDER BY X 1 1O2 ORDER BY X(1) 4 3O3 ORDER BY X X 1 1U1 UNION X(2) X 8 1U2 UNION X(2) 6 2U3 UNION X(2) X 4 1Op1 OPTIONAL X(1) 3 3Op2 OPTIONAL X X 6 2Op3 OPTIONAL X(2) 8 3M1 MIXED X X 3 2M2 MIXED X X X 2 2M3 MIXED X X 4 2S1 STAR X(1) X 12 11S2 STAR X(1) X 5 4S3 STAR X(1) 10 9

TOTAL 30 Q. - - - - - -

JenaTDB24 [v3.2.0], 4Store [30] [v1.1.5]; GraphDMS: TinkerGraph [31] [v3.2.3], Neo4J25 [v1.9.6],Sparksee26 [v5.1]. All the experiments were performedon a machine with the following configuration: CPU:Intel® Xeon® CPU E5-2660 v3 (20 cores @2.60GHz),RAM: 128 GB DDR3, HDD: 512 GB SSD, OS:Linux 4.2-generic.

24Apache Jena TDB (https://jena.apache.org/documentation/tdb/index.html)

25Neo4J (https://neo4j.com/)26Sparksee – formerly DEX (http://sparsity-technologies.com/

#sparksee)

Page 14: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

14 H. Thakkar et al. / Gremlinator

Evaluation Metrics. The following conditions andparameters were considered for reporting all results.

– Query execution time (in milliseconds or ms)considered is the average of 10 runs for eachquery (of both SPARQL and translated Gremlintraversals).

– Queries executed in both cold and warm cachesettings for respective DMSs. Where a warmcache: implies that the cache is not cleared af-ter each query run, and cold cache: impliesthat the cache is cleared using the ’echo 3> /proc/sys/vm/drop_caches’ UNIXcommand after each query execution.

– For Graph DMSs, query execution time is recordedfor both with and without creating explicit in-dices. We elaborate on the reason for the same,next.

Indexing in RDF Triple Stores vs Graph DMS.RDF triple stores typically index data employing pre-defined indices. However, it is theoretically possible tohave an RDF DMS totally index-free, but this wouldimply performing a linear search through the entiredataset (set of triples) for each query that is executed.For this reason, having some pre-defined index set-ting within a RDF DMS by default is salient. Thesame, however, cannot be said for Graph DMS whereinthese indices have to be created manually, depend-ing upon the use case. For instance, Openlink Vir-tuoso maintains two all-purpose full (bitmap indicesover PSOG, POGS) and three partial indices (over SP,OP GS) in the default configuration27. Furthermore,4Store in its default setting maintains a set of threefull indices (R, P, M) [30], where – the R-index is ahash-map index over RDF resources (URIs, Literals,and Blank Nodes); the P-index consists of a set of tworadix trees per predicate, using a 4-bit radix; the M-index is a hash-map based indexing scheme over RDFGraphs (G). Lastly, Apache Jena TDB maintains threeindices using a custom persistent implementation ofB+ Trees28.

On the other hand, Graph DMSs rarely maintain anydefault indexing scheme. They rather offer the possi-bility of creating explicit indexes over custom graphelements, using a variety of data structures, depend-

27RDF indexing scheme in Virtuoso (http://docs.openlinksw.com/virtuoso/rdfperfrdfscheme/)

28RDF indexing scheme in Apache Jena TDB (https://jena.apache.org/documentation/tdb/architecture.html#triple-and-quad-indexes)

ing on the implementation. For instance, TinkerGraphsupports the creation of regular and composite hash-map indices (multiple key-value pairs) on graph ele-ments (node and edge attributes). Neo4J allows declar-ing regular indices (composite indices are supportedfrom v3.5 onwards) on graph elements (including la-bels). It offers a variety of indices ranging from Luceneindex (for textual attributes) and as SBTREE-based in-dex (numeric ones, such as IDs), which is based oncustom implementation of B-Trees with several opti-mizations related to data insertion and range queries 29.Lastly, like other Graph DMSs, Sparksee also offersuser-defined indices on attributes. It uses a bitmap in-dex implemented using sorted B-trees [32].

As we pointed out earlier, it is not possible to havea completely index-free RDF DMS. Thus, in order tograsp a better understanding of query execution per-formance with respect to various factors (such as in-dexing schemes, query typology and cache configura-tion) and also for the sake of fairness (towards GraphDMSs) we run all the experiments with two settingsof Graph DMSs, i.e. with (i.e. manually created) andwithout indices.

6.5. Results

The detailed results including the queries, datasetstatistics, plots and full configuration settings can beobtained from here30. The complete source code ofGremlinator is made publicly available, along witha recorded demonstration of Gremlinator in action,which can be accessed here31. The complete setupincluding all the datasets, scripts, and DMSs canbe found here32. The average time for translating aSPARQL query to Gremlin traversal is 14 ms forBSBM and 12.5 ms for Northwind queries respec-tively.

Figures 8 and 9, presents the plots of our experi-mental results, in all four settings, for the BSBM andNorthwind datasets respectively. The plots follow logscale for execution time (in ms). Furthermore, we alsoreport the detailed query-wise results in tabular for-mat in Appendix B, for more comprehensive under-

29Indexing in Neo4J (http://neo4j.com/docs/developer-manual/current/cypher/schema/index/)

30Detailed results can be found at (https://goo.gl/CSSVzZ)31Gremlinator source code (https://github.com/

LITMUS-Benchmark-Suite/sparql-to-gremlin)32Experimental setup (https://github.com/harsh9t/

SWJ-2018-Experiments)

Page 15: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 15

Figure 8. Performance comparison of SPARQL queries vs Gremlin (pattern matching) traversals for BSBM dataset with respect to RDFand Graph DMSs in different configuration settings.

Page 16: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

16 H. Thakkar et al. / Gremlinator

Figure 9. Performance comparison of SPARQL queries vs Gremlin (pattern matching) traversals for Northwind dataset with respect toRDF and Graph DMSs in different configuration settings.

Page 17: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 17

standing. We observe similar trend of performance ofSPARQL vs. Gremlin queries over both the datasets,which is evident from Figures 8 and 9 and also the Ta-bles 6 and 7 of Appendix B. Therefore, we present thedetailed performance analysis of SPARQL vs. Grem-lin only for the BSBM dataset. We organize our obser-vations on the performances of participating DMSs asfollows, and present our discussion.

Graph DMSs without index: We categorize ourfindings in two groups – cold cache and warm cache.We observe that for –

1. Cold cache: SPARQL queries report a compara-tive advantage with respect to Gremlin traversals,leveraging the advantage of indexing schemesof RDF DMSs. SPARQL performs moderatelyfaster (1x-2x) for simple queries (C1, C2) and or-der by (O1, O3); substantially faster (3x-5x) forunion and mixed queries (U1-3, M1-3). Whereas,Gremlin traversals benefit from only the graph lo-cality inherent to Graph DMSs. Gremlin traver-sals perform moderately faster (1x-2x) for restric-tion (L1, L3), group by (G1-3) and conditional(F1-3) queries; substantially faster (3x-5x) forgroup count (Gc1-3) and star (S1-3) queries. Wealso note that aggregation queries (counts, groupcounts) in Graph DMSs are an order of mag-nitude faster as compared to RDF DMSs sincethey do not have to execute multiple inner joinsin addition to the aggregation operations. More-over, for star-shaped queries (queries with bushyplans having >=5 triple patterns, >=1 filter and>=4 projection variables) Gremlin pattern match-ing traversals outperform their SPARQL counter-parts by at least an order of one magnitude for S1,S2 and at least an order of two magnitudes for S3(with 10 triple patterns, 1 filter and 9 projectionvariables).

2. Warm cache: SPARQL queries reap the mostbenefits of warm caching from RDF DMSs ascompared to the Gremlin traversals from GraphDMSs. We observe that on average, in this set-ting, the improvement is up to 1x-1.8x for star andmixed queries, 2x-3x for aggregation (counts),condition (filter) and re-ordering (order by, groupby) queries, and 3x-5x for CGPs and unionqueries. We also note that SPARQL queries arealmost an order of magnitude faster than the cor-responding Gremlin traversals for queries hav-ing a union operator, and are comparable formixed, CGPs, and order by queries. Further-

more, we also note that SPARQL star-shape basedqueries do not register substantial improvement inwarm cache execution. On the other hand, Grem-lin traversals receive little benefit, from GraphDMSs, in warm cache. We report that on aver-age, in this setting, the improvement is up to 1.3xfor aggregation (count, group count) and star-shaped queries; up to 1.5x for re-ordering (order-by, group-by) and condition (filters) queries; upto 2x for mixed, union and restriction (limit)queries.

Graph DMSs with indexing: We manually createdcomposite indices for each Graph DMS on attributessuch as "name", "customerId", "unitPrice","unitsInStock", "unitsOnOrder" for BSBMdataset. Similarly, on "type", "productID","reviewerID", "productTypeID" for North-wind dataset, on the node attributes (numeric) 33. Theindices use the hash-map data structure. We did not re-execute SPARQL queries on RDF DMSs, as there wasno change in the indexing setting for the same.

1. Cold cache: Gremlin traversals perform signifi-cantly faster when executed on Graph DMSs withcomposite indices. We observe that, as comparedto the previous (cold cache + without index) set-ting, the improvement reported on an average isup to 1x-2x for union, mixed and group by traver-sals; up to 2x-3x for re-ordering (group-by, order-by) traversals; up to 3x-5x for regular and restric-tion traversals; and >5x for aggregation and star-shaped traversals.

2. Warm cache: In this setting the Graph DMSs(i.e. Gremlin traversals) register similar perfor-mance gains to that in non-indexed configuration.

6.6. Discussion.

We now discuss the findings of our experiments withrespect to the factors which influence the query execu-tion performance of a particular DMS and summarizeour observations. We categorize our findings based onthe following factors:

– Query typology: We report that for – (i) sim-ple/linear queries (such as C1-3, F1-3, L1-3) bothSPARQL and Gremlin traversal performancesare comparable; (ii) SPARQL outperforms cor-responding Gremlin traversals for union queries.

33We have provided all the groovy scripts used for creating com-posite indices in the Github repository pointed earlier

Page 18: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

18 H. Thakkar et al. / Gremlinator

This is so because in SPARQL a union oc-curs between two or more sets of triple pat-terns. Whereas in the declarative construct (pat-tern matching) of Gremlin, a union occurs be-tween two .match()-steps (i.e. Gremlin treatseach .match()-step as a distinct traversal andthen executes a union on top of it); (iii) Whereas,for complex queries (such as star-shaped and ag-gregation based queries), Gremlin traversals out-perform their SPARQL counterparts. As men-tioned before (ref. 6.5 – cold cache section), thisis because Graph DMSs do not have to performexpensive joins (like RDF DMSs) on top of exe-cuting aggregation operations. (iv) Lastly, we alsoobserve that for queries with greater number ofprojection variables (Proj. vars >= 3) and querymodifiers (count, distinct, limit + offset, filter),Gremlin traversals show a distinctive advantage(more than an order of magnitude) in terms of per-formance with respect to corresponding SPARQLqueries (e.g. for F1, F2, O2, S1, S2, S3). This ad-vantage, while still exists, is not as pronouncedwhen comparing queries with a fewer number ofprojection variables and query modifiers.

– Query caching – Cold vs Warm: Despite thefact that both DMSs benefit from warm cachequery execution (as compared to cold cache),SPARQL queries receive the most advantage ascompared to corresponding Gremlin traversals.One reason for this is that Gremlin traversalsperform considerately better (except in cases ofunion queries) by leveraging the advantage ofunderlying property graph data model (locality)and cannot be optimized further without explic-itly creating regular or composite indices. Out ofall the three RDF DMSs, Jena shows the mostgain in warm execution time, which receives up to5x boost in cases such as union and CGP queries.

– Indexing scheme: It does not go without notic-ing, the one-sided dominance of Openlink Virtu-oso, amongst all the evaluated RDF DMSs. Asmentioned earlier, Virtuoso maintains a variety offull and partial indices. Moreover, we also knowthat virtuoso employs custom partition cluster-ing and caching schemes on top of these indicesto provide an adaptable solution to all kinds ofworkloads. One distinctive advantage in virtu-oso is that the indices are column-wise by de-

fault34, which takes one-third amount of space ascompared to the row-wise indices. On the con-trary, similar claims cannot be made about otherRDF DMSs such as 4Store and JenaTDB. GraphDMSs, have a limited number options in termsof underlying indexing data structures implemen-tation for creating manual indexes in the chosenversion. One reason can be deduced that there hasnot been an explicit need for using complex indexschemes (as in Virtuoso), since composite indicesbased on B+ trees and hash-maps provide suffi-cient performance boost for graph traversal oper-ations.

Thus, based on our findings, we summarize thatfor complex queries (such as aggregation, star-shaped,and queries with higher number of projection vari-ables + query modifiers) corresponding Gremlin pat-tern matching traversals outperform SPARQL queries.Whereas, for union-based queries SPARQL registersignificant performance advantage.

6.7. Hands-On Gremlinator

For the demonstration of our approach – Gremli-nator, we provide the entire setup including both thedatasets and the entire set of pre-defined SPARQLqueries for interested users to get a first hand experi-ence [33]. Furthermore, we encourage the end user towrite and execute custom SPARQL queries for boththe datasets, for further exploration. As a part of thedemonstration of our system [33], we provide– (i) anonline screencast35 for an introductory video tutorialon how to use the demonstration (ii) a web applica-tion accessible at36 (iii) a desktop application of Grem-linator (standalone .jar bundle) which requires Java1.8 JRE installed on the corresponding host machine,downloadable from the web demo website.

7. Conclusion and Future Work

In this paper, we presented Gremlinator, a novelapproach for supporting the execution of SPARQLqueries on property graphs using Gremlin patternmatching traversals. Furthermore, we presented an

34Indexing scheme in Openlink Virtuoso (http://docs.openlinksw.com/virtuoso/rdfperfrdfscheme/)

35Gremlinator Demo Tutorial – https://youtu.be/Z0ETx2IBamw36Gremlinator Web Demo – http://gremlinator.iai.uni-bonn.de:

8080/Demo

Page 19: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 19

empirical evaluation of our approach using state-of-the-art RDF and Graph DMSs, demonstrating the va-lidity and applicability of our approach. The evalu-ation demonstrates the substantial performance gainobtained by translating SPARQL queries to Grem-lin traversals, especially for star-shaped and complexqueries. Gremlinator has obtained clearance by theApache Tinkerpop development team and is currentlyin production phase to be released as a plugin duringTinkerPop’s next framework cycle. Gremlinator hasalso been integrated into the SANSA Stack [34] (v0.3)framework as an experimental plugin. Furthermore,Gremlinator is freely available under the Apache 2.0license for public use from the Maven Central reposi-tory.

As future work, we aim to – (i) extend our cur-rent work by enabling support for SPARQL 1.1 fea-tureset, such as Property Paths, regex in restrictions(i.e. FILTERs) and variables for property predicates;(ii) integrate Gremlinator within frameworks suchas LITMUS [35–37], to enable automatic executionof SPARQL queries over property graphs for robustbenchmarking diverse RDF and Graph DMSs.

AcknowledgementsThis work is supported by the funding received from

EU-H2020 WDAqua ITN (GA. 642795). We wouldlike to thank Dr. Marko Rodriguez, Mr. Stephen Mal-lette, and Mr. Daniel Kuppitz, of the Apache Tinker-Pop project, for their support and quality insights forintegrating Gremlinator.

Appendix A. SPARQL - Gremlin Results

In this section we demonstrate the correctness ofGremlinator empirically, as already discussed in Sec-tion 6.3. We present a subset of the results in Table 5,which validate our claim that the proposed SPARQL→ Gremlin translation is correct.

Appendix B. SPARQL - Gremlin PerformanceComparison

In this section, we present the query-wise detailedresults in tabular format of the same plots reported pre-viously in Figures 8 and 9.

Page 20: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

20 H. Thakkar et al. / Gremlinator

Q.# SPARQL Query Feature SPARQL Query Result Gremlin Traversal Result

C1 SELECT (COUNT (DISTINCT (?product)) as ?total)WHERE { ?a v:type "review" . ?a e:edge ?product . }

BGP 2787 2787

F3 SELECT DISTINCT ?pid WHERE { ?a v:productID?pid . ?a v:ProductPropertyNumeric_1 ?property1 .FILTER ( ?property1 = 1 ) }

FILTER ?pid bsbm:inst/Product1636 bsbm:inst/Product2295 { pid=1636 } { pid=2295 }

L2 SELECT ?rating1 WHERE { ?a v:type "review" .?a v:Rating_1 ?rating. ?a e:edge ?product. ?productv:productID ?pid . FILTER ( ?pid = 343 ) .} LIMIT 2

LIMIT ?rating1 9 7 { rating1=9 } { rating1=7 }

G2 SELECT ?product WHERE { ?a v:type "reviewer". ?a v:reviewerID ?rid. ?a e:edge ?review . ?reviewv:Rating_1 ?rating1. ?review e:edge ?product. ?productv:productID ?pid. FILTER ( ?rid = 86). } GROUP BY(?rating1)

GROUPBY

?product bsbm:inst/Product1107 bsbm:inst/Product1301bsbm:inst//Product1852 bsbm:inst/Product2291bsbm:inst/Product1098 bsbm:inst/Product1954bsbm:inst/Product1994 bsbm:inst/Product1355bsbm:inst/Product734 bsbm:inst/Product1448bsbm:inst/Product1426 bsbm:inst/Product1817bsbm:inst/Product1141 bsbm:inst/Product1194bsbm:inst/Product451 bsbm:inst/Product1294bsbm:inst/Product1532

{ product=1107 } { product=1301 } { product=1852 } {product=2291 } { product=1098 } { product=1954 } { prod-uct=1994 } { product=1355 } { product=734 } { product=1448} { product=1426 } { product=1817 } { product=1141 } {product=1194 } { product=451 } { product=1294 } { prod-uct=1532 }

Gc2 SELECT ?product (COUNT (?review) as ?total)WHERE { ?review v:type "review" . ?review e:edge?product . ?product v:productID ?pid. } GROUP BY(?product) LIMIT 10

GROUPCOUNT

?product ?total bsbm:inst/Product2588 1 bsbm:inst/Product31 bsbm:inst/Product2331 2 bsbm:inst/Product25533 bsbm:inst/Product1803 5 bsbm:inst/Product24407 bsbm:inst/Product2201 5 bsbm:inst/Product316 3bsbm:inst/Product2210 7

{Product=2588, Total=1} {Product=3, Total=1} {Prod-uct=2331, Total=2} {Product=2553, Total=3} { Prod-uct=1803,Total=5 } { Product=2440, Total=7 } { Prod-uct=2201, Total=5 } { Product=316, Total=3 } { Prod-uct=2210, Total=7 }

O2 SELECT DISTINCT ?product ?label WHERE {?a v:productTypeID ?tid. FILTER(?tid = 58). ?ae:edge ?product. ?product v:productID ?pid. ?productv:label_n ?label. } ORDER BY (?product) LIMIT 5

ORDERBY

product label bsbm:inst/Product11 "pipers pests"bsbm:inst/Product18 "boondogglers" bsbm:inst/Product489"airsickness simplices skiing" bsbm:inst/Product694 "nahuatlsterrifiers direr" bsbm:inst/Product709 "jacinth medusoids"

{pid=11, lab=pipers pests} {pid=18, lab=boondogglers}{pid=489, lab=airsickness simplices skiing} {pid=694,lab=nahuatls terrifiers direr} {pid=709, lab=jacinth medu-soids}

U1 SELECT ?label WHERE { { ?a v:productTypeID?tid. FILTER(?tid = 58). ?a e:edge ?product. ?productv:productID ?pid. ?product v:label_n ?label. }UNION{ ?a v:productTypeID ?tid. FILTER(?tid = 102). ?ae:edge ?product. ?product v:productID ?pid. ?productv:label_n ?label. }} LIMIT 10

UNION ?label "airsickness simplices skiing" "nahuatls terrifiers direr""jacinth medusoids" "slowed cloche" "meshwork" "nonradi-cal warehousing" "furnacing" "accommodator" "collectivizedmathematics" "brachiate writeoff"

{ label=airsickness simplices skiing } { label=nahuatls ter-rifiers direr } { label=jacinth medusoids } { label=slowedcloche } { label=meshwork } { label=nonradical } { la-bel=warehousing } { label=furnacing } { label=accommodator} { label=collectivized mathematics } { label=brachiate write-off }

Op1 SELECT ?pTex2 ?pText3 ?pNum2 WHERE{ ?product v:productID ?pid . FILTER ( ?pid= 343 ) . ?product rdfs:label ?label. ?productv:ProductPropertyTextual2 ?propertyTextual_2 . ?prod-uct v:ProductPropertyTextual3 ?propertyTextual_3 .OPTIONAL { ?product v:productID ?pid . FILTER( ?pid = 350 ) . ?product rdfs:label ?label. ?productv:ProductPropertyNumeric_2 ?propertyNumeric2 .?product v:ProductPropertyTextual3 ?propertyTex-tual_3 .}}

OPT. pText2 pText3 pNum2 ""cyanided uncharged gametes"" ""flu-orosis appeasing railheads criticizers satirizer controllers"" 758

{pText_2=cyanided uncharged gametes, pText_3=fluorosisappeasing railheads criticizers satirizer controllers,pNum2_2=758}

M1 SELECT ?reviewer (COUNT (?product) as ?total)WHERE { ?reviewer v:type "reviewer". ?reviewere:edge ?review. ?review e:edge ?product . } GROUP BY(?reviewer) ORDER BY DESC (?total) LIMIT 10

MIX bsbm:inst/Reviewer1294 42 bsbm:inst/Reviewer501 41bsbm:inst/Reviewer424 39 bsbm:inst/Reviewer281 38bsbm:inst/Reviewer1263 38

[1294:42, 501:41, 424:39, 281:38, 1263:38]

S1 SELECT ?plabel ?label ?flabel ?proptext1 ?proptext2?proptext3 ?propnum1 ?propnum2 ?comment WHERE{ ?producer v:type "producer". ?producer v:label_n?plabel. ?producer e:edge ?product. ?product v:type"product". ?product v:productID ?pid. FILTER(?pid =343). ?product v:label_n ?label. ?product v:comment?comment. ?product v:ProductPropertyTextual_1?proptext1. ?product v:ProductPropertyTextual_2?proptext2. ?product v:ProductPropertyTextual_3?proptext3. ?product v:ProductPropertyNumeric_1?propnum1. ?product v:ProductPropertyNumeric_2?propnum2. ?product e:edge ?pfeature. ?pfeature v:type"product_feature". ?pfeature v:label_n ?flabel. } LIMIT1

STAR ?label ?comment ?p ?f ?productFeature ?producer ?prop-ertyTextual1 ?propertyTextual2 ?propertyTextual3 ?proper-tyNumeric_1 ?propertyNumeric1_2 "ors" "sobbers kynurenicundergoing remained horsed sidings hutzpa continenceflighty japingly semiretired crispest chukkers bamboozlershivah lagged miggs snickering arbitrators propped os-mic mismeeting dissimulate fraudulently cabled yeller trun-cheons sigil expatriating viceless merrymakers fetas recom-penses disreputability taperer multiplexed toddler disaffili-ating radiating worshipper flamboyance waggly botheringswindlers eucharistical enserfing lightfaced tench trampingmargraves bewilderment deuteronomy contravened fourpennycoveralls traitorousness millpond redetermine jeremiad re-sealable abreaction marblers whisks" bsbm:inst/Producer8bsbm:inst/ProductFeature11 "entoiling" "assignat disrobe""housewifeliness neoliths proselytizers infirmable meditationsbedchair maschera hagfish saplings prearranges debacles be-dews straying grouter stereophonically" "cyanided unchargedgametes" "fluorosis appeasing railheads criticizers satirizercontrollers" 1165 1526

[ProductPropertyNumeric_1:[1165], productID:[343], Pro-ductPropertyTextual_1:[cyanided uncharged gametes],ProductPropertyNumeric_2:[1526], ProductPropertyTex-tual_2:[fluorosis appeasing railheads criticizers satirizercontrollers], label_n:[ors], comment:[sobbers kynurenicundergoing remained horsed sidings hutzpa continenceflighty japingly semiretired crispest chukkers bamboozlershivah lagged miggs snickering arbitrators propped osmicmismeeting dissimulate fraudulently cabled yeller truncheonssigil expatriating viceless merrymakers fetas recompensesdisreputability taperer multiplexed toddler disaffiliating ra-diating worshipper flamboyance waggly bothering swindlerseucharistical enserfing lightfaced tench tramping margravesbewilderment deuteronomy contravened fourpenny coverallstraitorousness millpond redetermine jeremiad resealableabreaction marblers whisks],type:[product]],label:hedgehogsbarstools,label_prod:assignat disrobe

Table 5Comparison of results of a subset of SPARQL queries and their corresponding Gremlin traversals for BSBM dataset. The complete listof all the queries and their corresponding results can be accessed from the spreadsheet available at (https://goo.gl/CSSVzZ).

Page 21: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 21

Que

ryG

rem

linTr

aver

salE

xecu

tion

Tim

e(m

s,w

ithou

tind

exes

)SP

AR

QL

Que

ryE

xecu

tion

Tim

e(m

s,w

ithin

dexe

s)G

rem

linTr

aver

salE

xecu

tion

Tim

e(m

s,w

ithin

dexe

s)Ti

nker

(c)

Tink

er(w

)N

eo4J

(c)

Neo

4J(w

)Sp

akse

e(c

)Sp

akse

e(w

)V

irtu

oso

(c)

Vir

tuos

o(w

)Je

na(c

)Je

na(w

)4s

tore

(c)

4sto

re(w

)Ti

nker

(c)

Tink

er(w

)N

eo4J

(c)

Neo

4J(w

)Sp

akse

e(c

)Sp

akse

e(w

)C

122

0.12

136.

717

7.60

152.

7230

6.8

272.

6745

8.15

167.

2516

5216

1426

015

2.5

191.

610

7.69

157.

380

.05

253.

8218

4C

230

2.6

187.

327

2.67

231.

3634

0.84

224.

4861

.25

21.5

390

361.

2523

2.5

6523

8.1

100.

5221

8.46

97.1

427

9.01

203.

7C

317

.715

.418

.50

15.2

938

.86

13.8

127

170

.25

404.

6338

422

7.5

701.

50.

339.

432.

8629

.53.

68F1

16.3

215

.419

.72

16.4

032

.66

22.0

817

2.25

45.5

448

417.

527

5.25

72.5

17.8

315

.27

11.9

34.

7526

.23.

49F2

45.6

33.5

67.0

653

.17

102.

6173

.70

84.1

523

.75

945.

587

1.5

632

148.

251.

120.

934

.67

16.3

49.7

23.8

2F3

22.8

20.1

29.3

725

.81

44.9

325

.34

43.3

1581

2.75

904.

365

513

2.5

1.15

0.72

2.4

1.15

293.

77L

117

.616

34.7

333

.09

39.2

019

.97

14.4

57.

2537

9.15

369.

2524

572

.50.

246

0.16

93.

851.

7417

.85.

3L

210

7.17

56.5

771

.37

32.0

510

5.26

68.4

244

.521

.25

456.

7541

6.5

287.

555

.377

.949

.41

45.2

617

.98

43.8

21.5

2L

36.

846.

418

.80

14.7

79.

098.

538.

34.

7543

342

9.75

382.

2517

50.

790.

633.

251.

7212

.75

4.29

G1

42.5

728

.87

50.1

422

.49

48.1

143

.54

61.4

522

.581

1.34

544.

2146

5.19

132.

7518

.52

8.01

28.3

4.18

29.3

9.29

G2

19.1

516

.23

35.7

016

.58

20.7

618

.87

272.

369

.25

434.

338

3.75

212.

560

.34

18.5

16.5

18.2

3.98

11.6

73.

89G

326

0.47

149.

124

8.53

156.

5830

6.06

287.

5477

.25

50.5

916.

585

868

2.5

130

269.

516

3.3

133.

454

.52

212.

8217

4.25

Gc1

24.6

22.3

615

.85

11.2

729

.59

25.2

289

171.

2515

3514

5042

530

50.

330.

268

4.92

2.08

7.92

3.05

Gc2

23.4

721

.36

21.4

917

.09

27.0

424

.85

110.

394

.25

2809

.75

2717

.327

27.5

1110

.25

0.73

70.

679

6.07

2.34

5.88

3G

c323

.98

21.6

19.2

715

.62

28.7

825

.01

102

3010

6310

45.7

528

7.5

120.

750.

762

0.66

75.

192.

256.

312.

79O

132

8.57

192.

7723

8.3

173.

9546

1.59

402.

6418

3.5

164.

712

34.2

512

0020

7.45

130.

529

4.68

193.

4421

0.2

77.7

533

1.9

227.

24O

220

17.3

32.9

325

.32

56.6

435

.68

44.5

15.3

530

403.

2527

887

.54.

562.

638.

123.

9626

.73.

39O

352

5.81

357.

9848

9.63

369.

261

2.04

483.

5160

451

926

45.5

2109

650.

932

1.65

205.

115

8.18

196.

8715

3.76

338.

7417

9.82

U1

45.7

223

.17

53.1

548

.611

1.62

58.7

844

.89

13.2

549

8.5

504

207.

590

18.9

89.

4228

.51

13.4

947

.515

.59

U2

373.

9520

3.61

402.

4037

8.1

503

286.

2434

10.7

544

4.5

403

190

5525

7.5

173.

215

9.01

68.6

330

7.53

152.

7U

327

8.89

173.

0132

8.60

287.

2956

5.19

329.

115

1.4

40.7

511

1111

6950

7.5

190

218.

7315

3.9

141.

5247

.14

224.

172

.67

M1

1221

.776

5.7

246.

2511

6.57

453.

8236

7.93

74.1

558

.534

45.1

534

17.5

1535

498

899

618.

2512

6.33

75.7

922

6.23

130.

7M

280

6.1

537.

428

3.76

161.

4332

7.01

276.

0412

0.6

98.2

535

13.7

534

18.2

528

57.5

1412

.565

7.6

451

143.

6187

.81

139.

655

.84

M3

551.

630

9.32

161.

7281

.63

261.

9216

8.91

8475

.533

84.2

533

6710

8034

340

2.7

235.

966

.48

34.3

610

2.35

39.3

7S1

20.7

515

.525

.38

16.7

238

.83

32.7

938

8.15

108.

559

363

025

370

1.55

0.79

2.54

1.97

12.9

13.

94S2

17.5

15.4

21.1

614

.826

.45

21.8

933

289

.544

241

320

2.5

801.

450.

822.

391.

7110

.18

3.42

S334

.02

26.6

540

.75

31.5

452

.57

40.2

4064

140

329

2266

321

180

2175

1730

7.02

4.46

15.9

28.

0231

.26.

84

Tabl

e6

Perf

orm

ance

com

pari

son

ofSP

AR

QL

quer

ies

vstr

ansl

ated

Gre

mlin

trav

ersa

lsov

erB

SBM

data

set.

Page 22: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

22 H. Thakkar et al. / Gremlinator

Que

ryG

rem

linTr

aver

salE

xecu

tion

Tim

e(m

s,w

ithou

tind

exes

)SP

AR

QL

Que

ryE

xecu

tion

Tim

e(m

s,w

ithin

dexe

s)G

rem

linTr

aver

salE

xecu

tion

Tim

e(m

s,w

ithin

dexe

s)Ti

nker

(c)

Tink

er(w

)N

eo4J

(c)

Neo

4J(w

)Sp

akse

e(c

)Sp

akse

e(w

)V

irtu

oso

(c)

Vir

tuos

o(w

)Je

na(c

)Je

na(w

)4s

tore

(c)

4sto

re(w

)Ti

nker

(c)

Tink

er(w

)N

eo4J

(c)

Neo

4J(w

)Sp

akse

e(c

)Sp

akse

e(w

)C

111

.49

5.45

17.7

13.3

23.8

913

.836

.25

3.75

364

374

298

937.

754.

2512

.87

6.17

14.4

6.64

C2

20.7

5711

.828

.321

.231

.13

15.7

252.

7539

538

929

080

11.5

26.

1418

.45.

2623

.910

.52

C3

10.7

65.

8418

.912

.624

11.4

524.

2539

939

231

890

7.5

4.25

12.3

84.

5815

.17

7.1

F10.

770.

83.

492.

74.

452.

412

4.33

363

369

273

115

0.53

0.29

2.16

1.03

2.45

1.32

F215

.45

8.47

24.5

12.0

934

.317

.62

62

370

427

270

124

9.6

6.25

19.5

29.

2624

.411

.08

F32.

41.

246.

685.

4111

.28

9.7

52.

6547

037

325

211

51

0.72

8.15

3.38

7.6

3.46

L1

1.5

0.73

4.49

3.6

5.3

4.09

7.75

543

934

928

011

20.

550.

43.

011.

394.

52.

16L

22.

960.

477.

434.

64.

752.

812

2.5

364

364

295

113

0.39

0.38

4.12

1.83

4.57

2.25

L3

17.0

18.

9829

.118

.42

37.5

24.2

42.

541

037

827

910

811

.43

6.42

21.6

88.

0927

.115

.32

G1

62.

1512

.16

8.81

4.39

3.2

6418

.25

433

461

245

103

1.46

1.11

3.22

1.26

3.04

2.27

G2

5.14

1.8

10.6

5.5

3.62

2.7

279.

7534

836

928

592

2.29

0.9

3.72

2.35

2.8

2.19

G3

6.32

1.5

13.2

79.

764.

863.

82.

752

471

476

328

157

1.34

0.85

5.25

3.99

3.95

3.04

Gc1

10.2

5.45

17.3

13.6

15.4

99.

529

6.33

440

467

247

887.

253.

910

.83

7.05

9.66

4.6

Gc2

2514

.09

38.1

23.9

36.6

15.6

76

2.75

385

354

250

105

15.9

29.

2523

.62

9.84

18.9

7.01

Gc3

0.78

30.

686.

893.

74.

613.

847

.75

4.25

753

775

248

118

0.52

0.47

3.42

3.33

3.52

3.43

O1

18.2

512

.23

2817

.440

.625

.34.

52.

2544

648

326

712

512

.32

7.5

19.1

8.44

27.4

612

.2O

228

17.4

39.2

2354

.327

.81

208.

2532

135

427

095

19.2

11.9

322

.36.

8535

.49

13.8

O3

18.4

510

.44

28.3

18.0

242

.78

24.6

43.

752.

2554

557

330

012

010

.65

6.6

18.4

5.41

28.2

711

.3U

167

.25

36.7

570

.646

.25

81.7

239

.78

27.7

53.

7537

736

824

011

543

.75

24.1

135

.64

15.4

645

.34

28.7

6U

268

.535

.45

89.8

42.6

110

7.76

48.2

816

.52.

7552

953

625

2.5

130

4023

.25

53.2

129

.71

69.2

33.7

1U

375

.542

.698

.554

.415

7.6

62.9

131

.54.

534

643

426

011

2.5

44.5

23.5

865

.31

28.3

312

0.49

73.7

9M

122

.45

12.4

37.8

23.4

49.1

424

.512

6.25

457

473

325

140

12.8

76.

918

.09

7.97

24.1

13.8

M2

2.56

1.65

6.94

3.67

51.2

26.3

83

373

423

301

118

1.6

1.13

6.02

3.26

28.6

15.3

6M

32.

51.

927

4.16

50.8

25.4

3514

.556

354

729

811

01.

81.

254.

073.

3827

.815

.59

S135

.828

.442

.620

.935

.615

.917

667

.545

137

227

813

532

.517

.326

.47

12.1

521

.410

.63

S27.

855.

0311

.85.

321

.410

.850

24.7

532

836

925

398

4.09

3.25

2.25

2.02

18.9

8.51

S34.

231.

426.

93.

817

.68.

390

282

436

240

625

080

1.25

0.82

9.15

5.01

11.9

6.02

Tabl

e7

Perf

orm

ance

com

pari

son

ofSP

AR

QL

quer

ies

vstr

ansl

ated

Gre

mlin

trav

ersa

lsov

erN

orth

win

dda

tase

t.

Page 23: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

H. Thakkar et al. / Gremlinator 23

References

[1] M.A. Rodriguez, The Gremlin graph traversal machine andlanguage (invited talk), in: Proceedings of the 15th Symposiumon Database Programming Languages, Pittsburgh, PA, USA,October 25-30, 2015, 2015.

[2] S. Das, J. Srinivasan, M. Perry, E.I. Chong and J. Banerjee, ATale of Two Graphs: Property Graphs as RDF in Oracle., in:EDBT, 2014.

[3] D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov,D. Lanti, M. Rezk, M. Rodriguez-Muro and G. Xiao, Ontop:Answering SPARQL queries over relational databases, Seman-tic Web 8(3) (2017), 471–487.

[4] M. Rodriguez-Muro and M. Rezk, Efficient SPARQL-to-SQLwith R2RML mappings, Web Semantics: Science, Services andAgents on the World Wide Web 33 (2015).

[5] B. Elliott, E. Cheng, C. Thomas-Ogbuji and Z.M. Ozsoyoglu,A complete translation from SPARQL into efficient SQL, in:Proceedings of the 2009 International Database Engineering& Applications Symposium, ACM, 2009.

[6] A. Chebotko, S. Lu and F. Fotouhi, Semantics preservingSPARQL-to-SQL translation, Data & Knowledge Engineering68(10) (2009).

[7] F. Zemke, Converting sparql to sql, Technical Report, Techni-cal Report, October 2006., 2006.

[8] F. Priyatna, O. Corcho and J. Sequeda, Formalisation and ex-periences of R2RML-based SPARQL to SQL query translationusing Morph, in: Proceedings of the 23rd international confer-ence on World wide web, ACM, 2014.

[9] J. Rachapalli, V. Khadilkar, M. Kantarcioglu and B. Thu-raisingham, RETRO: A Framework for Semantics PreservingSQL-to-SPARQL Translation, The University of Texas at Dal-las 800 (2011).

[10] R. Angles, M. Arenas, P. Barceló, A. Hogan, J.L. Reutter andD. Vrgoc, Foundations of Modern Graph Query Languages,CoRR abs/1610.06264 (2016).

[11] M.A. Rodriguez and P. Neubauer, A path algebra for multi-relational graphs, in: Workshops Proceedings of the 27th Inter-national Conference on Data Engineering, ICDE 2011, 2011.

[12] H. Thakkar, D. Punjani, M.-E. Vidal and S. Auer, Towardsan Integrated Graph Algebra for Graph Pattern Matching withGremlin, in: Proceedings of the 28th International Conference,DEXA 2017, Lyon, France, August 28-31, 2017, Proceedings,Part I, Springer, 2017, pp. 81–91.

[13] W.W.W. Consortium et al., RDF 1.1 concepts and abstract syn-tax, WC3 Archive (2014).

[14] R. Cyganiak, A relational algebra for SPARQL, Digital MediaSystems Laboratory HP Laboratories Bristol. HPL-2005-170(2005).

[15] J. Pérez, M. Arenas and C. Gutierrez, Semantics and Complex-ity of SPARQL, in: International semantic web conference,Springer, 2006.

[16] J.L. Reutter, Graph Patterns: Structure, Query Answering andApplications in Schema Mappings and Formal Language The-ory, Edinburgh Research Archive (2013).

[17] V. Nguyen, J. Leeka, O. Bodenreider et al., A Formal GraphModel for RDF and Its Implementation, CoRR abs/1606.00480(2016).

[18] A. Gubichev and M. Then, Graph Pattern Matching - Do WeHave to Reinvent the Wheel?, in: Second International Work-shop on Graph Data Management Experiences and Systems,GRADES 2014, co-loated with SIGMOD/PODS 2014, Snow-bird, Utah, USA, June 22, 2014, 2014.

[19] C. Krause, D. Johannsen, R. Deeb et al., An SQL-based querylanguage and engine for graph pattern matching, in: Interna-tional Conference on Graph Transformation, Springer, 2016.

[20] M.A. Rodriguez and P. Neubauer, The Graph Traversal Pattern,in: Graph Data Management: Techniques and Applications.,IGI Global, 2011.

[21] E. Prud, A. Seaborne et al., SPARQL query language for RDF,Citeulike Online Archive (2006).

[22] M. Schmidt, M. Meier and G. Lausen, Foundations ofSPARQL query optimization, in: Proceedings of the 13th In-ternational Conference on Database Theory, ACM, 2010.

[23] R. Angles and C. Gutierrez, The Multiset Semantics ofSPARQL Patterns, in: The Semantic Web - ISWC 2016 - 15thInternational Semantic Web Conference, Kobe, Japan, October17-21, 2016, Proceedings, Part I, 2016.

[24] R. Angles and C. Gutierrez, The expressive power of SPARQL,in: International Semantic Web Conference, Springer, 2008.

[25] S. Harris, A. Seaborne and E. Prud’hommeaux, SPARQL 1.1query language, W3C recommendation 21(10) (2013).

[26] G.S.J. Marton, Formalizing openCypher Graph Queries in Re-lational Algebra, Published online on FTSRG archive (2017).

[27] J. Hölsch and M. Grossniklaus, An Algebra and Equivalencesto Transform Graph Patterns in Neo4j, in: EDBT/ICDT 2016Workshops: EDBT Workshop on Querying Graph StructuredData (GraphQ), 2016.

[28] C. Bizer and A. Schultz, The berlin sparql benchmark, 2009.[29] O. Erling, Virtuoso, a Hybrid RDBMS/Graph Column Store.,

IEEE Data Eng. Bull. 35(1) (2012).[30] S. Harris, N. Lamb and N. Shadbolt, 4store: The design and

implementation of a clustered RDF store, in: 5th InternationalWorkshop on Scalable Semantic Web Knowledge Base Systems(SSWS2009), 2009, pp. 94–109.

[31] A.T.P. Home, Apache tinkerpop home, Web Page (2016).[32] N. Martinez-Bazan, S. Gomez-Villamor and F. Escale-

Claveras, DEX: A high-performance graph database manage-ment system, in: Data Engineering Workshops (ICDEW), 2011IEEE 27th International Conference on, IEEE, 2011, pp. 124–127.

[33] H. Thakkar, D. Punjani, J. Lehmann and S. Auer, KillingTwo Birds with One Stone – Querying Property Graphs us-ing SPARQL via GREMLINATOR, CoRR abs/1801.09556(2018).

[34] J. Lehmann, G. Sejdiu, L. Bühmann, P. Westphal, C. Stadler,I. Ermilov, S. Bin, N. Chakraborty, M. Saleem and A.-C.N. Ngomo, Distributed Semantic Analytics using theSANSA Stack, in: Proceedings of the 16th International Se-mantic Web Conference (ISWC), Springer, 2017, pp. 147–155.

[35] H. Thakkar, Towards an Open Extensible Framework for Em-pirical Benchmarking of Data Management Solutions: LIT-MUS, in: The Semantic Web - 14th International Conference,ESWC 2017, Portorož, Slovenia, May 28 - June 1, 2017, Pro-ceedings, Part II, 2017, pp. 256–266.

[36] H. Thakkar, Y. Keswani, M. Dubey, J. Lehmann and S. Auer,Trying Not to Die Benchmarking–Orchestrating RDF andGraph Data Management Solution Benchmarks Using LIT-MUS (2017).

Page 24: Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves ... · Semantic Web 0 (0) 1 1 IOS Press A Stitch in Time Saves Nine – SPARQL Querying of Property Graphs using Gremlin Traversals

24 H. Thakkar et al. / Gremlinator

[37] Y. Keswani, H. Thakkar, M. Dubey, J. Lehmann and S. Auer,The LITMUS Test: Benchmarking RDF and Graph Data Man-agement Systems.

[38] A. Gubichev, Query Processing and Optimization in GraphDatabases, PhD thesis, München, Technische UniversitätMünchen, Diss., 2015, 2015.

[39] R. Angles, A comparison of current graph database models,in: Data Engineering Workshops (ICDEW), 2012 IEEE 28thInternational Conference on, IEEE, 2012.

[40] R. Nambiar, N. Wakou, F. Carman et al., Transaction Process-ing Performance Council (TPC): State of the Council 2010,in: Performance Evaluation, Measurement and Characteriza-tion of Complex Systems: Second TPC Technology Conference,TPCTC 2010, Revised Selected Papers, Springer Berlin Hei-delberg, 2011. ISBN 978-3-642-18206-8.

[41] M. Saleem, Y. Khan, A. Hasnain, I. Ermilov and A.N. Ngomo,A fine-grained evaluation of SPARQL endpoint federation sys-tems, Semantic Web 7 (2015).

[42] G. Tsatsaronis, G. Balikas, P. Malakasiotis et al., An overviewof the BIOASQ large-scale biomedical semantic indexingand question answering competition, BMC Bioinformatics 16(2015).

[43] R. Usbeck, M. Röder, A.N. Ngomo et al., GERBIL: Gen-eral Entity Annotator Benchmarking Framework, in: Proceed-ings of the 24th International Conference on World Wide Web,WWW 2015, 2015.

[44] C. Unger, C. Forascu, V. Lopez et al., Question Answeringover Linked Data (QALD-5), in: Working Notes of CLEF 2015,Toulouse, France, 2015.

[45] R. Angles, P.A. Boncz, J. Larriba-Pey et al., The linked databenchmark council: a graph and RDF industry benchmarkingeffort, SIGMOD Record (2014).

[46] X. Zhang and J. Van den Bussche, On the power of SPARQLin expressing navigational queries, The Computer Journal 58(2015).

[47] D. Dominguez-Sal, P. Urbón-Bayes, A. Giménez-Vañó et al.,Survey of Graph Database Performance on the HPC ScalableGraph Analysis Benchmark, in: Proceedings of the 2010 In-ternational Conference on Web-age Information Management,WAIM’10, Springer-Verlag, 2010. ISBN 3-642-16719-5, 978-3-642-16719-5.

[48] R. Usbeck, M. Röder, A.-C. Ngonga Ngomo et al., GERBIL:General Entity Annotator Benchmarking Framework, in: Pro-

ceedings of the 24th International Conference on World WideWeb, WWW ’15, ACM, 2015. ISBN 978-1-4503-3469-3.

[49] M. Morsey, J. Lehmann, S. Auer et al., DBpedia SPARQLBenchmark – Performance Assessment with Real Queries onReal Data, in: The Semantic Web – ISWC 2011: 10th Interna-tional Semantic Web Conference, Proceedings, Part I, SpringerBerlin Heidelberg, 2011. ISBN 978-3-642-25073-6.

[50] R.C. Murphy, K.B. Wheeler, B.W. Barrett and J.A. Ang, Intro-ducing the GRAPH 500, Cray User’s Group (CUG) (2010).

[51] M. Dayarathna and T. Suzumura, XGDBench: A benchmark-ing platform for graph stores in exascale clouds., in: Cloud-Com, IEEE Computer Society, 2012. ISBN 978-1-4673-4511-8.

[52] C. Bizer and A. Schultz, Benchmarking the performance ofstorage systems that expose SPARQL endpoints, World WideWeb Internet And Web Information Systems (2008).

[53] A.-C.N. Ngomo and M. Röder, HOBBIT: Holistic Benchmark-ing for Big Linked Data, ERCIM News 2016 (2016).

[54] G. Aluç, O. Hartig, M.T. Özsu et al., Diversified stress testingof RDF data management systems, in: International SemanticWeb Conference, Springer, 2014.

[55] A. Flores, G. Palma, M.-E. Vidal et al., GRAPHIUM: visualiz-ing performance of graph and RDF engines on linked data, in:Proceedings of the 2013th International Conference on Posters& Demonstrations Track-Volume 1035, CEUR-WS. org, 2013.

[56] C. Bizer and A. Schultz, The Berlin SPARQL Benchmark, Int.J. of Semantic Web Inf. Syst. 5 (2009).

[57] Y. Guo, Z. Pan and J. Heflin, LUBM: A Benchmark for OWLKnowledge Base Systems, Web Semant. 3 (2005).

[58] M. Schmidt, T. Hornung, M. Meier et al., SP2Bench: ASPARQL Performance Benchmark., in: Semantic Web Infor-mation Management, Springer, 2009. ISBN 978-3-642-04328-4.

[59] H. Thakkar, M. Dubey, G. Sejdiu et al., LITMUS: An OpenExtensible Framework for Benchmarking RDF Data Manage-ment Solutions, CoRR abs/1608.02800 (2016).

[60] M.A. Rodriguez and J.H. Watkins, Quantum Walks with Grem-lin, CoRR abs/1511.06278 (2015).

[61] O. Hartig, Reconciliation of RDF* and Property Graphs, CoRRabs/1409.3288 (2014).