geographica: a benchmark for geospatial rdf stores

Post on 10-May-2015

735 Views

Category:

Education

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Geospatial extensions of SPARQL like GeoSPARQL and stSPARQL have recently been defined and corresponding geospatial RDF stores have been implemented. However, there is no widely used benchmark for evaluating geospatial RDF stores which takes into account recent advances to the state of the art in this area. In this paper, we develop a benchmark, called Geographica, which uses both real-world and synthetic data to test the o ffered functionality and the performance of some prominent geospatial RDF stores.

TRANSCRIPT

Benchmarking Geospatial RDF Stores

George Garbis, Kostis Kyzirakos, Manolis Koubarakis

Dept. of Informatics and Telecommunications, National and Kapodistrian University of Athens, Greece

1st International Workshop on Benchmarking RDF Systems (BeRSys 2013)

25/26/2013

Outline

• Motivation

• SPARQL extensions for querying geospatial data expressed in RDF

• State-of-the-art geospatial RDF stores

• Related benchmarks

• The benchmark Geographica

• Evaluating the performance of geospatial RDF stores using Geographica

• Conclusions

4

Geospatial data on the Web

5

Open Government Data

6

Linked geospatial data – Ordnance Survey

7

Linked geospatial data – Open Street Map

8

Linked geospatial data – http://www.linkedopendata.gr

9

Linked geospatial data – Wildfire Monitoring

105/26/2013

Outline

• Motivation

• SPARQL extensions for querying geospatial data expressed in RDF

• State-of-the-art geospatial RDF stores

• Related benchmarks

• The benchmark Geographica

• Evaluating the performance of geospatial RDF stores using Geographica

• Conclusions

Our Contributions

• The data model stRDF and the query language stSPARQL

• The system Strabon

11

[ ISWC 2012 ]

The Data Model stRDF

• stRDF stands for spatiotemporal RDF.

• It is an extension of the W3C standard RDF for the representation of geospatial data that may change over time.

• stRDF extends RDF with:

• Spatial literals encoded in OGC standards Well-Known Text or GML

• New datatypes for spatial literals (strdf:WKT, strdf:GML and strdf:geometry)

• Valid time of triples (ignored in this talk) [ ESWC 2013 ]

stRDF: An example

stRDF: An example (WKT)

ex:BurntArea1 rdf:type noa:BurntArea.

ex:BurntArea1 noa:hasID "1"^^xsd:decimal.

ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.

ex:BurntArea1 strdf:hasGeometry "POLYGON(( 38.16 23.7, 38.18 23.7, 38.18 23.8, 38.16 23.8, 38.16 23.7)); <http://spatialreference.org/ref/epsg/4121/>"^^strdf:WKT .

Spatial Literal (OpenGIS Simple Features)

Spatial Data Type Well-Known Text

stRDF: An example (GML)

ex:BurntArea1 rdf:type noa:BurntArea.

ex:BurntArea1 noa:hasID "1"^^xsd:decimal.

ex:BurntArea1 noa:hasArea "23.7636"^^xsd:double.

ex:BurntArea1 strdf:hasGeometry

"<gml:Polygon

srsName='http://www.opengis.net/def/crs/EPSG/0/4121'>

<gml:outerBoundaryIs>

<gml:LinearRing>

<gml:coordinates>38.16,23.70 38.18,23.70 38.18,

23.80 38.16,23.80,38.16 23.70

</gml:coordinates>

</gml:LinearRing>

</gml:outerBoundaryIs>

</gml:Polygon>"^^strdf:GML .

Spatial Literal (GML Simple Features Profile)

Spatial Data Type GML

• Find all burnt forests close to a citySELECT ?BA ?BAGEO

WHERE {

?R rdf:type noa:Region .

?R strdf:hasGeometry ?RGEO .

?R noa:hasLandCover ?F .

?F rdfs:subClassOf clc:Forests .

?CITY rdf:type dbpedia:City .

?CITY strdf:hasGeometry ?CGEO .

?BA rdf:type noa:BurntArea .

?BA strdf:hasGeometry ?BAGEO .

FILTER(strdf:anyInteract(?RGEO,?BAGEO) &&

strdf:distance(?BAGEO,?CGEO) < 0.02) }

stSPARQL: An example

Spatial Functions(OGC Simple Feature Access)

stSPARQL: Geospatial SPARQL 1.1

• We start from SPARQL 1.1.

• We add a SPARQL extension function for each function defined in the OGC standard OpenGIS Simple Feature Access – Part 2: SQL option (ISO 19125) for adding geospatial data to relational DBMSs and SQL.

• We add appropriate geospatial extensions to SPARQL 1.1 Update language

stSPARQL (cont’d)

• Basic functions

• Get a property of a geometry (e.g., strdf:srid)

• Get the desired representation of a geometry (e.g., strdf:AsText)

• Test whether a certain condition holds (e.g., strdf:IsEmpty, strdf:IsSimple)

• Functions for testing topological spatial relationships (e.g., strdf:equals, strdf:intersects)

• OGC Simple Features Access, Egenhofer, RCC-8

• Spatial analysis functions

• Construct new geometric objects from existing geometric objects (e.g., strdf:buffer, strdf:intersection, strdf:convexHull)

• Spatial metric functions (e.g., strdf:distance, strdf:area)

• Spatial aggregate functions (e.g., strdf:union, strdf:extent)

stSPARQL (cont’d)

• SELECT clause

Construction of new geometries (e.g., strdf:buffer(?geo, 0.1))

Spatial aggregate functions (e.g., strdf:extent(?geo))

Metric functions (e.g., strdf:area(?geo))

• FILTER clause

Functions for testing topological spatial relationships between spatial terms (e.g., strdf:contains(?G1, strdf:union(?G2, ?G3)))

Numeric expressions involving spatial metric functions (e.g.,strdf:area(?G1)<=2*strdf:area(?G2)+1)

• HAVING clause

Spatial aggregate functions and spatial metric functions or functions testing for topological relationships between spatial terms (e.g., strdf:area(strdf:union(?geo))>1)

• Isolate the parts of the burnt areas that lie in coniferous forests.

SELECT ?burntArea (strdf:intersection(?baGeom, strdf:union(?fGeom)) AS ?burntForest) WHERE {

?burntArea rdf:type noa:BurntArea; noa:hasGeometry [ noa:hasSerialization ?baGeo ].

?forest rdf:type noa:Region; clc:hasLandCover noa:coniferousForest;

clc:hasGeometry [ clc:hasSerialization ?fGeom ].

FILTER(strdf:intersects(?baGeom,?fGeom)) } GROUP BY ?burntArea ?baGeom

stSPARQL: An example

GeoSPARQL

Core

Topology VocabularyExtension

- relation family

Geometry Extension - serialization - version

Geometry TopologyExtension

- serialization - version - relation family

Query RewriteExtension

- serialization - version - relation family

RDFS Entailment Extension

- serialization - version - relation family

Parameters• Serialization

• WKT• GML

• Relation Family• Simple

Features• RCC8• Egenhofer

System Language Index Geometries CRS support

Comments on Functionality

Strabon stSPARQL/ GeoSPARQL*

R-tree-over-GiST

WKT / GML support

Yes • OGC-SFA• Egenhofer• RCC-8

Parliament GeoSPARQL R-Tree WKT / GML support

Yes •OGC-SFA•Egenhofer•RCC-8

Brodt et al. (RDF-3X)

SPARQL R-Tree WKT support No OGC-SFA

Perry SPARQL-ST R-Tree GeoRSS GML Yes RCC8

AllegroGraph Extended SPARQL

Distribution sweeping technique

2D point geometries

Partial •Buffer•Bounding Box•Distance

OWLIM Extended SPARQL

Custom 2D point geometries (W3C Basic Geo Vocabulary)

No •Point-in-polygon•Buffer•Distance

Virtuoso SPARQL R-Tree 2D point geometries (in WKT)

Yes SQL/MM (subset)

uSeekM GeoSPARQL R-tree-over GiST

WKT support No OGC-SFA

235/26/2013

Outline

• Motivation

• SPARQL extensions for querying geospatial data expressed in RDF

• State-of-the-art geospatial RDF stores

• Related benchmarks

• The benchmark Geographica

• Evaluating the performance of geospatial RDF stores using Geographica

• Conclusions

245/26/2013

Related Work

• Benchmarks for SPARQL query processing:

• LUBM (JWS 2005)

• DBpedia SPARQL Benchmark (ISWC 2011)

• …

• Benchmarks for geospatial relational DBMS:

• Sequoia 2000 (SIGMOD 1993)

• VESPA (BNCOD 2000)

• Jackpine (ICDE 2011)

• Benchmarks for spatial indexing and query processing operations

• Benchmarks for geospatial RDF stores:

• A Benchmark for Spatial Semantic Web Systems (Kolas, SSWS 2008)

255/26/2013

The Benchmark Geographica

• Aim: measure the performance of today’s geospatial RDF stores

• Organized around two workloads:

• Real-world workload:

• Based on existing linked geospatial datasets and known application scenarios

• Synthetic workload:

• Measure performance in a controlled environment where we can play around with properties of the data and the queries.

• Γεωγραφικά: 17-volume geographical

encyclopedia by Στράβων (AD 17)

265/26/2013

Real-World Workload

• Datasets: Real-world datasets for the geographic area of Greece playing an important role in the LOD cloud or having complex geometries

• LinkedGeoData (LGD) for rivers and roads in Greece

• GeoNames for Greece

• DBpedia for Greece

• Greek Administrative Geography (GAG)

• CORINE land cover (CLC) for Greece

• Hotspots

275/26/2013

Real-World WorkloadDatasets

Dataset Size # of Triples

# of Points

# of Lines(max/min/avg points/line)

# of Polygons(max/min/avg

points/polygon)

GAG 33MB 4K - - 325 (15K/4/400)

CLC 401MB 630K - - 45K (5K/4/140)

LGD 29MB 150K - 12K (1.6K/2/21)

-

GeoNames 45MB 400K 22K - -

DBpedia 89MB 430K 8K - -

Hotspots 90MB 450K - - 37K (4/4/4)

user
Να κανατσεκάρω τα νούμερα στις παρενθέσεις.Μήπως να τα βάλω σε ξεχωριστή στήλη??

285/26/2013

Real-World WorkloadParts

• For this workload, Geographica has two parts (following Jackpine):

• Micro part: Tests primitive spatial functions offered by geospatial RDF stores

• Macro part: Simulates some typical application scenarios

295/26/2013

Real-World WorkloadMicro part

• 29 queries that consist of one or two triple patterns and a spatial function.

• Functions included:

• Spatial analysis: boundary, envelope, convex hull, buffer, area

• Topological: equals, intersects, overlaps, crosses, within, distance, disjoint

• As used in spatial selections and spatial joins

• Spatial aggregates: extent, union

• Functions are applied to many representative types of geometries .

305/26/2013

Example – spatial analysis

• Construct the boundary of all polygons of CLC

PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ( geof:boundary(?o1) as ?ret )

WHERE {

GRAPH <http://geographica.di.uoa.gr/dataset/clc>

{ ?s1 <http://geo.linkedopendata.gr/corine/ontology#asWKT> ?o1} }

315/26/2013

Example – spatial selection

• Find all points in Geonames that are contained in a given polygon.

PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ?s1 ?o1

WHERE {

GRAPH <http://geographica.di.uoa.gr/dataset/geonames>

{ ?s1 <http://www.geonames.org/ontology#asWKT> ?o1 }

FILTER( geof:sfWithin(?o1, "GIVEN_POLYGON_IN_WKT"^^<http://www.opengis.net/ont/geosparql#wktLiteral>)). }

325/26/2013

Example – spatial join

• Find all pairs of GAG polygons that overlap

PREFIX geof: <http://www.opengis.net/def/function/geosparql/>

SELECT ?s1 ?s2

WHERE {

GRAPH <http://geographica.di.uoa.gr/dataset/gag>

{?s1 <http://geo.linkedopendata.gr/gag/ontology/asWKT> ?o1}

GRAPH <http://geographica.di.uoa.gr/dataset/clc>

{?s2 <http://geo.linkedopendata.gr/corine/ontology#asWKT> ?o2}

FILTER( geof:sfOverlaps(?o1, ?o2) )

}

Query Point Query Line Query Polygon

Point Within BufferDistance

WithinDisjoint

Line EqualsCrosses

IntersectsDisjoint

Polygon Intersects EqualsOverlaps

Real-World WorkloadMicro part

• Spatial Selections

• Spatial JoinsPoint Line Polygon

Point Equals Intersects IntersectsWithin

Line IntersectsWithinCrosses

Polygon WithinTouchesOverlaps

345/26/2013

Real-World WorkloadMacro part

• Reverse Geocoding: Attribute a street address and place to a given point.

• Queries:

• Find the closest populated place (from GeoNames)

• Find the closest street (from LGD)

355/26/2013

Real-World WorkloadMacro part

• Web Map Search and Browsing

• Queries:

• Find the co-ordinates of a given POI based on thematic criteria (from GeoNames)

• Find roads in a given bounding box around these co-ordinates (from LGD)

• Find other POI in a given bounding box around these co-ordinates (from LGD)

365/26/2013

Real-World WorkloadMacro part

• Rapid Mapping for Fire Monitoring: representative of typical rapid mapping tasks carried out by space agencies in the case of an emergency

375/26/2013

Real-World WorkloadMacro part

• Rapid Mapping for Fire Monitoring

• Queries:

• Simple tasks retrieving background mapping information:

• Find the land cover of areas inside a given bounding box (from CLC)

• Find primary roads inside a given bounding box (from LGD)

• Find capitals of prefectures inside a given bounding box (from GAG)

• (Often) more complex main mapping tasks:

• Find municipality boundaries inside a bounding box (from GAG)

• Find coniferous forests which are on fire (from CLC and Hotspots)

• Find road segments which may be damaged by fire (from LGD and Hotspots)

385/26/2013

Experimental EvaluationReal-world workload

• Geospatial RDF stores tested: Strabon, Parliament, uSeekM

• Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz, 24GB RAM, 4 HDD with RAID-5

• Micro part:

• Metric: response time

• Run 3 times and compute the median

• Time out: 1 hour

• Run both on warm caches and cold caches

• Macro part:

• Run each scenario many times for one hour with warm caches

• Metric: Average time for a complete execution

395/26/2013

ResultsData Storage

• Strabon is the slowest given the PostGIS tables and indices it is building.

• uSeekM does much better using the Sesame native store

• Parliament slows down due to geo:asWKT subproperty inferencing

System Strabon uSeekM Parliament

Storage time 550 sec 214 sec 250 sec

405/26/2013

ResultsMicro part (cold caches)

415/26/2013

ResultsMacro part

Scenario Strabon uSeekM Parliament

Reverse Geocoding 65 sec 0.77 sec 2.6 sec

Map Search and Browsing

0.9 sec 0.6 sec 22.2 sec

Rapid Mapping for Fire Monitoring

207.4 sec - -

425/26/2013

Synthetic Workload

• Goal: Evaluate performance in a controlled environment where we can vary the thematic and spatial selectivity of queries

• Thematic selectivity: the fraction of the total geographic features of a dataset that satisfy the non-spatial part of a query

• Spatial selectivity: the fraction of the total geographic features of a dataset which satisfy the topological relation in the FILTER clause of a query

435/26/2013

Synthetic Workload

• Dataset: As in VESPA, the produced datasets are geographic features on a synthetic map:

• States in a country ( (n/3)2 )

• Land ownership (n2)

• Roads (n)

• POI (n2)

445/26/2013

Synthetic WorkloadOntology

• Based roughly on the ontology of OpenStreetMap and the GeoSPARQL vocabulary

• Tagging each feature with a key enables us to select a known fraction of features in a uniform way

455/26/2013

Synthetic WorkloadQuery template for spatial selections

SELECT ?s

WHERE {?s ns:hasGeometry ?g.?s c:hasTag ?tag.?g ns:asWKT ?wkt.?tag ns:hasKey “THEMA”

FILTER(FUNCTION(?wkt, “GEOM”))}

• Parameters:

• ns: specifies the kind of feature (and geometry type) examined

• THEMA: defines the thematic selectivity of the query using another parameter k

• FUNCTION: specifies the topological function examined

• GEOM: specifies a rectangle that controls the spatial selectivity of the query

465/26/2013

Synthetic WorkloadQuery template for spatial joins

SELECT ?s1 ?s2

WHERE {?s1 ns1:hasGeometry ?g1.?s1 c:hasTag ?tag1.?g1 ns:asWKT ?wkt1.?tag1 ns:hasKey “THEMA”

?s2 ns2:hasGeometry ?g2.?s2 c:hasTag ?tag2.?g2 ns2:asWKT ?wkt2.?tag2 ns2:hasKey “THEMA’”

FILTER(FUNCTION(?wkt1, ?wkt2))}

475/26/2013

Experimental EvaluationSynthetic workload

• Geospatial RDF stores tested: Strabon, Parliament, uSeekM

• Machine: Intel Xeon E5620, 12MB L3 cache, 2.4GHz, 24GB RAM, 4 HDD with RAID-5

• Details:

• Metric: response time

• Run 3 times and compute the median

• Time out: 1 hour

• Run both on warm caches and cold caches

485/26/2013

ResultsData Storage

• We generate the synthetic dataset with n=512 and k=9. This results in:

• 28900 states

• 262144 land ownerships

• 512 roads

• 262144 points of interest

• Size: 3,880,224 triples (745 MB)

System Strabon uSeekM Parliament

Storage time 221 sec 406 sec 462 sec

495/26/2013

ResultsSynthetic Workload (Spatial Selections)

IntersectsTag 1, cold caches

IntersectsTag 512, cold caches

505/26/2013

ResultsSynthetic Workload (Spatial Joins)

Intersects

515/26/2013

Outline

• Motivation

• SPARQL extensions for querying geospatial data expressed in RDF

• State-of-the-art geospatial RDF stores

• Related benchmarks

• The benchmark Geographica

• Evaluating the performance of geospatial RDF stores using Geographica

• Conclusions

525/26/2013

Conclusions

• We defined Geographica, a new comprehensive benchmark for geospatial RDF stores, and used it to compare 3 relevant systems (Strabon, Parliament, uSeekM).

• More implementation work is necessary in adding features to other geospatial RDF stores beyond the ones tested.

• More real-world scenarios can be added.

• Next target: spatiotemporal RDF stores

535/26/2013

Advertisement

Strabon: http://strabon.di.uoa.gr

Geographica: http://geographica.di.uoa.gr

Tutorials/Survey paper

More at ESWC

Paper: Storing and Querying the Valid Time of Triples in Linked Geospatial Data

Demo: Sextant, a web tool for browsing and mapping Linked Geospatial Data http://strabon.di.uoa.gr:8080/sextant/

Project networking: TELEIOS http://www.earthobservatory.eu

top related