linked open geodata management in the cloud k. kritikos, y. roussakis ics-forth d. kotzinos...
TRANSCRIPT
![Page 1: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/1.jpg)
Linked Open GeoData Management in the Cloud
K. Kritikos, Y. Roussakis ICS-FORTHD. Kotzinos ICS-FORTH & TEI of Serres
![Page 2: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/2.jpg)
Cloud Computing
2
Better (faster, reliable, etc.) infrastructure - IaaS
Development infrastructure – PaaS
Software infrastructure – SaaS
![Page 3: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/3.jpg)
Cloud Computing
3
DataData as a Service (DaaS)Data as a Service (DaaS)• Publication• Querying• Updating
![Page 4: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/4.jpg)
Linked (Open) Data as a Service
• Publishing Linked Data– URI construction– Conceptual Model– Storage as RDF files or SPARQL endpoints
• Querying Linked Data– SPARQL– GeoSPARQL
• Updating Linked Data– SPARUL– Synchronization with original sources
4
![Page 5: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/5.jpg)
Problem Introduction (I)
• INGeoCloudS FP7 Pilot B Project (www.ingeoclouds.eu)• Geophysical data from different sources and in different
formats (excel, xml, relational, nothing …)• Borehole and Groundwater Water Analysis
– Boreholes located in Mygdonia/Thriasio of Greece, whole country in Denmark and France and their features (static data over time)
– Chemical analyses of ground waters sampled from their boreholes (data updated over time)
• Earthquake events and features• Landslides
5
![Page 6: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/6.jpg)
Data granularity
• Data refer to different levels of granularity, e.g. susceptibility maps refer to a country-wide area while earthquakes or boreholes are point-level data
• Data might need to be aggregated by such aggregation is based on the spatial dimension, i.e. points contained within a polygon
• Some problems of aggregation do exist since phenomena outside the area of concern may affect it, so spatial aggregation might not be enough
![Page 7: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/7.jpg)
Earthquakes:•How much back in time should we go?•What information should be kept/would be relevant?•How should we query the repository to get the relevant information?
Earthquakes:•How much back in time should we go?•What information should be kept/would be relevant?•How should we query the repository to get the relevant information?
Landslides:•Which area and how much is it affected?•How does this change over time?•Is the earthquake effect cumulative or fades over time?
Landslides:•Which area and how much is it affected?•How does this change over time?•Is the earthquake effect cumulative or fades over time?
![Page 8: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/8.jpg)
Problem Introduction (II)
• Data/Metadata Standards– INSPIRE standard proposes generic conceptual schema for scientific
data + models for 34 spatial data themes
• Deal with geospatial data & maintaining schemas/ontologies becomes difficult– Challenge is to exploit semantic heterogeneity
• Need to offer seamless & transparent LOD as a service (LODaaS) way to manage LOD data– Lack of tools for mapping, transforming & synchronizing geo-spatial
LD– Generic LOD management independent of way LOD are stored
![Page 9: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/9.jpg)
Points of interest
• GeoData get bigger and more important– Used in a variety of applications in different fields
• Size & high demand impose considerable requirements in infrastructure storage size & compute power
• Need to be reused and linked with other data sets– Go beyond current Web paradigm of isolated data silos
• Current geo-spatial open data management work does not offer such effort– Cloud-based approaches:
• do not provide geo-spatial support • Some do not fully support SPARQL or offer SPARQL end-points
– Centralized approaches offer geo-spatial support but:• Do not enable automatic mapping between relational and RDF data• Worse performance in general (with the exception of Strabon wrt geo-
spatial query support)
![Page 10: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/10.jpg)
Proposed Solution (I)
• A specific set of LODaaS services for geo-spatial LOD publishing, integration & querying
• Cloud is offering its scalability & elasticity of computation, 24/7 availability & multiple data storage and integration offerings
• Our cloud-based service-oriented system:– Exhibits good LOD management performance– Exposes a LOD management service that abstracts away
RDF Store peculiarities & provides a generic way for LOD access and management
![Page 11: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/11.jpg)
Proposed Solution (II)
• A particular solution is adopted for mapping geo-spatial data in different formats to RDF data
• The latter conform to extensible conceptual models that accurately capture thematic areas and are integrated via GeoScientific Observation Model– This allows imposing queries across providers and thematic fields
• Our solution is part of the system, developed in the context of the InGeoCloudS project, that exploits cloud capabilities & LD technology to integrate & store heterogeneous geo-spatial data sets of different thematic fields + host & execute applications that exploit these data sets
![Page 12: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/12.jpg)
Architecture (I)
• System is scalable and elastic by exploiting cloud facilities• An extensive application pool can be built on top that
exploits the offered services to perform various added-value and high-demanding tasks:– LO GeoData visualization, discovery & composition of data-sets,
LO GeoData analytics– System could be extended to host such applications & offer
various (geo-spatial) LO GeoData processing services and pre-built applications
![Page 13: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/13.jpg)
Architecture (II)
• Distributor: equally distributes generic queries & collects back the results, non-generic queries are sent to instances with the appropriate data, data distribution achieved by assigning new data to the less loaded wrt storage space scaling layer, exploits CPU monitoring & elasticity facilities of Amazon
• Scaling Layer: comprises one or more LOD management components, data are replicated across these components to enhance reliability & enable layer-based load balancing
• LOD Management Component: comprises LOD Management Service (LMS) instance & Virtuoso server for storage
• LMS: provides methods for data providers to manage LOD & for other users to query & export the LOD stored
• Virtuoso: underlying RDF triple store also allowing the mapping & synchronization between relational and RDF data
![Page 14: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/14.jpg)
![Page 15: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/15.jpg)
General Query Evaluation Behavior
Response Time
Time passed
2nd instance involvement
![Page 16: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/16.jpg)
LOD Integration & Publishing (I)
• Extension of the high-level CIDOC-CRM conceptual model• New model is called Geo-Scientific Spatial Observation
Model (GSOM) & expressed in RDF/S• It enables to capture all information coming different
fields & countries + link data across different providers• INSPIRE was not exploited as did not cover all
requirements:– Capturing of scientific events– Complicated and cumbersome for information integration– In some cases, does not cover all appropriate information
required by the data providers in particular thematic fields• GSOM-to-INSPIRE mapping specification to enable
exporting INSPIRE-compliant data
![Page 17: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/17.jpg)
LOD Integration & Publishing (II)
• Two alternatives for publishing LOD:1. Create and import RDF-based descriptions of data-sets via
particular LMS method– Data update process must be controlled by performing SPARUL
updates via particular LMS method– Data provider responsibility to keep synchronized relational &
RDF data• A perfect synchronization may be also not required as it may incur costs
-> second alternative becomes more preferable
![Page 18: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/18.jpg)
LOD Integration & Publishing (III)
2. Data provider publishes relational data of his/her data sets + provides a mapping file in R2RML to enable the synchronization of relational to RDF data (by executing LMS method)
– System takes care of this synchronization– Relational storage in the way used many years + additional
RDF storage for the data with automatic one-way synchronization between the two
– Provider should have a good knowledge of GSOM & RDF
![Page 19: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/19.jpg)
LOD Integration & Publishing (IV)
• R2RML: – W3C recommendation since 2012– Can specify customized mappings between RDB & RDF data– R2RML specification is just a RDF graph in Turtle– No specific implementation is imposed
• Virtuoso supports R2RML by processing the R2RML specification & creating the respective RDB2RDF triggers (used for creating/updating RDF data from relational ones)– An RDF view or physical RDF graph can be created with the
second option mapping to far better performance
![Page 20: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/20.jpg)
BoreholeName
Sample ID
WaterLevel
B1 XYZ Level1
Borehole Relational Model
E42.IdentifierSample_IDSample_ID,
E41.AppelationBorehole_NamBorehole_Namee
S16.Borehole
E54.DimensionWaterlevelWaterlevel
P43F.has_dimension
E26.Physical_Feature
S15.Acquifer_ConceptIntake
P121F.overlaps_with
P1F.is_identified_by
P1F.is_identified_by
S13.Sample
O5.removed
S2.SampleTaking
O4.sampled_from
URI Identification:http://orgURL/SampleID/XYZ
R2RML
GSOM
Publication
Synchronization
RDB
![Page 21: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/21.jpg)
LOD Management Service (I)
• REST-based service with API exposing all appropriate management functionality needed by geo-spatial LOD users– Abstracts away from peculiarities of RDF triple stores– Enables simple & intuitive use of a specific set of LOD
management methods– Programmatic or form-based access to methods– Production of query results in different forms, such as WKT,
GML, & KML– Imporing/exporting capabilities in different formats
(RDF/XML, NTriples, Turtle)
![Page 22: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/22.jpg)
LOD Management Service (II)
• The provided methods are:– meta_query (SPARQL string, timeout (opt.), row limit (opt.)): user-
requested format (e.g., JSON)– meta_update (SPARUL string, baseURI, timeout (opt.), row limit
(opt.))– meta_addMappings(R2RML string, graphURI) -> initiates mapping
procedure– meta_export(graphURI, subjURI, predURI, objURI, internal): user-
requested format -> last param indicates if result will be inline in the response
– meta_import(url, graphUri, format, blocking): ImportStatus -> RDF data are imported by downloading them via provided URL or inline in user-request + method can be blocking or non-blocking
– import_status(importID): ImportStatus -> in case of blocking import request, the user can inquire the status of his/her import by exploiting the value of a specific field (importID) returned from the previous method as input to this method
![Page 23: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/23.jpg)
LOD Management Service (III)
• Each method accessible via specific URL + produces meaningful exception messages (e.g., in case user input is wrong)
• User-friendly HTML Documentation produced via Enunciate
• Implementation exploited Sesame RDF Data Management API, Virtuoso’s JDBC Driver & Jersey
![Page 24: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/24.jpg)
Open Issues (I)
• Model:– Extend it to capture other thematic fields– Data published in our system could fulfill all requirements to be
5-star LOD if respective owners decide to do so
• Data mapping:– Cloud-based Virtuoso version supports native Relational DB for
RDB2RDF synchronization• Trade-off between LOD management completeness & cost
– Mapping tools are needed to allow visual-based editing of R2RML without needing from data providers to have good knowledge of RDF
– Research issue: support bi-directional RDB2RDF mappings
![Page 25: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/25.jpg)
Open Issues (II)
• Geo-spatial query support:– Virtuoso does not support GeoSPARQL– Virtuoso has limited geo-spatial query support only in
commercial versions• 2D geometries + limited set of topological relation operators
– Additional support in terms of geometry dimensionality + feature aggregation operators
• Could extend Virtuoso via frameworks, such as uSeekM, which provide adequate geo-spatial support along with the capability of evaluating GeoSPARQL queries
– Such solutions require processing all RDF data stored to create geo-spatial indices as well as deploy another DB -> do not fit well with automatic geo-spatial LOD management
– Could resolve problem by: (a) performing re-indexing in infrequent time intervals, (b) create specialized triggers which trigger re-indexing only when RDF data are updated
![Page 26: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/26.jpg)
Open Issues (III)
• Quality & provenance:– Original input data sets may not have the appropriate quality -
> resulting RDF data can have the same or lower quality level– Proposed infrastructure must be extended with quality
resolving procedures & methods (e.g., data cleansing methods for correcting the data exploited)
– Provenance information can ensure the correct updating of LD + assist in LD reasoning process by deriving additional facts
– Thus, provenance information should be exploited by our system, especially if we consider that such exploitation is not enabled by most LOD management systems
![Page 27: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/27.jpg)
Conclusions
• Proposed a scalable, geo-spatial LOD as-a-Service management system deployed on Amazon cloud– Distributes query load + scales-up/down when CPU utilization
surpasses specific thresholds– Exposes REST-based service with LOD management methods– Provides two different ways for publishing open geo-spatial data sets
• Advance geo-spatial support level by following two directions:– Realize GSOM-to-INSPIRE mapping to enable producing INSPIRE-
compliant data– Extend Virtuoso with geo-spatial indexing & query systems to enable
the efficient processing of rich & expressive geo-spatial queries, expressed either in SPARQL or GeoSPARQL
![Page 28: Linked Open GeoData Management in the Cloud K. Kritikos, Y. Roussakis ICS-FORTH D. Kotzinos ICS-FORTH & TEI of Serres](https://reader035.vdocument.in/reader035/viewer/2022062321/56649de95503460f94ae3aed/html5/thumbnails/28.jpg)
28