rdf database-as-a-service with s4

48
RDF Database-as-a-Service with S4 Marin Dimitrov, CTO of Ontotext Apr 27 th , 2015 RDF DBaaS with S4 / AKSW Colloquium #1 Apr 2015

Upload: marin-dimitrov

Post on 15-Jul-2015

1.409 views

Category:

Technology


1 download

TRANSCRIPT

Page 1: RDF Database-as-a-Service with S4

RDF Database-as-a-Service with S4

Marin Dimitrov, CTO of Ontotext

Apr 27th, 2015

RDF DBaaS with S4 / AKSW Colloquium #1 Apr 2015

Page 2: RDF Database-as-a-Service with S4

• Self-Service Semantic Suite (S4)

• RDF DBaaS on AWS

• Demo

Contents

#2 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 3: RDF Database-as-a-Service with S4

About Ontotext

• Provides products & solutions for content enrichment and metadata management

– 70 employees, headquarters in Sofia (Bulgaria)

– Sales presence in London, Washington & Boston

• Major clients and industries

– Media & Publishing

– Health Care & Life Sciences

– Cultural Heritage & Digital Libraries

– Government

– Education

#3 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 4: RDF Database-as-a-Service with S4

The Self-Service Semantic Suite (S4)

#4 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 5: RDF Database-as-a-Service with S4

• On-demand capabilities for text analytics, content enrichment and metadata management

– Text analytics for news, life sciences and social media

– RDF graph database as-a-service

– Access to large open knowledge graphs

• Available anytime, anywhere

– Simple RESTful services

• Simple, pay-per-use pricing

– No upfront commitments

What is S4?

#5 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 6: RDF Database-as-a-Service with S4

• Enables quick prototyping

– Instantly available, no provisioning & operations required

– Focus on building applications, don’t worry about infrastructure

• Free tier

– Even bigger free quotas for research groups & projects

• Easy to start, shorter learning curve

– Various add-ons, SDKs and demo code

• Based on enterprise semantic technology by Ontotext

S4 benefits

#6 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 7: RDF Database-as-a-Service with S4

• Text analytics services

– News annotation

– News categorisation

– Biomedical

– Twitter

• Entity linking & disambiguation

– Mappings to DBpedia & GeoNames instances

– Mappings to biomedical data sources (LinkedLifeData)

• HTML, MS Word, XML, plain text input

• Simple JSON output

Text analytics with S4

#7 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 8: RDF Database-as-a-Service with S4

News analytics example

#8

S4 result

RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 9: RDF Database-as-a-Service with S4

• Available from AWS Marketplace

• Variety of hardware configurations

– 2 to 8 CPU cores / 8 to 61 GB RAM

– IOPS performance & encryption (EBS)

• Manage large data volumes

• Pay-per-hour pricing

Self-managed RDF DB in the Cloud

#9 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 10: RDF Database-as-a-Service with S4

• Low-cost DBaaS available 24/7

• Ideal for small & moderate data volumes

• Instantly deploy new databases when needed

• Zero administration: automated operations, maintenance & upgrades

• Users pay only for the actual database utilisation

– Number of triples stored + number of queries per month

Fully managed RDF DB in the Cloud

#10 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 11: RDF Database-as-a-Service with S4

• SPARQL query endpoint to the FactForge knowledge graph

– 500 million entities / 5 billion triples

• Key LOD datasets integrated

– DBpedia, Freebase, GeoNames, WordNet

– Dublin Core, SKOS, PROTON ontologies and vocabularies

Knowledge graphs with S4

#11 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 12: RDF Database-as-a-Service with S4

• (available soon)

• Knowledge Graph bundles

– DBpedia, Wikidata, GeoNames, …

– GraphDB RDF database (self-managed @ AWS)

– 3rd party interactive data exploration tool (faceted search, data navigation, dynamic charts)

• Get instant & reliable access to KGs without dealing with provisioning, data import, maintenance, …

Knowledge graphs with S4

#12 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 13: RDF Database-as-a-Service with S4

• Java & C# SDKs

• Sample code

– Java, C#, NodeJS, JavaScript, Python, PHP, Groovy

– Curl examples for the most impatient

• GATE & UIMA plugins

• Firefox & Chrome add-ons

• Online documentation

S4 for developers

#13 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 14: RDF Database-as-a-Service with S4

• DaPaaS & ProDataMarket

– Goal: Open Data / Linked Data publishing & hosting

– S4 role: scalable Linked Data hosting infrastructure

• KConnect

– Goal: semantic annotation, search & analytics for healthcare data

– S4 role: scalable text analytics & RDF data management infrastructure

Research projects using S4

#14 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 15: RDF Database-as-a-Service with S4

Fully Managed RDF Database-as-a-Service

#15 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 16: RDF Database-as-a-Service with S4

• Elastic

– dynamically adapt to data & query volumes

• High availability & resilience

– no SPFs, “graceful degradation” of performance upon failures

• Cost efficient

– cost aware architecture

– Key aspect for Open Data scenarios like DaPaaS & ProDataMarket

• Isolation of the multi-tenant databases

• Fair use of shared resources

Requirements

#16 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 17: RDF Database-as-a-Service with S4

• Micro DB

– Up to 1M triples

– FREE, available now

• Extra Small DB (10M triples)

• Small DB (50M)

• Medium DB (250M)

• Large DB (1B)

RDF DBaaS options on S4

#17 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 18: RDF Database-as-a-Service with S4

• AWS based

– Storage, compute, load balancing, integration services…

• Ontotext GraphDB for the database instances

• OpenRDF REST services

• Docker for containerisation

• Network-attached volumes (EBS) for data storage

• A DBaaS on S4 is…

– A GraphDB instance

– Running within a Docker container

– With a private EBS data volume

Implementation

#18 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 19: RDF Database-as-a-Service with S4

• Routing nodes

– Expose OpenRDF RESTful services to apps

– Access control & quota checks

– Forward client requests to the proper data node

– Temporarily queue requests when necessary

• Data nodes

– Multiple Docker containers (GDB+EBS) per node

• Coordinator (single)

– Distribute DB initialisation / creation tasks to data nodes

• Management Console

S4 DBaaS architecture

#19 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 20: RDF Database-as-a-Service with S4

S4 DBaaS architecture

#20 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

REST apps

3rd party RDF tools

Qu

ota

& A

cce

ss C

on

trol

routers

data nodes

coordinator

EBS

backups

SNS

Docker Repository

Account management

Quota management

reporting

Monitoring & Logging Dynamo

Amazon S3

images

Page 21: RDF Database-as-a-Service with S4

• CRUD

– Router node receives a request

– Routes it to the proper data node & container

– Receives a response, forwards it back to client app

• Routing updates

– Data nodes push notification via SNS – “hearbeats” + changes regarding the hosted DBs (if any)

– Each routing node receives the notifications (via SNS) and updates its routing tables

– Coordinator also receives notifications, learns which DBs are operational / down for maintenance

Normal operations

#21 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 22: RDF Database-as-a-Service with S4

Failure case #1 – data node crash

#22 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

REST apps

3rd party RDF tools

Qu

ota

& A

cce

ss C

on

trol

routers

data nodes

coordinator

EBS SNS

Docker Repository

1 2

2

2

3

Page 23: RDF Database-as-a-Service with S4

Recovery from a data node crash

#23 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

REST apps

3rd party RDF Visualisation

Qu

ota

& A

cce

ss C

on

trol

routers

data nodes

Coordinator

EBS SNS

Docker Repository

1

2

3+4

5 6

6

6

7

Auto Scaling

Page 24: RDF Database-as-a-Service with S4

Failure case #2 – router crash & recovery

#24 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

REST apps

3rd party RDF tools

Qu

ota

& A

cce

ss C

on

trol

routers

data nodes

coordinator

EBS SNS

Docker Repository

1 3

Auto Scaling

4

5 6

7

8 2

Page 25: RDF Database-as-a-Service with S4

• (open connections from client apps to the node are terminated)

• Auto-scaler starts a new router node

– New router subscribes to SNS for heartbeats & updates

• Load balancer starts sending new client requests to router

– Router puts them in the local queue (if routing table is still incomplete)

• Heartbeats from data nodes are received

– Routing information is now complete

– Router starts sending the queued requests to data nodes

Recovery from a router crash

#25 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 26: RDF Database-as-a-Service with S4

Failure case #3 – coordinator crash & recovery

#26 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

REST apps

3rd party RDF tools

Qu

ota

& A

cce

ss C

on

trol

routers

data nodes

coordinator

EBS SNS

Docker Repository

2

Auto Scaling

4

5

6

6

3

Create DB 1

Page 27: RDF Database-as-a-Service with S4

• Routers can route requests to data nodes as usual

– … but new DBs cannot be created temporarily

– … and data nodes with free container slots can’t get info on DBs waiting for initialisation

• AWS Auto-scaler starts a new Coordinator node

– Coordinator reads a list of all registered DBs from the metadata store & subscribes to SNS

• Coordinator starts receiving heartbeats & updates from data nodes

– … learns which DBs are operational / pending

– … and resumes distributing new / pending DBs initialisation tasks to the data nodes with free slots

Failure case #3 – coordinator crash & recovery

#27 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 28: RDF Database-as-a-Service with S4

• Combination of coordinator + data node + routing node crash – same as #1 + #2 + #3

• Routers depend on data nodes

• Data nodes depend on Coordinator

• Coordinator does not depend on other nodes

– No heartbeats coming, means all DBs are down

– Start distributing DB initialisation tasks whenever a request comes from a working data node

– Eventually, all data nodes are up, DBs initialised, heartbeats & routing updates start coming

– … and routers can start routing client requests

Composite failure & recovery

#28 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 29: RDF Database-as-a-Service with S4

Management interface

#29 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Micro, XS, S, M, or L

I/O performance

R/O access to Open Data services or open knowledge

graphs

Page 30: RDF Database-as-a-Service with S4

Management interface

#30 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

DBaaS endpoint

DB details summary

Backup, export, change settings, delete

Run a test query

Page 31: RDF Database-as-a-Service with S4

• Gradually introduce XS, S, M and L instances

• Integration with the GraphDB Workbench management UI

• LDF based containers

• Multi-datacenter deployment

• Replication across datacenters (single master)

Roadmap

#31 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 32: RDF Database-as-a-Service with S4

• “On-demand Text Analytics and Metadata Management with S4” (ESaaSA @ CLOSER’2015)

• “Text Analytics and Linked Data Management As-a-Service with S4” (Wasabi @ ESWC’2015)

• “Low-cost Open Data As-a-Service in the Cloud” (SemDev @ ESWC’2015)

More Details

#32 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 33: RDF Database-as-a-Service with S4

Demo

#33 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 34: RDF Database-as-a-Service with S4

• (create an account & generate an API key pair)

• Create a new DB

• Create a new repository in the DB

– via the REST API / OpenRDF Java SDK / curl

– …or via UI tools like the OpenRDF Workbench

• Import sample data (REST / OpenRDF Workbench)

• Run a query through the public SPARQL endpoint

Demo scenario

#34 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 35: RDF Database-as-a-Service with S4

Demo data – Universities in Saxony

#35 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 36: RDF Database-as-a-Service with S4

#1 Create a database

#36 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 37: RDF Database-as-a-Service with S4

#2a Create a repository & load data (curl)

#37 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#>. @prefix rep: <http://www.openrdf.org/config/repository#>. @prefix sr: <http://www.openrdf.org/config/repository/sail#>. @prefix sail: <http://www.openrdf.org/config/sail#>. @prefix graphdb: <http://www.ontotext.com/trree/owlim#>. [] a rep:Repository ; rep:repositoryID “test01" ; rdfs:label "Description of my repository" ; rep:repositoryImpl [ rep:repositoryType "openrdf:SailRepository" ; sr:sailImpl [ graphdb:ruleset "owl-horst-optimized" ; sail:sailType "owlim:Sail" ; graphdb:base-URL "http://example.org/graphdb#" ; graphdb:repository-type "file-repository" ; ] ].

Repository configuration file

config.ttl

• Repository name: ”test01” • OWL-Horst reasoning ruleset

Page 38: RDF Database-as-a-Service with S4

#2a Create a repository & load data (curl)

#38 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

API_KEY=… KEY_SECRET=… USER=… DATABASE=… REPOSITORY=… SERVICE_ENDPOINT="https://$API_KEY:[email protected]/$USER/$DATABASE" curl -X POST -H “Content-Type:application/x-turtle” -T config.ttl $SERVICE_ENDPOINT/repositories/SYSTEM/rdf-graphs/service?graph=http://example.com#g1 curl -X POST -H “Content-Type:application/x-turtle” -d “<http://example.com#g1> a <http://www.openrdf.org/config/repository#RepositoryContext>.” $SERVICE_ENDPOINT/repositories/SYSTEM/statements curl -X POST -H "Content-Type:application/rdf+xml;charset=UTF-8" -T example.rdf $SERVICE_ENDPOINT/repositories/$REPOSITORY/statements

Create a repository

Upload sample data from example.rdf

• User: 4730361296 • Database: demo01 • Repository: test01

• Configuration: config.ttl

Page 39: RDF Database-as-a-Service with S4

#2b Create a repository & load data (OpenRDF Workbench)

#39 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

DBaaS endpoint

Page 40: RDF Database-as-a-Service with S4

#2b Create a repository & load data (OpenRDF Workbench)

#40 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 41: RDF Database-as-a-Service with S4

#2b Create a repository & load data (OpenRDF Workbench)

#41 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

DBaaS endpoint

Page 42: RDF Database-as-a-Service with S4

#2b Create a repository & load data (OpenRDF Workbench)

#42 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 43: RDF Database-as-a-Service with S4

#2b Create a repository & load data (OpenRDF Workbench)

#43 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 44: RDF Database-as-a-Service with S4

#3a SPARQL query (OpenRDF Workbench)

#44 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 45: RDF Database-as-a-Service with S4

#3a SPARQL query (OpenRDF Workbench)

#45 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 46: RDF Database-as-a-Service with S4

#3b SPARQL query (from the S4 Management Console)

#46 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

PREFIX dbpedia: <http://dbpedia.org/resource/> PREFIX dbp-prop: <http://dbpedia.org/property/> PREFIX dbp-ont: <http://dbpedia.org/ontology/> SELECT ?name ?numberOfStudents ?staff ?established WHERE { dbpedia:University_of_Leipzig rdfs:label ?name ; dbp-prop:students ?numberOfStudents ; dbp-prop:staff ?staff ; dbp-prop:established ?established . }

Page 47: RDF Database-as-a-Service with S4

• S4 provides an enterprise RDF DBaaS

• Resilient design, high availability

• Instantly available whenever needed, easy to use, OpenRDF REST services

• Zero administration: automated operations, maintenance & upgrades

• Free DBs up to 1M triples (even more for research teams & projects)

• Check out http://s4.ontotext.com

Key takeaways

#47 RDF DBaaS with S4 / AKSW Colloquium Apr 2015

Page 48: RDF Database-as-a-Service with S4

Thank you!

#48 RDF DBaaS with S4 / AKSW Colloquium Apr 2015