building a high performance environment for rdf publishing

85
Building a High Performance Environment for RDF Publishing Pascal Christoph

Upload: dr0i

Post on 22-May-2015

2.211 views

Category:

Technology


0 download

DESCRIPTION

SWIB12, cologne 2012-11-27. Video recording: http://www.scivee.tv/node/55329

TRANSCRIPT

Page 1: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment

for RDF Publishing

Pascal Christoph

Page 2: Building a High Performance Environment for RDF Publishing

These slides and all the graphics made by the author and those taken from https://openclipart.org/ are dedicated to the public domain : https://creativecommons.org/about/cc0 .

All marks mentioned may be trademarks or registered trademarks of their respective owners.

Read about the license of „The scream“ of Edward Munch at https://en.wikipedia.org/wiki/File:The_Scream.jpg

Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/

Page 3: Building a High Performance Environment for RDF Publishing

Overview

3Building a High Performance Environment for RDF Publishing

Publishing is for Consuming• Mandatory• Nice to have

Story so far - experiences with lobid.org• What is lobid.org ?• Storing the data• Getting the data

Publishing RDF through elasticsearch• Benefits• Some more details• Caveats

Future prospects

Page 4: Building a High Performance Environment for RDF Publishing

Overview

4Building a High Performance Environment for RDF Publishing

Publishing is for Consuming• Mandatory• Nice to have

Story so far - experiences with lobid.org• What is lobid.org ?• Storing the data• Getting the data

Publishing RDF through elasticsearch• Benefits• Some more details• Caveats

Future prospects

Page 5: Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

Publishing is for Consuming

Building a High Performance Environment for RDF Publishing 5

Page 6: Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

Mandatory

A resource:

Building a High Performance Environment for RDF Publishing 6

Page 7: Building a High Performance Environment for RDF Publishing

Mandatory

A resource:

gets a dereferenceable URI:

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

7

Page 8: Building a High Performance Environment for RDF Publishing

Mandatory

A resource:

gets a dereferenceable URI:

which provides RDF:

<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/title> "With reference to reference" .<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/issued> "1983" .<http://lobid.org/resource/HT002948556> <http://purl.org/ontology/bibo/isbn13> "9780915145539" .<http://lobid.org/resource/HT002948556><http://purl.org/dc/elements/1.1/creator><http://d-nb.info/gnd/135539897> .

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

8

Page 9: Building a High Performance Environment for RDF Publishing

Mandatory

=> basic LOD publishing is very simple:

you just need a Webserver

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

9

Page 10: Building a High Performance Environment for RDF Publishing

Nice to have• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation (best: RDFa in HTML)

• Data searchable

• Timely updates

• High Availability

• Versioning

• Web developers want simple APIs providing JSON

• ...

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

10

Page 11: Building a High Performance Environment for RDF Publishing

SPARQL Endpoint• (Dumps)

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation (best: RDFa in HTML)

• Data searchable

• Timely updates

• High Availability

• Versioning

• Web developers want simple APIs providing JSON

• ...

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

11

Page 12: Building a High Performance Environment for RDF Publishing

SPARQL Endpoint• (Dumps): but may be painfully slow when having lots of data

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation (best: RDFa in HTML)

• (Data searchable) : maybe painfully slow

• Timely updates

• High Availability

• Versioning

• Web developers want simple APIs providing JSON

• most triple stores provides JSON/RDF

• Simple powerful API : too powerful/complex ?

• ...

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

12

Page 13: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

Nice to have

In principle, web developers already got simple APIs :

LOD is the API !

13

Page 14: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

14

Nice to have

In principle, web developers already got simple APIs :

Remember:

Page 15: Building a High Performance Environment for RDF Publishing

Mandatory

A resource:

gets a dereferenceable URI:

which provides the data (in RDF): <http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/title> "With reference to reference" .<http://lobid.org/resource/HT002948556> <http://purl.org/dc/terms/issued> "1983" .<http://lobid.org/resource/HT002948556> <http://purl.org/ontology/bibo/isbn13> "9780915145539" .<http://lobid.org/resource/HT002948556><http://purl.org/dc/elements/1.1/creator><http://d-nb.info/gnd/135539897> .

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

15

Page 16: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

16

Nice to have

In principle, web developers already got powerful APIs :

RESTful SPARQL

Page 17: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

RESTful SPARQL example

getting all data of all resources having a particular ISBN:

curl -H "Accept: application/json" --data-urlencode 'query=prefix bibo: <http://purl.org/ontology/bibo/>SELECT * WHERE { ?s bibo:isbn13 "9780851706238" ; ?p ?o . } LIMIT 100' http://lobid.org/sparql/

17

Page 18: Building a High Performance Environment for RDF Publishing

18Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

Nice to have

Page 19: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

RESTful SPARQL example… and the JSON/RDF result:{ "head": { "vars": [ "s", "p","o"] }, "results": { "bindings": [ { "o": { "type": "uri", "value": "http://openlibrary.org/works/OL2109573W" }, "p": { "type": "uri", "value": "http://rdvocab.info/RDARelationshipsWEMI/workManifested" }, "s": { "type": "uri", "value": "http://lobid.org/resource/HT007824357" } }, { "o": { ...

19

Page 20: Building a High Performance Environment for RDF Publishing

20Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

Nice to haveAs it is, web developers don't like SPARQL

web developer

Page 21: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing is for Consuming

Nice to have

Web developers want APIs like:

http://lobid.org/resources/api/isbn/$isbn

21

Page 22: Building a High Performance Environment for RDF Publishing

Happy web developer

Page 23: Building a High Performance Environment for RDF Publishing

Overview

23Building a High Performance Environment for RDF Publishing

Publishing is for Consuming• Mandatory• Nice to have

Story so far - experiences with lobid.org• What is lobid.org ?• Storing the data• Getting the data

Publishing RDF through elasticsearch• Benefits• Some more details• Caveats

Future prospects

Page 24: Building a High Performance Environment for RDF Publishing

What is lobid.org ?

lobid.org

Building a High Performance Environment for RDF Publishing 24

Page 25: Building a High Performance Environment for RDF Publishing

What is lobid.org ?

Building a High Performance Environment for RDF Publishing

● lobid := linking open bibliographic data

● LOD services of the hbz● lobid-resources :

● exposes 85% of the hbz cooperative catalogue● entries coming from > 200 scientific German libraries● ~ 16 M records with 700 M triples

● with links to ~ 5 M other resources● with links to ~ 32 M items (consisting of 300 M triples)

● lobid-organisations :● exposes German Sigelverzeichnis and MARC-Isil directory● ~ 40 k descriptions of institutions

25

Page 26: Building a High Performance Environment for RDF Publishing

What's missing?• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation ( RDFa in HTML)

• Data searchable

• Timely updates

• High Availability

• Versioning

• Web developers want simple APIs providing JSON

• ...

Building a High Performance Environment for RDF Publishing

What is lobid.org ?

26

Page 27: Building a High Performance Environment for RDF Publishing

Overview

27Building a High Performance Environment for RDF Publishing

Publishing is for Consuming• Mandatory• Nice to have

Story so far - experiences with lobid.org• What is lobid.org ?• Storing the data• Getting the data

Publishing RDF through elasticsearch• Benefits• Some more details• Caveats

Future prospects

Page 28: Building a High Performance Environment for RDF Publishing

2010 - 2011, lobid-organisation

Filesystem : + easy to maintain+ reliable+ fast- no search- no SPARQL- ...

Building a High Performance Environment for RDF Publishing

storing the data

28

Page 29: Building a High Performance Environment for RDF Publishing

lobid today

Triple Store (4store) :+ power of SPARQL

+/- depending on the query: fast to horribly slow +/- search (but string searches often slow and limited) - sometimes gets stuck !

Building a High Performance Environment for RDF Publishing

storing the data

29

Page 30: Building a High Performance Environment for RDF Publishing

lobid today

Search engine (elasticsearch): + fast search

+ stemming, linguistics … + wildcard searching + facets + geo search + JSON + schema-less + simple RESTful API + many plugins + ... + easy to achieve High Availability + scales nicely

Building a High Performance Environment for RDF Publishing

storing the data

30

Page 31: Building a High Performance Environment for RDF Publishing

storing/getting the datalobid today

Page 32: Building a High Performance Environment for RDF Publishing

Overview

32Building a High Performance Environment for RDF Publishing

Publishing is for Consuming• Mandatory• Nice to have

Story so far - experiences with lobid.org• What is lobid.org ?• Storing the data• Getting the data

Publishing RDF through elasticsearch• Benefits• Some more details• Caveats

Future prospects

Page 33: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

getting the data

lobid : technology/dependency stacklobid : technology/dependency stack

Search EngineSearch Engine

WebappWebapp

Triple StoreTriple Store

33

Page 34: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Search EngineSearch Engine

WebappWebapp

Triple StoreTriple Storesometimes gets stuck!sometimes gets stuck!

34

getting the data

lobid : technology/dependency stacklobid : technology/dependency stack

highly available !highly available !

we can do thatwe can do that

Page 35: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

getting the data

lobid : technology/dependency stacklobid : technology/dependency stack

Search EngineSearch Engine

WebappWebapp

Triple StoreTriple Storesometimes gets stuck!sometimes gets stuck!

<=

35

highly available !highly available !

we can do thatwe can do that

Page 36: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

getting the data

lobid : technology/dependency stacklobid : technology/dependency stack

Search EngineSearch Engine

WebappWebapp

Triple StoreTriple Storesometimes gets stuck!sometimes gets stuck!

36

highly available !highly available !

we can do thatwe can do that

Page 37: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

storing/getting the data

Variant 1 : technology/dependency stack Variant 1 : technology/dependency stack

Triple StoreTriple StoreFor external access. Sometimes gets stuck!For external access. Sometimes gets stuck!

Closed, internal. Will be safe from malign queries. Closed, internal. Will be safe from malign queries. Triple StoreTriple Store

37

Page 38: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

storing/getting the data

Variant 1 : technology/dependency stack Variant 1 : technology/dependency stack

Triple StoreTriple StoreFor external access. Sometimes gets stuck!For external access. Sometimes gets stuck!

Closed, internal. Will be safe from malign queries. Closed, internal. Will be safe from malign queries. Triple StoreTriple Store

redundant, complex …

38

Page 39: Building a High Performance Environment for RDF Publishing

Overview

39Building a High Performance Environment for RDF Publishing

Publishing is for Consuming• Mandatory• Nice to have

Story so far - experiences with lobid.org• What is lobid.org ?• Storing the data• Getting the data

Publishing RDF through elasticsearch• Benefits• Some more details• Caveats

Future prospects

Page 40: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

Search EngineSearch Engine

Variant 2: technology/dependency stack Variant 2: technology/dependency stack

WebappWebapp

highly available !highly available !

we can do thatwe can do that

Triple StoreTriple StoreFor external access and some fancy nice-to-have stuff. Sometimes gets stuck!

For external access and some fancy nice-to-have stuff. Sometimes gets stuck!

LOD basis functionality (and some other APIs) are highly available

40

Page 41: Building a High Performance Environment for RDF Publishing

Benefits• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation ( RDFa in HTML)

• Data searchable

• Near Real Time updates

• High Availability

• (Versioning)

• Web developers want simple APIs returning JSON

• ...

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

41

Page 42: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

Benefits

fast, scalable search engine

42

Page 43: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

performance test

Data: 10 M records <=> 300 M tripleCase-insensitive query: „beach“

SELECT ?sWHERE { ?s <http://purl.org/dc/terms/title> ?o FILTER regex(str(?o), "beach", "i") }

#### => SPARQL execution time for Q8316: 108.7s, returned 2815 rows.

http://$ip:9200/_search?q=beach&from=0&size=2800 # => Elasticsearch needed 0.4s

=> Elasticsearch is 250 times faster

43

Page 44: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

44

performance test

(there is a support for text indexing in 4store, have not tested that.)

Page 45: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

45

performance test

Elasticsearch: 18 M records , 6 GB RAM: 5 hour4store: 1 B triples, having 72 GB RAM: 7 hours

CPU: Quad Core mit 2.4 GhZ und Hyperthreading => 8 CPUsHD: 6 x 2.5" 10k U/min a 146GB

(Don't take benchmarks too seriously – they just give a clue !)

Page 46: Building a High Performance Environment for RDF Publishing

Benefits• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation ( RDFa in HTML)

• Data searchable

• Near Real Time updates

• High Availability

• (Versioning)• Web developers want simple APIs providing JSON

• ...

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

46

Page 47: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

Benefits

build to be easily made highly available !

47

Page 48: Building a High Performance Environment for RDF Publishing

Benefits• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation ( RDFa in HTML)

• Data searchable

• Near Real Time updates

• High Availability

• (Versioning)• Web developers want simple APIs providing JSON

• ...

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

48

Page 49: Building a High Performance Environment for RDF Publishing

Benefits

Versioning with elasticsearch:

Not out-of-the-box, but comes at least e.g. with

* concurrency control

* documents have a version number

=> implementing versioning is not hard

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

49

Page 50: Building a High Performance Environment for RDF Publishing

Benefits, relying on elasticsearch as basic LOD storage• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation ( RDFa in HTML)

• Data searchable

• Near Real Time updates

• High Availability

• Versionizing

• Web developers want:• JSON (LD)• Simple APIs

• ...

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

50

Page 51: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

51

Benefits• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation ( RDFa in HTML)

• Data searchable

• Near Real Time updates

• High Availability

• (Versioning)• Web developers want simple APIs providing JSON

• ...

Page 52: Building a High Performance Environment for RDF Publishing

Why JSON-LD?

JSON is :

• stored natively by many tools (e.g. elasticsearch)

• loved by consumers (web developers)

JSON-LD is :

• supported by RDF libraries (e.g. transforming to NTriples)

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

52

Page 53: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

53

Benefits• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation ( RDFa in HTML)

• Data searchable

• Near Real Time updates

• High Availability

• (Versioning)• Web developers want simple APIs providing JSON

• ...

Page 54: Building a High Performance Environment for RDF Publishing

Benefits

RESTful elasticsearch API, e. g. :

http://lobid.org/resources/_search?q=isbn:$isbn

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

54

Page 55: Building a High Performance Environment for RDF Publishing

Benefits

• … and many other nice things come with elasticsearch

• geo-search : „Query only libraries/items residing up to 10 km from me.“

• …

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

55

Page 56: Building a High Performance Environment for RDF Publishing

Benefits• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation ( RDFa in HTML)

• Data searchable

• Near Real Time updates

• High Availability

• (Versioning)

• Web developers want simple APIs providing JSON

• ...

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

Mission accomplished !

Mission accomplished !

56

Page 57: Building a High Performance Environment for RDF Publishing

( … ok, something is left to be done ! )• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation ( RDFa in HTML)

• Data searchable

• Near Real Time updates

• High Availability

• Versionizing

• Web developers want simple APIs providing JSON

• ...

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

57

Page 58: Building a High Performance Environment for RDF Publishing

Overview

58Building a High Performance Environment for RDF Publishing

Publishing is for Consuming• Mandatory• Nice to have

Story so far - experiences with lobid.org• What is lobid.org ?• Storing the data• Getting the data

Publishing RDF through elasticsearch• Benefits• Caveats• Auto suggest demo

Conclusion

Page 59: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

!?

59

Caveats• Dumps

• Content Negotiation (different RDF serializations)

• SPARQL

• Human readable representation ( RDFa in HTML)

• Data searchable

• Near Real Time updates

• High Availability

• Versionizing

• Web developers want simple APIs providing JSON

• ...

Page 60: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

Caveats

How to integrate semantic search into a document storage ? dct:contributor --------> dct:creator -------> dc:creator \---------> dc:contributor

\--------> bibo:translator …

There is no inferencing as comes with SPARQL !

60

Page 61: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

Caveats

Our data flow :

from records to RDF triples to records

61

Page 62: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

Caveats

Our data flow :

from records to RDF triples to records

62

Page 63: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

!?

63

Caveats

from records to RDF triples to records

Page 64: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

From records to RDF triples |-----> graph-database'------> computing ---> record-database

MARC/MAB/PICA... JSON-LD

64

Caveats

Page 65: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

65

Caveats

tree-based vs graph-based:

Pre-render the whole document?

What is the document ?

Page 66: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

66

Caveats

Page 67: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

67

Caveats

What is the document ? Only the top-level node ?

Page 68: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

68

Caveats

What is the document ? Only the top-level node ?

… but then you couldn't even search the authors name !

Page 69: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

69

Caveats

searching needs integration of some fields from subgraphs into the document

Page 70: Building a High Performance Environment for RDF Publishing

Overview

70Building a High Performance Environment for RDF Publishing

Publishing is for Consuming• Mandatory• Nice to have

Story so far - experiences with lobid.org• What is lobid.org ?• Storing the data• Getting the data

Publishing RDF through elasticsearch• Benefits• Caveats• Auto suggest demo

Conclusion

Page 71: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

auto suggest

authority IDs must be easily found

71

Page 72: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

auto suggest

authority IDs must be easily found

=> in need of auto suggest

72

Page 73: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

auto suggest

auto suggests needs fast searching

73

Page 74: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

Demo

auto suggest

74

Page 75: Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

75

Page 76: Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

auto suggestRESTful APIs: http://demo.lobid.org/search?format=short&index=gnd-index&author=Schmidt%2C+Karl

http://demo.lobid.org/search?format=page&index=gnd-index&author=Schmidt%2C+Karl

http://demo.lobid.org/search?format=full&index=gnd-index&author=Schmidt%2C+Karl…

API usage: GET /search?format=<page|full|short>&index=<lobid-index|gnd-index>&author=<query>

easy to enhance with the play framework and the elasticsearch API

Building a High Performance Environment for RDF Publishing 76

Page 77: Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

auto suggest

[

"Schmidt, Karl (1894-1945)", "Schmidt, Karl", "Schmidt, Karl (1910-)", "Schmidt, Karl (1846-1928)", "Schmidt, Karl (1913-)",

"Schmidt, Karl (1899-)",

"Schmidt, Karl (1924-)",

"Schmidt, Karl (1836-1888)", "Schmidt, L. F. Karl", "Schmidt, Karl (1902-1945)", "Schmidt, Karl J.", "Schmidt, Karl (1848-1905)", "Schmidt, Karl (1817-1882)", "Schmidt, Karl R.", "Schmidt, Karl (1954-)", "Schmidt, Karl (1888-)", "Schmidt, Karl (1867-)",

...]

RESTful APIs: http://demo.lobid.org/search?format=short&index=gnd-index&author=Schmidt%2C+Karl

Building a High Performance Environment for RDF Publishing 77

Page 78: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

auto suggest

GND authority file

in

lobid-resources

78

Page 80: Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

Building a High Performance Environment for RDF Publishing 80

Page 81: Building a High Performance Environment for RDF Publishing

Overview

81Building a High Performance Environment for RDF Publishing

Publishing is for Consuming• Mandatory• Nice to have

Story so far - experiences with lobid.org• What is lobid.org ?• Storing the data• Getting the data

Publishing RDF through elasticsearch• Benefits• Caveats• Auto suggest demo

Conclusion

Page 82: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

Search EngineSearch Engine

Conclusion

a highly customizable/reliable/feature-rich LOD service

Conclusion

a highly customizable/reliable/feature-rich LOD service

WebappWebapp

highly available !highly available !

we can do thatwe can do that

Triple StoreTriple StoreFor external access and some fancy nice-to-have stuff. Sometimes gets stuck!

For external access and some fancy nice-to-have stuff. Sometimes gets stuck!

LOD basis functionality (and some other APIs) are highly available

82

Page 83: Building a High Performance Environment for RDF Publishing

Building a High Performance Environment for RDF Publishing

Publishing LOD with elasticsearch

the software is Open Source: the software is Open Source:

https://github.com/lobid/

http://elasticsearch.org/

https://hadoop.apache.org/

http://www.playframework.org/

83

http://4store.org/

Page 84: Building a High Performance Environment for RDF Publishing

Any Questions ?

Pascal Christoph

[email protected]

[email protected]

Page 85: Building a High Performance Environment for RDF Publishing

Using a dark background, this presentation saves maybe 70% of energy