techniques used in rdf data publishing at nature publishing group

32
Techniques used in RDF Data Publishing at Nature Publishing Group Tony Hammond Data Architect, NPG March 5, 2013

Upload: tony-hammond

Post on 11-Jun-2015

9.713 views

Category:

Technology


6 download

DESCRIPTION

Lotico London Semweb Meetup - March 2013

TRANSCRIPT

Page 1: Techniques used in RDF Data Publishing at Nature Publishing Group

Techniquesused in

RDF Data Publishingat

Nature Publishing Group

Tony HammondData Architect, NPG

March 5, 2013

Page 2: Techniques used in RDF Data Publishing at Nature Publishing Group

22

Nature Publishing Group

● NPG a division of Macmillan (a privately owned company)

● Publishes ~120 titles in all● 34 Nature branded titles● 53 academic and society journals● 16 magazines (incl. Scientific American)

● ~1000 employees,17 offices (5 continents)● ~30 society partners● Databases, conferences/events, multimedia

Page 3: Techniques used in RDF Data Publishing at Nature Publishing Group

33

Semantic Publishing at NPG

• Prior Work• RSS 1.0 webfeeds• HTML metadata• PDF metadata (XMP)• Urchin – RSS aggregator• OAI-PMH, OpenSearch (SRU), OpenURL

• Linked Data Apps• Public Data: test viability of data publishing• Hub: application of technology internally

Page 4: Techniques used in RDF Data Publishing at Nature Publishing Group

44

Public Data

Page 5: Techniques used in RDF Data Publishing at Nature Publishing Group

55

NPG by Numbers

Page 6: Techniques used in RDF Data Publishing at Nature Publishing Group

66

NPG Ontology

Page 7: Techniques used in RDF Data Publishing at Nature Publishing Group

77

Cloud Hosting

• TSO OpenUp® SaaS platform• Offers 5store as a triplestore• Scale-out architecture (C/C++)• Supports up to a trillion triples• 150,000tps load speed• SPARQL 1.0, with 1.1 features

(aggregates, etc)

Page 8: Techniques used in RDF Data Publishing at Nature Publishing Group

88

data.nature.com

Page 9: Techniques used in RDF Data Publishing at Nature Publishing Group

99

data.nature.com/query

Page 10: Techniques used in RDF Data Publishing at Nature Publishing Group

1010

Hub

Page 11: Techniques used in RDF Data Publishing at Nature Publishing Group

1111

Hub: Problem

Page 12: Techniques used in RDF Data Publishing at Nature Publishing Group

1212

Hub: Solution

Page 13: Techniques used in RDF Data Publishing at Nature Publishing Group

1313

Hub: Method

Page 14: Techniques used in RDF Data Publishing at Nature Publishing Group

1414

XMP

Page 15: Techniques used in RDF Data Publishing at Nature Publishing Group

1515

Building the Graph

Page 16: Techniques used in RDF Data Publishing at Nature Publishing Group

1616

Local Hosting

• Apache TDB• Single-node architecture (Java)• Supports up to ~1.5b triples (tested)• SPARQL 1.1

Page 17: Techniques used in RDF Data Publishing at Nature Publishing Group

1717

Data Publishing

Page 18: Techniques used in RDF Data Publishing at Nature Publishing Group

1818

Hub Finder

Page 19: Techniques used in RDF Data Publishing at Nature Publishing Group

1919

Hub Finder: Results

Page 20: Techniques used in RDF Data Publishing at Nature Publishing Group

2020

Techniques

Page 21: Techniques used in RDF Data Publishing at Nature Publishing Group

2121

Naming Architecture

Page 22: Techniques used in RDF Data Publishing at Nature Publishing Group

2222

Naming Policy

Object Example Usage

Graph npgg:gadgets gadgets:33 ex:title "Title" npgg:gadgets .

Class npg:Gadget gadgets:33 a npg:Gadget npgg:gadgets .

Object Property

npg:hasGadget _:12 npg:hasGadget gadgets:33 npgg:_ .

Data Property

ex:title gadgets:33 ex:title "Title" npgg:gadgets .

Instance gadgets:33 gadgets:33 ex:title "Title" npgg:gadgets .

npg: http://ns.nature.com/terms/npgg: http://ns.nature.com/graphs/

Page 23: Techniques used in RDF Data Publishing at Nature Publishing Group

2323

Publishing

Page 24: Techniques used in RDF Data Publishing at Nature Publishing Group

2424

Monitoring

Page 25: Techniques used in RDF Data Publishing at Nature Publishing Group

2525

ETL Process

Page 26: Techniques used in RDF Data Publishing at Nature Publishing Group

2626

Datastore: Imports

Page 27: Techniques used in RDF Data Publishing at Nature Publishing Group

2727

Datastore: Exports

Page 28: Techniques used in RDF Data Publishing at Nature Publishing Group

2828

Contracts

npgg:affiliations a npg:Graph, void:Dataset ; dcterms:description "Graph of npg:Affiliation objects" ; dcterms:issued "2013-02-15"^^xsd:date ; dcterms:modified "2013-02-15"^^xsd:date ; dcterms:publisher [ a foaf:Organization ; foaf:mbox <mailto:[email protected]> ; foaf:name "Nature Publishing Group" ] ; dcterms:source "extractor-xml" ; dcterms:title "npgg:affiliations" ; rdfs:label "npgg:affiliations" ; void:classPartition [ void:class npg:Affiliation ; void:entities "973208"^^xsd:int ] ; void:propertyPartition [ void:property vcard:url ; void:triples "326"^^xsd:int ], [ void:property vcard:street-address ; void:triples "82638"^^xsd:int ], [

void:property vcard:region ; void:triples "183483"^^xsd:int ], [ void:property vcard:organisation-name ; void:triples "694290"^^xsd:int ], [ void:property vcard:locality ; void:triples "412042"^^xsd:int ], [ void:property vcard:email ; void:triples "21650"^^xsd:int ], [ void:property vcard:country-name ; void:triples 0 ], [ void:property rdfs:label ; void:triples "973208"^^xsd:int ], [ void:property rdf:type ; void:triples "973208"^^xsd:int ] ; void:triples "3340845"^^xsd:int ; void:vocabulary npg:, rdf:, rdfs:, void: .

Page 29: Techniques used in RDF Data Publishing at Nature Publishing Group

2929

Linked Data API

• ./api/articles [.json, .rdf, .xml]• ./api/articles?hasProduct.pcode=ng• ./api/contributors?familyName=Smith• ./api/products.json?pcode=ng&_page=2• ./api/products?_view=none&_properties=pcode• ./api/search?title=black+hole• ./api/tree/subjects/children.xml?_sort=title

Page 30: Techniques used in RDF Data Publishing at Nature Publishing Group

3030

Closing

Page 31: Techniques used in RDF Data Publishing at Nature Publishing Group

3131

Positions Available

goo.gl/bYIt8www.linkedin.com/jobs?jobId=4890057&viewJob