techniques used in rdf data publishing at nature publishing group

Post on 11-Jun-2015

9.713 Views

Category:

Technology

6 Downloads

Preview:

Click to see full reader

DESCRIPTION

Lotico London Semweb Meetup - March 2013

TRANSCRIPT

Techniquesused in

RDF Data Publishingat

Nature Publishing Group

Tony HammondData Architect, NPG

March 5, 2013

22

Nature Publishing Group

● NPG a division of Macmillan (a privately owned company)

● Publishes ~120 titles in all● 34 Nature branded titles● 53 academic and society journals● 16 magazines (incl. Scientific American)

● ~1000 employees,17 offices (5 continents)● ~30 society partners● Databases, conferences/events, multimedia

33

Semantic Publishing at NPG

• Prior Work• RSS 1.0 webfeeds• HTML metadata• PDF metadata (XMP)• Urchin – RSS aggregator• OAI-PMH, OpenSearch (SRU), OpenURL

• Linked Data Apps• Public Data: test viability of data publishing• Hub: application of technology internally

44

Public Data

55

NPG by Numbers

66

NPG Ontology

77

Cloud Hosting

• TSO OpenUp® SaaS platform• Offers 5store as a triplestore• Scale-out architecture (C/C++)• Supports up to a trillion triples• 150,000tps load speed• SPARQL 1.0, with 1.1 features

(aggregates, etc)

88

data.nature.com

99

data.nature.com/query

1010

Hub

1111

Hub: Problem

1212

Hub: Solution

1313

Hub: Method

1414

XMP

1515

Building the Graph

1616

Local Hosting

• Apache TDB• Single-node architecture (Java)• Supports up to ~1.5b triples (tested)• SPARQL 1.1

1717

Data Publishing

1818

Hub Finder

1919

Hub Finder: Results

2020

Techniques

2121

Naming Architecture

2222

Naming Policy

Object Example Usage

Graph npgg:gadgets gadgets:33 ex:title "Title" npgg:gadgets .

Class npg:Gadget gadgets:33 a npg:Gadget npgg:gadgets .

Object Property

npg:hasGadget _:12 npg:hasGadget gadgets:33 npgg:_ .

Data Property

ex:title gadgets:33 ex:title "Title" npgg:gadgets .

Instance gadgets:33 gadgets:33 ex:title "Title" npgg:gadgets .

npg: http://ns.nature.com/terms/npgg: http://ns.nature.com/graphs/

2323

Publishing

2424

Monitoring

2525

ETL Process

2626

Datastore: Imports

2727

Datastore: Exports

2828

Contracts

npgg:affiliations a npg:Graph, void:Dataset ; dcterms:description "Graph of npg:Affiliation objects" ; dcterms:issued "2013-02-15"^^xsd:date ; dcterms:modified "2013-02-15"^^xsd:date ; dcterms:publisher [ a foaf:Organization ; foaf:mbox <mailto:developers@nature.com> ; foaf:name "Nature Publishing Group" ] ; dcterms:source "extractor-xml" ; dcterms:title "npgg:affiliations" ; rdfs:label "npgg:affiliations" ; void:classPartition [ void:class npg:Affiliation ; void:entities "973208"^^xsd:int ] ; void:propertyPartition [ void:property vcard:url ; void:triples "326"^^xsd:int ], [ void:property vcard:street-address ; void:triples "82638"^^xsd:int ], [

void:property vcard:region ; void:triples "183483"^^xsd:int ], [ void:property vcard:organisation-name ; void:triples "694290"^^xsd:int ], [ void:property vcard:locality ; void:triples "412042"^^xsd:int ], [ void:property vcard:email ; void:triples "21650"^^xsd:int ], [ void:property vcard:country-name ; void:triples 0 ], [ void:property rdfs:label ; void:triples "973208"^^xsd:int ], [ void:property rdf:type ; void:triples "973208"^^xsd:int ] ; void:triples "3340845"^^xsd:int ; void:vocabulary npg:, rdf:, rdfs:, void: .

2929

Linked Data API

• ./api/articles [.json, .rdf, .xml]• ./api/articles?hasProduct.pcode=ng• ./api/contributors?familyName=Smith• ./api/products.json?pcode=ng&_page=2• ./api/products?_view=none&_properties=pcode• ./api/search?title=black+hole• ./api/tree/subjects/children.xml?_sort=title

3030

Closing

3131

Positions Available

goo.gl/bYIt8www.linkedin.com/jobs?jobId=4890057&viewJob

top related