why sparql tohu

18
RDF/SPARQL: a UniProtKB/Swiss-Prot practical perspective Jerven Bolleman Developer Swiss-Prot Group

Upload: jerven-bolleman

Post on 22-Jan-2018

408 views

Category:

Science


0 download

TRANSCRIPT

RDF/SPARQL: a UniProtKB/Swiss-Prot practical perspective

Jerven Bolleman Developer Swiss-Prot Group

Our Goals

• ProvidecoreBioinformaticsresources

– UniProtKB/

– …

• Provideservicesandinfrastructure

– Vital-IT:HPCforthelife-sciences

– …

GeneticVariationsandDiseasesinUniProtKB/Swiss-Prot:

TheInsandOutsofExpertManualCuration

Famiglietti, et al.

We annotate a lot of disease/variants!

http://europepmc.org/abstract/MED/24848695

Why provide a public SPARQL endpoint

• A10manwetlaboratorycannotafford:

Why provide a public SPARQL endpoint

• A10manwetlaboratorycannotafford:

– tohosttheirowndatabaseinhouseholdingallorevenabitofalllifesciencedata.

Why provide a public SPARQL endpoint

• A10manwetlaboratorycannotafford:

– tohosttheirowndatabaseinhouseholdingallorevenabitofalllifesciencedata.

– nottohaveaccess,anduse,existinglifescienceinformation.

← Not CPU Time...But Brain Time

The right kind of optimisation

Why provide a public SPARQL endpoint

• ClassicalSQLcanbeprovidedontheweb

–Isnotpractical–Nofederation–Poorstandardsconformance

• Local SQL is expensive • LocalJSONisnobetter

• NorislocalXML

Data Integration Traditional

Pathway.txt

UniProt.txt

Pathway Parser

UniProt Parser

Pathway Schema

UniProt Schema

Own Lab Data

Data warehouse

SQL queries

$

$

$

$

$

$

Data Integration RDF/SPARQL

Pathway.rdf

UniProt.rdf

Own Lab Data

Triple Store SPARQL Queries

$

$?

Why not some other graph database?

EcosystemRDF enables sharing and reuse of data at low cost

Identity Precision Standards

Why provide a public SPARQL endpoint

• DocumentcentricRESTisnotenough

–Swiss-ProtavailableasREST–(over e-mail !!) since 1986

–expasy.ch since 1993 –www.uniprot.orgsince2002

• Most user use a GUI not a CLI • developersbuildGUIonaCLI

13© 2015 SIB

100

10'000

1'000'000

2015-01

2015-02

2015-03

2015-04

2015-05

2015-06

2015-07

2015-08

2015-09

queries ask selectconstruct describe

Queries per month in 2015 peak: 4 million per month

Real users

Mix between hard analytics and super specific

Estimate somewhere between: 400 - 1200 real humans per month

We know they are real because they take holidays ;)

Questions?