introduction to rdf and the semantic web for the life sciences · what types of things are in my...
TRANSCRIPT
![Page 1: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/1.jpg)
Introduction to RDF and the Semantic Web for the life sciences
Simon Jupp
Sample Phenotypes and Ontologies Team
European Bioinformatics Institute
![Page 2: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/2.jpg)
Practical sessions
• Converting data to RDF
• Three questions
1. What types of things are in my data?
2. Can I identify these things?
3. How are these things related to other things?
![Page 3: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/3.jpg)
Gene expression data example
Experiment Gene name Ensembl id organism organism_part expression t-‐stat p-‐value E-‐TABM-‐865 Ms4a1 ENSMUSG00000024673 mus musculus liver DOWN -‐140.00183 8.40E-‐34 E-‐TABM-‐865 Ms4a1 ENSMUSG00000024673 mus musculus spleen UP 140.00183 8.40E-‐34 E-‐TABM-‐865 MMp ENSMUSG00000028158 mus musculus liver UP 138.82608 8.40E-‐34 E-‐TABM-‐865 MMp ENSMUSG00000028158 mus musculus spleen DOWN -‐138.82608 8.40E-‐34 E-‐TABM-‐865 Akr1c14 ENSMUSG00000033715 mus musculus liver UP 132.92674 1.69E-‐33 E-‐TABM-‐865 Akr1c14 ENSMUSG00000033715 mus musculus spleen DOWN -‐132.92674 1.69E-‐33 E-‐TABM-‐865 Gulo ENSMUSG00000034450 mus musculus liver UP 126.44113 4.51E-‐33 E-‐TABM-‐865 Gulo ENSMUSG00000034450 mus musculus spleen DOWN -‐126.44113 4.51E-‐33 E-‐TABM-‐865 Marc1 ENSMUSG00000026621 mus musculus liver UP 124.45381 4.66E-‐33 E-‐TABM-‐865 Marc1 ENSMUSG00000026621 mus musculus spleen DOWN -‐124.45381 4.66E-‐33 E-‐GEOD-‐2852 Gulo ENSRNOG00000016648 raMus norvegicus kidney DOWN -‐32.518154 1.09E-‐42 E-‐GEOD-‐2852 Gulo ENSRNOG00000016648 raMus norvegicus liver UP 32.518154 1.09E-‐42 E-‐GEOD-‐2852 Akr1c14 ENSRNOG00000017672 raMus norvegicus kidney DOWN -‐28.861328 2.29E-‐39 E-‐GEOD-‐2852 Akr1c14 ENSRNOG00000017672 raMus norvegicus liver UP 28.861328 2.29E-‐39 E-‐GEOD-‐2852 Akr1c14 ENSRNOG00000017672 raMus norvegicus kidney DOWN -‐16.854948 2.25E-‐25 E-‐GEOD-‐2852 Akr1c14 ENSRNOG00000017672 raMus norvegicus liver UP 16.854948 2.25E-‐25 E-‐GEOD-‐2852 Amacr ENSRNOG00000018662 raMus norvegicus kidney DOWN -‐6.296967 7.45E-‐08 E-‐GEOD-‐2852 Amacr ENSRNOG00000018662 raMus norvegicus liver UP 6.296967 7.45E-‐08
![Page 4: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/4.jpg)
What is it?
• What concepts do we have in this dataset?
• Some hints are already in the column names
Experiment Gene name Ensembl id organism organism_part expression t-‐stat p-‐value E-‐TABM-‐865 Ms4a1 ENSMUSG00000024673 mus musculus liver DOWN -‐140.00183 8.40E-‐34 E-‐TABM-‐865 Ms4a1 ENSMUSG00000024673 mus musculus spleen UP 140.00183 8.40E-‐34 E-‐TABM-‐865 MMp ENSMUSG00000028158 mus musculus liver UP 138.82608 8.40E-‐34 E-‐TABM-‐865 MMp ENSMUSG00000028158 mus musculus spleen DOWN -‐138.82608 8.40E-‐34 E-‐TABM-‐865 Akr1c14 ENSMUSG00000033715 mus musculus liver UP 132.92674 1.69E-‐33 E-‐TABM-‐865 Akr1c14 ENSMUSG00000033715 mus musculus spleen DOWN -‐132.92674 1.69E-‐33 E-‐TABM-‐865 Gulo ENSMUSG00000034450 mus musculus liver UP 126.44113 4.51E-‐33 E-‐TABM-‐865 Gulo ENSMUSG00000034450 mus musculus spleen DOWN -‐126.44113 4.51E-‐33 E-‐TABM-‐865 Marc1 ENSMUSG00000026621 mus musculus liver UP 124.45381 4.66E-‐33 E-‐TABM-‐865 Marc1 ENSMUSG00000026621 mus musculus spleen DOWN -‐124.45381 4.66E-‐33 E-‐GEOD-‐2852 Gulo ENSRNOG00000016648 raMus norvegicus kidney DOWN -‐32.518154 1.09E-‐42 E-‐GEOD-‐2852 Gulo ENSRNOG00000016648 raMus norvegicus liver UP 32.518154 1.09E-‐42 E-‐GEOD-‐2852 Akr1c14 ENSRNOG00000017672 raMus norvegicus kidney DOWN -‐28.861328 2.29E-‐39 E-‐GEOD-‐2852 Akr1c14 ENSRNOG00000017672 raMus norvegicus liver UP 28.861328 2.29E-‐39 E-‐GEOD-‐2852 Akr1c14 ENSRNOG00000017672 raMus norvegicus kidney DOWN -‐16.854948 2.25E-‐25 E-‐GEOD-‐2852 Akr1c14 ENSRNOG00000017672 raMus norvegicus liver UP 16.854948 2.25E-‐25 E-‐GEOD-‐2852 Amacr ENSRNOG00000018662 raMus norvegicus kidney DOWN -‐6.296967 7.45E-‐08 E-‐GEOD-‐2852 Amacr ENSRNOG00000018662 raMus norvegicus liver UP 6.296967 7.45E-‐08
![Page 5: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/5.jpg)
Exercise 1 – Concept maps
• Write down the concepts represented in this data (e.g. Experiment)
• Organise the concepts into a graph and write down some relationships between the concepts
![Page 6: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/6.jpg)
Exercise 1 solution
Experiment Ensembl id
Experimental factor
Expression Value
P-‐value
T-‐staSsSc
Gene name
Organism
Congratulations on building your first Ontology!
Has result
Factor value
T-statistic P-value
Ensembl gene
Gene name
Factor value
![Page 7: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/7.jpg)
Instance vs Types
• The world (of information) is made up of things and lots of them
• Instances, individuals, objects, tokens, particulars.
• The Earth is a kind of Planet
• Simon Jupp (NE 67 41 58 A) is a Person
• E-MTAB-62 is a type of Experiment
• Your liver is a type of Organ
![Page 8: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/8.jpg)
Instance Type Experiment Y E-‐TABM-‐865 Ms4a1 Gene name ensembl id
ENSMUSG00000024673 organism mus musculus organism_part liver expression DOWN
-‐140.00183 t-‐stat p-‐value
8.40E-‐34
Exercise 2 – Identify Types vs Instance data
![Page 9: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/9.jpg)
Instance Type Experiment Y E-‐TABM-‐865 Y Ms4a1 Y Gene name Y ensembl id Y
ENSMUSG00000024673 Y organism Y mus musculus Y organism_part Y liver Y expression Y DOWN Y
-‐140.00183 Y t-‐stat Y p-‐value Y
8.40E-‐34 Y
Exercise 2 solution
![Page 10: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/10.jpg)
Giving things identity
• Choose a URI scheme for resources.
• Re-use URIs for types of things where possible
• Shared URIs for the same things make integration happen
• General rule
1. If it’s your data, give it a URI in your namespace.
2. If it’s someone else's data (e.g. UniProt) use a URI from them (if they have one)
![Page 11: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/11.jpg)
Instance Type Mine Experiment Y E-‐TABM-‐865 Y Ms4a1 Y Gene name Y ensembl id Y
ENSMUSG00000024673 Y N organism Y mus musculus Y organism_part Y liver Y expression Y DOWN Y
-‐140.00183 Y t-‐stat Y p-‐value Y
8.40E-‐34 Y
Exercise 3 – your data vs shared data
![Page 12: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/12.jpg)
Instance Type Mine Experiment Y N E-‐TABM-‐865 Y Y Ms4a1 Y N Gene name Y N ensembl id Y N
ENSMUSG00000024673 Y N organism Y N mus musculus Y N organism_part Y N liver Y N expression Y N DOWN Y N
-‐140.00183 Y Y t-‐stat Y N p-‐value Y N
8.40E-‐34 Y Y
Exercise 3 solution
Types of things usually belong in external reference ontologies. Good practice try and connect your data to these ontologies
Your data is usually the instance data (the experiment or the results)
![Page 13: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/13.jpg)
URI for a instance
• Ensembl Gene ENSMUSG00000024673
• http://www.ensembl.org/Mus_musculus/Gene/Summary?g=ENSMUSG00000024673
• Is this a good URI?
• Is it stable? What does it represent?
• This is a URL for the web page, it may change
• It doesn’t return RDF
![Page 14: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/14.jpg)
Identifiers.org
• http://identifiers.org
• Identifiers.org is a system providing resolvable persistent URIs used to identify data for the scientific community, with a current focus on the Life Sciences domain. The provision of a resolvable identifiers (URLs) fits well with the Semantic Web vision, and the Linked Data initiative.
![Page 15: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/15.jpg)
Exercise 4
• Use the identifiers.org website to find the URI for ENSMUSG00000024673
![Page 16: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/16.jpg)
Exercise 4 solution
• Search identifiers.org for ensembl
• Got to http://www.ebi.ac.uk/miriam/main/collections/MIR:00000003
• Find root URL
• http://identifiers.org/ensembl/ENSMUSG00000024673
• See what it resolves to
![Page 17: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/17.jpg)
URI for types
• Experimental factor “liver”
• “liver” is an organ. We would expect to find an ontology term that describes what a liver is
• BioPortal is a repository or bio-medical ontologies
• https://bioportal.bioontology.org
![Page 18: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/18.jpg)
Exercise 5
• Go to https://bioportal.bioontology.org and find ontologies that contain terms for “liver”, “spleen” and “kidney”
• Get the URIs for liver, spleen and kidney from the Experimental Factor Ontology (EFO)
![Page 19: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/19.jpg)
Exercise 5 solution
• “liver”
• http://purl.obolibrary.org/obo/UBERON_0002107
• “spleen”
• http://purl.obolibrary.org/obo/UBERON_0002106
• “kidney”
• http://purl.obolibrary.org/obo/UBERON_0002113
![Page 20: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/20.jpg)
Instance Type Mine URI Experiment Y N E-‐TABM-‐865 Y Y Ms4a1 Y N Gene name Y N ensembl id Y N
ENSMUSG00000024673 Y N http://identifiers.org/ensembl/ENSMUSG00000024673
organism Y N mus musculus Y N organism_part Y N liver Y N http://purl.obolibrary.org/obo/UBERON_0002107 expression Y N DOWN Y N
-‐140.00183 Y Y t-‐stat Y N p-‐value Y N
8.40E-‐34 Y Y
Exercise 5 – Find URIs using BioPortal for types and identifiers.org for instances Restrict types search to EFO, UBERON, SIO, OBI and EDAM Ontology
![Page 21: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/21.jpg)
Instance Type Mine URI Experiment Y N http://www.ebi.ac.uk/efo/EFO_0004033
E-‐TABM-‐865 Y Y N/A
Ms4a1 Y N N/A this is just a label for the ensembl gene
Gene name Y N http://edamontology.org/data_2299
ensembl id Y N http://edamontology.org/data_2610
ENSMUSG00000024673 Y N http://identifiers.org/ensembl/ENSMUSG00000024673
organism Y N http://purl.obolibrary.org/obo/OBI_0100026
mus musculus Y N http://purl.obolibrary.org/obo/NCBITaxon_10090
organism_part Y N http://www.ebi.ac.uk/efo/EFO_0000635
liver Y N http://purl.obolibrary.org/obo/UBERON_0002107
expression Y N http://edamontology.org/topic_0203
DOWN Y N http://semanticscience.org/resource/SIO_001078
-‐140.00183 Y Y N/A
t-‐stat Y N http://semanticscience.org/resource/SIO_001074
p-‐value Y N http://semanticscience.org/resource/SIO_000765
8.40E-‐34 Y Y N/A
Exercise 5 solution- find some more URIs
![Page 22: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/22.jpg)
Building the RDF graph
• We have identified our types with URIs
• We know what data is ours
• Now we need to translate each row in the file to an RDF representation using N-triples
• <Subject> <Predicate> <Object>
• Remember the Object can be a URI or a value
• For predicates create URIs in our own namespace
• http://www.mydomain.com/mydata#
![Page 23: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/23.jpg)
Example row conversion to RDF
Experiment
E-‐TABM-‐865 Ms4a1 ENSMUSG00000024673 mus musculus liver DOWN -‐140.00183 8.40E-‐34
E-‐TABM-‐865
type
RDF Triples SUBJECT PREDICATE OBJECT
mydata:E-‐TABM-‐865 rdf:type efo:EFO_0004033
![Page 24: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/24.jpg)
Example row conversion to RDF
Experiment
E-‐TABM-‐865 Ms4a1 ENSMUSG00000024673 mus musculus liver DOWN -‐140.00183 8.40E-‐34
E-‐TABM-‐865
type
RDF Triples SUBJECT PREDICATE OBJECT
mydata:E-‐TABM-‐865 rdf:type efo:EFO_0004033
mydata:E-‐TABM-‐865 mydata:hasResult mydata:result1
mydata:result1 rdf:type sio:SIO_001078
Down Expression Value
mydata:result1
has result type
![Page 25: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/25.jpg)
Exercise 6
• Using the following schema write out some RDF in N-triples to represent this single row of data
Experiment Ensembl id
Experimental factor
Expression Value
P-‐value
T-‐staSsSc
Gene name
Organism
has result
Factor value
T-stat P-value
dbxref
label
Factor value
E-‐TABM-‐865 Ms4a1 ENSMUSG00000024673 mus musculus liver DOWN -‐140.00183 8.40E-‐34
![Page 26: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/26.jpg)
Exercise 6 solution
RDF Triples SUBJECT PREDICATE OBJECT
mydata:E-‐TABM-‐865 rdf:type efo:EFO_0004033
mydata:E-‐TABM-‐865 mydata:hasResult mydata:result1
mydata:result1 rdf:type sio:SIO_001078
mydata:result1 mydata:factorValue obo:NCBITaxon_10090
mydata:result1
mydata:factorValue obo:UBERON_0002107
mydata:result1 mydata:t-‐stat “-‐140.00183”
mydata:result1 mydata:p-‐value “8.40E-‐34”
mydata:result1 mydata:dbxref idenSfiers:ENSMUSG00000024673
idenSfiers:ENSMUSG00000024673
rdfs:label Ms4a1
E-‐TABM-‐865 Ms4a1 ENSMUSG00000024673 mus musculus liver DOWN -‐140.00183 8.40E-‐34
![Page 27: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/27.jpg)
Generating RDF
• CSV2RDF
• OpenRefine
• Scripts
• Output serialised RDF
• Simple to print out N3 to files
• Use an RDF API
• Most programming language will have RDF libraries
• Other options
• RDB2RDF: Work directly off your relational database
![Page 28: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/28.jpg)
A simple CSV 2 RDF in Perl
• Example script data2rdf.pl
• Read input file (raw-data.csv)
• Convert rows into triple statements according to my schema
• Generate appropriate URIs for things
• Print out triple statement in simple N3 format
![Page 29: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/29.jpg)
Exercise 7
• Look at the N-triple file generated (raw-data.rdf) • See if you understand how that translates to the Schema
• Convert this file to RDF/XML using online converter
• http://www.rdfabout.com/demo/validator
![Page 30: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/30.jpg)
Blank nodes (bnode)
• You can use an anonymous resource in RDF
• They can be the subject or object of any triple
• Denote the existence of a “thing” but you don’t have to explicitly give it a URI
• In our scenario we created a URI for the Gene expression value, we didn’t have to
• Using turtle syntax we could have said
mydata:E-‐TABM-‐865 rdf:type efo:EFO_0004033 . mydata:E-‐TABM-‐865 mydata:hasResult [ rdf:type sio:SIO_001078 ; mydata:factorValue obo:UBERON_0002107 ; mydata:t-‐stat “-‐140.00183”]
![Page 31: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/31.jpg)
Querying RDF
• Specialised databases for indexing RDF graphs
Stardog
Apache Jena
Sesame Virtuoso
Allegrograph
OWLIM
![Page 32: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/32.jpg)
OpenRDF sesame
• http://www.openrdf.org • OpenRDF Sesame is a de-facto standard framework for processing RDF data. This includes parsers,
storage solutions (RDF databases a.ka. triplestores), reasoning and querying, using the SPARQL query language. It offers a flexible and easy to use Java API that can be connected to all leading RDF storage solutions.
• Easy to deploy (Java servlet)
• Provides SPARQL endpoint and workbench for administration tasks
• Scalable to millions of triples
• Other more scalable implementations of the storage and inference layer available
• OWLIM
• Virtuoso
• Bigdata
![Page 33: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/33.jpg)
The Sesame workbench
• We have a workbench online for you to play with
• http://goo.gl/K5wmIe
• (http://ec2-54-72-241-21.eu-west-1.compute.amazonaws.com/openrdf-workbench)
• Use this to create a repository
• Upload data
• Test queries
![Page 34: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/34.jpg)
Exercise 8
• Create a new in memory store repository for your data
![Page 35: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/35.jpg)
Exercise 9
• Load RDF Data file (use raw-data.rdf form the dropbox folder)
• Set Data format to N-Triples
• Set base URI to
• http://www.mydomain.com/mydata#
![Page 36: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/36.jpg)
SPARQL endpoint
![Page 37: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/37.jpg)
Exploring a SPARQL endpoint
• Show me some triples
• Select all data = not a very friendly query!
• Find the types of things
• http://www.w3.org/TR/rdf-sparql-query/
SELECT * WHERE { ?subject ?predicate ?object }
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> SELECT DISTINCT ?type WHERE { ?subject rdf:type ?type } LIMIT 10
![Page 38: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/38.jpg)
Describing a resource
• What is known about
• http://www.mydomain.com/mydata#E-TABM-865
DESCRIBE <http://www.mydomain.com/mydata#E-TABM-865>
![Page 39: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/39.jpg)
Exercise 11 – SPARQL endpoint
• Try some of the previous queries on the SPARQL endpoint
• Explore clicking around URIs to follow links through the data
• Explore download formats
• SPARQL query results XML, JSON, CSV
![Page 40: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/40.jpg)
Binding variables
• Get all things that are types of experiment
• Experiment URI http://www.ebi.ac.uk/efo/EFO_0004033
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX efo:<http://www.ebi.ac.uk/efo/> SELECT DISTINCT ?thing WHERE { ?thing rdf:type efo:EFO_0004033 } LIMIT 10
![Page 41: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/41.jpg)
Exercise 12
• Write a SPARQL query to get the labels for all experiments (hint: Use the rdfs:label relation)
• Tip: Store SPARQL queries that work in a text file, easier to edit and re-use previous queries
![Page 42: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/42.jpg)
Exercise 12 solution
• Select labels for all classes
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX efo:<http://www.ebi.ac.uk/efo/> SELECT DISTINCT ?label WHERE { ?thing rdf:type efo:EFO_0004033 . ?thing rdfs:label ?label }
![Page 43: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/43.jpg)
Exercise 13
• Explore the raw-data.rdf files and try and write a SPARQL query that would show you all the genes UP in “liver” samples
• Hint:
• UP = http://semanticscience.org/resource/SIO_001081
• “liver” = http://purl.obolibrary.org/obo/UBERON_0002107
![Page 44: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/44.jpg)
Exercise 13 solution
• Get genes up regulated in liver samples
PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX mydata:<http://www.mydomain.com/mydata#> PREFIX sio:<http://semanticscience.org/resource/> PREFIX obo:<http://purl.obolibrary.org/obo/> SELECT DISTINCT ?geneid ?label WHERE { ?result mydata:dbXref ?geneid . ?geneid rdfs:label ?label . ?result rdf:type sio:SIO_001081 . ?result mydata:hasFactorValue obo:UBERON_0002107 }
![Page 45: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/45.jpg)
Filtering SPARQL queries
• Restrict values in results from matches in the graph patterns
• String matching
• FILTER regex(?x, "pattern" [, "flags"])
• E.g. FILTER regex (?label, “E-TABM-865”)
• Testing values
• FILTER (?tstat >0 24)
![Page 46: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/46.jpg)
Exercise 14
• Get all experiments where label contain “GEOD”
• Get all genes up regulated with a t-statistic < 0
![Page 47: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/47.jpg)
Exercise 14 solutions PREFIX rdf:<http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX efo:<http://www.ebi.ac.uk/efo/> SELECT DISTINCT ?label WHERE { ?thing rdf:type efo:EFO_0004033 . ?thing rdfs:label ?label . FILTER regex(?label, "geod", "i") }
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX mydata:<http://www.mydomain.com/mydata#> SELECT DISTINCT ?geneid ?label ?tstat WHERE { ?result mydata:dbXref ?geneid . ?geneid rdfs:label ?label . ?result mydata:hasTStatistic ?tstat . FILTER (?tstat < 0) }
![Page 48: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/48.jpg)
Enriching data
• Our dataset is still a bit sparse
• e.g. no labels or descriptions for sample information
• We used URIs form external ontologies to define some concepts
• Let’s integrate our dataset with those ontologies and do some querying
![Page 49: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/49.jpg)
Exercise 15
• Find the Experimental Factor Ontology ontology file
• Can get from Web or efo.owl in the course material
• Load the ontology file into the same repository as your raw data RDF
• Now describe the liver URI
• http://purl.obolibrary.org/obo/UBERON_0002107
• Create a SPARQL query to pull out labels for all of the factor values
![Page 50: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/50.jpg)
Exercise 15 solution
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX mydata:<http://www.mydomain.com/mydata#> SELECT DISTINCT ?factor?label WHERE { ?result mydata:hasFactorValue ?factor . ?factor rdfs:label ?label }
DESCRIBE <http://purl.obolibrary.org/obo/UBERON_0002107>
![Page 51: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/51.jpg)
Exploiting knowledge
• As an ontology, EFO contains lots of biological domain knowledge
• E.g. classification of diseases, organism parts etc..
• We can exploit this knowledge to enhance queries over our datasets
• E.g. What are all the parent types (or categories) for liver in EFO
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX obo:<http://purl.obolibrary.org/obo/> SELECT DISTINCT ?parent ?label WHERE { obo:UBERON_0002107 rdfs:subClassOf ?parent . ?parent rdfs:label ?label }
![Page 52: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/52.jpg)
Property paths
• We can query along paths of relations using SPARQL
• This is useful for exploiting transitive relationships
• Special SPARQL 1.1 syntax for property paths “*”
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX obo:<http://purl.obolibrary.org/obo/> SELECT DISTINCT ?parent ?label WHERE { obo:UBERON_0002107 rdfs:subClassOf* ?parent . ?parent rdfs:label ?label }
![Page 53: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/53.jpg)
Exercise 16 – Ontology query
• Get all genes expressed in your data where the factor values is a child of “organism part” (efo:EFO_0000635)
![Page 54: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/54.jpg)
Exercise 16 solution
• Get all genes expressed in your data where the factor values is a child of “organism part” (efo:EFO_0000635)
PREFIX rdfs:<http://www.w3.org/2000/01/rdf-schema#> PREFIX mydata:<http://www.mydomain.com/mydata#> PREFIX efo:<http://www.ebi.ac.uk/efo/> SELECT DISTINCT ?geneid ?label ?factor WHERE { ?result mydata:dbXref ?geneid . ?geneid rdfs:label ?label . ?result mydata:hasFactorValue ?factor . ?factor rdfs:subClassOf* efo:EFO_0000635 }
![Page 55: Introduction to RDF and the Semantic Web for the life sciences · What types of things are in my data? 2. Can I identify these things? 3. How are these things related to other things?](https://reader034.vdocument.in/reader034/viewer/2022050122/5f5242995f092e1b1c35adef/html5/thumbnails/55.jpg)
End of 1st practical session
• Introduced modeling data in RDF
• Three questions I always ask of data
• What is it (types)?
• What is it (id)?
• What is it related to?
• Generating RDF statements in N-Triples
• Loading RDF into a triple store
• Basic querying with SPARQL