migrating to the semantic web: bioinformatics as a case study. phillip lord, dept of computer...
Post on 15-Jan-2016
220 views
TRANSCRIPT
![Page 1: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/1.jpg)
Migrating to the Semantic Web: Bioinformatics as a case
study.
Phillip Lord,
Dept of Computer Science,
University of Manchester
![Page 2: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/2.jpg)
What is the Semantic Web
OWLRDFXML
We are here!
![Page 3: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/3.jpg)
The talk
• Three (and a half) example case studies• Two different technologies. • Why we choose the different technologies.
![Page 4: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/4.jpg)
RDF in a nutshell;Tim Berners-Lee’s original vision…
1989
![Page 5: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/5.jpg)
OWL in a nutshell
![Page 6: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/6.jpg)
![Page 7: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/7.jpg)
The Motivation
“At the doctor’s office, Lucy instructed her semantic web agent. It promptly retrieved information about her Mom’s prescribed treatment, looked up a list of several providers within 20 miles of home, with a good trust rating.”
![Page 8: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/8.jpg)
Scientific American, May 2001:
Beware of the
Hype!
![Page 9: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/9.jpg)
The Motivating Example
LucyDoctor
![Page 10: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/10.jpg)
myGrid
• UK e-Science Pilot Project.• Oct 2001 – April 2005.• £3.4 million.
• £0.4 million studentships. Newcastle
NottinghamManchester
Southampton
Hinxton
Sheffield
![Page 11: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/11.jpg)
Data(type)-intensive bioinformatics
ID MURA_BACSU STANDARD; PRT; 429 AA.DE PROBABLE UDP-N-ACETYLGLUCOSAMINE 1-CARBOXYVINYLTRANSFERASEDE (EC 2.5.1.7) (ENOYLPYRUVATE TRANSFERASE) (UDP-N-ACETYLGLUCOSAMINEDE ENOLPYRUVYL TRANSFERASE) (EPT).GN MURA OR MURZ.OS BACILLUS SUBTILIS.OC BACTERIA; FIRMICUTES; BACILLUS/CLOSTRIDIUM GROUP; BACILLACEAE;OC BACILLUS.KW PEPTIDOGLYCAN SYNTHESIS; CELL WALL; TRANSFERASE.FT ACT_SITE 116 116 BINDS PEP (BY SIMILARITY).FT CONFLICT 374 374 S -> A (IN REF. 3).SQ SEQUENCE 429 AA; 46016 MW; 02018C5C CRC32; MEKLNIAGGD SLNGTVHISG AKNSAVALIP ATILANSEVT IEGLPEISDI ETLRDLLKEI GGNVHFENGE MVVDPTSMIS MPLPNGKVKK LRASYYLMGA MLGRFKQAVI GLPGGCHLGP RPIDQHIKGF EALGAEVTNE QGAIYLRAER LRGARIYLDV VSVGATINIM LAAVLAEGKT IIENAAKEPE IIDVATLLTS MGAKIKGAGT NVIRIDGVKE LHGCKHTIIP DRIEAGTFMI
![Page 12: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/12.jpg)
Web Service (Grid Service) communication fabricWeb Service (Grid Service) communication fabric
AMBITText Extraction
Service
Provenance
Personalisation
Event Notification
Gateway
Service and WorkflowDiscovery
myGrid Information Repository
Ontology Mgt
Metadata Mgt
Work bench Taverna Talisman
Native Web Services
SoapLab
Web Portal
Legacy apps
Registries
Ontologies
FreeFluo Workflow Enactment Engine
OGSA-DQPDistributed Query Processor
Bio
info
rmat
icia
nsT
ool P
rovi
ders
Ser
vice
Pro
vide
rsA
pplicationsC
ore servicesE
xternal servicesService Stack
Views
Legacy apps
GowLab
![Page 13: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/13.jpg)
WBS Workflows:
GenBank Accession No
GenBank Entry
Seqret
Nucleotide seq (Fasta)
GenScanCoding sequence
ORFs
prettyseq
restrict
cpgreport
RepeatMasker
ncbiBlastWrapper
sixpack
transeq
6 ORFs
Restriction enzyme map
CpG Island locations and %
Repetative elements
Translation/sequence file. Good for records and publications
Blastn Vs nr, est databases.
Amino Acid translation
epestfind
pepcoil
pepstats
pscan
Identifies PEST seq
Identifies FingerPRINTS
MW, length, charge, pI, etc
Predicts Coiled-coil regions
SignalPTargetPPSORTII
InterProPFAMPrositeSmart
Hydrophobic regions
Predicts cellular location
Identifies functional and structural domains/motifs
Pepwindow?Octanol?
ncbiBlastWrapper
URL inc GB identifier
tblastn Vs nr, est, est_mouse, est_human databases.Blastp Vs nr
RepeatMasker
Query nucleotide sequence ncbiBlastWrapper
Sort for appropriate Sequences only
Pink: Outputs/inputs of a servicePurple: Tailor-made servicesGreen: Emboss soaplab services Yellow: Manchester soaplab services Grey: Unknowns
RepeatMasker
![Page 14: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/14.jpg)
![Page 15: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/15.jpg)
Semantic discovery• Query-ontology – discovering
workflows and services described in the registry by building a query in Taverna.
• A common ontology is used to annotate and query.
• Look for all workflows that accept an input of semantic type nucleotide sequence.
• Aim to have semantic discovery over public view on the Web.
![Page 16: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/16.jpg)
Service annotation
• Adding structured metadata to a workflow registration to enable others to discover and reuse it more effectively. E.g. what semantic type of input does it accept.
![Page 17: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/17.jpg)
Semantic Discovery
View annotations on workflow
Pedro data capture tool
Drag a workflow entry into the explorer pane and the workflow loads.Drag a service/ workflow to the scavenger window for inclusion into the workflow
![Page 18: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/18.jpg)
Biologist
Ontologist
Service Providers
![Page 19: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/19.jpg)
Problems when doing In Silico ExperimentsExperiments being performed repeatedly, at different site, different time, by different users or groups;
Scientists
In silico experiments:
A large repository of records about experiments!!•verification of data;• “recipes” for experiment designs;• explanation for the impact of changes;• ownership;• performance of services;• data quality;
![Page 20: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/20.jpg)
The Current State of the Art
![Page 21: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/21.jpg)
Tim Berners-Lee’s original vision… 1989
![Page 22: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/22.jpg)
A Semantic Web of Provenancewha
t
Literature relevant to
provenance study or data in this
workflow
Literature relevant to
provenance study or data in this
workflowDAML+OiL Ontologies linking provenance documents
ExperimentNotes
whyInterlinking graph of the workflow that generates the provenance logs
how
who
Web page of people who has related interests as the owner of the workflow
Provenance record of a workflow run
how/which/when/where
XML
HTML
XML
XML
![Page 23: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/23.jpg)
Population Semantic Data
Web Services
Taverna
FreeFluo
MetadataRepository
Data Repository
LaunchPad Haystack
![Page 24: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/24.jpg)
Haystack from IBM
![Page 25: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/25.jpg)
BiologistBiologist
Database
Biologist
![Page 26: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/26.jpg)
Gene Ontology Next Generation Project(GONG)
• Demonstrate the utility of finer grained concept descriptions in DAML+OIL (OWL-DL)
• Develop methodologies and tools to support the process
![Page 27: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/27.jpg)
Translating theory into practice
• Gene Ontology provides a service to the model organism database community
• Description logic (DL) is a technology born out of computer science research
• OWL is a standard ontology interchange language underpinned by DL
![Page 28: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/28.jpg)
GONG - proof of concept
• Maintaining an exhaustive is-a structure
GO conceptIs-a relationship
Parent
![Page 29: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/29.jpg)
Axis 1:
Chemicals
[chemical] biosynthesis (GO:0009058)
[i] carbohydrate biosynthesis (GO:0016051)
[i] aminoglycan biosynthesis (GO:0006023)
[i] heparin biosynthesis (GO:0030210)
Example: heparin biosynthesis
![Page 30: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/30.jpg)
Axis 1:
Chemicals
Axis 2:
Process
[chemical] biosynthesis (GO:0009058)
[i] carbohydrate biosynthesis (GO:0016051)
[i] aminoglycan biosynthesis (GO:0006023)
[i] heparin biosynthesis (GO:0030210)
[i] heparin metabolism (GO:0030202)
[i] heparin biosynthesis (GO:0030210)
Example: heparin biosynthesis
![Page 31: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/31.jpg)
Axis 1:
Chemicals
Axis 2:
Process
[chemical] biosynthesis (GO:0009058)
[i] carbohydrate biosynthesis (GO:0016051)
[i] aminoglycan biosynthesis (GO:0006023)
[i] heparin biosynthesis (GO:0030210)
[i] glycosaminoglycan biosynthesis (GO:0006024)
[i] heparin metabolism (GO:0030202)
[i] heparin biosynthesis (GO:0030210)
Example: heparin biosynthesis
![Page 32: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/32.jpg)
Is this important?
• Missing is-a not noticed by users
• BUT… improves fidelity of DB record retrieval.
– Asking for gene products involved in ‘glycosaminoglycan biosynthesis’ will lead to an additional result:
O94923 SPTr ISS - D-glucuronyl C5-epimerase (Fragment)
![Page 33: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/33.jpg)
Paraphrased reasoning process
• heparin biosynthesis– class heparin biosynthesis defined
subClassOf biosynthesis restriction onProperty acts_on hasClass heparin
• glycosaminoglycan biosynthesis– class glycosaminoglycan biosynthesis defined
subClassOf biosynthesis restriction onProperty acts_on hasClass glycosaminoglycan
Is-a
![Page 34: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/34.jpg)
Inferring a new is-a link
• heparin biosynthesis– class heparin biosynthesis defined
subClassOf biosynthesis restriction onProperty acts_on hasClass heparin
• glycosaminoglycan biosynthesis– class glycosaminoglycan biosynthesis defined
subClassOf biosynthesis restriction onProperty acts_on hasClass glycosaminoglycan
Is-a
Is-a
![Page 35: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/35.jpg)
Results
• Carbohydrate metabolism ~250 concepts– 22 additional is-a links 17 of which now in GO
• Amino acid metabolism ~ 250 concepts– Further 17 additional is-a links now in GO
• GO team will be reviewing results for metabolism as a whole once we have the tools to support the process
• Useful results come from even a partial coverage
![Page 36: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/36.jpg)
Build a practical environment
• Tools needed for:– Creating OWL definitions
– Tracking changes
– Reporting reasoning results
– Viewing definitions
![Page 37: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/37.jpg)
Reporting tools
![Page 38: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/38.jpg)
OWL for GONG
BiologistOntologist
![Page 39: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/39.jpg)
Conclusions
• Three problems, three different solutions, all making use of semantic web technologies.
• A little semantics can go a long way. • The expressivity of the language has to be chosen at least
in part based on the tasks to be performed, and the user base.
• Tools, tools, tools.
![Page 40: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/40.jpg)
![Page 41: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/41.jpg)
Acknowledgments
• Jane Lomax and Midori Harris of the GO editorial team for help and advice and responding to the suggested changes
• UMLS and MeSH which provided valuable resources for chemical information• Sean Bechhofer for development on OilEd
• Project funded as a subcontract of the DARPA DAML programme
Chris Wroe, Robert Stevens, Carole GobleUniversity of Manchester, UKMichael AshburnerEBI, Hinxton, UK
![Page 42: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/42.jpg)
Acknowledgements
myGrid is an EPSRC funded UK eScience Program Pilot Project
Particular thanks to the other members of the Taverna project, http://taverna.sf.net
![Page 43: Migrating to the Semantic Web: Bioinformatics as a case study. Phillip Lord, Dept of Computer Science, University of Manchester](https://reader036.vdocument.in/reader036/viewer/2022062309/56649d775503460f94a5944f/html5/thumbnails/43.jpg)
myGrid People
Core• Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris,
Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock, Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.
Users• Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences,
University of Newcastle, UK• Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UKPostgraduates• Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis,
Tracy Craddock, Alastair HampshireIndustrial • Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)• Robin McEntire (GSK)Collaborators• Keith Decker