![Page 1: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/1.jpg)
Larisa Soldatova
RCUK FellowThe Department of Computer Science The University of Wales, Aberystwyth
1
ART: ontology based annotation of scientific papers
Manchester, December 2, 2008
![Page 2: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/2.jpg)
Plan of the talk:Plan of the talk:
1. Introduction into ontology.2. An example: classification of biomedical
text by Hagit Shatkay.3. The Robot Scientist project and EXPO.4. LABORS, EXACT (protocols), DD(drug
discovery), OntoDM (data mining). 5. The ART project.6. SAPIENT demo - by Maria Liakata.
![Page 3: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/3.jpg)
3
OntologiesOntologies
An ontology is “a concise and unambiguous description of what principle entities are relevant to an application domain and the relationship between them”*.
*Schulze-Kremer, S., 2001, Computer and Information Sci. 6(21)
Soldatova, UWA
1. Introduction
![Page 4: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/4.jpg)
Ontology partsOntology parts
Classes and instances; Is-a relations; Other relations (part-of, located-in, has-
agent). Axioms.
4
![Page 5: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/5.jpg)
The EXACT description of protocolsThe EXACT description of protocols
5
![Page 6: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/6.jpg)
6
Ontologies in life sciences: Ontologies in life sciences: positive examplespositive examples OBI (biomedical investigations) http://obi-ontology.org/page/Main_Page FMA (Foundational Model of Anatomy ontology )
http://sig.biostr.washington.edu/projects/fm/AboutFM.html MSI for metabolomics experiments*
http://msi-ontology.sourceforge.net/
* Sansone, S., Schober, D., Atherton, H.J., Fiehn, O., Jenkins, H., Rocca-Serra, Ph., Rubtsov, D.V., Spasic, I., Soldatova, L.N., Taylor Ch., Tseng, A., Viant, M.R. and the Ontology Working Group Members. (2007) Metabolomics Standards Initiative - Ontology Working Group. Work in Progress. Metabolomics 3/3: 249-256.
![Page 7: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/7.jpg)
7
Ontologies in life sciences: Ontologies in life sciences: negative examplesnegative examples
MGED ontology for microarray experiments* mmCIF for protein data bank (PDB)**
*Soldatova, L.N., King, R.D., (2005) Are the Current Ontologies used in Biology Good Ontologies? Nature Biotechnology 9/23: 1096-1098.
Soldatova, LN & King, RD. (2006) Reply to Wrestling with SUMO and bio-ontologies. Nature Biotechnology. 24/ 23.
** Schierz, A.C., Soldatova, L.N. and King, R.D. (2007) Overhauling the PDB. Nature Biotechnology 25/4: 437-442.
Schierz, A.C., Soldatova, L.N. and King, R.D. (2007) The reply: Overhauling the PDB. Nature Biotechnology 25/8: 846.
![Page 8: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/8.jpg)
An example: classification of An example: classification of biomedical text by Hagit Shatkay.biomedical text by Hagit Shatkay.
Focus (scientific, generic, methodology); Polarity (affirmative/ negative); Certainty (0-3); Evidence (E0-E3); Direction/Trend (increase/decrease).
Shatkay et.al (2008) Multidemensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users. Bioinformatics 24/18: 2086-2093
![Page 9: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/9.jpg)
Problems?Problems?
Polarity, Certainty, Direction/Trend – are properties of some entities;
Values: scientific, generic, methodology - have overlapping semantics;
Evidence – re-invent the wheal: ECO (evidence codes) http://www.obofoundry.org/cgi-bin/detail.cgi?id=evidence_code
![Page 10: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/10.jpg)
ECO vs Hagit
E0 –no stated evidence or lack of evidence
E1 – mentions of evidence with no explicit reference
E2 – statement is backed by a reference to a supporting publication
E3 – experimental evidence is directly given in the text
![Page 11: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/11.jpg)
is_concretization_of relates a generically dependent continuant to aspecifically dependent continuant. A generically dependent continuant mayinhere in more than one entity. It does so by virtue of the fact that thereis, for each entity that it inheres, a specifically dependent*concretization* of the generically dependent continuant that isspecifically dependent.
Example definition of a Example definition of a relation:relation:
![Page 12: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/12.jpg)
12
2. Ontology of scientific experiments EXPO2. Ontology of scientific experiments EXPO
EXPO* v.1Concepts: 220 Language: OWLhttp://sourceforge.net/projects/exp
o
Tool: Hozo Ontology Editor
*Soldatova, LN & King, RD (2006) An Ontology of Scientific Experiments. Journal of the Royal Society Interface 3/11: 795-803.
2. EXPO
![Page 13: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/13.jpg)
13
EXPO conceptsEXPO concepts
![Page 14: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/14.jpg)
14
SolenodonsSolenodons
Paper investigates the phylogenetic status of the mammalian species Solenodon cubanus and Solenodon paradoxus. i.e. the evolutionary relationship of these animals with all others.
Solenodons have been isolated since the age of the dinosaurs!
![Page 15: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/15.jpg)
15
Scientific Experiment: Hypothesis-forming, Hypothesis-drivenAdmin info about experiment:Title: Mesozoic Origin of West Indian InsectivoresAuthor: Roca, A.L., Bar-Gal, G.K., Eizirik, E., Helgen,M.K.,…… Organisation: 1. National Cancer Institute, Frederick, USA …Status: public academic Reference: Roca, A.L., Bar-Gal, G.K., Eizirik, E., Helgen, M.K.,
Maria, R.at all. Mesozoic origin for West Indian insectivores. Nature,429, 649-651 (2004).
Classification of experiment:Taxonomy DDC(Dewey): 575 Evolution and Genetics
Library of Congress: QH 367.5 molecular phylogeneticsZoology DDC(Dewey): 599: mammalology
Library of Congress: QL351-QL352 Zoology-ClassificationExperimental goal: To discover the phylogeny of the species: Solenodon paradoxus
and Solenodon cubanusNull hypothesis H01:explicitRepresentation style: textLinguistic expression: natural language“Some have suggested a close relationship to soricids (shrews) but not to talpids”Linguistic expression: arificial language: predicate calculus …………………………experimental action 1.1.1 extraction and purification
object: sample of DNAparent group: DNA from Solenodon paradoxus sampling: random samplinginstrument: Qiagen DNA cleanup kit
experimental action 1.1.2 DNA amplification …………………………Experimental Conclusions (Formed Hypotheses) C1) Hypothesis Representation style: textLinguistic expression: natural language
There existed an mammal that is the ancestor of: Solenodons, Soricoidea, Talpoidea, Erinaceidea, and which is not the ancestor of any other mammal.
Linguistic expression: artificial language: predicate calculus …………………………
Prolog:instantiation(solenodon, So), instantiation(soricoidea, Sh), instantiation(talpoidea, T), instantiation(mammalia, An), shared_ancestor([So, Sh], [T], An).% shared_ancestror(Shared, Not_shared).shared_ancestor([X],[Y], An) :-ancestor(An, X).not ancestor(An, Y).shared_ancestor([X|Lx],[Ly], An) :-shared_ancestor([Lx],[Ly], An).ancestor(An, X).shared_ancestor([Lx],[[Y|Ly], An) :-shared_ancestor([Lx],[Ly], An).not ancestor(An, Y).
EXPO: A scientific experiment is a research method which permits the investigation of cause-effect relations between known and unknown (target) variables of the field of study (domain). An experimental result cannot be known with certainty in advance.
EXPO: A classification of experiments is a hierarchical system of categories – types of experiments – according to their domains or used models of experiments. EXPO: A null
hypothesis is an experimental hypothesis that states that a known controlled variable or variables does not have a specified effect on the unknown (target) variable or variables of the domain.
XML: </rdfs:Class><rdfs:Class rdf:ID="classification of experiments"> <rdfs:label>classification of experiments</rdfs:label> <rdfs:subClassOf rdf:resource="#classification" /> <rdfs:comment>Def:A classification of experiments is a hierarchical system of categories - types of experiments - according to their domains or used models of experiments.Axiom: </rdfs:comment>
![Page 16: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/16.jpg)
16
Problems Highlighted by AnnotationProblems Highlighted by Annotation
The use of EXPO makes explicit the different hypotheses described in the paper. The research conclusions are not mentioned as hypotheses in the text. This contrasts with seven null-hypotheses mentioned explicitly in the main text.
The DNA sequences produced during the experiment were stored in the EMBL database using the taxonomic term “Insectivora”. This taxon is now generally recognised to be polyphyletic, and its use contradicts the actual conclusions of the paper.
The authors’ conclusion: “Cuban Solenodons should be classified in a distinct genus, Atopogale”. Our analysis shows that it would be more internally consistent to classify Cuban Solenodons as a distinct family.
etc…..
![Page 17: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/17.jpg)
17
EXPO dissemination:EXPO dissemination:
Soldatova, LN & King, RD. (2006) An Ontology of Scientific Experiments. Journal of the Royal Society Interface 3/11: 795-803.
2006 nomination for World Technology Award (software).
Articles in the New Scientist and the Chronicle of Higher Education.
![Page 18: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/18.jpg)
18
The ConceptThe Concept of a Robot of a Robot Scientist:Scientist:
Background Knowledge
Analysis
Consistent
Hypotheses
Final TheoryExperiment
selectionRobot
Experiment
Results Interpretation
The robot scientist project aims to develop a computer system that is capable of originating its own experiments, physically doing them, interpreting the results, and then repeating the cycle.
Hypothesis Formation
*King et al. (2004) Nature, 427, 247-252.
![Page 19: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/19.jpg)
19
The Application DomainThe Application Domain Systems Biology
Yeast (S. cerevisiae) – best understood Eukaryotic organism.
Strain libraries, e.g. EUROFAN 2 has knocked out each of the 6,000 genes.
Task to learn models of yeast metabolism using selected mutant strains and quantitative growth experiments.
Soldatova et al., UWA
![Page 20: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/20.jpg)
20
The Robot During The Robot During CommissioningCommissioning
Soldatova et al., UWA
![Page 21: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/21.jpg)
21
3. The ART projects 3. The ART projects (an ontology based ARticle (an ontology based ARticle preparation Tool) preparation Tool) Translating scientific papers into a format with an
explicit semantics. Explicit linking of repository papers to data and
metadata. Creation of an example intelligent digital repository.
* Soldatova, L.N., Batchelor, C.R., Liakata, M., Fielding, H.H., Lewis, S. and King, R.D. (2007) ART: An ontology based tool for the translation of papers into Semantic Web format. SIG/ ISMB Proceedings.
**http://www.jisc.ac.uk/whatwedo/programmes/programme_rep_pres/tools/art.aspx
![Page 22: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/22.jpg)
22
Motivation:Motivation:
to improve information retrieval; to provide semantic clarity and explicitness of
represented information and knowledge; to promote the sharing of research results; to facilitate text mining and knowledge discovery
applications.
3. ART
Soldatova et al., UWA
![Page 23: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/23.jpg)
23
The Core Information about The Core Information about Scientific Papers (CISP):Scientific Papers (CISP):
<goal of investigation><object of investigation><motivation for investigation><method><model><experiment><observation><result><conclusion>
3. ART
Soldatova et al., UWA
![Page 24: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/24.jpg)
24
Please tell us your opinion:Please tell us your opinion:
http://www.aber.ac.uk/compsci/Research/bio/art/news/survey/
3. ART
![Page 25: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/25.jpg)
25
Colin R. Batchelor, Royal Society of Chemistry
The related projects
![Page 26: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/26.jpg)
26
The related projects
Colin R. Batchelor, Royal Society of Chemistry
![Page 27: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/27.jpg)
DEMO by Dr Maria Liakata: DEMO by Dr Maria Liakata: SAPIENTSAPIENT
![Page 28: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/28.jpg)
SAPIENT: Semantic Annotation of Papers: Interface and ENrichment Tool
• A web-based tool for sentence by sentene annotation of full papers
• Developed at UWA by Maria Liakata and Claire Q
• SAPIENT can be used to annotate papers with CISP (also incorporated OSCAR annotations)
• SAPIENT can also be used with other sentence based annotation schemes
![Page 29: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/29.jpg)
SAPIENT: Semantic Annotation of Papers: Interface and ENrichment Tool
• SAPIENT currently suitable for manual annotation, to facilitate creation of corpus
• Currently SAPIENT used by 16 experts to create a corpus of full papers from Chemistry/BioChemistry annotated with CISP concepts.
• Papers provided by the RSC
• Corpus creation consists of 3 phases. Now at the start of phase 2.
• Software and manual available on-line.
![Page 30: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/30.jpg)
SAPIENT Architecture
User User INPUTINPUT
Browser Browser ServerServer
Page for paper Upload &
Links to uploadedPapers
Annotations savedIn mode2.xml
Paper saved as source.xml
XMLHttprequest
Click on paper
Paper in.xml
1) Paper is split into sentences with SSSplit 2) Paper saved as mode2.xml
Paper displayedIn dynamic html
Javascript basedAnnotation with CISP
Processing with .xsl
Click on Save
OSCAR annotations
![Page 31: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/31.jpg)
SSSplit: SAPIENT Sentence Splitting
• Rule based sentence splitter developed in Java by Maria Liakata and Claire Q at UWA
• SSSplit developed to take as input full papers in XML
• It fully respects XML annotations pertaining to paper structure,references, formatting.
• Can be used independently of SAPIENT from command line or can be imported as package
• Has been tested successfully on 130 papers
• Software available on-line.
![Page 32: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/32.jpg)
SAPIENT: Semantic Annotation of Papers: Interface and ENrichment Tool
The Future
• Perform machine learning on corpus of papers annotated with CISP
• Automate SAPIENT to suggest CISP annotations in new papers
• Use CISP metadata to generate digital abstract
• Incorporate SAPIENT in publishers’ workflow as tool for editors, reviewers and authors of scientific papers.
![Page 33: ART: ontology based annotation of · SAPIENT demo - by Maria Liakata. 3 Ontologies An ontology is “a concise and unambiguous description of what principle entities are relevant](https://reader033.vdocument.in/reader033/viewer/2022052804/605052ec1268864a9878a457/html5/thumbnails/33.jpg)
SAPIENT: Semantic Annotation of Papers: Interface and ENrichment Tool
SAPIENT and SSSplit can be downloaded from:
http://www.aber.ac.uk/compsci/Research/bio/art/
For comments or questions contact [email protected]