making semantics do some work
DESCRIPTION
keynote talk at practical Semantic Astronomy (SemAst09), glasgow, 2009TRANSCRIPT
![Page 1: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/1.jpg)
Making Semantics do Some Work
Robert StevensBioHealth Informatics GroupSchool of Computer Science
University of [email protected]
![Page 2: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/2.jpg)
Introduction
• What’s the use of highly axiomatised ontological descriptions?
• Two use cases:
• Classifying instances based on features: New discoveries;
• Building a complex terminology.
• Cost and benefit.
• Conclusions
![Page 3: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/3.jpg)
Protein Classification• Proteins divided into broad functional classes
“Protein Families”• Families sub-divided to give family
classifications• Class membership can be determined by
“protein features”, such as domains, etc.• Resources exist for feature detection via
primary sequence– but not class membership• Current Limitation of Automated Tools• Needs human knowledge to recognise class
membership
![Page 4: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/4.jpg)
Finding Domains on a Sequence
A search of the linear sequence of protein tyrosine phosphatase type K – identified 9 functional domains
>uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine phosphatase kappa precursor (EC 3.1.3.48) (R-PTP-kappa).
MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHVSAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNPGTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYIAIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV………..
![Page 5: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/5.jpg)
Why Classify?
• Classification and curation of a genome is the first step in understanding the processes and functions happening in an organism
• Classification enables comparative genomic studies - what is already known in other organisms
• The similarities and differences between processes and functions in related organisms often provide the greatest insight into the biology
• In silico characterisation is the current bottleneck
![Page 6: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/6.jpg)
Phosphatase Classification
• Diagnostic phosphatase domains/motifs – sufficient for membership of the protein phosphatase superfamily
• Any protein having a phosphatase domain is a member of the phosphatase super-family
• Other motifs determine a protein’s place within the family
• Usually needs human to recognise that features detected imply class membership
• Can these be captured in an ontology?
![Page 7: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/7.jpg)
OWL represents classes of instances
A
BC
![Page 8: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/8.jpg)
Necessity and Sufficiency
• An R2A phosphatase must have a fibronectin domain• Having a fibronectin domain does not a phosphatase
make• Necessity -- what must a class instance have? • Any protein that has a phosphatase catalytic domain is a
phosphatase enzyme• All phosphatase enzymes have a catalytic domain• Sufficiency – how is an instance recognised to be a
member of a class?
![Page 9: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/9.jpg)
Definition of Tyrosine Phosphatase
Class: TyrosineReceptorProteinPhosphatase
EquivalentTo:
Protein That- contains atLeast 1 ProteinTyrosinePhosphataseDomain and
- contains 1 TransmembraneDomain
![Page 10: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/10.jpg)
…there are known knowns; there are things we know we know. We also know there are
known unknowns; that is to say we know there are some things we do not know. But
there are also unknown unknowns -- the ones we don't know we don't know.
![Page 11: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/11.jpg)
Definition for R2A Phosphatase
Class: R2AEquivalentTO: Protein That- contains 2 ProteinTyrosinePhosphataseDomain and- contains 1 TransmembraneDomain and - contains 4 FibronectinDomains and- contains 1 ImmunoglobulinDomain and- contains 1 MAMDomain and- contains 1 Cadherin-LikeDomain and- contains only TyrosinePhosphataseDomain or
TransmembraneDomain or FibronectinDomain or ImnunoglobulinDomain or Clathrin-LikeDomain or ManDomain
![Page 12: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/12.jpg)
Automated Reasoning
• An OWL-DL ontology mapped to its DL form as a collection of axioms
• An automated reasoner checks for satisfiability – throws out the inconsistent and infers subsumption
• Defined classes (where there are necessary and sufficient restrictions) enable a reasoner to infer subclass axioms
• Also infer to which class an individual belongs
![Page 13: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/13.jpg)
Incremental Addition of Protein Functional Domains
Phosphatase catalytic
Cadherin-like
Immunoglobulin
MAM domain Cellular retinaldehyde
Adhesion recognition Transmembrane
Fibronectin III Glycosylation
![Page 14: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/14.jpg)
Classification of the Classical Tyrosine Phosphatases
![Page 15: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/15.jpg)
What is the Ontology Telling Us?• Each class of phosphatase defined in terms of
domain composition• We know the characteristics by which an
individual protein can be recognised to be a member of a particular class of phosphatase
• We have this knowledge in a computational form• If we had protein instances described in terms of
the ontology, we could classify those individual proteins
• A catalogue of phosphatases
![Page 16: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/16.jpg)
Description of an Instance of a Protein
Individual: P21592
Types: Protein,hasDomain 2 ProteinTyrosinePhosphataseDomain hasdomain 1 TransmembraneDomain,, hasdomain 4 FibronectinDomain, hasDomain 1 ImmunoglobulinDomain, hasdomain 1 MAMDomain, hasdomain 1 Cadherin-LikeDomain
![Page 17: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/17.jpg)
Instance: P21592 TypeOf: Protein ThatFact: hasDomain 2 ProteinTyrosinePhosphataseDomain and Fact: hasdomain 1 TransmembraneDomain and Fact: hasdomain 4 FibronectinDomains and Fact: hasDomain 1 ImmunoglobulinDomain and Fact: hasdomain 1 MAMDomain and Fact: hasdomain 1 Cadherin-LikeDomain
Tyrosine Phosphatase(containsDomain some TransmembraneDomain) and(containsDomain at least 1 ProteinTyrosinePhosphataseDomain)
R2A Phosphatase(containsDomain some MAMDomain) and(containsDomain some ProteinTyrosineCatalyticDomain or ImmunoglobulinDomain) and(containsDomain some FibronectinDomain or FibronectinTypeIIIFoldDomain) and(containsDomain exactly 2 ProteinTyrosinePhosphataseDomain)
![Page 18: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/18.jpg)
Classifying Proteins>uniprot|Q15262|PTPK_HUMAN Receptor-type protein-tyrosine
phosphatase kappa precursor (EC 3.1.3.48) (R-PTP-kappa).MDTTAAAALPAFVALLLLSPWPLLGSAQGQFSAGGCTFDDGPGACDYHQDLYDDFEWVHVSAQEPHYLPPEMPQGSYMIVDSSDHDPGEKARLQLPTMKENDTHCIDFSYLLYSQKGLNPGTLNILVRVNKGPLANPIWNVTGFTGRDWLRAELAVSSFWPNEYQVIFEAEVSGGRSGYIAIDDIQVLSYPCDKSPHFLRLGDVEVNAGQNATFQCIATGRDAVHNKLWLQRRNGEDIPV………..
InterPro
Instance Store
Reasoner
Translate
Codify
![Page 19: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/19.jpg)
So Far…..
• Human phosphatases have been classified using the system
• The ontology classification performed equally well as expert classification
• The ontology system refined classification- DUSC contains zinc finger domain Characterised and conserved – but not in classification- DUSA contains a disintegrin domain previously uncharacterised – evolutionarily conserved
• A new kind of phosphatase?
![Page 20: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/20.jpg)
Aspergillus fumigatus• Phosphatase compliment very different from human
>100 human <50 A.fumigatus• Whole subfamilies ‘missing’
Different fungi-specific phosphorylation pathways?No requirement for tissue-specific variations?
• Novel serine/threonine phosphatase with homeobox Conserved in aspergillus and closely related species, but not in any other
Again, a new phosphatase?
![Page 21: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/21.jpg)
Generic Technique
• Feature detection
• Categories defined in terms of those features
• Produce catalogue of what you currently know
• Highlight cases that don’t match current knowledge
![Page 22: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/22.jpg)
The Cell type Ontology
• Some 880 terms• Describing cell function, lineage,
developmental stage, ploidy, secretion, species,…
• Not explicitly classified according to anatomy• Uses is-a and developsFrom• Used to describe cell types used in
experiments
![Page 23: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/23.jpg)
OBO Cell Type Ontology
![Page 24: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/24.jpg)
Issues with Current CTO
• History: A need was seen and a few days was spent “lashing” together an ontology by hand
• Contains lots of knowledge• Asserted multiple inheritance: Humans will
make slips and it is difficult• Some biological mistakes• All the knowledge is within the “is-a”
relationships and implicit in the cell names
![Page 25: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/25.jpg)
CTO Axes of Classification
• Histology: What cells look like • Lineage: Whence a given cell develops• Ploidy: How sets of chromosomes in a cell• Nucleation: How many nuclei• Secretion & accumulation: What chemicals a
cells secretes or accumulates• Function: What does the cell do• Location: In anatomy• Species: In what taxa does the cell exist• And some others
![Page 26: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/26.jpg)
Implicit Knowledge
• Anatomy: muscle cell; red blood cell
• Maturity: immature t-lymphocyte
• Cell surface protein: CD45 positive lymphocyte
• Size
• Shape
![Page 27: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/27.jpg)
Problems
• Tangles
• Hard to maintain
• Difficult to add a new cell
• Inflexible queries: What about hormone secreting mesodermal cells?
• Information hidden inside term names
![Page 28: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/28.jpg)
A Tangled Ontology of Cars
![Page 29: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/29.jpg)
Describing a Big Blue Ford Car
Class: BigBlueFordCar
SubClassOf: Car
that hasColour some Blue
and hasSize some Big
and hasManufacturer some Ford
![Page 30: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/30.jpg)
Modules
• Choose a primary axis: In this case Vehicle
• Other axes are represented in separate modules (Colour, Size (qualities) and manufacturer)
• Represent other aspects of classes through restrictions
• (Spot the ontological howler in this toy example)
![Page 31: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/31.jpg)
Definition of a Red Car
Class: RedCar
EquivalentTo: Car
that hasColour some Red
• Any car that has the colour red is recognised to be a member of the class RedCar
• The reasoner works it all out and builds the hierarchy for you
![Page 32: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/32.jpg)
Normalisation
• This technique of “pulling” apart tangled ontologies is “normalisation”
• Makes for cleaner modelling
• Makes for re-usable components
• The reasoner builds the taxonomy “completely”
• A new car (e.g., yellow Saab” is described and it just appears in the right place
![Page 33: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/33.jpg)
What We Did
• Examined CTO
• Chose primary axis of classification
• All other axes added as restrictions on class membership
• Describe cells
• Build ontology
• Use reasoner
![Page 34: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/34.jpg)
Ontologies Used
CTO Ontolog
y
PATO Ontology
GO
Biological Process
GO
Cellular Component
SpeciesTaxonomy
Anatomy
Nucleation
Morphology
Size
Ploidy
Muscle ContractionSecretion
Bacillus anthracis str. Ames
ChloroplastCell Membrane
Epithelium
Kidney
![Page 35: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/35.jpg)
Mammalian Red Blood Cell
Class: RedBloodCell
SubclassOf: Cell
That hasNucleation some Anucleate
and participatesIn some OxygenTransport
and existsIn some mammalia
and part_of some BloodTissue
and developsFrom some Reticulocyte
![Page 36: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/36.jpg)
Mesodermal Lineage Cells
Class: MesodermalLineageCell
EquivalentTo: Cell
That developsFrom some MesodermalCell
(developsFrom is transitive)
![Page 37: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/37.jpg)
Spreadsheet
![Page 38: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/38.jpg)
Workflow
Spreadsheet CVS OPPL
OWL
Ontology
Reasoned
Ontology
![Page 39: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/39.jpg)
Secreting CellsClass: EpinephrinSecretingCell
SubclassOf: Cell
That belongs_to_line some Somatic
and has_nucleation some mononucleate
and has_ploidy some diploid
and potentiality some TerminallyDifferentiated
and participates_in some EpinephrineBiosyntheticProcess
and participates_in EpinephrineSecretion
Class: ProlactinSecretingCell
SubclassOf: Cell
That belongs_to_line some Somatic
and has_nucleation some mononucleate
and has_ploidy some diploid
and potentiality some TerminallyDifferentiated
and participates_in some PeptideHormoneSecretion
and participates_in some ProlactinSecretion
![Page 40: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/40.jpg)
Defined CellsClass: SecretoryCell
EquivalentTo: Cell
that participates_in some (secretion or
(part_of some secrection)
Class: EndocrineCell
EquivalentTo: Cell
that participates_in some (EndocrineProcess or
(part_of some EndocrineProcess)
![Page 41: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/41.jpg)
Asserted Hierarchy
![Page 42: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/42.jpg)
Inferred Hierarchy
![Page 43: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/43.jpg)
What We Found
• More subsumption relationships
• The “is-a” hierarchy is complete
• Explicitness made us ask questions
• Found bad structure
• Can just slip in a new cell
• Can make arbitrary queries based on any of the types of axis
![Page 44: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/44.jpg)
Conclusions
• Can use strict semantics and automated reasoning to build structurally sound ontologies
• Can catalogue instances and make discoveries• If an object can be recognised by its features
and features can be computationally generated classification can be automated
• High cost and high benefit
![Page 45: Making Semantics do Some Work](https://reader036.vdocument.in/reader036/viewer/2022062513/555d04d6d8b42a08668b57b5/html5/thumbnails/45.jpg)
Acknowledgements
• Katy Wolstencroft did the protein phosphtase work as part of her Ph.D.
• The work on the cell type ontology was udnertaken by members of the EPSRC OntoGenesis Network
• All the ontoogy work at Manchester relies on the support and input of the wider BioHealth and Information Management Groups