the pragmatics and formality of authoring ontologiesodsl 2016
Post on 13-Apr-2017
31 Views
Preview:
TRANSCRIPT
Formality and Pragmatics in Authoring Ontologies
Robert Stevens
ODLS 2016
School of Computer ScienceThe University of Manchester
ManchesterUnited Kingdom
M13 9PLRobert.stevens@manchester.ac.uk
Acknowledgements
• On-going work with Phil Lord on normalising the Gene Ontology
• The Gene Ontology folk for making GO• Nico Matentzoglu for my slides• Mercedes Casteleiro for numbers
Formality and Pragmatics
• Formality: Acting strictly according to procedure or rules– Ontological formality– Representational formality
• Pragmatics: Behaviour driven by practical consequences rather than dogma
• There’s a tension between the two
Gene Ontology Molecular Function • D-alanyl carrier activity• acetylcholine receptor regulator activity• antioxidant activity• binding• calcium channel regulator activity• catalytic activity• channel regulator activity• chemoattractant activity• chemorepellent activity• core DNA-dependent RNA polymerase binding
promoter specificity activity• electron carrier activity• enzyme regulator activity• guanyl-nucleotide exchange factor activity• metallochaperone activity• mitochondrial RNA polymerase binding
promoter specificity activity• molecular function regulator
• molecular transducer activity • morphogen activity• negative regulation of molecular function• neurotransmitter receptor regulator activity• nucleic acid binding transcription factor activity• nutrient reservoir activity• positive regulation of molecular function• protein tag• receptor regulator activity• regulation of molecular function• signal transducer activity• structural molecule activity• transcription factor activity, core RNA
polymerase binding• transcription factor activity, protein binding• transcription factor activity, transcription factor
binding• translation regulator activity• transporter activity
NUMBER OF TERMS: ~10khttp://geneontology.org/
What is Molecular Function in GO?
• Describes “function”…?
GO:0003674molecular_function
Elemental activities, such as catalysis or binding, describing the actions of a
gene product at the molecular level. A given gene product may exhibit one
or more molecular functions.
Motivation
• Is GO’s molecular function ontology really function, “little” processes or both?
• Documented as a function• Sometimes looks like a process• Sometimes treated like a process• Confusion of thing with a function and the
function• This can make modelling harder than it need be
A Couple of Observations
• Pragmatically, we commit to GO – it’s the only show in town and it works
• There’s a lot of chemicals around in GO MF• We are biochemistry….!• Probably few functions – strip out all the “non-
function” stuff and see what’s left• Then we can look at the ontological nature of GO
MF• Also, re-create in a more sustainable form
It’s all work in progress
12
A “tangled” ontology of amino acids
13
There are several dimensions of classification here
• The amino acids themselves – a chemical dimension• The size of the amino acids side chain• The charge on the side chain• The polarity of the side chain• The hydrophobicity of the side chain• We can normalise these into separate hierarchies then put them
back together again • Our goal is to put entities into separate trees all formed on the
same basis • Size only talks about size; amino acid only talks about chemical
composition (based on an alpha-carbon with an amino and carboxylic acid group);and so onof classification
14
The dimensions separated
Amino AcidsAlanineArginineAsparagineCysteineGlutamateGlutamineGlycineHistidineIsoleucineLeucineLysineMethioninePhenylalanineProlineSerineThreonineTryptophanTyrosineValine
ChargeNegativeNeutralPositive
SizeTinySmallMediumLarge
PolarityPolarNonpolar
HydrophobicityHydrophobicHydrophilic
15
The process
• Hand-crafted ontologies with a polyhierarchy are “tangled”
• Usually axiomatically lean• We classify along one axis and use
“restrictions” to other modules to capture other axes
• Then re-build the polyhierarchy using the axiomatically rich ontology
16
“Pulling out” dimensions
• Each separate tree must be the same kind of thing
• We don’t mix continuants, processes, qualities, etc
• We don’t mix our classification by, for instance, structure and then charge
• We do that compositionally via defined classes and automated reasoners
17
The amino acid pattern
Class: AminoAcidSubClassOf:
hasSize some Size,hasPolarity some Polar,hasCharge some Charge,hasHydrophobicity some Hydrophobicity
18
An amino acid
Class: LysineSubClassOf:
AminoAcid,hasSize some Large,
hasCharge some Positive,hasPolarity some Polar,hasHydrophobicity some Hydrophilic
19
Rebuilding the hierarchy
• Class: LargeAminoAcid– EquivalentTo: AminoAcid
• and hasSize some Large
• Class: PositiveAminoAcid– EquivalentTo: AminoAcid– and hasCharge some Positive
• Class: LargePositiveAminoAcid– EquivalentTo: LargeAminoAcid and PositiveAminoAcid
20
A “tangled” ontology of amino acids
Other Ontology Topics as Factors in GO MF
molecular function
chemical
chemical role
reaction
biological process
cellular component
cell
protein
sequence
40-60% of terms mention chemicals
Some GO Terms
GO MF
glucose import
cytosolic calcium ion transport
hydrolase activity
tyrosine binding
retroviral strand
transfer activity
electron carrier
activity
Binding
• ~2k terms in the binding bit of GO MF• Remove the chemicals• Leaves “binding”• There is a function “to bind”• There is a process of binding”• Linguistically – an infinitive and a
gerund/nominalised verb
More “to bind” Functions?
• “to bind” is the basic function• Specialise to to bind covalently, to bind via
hydrogen, to bind electrostatically but these are built compositionally with reference to other ontologies
Chemorepellant - chemoattractant activity
GO:0042056chemoattractant activity
Providing the environmental signal that initiates the directed movement of a
motile cell or organism towards a higher concentration of that signal.
GO:0045499chemorepellent activity
Providing the environmental signal that initiates the directed movement of a
motile cell or organism towards a lower concentration of that signal.
To diffuse
GO realisable entities
RealizableEntity
ToCatalyseToBind
ToMark
ToStore
ToDiffuse
ToTransportToMaintainIntegrity
ToProtect
ToModulate
ToRegulate
ToTransduce
Angels on the head of a pin
Distinctions with no (practical) difference
• “Distinction without a difference” – making a distinction where none exists
• Distinctions may exist, but does one need to make them?
• Does a distinction make a practical difference to the use case in hand?
• Make no distinction unless it makes a difference• Beware of consistency…
New function hierarchy
• RealizableEntity– ToCatalyse– ToBind
• ToMark
– ToStore– ToDiffuse– ToTransport– ToMaintainIntegrity– ToProtect– ToModulate
• ToRegulate
– ToTransduce
Is realized
in
Standard pattern – some and only
Has realizable
entity
Gene product
Realisable entity
Biological process
RO candidate: capable_of = shortcut
Is capable ofGene
productBiological process
Some patterns
• hasRealisableEntity some (to_bind and realisedIn only (binding and hasInput some chemical)))
• Add “playsrole some role” for a chemical role like drug
• hasRealisableEntity some (to_catalyse and realisedIn only (catalysis and hasInput some chemical and hasOutput some chemical))
Actually doing it
• Programmatically using Tawny-OWL• Asserted tree of molecular realisables and
molecular processes • Defined classes for the actual terms• May have to restrict to OWL EL for practical
reasons• We shall see…
Strategies for Defined Classes
• Total post co-ordination• Total pre co-ordination• Pre co-ordinate those classes that have been
used in annotation
How many GO MF terms are used?
Annotation fileHomo sapiens: Canonical accessions from UniProt
(goa_human.gaf.gz)
Unfiltered GOA UniProt gene association file
(goa_uniprot_all.gaf.gz)
Total number of GO-UniProt annotations 354 515 ~ 354K 294 208 149 ~ 294M
Unique UniProt IDs 19 055 ~ 19K 45 968 890 ~ 46M
Unique active Molecular Function classes
3 947 ~ 4K 7 521 ~ 7K
Unique active Molecular Function classes used
more than 5 times1 313 ~ 1K
What have we found?• Very few functions• … and some look dispositional• It looks like physics• Most functions involve binding – makes sense• We separate realisables and processes • We live with a bit of “replication”• With molecular processes, do we need molecular funtion?• WE change the upper reaches of GO MF, but…• Does it make any practical difference?
Formality
• Ontological formality• Making the right distinctions drives consistent use of
relationships• Facilitates the kind of analysis we’ve done• Can also be a barrier to progress• Representational formality• Knowing what is being said is useful• Allows clean interpretation• Enables useful reasoning
Pragmatic Decisions
• Commit enough to achieve goals• If re-using take on the commitments of that ontology
– If using OBO commit to OBO– If what you’re using uses something with which you disagree
– get over it• Axiom pragmatics• Don’t represent that which isn’t needed• Truth and beauty• A counsel of perfection is a counsel of despair• I’d make “gene product” explicit
top related