phenotype annotation chris mungall lawrence berkeley labs ncbo go

44
Phenotype annotation Chris Mungall Lawrence Berkeley Labs NCBO GO

Upload: cody-higgins

Post on 26-Dec-2015

222 views

Category:

Documents


0 download

TRANSCRIPT

Phenotype annotation

Chris Mungall

Lawrence Berkeley Labs

NCBOGO

Outline

• Principles of Compositionality• Tour of PATO• Pre vs post composition• Quantitative phenotypes• Next steps

Phenotype annotation: why?

• To shed light on the relationships between genes, environment and phenotype

• To compare genes and phenotypes across organisms

• To improve human health and wellbeing

Difficulties

• Phenotypes can be complex– Descriptions are often composite– Encompass relationships between different

kinds of entities, at different levels of granularity

– Different ways of describing the same thing

• Descriptions must be rigorous and unambiguous– Ensures meaningful analyses and

comparisons within and between organisms

Compositionality is essential for describing

phenotypes• Compositionality is a principle of good

ontology design– aka building blocks, cross-products,

normalised/modular design– Create complex descriptions (definitions)

from simpler ones

• Descriptions can be composed at any time– Ontology construction time (pre-composition)– Annotation time (post-composition)

An example of compositionality

• Plasma membrane of spermatocyte• Plasma membrane [GO CC]• Spermatocyte [OBO Cell]

• Formal means of composition• Genus-differentia

a plasma membrane which is part_of a spermatocyte

GO-CC OBO-REL Cell

Genus Differentia

Compositionality and ontology tools

• Composition supported by:– Phenote– OBO-Edit

• Cross-product plugin

– Protégé-OWL– SWOOP– …and others

Advantage: Automatic DAG calculation

a plasma membrane which is part_of a spermatocyte

a membrane which is part_of a germ cell

The building blocks of phenotype descriptions:

EQ• Entities and qualities (EQ)

– (Bearer) Entity• E.g: compound eye, spermatocyte, blood, wing

growth, scale morphogenesis

– Quality (aka property, attribute)• A kind of dependent continuant• Defined in PATO• E.g: green, hot, squamous, rugose, edematous,

light-sensitivity, luminescent, ectopic, arrested, decomposed

Formal treatment of EQ

• We must be clear about what we mean when we compose an E and a Q– Otherwise we will have incomplete query

results and erroneous statistics in annotations

– The meaning must be computable

• Formally, an EQ description defines:a Quality which inheres_in a bearer entity

Example

normal eya[1]/eya[1]

E Q

Cell death in eye

Increased rate

Eye disc cell small

Eye disc cell refractile

Kinds of entities which can be bearers of biological

qualities• Continuants (3D entities)

– Cell parts (GO)– Cells (OBO Cell ontology)– Gross anatomical entities (CARO,

FMA, flyAO, MA, zfishAO, …)– Aggregates of organisms (?)

• Occurrents (4D entities)– Biological processes (GO)

normal eya[1]/eya[1]

E Q

Cell death in eye

Increased rate

Eye disc cell small

Eye disc cell refractile

GO FlyAO

PATO

Tour of PATO

• Tour from the top-down• The top level of PATO has been built

according to formal ontological principles– This helps us define terms in a consistent

and unambiguous way– The top level can be hidden from end-users

by means of ontology views (aka slims)– Still subject to change

• Feedback welcome!

PATO: Top level division

Quality

Quality of a continuantA quality which inheresIn a continuant

Quality of an occurrentA quality which inheresIn a process or spatiotemporalregion

arrested

color premature delayed

durationmorphology

physical

quality

density shape size structure

Note: some nodes omitted

for brevity

cellular

quality rate

Divisions by granularity

Monadic quality of a continuant

Physical qualityA quality that exists throughaction of continuants at thephysical level of organisation

Cellular qualityA quality that exists atthe cellular level of organisation

potency

color

hot

nucleatequality

ploidytemperatu

re mass

……

cold

diploidhaploidaneuploi

d

multipotenttotipotent

oligoptent

greenpink

yellowanucleate

binculeate

largemass

smallmass

Monadic vs relational

quality of a continuant

Monadic quality of a CA quality of a C that inheres solely in the bearer and does not require another entity

Relational quality of a CA quality of a C that requires anotherentity apart from its bearer to exist

Displacement

(with)

Physical quality

Connected-ness

(to)

Sensitivity(to)

Cellular

quality morpholog

y

……

shape size structure

Example relational quality

• Sensitivity– Directed towards some entity type

• E.g.– Sensitivity of an eye to red light

• The quality inheres_in the eye• With respect to (towards) red light

– Pheno-syntax:• E= eye Q= sensitivity E2= red_light

On absence

• Annotation patterns for absence, counts are currently under discussion

• “spermatocyte devoid of asters”– E= CL:spermatocyte

• Inheres in the spermatocyte

– Q= PATO:lacks_part• The quality/relation of missing some part or parts

– E2= GO-CC:aster• The quality is with respect to the type “aster”

Pre- vs post- composition

• When do we build the phenotype description?– In the ontology– During annotation?

• Reconciling pre and post composition: An analysis of the plant_trait ontology

When do we build the phenotype description?

• Early?– Pre-composed phenotype definitions

• MP:0000017 “big ears”• TO:0000227 “root length”• TO:0000029 “chlorine sensitivity”

• Late?– Post-composed phenotype definitions

• E= MA:ear Q= PATO:big• E= PO:root Q= PATO:length• E= organism Q= PATO:sensitivity E2=

CHEBI:chlorine

Is this comparable?

MP:0000285 “abnormal cardiac valve morphology”

MP:0000287 “heart valve hypoplasia”

E= MA:heart_valve Q=PATO:hypoplastic

PATO:0000141 “structure”

PATO:0000645 “hypoplastic”

PATO:0000051 “morphology”?

Yes: if term is decomposable

MP:0000285 “abnormal cardiac valve morphology”

MP:0000287 “heart valve hypoplasia”

E= MA:heart_valve Q=PATO:hypoplastic

PATO:0000141 “structure”

PATO:0000645 “hypoplastic”

PATO:0000051 “morphology”=

Def: a hypoplasticity which inheres_in a heart valve

Comparing phenotypes

• We want to compare and query both within and across species– For gross anatomical phenotypes to be

compared across species, descriptions must be decomposed or decomposable to anatomical terms

• Anatomical terms must be comparable– Homology links– CARO: Common Anatomy Reference

Ontology

Case study: Defining plant traits with PATO

• OBO Plant Trait ontology• Pre-composed phenotype terms

– Analagous to OBO mammalian_phenotype ontology

• Task: Define these terms with PATO– A good test of PATO– Demonstration of compositional approach– Allows meaningful comparison across plant

species– Pilot study before applying to metazoans

http://www.bioontology.org/wiki/index.php/PATO:Pre_vs_Post_Coordinating

Methods

• Creation of genus-differentia definitions– First pass: Obol– Second pass: manual editing

• Ontologies used– PATO– Plant anatomical entities (PO)– Gramene environment (GEO)– Chemical entities of biological interest (CHEBI)– GO

Basic phenotype terms

• “root length” (TO:00000227)– E= PO:root Q= PATO:length– Formally:

Def: a length which inheres_in a root

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.

Relational qualities involving types of

chemical• “Chlorine sensitivity”

[TO:0000029]• Directed towards an additional

entity type– Q= PATO:sensitivity E2= CHEBI:chlorine

Def: a sensitivity which is directed towards chlorine[ inheres_in organism ]

Relational qualities involving the environment• “drought sensitivity” [TO:0000029]

– Directed towards an additional entity type

– Q= PATO:sensitivity E2= EO:droughtDef: a sensitivity which is directed towards drought[ inheres_in organism ]

OBO needs a good environment ontology

Complex phenotypes

• “Chinsura boro”– "Abortion of microspore development

at trinucleate stage”

Def: a arrested which inheres_in ( microspore development which during trinucleate stage )

Results of plant_trait analysis

• 252/784 terms provided with genus-differentia definitions so far

• Helped find inconsistencies and problems in the ontology

• New term suggestions for PATO– proportionality

• Approach should work for animal phenotype ontologies

Bacterial phenotypes

• Performed similar analysis on bacterial phenotype terms– Provided by Garrity & Hozzein

• Results (morphological only):– 26 new terms added to PATO– Rugose, rhizoidal, lobate, filamentous, …– Todo: chemical utilization phenotypes

• Required:– Ontologies for aggregates of organisms– Assay ontology

Measurements

• Ontologies provide qualitative partitions on the kinds of entities we find in nature

• We may also want to record quantitative information– Comes from measurements of qualities– The measurement is not the phenotype

• Phenotypes exist independently of our measurements of them

Measurement schema

• A measurement record consists of– The quality being measured

• E.g. the length of a particular mouse tail

– The unit type• From PATO UO

– A magnitude• Floating point number• Error measure [optional]

Sample of PATO UO

• Unit– Base unit

• Length unit– Angstrom– meter

• Mass unit– Dalton– Gram

• Substance unit

– Derived unit• Concentration unit

– pH

• Quality– Morphology

• Sizelength

– Physical quality• Mass

Phenotype exchange formats

• Genotypes and phenotypes:– Pheno-syntax– Pheno-XML

• General purpose– OWL (using canonical EQ encoding)

• Also has Obo equivalent

• GO annotation files– Works with pre-coordinated terms only

OBD-Phenotype

• A database for phenotype associations• Built on OBD framework

– Tuned for inference and reasoning– Graph traversal built in from the start

• Results– Annotations on data from OMIM, ZFIN and

FlyBase– Currently too small a dataset to do

analysis

Next steps

• Get PATO & Phenote used across multiple organisms and projects– MODs, BIRN, OMIM,

• Collect annotation data from multiple sources in one repository (OBD)– Both pre + post composed– Demonstrated improved analysis of

annotation data using PATO

• filamentous - having thin filamentous extensions at its edge• pleomorphic - a quality inhering in a cell by virtue of it ability to take on two or

more different shapes during its life cycle• pulvinate - shaped like a cushion or has a marked convex cushion-like form• umbonate - having a knob or knoblike protuberance • rugose - having many wrinkles or creases on the surface• glistening - emitting or reflecting lots of light• dull - emitting or reflecting little or no light• viscid - covered with a sticky or clammy coating• mucoid - consistency of mucus• spiral - plane curve traced by a point circling about the center but at

increasing distances from the center• rhizoidal - having root like extensions radiating from its center• spiny - having spines, thorns or similar stiff projections on its surface• warty - having a hard rough surface; not smooth • curled - having parallel chains in undulate fashion on the border• fragile - easily damaged or disrupted; brittle• butyraceous - resembling butter in appearance and consistency• undulate - having a wavy, shallow edge• punctiform - small and resembling a point• lobate - a morphological quality in which the bearer has deeply

undulated edges forming lobes• erose - having an irregularly toothed edge• raised - is a thick colony that appear above the medium surface with

terraced edges• convex - a shape that obtains by virtue of having inward facing edges; having a

surface or boundary that curves or bulges outward, as the exterior of a sphere

Proportions

• “amylose to amylopectin ratio”TO:0000372

Def: a compositionality which is directed towards amylose relative_to amylopectin[ inheres_in organism ]

QuickTime™ and aTIFF (LZW) decompressor

are needed to see this picture.