ontology and phylogeny: ontologies as research tools linking phylogenies, systematics, phenotypes,...

14
Ontology and Phylogeny: Ontology and Phylogeny: Ontologies as research tools Ontologies as research tools linking phylogenies, linking phylogenies, systematics, phenotypes, systematics, phenotypes, and genomics and genomics Brent D. Mishler Brent D. Mishler University of California, Berkeley University of California, Berkeley Jepson Jepson Herbarium Herbarium University University Herbarium Herbarium

Upload: russell-barker

Post on 30-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

Ontology and Phylogeny:Ontology and Phylogeny:Ontologies as research tools linking Ontologies as research tools linking

phylogenies, systematics, phenotypes, phylogenies, systematics, phenotypes, and genomicsand genomics

Brent D. MishlerBrent D. MishlerUniversity of California, BerkeleyUniversity of California, Berkeley

Jepson HerbariumJepson Herbarium University HerbariumUniversity Herbarium

Page 2: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

Ontologies in generalOntologies in general• Are Are classificationsclassifications• Naming things, and organizing them in databases, Naming things, and organizing them in databases,

is critical in all mature sciencesis critical in all mature sciences• Need for frameworks for Need for frameworks for understandingunderstanding• Two organizing forces in biology:Two organizing forces in biology:

– current functioncurrent function– history (history (homologyhomology))

• Uses of cladograms for untangling theseUses of cladograms for untangling these• The role of systematics in relation to molecular, The role of systematics in relation to molecular,

cellular, and developmental biology -- once cellular, and developmental biology -- once estranged, now vitally interlinked.estranged, now vitally interlinked.

Page 3: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

Some things to think about, Some things to think about, in the “annotation” processin the “annotation” process

• So what does it mean to say I have the “same” or So what does it mean to say I have the “same” or “related” genes in two different genomes?“related” genes in two different genomes?

• Or for that matter, the “same” or “related” genes Or for that matter, the “same” or “related” genes in the same genome?in the same genome?

• Three ways to go:Three ways to go:•Name gene haphazardly by what ever Name gene haphazardly by what ever criteria the discoverer thinks best -- common criteria the discoverer thinks best -- common practice, unfortunately!practice, unfortunately!•Name gene by functional criteriaName gene by functional criteria•Name gene by phylogenetic criteriaName gene by phylogenetic criteria

• The need for ontologies (a formal classification)The need for ontologies (a formal classification)

Page 4: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

Approach taken by Gene Approach taken by Gene Ontology Consortium:Ontology Consortium:

““The Gene Ontology project provides an ontology of defined terms The Gene Ontology project provides an ontology of defined terms representing gene product properties. The ontology covers three representing gene product properties. The ontology covers three domains: domains: cellular componentcellular component, the parts of a cell or its extracellular , the parts of a cell or its extracellular environment; environment; molecular functionmolecular function, the elemental activities of a gene , the elemental activities of a gene product at the molecular level, such as binding or catalysis; and product at the molecular level, such as binding or catalysis; and biological processbiological process, operations or sets of molecular events with a , operations or sets of molecular events with a defined beginning and end, pertinent to the functioning of defined beginning and end, pertinent to the functioning of integrated living units: cells, tissues, organs, and organisms.integrated living units: cells, tissues, organs, and organisms.

For example, the gene product cytochrome c can be described by the For example, the gene product cytochrome c can be described by the molecular function term oxidoreductase activity, the biological molecular function term oxidoreductase activity, the biological process terms oxidative phosphorylation and induction of cell death, process terms oxidative phosphorylation and induction of cell death, and the cellular component terms mitochondrial matrix and and the cellular component terms mitochondrial matrix and mitochondrial inner membrane.mitochondrial inner membrane.““

From: http://www.geneontology.org/index.shtmlFrom: http://www.geneontology.org/index.shtml

Page 5: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

What would a phylogenetic What would a phylogenetic approach look like?approach look like?

• We need to add a gene ontology reflecting We need to add a gene ontology reflecting history!history!

• This would not be to the exclusion of This would not be to the exclusion of functional ontologies, but rather an addition.functional ontologies, but rather an addition.

• We want to be able to look at function and We want to be able to look at function and history in light of each other, i.e., the history in light of each other, i.e., the evolutionevolution of function. of function.

• The classic homology - analogy distinctionThe classic homology - analogy distinction

Page 6: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

HomologyHomology• Homology can reside at any level, requires: Homology can reside at any level, requires: historical historical

passage of information from ancestor to descendantpassage of information from ancestor to descendant• Two subcategories of homology:Two subcategories of homology:

– paralogy (e.g., homology due to duplications of a paralogy (e.g., homology due to duplications of a gene within one genome)gene within one genome)

– orthology (e.g., homology due to sharing of the same orthology (e.g., homology due to sharing of the same gene between different organisms)gene between different organisms)

• Homology is a statement of historical relationship (it Homology is a statement of historical relationship (it implies “sameness” in a yes/no sense).implies “sameness” in a yes/no sense).

• Thus we really shouldn’t say things like “gene x is 85% Thus we really shouldn’t say things like “gene x is 85% homologous to gene y” or “the closest homolog to gene homologous to gene y” or “the closest homolog to gene x is gene y”x is gene y”

Page 7: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

The history of genesThe history of genes

genes and “species,” or other higher-ordergenes and “species,” or other higher-orderlineages may have different historieslineages may have different histories

these are orthologsthese are orthologs

these are paralogsthese are paralogs

Page 8: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

The names of genesThe names of genesSo what does it mean to say I have the “same” gene, So what does it mean to say I have the “same” gene, in a phylogenetic sense, in two different organisms?in a phylogenetic sense, in two different organisms?

Problems:Problems:•nucleotide evolution keeps happening, nucleotide evolution keeps happening,

so genes are not identical.so genes are not identical.•genes evolve at different rates, therefore thegenes evolve at different rates, therefore the most similar genes may not be the most similar genes may not be the

most closely related.most closely related.•gene conversiongene conversion•extinction of gene extinction of gene

copiescopies

Page 9: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

The names of genesThe names of genesSo what does it mean to say I have the “same” gene, So what does it mean to say I have the “same” gene, in a phylogenetic sense, in two different organisms?in a phylogenetic sense, in two different organisms?

Solution:Solution:•phylogeny of gene copies,phylogeny of gene copies,

without regard towithout regard to““host” genomehost” genome

•compare with “host”compare with “host”phylogenyphylogeny

•need good sampling!need good sampling!•need whole genomes!need whole genomes!

Page 10: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

A node-based name:A node-based name:““I name the gene clade that containsI name the gene clade that contains A, B, and all the descendents of A, B, and all the descendents of their most recent common ancestor”their most recent common ancestor”

node in gene tree node in gene tree being namedbeing named

A, B, and Z here are A, B, and Z here are called called specifiers:specifiers:A & B are A & B are internalinternal specifiers, specifiers, while Z is an while Z is an externalexternal specifier. In this specifier. In this system, these would system, these would be be genesgenes..

AABB

ZZ

The main contribution of the Phylocode is to provide The main contribution of the Phylocode is to provide an unambiguous way to name cladesan unambiguous way to name clades: this could work : this could work

for gene clades!for gene clades!

Page 11: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

A Phylogenetic ClassificationA Phylogenetic Classification of Genes of Genes

• new proposal: need a unique phylogenetic identifier new proposal: need a unique phylogenetic identifier for each gene and gene clade (distinct from the for each gene and gene clade (distinct from the associated taxon name!!)associated taxon name!!)

• internal and external specifiers (other named genes)internal and external specifiers (other named genes)• registered in a data base (GO associated?)registered in a data base (GO associated?)• Parallel to the developing Phylocode for taxonomy Parallel to the developing Phylocode for taxonomy

of organism lineages (an interesting and of organism lineages (an interesting and unanticipated convergence)unanticipated convergence)

http://www.ohiou.edu/phylocode/http://www.ohiou.edu/phylocode/

Page 12: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

What about phenotypes?What about phenotypes?

• Like genes, need a primary name, plus inclusive Like genes, need a primary name, plus inclusive classifications.classifications.

• Primary name, by analogy with genes, should be a Primary name, by analogy with genes, should be a neutral identifier (e.g., GenBank accession)neutral identifier (e.g., GenBank accession)

• Linked to a specific data point, a particular Linked to a specific data point, a particular organism, i.e., a specimen, photo, anatomical prep, organism, i.e., a specimen, photo, anatomical prep, plus metadata.plus metadata.

Page 13: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

What about phenotypes?What about phenotypes?

• Classification could be based on:Classification could be based on:• development (e.g., "seedling," "anthesis")development (e.g., "seedling," "anthesis")• location (e.g., "axillary," "basal")location (e.g., "axillary," "basal")• function (e.g., "leaf," "scale," "stem," "spine")function (e.g., "leaf," "scale," "stem," "spine")• history history (e.g., "microphyll," "phyllid")(e.g., "microphyll," "phyllid")• structure per se?? (probably not a good idea)structure per se?? (probably not a good idea)

• Classifications can easily be cross-cutting, but Classifications can easily be cross-cutting, but basis of term needs to be clear to computer (and basis of term needs to be clear to computer (and user!); meta-tagsuser!); meta-tags

Page 14: Ontology and Phylogeny: Ontologies as research tools linking phylogenies, systematics, phenotypes, and genomics Brent D. Mishler University of California,

The historical criterionThe historical criterionfor phenotype ontologies for phenotype ontologies

needs workneeds work• Based on homology, thus based on current best Based on homology, thus based on current best

hypothesis of phylogenetic tree.hypothesis of phylogenetic tree.• Therefore subject to change, as phylogenies Therefore subject to change, as phylogenies

change.change.• Needs to be clearly specified (i.e., linked to a Needs to be clearly specified (i.e., linked to a

specific clade) –a Phylocode-type approach could specific clade) –a Phylocode-type approach could be used to triangulate to clade where name appliesbe used to triangulate to clade where name applies

• When Phylocode is active, phenotype ontologies When Phylocode is active, phenotype ontologies could reference RegNum for clade names.could reference RegNum for clade names.