flora phenotype ontology

20
Introduction Extracting trait information Ontology generation Flora Phenotype Ontology George Gosline, Quentin Groom, Thomas Hamann, Robert Hoehndorf, Claus Weiland pro-iBiosphere Final Event 11 June 2014

Upload: robert-hoehndorf

Post on 26-Jan-2015

105 views

Category:

Science


2 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Flora Phenotype Ontology

George Gosline, Quentin Groom, Thomas Hamann, RobertHoehndorf, Claus Weiland

pro-iBiosphere Final Event

11 June 2014

Page 2: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Kadsura heteroclita (Flora Malesiana)

Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.

Phytolacca dodecandra (Flora of Central Africa)

Fleurs odorantes, rouges, blanches, jaunes ou vertes; tepalesovales-triangulaires, obtus ou subaigus au sommet, de 1,53 mm delong et de la moitie de large, pubescents exterieurement, a margemembraneuse, libres, refractes a maturite.

Page 3: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Kadsura heteroclita (Flora Malesiana)

Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.

Phytolacca dodecandra (Flora of Central Africa)

Fleurs odorantes, rouges, blanches, jaunes ou vertes; tepalesovales-triangulaires, obtus ou subaigus au sommet, de 1,53 mm delong et de la moitie de large, pubescents exterieurement, a margemembraneuse, libres, refractes a maturite.

Page 4: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Entity-Quality model

Traits are decomposed in

entity

anatomyfunction/process

quality

attributevalue

Page 5: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Plant Ontology

plant anatomy and development

labels, definitions, synonyms (Spanish, Japanese)

relations: parthood, development

Page 6: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Plant Ontology: flower

subclass of: reproductive shoot system, flower meristem

develops from: flower meristem

137 flower parts:

flower nectary, androecium, palea awn, tepal base, petal tip, ...

Page 7: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

PATO ontology

attributes and values

labels, definitions, synonyms

Page 8: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

PATO: Color red

value: red

attribute: color

Page 9: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Extracting trait information

1 text processing

2 ontology generation

Page 10: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Text processing

Kadsura heteroclita

Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.

sentence identification (Apache OpenNLP toolkit)

stemming, stop word removal, etc. (Apache Lucene)

entity identification (Apache Lucene, Plant Ontology)

quality identification (Apache Lucene, PATO)

semantic relationships (Stanford parser)

Page 11: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Text processing

Kadsura heteroclita

Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.

sentence identification (Apache OpenNLP toolkit)

stemming, stop word removal, etc. (Apache Lucene)

entity identification (Apache Lucene, Plant Ontology)

quality identification (Apache Lucene, PATO)

semantic relationships (Stanford parser)

Page 12: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Text processing

Kadsura heteroclita

Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.

sentence identification (Apache OpenNLP toolkit)

stemming, stop word removal, etc. (Apache Lucene)

entity identification (Apache Lucene, Plant Ontology)

quality identification (Apache Lucene, PATO)

semantic relationships (Stanford parser)

Page 13: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Text processing

Kadsura heteroclita

Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.

sentence identification (Apache OpenNLP toolkit)

stemming, stop word removal, etc. (Apache Lucene)

entity identification (Apache Lucene, Plant Ontology)

quality identification (Apache Lucene, PATO)

semantic relationships (Stanford parser)

Page 14: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Text processing

Kadsura heteroclita

Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.

sentence identification (Apache OpenNLP toolkit)

stemming, stop word removal, etc. (Apache Lucene)

entity identification (Apache Lucene, Plant Ontology)

quality identification (Apache Lucene, PATO)

semantic relationships (Stanford parser)

Page 15: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Ontology generation

Entity: flower (PO:0009046)

Quality (value): red (PATO:0000322)

Attribute(red): color (PATO:0000014)

from PATO ontology

3 classes in FLOPO:flower red

EquivalentTo: phenotype-of some (flower and has-quality some red)

flower color

EquivalentTo: phenotype-of some (flower and has-quality some color)

flower phenotype

EquivalentTo: phenotype-of some (part-of some flower)

automatic reasoning to generate ontology structure

Page 16: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Ontology generation

Entity: flower (PO:0009046)

Quality (value): red (PATO:0000322)

Attribute(red): color (PATO:0000014)

from PATO ontology

3 classes in FLOPO:flower red

EquivalentTo: phenotype-of some (flower and has-quality some red)

flower color

EquivalentTo: phenotype-of some (flower and has-quality some color)

flower phenotype

EquivalentTo: phenotype-of some (part-of some flower)

automatic reasoning to generate ontology structure

Page 17: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Ontology generation

Page 18: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Summary

data-driven generation of FLOPO

every class in FLOPO has at least one annotation“fits the data”

over 25,000 classes

over 460,000 annotations

Flora Malesiana, Flora Gabon, Flora of Central Africa, AfricanFloras(but not all annotations are correct, sorry)

Page 19: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

Summary

integration of plant traits

multi-languagemulti-taxonmulti-flora

enables trait-based search

comparative analysis

Page 20: Flora Phenotype Ontology

Introduction Extracting trait information Ontology generation

What’s next?

improve annotation pipeline

text processing: CharaParser, languagesadd missing terms to PO, PATOquantitative traits

better ontology

must be usable by domain expertsless “artificial”

more data

the “World Flora”link to genetics/genomics (mutants, GWAS)

environment and habitat

continuous improvement