flora phenotype ontology
DESCRIPTION
TRANSCRIPT
Introduction Extracting trait information Ontology generation
Flora Phenotype Ontology
George Gosline, Quentin Groom, Thomas Hamann, RobertHoehndorf, Claus Weiland
pro-iBiosphere Final Event
11 June 2014
Introduction Extracting trait information Ontology generation
Kadsura heteroclita (Flora Malesiana)
Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.
Phytolacca dodecandra (Flora of Central Africa)
Fleurs odorantes, rouges, blanches, jaunes ou vertes; tepalesovales-triangulaires, obtus ou subaigus au sommet, de 1,53 mm delong et de la moitie de large, pubescents exterieurement, a margemembraneuse, libres, refractes a maturite.
Introduction Extracting trait information Ontology generation
Kadsura heteroclita (Flora Malesiana)
Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.
Phytolacca dodecandra (Flora of Central Africa)
Fleurs odorantes, rouges, blanches, jaunes ou vertes; tepalesovales-triangulaires, obtus ou subaigus au sommet, de 1,53 mm delong et de la moitie de large, pubescents exterieurement, a margemembraneuse, libres, refractes a maturite.
Introduction Extracting trait information Ontology generation
Entity-Quality model
Traits are decomposed in
entity
anatomyfunction/process
quality
attributevalue
Introduction Extracting trait information Ontology generation
Plant Ontology
plant anatomy and development
labels, definitions, synonyms (Spanish, Japanese)
relations: parthood, development
Introduction Extracting trait information Ontology generation
Plant Ontology: flower
subclass of: reproductive shoot system, flower meristem
develops from: flower meristem
137 flower parts:
flower nectary, androecium, palea awn, tepal base, petal tip, ...
Introduction Extracting trait information Ontology generation
PATO ontology
attributes and values
labels, definitions, synonyms
Introduction Extracting trait information Ontology generation
PATO: Color red
value: red
attribute: color
Introduction Extracting trait information Ontology generation
Extracting trait information
1 text processing
2 ontology generation
Introduction Extracting trait information Ontology generation
Text processing
Kadsura heteroclita
Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.
sentence identification (Apache OpenNLP toolkit)
stemming, stop word removal, etc. (Apache Lucene)
entity identification (Apache Lucene, Plant Ontology)
quality identification (Apache Lucene, PATO)
semantic relationships (Stanford parser)
Introduction Extracting trait information Ontology generation
Text processing
Kadsura heteroclita
Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.
sentence identification (Apache OpenNLP toolkit)
stemming, stop word removal, etc. (Apache Lucene)
entity identification (Apache Lucene, Plant Ontology)
quality identification (Apache Lucene, PATO)
semantic relationships (Stanford parser)
Introduction Extracting trait information Ontology generation
Text processing
Kadsura heteroclita
Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.
sentence identification (Apache OpenNLP toolkit)
stemming, stop word removal, etc. (Apache Lucene)
entity identification (Apache Lucene, Plant Ontology)
quality identification (Apache Lucene, PATO)
semantic relationships (Stanford parser)
Introduction Extracting trait information Ontology generation
Text processing
Kadsura heteroclita
Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.
sentence identification (Apache OpenNLP toolkit)
stemming, stop word removal, etc. (Apache Lucene)
entity identification (Apache Lucene, Plant Ontology)
quality identification (Apache Lucene, PATO)
semantic relationships (Stanford parser)
Introduction Extracting trait information Ontology generation
Text processing
Kadsura heteroclita
Male flowers with 39-62(-72) stamens, red, absent from apex oftorus, anthers sessile, closely appressed in subglobose to ellipsoidhead, 2.0-4.5 mm diameter, connectives broad, with lateral thecaeso that the thecae of adjacent stamens contiguous; pollenhexacolpate.
sentence identification (Apache OpenNLP toolkit)
stemming, stop word removal, etc. (Apache Lucene)
entity identification (Apache Lucene, Plant Ontology)
quality identification (Apache Lucene, PATO)
semantic relationships (Stanford parser)
Introduction Extracting trait information Ontology generation
Ontology generation
Entity: flower (PO:0009046)
Quality (value): red (PATO:0000322)
Attribute(red): color (PATO:0000014)
from PATO ontology
3 classes in FLOPO:flower red
EquivalentTo: phenotype-of some (flower and has-quality some red)
flower color
EquivalentTo: phenotype-of some (flower and has-quality some color)
flower phenotype
EquivalentTo: phenotype-of some (part-of some flower)
automatic reasoning to generate ontology structure
Introduction Extracting trait information Ontology generation
Ontology generation
Entity: flower (PO:0009046)
Quality (value): red (PATO:0000322)
Attribute(red): color (PATO:0000014)
from PATO ontology
3 classes in FLOPO:flower red
EquivalentTo: phenotype-of some (flower and has-quality some red)
flower color
EquivalentTo: phenotype-of some (flower and has-quality some color)
flower phenotype
EquivalentTo: phenotype-of some (part-of some flower)
automatic reasoning to generate ontology structure
Introduction Extracting trait information Ontology generation
Ontology generation
Introduction Extracting trait information Ontology generation
Summary
data-driven generation of FLOPO
every class in FLOPO has at least one annotation“fits the data”
over 25,000 classes
over 460,000 annotations
Flora Malesiana, Flora Gabon, Flora of Central Africa, AfricanFloras(but not all annotations are correct, sorry)
Introduction Extracting trait information Ontology generation
Summary
integration of plant traits
multi-languagemulti-taxonmulti-flora
enables trait-based search
comparative analysis
Introduction Extracting trait information Ontology generation
What’s next?
improve annotation pipeline
text processing: CharaParser, languagesadd missing terms to PO, PATOquantitative traits
better ontology
must be usable by domain expertsless “artificial”
more data
the “World Flora”link to genetics/genomics (mutants, GWAS)
environment and habitat
continuous improvement