building the hymenoptera anatomy ontology through exploration of the journal of hymenoptera research
DESCRIPTION
The Hymenoptera Anatomy Ontology (HAO) project aims to capture the complex lexica used to describe hymenoptera anatomy. Our core data are extracted from the corpus of published works, particularly descriptions of new taxa. We reviewed the Journal of Hymenoptera Research (JHR) to extract new labels and ontological classes, explored the completeness of the present version of the HAO, and reflected upon community language trends. Three hundred and fifty three (353) Journal of Hymenoptera Research articles were parsed, accessed through the Biodiversity Heritage Library and vetted against the present ontology. New labels (2121) were collected during this process including about 650 adjectives used to qualify morphological features. Language trends were revealed in the process, showing the occurrence of anatomical labels used in the literature, possibly reflecting the character systems and qualifiers we most often use to describe novel taxa. Additionally the novel software used for text extraction is reviewed, outlining possible improvements and useful tools resulting from this effort.TRANSCRIPT
Building the Hymenoptera Anatomy Ontology through exploration of the Journal of Hymenoptera ResearchKatja Seltmann Matthew Bertone Matthew J. YoderIstván MikóElizabeth MacleodAndrew Ernst Andrew R. Deans
Volumes: 1-16Years: 1992-2007
The opportunity…
1. Database (infrastructure)
2. Terms used in hymenoptera morphology
3. JHR Volumes 1-16 are online and processed using optical character recognition (OCR) software through the Biodiversity Heritage Library
.
Volumes: 1-16Years: 1992-2007
We were wondering…
1. Can we find new terms for the HAO by text extraction?
2. Look for ways we as a community do things. Is it really true that terminology follows phylogeny?
How captured terms from Journal of Hymenoptera Research…
1. Download articles from Biodiversity Heritage Library (http://www.biodiversitylibrary.org)
2. Put text in database (MX)
3. Match the article text to the words we know are terms
(also cataloged in the same database)
3. Add new terms based on what is NOT matched
4. People made decisions
353 articles
from 353 articles:
2121 morphological terms
643 qualitative
2065 terms from JHR are not defined as concepts. Floating without definition!
As of June 1, 2010…
carina (3638, 160) wing (3297, 194) setae (3294, 171) vein (2891, 141) cell (2855, 202) seta (2545, 55) eye (2438, 186) segment (2415, 159) tergum (2381, 137) hind (2209, 172) larva (1751, 113) propodeum (1617, 184 )
tooth (1604, 110) punctures (1490, 96) clypeus (1482, 175) segments (1422, 159) flagellomere (1392, 91) tergite (1371, 87) mandible (1369, 143)antenna (1365, 164) body (1359, 244) region (1289, 214)tibia (1261, 129) leg (1244, 101) ovipositor (1230, 127) ocellus (1218, 116)
larvae (1214, 161) scutellum (1201, 159) line (1166, 147) lobe (1160, 133) mesosoma (1137, 159) longitudinal (1131, 161) scape (1127, 133) legs (1072, 202) carinae (1014, 118) pronotum (1011, 162) terga (1002, 122) forewing (988, 132) antennal (966, 168) metasoma (960, 168)
carina (3638, 160) wing (3297, 194) setae (3294, 171) vein (2891, 141) cell (2855, 202) seta (2545, 55) eye (2438, 186) segment (2415, 159) tergum (2381, 137) hind (2209, 172) larva (1751, 113) propodeum (1617, 184 )
tooth (1604, 110) punctures (1490, 96) clypeus (1482, 175) segments (1422, 159) flagellomere (1392, 91) tergite (1371, 87) mandible (1369, 143)antenna (1365, 164) body (1359, 244) region (1289, 214)tibia (1261, 129) leg (1244, 101) ovipositor (1230, 127) ocellus (1218, 116)
larvae (1214, 161) scutellum (1201, 159) line (1166, 147) lobe (1160, 133) mesosoma (1137, 159) longitudinal (1131, 161) scape (1127, 133) legs (1072, 202) carinae (1014, 118) pronotum (1011, 162) terga (1002, 122) forewing (988, 132) antennal (966, 168) metasoma (960, 168)
Qualifying terms: spatial, adjectives, comparative
Qualifying terms: spatial, adjectives, comparative
posterior (2694, 216) dorsal (2654, 216) anterior (2475, 221) slightly (2247, 227 small (2048, 284) short (1930, 249) apex (1894, 192) smooth (1817, 174) large (1629, 266) distinct (1487, 201)
transverse (1486, 173) similar (1476, 276) base (1471, 200) broad (1394, 178) half (1357, 207) separated (1217, 182) single (1097, 243) rounded (1037, 158) dorsally (1017, 146) nearly (990, 185) shiny (980, 83)
inner (950, 158) shorter (938, 177) few (874, 239) elongate (859, 147) lower (834, 188)
.
Look at the data a different way…
1. Terminals are taxa discussed in articles• Use only articles that have the word “description
of” in the title• Holes: Ichnumonoidea(49), Chalcidoidea(38),
Vespoidea(36), Apoidea(36),Symphyta(9), Cynipoidea(7), Chrysidoidea(4), Stephanidae(1), Mymarommatidae(1)
2. Characters presence or absence of a term• Use only terms that occurred in more than one
article
3. Created a matrix excluding spatial and qualifying words • (1162 terms, 181 terminals)
4. TNT analysis • xmult /level 7 replications 5 hits 5• nelsen
http://tiny.cc/p0aan
http://tiny.cc/p0aan
http://tiny.cc/p0aan
http://tiny.cc/p0aan
http://tiny.cc/p0aan
studentstudent
http://tiny.cc/p0aan
Petiole: http://tiny.cc/p0aan
What does this mean to ISH…
1. Next session addresses this…moving to open access journal
2. Things we can do in our publications (in the form of annotations) that can make data synthesis easier and less need to repeat work.
funding: Advances in Biological Informatics (NSF DBI-0850223) NESCent (NSF EF-0423641) Morphbank (NSF DBI-0446224) HymAToL (NSF EF-0337220) PEET: Monographic research on parasitic Hymenoptera (NSF DEB-0328922)
intellect and enthusiasm:Biodiveristy Heritage Library, Rick Prelinger
International Society of Hymenopterists NESCent Other ontology projects Deans Lab (Barb Sharanowski, Trish Mullins, Bob Blinn, Rinchhuanawma,
Lydia Abernethy)
Acknowledgments
http://tiny.cc/[email protected]