moving beyond free text

Post on 22-Feb-2016

33 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Moving beyond free text. Authors. Moving beyond free text. Old Paradigm:. Scientist does research. Scientist publishes research results in journal article. Want: All genes involved in seed development (name, species, protein sequence). Read 3,404 articles???. Read 592,000 articles???. - PowerPoint PPT Presentation

TRANSCRIPT

Moving beyond free text

Moving beyond free textAuthors

Scientist does research

Scientist publishes research results in journal article

Old Paradigm:

Want:

All genes involved in seed development(name, species, protein sequence)

Read 3,404 articles???

Read 592,000 articles???

Results extracted from free text and converted to a structured format (ontology annotations)

Structured data combined with other data for queries, further analysis

manual curation (+ NLP…?)

Scientist does research

Scientist publishes research results as free text

Database

Old Paradigm - extended:

Example –Journal article about gene function

The goal: an annotation that captures the result

Example –Journal article about gene function

Manual curation:Time consuming, does not scale well

NLP:Very challenging

The goal: an annotation that captures the result

Example –Journal article about gene function

Example – phylogenetic treatment

http://www.mobot.org/mobot/research/apweb/welcome.html

Relatively high degree of structure compared to journal article

May be more amenable to natural language processing but still very challenging, complex information

Results extracted from free text and converted to a structured format (ontology annotations)

Structured data combined with other data for queries, further analysis

manual curation (+ NLP)Can we get authors involved?

Scientist does research

Scientist publishes research results as free text

Database

Link to external resource

Scientific Publishers are interested in this problem…

Science Direct: http://www.sciencedirect.com/science/article/pii/S0378111910001502

Scientific Publishers are interested in this problem…

Scientific Publishers are interested in this problem…

Databases are interested in this problem…

Databases are interested in this problem…

What if we had a good general tool for authors to do this themselves?

http://herbarium.usu.edu/webmanual/

Example: Morphological description of species

http://herbarium.usu.edu/webmanual/

Example: Morphological description of species

PO:0025034 (leaf), PATO:0000599 (decreased width)

PO:0020003 (ovule), PATO:0000460 (abnormal)

PO:0009010 (seed), PATO:0001997 (reduced)

Example: Mutant phenotype description

Scientist does research

Scientist publishes research results as free textand as annotations using ontology terms

Benefit to scientist – wider exposure and reuse of results

Benefit to publishers – tagged text allows enhanced presentation for subscribers

Benefit to research community – Better access to data

New Paradigm:

top related