assisted curation: does text mining really help?...(alex et al. 2008) by benedict fehringer seminar:...
TRANSCRIPT
![Page 1: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/1.jpg)
23.02.2012
Assisted Curation: Does Text Mining Really Help?(Alex et al. 2008)
by Benedict Fehringer
Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“Supervisor: Dr. Caroline Sporleder (and Martin Schreiber)
Donnerstag, 23. Februar 2012
![Page 2: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/2.jpg)
Outline
! Introduction
! Related Work
! Assisted Curation
! Text Mining Pipeline
! Curation Experiments
! Discussion and Conclusion
! References
Donnerstag, 23. Februar 2012
![Page 3: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/3.jpg)
Outline
! Introduction
! Related Work
! Assisted Curation
! Text Mining Pipeline
! Curation Experiments
! Discussion and Conclusion
! References
Donnerstag, 23. Februar 2012
![Page 4: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/4.jpg)
Basic study elements- Content -
! Curation of biomedical literature
! For example, protein-protein interaction recognition:1. Which protein are there?2. If two proteins are named, are they in interaction?
Donnerstag, 23. Februar 2012
![Page 5: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/5.jpg)
Example for protein-protein interaction recognition
Source: Schwikowski, Uetz, & Fields (pp. 1259, 2000)
[...] An example is YHR105W, which interacts with one protein involved in vesicular transport, Akr2, and with YGL161C, an uncharacterized protein that interacts with two transport proteins, Yip1 and Pep12. YHR105W also interacts with YPL246C, another uncharacterized protein that interacts with Ypt1 and Vam7, proteins implicated in vesicular transport and membrane fusion, respectively. [...]
1. Which proteins are there?
2. If two proteins are named, are they in interaction?
Donnerstag, 23. Februar 2012
![Page 6: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/6.jpg)
Basic study elements- Research Question -
! Curation of biomedical literature
! For example, protein-protein interaction recognition:1. Which protein are there?2. If two proteins are named, are they in interaction?
! Task should be supported by text mining
Donnerstag, 23. Februar 2012
![Page 7: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/7.jpg)
Related Work
! Increasing development of information extraction systems (spurred on by BioCreAtIvE II competition; Krallinger, Leitner, & Valencia, 2007)! studies suggest reduction of curation time
! But: lack of user studies for extrinsically evaluation! no validation by curator feedback about affecting their work and
usefulness
Donnerstag, 23. Februar 2012
![Page 8: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/8.jpg)
Basic study elements- Evaluation -! Curation of biomedical literature
! For example, protein-protein interaction recognition:1. Which protein are there?2. If two proteins are named, are they in interaction?
! Task should be supported by text mining
! Evaluation by:! objective performance metrics (e.g. speed improvement, number of
records)! focusing on user feedback, too
Donnerstag, 23. Februar 2012
![Page 9: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/9.jpg)
Outline
! Introduction
! Related Work
! Assisted Curation
! Text Mining Pipeline
! Curation Experiments
! Discussion and Conclusion
! References
Donnerstag, 23. Februar 2012
![Page 10: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/10.jpg)
Curation Scenario- General -
! Goal: Curators should identify protein-protein interactions (PPIs)
! Initial step: Providing set of matching papers
! Middle step: Filtering papers into candidates
Donnerstag, 23. Februar 2012
![Page 11: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/11.jpg)
Curation Scenario- General -
! Goal: Curators should identify protein-protein interactions (PPIs)
! Initial step: Providing set of matching papers
! Middle step: Filtering papers into candidatesHow can NLP help the curator
work?
Donnerstag, 23. Februar 2012
![Page 12: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/12.jpg)
Curation Scenario- General -
! Goal: Curators should identify protein-protein interactions (PPIs)
! Initial step: Providing set of matching papers
! Middle step: Filtering papers into candidates
! Basic Assumption: Information Extraction (IE) techniques are likely effective in identifying entities and relations" More specific: NLP can propose candidate PPIs
Donnerstag, 23. Februar 2012
![Page 13: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/13.jpg)
Curation Scenario- General -
! Goal: Curators should identify protein-protein interactions (PPIs)
! Initial step: Providing set of matching papers
! Middle step: Filtering papers into candidates
! Basic Assumption: Information Extraction (IE) techniques are likely effective in identifying entities and relations" More specific: NLP can propose candidate PPIs
Donnerstag, 23. Februar 2012
![Page 14: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/14.jpg)
Curation Scenario- Concrete -
Information Flow in the Curation Process
Source: Alex et al. (p. 558, 2008)Donnerstag, 23. Februar 2012
![Page 15: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/15.jpg)
Curation Scenario- Concrete -
Information Flow in the Curation Process
Source: Alex et al. (p. 558, 2008)Donnerstag, 23. Februar 2012
![Page 16: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/16.jpg)
Curation Scenario- Concrete -
Information Flow in the Curation Process
Source: Alex et al. (p. 558, 2008)Donnerstag, 23. Februar 2012
![Page 17: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/17.jpg)
Curation Scenario- Concrete -
Information Flow in the Curation Process
Source: Alex et al. (p. 558, 2008)Donnerstag, 23. Februar 2012
![Page 18: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/18.jpg)
Curation Scenario- Concrete -
Information Flow in the Curation Process
Source: Alex et al. (p. 558, 2008)Donnerstag, 23. Februar 2012
![Page 19: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/19.jpg)
Curation Scenario- Concrete -
Information Flow in the Curation Process
Source: Alex et al. (p. 558, 2008)Donnerstag, 23. Februar 2012
![Page 20: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/20.jpg)
NLP Engine- Main Components -
Concrete Subtasks
1. Exists protein‘s name in sentence?
2. Which protein do they name?
3. If two proteins are named, are they in interaction?
NLP-Components
1. Named Entity Recognition
2. Term Identification
3. Relation Extraction
Donnerstag, 23. Februar 2012
![Page 21: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/21.jpg)
! How should the interface design look like?
NLP Engine- Creation details -
Donnerstag, 23. Februar 2012
![Page 22: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/22.jpg)
! How should the interface design look like?
! How should the labour be divided between human and the software?
NLP Engine- Creation details -For example:
To decide which species is associated with which protein should be quite simple for an expert but not necessarily for the software.
Donnerstag, 23. Februar 2012
![Page 23: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/23.jpg)
! How should the interface design look like?
! How should the labour be divided between human and the software?
! Which functional characteristics of the NLP engine would be optimal?
NLP Engine- Creation details -
For example:
Should recall or precision be improved?
Donnerstag, 23. Februar 2012
![Page 24: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/24.jpg)
NLP Engine- Creation details -
The focus will be on the third question.
! How should the interface design look like?
! How should the labour be divided between human and the software?
! Which functional characteristics of the NLP engine would be optimal?
Donnerstag, 23. Februar 2012
![Page 25: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/25.jpg)
Outline
! Introduction
! Related Work
! Assisted Curation
! Text Mining Pipeline
! Curation Experiments
! Discussion and Conclusion
! References
Donnerstag, 23. Februar 2012
![Page 26: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/26.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
Donnerstag, 23. Februar 2012
![Page 27: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/27.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
217 Papers
9 EntitiesPPI
relationsFRAG*
relations
AttributesNormalized
were
Properties
enriched with
84.9
88.4
64.8
59.6 87.1
inter-annotatoragreement
*linked fragments and mutants to their parentsDonnerstag, 23. Februar 2012
![Page 28: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/28.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
217 Papers
9 EntitiesPPI
relationsFRAG*
relations
AttributesNormalized
were
Properties
enriched with
84.9
88.4
64.8
59.6 87.1
inter-annotatoragreement
Corpus consists of 2 million tokens:
- TRAIN (66%)- DEVTEST (17%)- TEST (17%)
*linked fragments and mutants to their parentsDonnerstag, 23. Februar 2012
![Page 29: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/29.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
Donnerstag, 23. Februar 2012
![Page 30: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/30.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
Sentence boundary detection
TokenizationAdding useful
linguistic markup
Attaches NCBI* taxonomy identifiers
*National Center for Biotechnology Information
Donnerstag, 23. Februar 2012
![Page 31: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/31.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
Donnerstag, 23. Februar 2012
![Page 32: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/32.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
no entity
entity
Donnerstag, 23. Februar 2012
![Page 33: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/33.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
no entity
entity
entitypred
no entitypred
Sum
entityreal
9 3 12
no entityreal
1 11 12
Sum 10 14 24
Donnerstag, 23. Februar 2012
![Page 34: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/34.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
no entity
entityRecall: 9/12 = 0.75
entitypred
no entitypred
Sum
entityreal
9 3 12
no entityreal
1 11 12
Sum 10 14 24
Donnerstag, 23. Februar 2012
![Page 35: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/35.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
no entity
entityRecall: 9/12 = 0.75 Precision: 9/10 = 0.9
entitypred
no entitypred
Sum
entityreal
9 3 12
no entityreal
1 11 12
Sum 10 14 24
Donnerstag, 23. Februar 2012
![Page 36: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/36.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
no entity
entityRecall: 9/12 = 0.75 Precision: 9/10 = 0.9
entitypred
no entitypred
Sum
entityreal
9 3 12
no entityreal
1 11 12
Sum 10 14 24
Donnerstag, 23. Februar 2012
![Page 37: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/37.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
no entity
entityRecall: 9/12 = 0.75 Precision: 9/10 = 0.9
entitypred
no entitypred
Sum
entityreal
12 0 12
no entityreal
5 7 12
Sum 17 7 24
Donnerstag, 23. Februar 2012
![Page 38: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/38.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
no entity
entityRecall: 9/12 = 0.75 Precision: 9/10 = 0.9
Recall: 12/12 = 1
entitypred
no entitypred
Sum
entityreal
12 0 12
no entityreal
5 7 12
Sum 17 7 24
Donnerstag, 23. Februar 2012
![Page 39: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/39.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
no entity
entityRecall: 9/12 = 0.75 Precision: 9/10 = 0.9
Recall: 12/12 = 1 Precision: 12/17 = 0.71
entitypred
no entitypred
Sum
entityreal
12 0 12
no entityreal
5 7 12
Sum 17 7 24
Donnerstag, 23. Februar 2012
![Page 40: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/40.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
Donnerstag, 23. Februar 2012
![Page 41: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/41.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
Producing a Set of candidate identifiers for each protein
Assigned species Heuristics
Bag accuracy as evaluation metric
Donnerstag, 23. Februar 2012
![Page 42: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/42.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
Donnerstag, 23. Februar 2012
![Page 43: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/43.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
Intra-sentential PPI and FRAG relations
Inter-sentential FRAG relations
Attributes and Properties
enriched with
Donnerstag, 23. Februar 2012
![Page 44: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/44.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
Donnerstag, 23. Februar 2012
![Page 45: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/45.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
DEVTEST and trained on TRAIN
F1 = 2 * (precision * recall) / (precision + recall)
Donnerstag, 23. Februar 2012
![Page 46: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/46.jpg)
Pipeline-Components
CorpusPre-
processingNamed Entity Recognition
Relation Extraction
Component Performance
Term Identification
DEVTEST and trained on TRAIN
inter-annotatoragreement:
84.9/88.464.8
87.159.6
F1 = 2 * (precision * recall) / (precision + recall)
Donnerstag, 23. Februar 2012
![Page 47: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/47.jpg)
Outline
! Introduction
! Related Work
! Assisted Curation
! Text Mining Pipeline
! Curation Experiments
! Discussion and Conclusion
! References
Donnerstag, 23. Februar 2012
![Page 48: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/48.jpg)
Experiment 1:Manual vs. Assisted Curation
! 4 curators
! 4 papers
! 3 conditions:! Manual: without assistance! GSA-assisted: with integrated gold standard annotation! NLP-assisted: with integrated NLP pipeline output
Donnerstag, 23. Februar 2012
![Page 49: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/49.jpg)
Experiment 1:Results
Total number of records and average curation speed per record
Scores range from (1) for „strongly agree“ to (5) for „strongly disagree“Donnerstag, 23. Februar 2012
![Page 50: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/50.jpg)
Experiment 1:Results
Total number of records and average curation speed per record
Scores range from (1) for „strongly agree“ to (5) for „strongly disagree“
<=<=>
Donnerstag, 23. Februar 2012
![Page 51: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/51.jpg)
Experiment 2:NLP Consistency
! 1 curator
! 10 papers
! 2 conditions:! Consistency 1: all recognized named entities (NEs) were
propagated (5 papers)! Consistency 2: only the most frequent recognized NEs were
propagated (5 papers)
Donnerstag, 23. Februar 2012
![Page 52: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/52.jpg)
Experiment 2:Results I
Total number of records and average curation speed per record
Donnerstag, 23. Februar 2012
![Page 53: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/53.jpg)
Experiment 2:Results II
Scores range from (1) for „strongly agree“ to (5) for „strongly disagree“A: consistent NLP output (Consistency 1/2)B: baseline NLP
Donnerstag, 23. Februar 2012
![Page 54: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/54.jpg)
Experiment 2:Results II
Scores range from (1) for „strongly agree“ to (5) for „strongly disagree“A: consistent NLP output (Consistency 1/2)B: baseline NLP
Donnerstag, 23. Februar 2012
![Page 55: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/55.jpg)
Experiment 3:Optimizing for Precision or Recall
! 1 curator
! 10 papers
! 3 conditions:! High R: NLP output with high recall (5 papers)! High P: NLP output with high precision (5 papers)! High F1: NLP output with high F1-score (subsequent all papers;
only viewing)
F1 = 2 * (precision * recall) / (precision + recall)
Donnerstag, 23. Februar 2012
![Page 56: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/56.jpg)
Experiment 3Results I
Comparison between High F1, High P and High RTP: true positiveFP: false positiveFN: false negative
Donnerstag, 23. Februar 2012
![Page 57: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/57.jpg)
Experiment 3Results II
Scores range from (1) for „strongly agree“ to (5) for „strongly disagree“A: High P/High RB: High F1
Donnerstag, 23. Februar 2012
![Page 58: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/58.jpg)
Experiment 3Results II
Scores range from (1) for „strongly agree“ to (5) for „strongly disagree“A: High P/High RB: High F1
Donnerstag, 23. Februar 2012
![Page 59: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/59.jpg)
Outline
! Introduction
! Related Work
! Assisted Curation
! Text Mining Pipeline
! Curation Experiments
! Discussion and Conclusion
! References
Donnerstag, 23. Februar 2012
![Page 60: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/60.jpg)
Discussion I
! Experiment 1:! Maximum time reduction of 1/3 if NLP output is perfectly accurate! NLP assistance leads to more records (but the validity has to be
proven)! In the questionnaire all condition are quite equal
Donnerstag, 23. Februar 2012
![Page 61: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/61.jpg)
Discussion II
! Experiment 2:! Curator prefers consistency with all NEs
! But: objective metrics suggest that other condition is prefered
! Experiment 3:! Curator prefers high recall
" Must be repeated with other curators (different curation styles)
Donnerstag, 23. Februar 2012
![Page 62: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/62.jpg)
Conclusion
! Curation time not sufficient measurement for NLP‘s usefulness
! Closely work with user is necessary" Identifying helpful and hindering aspects
! Future work:! Further research regarding the merit of high recall and high
precision! Implementing confidence values of extracted information! ... with more curators
Donnerstag, 23. Februar 2012
![Page 63: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/63.jpg)
Outline
! Introduction
! Related Work
! Assisted Curation
! Text Mining Pipeline
! Curation Experiments
! Discussion and Conclusion
! References
Donnerstag, 23. Februar 2012
![Page 64: Assisted Curation: Does Text Mining Really Help?...(Alex et al. 2008) by Benedict Fehringer Seminar: „Unlocking the Secrets of the Past: Text Mining for Historical Documents“ Supervisor:](https://reader034.vdocument.in/reader034/viewer/2022051909/5ffde5f99f248533cc39c90c/html5/thumbnails/64.jpg)
References
! Alex, B., Grover, C., Haddow, B., Kabadjov, M., Klein, E., Matthews,M., Roebuck, S., Tobin, R., Wang, X. (2008). Assisted curation: does text mining really help? In Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing, pp. 556-567.
! Krallinger, M., Leitner, F., & Valencia, A. (2007). Assessment of the! second BioCreative PPI task: Automatic extraction of protein-
protein interactions. In Proceedings of the Second BioCreative Challenge Evaluation Workshop, pp. 41–54, Madrid, Spain.
! Schwikowski, B., Uetz, P., & Fields, S. (2000). A network of protein-protein interactions in yeast. Nature Biotechnology, 18, pp. 1257-1261.
Donnerstag, 23. Februar 2012