Download - ODIE Toolkit - bioontology.org
![Page 2: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/2.jpg)
Slide # 2/35
Outline
Overview of the Project Aims, People, Organization, Domain, Philosophy
Specific Aims from a use case approach Information Extraction Ontology Enrichment
First steps, synergies, and year 1work, working together
![Page 3: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/3.jpg)
Slide # 3/35
Project Overview Funded by National Cancer Center Develop tools for
Information extraction from clinical text using ontologies Enrichment of ontologies using clinical text
Project Period: 9/27/2007 – 7/31/2011 Collaboration with National Center for Biomedical
Ontology Subcontract to Stanford (consultation on Bioportal) Subcontract to Mayo (Terminologies, NLP)
![Page 4: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/4.jpg)
Slide # 4/35
Specific Aims Specific Aim 1: Develop and evaluate methods for information extraction
(IE) tasks using existing OBO ontologies, including: 1. Named Entity Recognition 2. Co-reference Resolution 3. Discourse Reasoning 4. Attribute Value Extraction
Specific Aim 2: Develop and evaluate general methods for clinical-text mining to assist in ontology development, including: 1. Preprocessing 2. Concept Discovery and Clustering 3. Suggest taxonomic positioning and relationships 4. Specific Aim 3: Develop reusable software for performing information
extraction and ontology development leveraging existing NCBO tools and compatible with NCBO architecture.
Specific Aim 4: Enhance National Cancer Institute Thesaurus Ontology using the ODIE toolkit.
Specific Aim 5: Test the ability of the resulting software and ontologies to address important translational research questions in hematologic cancers.
Year 1 development goals
![Page 5: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/5.jpg)
Slide # 5/35
Dual Proposal Goals
![Page 6: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/6.jpg)
Slide # 6/35
People @pitt
Wendy Chapman, co-I Rebecca Crowley, PI Preet Chaudhary, co-I Kaihong Liu, Graduate Student Kevin Mitchell, Architect Girish Chavan, Interfaces John Dowling, Annotation
![Page 7: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/7.jpg)
Slide # 7/35
Organization
Annotations Algorithms Architecture
Rebecca Crowley Wendy Chapman Kaihong Liu John Dowling
Rebecca Crowley Wendy Chapman Kaihong Liu Kevin Mitchell
Rebecca Crowley Kevin Mitchell Girish Chavan
Develop manually annotated sets for training and testing
Consider and test existing algorithms; design, implement and test new algorithms
Develop and implement architecture
![Page 8: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/8.jpg)
Slide # 8/35
Domain
Will attempt to develop general tools whenever possible
Priorities for evaluation of components in : Radiology and pathology reports NCIT as well as other clinically relevant OBO
ontologies Cancer domains (including hematologic oncology)
![Page 9: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/9.jpg)
Slide # 9/35
Toolkit for developers of NLP applications and ontologies
Support interaction and experimentation Package systems at the conclusion of working
with ODIE Foster cycle of enrichment and extraction needed
to advance development of NLP systems Ontology enrichment as opposed to denovo
development Human-machine collaboration as opposed to fully
automated learning
Philosophy
![Page 10: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/10.jpg)
Slide # 10/35
Specific Aims Specific Aim 1: Develop and evaluate methods for information extraction
(IE) tasks using existing OBO ontologies, including: 1. Named Entity Recognition 2. Co-reference Resolution 3. Discourse Reasoning 4. Attribute Value Extraction
Specific Aim 2: Develop and evaluate general methods for clinical-text mining to assist in ontology development, including: 1. Preprocessing 2. Concept Discovery and Clustering 3. Suggest taxonomic positioning and relationships 4. Specific Aim 3: Develop reusable software for performing information
extraction and ontology development leveraging existing NCBO tools and compatible with NCBO architecture.
Specific Aim 4: Enhance National Cancer Institute Thesaurus Ontology using the ODIE toolkit.
Specific Aim 5: Test the ability of the resulting software and ontologies to address important translational research questions in hematologic cancers.
Key ODIE Functionality
![Page 11: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/11.jpg)
Slide # 11/35
Named Entity Recognition User has
clinical documents one or more ontology (and/or) one or more lexical resources (synonyms, POS) (optionally) a reference standard of human annotations
User wants to determine degree of coverage of different ontologies with text determine degree of overlap in annotations generated between
ontologies (optionally) test accuracy of NER with different ontologies to
choose ‘best’ ontology to annotate text with tag existing document set with concepts from ontology (optionally
using the synonyms from their synonym source if not in ontology) System produces annotated clinical documents and descriptive
statistics
![Page 12: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/12.jpg)
Slide # 12/35
Named Entity Recognition Clinical Document
Ontology Lexical Resource
Metathesaurus (synonyms) SPECIALIST (POS information)
![Page 13: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/13.jpg)
Slide # 13/35
Named Entity Recognition View Annotated Concepts From A Single Ontology
![Page 14: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/14.jpg)
Slide # 14/35
Named Entity Recognition Compare Annotations from Multiple Ontologies
![Page 15: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/15.jpg)
Slide # 15/35
Co-reference Resolution User has
clinical documents with NER annotations one or more ontology (optionally) a reference standard of co-reference annotations
User wants to visualize co-references detected using one or more ontologies (optionally) test accuracy of CR with different ontologies to
choose ontology for annotations tag existing document set with co-references from ontology
System produces annotated clinical documents and descriptive statistics
![Page 16: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/16.jpg)
Slide # 16/35
Co-reference Resolution
![Page 17: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/17.jpg)
Slide # 17/35
Discourse Reasoning
User has a set of clinical documents with NER and CR
annotations a set of information models about those
documents User wants to
determine which information model (or parts of them) should be used for which clinical document
![Page 18: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/18.jpg)
Slide # 18/35
Discourse Reasoning BRAIN, RIGHT PARIETAL, STEROTACTIC BIOPSY: Mucinous Adenocarcinoma, consistent with previous history of colon primary
BRAIN
Site Morphology
COLON
Location Grade Size TNM Stage
![Page 19: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/19.jpg)
Slide # 19/35
Attribute Value Extraction
User has clinical documents with NER, CR, DR annotations information model of specific subset of documents
Wants to extract attributes and value from clinical text conforming to model Analyze data using common tools possible later search for particular cases
![Page 20: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/20.jpg)
Slide # 20/35
Attribute Value Extraction
Histologic Type Clark’s Level Breslow Depth Mitoses Ulcer Perineural Invasion Angiolymphatic Invasion Regression
![Page 21: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/21.jpg)
Slide # 21/35
Attribute Value Extraction
Histologic Type – Superficial Spreading Clark’s Level – IV Breslow Depth – 1.75 mm Mitoses – Greater than 2 per HLP Ulcer – None Perineural Invasion – None Angiolymphatic Invasion – None Regression - None
![Page 22: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/22.jpg)
Slide # 22/35
Ontology Enrichment
User has clinical documents Ontology
User wants to identify potential candidate concepts from the documents to include in the ontology Visualized in a manner to ease search and recognition of
presence of absence of those concepts in the ontology Suggestions for where in taxonomy the concept should be
placed Suggestions for relationships
![Page 23: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/23.jpg)
Slide # 23/35
Ontology Enrichment Breast, Left, Excisional Biopsy: Mucinous Carcinoma
Breast, Right, Lumpectomy: Infiltrating Ductal Carcinoma
Breast, Left: Invasive Ductal Carcinoma
Breast, Left, Excisional Biopsy: Malignant Phylloides Tumor Tumor shows osseous and lipomatous metaplasia
Ductal Breast Carcinoma
Breast Carcinoma Malignant Breast Neoplasm
Breast Neoplasm Breast Disorder
Disease or Disorder
Invasive Ductal Carcinoma
![Page 24: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/24.jpg)
Slide # 24/35
Concept Discovery Breast, Left, Excisional Biopsy: Mucinous Carcinoma
Breast, Right, Lumpectomy: Infiltrating Ductal Carcinoma
Breast, Left: Invasive Ductal Carcinoma
Breast, Left, Excisional Biopsy: Malignant Phylloides Tumor Tumor shows osseous and lipomatous metaplasia
Ductal Breast Carcinoma
Breast Carcinoma Malignant Breast Neoplasm
Breast Neoplasm Breast Disorder
Disease or Disorder
Invasive Ductal Carcinoma
![Page 25: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/25.jpg)
Slide # 25/35
Taxonomic Positioning Breast, Left, Excisional Biopsy: Mucinous Carcinoma
Breast, Right, Lumpectomy: Infiltrating Ductal Carcinoma
Breast, Left: Invasive Ductal Carcinoma
Breast, Left, Excisional Biopsy: Malignant Phylloides Tumor Tumor shows osseous and lipomatous metaplasia
Ductal Breast Carcinoma
Breast Carcinoma Malignant Breast Neoplasm
Breast Neoplasm Breast Disorder
Disease or Disorder
Invasive Ductal Carcinoma Mucinous Carcinoma
Malignant Phylloides Tumor
![Page 26: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/26.jpg)
Slide # 26/35
Relationships Breast, Left, Excisional Biopsy: Mucinous Carcinoma
Breast, Right, Lumpectomy: Infiltrating Ductal Carcinoma
Breast, Left: Invasive Ductal Carcinoma
Breast, Left, Excisional Biopsy: Malignant Phylloides Tumor Tumor shows osseous and lipomatous metaplasia
Ductal Breast Carcinoma
Breast Carcinoma Malignant Breast Neoplasm
Breast Neoplasm Breast Disorder
Disease or Disorder
Invasive Ductal Carcinoma Mucinous Carcinoma
Malignant Phylloides Tumor
has-Finding
Metaplasia Osseous metaplasia Lipomatous metaplasia Cartilageous metaplasia
Morphologic Finding
![Page 27: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/27.jpg)
Slide # 27/35
First Steps
Use cases Survey of Bioportal, LexBio, GATE and UIMA Survey of ontology enrichment techniques Architectural assumptions and notional
architecture Started discussions with Stanford and Mayo Delineated first year work Annotation software and document sets
![Page 28: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/28.jpg)
Slide # 28/35
Architecture Decisions The primary goal of ODIE is to serve as a workbench for building and refining text
processing pipelines and ontologies. Information retrieval is not a primary goal. However ODIE may have a
rudimentary search feature for annotated document collections.
ODIE Toolkit will be a desktop application.
ODIE UI will be based on the Eclipse Rich Client Platform.
ODIE will use UIMA as the Language Engineering Platform. GATE processing resources will be usable in ODIE by wrapping them in UIMA TAEs. UIMA is highly configurable using xml descriptor files. Better documentation, community support. We will use GATE in first year for rapid prototyping and manual annotation
ODIE will have the ability to easily import and use UIMA TAEs developed by others. This may be expanded to GATE processing resources.
ODIE will allow for packaging a pipeline for deployment in a production environment.
![Page 29: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/29.jpg)
Slide # 29/35
Notional Architecture
![Page 30: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/30.jpg)
Slide # 30/35
Synergies: Ontrez
Ontrez ODIE
• Information Retrieval • Range of inputs
• Other kinds of annotation • Information Extraction • Ontology Enrichment • Clinical Documents
• Annotation • Named Entity Recognition
• Enhance annotation of Ontrez? • Use inference and indexing on clinical documents?
![Page 31: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/31.jpg)
Slide # 31/35
Synergies: Mayo
NER and Co-reference resolution Clustering, discovery of synonyms LexGrid Using similar tools, focused on larger range
of document types More – to be explored
![Page 32: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/32.jpg)
Slide # 32/35
First Year Work
NER and co-reference modules Concept discovery Develop manually annotated reference
standards for NER and CR Focus on testing and developing algorithms ODIE 1.0 will include basic architecture and
modules for NER, CR and concept discovery, statistics
![Page 33: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/33.jpg)
Slide # 33/35
Working Together
Work with Mayo to scope first year collaboration (NER, CR, synonym discovery)
Decisions regarding terminology access Better define what NCBO resources we will
use
![Page 34: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/34.jpg)
Slide # 34/35
Working Together SourceForge site, ODIE website and Wiki All our meetings are open and we are happy to
arrange teleconferences Mondays 2-4 pm (EST)
Schedule visits with Mayo and Stanford for early spring ’08
Anticipate providing monthly progress updates at the ODIE website starting in January ‘08
Other ideas? What’s the expectation of the Council?
![Page 35: ODIE Toolkit - bioontology.org](https://reader031.vdocument.in/reader031/viewer/2022012019/616874c7d394e9041f6fb228/html5/thumbnails/35.jpg)
35
Questions?
Comments?