ontology generation based on a user-specified ontology seed
DESCRIPTION
Ontology Generation Based on a User-Specified Ontology Seed. Cui Tao Data Extraction Research Group Department of Computer Science Brigham Young University. Supported by NSF. Introduction. Motivation: Traditional search engines: return documents - PowerPoint PPT PresentationTRANSCRIPT
1
Ontology Generation Based on a User-Specified
Ontology Seed
Cui TaoData Extraction Research GroupDepartment of Computer Science
Brigham Young University
Supported by NSF
www.deg.byu.edu2
Introduction
Motivation: Traditional search engines: return documents Ontology-based data extraction: return information
Problem: Build extraction ontology that meet users needs
Goal: Automatically build ontologies for users’ needs
www.deg.byu.edu3
Example
Example: a biologist is interested in information about large proteins in humans and their functions
Possible queries: Find proteins in humans that are >20 kDa Find all the proteins in humans that serve as receptors ...
Information sources --- various online databases NCBI Gene Cards The Gene Ontology GPM Proteomics Database …
www.deg.byu.edu4
Extraction Ontology
Regular Expression: ^\d{1,5}(\.\d{1,2})?
Unit: kilodaltons?|kdas?|kds|?das?|daltons?
Molecular Weight
www.deg.byu.edu5
User Interface
Select a title for the forms
www.deg.byu.edu6
User InterfaceBinary Relationship
NameProtein
Protein
Name
www.deg.byu.edu7
User InterfaceBinary Relationship
Molecular Weight
Protein
NameNameProtein
Molecular weight
www.deg.byu.edu8
User InterfaceN-ary Relationship
Chromosome number
Start End Orientation
Chromosome locationChromosome location
Chromosome number
Start End
Orientation
www.deg.byu.edu9
User InterfaceN-ary Relationship
GO
GO
GO phrase
GO ID
Go ID
Go term
www.deg.byu.edu10
Protein
Molecular Weight
Name
Chromosome location
GO
Chromosome number Start End Orientation
Overall Form
Go ID
Go term
www.deg.byu.edu11
Ontology View
Name
Chromosome location
Protein
Chromosome number
Start End
Orientation
GO
GO phrase
GO ID
Molecular weight
www.deg.byu.edu12
Protein
Molecular Weight
Name
Chromosome location
GO
Chromosome number Start End Orientation
Go ID
Go term
Fill in the Form
www.deg.byu.edu13
Protein
Molecular Weight
29175 Daltons
Name
14-3-3 protein epsilonMitochondrial import stimulation factor LsubunitProtein kinase C inhibitor protein-1KCIP-114-3-3E
Chromosome location
GO
Chromosome number17
Start End Orientation1,250,267 1,194,558 minus
Fill in the Form
GO:0019899GO:0019904
Go ID
Go term
enzyme bindingprotein domain specific binding
www.deg.byu.edu14
Mapping
Name
14-3-3 protein epsilonMitochondrial import stimulation factor LsubunitProtein kinase C inhibitor protein-1KCIP-114-3-3E
www.deg.byu.edu15
Mapping
Name
14-3-3 protein epsilonMitochondrial import stimulation factor LsubunitProtein kinase C inhibitor protein-1KCIP-114-3-3E
www.deg.byu.edu16
Mapping
Name
www.deg.byu.edu17
Data Frame Generation
Choose from data frame library Data frames for basic values
Numbers within different ranges Integers, floats, etc Emails, phone numbers, addresses, etc
Domain specific values (DNA sequences) Units
Build lexicon files
www.deg.byu.edu18
Data Frame Generation
• Find the best matched data frame from the library• Find the correct units
www.deg.byu.edu19
Build Lexicon Files
Name
www.deg.byu.edu20
Contribution
Automatically generates ontologies depending on users’ requests
Provides a tool for users to easily provide ontology seeds
Automatically generates ontology views from ontology seeds
Automatically map ontology concepts to source databases