alignment of ontologies for biological research judith a. blake, ph.d. bioinformatics and...
Post on 19-Dec-2015
216 views
TRANSCRIPT
Alignment of Ontologies for Biological Research
Judith A. Blake, Ph.D.Bioinformatics and Computational Biology The Jackson Laboratory
Dagstuhl - 2007
What is my perspective?
Biological data is voluminous and complex
Data integration is hard work
Bio-ontologies provide semantic structure and standards that aid in data analysis and hypothesis generation .
There are many challenges to the effective use of bio-ontologies (in addition to challenges to the development of ontologies)
Dagstuhl - 2007
What is my approach?
Goal is to facilitate ‘translational research’ through effective integration of experimental data from mouse models of human conditions with human clinical data from disease studies
Bio-ontologies provide a mechanism to support comprehensive data integration and analysis
Dagstuhl - 2007
Interesting….
- Refine Relations Ontology (RO) - Identify critical datasets - Focus on bottlenecks - Create views
Dagstuhl - 2007
Phenotype• mutant allele definitions
• QTL
• strain characteristics
• phenotype vocabularies
• disease models (human)
• comparative phenotypes
Genes & Gene Products• nomenclature
• gene characterization
• transcripts, proteins, gene products
• functional annotation
• orthologs & paralogs
Sequences & Maps• sequence representation
• C57BL/6J genomic sequence
• SNPs and strain variants
• adding biological context to computational gene models
Gene Expression• mouse anatomy
• time, tissue, level of expression
• range of assays & results
• emphasis on embryonic stages
Tumor Biology• tumor classifications & descriptions
• strain incidence
• histopathology images
• tumor genetics
Overview of Mouse Genome Informatics
Dagstuhl - 2007
Data acquisition is constantLoad Program Summary of Data Loaded
Mouse EntrezGene EntrezGene IDs for mouse markers. Plus marker-to-sequence associations from EntrezGene not already in MGD
Human/Rat EntrezGene Nomenclature, map position and other data regarding human and rat genes. OMIM associations for human.
GenBank Seq Mouse sequence records from GenBank
RefSeq Seq Mouse sequence records from RefSeq
UniProt/TrEMBL Seq Mouse sequence records from UniProt and TrEMBL
TIGR/DoTS/NIA Seq Mouse consensus sequence records from TIGR/DoTS/NIA clusters
TIGR/DoTS/NIA Association Associations between TIGR/DoTS/NIA cluster sequences and markers.
Ensembl Gene Model Ensembl gene model sequences, coordinates, & associations between these & markers
NCBI Gene Model NCBI gene model sequences, coordinates, & associations between these & markers
UniProt Association UniProt/TrEMBL IDs and additional GenBank IDs for mouse markers. Plus GO and InterPro annotations
UniGene Association UniGene cluster IDs for mouse markers.
EST cDNA Clone Mouse IMAGE, NIA, MGC, Riken, cDNAs and EST sequence associations
MGC Association MGC IDs and associations between MGC full length sequences and MGC cDNAs
RPCI Clone RPCI 23/24 BAC clones and sequence associations
GO Vocabulary Updated Gene Ontology (GO) vocabularies from the central GO site.
OMIM Vocabulary Updated OMIM disease terms
MP Vocabulary Updated MP vocabulary (from OBO-Edit)
Anatomy Updated adult mouse anatomy ontology (from OBO-Edit)
Mapping panel JAX, EUCIB, Copeland-Jenkins and many others
PIRSF Mouse PIR superfamily terms and associations to markers
SNPs Mouse SNPs from dbSNP and associations between SNPs & markers.
Dagstuhl - 2007
Snapshot of MGI data contentMGI data statistics March, 2007
Number of genes with sequence data 28,292
Number of genes (incl. unmapped mutants) 35,733
Number of markers (including genes) 69,639
Number of markers mapped 65,345
Number of genes with protein sequence information 24,293
Number of genes with GO annotations 17,664
Number of mouse/human orthologies 16,127
Number of mouse/rat orthologies 15,802
Number of genes with one or more phenotypic alleles 6,979
Number of cataloged phenotypic alleles 17,494
Number of references 113,508
Number of integrated mouse nucleotide sequences (+ ESTs) 8,3574,701
Dagstuhl - 2007
Build 36: Ensembl and NCBI
28807 24237
Unification(Exon Overlap Detection)
221826910 2646
Unique to EnsemblUnique to NCBI Equivalent
1:1 1:n n:1 n:m
20663 365 874 280
Dagstuhl - 2007
Multiple Controlled Vocabularies in MGI
Gene Nomenclature Gene/Marker Type Allele Type Developmental and
Adult Anatomies Assay Type
Expression Mapping
Molecular Mutation Inheritance Mode
Gene Ontology Mammalian Phenotype
Ontology Tissue Types Cell Types Cell Lines Units
Cytogenetic Molecular
ES Cell Line Strain Nomenclature
Dagstuhl - 2007
Mammalian Phenotype Ontology
Compositional terms ‘working’ ontology Projected xref to ‘core’ ontologies
Anatomy GO
Built with attention to ontological principles but with primary goal of supporting annotation of diverse experimental results from many research groups and perspectives
Dagstuhl - 2007
We are exploring ontological representations that relate human clinical data with mouse phenotypes
Create compositional view for annotation of mouse models and human clinical data
Provide xref / RO back to core ontologies
Support both annotation and ontology alignment efforts
Develop tools to support complex queries
Dagstuhl - 2007
We modeled gangliosidoses as a test case. Two types of gangliosidoses are Sandoff and Tay-Sachs diseases.
Dagstuhl - 2007
Curators use controlled terms from structured vocabularies (ontologies) to curate complex biological systems described in the literature
The knowledge is in the details
Dagstuhl - 2007
a
DopamineCHEBI:18243
Chemical Ontology
a
Cell Type Ontology
Dopaminergic NeuronCL:0000700
Biological Process
Synaptic transmissionGO:0007268
a
BrainMA:0000168
Anatomical Dictionary
Different core ontologies need to be combined to describe complex biological systems
Dagstuhl - 2007
Dilemma: No formal links currently existbetween the separate ontologies
Solution? Solution?
1. Generate cross-products (compositional 1. Generate cross-products (compositional terms) as necessary for annotations of terms) as necessary for annotations of characteristics of disease cases and disease characteristics of disease cases and disease models; models;
2. Annotate specific instances of human cases 2. Annotate specific instances of human cases and mouse models; and mouse models;
3. Visualize and mine co-annotated data3. Visualize and mine co-annotated data
Dagstuhl - 2007
Next Steps
Perspective (views) Lung Cancer Provide Disease Ontology Build compositional view
Mouse Data Curate comprehensive annotations for genes
implicated in lung phenotypes Human Data
Curate clinical data for ontology annotation Data Analysis
Use ontological structures to facilitate data exploration and hypothesis generation
Dagstuhl - 2007
Next conference?
“enabling technologies for ontological access to clinical and animal model data”
A hands-on problem solving workshop – a problem use case