analysis environments for functional genomics
DESCRIPTION
Analysis Environments For Functional Genomics. Bruce R. Schatz Institute for Genomic Biology University of Illinois at Urbana-Champaign [email protected] , www.beespace.uiuc.edu. Informatics Research First Annual BeeSpace Workshop June 6, 2005. What are Analysis Environments. - PowerPoint PPT PresentationTRANSCRIPT
Analysis EnvironmentsAnalysis Environments For Functional GenomicsFor Functional Genomics
Bruce R. SchatzInstitute for Genomic Biology
University of Illinois at [email protected] , www.beespace.uiuc.edu
Informatics ResearchFirst Annual BeeSpace Workshop
June 6, 2005
What are Analysis EnvironmentsWhat are Analysis Environments
Functional Analysis Find the underlying Mechanisms Of Genes, Behaviors, Diseases
Comparative Analysis Top-down data mining (vs Bottom-up) Multiple Sources especially literature
Building Analysis EnvironmentsBuilding Analysis Environments
Manual by Humans Interaction user navigation Classification collection indexing
Automatic by Computers Federation search bridges Integration results links
Needles and HaystacksNeedles and Haystacks
Genes Honey Bees have 13K genes Perhaps 100 have known functions
Paths Perhaps 30K protein families exist KEGG has 200 known pathways
Statistical Clustering for Interactive DiscoveryAcross Two Orders of Magnitude!
Trends in Analysis EnvironmentsTrends in Analysis Environments
Central versus Distributed Viewpoints
The 90s Pre-Genome Entrez (NIH NCBI) versus WCS (NSF Arizona)
The 00s Post-Genome GO (NIH curators) versus BeeSpace (NSF Illinois)
Pre-Genome EnvironmentsPre-Genome Environments
Focused on Syntax pre-Web
WCS (Worm Community System) Search words across sources Follow links across sources Words automatic, Links manual
Towards Uniform Searching
Post-Genome EnvironmentsPost-Genome Environments
Focused on Semantics post-Web
BeeSpace (Honey Bee Inter Space) Navigate concepts across sources Integrate data across sources Concepts automatic, Links automatic
Towards Question Answering
Worm Community SystemWorm Community System WCS Information:Literature BIOSIS, MEDLINE, newsletters,
meetings
Data Genes, Maps, Sequences, strains, cells
WCS FunctionalityBrowsing search, navigationFiltering selection, analysisSharing linking, publishing
WCS: 250 users at 50 labs across Internet (1991)
WCSMolecular
WCS Cellular
WCS invokes
gm
WCS vis-à-vis
acedb
from Objects to Concepts
from Syntax to Semantics
Infrastructure is Interaction with Abstraction
Internet is packet transmission across computers
Interspace is concept navigation across repositories
Towards the InterspaceTowards the Interspace
THE THIRD WAVE OF NET EVOLUTIONTHE THIRD WAVE OF NET EVOLUTION
PACKETS
OBJECTS
CONCEPTS
Technology
Engineering
Electrical
FORMAL
INFORMAL
(manual)
(automatic)
IEEE
communities
groups
individuals
LEVELS OF INDEXESLEVELS OF INDEXES
Navigation in MEDSPACENavigation in MEDSPACE
For a patient with Rheumatoid Arthritis Find a drug that reduces the pain (analgesic) but does not cause stomach (gastrointestinal) bleeding
Choose DomainChoose Domain
Concept SearchConcept Search
Concept NavigationConcept Navigation
Retrieve DocumentRetrieve Document
Navigate DocumentNavigate Document
Post-Genome Informatics IPost-Genome Informatics I
Comparative Analysis within theDry Lab of Biological Knowledge
Classical Organisms have Genetic Descriptions.There will be NO more classical organisms beyondMice and Men, Worms and Flies, Yeasts and Weeds.
Must use comparative genomics on classical organismsVia sequence homologies and literature analysis.
Post-Genome Informatics IIPost-Genome Informatics II
Functional Analysis within theDry Lab of Biological Knowledge
Automatic annotation of genes to standard classifications, e.g. Gene Ontology via homology on computed protein sequences.
Automatic analysis of functions to scientific literature, e.g. concept spaces via text extractions. Thus must use functions in literature descriptions.
Conceptual Navigation in BeeSpaceConceptual Navigation in BeeSpace
NeuroscienceLiterature
MolecularBiology
Literature
BeeLiterature
Flybase,WormBase
BeeGenome
Brain RegionLocalization
Brain GeneExpression
Profiles
BehavioralBiologist
MolecularBiologist
Neuro-scientist
BeeSpace Analysis EnvironmentBeeSpace Analysis Environment Build Concept Space of Biomedical Literature
for Functional Analysis of Bee Genes
-Partition Literature into Community Collections-Extract and Index Concepts within Collections-Navigate Concepts within Documents-Follow Links from Documents into Databases
Locate Candidate Genes in Related Literatures then follow links into Genome Databases
Question AnsweringQuestion AnsweringBehaviour Organism Gene
Molecular Function
Reference
Foraging
Rover vs sitter phenotype Drosophila melanogaster for Protein kinase G 8
Roamer vs dweller phenotype C. elegans egl-4 Protein kinase G 16
Division of labour: age at onset of foraging
Apis mellifera for Protein kinase G 9
Division of labour: age at onset of foraging
Apis mellifera mlv Mn transporter 19
Division of labour: foraging-related? Apis mellifera per Transcription cofactor 68
Division of labour: foraging-related? Apis mellifera ache Acetylcholine esterase 69
Division of labour: foraging-related? Apis mellifera IP(3)K Inositol signaling 70
Foraging specialization: nectar vs. pollen
Apis mellifera pkc Protein kinase C 71
Social feeding Drosophila melanogaster dpnfNeuropeptide Y
(NPY) homolog21
Social feeding (aggregation) C. elegans npr-1 Receptor for NPY 22, 23
Functional PhrasesFunctional Phrases<gene> encodes <chemical> Sokolowski and colleagues demonstrated in Drosophila melanogaster that the foraging gene (for) encodes a cGMP dependent protein kinase (PKG). The dg2 gene encodes a cyclic guanosine monophosphate (cGMP)- dependent protein kinase (PKG). <chemical> affects/causes <behavior> Thus, PKG levels affected food-search behavior. cGMP treatment elevated PKG activity and caused foraging behavior. <gene> regulates <behavior> Amfor, an ortholog of the Drosophila for gene, is involved in the regulation of age at onset of foraging in honey bees. This idea is supported by results for malvolio (mvl), which encodes a manganese transporter and is involved in regulating Drosophila feeding and age at onset of foraging in honey bees.
Data Integration (FlyBase Gene)Data Integration (FlyBase Gene)D. melanogaster gene foraging , abbreviated as for , is reported here . It has also been known in FlyBase as BcDNA:GM08338, CG10033 and l(2)06860. It encodes a product with cGMP-dependent protein kinase activity (EC:2.7.1.-) involved in protein amino acid phosphorylation which is a component of the cellular_component unknown . It has been sequenced and its amino acid sequence contains an eukaryotic protein kinase , a protein kinase C-terminal domain , a tyrosine kinase catalytic domain , a serine/Threonine protein kinase family active site , a cAMP-dependent protein kinase and a cGMP-dependent protein kinase . It has been mapped by recombination to 2-10 and cytologically to 24A2--4 . It interacts genetically with Csr . There are 27 recorded alleles : 1 in vitro construct (not available from the public stock centers), 25 classical mutants ( 3 available from the public stock centers) and 1 wild-type. Mutations have been isolated which affect the larval nerve terminal and are behavioral, pupal recessive lethal, hyperactive, larval neurophysiology defective and larval neuroanatomy defective. for is discussed in 80 references (excluding sequence accessions), dated between 1988 and 2003. These include at least 6 studies of mutant phenotypes , 2 studies of wild-type function , 3 studies of natural polymorphisms and 7 molecular studies . Among findings on for function, for activity levels influence adult olfactory trap response to a food medium attractant. Among findings on for polymorphisms, the frequency of for R and for s strains in three natural populations are studied to determine the contribution of the local parasitoid community to the differences in for R and for s frequencies.
BeeSpace Information SourcesBeeSpace Information Sources Biomedical Literature- Medline (medicine)- Biosis (biology)- Agricola, CAB Abstracts, Agris (agriculture)
Model Organisms (heredity)-Gene Descriptions (FlyBase, WormBase) Natural Histories (environment)-BeeKeeping Books (Cornell, Harvard)
Medical Concept Spaces (1998)Medical Concept Spaces (1998)
Medical Literature (Medline, 10M abstracts) Partition with Medical Subject Headings (MeSH)
Community is all abstracts classified by core term 40M abstracts containing 280M concepts computation is 2 days on NCSA Origin 2000
Simulating World of Medical Communities 10K repositories with > 1K abstracts (1K with > 10K)
Biological Concept Spaces (2006)Biological Concept Spaces (2006)
Compute concept spaces for All of BiologyBioSpace across entire biomedical literature
50M abstracts across 50K repositories
Use Gene Ontology to partition literature into biological communities for functional analysis
GO same scale as MeSH but adequate coverage?GO light on social behavior (biological process)
Concept SwitchingConcept Switching
In the Interspace…
each Community maintains its own repository
Switching is navigating Across repositories
use your specialty vocabulary to search another specialty
CONCEPT SWITCHINGCONCEPT SWITCHING
“Concept” versus “Term” set of “semantically” equivalent terms
Concept switching region to region (set to set) match
term
Semantic region
Concept SpaceConcept Space
Biomedical SessionBiomedical Session
Categories and ConceptsCategories and Concepts
Concept SwitchingConcept Switching
Document RetrievalDocument Retrieval
Interactive Functional AnalysisInteractive Functional AnalysisBeeSpace will enable users to navigate a uniform space of
diverse databases and literature sources for hypothesis development and testing, with a software system beyond a searchable database, using literature analyses to discover functional relationships between genes and behavior.
Genes to BehaviorsBehaviors to GenesConcepts to ConceptsClusters to ClustersNavigation across Sources
BeeSpace Information SourcesBeeSpace Information Sources
General for All Spaces:
Scientific Literature-Medline, Biosis, Agricola, Agris, CAB Abstracts-partitioned by organisms and by functions
Model Organisms -Gene Descriptions (FlyBase, WormBase, MGI, OMIM,
SCD, TAIR)
Special Sources for BeeSpace:-Natural History Books (Cornell Library, Harvard Press)
XSpace Information SourcesXSpace Information SourcesOrganize Genome Databases (XBase)Compute Gene Descriptions from Model OrganismsPartition Scientific Literature for Organism XCompute XSpace using Semantic Indexing
Boost the Functional Analysis from Special SourcesCollecting Useful Data about Natural Historiese.g. CowSpace Leverage in AIPL Databases
Towards the InterspaceTowards the Interspace
The Analysis Environment technology is GENERAL!
BirdSpace? BeeSpace?PigSpace? CowSpace? BehaviorSpace? BrainSpace?
BioSpace… Interspace
Prototype SystemPrototype System
Overall Architecture and Interface -- Todd Littell
Language Parsing and Entity Recognition – Jing Jiang Normalization and Theme Clustering – Qiaozhu Mei Concept Navigation and Switching – Azadeh Shakery Gene Summarization and Linking – Xu Ling Collection Development and Navigation – Xin He
Specialty Systems Question Answering – Eugene Grois Annotation Pipeline – Pouya Kheradpour