pathway/genome databases and software tools
DESCRIPTION
Pathway/Genome Databases and Software Tools. Peter D. Karp, Ph.D. Bioinformatics Research Group SRI International [email protected] http://ecocyc.DoubleTwist.com/ecocyc/. Overview. Overview of bioinformatics Motivations for the EcoCyc project EcoCyc demo - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/1.jpg)
Pathway/Genome Databases and Software
Tools
Peter D. Karp, Ph.D.
Bioinformatics Research Group
SRI International
http://ecocyc.DoubleTwist.com/ecocyc/
![Page 2: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/2.jpg)
SRI InternationalBioinformaticsOverview
Overview of bioinformatics
Motivations for the EcoCyc project
EcoCyc demoDescription of EcoCyc database and Pathway Tools
software
Underlying technologies Ocelot object database GKB Editor X-windows to WWW translator
![Page 3: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/3.jpg)
SRI InternationalBioinformaticsDefinition of Bioinformatics
Computational techniques for management and analysis of biological data and knowledge
Methods for disseminating, archiving, interpreting, and mining scientific information
![Page 4: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/4.jpg)
SRI InternationalBioinformatics
Motivations for Bioinformatics
Growth in molecular-biology knowledge
Industrialization of biological experimentation
High-throughput biology Genome sequences Gene and protein expression data Protein-protein interaction data Protein 3-D structures ….
![Page 5: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/5.jpg)
SRI InternationalBioinformatics
A
E
![Page 6: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/6.jpg)
![Page 7: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/7.jpg)
SRI InternationalBioinformaticsMotivations for EcoCyc --
E. coli Encyclopedia
Integrate E. coli information dispersed in the literature
New paradigm of scientific publishing
Model the full metabolic network of an organism
Integrate genomic data with functional data
Develop algorithms for computing with function
Provide a challenging domain for computer-science research
![Page 8: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/8.jpg)
SRI InternationalBioinformaticsDefinitions
A chemical reaction interconverts chemical compounds
An enzyme is a protein that accelerates chemical reactions
A pathway is a linked set of reactions
A conceptual unit of cell’s biochemical machine
A + B = C + D
A C E
![Page 9: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/9.jpg)
SRI InternationalBioinformaticsOrganism-Specific
Pathway/Genome Databases
Layer functional information above the genome
Rich ontology to encode biological information with high fidelity
Chromosomes, genes, operons, gene products, reactions, pathways
Curated by experts for that organism Integrate literature and computational predictions
![Page 10: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/10.jpg)
SRI InternationalBioinformaticsPathway Tools Software
Pathway/Genome Navigator WWW publishing of PGDBs Graphic depictions of pathways, chromosomes, operons Pathway visualization of gene-expression data
Pathway/Genome Editors Distributed curation of genome annotations Distributed object database system Interactive editing tools
PathoLogic Prediction of metabolic network from genome
![Page 11: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/11.jpg)
SRI InternationalBioinformatics
EcoCyc = E.coli Dataset + Pathway/Genome
Navigator
Genes: 4,393
Gene Products: 4,393
Reactions: 1,117
Pathways: 158
Metabolic Network
Compounds: 1,887
http://ecocyc.DoubleTwist.com/ecocyc/
Operons: 375
![Page 12: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/12.jpg)
SRI InternationalBioinformaticsEcoCyc
Collaborative development via internet Karp -- Bioinformatics architect Riley -- Metabolic pathways, signal transduction Saier and Paulsen -- Transport Collado -- Regulation of gene expression
Ontology of 1000 biological classes14,000 instances
Over 2,600 registered users
![Page 13: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/13.jpg)
SRI InternationalBioinformaticsPathway Tools Software
Pathway/ Genome Databases
Pathway/GenomeNavigator
PathoLogic Pathway
Predictor
Pathway/GenomeEditors
![Page 14: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/14.jpg)
![Page 15: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/15.jpg)
![Page 16: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/16.jpg)
![Page 17: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/17.jpg)
SRI InternationalBioinformaticsCreation of the Overview Graph
Run layout algorithms on individual pathway graphs
Automatically determine topology of pathway graph Apply associated layout algorithm (linear, circular, tidy tree)
Use superpathways to create hierarchical layouts Treat each individual pathway as a single node Pathway connections are edges Run appropriate layout algorithm
Manually position the resulting pathway clusters
![Page 18: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/18.jpg)
SRI InternationalBioinformaticsInference of Metabolic Pathways
Genomic Map
Genes
Gene Products
Reactions
Pathway
Metabolic Network
Compounds
Pathway/Genome Database
PathoLogicList of Genes/ORFs
List of Gene Products
ANNOTATED GENOMEStructured ASCII Text File
DNA Sequence
Reports
MetaCyc
![Page 19: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/19.jpg)
SRI InternationalBioinformaticsSummary of H. pylori Analysis
For 121 E. coli pathways, what is the evidence that each pathway occurs in H. pylori?
Strong evidence: 41 Medium evidence: 29 Little or no evidence: 51 31 reactions catalyzed by H. pylori but not by E. coli
H. pylori has partial abilities to synthesize cofactors and amino-acids, extremely
limited carbohydrate catabolism, some amino acid utilization, and a reductive citric-acid pathway
![Page 20: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/20.jpg)
SRI InternationalBioinformaticsMicrobial Pathway/
Genome DBs
Literature-based Datasets:
MetaCyc
Escherichia coli
PathoLogic-based Datasets:
Bacillus subtilisMycobacterium tuberculosisHelicobacter pyloriHaemophilus influenzaeMycoplasma pneumoniaTreponema pallidumChlamydia trachomatis
Saccharomyces cerevisiae
![Page 21: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/21.jpg)
SRI InternationalBioinformaticsPathway Tools Software
Architecture
Implemented in Common Lisp
WWW server runs as a single Unix process with a separate thread to service each query
Grasper-CL graph manager
Ocelot object databaseGKB Editor schema-driven editor
![Page 22: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/22.jpg)
SRI InternationalBioinformaticsEcoCyc WWW Server
![Page 23: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/23.jpg)
SRI InternationalBioinformaticsPathway Tools Architecture --
Development Configuration
Ocelot DBMS
GFP API
PathwayGenome Navigator
WWWServer
X-Windows Graphics
Object EditorPathway EditorReaction Editor
Oracle
![Page 24: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/24.jpg)
SRI InternationalBioinformaticsOcelot Database System
Object Database ManagerPersistence via filesystem or relational DBMS
Demand and background faulting of objects from RDBMS
Two-level object cachingExtensive bioinformatics schema
Stored transaction history Inspect object history
![Page 25: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/25.jpg)
SRI InternationalBioinformaticsOcelot Knowledge Server
Architecture
Frame data model
Persistent storage via Disk files Oracle DBMS
Optimistic concurrency-control protocol
Schema evolution
Logging facility
![Page 26: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/26.jpg)
SRI InternationalBioinformaticsThe Frame Data Model
Frames are of two types: classes, instances
Frames have slots that define their properties, attributes, relationships
A slot has one or more values
Each value can be any Lisp datatype
Slotunits define metadata about slots: Domain, range, inverse Collection type, number of values, value constraints
![Page 27: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/27.jpg)
SRI InternationalBioinformaticsInference Capabilities
Inheritance of defaults
Slot values computed via attached procedures
Maintenance of inverse relationships
Constraint system Deferred evaluation Tolerant of nonconformant data
![Page 28: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/28.jpg)
SRI InternationalBioinformaticsStorage System Architecture
Oracle KBs
DBMS is submerged within FRSRelational schema is domain independent,
supports multiple KBs simultaneously
Frames transferred from DBMS to Ocelot On demand By background prefetcher Memory cache Persistent disk cache to speed performance via Internet
![Page 29: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/29.jpg)
SRI InternationalBioinformaticsFrame Faulting
(get-slot-value gene ‘map-position)
Gene present in in-memory object cache?Gene present in cache on local disk?Query Oracle DBMS
![Page 30: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/30.jpg)
SRI InternationalBioinformaticsLogging
Oracle DBMS stores: The latest version of each frame A history of all OKBC operations applied to KB
Reconstruct earlier versions of KBView history of changes to an objectUpdate replicatesConcurrency control
![Page 31: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/31.jpg)
SRI InternationalBioinformaticsSchema Management
FRSs store and process class and instance information similarly
Applications can query schema information as easily as they can query instances
![Page 32: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/32.jpg)
SRI InternationalBioinformaticsGKB Editor
Browser and editor for KBs and ontologies
Four editing tools
GKB Editor reusable with multiple FRSs All database queries via OKBC/GFP API Interoperability achieved with Ocelot, LOOM, Ontolingua
All operations are schema driven
http://www.ai.sri.com/~gkb/overview.html
![Page 33: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/33.jpg)
SRI InternationalBioinformaticsEditors
Taxonomy editor
Frame editor
Relationships editor
Spreadsheet editor
![Page 34: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/34.jpg)
![Page 35: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/35.jpg)
![Page 36: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/36.jpg)
SRI InternationalBioinformaticsResults
Ocelot in use in the EcoCyc project for 5 years
Supports collaborative development of EcoCyc by four groups in North America
Distributed architecture GKB Editor in active use
Supports development of 8 Pathway/Genome Databases
![Page 37: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/37.jpg)
SRI InternationalBioinformaticsSummary
Pathway/Genome Databases
Pathway Tools software Extract pathways from genomes Distributed curation tools Query, visualization, WWW publishing Analysis algorithms
![Page 38: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/38.jpg)
SRI InternationalBioinformaticsComputer Science Results
Extend scalability and multiuser access for knowledge representation systems
Reusable, schema-driven KB editor
Hierarchical graph layout algorithms
Dynamic translation from X-windows to HTML+GIF
Importance of ontologies and of content:Discovery = Algorithm + Database
![Page 39: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/39.jpg)
SRI InternationalBioinformaticsProblem Solving Depends on
Algorithms and Content
Database Size and Quality
SolutionQuality
Algorithm Quality
ComputeTime
![Page 40: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/40.jpg)
SRI InternationalBioinformaticsBioinformatics Results:
Content
The EcoCyc database describes the full metabolic map of an organism
The MetaCyc database describes over 300 metabolic pathways
Ontology spans genome to pathway information
![Page 41: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/41.jpg)
SRI InternationalBioinformaticsBioinformatics Results:
Algorithms
Software environment for genome and pathway information
Query and visualization Distributed database development
PathoLogic algorithm predicts the metabolic network of an organism from its genome
Algorithms under development for qualitative modeling of the cell
![Page 42: Pathway/Genome Databases and Software Tools](https://reader036.vdocument.in/reader036/viewer/2022081513/56814d5d550346895dba9e3f/html5/thumbnails/42.jpg)
SRI InternationalBioinformaticsAcknowledgements
Funding sources: NIH National Center for Research Resources
Collaborators: Monica Riley, Marine Biological Laboratory Milton Saier, UC San Diego Julio Collado, UNAM Christos Ouzounis, European Bioinformatics Institute
Peter D. Karp, Ph.D.
http://www.ai.sri.com/pkarp/