editing pathway/genome databases compounds, reactions and pathways ron caspi
TRANSCRIPT
Editing Pathway/Genome Databases
Compounds, Reactions and Pathways
Ron Caspi
SRI InternationalBioinformaticsActivate Editing Mode
Type (enable/disable-editors t) at the listener pane
SRI InternationalBioinformaticsWhy Curation is Important!
Curators need jobs
“in silico” information less solid than experimental evidence
Database curation greatly enhances the usefulness of the data
SRI InternationalBioinformaticsPathway Tools Paradigms
Separate database from user interface
Navigator provides one interface to the DB
Editors provide an alternative interface to the DB
• Reuse information whenever possible!
• A PGDB should not describe the same biological or chemical entity more than once
• A tool helps to prevent creation of duplicate reactions
SRI InternationalBioinformaticsEditing rules: Support Policy
Do not modify the EcoCyc or MetaCyc datasets
Do not alter DB schema e.g. do not add or remove classes or slots
SRI InternationalBioinformaticsList of Editors
Compound Editor Compound Structure Editors Reaction Editor Synonym Editor Publication Editor Pathway Editor and Pathway Info Editor Protein/Subunit structure/Enzymatic Reaction Editors Gene Editor Intron Editor (Eukaryotes only) Transcription Unit Editor Frame Editor Relationships Editor Ontology Editor
SRI InternationalBioinformaticsSaving Changes
The user must save changes explicitly File => Save Current DB Save DB button
List Unsaved Changes in Current DBRevert Current DB
Checkpoint Current DB Updates to FileRestore Updates from Checkpoint File
SRI InternationalBioinformaticsOther DB commands under the
File menu
Summarize databases Summarize current organism Refresh DB list
Refresh All Current DBs Delete a DB Attempt to Reconnect to Database Server
SRI InternationalBioinformaticsInvoking the Editors
2. Existing Object: Right-Click on the Object Handle
1. New Object: Use the “New” command
SRI InternationalBioinformaticsCompound Editor
Create or edit a compound
Specify Class Common Name and
Synonyms Comments, citations Links to other DBs
SRI InternationalBioinformaticsThe Synonym Editor
Lets you easily edit the synonyms and set the common name
SRI InternationalBioinformaticsCitations
Citation boxesThe CITS fieldFile =>Import Citations
from PubmedPublication editor
(invoke by right clicking on a citation at bottom)
Non Pubmed citation:Enter in citation box in the form Smith06, invoke editor by clicking out of a citation box.
SRI InternationalBioinformaticsMore Compound Editing
Compound Structure Editors (Marvin, JME) Mol files Exporting to other DBs Merging Duplicate Frame and Edit
SRI InternationalBioinformaticsReaction Editor
Enter or edit a reaction equation EC number (official?) Check for balance Compound Resolver
SRI InternationalBioinformaticsPathway Info Editor
• Class (variant class)
• Common Name
• Synonyms
• Evidence code
• Citations (CIT)
• Comments
• External Links
• Hypothetical reactions
• Author credits
SRI InternationalBioinformaticsPathway Editor
Graphically create and modify pathwaysTwo tools:
Connections Editor: add reactions one by one Segment Editor: enter a linear pathway segment
SRI InternationalBioinformaticsConnections Editor Operations
Two main display panes: left: unconnected pathway reactions right: draws connected reactions (looks like the regular Pathway display
window) Connecting reactions:
select initial reaction (in either pane) ===> red and green reactions select a green reaction
Useful Commands: choose main compounds for reaction disconnect all reactions
In circular pathways, specify which compound should be at the top
Add links to other pathways, reactions, or comments
SRI InternationalBioinformaticsPathway Segment Editor
To enter linear sequence of reactions simultaneously
Reactions are specified by EC numbers or reaction substrates
One segment may contain up to 7 reactions
SRI InternationalBioinformatics
Pathway Editor Limitations
Complex situations can cause ambiguity: link may be ignored dialog box for disambiguating pathway drawn in bizarre arrangement
Fix: try removing offending link and add links in different order
Pathway Editor does not handle polymerization pathways easily.
SRI InternationalBioinformaticsEvidence Codes for Pathways
http://brg.ai.sri.com/ptools/evidence-ontology.html
EV-COMP: Inferred from computation HINF - Human inference AINF - Artificial inference
EV-AS: Author statement TAS - traceable NAS – non-traceable
EV-IC: Inferred by curator EV-EXP: Inferred from experiment
IDA - inferred from direct assay IPI - inferred from physical interaction TAS – inferred from traceable data (review) IEP - inferred from expression pattern IGI - inferred from genetic interaction IMP- inferred from mutant phenotype
SRI InternationalBioinformaticsEnzyme/Protein Editors
To add an enzyme to a reaction: Right click the reaction, choose Edit => Create/Add enzyme. “Choose Protein”: specify ID, or “Search by genes or create new
protein” => Protein subunit structure editor Protein Editor Check the Curator Guide at
http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf
SRI InternationalBioinformaticsProtein Editor
SRI InternationalBioinformaticsEnzymatic Reaction Editor
SRI InternationalBioinformaticsProtein Subunit Editor
SRI InternationalBioinformaticsSuper Pathways
Need to keep pathways within well-defined end points Link pathways to upstream or downstream pathways with pathway links.
Create more complex metabolic networks using superpathways
Example: superpathway of aromatic compound degradation (aerobic)
is composed of: catechol degradation II mandelate degradation I benzoate degradation (aerobic) -ketoadipate degradation protocatechuate degradation II shikimate degradation quinate degradation 4-hydroxymandelate degradation tryptophan degradation I
SRI InternationalBioinformaticsPathway Export
Export
Edit => Add Pathway to File Export List File => Export => Selected Pathways to File
Import
File => Import => Pathways from File
SRI InternationalBioinformaticsCreating Links to External
Databases
Creating links from a PGDB to external databases
To define a new external database: Tools => Ontology Browser View => Browse from new root / type Databases Highlight Databases Frame => Create => Instance Enter frame name, frame edit Enter Common Name, Static-Search-URL
e.g. http:/gene.pharma.com/dbquery? Enter a value for Search-Object-Class (e.g. Proteins)
Creating links to a PGDBsee http://biocyc.org/linking.shtml
SRI InternationalBioinformaticsConstraint Checking
General rules that constrain the valid relationships among instances
Constraints are checked when new facts are asserted to assure that the DB remains logically consistent
Constraints on slots: Domain violation checks to make sure the slots are in instances of the
appropriate class Range violation :
value type value cardinality
Inverse Cardinality Lisp-predicate
SRI InternationalBioinformaticsConsistency Checking
(correctify-kb) Removes newlines from names Converts “<“ to “|” in string citations Checks isozyme sequence similarity Fixes references between polypeptides and genes Changes compound names to ids in a variety of slots Matches physiological regulators to other regulators Cross-references compounds to reactions Checks pathways predecessors/reactions/subs Checks reaction balancing Checks compound structures Calculates sub- and super-pathways Finds missing sub-pathways links Verifies chromosome components and positions
SRI InternationalBioinformaticsUpdate your computers!
To install a patch:
Tools => Instant Patch => Download and Activate All Patches
SRI InternationalBioinformaticsMake sure that…
You perform all exercises on the Hb. pylori database, not on your own!!!
SRI InternationalBioinformaticsCreating New Reactions
Don’t forget to include spaces between chemical names and terms such as “+” and “=“:
1. ascorbate + H2O = 3-keto-L-gulonate
2. 3-keto-L-gulonate + ATP = 3-keto-L-gulonate-6-phosphate + ADP
3. 3-keto-L-gulonate-6-phosphate = L-xylulose-5-phosphate + CO2
4. L-xylulose-5-phosphate = L-ribulose-5-phosphate
5. L-ribulose-5-phosphate = xylulose-5-phosphate
6. xylulose-5-phosphate = D-ribulose-5-phosphate
SRI InternationalBioinformatics
Fill Reaction frame ID’s in your handout
Reaction Frame ID
ascorbate + H2O = 3-keto-L-gulonate XXX
3-keto-L-gulonate + ATP = 3-keto-L-gulonate 6-phosphate + ADP
3-keto-L-gulonate 6-phosphate = L-xylulose-5-phosphate + CO2
L-xylulose-5-phosphate = L-ribulose-5-phosphate
L-ribulose-5-phosphate = xylulose-5-phosphate
xylulose-5-phosphate = D-ribulose-5-phosphate
SRI InternationalBioinformaticsDuplicate Reaction?
Frame ID of the new reaction to be created. This frame will NOT be created unless you choose “Keep”
Frame ID of the existing reaction.This reaction will NOT be transferred into your database until you click “Import”!
Record this BEFORE you click “Import”
SRI InternationalBioinformaticsDefine a New Pathway
Define the pathway L-ascorbate degradation to D-ribulose-5-phosphate by connecting the reactions together
Assign class:
(Pathways -> Degradation/Utilization/Assimilation -> Carboxylates, Other)
Add the reactions, conect them, and add a link to the pathway non-oxidative branch of the pentose phosphate pathway(Generation of precursor metabolites and energy => Pentose phosphate pathways =>)
Add a reverse link from non-oxidative branch of the pentose phosphate pathway to the new pathway
SRI InternationalBioinformaticsRun
(correctify-kb)
Open the database Hb. pylori (HypCyc) (so ‘hyp)
Run (correctify-kb)
Analyze output