the pathway tools schema. sri international bioinformatics motivations for understanding schema...
Post on 18-Dec-2015
220 views
TRANSCRIPT
The Pathway Tools Schema
SRI InternationalBioinformaticsMotivations for Understanding
Schema
Pathway Tools visualizations and analyses depend upon the software being able to find precise information in precise places within a Pathway/Genome DB
When writing complex queries to PGDBs, those queries must name classes and slots within the schema
A Pathway/Genome Database is a web of interconnected objects; each object represents a biological entity
SRI InternationalBioinformaticsReference
Pathway Tools User’s Guide, Volume I Appendix A: Guide to the Pathway Tools Schema
SRI InternationalBioinformaticsWeb of Relationships for One
Enzyme
Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2
sdhA sdhB sdhC sdhD
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
TCA Cycle
SRI InternationalBioinformaticsFrame Data Model
Frame Data Model -- organizational structure for a PGDB
Knowledge base (KB, Database, DB)
Frames
Slots
Facets
Annotations
SRI InternationalBioinformaticsKnowledge Base
Collection of frames and their associated slots, values, facets, and annotations
AKA: Database, PGDB
Can be stored within An Oracle or MySQL DB A disk file Pathway Tools binary program
SRI InternationalBioinformaticsFrames
Entities with which facts are associated
Kinds of frames: Classes: Genes, Pathways, Biosynthetic Pathways Instances (objects): trpA, TCA cycle
Classes: Superclass(es) Subclass(es) Instance(s)
A symbolic frame name (id, key) uniquely identifies each frame
SRI InternationalBioinformaticsSlots
Encode attributes/properties of a frame Integer, real number, string
Represent relationships between frames The value of a slot is the identifier of another frame
Every slot is described by a “slot frame” in a KB that defines meta information about that slot
SRI InternationalBioinformaticsSlot Links
Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2
sdhA sdhB sdhC sdhD
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
TCA Cycle
product
component-of
catalyzes
reaction
in-pathway
SRI InternationalBioinformaticsSlots
Number of values Single valued Multivalued: sets, bags
Slot values Any LISP object: Integer, real, string, symbol (frame name)
Slotunits define properties of slots: datatypes, classes, constraints
Two slots are inverses if they encode opposite relationships
Slot Product in class Genes Slot Gene in class Polypeptides
SRI InternationalBioinformaticsRepresentation of Function
Sdh-flavo Sdh-Fe-S Sdh-membrane-1 Sdh-membrane-2
sdhA sdhB sdhC sdhD
Succinate + FAD = fumarate + FADH2
Enzymatic-reaction
Succinate dehydrogenase
TCA Cycle
EC#Keq
CofactorsInhibitors
Molecular wtpI
Left-end-position
SRI InternationalBioinformaticsMonofunctional Monomer
Gene
Reaction
Enzymatic-reaction
Monomer
Pathway
SRI InternationalBioinformaticsBifunctional Monomer
Gene
Reaction
Enzymatic-reaction
Monomer
Pathway
Reaction
Enzymatic-reaction
SRI InternationalBioinformaticsMonofunctional Multimer
Monomer Monomer Monomer Monomer
Gene Gene Gene Gene
Reaction
Enzymatic-reaction
Multimer
Pathway
SRI InternationalBioinformaticsPathway and Substrates
Reactant-1
Reaction
Pathway
ReactionReactionReaction
Reactant-2
Product-2
Product-1
in-pathwayleft
right
SRI InternationalBioinformaticsTranscriptional Regulation
site001
pro001
trpE
trpD
trpC
trpB
trpA
trpL
Int003 RpoSig70
TrpR*trpInt001
trpLEDCBA
trp
apoTrpRInt005
SRI InternationalBioinformaticsPrinciple Classes
Class names are capitalized, plural, separated by dashes
Genetic-Elements, with subclasses: Chromosomes Plasmids
Genes Transcription-Units RNAs
rRNAs, snRNAs, tRNAs, Charged-tRNAs Proteins, with subclasses:
Polypeptides Protein-Complexes
SRI InternationalBioinformaticsPrinciple Classes
Reactions, with subclasses: Transport-Reactions
Enzymatic-Reactions
Pathways
Compounds-And-Elements
SRI InternationalBioinformaticsFrame IDs of Instances
Instance frame ID conventions have evolved over time
Examples: Pathways
TRPSYN-PWY, P23-PWY Genes
AG10045 Monomers
TRPA-MONOMER, AG10045-MONOMER
SRI InternationalBioinformaticsSlots in Multiple Classes
Common-NameSynonymsNames (computed as union of Common-Name,
Synonyms)
CommentCitations
DB-Links
SRI InternationalBioinformaticsGenes Slots
Component-Of (links to replicon, transcription unit)
Left-End-PositionRight-End-PositionCentisome-PositionTranscription-DirectionProduct
SRI InternationalBioinformaticsProteins Slots
Molecular-Weight-SeqMolecular-Weight-Exp
pILocations
Modified-FormUnmodified-Form
Component-Of
SRI InternationalBioinformaticsPolypeptides Slots
Gene
SRI InternationalBioinformaticsProtein-Complexes Slots
Components
SRI InternationalBioinformaticsReactions Slots
EC-Number
Left, RightSubstrates (computed as union of Left, Right)
DeltaG0Keq
Spontaneous?
SRI InternationalBioinformaticsEnzymatic-Reactions Slots
EnzymeReactionActivatorsInhibitorsPhysiologically-RelevantCofactorsProsthetic-GroupsAlternative-SubstratesAlternative-Cofactors
SRI InternationalBioinformaticsPathways Slots
Reaction-ListPredecessorsPrimaries