shewcyc and beocyc: discovery platforms for environmental and bioenergy research tatiana karpinets,...
TRANSCRIPT
ShewCyc and BeoCyc: discovery platforms for environmental
and bioenergy research
Tatiana Karpinets, Gretta Serres, and Michael Leuze
Oak Ridge National Lab, Marine Biological Laboratory
Pathway Tools Workshop 2010
ShewCyc and Shewanella Knowledgebase
http://shewanella-knowledgebase.org:8080/Shewanella/
Biological insight
Analytical and visualization tools
Experimental data Computational predictions
• Manual reannotation
• Localization prediction
• Regulon predictions (http://regprecise.lbl.gov/RegPrecise/)
• Capture information from literature, gene expression data, proteomics
Shewanella oneidensis Metabolic Pathway Viewer
developed by Erich Baker , Baylor University, TX
http://watson.ecs.baylor.edu/4360/
Improved Individual Genome Editors
Multi-Genome Annotation Solution:
Ortholog Editor in Combination with Genome Editors
Manual Curation
Manual Curation
Ortholog Table Tools
Edit View
Evidence View
Alignments View
Consistency Check View
Search
Sort
Download
Table Overview
MUSCLE (3.6) multiple sequence alignment
Sfri_3956 MKIRVLISLATAFFMLNTSSAFAKDPADTAVQPLLVKPKVIIFDVNETLLDLENMRASVG
Swoo_1992 ---------------------MTLELRDTSIIKDF--PKAVIFDTDNTLYPYHYSHQQAS
:: : **:: : **.:***.::** . ...
Sfri_3956 KALNGREDLLPLWFSTMLHHSLVVSATGDYQTFGSIGVA---------SLQMVAEINGIA
Swoo_1992 LAVQQKAEKILGIKQSRFSDALKISKREIKERLGETASSHSRLLYFQRTIELLGLKTQIM
*:: . : : .: : :* :* : :*. . : :::::. . *
Sfri_3956 ITPEQAKTAILTPLRSLPAHPDVAEGLAKLKAQGYKLVTLTNSSLEGVTLQLKNANLSQY
Swoo_1992 TTLDLEQTYWRTFLTNSQLFPEMHEFLHDLRAHGIQSAVVTDLTAQIQFRKLVYFGLHEA
* : :* * * . .*:: * * .*.*:* : ..:*: : : :* .* :
Sfri_3956 FDANLSIESVGVFKPHLKTYQWAIKDLGVNADEAL-MVAAHGW-DIAGADKAGLQTAFIR
Swoo_1992 FDYIVTTEEAGADKPNPLPFQLARSKLGLEKGDNLWMIGDHPVKDIQGAKKT-LGAITLQ
** :: *..*. **: .:* * ..**:: .: * *:. * ** **.*: * : :.
Sfri_3956 RQGKVLFPLAAQPDYNVL--DVNELASTLAKFN-----
Swoo_1992 KNHKDVKVLKGKEGPDILFDKYSELRELLGEISSNKGK
.: * : * .: . ::* . .** . *.::.
Alignment View
Original annotations
Group annotation
Consistency Check View
Protein length consistency
Domain consistency
Automatic identification of
bad grouping
Protein Length Consistency Domain Consistency
Probing Intergenic Regions (IGs)
in S. oneidensis using microarray
Experimental data (Many Microbe Microarrays Database): CRP mutant vs wild type MR1 (various time points during the transition from aerobic growth with lactate to anaerobic growth with lactate and fumarate.
Affymetrix microarray was designed to probe transcripts derived from both genes and IGs
Examples: IG SO0016_SO0022; IG SO0017-SO0015
A regulatory effect of the IG transcriptionDown-stream gene Up-stream geneIG
Subset I: IG regions with the same direction of change in gene expression as their neighboring genes (1466)Subset A: IG regions with directions of changes in gene expression that are opposite to upstream genes (805)Subset B: IG regions with directions of changes in gene expression that are opposite to downstream genes (820)
Revealing a biological role of Intergenic Regions transcription
using Pathway Tools
PykAHexR
IG (SO2490_SO2491)
SO2490 (HexR)
SO2491 (PykA)Enzymes of the Entner–Doudoroff (ED) pathway
BioEnergy research Science Center(BESC)
SugarFermentation
Cellulosicbiomass
Fuel(s)Breakdowninto sugars
(1) designing plant cell walls for rapid deconstruction and
(2) engineering microbes for converting plants into biofuel in a single step (consolidated bioprocessing)
BESC’s approach:
AnalysisFramework
BESC knowledgebase: a discovery platform for bioenergy research
Private Public portal
Computational predictions:• Orthologs/Inparalogs• Protein Domains• Protein Localization• Metabolic enzymes and pathways• Carbohydrates Active Enzymes • and more
Manually curated (NREL, UGA)pathway genome database for
Populus trichocarpa
Microbial Phenotypes comparison toolkit
Integrating Experimental Data from LIMS and external resources:
Genomes comparison, analysis and visualization tools: Genome browsers Comparative chromosome
maps (CMAP) Metabolic maps Omic Viewers and more
Metabolic reconstructions for BESC relevant microorganisms
(BeoCyc)
Usage Summary
http://besckb.ornl.gov
CAZYmes Analysis Toolkit (CAT)
Novel approach based on the association analysis to discover links between CAZy
families and pfam domains
Assign CAZy families to a sequence with high
specificity and sensitivity
Find conserved associations
between CAZy families and pfam
domains
Find CAZymes among
hypothetical proteins
Suggest novel CAZy families
Assign carbohydrates activity to unknown
protein domains
Web site: http://cricket.ornl.gov/cgi-bin/cat.cgi
Private BeoCyc hosts a P. trichocarpa PGDB manually curated by NREL team
1. GDP-mannose biosynthesis II, 2. GDP-L-fucose biosynthesis I (from GDP-D-
mannose), 3. GDP-L-fucose biosynthesis II (from L-fucose), 4. UDP-D-galactose biosynthesis, 5. UDP-D-galacturonate biosynthesis I (from UDP-
D-glucuronate), 6. UDP-D-galacturonate biosynthesis II (from D-
galacturonate), 7. UDP-D-glucose biosynthesis (from sucrose), 8. UDP-D-glucuronate biosynthesis (from myo-
inositol), 9. UDP-D-xylose biosynthesis
(compartmentalized), 10. UDP-L-arabinose biosynthesis I (from UDP-
xylose) in Endoplasmic Reticulum, 11. UDP-L-arabinose biosynthesis I (from UDP-
xylose) in Cytosol, 12. UDP-L-arabinose biosynthesis I (from UDP-
xylose) in Golgi lumen, 13. UDP-L-arabinose biosynthesis II (from L-
arabinose) in Cytosol
BeoCyc and BESC knowledgebasehttp://bobcat.ornl.gov/besc/index.jsp
Sequences
Genes EC numbers
Arabidopsis Populus
Sequences
Genes EC numbers
Ortholog search
Blast
Kyoto Encyclopedia of Genes and Genomesgenomic and molecular information
Improving Populus Trichocarpa genome annotation
Poor annotation of the poplar genome (gene models and predicted enzymes)
Poor representation of the cell wall biosynthesis and related pathways in the reference databases (MetaCyc, KEGG, and PlantCyc)
RESD & PESD
Future!!!
Enzyme information KEGG, CAZy
RefSeq files from the NCBI
Input files for Pathologic
Pathway Genome
DatabasesRefine the PGDBs
Create MySQL tables
Supplement databases by additional annotations
Compare phenotypes of the organisms In terms of their
genomic and metabolic characteristics
Integration of the metabolic reconstructions into BESC knowledgebase
Challenge : automatic PGDB generation for draft genomes using one table for orf
predictions fastA contigs
>C0ATAAAGACGAAAAGCACCGGATCGAACACCGCCACTTCGAAAACTTCGAACGTCTACGG ….>C1rAGTGCGGCTAGGCCGTCGATGGAGCTAGGCCGTCGA ….>C3rGACGAAAAGCAGACGAAAAGCAGACGAAAAGCAGCT….….
Replicon
Locus Start Stop Product EC1 EC2EC3
EC4
EC5
C0 or4062 1176 709Polyketide cyclase/dehydrase
C0 or4063 4667 1206L-threonine ammonia-lyase (2-oxobutanoate-forming) 4.3.1.19
C0 or4064 5611 4682glutaryl-CoA dehydrogenase 1.3.99.7 C0 or4065 6384 5608hypothetical protein C0 or4066 7869 7120naphthoate synthase 4.1.3.36 C0 or4067 8597 7887GntR domain protein
C0 or4068 8852 8643adenosylcobinamide-phosphate guanylyltransferase 2.7.7.62
C0 or4069 9812 8973hypothetical protein
C1r or2287 343 1230protein of unknown function DUF6 transmembrane
C1r or2288 1398 1679putative lipoprotein
C1r or2289 2852 1854Alcohol dehydrogenase GroES domain protein
C1r or2290 2985 3881LysR substrate-binding C1r or2291 5705 3897peptidase M24 C1r or2292 6933 5764Cysteine desulfurase C1r or2293 7743 7129Lysine exporter protein (LYSE/YGGA) C1r or2294 8110 78413-hydroxybutyryl-CoA epimerase 5.1.2.3 1.1.1.35 C3r or2604 401 222phage tail sheath protein FI
C3r or2605 1499 903
phosphonate metabolism protein/1,5-bisphosphokinase (PRPP-forming) PhnN
C3r or2606 1870 2196Arc-like DNA binding C3r or2607 3807 2365glutaryl-CoA dehydrogenase 1.3.99.7 C3r or2608 4128 4640hypothetical protein C3r C3r C4r
!!! Predict automatically TU, complexes, transporters
for each contig
Involvement of Single-Genotype Consortia in Degradation of Aromatic Compounds by
Rhodopseudomonas palustris
Benzoate
p-Coumarate -
- anoxygenic photosynthesis
- aerobic or anaerobic respiration and fermentation
- fixation of nitrogen gas
- utilization of carbon through CO2 reduction using H2 as an electron donor
Average log2 ratio of the expression of nitrogenases with
different cofactors in the growth on p-coumarate and
benzoate versus succinate
• Transpoters• Chemotaxes operons• Curli formation
operon
Expression of R. palustris phenotypes under p-coumarate (black columns) and benzoate (white columns) degrading
conditions if compared with growth on succinate.
Benzoate
p-Coumarate -
Structures of R. palustris consortia mediating anaerobic growth on p-
coumarate (A) and on benzoate (B)
Putative electron donor and electron acceptor reactions under different modes of the Rhodopseudomonas palustris growth
Changes in total nitrogen, ammonium and dissolved nitrogen gas during the benzoate
degradation as functions of OD660
Acknowledgements
ShewCyc and Shewanella Knowledgebase
PNNL:Margaret Romine
Marine Biological Laboratory: Margrethe Serres
ORNL:Denise Schmoyer Guruprasad KoraMustafa SyedErich BakerHoony Park Nagiza Samatova and Edward Uberbacher
BeoCyc and BESC Knowledgebase
NREL:Ambarish NagChristopher Chang
UGA:Maor Bar-Peled
ORNL:Mustafa SyedHoony Park Morrey ParangDenise Schmoyer and Edward C. Uberbacher