shewcyc and beocyc: discovery platforms for environmental and bioenergy research tatiana karpinets,...

33
ShewCyc and BeoCyc: discovery platforms for environmental and bioenergy research Tatiana Karpinets, Gretta Serres, and Michael Leuze Oak Ridge National Lab, Marine Biological Laboratory Pathway Tools Workshop 2010

Upload: brett-lane

Post on 17-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

ShewCyc and BeoCyc: discovery platforms for environmental

and bioenergy research

Tatiana Karpinets, Gretta Serres, and Michael Leuze

Oak Ridge National Lab, Marine Biological Laboratory

Pathway Tools Workshop 2010

ShewCyc and Shewanella Knowledgebase

http://shewanella-knowledgebase.org:8080/Shewanella/

Biological insight

Analytical and visualization tools

Experimental data Computational predictions

Manually curated PGDB for Shewanella oneidensis MR1

• Manual reannotation

• Localization prediction

• Regulon predictions (http://regprecise.lbl.gov/RegPrecise/)

• Capture information from literature, gene expression data, proteomics

Yang et al. J Biol Chem. 281:29872-85

CrpFur

Fnr ArgR

Shewanella oneidensis Metabolic Pathway Viewer

developed by Erich Baker , Baylor University, TX

http://watson.ecs.baylor.edu/4360/

Improved Individual Genome Editors

Multi-Genome Annotation Solution:

Ortholog Editor in Combination with Genome Editors

Manual Curation

Manual Curation

Ortholog Table Tools

Edit View

Evidence View

Alignments View

Consistency Check View

Search

Sort

Download

Table Overview

Edit View

MUSCLE (3.6) multiple sequence alignment

Sfri_3956 MKIRVLISLATAFFMLNTSSAFAKDPADTAVQPLLVKPKVIIFDVNETLLDLENMRASVG

Swoo_1992 ---------------------MTLELRDTSIIKDF--PKAVIFDTDNTLYPYHYSHQQAS

:: : **:: : **.:***.::** . ...

Sfri_3956 KALNGREDLLPLWFSTMLHHSLVVSATGDYQTFGSIGVA---------SLQMVAEINGIA

Swoo_1992 LAVQQKAEKILGIKQSRFSDALKISKREIKERLGETASSHSRLLYFQRTIELLGLKTQIM

*:: . : : .: : :* :* : :*. . : :::::. . *

Sfri_3956 ITPEQAKTAILTPLRSLPAHPDVAEGLAKLKAQGYKLVTLTNSSLEGVTLQLKNANLSQY

Swoo_1992 TTLDLEQTYWRTFLTNSQLFPEMHEFLHDLRAHGIQSAVVTDLTAQIQFRKLVYFGLHEA

* : :* * * . .*:: * * .*.*:* : ..:*: : : :* .* :

Sfri_3956 FDANLSIESVGVFKPHLKTYQWAIKDLGVNADEAL-MVAAHGW-DIAGADKAGLQTAFIR

Swoo_1992 FDYIVTTEEAGADKPNPLPFQLARSKLGLEKGDNLWMIGDHPVKDIQGAKKT-LGAITLQ

** :: *..*. **: .:* * ..**:: .: * *:. * ** **.*: * : :.

Sfri_3956 RQGKVLFPLAAQPDYNVL--DVNELASTLAKFN-----

Swoo_1992 KNHKDVKVLKGKEGPDILFDKYSELRELLGEISSNKGK

.: * : * .: . ::* . .** . *.::.

Alignment View

Evidence View

Original annotations

Group annotation

Consistency Check View

Protein length consistency

Domain consistency

Automatic identification of

bad grouping

Protein Length Consistency Domain Consistency

ShewCyc

Probing Intergenic Regions (IGs)

in S. oneidensis using microarray

Experimental data (Many Microbe Microarrays Database): CRP mutant vs wild type MR1 (various time points during the transition from aerobic growth with lactate to anaerobic growth with lactate and fumarate.

Affymetrix microarray was designed to probe transcripts derived from both genes and IGs

Examples: IG SO0016_SO0022; IG SO0017-SO0015

A regulatory effect of the IG transcriptionDown-stream gene Up-stream geneIG

Subset I: IG regions with the same direction of change in gene expression as their neighboring genes (1466)Subset A: IG regions with directions of changes in gene expression that are opposite to upstream genes (805)Subset B: IG regions with directions of changes in gene expression that are opposite to downstream genes (820)

Revealing a biological role of Intergenic Regions transcription

using Pathway Tools

PykAHexR

IG (SO2490_SO2491)

SO2490 (HexR)

SO2491 (PykA)Enzymes of the Entner–Doudoroff (ED) pathway

BioEnergy research Science Center(BESC)

SugarFermentation

Cellulosicbiomass

Fuel(s)Breakdowninto sugars

(1) designing plant cell walls for rapid deconstruction and

(2) engineering microbes for converting plants into biofuel in a single step (consolidated bioprocessing)

BESC’s approach:

AnalysisFramework

BESC knowledgebase: a discovery platform for bioenergy research

Private Public portal

Computational predictions:• Orthologs/Inparalogs• Protein Domains• Protein Localization• Metabolic enzymes and pathways• Carbohydrates Active Enzymes • and more

Manually curated (NREL, UGA)pathway genome database for

Populus trichocarpa

Microbial Phenotypes comparison toolkit

Integrating Experimental Data from LIMS and external resources:

Genomes comparison, analysis and visualization tools: Genome browsers Comparative chromosome

maps (CMAP) Metabolic maps Omic Viewers and more

Metabolic reconstructions for BESC relevant microorganisms

(BeoCyc)

Usage Summary

http://besckb.ornl.gov

CAZYmes Analysis Toolkit (CAT)

Novel approach based on the association analysis to discover links between CAZy

families and pfam domains

Assign CAZy families to a sequence with high

specificity and sensitivity

Find conserved associations

between CAZy families and pfam

domains

Find CAZymes among

hypothetical proteins

Suggest novel CAZy families

Assign carbohydrates activity to unknown

protein domains

Web site: http://cricket.ornl.gov/cgi-bin/cat.cgi

Private BeoCyc hosts a P. trichocarpa PGDB manually curated by NREL team

1. GDP-mannose biosynthesis II, 2. GDP-L-fucose biosynthesis I (from GDP-D-

mannose), 3. GDP-L-fucose biosynthesis II (from L-fucose), 4. UDP-D-galactose biosynthesis, 5. UDP-D-galacturonate biosynthesis I (from UDP-

D-glucuronate), 6. UDP-D-galacturonate biosynthesis II (from D-

galacturonate), 7. UDP-D-glucose biosynthesis (from sucrose), 8. UDP-D-glucuronate biosynthesis (from myo-

inositol), 9. UDP-D-xylose biosynthesis

(compartmentalized), 10. UDP-L-arabinose biosynthesis I (from UDP-

xylose) in Endoplasmic Reticulum, 11. UDP-L-arabinose biosynthesis I (from UDP-

xylose) in Cytosol, 12. UDP-L-arabinose biosynthesis I (from UDP-

xylose) in Golgi lumen, 13. UDP-L-arabinose biosynthesis II (from L-

arabinose) in Cytosol

BeoCyc and BESC knowledgebasehttp://bobcat.ornl.gov/besc/index.jsp

Sequences

Genes EC numbers

Arabidopsis Populus

Sequences

Genes EC numbers

Ortholog search

Blast

Kyoto Encyclopedia of Genes and Genomesgenomic and molecular information

Improving Populus Trichocarpa genome annotation

Poor annotation of the poplar genome (gene models and predicted enzymes)

Poor representation of the cell wall biosynthesis and related pathways in the reference databases (MetaCyc, KEGG, and PlantCyc)

RESD & PESD

Future!!!

Enzyme information KEGG, CAZy

RefSeq files from the NCBI

Input files for Pathologic

Pathway Genome

DatabasesRefine the PGDBs

Create MySQL tables

Supplement databases by additional annotations

Compare phenotypes of the organisms In terms of their

genomic and metabolic characteristics

Integration of the metabolic reconstructions into BESC knowledgebase

Challenge : automatic PGDB generation for draft genomes using one table for orf

predictions fastA contigs

>C0ATAAAGACGAAAAGCACCGGATCGAACACCGCCACTTCGAAAACTTCGAACGTCTACGG ….>C1rAGTGCGGCTAGGCCGTCGATGGAGCTAGGCCGTCGA ….>C3rGACGAAAAGCAGACGAAAAGCAGACGAAAAGCAGCT….….

Replicon

Locus Start Stop Product EC1 EC2EC3

EC4

EC5

C0 or4062 1176 709Polyketide cyclase/dehydrase          

C0 or4063 4667 1206L-threonine ammonia-lyase (2-oxobutanoate-forming) 4.3.1.19        

C0 or4064 5611 4682glutaryl-CoA dehydrogenase 1.3.99.7        C0 or4065 6384 5608hypothetical protein          C0 or4066 7869 7120naphthoate synthase 4.1.3.36        C0 or4067 8597 7887GntR domain protein          

C0 or4068 8852 8643adenosylcobinamide-phosphate guanylyltransferase 2.7.7.62        

C0 or4069 9812 8973hypothetical protein          

C1r or2287 343 1230protein of unknown function DUF6 transmembrane          

C1r or2288 1398 1679putative lipoprotein          

C1r or2289 2852 1854Alcohol dehydrogenase GroES domain protein          

C1r or2290 2985 3881LysR substrate-binding          C1r or2291 5705 3897peptidase M24          C1r or2292 6933 5764Cysteine desulfurase          C1r or2293 7743 7129Lysine exporter protein (LYSE/YGGA)          C1r or2294 8110 78413-hydroxybutyryl-CoA epimerase 5.1.2.3 1.1.1.35      C3r or2604 401 222phage tail sheath protein FI          

C3r or2605 1499 903

phosphonate metabolism protein/1,5-bisphosphokinase (PRPP-forming) PhnN          

C3r or2606 1870 2196Arc-like DNA binding          C3r or2607 3807 2365glutaryl-CoA dehydrogenase 1.3.99.7        C3r or2608 4128 4640hypothetical protein          C3r                  C3r                  C4r                  

!!! Predict automatically TU, complexes, transporters

for each contig

Involvement of Single-Genotype Consortia in Degradation of Aromatic Compounds by

Rhodopseudomonas palustris

Benzoate

p-Coumarate -

- anoxygenic photosynthesis

- aerobic or anaerobic respiration and fermentation

- fixation of nitrogen gas

- utilization of carbon through CO2 reduction using H2 as an electron donor

Average log2 ratio of the expression of nitrogenases with

different cofactors in the growth on p-coumarate and

benzoate versus succinate

• Transpoters• Chemotaxes operons• Curli formation

operon

Expression of R. palustris phenotypes under p-coumarate (black columns) and benzoate (white columns) degrading

conditions if compared with growth on succinate.

Benzoate

p-Coumarate -

Structures of R. palustris consortia mediating anaerobic growth on p-

coumarate (A) and on benzoate (B)

Putative electron donor and electron acceptor reactions under different modes of the Rhodopseudomonas palustris growth

Changes in total nitrogen, ammonium and dissolved nitrogen gas during the benzoate

degradation as functions of OD660

Acknowledgements

ShewCyc and Shewanella Knowledgebase

PNNL:Margaret Romine

Marine Biological Laboratory: Margrethe Serres

ORNL:Denise Schmoyer Guruprasad KoraMustafa SyedErich BakerHoony Park Nagiza Samatova and Edward Uberbacher

BeoCyc and BESC Knowledgebase

NREL:Ambarish NagChristopher Chang

UGA:Maor Bar-Peled

ORNL:Mustafa SyedHoony Park Morrey ParangDenise Schmoyer and Edward C. Uberbacher