metacyc and aracyc: plant metabolic databases hartmut foerster carnegie institution

53
AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Upload: abraham-hart

Post on 13-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc and AraCyc:

Plant Metabolic Databases

Hartmut FoersterCarnegie Institution

Page 2: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Overview

MetaCyc: Explore the Database

AraCyc: The Metabolic Network of Arabidopsis

Page 3: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc

http://www.metacyc.org

Page 4: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

What is MetaCyc ?

MetaCyc is a multi-organism database that collects any known pathways across all kingdoms

MetaCyc is a curated, literature-based biochemical pathways database

Collaboration between SRI International and Carnegie Institution

Page 5: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Goal and ApplicationsGoal

Universal repository of metabolic pathways Up-to-date, literature-curated catalogue of

commented enzymes and pathways for use in research, metabolic engineering and education

Applications Database of reference used to generate

predicted Pathway/Genome DataBases (PGDBs)

Page 6: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

The Content of MetaCyc

Pathways from primary and secondary (specialized) metabolism Reactions with compound structures

Proteins Genes

Mostly from microorganism and plant kingdoms, and several animal pathways

Does not contain sequence information

Page 7: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Curation Team

PhD-level curators

Extract data from literature oNLY experimentally verified data ideally protein information

Follow MetaCyc Curator’s guide http://bioinformatics.ai.sri.com/ptools/curatorsguide.pdf

Page 8: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc Version 10.5to be released in August, 2006

Note: The statistics for each year pertain to the last MetaCyc version released in that year.

Note: The statistics for each year pertain to the last MetaCyc version released in that year

Page 9: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Browse the Database

Explore class hierarchy of pathways, compounds, reactions,

genes and cell components

Page 10: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Browse the Database (cont’d)

Explore class hierarchy of pathways, compounds, reactions,

genes and cell components

Page 11: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Explore the Metabolic Universe

Query the database (pathways, reactions, compounds, genes)

Page 12: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Query Page

Type your search term

and click submit

Luteolin

Page 13: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Query Result Page

Page 14: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Pathway Detail Page Part I

The pathway diagram shows compounds,

reactions and metabolic links

Page 15: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Pathway Detail Page Part II

Pathway commentary comprises general and

specific information about the pathway

Page 16: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – More Detail

Extend or collapse the detail level of the pathway detail page

Page 17: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – More Detail (cont’d)

In depth information about reaction, EC number, enzymes, genes, regulatory

aspects, and metabolic links to related

pathways

Page 18: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – More Detail (cont’d)

More detail reveals structural information about

compounds

Page 19: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Reaction Detail Page

Page 20: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Enzymes and Genes

Contains enzyme commentary,

references, and physico-chemical properties of the

enzyme

Page 21: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Enzymes and Genes (cont’d)

Page 22: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Evidence Codes

Intuitive icons

Pathway Level

Evidence codes provide assessment of data quality, i.e. the affirmation for the existence of an pathway

Evidence codes provide assessment of data quality, i.e. the affirmation for the catalytic activity of an enzyme

Enzyme Level

Page 23: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

MetaCyc – Evidence Codes (cont’d)

Page 24: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Variants, Related Pathways and Links

Pathway variants are created as separate pathways

IAA biosynthesis I (tryptophane-dependent) IAA biosynthesis II (tryptophane-independent)

Links are added between interconnected pathways

Related pathways are grouped into superpathways

e.g.superpathway of choline biosynthesis

Page 25: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Creation of a Superpathway

L-serine

ethanolamine

phosphoryl-ethanolamine

N-methylethanolamine phosphate

N-dimethylethanolamine phosphate

phosphoryl-choline

cholinecholine

biosynthesis III

Choline biosynthesis

II

ethanolaminephosphoryl-choline

CDP-choline-choline

a phosphatidylcholine

choline biosynthesis III choline biosynthesis IIcholine biosynthesis I

N-monomethylethanolamine

N-dimethylethanolamine

choline

choline

Superpathway choline biosynthesis

Page 26: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Applications: Pathways Prediction

Goal Universal repository of metabolic pathways

Up-to-date, literature-curated catalogue of commented enzymes and pathways for use in research, metabolic engineering and education

Applications Database of reference used to generate

predicted Pathway/Genome DataBases (PGDBs)

Page 27: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Pathway Tools Software Suite

Software for generating, curating, querying, displaying PGDBs

Developed by Peter Karp and team PathoLogic – Infers pathways from genome or transcripts

sequencing Pathway/Genome Editors – Curation interface Pathway/Genome Navigator – Query, visualization,

analysis and Web publishing OMICS Viewer

Page 28: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

The PGDB Family

Annotated GenomeArabidopsis thaliana

PathoLogic

SoftwareReference PathwayDatabase (MetaCyc)

Reactions

Pathways

compounds

Gene products

genes

Pathway/Genome Database (AraCyc)

Page 29: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

AraCyc: The Arabidopsis thaliana

specific metabolic database

http://www.arabidopsis.org/tools/aracyc

Page 30: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

The Computational Build of AraCyc In 2004, the Arabidopsis genome

contained 7900 genes annotated to the GO term ‘catalytic activity’

4900 enzymes in small molecule metabolism (19% of the total genome)

PathoLogic inferred 219 pathways and mapped 940 (19% enzyme-coding) genes to the pathways

Page 31: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Cleaning of a Newborn Database

PathoLogic errs on the side of over-prediction

First round of curation to remove false-positives

Add missing pathways

Improve the quality of information Introduce new pathways Increase number of pathway and protein comments Refine computational assignment of protein

Page 32: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Pathway Validation Criteria

A pathway that is described in the Arabidopsis literature

A pathway whose critical metabolites are described in the Arabidopsis literature

A pathway that contains unique reactions and having genes assigned to those unique reactions

Page 33: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Validation Procedure

Delete non-plant pathways: Pathway variants of bacteria-origin Pathways not operating in plants at all

(e.g. glycogen biosynthesis)

Add new plant-specific pathways: Pathway variants of plant-origin Plant-specific metabolites (e.g.

plant hormones)

Plant-specific metabolism (e.g. xanthophyll cycle)

Page 34: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

AraCyc - Curation Progress

AraCyc (2.1)Apr 2005

AraCyc (2.5)Oct 2005

AraCyc (2.6) Feb 2006

Total pathways 221 197 228

New - 37 35

Updated - 0 4

Deleted - 61 6

Pathways manually reviewed with literature evidence

71 (32%) 170 (86%) 201 (88%)

Additional 66 new pathways were added to MetaCyc

Page 35: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

How to Link to AraCyc

From the TAIR home page click on the link to AraCyc

pathways

Page 36: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

AraCyc – The Home Page

Browse pathways,

enzymes, genes, compounds

Display of the Arabidopsis

metabolic network

Paint data from high-throughput experiments on the metabolic

map

User submission

form

Page 37: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

AraCyc – All the Help you can get

Page 38: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

AraCyc’s Content

Page 39: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Computationally Predicted Enzymes

Page 40: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Experimentally Verified Enzymes

Page 41: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

AraCyc: Pathway Detail Page (cont’d)

Page 42: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

AraCyc: OmicsViewer

Page 43: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

OMICS Viewer

Part of the Pathway Tools Software Suite

Displays bird-eye view of the Metabolic Overview diagram

Allows to paint data values from the user's high-throughput onto the Metabolic Overview diagram

Microarray Expression Data Proteomics Data Metabolomics Data

Page 44: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

OmicsViewer Submission Page

Step 1

Step 2

Load sample file and provide information

about your data

Sample data file(text tab-delimited)

0 1 2 3 4

Page 45: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

OmicsViewer Submission Page (cont’d)

Step 3

Step 6

Step 4

Step 5

Choose relative or absolute valuesCheck the box if

you have log values or

negative fold change numbers

Choose to display a single or

multiple step experiment

Select the type of data you want to display (refers to your loading

file)

Page 46: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

OmicsViewer Submission Page (cont’d)

Step 7

For single or multiple time points add the

corresponding column number(s)

Page 47: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Step 8

Step 9

OmicsViewer Submission Page (cont’d)

Choose your cutoff to

visualize your expression

values

Page 48: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

The Omics Viewer Result Page

reactions (lines) arecolor-coded accordingto the gene expression level

compounds (icons) are color-coded according to the concentration of compounds

Page 49: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

The Omics Viewer Result Page (cont’d)

The statistics for the expression map

(single time points only) is provided at the bottom of the

page

Page 50: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution
Page 51: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Saving Results

Page 52: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Curation Flow

Page 53: MetaCyc and AraCyc: Plant Metabolic Databases Hartmut Foerster Carnegie Institution

Acknowledgements TAIR

- Sue Rhee (PI)- Peifen Zhang (curator)- Christophe Tissier (curator)- Hartmut Foerster (curator)- Mary Montoya (software developer)- Joe Filla (sysadmin)

SRI - Peter Karp (PI)- Ron Caspi (curator)- Carol Fulcher (curator)- Suzanne Paley (software developer)- Pallavi Kaipa (programmer)- Markus Krummenacker (programmer)- Mario Latendresse (programmer)

NIH, NSF, Pioneer Hi-Bred

Previous contributers- Lukas Müller

- Aleksey Kleytman (curator assistant – TAIR)- Thomas Yan (programmer – TAIR)- John Pick (software developer - SRI)