an automated system for deep proteome annotation...david meeuwis roman eisner brett poulin zhiyong...
Post on 02-Feb-2021
0 Views
Preview:
TRANSCRIPT
-
An Automated System for An Automated System for Deep Proteome AnnotationDeep Proteome Annotation
Gary Van DomselaarSeptember 27, 2003
-
The ProblemThe Problem
• Most existing biological databases have a narrow biological aspect.– PDB: biomolecular coordinate data– Ensembl: human gene predictions– GO: Genome Ontology (process, function, location)
• Each has a custom interface• Each can answer questions in its own domain
but cannot answer question that span multiple domain boundaries. ‘Which human gene products located in the
endoplasmic reticulum have experimental coordinate data?’
-
The Solution: Integrated Biological The Solution: Integrated Biological Databases.Databases.
3 main approaches:1. Link Integration. Researchers begin their
query with one data source, then follow hypertext links to related information in other data sources. Example: DAS, NCBI link out.
2. View Integration. A ‘super interface’ is created that makes the source databases appear as one. Example: Kleisli.
3. Data Warehousing. All the data is brought under one roof. Example: Genecards, GeneMine, Cybercell database.
-
An Automated Proteome An Automated Proteome Annotation System for Proteome Annotation System for Proteome
AnalystAnalyst
• Proteome Analyst provides annotations in the form of a ‘PA Card’
-
An Automated Proteome An Automated Proteome Annotation System for Proteome Annotation System for Proteome
AnalystAnalyst
-
An Automated Proteome An Automated Proteome Annotation System for Proteome Annotation System for Proteome
AnalystAnalyst
• Proteome Analyst provides annotations in the form of a ‘PA Card’
• This system will provide a much fuller set of annotations
-
AnnotationsAnnotations• 2D_Gel_Image• Accession_No.• Alternate_Names• Availability• Centisome Position• Cofactors• Copy Number• Cys/Met_Content• EC_Number• Entry_ID• Following_Gene• Gene_Name• Gene_Ontology• Gene_Position• General_Function• General_Reaction• Gene_Sequence
• Quaternary_Structure• Resolution• Riley_Cell_Function• Riley_Gene_Function• RNA_Copy_No.• Secondary_Structure• Sequence• Similarity• Specific_Activity• Specific_Function• Specific_Reaction• Structure_CLASS• Substrates• SWISS_PROT_(AC_&_ID)• Theoretical_pI• Transmembrane• Upstream_100_bases
• Homologues• Important_Sites• Inhibitor• Interacting_Partners• Kcat_Value_[1/min]• Km_Value_[mM]• Location• Metabolic_Importance• Metals_Ions• Molecular_Weight• No._of_Amino_Acids• Other_Databases• Paralogues• Pfam_Domain/Function• Preceding_Gene• Products
• PROSITE_Motif
-
Concept Concept Genomic Sequence Data
-
Concept Concept Genomic Sequence Data
Genomic data analysis must be tailored to the major kingdoms:
•viruses
•prokaryotes
•Eukaryotes - Genscan
} Glimmer
Genomic Sequence Data
-
Concept Concept Genomic Sequence Data
Proteomic Sequence Data
Gene Identification
and Translation
-
Concept Concept Genomic Sequence Data
Proteomic Sequence Data Processing
Gene Identification
and Translation
-
Concept Concept Genomic Sequence Data
Proteomic Sequence Data Processing
Gene Identification
and Translation
Internal Processing
-
Concept Concept Genomic Sequence Data
Proteomic Sequence Data Processing
Gene Identification
and Translation
Internal Processing
-
Concept Concept Genomic Sequence Data
Proteomic Sequence Data Processing
Gene Identification
and Translation
•Secondary Structure
•Homology Modeling
•Mol. Wt
•pI
•Etc.
Internal Processing
-
Concept Concept Genomic Sequence Data
Proteomic Sequence Data Processing
Internal Processing
Gene Identification
and Translation
Internal DBs
-
Concept Concept Genomic Sequence Data
Proteomic Sequence Data Processing
Internal Processing
Gene Identification
and Translation
•CCDB: a deeply annotated database for E. coli.
•CCDB++ other deeply annotated model organisms from each kingdom
SWISS-PROT
PDB
Internal DBs
-
Cybercell (CCDB)Cybercell (CCDB)
• A comprehensive collection of detailed enzymatic, biological, chemical, genetic, and molecular biological data about E. coli (strain K12, MG1655).
-
Concept Concept
External DBs
Genomic Sequence Data
Proteomic Sequence Data Processing
External Processing
Internal Processing
Internal DBsGene Identification
and Translation
-
Data SourcesData Sources
• GenBank • SwissProt • Prosite • pI/MW Tool • Geneiz • PIR PEC/Shigen • Echobase • Wisconsin • ExpressDB • GeneOntology • GenProtEC • EcoGene • PsiPred
• EcoCyc • PDB • CATH • Swiss2D PAGE • SwissModel • BRENDA • TargetDB • Rosetta • PsortB • KEGG • Chemfinder • Babel
-
Concept Concept
External DBs
Genomic Sequence Data
Proteomic Sequence Data Processing
External Processing
Internal Processing
Internal DBs
Annotated Proteomic Sequence Data
Gene Identification
and Translation
-
Concept Concept
External DBs
Genomic Sequence Data
Proteomic Sequence Data Processing
External Processing
Internal Processing
Internal DBs
Annotated Proteomic Sequence Data
Viewing and Mining Software
Gene Identification
and Translation
-
Concept Concept
External DBs
Genomic Sequence Data
Proteomic Sequence Data Processing
External Processing
Internal Processing
Internal DBs
Annotated Proteomic Sequence Data
Viewing and Mining Software
Gene Identification
and Translation
Proteome
Analyst
Multiple
Protein
Extraction and
Report
System
-
Data Mining and VisualizationData Mining and Visualization
-
Data Mining and VisualizationData Mining and Visualization
-
Concept Concept
External DBs
Genomic Sequence Data
Proteomic Sequence Data Processing
External Processing
Internal Processing
Internal DBs
Annotated Proteomic Sequence Data
Viewing and Mining Software
Gene Identification
and Translation
Discoveries
-
ProgressProgress
• Curently working on H. Influenzae reference genome.
• Written modules for generating protein sequence data from gene predictions (using glimmer).
• Currently writing the analysis modules and automation scripts.
-
ProgressProgress
-
AcknowledgmentsAcknowledgments
P.I.sDavid WishartDwayne SzaffronPaul LuRussel Greiner
CyberCell DatabaseShan Sundararaj An Chi GuoBahram Habibi Nazhad
Proteome AnalystAlona FysheDavid MeeuwisRoman Eisner Brett PoulinZhiyong LuJohn AnvikCam Macdonnel
-
An Automated System for An Automated System for Deep Proteome AnnotationDeep Proteome Annotation
Gary Van DomselaarSeptember 27, 2003
top related