bio-trac 25 (proteomics: principles and methods) march 24, 2006 zhang-zhi hu, m.d. senior...

43
Bio-Trac 25 (Proteomics: Principles Bio-Trac 25 (Proteomics: Principles and Methods) and Methods) March 24, 2006 March 24, 2006 Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Senior Bioinformatics Scientist, Protein Information Resource Information Resource Research Assistant Professor, Department of Research Assistant Professor, Department of Biochemistry and Molecular Biology Biochemistry and Molecular Biology Georgetown University Medical Center Georgetown University Medical Center Tutorial: Tutorial: Bioinformatics Resources Bioinformatics Resources ( http://pir.georgetown.edu/~huz/class/bioin fo_resource.html )

Post on 19-Dec-2015

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

Bio-Trac 25 (Proteomics: Principles and Methods)Bio-Trac 25 (Proteomics: Principles and Methods)March 24, 2006 March 24, 2006

Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information ResourceSenior Bioinformatics Scientist, Protein Information ResourceResearch Assistant Professor, Department of Research Assistant Professor, Department of Biochemistry and Molecular BiologyBiochemistry and Molecular BiologyGeorgetown University Medical CenterGeorgetown University Medical Center

Tutorial: Tutorial: Bioinformatics ResourcesBioinformatics Resources(http://pir.georgetown.edu/~huz/class/bioinfo_resource.html)

Page 2: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

2

computer + mouse = bioinformatics (information) (biology)

NIH Biomedical Information Science and Technology NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000)Initiative (BISTI) Working Definition (2000) - - Research, Research, development, or application of computational tools and development, or application of computational tools and approaches for expanding the use of biological, medical, approaches for expanding the use of biological, medical, behavioral or health behavioral or health datadata, including those to , including those to acquireacquire, , storestore,, organizeorganize, , archivearchive, , analyzeanalyze, or , or visualizevisualize such data. such data.

What is Bioinformatics?What is Bioinformatics?

Page 3: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

3

Molecular Biology Database CollectionMolecular Biology Database Collection

---- 858858 key databases key databases of of 1515 categories categories

(http://nar.oxfordjournals.org/cgi/content/full/34/suppl_1/D3/DC1)

Page 4: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

4

Database Collection inDatabase Collection in Nucleic Acids Res.Nucleic Acids Res.

Page 5: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

5

http://pir.georgetown.edu/~huz/class/2005_database_update.html

Online Access to Database Collection

http://www.oxfordjournals.org/nar/database/cap/

2006

Page 6: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

6

OverviewOverview

I.I. Text search / Information retrievalText search / Information retrieval

II.II. Sequence & genomics databasesSequence & genomics databases

III.III. Protein family databasesProtein family databases

IV.IV. Database of protein functionsDatabase of protein functions

V.V. Databases of protein structuresDatabases of protein structures

VI.VI. Proteomics databasesProteomics databases

Database Contents, Search and RetrievalDatabase Contents, Search and Retrieval

Page 7: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

7

Entrez Text Searches Text Searches

(http://www.ncbi.nlm.nih.gov/Entrez/)

Page 8: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

8

PubMed Literature DatabasePubMed Literature Database((http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMedhttp://www.ncbi.nlm.nih.gov/entrez/query.fcgi?CMD=Search&DB=PubMed) )

Page 9: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

9

UniProt Text SearchUniProt Text Search(http://www.pir.uniprot.org/cgi-bin/textSearch)

Google type search vs. Boolean searches: AND, OR, NOT

Page 10: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

10

PIR Text Search (I)PIR Text Search (I)((http://pir.georgetown.edu/pirwww/search/textsearch.html) ) Search: Alpha crystallin A chain and protein family?

Page 11: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

11

PIR Text Search (II)PIR Text Search (II)

Can you find which crystallin that has 3D structure determined?

Search: Crystallins that are enzymes ?

Page 12: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

12

I. Sequence & Genomics DatabasesI. Sequence & Genomics Databases

GenBankGenBank: An annotated collection of all publicly available nucleotide An annotated collection of all publicly available nucleotide and protein sequences.and protein sequences.RefSeqRefSeq: NCBI : NCBI non-redundant set of reference sequences, including non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein productsgenomic DNA, transcript (RNA), and protein productsUniProtUniProt Consortium DatabaseConsortium Database: : UUniversal protein knowledgebase, a niversal protein knowledgebase, a central resource of protein sequence and function from central resource of protein sequence and function from Swiss-ProtSwiss-Prot, , TrEMBLTrEMBL and and PIRPIR..Entrez GeneEntrez Gene: Gene-centered information at NCBI.: Gene-centered information at NCBI.UniGeneUniGene: Unified clusters of ESTs and full-length mRNA sequences .: Unified clusters of ESTs and full-length mRNA sequences .OMIMOMIM:: Online Mendelian inheritance in man: a catalog of human Online Mendelian inheritance in man: a catalog of human genetic and genomic disorders.genetic and genomic disorders.Model Organism Genome DatabasesModel Organism Genome Databases:: MGD, RGD, SGD, Flybase…MGD, RGD, SGD, Flybase…

GeneCardsGeneCards:: Integrated database of human genes, maps, proteins and Integrated database of human genes, maps, proteins and diseases.diseases.SNP SNP Consortium DatabaseConsortium Database

Page 13: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

13

UniProt Consortium DatabasesUniProt Consortium Databases

((http://www.uniprot.orghttp://www.uniprot.org) )

2.85 million

Universal Protein Resource

UniProtKB UniProtKB UniRef UniRef UniParcUniParc

Page 14: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

14

UniProt Sequence Report (I)UniProt Sequence Report (I)

(http://www.pir.uniprot.org/cgi-bin/unipEntry?id=CRYAA_RABIT)

What’s the difference between CRYAA_RABIT & CYRBAA?

Page 15: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

15

UniProt Sequence Report (II)UniProt Sequence Report (II)

(http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489)

(http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef90_P02489)

Page 17: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

17

OMIM: OMIM: Online Mendelian inheritance in manOnline Mendelian inheritance in man

(http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580)

Page 18: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

18

II. Protein Family DatabasesII. Protein Family Databases

Whole Proteins PIRSF: A Network Classification System of Protein Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins

Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families CDD: Conserved Domain Database

Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures

Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART,

PIRSF, SuperFamily

Page 19: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

19

Protein ClusteringProtein Clustering

COGs:COGs: ( (http://www.ncbi.nlm.nih.gov/COG/))

Page 20: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

20

KOGs: KOGs: Eukaryotic Eukaryotic ClustersClusters

(http://www.ncbi.nlm.nih.gov/COG/new/shokog.cgi?KOG3591)

Page 21: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

21

Domain ClassificationDomain Classification

(http://pir.georgetown.edu/cgi-bin/ipcEntry?id=CRYAA_RABIT)

(http://www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget.pl?name=CRYAA_RABIT)

Page 22: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

22

Pfam DomainPfam Domain(http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00525)

Page 23: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

23

Integrated Family ClassificationIntegrated Family Classification

InterProInterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html)

Page 24: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

24

PIRSF: PIRSF: Full Length Full Length Classification Classification

iProClass iProClass Family ReportFamily Report

(http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280)

Page 25: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

25

Protein MotifsProtein Motifs PROSITE is a database of protein families and domains. It consists of

biologically significant sites, patterns and profiles. (http://us.expasy.org/prosite/)

Page 26: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

26

III. Databases of Protein FunctionsIII. Databases of Protein Functions

Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed

Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways

Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins BioCarta: Biological pathways of human and mouse GO: Gene Ontology Consortium Database

Page 27: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

27

KEGG Metabolic & Regulatory PathwaysKEGG Metabolic & Regulatory Pathways

(http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00220+4.3.2.1)

KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)

Page 28: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

28

BioCyc (EcoCyc/MetaCyc BioCyc (EcoCyc/MetaCyc Metabolic Pathways)Metabolic Pathways)

The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)

Page 29: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

29

BioCarta Cellular PathwaysBioCarta Cellular Pathways(http://www.biocarta.com/index.asp)

Page 30: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

30

Protein-Protein Interaction: BINDProtein-Protein Interaction: BIND((http://www.bind.ca/)

Page 31: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

31

Gene OntologyGene Ontology((http://www.geneontology.org/http://www.geneontology.org/))

Three GOs: Molecular Function Biological Process Cellular Component

Page 32: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

32

IV. Databases of Protein StructuresIV. Databases of Protein Structures

Protein Structure PDB: Structure Determined by X-ray Crystallography and NMR PDBsum: Summaries and analyses of PDB structures MMDB: NCBI’s database of 3D structures, part of NCBI Entrez SWISS-MODEL Repository: Database of annotated protein 3D models ModBase: Annotated comparative protein structure models

Structure Classification CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Classification Based on Structure--Structure

Alignment

Page 33: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

33

PDB: Experimental 3D Structure RepositoryPDB: Experimental 3D Structure Repository

(http://www.rcsb.org/pdb/)

Rat gamma-crystallin, Rat gamma-crystallin, chain A, B.chain A, B.

Can you do a text search at PIR to find this?

Page 34: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

34

PDBsum:PDBsum:Summary and AnalysisSummary and Analysis (http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/)

Search 3-D structure summary

2-D structure

Page 35: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

35

Protein Structural Classification (1)Protein Structural Classification (1)

CATH: Hierarchical domain classification of protein structures (http://www.biochem. ucl.ac.uk/bsm/cath_new/)

Page 36: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

36

Protein Structural Classification (2)Protein Structural Classification (2)

(http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html)

SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known.

Page 37: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

37

SWISS-MODEL RepositorySWISS-MODEL Repository

A database of annotated three-dimensional A database of annotated three-dimensional comparative protein structure modelscomparative protein structure models (http://swissmodel.expasy.org/repository/smr.php?sptr_ac=CRGE_RAT&job=2)

Page 38: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

38

VI. Proteomic ResourcesVI. Proteomic Resources

GELBANKGELBANK ( (http://gelbank.anl.gov): 2D-gel patterns from completed ): 2D-gel patterns from completed genomes; genomes; SWISS-2DPAGESWISS-2DPAGE ( (http://www.expasy.org/ch2d/))

PEPPEP ( (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire ): Predictions for Entire Proteomes: summarized analyses of protein sequencesProteomes: summarized analyses of protein sequences Integr8 Integr8 ((http://www.ebi.ac.uk/integr8/): A): A browser for information browser for information relating to completed genomes and proteomes, based on data relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome setscontained in Genome Reviews and the UniProt proteome setsPRIDEPRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications database database Expression Profiling databasesExpression Profiling databases

GPMdbGPMdb (http://gpmdb.thegpm.org/): Mass Spec Proteomics (http://gpmdb.thegpm.org/): Mass Spec Proteomics DatabasesDatabases

Page 39: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

39

2D-Gel Image Databases (1)2D-Gel Image Databases (1)(http://us.expasy.org/ch2d/2d-index.html)

(http://us.expasy.org/cgi-bin/nice2dpage.pl?P02489)

Page 40: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

40

2D-Gel Image Databases (2)2D-Gel Image Databases (2)(http://gelbank.anl.gov/2dgels/index.asp)

Page 41: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

41

GPMdb MS Data SearchGPMdb MS Data Search http://gpmdb.thegpm.org/

Craig, et al., J Proteome Res. 2004, 3:1234-42.

Page 42: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

42

iProLINK: Protein Literature Mining ResourceiProLINK: Protein Literature Mining Resource

http://pir.georgetown.edu/iprolink/

Text mining of Protein phospohrylation

Gene/protein name thesaurus: synonyms, ambiguous names…

Page 43: Bio-Trac 25 (Proteomics: Principles and Methods) March 24, 2006 Zhang-Zhi Hu, M.D. Senior Bioinformatics Scientist, Protein Information Resource Research

43

Choose additional protein IDs to browse the variety of molecular biology databases each sequence report links to.

Delta crystallin II (Argininosuccinate lyase) (UniProt: ARLY2_ANAPL/P24058)

Alpha crystallin A (UniProt: CRYAA_RABIT/P02493)

Lab:Lab: