1 bio-trac 25 (proteomics: principles and methods) march 23, 2007 zhang-zhi hu, m.d. research...

47
1 Bio-Trac 25 (Proteomics: Principles and Methods) Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 March 23, 2007 Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Research Associate Professor Research Associate Professor Protein Information Resource, Department of Protein Information Resource, Department of Biochemistry and Molecular & Cellular Biology Biochemistry and Molecular & Cellular Biology Georgetown University Medical Center Georgetown University Medical Center Tutorial: Bioinformatics Resources (http:// pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html )

Upload: justina-patterson

Post on 20-Jan-2016

219 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

1

Bio-Trac 25 (Proteomics: Principles and Methods)Bio-Trac 25 (Proteomics: Principles and Methods)

March 23, 2007March 23, 2007

Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Research Associate ProfessorResearch Associate ProfessorProtein Information Resource, Department of Protein Information Resource, Department of Biochemistry and Molecular & Cellular BiologyBiochemistry and Molecular & Cellular BiologyGeorgetown University Medical CenterGeorgetown University Medical Center

Tutorial: Bioinformatics Resources(http://pir.georgetown.edu/pirwww/workshop/bioinfo_resource.html)

Page 2: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

2

computer + mouse = bioinformatics (information) (biology)

• NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2000) - Research, development, or application of computational tools and approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.

What is Bioinformatics?

Page 3: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

3

Molecular Biology Database Collection

(http://nar.oxfordjournals.org/cgi/content/full/35/suppl_1/D3/DC1)

-- 968 key databases of 14 categories

Page 4: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

4

Database Collection in Nucleic Acids Res.

Page 5: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

5

http://pir.georgetown.edu/pirwww/workshop/2005_database_update.html

Online Access to Database Collection

http://www.oxfordjournals.org/nar/database/cap/

2007

Page 6: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

6

Overview

I.I. Text search / Information retrievalText search / Information retrieval

II.II. Sequence & genomics databasesSequence & genomics databases

III.III. Protein family databasesProtein family databases

IV.IV. Database of protein functionsDatabase of protein functions

V.V. Databases of protein structuresDatabases of protein structures

VI.VI. Proteomics databasesProteomics databases

Database Contents, Search and RetrievalDatabase Contents, Search and Retrieval

Page 7: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

7

Entrez Text Searches

(http://www.ncbi.nlm.nih.gov/Entrez/)

Page 9: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

9

iProLINK: Protein Literature Mining Resource

http://pir.georgetown.edu/iprolink/

Text mining for protein phosphorylationGene/protein name thesaurus: synonyms, ambiguous names…

Page 10: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

10

BioThesaurus: Gene/protein name searches - synonyms, ambiguous names…

http://pir.georgetown.edu/iprolink/biothesaurus

Synonyms: CRYAAcrystallin, alpha ACRYA1HSPB4…

Page 11: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

11

RLIMS-P: Text mining for protein phosphorylation

http://pir.georgetown.edu/iprolink/rlimsp/

Page 12: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

12

UniProt Text Search(http://www.pir.uniprot.org/cgi-bin/textSearch) Google type search vs.

Boolean searches: AND, OR, NOT

Page 13: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

13

PIR Text Search (I)

((http://pir.georgetown.edu/http://pir.georgetown.edu/pirwww/search/pirwww/search/textsearch.htmltextsearch.html) ) Search: alpha crystallin A chain that are in protein families?

Search for synonyms

Search for synonyms

Page 14: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

14

PIR Text Search (II) Search: what crystallins are enzymes and what families they belong to?

Can you find which crystallins have 3D structure determined?

Page 15: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

15

I. Sequence & Genomics Databases

• GenBank: An annotated collection of all publicly available nucleotide and protein sequences.

• RefSeq: NCBI non-redundant set of reference sequences, including genomic DNA, transcript (RNA), and protein products

• UniProt Consortium Database: Universal protein resource, a central repository of protein sequence and function.

• Entrez Gene: Gene-centered information at NCBI.• UniGene: Unified clusters of ESTs and full-length mRNA sequences .• OMIM: Online Mendelian inheritance in man: a catalog of human genetic

and genomic disorders.• Model Organism Genome Databases: MGD, RGD, SGD, Flybase…• GeneCards: Integrated database of human genes, maps, proteins and

diseases.• SNP Consortium Database; International HapMap Project: Genes

associated with human disease

(http://www.oxfordjournals.org/nar/database/cap/)

Page 16: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

16

UniProt Consortium Databases

((http://www.uniprot.org) )

Universal Protein Resource

4.1 million

UniProtKB UniRef UniParc

Page 17: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

17

UniProt Sequence Report (I)

(http://www.pir.uniprot.org/cgi-bin/unipEntry?id=CRYAA_RABIT)

What’s the difference between CRYAA_RABIT & CYRBAA?

UniProtKB

Page 18: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

18

UniProt Report (II): UniRef100 & 90

(http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef90_P02489)

(http://www.pir.uniprot.org/cgi-bin/unipEntry?id=UniRef100_P02489)

UniRef100

UniRef90

Page 20: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

20

OMIM: Online Mendelian inheritance in man

(http://www.ncbi.nlm.nih.gov/entrez/dispomim.cgi?id=123580)

Page 21: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

21

II. Protein Family Databases

• Whole Proteins– PIRSF: Network Classification Based on Evolutionary Relationship of Whole Protein– COG (Clusters of Orthologous Groups) of Complete Genomes– PANTHER: Proteins Classified into Families/Subfamilies of Shared Function– ProtoNet: Automated Hierarchical Classification of Proteins

• Protein Domains– Pfam: Alignments and HMM Models of Protein Domains– SMART: Protein Domain Families– CDD: Conserved Domain Database

• Protein Motifs– PROSITE: Protein Patterns and Profiles– BLOCKS: Protein Sequence Motifs and Alignments– PRINTS: Compendium of Protein Fingerprints (a group of conserved motifs)

• Integrated Family Databases– InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART, PIRSF,

SuperFamily…

Page 22: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

22

Protein Clustering

COGs:COGs: ((http://www.ncbi.nlm.nih.gov/COG/))

Initial version

New version: Includes Eukaryotic Clusters - KOGs

Page 23: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

23

PIRSF: Full Length Classification

iProClass Family Report

(http://pir.georgetown.edu/cgi-bin/ipcSF?id=SF002280)

Page 24: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

24

Domain Classification – Pfam Domain

(http://pir.georgetown.edu/cgi-bin/ipcEntry?id=P02493)

(http://www.sanger.ac.uk/cgi-bin/Pfam/swisspfamget.pl?name=CRYAA_RABIT)

Page 25: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

25

Pfam Domain(http://www.sanger.ac.uk/cgi-bin/Pfam/getacc?PF00525)

Page 26: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

26

Protein Motifs: PROSITE – A database of protein families and domains. It consists of biologically significant sites, patterns and profiles.

(http://us.expasy.org/prosite/)

Page 27: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

27

Integrated Family Classification

InterProInterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs, PIRSF. (http://www.ebi.ac.uk/interpro/search.html

)

Mapping of families

Page 28: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

28

III. Databases of Protein Functions• Metabolic Pathways, Enzymes, and Compounds

– Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed Reactions (EC-IUBMB)

– KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways– LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes– EcoCyc: Encyclopedia of E. coli Genes and Metabolism– MetaCyc: Metabolic Encyclopedia (Metabolic Pathways)– BRENDA: Enzyme Database– UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways

• Inter-Molecular interactions and Regulatory Pathways– IntAct: Protein interaction data from literature and user submission– BIND: Descriptions of interactions, molecular complexes and pathways– DIP: Catalogs experimentally determined interactions between proteins – Reactome - A curated knowledgebase of biological pathways – BioCarta: Biological pathways of human and mouse– GO: Gene Ontology Consortium Database

• Pathway Resources - Pathguide

Page 29: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

29

Biological Pathway Resource Collectionhttp://www.pathguide.org/

• Protein-protein interactions

• Metabolic pathways

• Signaling pathways

• Pathway diagrams

• Transcription factors / gene regulatory networks

• Protein-compound interactions

• Genetic interaction networks

Page 30: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

30

KEGG Metabolic & Regulatory Pathways

(http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00220+4.3.2.1)

KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)

Page 31: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

31

BioCyc: EcoCyc/MetaCyc Metabolic Pathways

The BioCyc Knowledge Library is a collection of Pathway/Genome Databases (http://biocyc.org/)

Page 32: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

32

BioCarta Cellular Pathways(http://www.biocarta.com/index.asp)

Page 33: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

33

Reactome: http://www.reactome.org/ • Collaboration of CSHL, EBI and GO Consortium• Curated resource of core pathways and reactions in human biology• Authored by biological researchers of field experts• Cross-referenced with NCBI, Ensembl and UniProt, HapMap, KEGG…• Inferred orthologous events in 22 non-human species (mouse, rat…)

Page 34: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

34

Transforming Growth Factor (TGF) beta signaling [Homo sapiens]

Event ->REACT_6879.1: Activated type I receptor phosphorylates R-SMAD directly [Homo sapiens] Object -> REACT_7364.1: Phospho-R-SMAD [cytosol]Event -> REACT_6760.1: Phospho-R-SMAD forms a complex with CO-SMAD [Homo sapiens]Object -> REACT_7344.1: Phospho-R-SMAD:CO-SMAD complex [cytosol]Event -> REACT_6726.1: The phospho-R-SMAD:CO-SMAD transfers to the nucleusObject -> REACT_7382.2: Phospho-R-SMAD:CO-SMAD complex [nucleoplasm] ……

(http://reactome.org/cgi-bin/eventbrowser?DB=gk_current&FOCUS_SPECIES=Homo%20sapiens&ID=170834&)

Reactome: events and objects (including modified forms and complex)

Page 35: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

35

Protein-Protein Interaction Database - IntAct(http://www.ebi.ac.uk/intact/)

Page 36: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

36

Gene Ontology (GO)

- Molecular Function - Biological Process - Cellular Component

(http://www.geneontology.org/)

Page 37: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

37

IV. Databases of Protein Structures

• Protein Structure– PDB: Structure Determined by X-ray Crystallography and NMR– PDBsum: Summaries and analyses of PDB structures – MMDB: NCBI’s database of 3D structures, part of NCBI Entrez– SWISS-MODEL Repository: Database of annotated protein 3D

models– ModBase: Annotated comparative protein structure models

• Structure Classification– CATH: Hierarchical Classification of Protein Domain Structures– SCOP: Familial and Structural Protein Relationships– FSSP: Protein Fold Classification Based on Structure--Structure

Alignment

Page 38: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

38

PDB: Experimental 3D Structure Repository

(http://www.rcsb.org/pdb/)

Rat gamma-crystallin Rat gamma-crystallin (chain A, B.)(chain A, B.)

Can you do a text search at PIR to find this (CRGE_RAT)?

Page 39: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

39

PDBsum:Pictorial Database to Provide Summary and Analysis to PDB Entries

Search 3-D structure summary

2-D structure

(http://www.ebi.ac.uk/thornton-srv/databases/pdbsum/)

Page 40: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

40

Protein Structural Classification (1)

CATH: Hierarchical domain classification of protein structures (http://www.cathdb.info/latest/index.html)

Page 41: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

41

Protein Structural Classification (2)

(http://scop.mrc-lmb.cam.ac.uk/scop/data/scop.b.html)

SCOP: comprehensive description of structural and evolutionary relationships between all proteins whose structure is known.

Page 42: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

42

SWISS-MODEL Repository

A database of annotated three-dimensional A database of annotated three-dimensional comparative protein structure modelscomparative protein structure models (http://swissmodel.expasy.org/repository/smr.php?sptr_ac=CRGE_RAT&job=2)

Page 43: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

43

VI. Proteomic Resources

• GELBANK (GELBANK (http://gelbank.anl.gov): 2D-gel patterns of species ): 2D-gel patterns of species with completed genomes. with completed genomes.

• SWISS-2DPAGESWISS-2DPAGE ( (http://www.expasy.org/ch2d/): index of 2D-gels): index of 2D-gels• PEP (PEP (http://cubic.bioc.columbia.edu/ pep/): Predictions for Entire ): Predictions for Entire

Proteomes: summarized analyses of protein sequences Proteomes: summarized analyses of protein sequences • Integr8 (Integr8 (http://www.ebi.ac.uk/integr8/): A browser for information ): A browser for information

relating to completed genomes and proteomes, based on data relating to completed genomes and proteomes, based on data contained in Genome Reviews and the UniProt proteome setscontained in Genome Reviews and the UniProt proteome sets

• PRIDE (PRIDE (http://www.ebi.ac.uk/pride/): PRoteomics IDEntifications ): PRoteomics IDEntifications database Expression Profiling databasesdatabase Expression Profiling databases

• GPMdb GPMdb ((http://gpmdb.thegpm.org/): Mass Spec Proteomics ): Mass Spec Proteomics DatabasesDatabases

Page 44: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

44

2D-Gel Image Databases

(http://us.expasy.org/swiss-2dpage/ac=P02489) Part of WORLD-2DPAGE: index to 2-D PAGE databases and services

(http://us.expasy.org/ch2d/)

Page 45: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

45

GPMdb: MS Data Search (http://gpmdb.thegpm.org/)

Craig, et al., J Proteome Res. 2004, 3:1234-42.

Page 46: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

46

PRIDE: centralized, standards compliant, public data repository for proteomics data

http://www.ebi.ac.uk/pride/

HUPO Plasma

Proteome Project

Page 47: 1 Bio-Trac 25 (Proteomics: Principles and Methods) March 23, 2007 Zhang-Zhi Hu, M.D. Research Associate Professor Protein Information Resource, Department

47

Protein Examples

• Rabbit alpha crystallin A (UniProtKB: CRYAA_RABIT/P02493)

• Delta crystallin II (Argininosuccinate lyase) (UniProtKB: ARLY2_ANAPL/P24058)

• Any additional proteins of your interest for search and retrieval

Lab:

I.I. Text search / Information retrievalText search / Information retrieval1. Literature search and text mining

– Finding synonyms (BioThesaurus)Finding synonyms (BioThesaurus)– Information extraction (e.g., protein phosphorylation sites)Information extraction (e.g., protein phosphorylation sites)

2. Find the sequence for the rabbit alpha crystallin A chain3. Find all alpha crystallin A chain classified in protein families4. Search crystallins that have active enzyme activities5. Find crystallins that have determined 3D structures

II.II. Database contents (reports)Database contents (reports)1.1. Sequence & genomics databases (UniProt)Sequence & genomics databases (UniProt)2.2. Protein family databases (PIRSF)Protein family databases (PIRSF)3.3. Database of protein functions (KEGG)Database of protein functions (KEGG)4.4. Databases of protein structures (PDB)Databases of protein structures (PDB)5.5. Proteomics databases (Swiss-2D)Proteomics databases (Swiss-2D)