bio-trac 25 (proteomics: principles and methods) march 28, 2003 nih, bethesda, md zhang-zhi hu, m.d....

50
BIO-TRAC 25 (Proteomics: Principles and BIO-TRAC 25 (Proteomics: Principles and Methods) Methods) March 28, 2003 March 28, 2003 NIH, Bethesda, MD NIH, Bethesda, MD Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Bioinformatics Scientist, Protein Information Resource Information Resource National Biomedical Research Foundation National Biomedical Research Foundation Tutorial: Tutorial: Bioinformatics Resources Bioinformatics Resources

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

BIO-TRAC 25 (Proteomics: Principles and Methods)BIO-TRAC 25 (Proteomics: Principles and Methods)March 28, 2003March 28, 2003 NIH, Bethesda, MDNIH, Bethesda, MD

Zhang-Zhi Hu, M.D. Zhang-Zhi Hu, M.D. Bioinformatics Scientist, Protein Information ResourceBioinformatics Scientist, Protein Information ResourceNational Biomedical Research FoundationNational Biomedical Research Foundation

Tutorial: Tutorial: Bioinformatics ResourcesBioinformatics Resources

2

What is Bioinformatics?What is Bioinformatics?

NIH Biomedical Information Science and Technology NIH Biomedical Information Science and Technology Initiative (BISTI) Working Definition (2002)Initiative (BISTI) Working Definition (2002) - Research, - Research, development, or application of computational tools and development, or application of computational tools and approaches for expanding the use of biological, medical, approaches for expanding the use of biological, medical, behavioral or health data, including those to acquire, store, behavioral or health data, including those to acquire, store, organize, archive, analyze, or visualize such data.organize, archive, analyze, or visualize such data.

BioinformaticsBioinformatics is the application of information technology to is the application of information technology to the analysis, organization and distribution of biological data the analysis, organization and distribution of biological data in order to answer complex biological questions.in order to answer complex biological questions.

3

Bioinformatics ResourcesBioinformatics Resources

The Molecular Biology Database Collection: The Molecular Biology Database Collection: An Online An Online Compilation of Relevant Database ResourcesCompilation of Relevant Database Resources 2003 update: 2003 update: http://www3.oup.co.uk/nar/database/ Nucleic Acids Research Database Issues (January Annually) Nucleic Acids Research Database Issues (January Annually)

(2003 - (2003 - http://nar.oupjournals.org/content/vol31/issue1/))

DBcat: DBcat: A Catalog of > 500 Biological DatabasesA Catalog of > 500 Biological Databases http://www.infobiogen.fr/services/dbcat/

4

Molecular Biology Database Collection Molecular Biology Database Collection (http://nar.oupjournals.org/cgi/content/full/31/1/1#GKG120TB1)

5

The Molecular Biology Database Collection: The Molecular Biology Database Collection: 2003 update (Baxevanis, A.D.)2003 update (Baxevanis, A.D.)

---- An online resource of 386 key databases of 18 categoriesAn online resource of 386 key databases of 18 categories

Major sequence repositoriesMajor sequence repositories

Comparative GenomicsComparative Genomics

Gene ExpressionGene Expression

Gene Identification and Gene Identification and StructureStructure

Genetic and Physical MapsGenetic and Physical Maps

Genomic DatabasesGenomic Databases

Intermolecular InteractionsIntermolecular Interactions

Metabolic Pathways and Metabolic Pathways and Cellular RegulationCellular Regulation

Mutation DatabasesMutation Databases

PathologyPathology

Protein Sequence MotifsProtein Sequence Motifs

Proteome ResourcesProteome Resources

Retrieval Systems and Retrieval Systems and Database StructureDatabase Structure

RNA SequencesRNA Sequences

StructureStructure

TransgenicsTransgenics

Varied Biomedical ContentVaried Biomedical Content

6

OverviewOverview

Protein Sequence AnalysisProtein Sequence AnalysisII. Sequence Similarity Search and Alignment. Sequence Similarity Search and Alignment

IIII. Family Classification Methods. Family Classification Methods

IIIIII. Structure Prediction Methods. Structure Prediction Methods

Molecular Biology DatabasesMolecular Biology DatabasesIVIV. Protein Family Databases. Protein Family Databases

VV. Database of Protein Functions. Database of Protein Functions

VIVI. Databases of Protein Structures. Databases of Protein Structures

Proteomic ResourcesProteomic ResourcesVIIVII. 2D-gel databases. 2D-gel databases

VIIIVIII. Proteomic analyses. Proteomic analyses

7

I. Sequence Similarity SearchI. Sequence Similarity Search

Find a protein sequence: Find a protein sequence: text searchtext searchBased on Based on Pair-Wise ComparisonsPair-Wise Comparisons BLOSUMBLOSUM scoring matrix scoring matrix PAMPAM scoring matrix scoring matrixDynamic Programming AlgorithmsDynamic Programming Algorithms Global Similarity: Global Similarity: Needleman-WunschNeedleman-Wunsch ( (GAP/BestFitGAP/BestFit)) Local Similarity: Local Similarity: Smith-WatermanSmith-Waterman ( (SSEARCHSSEARCH))Heuristic Algorithms (Sequence Database Searching)Heuristic Algorithms (Sequence Database Searching) FASTAFASTA: Based on K-Tuples (2-Amino Acid): Based on K-Tuples (2-Amino Acid) BLASTBLAST: Triples of Conserved Amino Acids: Triples of Conserved Amino Acids Gapped-BLASTGapped-BLAST: Allow Gaps in Segment Pairs (NREF): Allow Gaps in Segment Pairs (NREF) PHI-BLASTPHI-BLAST: Pattern-Hit Initiated Search (NCBI): Pattern-Hit Initiated Search (NCBI) PSI-BLASTPSI-BLAST: Iterative Search (NCBI): Iterative Search (NCBI)

8

Sequence Search by Text or Unique IDSequence Search by Text or Unique ID(http://www.ncbi.nlm.nih.gov/Entrez/)

(http://pir.georgetown.edu/pirwww/search/textsearch.html)

9

Pair-Wise Pair-Wise ComparisonsComparisons

Scoring matrix Global lobal and local local

Similarity: Similarity: Dynamic Dynamic ProgrammingProgramming((Needleman-Wunsch,Smith-Waterman)

((http://www.ebi.ac.uk/emboss/align/))

10

FASTA SearchFASTA Search

(http://www.ebi.ac.uk/fasta33/)

(http://pir.georgetown.edu/pirwww/search/fasta.html)

11

Gapped-BLAST SearchGapped-BLAST Search(http://pir.georgetown.edu/pirwww/search/pirnref.shtml)

(http://www.ncbi.nlm.nih.gov/BLAST/)

12

PSI-BLAST Iterative SearchPSI-BLAST Iterative Search

(http://www.ncbi.nlm.nih.gov/BLAST/)

13

PSI-BLASTPSI-BLAST

14

II. Family Classification MethodsII. Family Classification Methods

Multiple Sequence AlignmentMultiple Sequence Alignment and Phylogenetic Analysis and Phylogenetic Analysis ClustalW Multiple Sequence AlignmentClustalW Multiple Sequence Alignment Alignment Editor & Phylogenetic TreesAlignment Editor & Phylogenetic Trees

Based on Based on Family InformationFamily Information PROSITE Pattern SearchPROSITE Pattern Search Motif and Profile SearchMotif and Profile Search Hidden Markov Model (HMMs)Hidden Markov Model (HMMs)

15

Multiple Sequence AlignmentMultiple Sequence Alignment ClustalW (http://pir.georgetown.edu/pirwww/search/multaln.html)

16

Alignment Editor (Jalview)Alignment Editor (Jalview)(http://www.ebi.ac.uk/clustalw/)

17

Alignment Editor (GeneDoc)Alignment Editor (GeneDoc)(http://www.psc.edu/biomed/genedoc/)

18

Phylogenetic AnalysisPhylogenetic AnalysisTree Programs: (Tree Programs: (http://evolution. http://evolution. genetics.washington.edu/phylip.htmlgenetics.washington.edu/phylip.html)) Tree Searches: (http://pauling.

mbu.iisc.ernet.in/~pali/index.html)

19

PROSITE Pattern SearchPROSITE Pattern Search(http://pir.georgetown.edu/pirwww/search/patmatch.html)

20

Profile SearchProfile Search(http://bmerc-www.bu.edu/bioinformatics/profile_request.html)

21

Hidden Markov Model Search Hidden Markov Model Search (http://www.sanger.ac.uk/Software/Pfam/search.shtml)

(http://smart.embl-heidelberg.de)

22

III. Structural Prediction MethodsIII. Structural Prediction Methods

Signal Peptide (e.g. http://www.cbs.dtu.dk/services/)

Transmembrane Helix (e.g. http://www.cbs.dtu.dk/services/)

2D Prediction (e.g. http://cubic.bioc.columbia.edu/ predictprotein/, http://www.compbio.dundee.ac.uk/ WWW_Servers/JPred/jpred.html)

3D Modeling (e.g. http://guitar.rockefeller.edu/modeller/ modeller.html)

23

StructureStructurePrediction:Prediction:A GuideA Guide

(www.bmm.icnet.uk/people/rob/CCP11BBS/flowchart2.html)

24

Protein Protein Prediction Prediction ServerServer

(http://www.cbs.dtu.dk/services/)

25

Signal Peptide PredictionSignal Peptide Prediction(http://www.stepc.gr/~synaptic/sigfind.html)

(http://www.cbs.dtu.dk/services/SignalP)

26

Transmembrane HelixTransmembrane Helix

(http://www.cbs.dtu.dk/services/TMHMM/)

27

Protein Structure PredictionProtein Structure Prediction(http://cmgm.stanford.edu/WWW/www_predict.html)

(http://restools.sdsc.edu/biotools/biotools9.html)

28

Structure Prediction ServerStructure Prediction Server(http://cubic.bioc.columbia.edu/predictprotein/)

(http://www.compbio.dundee.ac.uk/WWW_Servers/JPred/jpred.html)

30

IV. Protein Family DatabasesIV. Protein Family Databases

Whole Proteins PIR: Superfamilies and Families COG (Clusters of Orthologous Groups) of Complete Genomes ProtoNet: Automated Hierarchical Classification of Proteins

Protein Domains Pfam: Alignments and HMM Models of Protein Domains SMART: Protein Domain Families

Protein Motifs PROSITE: Protein Patterns and Profiles BLOCKS: Protein Sequence Motifs and Alignments PRINTS: Protein Sequence Motifs and Signatures

Integrated Family Databases iProClass: Superfamilies/Families, Domains, Motifs, Rich Links InterPro: Integrate Pfam, PRINTS, PROSITES, ProDom, SMART

31

Protein ClusteringProtein Clustering((http://www.ncbi.nlm.nih.gov/COG/))

32

Protein DomainsProtein DomainsPfam (http://www.sanger.ac.uk/Software/Pfam/)

SMART (http:// smart.embl-heid elberg.de/smart/ show_motifs.pl)

33

Protein MotifsProtein Motifs PROSITE is a database of protein families and domains. It

consists of biologically significant sites, patterns and profiles. (http://www.expasy.ch/prosite/)

34

Integrated Family ClassificationIntegrated Family ClassificationInterPro: An integrated resource unifying PROSITE, PRINTS, ProDom, Pfam, SMART, and TIGRFAMs. (http://www.ebi.ac.uk/interpro/search.html)

35

V. Databases of Protein FunctionsV. Databases of Protein Functions

Metabolic Pathways, Enzymes, and Compounds Enzyme Classification: Classification and Nomenclature of Enzyme-Catalysed

Reactions (EC-IUBMB) KEGG (Kyoto Encyclopedia of Genes and Genomes): Metabolic Pathways LIGAND (at KEGG): Chemical Compounds, Reactions and Enzymes EcoCyc: Encyclopedia of E. coli Genes and Metabolism MetaCyc: Metabolic Encyclopedia (Metabolic Pathways) WIT: Functional Curation and Metabolic Models BRENDA: Enzyme Database UM-BBD: Microbial Biocatalytic Reactions and Biodegradation Pathways Klotho: Collection and Categorization of Biological Compounds

Cellular Regulation and Gene Networks EpoDB: Genes Expressed during Human Erythropoiesis BIND: Descriptions of interactions, molecular complexes and pathways DIP: Catalogs experimentally determined interactions between proteins RegulonDB: Escherichia coli Pathways and Regulation

36

KEGG Metabolic & Regulatory PathwaysKEGG Metabolic & Regulatory Pathways

(http://www.genome.ad.jp/dbget-bin/show_pathway?hsa00590+874)

KEGG is a suite of databases and associated software, integrating our current knowledge on molecular interaction networks, the information of genes and proteins, and of chemical compounds and reactions. (http://www.genome.ad.jp/kegg/kegg2.html)

37

BioCycBioCyc (EcoCyc/MetaCyc Metabolic Pathways) (EcoCyc/MetaCyc Metabolic Pathways) The BioCyc Knowledge Library is a collection of Pathway/Genome

Databases (http://biocyc.org/)

38

Protein-Protein Interactions: DIPProtein-Protein Interactions: DIP(http://dip.doe-mbi.ucla.edu/)

39

Protein-Protein Interaction: BINDProtein-Protein Interaction: BIND((http://www.bind.ca/))

40

BioCarta Cellular PathwaysBioCarta Cellular Pathways(http://www.biocarta.com/index.asp)

41

VI. Databases of Protein StructuresVI. Databases of Protein Structures

Protein Structure and Classification PDB: Structure Determined by X-ray Crystallography and NMR CATH: Hierarchical Classification of Protein Domain Structures SCOP: Familial and Structural Protein Relationships FSSP: Protein Fold Family Database

Protein Sequence-Structure Relationship PIR-NRL3D: Protein Sequence-Structure Database PIR-RESID: Protein Structure/Post-Translational Modifications HSSP: Families and Alignments of Structurally-Conserved

Regions

42

PDB Structure DataPDB Structure Data(http://www.rcsb.org/pdb/)

43

PDBsum:PDBsum:

Summary and AnalysisSummary and Analysis (http://www.biochem.ucl.ac.uk/bsm/pdbsum)

44

Protein Structural ClassificationProtein Structural ClassificationCATH: Hierarchical domain classification of protein structures (http://www.biochem.ucl.ac.uk/bsm/cath_new/)

45

Protein Structural ClassificationProtein Structural Classification

(http://scop.mrc-lmb. cam.ac.uk/scop/)

The SCOP database aims to provide a detailed and comprehensive description of the structural and evolutionary relationships between all proteins whose structure is known, including all entries in the PDB.

46

Proteomic ResourcesProteomic Resources

GELBANK (GELBANK (http://gelbank.anl.gov): 2D-gel patterns from completed ): 2D-gel patterns from completed genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/)genomes; SWISS-2DPAGE (http://www.expasy.org/ch2d/)

PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ PEP: Predictions for Entire Proteomes: (http://cubic.bioc.columbia.edu/ pep/): Summarized analyses of protein sequencespep/): Summarized analyses of protein sequences Proteome BioKnowledge Library: (http://www.proteome.com): Detailed Proteome BioKnowledge Library: (http://www.proteome.com): Detailed information on human, mouse and rat proteomesinformation on human, mouse and rat proteomesProteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online Proteome Analysis Database (http://www.ebi.ac.uk/proteome/): Online application of InterPro and CluSTr for the functional classification of application of InterPro and CluSTr for the functional classification of proteins in whole genomesproteins in whole genomesExpression Profiling databases: GNF Expression Profiling databases: GNF (http://expression.gnf.org/cgi-bin/index.cgi, human and mouse (http://expression.gnf.org/cgi-bin/index.cgi, human and mouse transcriptome), SMD transcriptome), SMD (http://genome-www5.stanford.edu/MicroArray/SMD/, Stanford (http://genome-www5.stanford.edu/MicroArray/SMD/, Stanford microarray data analysis), EBI Microarray Informatics microarray data analysis), EBI Microarray Informatics (http://www.ebi.ac.uk/microarray/ index.html , (http://www.ebi.ac.uk/microarray/ index.html , managing, storing and managing, storing and analyzing microarray dataanalyzing microarray data))

47

VII. 2D-Gel Image DatabasesVII. 2D-Gel Image Databases

(http://www-lecb.ncifcrf.gov/2dwgDB)

(http://gelbank.anl.gov/2dgels/index.asp)(2D-gel of human ventricle proteins)

48

VIII. Proteome AnalysisVIII. Proteome Analysis(http://www.ebi.ac.uk/proteome)

49

Expression ProfilingExpression Profiling Human and Mouse Transcriptome

(http://expression.gnf.org/cgi-bin/index.cgi)

(http://genome-www. stanford.edu/serum/)

50

Lab:Lab: Visit selected websites and analyze some protein sequence of

your own choices. List of Bioinformatics Resources of this tutorial available: http://pir.georgetown.edu/~huz/bioinfo_resource.html

Try some of the following sequences for analysis: 1) well characterized proteins: PIR:A26366(CYP17), JS0747(Sp1) 2) less characterized proteins: PIR:A59000(MATER) TrEMBL:Q9QY16(GRTH) 3) hypothetical protein: PIR:T12515, T00338 , T47130 SWISS-PROT:Q9BWT7