biology 224 dr. tom peavy sept 28 & 30
DESCRIPTION
Protein Structure & Analysis. Biology 224 Dr. Tom Peavy Sept 28 & 30. . Protein families. Protein localization. protein. Protein function. Gene ontology (GO): --cellular component --biological process - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/1.jpg)
Biology 224Dr. Tom Peavy
Sept 28 & 30
<Images from Bioinformatics and Functional Genomics by Jonathan Pevsner>
Protein Structure &Analysis
![Page 2: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/2.jpg)
protein
Protein families
Protein function
Physical properties
Protein localization
Gene ontology (GO):--cellular component--biological process--molecular function
![Page 3: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/3.jpg)
The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI)
Work groups
• Gel Electrophoresis• Mass Spectrometry• Molecular Interactions• Protein Modifications• Proteomics Informatics• Sample Processing
Themes
• Controlled vocabularies• MIAPE: Minimum information about a proteomics experiment
![Page 4: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/4.jpg)
The Human Proteome Organisation (HUPO) Proteomics Standards Initiative (PSI)
http://www.psidev.info/
![Page 5: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/5.jpg)
Protein domains, motifs& signatures
![Page 6: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/6.jpg)
Definitions
Signature: • a protein category such as a domain or motif(a defining property of the protein or family)
Domain: • a region of a protein that can adopt a 3D structure• a fold• a family is a group of proteins that share a domain• examples: zinc finger domain immunoglobulin domain
Motif (or fingerprint):• a short, conserved region of a protein• typically 10 to 20 contiguous amino acid residues
![Page 7: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/7.jpg)
Definition of a domain
According to InterPro at EBI (http://www.ebi.ac.uk/interpro/):
A domain is an independent structural unit, found aloneor in conjunction with other domains or repeats.Domains are evolutionarily related.
According to SMART (http://smart.embl-heidelberg.de):
A domain is a conserved structural entity with distinctivesecondary structure content and a hydrophobic core.Homologous domains with common functions usuallyshow sequence similarities.
![Page 8: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/8.jpg)
15 most common domains (human)
Zn finger, C2H2 type 1093 proteinsImmunoglobulin 1032EGF-like 471Zn-finger, RING 458Homeobox 417Pleckstrin-like 405RNA-binding region RNP-1 400SH3 394Calcium-binding EF-hand 392Fibronectin, type III 300PDZ/DHR/GLGF 280Small GTP-binding protein 261BTB/POZ 236bHLH 226Cadherin 226
![Page 9: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/9.jpg)
Varieties of protein domains
Extending along the length of a protein
Occupying a subset of a protein sequence
Occurring one or more times
![Page 10: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/10.jpg)
Example of a protein with domains: Methyl CpG binding protein 2 (MeCP2)
MBD TRD
The protein includes a methylated DNA binding domain(MBD) and a transcriptional repression domain (TRD).MeCP2 is a transcriptional repressor.
Mutations in the gene encoding MeCP2 cause RettSyndrome, a neurological disorder affecting girlsprimarily.
![Page 11: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/11.jpg)
Result of an MeCP2 blastp search:A methyl-binding domain shared by several proteins
![Page 12: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/12.jpg)
Are proteins that share only a domain homologous?
![Page 13: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/13.jpg)
Proteins can have both domains and patterns (motifs)
Domain(aspartylprotease)
Domain(reversetranscriptase)
Pattern(severalresidues)
Pattern(severalresidues)
![Page 14: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/14.jpg)
The SwissProt entry for any protein provideshighly useful information…
![Page 15: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/15.jpg)
SwissProt entry for HIV-1 pol links to many databases
![Page 16: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/16.jpg)
Definition of a motif
A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids.
Simple motifs include transmembrane domains andphosphorylation sites. These do not imply homologywhen found in a group of proteins.
PROSITE (www.expasy.org/prosite) is a dictionary of motifs (there are currently 1600 entries). In PROSITE,a pattern is a qualitative motif description (a proteineither matches a pattern, or not). In contrast, a profileis a quantitative motif description. Profiles are found in Pfam, ProDom, SMART, and other databases.
Page 231-233
![Page 17: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/17.jpg)
![Page 18: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/18.jpg)
http://www.ebi.ac.uk/Databases/
ExPASy Proteomics ServerThe ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE (Disclaimer / References).
http://ca.expasy.org/
InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.
http://www.ebi.ac.uk/interpro/
InterPro
![Page 19: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/19.jpg)
PROSITEDatabase of protein families and domainshttp://ca.expasy.org/prosite/
Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains. http://www.sanger.ac.uk/Software/Pfam/index.shtml
PRINTS is a compendium of protein fingerprints http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/
The ProDom protein domain database consists of an automatic compilation of homologous domains.http://prodes.toulouse.inra.fr/prodom/current/html/home.php
![Page 20: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/20.jpg)
![Page 21: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/21.jpg)
Page 231
ProDom entry for HIV-1 pol shows many related proteins
![Page 22: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/22.jpg)
SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.
http://smart.embl-heidelberg.de/
Houses the PIRSF, ProClassand ProLINK databaseshttp://pir.georgetown.edu/
![Page 23: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/23.jpg)
www.uniprot.org
Three protein databases recently merged to form UniProt:
• SwissProt
• TrEMBL (translated European Molecular Biology Lab)
• Protein Information Resource (PIR)
You can search for information on your favorite protein there; a BLAST server is provided.
![Page 24: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/24.jpg)
Page 230
1. Go to ExPASy (http://www.expasy.ch/)2. If you know the SwissProt accession of your protein, enter it at top.3. Otherwise go into Swiss-Prot/TrEMBL, click SRS (Sequence Retrieval System), click Start, then click continue, then search for your protein of interest.
![Page 25: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/25.jpg)
Protein family classification and databases
PIRSF TIGRFAMs
SUPERFAMILY Gene3D
PANTHER
http://pir.georgetown.edu/iproclass/http://www.tigr.org/TIGRFAMs/index.shtml
http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/ http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/
http://www.pantherdb.org/
![Page 26: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/26.jpg)
![Page 27: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/27.jpg)
Physical properties of proteins
Many websites are available for the analysis ofindividual proteins. ExPASy and ISREC are twoexcellent resources.
The accuracy of these programs is variable. Predictions based on primary amino acid sequence (such as molecular weight prediction) are likely to be more trustworthy. For many other properties (such asposttranslational modification of proteins by specific sugars), experimental evidence may be required rather than prediction algorithms.
Page 236
![Page 28: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/28.jpg)
Page 230
http://www.expasy.ch/
![Page 29: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/29.jpg)
Page 235
Access a variety of protein analysis programsfrom the top right of the ExPASy home page
![Page 30: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/30.jpg)
Page 244
![Page 31: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/31.jpg)
Page 244
![Page 32: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/32.jpg)
Proteomics: High throughput protein analysis
Proteomics is the study of the entire collection of proteins encoded by a genome
“Proteomics” refers to all the proteins in a celland/or all the proteins in an organism
Large-scale protein analysis2D protein gelsYeast two-hybridRosetta Stone approachPathways
Page 247
![Page 33: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/33.jpg)
Two-dimensional protein gels
First dimension: isoelectric focusing
Second dimension: SDS-PAGE
Page 248
![Page 34: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/34.jpg)
![Page 35: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/35.jpg)
Two-dimensional protein gels
First dimension: isoelectric focusing
Electrophorese ampholytes to establisha pH gradient
Can use a pre-made strip
Proteins migrate to their isoelectric point(pI) then stop (net charge is zero)
Range of pI typically 4-9 (5-8 most common)
Page 248
![Page 36: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/36.jpg)
Two-dimensional protein gels
Second dimension: SDS-PAGE
Electrophorese proteins through an acrylamidematrix
Proteins are charged and migrate through an electric field
Conditions are denaturing (SDS) and reducing (2-mercaptoethanol)
Can resolve hundreds to thousands of proteins
Page 248
![Page 37: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/37.jpg)
Proteins identified on 2D gels (IEF/SDS-PAGE)
Direct protein microsequencing byEdman degradations
-- done at many core facilities (e.g. UC Davis)-- typically need 5 picomoles-- often get 10 to 20 amino acids sequenced
Protein mass analysis by MALDI-TOF
-- done at core facilities-- often detect posttranslational modifications-- matrix assisted laser desorption/ionization time-of-flight spectroscopy
Page 250-1
![Page 38: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/38.jpg)
Page 252
![Page 39: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/39.jpg)
Evaluation of 2D gels (IEF/SDS-PAGE)
Advantages:Visualize hundreds to thousands of proteinsImproved identification of protein spots
Disadvantages:Limited number of samples can be processedMostly abundant proteins visualizedTechnically difficult
Page 251
![Page 40: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/40.jpg)
Gene Ontology (GO) Consortium
![Page 41: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/41.jpg)
The Gene Ontology Consortium
An ontology is a description of concepts. The GOConsortium compiles a dynamic, controlled vocabularyof terms related to gene products.
There are three organizing principles: Molecular functionBiological processCellular component
![Page 42: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/42.jpg)
Page 241
GO terms are assigned to Entrez Gene entries
![Page 43: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/43.jpg)
Page 241
![Page 44: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/44.jpg)
Gene product cytochrome c GO entry terms:
molecular function = electron transporter activity,
the biological process = oxidative phosphorylation and induction of cell death
the cellular component = mitochondrial matrix and mitochondrial inner membrane.
Example
![Page 45: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/45.jpg)
GO consortium (http://www.geneontology.org)No centralized GO database. Instead, curatorsof organism-specific databases assign GO termsto gene products for each organism.
AmiGO is the searchable portion of the GO--Gene Symbol, name, UniProt access numbers, and Text searches can be used to find GO entries
![Page 46: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/46.jpg)
![Page 47: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/47.jpg)
The Gene Ontology Consortium: Evidence Codes
IC Inferred by curatorIDA Inferred from direct assayIEA Inferred from electronic annotationIEP Inferred from expression patternIGI Inferred from genetic interactionIMP Inferred from mutant phenotypeIPI Inferred from physical interactionISS Inferred from sequence or structural similarityNAS Non-traceable author statementND No biological dataTAS Traceable author statement
![Page 48: Biology 224 Dr. Tom Peavy Sept 28 & 30](https://reader035.vdocument.in/reader035/viewer/2022062423/56814d09550346895dba3f8e/html5/thumbnails/48.jpg)