biology 224 instructor: tom peavy feb 21 & 26, 2008

26
Biology 224 Instructor: Tom Peavy Feb 21 & 26, 2008 <Images from Bioinformatics and Functional Genomics by Jonathan Pevsner> Protein Structure & Analysis

Upload: doctor

Post on 26-Jan-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Protein Structure & Analysis. Biology 224 Instructor: Tom Peavy Feb 21 & 26, 2008. . Protein families. Protein localization. protein. Protein function. Gene ontology (GO): --cellular component - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Biology 224Instructor: Tom Peavy

Feb 21 & 26, 2008

<Images from Bioinformatics and Functional Genomics by Jonathan Pevsner>

Protein Structure &Analysis

Page 2: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

protein

Protein families

Protein function

Physical properties

Protein localization

Gene ontology (GO):--cellular component--biological process--molecular function

Page 3: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Protein domains, motifs& signatures

Page 4: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Definitions

Signature: • a protein category such as a domain or motif

Domain: • a region of a protein that can adopt a 3D structure• a fold• a family is a group of proteins that share a domain• examples: zinc finger domain immunoglobulin domain

Motif (or fingerprint):• a short, conserved region of a protein• typically 10 to 20 contiguous amino acid residues

Page 5: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Definition of a domain

According to InterPro at EBI (http://www.ebi.ac.uk/interpro/):

A domain is an independent structural unit, found aloneor in conjunction with other domains or repeats.Domains are evolutionarily related.

According to SMART (http://smart.embl-heidelberg.de):

A domain is a conserved structural entity with distinctivesecondary structure content and a hydrophobic core.Homologous domains with common functions usuallyshow sequence similarities.

Page 6: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

15 most common domains (human)

Zn finger, C2H2 type 1093 proteinsImmunoglobulin 1032EGF-like 471Zn-finger, RING 458Homeobox 417Pleckstrin-like 405RNA-binding region RNP-1 400SH3 394Calcium-binding EF-hand 392Fibronectin, type III 300PDZ/DHR/GLGF 280Small GTP-binding protein 261BTB/POZ 236bHLH 226Cadherin 226

Page 7: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Varieties of protein domains

Extending along the length of a protein

Occupying a subset of a protein sequence

Occurring one or more times

Page 8: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Example of a protein with domains: Methyl CpG binding protein 2 (MeCP2)

MBD TRD

The protein includes a methylated DNA binding domain(MBD) and a transcriptional repression domain (TRD).MeCP2 is a transcriptional repressor.

Mutations in the gene encoding MeCP2 cause RettSyndrome, a neurological disorder affecting girlsprimarily.

Page 9: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Result of an MeCP2 blastp search:A methyl-binding domain shared by several proteins

Page 10: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Are proteins that share only a domain homologous?

Page 11: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Proteins can have both domains and patterns (motifs)

Domain(aspartylprotease)

Domain(reversetranscriptase)

Pattern(severalresidues)

Pattern(severalresidues)

Page 12: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

SwissProt entry for HIV-1 pol links to many databases

Page 13: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008
Page 14: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

http://www.ebi.ac.uk/Databases/

             

                 

ExPASy Proteomics ServerThe ExPASy (Expert Protein Analysis System) proteomics server of the Swiss Institute of Bioinformatics (SIB) is dedicated to the analysis of protein sequences and structures as well as 2-D PAGE (Disclaimer / References).

http://ca.expasy.org/

      

InterPro is a database of protein families, domains and functional sites in which identifiable features found in known proteins can be applied to unknown protein sequences.

http://www.ebi.ac.uk/interpro/

InterPro

Page 15: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

                                                        

PROSITEDatabase of protein families and domainshttp://ca.expasy.org/prosite/

         

Pfam is a large collection of multiple sequence alignments and hidden Markov models covering many common protein domains. http://www.sanger.ac.uk/Software/Pfam/index.shtml

          PRINTS is a compendium of protein fingerprints http://umber.sbs.man.ac.uk/dbbrowser/PRINTS/

        The ProDom protein domain database consists of an automatic compilation of homologous domains.http://prodes.toulouse.inra.fr/prodom/current/html/home.php

Page 16: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

          SMART (a Simple Modular Architecture Research Tool) allows the identification and annotation of genetically mobile domains and the analysis of domain architectures.

http://smart.embl-heidelberg.de/

        The ProDom protein domain database consists of an automatic compilation of homologous domains.http://prodes.toulouse.inra.fr/prodom/current/html/home.php

                                     

Houses the PIRSF, ProClassand ProLINK databaseshttp://pir.georgetown.edu/

Page 17: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Protein family classification and databases

          PIRSF TIGRFAMs

SUPERFAMILY Gene3D

PANTHER

http://pir.georgetown.edu/iproclass/http://www.tigr.org/TIGRFAMs/index.shtml

http://supfam.mrc-lmb.cam.ac.uk/SUPERFAMILY/ http://www.biochem.ucl.ac.uk/bsm/cath/Gene3D/

http://www.pantherdb.org/

Page 18: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008
Page 19: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Definition of a motif

A motif (or fingerprint) is a short, conserved region of a protein. Its size is often 10 to 20 amino acids.

Simple motifs include transmembrane domains andphosphorylation sites. These do not imply homologywhen found in a group of proteins.

In PROSITE,a pattern is a qualitative motif description(a protein either matches a pattern, or not).

Page 20: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Gene Ontology (GO) Consortium

Page 21: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

The Gene Ontology Consortium

An ontology is a description of concepts. The GOConsortium compiles a dynamic, controlled vocabularyof terms related to gene products.

There are three organizing principles: Molecular functionBiological processCellular component

Page 22: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

Gene product cytochrome c GO entry terms:

molecular function = electron transporter activity,

the biological process = oxidative phosphorylation and induction of cell death

the cellular component = mitochondrial matrix and mitochondrial inner membrane.

Example

Page 23: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

GO consortium (http://www.geneontology.org)No centralized GO database. Instead, curatorsof organism-specific databases assign GO termsto gene products for each organism.

AmiGO is the searchable portion of the GO--Gene Symbol, name, UniProt access numbers, and Text searches can be used to find GO entries

Page 24: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008

The Gene Ontology Consortium: Evidence Codes

IC Inferred by curatorIDA Inferred from direct assayIEA Inferred from electronic annotationIEP Inferred from expression patternIGI Inferred from genetic interactionIMP Inferred from mutant phenotypeIPI Inferred from physical interactionISS Inferred from sequence or structural similarityNAS Non-traceable author statementND No biological dataTAS Traceable author statement

Page 25: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008
Page 26: Biology 224 Instructor:  Tom Peavy Feb 21 & 26, 2008