chapter 14 protein secondary structure prediction

28
Chapter 14 Protein Secondary Structure Prediction

Upload: todd

Post on 13-Jan-2016

41 views

Category:

Documents


2 download

DESCRIPTION

Chapter 14 Protein Secondary Structure Prediction. Refresher. Proteins have secondary structures These structures are essential to maintain the 3D structure of the protein Secondary structure can be either of  -helix  -strand Coil - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Chapter 14 Protein Secondary Structure Prediction

Chapter 14

Protein Secondary Structure Prediction

Page 2: Chapter 14 Protein Secondary Structure Prediction

Proteins have secondary structuresThese structures are essential to maintain the 3D structure of the proteinSecondary structure can be either of•-helix•-strand•Coil

-helix H-bond between C=O and N-H of every 4+ith residue3.6 aa per turn1.5 Å / aa (= 5.4 Å per turn)(fully extended peptide backbone = 3.5 Å / aa)

-strand H-bond between C=O and N-H of distant regionsParallel or anti-parallel

Coiled coilHydrophobic amino acids interact

Refresher

Page 3: Chapter 14 Protein Secondary Structure Prediction

Secondary Structure Predictions

Prediction of conformation of each amino acid: •H: -helix•E: -strand•C: Coil (no defined 2° structure)

Used for classification of proteinsDefining domains and motifsIntermediary step towards 3° structure predictionGlobular and trans-membrane proteins are structurally very differentRequired different algorithms to predict these two classes of proteins

Page 4: Chapter 14 Protein Secondary Structure Prediction

• Problem is not trivial• -helix based on short distance (4+i interactions)• -strand based on long distance (5 – 50+ residues)• Long range interaction predictions less accurate• Accuracy about 75%

Ab initio basedStatistical calculation of residues in single query sequence

Homology-basedCommon 2° structure patterns in homologous sequences

Page 5: Chapter 14 Protein Secondary Structure Prediction

Ab initio Methods

A.A.

Helix Sheet

Designatio

nP

Designatio

nP

Ala H 1.42 i 0.83

Cys i 0.70 h 1.19

Asp I 1.01 B 0.54

Glu H 1.51 B 0.37

Phe h 1.13 h 1.38

Gly B 0.57 b 0.75

His I 1.00 h 0.87

Ile h 1.08 H 1.60

Lys h 1.16 b 0.74

Leu H 1.21 h 1.30

Met H 1.45 h 1.05

Asn b 0.67 b 0.89

Pro B 0.57 B 0.55

Gln h 1.11 h 1.10

Arg i 0.98 i 0.93

Ser i 0.77 b 0.75

Thr i 0.83 h 1.19

Val h 1.06 H 1.70

Trp h 1.08 h 1.37

Tyr b 0.69 H 1.47

Chou-FasmanIntrinsic property of residue to be in helix, strand or turn structureA, E, M common in -helices

N: residues in all protein structuresM: residues in -helicesY: Total Ala in protein structuresX: Ala in -helices

Propensity Ala in -helix: (X/Y)/(M/N)

Value = 1: same distribution as averageValue > 1: more often in -helix than averageValue < 1: less often in -helix than average

6 residue window of which 4 is H -helix Window extended bidirectionally until P < 1.05 residue window of which 3 is E -strand

Page 6: Chapter 14 Protein Secondary Structure Prediction

http://fasta.bioch.virginia.edu/fasta_www2/fasta_www.cgi?rm=misc1

Page 7: Chapter 14 Protein Secondary Structure Prediction

. . . . . .

SRRSASHPTYSEMIAAAIRAEKSRGGSSRQSIQKYIKSHYKVGHNADLQIKLSIRRLLAA

helix <--------> <-----> <-----------------

sheet EEEEEEEEE EEEEEE EEEEEEEEEEEEE

turns T T T T T

. . .

GVLKQTKGVGASGSFRLAKSDKAKRSPGKK

helix -------> <------->

sheet EEEEEEEEE

turns T T TT T

Example Chou-Fasman

10 20 30 40 50 60SRRSASHPTY SEMIAAAIRA EKSRGGSSRQ SIQKYIKSHY KVGHNADLQI KLSIRRLLAA

70 80 90GVLKQTKGVG ASGSFRLAKS DKAKRSPGKK

HELIX 1 HA1 SER A 29 ALA A 38HELIX 2 HA2 ARG A 47 SER A 56HELIX 3 HA3 ALA A 64 ALA A 78SHEET 1 SA 3 SER A 45 SER A 46SHEET 2 SA 3 GLY A 91 ARG A 94SHEET 3 SA 3 LEU A 81 GLY A 86

Page 8: Chapter 14 Protein Secondary Structure Prediction

Garnier-Osguthorpe-Robson (GOR)

•Makes use of distant influences on propensity•Uses 17 residue window•Adds propensity for four 2º structure states (H, E, T, C)•Highest value defines 2º structure state of central residue in window

. 10 . 20 . 30 . 40 . 50 . 60

SRRSASHPTYSEMIAAAIRAEKSRGGSSRQSIQKYIKSHYKVGHNADLQIKLSIRRLLAA

helix HHHHHHHHHHH HHHHHH HHHH

sheet EEEEEEEE E EEEEEE

turns TTTT TTTTT T TTTT

coil C CCCCC CCC C

. 70 . 80 . 90

GVLKQTKGVGASGSFRLAKSDKAKRSPGKK

helix HHHH HHHHHHHHHHH

sheet EEEEE E

turns TTT

coil CCCC C C

Residue totals: H: 36 E: 21 T: 17 C: 16

percent: H: 48.6 E: 28.4 T: 23.0 C: 21.6

Page 9: Chapter 14 Protein Secondary Structure Prediction

Algorithms based on a larger database of crystal structure information:

•GOR II, III and IV•SOPM

http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_server.html

SRRSASHPTYSEMIAAAIRAEKSRGGSSRQSIQKYIKSHYKVGHNADLQIKLSIRRLLAAGVLKQTKGVGcccccccchhhhhhhhhhhhtccttcccchhhhhhhhhtcccccccthhhhhhhhhhhhhhhhhttttcc

ASGSFRLAKSDKAKRSPGKKcccceeeecccccccccccc

Expansion using larger crustal structure databases

Page 10: Chapter 14 Protein Secondary Structure Prediction

Homology based methods

Page 11: Chapter 14 Protein Secondary Structure Prediction

Neural Network programs

• A neural net has an input layer, hidden layers composed of nodes given different weights, and an output layer

• Neural net trained with multiply aligned sequences• Accuracy >75%

PHD1. BLASTP2. MAXHOM (sequence alignment)3. Neural Net

Layer one : 13 residue windowLayer two: 17 residue windowLayer three: “Jury layer” – removes very short stretches

PSIPRED1. PSI-BLAST2. Neural net

SSproPROTERPROFHMMSTR

Page 12: Chapter 14 Protein Secondary Structure Prediction

Predictions with Multiple Methods

No single prediction program is correct, and it is generally good practice to use the output from several programs

Some web servers do this:

JPred•PHD, PREDATOR, DSC, NNSSP, Inet and ZPred•First submitted to PSI-BLAST•Multiple alignment•Submitted to above 6 programs•Consensus returned•No consensus, uses PHD

SRRSASHPTYSEMIAAAIRAEKSRGGSSRQSIQKYIKSHYKVGHNADLQIKLSIRRLLAAGVLKQTKGVGASGSFRLAKSDKAKRSPGKK---------HHHHHHHHHHH--------HHHHHHHHHH-------HHHHHHHHHHHHH---EEEEE------EEEE--------------

Page 13: Chapter 14 Protein Secondary Structure Prediction

How accurate?

Page 14: Chapter 14 Protein Secondary Structure Prediction

Trans-membrane proteins

Two types of trans-membrane proteins

-helix-barrel

•Many consists solely of -helix and are found in the cytoplasmic membrane-barrel normally found in outer-membrane of gram negative bacteria

•Difficult to get X-ray or NMR structure

Page 15: Chapter 14 Protein Secondary Structure Prediction

-helix perpendicular to membrane 17-25 residues•Hydrophobic residues separated by hydrophilic loops (<60 residues)•Residues bordering hydrophobic module is generally charged•Inner cytosolic region most often highly charged (orientation info)

•Positive inside rule•Scan window 17-25 residues calculate hydrophobicity score•Many false positives•Signal peptide sequences confuse algorithm

Page 16: Chapter 14 Protein Secondary Structure Prediction

TMHMM

•Trained with 160 known TM sequences•Probability of having an -helix is given•Orientation of -helix based on positive inside rule

Phobius

•Incorporates distinct HMM models for signal peptides and TM helices•Signal peptide sequence ignored•Can use sequence homologs and multiply aligned sequences

Page 17: Chapter 14 Protein Secondary Structure Prediction

Prediction of -barrel proteins

-strand forming trans-membrane section is amphipatic•10-22 residues•Alternating hydrophobic and hydrophilic sequence arrangement-helix TM prediction programs thus not applicable to -barrel proteins

TBBpred

•Neural net trained with -barrel protein sequences

Page 18: Chapter 14 Protein Secondary Structure Prediction

Coiled coil prediction

Two or more -helices winding around each otherFor every 7 residues, 1 and 4 are hydrophobic, facing central core

Coils•Scan window of 14, 21 or 28 residues•Compares residues to probability matrix based on known coiled coils•Accurate for left-handed coil, but not right-handed coil

Multicoil•Scoring matrix based on 2-strand and 3-strand coils•Used in several genome-wide studies

Leucine zippers•sub-class of coiled coils•L-X6-L-X6-L-•Found in transcription factors•Anti-parallel -helices stabilized by leucine core

Page 19: Chapter 14 Protein Secondary Structure Prediction

Chapter 13

Protein Tertiary Structure Prediction

Page 20: Chapter 14 Protein Secondary Structure Prediction

The need for predicting 3D structures

• X-ray crystallography is extremely tedious• DNA sequences and therefore protein sequences are rapidly

generated• A gap between sequence and structure is widening• Protein structure often provides insight info function

Thee main methods for 3D prediction

1. Homology modeling2. Threading3. Ab initio

Page 21: Chapter 14 Protein Secondary Structure Prediction

Homology Modeling

Page 22: Chapter 14 Protein Secondary Structure Prediction

•Search PDB for homologous sequences with BLAST or FASTA•Should have >30% sequence identity (20% at a stretch)•In case of multiple hits, choose•Highest identity•Highest resolution•Most appropriate co-factors

Template Selection

Sequence Alignment

CriticalIncorrectly aligned residues will give an incorrect modelUse Praline or T-Coffee for alignmentInspect visually to confirm alignment of key residues

Page 23: Chapter 14 Protein Secondary Structure Prediction

Backbone Model Building

•Copy the backbone atoms of the query sequence to that of the corresponding aligned residue•If the residues are identical, the coordinates of the whole residue can be copied•If the residues are different, only the C are copied•The remaining atoms of the residue are modeled later

Loop Modeling

It often happens that there are “gaps” in the aligned sequencesTwo techniques to connect the protein on either side of the gap:Database•Search database for fragments that fit the gap•Measure coordinates and orientation of backbone on either side of gap•Search for fragments that can fit•Best loop gives no steric clash with structureAb Initio•Generate random loop No clash with nearby side-chains And angles in acceptable region of Ramachandran plot

Page 24: Chapter 14 Protein Secondary Structure Prediction

Side Chain Refinement

•Need to model side-chains where these differ from aligned template sequence•Search database for all occurrences of given side-chain in backbone conformation and minimal clash with neighbouring residues•Computationally prohibitive•Library of rotamers•Collection of conformations for each residue that is most often observed in structure database•Select rotamer with conformation that best fits backbone•Minimal interference with neighbouring side-chains•SCWRL

Page 25: Chapter 14 Protein Secondary Structure Prediction

Model Refinement using Energy Function

•After loop modeling and side-chain refinement the follwing remain•Unfavourable torsion angles•Unacceptable proximity of atoms

•Use energy minimization to alleviate such problems•Limit number of iteration (<100) to ensure that the entire model does not change form the template•Molecular Dynamic can be used to search for a global minimum

Model Evaluation

•Check consistency in - angles•Bond lengths•Close contacts•Flag regions below acceptability threshold

•Procheck•WHATIF•ANOLEA•Verify3D

Page 26: Chapter 14 Protein Secondary Structure Prediction

Comprehensive Modeling Programs

•Modeler•Swiss-Model•3D-Jigsaw

Page 27: Chapter 14 Protein Secondary Structure Prediction

Threading and Fold Recognition

Pairwise Energy Method•Fit sequence to each fold in database•Use local alignment to improve fit•Calculate energies•Pairwise residue interaction•Solvation Hydrophobic

Profile Method•Fit sequence to fold•Calculate propensity of each amino acid to be present at each profile position

•Secondary structure types•Solvent exposure•Hydrophobicity

•Use structure fold that best fits profile of parameters

Page 28: Chapter 14 Protein Secondary Structure Prediction

Ab Initio Prediction

Protein fold into a native, low-energy native stateThe mechanism driving this process is poorly understoodComputationally untenable to explore all possible states and calculate energiesA 40 residue peptide will require 1020 years to calculate all states using a 1×1012 FLOPS computerNot realistic approach currently