proteins secondary structure predictions
DESCRIPTION
Structural Bioinformatics. Proteins Secondary Structure Predictions. The first high resolution structure of a protein-myoglobin. Was solved in 1958 by Max Perutz John Kendrew of Cambridge University. (Won the 1962 and Nobel Prize in Chemistry ). - PowerPoint PPT PresentationTRANSCRIPT
Proteins SecondaryStructure Predictions
Structural Bioinformatics
2
In 12.12.2013 there were 89,110 protein structures in the protein structure database.Great increase but still a magnitude lower then the total number of protein sequence databases (close to 1,000,000)
Was solved in 1958 by Max Perutz John Kendrew of Cambridge University.
(Won the 1962 and Nobel Prize in Chemistry)
The first high resolution structure of a protein-myoglobin
3
Predicting the three dimensional structure from sequence of a protein is very hard
(some times impossible)
However we can predict with relative high precision the secondary structure
MERFGYTRAANCEAP….
What can we do to bridge the gap??
What do we mean by Secondary Structure ?
Secondary structure are the building blocks of the protein structure:
=
5
What do we mean by Secondary Structure ?
Secondary structure is usually divided into three categories:
Alpha helix Beta strand (sheet)Anything else –
turn/loop
6
The different secondary structures are combined together to form the
Tertiary Structure of the Proteins
7
RBP
Globin
Tertiary
Secondary
?
?
?
Secondary Structure Prediction
• Given a primary sequence
ADSGHYRFASGFTYKKMNCTEAA
what secondary structure will it adopt
(alpha helix, beta strand or random coil) ?
8
9
Secondary Structure Prediction Methods
• Statistical methods– Based on amino acid frequencies– HMM (Hidden Markov Model)
• Machine learning methods– SVM , Neural networks
10
Chou and Fasman (1974)
Name P(a) P(b) P(turn)
Alanine 142 83 66Arginine 98 93 95Aspartic Acid 101 54 146Asparagine 67 89 156Cysteine 70 119 119Glutamic Acid 151 037 74Glutamine 111 110 98Glycine 57 75 156Histidine 100 87 95Isoleucine 108 160 47Leucine 121 130 59Lysine 114 74 101Methionine 145 105 60Phenylalanine 113 138 60Proline 57 55 152Serine 77 75 143Threonine 83 119 96Tryptophan 108 137 96Tyrosine 69 147 114Valine 106 170 50
The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker)
Success rate of 50%
Statistical Methods for SS prediction
11
Secondary Structure Method Improvements
‘Sliding window’ approach• Most alpha helices are ~12 residues long
Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold
predict this is an alpha helix/beta sheet
TGTAGPQLKCHIQWMLPLKK
12
Improvements since 1980’s
• Adding information from conservation in MSA
• Smarter algorithms (e.g. Machine learning, HMM).
13
• HMM enables us to calculate the probability of assigning a sequence to a secondary structure
TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB
p? =
HMM (Hidden Markov Model) approach for predicting
Secondary Structure
14
The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15
The probability of
observing Alanine as part of a β-
sheet
Table built according to large database of known secondary structures
α-helix followed by
α-helix
Beginning with an α-
helix
15
• Example
What is the probability that the sequence TGQ will be in a helical structure??
TGQHHH
p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995
Success of HMM based methods-> 75%-80%
• What can we learn from secondary structure predictions??
Mad Cow DiseasePrPcc to PrPscsc
PRPc PRPsc
18
How do the protein structure relate to the primary protein sequence??
19
-Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen)
- Protein structure is more conserved than
protein sequence and more closely related
to function.
SEQUENCE
20
How (CAN) Different Amino Acid Sequence Determine Similar Protein
Structure ??
Lesk and Chothia 1980
21
The Globin Family
22
Different sequences can result in similar structures
1ecd 2hhd
23
We can learn about the important features which determine structure and function by comparing the sequences and structures ?
24
The Globin Family
25
Why is Proline 36 conserved in all the globin family ?
26
Where are the gaps??
The gaps in the pairwise alignment are mapped to the loop regions
27
How are remote homologs related in terms of their structure?
retinol-binding protein
odorant-binding protein
apolipoprotein D b-lactoglobulin
RBD
28
PSI-BLAST alignment of RBP and -lactoglobulin: iteration 3
Score = 159 bits (404), Expect = 1e-38Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)
Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59
Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112
Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159
29
The Retinol Binding Protein b-lactoglobulin
30
MERFGYTRAANCEAP….
Taken together
FUNCTION
PfamDatabase that contains a large collection of multiple sequence alignments of protein families (common structures) Very useful for function prediction.
http://pfam.sanger.ac.uk/
The zinc-finger family (domain)Known family of Transcription Factors
ZINC FINGER DOMAIN
Protein sequence
PfamBased on Profile hidden Markov Models (HMMs) which represents the protein family
HMM in comparison to PSSM is a modelwhich considers dependencies between thedifferent columns in the matrix (different residues) and is thus much more powerful!!!!
http://pfam.sanger.ac.uk/
Profile HMM (Hidden Markov Model)can accurately represent a MSA
D16 D17 D18 D19
M16 M17 M18 M19
I16 I19I18I17
100%
100% 100%
100%
D 0.8S 0.2
P 0.4R 0.6
T 1.0 R 0.4S 0.6
X XX X
50%
50%D R T RD R T SS - - SS P T RD R T RD P T SD - - SD - - SD - - SD - - R
16 17 18 19
Match
delete
insert
Extra Slides (for your interest)
35
36
3.6 residues
5.6 Å
Alpha Helix: Pauling (1951)
• A consecutive stretch of 5-40 amino
acids (average 10).
• A right-handed spiral conformation.
• 3.6 amino acids per turn.
• Stabilized by Hydrogen bonds
37
Beta Strand: Pauling and Corey (1951)
> An extended polypeptide chains
is called β –strand
(consists of 5-10 amino acids
> The chains are connected together
by Hydrogen bonds to form b-sheet
β -strand
β -sheet
38
Loops
• Connect the secondary structure elements (alpha helix and beta strands).
• Have various length and shapes.