proteins secondary structure predictions

38
Proteins Secondary Structure Predictions ructural Bioinformati

Upload: vevina

Post on 19-Jan-2016

30 views

Category:

Documents


1 download

DESCRIPTION

Structural Bioinformatics. Proteins Secondary Structure Predictions. The first high resolution structure of a protein-myoglobin. Was solved in 1958 by Max Perutz John Kendrew of Cambridge University. (Won the 1962 and Nobel Prize in Chemistry ). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Proteins  Secondary Structure Predictions

Proteins SecondaryStructure Predictions

Structural Bioinformatics

Page 2: Proteins  Secondary Structure Predictions

2

In 12.12.2013 there were 89,110 protein structures in the protein structure database.Great increase but still a magnitude lower then the total number of protein sequence databases (close to 1,000,000)

Was solved in 1958 by Max Perutz John Kendrew of Cambridge University.

(Won the 1962 and Nobel Prize in Chemistry)

The first high resolution structure of a protein-myoglobin

Page 3: Proteins  Secondary Structure Predictions

3

Predicting the three dimensional structure from sequence of a protein is very hard

(some times impossible)

However we can predict with relative high precision the secondary structure

MERFGYTRAANCEAP….

What can we do to bridge the gap??

Page 4: Proteins  Secondary Structure Predictions

What do we mean by Secondary Structure ?

Secondary structure are the building blocks of the protein structure:

=

Page 5: Proteins  Secondary Structure Predictions

5

What do we mean by Secondary Structure ?

Secondary structure is usually divided into three categories:

Alpha helix Beta strand (sheet)Anything else –

turn/loop

Page 6: Proteins  Secondary Structure Predictions

6

The different secondary structures are combined together to form the

Tertiary Structure of the Proteins

Page 7: Proteins  Secondary Structure Predictions

7

RBP

Globin

Tertiary

Secondary

?

?

?

Page 8: Proteins  Secondary Structure Predictions

Secondary Structure Prediction

• Given a primary sequence

ADSGHYRFASGFTYKKMNCTEAA

what secondary structure will it adopt

(alpha helix, beta strand or random coil) ?

8

Page 9: Proteins  Secondary Structure Predictions

9

Secondary Structure Prediction Methods

• Statistical methods– Based on amino acid frequencies– HMM (Hidden Markov Model)

• Machine learning methods– SVM , Neural networks

Page 10: Proteins  Secondary Structure Predictions

10

Chou and Fasman (1974)

Name P(a) P(b) P(turn)

Alanine 142 83 66Arginine 98 93 95Aspartic Acid 101 54 146Asparagine 67 89 156Cysteine 70 119 119Glutamic Acid 151 037 74Glutamine 111 110 98Glycine 57 75 156Histidine 100 87 95Isoleucine 108 160 47Leucine 121 130 59Lysine 114 74 101Methionine 145 105 60Phenylalanine 113 138 60Proline 57 55 152Serine 77 75 143Threonine 83 119 96Tryptophan 108 137 96Tyrosine 69 147 114Valine 106 170 50

The propensity of an amino acid to be part of a certain secondary structure (e.g. – Proline has a low propensity of being in an alpha helix or beta sheet breaker)

Success rate of 50%

Statistical Methods for SS prediction

Page 11: Proteins  Secondary Structure Predictions

11

Secondary Structure Method Improvements

‘Sliding window’ approach• Most alpha helices are ~12 residues long

Most beta strands are ~6 residues long Look at all windows of size 6/12 Calculate a score for each window. If >threshold

predict this is an alpha helix/beta sheet

TGTAGPQLKCHIQWMLPLKK

Page 12: Proteins  Secondary Structure Predictions

12

Improvements since 1980’s

• Adding information from conservation in MSA

• Smarter algorithms (e.g. Machine learning, HMM).

Page 13: Proteins  Secondary Structure Predictions

13

• HMM enables us to calculate the probability of assigning a sequence to a secondary structure

TGTAGPOLKCHIQWML HHHHHHHLLLLBBBBB

p? =

HMM (Hidden Markov Model) approach for predicting

Secondary Structure

Page 14: Proteins  Secondary Structure Predictions

14

The probability of observing a residue which belongs to an α-helix followed by a residue belonging to a turn = 0.15

The probability of

observing Alanine as part of a β-

sheet

Table built according to large database of known secondary structures

α-helix followed by

α-helix

Beginning with an α-

helix

Page 15: Proteins  Secondary Structure Predictions

15

• Example

What is the probability that the sequence TGQ will be in a helical structure??

TGQHHH

p = 0.45 x 0.041 x 0.8 x 0.028 x 0.8x 0.0635 = 0.0020995

Success of HMM based methods-> 75%-80%

Page 16: Proteins  Secondary Structure Predictions

• What can we learn from secondary structure predictions??

Page 17: Proteins  Secondary Structure Predictions

Mad Cow DiseasePrPcc to PrPscsc

PRPc PRPsc

Page 18: Proteins  Secondary Structure Predictions

18

How do the protein structure relate to the primary protein sequence??

Page 19: Proteins  Secondary Structure Predictions

19

-Early experiments have shown that the sequence of the protein is sufficient to determine its structure (Anfisen)

- Protein structure is more conserved than

protein sequence and more closely related

to function.

SEQUENCE

Page 20: Proteins  Secondary Structure Predictions

20

How (CAN) Different Amino Acid Sequence Determine Similar Protein

Structure ??

Lesk and Chothia 1980

Page 21: Proteins  Secondary Structure Predictions

21

The Globin Family

Page 22: Proteins  Secondary Structure Predictions

22

Different sequences can result in similar structures

1ecd 2hhd

Page 23: Proteins  Secondary Structure Predictions

23

We can learn about the important features which determine structure and function by comparing the sequences and structures ?

Page 24: Proteins  Secondary Structure Predictions

24

The Globin Family

Page 25: Proteins  Secondary Structure Predictions

25

Why is Proline 36 conserved in all the globin family ?

Page 26: Proteins  Secondary Structure Predictions

26

Where are the gaps??

The gaps in the pairwise alignment are mapped to the loop regions

Page 27: Proteins  Secondary Structure Predictions

27

How are remote homologs related in terms of their structure?

retinol-binding protein

odorant-binding protein

apolipoprotein D b-lactoglobulin

RBD

Page 28: Proteins  Secondary Structure Predictions

28

PSI-BLAST alignment of RBP and -lactoglobulin: iteration 3

Score = 159 bits (404), Expect = 1e-38Identities = 41/170 (24%), Positives = 69/170 (40%), Gaps = 19/170 (11%)

Query: 3 WVWALLLLAAWAAAERD--------CRVSSFRVKENFDKARFSGTWYAMAKKDPEGLFLQ 54 V L+ LA A + S V+ENFD ++ G WY + K Sbjct: 1 MVTMLMFLATLAGLFTTAKGQNFHLGKCPSPPVQENFDVKKYLGRWYEIEKIPASFE-KG 59

Query: 55 DNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTEDPAKFKMKYWGVASFLQ 114 + I A +S+ E G + K V + ++ +PAK +++++ + Sbjct: 60 NCIQANYSLMENGNIEVLNKELSPDGTMNQVKGE--AKQSNVSEPAKLEVQFFPL----- 112

Query: 115 KGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADSYSFVFSRDPNGLPPEA 164 +WI+ TDY+ YA+ YSC + ++ R+P LPPE Sbjct: 113 MPPAPYWILATDYENYALVYSCTTFFWL--FHVDFFWILGRNPY-LPPET 159

Page 29: Proteins  Secondary Structure Predictions

29

The Retinol Binding Protein b-lactoglobulin

Page 30: Proteins  Secondary Structure Predictions

30

MERFGYTRAANCEAP….

Taken together

FUNCTION

Page 31: Proteins  Secondary Structure Predictions

PfamDatabase that contains a large collection of multiple sequence alignments of protein families (common structures) Very useful for function prediction.

http://pfam.sanger.ac.uk/

Page 32: Proteins  Secondary Structure Predictions

The zinc-finger family (domain)Known family of Transcription Factors

ZINC FINGER DOMAIN

Protein sequence

Page 33: Proteins  Secondary Structure Predictions

PfamBased on Profile hidden Markov Models (HMMs) which represents the protein family

HMM in comparison to PSSM is a modelwhich considers dependencies between thedifferent columns in the matrix (different residues) and is thus much more powerful!!!!

http://pfam.sanger.ac.uk/

Page 34: Proteins  Secondary Structure Predictions

Profile HMM (Hidden Markov Model)can accurately represent a MSA

D16 D17 D18 D19

M16 M17 M18 M19

I16 I19I18I17

100%

100% 100%

100%

D 0.8S 0.2

P 0.4R 0.6

T 1.0 R 0.4S 0.6

X XX X

50%

50%D R T RD R T SS - - SS P T RD R T RD P T SD - - SD - - SD - - SD - - R

16 17 18 19

Match

delete

insert

Page 35: Proteins  Secondary Structure Predictions

Extra Slides (for your interest)

35

Page 36: Proteins  Secondary Structure Predictions

36

3.6 residues

5.6 Å

Alpha Helix: Pauling (1951)

• A consecutive stretch of 5-40 amino

acids (average 10).

• A right-handed spiral conformation.

• 3.6 amino acids per turn.

• Stabilized by Hydrogen bonds

Page 37: Proteins  Secondary Structure Predictions

37

Beta Strand: Pauling and Corey (1951)

> An extended polypeptide chains

is called β –strand

(consists of 5-10 amino acids

> The chains are connected together

by Hydrogen bonds to form b-sheet

β -strand

β -sheet

Page 38: Proteins  Secondary Structure Predictions

38

Loops

• Connect the secondary structure elements (alpha helix and beta strands).

• Have various length and shapes.