secondary structure prediction from amino acid sequence

22
Secondary structure prediction from amino acid sequence

Upload: gabriella-minus

Post on 14-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Secondary structure prediction from amino acid sequence

Secondary structure prediction from amino acid sequence

Page 2: Secondary structure prediction from amino acid sequence

Homology: Paralogs and orthologs

a

a b

duplication

speciation

species 1 species 2

a b a b

Paralogs = gene families in same species

orthologs

Page 3: Secondary structure prediction from amino acid sequence

Amino acid primary sequence

2. Homologue(s) with known 3D structure?

Homology modellingavailable

1. Search for sequence homologue(s) and construct an alignment

3. Motif recognition: Search secondary databases

Secondary structure prediction

Fold assignment

Physico-chemical properties

(e. g., using EMBOSS suite)

DNA sequence

Automatic translation

Primary db searches

FASTA, BLAST

Page 4: Secondary structure prediction from amino acid sequence
Page 5: Secondary structure prediction from amino acid sequence

Chou-Fasman Parameters

• Amino acid propensities

Page 6: Secondary structure prediction from amino acid sequence

• Q3 score

Q3 = q+q+qcoil X 100%

total no. of residues

Accuracy of prediction

Page 7: Secondary structure prediction from amino acid sequence

Recent improvements

• The availability of large families of homologous sequences has greatly enhanced secondary structure prediction.

• The combination of sequence data in multiple alignments with sophisticated computing techniques such as neural networks has lead to accuracies well in excess of 70 %.

• The limit of 70-80% may be a function of secondary structure variation within homologous proteins.

Page 8: Secondary structure prediction from amino acid sequence

Stereochemical analysis

Patterns of residue conservation are indicative of particular secondary structure types.

Alpha helices have a periodicity of 3.6. Many alpha helices in proteins are amphipathic, meaning that one face is pointing towards the hydrophobic core and the other towards the solvent.

Patterns of hydrophobic residue conservation showing the i, i+3, i+4, i+7 pattern are highly indicative of an alpha helix.

XOOXXOOX

Page 9: Secondary structure prediction from amino acid sequence

Stereochemical analysis

The geometry of beta strands means that adjacent residues have their side chains pointing in oppposite directions.

Beta strands that are half buried in the protein core will tend to have hydrophobic residues at positions i, i+2, i+4, i+8 etc, and polar residues at positions i+1, i+3, i+5, etc.

XOXOXOXOXO

Page 10: Secondary structure prediction from amino acid sequence

Stereochemical analysis

Beta strands that are completely buried (as is often the case in proteins containing both alpha helices and beta strands) usually contain a run of hydrophobic residues.

XXXXXXXXXXXX

Page 11: Secondary structure prediction from amino acid sequence

Helical transmembrane proteins

• Strong hydrophobicity signal from membrane spanning regions, each ~25 residues in length

• Predominance of positively charged amino acid residues on cytoplasmic side

•Prediction accuracy with multiple alignment = 95%

+

Page 12: Secondary structure prediction from amino acid sequence

Helical transmembrane proteins• ~30% of top 100 drugs bind to membrane

proteins• Difficult to determine experimentally• But much easier to predict than globular

proteins!

• TMpred – based on statistical analysis of transmembrane proteins

• TMHMM – based on Hidden Markov Model

Page 13: Secondary structure prediction from amino acid sequence

Protein Structure Classification

http://www.cathdb.info/latest/index.html

Class(C) secondary structure content – mainly alpha, mainly beta, alpha/beta, few secondary structures (type) Architecture(A) gross arrangement of sec. structure elements (type and number of SS elements)

Topology(T) shape and connectivity of SS (type, number and order of SS elements)

Homologous superfamily (H)

Page 14: Secondary structure prediction from amino acid sequence

Topology

Page 15: Secondary structure prediction from amino acid sequence

Class Architecture Topology H-level

Fold families

Homologous domains, share common ancestor

Page 16: Secondary structure prediction from amino acid sequence

Class Architecture Topology H-level

Fold families

Homologous domains, share common ancestor

In CATH, the assignments of structures to fold groups and homologous superfamilies are made by sequence and structure comparisons.

Page 17: Secondary structure prediction from amino acid sequence

Architecture: ‘Barrel’

9 Topologies : type of SS, number and order

Homologous domain family ?

Page 18: Secondary structure prediction from amino acid sequence

Secondary structure prediction methods

• PSI-pred (PSI-BLAST profiles used for prediction; David Jones, Warwick)

• JPRED Consensus prediction (includes many of the methods given below; Cuff & Barton, EBI)

• DSC King & Sternberg• PREDATORFrischman & Argos (EMBL) • PHD home page Rost & Sander, EMBL, Germany • ZPRED server Zvelebil et al., Ludwig, U.K. • nnPredict Cohen et al., UCSF, USA. • BMERC PSA Server Boston University, USA • SSP (Nearest-neighbor) Solovyev and Salamov, Baylor

College, USA.

http://speedy.embl-heidelberg.de/gtsp/secstrucpred.html

Page 19: Secondary structure prediction from amino acid sequence

Consensus prediction method

hydrophobic

highly conserved b= buried, e = exposed

Page 20: Secondary structure prediction from amino acid sequence

Consensus prediction method -JPRED

hydrophobic

highly conserved b= buried, e = exposed

amphipathichydrophobic

Page 21: Secondary structure prediction from amino acid sequence

Neural network prediction - PHD

Multiple alignment

of protein family

SS profile for window of adjacent residues

Page 22: Secondary structure prediction from amino acid sequence

Hidden Markov Models-HMMSTR

amino acid

secondary structure element

structural context

Markov state

• Recurrent local features of protein sequences

• Accuracy of 74%

Bystroff et al., 2000