protein folding & biospectroscopy lecture 6 f14pfb david robinson

38
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Upload: virginia-smith

Post on 18-Jan-2016

220 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Protein Folding & Biospectroscopy

Lecture 6

F14PFB

David Robinson

Page 2: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Protein Folding

1. Introduction

2. Protein Structure

3. Interactions

4. Protein Folding Models

5. Biomolecular Modelling

6. Bioinformatics

Page 3: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson
Page 4: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Homology Modelling

• Based on the observation that “Similar sequences exhibit similar structures”

• Known structure is used as a template to model an unknown (but likely similar) structure with known sequence

• First applied in late 1970’s using early computer imaging methods (Tom Blundell)

Page 5: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Homology Modelling

• Offers a method to “Predict” the 3D structure of proteins for which it is not possible to obtain X-ray or NMR data

• Can be used in understanding function, activity, specificity, etc.

• Of interest to drug companies wishing to do structure-aided drug design

Page 6: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Terminology

• target is sequence of protein with unknown 3D structure• template is sequence of protein already with known 3D

structure (modelled experimentally usually)• alignment of sequence is rearrangement of subsets (of a

sequence) to find maximum similarities• rigid bodies conserved -helices & -sheets• sidechains: decorative stuff for backbone (v. imp. for

interactions), actually residues connected to rigid bodies.• loops connecting to the core-regions (-helices & -sheets)• spatial restraints (bond length, bond angles etc.)• PDB Protein Data Bank

Page 7: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Homology Modelling• Identify homologous sequences in PDB• Align query sequence with homologues• Find Structurally Conserved Regions (SCRs)• Identify Structurally Variable Regions (SVRs)• Generate coordinates for core region• Generate coordinates for loops• Add side chains (Check rotamer library)• Refine structure using energy minimization• Validate structure

Page 8: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 1: ID Homologues in PDB

PRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQIYPASDFGHKLMCNASQERWWPRETWQLKHGFDSADAMNCVCNQWERGFDHSDASFWERQWK

Query Sequence PDB

Page 9: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 1: ID Homologues in PDB

PRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQIYPASDFGHKLMCNASQERWWPRETWQLKHGFDSADAMNCVCNQWERGFDHSDASFWERQWK

Query Sequence PDB

PRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQIYPASDFGHKLMCNASQERWWPRETWQLKHGFDSADAMNCVCNQWERGFDHSDASFWERQWK

PRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQIYPASDFG

PRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQIYPASDFGPRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQIYPASDFGTREWQIYPASDFGPRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQ

PRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQQWEWEWQWEWEQWEWEWQRYEYEWQWNCEQWERYTRASDFHGTREWQIYPASDWERWEREWRFDSFG

PRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQIYPASDFGHKLMCNASQERWWPRETWQLKHGFDSADAMNCVCNQWERGFDHSDASFWERQWK

PRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQIYPASDFG

PRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQNCEQWERYTRASDFHGTREWQIYPASDFGPRTEINSEQENC

PRTEINSEQENCEPRTEINSEQUENCEPRTEINSEQQWEWEWQWEWEQWEWEWQRYEYEWQWNCEQWERYTRASDFHGTR

Hit #1

Hit #2

Page 10: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 2: Align Sequences

G E N E T I C SG 10 0 0 0 0 0 0 0E 0 10 0 10 0 0 0 0N 0 0 10 0 0 0 0 0E 0 10 0 10 0 0 0 0S 0 0 0 0 0 0 0 10I 0 0 0 0 0 10 0 0S 0 0 0 0 0 0 0 10

G E N E T I C SG 60 40 30 20 20 10 10 0E 40 50 30 30 20 10 10 0N 30 30 40 20 20 10 10 0E 20 30 20 30 20 10 10 0S 20 20 20 20 20 10 10 10I 10 10 10 10 10 20 10 0S 0 0 0 0 0 0 0 10

Dynamic Programming

Identity Matrix Sum Matrix

Page 11: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 2: Align Sequences

G E N E T I C SG 10 0 0 0 0 0 0 0E 0 10 0 10 0 0 0 0N 0 0 10 0 0 0 0 0E 0 10 0 10 0 0 0 0S 0 0 0 0 0 0 0 10I 0 0 0 0 0 10 0 0S 0 0 0 0 0 0 0 10

G E N E T I C SG 0E 0N 0E 0S 10I 0S 0 0 0 0 0 0 0 10

Dynamic Programming

Identity Matrix Sum Matrix

Page 12: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 2: Align Sequences

G E N E T I C SG 10 0 0 0 0 0 0 0E 0 10 0 10 0 0 0 0N 0 0 10 0 0 0 0 0E 0 10 0 10 0 0 0 0S 0 0 0 0 0 0 0 10I 0 0 0 0 0 10 0 0S 0 0 0 0 0 0 0 10

G E N E T I C SG 0E 0N 0E 0S 10I 0S 0 0 0 0 0 0 0 10

Dynamic Programming

10

Identity Matrix Sum Matrix

Page 13: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 2: Align Sequences

G E N E T I C SG 10 0 0 0 0 0 0 0E 0 10 0 10 0 0 0 0N 0 0 10 0 0 0 0 0E 0 10 0 10 0 0 0 0S 0 0 0 0 0 0 0 10I 0 0 0 0 0 10 0 0S 0 0 0 0 0 0 0 10

G E N E T I C SG 0E 0N 0E 0S 10I 0S 0 0 0 0 0 0 0 10

Dynamic Programming

1010

Identity Matrix Sum Matrix

Page 14: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 2: Align Sequences

G E N E T I C SG 10 0 0 0 0 0 0 0E 0 10 0 10 0 0 0 0N 0 0 10 0 0 0 0 0E 0 10 0 10 0 0 0 0S 0 0 0 0 0 0 0 10I 0 0 0 0 0 10 0 0S 0 0 0 0 0 0 0 10

G E N E T I C SG 0E 0N 0E 0S 10I 0S 0 0 0 0 0 0 0 10

Dynamic Programming

1010 10 10 10

Identity Matrix Sum Matrix

Page 15: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 2: Align Sequences

G E N E T I C SG 10 0 0 0 0 0 0 0E 0 10 0 10 0 0 0 0N 0 0 10 0 0 0 0 0E 0 10 0 10 0 0 0 0S 0 0 0 0 0 0 0 10I 0 0 0 0 0 10 0 0S 0 0 0 0 0 0 0 10

G E N E T I C SG 0E 0N 0E 0S 10I 0S 0 0 0 0 0 0 0 10

Dynamic Programming

1010 10 10 10 20

Identity Matrix Sum Matrix

Page 16: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

20

Step 2: Align Sequences

G E N E T I C SG 10 0 0 0 0 0 0 0E 0 10 0 10 0 0 0 0N 0 0 10 0 0 0 0 0E 0 10 0 10 0 0 0 0S 0 0 0 0 0 0 0 10I 0 0 0 0 0 10 0 0S 0 0 0 0 0 0 0 10

G E N E T I C SG 0E 0N 0E 0S 10I 0S 0 0 0 0 0 0 0 10

Dynamic Programming

1010 10 10 10

Identity Matrix Sum Matrix

10

Page 17: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

20

Step 2: Align Sequences

G E N E T I C SG 10 0 0 0 0 0 0 0E 0 10 0 10 0 0 0 0N 0 0 10 0 0 0 0 0E 0 10 0 10 0 0 0 0S 0 0 0 0 0 0 0 10I 0 0 0 0 0 10 0 0S 0 0 0 0 0 0 0 10

G E N E T I C SG 0E 0N 0E 0S 10I 0S 0 0 0 0 0 0 0 10

Dynamic Programming

1010 10 10 10

Identity Matrix Sum Matrix

1010

Page 18: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 2: Align Sequences

ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEGASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEAMCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAA

QueryHit #1Hit #2

Hit #1 Hit #2

Page 19: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Alignment

• Key step in Homology Modelling

• Global (Needleman-Wunsch) alignment is absolutely required

• Small error in alignment can lead to big error in structural model

• Multiple alignments are usually better than pairwise alignments

Page 20: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Alignment Thresholds

Page 21: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Hit #1 Hit #2

Step 3: Find SCR’s

ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEGASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEAMCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAAHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCBBBBBBBBB

QueryHit #1Hit #2

SCR #1 SCR #2

Page 22: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Structurally Conserved Regions (SCR’s)

• Correspond to the most stable regions (usually interior) of protein

• Correspond to sequence regions with lowest level of gapping, highest level of sequence conservation

• Usually correspond to secondary structures

Page 23: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Hit #1 Hit #2

Step 4: Find SVR’s

ACDEFGHIKLMNPQRST--FGHQWERT-----TYREWYEGASDEYAHLRILDPQRSTVAYAYE--KSFAPPGSFKWEYEAMCDEYAHIRLMNPERSTVAGGHQWERT----GSFKEWYAAHHHHHHHHHHHHHCCCCCCCCCCCCCCCCCCBBBBBBBBB

QueryHit #1Hit #2

SVR (loop)

Page 24: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Structurally Variable Regions (SVR’s)

• Correspond to the least stable or most flexible regions (usually exterior) of protein

• Correspond to sequence regions with highest level of gapping, lowest level of sequence conservation

• Usually correspond to loops and turns

Page 25: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

ATOM 1 N ALA A 1 21.389 25.406 -4.628 1.00 23.22 2TRX 152 ATOM 2 CA ALA A 1 21.628 26.691 -3.983 1.00 24.42 2TRX 153 ATOM 3 C ALA A 1 20.937 26.944 -2.679 1.00 24.21 2TRX 154 ATOM 4 O ALA A 1 21.072 28.079 -2.093 1.00 24.97 2TRX 155 ATOM 5 CB ALA A 1 21.117 27.770 -5.002 1.00 28.27 2TRX 156 ATOM 6 OG SER A 1 22.276 27.925 -5.861 1.00 32.61 2TRX 157 ATOM 7 N GLU A 2 20.173 26.028 -2.163 1.00 21.39 2TRX 158 ATOM 8 CA GLU A 2 19.395 26.125 -0.949 1.00 21.57 2TRX 159 ATOM 9 C GLU A 2 20.264 26.214 0.297 1.00 20.89 2TRX 160 ATOM 10 O GLU A 2 19.760 26.575 1.371 1.00 21.49 2TRX 161

Step 5: Generate Coordinates ATOM 1 N SER A 1 21.389 25.406 -4.628 1.00 23.22 2TRX 152 ATOM 2 CA SER A 1 21.628 26.691 -3.983 1.00 24.42 2TRX 153 ATOM 3 C SER A 1 20.937 26.944 -2.679 1.00 24.21 2TRX 154 ATOM 4 O SER A 1 21.072 28.079 -2.093 1.00 24.97 2TRX 155 ATOM 5 CB SER A 1 21.117 27.770 -5.002 1.00 28.27 2TRX 156 ATOM 6 OG SER A 1 22.276 27.925 -5.861 1.00 32.61 2TRX 157 ATOM 7 N ASP A 2 20.173 26.028 -2.163 1.00 21.39 2TRX 158 ATOM 8 CA ASP A 2 19.395 26.125 -0.949 1.00 21.57 2TRX 159 ATOM 9 C ASP A 2 20.264 26.214 0.297 1.00 20.89 2TRX 160 ATOM 10 O ASP A 2 19.760 26.575 1.371 1.00 21.49 2TRX 161

ALA

Page 26: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 5: Generate Core Coordinates

• For identical amino acids, transfer all atom coordinates (XYZ) to query protein

• For similar amino acids, transfer backbone coordinates & replace side chain atoms while respecting angles

• For different amino acids, transfer only the backbone coordinates (XYZ) to query sequence

Page 27: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 6: Replace SVRs (loops)

FGHQWERTYAYE--KS

QueryHit #1

Page 28: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Loop Library

• Loops extracted from PDB using high resolution (<2 Å) X-ray structures

• Typically thousands of loops in DB

• Includes loop coordinates, sequence, # residues in loop, C-C distance, preceding 2o structure and following 2o structure (or their C coordinates)

Page 29: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 6: Replace SVRs (loops)

• Must match desired # residues • Must match C-C distance (<0.5 Å)• Must not bump into other parts of protein

(no C-C distance <3.0 Å)• Preceding and following C’s (3 residues)

from loop should match well with corresponding C coordinates in template structure

Page 30: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 6: Replace SVRs (loops)

• Loop placement and positioning is done using superposition algorithm

• Loop fits are evaluated using RMSD calculations and standard “bump checking”

• If no “good” loop is found, some algorithms create loops using randomly generated angles

Page 31: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 7: Add Side Chains

Page 32: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

CCOOHH2N

H

NH3+

Amino Acid Side Chains

Page 33: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 7: Add Side Chains

• Done primarily for SVRs (not SCRs)

• Rotamer placement and positioning is done via a superposition algorithm using rotamers taken from a standardized library (Trial & Error)

• Rotamer fits are evaluated using simple “bump checking” methods

Page 34: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Step 8: Energy Minimization

Page 35: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Energy Minimization

• Removes atomic overlaps and unnatural strains in the structure

• Stabilizes or reinforces strong hydrogen bonds, breaks weak ones

• Brings protein to lowest energy in about 1-2 minutes CPU time

Page 36: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

ActualModelled

The Final Result

Page 37: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

Summary• Identify homologous sequences in PDB• Align query sequence with homologues• Find Structurally Conserved Regions (SCRs)• Identify Structurally Variable Regions (SVRs)• Generate coordinates for core region• Generate coordinates for loops• Add side chains (Check rotamer library)• Refine structure using energy minimization• Validate structure

Page 38: Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson

(a) Sketch the chemical structure corresponding to the atoms listed in the excerpt from the Protein Data Bank (PDB) file below. Add the missing side chain atoms for Ala. Indicate the position of the peptide bond on your sketch.

ATOM 4 N SER A 1 37.713 48.836 48.018 ATOM 2 CA SER A 1 38.297 49.795 48.973 ATOM 3 C SER A 1 37.542 49.799 50.268 ATOM 4 O SER A 1 38.112 49.890 51.357 ATOM 5 CB SER A 1 38.285 51.207 48.322 ATOM 6 OG SER A 1 39.140 51.101 47.211 ATOM 7 N ALA A 2 36.185 49.706 50.220 ATOM 8 CA ALA A 2 35.421 49.702 51.450 ATOM 9 C ALA A 2 35.725 48.497 52.368 ATOM 10 O ALA A 2 35.884 48.638 53.572 ATOM 11 CB ALA A 2 33.910 49.733 51.233 (b) Indicate on your sketch in part (a) the N-terminus and C-terminus, and the main-chain dihedral angles, , , and .

(c) Place the following main-chain bond lengths in order of decreasing length (longest first): Cα-C, C-O, C-N and N-Cα. Explain the physical reason for this ordering.

(d) Sketch a graph showing the variation of energy versus separation of two particles for a van der Waals interaction.