Download - From Sequences to Structure
![Page 1: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/1.jpg)
From Sequences to Structure
Illustrations from: C Branden and J Tooze, Introduction to Protein Structure, 2nd ed. Garland Pub. ISBN 0815302703
![Page 2: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/2.jpg)
Protein Functions•Mechanoenzymes: myosin, actin•Rhodopsin: allows vision•Globins: transport oxygen•Antibodies: immune system•Enzymes: pepsin, renin, carboxypeptidase A•Receptors: transmembrane signaling•Vitelogenin: molecular velcro–And hundreds of thousands more…
2
![Page 3: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/3.jpg)
Proteins are Chains of Amino Acids•Polymer – a molecule composed of repeating units
3
![Page 4: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/4.jpg)
The Peptide Bond
•Dehydration synthesis•Repeating backbone: N–C –C –N–C –C
–Convention – start at amino terminus and proceed to carboxy terminus
4
O O
![Page 5: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/5.jpg)
Peptidyl polymers•A few amino acids in a chain are called a polypeptide. A protein is usually composed of 50 to 400+ amino acids.•Since part of the amino acid is lost during dehydration synthesis, we call the units of a protein amino acid residues.
5
carbonylcarbonylcarboncarbon
amideamidenitrogennitrogen
![Page 6: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/6.jpg)
Side Chain Properties•Recall that the electronegativity of carbon is at about the middle of the scale for light elements–Carbon does not make hydrogen bonds with water easily – hydrophobic–O and N are generally more likely than C to h-bond to water – hydrophilic•We group the amino acids into three general groups:–Hydrophobic–Charged (positive/basic & negative/acidic)–Polar
6
![Page 7: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/7.jpg)
The Hydrophobic Amino Acids
7
Proline severelyProline severelylimits allowablelimits allowableconformations!conformations!
![Page 8: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/8.jpg)
The Charged Amino Acids
8
![Page 9: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/9.jpg)
The Polar Amino Acids
9
![Page 10: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/10.jpg)
More Polar Amino Acids
10
And then there’s…And then there’s…
![Page 11: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/11.jpg)
Planarity of the Peptide Bond
11
![Page 12: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/12.jpg)
Phi and psi
• = = 180° is extended conformation• : C to N–H• : C=O to C
OCCBIO 2006 – Fundamental Bioinformatics
12
![Page 13: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/13.jpg)
The Ramachandran Plot
•G. N. Ramachandran – first calculations of sterically allowed regions of phi and psi•Note the structural importance of glycine
13
Observed(non-glycine)
Observed(glycine)Calculate
d
![Page 14: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/14.jpg)
Primary and Secondary Structure•Primary structurePrimary structure = the linear sequence of amino acids comprising a protein:
AGVGTVPMTAYGNDIQYYGQVT…•Secondary structureSecondary structure
Regular patterns of hydrogen bonding in proteins result in two patterns that emerge in nearly every protein structure known: the -helix and the-sheetThe location of direction of these periodic, repeating structures is known as the secondary structuresecondary structure of the protein
14
![Page 15: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/15.jpg)
The alpha Helix
15
60°
![Page 16: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/16.jpg)
Properties of the alpha helix
• 60°•Hydrogen bondsHydrogen bondsbetween C=O ofresidue n, andNH of residuen+4•3.6 residues/turn•1.5 Å/residue rise•100°/residue turn
16
![Page 17: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/17.jpg)
Properties of -helices•4 – 40+ residues in length•Often amphipathic or “dual-natured”–Half hydrophobic and half hydrophilic–Mostly when surface-exposed•If we examine many -helices,we find trends…–Helix formers: Ala, Glu, Leu,Met–Helix breakers: Pro, Gly, Tyr,Ser
17
![Page 18: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/18.jpg)
The beta Strand (and Sheet)
18
135° +135°
![Page 19: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/19.jpg)
Properties of beta sheets•Formed of stretches of 5-10 residues in extended conformation•Pleated – each C a bitabove or below the previous•Parallel/aniparallelParallel/aniparallel,contiguous/non-contiguous
OCCBIO 2006 – Fundamental Bioinformatics
19
![Page 20: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/20.jpg)
Parallel and anti-parallel -sheets•Anti-parallel is slightly energetically favored
20
Anti-Anti-parallelparallel
ParalleParallell
![Page 21: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/21.jpg)
Turns and Loops•Secondary structure elements are connected by regions of turns and loops•Turns – short regionsof non-, non-conformation•Loops – larger stretches with no secondary structure. Often disordered.
•“Random coil”•Sequences vary much more than secondary structure regions
21
![Page 22: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/22.jpg)
Levels of Protein Structure
•Secondary structure elements combine to form tertiary structure•Quaternary structure occurs in multienzyme complexes–Many proteins are active only as homodimers, homotetramers, etc.
![Page 23: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/23.jpg)
Disulfide Bonds•Two cyteines in close proximity will form a covalent bond•Disulfide bond, disulfide bridge, or dicysteine bond.•Significantly stabilizes tertiary structure.
23
![Page 24: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/24.jpg)
Protein Structure Examples
24
![Page 25: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/25.jpg)
Determining Protein Structure•There are ~ 100,000 distinct proteins in the human proteome.•3D structures have been determined for 14,000 proteins, from all organisms–Includes duplicates with different ligands bound, etc.
•Coordinates are determined by X-ray X-ray crystallographycrystallography
25
![Page 26: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/26.jpg)
X-Ray diffraction
•Image is averagedover:–Space (many copies)–Time (of the diffractionexperiment)
26
![Page 27: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/27.jpg)
Electron Density Maps•Resolution is dependent on the quality/regularity of the crystal•R-factor is a measure of “leftover” electron density•Solvent fitting•Refinement
27
![Page 28: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/28.jpg)
The Protein Data Bank•http://www.rcsb.org/pdb/
28
ATOM 1 N ALA E 1 22.382 47.782 112.975 1.00 24.09 3APR 213ATOM 2 CA ALA E 1 22.957 47.648 111.613 1.00 22.40 3APR 214ATOM 3 C ALA E 1 23.572 46.251 111.545 1.00 21.32 3APR 215ATOM 4 O ALA E 1 23.948 45.688 112.603 1.00 21.54 3APR 216ATOM 5 CB ALA E 1 23.932 48.787 111.380 1.00 22.79 3APR 217ATOM 6 N GLY E 2 23.656 45.723 110.336 1.00 19.17 3APR 218ATOM 7 CA GLY E 2 24.216 44.393 110.087 1.00 17.35 3APR 219ATOM 8 C GLY E 2 25.653 44.308 110.579 1.00 16.49 3APR 220ATOM 9 O GLY E 2 26.258 45.296 110.994 1.00 15.35 3APR 221ATOM 10 N VAL E 3 26.213 43.110 110.521 1.00 16.21 3APR 222ATOM 11 CA VAL E 3 27.594 42.879 110.975 1.00 16.02 3APR 223ATOM 12 C VAL E 3 28.569 43.613 110.055 1.00 15.69 3APR 224ATOM 13 O VAL E 3 28.429 43.444 108.822 1.00 16.43 3APR 225ATOM 14 CB VAL E 3 27.834 41.363 110.979 1.00 16.66 3APR 226ATOM 15 CG1 VAL E 3 29.259 41.013 111.404 1.00 17.35 3APR 227ATOM 16 CG2 VAL E 3 26.811 40.649 111.850 1.00 17.03 3APR 228
![Page 29: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/29.jpg)
Views of a Protein
29
Wireframe
Ball and stick
![Page 30: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/30.jpg)
Views of a Protein
30
Spacefill Cartoon CPK colors
Carbon = green, black
Nitrogen = blue
Oxygen = red
Sulfur = yellow
Hydrogen = white
![Page 31: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/31.jpg)
The Protein Folding Problem•Central question of molecular biology:“Given a particular sequence of amino acid Given a particular sequence of amino acid residues (primary structure), what will the residues (primary structure), what will the tertiary/quaternary structure of the resulting tertiary/quaternary structure of the resulting protein be?”protein be?”•Input: AAVIKYGCAL…Output: 11, 22…= backbone conformation:(no side chains yet)
31
![Page 32: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/32.jpg)
Forces Driving Protein Folding•It is believed that hydrophobic collapse is a key driving force for protein folding–Hydrophobic core–Polar surface interacting with solvent
•Minimum volume (no cavities)•Disulfide bond formation stabilizes•Hydrogen bonds•Polar and electrostatic interactions
32
![Page 33: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/33.jpg)
Folding Help•Proteins are, in fact, only marginally stable–Native state is typically only 5 to 10 kcal/mole more stable than the unfolded form
•Many proteins help in folding–Protein disulfide isomerase – catalyzes shuffling of disulfide bonds–Chaperones – break up aggregates and (in theory) unfold misfolded proteins
33
![Page 34: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/34.jpg)
The Hydrophobic Core•Hemoglobin A is the protein in red blood cells (erythrocytes) responsible for binding oxygen.•The mutation E6V in the chain places a hydrophobic Val on the surface of hemoglobin•The resulting “sticky patch” causes hemoglobin S to agglutinate (stick together) and form fibers which deform the red blood cell and do not carry oxygen efficiently•Sickle cell anemia was the first identified molecular disease
34
![Page 35: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/35.jpg)
Sickle Cell Anemia
35
Sequestering hydrophobic residues in the protein core Sequestering hydrophobic residues in the protein core protects proteins from hydrophobic agglutination.protects proteins from hydrophobic agglutination.
![Page 36: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/36.jpg)
Computational Problems in Protein Folding
•Two key questions:–Evaluation – how can we tell a correctly-folded protein from an incorrectly folded protein?•H-bonds, electrostatics, hydrophobic effect, etc.•Derive a function, see how well it does on “real” proteins
–Optimization – once we get an evaluation function, can we optimize it?•Simulated annealing/monte carlo•EC•Heuristics
36
![Page 37: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/37.jpg)
Fold Optimization•Simple lattice models (HP-models)–Two types of residues: hydrophobic and polar–2-D or 3-D lattice–The only force is hydrophobic collapse–Score = number of HH contacts
37
![Page 38: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/38.jpg)
Scoring Lattice ModelsScoring Lattice ModelsH/P model scoring: count noncovalent
hydrophobic interactions.
Sometimes:Penalize for buried polar or surface hydrophobic residues
38
![Page 39: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/39.jpg)
What can we do with lattice models?•For smaller polypeptides, exhaustive search can be used–Looking at the “best” fold, even in such a simple model, can teach us interesting things about the protein folding process
•For larger chains, other optimization and search methods must be used–Greedy, branch and bound–Evolutionary computing, simulated annealing–Graph theoretical methods
39
![Page 40: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/40.jpg)
Learning from Lattice ModelsLearning from Lattice ModelsThe “hydrophobic zipper” effect:
40
Ken Dill ~ 1997
![Page 41: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/41.jpg)
Representing a lattice modelRepresenting a lattice modelAbsolute directionsUURRDLDRRU
Relative directionsLFRFRRLLFFLAdvantage, we can’t have UD or RL in absoluteOnly three directions: LRF
What about bumps? LFRRRBad scoreUse a better representation
41
![Page 42: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/42.jpg)
Preference-order representation•Each position has two “preferences”–If it can’t have either of the two, it will take the “least favorite” path if possible
•Example: {LR},{FL},{RL},{FR},{RL},{RL},{FR},{RF}
•Can still cause bumps:{LF},{FR},{RL},{FL},{RL},{FL},{RF},{RL},{FL}
42
![Page 43: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/43.jpg)
More Realistic Models•Higher resolution lattices (45° lattice, etc.)•Off-lattice models–Local moves–Optimization/search methods and / representations•Greedy search•Branch and bound•EC, Monte Carlo, simulated annealing, etc.
43
![Page 44: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/44.jpg)
The Other Half of the Picture•Now that we have a more realistic off-lattice model, we need a better energy function to evaluate a conformation (fold).•Theoretical force field:G = Gvan der Waals + Gh-bonds + Gsolvent + Gcoulomb
•Empirical force fields–Start with a database–Look at neighboring residues – similar to known protein folds?
44
![Page 45: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/45.jpg)
Threading: Fold recognition•Given:–Sequence: IVACIVSTEYDVMKAAR…–A database of molecular coordinates•Map the sequence onto each fold•Evaluate–Objective 1: improve scoring function–Objective 2: folding
45
![Page 46: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/46.jpg)
Secondary Structure Prediction
46
AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…A-VGIVPM-AYGQDIQY-GQVT…AG-GIIP--AYGNELQ--GQVT…AGVCTVPMTA---ELQYYG--T…
AGVGTVPMTAYGNDIQYYGQVT…AGVGTVPMTAYGNDIQYYGQVT…----hhhHHHHHHhhh--eeEE…----hhhHHHHHHhhh--eeEE…
![Page 47: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/47.jpg)
Secondary Structure Prediction•Easier than folding–Current algorithms can prediction secondary structure with 70-80% accuracy
•Chou, P.Y. & Fasman, G.D. (1974). Biochemistry, 13, 211-222.–Based on frequencies of occurrence of residues in helices and sheets
•PhD – Neural network based–Uses a multiple sequence alignment–Rost & Sander, Proteins, 1994 , 19, 55-72
47
![Page 48: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/48.jpg)
Chou-Fasman Parameters
48
Name Abbrv P(a) P(b) P(turn) f(i) f(i+1) f(i+2) f(i+3)Alanine A 142 83 66 0.06 0.076 0.035 0.058Arginine R 98 93 95 0.07 0.106 0.099 0.085Aspartic Acid D 101 54 146 0.147 0.11 0.179 0.081Asparagine N 67 89 156 0.161 0.083 0.191 0.091Cysteine C 70 119 119 0.149 0.05 0.117 0.128Glutamic Acid E 151 37 74 0.056 0.06 0.077 0.064Glutamine Q 111 110 98 0.074 0.098 0.037 0.098Glycine G 57 75 156 0.102 0.085 0.19 0.152Histidine H 100 87 95 0.14 0.047 0.093 0.054Isoleucine I 108 160 47 0.043 0.034 0.013 0.056Leucine L 121 130 59 0.061 0.025 0.036 0.07Lysine K 114 74 101 0.055 0.115 0.072 0.095Methionine M 145 105 60 0.068 0.082 0.014 0.055Phenylalanine F 113 138 60 0.059 0.041 0.065 0.065Proline P 57 55 152 0.102 0.301 0.034 0.068Serine S 77 75 143 0.12 0.139 0.125 0.106Threonine T 83 119 96 0.086 0.108 0.065 0.079Tryptophan W 108 137 96 0.077 0.013 0.064 0.167Tyrosine Y 69 147 114 0.082 0.065 0.114 0.125Valine V 106 170 50 0.062 0.048 0.028 0.053
![Page 49: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/49.jpg)
Chou-Fasman Algorithm•Identify -helices–4 out of 6 contiguous amino acids that have P(a) > 100–Extend the region until 4 amino acids with P(a) < 100 found–Compute P(a) and P(b); If the region is >5 residues and P(a) > P(b) identify as a helix•Repeat for -sheets [use P(b)]•If an and a region overlap, the overlapping region is predicted according to P(a) and P(b)
49
![Page 50: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/50.jpg)
Chou-Fasman, cont’d•Identify hairpin turns:–P(t) = f(i) of the residue f(i+1) of the next residue f(i+2) of the following residue f(i+3) of the residue at position (i+3)–Predict a hairpin turn starting at positions where:•P(t) > 0.000075•The average P(turn) for the four residues > 100•P(a) < P(turn) > P(b) for the four residues
•Accuracy 60-65%
50
![Page 51: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/51.jpg)
Chou-Fasman Example•CAENKLDHVRGPTCILFMTWYNDGP
•CAENKL – Potential helix (!C and !N)•Residues with P(a) < 100: RNCGPSTY
–Extend: When we reach RGPT, we must stop–CAENKLDHV: P(a) = 972, P(b) = 843–Declare alpha helix
•Identifying a hairpin turn–VRGP: P(t) = 0.000085–Average P(turn) = 113.25•Avg P(a) = 79.5, Avg P(b) = 98.25
51
![Page 52: From Sequences to Structure](https://reader036.vdocument.in/reader036/viewer/2022062407/56812bc9550346895d901d87/html5/thumbnails/52.jpg)
Lots More to Come•Microarray analysis•Mass Spectrometry•Interactions/ Knockouts•Synthetic Lethality•RPPA•.....
52