proteins - eb.tuebingen.mpg.de filekeefe and szostak (2001) nature 410 wei et al (2003) pnas 100. 7...
TRANSCRIPT
1
Proteins:Proteins:Evolution & DesignEvolution & Design
22
Outline● What have we learned from nature protein evolution→
– Permanence of protein in evolution● Reconstruction of evolutionary history
– Change of protein in evolution
● How can we apply the knowledge protein design→– Computational design– Design of proteins– Design of enzymes
3
Similarities suggests evolutionary relationships
Similarityin
SequenceStructure
Function / mechanism
Homology
Analogy
4
Sequence determines structure
Sequence specifies structure
C. B. Anfinsen, 1950s
5
Protein sequence space is BIG
A2A2A1A1
A3A3
A4A4
A5A5
A6A6 A7
A7
20x 20
x 20
x 20
x 20…
Each of the 20 amino acids can be at each position of the polypeptide chain
→ 20Nres protein sequences
A polypeptide of 100 AA → possibilities > num of particles in the universe
6
Most polypeptides don’t fold
Enormous number of possible amino acid sequences
Only a tiny subset fold reliably into a functional
native state
Keefe and Szostak (2001) Nature 410Wei et al (2003) PNAS 100
7
Levinthal's paradox● Protein conformational space is huge
– Two torsion angles (Φ,Ψ) per residue– m values per torsion angle– m2Nres possible backbone conformations
● Proteins folds quickly– Within a few ms or s
wikipedia
88
How did proteins originate?
99
Evolution is cumulative● Cumulative selection
– Evolution is not a completely random search
– Partially correct intermediates are retained
● Eventually, they build up to large, stable folds– Duplication, fusion,
recombination, accretion
Berg JM et al. Biochemistry. 5th ed.
# of
ke
ystro
kes
1010
Protein evolution is cumulative● “Correctness”
– Native proteins are only marginally stable● ca. 0.1 kcal/mol per residue
– Intermediates are not necessarily stable on their own
● Rather, they show signifcant structural preference
– Worked as cofactors to ribozymes● Assume preferred structure upon binding
11
SsSs shared between diferent folds
12
“Piecing together the evolution of proteins”
Alva et al. (2015) eLife
1313
Reconstruction of Evolutionary Events
● Evolution cannot be proven– any direct observation of intermediate forms is impossible
● We study the likelihood of certain types of events and use it to extrapolate from traits observed today– We use protein sequences as documents of evolutionary events
● Why can we extrapolate so far back in time?– Because of the high evolutionary permanence of protein
domains● Sequence / structure / function / mechanism
Lupas & Koretke (2008) Evoluton of protein folds
...
14
Permanence of proteins across timeUbiquitin
human UBI MQIFVKTLTGKTITLEVEPSDTIENVKAKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG 7696% id |||||||||||||||||| |||| ||| ||||||||||||||||||||||||||||||||||||||||||||||||yeast UBI MQIFVKTLTGKTITLEVESSDTIDNVKSKIQDKEGIPPDQQRLIFAGKQLEDGRTLSDYNIQKESTLHLVLRLRGG 7613% id | | | | | | | | ||yeast SUMO INLKVSD-GSSEIFFKIKKTTPLRRLMEAFAKRQGKEMDSLRFLYDGIRIQADQTPEDLDMEDNDIIEAHREQIGG 86
N N
C CHuman & yeast UBI● diverged > 2 billion yrs ago● vital for eukaryotic
protein degradation
Yeast UBI & SUMO (paralogs)● only 13% seq identity● but key residues conserved
(found by profle-based methods, e.g. PSI-BLAST)
Lupas & Koretke (2008) Evoluton of protein folds
1515
Permanence of proteins across timeRibosomal proteins
Lupas & Koretke (2008) Evolution of protein folds
The sequences of core ribosomal proteins are still more than 40% identical between all organisms.
So central to cellular processes that its modifcation has become nearly impossible
→ living fossil
David Goodsell & RCSB
16
Permanence of proteins across timePoint mutations do not alter fold
● Hundreds of crystal structures of wild-type proteins and their mutants show that point mutations generally do not alter the fold of a protein.– e.g. look at each SCOP
family N
C’N’
1 mutation
NC
C’N’
RNA-binding protein ROPLefthandeee & andtiparallel
(PDB 1ROP)
Rightthandeee & mixedparallel & andtiparallel bundle
(PDB 1B6Q)
Glycos, Cesareni & Kokkinidis (1999) Structure 7:597
A31P
● Exceptions do exist.
e.g. coiled-coils and helical bundles
20
Reconstruction of ancient protein sequences
● It allows to investigate – the evolutionary past
of present-day protein structure and function directly in the lab
– evolutionary pathways– adaptive selection– functional divergence
● Suggested by Pauling and Zuckerkandl in 1963
● Only become feasible recently due to development of – genome sequence databases – phylogenetic inference methods
● Methods: – Parsimony– Maximum likelihood– Bayesian inference
21
Horizontal vs. vertical analysis of protein families
Harms & Thornton (2010) Curr Opin Struc Biol,20:360
The functional change was caused by a subset of sequence changes along branch C (black box), while permissive mutations on branch B (star) were required to allow the protein to tolerate the function-switching mutations. Restrictive mutations incompatible with the ancestral function accumulated on branch D (cross). Swapping residues between modern proteins (arrow) is inefficient because the sequences differ by all mutations along A, B, C, and D. Protein X does not have the permissive mutations and cannot take on the derived function, while protein Y has restrictive mutations that do not allow it to tolerate the ancestral function.
ancestral function (flled circle) derived function (open circle)
22
Origin and evolution of thermophilyAncestral Reconstruction of LeuB
Thermophily is thought to be a primitive trait, characteristic of early forms of life on Earth
Optimal growth temperatures: 20 °C25–30 °C 37 °C45–50 °C60–80 °C
Hobbs et al. (2012) Mol Biol Evol,29:825
23
Origin and evolution of thermophily Ancestral Reconstruction of LeuB
Hobbs et al. (2012) Mol Biol Evol,29:825
Trend in thermal adaptation for reconstructed ancestral LeuB enzymes over evolutionary time/estimated age.
Unusual catalytic properties of ANC4 compared with other ancestral LeuB enzymes.
‘Structural analysis suggests that the determinants of thermophily in LeuB from the LCA of Bacillus and the most recent ancestor (ANC1) are distinct and that thermophily has arisen in this genus at least twice via independent evolutionary paths.’
ANC4
ANC1ANC3
ANC2
24
Fold change in evolution● Fold space is small
– Thus, structure similarity more likely implies analogy than sequence similarity
● Homologous proteins could also show major structural difference
● Mechanisms– Insertions / Deletions– Circular permutations– Point mutations– Topological substitutions
Swaps & strand invasions– Duplication & fusion– Environment
Grishin (2001) JSB,134:167
Substitutions of SSEs
25
Mechanisms of fold changeDeletion
N
CN
C
deletion
Bacterial luciferase(PDB 1LUC)
Nonfuorescent favoprotein(PDB 1NFP)
Grishin (2001) JSB,134:167
26
Mechanisms of fold changeCircular permutation
N
C
N
C
circular permutation
C2 domain of synaptogamin I(PDB 1RSY)
C2 domain of phospholipase C(PDB 1QAS)
Grishin (2001) JSB,134:167
27
Mechanisms of fold changeCircular permutation
Tandem, in-frame gene duplication:
Loss of stop codon:
3' deletion:
Resolution through further deletionI: Gene returns to previous state
II: Circular permutation
28
Mechanisms of fold changeTarget binding
Transcription factor NusGNTD & CTD independent
(PDBs 2K06 & 2JVV)
Transcripton factor RfaH (paralog of NusG)CTD shields RNApol-binding site in NTD
(PDB 2OUG)
N
C
N
C
Belogurov et al. (2007) Mol Cell,26:117
Dissociate upon DNA binding
29
Mechanisms of fold changeTarget binding (cont)
Transcripton factor NusGNTD & CTD independent
(PDBs 2K06 & 2JVV)
N
C
N
C
Burmann et al. (2012) Cell
X-ray structure of RfaH(PDB 2OUG)
NMR structure of RfaH-CTD(PDB 2LCL)C
Switch friom a transcription tio a translation factior: Upon
interacton with its target DNA the CTD opens up a RNAP-binding site
30
Mechanisms of fold changeDuplication, diferentiation & swapping
duplication and diferentation
3D diomain swapping
decioration, duplication and diferentation
C’N
N’
C
N
AAA+ C-domain(PDB 1IN4)
ClpA N-domain(PDB 1K6K)
Histone dimer(PDB 1B67)
CN C
Alva et al. (2007) BMC Struc Biol,7:17
31
Fold change based on environmentLymphotactin
N
C(66)
N
C(60)
C’(60)N’
salt/temperature
monomeric chemokine fold(PDB 1J8I)
Agonist of G-protein coupled XCR1 receptor
dimeric beta-sandwich fold(PDB 2JP1)
Binds glycosaminoglycans
Under normal conditions lymphotactin exists in both forms. The equilibrium between these two states can be changed by varying salt and temperature conditions. Other chemokines are restricted to a single conformation by two conserved disulphide bonds, one of which is absent in lymphotactin. Upon engineering of this disulphide bond into lymphotactin, it is locked in the monomeric state
32
Fold change based on environmentInfuenza hemagglutinin
N
C
N
C
pH
a-helical homotrimer(PDB 1HGG)
Three-helix bundle(PDB 1HTM)
Mediates membrane fusion
Bullough et al. (1994) Nature,371:37
Under acidic conditons, hemagglutnin rearranges by extending the N-terminal a-helices of the monomers. The rearranged helical segments form a long three-helix bundle. The change mediates fusion of the viral and host cell membranes.
33
Fold change through engineering
Alexander et al. (2009) PNAS,106:21149
GA95albumin-binding 3a fold
(PDB 2KDL)
GB95IgG-binding 4b+a fold
(PDB 2KDM)
N
C N
C
mutation
34
Fold change through engineering
Alexander et al. (2009) PNAS,106:21149
albumin-bindingGA77
IgG-bindingGB77
35
Outline● What we have learned from nature protein evolution→
– Permanence of protein in evolution● Reconstruction of evolutionary history
– Change of protein in evolution
● How can we applying the knowledge protein design →– Computational design– Fold change by engineering– Enzyme design
36
Protein designThe inverse folding problem
Sequence determines structure
Design sequence fora predefined structure
37
Native structures are conformationalfree energy minima
Ener
gy
native
unfolded
Protein design:With which seq is this an energy minimum?
Anfinsen's dogma:
The native structure corresponds to a
1.unique2.stable3.kinetically accessible
free energy minimum.
38
Conformational degrees of freedom
Figure: Harder et al. (2010) BMC Bioinformatcs
Degrees of freedom in glutamic acid:• Backbone dihedrals Φ and Ψ,
• Side chain dihedrals Χ1, Χ2, Χ3,
→ ~32xNres polypeptide conformations
If Nres = 100● 3198 possibilities
→ folding time > the age of universe
● Obviously paradoxical to folding time● ms or μs
→ Levinthal’s paradox
Ca
CbCg
Cd
39
… and many possible amino acid combinations
A2A2A1A1
A3A3
A4A4
A5A5
A6A6 A7
A7
20x 20
x 20
x 20
x 20…
Each of the 20 amino acids can be at each position of the polypeptide chain
→ 20Nres protein sequences
A polypeptide of 100 AA → possibilities > num of particles in the universe
40
Abstractions in energy calculationsMolecular mechanics
● Nuclei and electrons are lumped into atom-like particles● Atom-like particles are spherical (radii obtained from
measurements or theory) and have a net charge (obtained from theory)
● Interactions are based on springs and classical potentials ● Interactions must be pre-assigned to specifc sets of atoms● Interactions are transferable between different molecules● Interactions determine the spatial distribution of atom-like
particles and their energies
41
Molecular mechanicsForce feld
Atoms spheres→Bonds springs (can stretch, bend, or twist)→
Non-bonded interactions:● van der Waals attractions● electrostatic attractions/repulsions
Energy = Stretching Energy + Bending Energy + Tiorsiion Energy + Nion-Bionded Interaction Energy
● Absolute quantities have no meaning● The differences between
conformations have meaning
42
Quick energy calculationRosetta design
Figure: D.Baker, youtube
(1) Lennard-Jones Potential, prefers atoms close, but not too close
(2) Implicit solvation model (penalizes polar residues in the core of a protein)
(3) Hydrogen bonding (allows polar residues in the core of the protein)
(4) Close electrostatic interactions (5) Preference of torsion angles
44
Rotamers & conformers● Conformational isomers can be inter-converted by
rotations around single bonds
● Rotamers are conformers that differ only by one dihedral angle.
● A rotamer library is a collection of conformers for each residue type in proteins with side-chain degrees of freedom.
● It usually contains information on both conformation and frequency of a conformation.
Ponders & Richards, JMB, 1987 – analysis of internal packingDunbrack & Karplus, JMB, 1993 – application to side-chain predictionLovell et al., Proteins, 2000 – the penultimate rotamer library
45
Protein design – work fow● Computational calculations of possible sequences for a desired structure or function
● Translate into DNA sequence and generate gene
● Produce protein and test it
● New calculations based on test results
46
Design of a novel protein fold
Kuhlman et al. (2003) Science,302:1364
2D schematic of target fold (arrows define constraints):93 residue α/β protein with novel topology
3D models were generated by assembling 3- and 9-residue fragments from the PDB 172 →backbone-only models within 2-3Å RMSDs from each other.
For each model a sequence was designed using the RosettaDesign Monte Carlo search protocol & energy function. All amino acids except cysteine were allowed at 71 positions, the 22 surface β-sheet positions were restricted to polar AAs.
Simultaneous optimization of sequence and structure: Cycling between sequence design & backbone optimization (to identify lowest energy for a given sequence)
47
Design of a novel protein fold
Kuhlman et al. (2003) Science,302:1364
Representatve part of Top7 in unbiased SAD densityComputatonal model 2.5 A x-ray structure
Comparison of computationally designed model and solved x-ray
structure (PDB 1QYS)
2D schematic of target fold (arrows define constraints):93 residue α/β protein with novel topology
Blue: designedRed: x-ray structBB RMSD = 1.17 Å
48
Design of idealized proteins
Koga et al. (2012) Nature,491:222
Fundamental Rules for the connection of bb, ba, and ab
Emergent Rules for the connection of bba, abb, and bab
49
Design of idealized proteins
Koga et al. (2012) Nature,491:222
NMRDesign superpositonRMSD
Rose
ta
Ener
gie
50
Fold change through engineering
TIM- or (ba)8-barrelIGP synthase HisF
(PDB 1THF) 9-stranded barrelCheYHisF chimera
(PDB 3cwo)
Flavodoxin-like foldresponse regulator CheY
(PDB 1TMY)
Illegitmate reciombination
Bharat et al. (2008) PNAS,105:9942Eisenbeis et al. (2012) JACS,134:4019
8-stranded barrelCheYHisF chimera
(PDB 2lle)
Ciomputational design
52
Computational receptor design
Changes in the binding pocket
wt ligand
protein
new ligand
A binding pocket is rearranged in order to recognize a new ligand
=> new therapeutics, vaccines, biosensors
53
Computational enzyme design
Malisi et al. (2011) BIOspektrum,17:736
Substrate orTransiton state
Catalytc motf Structure databaseScafold
Designed enzymeShell 2: only fexibility
Shell 1: mutatons
54
Computational enzyme design
Catalytc motf in new scafold Steric conficts (arrows)
Search space of possible mutants Soluton with conficts resolved
Malisi et al. (2011) BIOspektrum,17:736
57
Outline● What we have learned from nature protein evolution→
– Permanence of protein in evolution● Reconstruction of evolutionary history
– Change of protein in evolution
● How can we applying the knowledge protein design→– Computational design– Design of proteins– Design of enzymes
5858
“Piecing together the evolution of proteins”
Alva et al. (2015) eLife
5959
Vocabulary of primordial peptides
Alva et al. (2015) eLife
6060
Vocabulary of primordial peptidesap28
Alva et al. (2015) eLife
TPR
RPS20
6161
Helical hairpin in TPR and RPs
TPR RPS20
Zhu et al. (2016) eLife
62
From helical hairpin to TPR
M4N, a designed TPR from RPS20-hh
Zhu et al. (2016) eLife
6363
Origin of a folded protein from an intrinsically disordered ancestor
64
Outline● What we have learned from nature protein evolution→
– Permanence of protein in evolution● Reconstruction of evolutionary history
– Change of protein in evolution
● How can we applying the knowledge protein design→– Computational design– Design of proteins– Design of enzymes