lectures 2-3 - cnrs-orleans.frdirac.cnrs-orleans.fr/~piazza/pb/files/lect3-4.pdfselected topic in...
TRANSCRIPT
Selected topic in Physical Biology
Lectures 2-3
F. Piazza Center for Molecular Biophysics and University of Orléans, France
Elastic network models of proteins. Theory, applications
and much more than this
The force field
The first thing: a force field
Structure-energy relation
Energy landscapes: local versus global minima
2D representation of a 3nD-dimensional landscape
Energy minimization
Molecular dynamics simulations: stuck in a local minimum…
Computational costs at present only allow to explore local minima in general. For small system the ever-growing computational power allows to observe a few barrier-crossing events
Molecular dynamics simulations: integrate Newton’s equation
Nothing more than F = ma
Many force fields and full simulation packages are available
You can install one of these packages on your local work-station and run your molecular dynamics simulation right away
A recent innovation: ANTON
A microchip designed and optimized only to run molecular dynamics simulations
Coarse-grained models
Given the system coordinates, a “coordinate mapping,” M , determines the configuration, R, of the CG model as a function of the configuration, r, of an underlying atomistic model. The Cartesian coordinates, RI , of site I are typically determined as a linear combination of atomic Cartesian coordinates, ri, with constant, positive coefficients that often correspond to, e.g., the center of mass or geometry for the associated atomic group.
The Journal of Chemical Physics 139 , 090901 (2013) W. G. Noid, Perspective: Coarse-grained models for biomolecular systems
A long story, starting in 1975 and recently culminated in a Nobel prize!
Abstract A new and very simple representation of protein conformations has been used together with energy minimisation and thermalisation to simulate protein folding. Under certain conditions, the method succeeds in ‘renaturing’ bovine pancreatic trypsin inhibitor from an open-chain conformation into a folded conformation close to that of the native molecule.
The Go model: the birth of native-centric modeling strategies
Independently, in that same year Nobuhiro Go and his collaborators proposed a model an (even simpler) model where the chain of beads is mounted on on a lattice (initially they took a 2D lattice). Each bead would correspond to a residue or even to a secondary structure element of a protein (e.g. an a-helix).
The protein is restricted to fluctuate about the imposed structure: native-centric
Native-centric and off-lattice: the class of Elastic Network Models (ENMs)
The story begins in 1996… Monique Tirion shows that the low –frequency normal modes (NM) of a protein are not significantly altered when Interatomic interactions are replaced by identical Hookean springs (one-parameter model)!
Normal Mode Analysis (NMA)
Typical all-atom potential energy
V
For small enough displacements about the equilibrium position:
Normal Mode Analysis (NMA)
The equations of motion reads !
where we have introduced the Hessian matrix !
(1)!
The system of coupled ODEs (1) can be transformed into a set of uncoupled ODEs by the Normal Mode coordinate transformation!
Displacements
Mass-weighted Hessian! Diagonalization!
Coordinate transformation: the Normal Modes!
Normal Mode Analysis (NMA)
Perform the change of variable and use the orthonormality and completeness relations!
We get 3N uncoupled ODEs. It is the harmonic oscillator…!
The solutions are readily determined!
So that the bead displacements can be written as !
Normal Mode Analysis (NMA)
We note that the NM change of variables is such that the total potential energy and kinetic energy quadratic forms be diagonalized simultaneously. As a consequence, one has !
1
2
�
iα
miu2iα +
1
2
�
iα,jβ
uiαKαβij ujβ =
�
k
�k
where we have introduced the normal mode energies !
�k =1
2
�Q2
k + ω2kQ
2k
�
At thermodynamic equilibrium! ��k� = kBT �H� = 1
2
�
k
C2kω
2k =⇒ Ck =
√2kBT
ωk
The amplitude of the k-th mode goes !as the inverse of its frequency. In general, !for proteins modes with frequencies !below 30-100 cm-1 are responsible !for 90-95 % of displacements.!
Elastic Network Models (ENMs)
Elastic Network Models (ENMs): the potential energy
i and j can be atoms (Tirion) or groups of atoms … e.g. aggregated particles, e.g. amino acids…
Anisotropic Network Model (ANM) Nearly pure central forces
rij = ri − rj
r0ij = r0
i − r0j
Gaussian Network Model (ANM) Angular forces of the same order as central ones
V A =12
�
i>j
kij
�|rij |− |r0
ij |�2
V G =12
�
i>j
kij |rij − r0ij |2
=12
�
i>j
kij |uj − ui|2
rij − r0ij = uj − ui
kij = kf(|r0ij |)
f(|r0ij |) = cij ≡
�1 for |r0
ij | < Rc
0 otherwise
f(|r0ij |) ∝ |r0
ij |−α, α > 0
f(|r0ij |) ∝ exp[−(|r0
ij |/σ)2]
is the force constant between “particles” i and j
Popular choices for the force constants include:
Two-parameter models, in the sense that there is 1. One physical force scale
gauging homogeneously inter-particle force constants
2. One reference length specifying the range of inter-particle interactions
Elastic Network Models (ENMs): the cutoff issue
The sharp cutoff model Rc vary between 8 and 16 Ang!
The Gaussian network model is equivalent to a scalar model
The GNM is intrinsically a scalar model!
Elastic Network Models (ENMs): the forces. The ANM scheme
Elastic Network Models (ENMs): the forces. The ANM scheme Elastic Network Models (ENMs): the forces. The GNM scheme
Elastic Network Models (ENMs): angular versus central forces
Case (a) is meant to illustrate the magnitude of angle-bending forces, case (b) illustrates bond stretching. We want to compare the two kind of forces in the ANM and in the GNM schemes !
ANM! GNM!
Elastic Network Models (ENMs): angular versus central forces
ANM!f (a)iy
f (b)ix
= O[(u/R)2]
GNM!f (a)iy
f (b)ix
= O[1]
Coupling along the direction of displacement !
An energy minimization is no longer required. This means that the equilibrium structure is assumed to coincide with the experimentally resolved structure (X-ray Crystallography, NMR). This can be risky!
First step: go to a structure repository and download a file containing the atomic coordinates of the macromolecules
h"p://pdb.org/pdb/home/home.do
The extension of such files is .pdb (for Protein Data Bank)
1
2 Use a computer program to read the coordinates from the PDB file, which has its own specific format. This will be your r0
i vectors
Building an ENM
…………………………………………………………………………………
Atom number
Atom type
Amino acid (three-‐leEer code)
Chain
Chain number
x y z
B factor
PDB files contain a wealth of information on the protein
Calculating the mass-weighted Hessian is simple for ENMs!
Hαβij = − kij√
mimj
�kijR
αijR
βij − δij
�
m
RαimRβ
im
�
Rαij ≡
r0i,α − r0
j,α
|r0ij |
Cartesian components (a=x,y,z) of the equilibrium unit vectors of inter-particle bonds
ANM
GNM
V G =12
�
α
uTαKuα
=12uT (I3 ⊗K)u
Hαβij =
Kijδαβ√mimj
K =
�−kij for i �= j
−�
j �=i Kij for i = j
u is the 3N-dimensional vector of particle displacement
GNM
Inter-particle correlation: the covariance matrix.
The covariance matrix can be computed analytically!!
Zu is the partition function!
Let us introduce the matrix of eigenvectors of the mass-weighted Hessian matrix!
Then we have!
Inter-particle correlation: the covariance matrix
Let us perform the following change of variables!
where!
Inter-particle correlation: the covariance matrix
where!
Inter-particle correlation: the covariance matrix
The crystallographic B-factors: definition
The crystallographic Debye-Waller factors (so-called isotropic B-factors) are related to atomic fluctuations. In fact, this is not as simple as that, as they are indeed refining parameters used in fitting the X-ray diffraction spectra, measuring line-widths. As such, they also contain (large!) contributions from roto-translations of the protein as a whole in the crystal and static disorder (the atoms of the protein in different crystal cells are not exactly in the very same position)
Bi =8π
3
�
α
�u2iα�
=8πkBT
3mi
�
α
(H−1)ααii
(H−1)αβij = √mimj K
−1ij δαβ =⇒ Bi = 8πkBT K
−1ii
In the GNM the problem is effectively N-dimensional and not 3N-dimensional
Atomic fluctuations in the GNM
PnB-‐Esterase 13 It appears that the protein is almost rigid. With the excepIon that the C-‐terminal region exhibit the largest correlated movement within the whole protein. This could be a linker or entrance mechanism to the acIve site. InteresIngly, the residues around the proposed acIve site (HIS 399, GLU 308 and SER 187) measured also high slow modes (in contrast of the whole protein)
Atomic fluctuations provide insight into biological function
β-‐lactamase (1BLC) (a) and penicillopepsin (1BXO) (b), illustra7ng the mobility of residues in the first (lowest frequency) GNM mode. The color code is blue-‐red-‐yellow-‐green in the order of increasing mobility. Both enzymes contain an inhibitor (shown in space filling, gray) bound near the most constrained (lowest mobility) region. (c) and (d) Corresponding square fluctua7on profiles and posi7ons of cataly7c and inhibitor-‐binding residues.
Residues directly involved in cataly1c func1on at ac1ve sites are shown by the green open circles, inhibitor-‐binding residues are shown by the red squares and residues serving both cataly1c and inhibitor-‐binding func1ons are marked by the orange diamond. Cataly1c residues tend to lie in the s1ffest por1ons of the protein structures
ElasIc network models for understanding biomolecular machinery: from enzymes to supramolecular assemblies Chakra Chennubhotla, A J Rader, Lee-‐Wei Yang and Ivet Bahar Phys. Biol. 2 (2005) S173-‐S180
Catalytic Residues Coincide or communicate with Global Hinge Regions
Fluctuation profiles in the global mode (k = 1) and position of catalytic and inhibitor binding residues illustrated for six enzymes. Residues involved in catalytic function are marked with an open circle, inhibitors binding sites are marked with a closed square, and residues serving both catalytic and inhibitor binding functions are marked with a closed circle. Arrows indicate the hinge sites.
Lee-Wei Yang and Ivet Bahar, Coupling between Catalytic Site and Collective Dynamics: A Requirement for Mechanochemical Activity of Enzymes, Structure, 13, 893–904, June, 2005
With care: B factors do not only contain internal motions!
Conformational changes in proteins
Many proteins exist in open (apo) and closed (holo, liganded) form. The biological function is closely related to the conformational change brought about by the apo-holo transition
Open (top and middle) and closed (bottom) forms of lysine-arginine-ornithine (LAO) binding protein as shown usually (top) and modeled as an ENM coarse-grained at the Ca level with a cutoff Rc = 8 Ang.
F. Tama and Y.-H. Sanejouand, Conformational changes in proteins arising from normal mode calculations, Prot. Eng., 14, 1-6 (2001).
One can ask the following question: How many NMs (a complete orthonormal basis in E3N) will be required to describe (reconstruct) the conformational change (a given percentage of it)? Will I require a number order N or order 1?
∆Riα ≡Rholo
iα −Rapoiα
|Rholo −Rapo| =3N−6�
k=1
Ikakiα
Conformational changes in proteins reconstructed through NMs: the overlap coefficients
If I use the whole basis of NMs, the (normalized) conformational change is reconstructed exactly
Let us consider our NMs as normalized
Ik ≡�
iα
∆Riαakiα Overlap coefficients
The surprising answer
There exists a single mode m for which Im is of the order 0.5-0.8!!
Normal modes of the open conformations perform better than those of the closed conformations
In the open conformations domains are better separated, hence better defined. Therefore a coarse-grained description of the large-scale dynamics works better.
This is usually true for collective motions (low-frequency NMs) but There exist conformational changes that are captured by more localized modes
Im = 0.3
Ik ≡�
iα
∆Riαakiα
Overlap coefficients Correlation coefficients
ck =1
3N
�iα(ak
iα − ak)(∆Riα −∆R)σ(ak)σ(∆R)
Collectivity index The more collective the conformational change, the better the one-mode overlap
η =1
3Nexp
−�
jβ
∆R2jβ log R2
jβ
Case study: predicting active sites in enzymes
We have seen catalytic sites seem to have a tendency to lie at hinge-like regions in enzyme structures. This indication comes from the fact that these sites surprisingly often coincide with nodes of low-frequency normal modes.!
Is it possible to devise specific indicators that help identify and predict active sites?!
Ci =�
j
cij
CCi =
�
j
�ij
−1
χi =�
k∈Shf
|ξki |2
Simple tools from network theory: the connectivity graph!
Connectivity. How many neighbors at each node!
Closeness centrality Inverse of the sum of the shortestpaths from a given node to all othernodes. !
Spectral stiffness Contribution of a reduced subset of!high-frequency NMs to the local fluctuation of a given node!
Predicting active sites in enzymes: high-pass filter
Filtering procedure !We compute the indicator patterns but apply a high-pass filter, so as to only retain a reduced number of peaks !
Arginin Glycineaminotransferase. PDB code 1JDW!
Predicting active sites in enzymes: the cutoff lensing idea
Study the patterns of our structural indicators as functions of the cutoff used to build the connectivity graph. For the sake of the argument, we can also push it to values that may be thought of unphysical (excessively connected structures)!
Rc = 10 A Rc = 20 A
Arginin Kinase. PDB code 1BG0!
Cutoff lensing effectSome irrelevant peaks disappear and an additional peak appears flagging an active site. !
Predicting active sites in enzymes: varying the cutoff…
The three indicators behave differently when the cutoff is let increase. In particularthe connectivity pattern becomes less interesting at high cutoff as the structures become more and more connected. !
Arginin Glycineaminotransferase. PDB code 1JDW!
We need to study the number of peaks as a function of the cutoff!
Average peak fraction (number of peaks!divided by number of residues) computed over the CSA!
Analysis of the enzyme database: the catalytic site atlas (CSA)
Fraction of catalytic residues within sites from!the nearest peak versus cutoff, as computed over the ensemble of enzymes from the CSA!
∆n
Reliability of the stiffness indicator
The reliability is defined as the fraction of predicted catalytic sites (within amino acids along the sequence) divided by the fraction of stiffness peaks (number of peaksper amino acid).!
∆n
Average number of peaks in the reduced !stiffness patterns per catalytic site!!The optimal cutoff corresponds to nearly !one peak per catalytic site. !!Extreme predictive precision!
Predicting active sites in enzymes: size matters!
fraction of catalytic sites within sites from the nearest peak of the reduced stiffness!patterns computed over three different size classes in the CSA database.!
∆n
!"##$%&'()*+Rc = 20!"!
,&-#$..+Rc = 22!"!
!/".$#$..+Rc = 28!"!
#$!%!
#&!%! ''!%!
'(!%! )*!%!
)+!%!
))!%!
!n!= 1!
!"##$%&'()*+Rc = 20!"!
,&-#$..+Rc = 22!"!
!/".$#$..+Rc = 28!"!
++!%!
&'!%! ',!%!
#'!%! '#!%!
''!%!
)-!%!
!n!= 2!
The best predictions can be obtained by combining the 3 indicators at optimal cutoff in a sequential way
The connectivity profiles should be examined first. These are the ones with the largest number !of peaks, often coalescing to highlight extended regions. The search should be subsequently !narrowed down with the corresponding closeness profile, typically featuring more localized peaks, !albeit many of them likely to be orphan ones. The prediction should then be refined through the !reduced stiffness patterns, the ones with the least number of peaks.!
Scientific Reports 5, Article number: 14874 (2015) doi:10.1038 srep14874!
Normal modes strictly refer to very low temperature… The thermal overlap coefficients
The thermal overlap coefficients
Aggregated spectral weight over a few modes is a good temperature-insensitive indicator
There is redistribution of spectral weight at the working temperature. CAUTION IN USING T=0 normal modes !
The overall shape matters for spectral reconstructions of conformational fluctuations at non-zero temperature
1G2F!
The less globular, the less cooperative, the worse
Proteins live immersed in a solvent. There is friction!
(1)
(2)
(3)
(4)
(5)
(3)
Langevin modes
(6)
(8)
(7)
is the block covariance matrix
The block covariance matrix can be shown to obey the following solution (this is a straightforward consequence of imposing a Gaussian ansatz for the Fokker-Planck equation)
(9)
Eq. (9) has the following solution (see e.g. book by Risken, “The Fokker-Planck equation”)
(10)
(12)
(11)
(12)
(12)
(14)
(13)
(15)
(15)
(10)
(11)
Intramolecular energy flux in proteins… or within other complex three-dimensional molecular structures.
Scanned by CamScanner
General question!!2-electrode setup: inject energy !at some site and monitor energy !outflux at a different site.!!1. What are the energy
transduction pathways?!2. Are specific site pairs
characterized by low impedance? !
!!!
A typical non-equilibrium setup!
Scanned by CamScanner
Hot thermostat T1!
i
Cold thermostat T2!j
Generalized impedance !
i −→ j
Energy flux
We need first to introduce a measure of local energy flow in protein structures. !This comes naturally if we take the time derivative of the local energies!!
Taking the time derivative !
Let us introduce the current from i to j, (positive if energy flows from from i to j), then !
which leads to!
Note that at equilibrium the following relations hold!
Hence the total incoming and outgoing energy current is zero at each site !
In the harmonic approximation the expression for the energy current can be simplified further. One has!
Scanned by CamScanner
Hot thermostat T1!
i
Cold thermostat T2!j
This expression can be used to measure the energy current between two given sites in a non-equilibrium setting such as the one we are interested into!
where (ss = steady-state)
is an average computed with respect to the non-equilibrium steady-state measure!
←− β−1i Γij
The imposed temperature field !
�Ji→j�ss =1
2
�
αβ
Kαβij [�uiαujβ�ss − �ujβuiα�ss]
Practically, the sequence of operations is the following:
←− β−1i Γij
The imposed temperature field !
1.
2. Compute the non-equilibriumcovariance matrix !
3. Isolate the 3x3 blocks in the off-diagonal block that correspond to the two “electrode” beads and compute the current!
�Ji→j�ss =1
2
�
αβ
Kαβij [�uiαujβ�ss − �ujβuiα�ss]
Questions!!
?!