practical 15: mdanalysis documentation - beckstein lab

29
Practical 15: MDAnalysis Documentation Release 1.0 Oliver Beckstein April 24, 2013

Upload: others

Post on 28-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

CONTENTS
1 Contents 3 1.1 Installing MDAnalysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 1.2 Basics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4 1.3 Working with AtomGroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 1.4 Trajectory analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 1.5 Intermediate Level MDAnalysis hacks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2 References 19
Bibliography 23
Index 25
Practical 15: MDAnalysis Documentation, Release 1.0
MDAnalysis is an open source Python library that helps you to quickly write your own analysis algorithm for studying trajectories produced by the most popular simulation packages [Michaud-Agrawal2011].
The online documentation together with the interactive python documentation should help you while you are using the library.
DISCLAIMER: Your instructor is one of the main authors of the package and might be overly enthusiastic in promoting it...
CONTENTS 1
2 CONTENTS
1.1 Installing MDAnalysis
The following notes are specific for the SimBioNano course; in general (if you have a C-compiler installed and a few other packages) then you should be able to follow the installation notes.
1.1.1 Installing the binary distribution on the iMacs
The iMacs do not have a C-compiler so a special binary distribution was prepared as a so-called “egg” file, which contains Python code and compiled code. To install a Python egg you need the helper script ez_setup.py, which installs some infrastructure, and the egg file itself.
On the iMacs, do the following to install a version latest released version of MDAnalysis (0.7.7).
First create the directory where you will install packages:
mkdir -p ~/Library/Python/2.7/lib/python/site-packages
(This directory is automatically searched by the Python installation on the iMacs so you don’t have to manipulate PYTHONPATH.) Then create a file ~/.pydistutils.cfg that will tell the Python installation tools (distutils) that you always want to put Python packages into your own private directory (a so-called Mac OS X user installation):
cat > ~/.pydistutils.cfg <<’EOF’ # Mac OS X user installation: # http://peak.telecommunity.com/DevCenter/EasyInstall#mac-os-x-user-installation # http://peak.telecommunity.com/DevCenter/EasyInstall#downloading-and-installing-a-package
# for Mac OS X framework installations (such as EPD) # use site.USER_SITE http://docs.python.org/2/library/site.html
[install] install_lib = ~/Library/Python/$py_version_short/lib/python/site-packages install_scripts = ~/bin EOF
Then download the packages from the SimBioNano/15/eggs directory (in general you can find source files on the MDAnalysis download page):
curl -O http://becksteinlab.physics.asu.edu/pages/courses/2013/SimBioNano/15/ez_setup.py curl -O http://becksteinlab.physics.asu.edu/pages/courses/2013/SimBioNano/15/eggs/MDAnalysis-0.7.7-py2.7-macosx-10.6-i386.egg curl -O http://becksteinlab.physics.asu.edu/pages/courses/2013/SimBioNano/15/eggs/MDAnalysisTests-0.7.7-py2.7.egg
Finally, install the packages and a few dependencies (which are downloaded from the internet):
easy_install --no-deps ./MDAnalysis-0.7.7-py2.7-macosx-10.6-i386.egg ./MDAnalysisTests-0.7.7-py2.7.egg easy_install networkx gridDataFormats
1.1.2 Installing on saguaro from source
saguaro has an installation of all the important Python packages and the GNU compilers as part of the Software Library and Packages. Thus it is very easy to install MDAnalysis from Python packages:
module load python easy_install --user -U MDAnalysis MDAnalysisTests
Just remember to module load python before starting to use MDAnalysis.
1.1.3 Testing the installation
MDAnalysis comes with over 500 test cases that check its functionality. These test cases can be run with the command
python -c ’from MDAnalysis.tests import test; test(label="full", verbose=3, extra_argv=["--exe"])’
This can take a few minutes. Ideally, you should only get passing tests (“ok” or just a single dot ”.” when using verbose=1) or “KnownFailures”.
1.2 Basics
• use the interactive help (command? or command??)
• TAB completion, e.g. MDAnalysis.U<TAB> will autocomplete to MDAnalysis.Universe.
MDAnalysis.Universe.<TAB> will show all methods and attributes.
• quick plotting with matplotlib (and array manipulations with numpy)
1.2.1 Loading modules
Load MDAnalysis:
import MDAnalysis
MDAnalysis comes with a bunch of test files and trajectories. One is the AdK trajectory from Practical 10 that samples a transition from a closed to an open conformation [Beckstein2009]. The topology file (CHARMM psf format) and trajectory (CHARMM/NAMD dcd format) can be loaded into the variables PSF and DCD:
from MDAnalysis.tests.datafiles import PSF, DCD
Finally, also load numpy:
import numpy as np
4 Chapter 1. Contents
1.2.2 Universe and AtomGroup
MDAnalysis is object oriented. Molecular systems consist of Atom objects (“instances” of the “class” MDAnalysis.core.AtomGroup.Atom), which are grouped in AtomGroup instances. You build the AtomGroup of your system by loading a topology (list of atoms and possibly their connectivity) together with a trajectory (coordinate information) into the central data structure, the Universe object:
>>> u = MDAnalysis.Universe(PSF, DCD) >>> print(u) <Universe with 3341 atoms>
The atoms are stored in the “attribute” MDAnalysis.core.AtomGroup.Universe.atoms
>>> print(u.atoms) <AtomGroup with 3341 atoms> >>> list(u.atoms[:5]) [< Atom 1: name ’N’ of type ’56’ of resname ’MET’, resid 1 and segid ’4AKE’>, < Atom 2: name ’HT1’ of type ’2’ of resname ’MET’, resid 1 and segid ’4AKE’>, < Atom 3: name ’HT2’ of type ’2’ of resname ’MET’, resid 1 and segid ’4AKE’>, < Atom 4: name ’HT3’ of type ’2’ of resname ’MET’, resid 1 and segid ’4AKE’>, < Atom 5: name ’CA’ of type ’22’ of resname ’MET’, resid 1 and segid ’4AKE’>]
Any AtomGroup knows the residues that the atoms belong to via the attribute residues, which produces a ResidueGroup. A ResidueGroup acts like a list of Residue objects:
>>> u.atoms[100:130].residues <ResidueGroup [<Residue ’LEU’, 6>, <Residue ’GLY’, 7>, <Residue ’ALA’, 8>]>
Larger organizational units are Segment instances, for example one protein or all the solvent molecules or simply the whole system. Atom, AtomGroup, Residue, and ResidueGroup have an attribute segments that will list the segment IDs (“segids”) as a SegmentGroup:
>>> u.atoms.segments <SegmentGroup [<Segment ’4AKE’>]>
The converse is also true: each “higher” level in the hierarchy also know about the Residue and Atom instances it contains. For example, to list the atoms of the ResidueGroup we had before:
>>> r = u.atoms[100:130].residues >>> r.atoms <AtomGroup with 36 atoms>
Exercise 1
1. What residue (“resname”) does the last atom belong to in the above example?
>>> r = u.atoms[100:130].residues >>> r.atoms[-1] < Atom 136: name ’O’ of type ’70’ of resname ’ALA’, resid 8 and segid ’4AKE’>
2. Why does the expression
len(u.atoms[100:130]) == len(u.atoms[100:130].residues.atoms)
return False?
Because the complete residues contain more atoms than the arbitrary slice of atoms.
3. How many residues are in the Universe u?
1.2. Basics 5
>>> len(u.atoms.residues) >>> u.atoms.numberOfResidues() 214
How do you get a list of the residue names (such as ["Ala", "Gly", "Gly", "Asp", ...]) and residue numbers (“resid”) for atoms 1000 to 1300? And as a list of tuples (resname, resid) (Hint: zip())?:
>>> resnames = u.atoms[999:1300].resnames() >>> resids = u.atoms[999:1300].resids() >>> zip(resnames, resids)
How do you obtain the resid and the resname for the 100th residue? (Hint: investigate the Residue object interactively with TAB completion)
>>> r100 = u.atoms.residues[99] >>> print(r100.id, r100.name) 100 GLY
4. How many segments are there?
>>> len(u.segments) >>> len(u.atoms.segments) >>> u.atoms.numberOfSegments() 1
>>> s1 = u.segments[0] >>> s1.id ’4AKE’
See Also:
• numberOfResidues() and numberOfAtoms()
1.2.3 Selections
MDAnalysis comes with a fairly complete atom selection facility. Primarily, one uses the method selectAtoms() of a Universe:
>>> CA = u.selectAtoms("protein and name CA") >>> CA >>> <AtomGroup with 214 atoms>
but really any AtomGroup has a selectAtoms() method:
>>> acidic = CA.selectAtoms("resname ASP or resname GLU") >>> acidic >>> <AtomGroup with 35 atoms> >>> acidic.residues <ResidueGroup [<Residue ’GLU’, 22>, <Residue ’ASP’, 33>, <Residue ’GLU’, 44>, <Residue ’ASP’, 51>, <Residue ’ASP’, 54>, <Residue ’ASP’, 61>, <Residue ’GLU’, 62>, <Residue ’GLU’, 70>, <Residue ’GLU’, 75>, <Residue ’ASP’, 76>, <Residue ’ASP’, 84>, <Residue ’ASP’, 94>, <Residue ’GLU’, 98>, <Residue ’ASP’, 104>, <Residue ’GLU’, 108>, <Residue ’ASP’, 110>, <Residue ’ASP’, 113>, <Residue ’GLU’, 114>, <Residue ’ASP’, 118>, <Residue ’GLU’, 143>, <Residue ’ASP’, 146>, <Residue ’ASP’, 147>, <Residue ’GLU’, 151>, <Residue ’GLU’, 152>, <Residue ’ASP’, 158>, <Residue ’ASP’, 159>, <Residue ’GLU’, 161>, <Residue ’GLU’, 162>, <Residue ’GLU’, 170>, <Residue ’GLU’, 185>, <Residue ’GLU’, 187>, <Residue ’ASP’, 197>, <Residue ’GLU’, 204>, <Residue ’ASP’, 208>, <Residue ’GLU’, 210>]>
6 Chapter 1. Contents
See Also:
All the selection keywords are described in the documentation.
Selections can be combined with boolean expression and it is also possible to select by geometric criteria, e.g. with the around distance selection keyword:
u.selectAtoms("((resname ASP or resname GLU) and not (backbone or name CB or name CG)) \ and around 4.0 ((resname LYS or resname ARG) \
and not (backbone or name CB or name CG))").residues
What is this selection trying to accomplish?
Exercises 2
1. Select the range of resids 100 to 200 (“100-200”) with a selection. Compare the result to what you get by slicing the u.atoms.residues appropriately.
Which approach would you prefer to use in a analysis script?
Solution:
>>> u.selectAtoms("resid 100-200") <AtomGroup with 1609 atoms>
Compare to the slicing solution (doing an element-wise comparison, i.e. residue by residue in each list()):
>>> list(u.selectAtoms("resid 100-200").residues) == list(u.atoms.residues[99:200])
If one wants to get specific residues in scripts one typically uses selections instead of slicing because the index in the slice might not correspond to the actual residue ids (minus 1): If a number of residues (e.g. 150-160) are missing from the structure then the selection will simply give you residues 100-149 and 151-200 but the slice 99:200 would give you residues 100-149, 151-209.
2. Select all residues that do not contain a Cβ (“CB”) atom. How many are there? What residue names did you find?
Solution:
>>> sel = u.selectAtoms("(byres name CA) and not (byres name CB)").residues >>> len(sel) 20
These are all Glycines, as can be seen by comparing the residue groups element-wise:
>>> glycines = u.selectAtoms("resname GLY") >>> list(sel) == list(glycines.residues) True
1.3 Working with AtomGroups
A AtomGroup has a large number of methods attributes defined that provide information about the atoms such as names, indices, or the coordinates in the positions attribute:
>>> CA = u.selectAtoms("protein and name CA") >>> r = CA.positions >>> r.shape (214, 3)
1.3. Working with AtomGroups 7
Practical 15: MDAnalysis Documentation, Release 1.0
The resulting output is a numpy.ndarray. The main purpose of MDAnalysis is to get trajectory data into numpy arrays!
1.3.1 Important methods and attributes of AtomGroup
The coordinates positions attribute is probably the most important information that you can get from an AtomGroup.
Other quantities that can be easily calculated for a AtomGroup are
• the center of mass centerOfMass() and the center of geoemtry (or centroid) centerOfGeometry() (equivalent to centroid());
• the total mass totalMass();
• the total charge totalCharge() (if partial charges are defined in the topology);
• the radius of gyration
mi(ri −R)2
with radiusOfGyration();
• the principal axes p1,p2,p1 from principalAxes() via a diagonalization of the tensor of inertia momentOfInertia(),
Λ = UT IU, with U = (p1,p2,p3)
where U is a rotation matrix whose columns are the eigenvectors that form the principal axes, Λ is the diagonal matrix of eigenvalues (sorted from largest to smallest) known as the principal moments of inertia, and I =∑N i=1mi[(ri · ri)
∑3 α=1 eα ⊗ eα − ri ⊗ ri] is the tensor of inertia.
1.3.2 Exercises 3
• CORE residues 1-29, 60-121, 160-214 (gray)
• NMP residues 30-59 (blue)
• LID residues 122-159 (yellow)
1. Calculate the center of mass and the center of geometry for each of the three domains.
• What are the distances between the centers of mass?
(Hint: you can use numpy.linalg.norm() or use a function like veclength() that you defined previously)
8 Chapter 1. Contents
Practical 15: MDAnalysis Documentation, Release 1.0
• Does it matter to use center of mass vs center of geometry?
AdK undergoes a conformational transition during which CORE and LID move relative to each other. The movement can be characterized by two angles, θNMP and θLID, which are defined between the centers of geometry of the backbone and Cβ atoms between groups of residues [Beckstein2009]:
definition of θNMP A: 115-125, B: 90-100, C: 35-55
definition of θLID A: 179-185, B: 115-125, C: 125-153
The angle between vectors ~BA and ~BC is
θ = arccos
)
2. Write a function theta_NMP() that takes a Universe as an argument and computes θNMP:
theta_NMP(u) Calculate the NMP-CORE angle for E. coli AdK in degrees from Universe u
Use the following incomplete code as a starting point:
import numpy as np from np.linalg import norm
def theta_NMP(u): """Calculate the NMP-CORE angle for E. coli AdK in degrees""" A = u.selectAtoms("resid 115:125 and (backbone or name CB)").centerOfGeometry() B = C = BA = A - B BC = theta = np.arccos( return np.rad2deg(theta)
Write the function in a file adk.py and use ipython %run adk.py to load the function while working on it.
Test it on the AdK simulation (actually, the first frame):
>>> theta_NMP(u) 44.124821
1.3. Working with AtomGroups 9
Test it:
>>> theta_LID(u) 107.00881
1.3.3 Processing AtomGroups
You can directly write a AtomGroup to a file with the write() method:
CORE = u.selectAtoms("resid 1:29 or resid 60:121 or resid 160:214") CORE.write("AdK_CORE.pdb")
(The extension determines the file type.)
You can do fairly complicated things on the fly, such as writing the hydration shell around a protein to a file
u.selectAtoms("byres (name OW and around 4.0 protein)").write("hydration_shell.pdb")
for further analysis or visualization.
You can also write Gromacs index files (in case you don’t like make_ndx...) with the write_selection() method:
CORE.write_selection("CORE.ndx", name="CORE")
1.4 Trajectory analysis
The Universe binds together the static topology (which atoms, how are they connected, what un-changing properties do the atoms possess (such as partial charge), ...) and the changing coordinate information, which is stored in the trajectory.
The length of a trajectory (number of frames) is
len(u.trajectory)
The standard way to assess each time step (or frame) in a trajectory is to iterate over the Universe.trajectory attribute (which is an instance of Reader class):
for ts in u.trajectory: print("Frame: %5d, Time: %8.3f ps" % (ts.frame, u.trajectory.time)) print("Rgyr: %g A" % (u.atoms.radiusOfGyration(), ))
The time attribute contains the current time step. The Reader only contains information about one time step: imagine a cursor or pointer moving along the trajectory file. Where the cursor points, there’s you current coordinates, frame number, and time.
Normally you will collect the data in a list or array, e.g.
10 Chapter 1. Contents
Rgyr = [] protein = u.selectAtoms("protein") for ts in u.trajectory:
Rgyr.append((u.trajectory.time, protein.radiusOfGyration())) Rgyr = np.array(Rgyr)
Note: It is important to note that the coordinates and related properties calculated from the coordinates such as the radius of gyration change while selections such as protein in the example do not change when moving through a trajectory: You can define the selection once and the recalculate the property of interest for each frame of the trajectory.
The data can be plotted to give the graph below:
# quick plot from pylab import * plot(Rgyr[:,0], Rgyr[:,1], ’r--’, lw=2, label=r"$R_G$") xlabel("time (ps)") ylabel(r"radius of gyration $R_G$ ($\AA$)")
What does the shape of the RG(t) time series indicate?
0 20 40 60 80 100 time (ps)
16.5
17.0
17.5
18.0
18.5
19.0
19.5
20.0
import numpy as np from numpy.linalg import norm
def theta_NMP(u): """Calculate the NMP-CORE angle for E. coli AdK in degrees""" C = u.selectAtoms("resid 115:125 and (backbone or name CB)").centerOfGeometry() B = u.selectAtoms("resid 90:100 and (backbone or name CB)").centerOfGeometry() A = u.selectAtoms("resid 35:55 and (backbone or name CB)").centerOfGeometry() BA = A - B BC = C - B theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC))) return np.rad2deg(theta)
def theta_LID(u): """Calculate the LID-CORE angle for E. coli AdK in degrees""" C = u.selectAtoms("resid 179:185 and (backbone or name CB)").centerOfGeometry()
1.4. Trajectory analysis 11
B = u.selectAtoms("resid 115:125 and (backbone or name CB)").centerOfGeometry() A = u.selectAtoms("resid 125:153 and (backbone or name CB)").centerOfGeometry() BA = A - B BC = C - B theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC))) return np.rad2deg(theta)
and calculate the time series θNMP(t) and θLID(t).
Plot them together in one plot.
2. Plot θNMP(t) against θLID(t).
What does the plot show?
Why could such a plot be useful?
0 20 40 60 80 100 time t (ps)
40 ±
60 ±
80 ±
100 ±
120 ±
140 ±
160 ±
NMP-CORE angle µNMP
The code to generate the figure contains theta_LID() and theta_NMP().
1 import numpy as np 2 from numpy.linalg import norm 3
4 def theta_NMP(u): 5 """Calculate the NMP-CORE angle for E. coli AdK in degrees""" 6 C = u.selectAtoms("resid 115:125 and (backbone or name CB)").centerOfGeometry() 7 B = u.selectAtoms("resid 90:100 and (backbone or name CB)").centerOfGeometry() 8 A = u.selectAtoms("resid 35:55 and (backbone or name CB)").centerOfGeometry() 9 BA = A - B
10 BC = C - B 11 theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC))) 12 return np.rad2deg(theta) 13
14 def theta_LID(u): 15 """Calculate the LID-CORE angle for E. coli AdK in degrees""" 16 C = u.selectAtoms("resid 179:185 and (backbone or name CB)").centerOfGeometry() 17 B = u.selectAtoms("resid 115:125 and (backbone or name CB)").centerOfGeometry() 18 A = u.selectAtoms("resid 125:153 and (backbone or name CB)").centerOfGeometry() 19 BA = A - B 20 BC = C - B 21 theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC))) 22 return np.rad2deg(theta) 23
24 if __name__ == "__main__": 25 import MDAnalysis 26 from MDAnalysis.tests.datafiles import PSF, DCD 27 import matplotlib 28 import matplotlib.pyplot as plt 29
30 u = MDAnalysis.Universe(PSF, DCD)
12 Chapter 1. Contents
31 data = np.array([(u.trajectory.time, theta_NMP(u), theta_LID(u)) for ts in u.trajectory]) 32 time, NMP, LID = data.T 33
34
35 # plotting 36 degreeFormatter = matplotlib.ticker.FormatStrFormatter(r"%g$^\circ$") 37 fig = plt.figure(figsize=(6,3)) 38
39 ax1 = fig.add_subplot(121) 40 ax1.plot(time, NMP, ’b-’, lw=2, label=r"$\theta_{\mathrm{NMP}}$") 41 ax1.plot(time, LID, ’r-’, lw=2, label=r"$\theta_{\mathrm{LID}}$") 42 ax1.set_xlabel(r"time $t$ (ps)") 43 ax1.set_ylabel(r"angle $\theta$") 44 ax1.yaxis.set_major_formatter(degreeFormatter) 45 ax1.legend(loc="best") 46
47 ax2 = fig.add_subplot(122) 48 ax2.plot(NMP, LID, ’k-’, lw=3) 49 ax2.set_xlabel(r"NMP-CORE angle $\theta_{\mathrm{NMP}}$") 50 ax2.set_ylabel(r"LID-CORE angle $\theta_{\mathrm{LID}}$") 51 ax2.xaxis.set_major_formatter(degreeFormatter) 52 ax2.yaxis.set_major_formatter(degreeFormatter) 53 ax2.yaxis.tick_right() 54 ax2.yaxis.set_label_position("right") 55
56 fig.subplots_adjust(left=0.12, right=0.88, bottom=0.2, wspace=0.15) 57
58 for ext in (’svg’, ’pdf’, ’png’): 59 fig.savefig("NMP_LID_angle_projection.{0}".format(ext))
Note that one would normally write the code more efficiently and generate the atom groups once and then pass them to a simple function to calculate the angle
def theta(A, B, C): """Calculate the angle between BA and BC for AtomGroups A, B, C""" B_center = B.centroid() BA = A.centroid() - B_center BC = C.centroid() - B_center theta = np.arccos(np.dot(BA, BC)/(norm(BA)*norm(BC))) return np.rad2deg(theta)
1.4.2 Bells and whistles
Especially useful for interactive analysis in ipython –pylab using list comprehensions (implicit for loops):
protein = u.selectAtoms("protein") data = np.array([(u.trajectory.time, protein.radiusOfGyration()) for ts in u.trajectory]) time, RG = data.T plot(time, RG)
More on the trajectory iterator
One can directly jump to a frame by using “indexing syntax”:
1.4. Trajectory analysis 13
Practical 15: MDAnalysis Documentation, Release 1.0
>>> u.trajectory[50] < Timestep 51 with unit cell dimensions array([ 0., 0., 0., 90., 90., 90.], dtype=float32) > >>> ts.frame 51
You can also slice trajectories, e.g. if you want to start at the 10th frame and go to 10th before the end, and only use every 5th frame:
for ts in u.trajectory[9:-10:5]: print(ts.frame) ...
(although doing this on Gromacs XTC and TRR trajectories is currently much slower than for DCDs.)
Note: Trajectory indexing and slicing uses 0-based indices (as in standard Python) but MDAnalysis numbers frames starting with 1 (for historical reasons and according to the practice of all MD codes).
1.5 Intermediate Level MDAnalysis hacks
MDAnalysis comes with a number of existing analysis code in the MDAnalysis.analysis module and example scripts (see also the Examples on the MDAnalysis wiki).
1.5.1 RMSD
As an example we will use the MDAnalysis.analysis.rms.rmsd() function from the MDAnalysis.analysis.rms module. It computes the coordinate root mean square distance between two sets of coordinates. For example for the AdK trajectory the backbone RMSD between first and last frame is
>>> u = Universe(PSF,DCD) >>> bb = u.selectAtoms(’backbone’) >>> A = bb.positions # coordinates of first frame >>> u.trajectory[-1] # forward to last frame >>> B = bb.positions # coordinates of last frame >>> rmsd(A,B) 6.8342494129169804
1.5.2 Superposition of structure
In order to superimpose two structures in a way that minimizes the RMSD we have functions in the MDAnalysis.analysis.align module.
The example uses files provided as part of the MDAnalysis test suite (in the variables PSF, DCD, and PDB_small). For all further examples execute first
>>> from MDAnalysis import Universe >>> from MDAnalysis.analysis.align import * >>> from MDAnalysis.tests.datafiles import PSF, DCD, PDB_small
In the simplest case, we can simply calculate the C-alpha RMSD between two structures, using rmsd():
>>> ref = Universe(PDB_small) >>> mobile = Universe(PSF,DCD)
14 Chapter 1. Contents
>>> rmsd(mobile.atoms.CA.positions, ref.atoms.CA.positions) 18.858259026820352
Note that in this example translations have not been removed. In order to look at the pure rotation one needs to superimpose the centres of mass (or geometry) first:
>>> ref0 = ref.atoms.CA.positions - ref.atoms.CA.centerOfMass() >>> mobile0 = mobile.atoms.CA.positions - mobile.atoms.CA.centerOfMass() >>> rmsd(mobile0, ref0) 6.8093965864717951
The rotation matrix that superimposes mobile on ref while minimizing the CA-RMSD is obtained with the rotation_matrix() function
>>> R, rmsd = rotation_matrix(mobile0, ref0) >>> print rmsd 6.8093965864717951 >>> print R [[ 0.14514539 -0.27259113 0.95111876] [ 0.88652593 0.46267112 -0.00268642] [-0.43932289 0.84358136 0.30881368]]
Putting all this together one can superimpose all of mobile onto ref :
>>> mobile.atoms.translate(-mobile.atoms.CA.centerOfMass()) >>> mobile.atoms.rotate(R) >>> mobile.atoms.translate(ref.atoms.CA.centerOfMass()) >>> mobile.atoms.write("mobile_on_ref.pdb")
1.5.3 Exercise 5
Use the above in order to investigate how rigid the CORE, NMP, and LID domains are during the transition: Compute time series of the CA RMSD of each domain relative to its own starting structure, when superimposed on the starting structure.
• You will need to make a copy of the starting reference coordinates that are needed for the shifts, e.g.
NMP = u.selectAtoms("resid 30:59") u.trajectory[0] # make sure to be on initial frame ref_com = NMP.selectAtoms("name CA").centerOfMass() ref0 = NMP.positions - ref_com
which is then used instead of ref.atoms.CA.centerOfMass() (which would change for each time step).
• I suggest writing a function that does the superposition for a given time step, reference, and mobile AtomGroup to make the code more manageable (or use MDAnalysis.analysis.align.alignto())
1.5. Intermediate Level MDAnalysis hacks 15
0 20 40 60 80 100 time t (ps)
0.0
0.5
1.0
1.5
2.0
)
CORE
NMP
LID
Possible solution
The code contains a function superpose() and rmsd(). The latter is marginally faster because we only need the calculated RMSD and not the full rotation matrix. (We are calling the lower-level function MDAnalysis.core.qcprot.CalcRMSDRotationalMatrix() directly, which has somewhat non-intuitive calling conventions). superpose() also does the superposition of the mobile group to the references (but alignto() is actually a more flexible tool for doing this). Otherwise it is mostly book-keeping, which is solved by organizing everything in dictionaries with keys “CORE”, “NMP”, “LID”.
1 import numpy as np 2 from MDAnalysis.analysis.align import rotation_matrix 3 from MDAnalysis.core.qcprot import CalcRMSDRotationalMatrix 4
5 def superpose(mobile, xref0, xref_com=None): 6 """Superpose the AtomGroup *mobile* onto the coordinates *xref0* centered at the orgin. 7
8 The original center of mass of the reference group *xref_com* must 9 be supplied or the superposition is done at the origin of the
10 coordinate system. 11 """ 12 # 995 us 13 xref_com = xref_com if xref_com is not None else np.array([0., 0., 0.]) 14 xmobile0 = mobile.positions - mobile.centerOfMass() 15 R, rmsd = rotation_matrix(xmobile0, xref0) 16 mobile.rotate(R) 17 mobile.translate(xref_com) 18 return rmsd 19
20 def rmsd(mobile, xref0): 21 """Calculate optimal RMSD for AtomGroup *mobile* onto the coordinates *xref0* centered at the orgin. 22
23 The coordinates are not changed. No mass weighting. 24 """ 25 # 738 us 26 xmobile0 = mobile.positions - mobile.centerOfMass()
16 Chapter 1. Contents
27 return CalcRMSDRotationalMatrix(xref0.T.astype(np.float64), xmobile0.T.astype(np.float64), mobile.numberOfAtoms(), None, None) 28
29
30 if __name__ == "__main__": 31 import MDAnalysis 32 import matplotlib 33 import matplotlib.pyplot as plt 34
35 # load AdK DIMS trajectory 36 from MDAnalysis.tests.datafiles import PSF, DCD 37 u = MDAnalysis.Universe(PSF, DCD) 38
39 # one AtomGroup per domain 40 domains = { 41 ’CORE’: u.selectAtoms("(resid 1:29 or resid 60:121 or resid 160:214) and name CA"), 42 ’LID’: u.selectAtoms("resid 122-159 and name CA"), 43 ’NMP’: u.selectAtoms("resid 30-59 and name CA"), 44 } 45 colors = {’CORE’: ’black’, ’NMP’: ’blue’, ’LID’: ’red’} 46
47 u.trajectory[0] # rewind trajectory 48 xref0 = dict((name, g.positions - g.centerOfMass()) for name,g in domains.iteritems()) 49
50 nframes = len(u.trajectory) 51 results = dict((name, np.zeros((nframes, 2), dtype=np.float64)) for name in domains) 52
53 for iframe,ts in enumerate(u.trajectory): 54 for name, g in domains.iteritems(): 55 results[name][iframe, :] = u.trajectory.time, rmsd(g, xref0[name]) 56
57
58 # plot 59 fig = plt.figure(figsize=(5,5)) 60 ax = fig.add_subplot(111) 61 for name in "CORE", "NMP", "LID": 62 data = results[name] 63 ax.plot(data[:,0], data[:,1], linestyle="-", color=colors[name], lw=2, label=name) 64 ax.legend(loc="best") 65 ax.set_xlabel(r"time $t$ (ps)") 66 ax.set_ylabel(r"C$_\alpha$ RMSD from $t=0$, $\rho_{\mathrm{C}_\alpha}$ ($\AA$)") 67
68 for ext in (’svg’, ’pdf’, ’png’): 69 fig.savefig("AdK_domain_rigidity.{0}".format(ext))
1.5. Intermediate Level MDAnalysis hacks 17
Practical 15: MDAnalysis Documentation, Release 1.0
18 Chapter 1. Contents
20 Chapter 2. References
BIBLIOGRAPHY
[Michaud-Agrawal2011] N. Michaud-Agrawal, E. J. Denning, T. B. Woolf, and O. Beckstein. MDAnalysis: A Toolkit for the Analysis of Molecular Dynamics Simulations. J. Comput. Chem. 32 (2011), 2319–2327, doi:10.1002/jcc.21787 PMCID:PMC3144279
[Beckstein2009] O Beckstein. EJ Denning, JR Perilla, and TB Woolf. Zipping and Unzipping of Adenylate Ki- nase: Atomistic Insights into the Ensemble of Open/Closed Transitions. J Mol Biol 394 (2009), 160–176. doi:10.1016/j.jmb.2009.09.009
24 Bibliography
25
Contents