user manual for ereg - hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 computational...

95
User Manual for ereg Michael J. Dudek Hexadecapole September 2018 Copyright © 2018 Michael J. Dudek 1 INTRODUCTION ereg is a software package for modeling of proteins and nucleic acids. The energy functions, which include a careful treatment of electrostatic energy, provide an alter- native to standard models. The initial functionality of the package was molecular mechanics-based prediction of protein structure. Over a period of several years, the energy functions and conforma- tional search algorithms evolved under the selective pressure of success in this initial application. Sequence design functionality was obtained from the initial structure pre- diction functionality by adding reference states —for example the unfolded state for design of thermodynamic stability, or the unbound conformation for design of bind- ing affinity— along with search through residue sequence space. The same methods added to enable sequence design also enable prediction of pK a for ionizable residues. Most recently, by introducing a generalization of the concept of residue, much of the protein functionality has been extended to oligonucleotides, including in the residue data set some of the more common nucleotide chemical modifications. The user interface is designed with a goal of creating an easily-usable tool set, provid- ing functionality in factored units of utility that should be combinable to accomplish a range of modeling studies. In version sep2018 of the program, modeling is limited to systems consisting of pro- tein and oligonucleotide chains. Plans for future versions include extension of the residue data set to include additional post translationally modified amino acids, addi- tional chemically modified nucleotides, lipids, carbohydrates, and other biomolecules. 1

Upload: others

Post on 11-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

User Manual for ereg

Michael J. Dudek

Hexadecapole

September 2018Copyright © 2018 Michael J. Dudek

1 INTRODUCTION

ereg is a software package for modeling of proteins and nucleic acids. The energyfunctions, which include a careful treatment of electrostatic energy, provide an alter-native to standard models.

The initial functionality of the package was molecular mechanics-based prediction ofprotein structure. Over a period of several years, the energy functions and conforma-tional search algorithms evolved under the selective pressure of success in this initialapplication. Sequence design functionality was obtained from the initial structure pre-diction functionality by adding reference states —for example the unfolded state fordesign of thermodynamic stability, or the unbound conformation for design of bind-ing affinity— along with search through residue sequence space. The same methodsadded to enable sequence design also enable prediction of pKa for ionizable residues.Most recently, by introducing a generalization of the concept of residue, much of theprotein functionality has been extended to oligonucleotides, including in the residuedata set some of the more common nucleotide chemical modifications.

The user interface is designed with a goal of creating an easily-usable tool set, provid-ing functionality in factored units of utility that should be combinable to accomplish arange of modeling studies.

In version sep2018 of the program, modeling is limited to systems consisting of pro-tein and oligonucleotide chains. Plans for future versions include extension of theresidue data set to include additional post translationally modified amino acids, addi-tional chemically modified nucleotides, lipids, carbohydrates, and other biomolecules.

1

Page 2: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

2 COMPUTATIONAL REQUIREMENTS

The program, which consists of roughly 100,000 lines of C++ code, was developed for alinux (or macOS) workstation with a requirement of 4, and preferably more, Gigabytesof memory. The source code is compiled using gcc. The calculations are computation-ally intensive.

3 INSTALLATION

1) From the company website, download the current version of the program:ereg_sep2018.tar.gz.

2) Restore the directory structure.% gunzip ereg_sep2018.tar.gz% tar xvf ereg_sep2018.tar

The top level directory, ereg_sep2018, can be renamed as desired.

3) Activate LAPACK options.% cd ereg_sep2018/src% zzlapack_mac

For 2 source code files, better performance can be achieved by interfacing to LAPACK.The script zzlapack_mac replaces these 2 files with versions that interface correctlyto the LAPACK available with Xcode installed on a macOS system. Optionally, alterna-tive script zzlpack_linux should interface to LAPACK installed on a linux system, andzzlpack_noblas uses equivalent linear algebra routines contained within the eregcodebase.

For each of the 3 LAPACK options (macOS, linux, and none), the directorysrc/zzcodebase contains a consistent variant of the 2 source code files, and of 3 as-sociated makefiles. On a linux system, before running the script zzlpack_linux to en-able compilation using LAPACK, the user must edit the 3 makefiles associated with thisoption (LIN_med_makefile, LIN_tra_makefile, and LIN_str_makefile). Replace thecompiler flags

-L/home/mike/enviro/LAPACK/lib -llapack -lcblas -lf77blas -latlas-I/home/mike/enviro/LAPACK/include

for consistency with the system-specific LAPACK installation.

4) Compile the codebase.% cd src% zzmake

2

Page 3: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

5) Compile the program commands.% cd src/str% make compile

If no errors are encountered, the program has installed successfully.

4 DIRECTORY STRUCTURE

The top level directory contains the following 5 subdirectories.

The src directory tree contains the C++ source code, the object files created by com-pilation of the source code, and all permanent data files. Following compilation, theexecutable user commands are contained in subdirectory src/str.

The man directory contains the user manual in ascii and PDF formats.

The fam directory tree contains the path structure required by ereg commands for lo-cating user-supplied and program-generated data files. A new user project is initiatedby copying and renaming this directory tree structure. The subdirectories fam/arg,fam/car, fam/dgn, fam/exp, fam/seq, fam/stp, and fam/tor organize all data files as-sociated with a single user project.

The scop directory is a user project with the distinction from all other user projectsthat data files generated and stored in this directory tree are required by the igorcommand. Experimental pdb structures for 5529 domains of the SCOP40 set of do-main definitions were entered into subdirectory scop/exp. For each of these domains,geometry was regularized using the greg command. As a component of igor sampling,coordinates of regularized SCOP domains are used in conversion of chain states of theigor model to full atom structures. This directory should not be renamed or deleted.

The test directory is a user project containing input files for each of the example testcases included in this manual. For each ereg command, the corresponding examplesdemonstrate the functionality, verify proper execution, and benchmark computationalrequirements.

5 USER INTERFACE

At the expense of a graphical interface, development resources have been focused onenergy functions, conformational search algorithms, and a handful of supporting util-ities. The ereg package interfaces to programs (such as pymol) for graphical output

3

Page 4: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

display through pdb-format files. The user interface consists of 12 easily learned com-mands typed to the linux prompt.

The following command line arguments, each replaced by the user with a meaningfulstring of characters, specify the associated quantities.

FAM A project name (i.e. test, scop, ...). A project name could be chosen to identify afamily of systems, related in sequence and structure, (i.e. mab, kinases, ...).

MOL A molecule or set of molecules that compose a system (i.e. 1crn, 1pga, ...).

CNF A structure or conformation of the system.

SUB A subset of the set of rigid-geometry degrees of freedom of the system.

GRP A collection of templates to be used in homology model building.

6 CENTRAL COMMANDS

The functionality of the package is accessed through the following 10 commands.

6.1 greg

FUNCTIONALITYgeometry regularization

SYNTAXgreg FAM MOL

INPUT FILES OUTPUT FILES/FAM/exp/MOL.pdb /FAM/dgn/greg.MOL

/FAM/seq/seq.MOL/FAM/tor/tor.MOL.01/FAM/car/MOL.01.pdb

SUMMARY

1) inputs a pdb-format file, FAM/exp/MOL.pdb, usually an experimental structure

4

Page 5: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

2) regularizes geometry, meaning bond lengths, bond angles, and some torsion anglesare adjusted to standard values with minimal movement of atom coordinates

3) outputs, in file FAM/seq/seq.MOL, residue sequence and disulfide crosslinks, a spec-ification of the covalent connectivity of the system independent of conformation

4) outputs 2 alternative specifications of the geometry regularized structure, cartesiancoordinates in pdb-format file FAM/car/MOL.01.pdb, and torsion angle coordinates infile FAM/tor/tor.MOL.01

5) outputs, in file FAM/dgn/greg.MOL, a diagnostic summary of the execution of thecommand

NOTES

The energy surface is defined for a rigid-geometry, nonpolarizable model. To accessthe energy surface for structure prediction, a generalized structure of a molecule orsystem of molecules must be moved into the subspace of structures consistent withregularized geometry.

The primary use of this command is, for a collection of templates in preparation forhomology model building, to convert experimental structures into geometry regular-ized structures. A second, less common, use is geometry regularization of large struc-tures, as an alternative to the ereg command, in preparation for application of struc-ture prediction to localized regions.

Preparation for the command consists of entering a pdb-format file MOL.pdb into direc-tory FAM/exp, and possibly editing MOL.pdb to include only the chains or fragments ofinterest. The only diagnostic information associated with the command is the heavyatom RMSD between the final geometry regularized structure and the initial structure.The conformation name 01 is assigned by the program to the conformation that re-sults from initial geometry regularization of an experimental structure.

The greg command does not access the energy surface and requires only a few sec-onds of computation time.

EXAMPLE TEST CASE

File 1crn.pdb, a crystal structure of a small 46 residue protein, has been entered intodirectory test/exp.

To execute the command, open a window to directory src/str, and type the followingline to the linux (or macOS) prompt.% greg test 1crn

The program outputs files test/seq/seq.1crn, test/tor/tor.1crn.01,

5

Page 6: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

test/car/1crn.01.pdb, and test/dgn/greg.1crn.

The heavy atom RMSD between experimental and geometry regularized structures is.104 angstroms.

6.2 ereg

FUNCTIONALITYlocal energy minimization

SYNTAXereg FAM MOL

INPUT FILES OUTPUT FILES/FAM/exp/MOL.pdb /FAM/dgn/ereg.MOL

/FAM/seq/seq.MOL/FAM/tor/tor.MOL.CNF/FAM/car/MOL.CNF.pdb

SUMMARY

1) inputs a pdb-format file, FAM/exp/MOL.pdb, usually an experimental structure

2) regularizes geometry, meaning bond lengths, bond angles, and some torsion anglesare adjusted to standard values with minimal movement of atom coordinates

3) starting from the initial geometry-regularized structure, minimizes energy locallywith respect to all torsion angle degrees of freedom

4) to prevent movement out of the well of the initial conformation, minimization on theenergy surface is accomplished by generating a sequence of up to 9 local minimiza-tion trajectories, gradually reducing to zero the weighting coefficient for a set of har-monic distance constraints taken from the initial structure

5) outputs, in file FAM/seq/seq.MOL, a specification of residue sequence and disulfidecrosslinks

6) outputs, using CNF=01 to denote the initial geometry-regularized structure, andCNF={02,03,04,...,10} to denote the energy minima corresponding to the endpointsof the sequence of local minimization trajectories, 2 alternative specifications of con-formation, cartesian coordinates in pdb-format file FAM/car/MOL.CNF.pdb, and torsionangle coordinates in file FAM/tor/tor.MOL.CNF

6

Page 7: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

7) outputs, in file FAM/dgn/ereg.MOL, a diagnostic summary of the execution of thecommand, including a compact analysis of the single point energy and RMSD evalua-tions at the end points of the sequence of local minimizations

NOTES

The ereg command is a better alternative to the greg command for bringing an exper-imental structure into regularized geometry in preparation for sequence design, pre-diction of pKa values of ionizable groups, or structure prediction of localized regions.Since this command accesses the restbchm energy surface, it is also useful for scoringstructures based on evaluation of total energy.

Preparation for the command consists of entering a pdb-format file MOL.pdb into direc-tory FAM/exp, and possibly editing MOL.pdb to include only the chains or fragments ofinterest. Consistent with the greg command, the conformation name 01 is assignedby the program to the conformation that results from initial geometry regularization.If the input structure, FAM/exp/MOL.pdb, was created by the ereg program, then ge-ometry regularization should not alter the input structure, and the CNF=01 structure isidentical to the input structure.

Table I. Components of Total Energy.

Component Name Functional Form

repulsion+dispersion Fr 3-parameter buf14-7electrostatic Fe multipole expansiondisulfide crosslink Fs harmonicintrinsic torsional Ft 1D, 2D, or 3D fourier seriesring closure crosslinka Fb harmonicdistance constraint Fc harmonichydration entropy Fh gaussian volumedielectric medium Fm boundary element solution to Poisson equationconformation penaltyb Fg step functionpolarizationc Fp

aThe Fb component, which contributes to the energy of the proline ring and to the energies of theribose and deoxyribose rings could logically be considered a part of Ft. It is currently being split outas a separate component to assist in analysis of this initial parameterization of nucleotide residues.

bThe Fg component is an exploratory term used to penalize residue conformations that occur toofrequently in structure prediction. It has not yet been described in publication.

cThe Fp component is an exploratory term and has not yet been described in publication.

The notation used to refer to components of the total energy is defined in Table I. En-

7

Page 8: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

ergy minimization is carried out on the (Fr+Fe+Fs+Ft+Fb+Fc+Fh) energy surface, withcharges on ionized groups scaled to 3/8 of full values. At the endpoint of local min-imization, charges on ionized groups are restored to full values, and Fe is recalcu-lated along with Fm as a single point evaluation. Additional components (Fg+Fp), alsoevaluated only at the endpoint of local minimization, are added to complete the to-tal energy. The sequence of local minimizations on the approximate energy surface(Fr+Fe+Fs+Ft+Fb+Fc+Fh), using reduced charges on ionized groups, and gradually re-ducing to zero the weighting coefficient for Fc, is ended if the single point evaluation oftotal energy (excluding Fc) increases.

For systems in which the number of torsion angle degrees of freedom is greater than1700, the command chooses a collection of subsets of degrees of freedom, eachcontaining less than 1700 degrees of freedom, such that the union spans the entiremolecule or system of molecules. Local energy minimization is accomplished by cy-cling through this collection, minimizing locally with respect to each subset beforemoving to the next. For energy minimizations with respect to subsets of the full setof rigid geometry degrees of freedom, the function minimized is a partial sum over thefull set of terms that contribute to the total energy of the system. Those terms thatdepend only on degrees of freedom outside of the subset are not included. Becausepartial energies are of limited utility for predicting the relative stability of 2 conforma-tions, the single point evaluation of (Fe+Fm+Fg+Fp) at the endpoint of local minimiza-tion is not included. For accomplishing geometry regularization of very large struc-tures, the greg command can be used as a fast alternative to the ereg command.

EXAMPLE TEST CASE

File 1pga.pdb, a crystal structure of a small 56 residue protein, has been entered intodirectory test/exp.

To execute the command, open a window to directory src/str, and type the followingline to the linux prompt.% ereg test 1pga

The program outputs files test/seq/seq.1pga, test/tor/tor.1pga.CNF forCNF={01,02,03,...,07}, test/car/1pga.CNF.pdb for CNF={01,02,03,...,07}, andtest/dgn/ereg.1pga.

In the following abbreviated listing, a vertical line in column 1 indicates a line that hasbeen added for the purpose of description.

File=test/dgn/ereg.1pga

continued on next page

8

Page 9: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Diagnostic file created by the command:% ereg test 1pga

REGULARIZE GEOMETRY

heavy atom RMSD= 0.124

For CNF=01, created by geometry regularization with no consideration of energy.

CONSTRUCT MECHANICAL SYSTEM

subset name=a00

Variable subset name is a name assigned by the program to the subset of degreesof freedom with respect to which energy is being minimized. For a protein chaincomposed of less than about 150 residues, a single subset is sufficient to contain alltorsion angle degrees of freedom.

nZ0 nS0

1 0

cQ1 cQ1bb cU1 cJ1 cY1

322 168 154 205 145

cQ2 cQ2bb

322 168

cF1 cG2 cB2 cH1 cG1 cB1

3970 3938 32 1750 3938 32 cQ3 cX2

322 321cE0 cC1

51681 2126

Diagnostic set sizes. For program design utility, set names consist of a 2-character(capital letter+number) combination. Set name definitions can be found in filesrc/str/dimensions. For example, Z0 is the set of chains in the system, Q1 is theset of torsion degrees of freedom. This information, which has utility for algorithmdevelopment, is best ignored for application use.

Z0sub

0

iR0 R0aa Q0sub

1 eMET 111111

2 THR 1111113 TYR 111111

4 LYS 11111111

5 LEU 1111111

6 ILE 1111111

.

.

.

51 THR 111111

52 PHE 11111

53 THR 111111

continued on next page

9

Page 10: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

54 VAL 111111

55 THR 111111

56 GLUe 111111

A profile of the set of torsion angle degrees of freedom. Here, chain transla-tion+rotation is not used, all torsion degrees of freedom are variable. For eachresidue, the order of torsion angles used in mechanical system objects can be foundin data file src/dat/residue_mappings, input to variable Q0tor.

subset name=a00

F

-1022.126

Fr Fe Fs Ft

333.543 -1203.029 0.000 16.139Fc Fh Fw Fb

46.956 -103.415 -112.320 0.000

cQ2 cQ2bb cF2 cG3 cB3 cH2 cJ2 cY2 cE1 cC1

322 168 3870 3841 29 1750 205 145 51681 2126

Diagnostic output following mechanical system contraction, meaning energy contri-butions that can not change using this subset of degrees of freedom are removed.

MINIMIZE RESTCHM LOCALLY

distance constraint coeff index= 0

Affecting energy component Fc, the weighting coefficient for the sum of harmonicdistance constraints is set at 1.00 kcal/(mol-bohr2). Weights corresponding to otherindexes can be found in data file src/dat/energy_params, input to program vari-able W0a.

subset name=a00

steps remaining=201 steps taken= 55 endstate=0

Variable steps remaining is a counter, initiated as the maximum number of steps,and decremented following each step. A trajectory ends either when the RMS of thegradient falls below a threshold value, or when steps remaining=0.

For variable endstate, a nonzero value indicates abnormal termination.

i F z b lam2 s2 d2 delF

0-1.02213e+03 3.9301e+01 6.2720e-02 3.1633e+01 6.2750e-02 6.2750e-02-1.07231e+03 MACRO

1-1.06317e+03 8.3210e+00 7.8400e-02 1.3295e+01 7.8469e-02 1.2280e-01-1.09075e+03 MICRO

2-1.08632e+03 3.7034e+00 7.8400e-02 9.4882e+00 7.8449e-02 1.7725e-01-1.10373e+03 MICRO

3-1.10013e+03 1.6439e+00 7.5200e-02 1.1186e+01 7.5204e-02 2.1443e-01-1.11565e+03 MICRO4-1.10825e+03 1.0619e+00 4.9482e-02 1.1737e+01 4.9486e-02 2.3886e-01-1.11573e+03 MICRO

5-1.11291e+03 7.4547e-01 2.4939e-02 1.7368e+01 2.4939e-02 2.5677e-01-1.11601e+03 MICRO|inc

6-1.11527e+03 4.8317e-01 9.6489e-03 6.5000e+01 9.6490e-03 2.5890e-01-1.11677e+03 MICRO

7-1.11372e+03 7.1366e-01 8.0000e-02 6.5736e+00 8.0085e-02 8.0085e-02-1.12597e+03 MACRO

8-1.12348e+03 1.4693e+00 6.1440e-02 5.2975e+00 6.1471e-02 1.2642e-01-1.13049e+03 MICRO

continued on next page

10

Page 11: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

.

.

.

47-1.14649e+03 1.6356e-01 5.7273e-02 1.6798e-01 5.7278e-02 9.5295e-02-1.14702e+03 MICRO

48-1.14697e+03 4.1082e-01 6.8728e-02 0.0000e+00 4.3389e-02 1.0981e-01-1.14716e+03 MICRO49-1.14692e+03 3.0975e-01 3.6029e-02 1.6979e+00 3.6029e-02 3.6029e-02-1.14754e+03 MACRO

50-1.14731e+03 9.9589e-02 3.3777e-02 5.8839e-02 3.3779e-02 5.1374e-02-1.14735e+03 MICRO

51-1.14734e+03 2.0394e-02 4.0532e-02 0.0000e+00 1.1850e-02 6.0157e-02-1.14735e+03 MICRO

52-1.14729e+03 6.3691e-02 3.2696e-02 3.3250e-04 3.2696e-02 3.2696e-02-1.14731e+03 MACRO

53-1.14731e+03 2.1654e-02 3.2696e-02 0.0000e+00 7.3116e-03 2.5543e-02-1.14731e+03 MICRO54-1.14731e+03 3.1055e-03 3.9235e-02 0.0000e+00 4.0092e-03 4.0092e-03-1.14731e+03 MACRO

55-1.14731e+03 7.0216e-04

At each point of the trajectory, the program calculates energy, 1st derivatives, and2nd derivatives.

These quantities define a harmonic approximation to the actual energy surface.

Steps are calculated using Newton’s method to minimize the harmonic surfacewithin a trust region.

For the above compact characterization of the minimization trajectory, columnlabels are defined as follows:

i Minimization step.

F Total energy.

z RMS of the gradient.

b Radius of the trust region, or equivalently RMS of a step to the boundary ofthe trust region.

lam2 Lagrange multiplier used to enforce the condition that the step minimize theharmonic surface on the boundary of the trust region. A value of zero indi-cates the minimum of the harmonic surface is inside of the boundary of thetrust region.

s2 RMS of the calculated step.

delF Value of F estimated by the harmonic surface at the position of the calculatedstep.

At each step i>1, the radius of the trust region, b(i), is adjusted based on a com-parison of the actual change in energy, F(i)−F(i−1), to the decrease in energy,delF(i−1)−F(i−1), predicted by the harmonic surface of the previous step.

F Fr Fe Fs Ft Fc Fb Fh Fw

-1147.31 302.90 -1224.49 0.00 -9.00 31.05 0.00 -123.90 -123.86

continued on next page

11

Page 12: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Decomposition of energy at the endpoint of local minimization.

Variable F is the sum of components (Fr+Fe+Fs+Ft+Fb+Fc+Fh+Fw), where Fw is agreatly simplified estimate of Fm based on volume exclusion from a hydration shell.

Affecting component Fe, charges on ionized groups are scaled to 3/8 of full values.

single point energy evaluation

nA5 nB5 nC5 nH5 nV5 nE5 nE6 nF5 nF6 nF7

6 10 6 14 18 18 18 6 9 5

nDOT nD5 nHIT

758 53 48nA5 nB5 nC5 nH5 nV5 nE5 nE6 nF5 nF6 nF7

1180 43105 610 1796 1830 1830 1812 610 906 304

nDOT nD5 nHIT

37298 2891 4737

Diagnostic set sizes, following generation of a molecular surface, and partitioning ofthe surface into elements.

Fm Fps Fss

-365.87666 -622.83559 241.06665

Single point evaluation of Fm. When placed in a dielectric continuum, the chargedistribution of the protein (or, more generally, the system of molecules) induces acharge distribution on the boundary surface between the protein and the dielectriccontinuum. Fps is the interaction energy of the charge distribution of the proteinwith the charge distribution of the boundary surface. Fssis the interaction energyof the charge distribution of the boundary surface with itself. Fm is calculated as alinear combination of components Fps and Fss.

F

-1631.64

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

302.90 -1434.07 0.00 -9.00 31.05 0.00 -123.90 -365.88 0.00 -1.68

Decomposition of full energy at the endpoint of local minimization.

Charges on ionized groups are restored to full values. F is the sum of all energycomponents excluding Fc.

heavy atom RMSD= 0.356

For CNF=02, the endpoint of local minimization.

distance constraint coeff index=-1

The weighting coefficient for the sum of harmonic distance constraints is assignedsuccessively smaller values for minimization trajectories CNF={03,04,05,...}.

continued on next page

12

Page 13: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

.

.

.

F

-1646.09Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

295.86 -1436.00 0.00 -11.18 17.74 0.00 -127.69 -365.61 0.60 -2.07

heavy atom RMSD= 0.390

For CNF=03.

distance constraint coeff index=-2

.

.

.

F-1655.99

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

290.71 -1439.01 0.00 -11.07 12.58 0.00 -133.95 -361.17 0.60 -2.10

heavy atom RMSD= 0.485

For CNF=04.

distance constraint coeff index=-3

.

.

.

F-1663.70

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

291.70 -1441.44 0.00 -12.43 6.72 0.00 -140.58 -359.53 0.60 -2.02

heavy atom RMSD= 0.604

For CNF=05.

distance constraint coeff index=-4

.

.

.

F-1669.28

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

290.56 -1443.63 0.00 -11.43 3.67 0.00 -145.31 -358.09 0.60 -1.97

heavy atom RMSD= 0.716

For CNF=06.

distance constraint coeff index=-5

.

.

.

F

continued on next page

13

Page 14: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

-1672.75

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

290.58 -1443.37 0.00 -9.40 1.89 0.00 -150.64 -358.59 0.60 -1.93

heavy atom RMSD= 0.845

For CNF=07.

distance constraint coeff index=-6

.

.

.

F-1671.74

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

291.01 -1443.25 0.00 -9.39 0.49 0.00 -151.09 -357.69 0.60 -1.92

Because the full energy (excluding Fc) increases, the sequence of local minimizationtrajectories is ended, and the endpoint of this final minimization is not accepted.

COMPACT ANALYSIS

W0 F Fr Fe Fs Ft Fc Fb Fh Fm RMSD

0 -1631.64 302.90 -1434.07 0.00 -9.00 31.05 0.00 -123.90 -365.88 0.36

-1 -1646.09 295.86 -1436.00 0.00 -11.18 17.74 0.00 -127.69 -365.61 0.39

-2 -1655.99 290.71 -1439.01 0.00 -11.07 12.58 0.00 -133.95 -361.17 0.49-3 -1663.70 291.70 -1441.44 0.00 -12.43 6.72 0.00 -140.58 -359.53 0.60

-4 -1669.28 290.56 -1443.63 0.00 -11.43 3.67 0.00 -145.31 -358.09 0.72

-5 -1672.75 290.58 -1443.37 0.00 -9.40 1.89 0.00 -150.64 -358.59 0.84________ ________ ________ ________ ________ ________ ________ ________ ________

-41.11 -12.32 -9.30 0.00 -0.40 -29.16 0.00 -26.73 7.28W0 Fm Fps Fss -Fss/Fps

0 -365.88 -622.84 241.07 0.3870

-1 -365.61 -622.94 241.56 0.3878

-2 -361.17 -614.94 238.10 0.3872

-3 -359.53 -613.03 238.08 0.3884-4 -358.09 -610.47 237.00 0.3882

-5 -358.59 -610.84 236.76 0.3876________ ________ ________

7.28 11.99 -4.31

A compact summary of how the full energy components and RMSD change withminimization.

The local energy minimum conformation is characterized by a heavy atom RMSDfrom the experimental structure of .84 angstroms, and by a total energy of -1672.75kcal/mol.

The ereg command accesses the energy surface. Using a 4.0GHz Intel Core i7 proces-sor, local energy minimization of 1pga requires computation time of about 8 minutes.

14

Page 15: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

6.3 estp

FUNCTIONALITYstructure prediction and sequence design

SYNTAXestp FAM MOL CNF SUB

INPUT FILES OUTPUT FILES/FAM/seq/seq.MOL /FAM/dgn/estp.MOL.CNF.SUB/FAM/car/MOL.CNF.pdb /FAM/car/MOL.CNF_SUB_???.pdb/FAM/stp/stp.SUB.MOL /FAM/tor/tor.MOL.CNF_SUB_???

/FAM/car/MOL.g???.pdb

SUMMARY

1) inputs a sequence, FAM/seq/seq.MOL

2) inputs an initial conformation in pdb format, FAM/car/MOL.CNF.pdb

3) inputs a compact file, FAM/stp/stp.SUB.MOL, specifying the subset of degrees offreedom to be searched and, at each sequence position, the set of residues to be sub-stituted

4) for each sequence variant, generates a collection of local minima on the energy sur-face by minimizing globally with respect to a subset of torsion angle degrees of free-dom concentrated in one or more chain segments

5) for each local energy minimum conformation, here using ??? to denote 3 digitsspecifying the order of the minimum based on energy, outputs cartesian coordinatesin pdb-format file FAM/car/MOL.CNF_SUB_???.pdb, and torsion angle coordinates infile FAM/tor/tor.MOL.CNF_SUB_???

6) outputs, in file FAM/dgn/estp.MOL.CNF.SUB, a diagnostic summary of the executionof the command

7) for each of the most stable sequences found in the search through sequence space,here using ??? to denote the order of the sequence variant based on calculatedfree energy of folding, outputs the lowest-energy conformation in pdb-format fileFAM/car/MOL.g???.pdb

NOTES

The most useful functionality of the ereg program is accessed using the estp com-

15

Page 16: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

mand. Structure prediction for segments of proteins is achieved by search throughconformation space. Sequence design of thermodynamic stability is achieved by ad-ditional search through sequence space.

In the context of a trajectory of local energy minima, used to accomplish global min-imization with respect to subsets of torsion angle degrees of freedom larger than asingle segment, the estp command generates the component steps to the next con-formation of the trajectory.

Using conformation CNF as a starting structure, the estp command searches for theconformation that minimizes the detailed energy function, globally, within a subspaceof the full space of motion for the rigid geometry system MOL. The subspace SUB ofconformations to be searched consists of from 1 to 8 segments, each segment 7 to 13residues in length, plus a collection of side chains.

The search through the specified subspace of conformations consists of the followingsequence of calculations. A segment, or segments, of the initial backbone structureis deformed, analytically, resulting in a large collection of alternative backbone struc-tures. Local energy minimization is applied, resulting in a collection of optimized back-bone structures. Interactions involving side chains that will be adjusted later are notyet included in the energy. For each optimized backbone structure, one optimized sidechain structure is added, the result of a trajectory of local minima search through thesub-subspace of side chain motion. Local energy minimization is applied to the back-bone plus side chain combination, resulting in a collection of local minima for the de-formed segment, or segments.

To achieve efficiency in applications to protein surface loops, local energy minimiza-tion uses distance constraints between rigid segments to maintain fixed the structureof the region outside of the segments specified as variable. Efficiency is enhanced fur-ther by original algorithms for deforming segments of the protein chain, and for re-moving overlaps from starting conformations.

In the default mode of execution, the subset of degrees of freedom specified in filetest/stp/stp.SUB.MOL is expanded, following input to the program, to include sidechain torsions for all side chains that can contact a deformable segment. This automa-tion of subset expansion has been found preferable in comparison to manual subsetexpansion, which requires of the user graphical examination of the initial structure.Also specified in file test/stp/stp.SUB.MOL is the set of residue types allowed ateach sequence position. In the default mode of execution, the space of sequencessearched consists of all possible combinations. For example, 5 allowed residue typesat 3 sequence positions produces 125 possible combinations. The 1st charcter ofcommand line argument SUB is used to control 3 variant modes of execution. Thismechanism of control has been found more convenient in comparison to addition ofa command line argument. For 1st character ’l’ or ’h’, automated subset expansion

16

Page 17: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

is skipped, thus allowing complete subset specification by manual preparation of filetest/stp/stp.SUB.MOL. For 1st character ’h’ or ’i’, the combinatorial search throughsequence space skips over all but the initial and final sequences, thus allowing a fastcomparison of the stability of 2 sequences. A common use is for evaluation of the sta-bility of a mutant sequence in comparison to native.

For each sequence of the specified space of sequences, the program searches throughthe specified space of conformations. The lowest-energy conformation found is usedto evaluate dG, an estimate of free energy of folding. Because different models areused to represent the folded state of the protein and the reference unfolded state, dGis not, for a single sequence, a physically meaningful measure of stability. However,ddG=( dG(sequence1) -dG(sequence0)), logo+name2.xcfthe change in dG with se-quence, does provide a meaningfull measure of relative stabilty.

The free energy of the folded state is calculated as ( cpp( Fr+Fe+Fs+Ft+Fb+Fc+Fg+Fp)+chFh+cpsFps+cssFss), where { cpp, ch, cps, css} are coefficients optimized such that cal-culated ddG matches observed ddG over a large dataset of experimental measure-ments, and where energy components are evaluated for the lowest-energy conforma-tion found in the search through conformation space. The free energy of the unfoldedstate, Gunf, is estimated as the free energy of an Ising model representation of the un-folded state.

EXAMPLE TEST CASE

File 4pti.pdb, a crystal structure of a 58-residue inhibitor of the serine proteasetrypsin, has been entered into directory test/exp. File stp.l5.4pti, a subset of tor-sion angle degrees of freedom corresponding to a surface loop (44-50) of 4pti, hasbeen created by hand in directory test/stp.

To execute the command, open a window to directory src/str, and type the followinglines to the linux prompt.% ereg test 4pti% estp test 4pti 06 l5

The former command is executed for the purpose of creating input filestest/seq/seq.4pti and test/car/4pti.06.pdb.

In the following abbreviated listing, a vertical line in column 1 indicates a line that hasbeen added for the purpose of description.

file=test/dgn/estp.4pti.06.l5

continued on next page

17

Page 18: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Diagnostic file created by the command:% estp test 4pti 06 l5

REGULARIZE GEOMETRY

heavy atom RMSD= 0.008

In this example, the input conformation test/car/4pti.06.pdb is already regu-larized. The small RMSD results from a truncation of atom positions imposed bystorage of the conformation as a pdb format file.

SEARCH COMBINATORIAL SEQUENCE SPACE

This output line marks entry into the program’s outer loop, which controls searchthrough sequence space. In this example, the functionality being accessed is struc-ture prediction, as opposed to the more general functionality of sequence design.File test/stp/stp.l5.4pti directs search through conformation space for a sin-gle 7-residue segment. Because the input sequence test/seq/seq.4pti is notchanged, the program outer loop reduces to a single iteration.

MINIMIZE RESTCHM GLOBALLY

For each iteration of the outer loop, this output line marks entry into search throughconformation space.

CONSTRUCT MECHANICAL SYSTEM

subset name=l5

The fourth command line argument is used as the name for this initial subset ofdegrees of freedom.

nZ0 nS0

1 3

cQ1 cQ1bb cU1 cJ1 cY1

343 174 169 228 162

cQ2 cQ2bb

31 22

cF1 cG2 cB2 cH1 cG1 cB1

4152 1059 3093 1859 4120 32

cQ3 cX2

81 81cE0 cC1

1920 16

Diagnostic set sizes.

Z0sub

0

iR0 R0aa Q0sub

1 eARG 000000000

2 PRO 000000

continued on next page

18

Page 19: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

3 ASP 00000

4 PHE 00000

5 CYS 00000

6 LEU 0000000

7 GLU 0000008 PRO 000000

9 PRO 000000

10 TYR 000000

11 THR 000000

12 GLY 00013 PRO 000000

14 CYS 00000

15 LYS 00000000

16 ALA 0000

17 ARG 000000000018 ILE 0000000

19 ILE 0000000

20 ARG 0010000000

21 TYR 000000

22 PHE 0000023 TYR 000000

24 ASN 000000

25 ALA 0000

26 LYS 00000000

27 ALA 000028 GLY 000

29 LEU 0000000

30 CYS 00000

31 GLN 0000000

32 THR 00000033 PHE 00000

34 VAL 000000

35 TYR 000000

36 GLY 000

37 GLY 00038 CYS 00000

39 ARG 0000000000

40 ALA 0000

41 LYS 00000000

42 ARG 000000000043 ASN 000000

44 ASN 111001

45 PHE 11101

46 LYS 11100001

47 SER 1110148 ALA 1101

49 GLU 111001

50 ASP 11101

51 CYS 00000

52 MET 0000000

continued on next page

19

Page 20: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

53 ARG 0010000000

54 THR 001000

55 CYS 00000

56 GLY 000

57 GLY 000

58 ALAe 0000

A profile of the set of torsion angle degrees of freedom. Chain translation+rotationis not used. Torsion degrees of freedom include all backbone torsions for the 7-residue segment {44 ASN–50 ASP}. Also included are the χ1 torsions for residuesof the segment, and for residues {20 ARG,53 ARG,54 THR} whose side chains con-tact the segment. Inclusion of the χ1 torsions provides a mechanism for neglect ofside chain interactions, beyond the Cβ atom, in the search over segment backboneconformations.

subset name=l5

F

-324.673

Fr Fe Fs Ft

18.633 -177.401 0.000 1.235Fc Fh Fw Fb

0.000 -14.872 -152.268 0.000

cQ2 cQ2bb cF2 cG3 cB3 cH2 cJ2 cY2 cE1 cC1

31 22 3792 781 3011 1719 19 6 1230 16

Diagnostic output following mechanical system contraction.

iJ3= 0 [ 0]BACKBONE DEFORMATIONS, segment= 0

def -kT*ln(p) dchi cnf0

Column headings:

def= index of deformation in order of probability, the initial undeformed conforma-tion is assigned index 0

-kT*ln(p)= estimate of energy of deformation based on probabilities of occurrenceof dipeptide conformations

dchi= RMS difference (in degrees) of φ and ψ torsion angles from nondeformedconformation

cnf0= sequence of single residue conformational regions

0 8.26 0.0 E E A E A A A

1 8.20 26.7 X A X X E A*A

2 9.01 27.5 X A C E C A*A

3 9.03 22.0 X E E X X A A

.

continued on next page

20

Page 21: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

.

.

288 13.77 29.2 X X*E X E X*A

In this example, a single segment is deformed. Deformations are exact meaningcoordinates for adjacent nondeformed segments are unchanged. Coverage of thespace of segment deformations is approximately uniform. The initial conformationis included as the first deformation before ordering. For each new deformation, RMSdistance from all previously accepted deformations is required to be greater than 9degrees.

BACKBONE OVERLAP REMOVAL, SCORE= 0.500[restbhw] +SCREEN

iD2 tot nE3 S1 S2 S3 S4 S5 Z2iD1

2048 17.8 0 -147.9 8.3 -40.7 167.0 31.1 0

cnf1=[E E A E A A A ]

cnf0=[E E A E A A A ]

2049 91.7 0 -130.3 8.2 -25.0 180.7 58.0 1cnf1=[X*A C X C A*A ]

cnf0=[X A X X E A*A ]

2050 99.2 0 -126.4 9.0 -23.6 182.9 57.4 2

cnf1=[X*A E X C A*X ]

cnf0=[X A C E C A*A ]

2051 20.1 0 -127.8 9.0 -39.9 154.8 24.0 3

cnf1=[E E E A X A A ]

cnf0=[X E E X X A A ]

.

.

.

2333 61.6 0 -126.0 13.8 -26.0 168.1 31.6 288

cnf1=[E A*E X E X*A ]

cnf0=[X X*E X E X*A ]

continued on next page

21

Page 22: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

tot=( S1+S2+S3+S4+S5) is a score evaluated at the endpoint of overlap removalfor the purpose of ordering deformations, enabling selection of those likely to bemost productive.

S1= .50( Fr+Fe+Fs+Ft+Fb+Fh+Fw+Fg+Fp), using charges on ionized groups scaledto .375 of full values.

Components S2-S5 are a fast screen.

S2= energy contributed by local conformation

S3= energy contributed by residue-residue contacts

S4= energy contributed by ionized groups

S5= energy contributed by hydrophobic groups

nE3= number of overlapping atom pairs remaining following overlap removal

cnf1= sequence of conformational regions at the endpoint of overlap removal

Deformations for which overlaps can not be removed are excluded.For consistency with the more general case of multiple segment deformations, theindex used here for iD2 is a temporary value, indexing scratch locations of D2, theset of combined backbone deformations. For the case of multiple segment deforma-tions, search through all combinations of individual segment deformations produceslarge numbers of combined backbone deformations. These combined deformationsare processed, using overlap removal and evaluation of screen score, in groups of2048, before being assimilated into the ordered set of D2 locations.

SORTED ORDER

iD2 tot S1 S2 S3 S4 S5 Z2iD1

0 7.75 -120.9 12.7 -41.5 142.5 14.9 229

cnf1=[E X X X A X A ]

1 15.45 -125.2 11.3 -46.4 150.6 25.1 108cnf1=[E X E X X A A ]

2 16.51 -117.9 10.2 -44.8 147.4 21.5 37

cnf1=[E X X*X X A A ]

3 17.77 -147.9 8.3 -40.7 167.0 31.1 0

cnf1=[E E A E A A A ]

.

.

.

255 126.48 -121.2 11.9 -26.3 194.0 68.1 148

cnf1=[X*X C E A*A*X ]

continued on next page

22

Page 23: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

The number of overlap free deformations has been reduced slightly by clustering.For single segment searches, the number of deformations passed to the next stageof local minimization on the energy surface is limited to 512.

BACKBONE LOCAL ENERGY MINIMIZATION,400 STEPS, SCORE= 0.750[restbchm] +SCREEN

iD2= 0 Z2iD1= 229

steps taken=406 rms of gradient= 5.8029e-01

S S1 S2 S3 S4 S5

-310.5 -387.0 6.4 -17.7 75.8 12.0Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

23.4 -166.6 0.0 5.8 0.0 0.0 -13.8 -363.9 0.6 -1.5

cnf2=[E X E A A E A ]

cnf0=[X A X X A A A ]

iD2= 1 Z2iD1= 108steps taken=406 rms of gradient= 4.0489e-01

S S1 S2 S3 S4 S5

-302.2 -382.8 5.7 -18.6 79.0 14.5

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

30.8 -168.3 0.0 5.2 0.0 0.0 -18.1 -359.7 1.2 -1.5cnf2=[E X E A A X A ]

cnf0=[X X E X X A A ]

iD2= 2 Z2iD1= 37

steps taken=406 rms of gradient= 5.6856e-01

S S1 S2 S3 S4 S5-304.0 -383.1 5.1 -19.3 78.8 14.5

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

23.7 -167.1 0.0 10.1 0.0 0.0 -16.1 -360.5 0.6 -1.6

cnf2=[E E E C A A A ]

cnf0=[X A X*X E X A ]

.

.

.

iD2= 255 Z2iD1= 148

steps taken=406 rms of gradient= 9.5953e-01

S S1 S2 S3 S4 S5

-292.9 -377.5 6.0 -15.4 87.5 6.4

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

28.7 -161.5 0.0 2.9 0.0 0.0 -14.1 -358.6 0.0 -0.7

cnf2=[X*A E E A*C*A ]

cnf0=[E X X E A*A*A ]

A local minimization trajectory, limited to approximately 400 steps, is generatedon the [restbchw] energy surface with charges on ionized groups scaled to .375of full values. At the endpoint of local minimization, ( Fe+Fm+Fg+Fp) are evaluatedwith charges on ionized groups restored to full values. Variable rms of gradientprovides a measure of the extent of convergence.

continued on next page

23

Page 24: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

S=( S1+S2+S3+S4+S5) is a score, evaluated at the endpoint of the local mini-mization trajectory, enabling selection of those conformations most likely to beproductive.

S1= .75( Fr+Fe+Fs+Ft+Fb+Fc+Fh+Fm+Fg+Fp)

Components S2-S5 are described above.

cnf2= sequence of conformational regions following energy minimization

In addition to the deformation corresponding to the initial conformation, 4 otherdeformations minimize to the initial conformation indicating that the extent of cov-erage of backbone deformations is probably sufficient.

For single segment searches, the number of backbone conformations passed on tocomplete minimization is limited to 128.

BACKBONE LOCAL ENERGY MINIMIZATION,800 STEPS, SCORE= 1.000[restbchm] +SCREEN

iD2= 0 Z2iD1= 6

steps taken=100 rms of gradient= 4.1540e-04

S S1 S2 S3 S4 S5

-451.3 -533.4 4.7 -19.5 82.5 14.5Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

17.1 -179.0 0.0 2.9 0.0 0.0 -15.9 -357.5 0.0 -1.1

cnf2=[E E E E*A A A ]

cnf0=[X E E X X X A ]

iD2= 1 Z2iD1= 47steps taken= 0 rms of gradient= 8.1414e-04

S S1 S2 S3 S4 S5

-449.0 -519.8 5.2 -20.8 76.7 9.7

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

27.2 -170.2 0.0 0.9 0.0 0.0 -16.5 -361.7 1.2 -0.7cnf2=[E X E E A A A ]

cnf0=[X X X*A X A A ]

iD2= 2 Z2iD1= 3

steps taken= 0 rms of gradient= 8.1746e-04

S S1 S2 S3 S4 S5-449.7 -524.4 4.5 -17.5 76.6 11.2

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

21.2 -173.6 0.0 4.6 0.0 0.0 -15.2 -361.6 1.2 -1.1

cnf2=[E E E X A C X ]

cnf0=[X E E X X A A ]

.

.

.

iD2= 127 Z2iD1= 124

steps taken=418 rms of gradient= 4.1432e-03

continued on next page

24

Page 25: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

S S1 S2 S3 S4 S5

-411.0 -506.2 5.8 -12.1 83.3 18.3

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

26.2 -161.7 0.0 2.6 0.0 0.0 -15.3 -358.5 1.2 -0.7

cnf2=[E E E*A*E*A A ]

cnf0=[E E X X*X*X A ]

The number of backbone conformations passed on to side chain conformationalsearch is limited to 64.

DEFORMATIONS FOR MODE=bbNumber of conformations= 64

iD0= 0 S=?? initial cnf=[E E A E A A A ]

S1=?? S2=?? S3=?? S4=?? S5=??ARG -135.85 172.28 176.68 -70.25 -75.17 178.02 100.45 -8.88 177.38 176.27ASN -79.56 78.81-178.59-164.48 -6.89-176.57ASN E -161.48 114.18-170.07-174.92 -25.74 179.41PHE E -128.46 155.00-171.05 -63.31 95.28LYS A -88.89 -12.02-176.64 -66.31-166.34-158.13-172.00 179.87SER E -158.35 166.72-179.46 66.97 -64.97ALA A -69.67 -37.55 179.67 60.00GLU A -60.71 -51.62 179.06 -97.51 73.80-113.08ASP A -53.77 -36.38-166.28 -71.55 -15.59ARG -67.67 -37.33-177.99-172.28 128.88 71.93-163.48 -2.00 178.11-178.86THR -80.69 -51.76-160.11 -75.16 88.30 60.00iD0= 1 S=-4.51309e+02 cnf=[E E E E*A A A ]

S1= -533.43 S2= 4.74 S3= -19.54 S4= 82.46 S5= 14.45ARG -135.85 172.28 176.68 -70.25 -75.17 178.02 100.45 -8.88 177.38 176.27ASN -79.56 78.81 168.68-164.48 -6.89-176.57ASN E -160.76 117.53-166.34-174.92 -25.74 179.41PHE E -119.84 159.30 176.22 -63.31 95.28LYS E -93.74 113.79-174.14 -66.31-166.34-158.13-172.00 179.87SER E* 68.24 171.05 176.91 66.97 -64.97ALA A -74.11 -29.06-176.98 60.00GLU A -57.73 -49.89-176.67 -97.51 73.80-113.08ASP A -64.14 -31.66-166.28 -71.55 -15.59ARG -67.67 -37.33-177.99-172.28 128.88 71.93-163.48 -2.00 178.11-178.86THR -80.69 -51.76-160.11 -75.16 88.30 60.00

iD0= 2 S=-4.50288e+02 cnf=[E E A E A A A ]

S1= -535.23 S2= 6.24 S3= -19.84 S4= 83.82 S5= 14.73ARG -135.85 172.28 176.68 -70.25 -75.17 178.02 100.45 -8.88 177.38 176.27ASN -79.56 78.81 172.24-164.48 -6.89-176.57ASN E -159.15 120.95-177.17-174.92 -25.74 179.41PHE E -121.30 146.52 179.76 -63.31 95.28LYS A -67.93 -30.13-172.25 -66.31-166.34-158.13-172.00 179.87SER E -148.26 157.05-179.21 66.97 -64.97ALA A -61.29 -38.79 177.78 60.00GLU A -52.21 -46.93-176.90 -97.51 73.80-113.08ASP A -67.53 -31.31-166.28 -71.55 -15.59ARG -67.67 -37.33-177.99-172.28 128.88 71.93-163.48 -2.00 178.11-178.86THR -80.69 -51.76-160.11 -75.16 88.30 60.00

.

.

.

iD0= 63 S=-4.17722e+02 cnf=[E*A X A*E A*A ]

continued on next page

25

Page 26: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

S1= -508.84 S2= 6.57 S3= -21.48 S4= 87.88 S5= 18.15ARG -135.85 172.28 176.68 -70.25 -75.17 178.02 100.45 -8.88 177.38 176.27ASN -79.56 78.81-165.96-164.48 -6.89-176.57ASN E* 157.99-165.94 -19.20-174.92 -25.74 179.41PHE A -121.09 28.48 154.23 -63.31 95.28LYS X -127.31 -65.29 165.73 -66.31-166.34-158.13-172.00 179.87SER A* 65.72 25.07-178.65 66.97 -64.97ALA E -71.13 107.31-177.67 60.00GLU A* 75.22 -0.98-168.81 -97.51 73.80-113.08ASP A -110.13 -10.90-166.28 -71.55 -15.59ARG -67.67 -37.33-177.99-172.28 128.88 71.93-163.48 -2.00 178.11-178.86THR -80.69 -51.76-160.11 -75.16 88.30 60.00

Local energy minima conformations for the segment backbone. The initial confor-mation is included, associated with index 0. No screen score has been evaluated forthe initial conformation. Other conformations are ordered based on screen score.

SIDE CHAIN GLOBAL ENERGY MINIMIZATION, SCORE= [resthm]

For each backbone conformation, a low-energy side chain conformation is gener-ated as follows. A rotamer lattice dead-end elimination search is used to obtain thecollection of combined rotamer conformations that have lowest energies on thelattice. A better scoring of rotamer combinations is obtained by moving off latticeusing local energy minimization with respect to variable side chain torsions.

iD0= 1CONSTRUCT MECHANICAL SYSTEM

subset name=l5

nZ0 nS0

1 3cQ1 cQ1bb cU1 cJ1 cY1

343 174 169 228 162

cQ2 cQ2bb

35 1

cF1 cG2 cB2 cH1 cG1 cB14152 389 3763 1859 4120 32

cQ3 cX2

92 92

cE0 cC1

2601 0Z0sub

0

iR0 R0aa Q0sub

1 eARG 000000000

2 PRO 0000003 ASP 00000

4 PHE 00000

5 CYS 00000

6 LEU 0000000

7 GLU 000000

continued on next page

26

Page 27: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

8 PRO 000000

9 PRO 000000

10 TYR 000000

11 THR 000000

12 GLY 00013 PRO 000000

14 CYS 00000

15 LYS 00000000

16 ALA 0000

17 ARG 000000000018 ILE 0000000

19 ILE 0000000

20 ARG 0011111110

21 TYR 000000

22 PHE 0000023 TYR 000000

24 ASN 000000

25 ALA 0000

26 LYS 00000000

27 ALA 000028 GLY 000

29 LEU 0000000

30 CYS 00000

31 GLN 0000000

32 THR 00000033 PHE 00000

34 VAL 000000

35 TYR 000000

36 GLY 000

37 GLY 00038 CYS 00000

39 ARG 0000000000

40 ALA 0000

41 LYS 00000000

42 ARG 000000000043 ASN 000000

44 ASN 001110

45 PHE 00110

46 LYS 00111110

47 SER 0011048 ALA 0000

49 GLU 001110

50 ASP 00110

51 CYS 00000

52 MET 000000053 ARG 0011111110

54 THR 001110

55 CYS 00000

56 GLY 000

57 GLY 00058 ALAe 0000

continued on next page

27

Page 28: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

The subset of degrees of freedom consists of side chain torsions for residues ofthe deformable segment and for the 3 side chains that contact the segment. Themethyl side chain of ALA48 is not variable. For this backbone conformation, all sidechains to be searched are contained within a single packing unit. Packing units areassigned based on a clustering of side chain ellipsoids, with a maximum size limita-tion of 12 side chains.

subset name=l5

F

-69.612

Fr Fe Fs Ft

69.110 15.539 0.000 -11.557Fc Fh Fw Fb

0.000 -17.684 -125.019 0.000

cQ2 cQ2bb cF2 cG3 cB3 cH2 cJ2 cY2 cE1 cC1

35 1 4037 372 3665 1853 16 7 2601 0

side chain packing unit= 0

[ 1: 20] ARG[ 1: 44] ASN[ 1: 45] PHE[ 1: 46] LYS[ 1: 47] SER

[ 1: 49] GLU[ 1: 50] ASP[ 1: 53] ARG[ 1: 54] THR

steps taken= 13 rms of gradient= 2.0778e-03

iD3 tot Fr Fe Fs Ft Fb Fh Fm

0 -422.5 39.0 32.0 0.0 -9.1 0.0 -17.5 -466.8

Y3iC2

32 10 2 10 14 5 4 31 1

steps taken= 12 rms of gradient= 1.9565e-03

iD3 tot Fr Fe Fs Ft Fb Fh Fm

1 -423.9 37.3 38.9 0.0 -12.3 0.0 -16.3 -471.6

Y3iC2

32 10 2 10 14 5 4 22 1

steps taken= 7 rms of gradient= 1.7413e-03

iD3 tot Fr Fe Fs Ft Fb Fh Fm

2 -423.2 35.3 51.1 0.0 -14.0 0.0 -15.0 -480.6

Y3iC2

32 10 2 12 14 5 4 17 1

.

.

.

steps taken= 7 rms of gradient= 1.3538e-03

iD3 tot Fr Fe Fs Ft Fb Fh Fm

19 -424.0 35.0 52.4 0.0 -12.8 0.0 -15.5 -483.2Y3iC2

32 10 2 9 13 5 4 22 1

tot=( Fr+Fe+Fs+Ft+Fb+Fh+Fm)

continued on next page

28

Page 29: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Fg and Fp are constant, and therefore excluded, over subspaces of conformationsrestricted to side chain movement.

For the current backbone conformation and packing unit, the collection of lowest-energy local minima are characterized. Energy components are for end points oflocal minimization trajectories starting from the corresponding combination of ro-tamers.

selected combination of rotamers= 32 10 2 1 5 5 4 24 1

This is the rotamer combination that, following off lattice energy minimization, wasselected to be carried forward. Selection is random, assuming a boltzmann prob-ability distribution over the collection of local minima. The corresponding energy-minimized side chain conformation is added to the current backbone conformation.If there was more than a single packing unit, then search through side chain confor-mations would continue with the next packing unit. In this example, for this back-bone conformation, the algorithm proceeds to the next backbone conformation.

.

.

.CONSTRUCT MECHANICAL SYSTEM

subset name=l5nZ0 nS0

1 3

cQ1 cQ1bb cU1 cJ1 cY1

343 174 169 228 162

cQ2 cQ2bb

57 22

cF1 cG2 cB2 cH1 cG1 cB1

4152 1059 3093 1859 4120 32

cQ3 cX2

107 107cE0 cC1

4379 16

Z0sub

0

iR0 R0aa Q0sub

1 eARG 000000000

2 PRO 000000

3 ASP 00000

4 PHE 00000

5 CYS 000006 LEU 0000000

7 GLU 000000

8 PRO 000000

9 PRO 000000

10 TYR 000000

continued on next page

29

Page 30: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

11 THR 000000

12 GLY 000

13 PRO 000000

14 CYS 00000

15 LYS 0000000016 ALA 0000

17 ARG 0000000000

18 ILE 0000000

19 ILE 0000000

20 ARG 001111111021 TYR 000000

22 PHE 00000

23 TYR 000000

24 ASN 000000

25 ALA 000026 LYS 00000000

27 ALA 0000

28 GLY 000

29 LEU 0000000

30 CYS 0000031 GLN 0000000

32 THR 000000

33 PHE 00000

34 VAL 000000

35 TYR 00000036 GLY 000

37 GLY 000

38 CYS 00000

39 ARG 0000000000

40 ALA 000041 LYS 00000000

42 ARG 0000000000

43 ASN 000000

44 ASN 111111

45 PHE 1111146 LYS 11111111

47 SER 11111

48 ALA 1111

49 GLU 111111

50 ASP 1111151 CYS 00000

52 MET 0000000

53 ARG 0011111110

54 THR 001110

55 CYS 0000056 GLY 000

57 GLY 000

58 ALAe 0000

The subset of degrees of freedom includes both backbone and side chain torsions.

continued on next page

30

Page 31: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

subset name=l5

F

-290.994

Fr Fe Fs Ft

38.749 -162.742 0.000 -9.994Fc Fh Fw Fb

0.000 -30.038 -126.969 0.000

cQ2 cQ2bb cF2 cG3 cB3 cH2 cJ2 cY2 cE1 cC1

57 22 4037 1026 3011 1853 35 25 4379 16BACKBONE +SIDE CHAIN LOCAL ENERGY MINIMIZATION,800 STEPS, SCORE= [restbchm]

iD0= 0

steps taken= 9 rms of gradient= 4.1034e-05

tot Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

-621.6 38.5 -130.8 0.0 -9.9 0.0 0.0 -30.0 -488.7 0.0 -0.9

iD0= 1steps taken= 58 rms of gradient= 4.3950e-04

tot Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

-613.4 39.9 -134.0 0.0 -7.7 0.0 0.0 -30.1 -480.5 0.0 -1.1

iD0= 2

steps taken= 85 rms of gradient= 9.9651e-05

tot Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

-622.3 39.3 -129.4 0.0 -11.8 0.0 0.0 -30.5 -488.9 0.0 -0.9

.

.

.iD0= 63

steps taken=699 rms of gradient= 3.7641e-03

tot Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

-485.5 146.1 -124.2 0.0 6.4 0.0 0.0 -38.7 -475.9 1.8 -1.1

tot=( Fr+Fe+Fs+Ft+Fb+Fc+Fh+Fm+Fg+Fp)

Energy evaluation is at the end point of local minimization on the simplified energysurface [restbchw].

DEFORMATIONS FOR MODE=lpNumber of conformations= 23

iD0= 0 F=-6.22348e+02 cnf=[D E B E A A A ]

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

39.32 -129.45 0.00 -11.84 0.00 0.00 -30.54 -488.93 0.00 -0.90Fps Fss sol Qha Qpa Qps Qss Qtq Qfq

-871.77 376.38 -0.08 5.39 -2.59 -3.09 1.17 -0.10 -0.85ARG -135.85 172.28 176.68 -71.01 -74.01 178.46 100.92 -9.41 177.30 176.79ASN -79.56 78.81-179.06-164.48 -6.89-176.57ASN D -160.89 115.55-170.95-174.99 -25.58 179.11PHE E -129.78 158.03-172.81 -62.62 94.58LYS B -90.04 -13.01-175.78 -68.55 153.65 64.49 151.79 178.66SER E -158.61 157.54 173.86 177.84 172.45ALA A -60.93 -32.12 179.08 60.37GLU A -59.92 -52.59-178.27-158.07 175.29 126.14

continued on next page

31

Page 32: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

ASP A -63.58 -31.73-166.28 -61.27 116.91ARG -67.67 -37.33-177.99-176.58 137.10 173.86 77.82 3.34-179.64 179.92THR -80.69 -51.76-160.11 -76.26 87.48 50.25

iD0= 1 F=-6.21628e+02 cnf=[D E B E A A A ]

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

38.54 -130.78 0.00 -9.87 0.00 0.00 -29.98 -488.65 0.00 -0.89

Fps Fss sol Qha Qpa Qps Qss Qtq Qfq

-872.23 377.31 0.02 5.41 -2.52 -3.09 1.17 -0.10 -0.85ARG -135.85 172.28 176.68 -70.29 -75.10 178.03 100.35 -8.84 177.43 176.25ASN -79.56 78.81-178.78-164.48 -6.89-176.57ASN D -161.51 114.26-169.96-175.04 -25.94 179.17PHE E -128.35 154.74-170.70 -63.46 95.65LYS B -89.01 -11.75-176.70 -66.24-166.32-158.20-172.00 179.86SER E -158.62 166.84-179.19 67.20 -65.73ALA A -69.34 -38.79 179.68 62.76GLU A -59.88 -51.60 179.42 -97.83 73.47-113.14ASP A -53.54 -36.57-166.28 -71.85 -15.50ARG -67.67 -37.33-177.99-172.20 128.67 71.94-163.51 -2.00 178.14-178.83THR -80.69 -51.76-160.11 -75.47 87.96 50.04

iD0= 2 F=-6.20003e+02 cnf=[D E A E A A A ]

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

39.59 -129.78 0.00 -10.36 0.00 0.00 -31.76 -486.83 0.00 -0.87

Fps Fss sol Qha Qpa Qps Qss Qtq Qfq

-866.67 373.14 -0.01 5.40 -2.55 -3.07 1.16 -0.10 -0.85ARG -135.85 172.28 176.68 -69.13 -75.25 178.35 100.24 -8.00 176.72 176.46ASN -79.56 78.81-178.77-164.48 -6.89-176.57ASN D -161.79 113.96-167.96-175.07 -25.62 179.13PHE E -128.30 153.00-175.90 -65.59 97.08LYS A -73.82 -37.29-171.83-135.78 78.20 67.19 50.19 176.50SER E -140.70 171.23 176.51 71.30-144.55ALA A -68.06 -33.98-179.25 62.38GLU A -64.29 -51.40 178.60-156.38-179.66 128.61ASP A -55.45 -35.69-166.28 -70.59 -9.91ARG -67.67 -37.33-177.99-176.04 137.41 173.71 77.97 3.25-179.65 179.92THR -80.69 -51.76-160.11 -75.80 87.41 49.79

.

.

.

iD0= 22 F=-5.91996e+02 cnf=[D E D E C A*A ]

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

46.30 -121.02 0.00 -5.89 0.00 0.00 -29.52 -480.38 0.00 -1.48

Fps Fss sol Qha Qpa Qps Qss Qtq Qfq

-851.71 364.01 0.00 5.44 -2.60 -3.02 1.13 -0.10 -0.85ARG -135.85 172.28 176.68 -67.72 -75.14-179.62 101.70 -10.91 176.57 175.65ASN -79.56 78.81-179.22-164.48 -6.89-176.57ASN D -162.52 113.95-168.16-175.02 -23.57 178.95PHE E -128.58 142.25 162.57 -63.19 97.18LYS D -147.42 97.30 171.44-158.48 73.89-162.02 71.84 157.18SER E -177.85-179.83-176.42-116.45 88.18ALA C -62.61 108.22 174.71 60.01GLU A* 62.43 13.91-165.80-144.52-179.68-111.06ASP A -56.03 -38.08-166.28-171.93 -0.15ARG -67.67 -37.33-177.99-174.78 137.31 174.39 78.01 3.62-179.37 179.98THR -80.69 -51.76-160.11 -76.66 90.77 49.54

continued on next page

32

Page 33: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

For the final collection of low-energy local minima, conformations are characterizedby energy, by sequence of conformational regions, and by torsion angle values forthe set of residues specified in file test/stp/stp.l5.4pti to contain torsion anglesallowed to vary in the search.

The conformation having the 2nd lowest energy is the crystal structure conforma-tion. The conformation having the lowest energy results from side chain placementonto the undeformed backbone. The conformations having the 3rd, 4th, and 5thlowest energies have the same backbone conformation as the crystal structure withdiffering side chain conformations.

dG

535.70

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

39.32 -129.45 0.00 -11.84 0.00 0.00 -30.54 -488.93 0.00 -0.90

Gunf Sunf Qsol

-792.00 2147.47 -0.08

dG=(G−Gu) is an estimate of free energy of folding that is not meaningful for asingle sequence.

Gunf and Sunf are the free energy and entropy, respectively, of an Ising model rep-resentation of the unfolded state.

In a search through sequence space, ddG=( dG(sequence1) −dG(sequence0)) en-ables meaningful comparison of stability between different sequences.

dG is calculated as the free energy of the folded state ( cpp(Fr+Fe+Fs+Ft+Fb+Fc+Fg+Fp) +chFh+cpsFps+cssFss) minus the free energy of theunfolded state ( Gunf), where { cpp, ch, cps, css} are coefficients optimized such thatcalculated ddG matches observed ddG over a large dataset of experimental mea-surements.

COMPACT ANALYSIS

energy minima in sequence-structure space

iM1 Fr Fe Fs Ft Fb Fh Fm Fg Fp Fc

0 39.32 -129.45 0.00 -11.84 0.00 -30.54 -488.93 0.00 -0.90 0.00

iM1 G -Gu Gu Qsol

0 535.70 -792.00 -0.08

iM1 sequence substitutions

20 43 44 45 46 47 48 49 50 53 54

0 ARG ASN ASN PHE LYS SER ALA GLU ASP ARG THR

The collection of most stable sequences found in order of stability.

In this example, the lowest-energy conformation found has the backbone conforma-tion of the crystal structure.

33

Page 34: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

Using a 4.0GHz Intel Core i7 processor, application of global energy minimization tothis 7-residue surface loop requires computation time of about 13.5 hours.

As a second example, input file test/stp/stp.c2.1pga directs a search through se-quence space. At 3 residue positions { 5_LEU,34_ALA,43_TRP}, each of which con-tributes to the hydrophobic core of 1pga, a small set of alternative side chains is sub-stituted. Other files needed as input for this example were created previously in theexample test case for the ereg command.

% estp test 1pga 06 c2

File test/stp/stp.c2.1pga specifies 10 positions that contain torsion angles vari-able in the search. Backbone is variable but not searched for residue positions{ 5, 7,34,43,54}. Because backbone variabilty includes the ω torsion of the previousresidue, in addition to φ and ψ of the residue marked as variable, residue positions{ 4, 6,33,42,53} must also be included as residues containing 1 or more torsions thatwill vary with conformational search. The side chain is searched for residue positions{ 5, 7,34,43,54}.

Automated expansion of the search subspace, to include all side chains thatcan contact a side chain initially marked as searchable, adds 8 side chains{12,16,30,31,33,39,40,52} to the searchable collection. To avoid automated expan-sion of the search subspace, a name must be chosen for the subset of degrees of free-dom such that the 1st character is either ’l’ or ’h’.

The collection of most stable sequences found in the search is listed, in order of sta-bility, at the end of the diagnostic output file test/dgn/estp.1pga.06.c2. For themost stable sequence found in the search through sequence space, the correspondinglowest-energy conformation found in the search through conformation space is output,in pdb format, to file test/car/MOL.g000.pdb.

In this example, the native sequence is found to be the most stable of the collectionexamined.

In the following abbreviated listing, a vertical line in column 1 indicates a line that hasbeen added for the purpose of description.

file=test/dgn/estp.1pga.06.c2

Diagnostic file created by the command:% estp test 1pga 06 c2

continued on next page

34

Page 35: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Input file test/stp/stp.c2.1pga directs search through 64 sequences. At 3 posi-tions that contribute to the hydrophobic core { 5 LEU, 34 ALA, 43 TRP}, alternativehydrophobic side chains are substituted.

.

.

.

For each sequence, the program outputs a description of the search through con-formational space. The lowest-energy conformation found is used to evaluate dG.Relative stabilty is calculated as ddG, the change in dG with sequence.

COMPACT ANALYSIS

energy minima in sequence-structure space

iM1 Fr Fe Fs Ft Fb Fh Fm Fg Fp Fc

0 18.19 -10.73 0.00 4.18 0.00 -119.21 -370.14 0.60 -1.97 0.01

1 35.57 10.67 0.00 0.16 0.00 -124.34 -372.72 0.60 -1.95 0.012 50.10 -1.21 0.00 4.54 0.00 -143.77 -373.77 0.60 -1.97 0.02

3 33.71 23.68 0.00 0.14 0.00 -114.98 -374.57 0.60 -1.95 0.01

4 23.84 3.76 0.00 3.67 0.00 -117.52 -370.90 0.60 -1.98 0.03

.

.

.

63 27.68 3.79 0.00 5.61 0.00 -88.68 -378.29 0.60 -1.97 0.01

Energy components for sequences ordered by stability.

In rare instances, evaluation of Fm fails due to occurance of a “cycle of edges error”in calculation of a face of the molecular surface. In such cases, Fm is set to zero.

iM1 G -Gu Gu Qsol

0 704.01 -911.32 0.04

1 706.49 -903.55 -0.02

2 707.35 -911.87 -0.03

3 708.23 -896.87 -0.024 708.82 -907.74 0.03

.

.

.

63 721.49 -906.53 -0.00

dG=(G−Gu) is an estimate of free energy of folding that is not meaningful for asingle sequence.

Gu is the free energy of an Ising model representation of the unfolded state.

In a search through sequence space, ddG=( dG(sequence1) −dG(sequence0)) en-ables meaningful comparison of stability between different sequences.

continued on next page

35

Page 36: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

dG is calculated as the free energy of the folded state ( cpp(Fr+Fe+Fs+Ft+Fb+Fc+Fg+Fp) +chFh+cpsFps+cssFss) minus the free energy of theunfolded state ( Gu), where { cpp, ch, cps, css} are coefficients optimized such thatcalculated ddG matches observed ddG over a large dataset of experimental mea-surements.

iM1 sequence substitutions

4 5 6 7 12 16 30 31 33 34 39 40 42 43 52 53 54

0 LYS LEU ILE LEU LEU THR PHE LYS TYR ALA VAL ASP GLU TRP PHE THR VAL

1 LYS LEU ILE LEU LEU THR PHE LYS TYR MET VAL ASP GLU PHE PHE THR VAL

2 LYS LEU ILE LEU LEU THR PHE LYS TYR ILE VAL ASP GLU TRP PHE THR VAL3 LYS LEU ILE LEU LEU THR PHE LYS TYR MET VAL ASP GLU MET PHE THR VAL

4 LYS MET ILE LEU LEU THR PHE LYS TYR ALA VAL ASP GLU TRP PHE THR VAL

.

.

.63 LYS VAL ILE LEU LEU THR PHE LYS TYR ALA VAL ASP GLU LEU PHE THR VAL

The collection of most stable sequences found in order of stability.

Input file test/stp/stp.c2.1pga specifies 10 positions that contain torsion anglesvariable in the search.

Backbone is variable but not searched for residue positions { 5, 7,34,43,54}. Be-cause backbone variabilty includes the ω torsion of the previous residue in addi-tion to the φ and ψ torsions of the residue specified as variable, residue positions{ 4, 6,33,42,53} must also be included as residues containing 1 or more torsionsthat will vary with conformational search.

The side chain is searched for residue positions { 5, 7,34,43,54}. Automated ex-pansion of the search subspace, to include side chains that can contact the sidechains specified as searchable, adds 8 side chains {12 LEU,16 THR,30 PHE,31LYS,33 TYR,39 VAL,40 ASP,52 PHE}.

To avoid automated expansion of the search subspace, choose a name for the sub-set of degrees of freedom such that the 1st character is ’l’ or ’h’.

6.4 prof

FUNCTIONALITYstructure quality assessment

SYNTAXprof FAM MOL CNF

36

Page 37: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

INPUT FILES OUTPUT FILES/FAM/seq/seq.MOL /FAM/dgn/prof.MOL.CNF/FAM/tor/tor.MOL.CNF /FAM/car/dprof.MOL.CNF/FAM/car/MOL.CNF.pdb

SUMMARY

1) inputs a sequence, file FAM/seq/seq.MOL

2) inputs a conformation in 2 alternative formats: torsion angle format in fileFAM/tor/tor.MOL.CNF, and pdb format in file FAM/car/MOL.CNF.pdb

3) identifies defects in structures, associates with these defects differential free ener-gies of folding, and accumulates a profile along the sequence of defect energy density

4) outputs the defect energy density profile in file FAM/car/dprof.MOL.CNF

5) outputs, in file FAM/dgn/prof.MOL.CNF, additional characterization of identified de-fects

NOTES

Structure quality assessment identifies chain segments likely to be improved by appli-cations of molecular mechanics-based structure prediction.

A common use of the prof command is in preparation for the hlog command, whichuses defect energy densities of template structures in deciding the best alignment ofa target sequence to a group of structurally aligned templates. For each template, thehlog command inputs, along with the geometry regularized structure, the defect en-ergy density profile. Creation of these input files requires execution of the prof com-mand following the greg command for each template structure.

The prof command does not access the energy surface and requires only a few sec-onds of computation time.

EXAMPLE TEST CASE

To execute the command, open a window to directory src/str, and type the followingline to the linux prompt.% prof test 1pga 07

Files needed as input for this command were created previously in the example testcase for the ereg command.

Because the pdb entry 1pga (resolution=2.07Å) should be free of significant defects,profile test/car/dprof.1pga.07 provides an example of a baseline level of defect

37

Page 38: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

energy density that can be expected for a good structure.

In the following abbreviated listing, a vertical line in column 1 indicates a line that hasbeen added for the purpose of description.

file=test/dgn/prof.1pga.07

Diagnostic file created by the command:% prof test 1pga 07

OVERLAP

.

.

.DISALLOWED (PHI,PSI,OMG,CHI)

.

.

.EXPOSED HYDROPHOBIC SURFACE

.

.

.CAVITY

.

.

.BURIED CHARGE

.

.

.UNPAIRED HBOND DONORS AND ACCEPTORS

.

.

.IMPROBABLE SECONDARY STRUCTURE

.

.

.OUTLIER STATISTICAL CONTACT

.

.

.ELECTROSTATIC

.

.

.

Output consists of a listing of defects. For each defect, a mapping is included toresidues of the polymer chains.

The prof command assigns energies to defects. By partitioning these energies ontothe contributing residues, the command accumulates a defect energy density alongthe chains.

continued on next page

38

Page 39: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

The calculated energy density is output as a compact profile in filetest/car/dprof.PRO.CNF.

6.5 rcyc

FUNCTIONALITYenergy refinement of a homology model

SYNTAXrcyc FAM MOL CNF

INPUT FILES OUTPUT FILES/FAM/seq/seq.MOL /FAM/dgn/rcyc.MOL/FAM/tor/tor.MOL.CNF /FAM/stp/stp.g???.MOL

/FAM/car/MOL.g???.pdb/FAM/tor/tor.MOL.g???

SUMMARY

1) inputs a sequence, FAM/seq/seq.MOL

2) inputs an initial conformation in torsion angle format, FAM/tor/tor.MOL.CNF

3) generates a collection of 7-residue segments chosen such that the union spans theentire chain (or chains) and, using notation SUB=g??? where ? is a digit, outputs tofiles FAM/stp/stp.g???.MOL the corresponding subsets of degrees of freedom to besearched

4) cycles through this collection of segments performing a limited conformationalsearch with respect to each subset of degrees of freedom

5) outputs a sequence of conformations in 2 alternative formats, files FAM/car/MOL.g???.pdband FAM/tor/tor.MOL.g???, one for each subset of degrees of freedom, whereCNF=g??? is the lowest energy conformation obtained in the search with respect toSUB=g???

6) outputs, in file FAM/dgn/rcyc.MOL, a diagnostic summary of the execution of thecommand

39

Page 40: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

EXAMPLE TEST CASE

The 20 residue trp cage mini protein is an unusual example of a short, uncrosslinkedpolypeptide chain that folds to a single stable conformation. An NMR structure of thefolded state is available in pdb entry 2jof. This system, because the space of con-formations can be searched extensively, provides an easily interpreted test of energyfunctions and conformational search procedures.

In this example, following local energy minimization of the experimental structure, thercyc command is used to search the space of conformations in the neighborhood ofthe experimental structure, thereby establishing a more meaningful energy for the na-tive state.

To execute the command, open a window to directory src/str, and type the followinglines to the linux prompt.% ereg test 2jof% rcyc test 2jof 06% cp ../../test/car/2jof.g004.pdb ../../test/exp/2jof_r.pdb% ereg test 2jof_r

The initial command, directing local energy minimization of the NMR structure, estab-lishes an energy for the experimental conformation. The rcyc command first creates 5subsets of torsion angle degrees of freedom, SUB={ g000, g001, g002, g003, g004},then cycles through these subsets, accomplishing a limited conformational searchwith respect to each subset.

Following energy refinement of the experimental structure, the final conformation(MOL=2jof, CNF=g004) is renamed using the more descriptive MOL=2jof_r. The finalcommand, directing local energy minimization with respect to the full set of torsionangle degrees of freedom, establishes an energy for the region of the experimentalstructure. We note that, because the subspace searches exclude energy terms that donot change with respect to the subset of degrees of freedom, the partial energies as-sociated with a subset of degrees of freedom are not comparable between subsets, orto a full energy associated with the full set of torsion angle degrees of freedom. Theenergy of the refined structure is more relevant than the energy of the experimentalstructure as a point of reference in searches of the conformational space for regionshaving lower energy than the native conformation. Existence of such regions wouldindicate errors remaining in the energy functions.

From file test/dgn/ereg.2jof, the total energy of the energy minimized NMR struc-ture, test/car/2jof.06.pdb, is -384.16 kcal/mol. From file test/dgn/ereg.2jof_r,the total energy of the energy refined native conformation, test/car/2jof_r.06.pdb,is -395.88 kcal/mol.

The rcyc command accesses the energy surface. Using a 4.0GHz Intel Core i7 proces-

40

Page 41: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

sor, refinement search for the 20 residue mini protein 2jof requires about 32 minutesof computation time.

As a third example using the estp command, we complete 2 steps of a trajectory oflocal minima intended to accomplish global energy minimization for the trp cage miniprotein.

The sequence of the mini protein test/seq/seq.2jof was created by the initialereg command of the current test case. A collection of 15 search subspaces of formtest/stp/stp.SUB.2jof have been created by hand.

As an initial conformation, test/car/2jof.s000.pdb will be created here by repro-ducing the test case for the utility command initcar. The command generates anoften-poor chain conformation by assigning to each residue 1 of 3 possible backboneconformations chosen for consistency with a secondary structure prediction.

% initcar test 2jof i% cp ../../test/car/2jof.i.pdb ../../test/exp/2jof_i.pdb% ereg test 2jof_i% cp ../../test/tor/tor.2jof_i.06 ../../test/tor/tor.2jof.s000% cp ../../test/car/2jof_i.06.pdb ../../test/car/2jof.s000.pdb

From file test/dgn/ereg.2jof_i, the total energy of the initial conformation(MOL=2jof, CNF=s000) is -340.07 kcal/mol, higher by 55 kcal/mol than the energy ofthe native conformation.

The following sequence of commands generates the trajectory.% estp test 2jof s000 h07c% cp ../../test/tor/tor.2jof.s000_h07c_000 ../../test/tor/tor.2jof.s001% cp ../../test/car/2jof.s000_h07c_000.pdb ../../test/car/2jof.s001.pdb% estp test 2jof s001 h07f% cp ../../test/tor/tor.2jof.s001_h07f_000 ../../test/tor/tor.2jof.s002% cp ../../test/car/2jof.s001_h07f_000.pdb ../../test/car/2jof.s002.pdb

Energies for the local minima produced by these initial steps of the trajec-tory are -378.25 and -386.01 kcal/mol. These energies, along with a resolu-tion into components, are written to the associated diagnostic output filestest/dgn/estp.2jof.s000.h07c and test/dgn/estp.2jof.s001.h07f. A full searchof the space of conformations for this mini protein requires further computation tocontinue the trajectory.

In addition to demonstrating more fully the utility of the estp command, this exam-ple shows the process of setting stp.SUB.MOL files to enable conformational searcheswith respect to moderately large subsets of torsion angle degrees of freedom. On par-allel computer systems, a strategy found to be effective is parallel search of all 15subspaces distributed, one subspace per node, over 15 nodes. The resulting collec-

41

Page 42: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

tions of local minima are combined, and the step taken is to the local minimum havinglowest energy.

In this example, each of the 15 stp.SUB.MOL files specifies that all torsion angles arevariable. As a consequence, because no energy contributions are excluded in con-traction of the mechanical system, energies can be meaningfully compared betweensearches, and with energies evaluated using the ereg command. In contrast, for thecollection of stp.SUB.MOL files that is generated by the rcyc command, each filespecifies a different subset of the full set of torsion angles. In such cases, because dif-ferent subsets of degrees of freedom exclude different energy contributions, energiescan not be meaningfully compared between searches.

file=test/dgn/estp.2jof.s000.h07c

Diagnostic file created by the command:% estp test 2jof s000 h07c

.

.

.

The above sections of output are analagous to those generated by the test casefor the estp command in which a search through the conformation space of a sin-gle segment was directed by file test/stp/stp.l5.4pti. In this current example,a search through a more complex space of conformations consisting of 2 adja-cent segments is directed by file test/stp/stp.h07c.2jof. The output becomeslengthy, roughly proportional to the size of the subset of degrees of freedom that isbeing searched. The result is a collection of local energy minima characterized byenergy (with a partitioning into components) and torsion angle coordinates.

DEFORMATIONS FOR MODE=lpNumber of conformations= 29

iD0= 0 F=-3.78254e+02 cnf=[C B A D A F A ][A B A*F F F A*G F ]

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

169.76 -404.40 0.00 -16.32 0.00 34.40 -25.19 -137.98 0.60 0.88Fps Fss sol Qha Qpa Qps Qss Qtq Qfq

-226.82 83.11 0.26 2.28 -1.15 -0.80 0.26 -0.00 -0.32eASP -56.67 126.55-176.55 -49.32 125.46ALA C -86.13 77.52 178.47 63.50TYR B -69.18 -7.66 167.73-142.67 -73.54 17.94ALA A -50.82 -39.93-174.60 58.64GLN D -117.79 39.20-176.23 -55.57 161.12 -41.02-175.55TRP A -69.10 -19.37 173.09 73.67 -68.89LEU F -107.60 150.61-175.24 -60.63 175.94 59.04 65.88LYS A -67.67 -14.80-177.60 66.48 179.24 -58.81-165.21-179.56ASP A -70.77 -15.72-178.66 -57.79 113.70GLY B -90.53 -4.45 179.83GLY A* 64.81 60.46 177.69PRO F -66.65 136.62 168.23 17.32 -19.77 13.79SER F -77.41 177.58 179.72 -66.43 -17.11

continued on next page

42

Page 43: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

SER F -61.23 126.70 172.36-179.28 157.11GLY A* 64.29 36.78 177.35ARG G -94.96 -57.32-166.07-100.74 -77.20 178.64 103.37 -8.19 175.73 179.56PRO F -83.18 158.79 174.93 27.47 -22.14 7.76PRO A -64.59 -49.52 167.55 3.14 -2.98 2.26PRO F -83.22 113.93-178.41 27.19 -22.08 7.92SERe -131.11 124.09 173.42-166.52

iD0= 1 F=-3.76514e+02 cnf=[A B A X A C X ][F B*F*A E F A*E F ]

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

150.34 -395.89 0.00 5.07 0.00 25.03 -22.56 -140.25 1.20 0.53Fps Fss sol Qha Qpa Qps Qss Qtq Qfq

-230.34 84.23 0.18 2.26 -1.20 -0.82 0.26 -0.00 -0.32eASP -59.79 143.16 179.57-178.20-140.20ALA A -52.91 -45.36-168.13 60.76TYR B -90.89 15.03 174.54 -49.33 -76.38 -1.58ALA A -75.48 -27.28-174.99 59.28GLN X -159.45 -51.32-171.15-157.24 173.27-122.51-177.18TRP A -63.71 -57.68-178.57 161.50 42.70LEU C -96.29 49.50-174.56 -60.16 178.94 66.33 68.04LYS X -123.00 -68.36-174.56 179.54 144.74 176.27 164.49-172.79ASP F -68.89 118.13-178.05-169.41 -53.01GLY B* 109.11 -26.06-173.85GLY F* 99.61 127.97 172.88PRO A -66.15 -17.73 173.23 16.42 -19.09 13.59SER E -162.07 167.89-177.97-136.65 70.34SER F -61.34 127.84 175.73-179.31 156.29GLY A* 64.81 40.25-173.94ARG E -135.70 141.70 163.89-158.30-178.50-159.92-165.12 -2.56-179.03-179.97PRO F -83.18 177.36 171.35 26.88 -21.77 7.71PRO F -83.05 117.37-179.89 26.61 -21.55 7.62PRO F -71.22 116.61 175.13 20.28 -20.69 12.25SERe -152.90 124.96 174.01 179.12

.

.

.iD0= 28 F=-3.47299e+02 cnf=[A B A B G C A ][E E*D F C D*F X*F ]

Fr Fe Fs Ft Fc Fb Fh Fm Fg Fp

175.27 -393.42 0.00 -6.53 0.00 32.85 -22.35 -134.35 0.60 0.64

Fps Fss sol Qha Qpa Qps Qss Qtq Qfq

-222.02 82.33 0.24 2.26 -1.16 -0.79 0.26 -0.00 -0.32eASP -77.76 149.50 174.93-153.95-165.58ALA A -51.74 -41.62-170.23 59.29TYR B -95.60 15.97-173.03 -63.43 -68.59 -3.01ALA A -67.52 -25.85-179.80 52.61GLN B -74.25 -8.99-178.94-149.68 176.37-115.20-174.05TRP G -112.97 -46.03-168.05 146.98 43.39LEU C -84.33 69.15 166.91 -66.82 168.21 63.34 65.46LYS A -67.18 -33.30 174.81-177.51 138.80 177.56 170.67-178.31ASP E -133.88 131.88-175.46-168.21 -53.42GLY E* 140.84-154.71 175.76GLY D -121.02 85.03 157.36PRO F -83.19 128.13 178.66 26.83 -21.82 7.82SER C -100.82 69.10 173.55-170.42-177.60SER D* 179.12 111.01-167.46-110.25 79.88GLY F -58.62 123.15-176.52

continued on next page

43

Page 44: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

ARG X* 175.23 -42.01 169.16-164.30-135.48-178.53 164.85 -0.79 179.21 179.96PRO F -83.19 175.31-176.90 27.08 -21.93 7.78PRO A -62.20 -31.74 151.42 11.82 -17.20 15.35PRO C -66.00 104.72-178.36 4.27 -3.37 1.70SERe -126.86 123.65 177.11 125.38

Because this example includes no search through sequence space, the final section,the collection of most stable sequences found in order of stability, is not meaning-ful.

.

.

.

NOTES

A more complete search of the region of the experimental structure of 2jof locateda conformation having an energy of -398.67 kcal/mol. As an addtional point of refer-ence, this conformation is included in file test/exp/2jof_n.pdb.

6.6 hlog

FUNCTIONALITYhomology model building

SYNTAXhlog FAM MOL GRP

INPUT FILES OUTPUT FILES/FAM/arg/exptstructs.GRP /FAM/dgn/hlog.MOL.GRP/FAM/seq/seq.MOL /FAM/car/TEM.GRP.pdb/FAM/seq/seq.TEM /FAM/tor/tor.MOL.GRP/FAM/tor/tor.TEM.01 /FAM/car/MOL.GRP.pdb/FAM/car/TEM.01.pdb /FAM/car/dprof.MOL.GRP/FAM/car/dprof.TEM.01 /FAM/stp/stp.hot.MOL

SUMMARY

1) inputs a target sequence, file FAM/seq/seq.MOL

2) inputs a group of templates, file FAM/arg/exptstructs.GRP, to be used in buildingthe homology model

44

Page 45: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

3) for each template in group GRP, inputs a sequence (file FAM/seq/seq.TEM), 2 alter-native specifications of the geometry-regularized structure (files FAM/car/TEM.01.pdband FAM/tor/tor.TEM.01), and the defect energy density profile of the geometry-regularized structure (file FAM/car/dprof.TEM.01)

4) aligns all template structures contained in group GRP and, considering also defectenergy densities along the alignment, partitions the alignment into structurally con-served and non-conserved regions

5) aligns the target sequence to the multiple structure alignment, selecting, withineach region, the template that best matches the target

6) constructs a homology model by transferring coordinates for atoms aligned to atemplate structure, and by generating coordinates for unaligned residues and for sub-stituted side chains

7) for each template in group GRP, outputs, in pdb format, file FAM/car/TEM.GRP.pdb,the geometry-regularized structure of the template applying the translation and rota-tion obtained from multiple structural alignment of the group of templates

8) outputs 2 alternative specifications of the homology model structure, filesFAM/car/MOL.GRP.pdb and FAM/tor/tor.MOL.GRP, and the corresponding defect en-ergy density profile, file FAM/car/dprof.MOL.GRP

9) outputs, in file FAM/stp/stp.hot.MOL, specification of a subset of degrees of free-dom, in the format required for input to the estp command, chosen such that con-formational search with respect to this subset is likely to reduce defects in the modelstructure

10) outputs, in file FAM/dgn/hlog.MOL.GRP, a diagnostic summary of the execution ofthe command

NOTES

Homology model building leads to one of the major applications of molecularmechanics-based structure prediction, structure prediction of surface loops for whichknowledge-based structure prediction may not be reliable.

At the expense of added preparation, geometry regulation of templates and subse-quent defect energy density calculation have been separated from the hlog com-mand. Effective use of the command often requires some experimentation and feed-back to select a group of templates that align well structurally. By separating geome-try regulation and defect energy density calculation, these computations can be per-formed only once for an entire family of templates.

45

Page 46: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

The hlog command does not access the energy surface and requires only a few min-utes of computation time.

EXAMPLE TEST CASE

In this example, a homology model is constructed for the sh3 domain of the humangene sequence grb2 (growth factor bound protein 2). Although the set of experimen-tal structures for this family of gene sequences exceeds 60, we limit here the group oftemplates to 6 structures: SCOP domains 1ckaA_, 1bbzA_, 1cskA_, 1semA_, 1oebA_,and 1jo8A_. We assign to this group the name sparse. For comparison, a partialstructure exists for this target sequence, SCOP domain 1griA1 determined by x-raycrystallography at 3.1A resolution.

File test/arg/exptstructs.sparse, a listing of the templates for GRP=sparse,was created by hand editing. Files test/exp/TEM.pdb, where TEM is the 6-character domain name, were created by copying and editing the corresponding 4-character pdb entry to extract the domain homologous to the target sequence. Filetest/seq/seq.grb2 was created, starting from the SCOP definition of the domainwhich includes a 1-letter code specification of the amino acid sequence, using theutility command seqformat. In preparation for the seqformat command, input filetest/exp/seq.grb2, the target residue sequence in 1-letter code format, was createdby hand editing.

To execute the command, open a window to directory src/str, and type the followinglines to the linux prompt.% seqformat test grb2% greg test 1ckaA_

% greg test 1bbzA_

% greg test 1cskA_

% greg test 1semA_

% greg test 1oebA_

% greg test 1jo8A_

% prof test 1ckaA_ 01% prof test 1bbzA_ 01% prof test 1cskA_ 01% prof test 1semA_ 01% prof test 1oebA_ 01% prof test 1jo8A_ 01% hlog test grb2 sparse

For each template TEM, input files test/seq/seq.TEM, test/tor/tor.TEM.01,test/car/TEM.01.pdb, and test/car/dprof.TEM.01 are created by executing thegreg command followed by the prof command.

46

Page 47: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

Graphical viewing of the homology model, file test/car/grb2.sparse.pdb, showsa conformation with many stabilizing hydrophobic and H-bond interactions, but alsowith some destabilizing atom pair overlaps localized to segment 40-45, and with somequestionable conformations of charged side chains. Supplementing the user’s eye,the defect energy density profile, file test/car/dprof.grb2.sparse, identifies a peakover residues 40-45 originating from overlaps, disallowed (φ, ψ) values, and unpairedHydrogen bond donors and acceptors.

The subset of torsion angle degrees of freedom SUB=hot, specified in filetest/stp/stp.hot.grb2, was selected by the program as that most likely to reducedefect energy. Alternatively, a stp.SUB.MOL file can be created by hand and confor-mational search with respect to this subset directed by the estp command.

The diagnostic file contains the multiple structure alignment and the alignment of thetarget sequence to the multiple structure alignment. In the following abbreviated list-ing, a vertical line in column 1 indicates a line that has been added for the purpose ofdescription.

file=test/dgn/hlog.grb2.sparse

Diagnostic file created by the command:% hlog test grb2 sparse

INPUT TEMPLATE STRUCTURESALIGN TEMPLATE STRUCTURESstructure-structure alignment of 1ckaA_ to 1oebA_

C-alpha RMSD of aligned substructures= 1.35number of residues aligned= 44

GLU A 135 ARG A 2TYR A 136 TRP A 3VAL A 137 ALA A 4

.

.

.GLU A 188 ALA A 54structure-structure alignment of 1bbzA_ to 1oebA_

C-alpha RMSD of aligned substructures= 1.25number of residues aligned= 50

.

.

.structure-structure alignment of 1cskA_ to 1oebA_

C-alpha RMSD of aligned substructures= 1.63

number of residues aligned= 45

.

.

.structure-structure alignment of 1semA_ to 1oebA_

continued on next page

47

Page 48: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

C-alpha RMSD of aligned substructures= 1.01number of residues aligned= 53

.

.

.structure-structure alignment of 1jo8A_ to 1oebA_

C-alpha RMSD of aligned substructures= 0.93number of residues aligned= 50

.

.

.

A multiple structure alignment is generated for the templates of group GRP=sparse.The longest template is identified to be 1oebA_. For all remaining templates, thestructure is aligned to the structure of the longest template. If one or a few tem-plates fail to align over a large fraction of their length, then a better model will likelybe obtained by reducing the size of the group of templates.

Number of Aligned Structures= 6Number of Alignment Elements= 2Alignment Element= 1 Length= 261ckaA_ ( 1: 2- 27) EYVRALFDFNGNDEEDLPFKKGDILR1bbzA_ ( 1: 1- 26) NLFVALYDFVASGDNTLSITKGEKLR

1cskA_ ( 1: 2- 27) TECIAKYNFHGTAEQDLPFCKGDVLT1semA_ ( 1: 3- 28) KFVQALFDFNPQESGELAFKRGDVIT1oebA_ ( 1: 6- 31) RWARALYDFEALEEDELGFRSGEVVE1jo8A_ ( 1: 1- 26) PWATAEYDYDAAEDNELTFVENDKIIAlignment Element= 2 Length= 13

1ckaA_ ( 1: 43- 55) EGKRGMIPVPYVE1bbzA_ ( 1: 42- 54) KNGQGWVPSNYIT1cskA_ ( 1: 44- 56) VGREGIIPANYVQ1semA_ ( 1: 43- 55) NNRRGIFPSNYVC1oebA_ ( 1: 46- 58) HNKLGLFPANYVA

1jo8A_ ( 1: 43- 55) DGSKGLFPSNYVS

Regions are identified for which structure is conserved over all templates. In theabove listing, structurally conserved regions are referred to as alignment elements.

ALIGN TARGET SEQUENCE TO TEMPLATE STRUCTURESAlignment of Target Sequence to Family of Templatesnumber of templates= 6target sequence=grb2%identical= 44.64 template sequence=1cskA_

-MEAIAKYDF KATADDELSF KRGDILKVLN EECDQNWYKA E-LNGKDGFI PKNYIEMK

GTECIAKYNF HGTAEQDLPF CKGDVLTIVA VTKDPNWYKA KNKVGREGII PANYVQKR

######### ########## ####### ####### ######

.

.

.target sequence=grb2%identical= 33.93 template sequence=1bbzA_

MEAIAKYDFK ATADDELSFK RGDILKVLNE ECDQNWYKAE LNGKDGFIPK NYIEMK-

NLFVALYDFV ASGDNTLSIT KGEKLRVLGY NHNGEWCEAQ TKNGQGWVPS NYITPVNS

continued on next page

48

Page 49: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

########## ########## ###### ######### ####

The target sequence grb2 is aligned to each template of the group. A pound charac-ter below the alignment indicates the above template residue is contained within astructurally conserved region.

Structurally Conserved Element= 0, Length= 26grb2 ( 1: 1- 26) MEAIAKYDFKATADDELSFKRGDILK1cskA_ ( 1: 2- 27) TECIAKYNFHGTAEQDLPFCKGDVLTPreceding Structurally Conserved Element= 1, Length= 15grb2 ( 1: 27- 41) VLNEECDQNWYKAEL

1bbzA_ ( 1: 27- 41) VLGYNHNGEWCEAQTStructurally Conserved Element= 1, Length= 13grb2 ( 1: 42- 54) NGKDGFIPKNYIE1cskA_ ( 1: 44- 56) VGREGIIPANYVQFollowing Structurally Conserved Element= 1, Length= 2

grb2 ( 1: 55- 56) MK1cskA_ ( 1: 57- 58) KR

For each region, structurally conserved and intervening, a template is selected tooptimize alignment with the target sequence.

Segments Targeted for Backbone Structure Generation

For each target residue aligned to a template, backbone coordinates are copiedfrom the aligned template residue. If insertions or deletions occur, constraints ofchain connectivity and polypeptide geometry require backbone structure gener-ation, including some searching of conformations, for segments of the chain sur-rounding an insertion or deletion. This section of output specifies those segmentsfor which backbone structure has been generated as opposed to transferred from atemplate. In this example, there are no insertions or deletions.

BUILD TARGET HOMOLOGY MODELHomology Substitution pR0min= 1 pR0max= 56

For each target side chain, coordinates are copied from the aligned residue of thetemplate for all atoms of the target side chain for which a homologous atom of thetemplate side chain exists. When mismatches occur for which some atoms of thetarget side chain have no homologous atom in the aligned template side chain,coordinates are generated using on-lattice rotamer dead end elimination search.

For program design utility, set names consist of a 2-character (capitalletter+number) combination. Set name definitions can be found in filesrc/str/dimensions. For example:

Z0= chains in system of molecules

R0= residues in chains

continued on next page

49

Page 50: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

The following set name definitions are relevant to side chain placement:

R6= residues with variable side chains in homology model building

Q6= side chain torsions variable in rotamers

V6= side chain rotamers consistent with substitution

P6= incomplete side chain heavy atoms variable in homology model building

W6= packing units for side chain placement in homology model building

J6= residues of a packing unit in homology model building

nR6= 27

iR6= 0 R6Z0= 1 R6R0= 1 aa=eMET R6cQ6=3 R6cV6=105

iR6= 1 R6Z0= 1 R6R0= 10 aa=LYS R6cQ6=3 R6cV6= 27

iR6= 2 R6Z0= 1 R6R0= 14 aa=ASP R6cQ6=1 R6cV6= 3

.

.

.

iR6= 26 R6Z0= 1 R6R0= 56 aa=LYSe R6cQ6=2 R6cV6= 9

Characterization of incomplete side chains. Note that Q6 (the side chain torsionsvariable in rotamers) and V6 (the side chain rotamers consistent with substitution)are reduced when some of the target side chain atoms have been fixed by homol-ogy.

For example, target residue 10 Lysine is aligned to a template Histidine. Here, co-ordinates for the {Cβ,Cγ} atoms are copied from the template. This fixes the χ1

torsion, leaving only 3 torsions {χ2,χ3,χ4} variable. Also, coordinates remain unde-fined for only 3 heavy atoms {Cδ,Cε,Nζ}, and rotamers consistent with the value ofχ1 transferred from the template are reduced from 81 to 27.

iR6= 0 R6Z0= 1 R6R0= 1 aa=eMET R6cV6=105 V6cP6= 3

iR6= 1 R6Z0= 1 R6R0= 10 aa=LYS R6cV6= 27 V6cP6= 3

iR6= 2 R6Z0= 1 R6R0= 14 aa=ASP R6cV6= 3 V6cP6= 2

.

.

.

iR6= 26 R6Z0= 1 R6R0= 56 aa=LYSe R6cV6= 9 V6cP6= 2

Further characterization of incomplete side chains.

nW6= 6

iW6= 0 W6nJ6= 6

iW6= 0 iJ6= 0 W6J6R6= 0 [ 1: 1] eMET

iW6= 0 iJ6= 1 W6J6R6= 9 [ 1: 26] LYS

iW6= 0 iJ6= 2 W6J6R6= 16 [ 1: 37] TYR

continued on next page

50

Page 51: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

iW6= 0 iJ6= 3 W6J6R6= 25 [ 1: 55] MET

iW6= 0 iJ6= 4 W6J6R6= 26 [ 1: 56] LYSe

iW6= 0 iJ6= 5 W6J6R6= 8 [ 1: 24] ILE

iW6= 1 W6nJ6= 7

iW6= 1 iJ6= 0 W6J6R6= 1 [ 1: 10] LYS

iW6= 1 iJ6= 1 W6J6R6= 5 [ 1: 18] SER

iW6= 1 iJ6= 2 W6J6R6= 6 [ 1: 20] LYS

iW6= 1 iJ6= 3 W6J6R6= 7 [ 1: 21] ARG

iW6= 1 iJ6= 4 W6J6R6= 18 [ 1: 41] LEU

iW6= 1 iJ6= 5 W6J6R6= 19 [ 1: 42] ASN

iW6= 1 iJ6= 6 W6J6R6= 20 [ 1: 44] LYS

.

.

.

iW6= 5 W6nJ6= 1iW6= 5 iJ6= 0 W6J6R6= 21 [ 1: 45] ASP

The partitioning of incomplete side chains into packing units.

iW6= 0 nJ6= 6

iJ6= 0 iZ0= 1 iR0= 1 aa=eMET J6V6pck= 62

iJ6= 1 iZ0= 1 iR0= 26 aa=LYS J6V6pck= 0

iJ6= 2 iZ0= 1 iR0= 37 aa=TYR J6V6pck= 9

iJ6= 3 iZ0= 1 iR0= 55 aa=MET J6V6pck= 6

iJ6= 4 iZ0= 1 iR0= 56 aa=LYSe J6V6pck= 0

iJ6= 5 iZ0= 1 iR0= 24 aa=ILE J6V6pck= 2

iW6= 1 nJ6= 7

iJ6= 0 iZ0= 1 iR0= 10 aa=LYS J6V6pck= 0

iJ6= 1 iZ0= 1 iR0= 18 aa=SER J6V6pck= 0

iJ6= 2 iZ0= 1 iR0= 20 aa=LYS J6V6pck= 3

iJ6= 3 iZ0= 1 iR0= 21 aa=ARG J6V6pck= 5

iJ6= 4 iZ0= 1 iR0= 41 aa=LEU J6V6pck= 20

iJ6= 5 iZ0= 1 iR0= 42 aa=ASN J6V6pck= 5

iJ6= 6 iZ0= 1 iR0= 44 aa=LYS J6V6pck= 5

.

.

.

iW6= 5 nJ6= 1

iJ6= 0 iZ0= 1 iR0= 45 aa=ASP J6V6pck= 0

For each incomplete side chain of each packing unit, a listing of the rotamer se-lected.

EVALUATE DEFECT PROFILE

.

.

.

A listing of defects for the model structure.

As a second example using the rcyc command, we perform 1 cycle of energy refine-ment to the homology model characterized by MOL=grb2 and CNF=sparse. The follow-

51

Page 52: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

ing command executes the search.

% rcyc test grb2 sparse% cp ../../test/car/grb2.g013.pdb ../../test/car/grb2.hlog.pdb

The copy command assigns a more meaningful name to the final energy refined con-formation.

Visual comparison of the energy refined conformation, test/car/grb2.hlog.pdb,with the initial homology derived conformation, test/car/grb2.sparse.pdb, showsthat the quality is much improved by the refinement. Although changes to the struc-ture are localized, the packing of the hydrophobic core, the distribution of ionizedgroups on the protein surface, secondary structure elements, and the number of Hy-drogen bonds all move noticably toward patterns seen in native protein structures.

As an alternative to the rcyc command,%% estp test grb2 sparse hotuses energy-based structure prediction to patch the initial homology model. The al-tered prompt is meant to indicate that the above command is not a part of this testcase.

6.7 igor

FUNCTIONALITYfold recognition

SYNTAXigor FAM MOL

INPUT FILES OUTPUT FILES/FAM/exp/seq.MOL /FAM/dgn/igor.MOL

/FAM/seq/seq.MOL/FAM/tor/tor.MOL.CNF/FAM/car/MOL.CNF.pdb/FAM/car/MOL_folds.pdb

SUMMARY

1) inputs a target sequence in 1-letter code format, file FAM/exp/seq.MOL

2) threads the low-resolution igor model onto structures of SCOP domains, generatinga sample of structures consistent with the target sequence

3) outputs, in file FAM/seq/seq.MOL, a specification of residue sequence and disulfide

52

Page 53: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

crosslinks

4) for a collection of homology model conformations, using CNF set equal to the nameof the SCOP domain template, outputs 2 alternative specifications of the model struc-ture, files FAM/car/MOL.CNF.pdb and FAM/tor/tor.MOL.CNF

5) outputs, in file FAM/dgn/igor.MOL, a diagnostic summary of the execution of thecommand

6) outputs, in pdb-format file FAM/car/MOL_folds.pdb, Cα trace structures for all ofthe homology models of the collection, with each model structure translated such thatthe centers of mass form a 2 dimensional grid.

NOTES

For a polypeptide chain, the igor model consists of a paritioning of the continuousspace of conformations into a discrete set of chain states, and an energy function de-fined over the set of chain states. For each residue of a chain, the space of conforma-tions is partitioned into 3 states {H,E,C}, abbreviations for the elements of secondarystructure {helix, extended, coil} to which the residue can contribute. The set of chainstates is the set of mappings from the set of residues into {H,E,C}. Letting n be thenumber of residues, the number of chain states is 3n.

The igor model, described in detail in publications available on the company website,is a generalization of a residue Ising model. The residue Ising model is short range inthe sense that residue i interacts directly only with neighboring residues in the range{i-5,...,i+5}. The igor model extends the residue Ising model to include interactions ofintermediate range. Unlike the residue Ising model, for which integration over chainstates has a fast analytic solution, the igor model requires use of a trajectory searchover the space of element compositions to accomplish integration over chain states.For each element composition, integration over the subset of chain states consis-tent with the element composition has a fast analytic solution. The trajectory searchthrough the space of element compositions enables isolation of the collection of ele-ment compositions that contribute the largest statistical weights to the integration.

The igor command generates a sample of conformations for a single protein chain.These conformations are obtained as homology models based on a collection of SCOPdomain templates. The collection of templates is determined by screening the largeSCOP40 set of domains for hits, measured as compatibility of the SCOP domain struc-ture with the igor model of the protein chain. Other uses of the igor model include pre-diction of residue state probabilities, and generation of a sample of conformations fora segment of a protein chain.

The resolution of the igor model is too low to form the basis of a sensitive method forremote homolog detection. While not competitive for remote homolog detection, the

53

Page 54: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

igor command is sometimes useful in cases for which no remote homolog can be de-tected using psi-blast and more sensitive methods. Another application is for generat-ing an initial conformation for a polypeptide sequence in preparation for testing capa-bilities of the energy function and conformational search procedure.

The order of creation of model structures corresponds to decreasing values of the igormodel score when aligned to template. To clarify, the igor model has highest scorewhen aligned to the best template. The igor model scores associated with hits to SCOPdomains are included in the diagnostic summary.

File FAM/car/MOL_folds.pdb is useful for viewing the range of folds which producehits for consistency with the igor model.

An error message (ERROR: No backbone conf found for gap segment. Targetsequence incompatible with SCOP domain template. Continuing to nextSCOP domain template.) is printed when homology model building fails. This type offailure, caused by a large insertion or deletion in the alignment of the target sequenceto a SCOP domain template, occurs commonly. Failure to build models for question-able templates has little impact on the conformation sampling functionality of theigor command. In the context of the igor command, this condition should more prop-erly be ignored as opposed to being flagged as an ERROR. The ERROR notification re-flects use of the same homology model building code in the context of the hlog com-mand, where failure to build a model represents a true loss of functionality.

EXAMPLE TEST CASE

In this example, MOL=T0215 refers to target sequence T0215 from the casp6 (criticalassessment of structure prediction) blinded prediction experiment. The protein has53 residues. For comparison, a structure for this sequence, determined by NMR spec-troscopy, is included in file test/exp/1x9b.pdb. File seq.T0215, the residue sequencein 1-letter code format, has been entered into directory test/exp.

To execute the command, open a window to directory src/str, and type the followingline to the linux prompt.% igor test T0215

The igor command does not access the energy surface but can, for long targetchains, be computationally intensive. Using a 4.0GHz Intel Core i7 processor, foldrecognition for the 53 residue protein T0215 requires computation time of about 18minutes.

In the following abbreviated listing, a vertical line in column 1 indicates a line that hasbeen added for the purpose of description.

54

Page 55: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

file=test/dgn/igor.T0215

Diagnostic file created by the command:% igor test T0215

CALCULATE SECONDARY STRUCTURE PROBABILITY PROFILEIGOR SAMPLE

pZ0 pR0min pR0max

1 1 53

pZ0 is the chain index, and (pR0min,pR0max) are the (start,end) residue indexes forthe chain segment being considered.

For the remainder of this diagnostic file, residue indexes are internal to this seg-ment such that indexes 0 and 52 refer to the first and last residue of the segment.

________

CALCULATE RESIDUE ISING MODEL RESIDUE PROBABILITIESResiduesRNLSDRAKFESMINSPSKSVFVRNLNELEALAVRLGKSYRIQLDQAKEKWKVKStates Predicted by Residue Ising Model

CCCCHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHCCCHEECHHHHHHHHCCC

iR0 aa (H,E,C) probability (H,E,C) energy

0 ARG C 0.0591 0.0074 0.9335 -1.73 -3.81 1.03

1 ASN C 0.0441 0.0206 0.9353 -2.02 -2.78 1.03

2 LEU C 0.2198 0.0294 0.7508 -0.42 -2.43 0.813 SER C 0.2324 0.0250 0.7426 -0.36 -2.59 0.80

.

.

.

52 LYS C 0.0089 0.0017 0.9894 -3.63 -5.25 1.09

The igor model is a generalization of a residue Ising model. For the residue Isingmodel, integration over chain states has a fast analytic solution.

The above listing shows residue state probabilities calculated using a residue Isingmodel. This profile of residue state probabilities is used as input to a screen of alarge set of SCOP domains, in which SCOP domains are excluded when the fold ofthe domain is not compatible with the target sequence.

Model energies are defined as ln(probability) plus a constant. Because this defini-tion differs from physical energy by a factor of -kT, model energy and its compo-nents are also referred to as scores.

________

EXCLUDE SCOP DOMAINS INCOMPATIBLE WITH RESIDUE ISING MODEL

scop40: passed= 204

scopdomain scopfamily respacksco res%id

1bl0A2 a.4.1.8 0.76 8.00

continued on next page

55

Page 56: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

1e7lA1 a.140.4.1 0.76 12.50

1j75A_ a.4.5.19 0.76 18.37

1efuB3 a.5.2.2 0.75 7.69

.

.

.

1iufA1 a.4.1.7 0.32 3.77

A set of 5529 SCOP domains is screened for compatibility with the residue Isingmodel. A collection of 204 hits is retained. This collection will be reordered basedon threading of the igor model onto the domain structures.

________

TRAJECTORY SEARCH THROUGH SPACE OF ELEMENT COMPOSITIONS WITH NO TEMPLATE

SCORE SCO0 SCO1 SCO1b SCO2 SCO2b SCO3

31.44 13.86 6.31 6.58 3.02 1.68 0.00

sequence of most probable residue states obtained using residue ising model=

CCCCHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHCCCHEECHHHHHHHHCCC

chain state of most probable element composition=CCCCHHHHHHHHHHCCCHHHHHHHHHHHHHHHHHHCCCCCCCHHHHHHHHHHC

The igor model requires use of a trajectory search over element compositions toaccomplish integration over chain states. For each element composition, integra-tion over the subset of chain states consistent with the element composition has afast analytic solution. The trajectory search through the space of element composi-tions enables isolation of the collection of element compositions that contribute thelargest statistical weights to the integration.

The above listing shows the most probable state of the most probable elementcomposition. The score for this most probable composition is 31.44. SCO3, a contri-bution attributed to long range interactions, is evaluated by threading of the igormodel onto the fold of a template. Because no template was used here, SCO3 iszero.

________

TRAJECTORY SEARCH THROUGH SPACE OF ELEMENT COMPOSITIONS WITH TEMPLATE

template=1bl0A2 scopfamily=a.4.1.8

SCORE SCO0 SCO1 SCO1b SCO2 SCO2b SCO3

51.25 11.41 -5.24 9.21 3.69 3.69 28.48chain state of most probable element composition=CCCCHHHHHHHHHCCCCHHHHHHHCCCHHHHHHHHHHHHCCCHHHHHCCCCCC

sequence of elements alignment:

element type and predicted state for most probable element composition=pvpvuidqe

CHCECHCHC

element type and observed state for template=tipvdideiCHCHCHCHC

.

continued on next page

56

Page 57: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

.

.

template=1iufA1 scopfamily=a.4.1.7

SCORE SCO0 SCO1 SCO1b SCO2 SCO2b SCO3

30.20 10.55 -7.43 9.21 5.27 2.57 10.04chain state of most probable element composition=CCHHHHHHHHHHHHCCCCCCCHHHHHHHHHHHCCCCCCHHHHHHCHHHHHHHC

sequence of elements alignment:

element type and predicted state for most probable element composition=ufvieetetCHCHCECHC

element type and observed state for template=iiqihvtvrCHCHCHCHC

For each template in the collection of hits, the igor model was threaded onto thefold of the template. Note that the final template scores worse than than the modelwith no template, thus indicating a fold incompatible with the target sequence.

For the collection of top scoring templates, the alignments obtained here, of targetto template, are input to homology model building to generate the sample of targetchain conformations that constitutes the functionality of the igor command. Theoutput of the following sections, describing the integration over element composi-tions to obtain improved residue state probabilities, is used currently by the profcommand and the initcnf command.

________

PROFILE OF POSITION SPECIFIC IMPULSES ASSOCIATED LONG RANGE PATTERNS#

scopdomain scopfamily iQ7 SCORE SCO0 SCO1 SCO1b SCO2 SCO2b SCO3

1jeqA1 a.140.2.1 4 53.18 13.46 4.45 6.58 3.68 1.50 23.53

1akhA_ a.4.1.1 14 52.74 13.93 2.95 6.58 4.37 1.05 23.87

1bl0A2 a.4.1.8 0 51.25 11.41 -5.24 9.21 3.69 3.69 28.48

1qbjA_ a.4.5.19 5 50.73 7.48 -8.22 10.52 5.25 1.59 34.10

.

.

.

1sfe_1 a.4.2.1 187 3.06 7.58 -12.90 11.84 6.96 2.32 -12.73#residue ising model CCCCHHHHHHHHHCCCCHHHHHHHHHHHHHHHHHHCCCHEECHHHHHHHHCCC1jeqA1 a.140.2.1 4 CCCCHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHHCCCCCCCHHHHHHHHHHC

000000000000000000000000222222222221111111000000000021akhA_ a.4.1.1 14 CCHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHHCCCHHHHHHHHHHHHHHC

00000000000000AAAAAAAAAA333333333330000000000000000031bl0A2 a.4.1.8 0 CCCCHHHHHHHHHCCCCHHHHHHHCCCHHHHHHHHHHHHCCCHHHHHCCCCCC

DDDD00000000000000000000AAA44444444444433333333CCCCCC1qbjA_ a.4.5.19 5 HHHHHHHHHHHHHHCCCHHHHHHHHCHHHHHHHHHHHHCCEEEECCCCCEEEC

11111111111111III33333333W222222222222006666000002222

.

.

.

continued on next page

57

Page 58: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

1sfe_1 a.4.2.1 187 CCHHHHHHCHHHHHCCCHHHHHHHHHHCCCCCCCEECCHHHHHHHHHHHHCCC

00GGGGGG*AAAAAGGG2222222222BBBBBBB33QQ000000000000000

The collection of templates is reordered based on SCORE. iQ7 is the index of thetemplate before reordering.

In the lower half of the above listing, the string of characters shown below thestring of residue states is a representation of the energy density of SCO3, the longrange packing energy of the igor model threaded onto the fold of the template. Thestring of characters,

ZYXWVUTSRQPONMLKJIHGFEDCBA0123456789abcdefghijklmnopqrstuvwxyz,

shows the 1-character code progression from very unfavorable to very favorable,with the zero of energy density corresponding to character 0.

At this point, an attempt is made to incorporate favorable long range contribu-tions to energy, identified in the threading of the igor model to templates, into theoriginal igor model (without templates). To accomplish this goal, position specificresidue and element impulses are added to the igor model. These position specificimpulses are applicable only to the igor model constructed for the current targetsequence, and would not be useful for any other target sequence.

________

CALCULATE RESIDUE ISING MODEL RESIDUE PROBABILITIESResiduesRNLSDRAKFESMINSPSKSVFVRNLNELEALAVRLGKSYRIQLDQAKEKWKVKStates Predicted by Residue Ising Model

CCCCHHHHHHHHHHCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCC

iR0 aa (H,E,C) probability (H,E,C) energy

0 ARG C 0.0649 0.0069 0.9283 -1.64 -3.88 1.02

1 ASN C 0.0543 0.0156 0.9301 -1.82 -3.06 1.03

2 LEU C 0.3674 0.0216 0.6110 0.10 -2.74 0.613 SER C 0.3823 0.0178 0.5999 0.14 -2.93 0.59

.

.

.

52 LYS C 0.0099 0.0016 0.9885 -3.52 -5.35 1.09

Following addition of position specific impulses to the residue Ising model, residuestate probabilities are recalculated.

________

TRAJECTORY SEARCH THROUGH SPACE OF ELEMENT COMPOSITIONS WITH NO TEMPLATE

SCORE SCO0 SCO1 SCO1b SCO2 SCO2b SCO3

37.27 13.06 4.63 6.58 13.01 0.00 0.00

sequence of most probable residue states obtained using residue ising model=

CCCCHHHHHHHHHHCCCHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHHCCC

chain state of most probable element composition=CCHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHCCCHHHHHHHHHHC

continued on next page

58

Page 59: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Following addition of position specific impulses to the igor model, the most proba-ble state of the most probable element composition, is recalculated. Note that thescore has changed from the previous calculation.

________

PARTIAL INTEGRATION OVER ELEMENT COMPOSITIONS WITHIN LOCAL REGIONS

region=( 0, 41)

free en <SCO > <SCO0> <SCO1> <SCO1b> <SCO2> <SCO2b> #comps

region state

3.08 38.46 13.58 3.95 6.58 14.35 0.00 35CCCCHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHHCCCCCCC pvvie CHCHC

0.36 36.77 13.14 4.05 6.58 13.01 0.00 9CCHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHHHHHCCC uvvid CHCHC

-0.75 35.24 13.51 3.19 6.58 11.86 0.11 26CCHHHHHHHHHHCCCCCHHHHHHHHHHHHHHHHHHCCCCCCC uvvie CHCHC

-1.00 34.74 11.54 3.43 9.21 10.48 0.09 19CCCCHHHHHHHHHHCCCCCEEECCHHHHHHHHHHHHHHHCCC pvqduid CHCECHC

.

.

.

-8.76 28.34 10.35 1.38 9.21 6.62 0.79 1CCHHHHHHHHHHHCCCCHHHHHHHHHHCCHHHHHHHHHHCCC ufpiuid CHCHCHC

region=( 12, 52)

free en <SCO > <SCO0> <SCO1> <SCO1b> <SCO2> <SCO2b> #comps

region state

0.94 36.43 13.25 3.59 6.58 13.01 0.00 80CCCCCHHHHHHHHHHHHHHHHHHHHHHCCCHHHHHHHHHHC vidit CHCHC

-0.07 35.17 13.63 3.00 6.58 11.85 0.11 103CCCCCHHHHHHHHHHHHHHHCCCCCCCCCCHHHHHHHHHHC vieit CHCHC

-0.55 34.93 13.93 2.25 6.58 12.18 0.00 63CCCCCHHHHHHHHHHHHHHHCCCCHHHHHHHHHHHHHHCCC vidid CHCHC

-3.42 31.36 13.61 0.66 6.58 10.48 0.03 146CCCCCHHHHHHHHHHHHHHHCCCCHHHHHHHHHHHCCCCCC vidie CHCHC

.

.

.

-9.17 27.43 11.02 -2.09 9.21 6.33 2.96 2CCCCCEEEEECCHHHHHHHHHHHCCCHHHHHHHHHHHHCCC vvuipid CECHCHC

A search through the space of element compositions identifies a collection of ele-ment compositions having the largest statistical weights. Bacause the number ofelement compositions grows exponentially with chain length, search through thespace of element compositions is accomplished as a sequence of searches, witheach search focused on a localized region of the chain. The regions overlap, andthe union of the set of regions spans the chain. For the current example, 2 regionsare required: ( 0,41) and (12,52).

Residue state probabilities are calculated separately for each region, by integra-tion over the collection of most stable element compositions associated with thatregion, then merged to give full chain residue state probabilities.

continued on next page

59

Page 60: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

________

RESIDUE PROBABILITIES FROM LOCAL INTEGRATIONResiduesRNLSDRAKFESMINSPSKSVFVRNLNELEALAVRLGKSYRIQLDQAKEKWKVKStates Predicted by Integration Over Element Compositions

CCCCHHHHHHHHHHCCCCCCCCCHHHHHHHHHHHHCCCCCCHHHHHHHHHHHC

iR0 aa (H,E,C) probability

0 ARG C 0.0004 0.0004 0.9992

1 ASN C 0.0004 0.0004 0.9992

2 LEU C 0.1124 0.0010 0.88663 SER C 0.1137 0.0010 0.8853

.

.

.

52 LYS C 0.0002 0.0000 0.9998

The above residue state probabilities are a best estimate obtained by integrationover element compositions.

________

GENERATION OF C-ALPHA CHAINS

scopdomain scopfamily iQ7 frac of target

1jeqA1 a.140.2.1 4 0.887

1akhA_ a.4.1.1 14 0.8871bl0A2 a.4.1.8 0 0.943

1qbjA_ a.4.5.19 5 1.000

.

.

.1cipA2 c.37.1.8 152 0.547

Output file test/car/T0215_folds.pdb contains a visual summary of the foldscompatible with the igor model of the target sequence. Cα trace structures of thehomology models based on the highest-scoring templates are positioned on a 2-dimensional grid.

For each template, the above listing shows the fraction of the target aligned to thetemplate.

INPUT SCOP DOMAIN TEMPLATE STRUCTUREALIGN TARGET SEQUENCE TO TEMPLATE STRUCTUREBUILD TARGET HOMOLOGY MODELEVALUATE DEFECT PROFILE

.

.

.

Comparison of the model structures with the NMR structure, pdb entry 1x9b, showsthat the model ranked second in the sample, and built from template 1akhA_ repre-senting SCOP family a.4.1.1, approximates the native conformation in secondarystructure and topology. Use of this model as a starting point for conformational search

60

Page 61: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

would be expected to be productive.

6.8 ionstate

FUNCTIONALITYcalculate pKa of ionizable groups

SYNTAXionstate FAM MOL

INPUT FILES OUTPUT FILES/FAM/exp/MOL.pdb /FAM/dgn/ionstate.MOL

/FAM/dgn/ionstate_titration.MOL/FAM/seq/seq.MOL???/FAM/tor/tor.MOL???.ph/FAM/car/MOL???.ph.pdb

SUMMARY

1) inputs a pdb-format file, FAM/exp/MOL.pdb, most often a structure prepared by theereg command and copied from directory FAM/car

2) regularizes geometry, meaning bond lengths, bond angles, and some torsion anglesare adjusted to standard values with minimal movement of atom coordinates

3) iterates over a range of pH values

4) for each pH, calculates a probability distribution over the set of ionization states

5) for each ionizable group, calculates pKa

6) for each pH ??.? (where ? is a digit of pH expressed to 1 decimal place), outputs(in file FAM/seq/seq.MOL???) residue sequence and disulfide crosslinks for the mostprobable ionization state at pH=??.?

7) for each pH ??.?, outputs (using CNF=ph) 2 alternative specifications (in pdb andtorsion angle formats) of the most probable conformation for the sequence of the mostprobable ionization state at pH=??.?

8) outputs a diagnostic summary of the execution of the command in 2 files:FAM/dgn/ionstate.MOL and FAM/dgn/ionstate_titration.MOL

NOTES

61

Page 62: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

A common use of the ionstate command is to calculate the most probable ionizationstate for a specified pH, and to generate a most probable conformation consistent withthis ionization state. This usage enables subsequent modeling of the most probablesequence, including protonation state, for a given pH.

The outer loop of the command is a titration over a wide range of pH values. For eachpH value, search through the combinatorial space of whole-protein ionization statesis used to isolate the collection of ionization states having the lowest free energies.For each whole-protein ionization state x, the free energy dG(x,pH) is calculated as asum of free energies for 2 component processes: g(x,pH), the free energy required tocreate x in the absence of the protein environment, and dF(x), the energy contributedby the protein environment to stabilization of x. For each ionizable group, the titrationdata is used to calculate pKa in the environment of the protein.

g(x,pH) is calculated as a sum over the subset of ionizable groups such that the ion-ization state in x differs from the ionization state observed for a model peptide at thecurrent pH. The function being integrated is the experimental free energy required tochange the state of the ionizable group in a model peptide at the current pH. dF(x) iscalculated as the model energy of ionization state x minus the model energy of thefully protonated reference state minus the sum, over ionizable groups deprotonatedin x, of the change in model energy for the process of deprotonation for the ionizablegroup in a model peptide. dG(x,pH) is calculated as g(x,pH) plus h(dF(x)), where thefunction h reduces differences in dF between ionization states. The damping functionh() is included to account for relaxation of structure with respect to degrees of free-dom held fixed in the model. Details of the calculation are described in a publicationavailable on the company website.

EXAMPLE TEST CASE

File 1lni.pdb, a crystal structure of a 96-residue protein, ribonuclease from strepto-myces aureofaciens, has been entered into directory test/exp. To best model pro-tonation of the carboxyl groups of ASP and GLU the original pdb file was modified asfollows. The structure was viewed graphically. For each ASP (or GLU) residue, coordi-nates for the pair of atoms {OD1,OD2} of ASP (or {OE1,OE2} of GLU) are optionally ex-changed such that the OD2 of ASP (or OE2 of GLU) is the atom of the pair more freelyaccessible to addition of a proton.

To execute the command, open a window to directory src/str, and type the followingline to the linux prompt.% ereg test 1lni% cp ../../test/car/1lni.07.pdb ../../test/exp/x1lni.pdb% ionstate test x1lni

The ionstate command accesses the energy surface. Using a 4.0GHz Intel Core i7processor, titration of the 96 residue protein 1lni requires computation time of about

62

Page 63: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

42 hours.

The output diagnostic file /FAM/dgn/ionstate.MOL contains information that, whileuseful for algorithm development, is best ignored for application usage of the com-mand. File /FAM/dgn/ionstate_titration.MOL contains a compact summary of thetitration.

In the following abbreviated listing, a vertical line in column 1 indicates a line that hasbeen added for the purpose of description.

file=test/dgn/ionstate_titration.x1lni

Diagnostic file created by the command:% ionstate test x1lni

pKa OF IONIZABLE GROUPS

res pept prot

1 eASP [ 7.50] 8.55

1 eASP [ 4.00] 2.95

14 GLU [ 4.40] 5.20

17 ASP [ 4.00] 3.35

25 ASP [ 4.00] 4.05

30 TYR [ 9.60] 10.35

33 ASP [ 4.00] 1.80

41 GLU [ 4.40] 3.85

49 TYR [ 9.60] 9.80

51 TYR [ 9.60] 11.70

52 TYR [ 9.60] 12.50

53 HIS [ 6.30] 7.45

54 GLU [ 4.40] 3.00

55 TYR [ 9.60] 10.90

74 GLU [ 4.40] 3.70

78 GLU [ 4.40] 3.20

79 ASZ [ 4.00] 7.25

80 TYR [ 9.60] 12.25

81 TYR [ 9.60] 11.00

84 ASP [ 4.00] 3.60

85 HIS [ 6.30] 6.55

86 TYR [ 9.60] 11.05

93 ASP [ 4.00] 3.50

96 CYSe [ 3.80] 2.80

For each ionizable group, the above listing shows pKa calculated in the environ-ment of the protein. For comparison, the value in brackets is experimental for thefunctional group in a peptide.

continued on next page

63

Page 64: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

TITRATION

pH DDEDDYDEYYYHEYEEDYYDHYDC dG dF g p pp h ps ss

0.00 000000000000000000000000 -22.40 -0.07 0.00 0.992 -116.55 -83.13 -393.11 150.41

0.10 000000000000000000000000 -22.40 -0.07 0.00 0.990 -116.55 -83.13 -393.11 150.41

0.20 000000000000000000000000 -22.40 -0.07 0.00 0.988 -116.55 -83.13 -393.11 150.410.30 000000000000000000000000 -22.40 -0.07 0.00 0.985 -116.55 -83.13 -393.11 150.41

.

.

.

3.10 000000100000000000000001 -22.82 -19.70 2.18 0.117 -185.21 -83.36 -314.82 116.733.20 010000100000100000000001 -23.28 -27.70 4.64 0.096 -241.60 -83.41 -234.64 81.55

3.30 010000100000100000000001 -23.82 -27.70 4.09 0.067 -241.60 -83.41 -234.64 81.55

3.40 010100100000100100000011 -24.70 -31.90 6.55 0.064 -244.16 -83.12 -240.48 80.48

3.50 010100100000100100000011 -25.66 -31.90 5.59 0.059 -244.16 -83.12 -240.48 80.48

3.60 010100100000101100010011 -26.88 -33.80 6.28 0.069 -237.44 -83.44 -257.41 85.703.70 010100100000101100010011 -28.10 -33.80 5.05 0.080 -237.44 -83.44 -257.41 85.70

3.80 010100100000101100010011 -29.33 -33.80 3.82 0.078 -237.44 -83.44 -257.41 85.70

.

.

.7.10 011110110000101100011011 -26.82 -32.79 5.32 0.441 -192.78 -83.37 -334.27 116.49

7.20 011110110000101100011011 -26.55 -32.79 5.59 0.398 -192.78 -83.37 -334.27 116.49

7.30 011110110000101100011011 -26.27 -32.79 5.87 0.343 -192.78 -83.37 -334.27 116.49

7.40 011110110001101110011011 -26.03 -23.95 0.00 0.298 -114.92 -83.44 -449.48 164.16

7.50 011110110001101110011011 -26.03 -23.95 0.00 0.371 -114.92 -83.44 -449.48 164.167.60 011110110001101110011011 -25.89 -23.95 0.14 0.442 -114.92 -83.44 -449.48 164.16

7.70 011110110001101110011011 -25.76 -23.95 0.27 0.504 -114.92 -83.44 -449.48 164.16

7.80 011110110001101110011011 -25.62 -23.95 0.41 0.552 -114.92 -83.44 -449.48 164.16

.

.

.

10.10 111110111001101110011011 -19.56 -17.77 4.77 0.119 -38.82 -83.35 -580.12 218.39

10.20 111110111001101110011011 -18.61 -17.77 5.73 0.093 -38.82 -83.35 -580.12 218.39

10.30 111111111001101110011011 -17.78 -13.44 5.73 0.088 -11.88 -83.68 -634.35 240.84

10.40 111111111001101110011111 -17.04 -6.57 5.46 0.095 33.79 -83.74 -715.93 274.1910.50 111111111001101110011111 -16.36 -6.57 6.14 0.108 33.79 -83.74 -715.93 274.19

10.60 111111111001101110011111 -15.68 -6.57 6.82 0.111 33.79 -83.74 -715.93 274.19

10.70 111111111001111110111011 -15.51 2.41 6.00 0.212 95.09 -83.52 -821.72 317.95

10.80 111111111001111110011111 -14.99 2.10 6.55 0.205 100.12 -83.69 -829.51 320.58

.

.

.

The 1st column specifies the pH.

For each pH of the titration, the 2nd column specifies the ionization state having thelowest free energy. The header of this column is a string of 1-letter code names forthe set of ionizable groups listed in the previous section. The binary choice of {0,1}is used to indicate {protonated,unprotonated}.

The remaining columns characterize the most-probable ionization state (specified incolumn 2). Column headings use the following notation.

continued on next page

64

Page 65: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

dG is the calculated free energy.

dF, calculated as f(ionstate) −f(fully protonated) −df(peptide), is the model energyof the ionization state minus the model energy of the fully protonated referencestate minus the sum, over deprotonated ionizable groups, of the change in modelenergy for the process of deprotonation for the ionizable group in a model peptide.

p is the probability.

g is a sum (over the subset of ionizable groups such that the ionization state differsform the ionization state observed for a model peptide at the current pH) of theexperimental free energy required to change the state of the ionizable group in amodel peptide.

More conceptually, g is the free energy required to create the whole protein ion-ization state in the absence of the protein environment, and dF is the energy con-tributed by the protein environment to stabilization of this ionization state. dG isobtained by a merging of dF and g.

pp=( Fr+Fe+Fs+Ft+Fb+Fg+Fp), the interaction energy of the protein with itself.

h= Fh, the hydrophobic component of the hydration free energy.

ps= Fps, the electrostatic interaction energy of the protein with the boundary sur-face.

ss= Fss, the electrostatic interaction energy of the boundary surface with itself.

As shown in the above section, 7 of the 12 ASP and GLU side chains change proto-nation state in the pH range ( 3.1, 3.8).

pH -DEDD-DE---E-EED-D-D-3.10 -0000-10---0-000-0-0-3.20 -1000-10---1-000-0-0-3.30 -1000-10---1-000-0-0-3.40 -1010-10---1-010-0-1-3.50 -1010-10---1-010-0-1-3.60 -1010-10---1-110-1-1-3.70 -1010-10---1-110-1-1-3.80 -1010-10---1-110-1-1-

continued on next page

65

Page 66: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

In the pH range ( 7.1, 7.8), HIS 53 and ASP 79 change protonation state.pH -------H---D-----7.10 -------0---0-----7.20 -------0---0-----7.30 -------0---0-----7.40 -------1---1-----7.50 -------1---1-----7.60 -------1---1-----7.70 -------1---1-----7.80 -------1---1-----

In the pH range (10.1,10.8), 4 of the 8 TYR side chains change protonation state.pH ---Y-YYY-Y--YY-Y-

10.10 ---0-100-0--00-0-10.20 ---0-100-0--00-0-10.30 ---1-100-0--00-0-10.40 ---1-100-0--00-1-10.50 ---1-100-0--00-1-10.60 ---1-100-0--00-1-10.70 ---1-100-1--01-0-10.80 ---1-100-1--00-1-

An unusual property of this protein is the observed protonation of an ASP residue (ASP79) at pH 7.0. Consistent with this observed protonation, the calculated pKa value forAsp 79 is 7.25. In Table II, calculated values of pKa are compared to experimental val-ues for all ionizable groups of 1lni. Also included in Table II are experimental values ofpKa obtained for the ionizable group in a model peptide.

66

Page 67: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

Table II. Comparison of calculation to experiment for pKavalues of ionizable groupsof 1lni (a crystal structure of Ribonuclease SA).

pKa

residue pepta exptb calcc

1 eASP [ 7.50] 9.14 8.55

1 eASP [ 4.00] 3.44 2.95

14 GLU [ 4.40] 5.02 5.20

17 ASP [ 4.00] 3.72 3.35

25 ASP [ 4.00] 4.87 4.05

30 TYR [ 9.60] 11.30 10.35

33 ASP [ 4.00] 2.39 1.80

41 GLU [ 4.40] 4.14 3.85

49 TYR [ 9.60] 10.60 9.80

51 TYR [ 9.60] 11.50 11.70

52 TYR [ 9.60] 11.50 12.50

53 HIS [ 6.30] 8.27 7.45

54 GLU [ 4.40] 3.42 3.00

55 TYR [ 9.60] 11.50 10.90

74 GLU [ 4.40] 3.47 3.70

78 GLU [ 4.40] 3.13 3.20

79 ASZ [ 4.00] 7.37 7.25

80 TYR [ 9.60] 11.50 12.25

81 TYR [ 9.60] 11.50 11.00

84 ASP [ 4.00] 3.01 3.60

85 HIS [ 6.30] 6.35 6.55

86 TYR [ 9.60] 11.50 11.05

93 ASP [ 4.00] 3.09 3.50

96 CYSe [ 3.80] 2.42 2.80

aExperimental value for the ionizable group in a model peptide.bExperimental value in the protein.cCalculated value in the protein.

6.9 ptra

FUNCTIONALITYglobal search on the restbch_m energy surface

67

Page 68: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

SYNTAXptra FAM MOL CNF SUB

INPUT FILES OUTPUT FILES/FAM/seq/seq.MOL /FAM/dgn/ptra.MOL.CNF.SUB/FAM/tor/tor.MOL.CNF /FAM/car/MOL.g???.pdb/FAM/stp/stp.SUB.MOL /FAM/tor/tor.MOL.g???

SUMMARY

1) inputs a sequence, FAM/seq/seq.MOL

2) inputs an initial conformation in torsion angle format, FAM/tor/tor.MOL.CNF

3) inputs a subset of degrees of freedom, FAM/stp/stp.SUB.MOL, for which variationwill be allowed

4) generates a trajectory of local minima on the restbch_m energy surface

5) at each step of the trajectory, generates a neighboring local minimum, with a biastoward lowering the electrostatic (Fe+F_m) and hydrophobic (Fh) components, and de-cides acceptance based on metropolis monte carlo

6) for each local minimum of the trajectory, outputs 2 alternative specifications of con-formation (in pdb and torsion angle formats) using CNF=g??? (where ? is a digit)

7) outputs, in file FAM/dgn/ptra.MOL.CNF.SUB, a diagnostic summary of the executionof the command, including a compact summary of the energies and acceptance deci-sions over the sequence of local minima generated on the restbch_m energy surface

NOTES

The ptra and etra commands use a simplified alternative to the full energy (definedas Fr+Fe+Fs+Ft+Fb+Fc+Fh+Fm+Fg+Fp). Fm, the dielectric medium component, is re-placed with an approximation, F_m, in which some of the concave and saddle facescontributing to the molecular surface are neglected. Also neglected from the full en-ergy are the Fg and Fp components. This approximation enables calculation of analytic1st and 2nd derivatives of F_m. In contrast, because of its complexity, analytic deriva-tives of Fm are not available. As a consequence, full energy is obtained, always, as asingle point evaluation at the endpoint of minimization on an energy surface that ne-glects this component.

Let p be the variable coordinates for a subset of repulsion parameters. More specifi-cally, each parameter of the subset controls, for a pair of atom types, the interatomic

68

Page 69: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

distance of the minimum of the Fr potential energy well. And let q be the variable co-ordinates for a subset of torsion angle degrees of freedom.

A trajectory through the combined space of parameters and torsion angles is gener-ated by minimization of a target function F(p)=( Fy(q(p)) +Fz(q(p)) +Fw(p)) with re-spect to p. The torsion angles q also vary, but are dependent on p. The function q(p)is determined by the constraint that q be a local minimum conformation of the energysurface restbch_m with respect to q, which depends on p through the dependence ofFr(p,q), the repulsion +dispersion component, on atomic radii.

A search through local energy minima is accomplished as a sequence of cycles, eachof which produces a local minimum conformation. Each cycle consists of 3 local mini-mization trajectories: local minimization of an inflation target function with respect top, local minimization of an equilibration target function with respect to p, and finally,following restoration of the original (physically meaningful) atomic radii, local mini-mization of the restbch_m energy with respect to q.

Components of the inflation target function have physical meaning as follows. Fy=−(Fe+Fh+Ft+Fb+F_m) is the negative of electrostatic energy +hydrophobic energy +in-trinsic torsional energy. Fz is proportional to the length (meaning the root meansquare) of the displacement of q, the local energy minimum conformation on the rest-bch_m surface. Fw drives inflation by decreasing in value as atomic radii expand. Theinflation trajectory consists of a maximum of 40 steps through parameter space. Stop-ping criteria include the change in Fy(q(p)) becoming larger than a threshold value.

The combined effect of minimizing these target function components is expansion ofatomic radii, pushed by decrease of Fw(p), on a path through parameter space suchthat the electrostatic +hydrophobic energy climbs the maximum slope. Inflation ofatomic radii smooths Fy(q) by altering the balance between the electrostatic interac-tion energy contributed by contacting atom pairs and the electrostatic interaction en-ergy contributed by all other, more distantly separated, atom pairs.

The equilibration trajectory differs from the inflation trajectory by a change to the tar-get function. Components Fy and Fz change by a reversal of sign relative to the defi-nitions used for the inflation trajectory. Component Fw remains unchanged. The equili-bration trajectory consists of a maximum of 80 steps through parameter space. Stop-ping criteria include the change in Fy(q(p)) reaching a threshold value.

The equilibration trajectory through parameter space is an attempt to facilitate tran-sition between local minima on the restbch_m energy surface. In contrast, using thephysically meaningful values of atomic radii, and not allowing variation, generatingtransitions between local minima is complicated by the existence of energy barriers.Here, a path through parameter space is driven by target function components Fw(p)and Fz(q(p)). Component Fy(q(p)) adds a bias pushing the path toward lower electro-

69

Page 70: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

static +hydrophobic energy. In this way, the energy, as measured by component Fy,pumped into the mechanical system by components Fw and Fz is analagous to kineticenergy in the physical equations of motion.

Following completion of the 2 trajectories, inflation followed by equilibration, gener-ated by target function minimization with respect to p, atomic radii are restored totheir original, physically meaningful, values. The cycle ends with local minimizationon the restbch_m energy surface with respect to q. The resulting local minimum con-formation is either accepted or rejected as the next point of the trajectory of local min-ima based on a metropolis monte carlo acceptance criteria.

Iteration over cycles generates a trajectory of local minima on the restbch_m energysurface. For each new cycle, the algorithm selects a different collection of pairs ofatom types to be used for parameter variation.

EXAMPLE TEST CASE

We generate 4 steps of a trajectory of local minima, intended to lower elec-trostatic +hydrophobic energy, for the trp cage mini protein. The input filestest/seq/seq.2jof and test/tor/tor.2jof.s002 have been created in earlier ex-amples. The file test/stp/stp.full.2jof, specifying all torsion angles as variable,has been created by hand. As a point of reference, the restbch_m energy for the re-gion of the nmr structure, calculated in the test case for the etra command, is -387.51kcal/mol.

To execute the command, open a window to directory src/str, and type the followinglines to the linux prompt.% ptra test 2jof s002 full% cp ../../test/tor/tor.2jof.g003 ../../test/tor/tor.2jof.s003% cp ../../test/car/2jof.g003.pdb ../../test/car/2jof.s003.pdb

To speed execution, the ptra command uses calls to LAPACK library functions to ac-complish matrix inversion. The LAPACK code introduces into the calculation a lackof reproducibility, possibly originating from the use of multithreaded execution. Re-placement of the calls to LAPACK with equivalent linear algebra functionality providedwithin the ereg codebase restores reproducibility. The variability with successive runs,which begins in the 15th decimal place of a small number of elements of the invertedmatrix, builds analogous to roundoff error until the computed trajectories diverge. Thelisting provided was chosen as an example of the diagnostic output produced by theptra command. Use of the optimized LAPACK code was considered preferable to useof the slower, but reproducible, code internal to the ereg package.

70

Page 71: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

file test/dgn/ptra.2jof.s002.full

Diagnostic file created by the command:% ptra test 2jof s002 full

CONSTRUCT MECHANICAL SYSTEM

subset name=full

nZ0 nS0

1 0

cQ1 cQ1bb cU1 cJ1 cY1

109 60 49 78 50

cQ2 cQ2bb

109 60

cF1 cG2 cB2 cH1 cG1 cB1

1328 1296 32 516 1296 32cQ3 cX2

109 108

cE0 cC1

5886 28

Z0sub0

iR0 R0aa Q0sub

1 eASP 1111

2 ALA 1111

3 TYR 1111114 ALA 1111

5 GLN 1111111

6 TRP 11111

7 LEU 1111111

8 LYS 111111119 ASP 11111

10 GLY 111

11 GLY 111

12 PRO 111111

13 SER 1111114 SER 11111

15 GLY 111

16 ARG 1111111111

17 PRO 111111

18 PRO 11111119 PRO 111111

20 SERe 11111

.

.

.

All torsion angle degrees of freedom are variable.

INFLATE RESTCH_M

i F z b lam2 s2 delF

0 5.66561e+03 1.0994e+03 2.4000e-01 3.5461e+03 2.4003e-01 4.73084e+03

1 4.72427e+03 8.7835e+02 2.6400e-01 2.5408e+03 2.6400e-01 3.90753e+03

2 3.89933e+03 6.9420e+02 2.9040e-01 1.7838e+03 2.9142e-01 3.19259e+03

continued on next page

71

Page 72: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

3 3.18488e+03 5.4043e+02 3.1944e-01 1.2340e+03 3.2029e-01 2.58609e+03

4 2.57827e+03 4.1332e+02 3.5138e-01 8.3350e+02 3.5208e-01 2.08094e+03

.

.

.16 5.63668e+02 1.3182e+01 4.0000e-01 1.8559e+01 4.0169e-01 5.47110e+02

17 5.44474e+02 8.1270e+00 4.0000e-01 1.4755e+01 4.0013e-01 5.33246e+02

18 5.27276e+02 5.8114e+00

tot Fr Fe Fs Ft Fc Fb Fh F_m

-367.86 162.17 -423.04 0.00 -4.24 0.00 37.43 -34.10 -106.08-370.03 162.34 -426.23 0.00 -4.17 0.00 36.71 -34.33 -104.36

-371.01 161.85 -426.87 0.00 -3.99 0.00 36.56 -34.43 -104.14

-372.23 160.85 -426.95 0.00 -3.93 0.00 36.50 -34.50 -104.21

-373.74 159.82 -427.60 0.00 -3.81 0.00 36.43 -34.66 -103.92

.

.

.

-335.77 177.53 -419.83 0.00 -9.10 0.00 38.93 -19.62 -103.67

-321.66 183.44 -415.16 0.00 -8.78 0.00 39.54 -15.12 -105.58

-307.63 191.10 -410.19 0.00 -7.00 0.00 39.11 -8.92 -111.72________ ________ ________ ________ ________ ________ ________ ________ ________

60.23 28.93 12.84 0.00 -2.76 0.00 1.68 25.18 -5.64

F Fy Fz Fw lamda s delf

5665.61 534.78 0.22 5130.61 6.58 0.020 -0.29

4724.27 536.99 0.34 4186.94 3.79 0.020 -0.163899.33 537.24 0.36 3361.73 2.76 0.020 -0.13

3184.88 537.78 0.38 2646.73 3.60 0.020 -0.15

2578.27 537.96 0.39 2039.92 2.54 0.020 -0.12

.

.

.

563.67 517.00 0.97 45.70 4.30 0.020 -0.20

544.47 510.28 1.04 33.15 266.29 0.010 -1.66

527.28 502.08 1.10 24.09 12.64 0.020 -0.58________ ________ ________ ________ __________ ______ ________

-5138.33 -32.70 0.88 -5106.51

Energy component F_m is an approximation to Fm in which some of theconcave and saddle faces contributing to the molecular surface are ne-glected. All convex faces are included. Also neglected from the full energy (Fr+Fe+Fs+Ft+Fb+Fc+Fh+Fm+Fg+Fp) are the Fg and Fp components.

The above listing shows the inflation trajectory of the 1st cycle.

continued on next page

72

Page 73: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Let (p,q) be a pair of variable coordinates, where p are coordinates for a set of re-pulsion parameters, essentially atomic radii for a subset of atom types, and q arecoordinates for the complete set of torsion angle degrees of freedom. An inflationtrajectory is generated by varying coordinates p to minimize the target functionF=( Fy+Fz+Fw). The torsion angles q also vary, but not independently of p, q is afunction of p, determined by the constraint that q be a local minimum conformationof the energy surface restbch_m. The energy surface depends on p through therepulsion+dispersion component Fr dependence on atomic radii.

Components of the target function have physical meaning as follows. Fy=−(Fe+Fh+Ft+Fb+F_m), is the negative of electrostatic energy +hydrophobic energy+intrinsic torsional energy. Fz is proportional to the length of the displacement of q,the local energy minimum conformation on the restbch_m surface. Fw drives infla-tion by decreasing as atomic radii expand.

For H-bonding atom pairs, the interaction energy between the donor Hydrogen andthe electronegative acceptor is evaluated using 2 functional forms: the multipoleexpansion accumulated to Fe, and the buf14-7 term accumulated to Fr. In this case,the attractive energy contribution being captured by the Fr functional form resultsfrom a flow of electron density that is electrostatic in nature. In contrast, for allother atom pairs, the attractive energy modeled by the Fr functional form originatesfrom the dispersion force. For this reason, the contributions accumulated to Fr fromH-bonding atom pairs are included in Fy. These added terms, taken from the Frcomponent, account for the difference between Fy and the sum of the components (Fe+Fh+Ft+Fb+F_m).

The net effect of the target function is expansion of atomic radii, driven by decreaseof Fw, in a coordinated way such that the electrostatic +hydrophobic energy climbsthe maximum slope. The energy pumped into the system by Fw is analagous to thekinetic energy of the physical equations of motion, smoothing the function Fy, andfacilitating the transitioning between local minima.

The trajectory ends if the number of steps reaches 40. A second stopping crite-ria would occur if the change in Fy reaches a threshold value of (-.30)*(number oftorsions). As seen from the set sizes for the mechanical system, the number oftorsions is 108.

change in RHO

C08 6.6199

C09 6.1451

N11 7.3247

O13 6.4089heavy atom RMSD= 1.726

continued on next page

73

Page 74: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

For each type of atom for which the radius is variable, the above section shows thechange (in Angstroms) in the parameter RHO corresponding to the inflation tra-jectory. Parameters (EPS,RHO,ALP) control, respectively, the ( depth, interatomicdistance of the minimum, softness) of the Fr atom pair interaction energy.

In all cases, heavy atom RMS deviation is from the starting structure for the currentcycle. Here the inflated structure differs from the starting structure by 1.726 Å.

EQUILIBRATE RESTCH_M

i F z b lam2 s2 delF

0-4.62478e+02 1.4256e+01 4.0000e-02 3.3805e+02 4.0000e-02-4.64700e+02

1-4.66216e+02 1.2972e+01 4.4000e-02 2.7766e+02 4.4027e-02-4.68435e+02

2-4.69538e+02 1.2088e+01 4.8400e-02 2.3578e+02 4.8431e-02-4.71815e+023-4.71295e+02 1.1187e+01 4.3560e-02 2.4339e+02 4.3583e-02-4.73194e+02

4-4.73525e+02 1.4917e+01 4.7916e-02 3.3655e+02 4.7916e-02-4.76498e+02

.

.

.78-5.07688e+02 2.3663e+00 2.1258e-02 1.0716e+02 2.1258e-02-5.07885e+02

79-5.07850e+02 2.2688e+00 1.9132e-02 1.1456e+02 1.9132e-02-5.08021e+02

80-5.03668e+02 4.1773e+00

tot Fr Fe Fs Ft Fc Fb Fh F_m

-326.72 191.27 -406.40 0.00 6.63 0.00 8.23 -9.46 -116.99-331.21 187.39 -401.60 0.00 8.02 0.00 7.69 -10.98 -121.72

-331.70 186.72 -400.72 0.00 8.55 0.00 7.54 -11.71 -122.08

-330.84 187.33 -400.50 0.00 8.76 0.00 7.50 -11.75 -122.18

-330.80 186.86 -399.77 0.00 9.23 0.00 7.40 -12.45 -122.09

.

.

.

-298.72 198.38 -402.14 0.00 17.82 0.00 7.17 -5.11 -114.84

-298.44 198.52 -402.06 0.00 17.86 0.00 7.18 -5.02 -114.91

-298.21 198.60 -401.97 0.00 17.88 0.00 7.18 -4.95 -114.96________ ________ ________ ________ ________ ________ ________ ________ ________

28.51 7.33 4.44 0.00 11.25 0.00 -1.05 4.52 2.03

F Fy Fz Fw lamda s delf

-462.48 -521.90 -1.04 60.46 14.01 0.020 -0.61

-466.22 -522.31 -1.97 58.07 7.91 0.020 -0.36-469.54 -522.82 -2.30 55.58 97.48 0.010 -0.61

-471.29 -521.79 -2.49 52.98 8.10 0.020 -0.36

-473.52 -521.59 -2.72 50.78 34.68 0.020 -0.84

.

.

.

-507.69 -500.43 -19.40 12.15 2.65 0.020 -0.12

-507.85 -500.28 -19.49 11.92 2.68 0.020 -0.12

-503.67 -500.13 -19.58 16.04 2.76 0.020 -0.12________ ________ ________ ________ __________ ______ ________

-41.19 21.76 -18.54 -44.41

continued on next page

74

Page 75: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

For each cycle, the trajectory generated consists of 2 parts: inflation followed byequilibration. Equilibration differs from inflation by a change to the target function.For target function component Fy, the only change from the definition used in theinflation trajectory is a reversal of sign. Component Fz changes in 3 ways: a rever-sal of sign, an increase in the magnitude of the factor by which RMS displacement isscaled, and the reference conformation from which RMS displacement is calculated.For the inflation trajectory, displacement is calculated from the initial conformation.For the equilibration trajectory, displacement is calculated from the endpoint of theinflation trajectory. Component Fw remains unchanged.

The net effect of the target function, mediated by coordinated changes of atomicradii, is to decrease electrostatic +hydrophobic energy. The trajectory through pa-rameter space is an attempt to facilitate transition between local minima on therestbch_m energy surface. In contrast, using the physically meaningful values ofatomic radii, and not allowing variation, generating transitions between local min-ima is greatly complicated by the energy barriers.

Here, the equilibration trajectory appears driven by the continued decrease in Fw.Fy, although targeted to decrease, gets pushed up by about half of the decrease inFw. The trajectory ends if the number of steps reaches 80. Other (less common)stopping criteria include either the change in Fy or the change in Fz reaching athreshold value.

change in RHO

C08 8.5002

C09 8.1882

N11 9.1503

O13 8.7490heavy atom RMSD= 4.479

The parameters continue to move, here driven by the decrease in Fw.

DEFLATE RESTCH_M________________________________________________________________________________

i F z b lam2 s2 delF

0-3.28395e+02 1.7824e+00 2.0000e-02 2.9797e+01 2.0001e-02-3.30355e+02

1-3.30074e+02 1.0235e+00 1.6000e-02 3.0449e+01 1.6025e-02-3.31000e+022-3.30818e+02 4.2606e-01 1.2800e-02 2.6923e+01 1.2845e-02-3.31333e+02

3-3.31311e+02 3.3613e-01 1.5360e-02 1.9845e+01 1.5360e-02-3.31835e+02

4-3.31832e+02 3.0740e-01 1.8432e-02 1.5858e+01 1.8432e-02-3.32426e+02

.

.

.

38-3.49374e+02 6.3605e-01 7.8604e-03 1.8682e+01 7.8609e-03-3.49537e+02

39-3.49540e+02 1.6344e-01 9.4325e-03 1.5244e+01 9.4325e-03-3.49690e+02

40-3.49672e+02 1.4691e-01

F Fr Fe Fs Ft Fc Fb Fh F_m

continued on next page

75

Page 76: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

-349.67 162.31 -412.94 0.00 -8.70 0.00 43.05 -21.52 -111.88

heavy atom RMSD= 4.863

To end the cycle, the original (physically meaningful) radii are restored. Local mini-mization on the restbch_m energy surface produces a local minimum conformationthat can be compared to the initial conformation (total energy= -367.86 kcal/molas seen from the initial evaluation of the inflation trajectory), and to conformationsproduced by previous cycles.

.

.

.

F Fr Fe Fs Ft Fc Fb Fh F_m

-353.58 167.49 -431.35 0.00 6.12 0.00 32.85 -31.73 -96.96heavy atom RMSD= 1.918

.

.

.

F Fr Fe Fs Ft Fc Fb Fh F_m-373.62 171.31 -473.32 0.00 -4.78 0.00 43.33 -31.63 -78.53

heavy atom RMSD= 2.686

.

.

.F Fr Fe Fs Ft Fc Fb Fh F_m

-385.35 165.08 -466.18 0.00 -2.92 0.00 42.78 -37.45 -86.65

heavy atom RMSD= 1.820

Over 4 cycles, total energy has decreased by 17 kcal/mol, electrostatic energy de-fined as (Fe+Fh+F_m) has decreased by 22 kcal/mol. As a point of reference, the nmrstructure for 2jof, following local energy minimization using the etra command, hasa total restbch_m energy of -405.92 kcal/mol.

COMPACT ANALYSIS

RMSD Energy Decomposition Acceptance

Ri Re Rd Ftot Frstbc Fe Fh F_m dtot z rn ACCEPT

0 1.73 4.48 4.86 -349.67 196.67 -412.94 -21.52 -111.88 18.18 0.00047 0.68506 0

1 2.96 2.53 1.92 -353.58 206.47 -431.35 -31.73 -96.96 14.28 0.00242 0.41610 02 1.94 4.27 2.69 -373.62 209.86 -473.32 -31.63 -78.53 -5.77 0.00000 0.00000 1

3 2.57 2.77 1.82 -385.35 204.93 -466.18 -37.45 -86.65 -11.73 0.00000 0.00000 1

continued on next page

76

Page 77: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

For the above compact summary of the trajectory of local minima, column headingsare defined as follows:

Ri,Re,Rd heavy atom RMSD (in Angstroms) from start of cycle following infla-tion, equilibration, and deflation

Ftot total energy (in kcal/mol)

Frstbc sum of energy components ( Fr+Fs+Ft+Fb+Fc)

Fe, Fh, F_m energy components electrostatic, hydrophobic, and differentiableapproximation to dielectric medium

dtot change in total energy from previous accepted point

z exp( -(.25)*dtot/kT)

rn random number in range [ 0, 1]

ACCEPT binary acceptance decision { 0=reject, 1=accept}

For cycles 0 and 1, Ftot increases relative to the initial conformation of the trajec-tory by 18.18 and by 14.28 kcal/mol, respectively. Both of these local minima arerejected based on the metropolis monte carlo acceptance decision. For cycles 2 and3, Ftot decreases by 5.77 and by 11.73 kcal/mol, respectively. Both local minima areaccepted as steps of the trajectory.

The 4 step trajectory of local minima lowers total energy by -17.49 kcal/mol. In thetrajectory through parameter space, the Fy component of the target function biasestoward lower values for energy contibutions that can be considered electrostatic innature. These include energy components Fe, F_m, and the contribution to Fr from H-bonding atom pairs. In addition, energy component Fh can be considered electrostaticin nature, acting through entropic contributions to hydration free energy. Consistentwith this bias, the change, from the start of the trajectory of local minima to the end,in the sum Fe +Fh+F_m, is -27.06 kcal/mol.

The search of conformation space funcionality provided by the ptra command is ex-ploratory. Utility will depend on finding applications for which this command performsbetter in comparison to the alternative (and much more highly evolved) estp com-mand. In this sense, the example test case provided for this command is not a gooddemonstration of utility. A hoped for application is protein-protein or protein-nucleicacid docking. In such cases, the subset of degrees of freedom would allow transla-tion and rotation between molecules in addition to some surface flexibility for eachmolecule.

77

Page 78: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

6.10 etra

FUNCTIONALITYlocal minimization on the restbch_m energy surface

SYNTAXetra FAM MOL

INPUT FILES OUTPUT FILES/FAM/exp/MOL.pdb /FAM/dgn/etra.MOL

/FAM/seq/seq.MOL/FAM/tor/tor.MOL.CNF/FAM/car/MOL.CNF.pdb

SUMMARY

1) inputs a pdb-format file, FAM/exp/MOL.pdb

2) regularizes geometry, meaning bond lengths, bond angles, and some torsion anglesare adjusted to standard values with minimal movement of atom coordinates

3) starting from the initial geometry-regularized structure, minimizes energy locallywith respect to all torsion angle degrees of freedom

4) to prevent movement out of the well of the initial conformation, minimization on theenergy surface is accomplished by generating a sequence of up to 9 local minimiza-tion trajectories, gradually reducing to zero the weighting coefficient for a set of har-monic distance constraints taken from the initial structure

5) outputs, in file FAM/seq/seq.MOL, a specification of residue sequence and disulfidecrosslinks

6) outputs, using CNF=01 to denote the initial geometry-regularized structure, andCNF=02,03,04,...,10 to denote the energy minima corresponding to the endpoints ofthe sequence of local minimization trajectories, 2 alternative specifications of confor-mation: files FAM/car/MOL.CNF.pdb and FAM/tor/tor.MOL.CNF

7) outputs, in file FAM/dgn/etra.MOL, a diagnostic summary of the execution of thecommand, including a compact analysis of the sequence of energies and RMS devia-tions evaluated at the end points of the local minimization trajectories

NOTES

The primary use of the etra command is to support development of the ptra com-mand by facilitating energy evaluation of conformations. Both commands use the

78

Page 79: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

restbch_m energy surface, which substitutes for the Fm component of the total energya differentiable approximation F_m.

Input and output are analagous to the ereg command. Consistent with the greg com-mand, the conformation name 01 is assigned by the program to the conformation thatresults from initial geometry regularization.

Because the computational effort required to calculate derivatives of F_m is large, andbecause the substitution of F_m introduces approximation, the etra command is muchless useful for applications in comparison to the ereg command.

EXAMPLE TEST CASE

For the trp cage mini protein, we calculate the restch_m energy of the nmr structure.This energy value is useful in testing of the ptra command. In general, when a confor-mation can be found having lower energy than the experimental structure, this oftenindicates a problem remaining in the energy surface being used.

File 2jof_r.pdb, a conformation representing the native state for the trp cage miniprotein, has been created in the test case for the rcyc command and entered intodirectory test/exp. To avoid overwriting of files previously output by the ereg com-mand, the input file is first copied and renamed.

To execute the command, open a window to directory src/str, and type the followinglines to the linux prompt.% cp ../../test/exp/2jof_r.pdb ../../test/exp/2jof_alt.pdb% etra test 2jof_alt

The program outputs files test/seq/seq.2jof_alt, test/tor/tor.2jof_alt.CNFfor CNF={01,...,06}, test/car/2jof_alt.CNF.pdb for CNF={01,...,06}, andtest/dgn/etra.2jof_alt.

The local energy minimum conformation is characterized by a heavy atom RMSD fromthe starting structure of .23 angstroms, and by a total energy of -387.51 kcal/mol.

file=test/dgn/etra.2jof_alt

Diagnostic file created by the command:% etra test 2jof_altOutput generated by the etra command is closely analagous to output generatedby the ereg command.

REGULARIZE GEOMETRY

heavy atom RMSD= 0.019

continued on next page

79

Page 80: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

For CNF=01, created by geometry regularization with no consideration of energy.

CONSTRUCT MECHANICAL SYSTEM

.

.

.MINIMIZE RESTCH_M LOCALLY

distance constraint coeff index= 0

Affecting energy component Fc, the weighting coefficient for the sum of harmonicdistance constraints is set at 1.00 kcal/(mol-bohr2). Weights corresponding to otherindexes can be found in data file src/dat/energy_params, input to program vari-able W0a.

i F z b lam2 s2 delF

0-3.66481e+02 6.9034e+00 1.0000e-02 1.9493e+02 1.0002e-02-3.70082e+02

1-3.70212e+02 2.5411e+00 1.2000e-02 1.0672e+02 1.2003e-02-3.72308e+02

2-3.72329e+02 1.3901e+00 1.4400e-02 6.8201e+01 1.4401e-02-3.74054e+02

3-3.74050e+02 9.7865e-01 1.7280e-02 6.8664e+02 1.7291e-02-3.85299e+024-3.73865e+02 1.1936e+00 1.2960e-02 6.3836e+01 1.2980e-02-3.75141e+02

5-3.75139e+02 8.0426e-01 1.5552e-02 4.3546e+01 1.5552e-02-3.76354e+02

.

.

.49-3.82704e+02 2.4093e-02 1.7609e-05 1.3319e+03 1.7622e-05-3.82704e+02

50-3.82704e+02 4.0180e-02 1.3207e-05 3.0051e+03 1.3231e-05-3.82704e+02

51-3.82704e+02 2.3921e-02 1.0000e-05 2.3546e+03 1.0000e-05-3.82704e+02

52-3.82704e+02 2.3547e-02

At each point of the trajectory, the program calculates energy, 1st derivatives, and2nd derivatives.

These quantities define a harmonic approximation to the actual energy surface.

Steps are calculated using Newton’s method to minimize the harmonic surfacewithin a trust region.

continued on next page

80

Page 81: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

For the above compact characterization of the minimization trajectory, columnlabels are defined as follows:

i Minimization step.

F Total energy.

z RMS of the gradient.

b Radius of the trust region, or equivalently RMS of a step to the boundary ofthe trust region.

lam2 Lagrange multiplier used to enforce the condition that the step minimize theharmonic surface on the boundary of the trust region. A value of zero indi-cates the minimum of the harmonic surface is inside of the boundary of thetrust region.

s2 RMS of the calculated step.

delF Value of F estimated by the harmonic surface at the position of the calculatedstep.

At each step i>1, the radius of the trust region, b(i), is adjusted based on a com-parison of the actual change in energy, F(i)−F(i−1), to the decrease in energy,delF(i−1)−F(i−1), predicted by the harmonic surface of the previous step.

The maximum number of steps allowed in this minimization is 64. Convergence,defined as a value of z less than (1.00e-4), is not achieved. Although the function Fcontinues to decrease, the step size b becomes small as the trajectory approachesthe local minimum on the energy surface. The analytical derivatives are correctbased on comparison to numerical derivatives. The remaining problem originatesfrom faces of the boundary surface (convex, concave, and saddle) being createdand destroyed by very small changes in atom positions. Further work would be re-quired to enable efficient convergence. Because the ptra and etra commands areexploratory, seeking an application in which these commands perform better thanthe alternative estp and ereg commands, further work to increase performancehas not yet been attempted.

F Fr Fe Fs Ft Fc Fb Fh F_m

-383.46 168.16 -450.66 0.00 -7.74 0.76 20.79 -19.03 -94.99

Decomposition of full energy at the endpoint of local minimization. Charges on ion-ized groups are full values. F is the sum of all energy components excluding Fc.Energy component F_m is a slightly simplified estimate of Fm that enables calculationof analytic 1st and 2nd derivatives.

continued on next page

81

Page 82: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

heavy atom RMSD= 0.061

For CNF=02, the endpoint of local minimization.

distance constraint coeff index=-1

The weighting coefficient for the sum of harmonic distance constraints is assignedsuccessively smaller values for minimization trajectories CNF={03,04,05,...}.

.

.

.

F Fr Fe Fs Ft Fc Fb Fh F_m

-384.55 168.35 -451.12 0.00 -8.47 0.72 20.90 -18.90 -95.30heavy atom RMSD= 0.085

.

.

.

distance constraint coeff index=-5.

.

.

F Fr Fe Fs Ft Fc Fb Fh F_m

-387.49 170.94 -451.91 0.00 -12.02 0.06 21.78 -19.79 -96.50heavy atom RMSD= 0.240

If the full energy (excluding Fc) obtained at the endpoint of a local minimizationtrajectory increases relative to the full energy obtained for the previous trajectory,then the sequence of local minimization trajectories is ended, and the endpoint ofthe final minimization is not accepted.

Alternatively, the sequence of trajectories ends when the weighting coefficient forFc becomes zero, corresponding to index -8.

Each of the 6 local minimization trajectories characterized in the preceding sec-tions is generated on the (Fr+Fe+Fs+Ft+Fb+Fc+Fh+F_m) energy surface. Althoughnot shown in this diagnostic file, the first of these trajectories is preceded by a lo-cal minimization trajectory generated on the (Fr+Fe+Fs+Ft+Fb+Fc+Fh+Fw) energysurface. Intended as a method to accelerate convergence, minimization on therestbchw energy surface creates a starting point for continued minimization on therestbch_m energy surface.

COMPACT ANALYSIS

W0 F Fr Fe Fs Ft Fc Fb Fh F_m RMSD

0 -383.46 168.16 -450.66 0.00 -7.74 0.76 20.79 -19.03 -94.99 0.06

-1 -384.55 168.35 -451.12 0.00 -8.47 0.72 20.90 -18.90 -95.30 0.08

-2 -385.64 168.82 -451.37 0.00 -9.30 0.60 21.02 -19.01 -95.80 0.13

-3 -387.09 170.01 -451.52 0.00 -11.42 0.51 21.66 -19.33 -96.48 0.20

continued on next page

82

Page 83: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

-4 -387.51 170.77 -451.82 0.00 -12.00 0.21 21.78 -19.68 -96.55 0.23________ ________ ________ ________ ________ ________ ________ ________ ________

-4.04 2.60 -1.16 0.00 -4.27 -0.55 1.00 -0.65 -1.57

A compact summary of how the full energy components and RMSD change withminimization.

7 CONTEXT

Because the ereg, estp, rcyc, ionstate, ptra, and etra commands access the en-ergy surface and the conformational search algorithms, the functionality provided bythese commands is unique to this package. Also, the functionality of the greg com-mand is, we believe, unique. In contrast, for the prof, hlog, and igor commands,there exist alternative, possibly better, almost certainly faster, software packages ca-pable of accomplishing structure quality assessment, homology model building, andfold recognition. However, as opposed to interfacing to external software, the abovecommands have the benefit of maintaining consistent rigid geometry along with thefile and directory organization of the ereg package.

8 UTILITY COMMANDS

The following 2 commands facilitate format conversion.

8.1 seqformat

FUNCTIONALITYsequence format conversion

SYNTAXseqformat FAM MOL

INPUT FILES OUTPUT FILES/FAM/exp/seq.MOL /FAM/seq/seq.MOL

SUMMARY

83

Page 84: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

1) inputs the sequence of a single polypeptide chain in 1-letter code format

2) outputs the sequence in an alternative format that allows specification of disulfidecrosslinks, multiple chains, alternative ionization states, and residues outside the setof 20 naturally occurring amino acids

NOTES

The 1-letter code C is translated to residue CYH, protonated free cysteine. To specifydisulfide bond crosslinks, edit by hand file FAM/seq/seq.MOL. Add a line specifying thepair of residues forming the disulfide bond and, for each residue of this pair, changeCYH to CYS.

EXAMPLE TEST CASE

In this example, MOL=grb2 is the sh3 domain of growth factor bound protein 2.

To execute the command, open a window to directory src/str, and type the followingline to the linux prompt.% seqformat test grb2

8.2 initcar

FUNCTIONALITYinitial structure generation

SYNTAXinitcar FAM MOL CNF

INPUT FILES OUTPUT FILES/FAM/seq/seq.MOL /FAM/dgn/initcar.MOL/FAM/tor/tor.MOL.CNF (optional) /FAM/car/MOL.CNF.pdb

/FAM/tor/tor.MOL.CNF (if notpresent)

SUMMARY

1) inputs a sequence, FAM/seq/seq.MOL

2) inputs (if present) an initial conformation in torsion angle format, FAM/tor/tor.MOL.CNF

3) if torsion angle input is not present, assigns, for each residue, backbone torsion an-gles consistent with the most probable residue state calculated using the igor model,and an extended conformation of the side chain

84

Page 85: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

4) generates cartesian coordinates from torsion angle coordinates

5) outputs, in file FAM/dgn/initcar.MOL, a diagnostic summary of the execution ofthe command

6) outputs, in pdb-format file FAM/car/MOL.CNF.pdb, cartesian coordinates of the ini-tial conformation

7) if not present initially, outputs, in file FAM/tor/tor.MOL.CNF, torsion angle coordi-nates of the initial conformation

NOTES

To initiate a new conformation by hand editing, or to alter an existing program-generated conformation, the easiest way is by creating or modifying a torsion angleformat file test/tor/tor.MOL.CNF.

The most common use of the initcar command is to generate a corresponding pdb-format file for a hand modified torsion angle format file. This functionality is a simpleformat conversion for the specification of a conformation.

A second use of the command is to generate an initial conformation for a sequenceconsistent with the predicted secondary structure. Although far from the native con-formation for the sequence, this choice for an initial conformation is sometimes prefer-able to a random conformation.

EXAMPLE TEST CASE

As an example, an initial conformation CNF=s000 is prepared for protein MOL=2jof.This conformation has been used in a previous example as input to commandestp. The most probable secondary structure state is shown in diagnostic filetest/dgn/initcar.2jof.

To execute the command, open a window to directory src/str, and type the followinglines to the linux prompt.% initcar test 2jof i% cp ../../test/car/2jof.i.pdb ../../test/exp/2jof_i.pdb% ereg test 2jof_i% cp ../../test/tor/tor.2jof_i.06 ../../test/tor/tor.2jof.s000% cp ../../test/car/2jof_i.06.pdb ../../test/car/2jof.s000.pdb

9 FILE FORMATS

In this section, we describe, using examples, the input file types of Table III.

85

Page 86: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

Table III. Types of Input Files.

General Name Content

/FAM/seq/seq.MOL residue sequence/FAM/tor/tor.MOL.CNF torsion angles/FAM/stp/stp.SUB.MOL subset of degrees of freedom/FAM/car/dprof.MOL.CNF defect energy density profile/FAM/arg/exptstructs.GRP group of templates/FAM/exp/seq.MOL 1-letter code residue sequence

9.1 residue sequence

Thick vertical lines on the boundaries with the left and right margins distinguish linesthat have been added for the purpose of description. These added lines should not beincluded in actual input files. Formats for input lines are specified using the compactfortran notation for fields and repeat counts.

file=test/seq/seq.1crn

1

Number of polymer chains. FORMAT=(1x,i4).

46

Number of residues in first chain. FORMAT=(1x,i4).

|eTHR|THR |CYS |CYS |PRO |SER |ILE |VAL |ALA |ARG |SER |ASN |PHE |ASN |VAL |CYS|ARG |LEU |PRO |GLY |THR |PRO |GLU |ALA |ILE |CYS |ALA |THR |TYR |THR |GLY |CYS|ILE |ILE |ILE |PRO |GLY |ALA |THR |CYS |PRO |GLY |ASP |TYR |ALA |ASNe

Sequence of residues. FORMAT=(16(1x,a4)). N- and C-terminal residues, becausethese differ in composition and connectivity, are distinct from interior residues hav-ing the same amino acid side chain.

3

Number of disulfide bonds. FORMAT=(1x,i4).

1 3 1 401 4 1 321 16 1 26

(chain index, residue index) of 1st then 2nd half cystine. One disulfide bond perline. FORMAT=(4(1x,i4)).

86

Page 87: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

If a system of molecules contains n chains, the residue sequences of chains 2 to n arespecified following the specification for chain 1, and prior to specification of disulfidebonds.

For nucleic acid chains, the most natural choice for the unit of composition, the nu-cleotide, was not used. Instead, each nucleotide is partitioned into an ordered tripleof residues (a pentose ring, a nucleic acid base, a backbone phosphate). This choice,although it imposes a burden of required preprocessing for structures taken from thePDB database, was considered preferable to a combinatorial explosion of residuetypes.

Table IV. Currently Available Residues.

Num Res Classa Descriptionb

1 ALA a alanine2 ASP a aspartic acid (ionized)3 CYS a half cystine4 GLU a glutamic acid (ionized)5 PHE a phenylalanine6 GLY a glycine7 HIS a histidine (N-delta protonated)8 ILE a isoleucine9 LYS a lysine (ionized)

10 LEU a leucine11 MET a methionine12 ASN a asparagine13 PRO a proline14 GLN a glutamine15 ARG a arginine (ionized)16 SER a serine17 THR a threonine18 VAL a valine19 TRP a tryptophan20 TYR a tyrosine21 AIB a aminoisobutyric acid22 ABU a aminobutyric acid23 NLE a norleucine24 ORN a ornithine (ionized)25 CYH a cysteine26 HIE a histidine (N-epsilon protonated)27 HIP a histidine (ionized)28 ASZ a aspartic acid (neutral, protonated)

continued on next page

87

Page 88: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Num Res Class Description

29 CYZ a cysteine (ionized, deprotonated)30 GLZ a glutamic acid (neutral, protonated)31 LYZ a lysine (neutral, deprotonated)32 TYZ a tyrosine (ionized, deprotonated)33 ACE e N-terminal acetyl34 NME e C-terminal N-methyl35 eALA a N-terminal ALA36 eASP a N-terminal ASP37 eCYS a N-terminal CYS38 eGLU a N-terminal GLU39 ePHE a N-terminal PHE40 eGLY a N-terminal GLY41 eHIS a N-terminal HIS42 eILE a N-terminal ILE43 eLYS a N-terminal LYS44 eLEU a N-terminal LEU45 eMET a N-terminal MET46 eASN a N-terminal ASN47 ePRO a N-terminal PRO48 eGLN a N-terminal GLN49 eARG a N-terminal ARG50 eSER a N-terminal SER51 eTHR a N-terminal THR52 eVAL a N-terminal VAL53 eTRP a N-terminal TRP54 eTYR a N-terminal TYR55 eAIB a N-terminal AIB56 eABU a N-terminal ABU57 eNLE a N-terminal NLE58 eORN a N-terminal ORN59 eCYH a N-terminal CYH60 eHIE a N-terminal HIE61 eHIP a N-terminal HIP62 eASZ a N-terminal ASZ63 eCYZ a N-terminal CYZ64 eGLZ a N-terminal GLZ65 eLYZ a N-terminal LYZ66 eTYZ a N-terminal TYZ67 ALAe a C-terminal ALA

continued on next page

88

Page 89: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Num Res Class Description

68 ASPe a C-terminal ASP69 CYSe a C-terminal CYS70 GLUe a C-terminal GLU71 PHEe a C-terminal PHE72 GLYe a C-terminal GLY73 HISe a C-terminal HIS74 ILEe a C-terminal ILE75 LYSe a C-terminal LYS76 LEUe a C-terminal LEU77 METe a C-terminal MET78 ASNe a C-terminal ASN79 PROe a C-terminal PRO80 GLNe a C-terminal GLN81 ARGe a C-terminal ARG82 SERe a C-terminal SER83 THRe a C-terminal THR84 VALe a C-terminal VAL85 TRPe a C-terminal TRP86 TYRe a C-terminal TYR87 AIBe a C-terminal AIB88 ABUe a C-terminal ABU89 NLEe a C-terminal NLE90 ORNe a C-terminal ORN91 CYHe a C-terminal CYH92 HIEe a C-terminal HIE93 HIPe a C-terminal HIP94 ASZe a C-terminal ASZ95 CYZe a C-terminal CYZ96 GLZe a C-terminal GLZ97 LYZe a C-terminal LYZ98 TYZe a C-terminal TYZ99 H2O s water

100 NH2 e C-terminal NH2101 UNK a unknown102 D r deoxyribose103 R r ribose104 RME r 2’O-methyl105 RF r 2’flouro106 MOE r 2’O-methoxyethyl

continued on next page

89

Page 90: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Num Res Class Description

107 LNA r locked nucleic acid108 CET r 2’O-constrained ethyl109 PO p phosphodiester110 PSR p phosphorothioate R isomer111 PSS p phosphorothioate S isomer112 5OH p 5’OH113 3OH p 3’OH114 5PO p 5’phosphate115 3PO p 3’phosphate116 N b NH2 (a minimal replacement of base)117 A b adenine118 AP b ionized adenine (N1 protonated)119 G b guanine120 GP b ionized guanine (N7 protonated)121 GM b ionized guanine (N1 deprotonated)122 T b thymine123 TM b ionized thymine (N3 deprotonated)124 C b cytosine125 CP b ionized cytosine (N3 protonated)126 U b uracil127 UM b ionized uracil (N3 deprotonated)

a classes: a= amino acid, r= nucleic acid pentose ring, b= nucleic acid base, p= nu-cleic acid backbone phosphate, c= carbohydrate, l= lipid, e= end group, s= smallmoleculeb All N-terminal amino and C-terminal carboxyl groups are ionized. For each N- andC-terminal amino acid residue, a corresponding residue, with e replaced by z in thename, having a neutralized amino or carboxyl group is created by the program.

9.2 torsion angles

file=test/tor/tor.1crn.01

1 17.0804470 13.9565390 3.5632209 -26.1457966 41.5224739 128.6902053

Chain translation and rotation, one chain per line.

continued on next page

90

Page 91: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

chain index| x translation (angstrom)| | y translation| | | z translation| | | | euler angle α (degree)| | | | | euler angle β| | | | | | euler angle γ

FORMAT=(i2,1x,f12.7,f12.7,f12.7,f12.7,f12.7,f12.7)__________________________________________________________________________________________

Horizontal lines separate chains.

eTHR 180.0000 140.3473 174.1240 -61.0830 150.0000 60.0000THR -104.0522 138.8087 173.6611 56.7180 30.0000 60.0000CYS -128.5749 141.1154-171.3352 -34.5296 -89.7740...

TYR -126.3966 79.0567-159.8522 -71.4305 -94.8297 -0.0000ALA -104.3896 -2.8579-168.6423 60.0000ASNe -121.4300 22.5508 -61.2574 114.7476-180.0000__________________________________________________________________________________________

Torsion angles, IUPAC definitions, one residue per line.

residue name| φ| | ψ| | | ω| | | | χ1

| | | | | χ2

| | | | | | χ3

| | | | | | | χ4

| | | | | | | | χ5

| | | | | | | | | χ6

| | | | | | | | | | χ7

FORMAT=(a4,1x,f9.4,f9.4,f9.4,f9.4,f9.4,f9.4,f9.4,f9.4,f9.4,f9.4)

Torsion angles that do not exist for a residue, for example ω for a C-terminalresidue, can be left blank.

For residues of the classes r, b, and p that compose nucleic acid chains, the order usedfor torsion angles can be found in data file src/dat/residue_interface, input tovariable T0tor.

Because the pdb file format, used in this package to store conformations, is a widelyused standard, it is, with the following exception, defined elsewhere. For nucleic acidchains, the order used for atoms within residues of classes r, b, and p can be found indata file src/dat/residue_interface, input to variable P0atm.

91

Page 92: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

9.3 subset of degrees of freedom

file=test/stp/stp.l5.4pti

11

Number of residues containing torsion angles that will be allowed to vary duringenergy minimization. FORMAT=(i4).

[ 1: 20] 0 2 0 ARG[ 1: 43] 0 0 0 ASN[ 1: 44] 2 2 7 ASN[ 1: 45] 2 2 0 PHE[ 1: 46] 2 2 0 LYS[ 1: 47] 2 2 0 SER[ 1: 48] 2 1 0 ALA[ 1: 49] 2 2 0 GLU[ 1: 50] 2 2 0 ASP[ 1: 53] 0 2 0 ARG[ 1: 54] 0 2 0 THR

One line per residue.

’[’ character| chain index| | ’:’ character| | | residue index| | | | ’]’ character| | | | || | | | | backbone search parameter, controls| | | | | conformational search characteristics of| | | | | torsion angles (ω,φ,ψ), where ω is the| | | | | final backbone torsion angle of the| | | | | preceding residue, [0=held fixed, 1=allowed to| | | | | vary, 2=actively assigned alternative values]| | | | | || | | | | | side chain search parameter, controls| | | | | | conformational search characteristics of| | | | | | side chain torsion angles, [0=held fixed,| | | | | | 1=allowed to vary, 2=actively assigned| | | | | | alternative values]| | | | | | || | | | | | | for the first residue of a segment to be| | | | | | | deformed, set equal to the length of the| | | | | | | segment, zero otherwise| | | | | | | || | | | | | | | concatenated string of names of| | | | | | | | residues to be substituted at this| | | | | | | | position

FORMAT=((1x,i2,1x,i4,1x),1x,i2,i2,i2,2x,32a4).

Rules:

continued on next page

92

Page 93: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

1) The length of a deformed segment must be an odd number equal to{ 5, 7, 9,11,13}.

2) For each string of residues such that the backbone search parameter is 1 or 2,if the residue preceding this string is not already included in this file, due to sidechain torsion angles that will be allowed to vary, then it must be added with back-bone and side chain search parameters set to zero. In addition to φ and ψ, the back-bone search parameter controls the ω torsion angle associated with the precedingresidue. As a consequence, for each residue having backbone search parameterequal 1 or 2, the set of residues containing torsion angles that will be allowed tovary during energy minimization must include the preceding residue.

3) For convenience, when used with the command estp, if the first character of thesubset name is ’l’ or ’h’, then automated expansion of the subset is bypassed. Ex-pansion, if not bypassed, adds search of side chains that contact the original set ofbackbone segments and side chains marked as searchable,

4) For convenience, when used with the command estp, if the first character of thesubset name is ’h’ or ’i’, then the combinatorial search through residue sequencespace is truncated to include only the initial and final sequences.

5) For segments of length 13 residues or greater, because the space of possiblebackbone deformations increases combinatorially with length, and because thespace of productive backbone deformations, defined as those that minimize to lowenergy conformations, remains small, the sample of deformations generated isusually not productive. As a more highly evolved alternative, a segment of length13 residues or greater can be partitioned into a sequence of adjacent segments oflengths 7 to 11 residues, and conformational search can be directed as simultane-ous deformation of the multiple small segments.

Examples of this file type can be found in directory test/stp/.

9.4 defect profile

file=test/car/dprof.1pga_.07

continued on next page

93

Page 94: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

continued from previous page

Residue..................Overlap...................................Unpaired Hbond Acceptor......................|.........bb Conformation......Disallowed (phi,psi)................|.....Exposed Hydrophobic Surface............|.........|.sc Conformation....|.....Disallowed omega..............|.....|.....Improbable Secondary Structure...|.........|.|.Total Energy.....|.....|.....Disallowed chi..........|.....|.....|.....Outlier Statistical Contact|.........|.|.|......isCoil....|.....|.....|.....Buried Charge.....|.....|.....|.....|.....Electrostatic........|.........|.|.|......|isExposed|.....|.....|.....|.....Cavity......|.....|.....|.....|.....|....................|.........|.|.|......||isLoop..|.....|.....|.....|.....|.....Unpaired HBond Donor....|.....|....................|.........|.|.|......|||.|.....|.....|.....|.....|.....|.....|.....|.....|.....|.....|.....|....................

The above headings describe the columns below.

____________________________________________________________________________________________________

eMET 1 + 0.56 111 0.00 0.00 0.00 0.12 0.00 0.03 0.40 0.00 0.00 0.00 0.00 0.00THR 2 E - 1.63 010 0.00 0.00 0.00 0.62 0.00 0.00 0.50 0.50 0.00 0.00 0.00 0.00TYR 3 E + 0.58 010 0.00 0.00 0.00 0.12 0.00 0.00 0.20 0.25 0.00 0.00 0.00 0.00LYS 4 E + 0.12 010 0.00 0.00 0.00 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

.

.

.VAL 54 E + 0.12 000 0.00 0.00 0.00 0.12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00THR 55 E + 0.73 010 0.00 0.00 0.00 0.12 0.00 0.00 0.60 0.00 0.00 0.00 0.00 0.00GLUe 56 - 3.75 111 0.00 0.00 0.00 0.25 0.00 0.00 0.00 0.50 0.00 0.00 0.00 3.00____________________________________________________________________________________________________

Decimal numbers are free energies of chain folding in kcal/mol units. These areattributed to defects in structure given in the column headings.

For side chain conformation, a value of ’-’ indicates the conformation differs from aset of commonly observed rotamers.

9.5 group of templates

file=test/arg/exptstructs.sparse

1ckaA_

1bbzA_

1cskA_

1semA_

1oebA_

1jo8A_

Template structures, one per line. FORMAT=(a32).

9.6 1-letter code sequence

94

Page 95: User Manual for ereg - Hexadecapolehexadecapole.com/usermanual/usermanual.pdf · 2 COMPUTATIONAL REQUIREMENTS The program, which consists of roughly 100,000 lines of C++ code, was

file=test/exp/seq.grb2

56

Number of residues in chain. FORMAT=(i4).

MEAIAKYDFKATADDELSFKRGDILKVLNEECDQNWYKAELNGKDGFIPKNYIEMK

1-letter code string of residues. FORMAT=(100a1).

SUPPORT

We offer support and responsiveness to user input. Contact information is provided onthe company website.

95