dot 2.0.1 user guide - san diego supercomputer center

66
DOT 2.0.1 User Guide Victoria A. Roberts Michael E. Pique Susan Lindsey Martin S. Perez Elaine E. Thompson Lynn F. Ten Eyck San Diego Supercomputer Center University of California, San Diego 9500 Gilman Drive La Jolla, CA 92093 http://www.sdsc.edu/CCMS Email: [email protected] May 30, 2013 1

Upload: others

Post on 03-Feb-2022

3 views

Category:

Documents


0 download

TRANSCRIPT

DOT 2.0.1UserGuide

Victoria A. RobertsMichaelE. PiqueSusanLindseyMartin S.Perez

ElaineE. ThompsonLynnF. TenEyck

SanDiegoSupercomputerCenterUniversityof California,SanDiego

9500GilmanDriveLa Jolla,CA 92093

http://www.sdsc.edu/CCMSEmail: [email protected]

May 30,2013

1

Contents

1 Intr oduction 51.A WhatDOT does. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.B Key DOT Chaptersandhow to learnmore . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.C Acknowledgements. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2 Try DOT: a Tutorial Example 72.A Setup userenvironment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.B Copy thetutorial files to your directory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.C RunDOT with preparedinputfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.D PrepareDOT inputfiles usingsuppliedPDB files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

2.D.1 Make surenecessaryprogramsareinstalled . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82.D.2 Run“prepscript” . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

3 DOT Quick Start Guide 113.A Setup userenvironment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.A.1 $DOT ROOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.A.2 APBS,Reduce,MSMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.B Setup yourmolecularsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.B.1 CreateWorkingDirectories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.B.2 Selectthestationaryandmoving molecules.. . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.B.3 Createprepscriptfor your molecularsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . 123.B.4 Runprepscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.B.5 Filescreatedby prepscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.B.6 Most likely problems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3.C RunDOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.C.1 RunningDOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143.C.2 RunningDOT on multiple computer/processors. . . . . . . . . . . . . . . . . . . . . . . . . . 153.C.3 WhereDOT outputgoes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.C.4 DOT outputandevaluation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.C.5 TestingDOT on multiple computers/processors. . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 Prepscript - creating input files for DOT 184.A Outlineof setupsteps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184.B AccessingDOT scriptsandutilities andauxilliary programs . . . . . . . . . . . . . . . . . . . . . . . 194.C ExamineandcompletestartingPDBfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.C.1 Missingatomsin sidechains. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.C.2 Missingloopsandchainends . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

4.D Assigningthestationaryandmoving molecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.D.1 Computationtime. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 204.D.2 Suitability of potentials. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.D.3 Otherfactors. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2

Needhelp?Email [email protected]

4.E Setup directorystructurewith startingcoordinates . . . . . . . . . . . . . . . . . . . . . . . . . . . . 214.F Setup customizedlibrary for new functionalgroups. . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.F.1 Standardresiduelibrary for proteins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224.F.2 Additional residuelibraries. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

4.G CreatePrepscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 234.H Prepscriptgeneral . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

4.H.1 GeneralApproach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.H.2 Runningprepscript . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 244.H.3 Assignmentof file names. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 254.H.4 The“pause”command,pausingduringcreationof DOT input files . . . . . . . . . . . . . . . . 26

4.I Prepscript:Stepscommonto bothmolecules– dot2-prep-mol-common . . . . . . . . . . . . . . . . . 264.I.1 Checkon rangefor total molecularcharge . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.I.2 Creatingno-hydrogenfiles (nohfiles) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.I.3 Centeringmolecules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 274.I.4 Protonationwith Reduce(allh files) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 284.I.5 Createfiles with heavy andpolarH atoms(polh)andRESnamesfor chargelibrary . . . . . . . 324.I.6 Moving moleculeshape(.xyz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334.I.7 Moving moleculepartialatomiccharges(.xyzq) . . . . . . . . . . . . . . . . . . . . . . . . . 334.I.8 Creatingnew functionalgroups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.J Determininggrid size . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.J.1 Default methodbasedon moleculardiameters. . . . . . . . . . . . . . . . . . . . . . . . . . . 354.J.2 Sizesefficient for thecalculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.J.3 Userselectionof grid dimension . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 354.J.4 Grid neednotbeacube . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.K Prepscript:Stationarymolecule. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 364.K.1 CalculateElectrostaticPotentialfor StationaryMolecule(dx) . . . . . . . . . . . . . . . . . . 364.K.2 StationaryShapeDescription(xyzcrv) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 374.K.3 ElectrostaticClampingValues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.L DOT ParameterFile (.parm) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.M Prepscriptchecksthroughoutprocessing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.M.1 Moleculardescriptionfiles notmadecorrectly(mostcommonproblemis file is empty) . . . . . 394.M.2 Problemswith centering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 394.M.3 ProblemsrunningAPBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.M.4 ProblemsrunningMSMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.M.5 ProblemsrunningReduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 404.M.6 Problemscalculatingcharges:PDBto XYZQ . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5 DOT 415.A rundot . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.B Parametersmostcommonlychanged. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 415.C Moleculardescriptionfilesneededby DOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.D RunningDOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

5.D.1 Singleprocessormode . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.D.2 Multiple computersona local network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 425.D.3 Multiple processors,singlesupercomputer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.E DOT actionson input files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.E.1 Stationarymoleculeshapepotential(.xyzcrv) . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.E.2 Moving moleculeshape(.xyz) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.E.3 Stationarymoleculeelectrostaticpotential(.apbsgrd). . . . . . . . . . . . . . . . . . . . . . . 445.E.4 Moving moleculeatomicpartialcharges(.xyzq) . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.F WhatDOT computes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

3 May 30,2013

DOT 2.0UserGuide Chapter0 Section Needhelp?Email [email protected]

5.F.1 vanderWaalsenergy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 445.F.2 Electrostaticenergy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

5.G DOT parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.G.1 SampleParameterfile . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 455.G.2 ParameterReference(Table) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 465.G.3 ParameterGuide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Evaluation 486.A After DOT run: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.B evaluatedot run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 486.C How to customizeevaluatedot run . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.D log files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.E e6doutputfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 496.F Quickcomparisonof centerandorientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.G CreatingPDB files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.H RMSD valuesbetweenPDB files . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.I Bump-checkingPDBfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.J Residue-residueinteractions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.K Re-rankingby ACE scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 506.L Distancefiltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 516.M Creatingvisualizationinputfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

7 DOT Utility Programs 537.A Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 537.B Descriptions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

8 Installing DOT2 588.A Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588.B DOT InstallationQuickStartGuide . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588.C Whatis in theDOT distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 588.D Externalprogramsneededto createDOT inputfiles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

8.D.1 Generalutilities: shell,awk/gawk, Ruby . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 598.D.2 MSMS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608.D.3 Reduce . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 608.D.4 APBS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

8.E Externallibrariesneededto runDOT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 618.E.1 MPICH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

9 Recompiling DOT2 and its components 629.A Recommendeddirectorylayout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 629.B GNU Autoconf/Automake . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639.C How to recompileFFTW . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639.D How to recompileMPICH . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 639.E How to recompileReduce. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649.F How to recompileDOT2 Utilities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 649.G How to recompileDOT2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

Referencesat end.

4 May 30,2013

Chapter 1

Intr oduction

1.A What DOT does

DOT is amacromoleculardockingprogramthatcarriesoutacomplete,systematicrigid-bodysearchfor two molecules.Intermolecularinteractionenergies are calculatedas the sum of electrostaticand van der Waals terms, which areefficiently evaluatedascorrelationfunctions.Onemolecule(S) is keptstationaryanda secondmolecule(M) is movedaboutS.Theelectrostaticandshapepropertiesof bothmoleculesaremappedontoequal-sizedgrids.M is translatedbybeingcenteredateachgrid pointof S,interactionenergiesarecalculated,M is rotated,andtheveryefficient translationalsearchis repeated.Thecalculationis dependentonthesizeof thegrid andthenumberof orientationsappliedto M. Theoutputis a ranked list of placementsof M aboutS, from which coordinatefiles in PDB formatcanbegenerated.Theresultingconfigurationscanbeanalyzedvisuallywith computergraphics,filteredby biochemicalor spectroscopicdata,analyzedto find clustering,subjectedto methodsthatintroduceflexibility,

Thespring20132.0.1releaseof theDOT2 softwarepackageis theversiondescribedin theJournalofComputationalChemistryarticle, “DOT2: Macromoleculardocking with improved biophysicalmodels”, [9], currently available inEarly View.

Thepackageprovidesthefollowing:

� Automatedsetupof DOT input filesstartingwith proteincoordinatefiles from thePDB.

� Improvementsin molecularpotentialsthathave beendescribedin literaturearenow partof theautomatedsetup.

� Error checkingduringsetupof input files to detectpotentialproblemsbeforethedockingcalculationis run.

� Portability- will runon Linux, Mac OSX, andSolaris.

� Reevaluationof top-rankedDOT protein-proteincomplexeswith ACE(pairwiseatomiccontactenergy), andwithRAASP(RubenAbagyanlab’s atomicsolvationpotentials),which take into accountdesolvationenergy.

� Simplifiedinstallationwith no needto install or configureMPICH or FFTW3.

1.B Key DOT Chaptersand how to learn more� Chapter8 - DOT Installation.

� Chapter2 - DOT Tutorial, try out first.

� Chapter3 - Dockingyourown system,thebasics.

� For help- Sendemailto theDOT helpline, [email protected].

� Join the DOT UsersMailing List ([email protected]),seehttp://lists.sdsc.edu/mailman/listinfo/dot-users tosubscribeto thelist. Youmustbeasubscriberto themailing list to post.

5

DOT 2.0UserGuide Chapter1 SectionC Needhelp?Email [email protected]

1.C Acknowledgements

We thank the Departmentof Energy (DOE), the National ScienceFoundation(NSF), and the National InstitutesofHealth(NIH) for funding.

DOT setupdependsupontheprogramsAPBS,Reduce,andMSMS.DOT analysisdependsuponACE andRAASPscoringfunctions.We appreciatethework of theauthorsof theseprogramsandfunctionsandstronglyencourageyouto acknowledgetheirefforts shouldyoupublishwork aidedby DOT. Theappropriateacknowledgementsare:

� APBS[1]: Baker NA, SeptD, JosephS,Holst MJ, McCammonJA. Electrostaticsof nanosystems:applicationtomicrotubulesandtheribosome.Proc.Natl. Acad.Sci. USA 98,10037-10041(2001).

� Reduce[13]:J. Michael Word , Simon C. Lovell, JaneS. Richardson,David C. Richardson.AsparagineandGlutamine:Usinghydrogenatomcontactsin thechoiceof side-chainamideorientation.J.Mol. Biol 285,1735-1747(1999).

� MSMS[10]: MichelSanner, ArthurJ.Olson,JeanClaudeSpehner. ReducedSurface:anEfficientWaytoComputeMolecularSurfaces.Biopolymers38,305-320(1996).

� ACE[14]: C. Zhang,G. Vasmatzis,J.Cornette,C. DeLisi. Determinationof Atomic DesolvationEnergiesFromtheStructuresof CrystallizedProteins.J.Mol.Biol 267:707-726(1997)

� RAASP[2]: JuanFernandez-Recio,Max Totrov, ConstantinSkorodumov, andRubenAbagyan.OptimalDockingArea: A New Methodfor PredictingProtein-ProteinInteractionSites.Proteins58,134-143(2005)

TheDOT softwareis built on

� The FFTW Fourier-transform library[3]: Matteo Frigo and Steven G. Johnson,Steven G. The Design andImplementationof FFTW3,Proc.IEEE93(2),216-231(2005).

� The MPICH Multiprocessing library from the US Departmentof Energy Argonne National Laboratory,http://www.mpich.org

Notethatthroughoutthis manual,full referencesto workscitedas,for example,[2], maybefoundat theend,page66.

6 May 30,2013

Chapter 2

Try DOT: a Tutorial Example

WeassumeDOT andits utilities have beeninstalled,seeChapter8 for instructions.For this tutorial,you will alsoneedtheprogramsAPBS,Reduce,andMSMS installedon your computer. In this tutorial,you will setup your environmentfor runningDOT, run DOT with preparedinput files to testthatDOT is properlylocatedandrunning,andprepareDOTinput files to testthattheDOT utilities andtheprogramsAPBS,Reduce,andMSMSareproperlylocatedandrunning.

2.A Setup userenvir onment

To find DOT, you mustsettheDOT ROOT environmentvariableto whereDOT is installed. For example,if DOT isinstalledin /usr/local/dot2,to setyour program-searchpathto includeDOT programs,do:If youarerunningcsh/tcsh(seepage11 for helpon whatthis means):

setenv DOT ROOT /usr/local/dot2source $DOT ROOT/bin/share/dot2.setup.csh

If youput thesetwo linesinto yourhomedirectory.cshrcfile youwill nothave to typethemagain.If youarerunningsh/bash:

export DOT ROOT=/usr/local/dot2source $DOT ROOT/bin/share/dot2.setup.bash

If youput thesetwo linesinto yourhomedirectory.login or .bashrcfile youwill nothave to typethemagain.To checkthatDOT cannow belocated,type:

rundot --help

You shouldseea single line giving a “Usage:” message.If not, checkthat your PATH matchesthe locationwheretheDOT distribution is installedby typing

ls $DOT ROOT

whichgivesa listing thatshouldincludethedirectoriesbin, data,src,andexamples,amongothers.

2.B Copy the tutorial files to your dir ectory

Make anew directory, for example“testdot”,andcopy theDOT tutorial files to it:

mkdir testdotcd testdotdot tutorial

7

DOT 2.0UserGuide Chapter2 SectionD Needhelp?Email [email protected]

Thisputsthetutorial files in two directoriesnamed“test-rundot”and“test-prepscript”.

2.C Run DOT with prepared input files

Youcannow do ashortDOT run to checkthatall is well. Typethetwo lines:

cd test-rundotrundot udgugi.deg72.nb0.parm

This will start a DOT run on your computer, docking udg (uracil-DNA glycosylase)with ugi (an inhibitor protein),usingacoarse(thusfast)72-degreerotationincrement.

Youwill seeapageor two of log file output,followedby anopportunityfor you to recordanentryin a labnotebookfile named“dotruns”:

Result for dotruns log file (Type control-D to end)�

Youcanthentyperemarks,suchas

first example run, took 2 minutes

and then type ENTER (or RETURN), then hold down “control” and type “d” on a line by itself to completeyourentry.

The DOT output for all runsstartedin a given directorygoesinto a subdirectorynamed“runs”. Rundotcreatesa subdirectoryin “runs” startingwith a numericaltime-stampand endingwith “udgugi.deg72.nb0”, taken from thenameof the.parmfile, For example,by doingtheDOT run on October1, 2007at 7:32PM, we got thedirectoryname“20071001.1932.udgugi.deg72.nb0”. If you go there,you will find thedirectory“pdb/ugi.top30”,which containsPDBfilesof theugi inhibitor:

cd runs/20071001.1932.udgugi.deg72.nb0/pdb/ugi.top30

These PDB files of ugi are relative to the udg coordinatesin the file ‘udg.cen.noh.pdb’. Examine the file‘udg.cen.noh.pdb’and the ugi files in “pdb/ugi.top30”with your favorite molecularvisualizationprogramto seethe30 top-rankedcomplexesfoundby DOT.

2.D PrepareDOT input filesusing supplied PDB files

Next, youwill preparetheDOT input files,startingwith two PDBfiles wesupplyin thedistrbution.

2.D.1 Makesure necessaryprogramsare installed

You needthreeprograms,MSMS, APBS,andReduce,to prepareyour moleculesfor DOT, althoughDOT itself runswithout them.First,we will checkto seeif thesethreeprogramsarealreadyavailable.Type

which msmswhich apbswhich reduce

We supplycopiesof APBSandReducein thedistribution, in $DOT ROOT/bin/$ARCHOSV, so if your computerplatform is oneof theonessupportedin thecurrentDOT distribution, thepathto thesetwo programsshouldprint outafterthe“which” command.

8 May 30,2013

DOT 2.0UserGuide Chapter2 SectionD Needhelp?Email [email protected]

We do not supplya copy of MSMS,soyou will needto downloadit if you do not alreadyhave it. SeeChapter8 ofthis manualfor instructions.Onceyou have installedMSMS (andAPBSandReduceif you do not usetheonesin theDOT distribution), you mustaddits locationto your shell PATH. For example,if MSMS is installedin /usr/local/bin,type: For csh/tcsh:

set path=(/usr/local/bin $path)

For sh/bash:

export PATH="/usr/local/bin:$PATH"

2.D.2 Run “pr epscript”

Go to thesuppliedprepscripttestdirectory:

cd testdot/test-prepscript

Thengo to thesubdirectorycoords:

cd coordsls

In thecoordsdirectory, therearetwo PDB files,udg.pdbandugi.pdb,andtheready-to-runscript,‘prepscript’.Now, runprepscript,keepinga log file: For csh:

./prepscript�& tee prepscript.log

For bash:

./prepscript 2�&1

�tee prepscript.log

This will take several minutesandgeneratea few pagesof log file output,includinga few warningmessagesyoucandisregard.Thelastfew linesof theoutputshouldresemble:

Stationary potential clamp high will be 1.8275Stationary potential clamp low will be -1.7861Stationary molecule files are complete/test-prepscript/coordsGenerating sample DOT parm file udgugi.zero-rotation.nb0.parmGenerating sample DOT parm file udgugi.deg72.nb0.parmGenerating sample DOT parm file udgugi.deg06.nb0.parmGenerating sample DOT parm file udgugi.deg06.nb10.parmTo remove all files normally created by prepscript, typerm *.parm *.cen.* *.center.* *.minmax *.log *.com

DOT inputfiles for thestationarymoleculewill bein thedirectorycoords/udg,files for themoving moleculewill bein coords/ugi.DOT parameterfiles (.parm)will becreatedin theparent‘test-prepscript’directory. If you like, you cango up to thatdirectorynow andrunDOT usingtheDOT input filesyou justprepared:

9 May 30,2013

DOT 2.0UserGuide Chapter2 SectionD Needhelp?Email [email protected]

cd ..rundot udgugi.deg72.nb0.parm

10 May 30,2013

Chapter 3

DOT Quick Start Guide

DOT, its utilities, andtheprogramsMSMS,APBS,andReducemustalreadybeinstalled,seeChapter8 for instructions.

3.A Setup userenvir onment

Goal: Thegoalof this sectionis to make theDOT program,DOT utilities, andtheauxilaryprogramsMSMS, APBS,andReduceavailableto theuser.

3.A.1 $DOT ROOT

To find DOT andits utilities, theDOT ROOT environmentvariablemustbesetto whereDOT is installed.For example,if DOT is installedin “/usr/local/dot2”,type: For csh/tcsh:

setenv DOT ROOT /usr/local/dot2

For sh/bash:

export DOT ROOT=/usr/local/dot2

Yourpathmustbesetto accessDOT utilities, scripts,anddata:

For csh/tcsh: source $DOT ROOT/bin/share/dot2.setup.cshFor sh/bash: source $DOT ROOT/bin/share/dot2.setup.bash

What is a “shell” and what shell am I running? The“shell” is theprogramthatrunsprogramswhosenamesyoutype in: you type“date” andtheshell runsa programby thatnamethatprints thedateandtime. Differentpeoplepreferdifferentshellprograms;wesupportthe“bash” family (sh,bash)andthe“csh” family (csh,tcsh).To find outwhatshellyouarerunning,typeecho $SHELLIf you see“/bin/sh” or “/bin/bash”, usethe instructions“for bash”. If you see“/bin/csh” or “/bin/tcsh”, usetheinstructions“for csh”.

3.A.2 APBS,Reduce,MSMS

You needthethreeprogramsAPBS,Reduce,andMSMS to prepareyour moleculesfor DOT, althoughDOT itself runswithout them. If you have successfullyrun thetutorial (Chapter2), theseprogramsarealreadyin your shellPATH. Toverify, type:

which abpswhich reduce

11

DOT 2.0UserGuide Chapter3 SectionB Needhelp?Email [email protected]

which msms

Copiesof APBSandReducearesuppliedin theDOT distribution in $DOT ROOT/bin/$ARCHOSV, soif yourcomputerplatform is oneof theonessupportedin thecurrentDOT distribution, thepathto thesetwo programsshouldprint outafter the “which” command.If MSMS is not installed,seeChapter8, page60 of this manualfor instructions.AfterinstallingMSMS,unlessyoucopiedit into $DOT ROOT/bin/$ARCHOSV/msmsyoumustaddits locationto yourshellPATH. For example,if MSMS is installedin /usr/local/bin,type:

For csh/tcsh: set path=(/usr/local/bin $path)For sh/bash: export PATH="/usr/local/bin:$PATH"

3.B Setup your molecular system

Goal: In this section,you createa directorystructurefor your project,andgenerateandrun the scriptsnecessarytocreatetheinputfiles for DOT.

3.B.1 CreateWorking Dir ectories

To createthedirectorystructurefor your project,first createa maindirectory, andunderthis a subdirectory“coords”.For example,if theprojectdirectoryis named“projdir”, type:

mkdir projdircd projdirmkdir coords

3.B.2 Selectthe stationary and moving molecules.

Copy thecoordinates(in PDB format)of the two moleculesto bedocked to the “coords” subdirectoryof your projectdirectory. You mustdecidewhich moleculewill bestationaryandwhich moleculewill bemovedaboutthestationarymolecule.Generally, themoleculewith thelargestdimensionshouldbethestationarymoleculeandthesmallermoleculeshouldbethemoving molecule.This is computationallymostefficient (seeSection4.K.1), resultingin botha smallergrid sizeandfewer orientationsof themoving moleculeto getreasonablerotationalsampling.Throughoutthis manualstat.pdbwill be the stationarymoleculestat.pdbandmov.pdbwill be the moving molecule. Typically we usenamesfrom thePDB, like 1cco.pdb. If you areonly usingonemoleculefrom a PDB file thatcontainsmultiple copiesof themolecule,edit thefile in “coords” sothatit containsonly thesinglemolecule.

Sonow we have

projdir/coords/stat.pdbprojdir/coords/mov.pdb

3.B.3 Createprepscript for your molecular system.

In thecoordssubdirectory, createacopy of prepscriptcustomizedfor yourmolecularsystemby

gen dot prepscript -m movingpdb -s stationarypdb [-d griddim] [-l residue library][-o prepscript]

12 May 30,2013

DOT 2.0UserGuide Chapter3 SectionB Needhelp?Email [email protected]

where“movingpdb” and“stationarypdb”arethePDB files for your two molecules(required),in our examplemov.pdbandstat.pdb. By default,gendot prepscriptwill generateaprepscriptthatfiguresoutareasonablesizefor thegrid, usesthedefault residuelibrary, andis named“prepscript”.Theoptionalflagscanbeusedto overridethesedefaults.

Example1: I wantto makeprepscriptfor moleculesstat.pdbandmov.pdb,let prepscriptdeterminethegrid size,andusethedefault residuelibrary.

gen dot prepscript -m mov.pdb -s stat.pdb

Example2: I want to make prepscriptfor moleculesstat.pdband mov.pdb, usea 128� grid with 1 A spacing,andusemy library /home/me/misc/myreslib.rlb.

gen dot prepscript -m mov.pdb -s stat.pdb -d 128 -l /home/me/misc/myreslib.rlb

Both examplescreateanew file “prepscript”in thecoordsdirectory, in ourcase

projdir/coords/prepscript

thatis customizedfor theinputPDB filesmov.pdbandstat.pdb.

3.B.4 Run prepscript

To runprepscript,keepinga log file, do:

For bash: ./prepscript 2�&1 | tee prepscript.log

For csh: ./prepscript |& tee prepscript.log

For anoutlineof thestepsperformedby prepscriptandadetaileddescriptionof eachstep,seeChapter4, page18.

3.B.5 Filescreatedby prepscript

Prepscriptcreatestwo subdirectoriesin “coords”, with namesbasedon your molecules.EachdirectorycontainstheneededDOT input files for eachmolecule.In addition,parameterfiles for runningDOT will beput in themainprojectdirectory.

In ourexamplewith stat.pdbandmov.pdb,whereprepscriptdeterminedweshoulduseagrid 128A oneachside(1A grid spacing)andtheionic strengthfor theelectrostaticpotentialcalculationis 150mM(default), thefollowing DOTinput filesarecreated:

1. Stationarymoleculefiles, in projdir/coords/stat

(a) stat.128.150m.dx– theelectrostaticpotentialof thestationarymolecule,wherethegeneralnameof thefileis molname.[griddim].[ionic strength].dx.

(b) stat.cen.noh.xyzcrv– theshapepotentialof thestationarymolecule,basedon heavy atomsonly.

(c) stat.cen.noh.pdb– thePDB file of thestationarymolecule,heavy atomsonly, for evaluation.

2. Moving moleculefiles, in projdir/coords/mov

(a) mov.cen.polh.xyzq– partialatomicchargesfor themoving molecule,includingpolarhydrogenatoms.

(b) mov.cen.noh.xyz– shapeof moving molecule,representedby theatomiccentersof heavy atomsonly.

(c) mov.cen.noh.pdb– thePDB file of themoving molecule,heavy atomsonly, for evaluation.

3. Parameterfiles, in projdir

(a) statmov.zero-rotation.nb0.parm – Singleorientationfor mov.pdb,no penetrationsof mov into statallowed.

13 May 30,2013

DOT 2.0UserGuide Chapter3 SectionC Needhelp?Email [email protected]

(b) statmov.deg72.nb0.parm– 72� orientationalsearch(60 orientations)for mov.pdb,no penetrationsof movinto statallowed.

(c) statmov.deg06.nb0.parm– 6� orientationalsearch(54,000)for mov.pdb, no penetrationsof mov into statallowed.

(d) statmov.deg06.nb10.parm– 6� orientationalsearch(54,000)for mov.pdb,up to 10 penetrationsof mov intostatallowed.

Thesearethefiles thatDOT uses.Additional files aremadein themoleculesubdirectories,generatedasprepscriptproceeds,seeSection4.G.

3.B.6 Most lik ely problems

Theprepscriptscriptshouldrunwith no interventionneededIF

1. The two moleculesare proteinswith standardresidues. The programsin prepscriptcan handlethe variousprotonationstatesof His andCys(freeanddisulfide).

2. Both moleculesbegin with residueID 1 (any chainID is okay),andyou wanteachN-terminusto bea positivelychargedaminogroup.

3. Thechecksdonein prepscriptareappropriatefor yoursystem,in particularthetotal chargeis nothuge.

Why? TheseproblemsALL revolvearoundtheneedto assignatomicchargesandmolecularradii for thestationarymoleculeto do theelectrostaticpotentialcalculationsandto assignatomicchargesfor themoving molecule.TheprogramReduce,which addshydrogenatomsdependingon the local environment,handlesstandardaminoacidresidues.The default DOT residuelibrary containsatomicchargesandmolecularradii for standardaminoacids,includingthetypical protonationstatesof His, Cysfreeor in a disulfide,andchargedN- andC-termini. Prepscriptassignsa uniqueresiduenamefor eachuniquelychargedandprotonatedstateof eachstandardaminoacid. If aresiduedoesnot appearin theDOT residuelibrary, prepscriptwill stop.Prepscriptwill alsostopif thetotal chargeonamoleculeis notaninteger. This is mostlikely to happenwhenaknown residuetypeis notproperlyprotonated.Thischeckis aKEY checkthatyou arebuilding your systemproperly.

Residuesother than standard amino acids. Appropriatechargesarealsoavailablefor HEM andDNA residues;thelibrariescanbecustomizedto handlenew residues(seesection8a.,page34. TheprogramReducecanalsohandleDNA residues,andmayalsoaddhydrogenatomsproperlyto unusualfunctionalgroupsor modifiedaminoacidresidues(seesection4e.,page32).

N-terminal residuesof a protein. If theN-terminalresiduedoesnot have a residueID of 1, seesection4a.,page29. Youmustdecideif youwantit neutral(beginningwith astandardresidue,whichhasasinglehydrogenatomon theN atom)or positively charged.Themostlikely problemis a missinghydrogenatomat thebeginningof apeptidechainwhenyou intendedthe terminusto beneutral. If the total charge is off by 0.248,suspecta missingmainchainamidehydrogenatom!

Prepscript checksthat may not apply to your particular system.Oneexamplewherechecksdonein prepscriptmaynotbeappropriatefor yoursystemis if thetotalchargeononemoleculeis very large,whichis quitelikely for DNA(seesection4.M, page39).

3.C Run DOT

Goal: To runDOT andexamineDOT output.

3.C.1 Running DOT

To run theprogramDOT, first go to themaindirectoryof yourproject.For ourexample“projdir”:

cd projdir

14 May 30,2013

DOT 2.0UserGuide Chapter3 SectionC Needhelp?Email [email protected]

ThesampleDOT parameterfiles (.parm)madeby prepscriptarein thisdirectory.Thecommandto runDOT is:

rundot parameterfile [-h hostfile] [optional comments]

The parameterfile is required. If no hostfile is specified,DOT will run on oneprocessoron the computerthat youareloggedinto. If your optionalcommentscontainspecialcharacters,you shouldsurroundthemwith singlequotes.Examplesaregivenbelow.

For eachnew molecularsystem,we STRONGLY suggestfirst runninga testcasethatusesoneprocessoranddoesonly a singlerotationfor the moving molecule. This will checkthat DOT found the input files, that $DOT ROOT issetproperly, andthatDOT is runningokay. For our project,startingwith stat.pdbandmov.pdb,thecommandto run asinglerotationof themoving moleculeis:

rundot statmov.zerorot.nb0.parm

For your system,the .parmfiles will have your moleculenames,asmadeby prepscript. This runs the DOT on oneprocessor, theoneyouareloggedinto.

At theendof therun,you arepromptedenterany additionalcomments,endingwith control-D.

3.C.2 Running DOT on multiple computer/processors

For your project, you will want to do a full rotationalsearch. You could do this on a singleprocessor, but the runcanbe donemuchquicker on multiple computersor processors.DOT is designedto run very efficiently on multiplecomputersin parallelusingMPI (MessagePassingInterface).All of thecomputersthatyouusemusthave accessto theappropriatelycompiledDOT program.Eachcomputermustalsohaveaccessto yourprojectdirectory. Furthermore,youmustbeableto either“rsh” or “ssh” to eachcomputer. Gettingsshor rshmayrequirehelpfrom asystemsadministrator.SeeChapter5, Section5.D.2.

ALERT! If this is thefirst timeyouhave usedDOT with multiple computersor processors,seeSection3.C.5belowbeforerunninga time-consumingproductionrun!

To runon multiple computerscreatea text file in yourprojectdirectorythatis a list of thecomputers,onenameperline. For example,for 3 computersI make afile called‘myhosts’

dopeysleepygrumpy

If I want to usemultiple processorson a single computer, put one line for eachprocessor. For example, if I havean8-processorcomputerdopey andI wantto use3 processors,my “myhosts”file wouldcontain

dopeydopeydopey

For our big productionrun,we wantto sampletherotationalsearchfor themoving moleculewith afinenessequivalentto thatof thetranslationalsearch.Thenumberof orientationsneededis dependenton thesizeof themoving molecule.Wehave foundthatthe6 degreesearchis appropriatefor globularproteinswith upto 120residues(diameterof up to 40A). Prepscriptcreatestwo .parmfiles thatdoa6 degreerotationalsearch:in one,noatomsof themoving moleculemaypenetratetheinterior of thestationarymolecule,in theother, up to 10 moving moleculeatomsareallowedto penetratethestationarymolecule.For our systemstartingwith stat.pdbandmov.pdb,to allow no penetrations(or bumps)by themoving molecule,type:

rundot statmov.deg06.nb0.parm -h myhosts ’my 0 bump production run’

15 May 30,2013

DOT 2.0UserGuide Chapter3 SectionC Needhelp?Email [email protected]

This DOT run will do 54,000evenly spacedorientationsof the moving molecule,which shouldtake a few hourson5 to 10processors.For afiner rotationalsearch,examineyour .parmfile.

To do the samerotationalsearch,but allow up to 10 atomsof the moving moleculeto penetratethe stationarymolecule,do:

rundot statmov.deg06.nb10.parm -h myhosts ’my 10 bump production run’

Allowing penetrations(bumps)can be useful for unboundmoleculeswhen somestructuralrearrangementoccursinthecomplex.

3.C.3 Where DOT output goes

Eachinvocationof “rundot” in projdir createsasubdirectoryin projdir/runsin this format:

projdir/runs/DATE.TIME.prefix from parm file

DOT output is put in this subdirectory. For examplefor ‘my 10 bump productionrun’ above startedon August 2,2007at10:20AM, outputwill bein

projdir/runs/20070802.1020.statmov.deg06.nb10

The directorynameis constructedfrom the dateand time the run wasstarted,the namesof the moleculesused,thenumberof orientationsappliedto the moving molecule,andthe numberof bumpsallowed. The last threefields aretakendirectly from thenameof the.parmfile, in thiscasestatmov.deg06.nb10.parm.

In yourprojectdirectory, thefile

dotruns

is created,which is a cumulative log of all DOT runsstartedin projdir. This file logs the directorynamesandyourcommentsat thebeginningandendof eachrun. Very handy!

3.C.4 DOT output and evaluation

Thebasicoutputof DOT put into the runs/DATE.TIME.prefix from parmfile outputdirectoryis an.e6dfile. For ourexamplestartingwith stat.pdbandmov.pdb:

statmov.top2000.e6dThis file containstheinformationfor generatingthe2000(default) top-rankedplacementsof themoving molecule.

Theseplacementsarerelative to thecenteredcoordinatesof thestationarymolecule,in ourcase,stat.cen.noh.pdb,whichis copiedinto theoutputdirectoryfrom projdir/coords/stat.

“Rundot” alsoperformsa preliminaryevaluationof the DOT run by runningthe script “evaluatedot run”, whichprocessesthe informationin the.e6dfile. TheevaluationincludescreatingPDB files of the30 top-ranked placementsof themoving moleculein thesubdirectory“pdb/mov.top30”. SeeChapter6 for adetaileddescriptionof theevaluationperformedby “evaluatedot run” andfurtherevaluationsthatyoucando.

3.C.5 TestingDOT on multiple computers/processors

If this is thefirst time you have run DOT on multiple machines,we suggestthatyou first testparallelprocessingon asmallDOT run. Prepscriptmakesaparameterfile for doingjust60 rotationsof themoving molecule.For your testrun,do:

rundot statmov.deg72.nb0.parm -h hosts ’test of multiple processors’

16 May 30,2013

DOT 2.0UserGuide Chapter3 SectionC Needhelp?Email [email protected]

This will test that all of the machinescan accessan appropriatelycompiledDOT programand its utilities, that allmachinescanaccessyourprojectdirectory, andthatyougeteither“rsh” or “ssh” to eachcomputer.

Bug Warning: currentlythenumberof computersmustbeno morethanthenumberof orientationsof themovingmolecule.Sinceyou probablyhave only adozenor socomputersavailableandusuallyhave hundredsor thousandsof orientationsthis is not likely aproblemexceptfor thesingle-rotationtestcase.

17 May 30,2013

Chapter 4

Prepscript - creating input files for DOT

Goal: This chaptergivesdetaileddescriptionsof the stepsneededto setup your molecularsystemfor DOT dockingruns.Thescriptsdescribedin thissectionare

� gendot prepscript– generatesaprepscriptcustomizedfor yoursystem

� prepscript– createsmolecularinput filesandparameterfiles for DOT

andtheutilities andscriptscalledby these.

4.A Outline of setupsteps

1. AccessingDOT scriptsandutilities andauxilliary programs

2. Examinethemoleculestructuresfor completeness

3. Selectstationaryandmoving molecules

4. Setup directorystructure

5. Setup customizedlibrary for new functionalgroups

6. Createprepscript:gendot prepscript

7. prepscriptgeneralfeatures

(a) toolsthatprepscriptuses

(b) Assignssystematicnaming to neededfiles: cen=centered,noh=no hydrogenatoms, polh= with polarhydrogenatoms,allh=with all hydrogenatoms

(c) Assignsfile suffixesto indicatefile type

(d) Pausecommandfor userintervention

8. Prepscript:stepscommonto bothmolecules– dot2-prep-mol-common

(a) Defineanacceptablerangefor thetotal chargeon eachmolecule

(b) Removewaters,H atoms,alternatelocationsfrom PDB files

(c) Centerbothmoleculeswith no H atoms(cen.noh)[pdb make centered]

(d) Add hydrogenatomsto PDBcoordsusingReduce(cen.allh)

(e) Create files with heavy atoms and polar H atoms with RES names to match library (cen.polh)[pdb renameres by hydrogens,striph]

(f) Moving molecule:Createshapefile (.xyz) basedon heavy atoms

(g) Moving molecule:Createpartialatomicchargefile (.xyzq)

18

DOT 2.0UserGuide Chapter4 SectionC Needhelp?Email [email protected]

9. Prepscript:Calculatediametersof moleculesto determinegrid size

10. Prepscript:Stationarymolecule

(a) Stationarymolecule:Createelectrostaticpotential(.dx)

(b) CreateAPBScommandandparameterfiles

(c) RunAPBS

(d) Stationarymolecule:Createshapepotential(.xyzcrv)

(e) RunMSMS(heavy atomsonly) to defineexcludedandfavorablevolumes.

(f) Determineelectrostaticclampingvalues

11. Prepscript:Createparameterfiles for DOT run (.parm)

4.B AccessingDOT scripts and utilities and auxilliary programs

It is assumedthatyouhave setupyour userenvironmentto access

� DOT scriptsandutilities – seeChapter2, section2.A

� the programsAPBS, Reduce,andMSMS – seeChapter2, section2.D.1. Note the DOT distribution doesnotincludeMSMS.

If yousuccessfullyranthetutorials(Chapter2), youruserenvironmentincludestheenvironmentvariable$DOT ROOTandthefollowing directoriesarein your path.Platformindependentbashandcshscriptsarein:

$DOT ROOT/bin/share

CompiledC/C++utilities arein:

$DOT ROOT/bin/$ARCHOSV

where $ARCHOSV is replaced by your platform/operatingsystem name that is determined by the script$DOT ROOT/bin/share/archosv. If youareusingtheAPBSandReduceprogramsdistributedwith DOT, they arein

$DOT ROOT/bin/$ARCHOSV

. If you followedtheinstallationinstructionsfor MSMS(Chapter8, Section8.D.8.D.2),MSMSwill alsobein

$DOT ROOT/bin/$ARCHOSV

Thedirectory

$DOT ROOT/data

includesresidueatomiccharge libraries(.rlb), rotationsetsthat areappliedto the moving molecule(.eul), auxilliaryfiles for MSMS(calculationof molecularsurfaces)andACE (for evaluation),andsamplefiles.

4.C Examineand completestarting PDB files

Prepscript,thescript thatcreatestheDOT input files, takesasinput two startingPDB files, eachwith a singlecopy ofa macromolecule.If multiple copiesof themacromoleculearepresentin theasymmetricunit of a crystallographicfile,theusermustdecidewhich setof coordinatesis to beusedfor dockingandcreatean input PDB file thatcontainsjustthatmolecule.Themoleculecanincludemultiplechains.Prepscriptwill removewatermolecules,hydrogenatoms,and

19 May 30,2013

DOT 2.0UserGuide Chapter4 SectionD Needhelp?Email [email protected]

multiplepositionsof aresiduein thePDBfilesunlesstheuserexplicitly doesnotwantthis to happenandeditsprepscriptappropriately. Eachresiduein thePDB file mustincludeall of theheavy (nonhydrogen)atomsthatcorrespondto thatresiduein theresiduelibrary. It is okayif themacromoleculeis missinganentireresidueor a regionconsistingof manyresidues,suchasmissingresiduesat theN- or C-terminior disorderedloop regionsthatdo notappearin thePDB file.

4.C.1 Missing atomsin sidechains.

If aPDB file lacksthefull setof heavy atomsthatdefinea residue,thePDB curatorsusuallyindicatethis in REMARK470,MISSING ATOMS. Theheavy atomsmustmatchthoseassociatedwith theresiduename,hencetheuserhastwochoices:(1) build thefull sidechainor (2) renametheresidueto matchtheexisting atoms.For example,givena lysineresiduewith atomsonly to C� , theusercouldeitherbuild the full Lys sidechainby addingatomsC� , C� , andN � , orcould remove the C� atom,leaving only the C atomof the sidechain,andrenamethe residue“ALA”. In this case,theadvantageof building thefull sidechainis thatthecorrecttotal charge is retained.Thedisadvantageis thatthesidechainwasprobablydisordered,andmayhave multiple,similar-energy conformations.Generally, we build thefull sidechain. Building sidechainsis outsidethe scopeof the DOT programsuite,but proprietaryprograms,suchasInsightII (Accelrys), and opensourcetools, suchas the Swiss-Pdb-Viewer[5], canbe usedto do this task. Sinceresidueswith disorderedatomsareusuallyon thesurfaceof theprotein,they canaffect theshapeandelectrostaticpropertiesofinteractionsurfacesandhenceshouldbebuilt carefully. For example,addedatomsshouldbecheckedfor stericclasheswith otheratomsin themolecule.

If themissingheavy atomsarenot added,prepscriptwill find that theprotein’s total charge is not aninteger, reporttheerror, andstop.

4.C.2 Missing loopsand chain ends

Missing residuesare often indicatedin the PDB file as REMARK 465, MISSING RESIDUES.In general,we donot build in missingresiduesfor rigid-body docking. The usermay decidethat missingregions needto be built tomake a completemodel,but theseregionsusuallyaremissingbecausethey aredisorderedandprobablyhave multipleconformations.If the missingloop or N- or C-termini regionsarenot built, thereare two considerations.First, theresultingtermini shouldbeunchargedsothatspuriouschargedgroups,which cansignificantlyaffect theoverall chargedistribution, arenot introduced.Oneway the usercanapproachthis is to leave the termini asstandardaminoacids.This resultsin incompletecovalentbondingfor the terminalmain-chainatoms,N at the N-terminusandC at the C-terminus,which is notaproblemfor thecomputationaldocking.Alternatively, theusercoulduseamolecularmodelingprogramto build acetyl (ACE) groupsonto N-termini or N-methyl groups(NME) onto C-termini, also leaving bothendsneutral.Second,missingregions,especiallythoseat theN- andC-termini,canindicatesurfaceregionswhereanincomingmoleculeis unlikely to bind. Dockedconfigurationsfoundby DOT thatcontacttheseincompleteterminimaybeeliminatedaslikely complexes.

4.D Assigningthe stationary and moving molecules

In theDOT dockingcalculation,onemoleculeis stationaryandtheothermoleculeis movedaboutit. Thereareseveralcriteriafor selectingwhich moleculewill bestationaryandwhich will bemoved. (1) Thecomputationaltime requiredfor a translationalandorientationalsearchof a givenfineness.Thecomputationaltime is dependenton thesizeof thegrid andthenumberof orientationsappliedto themoving molecule.(2) Suitabilityor convenienceof thepotentialsusedto describethemolecules,sincethestationaryandmoving moleculepotentialsarequitedifferentfrom eachother. (3)Otherfactorsspecificto themolecularsystem.

4.D.1 Computation time.

Computationaltime for a given resolutionof the searchis shortestwhenthe larger moleculeis stationary(S) andthesmallermoleculeis assignedasmoving (M). Thecomputationaltime is dependenton two factors:thenumberof pointsin thegrid andthenumberof orientationsappliedto themoving molecule.

Thegrid representingS is repeatedin all directions(periodicboundaryconditions).Giventhedefault grid spacingusedby DOT of 1 A, gridsthatare32,64,96,128,160,192,and224A oneachsidearethemostefficient for theDOT

20 May 30,2013

DOT 2.0UserGuide Chapter4 SectionE Needhelp?Email [email protected]

algorithm.Thegoalis to selecttheminimumgrid sizewhereM doesnot seeadjacentcopiesof S or their properties.Agoodruleof thumbis thatwhenM contactsS,M liesentirelyinsidethegrid. A generousmeasureof this is thatthegriddimensionshouldbelargerthan + � � , where and � arethelargestdiametersfor thetwo molecules. + � � willbesmallerwhenthelargermoleculeis assignedasS.

Thepropertiesof themoleculesmaybea factorin choosingthesizeof thegrid. For example,if S createsa strongelectrostaticpotential,theremay be significantvaluesat the edgesof the grid. This is most likely whenS is highlychargedor hasa large region with a high concentrationof residueswith similar charge. WhenM is centeredon a gridpoint distantfrom S, M mayseeelectrostaticpotentialfrom copiesof S thatdistortstheelectrostaticenergy. In thesecases,agrid sizelargerthan + � � maybeworthconsidering.

Thecomputationaltime is linearly dependenton thenumberof orientationsappliedto themoving molecule.For agivensetof orientations,therotationalspaceof themoving moleculewill bemorefinely sampledfor asmallermolecule.For example,for two molecules,onewith twice thediameterof theother, thelargermoleculewould require8 timestheorientationsto give thesameorientationalsamplingof thesmallermolecule.

4.D.2 Suitability of potentials.

Thepotentialsof Sarecalculatedonce,whereasthepotentialsof M mustberecalculatedfor eachorientationof M. Sincethepotentialsof Sarecalculatedonly once,they canbequitedetailedandcomputationallyintensive. Theshapepotentialof S requirescalculationof molecularsurfacesanddeterminationof thevolumesof thegrid thatrepresenttheexcludedandfavorableregions. The electrostaticpotentialof S is calculatedby Poisson-Boltzmannmethods,which take intoaccountionic strengtheffectsandthedielectricboundariesbetweensoluteandsolvent.Both theshapeandelectrostaticpotentialcalculationfor the stationarymoleculetake several minutes. For computationalfeasibility, the shapeandelectrostaticpropertiesof the moving moleculemustbe rapidly calculated. In DOT, the shapeis representedby theatomiccoordinatesof all non-hydrogenatomsandthe charge distribution by point chargesat the atomiccoordinates,including polar hydrogenatoms. Both are rapidly mappedonto the grid for eachorientation. If the userwantsonemoleculebedefinedin moredetailthantheother, thatmoleculeshouldbeassignedasS.

We have found the reversecasewhen a fragmentof double-strandedDNA is usedto representpart of a longDNA strand[7, 8]. The DNA fragmentworksbestasM, whereits charge distribution is representedby atomicpointcharges.Whentheelectrostaticpotentialfor a DNA fragmentis calculatedby Poisson-Boltzmannmethods,thegreatersolventaccessibilityof theendscreatesanonuniformelectrostaticpotentialalongtheDNA strand.Neartheendsof thefragment,theelectrostaticpotentialis neutral;only in thecenterof the fragmentis thepotentialduedo thephosphateatomsconstant.Thustheelectrostaticpotentialof theDNA fragmentis highly dependenton its length.WhentheDNAfragmentis representedaspoint charges,thechargedistribution is consistentover thefull fragment.

4.D.3 Other factors.

OneproteinenvironmentmaybemoresuitablerepresentedaseitherSor M. For example,if onemoleculeis embeddedin a membraneenvironment,this couldbeincludedaspartof thedescriptionof thestationarymolecule.

4.E Setup dir ectory structur ewith starting coordinates

First, createa maindirectoryfor your project,thencreatea subdirectory“coords”. For example,if we decidethemainprojectdirectorywill benamed“projdir”, wemove to thedirectorywhere“projdir” is to becreatedandtype:

mkdir projdircd projdirmkdir coords

Thismakesthedirectorystructure:

projdir/coords/

Put thepreparedPDB files of themoving andstationarymolecules“projdir/coords”. For example,if our startingPDB

21 May 30,2013

DOT 2.0UserGuide Chapter4 SectionF Needhelp?Email [email protected]

filesarestat.pdb(stationarymolecule)andmov.pdb(moving molecule),wehave

projdir/coords/stat.pdbprojdir/coords/mov.pdb

Youwill probablywantto namethesefilesaccordingto themoleculeswithin them,suchas1udg.pdband3ugi.pdb.

What characters can my file namesuse? DOT andPrepscriptfile namesareallowed to containonly printableASCII charactersandmaynotcontainspaces.Becauseof conventionsthatsomeof thetoolsuse,avoid usingany =,$, *, ?, or & charactersin file names.You areOK usingletters,numbers,periods,commas,underscores,hyphens,andany of !, @,#, � , or %.

4.F Setup customizedlibrary for new functional groups

4.F.1 Standard residuelibrary for proteins

Thedefault residuelibrary is

$DOT ROOT/data/uhbd.amber84.prot.rlb

This residuelibrary containsatomicpartialchargesandradii for

� the20 aminoacidresiduesaspartof apeptidechain,

� the20 aminoacidresiduesasN-terminalresiduesbeginningapeptidechain,

� the20 aminoacidresiduesasC-terminalresiduesendinga peptidechain,

� HIS protonatedon NE2aloneandND1 alone(neutral)andprotonatedon both,

� CYS,bothfreeandaspartof adisulfide

� ACE,acetylgroupthatsometimesbeginsapeptidechain,

� NME, methylaminogroupthatsometimesendsa peptidechain

For a list anddescriptionof theresiduenames,see

$DOT ROOT/data/uhbd.amber84.prot.README

The library includespolarhydrogenatoms,but not nonpolarhydrogenatoms.Partial atomicchargesmatchthosefrom AMBER[12] atomsonly. The library format is that usedby the UHBD[4] (University of Houston,BrownianDynamics)program.Thelibrary hastwo sections.Thefirst “EQUIVALENCE” allows theuserto equivalenceanatomnamein theirPDBfiles to theatomnameusedin thelibrary. For example,themain-chainamide‘H’ usedto sometimesbecalled‘HN’, sofor ALA, theline in the“EQUIVALENCE” sectionis

ALA HN ALA H

which mapsHN in ALA in the input PDB file to H in ALA in the residuelibrary. Theselines are lessneedednow,with theremediationandstandardizationof atomnamesin thePDB.

Thesecondsection,“AMBER” containstheresiduename(resi),atomname(atom),charge(chrg), andradius(radi).Thelibrary currentlyincludes2 additionalfieldsfor UHBD Browniandynamics,epsiandsigm,but thesearenotneededfor DOT andshouldbesetto 0.0whennew residuesareaddedto thelibrary.

22 May 30,2013

DOT 2.0UserGuide Chapter4 SectionG Needhelp?Email [email protected]

4.F.2 Additional residuelibraries

Thedirectory�

$DOT ROOT/data/cofactors

containsresiduelibrariesfor DNA (including5� and3� termini), RNA (with 5� termini beginningwith eitherthesugaror includingaprecedingphosphategroupand3� termini),GTP, ATP, andheme(Fe(II)) thatareconsistentwith thepolaratomatomicchargesof AMBER. Currently, only asinglelibrary file canbeused,sotheselibrariesneedto beappendedto thestandardproteinlibrary. For example,for aDNA/proteinsystemin ourdirectory“projdir”, I makea subdirectorydata,copy thestandardproteinlibrary andtheDNA library into it, andappendtheDNA library to make a new libraryfile uhbd.amber84.prot.dna.rlb.

cd projdirmkdir datacd datacp $DOT ROOT/data/uhbd.amber84.prot.rlb .cp $DOT ROOT/data/cofactors/dna.amber.rlb .cat uhbd.amber84.prot.rlb dna.amber.rlb

�uhbd.amber84.prot.dna.rlb

To useyour new library in prepscript,you mustaddthe-l flag andtheFULL pathnameof your new library in yourgendot prepscriptcommand(Section4.G).

4.G CreatePrepscript

“Prepscript”is anexecutablescript thatproducesall theneededDOT input files, includingmoleculardescriptionsforthestationaryandmoving moleculesandparameterfiles for runningDOT (seeSection5.C for a list of files). You willuse$DOT ROOT/bin/gen dot prepscript to generate“prepscript”customizedfor yoursystem.For gendot prepscriptandprepscriptto work, youneedto

� Have yourpathsetup to accessDOT scriptsandutilities (Chapter2, section2.A).

� Have yourpathsetup to accesstheprogramsReduce,MSMS,andAPBS(Chapter2, section2.D.1).

� Setup thedirectorystructurewith startingcoordinates(Section4.E)

� If needed,make acustomizedresiduelibrary for non-standardproteinresiduesandotherffunctionalgroups,suchascofactors(Section4.F)

For our example,with “projdir” asour projectdirectoryandthe startingPDB files stat.pdb(stationarymolecule)andmov.pdbmoving molecule),bothPDBfiles arecopiedinto projdir/coords, assetup in 4.E.

Thescript“gen dot prepscript”will createprepscriptfor yoursystem.Syntaxis:

gen dot prepscript -m movingpdb -s stationarypdb [-d grimdim] [-l residue library]

An appropriategrid dimensionis calculatedby prepscriptif it is not specified,seesection4.J. The default libraryassignsatomicchargesandradii andis in:

$DOT ROOT/data/uhbd.amber84.prot.rlb

Example1. Working in the projectdirectory“projdir” with the PDB files stat.pdbandmov.pdb“projdir/coords”,usingthedefault library, andlettingprepscriptcalculatethegrid size.

23 May 30,2013

DOT 2.0UserGuide Chapter4 SectionH Needhelp?Email [email protected]

cd projdir/coordsgen dot prepscript -m mov1.pdb -s stat.pdb

Example2: Samecoordinates,but we want to usea 128-cubegrid with 1 A spacingandthe customizedresiduelibrary with atomiccharges“myreslib.rlb” in projdir/lib:

gen dot prepscript -m mov.pdb -s stat.pdb -d 128 -l /home/me/projdir/lib/myreslib.rlb

Note the full pathname(startingwith a ‘/’) must be specifiedfor the library. A customizedlibrary can be built byaddingyour new functional groupsto a copy of the default library. For example,our researchercopiedthe defaultproteinresiduelibrary

$DOT ROOT/data/uhbd.amber84.prot.rlb

into the directory “/home/me/projdir/lib”andnamedthis residuelibrary file myreslib.rlb. The library assignspartialatomicchargesandatomicradii to eachatomof a residue.To adda new residueto the library, theusershouldassignradii consistentwith thosein thedefault library andobtainreasonablepartialatomicchargesconsistentwith thoseof theAMBER polarhydrogenforcefield [12], following theformatof thelibrary.

4.H Prepscript general

4.H.1 GeneralApproach

Prepscriptwill work without interventionif thetwo moleculesareproteinswith standardresidues,containno cofactorsnot in thecurrentcharge library, andbothstartwith residuenumber1, which shouldbetreatedasa positively chargedN-terminus.If interventionis needed,thePAUSEcommandcanbe insertedwith a commentto remindtheuserthatafile needsto beadjustedbeforeproceeding.

It is importantto checkthe log files for errors! They will be locatedin the “stat” and“mov” directories(where,again, ’stat’ and ’mov’ are examplenamesusedin this manual: normally you will have called your stationaryandmoving moleculesnameslike ’1abc.pdb’and’5gfd.chainB.pdb’).

4.H.2 Running prepscript

For prepscriptto work, youneedto

� Have yourpathsetup to accessDOT scriptsandutilities (Chapter2, section2.A).

� Have yourpathsetup to accesstheprogramsReduce,MSMS,andAPBS(Chapter2, section2.D.1).

� Setup thedirectorystructurewith startingcoordinates(Section4.E)

“Prepscript”is runin the“coords”subdirectoryof yourproject.Wehighly recommendyoumakealog file whenrunningprepscript.To runprepscript,type:For csh:

./prepscript�& tee prepscript.log

For bash:

./prepscript 2�&1

�tee prepscript.log

24 May 30,2013

DOT 2.0UserGuide Chapter4 SectionH Needhelp?Email [email protected]

Example.Working in theprojectdirectory“projdir” usingcsh:

cd projdir/coords./prepscript

�& tee prepscript.log

4.H.3 Assignmentof file names

Prepscriptautomaticallyaddsto thefile nameof theoriginal input PDB files to indicatewhat is in thefile andthefiletype.Systematicnamesinclude

� cen= centered

� noh= heavy atomswith nohydrogenatoms

� polh = heavy atomswith only polarhydrogenatoms

� allh = heavy atomswith all hydrogenatoms

Suffixesthatindicatefile typesinclude

� xyz = ContainsX, Y, andZ coordinatesonly

� xyzq = ContainsX, Y, andZ coordinatesandpartialatomiccharge

� xyzqr = ContainsX, Y, andZ coordinates,partialatomiccharge,andradius

� xyzqr.xml = asxyzqrbut in formatfor APBSinput

� uhbdgrd= electrostaticpotentialgrid createdin UHBD format

� dx = electrostaticpotentialgrid createdin APBSformat

� xyzcrv= shapefile

� e6d= DOT output,seeSection6.E,page49

If thepreparationscriptsucceedsthenthefollowing fileswill begenerated.$projdir/coords/mov/mov.pdb$projdir/coords/mov/mov.noh.pdb$projdir/coords/mov/mov.noh.center.xyz$projdir/coords/mov/mov.cen.noh.pdb$projdir/coords/mov/mov.cen.noh.xyz$projdir/coords/mov/mov.cen.allh.pdb$projdir/coords/mov/mov.cen.noh.reduce.log$projdir/coords/mov/mov.cen.polh.pdb$projdir/coords/mov/mov.cen.polh.pdb to xyzq.log$projdir/coords/mov/mov.cen.polh.xyzq

$projdir/coords/stat/stat.pdb$projdir/coords/stat/stat.noh.center.xyz$projdir/coords/stat/stat.cen.pdb$projdir/coords/stat/stat.cen.polh.pdb$projdir/coords/stat/stat.cen.noh.pdb$projdir/coords/stat/stat.cen.allh.pdb$projdir/coords/stat/stat.cen.noh.reduce.log

25 May 30,2013

DOT 2.0UserGuide Chapter4 SectionI Needhelp?Email [email protected]

$projdir/coords/stat/stat.cen.polh.pdb$projdir/coords/stat/stat.cen.polh.pdb to xyzq.log$projdir/coords/stat/stat.cen.polh.xyzqr.xml$projdir/coords/stat/stat.cen.polh.apbs.log$projdir/coords/stat/stat.cen.128.150.0m.dx(or similarname)$projdir/coords/stat/stat.cen.noh.xyzcrv$projdir/coords/stat/stat.cen.noh.r+1.4.p=1.4.xyzcrv$projdir/coords/stat/stat.cen.noh.clamp.minmax

4.H.4 The “pause” command,pausingduring creation of DOT input files

If you insert

pause ’reason’

into prepscriptor the scripts called by prepscript,such as “dot2-prep-mol-common”, the script will pauseat thisstatement,with the ’reason’remindingyou of what you needto do. NOTE: To delineatethecomment,usethesingleforward-leaningquote“ � ” : thecharacterright of thesemicolonon mostkeyboards.Typing enter(or return)will makeyourscriptresume.

Example1: I wanttochecktheDOT inputfilesfor themovingmoleculebeforeI moveontoprocessingthestationarymolecule.I puta “pause”statementin prepscriptafterthemoving moleculeis processed:

dot2-prep-mol-common -p mov.pdb -l $reslib -w movingexit if error $? dot2-prep-mol-common reported an error, prepscript quittingpopdecho Moving molecule files are complete

pause ’check moving molecule files mov.cen.noh.xyz and mov.cen.polh.xyzq’ � addthis

Example2: I have a CYS residuethat is boundto a Cu atomin plastocyanin,my moving molecule.Sincethereisno hydrogenatomon theS atom,it will beautomaticallyassignedasCYX, a neutralresiduethat is partof a disulfide.However, I have createda residueCYM, that hasthe appropriatecharge of -1 neededfor CYS asa metal ligand. Tomake suretheresiduegetsassignedproperly, I have to interveneafterall hydrogenatomshave beenaddedby Reduce,but beforeatomicchargesareassigned.This requiresmakinga customizedversionof ’dot2-prep-mol-common’for mymoleculemov.pdb. After all hydrogenatomsareaddedwith Reduce,but beforenon-polarhydrogensareremoved,I put:

pause ’edit mov.cen.allh.pdb to replace Cu-bound CYS 84 with CYM’

I edit thefile andthentypeenter(or return)to continuetheprocessingof DOT inputfiles.

4.I Prepscript: Stepscommonto both molecules– dot2-prep-mol-common

Thescript “dot2-prep-mol-common”is run by prepscriptfor boththemoving andthestationarymolecules.Thescript“dot2-prep-mol-common”doesthefollowing steps:

� removeshydrogenatomsandwatermoleculesfrom thestartingPDBfiles

� centersthemolecules

� callstheprogramReduceto build hydrogenatoms

26 May 30,2013

DOT 2.0UserGuide Chapter4 SectionI Needhelp?Email [email protected]

� createsthe centeredPDB files with heavy atomsandpolar hydrogenatoms,with eachprotonation/charge statehaving auniqueresiduename

� createsthemoving moleculeshapefile (.xyz)

� createstheatomicchargefile (.xyzq)andcheckstotal molecularcharge

Thesestepscreateall therequiredDOT input files for themoving moleculeandcreatethefiles neededto calculatetheshapeandelectrostaticpotentialsof thestationarymolecule.

4.I.1 Checkon range for total molecular charge

Before running “dot2-prep-mol-common”,prepscriptsets the parameters“qtotmin” and “qtotmax” to define anacceptablerangefor thetotal chargeon eachmolecule.Thedefault in prepscriptis

qtotmin=-20qtotmax=20

If the total charge of either moleculeis not in this range,Prepscriptwill quit with an error message.If you knowthetotal chargeof your moleculeis anintegeroutsidethis range,adjustthelimits accordingly. For example,a fragmentof double-strandedDNA with 12 basepairshasa total molecularcharge of -22, so changing“qtotmin” to equal-30will allow prepscriptto runwithout anerror. Within prepscript,thetotalmolecularchargeis checkedfor eachmoleculeby thescript “dot2-prep-mol-common”through“xyzq checkok” andchecked in the log file madeby eitherAPBSorUHBD whentheelectrostaticpotentialfor thestationarymoleculeis calculated.

If the the total charge for both the stationaryand the moving moleculeis in the rangedefinedby “qtotmin” and“qtotmax”, but is not an integer, prepscriptwill quit with an error message.This turnsout to be an excellentcheckof whetheratomicchargesgot assignedcorrectly. Residuesor associatedresidueshave integral chargesin theresiduelibrary. A non-integral charge usually indicatesmissingatoms,eitherbecauseatomsaremissingin the original PDBfilesor becausehydrogenatomswerenotaddedcorrectly.

To turn off thechargerangechecking,set“qtotmin” equalto “qtotmax”, with any value,e.g.qtotmin=0;qtotmax=0Thiswill alsoturn thenon-integercheckinto awarningratherthana fatalerror.

4.I.2 Creatingno-hydrogenfiles (noh files)

The script “dot2-prep-mol-common”,calledby prepscript,removes watermolecules,hydrogenatoms,andalternatepositionsfor thesameatomfrom theinput PDBfile. For astartingmoleculenamedmol.pdb,thefile createdis:

� mol.noh.pdb

in PDB format.

Why? Input PDB files canincludesmall moleculesthat would not be consideredto be part of molecularshapepresentedto dockingpartners.Thesemoleculesincludewatermolecules,ions from buffer, andsomemetal ions.Prepscriptremoveswatermolecules,but theusermustdecideif othermoleculesareanintrinsicpartof themolecularstructureandremove thosethatarenot. For example,a metalion maybe an importantcofactor(keep)or presentbetweenmoleculesin thecrystalasa heavy atomderivative (remove). SomecrystallographicstructuresandmostNMR structuresincludesomeor all hydrogenatoms. Theseatomsmay have nonstandardnamesor geometries,dependingon themethodof refinement.This stepof prepscriptwill remove them,andthey will bereplacedin latersteps.If theuserwantsto retainhydrogenatomsfrom thePDB, skip this step.SomePDB have multiple locationsfor someatoms,saya mobilesidechain.Prepscriptselectsonly thefirst listedalternatelocationfor anatom.

If multiple moleculesare presentin the asymmetricunit of a crystallographicfile, the usermust decidewhichmoleculeis to beusedfor docking,andedit theinputfile accordinglyto containjustonemolecule.

4.I.3 Centering molecules

Thescript“dot2-prep-mol-common”, calledby prepscript,findsthegeometriccenterof eachmoleculeandtranslatesthecoordinatessothatthecenterliesat (0,0,0).Filesof centeredcoordinateshave ”cen” addedto theirname.In prepscript,

27 May 30,2013

DOT 2.0UserGuide Chapter4 SectionI Needhelp?Email [email protected]

thecenteringis appliedto themoleculewith nohydrogenatoms,sofor astartingmoleculenamedmol.pdb,filescreatedare:

� mol.cen.noh.pdb- thecenteredPDB file with nohydrogenatoms

� mol.noh.center.xyz - thegeometriccenterof theoriginalPDB file, 1 line

Why? It is especiallyimportantthat themoving moleculebecentered.This ensuresthat,whenrotatedby DOT, itwill not swingoutof thegrid box.Centeringthestationarymoleculeprovidesthe largestdistancebetweenthemoleculeandtheedgesof thegrid forthe moving moleculeto fit into. It may be convenientto move the stationarymoleculefrom a centeredposition.For example,in dockingsto a fragmentof cytochromec oxidase,the faceof the fragmentthat is connectedto themembrane-boundportionwastranslatedto thebottomof thegrid, leaving availablemorespacein thegrid for themoleculebeingdocked.

3a. Scripts, utilities used

$DOT_ROOT/bin/share/pdb_make_centered mol.noh.pdb > mol.cen.noh.pdb

This finds the geometriccenterof the moleculeand writes a new PDB file with the sameatomsbut moved so themolecule’s centeris at the point (0,0,0). It createsmol.center.xyz andmol.cen.noh.pdb. The –round option roundstheX, Y, andZ coordinatesof thegeometriccenterto thenearestinteger, so thatcenteringto (0,0,0)involvesintegraltranslationsin the3 directions.

4.I.4 Protonation with Reduce(allh files)

Thescript“dot2-prep-mol-common”,calledby prepscript,usestheprogramReduceto addhydrogenatoms,bothpolarandnonpolar, to thealreadycenteredcoordinates.Within prepscript,givenastartingPDBfile namedmol.pdb,thePDBfile createdby Reducewith all hydrogenatomswill be namedmol.cen.allh.pdb. WhenReduceprotonatesthemacromolecule,it determinesa reasonableprotonationstatefor His andCyssidechainsfrom thelocalenvironment.

For protonation,3 problemsneedto beconsidered:

1. Theappropriatestateof theN-terminalresidueof apeptidechain.

2. Protonationstateof residueslike His andCys.

3. Protonationstateof unusualfunctionalgroups.

ReduceautomaticallyassignsN-termini as positively charged IF the residuehasresiduenumber1. Othercasesrequirespecialconsideration,seebelow. INCORRECTPROTONATION OF N-TERMINI is themostlikely reasonforanonintegral chargeon aproteinwhentheatomicpointchargesareassigned(seeSection4a.).

For atrueN-terminalresidue,theN-terminusshouldbeanaminogroup(-NH � ), consistingof atomsN, H1, H2, andH3, that contributesanoverall charge of +1 to the residue.For a trueC–terminalresidue,theC-terminusshouldbe acarboxylategroup(-CO� ), consistingof atomsC,O,andOXT, thatcontributesanoverallchargeof -1 to theresidue.Theterminiof thecoordinatesoftendonot representthetrueendsof theprotein;therecanbedisorderedregionsat theendsof oneor bothtermini thatarenot in thePDB coordinates.Thesemissingresiduesshouldbeindicatedby a REMARK465,MISSINGRESIDUESin thePDBfile. If theterminiarenot thetruetermini,they shouldbeuncharged.Theeasiestwayto dothis is to leavetheterminiasstandardaminoacids.CreatinganeutralN-terminusrequiresmanipulationof thePDB files createdby Reduce,seeSection4a.). In someproteincoordinates,thelastresidueof a peptidechainincludesan OXT atom,even thoughit is not the true terminus. If you know the C-terminusshouldbe uncharged, remove theOXT atom.

Reducealsochecksfor theproperorientationof Gln andAsn sidechains,determinestheprotonationstateof Hisresiduesfrom thelocalenvironment,andlooksfor andcorrectsstericclasheswithin themacromolecule.

28 May 30,2013

DOT 2.0UserGuide Chapter4 SectionI Needhelp?Email [email protected]

4a. Protonation of N-terminal amino acidsin proteins

In proteins,we call N-terminalresiduesall thosethatdo not have a covalentbondbetweenthemain-chainN atomandapreceding,C=Ogroup,definedin Reduceby aC–Ndistancebetween1.1and1.7A. N-terminalresiduesinclude

Case1. thetrueN-terminusof a peptidechain,with residuenumber1.

Case2. the true N-terminusof a peptidechain,with residuenumberother than1, aswhenprecedingresiduesarenotpresentin themolecule.

Case3. a residueat thebeginningof apeptidechainin thePDBfile but nota trueN-terminusin themoleculeitself.

Case4. a residuein themiddleof a peptidechainwhencoordinatesfor thepreceding,covalentlybondedresiduearenotin thePDBfile, asin adisorderedloop.

For true N-termini, we want a -NH � group(positively charged); for otherN-termini we want a -NH group(neutral),forming acompleteresiduebut with anincompletecovalentshellfor theN atom.

Case1: Residuenumber1 is the true N-terminusof the peptidechain. This caserequiresno userintervention.Reducewill, by default,add3 hydrogenatomsto residue1 of apeptidechain,creatingapositively charged-NH � group.Threehydrogenatomswill ONLY beaddedif Reducedeterminesthat theN atomis not attachedto a precedingC=Ogroup(distancecriterion).For example,if theproteinchainis numbered1A, 1B, 1, 2,...,Reduceputsthe-NH � grouponthefirst residue,1A. In thenext stepof “dot2-prep-mol-common”, thepresenceof theN-terminalhydrogenatomswillbeusedto determinethat this is a positively chargedN-terminalresidue,andthecorrespondingpartialatomicchargeswill beassigned.

Case2: The peptidechainbegins with a residuehaving a residuenumberotherthan1 andyou want this residueto have a positively charged-NH � groupat theN-terminus.N-terminalregion,but numberedasin theoriginal protein.This caserequiresediting a customizedprepscriptso that Reducewill addhydrogenatomsasdesired. Edit the callto dot2-prep-mol-commonfor theappropriatemoleculeby addingthe “Nterm” flag, asindicatedin thecommentlineabove thecall. The“Nterm” flag is passedalonginto theReducecommandline.Examplefor Case2: My stationarymoleculestat.pdbbegins with residuenumber82 andthis residueshouldhave an-NH � group.Within “prepscript”,I edit theline thatcalls“dot2-prep-mol-common”to processthestationarymolecule:

dot2-prep-mol-common -r Nterm82 -p stat.pdb -l $reslib -w stationary

With this command,all residueswith residuenumberslessthanor equalto 82 will have 3 hydrogenatomsaddedtotheir N atom,but only if thereis no preceding,covalently bondedC=O in the coordinatefile. Warning: If therearemultiple chainsin themolecule,Reducewill add3 hydrogenatomsto all theN-terminalresidueswith residuenumbers82 or less.Reducedoesnot considerchainIDs. Seetheboxon page31 for a tool thatcanhelp.

Case3: The N-terminal residueis not the true startof the chain,so it shouldbe neutralto avoid introducinganinappropriatepositive charge whenatomicchargesareassigned.The goal is to put a singlehydrogenatomon the Natomof this residue.Thiscaserequires

� makingacustomizedversionof “dot2-prep-mol-common”for themolecule

� editing“prepscript”

� editingthePDB file createdby Reduce.

Currently, Reduceaddsno protonsto a backboneN atomwhenthereis no precedingC=OAND theresiduenumberisnot1 AND the-Ntermoptionis not specified.To achieve our goal,we mustmake Reduceadd3 hydrogenatomsto thebackboneN atom,thenedit thePDBfile madeby Reducewith all hydrogenatoms.Examplefor Case3. Our startingstationarymoleculecoordinates,stat.pdb,begin with residue82 which is preceededby a disorderedN-terminalpeptidethat is missingin thePDB coordinatefile. Therefore,we wantresidue82 to have aneutralchargeN-terminus.First, copy dot2-prep-mol-commonfrom $DOT ROOT/bin/shareto thecoordsdirectoryofyourprojectandgive it a uniquename:

cd projdir/coords

29 May 30,2013

DOT 2.0UserGuide Chapter4 SectionI Needhelp?Email [email protected]

cp $DOT ROOT/bin/share/dot2-prep-mol-common dot2-prep-mol-stat

Second,edit “dot2-prep-mol-stat”by addinga “pause”commandto allow editingof

stat.cen.allh.pdb

the PDB file createdby Reduce,while prepscriptis paused. You must insert the “pause” commandat the point intheprepscriptfile after Reduceis run (creatingthestat.cen.allh.pdbfile) but before non-polarhydrogensareremoved.For ourcase,thepausecommandwouldbe:

pause ’correct protonation state of residue 82’

Youwould insertthis in yourcustomized“dot2-prep-mol-stat”to readasfollows:

grep ERROR $reduce log�

/dev/nullexit if matches found $? reduce reported an error, $pgm quitting, see $reduce logexit if file missing or empty $cen allh pdb

pause ’correct protonation state of residue 82’ � addthisecho Remove non-polar hydrogens

Third, edit prepscriptto call your customized“dot2-prep-mol-stat”.Replace

dot2-prep-mol-common -p stat.pdb -l $reslib -w stationary

with

../dot2-prep-mol-stat -r Nterm82 -p stat.pdb -l $reslib -w stationary

Now runprepscript.Prepscriptwill pauseafterrunningReduce,with themessage

correct protonation state of residue 82

Reducewill have added3 hydrogenatomsnamedH1, H2, andH3 to the N atomof residue82. To make the neutralN-terminus,atomsH1, H2, andH3 of residue82 needto bereplacedby asingleH. Edit stat.cen.allh.pdbto remove H2andH3, andchangethenameof H1 to H, thestandardbackboneamideprotonname.Make surethe ’H’ andthe restof thefieldson the line arein thecorrectcolumns.Save your file usingits original name.Then,resumeprepscriptbytypingenter(or return).

Case4: Thereis a disorderedloop in themiddleof theproteinchain,suchthata peptidesegmentis missingfromthePDB file but not from themoleculeitself. This createsa residuewith no preceding,covalentlybondedC=O groupin the middle of thepeptidechainthat shouldbe built asa neutralresidueto avoid introducinginappropriatepositivechargein theprotein.Thegoalis to have asinglehydrogenatomon theN atomfor this residue.Verysimilar to Case3,this caserequires

� Making a customizedversionof “dot2-prep-mol-common”for the moleculein which a “pause” commandisinsertedafterthe“allh” PDBfile is madeby Reduce.

� Editing “prepscript” to addthe appropriateflag for Reduceandcall the customizedversionof “dot2-prep-mol-common”.

� Editing thePDBfile (...allh.pdb)createdby Reducewhenthe“pause”commandpausesprepscript.

If prepscriptis runwithout thesechanges,Reducewill notaddany hydrogenatomsto theN atomlackingthepreceding,covalentlybondedC=Ogroupandthetotal chargefor theresiduewill notaninteger(awarningin prepscript)AND the

30 May 30,2013

DOT 2.0UserGuide Chapter4 SectionI Needhelp?Email [email protected]

total chargefor thepeptidechainwill notaninteger(anerrorthatwill cause“prepscript”to halt).Example for mixed case3 and case4: Coordinatesof my stationarystat.pdbaremissingtheN-terminalpeptide,

with thechainbeginningat residue82, andmissingthe loop segment,residues100-120.So we want residues82 and121eachto have a neutralN-terminuswith theN atomhaving a singleproton. As in Case3, make a customizedcopyof “dot2-prep-mol-common”thatincludesa “pause”statementafterthestat.cen.allh.pdb:

cd projdir/coordscp $DOT ROOT/bin/share/dot2-prep-mol-common dot2-prep-mol-stat

insertingthe“pause”command

pause ’correct protonation state of residues 82 and 121’

Then,edit prepscriptto call thecustomized“dot2-prep-mol-stat”.

../dot2-prep-mol-stat -r Nterm121 -p stat.pdb -l $reslib -w stationary

Reducewill thenadd3 hydrogenatomsto the N-terminusof any aminoacid with a residuenumberup to 121 thatis not precededby a C=O group. Whenyou run prepscript,it will pauseafter thestat.cen.allh.pdbfile is created;editbothresidues82 and121,replacingtheatomname“H1” with “H ” anddeleting“H2” and“H3”.

Example for mixed case2 and case4: Coordinatesof my stationarystat.pdbbegin at residue82, which is thetrueN-terminus,andarealsomissingthe loop segment,residues100-120.Sowe want residue82 to have a positivelycharged N-terminusand residue121 to have a neutralN-terminuswith a single hydrogenatom. Proceedas in theprevious Example,but when“prepscript”pauses,edit only residue121,replacingtheatomname“H1” with “H ” anddeleting“H2” and“H3”.

Renumbering residues: Sometimesyou will needto renumbera block of residuesin a PDB file to workaroundlimitations in varioustools, includingReduce.TheDOT utilities includepdb replace resnum, whichrenumbersresiduesconsecutively, startingat1 or any othernumber. For example,pdb replace resnum --start 20 in.pdb

�in.renum.pdb will renumberstartingat 20. Utilities

pdb chain andpdb resrange canhelp selectresiduesto be renumbered.For example,to renumberchainBstartingat1000without renumberingchainA:( pdb chain A in.pdb ; pdb chain B in.pdb

�pdb replace resnum --start 1000

)�

in.renumB.pdb

4b. Protonation statesof His and Cys

Reducehandlesprotonationof standardstatesof His andCyswithout intervention.DependingonpH andenvironment,theimidazoleringof His canbeprotonatedoneitherND1orNE2(neutral)oronbothND1andNE2(positively charged).In addition,the two possibleorientationsof the imidazolering shouldbeexamined(180degreering flip), since,at thetypical resolutionof crystallographicproteinstructures,the two orientationscannotbedistinguishedfrom theelectrondensityalone.Reduceconsidersbothorientationsof theimidazolering whenit determinesthemostlikely protonationstate[13]. For DOT, we suggestmakingHis residuespositively chargedonly whenthereis a compellingreasonfordoing so, suchasa His residuein which both ND1 andNE2 form hydrogenbondswith carboxylategroups,suchasthoseof Asp andGlu. Reducetakesthis conservative approach,usuallyaddingonly oneH atomto thetwo N atomsoftheimidazolering. ReducealsorecognizeswhenHis is ligatedto a metalion, andaddshydrogenatomsappropriately.Theusercanuseothermethodsto determinetheHis protonationstate,includingalgorithmsthatattemptto determinethepKa. For anisolated,neutralHis residueatpH 7, protonationonNE2 is slightly morefavorablethanprotonationonND1. Somemolecularmodelingprograms,suchasInsight (Accelrys)usethis asthedefault protonationstateat pH 7.Sofar, Reduce’s methodof basingtheprotonationstateon thelocalenvironmenthasgiventhebestdockingresultswithDOT.

Cyscanbebothfree(protonatedonSG)or form adisulfide(nohydrogenatom).Reducedeterminestheprotonationstatebasedon local environment.ReducealsorecognizeswhenCysis ligatedto a metalion, andaddshydrogenatoms

31 May 30,2013

DOT 2.0UserGuide Chapter4 SectionI Needhelp?Email [email protected]

appropriately.

4c. Reducegeometrycheck: HIS, ASN, and GLN sidechains

At the resolutionof crystallographicproteinstructures,the two possibleorientationsof theHis imidazolering andthesidechainamidesof Asn andGln cannotbe distinguished.Reducechecksthe hydrogenbondingenvironmentof thecrystallographicorientationandthe’flipped’ position(180degreerotation),andscoreswhich oneis best.Reducethencreatestheflippedcoordinates,if needed,andaddshydrogenatoms.In theHis imidazolering, thepositionsof CD2andND1 areexchangedandthepositionsof CE1andNE2areexchangedin theflippedconformation.For AsnandGln, theN andO atomsof thesidechainamideareexchanged.In theReducemodeusedby prepscript,thesidechainsareflippedonly whenthe environmentindicatesa clearpreferencefor the ’flipped’ orientation. We have found that the defaultsusedby Reducework well. Theusermaybe interestedin usingReduceinteractively throughtheMolprobity website(http://molpr obity.biochem.duke.edu/), wheretheusercanview the two orientationsin theproteinenvironmentandselectwhichshouldbeused.

4d. Reducegeometrycheck: Steric problems

Reduceusesthehydrogenatomsthatit addsto checkfor stericproblemsandappliessmallmotionsto thesidechainstorelieve them.

4e. Residuesother than standard amino acids

Thesegroupsinclude functionalizedamino acids,cofactors,DNA residues,metal ions, etc. Reduceis likely to doreasonablejob, especiallyif theCONECTrecordsfrom thePDB file areretained.We have testedReduceon HEM andDNA residues,althoughwith the changesover time in the PDB, the namesof residuesin DNA andRNA is a mess.Reducehasa utility to addnew functionalgroups(seeReducedocumentation).The usermay want to try addingHatomsfirst to new functionalgroupsandthenmake a customizedversionof dot2-prep-mol-common in which thecall to pdb dehydrogen,which removeshydrogenatomsin thestartingPDBfiles, is commentedout.

4.I.5 Createfileswith heavy and polar H atoms(polh) and RES namesfor chargelibrary

Thescript“dot2-prep-mol-common”,calledby prepscript,takesthecenteredPDBfileswith all hydrogenatomscreatedby Reduce(.allh files), andselectsall the heavy atomsplus the polar hydrogenatoms(.polh files), therebyremovingall of thenonpolarhydrogenatoms.Polarhydrogenatomsarethoseattachedto N, O, andS. For commonvariantsofstandardaminoacids,Prepscriptautomaticallyassignsuniqueresiduenamesbasedontheprotonationpattern.Currently,prepscriptcanfigureoutcommonvariantsof His andCysresiduesandwhetheranaminoacidis at theN- or C-terminusof apeptidechain.Forastartingmoleculenamedmol.pdb,theinputfile is thecenteredPDBfile with all hydrogenatoms:

mol.cen.allh.pdb

andthefile createdis thecenteredPDBfile with polarhydrogenatomsandadjustedresiduenames:

mol.cen.polh.pdb

Why? The currentDOT energy termsand charge libraries are balancedfor using atomic charge setsbasedonheavy atomspluspolarhydrogenatoms.Therefore,nonpolarhydrogenatomsmustbestrippedfrom thecoordinatefiles. Sinceeachvariantof a residuehasits own charge distribution, eachmusthave a uniqueresiduenamethatcorrespondsto its entry in the charge library, so that partial atomicchargescanbe correctlyassigned.Prepscriptusesthe traditionalAMBER names.For compatibility with othertools, this .polh file alsohaschainIDs removedand4-letterH atomsmodifiedto, say, HE11from thePDB name1HE1.

32 May 30,2013

DOT 2.0UserGuide Chapter4 SectionI Needhelp?Email [email protected]

5a. His, Cys,and N-terminal amino acid residues

Prepscriptautomaticallyassignsuniqueresiduenamesto commonvariantsof His andCysandto chargedaminoacidresiduesat theN- or C-terminusof a peptidechain. For His, thevariantsarerecognizedby theprotonationpatternsoftheimidazolering.

HIS or HIE Protonatedon NE2HID Protonatedon ND1HIP ProtonatedbothonND1 andNE2

For Cys,variantsarerecognizedby theprotonationof SG:CYS Freecysteine,protonatedon SGCYX Cysteinethatis partof adisulfide,no hydrogenatomaddedto SG

ChargedN- andC-terminiof peptidechainsareidentifiedbyN-terminus,exampleALAN Presenceof atoms1H, 2H, and3HC-terminus,exampleALAC Presenceof atomOXT

For example,if anaminoacidresidue,sayAla, includestheatomtypes1H, 2H, and3H, prepscriptwill assignit asbeingapositively chargedN-terminalresiduewith theresiduenameALAN.

5b. Charges- shouldadd up to correct formal charge

4.I.6 Moving moleculeshape(.xyz)

The script “dot2-prep-mol-common”,called by prepscript, createsthe “.xyz” file, which contains the centeredcoordinatesof theheavy atomsof themolecule.ThisDOT inputfile definesthemolecularshapeof themoving molecule.Eachline of the“.xyz” file hasfieldsX, Y, Z (in Angstroms).Example.xyz file fragment:

-24.004 10.540 13.35414.847 -1.450 26.42923.264 -49.230 39.562

For example,for ourstartingmoving moleculecoordinates,mov.pdb,thecentered.xyz file is

mov.cen.noh.xyz

whichhasthesamenumberof atomsasmov.cen.noh.pdb.Thisfile is createdin dot2-prep-mol-common.

pdb to xyz mov.cen.noh.pdb�

mov.cen.noh.xyz

DOT will readthe.xyz file andthenmapeachcoordinateof themoving molecule(heavy atomsonly) ontothenearestgrid point.

4.I.7 Moving moleculepartial atomic charges(.xyzq)

Thescript“dot2-prep-mol-common”,calledby prepscript,createsthe“.xyzq” file, which containsall heavy atomsandpolarhydrogenatomsandtheirpartialatomiccharges.ThisDOT inputfile definesthechargedistribution of themovingmolecule.In the“.xyzq” file, eachline hasfieldsX, Y, Z (in Angstroms),andpartialatomiccharge. Example.xyzqfilefragment:

-24.004 10.540 13.354 -0.914.847 -1.450 26.429 0.823.264 -49.230 39.562 1.2

For example,for our startingmoving moleculecoordinates,mov.pdb,thecentered.xyzqfile is

33 May 30,2013

DOT 2.0UserGuide Chapter4 SectionJ Needhelp?Email [email protected]

mov.cen.polh.xyzq

whichhasthesamenumberof atomsasmov.cen.polh.pdb. This file is createdin dot2-prep-mol-common.

pdb to xyzq mov.cen.polh.pdb�

mov.cen.polh.xyzq

DOT will readthe.xyzqfile andtheninterpolatethepartialatomicchargesof themoving molecule(heavy andpolarhydrogenatoms)over theeightnearestgrid points.

4.I.8 Creatingnew functional groups

For new functionalgroupstheuserwill have to edit theoutputpolh file to assignnew residuenamesthat identify thefunctionalgroupuniquely. To do this, theusermust

1. Make acustomizedatomicchargelibrary thatcontainsthenew functionalgroup

2. Make acustomizedversionof dot2-prep-mol-commonthatincludesa “pause”command

3. Edit prepscriptto call thecustomizeddot2-prep-mol-common

Add thepausecommand(section4.H.4,page26) to theprepscriptfile, sothatprepscriptpauseswhile theusereditsthe polh file. For example,a PDB file containsa Cys that is a Cu ligand. The CYS residuenameis in the originalPDBfile andReducehasCYS in its library andcorrectlyprotonatestheresidue.Reducecorrectlyaddsnoprotonto theCYS SG becauseReduceidentifiestheSG-Cubond. Prepscriptautomaticallyidentifiesthis CYS asCYX dueto theprotonationpattern.However, theuserknows thatCYS asa Cu ligandshouldhave anoverall chargeof -1, ratherthanbeingneutralasit is in a disulfide. Theusercreatesa new residueentry in thecharge library, CYM, with appropriatepartialatomiccharges.Theusertheneditstheoutputpolh file to changetheCYX residueto CYM, sothat thecorrectatomicchargesareassigned.

8a. Other functional groups

If thereareotherfunctionalgroupsor molecules,suchascofactors,presentin thePDB file, theusershould

� Checkto seeif thisgroupis in thechargelibrary.

� Checkthattheatomandresiduenamesin thelibrary andthePDB file match.

� If needed,createnew entryin thechargelibrary with auniqueRESnameandpartialchargesandatomicradii.

� Checkthetotal chargeon theresidueis asexpected.

� CheckthatReduceaddshydrogenatomsappropriately.

� Checkthatthecorrectpolarhydrogenatomsappearin thepolhfile.

Assignmentof partial atomicchargescanbe straightforward from examinationof the currentcharge library. Forexample,if I hadthe partial atomicchargesassignedfor CYM, a Cys ligatedto Cu, but I needCYM with a chargedC-terminus,I wouldcreateCYMC, whichaddedtypical chargesfor the-CO� group.

Onetheotherhand,theatomicchargesfor CYM itself aremoredifficult andbeyondthescopeof theDOT softwarepackage.For acofactor, aquantummechanicsprogramsuchasGaussiancouldbeused.Somecofactors,suchasHEM,have publishedsetsof point charges,becauseparametrizationof thesefunctionalgroupshasbeendonefor moleculardynamicssimulations.Complex functionalgroups,suchasmetalclusters,requirecomplex calculations

TheDOT teamhaschargelibrariesfor DNA basesandHEM groupswearehappy to sharewith you, justemailus.

34 May 30,2013

DOT 2.0UserGuide Chapter4 SectionK Needhelp?Email [email protected]

4.J Determining grid size

Optimally, themoving moleculein any orientationshouldfit in thegrid surroundingthestationarymoleculewhenthetwo moleculesareclose. This ensuresthat the moving molecule,whencloseto the stationarymolecule,will not beinfluencedby theshapeor electrostaticpropertiesof thestationarymoleculesin adjacentcells.Thedefault grid spacingis currently1 A. We have foundthata grid spacingof 2 A is too coarseto give goodresultswith macromoleculesandalsogivesa poorapproximationof theelectrostaticpotentialof thestationarymolecule.A grid spacingsmallerthan1A is not computationallyefficient for macromolecules.If theuseris looking at smallmoleculedockingover a smallerregion of space,a finer grid spacingmaybeneeded.Therotationsetsprovidedwill give a rotationalsearchfor a smallmoleculecomparableto thetranslationalsearchwith smallgrid spacing.Thedefault is acubicgrid.

4.J.1 Default methodbasedon molecular diameters

To ensurethatthemoving moleculesitsinsidethegrid surroundingthestationarymolecule,prepscriptcalculates

+ � �

where is the largestlengthfor the stationarymoleculein X, Y, or Z and � is the largestdiameterfor the movingmolecule.Prepscriptthenselectsthecubicalgrid sizethatis closestto, but largerthan,this sum.

4.J.2 Sizesefficient for the calculation.

Grid sizesof 64, 128, 160, 192, 224, and 256 are most efficient for the FFT calculation. The calculationtime isproportionalto ��������� , where� is thenumberof grid points.A grid sizeof 128takesabout9 timeslongerthanagridof 64 andagrid sizeof 160takesabout2 timeslongerthanagrid of 128.

Larger grids also take longer for the electrostaticpotentialcalculation(usingAPBS or UHBD), andgrids largerthanabout200-cubedneedto run the electrostaticcalculationon computerswith more than2 gigabytesof memory(not just swapspace)to finish in a reasonabletime (anhour). For a givengrid size,APBStakesseveral-fold lesstimethanUHBD, but hasmuchlargermemoryrequirements.For example,runningAPBSusinga 256-cubedgrid needs3.6gigabyteswhereasa192-cubedgrid needsonly 1.6gigabytes.Finally, whenyouusetheoptionsthatprepscriptsupplies,grid dimensionsthataremultiplesof 32work bestfor APBS.WehavesuccessfullyrunAPBSandDOT ongridsaslargeas448 � 512 � 576,132million points.

4.J.3 Userselectionof grid dimension

Theusercanoverridethegrid sizecalculatedby prepscriptby usingthe-d parameterin thegendot prepscriptcommandline. For example

gen dot prepscript -m moving.pdb -s stat.pdb -d 128

will createa prepscriptwith a grid sizeof 128. For a coupleof testcases,we have found that the grid sizenearest,but smallerthan,themoleculediametersumabove givesalmostidenticalresultsatadecreasedcomputationalcost.

4.J.4 Grid neednot be a cube

By editingPrepscript,youcanusegrid boxesthatarenotcubes;thatis, havedifferentdimensionsin X, Y, andZ. This isusefulwhenyourstationarymoleculeis highly non-spherical,aswhendockingto proteinfibers.Thegrid spacingmuststill bethesamein eachdimension.Seethecommentsin Prepscript,“if youneedanoncubicgrid”. Onceyousetxdim,ydim, andzdim in Prepscriptto differingvalues,therestof Prepscriptandtheutilities know how to handlethesevalues.

35 May 30,2013

DOT 2.0UserGuide Chapter4 SectionK Needhelp?Email [email protected]

4.K Prepscript: Stationary molecule

The shapeandelectrostaticpotentialsdescribingthestationarymoleculearemoredetailedandmoretime-consumingto calculatethanthoseof the moving molecule. NOTE: It is essentialthat both potentialsarecalculatedin the samecoordinatespace!Prepscripttriesto ensurethis by controllingtheinput file coordinatesthatareused.For example,forthestartingstationarymolecule,stat.pdb,thecoordinatefile

stat.cen.polh.pdb

is usedto createtheelectrostaticpotentialgrid, andthecoordinatefile

stat.cen.noh.pdb

is usedto createtheshapepotential.BesidesnecessaryDOT utilities, creatingthepotentialfiles requirestheprogramsMSMS,for creatingmolecularsurfaces,andAPBS(or UHBD), for calculatingtheelectrostaticpotential.

4.K.1 CalculateElectrostatic Potential for Stationary Molecule (dx)

The electrostaticpotentialgrid is normally calculatedby APBS.Prepscriptcreatesthe parameter(or ‘command’)fileandrunsAPBS.(PrepscriptcanuseUHBD instead;run

prepscript --uhbd

) The following parametersfor calculatingthe electrostaticpotentialaresetup in prepscriptandcanbe editedby theuser:pot ionstr=150 Ionic strength,millimolarpot maxits=500 For UHBD only

The APBS calculationcantake many minutesto run. The following line in prepscriptrunsthe script “dot2-prep-potgrid-apbs”,which lists thedefaultAPBSparameters,setsuptheAPBSparameterfile (.in file), setsupthecoordinatefile readby APBS which containsthe atomiccoordinates,partial charges,andatomicradii for eachatom(heavy andpolarhydrogenatoms)in thestationarymolecule,andrunsAPBS,creatinga log file andtheelectrostaticpotentialgrid(.dx file).

Example:For our startingstationarymoleculestat.pdb,thecoordinatesusedfor theelectrostaticcalculationarethecenteredcoordinateswith polarhydrogenatoms

stat.cen.polh.pdb

For a grid 128 A on a side(1 A spacing)with the calculationdoneat 150 mM ionic strength,the APBS input filescreatedby “dot2-prep-potgrid-apbs” aretheparameterfile

stat.cen.128.150m.apbs.in

andtheatomicposition/charge/radius file

stat.cen.polh.xyzqr

Outputfiles aretheelectrostaticgrid

stat.cen.128.150m.dx

andthelog file

stat.cen.128.150m.apbs.log

36 May 30,2013

DOT 2.0UserGuide Chapter4 SectionK Needhelp?Email [email protected]

Prepscriptchecksthatthetotal chargecalculatedby APBSis anintegerby looking at theAPBSlog file. If thetotalchargein theAPBSlog file is notaninteger, prepscriptwill stop.Evenif thetotalchargeis aninteger, it is agoodideatocheckthatthechargeis correct,look for “Net charge” in theAPBSlog file. Thischargeshouldbethesameasprepscriptcalculatedin thelast line of the.xyzqfile. Note: As a stand-aloneprogram,APBScalculatestheelectrostaticpotentialfor any setof coordinates,takingX, Y, Z, charge,andradiusasinput. APBSthereforecando no internalcheckingofwhetherresiduesarecomplete.Thewiseuserwill checktheexpectedchargeonamoleculegiventhenumberof chargedsidechains(Lys,Arg, Asp,Glu), thechargestateof thetermini, theprotonationstateof His (asdeterminedby Reduce),andthechargedueto cofactors,andcomparethatchargewith thosecalculatedby APBSandin the.xyzqfile.

Prepscriptcontrolsthesizeof thegrid usedby APBSso that it is compatiblewith theDOT calculation.The griddimensionsin APBSaretheDOT dimensionsplusone.For example,for ourDOT calculationdoneon a grid 128A onasidewith 1 A grid spacingfor stat.pdb,theAPBSinput file madeby prepscript

stat.cen.128.150m.apbs.in

hasparameters

grid 1.0 Grid spacingdime 129 129 129 Grid dimensions in the x, y, and z directions

Otherparametersfor APBSaresetto besimilar to theUHBD parametersthatwereusedto balancetheelectrostaticandshapetermsin DOT. Theusercandecideto altertheseAPBSparameters,but wehavefoundthatdifferentparametersinfluencethe magnitudeof the electrostaticpotentialover partsof the grid, particularly nearthe molecularsurface.Disruptingthis would requirechangingthe balanceof theelectrostaticandvan der Waalsenergy termscalculatedbyDOT (theDOT vdw weightandelectrostatic weightparameters).

4.K.2 Stationary ShapeDescription (xyzcrv)

The shapeof the stationarymoleculeis describedby an excludedvolumesurroundedby a 3 A favorablelayer (the.xyzcrv file). TheprogramMSMS is usedto createthesurfacesthatareusedto definethesevolumes.MSMS rolls aprobesphere(default radiusof 1.4A) overthevanderWaalsradii to generateacontinuoussmoothsurfacerepresentationof themolecule.

PrepscriptinvokesDOT utilities to createtheneededMSMSsurfaces,determinethegrid pointsinsideandbetweenthevolumes,assignvalues,andcreatethestationarymoleculeshapepotentialvolumefile readby DOT.

� Sphereswith center, radius,value that will be mappedonto grid by DOT The volume is specifiedby a list ofspheresthatDOT readsandfills in in theorderthey appearin thefile.

� Excludedvolume - all grid pts inside molecularsurfaceThe shapepotential is basedon volumesinside andbetweenmolecularsurfacesmadewith the MSMS program. Two surfacesaremade: 1) the solvent-excluded(‘molecular’ or Connolly)surface,and2) a molecularsurfaceformedusingatomswith their vanderWaalsradiiexpandedby 3 Angstrom. As above, only the nonhydrogenatomsof the coordinatesareused. All grid pointswithin thesetwo surfacesaredetermined;thosebetweenthe two surfacesareassigneda favorablevalue,thoseinsidethesolvent-excludedsurfaceareassignedanunfavorablevalue.Theresultinglist of grid pointsandvaluesis thexyzcrvfile readby DOT to build theshapepotentialgrid. Thesefilestypically have50,000to 100,000lines.

� Favorablevolume- all grid pointsbetweenthemolecularsurfaceandthesurfacemadewith 3A-extendedatomicradii.

� Additional atom-relatedpropertiescanbeadded.

Themolecularsurface(solvent-excludedor Connollysurface)is usedto definetheexcludedvolumeof themolecule,with all grid pointsinsidethis volumedefinedas“forbidden” (F). An expandedmolecularsurfaceis madeby adding3A to the radiusof eachatom. Grid pointsinsidethis surface,but outsidethe solvent-excludedsurface,aredefinedas

37 May 30,2013

DOT 2.0UserGuide Chapter4 SectionK Needhelp?Email [email protected]

“attractive” (A). NOTE: All surfacesaremadeusingtheheavy atomsonly, no hydrogenatompositionsareincludedinthecalculation.

Example:For ourstartingstationarymoleculestat.pdb,theinputfile for themolecularsurfacesis:

stat.cen.noh.pdb

Thecall to generatethe.xyzcrvfile in prepscriptis:

gen xyzcrvs ccp.cen.noh.pdb $xdim $ydim $zdim $xyzstep �2�&1 tee gen xyzcrvs.log

where$x,y,zdim are the x, y, and z dimensionsof the grid and $xyzstepis the grid spacing(prepscriptassumesadefault of 1 A).

Theresulting.xyzcrvfile, in ourexample

stat.cen.noh.xyzcrv

is a free-formattext file, eachline of which hasfields X, Y, Z, Action Code,radius,andoptionalvalue to be filledinto thespherewith thatcenterandradius.

1. X, Y, and Z are the coordinatesfor the spherecenter, in Angstroms,in the centeredstationarymolecule’scoordinatesystem.

2. TheAction codeis oneletter:

� F for ‘forbidden’ (theexcludedinterior volume,replacesany previousvaluesin thesphere’s volume)� A for ‘attractive’ (favorablevanderWaals,doesnot replaceforbiddenregion),� R for ‘replacewith attractive’ (favorablevanderWaals,regardlessof previousvalue)� S for ‘sum’ (replacewith sumof previousvalueandspecifiedvalue,doesnot replaceforbiddenregion)

3. Theradiusof thespherein Angstroms.If smallerthanthegrid spacing,only thesinglenearestgrid point will befilled in.

4. An optionalvaluecanbeused.Normally this is omittedbecauseit is determinedby theAction Code;seedotio.cfor detailsanddotio.hfor theactualnumericvalues.Currently, theseare1 for attractive(A) and1000for forbidden(F).

Prepscriptrunsgen xyzcrvsto producetwo lists of grid points: onesthatarewithin the innervolumeandareassignedas“forbidden”, andonesthatarewithin theouter(3A-extended)volumeandareassignedas“attractive”.

Example.xyzcrvfile fragment:

-24 1 13 A 0.1-24 2 -4 A 0.1-24 2 -3 A 0.1-24 2 -2 A 0.1-24 2 5 A 0.117 2 -2 F 0.117 2 -1 F 0.117 2 0 F 0.117 2 1 F 0.117 3 -6 F 0.1

In this typicalcase,thespheresall havearadiusof 0.1A, therefore,for thedefault grid spacingof 1 A, includeonlya singlegrid point. Thesearetheforbiddengrid pointsthatweredeterminedto lie insidethesolvent-excludedsurfaceandtheattractive grid pointsthatweredeterminedto lie betweenthesolvent-excludedandexpandedsurfaces.

38 May 30,2013

DOT 2.0UserGuide Chapter4 SectionM Needhelp?Email [email protected]

The.xyzcrvfile typically has50,000to 100,000lines,whichareall thegrid pointsassignedasforbiddenor attractive.Thegrid pointsthatarenotassignedanAction Codeor valuein the.xyzcrvfile have thevalue0.

Theconstructionof the.xyzcrvfile makesit easyto addadditionalatom-relatedproperties.Example: if it is known that a region of the stationarymoleculeis not available for the incomingmolecule,the

automaticallyassignedattractive layerin this regioncanbeoverwrittenby:

� Selectingtheatomsor secondarystructureelementsthatrepresentthis surface.

� Creatingspheresof forbiddenareacenteredat theseatoms.

� Appendingthesespheresto the.xyzcrvfile createdby prepscript.

4.K.3 Electrostatic Clamping Values

Theelectrostaticclampingvaluesarecalculatedby prepscriptandinsertedinto theparameterfile it generates.PrepscriptinstructsDOT to performa techniqueknown aselectrostaticclamping;theinterestedreadermayconsult

Robertset al. [6] for a thoroughexplanationof therationale.WeusethevanderWaalsvolumefile for thestericpartofthedockingproblem.However, therearewell–documentedproblemswith usingtheelectrostaticpotentialat thevanderWaalssurface.Wealsodefineasolventaccessiblesurfacewhich is anexpansionof thefirst basedonasolventradiusof1.4Angstromsdefinedin genxyzcrvs.We will calculatetheelectrostaticpotentialat thesolventexcludedsurface.Themaximumrangeat this surfacearethenusedastheclamplimits fo thepotentialat thevanderWaalssurface.This hastheeffectof smearingoutunrealisticallyconcentratedchargeat thesurface.

4.L DOT Parameter File (.parm)

Onceall themoleculeinputfileshavebeengeneratedsuccessfullyasindicatedby thepresenceof thefilesaboveandanymessagesdisplayed,prepscriptthencreatessampleDOT ‘.parm’ files thatspecifytheDOT calculationto beperformed.

TheDOT distribution includesatemplatefor theDOT 2.0parameterfile in $DOT ROOT/data/dot parm template.Thedefault valuesareusedunlessthey aresetexplicitly in prepscript.Thegrid dimensions,grid spacings,thelocationof the four structurefiles, andclampingrangearesetby prepscript.Whenproposeddocked structurespenetratethissurfacethemoleculesmaybethoughtto bumpinto eachother. In fact,to allow for flexibility not capturedby our rigidmodelwe mayallow a specifiednumberof ”bumps”in our results.Thisnumberwill bespecifiedin thedot.parmfile.

(emptydu-prep-files.tex file)

4.M Prepscript checksthr oughout processing

Althoughat this point theuserhasmostassuredlyprovidedappropriateinput files,a little errorcheckingcouldn’t hurt.Prepscriptwill checkfor the existenceof wateranda particular, but common,non-polarhydrogen. Also, sincethestructureshouldcontainthepolarhydrogens,we alsocheckthat thefile containsat leastsomehydrogens.Thesetestswill only find somecommonerrorsin thepdbfiles.

4.M.1 Molecular description filesnot madecorrectly (most commonproblem is file is empty)

4.M.2 Problemswith centering

Sinceprepscriptcentersthecoordinateswith addedpolarH atomsandplacesthecenterof massat thenearestgrid point,themidpointof thesecenteredfiles shouldbeoff by no morethan1 A from 0,0,0.Thefiles without polarH atomswillvary slightly from 0,0,0.Notethatif youenduprecentering(say, discover laterthatyou weremissingsomeatoms,addthem,andrecenter),ALL of theDOT input files for thoserecenteredcoordinatesmustberemade.

It is very importantthatthemoving moleculeis quitecloseto centered.How to checkthemidpoints:

pdb to xyz centered pdbfile�minmax

�pdbfile.minmax

pdb to xyz centered pdbfile pdbfile.minmax

39 May 30,2013

DOT 2.0UserGuide Chapter4 SectionM Needhelp?Email [email protected]

Also, comparethe first few lines of the centeredPDB files, with andwithout polar H atoms. The x,y,z coordinatesof theheavy atomsshouldbeexactly thesame.

4.M.3 Problemsrunning APBS

Wehave triedto supplyrunnablecopiesof APBSfor eachof oursupportedcomputerplatforms,but this is tricky andweareunableto testtheseonevery possiblecomputer. If you try to runABPSandgetmessagesabout“missinglibrary” or“GLIBC2.7 notavailable”,wemaybeableto help.First,seeif yourcomputercanrunoneof theolderalternateversionsthatwe includein $DOT ROOT/bin/$ARCHOSVfor yourparticular$ARCHOSV(suchas“i86Linux2” for 32-bit IntelLinux). If oneof thoseruns,say, “apbs-1.0.0-ia32”,do this sothat thename“apbs” will point to theprogramthatrunsfor you:

cd $DOT ROOT/bin/$ARCHOSV; rm apbs; ln -s apbs-1.0.0-ia32 apbs

If noneof themrun, it is likely you will needto updatethe operatingsystemsoftware on your computer;email usif youneedhelpdecidingif this is thecase.

4.M.4 Problemsrunning MSMS

If youhaveproblemsrunningMSMS,checkfirst on theSannerlabwebsiteto makesureyouhave theircurrentversion(seesection8.D.2,page60). If you gettheerrormessage“msms:commandnot found” or “/lib/ld-linux.so.2: badELFinterpreter”,seethatsamepage.If thesedonot solve theproblem,let usknow andwewill work with you,andwith theSannerlab if necessary.

4.M.5 Problemsrunning Reduce

As with APBS,we have tried to supplycopiesthat run without recompiling. If Reducefails to run for you, you canrecompileit on your computer(seeinstructionsin 9.E, page64), look for a newer versionat the Richardsonlab sitehttp://kinemage.biochem.duke.edu/software,or emailusandwewill helpyou if we can.

4.M.6 Problemscalculating charges:PDB to XYZQ

Typically, programsthat areusedto calculatethe electrostaticpotentialarounda moleculeutilize a PDB file for themoleculeand and appropriateresiduecharge library. Here we assumethe useof the includedcharge library. Theresiduesandatomsarelookedup in this library andwritten to anXYZQ file whichcontainsthecoordinatesandchargesof themoving molecule.

Verification: For the.xyzq file (moving molecule)checkthat the.xyzq file (last line) agreeswith your calculation.Checktherewereno errormessagesin the .xyzq file. Checkthebeginning of thefile, arethecoordinatesexactly thesameasthecorrespondingPDB file? For this testof prepscript,alsoremove all thecommentlines from the.xyzq fileandcheckthecenterpoint. which shouldbeexactly thesameasthecenteredPDB file, andcheckthat the4th column(charge)addsup properly.

For example:

grep -v "#" mov.cen.xyzq�

minmax

whichwill reportthecenter, and

grep -v "#" mov.cen.xyzq�

awk ’ � t+=$4 � ; END � print t � ’

whichwill sumthe4thcolumnof thenon-commentlinesin file mov.cen.xyzq.

40 May 30,2013

Chapter 5

DOT

5.A rundot

Theeasiestway to startDOT is by invoking rundot,ascriptweprovide. YoualwaysneedaDOT parameterfile andthefour input files discussedjust below (Section5.C). If you arerunningin ‘parallel mode’,on morethanonecomputer,youalsoneeda‘hosts’file, seesection5.D.2. rundot will examinetheparameterfile anduseit to createanew directory(folder) for eachrun, namedaccordingto the date,time, and parametersused. Thesefolders will be placedin theproject’s ‘runs’ folder, whichwill becreatedif it doesnot itself alreadyexist.

5.B Parametersmost commonlychanged

DOT’s functionsarecontrolledfrom a parameterfile, which DOT readsat thebeginningof a run. You will likely useseveralparameterfilesduringaresearchproject,with differentnumbersof allowedbumps(atom-atomclashes),fewerormorerotationsto try, andsaving differentnumbersof results.In general,startoutwith nobumpsandjusta few hundredrotations(best:just onerotation),andsave only thebest2000results.After you have examinedtheoutputof theinitialshake-down runs,increasethenumberof rotations(a smallerdegreespacingimpliesmorerotations)andsave perhapsthebest20,000or moreresults.

Small moving moleculesneedfewer rotationsto sampleadequatelytheir orientation,large moleculesneedmore.TableDOTROT shows therotationfilesprovidedin theDOT distribution. As aruleof thumb,choosearotationstepthatis aboutthesameasthe3-D diagonalof yourgrid, which is !"�#�%$'&)(�*,+.-0/1&)23� . For example,if yourmoving moleculehasa radiusof 15angstroms,1 angstromgrid spacing(1.7angstrom3-D diagonal)wouldsubtend1.7* 60degrees(perradian)/ 15 = 6 degrees,sothe”deg06.eul”file wouldbeappropriate.

It isbesttomakeyourparameterfileswith namesmeaningfultoyou,wefindapatternlike“udgugi.deg06.nb10.parm”to work well; indicatingthatthestationarymoleculeis udg,themoving moleculeis ugi, therotationspacingis 6 degrees,andthenumberof allowedbumpsis 10. NotethatDOT ignoresthefile nameandjust looksat thecontent,however.

TableDOTROT: Rotationfiles includedin DOT distributionSpacing Numberof File name

(degrees) Orientations (in $DOT ROOT/data)4 232020 deg04.eul4 12869 deg04.near180.eul Rotationsin range175to 180degrees6 54000 deg06.eul8 27000 deg08.eul

10 14400 deg10.eul12 9000 deg12.eul20 1800 deg20.eul72 60 deg72.eul90 24 deg90.eul

360 1 zero-rotation.eul

41

DOT 2.0UserGuide Chapter5 SectionD Needhelp?Email [email protected]

We find it easiestto make changesby copying to a new nameandeditingthecopy. For example,Prepscript’s parmfiles save the2000best-rankedplacements.To save 20000of them,copy thesampleparameterfile (thatprepscriptwillhave madefor you asstatmov.deg06.nb0.parm)to a namesuchasstatmov.deg06.nb0.top20000.parm . Edit that copyandchangetheline

output how many best values 2000

to

output how many best values 20000

If your moving moleculeis small, you might not needasfine a rotationspacingasthe 6 degreesthat prepscript’sparmfiles use.Supposeyou decide10 degreesis sufficient; copy statmov.deg06.nb0.parmto statmov.deg10.nb0.parmandchangedeg06to deg10in theline

rot file $DOT ROOT/data/deg06.eul

To allow tenbumps,changethe0 to 10 in theline

mov atoms in stat interior limit 0

Section5.G.2 gives detailson the many other parametersfor DOT but thesethree are (by far) the onesmostcommonlychanged.

5.C Molecular description filesneededby DOT

Theparameterfile alsonamesthefour necessaryinput files for DOT:

mov.cen.polh.xyzq electrostaticchargesof atomsof moving molecule

stat.cen.polh.*.dx electrostaticpotentialgrid of stationarymolecule

mov.cen.noh.xyz vanderWaalsspheresof moving molecule

stat.cen.nolh.xyzcrv shapepotentialof stationarymolecule

Becausetheelectrostaticpotentialfile canbe large,you maywant to compressit (as.Z file) or gzip it (as.gz file).DOT will automaticallylook for andreadthesecompressedfiles if theparameterfile specifiesthepre-compressednameandit is not found.

5.D Running DOT

5.D.1 Singleprocessormode

Youneedonly aDOT parameterfile andthefour molecular-data-derived inputfiles.rundot parmfile optional commentsfor logTheseoptionalcommentswill beappendedto $projdir/dotruns.txt

5.D.2 Multiple computerson a local network

You needthesameDOT parameterfile, thesamefour input files discussedabove,anda “hosts” file thatlists thenamesof thecomputersto runDOT on,oneperline.

42 May 30,2013

DOT 2.0UserGuide Chapter5 SectionD Needhelp?Email [email protected]

rundot parmfile –hostsdot.hostsoptional commentsfor log

If youhavea ‘dual processor’(or ‘quad’, or more)computer, likearecentMacintosh,youcanuseall theprocessingcoresby makinga hostsfile thathasyour computername(type “hostname”to seewhat it is) in it multiple times: forexample,if your dual-corecomputeris named”dopey”, make a hostsfile with two lineslike this (but use“dopey.local”on aMacintosh,or try “localhost” if neitherof thesework for you):

dopeydopey

Try four copieson a quadcore,andsoon. [The MPICH manualtells otherwaysto do this, but this is simpleandworksquitewell for DOT’s looseparallelism.]

Youcanmix differentcomputerplatformsin yourhostsfile aslongasDOT hasbeencompiledandinstalledfor thatcomputertype, the DOT 2.0 releaseincludesIntel Linux 32-bit and64-bit, Power-PC andIntel Mac OS X (Darwin),andSunSPARC Solaris.For example,if your localnetwork hasanIntel Linux named”dopey”, aPower-PCMacintoshnamed”happy”, anda SunSolarisnamed”bashful”, just make a 3-line text file nameddot.hostsin your DOT projectdirectorycontaining:

dopeyhappybashful

We have hadbestresultsputtingfirst thenameof thecomputeryou’re startingDOT on,but pleaselet usknow howthisworksfor you. (Onceyougetgoingandyouhavemany machineslisted,youcancommentoutonesyoudon’t wantfor aparticularrunby puttinga ”#” in front of thoselines.)

Be surethe $DOT ROOT directoryaswell asyour DOT projectdirectoryaremountedon all the hosts. We useNFS(Sun’s network file system)for this but any local areasharedfile systemshouldwork (let usknow of problemsorsuccessesplease).

Any computeryou list musthave the“ssh” (secureshell)service(or theolder“rsh”, remoteshell,seebelow) turnedonandenabledfor your login id. OnSolarisor Linux, this is usuallyanentryin /etc/services,type“mansshd”for help.On Macs,open“SystemPreferences4 Sharing”andcheckthe item “RemoteLogin”. You canallow remoteloginsfrom all usersor only thoseyou list.

Finally, you mustbeableto run programson all of thesehostsfrom thestart-uphostwithout having to typeyourpassword. Be sureof thisby typing (for example):

ssh happy date ; ssh bashful date

If youcannotdo thisusingssh,try againusingrsh:

rsh happy date ; rsh bashful date

If rsh works but ssh does not, set the environment variable P4 RSHCOMMAND to rsh (sh/bash: exportP4 RSHCOMMAND=rsh;csh/tcsh:setenv P4 RSHCOMMANDrsh)beforetrying to runDOT in parallel.

You can learn about how to set up your ssh and rsh environments so they do not require pass-words, see the MPICH web documentation,especiallyhttp://www-unix.mcs.anl.gov/mpi/mpich1/docs/mpichman-chp4/node127.htm#Node127 andhttp://www-unix.mcs.anl.gov/mpi/mpich1/docs/mpichman-chp4/node128.htm#Node128Seealsothediscussionin the”Installation” chapterof thismanual.

If you are unableto get either ”ssh” or ”rsh” to work without passwords, seethe MPICH ”chp4” manualtoset up a secureserver. The MPICH manualalso describesseveral fancierand more flexible ways to run DOT inparallel; in particular, you cansetup (once)a ”machines”file thenrun DOT using”mpirun -np ...”, seehttp://www-unix.mcs.anl.gov/mpi/mpich1/docs/mpichman-chp4/mpichman-chp4.htm You can also run DOT on a Beowulf-stylecomputerclusteror on a massively parallel supercomputersuchas an IBM BlueGene;pleaseemail the DOT helpline ([email protected])for help.

43 May 30,2013

DOT 2.0UserGuide Chapter5 SectionF Needhelp?Email [email protected]

5.D.3 Multiple processors,singlesupercomputer

Theprocedurevariesamongthesupercomputercenters.CCMShadexperiencewith BlueGeneandweareeagerto helpyoubut cannotoffer written instructionsyet.

5.E DOT actionson input files

5.E.1 Stationary moleculeshapepotential (.xyzcrv)

1. Map ontogrid

2. Forbidden- default value1000

3. Why anotherForbiddenvaluemight beneeded

4. Forbidden- needsto belargerthanmaxnumberof M atomsin favorablelayer

5. Attractive - default value1

5.E.2 Moving moleculeshape(.xyz)

1. Coordinatesof moving mol. heavy atomsaremappedontonearestgrid point by DOT

2. Why not interpolated?Problemswith excludedstat.mol volume

5.E.3 Stationary moleculeelectrostatic potential (.apbsgrd)

1. Electrostaticclamping- whatandwhy

2. Interior zeroing- whatandwhy

5.E.4 Moving moleculeatomic partial charges(.xyzq)

1. Atomic chargesinterpolatedontonearest8 grid pointsby DOT

2. DistributesM atomicchargebetterover grid - slightly betteranswers

5.F What DOT computes

5.F.1 van der Waalsenergy� Countof numberof M heavy atomsin favorablelayersurroundingS

� Counttimes-0.1kcal5 mol to give interactionenergy

� S favorablelayer3 A thick.

5.F.2 Electrostatic energy� Stationarymolecule:elec.pot. grid madeusingPoisson-Boltzmannmethods

� Moving molecule:atomicpartialcharges

� Intermolecularelec.energy calculatedasmov. mol partialchargesin elec.pot. of stationarymol.

� Elec.pot. grid is createdby APBS

� Chargelibrary usedfor bothS andM haschargesfor heavy atomspluspolarH model

44 May 30,2013

DOT 2.0UserGuide Chapter5 SectionG Needhelp?Email [email protected]

� Library currentlyhasaminoacids,pluschargedN andC termini for all.

� Library currentlyhasDNA nucleicacids,plus5 and3 termini for all

� APBScommandfile generatedby prepscript

5.G DOT parameters

dot2.parm DOT parameterfile. DOT’s functionsarecontrolledby a parameterfile, which DOT readsat thebeginningof a run. The “rundot” script arrangesto expandshell-stylevariables(like $DOT PROJECT, $DOT ROOT) beforehandingthefile to DOT.

5.G.1 SampleParameter file

Hereis asampleparameterfile, the‘#’ marksbegin comments:

# grid size (number of points in each dimension)dot_grid_size 64

# grid spacing (in Angstrom)dot_grid_step 1.0

# Input files.# stationary molecule electrostatic potential grid (UHBD or APBS):stat_pot_file $DOT_PROJECT/coords/udg/udg.cen.64.150m.dx

# stationary molecule shape potential:stat_vdw_file $DOT_PROJECT/coords/udg/udg.cen.noh.xyzcrv

# moving molecule atomic point charges:mov_charge_file $DOT_PROJECT/coords/ugi/ugi.cen.polh.xyzq

# moving molecule coordinatesmov_vdw_file $DOT_PROJECT/coords/ugi/ugi.cen.noh.xyz

# rotation list file:rot_file $DOT_ROOT/data/deg72.eul

# DOT computation parameters

stat_pot_clamp_high 1.8275stat_pot_clamp_low -1.7861

# number of moving mol atoms allowed to penetrate stat mol (bumps), default 0mov_atoms_in_stat_interior_limit 10

# DOT cautious approach:fussy on

# DOT outputout_base udgugi # output file base nameoutput_log_detail 5 # how chatty DOT is. Values 0 to 8 are reasonableoutput_how_many_best_values 2000mov_pdb_file $DOT_PROJECT/coords/ugi/ugi.cen.noh.pdb # passed to evaluate_dot_runstat_pdb_file $DOT_PROJECT/coords/udg/udg.cen.noh.pdb # passed to evaluate_dot_run

45 May 30,2013

DOT 2.0UserGuide Chapter5 SectionG Needhelp?Email [email protected]

5.G.2 Parameter Reference(Table)

Dot 2 parameters- 6%(87�(%�:9;�=<>+.-�$'-�?A@B9C@B$D*%EF9;@HGJICKML'EFN%�'O O%P 50LH�D5:O%P:OQL87RLSPT7%OD!:?"+MU=GD+Commonly-usedandrequiredparametersarein bold.parameter type default notesdot version string none Parameterfile formatidentifier(ignored)fftw plan string ”patient” Expertsonly, also”estimate”,”measure”do logNmerge boolean false Usefor massively parallelruns(BlueGene)fussy boolean true Halt if likely mistakesaredetected

dot grid size int 128 X,Y,Z sizeof thegrid in grid pointssizex int 128 X sizeof thegrid in grid pointssizey int 128 Y sizeof thegrid in grid pointssizez int 128 Z sizeof thegrid in grid pointsdot grid step float 1. (Angstroms)

stat pot file filename ”” APBS,UHBD, or DelPhigrid filestat vdw file filename ”” xyzcrvshapefilestat pdb file filename ”” pdbfile - passedto evaluatedot runmov charge file filename ”” xyzq filemov vdw file filename ”” xyz file, default is mov charge file datamov pdb file filename ”” pdbfile - passedto evaluatedot runrot file filename ”” Euleranglefileout base string ”” basenamefor all outputfiles

stat pot clamp high float +6. Max electrostaticpotentialvalueusedstat pot clamp low float -6. Min electrostaticpotentialvalueusedstatpot interior scale float 0. Expertsonly, interiorelectrostaticscalingstatpot interior zero boolean true Synonym for statpot interior scale0.0stat vdw interior float 1000. Expertsonly, sets”forbidden” value

vdw weight float -0.1 vanDerWaalsenergy termweightingelectrostaticweight float 1.0 electrostaticenergy termweightingmov atomsin stat interior limit integer 0 how many bumpsto allow

do partition sum boolean false computepartitionsum(”free energy”)partition sumtemp float 300.0 Kelvin

do energy boolean false Retaingrid of bestenergy pergrid celldo bgrids boolean false Automaticallyimpliesdo energydo histograms boolean falseoutput log detail integer 1 Values4 to 8 areresonableoutput how many best values integer 200 how many globally-bestenergiesto retainoutputhow many per gridcell bestvalues integer 0 Also called”saved bestvalues”outputhow many partition sumbestvalues integer unlimited Effective only if do partition sumis trueoutputall Ethreshold float -1000. Expertsonly, reportevery energy V this

Note: Dot accepts”true”, ”yes”, ”on”, or ”enabled”to setbooleanstrue;Dot interpretsanythingelseasfalse.Note: Dot sourcecodegenerallyusesvariablenamesdifferentfrom theseparms.

5.G.3 Parameter Guide

Parameter‘mov atomsin stat interior limit’. Sets the maximum number of moving molecule atoms allowed topenetratethestationarymolecule’s excludedvolume.Informally calledthenumberof bumps.

Parameter‘vdw weight’. Setstheweightingin thecompositeenergy of thevanderWaalscount(numberof moving

46 May 30,2013

DOT 2.0UserGuide Chapter5 SectionG Needhelp?Email [email protected]

moleculeatomsinsidethestationarymolecule’s favorablelayer).Thedefault is -0.1,sothateachmoving moleculeatomin thefavorablelayercontributes-0.1kcal/molto thevanderWaalsenergy. We have foundthat this default providesagoodbalancewith theelectrostaticenergy term.

Parameter‘output how many bestvalues’. Savesthe top N solutions,includingmultiple solutionscenteredat thesamegrid pointwith differentorientations.Thisbetterrevealsclustersthatmayindicatecorrectsolutions.Not availableif DOT wascompiledwithout HEAPoption.

Parameter‘output all Ethreshold’.Savesall solutions,includingmultiple solutionscenteredat thesamegrid pointwith differentorientations,with energiesmorefavorablethanthespecifiedthreshold.Needsto beusedcarefullyto avoidoutputtingavery largenumberof solutions.

Parameter‘output how many per gridcell bestvalues’. Savesthetop N placementstaken from the list of thebestranked solutionat eachgrid point. In otherwords,if two solutionscenteredat the samegrid point but with differentorientationswerehighly ranked,only theonewith thebestenergy wasincludedin the list. Not availableif DOT wascompiledwithout OLDSORT option.

Parameter‘fussy’. By default,DOT performsstrict checksfor possibleproblemsin theparameterfile options,suchas for no more than 200,000rotations,no fewer than 20 moving atoms,no more than 20,000moving atoms,for anon-positive potentialgrid low clamp.If youarerunninglargeproblems,or usingDOT for thingslike fitting to atomic-force-microscopy data[11]insteadof for protein-proteindocking,youmusttell DOT to belessfussy. To dothis,simplyincludetheline

fussy off

anywherein theparameterfile.

47 May 30,2013

Chapter 6

Evaluation

6.A After DOT run:

$projdir � cd runs$projdir/runs � ls$projdir/runs � (date.time).statmov.(rotations).nb(# of bumps)Therewill bea list of DOT resultsdirectoryfor eachsimulationruneachwill containthefollowing files. date.rundot.parmdot parms usedstatmov.topN.e6dstatmov.topN.top2000.ACE6.e6dlog

6.B evaluate dot run

If DOT finisheswithout reportingerrors,therundotscriptautomaticallyrunsasecondscript,evaluatedot run, to createPDB files for aninitial evaluation.You canalsorun evaluatedot run later, asmany timesasyou want,by giving it thenameof thedirectoryto do theevaluationin (suchasruns/20070927.2359.1jcg2rsa.deg06.nb0)

Theevaluatedot run will rescorethetop 2000DOT placementsusingthesumof theDOT electrostaticenergy andanempirical“ACE” termbasedon atomtypes[14], with a6 Angstromcutoff.

Theevaluatedot runwill thenrescorethetop5 DOT placementsandthetop5 ACE+electrostaticplacementsusingthesumof theDOT electrostaticenergy andanexposed-area-based“RAASP” termbasedon atomtypes[2], with a1.4Angstromsurfaceproberadius.

The“e6d” DOT resultfiles thatevaluatedot runmakesfollow anamingconventionwe have founduseful:(DOT RUN).(selecttopN).(rescore).(selecttopN).(rescore)...e6dFor example,

udgugi.top2000.e6darethetop2000resultsfrom DOTudgugi.top2000.top2000.ACE6.e6darethose,all rescoredwith ACE6,andsortedby ACE6+DOT electrostatics.udgugi.top2000.top2000.ACE6.top5.RAASP1.4.e6darethose,thetop5 selectedandrescoredby RAASPdesolvation.udgugi.top2000.top5.RAASP1.4.e6dare the top 5 selectedfrom the original DOT results, rescoredby RAASPdesolvation.

In our convention,which you shouldfeel free to vary from, we rescoreusingthenamedscoringfunction (ACE6,RAASP1.4,...) andthensort(bestto worst)by thesumof thattermwith theoriginalDOT electrostaticterm.

The evaluatedot run script then generatesPDB files of the docked moving molecule so you can look atthem with computergraphicsor, perhaps,minimize them or check for bumps. ThesePDB files will be placedin new directoriesin the runs/pdb/... directory namedmov.top30, mov.top2000.ACE6, mov.top5.RAASP1.4,andmov.top2000.ACE6.top5.RAASP1.4. The top 30 raw DOT moving molecules will be in mov.top30, namedmov.0001.pdb,mov.0002.pdb,etc., where ‘mov’ is the nameyou gave to your moving molecule. Theseare thetop 30 as reportedby DOT’s built-in electrostatic+ quick van der Waalsterm. The ACE-rescoredtop 30 moving

48

DOT 2.0UserGuide Chapter6 SectionE Needhelp?Email [email protected]

moleculeswill be in pdb/mov.top2000.ACE6 namedmov.NNNN.pdb, whereNNNN is their original DOT rankingand 6 is the ACE cutoff distance(6 Angstroms). The DOT+RAASP-rescoredtop 5 moving moleculeswill be inpdb/mov.top5.RAASP1.4namedmov.NNNN.pdb,andtheDOT+ACE+RAASPrescoredtop 5 moving moleculeswillbein pdb/mov.top2000.ACE6.top5.RAASP1.4namedwhereNNNN is theiroriginal DOT ranking

Why only 5? OurcurrentRAASPevaluationis ascriptinglash-upandis veryslow andnotparallelized.Therefore,to avoid lengthycomputations,thedefaultevaluatedot runsets“maxnRAASP=5”.Whenyouhaveassuredyourselfthatyour proteinmodelsareappropriate,make your own copy of evaluatedot run (seebelow) andcustomizeit byeditingthe“# defaults:” sectionto increase“maxnACE” to asmany DOT resultsasyou aresaving, perhaps20,000or even200,000,andincrease“maxnRAASP”to perhaps2000.

Theevaluatedot runscriptalsomakes,in thesefourdirectories,acombofile containingtheC-alphabackboneatomsonly, separatedby TER records,namedmov.top30.ca.pdb,mov.top30.ACE6.ca.pdb,mov.top5.RAASP1.4.top5.ca.pdb,andmov.top2000.ACE6.top5.RAASP1.4.top5.ca.pdb.

You mustexamineandcompareall thesePDB files againstthecentered stationarymolecule,not theoriginal oneyousuppliedto prepscript.Youwill find thisfile in yourcoords/stat/stat.cen.noh.pdb (where‘stat’ is thenameyougaveto your stationarymolecule). The rundotscript alsomakesa copy of it, by that name,in the runs/... directoryat theconclusionof eachsuccessfulDOT run.

6.C How to customizeevaluate dot run

You eventually will want to customizeevaluatedot run. To do this, copy it from $DOT ROOT/bin/shareinto yourprojectdirectory(cp $DOT ROOT/bin/share/evaluate dot run .), make it executable(chmod +x evaluate dot run ),andmodify it asyou wish. Whenrundotfindsanevaluatedot run there,rundotwill run it in preferenceto thedefaultversionin $DOT ROOT/bin/share(or, in fact,in preferenceto any otherfoundin your$PATH).

TheACE6 A cutoff is appropriatefor evaluatingdockingsof boundstructuresor of thosewith little conformationalchange.Someresearcherssuggesta 9 A cutoff worksbetterfor systemsshowing morechangeuponbinding[14]. Youcanedit evaluatedot run to do eitheror both(or any other): changethe line acecutofflist=”6” to acecutofflist=”6 9” ,for example.

6.D log files

A simplelog file from DOT is savedas“log”, with numberedlog filesfor eachcomputer, if theDOT runwasparallelized.

6.E e6doutput files

Thee6dformat is, basically, a databasestoredin a simpletext file. After a shortdescriptive header, eachline specifiesoneDOT moving moleculeplacementrelative to thecenteredstationarymolecule.Thefirst sevenfieldsof eachline arealwaysthe DOT total energy (kcal/mol), the threeX, Y, Z translationvalues(Angstroms),andthe threeEuler anglesrepresentingthe orientation(radians). By convention, the next threefields are the placement’s rank in the originalDOT scoring,theelectrostaticandVanderWaalscomponentsof theDOT energy, andthenumberof “bumps”of thisplacement.Fieldsbeyondtheseelevenareaddedby variousDOT2 evaluationutilities, includingACE andRAASP.

All e6dfiles includea three-lineheaderdescribingthe fields’ name,datatype, andunits. For example,the DOT“statmov.top2000.e6d”outputfrom asamplerunmight begin as:

## fields Energy X Y Z Phi Theta Psi Rank E_elec E_vdw Nbumps## types float float float float float float float int float float int## units kcal/mol ang ang ang radian radian radian 1-origin kcal/mol kcal/mol 0-origin-27.8619 -11 19 11 -0.209 0.411 0.270 1 -14.1619 -13.7000 9-27.8573 -11 19 10 0.105 0.471 0.014 2 -13.7573 -14.1000 10-27.7278 14 13 3 -0.209 1.427 2.102 3 -13.3278 -14.4000 8-27.7065 4 21 -2 0.314 1.567 2.643 4 -13.1065 -14.6000 8...

After processingthrough ACE6 and RAASP, and resorting by electrostatic+ RAASP score, the e6d file“statmov.top2000.ace6.top5.RAASP1.4.e6d”might begin as:

49 May 30,2013

DOT 2.0UserGuide Chapter6 SectionL Needhelp?Email [email protected]

## fields Energy X Y Z Phi Theta Psi Rank E_elec E_vdw Nbumps ACE6 E_elec+ACE6 RAASP1.4 E_elec+RAASP1.4## types float float float float float float float int float float int float float float float## units kcal/mol ang ang ang radian radian radian 1-origin kcal/mol kcal/mol 0-origin kcal/mol kcal/mol kcal/mol kcal/mol

-23.8236 -15 17 10 0.314 0.471 0.014 767 -12.0236 -11.8000 9 2.8480 -9.1756 -130.3064 -142.3300-25.8742 -14 18 9 0.419 0.551 -0.181 59 -13.3742 -12.5000 9 4.6193 -8.7549 -128.8119 -142.1861-22.9091 -12 19 8 -0.105 0.934 -3.016 1924 -13.2091 -9.7000 10 0.8951 -12.3140 -127.9991 -141.2082-23.9238 -15 17 9 0.314 0.340 -0.014 695 -11.5238 -12.4000 9 2.3483 -9.1755 -129.6000 -141.1238-23.0095 -4 22 -1 -0.628 1.165 3.093 1744 -10.1095 -12.9000 7 -0.1107 -10.2202 -128.1367 -138.2462...

Note the lines now include the ACE and RAASP scores(with and without addingDOT electrostatics)for eachplacement,asidentifiedin theheaderlines. Theoutputhasbeensortedby thelastfield, but theoriginal DOT rankingsareavailablein field 7 (767,59,1924,....).

The DOT2 distribution includesutilities to selectandsort e6drecords,appendandextract fields by name,andtoconvert to CVS(comma-separatedvalue)formatfor furtheranalysis,seepage54.

6.F Quick comparisonof centerand orientation

Distanceof the centerandorientationfrom a referencevalue- useful for comparisonswith the correctsolutionor apreferredsolution.Needsonly theDOT outputfile of solutions(translationsandorientations)not thefull coordinates.

6.G Creating PDB files

Run“pdbgenmov.cen.noh.pdbshortname-for-output-files file.e6d”

6.H RMSD valuesbetweenPDB files

RMSD differencesamonga set of possiblesolutions. Useful for identifying similar solutionsamongthe top fewsolutions.RequiresmakingthePDB files,with at leastC-0�W+3XR- positions.

6.I Bump-checkingPDB files

RequiresmakingthePDB files.

6.J Residue-residueinteractions

Usefulfor comparisonwith aknown solutionor apreferredsolution.Themethodusedto evaluateresultsin theCAPRIcompetition.RequiresmakingthePDB files.

6.K Re-ranking by ACE scores

Example:re-rankthetop20,000DOT resultsby sumof ACE potential(using6A cutoff) andDOT electrostatics,thengeneratePDB filesof thetop20:

e6dfirst 20000statmov.top20000.e6d��ace-desolvation-oo–dist6 mov.cen.pdbstat.cen.pdb��e6dsortbyE elec+ACE6 ��e6dfirst 20 ��pdbgenmov.cen.pdbmov

50 May 30,2013

DOT 2.0UserGuide Chapter6 SectionL Needhelp?Email [email protected]

6.L Distancefiltering

A distancefilter is usefulin applyingknown biochemicaldata,suchasthataspecificresidueor setof residueslie in theinterface.

The basictool is the DOT utility dotxyzfilter, whoseargumentsare a distance,a count, a stationaryxyz file, amoving moleculexyz file, andaninpute6dfile. It passesthrough(ie, filters)any e6dplacementsfor whichat leastcountstationaryatomsarewithin thespecifieddistanceof oneor moremoving moleculeatoms.

Hereis anexamplescriptthatfindsany placementswheremoving residue228’s CZ atomis within 6 Angstromsofstationaryresidue96’s CD atom. (Noteall theseexamplesassumethey arerun in thedirectorythatholdsyour results,suchasruns/20080704.statmov.nb0)

#! /bin/bashstatmol=stat.cen.noh.pdbstatres=96statatom=CDmovmol=mov.cen.noh.pdbmovres=228movatom=CZ

# create .xyz files# working file name, eg stat.228.CZ.xyzstatxyz=${statmol%.*}.$statres.$statatom.xyzpdb_resnum $statres $statmol | pdb_atomtype $statatom | pdb_to_xyz > $statxyz

movxyz=${movmol%.*}.$movres.$movatom.xyz # working namepdb_resnum $movres $movmol | pdb_atomtype $movatom | pdb_to_xyz > $movxyz

# run filter, input from statmov.top2000.e6d, output to statmov.filter.e6d# Filter will pass any e6d placements that have# one or more of the specified moving molecule atoms# within 6 Angstroms of any of the specified stationary molecule atomscount=1dmax=6dotxyzfilter $dmax $count $statxyz $movxyz < statmov.top2000.e6d > statmov.filter.e6d

# generate PDB files of top 30 that pass filtere6d_first 30 statmov.filter.e6d | pdbgen $movmol filter

Hereis a fancierexamplescript thatfindsany placementswhere5 or moreC-alphaatomsin stationaryresidues25to 33 arewithin 6 Angstromsof any atomin moving residues228,240,or 293. Without the-r flag, it would reportthe“reverse”: any placementwhere5 or moreatomsin moving residues228 etc werewithin 6 Angstromsof any of theC-alphaatomsin stationaryresidues25 to 33.

#! /bin/bashstatmol=stat.cen.noh.pdbmovmol=mov.cen.noh.pdb

# create xyz filesstatxyz=${statmol%.*}.filter.xyz # working namepdb_resrange 25 33 $statmol | pdb_ca | pdb_to_xyz > $statxyz

movxyz=${movmol%.*}.filter.xyz # working namepdb_resnum ’228 240 293’ $movmol | pdb_to_xyz > $movxyz

# run filter, input from mov.top2000.e6d, output to mov.filter.e6d

51 May 30,2013

DOT 2.0UserGuide Chapter6 SectionM Needhelp?Email [email protected]

# filter will pass any e6d placements with# any five or more stationary atoms within 6 Angstoms of any moving atomcount=5dmax=6dotxyzfilter -r $dmax $count $statxyz $movxyz < statmov.top2000.e6d > statmov.filter.e6d

# generate PDB files of top 30 that pass filtere6d_first 30 statmov.filter.e6d | pdbgen $movmol filter

Theoutputof onefilter canbepipedinto anothersinceeachwritesto standardoutputandreadsfrom standardinput.Youcanalsoconcatenatetheoutputfrom filters,however, theincominge6dfilesmusthave thesamefields(a limitationof thee6dfile format).

The dotxyzfilter utility hasoptionsto countthe moving insteadof the stationaryatoms,or to reportonly booleanresults;run

dotxyxfilter --help

6.M Creatingvisualization input files

Oftenyoujustwantto seewherethecentersof themoving moleculeare.A simpleway is to usee6d to pdbhohto makea “f ake” PDB file thathasawaterat eachmoving moleculecenter. For example,run

e6d first 200 statmov.top2000.e6d�e6d to pdbhoh

�top200.pdb

You canthenusea moleculargraphicsprogramto look at top200.pdbalongwith stat.cen.pdbto survey quickly whatpartsof thestationarymoleculewerefavorablefor docking.

52 May 30,2013

Chapter 7

DOT Utility Programs

7.A Intr oduction

The DOT distribution includesa setof programsintendedto help setup andevaluateDOT runs. About half of thesearescripts(written in a blendof csh,bash,andawk), andabouthalf arecompiledprograms(written in C or C++) Thescriptsareinstalledin $DOT ROOT/bin/share,andtheprogramsin $DOT ROOT/bin/$ARCHOSV, whereARCHOSVis oneof thesupportedplatformsfor DOT suchasi86Linux2. As long asyour $PATH includesbothof these,you willnothave to worry aboutwhich is which, just typethename(or write it into yourown scripts)andit will work.

Most of theseprogramswerewritten by the CCMS team,but we draw your attentionto ‘reduce’, which is fromthelaboratoryof David andJaneRichardsonat Duke University, written by JMichaelWord. We encourageyou to citetheir work in any publicationsthatresultfrom your useof DOT, alongwith thework of theAPBS,MSMS,FFTW, andMPICH teams(see‘Acknowledgements’,page6).

7.B Descriptions

The following is an admittedly telegraphic rundown of theseprograms,for most of them you can type their namefollowedby “–help”. Thescriptsoftenhave moreinformationin them,which you canlook at with any editor, andthesourcefor thecompiledprogramswill befoundin theDOT installationdirectory$DOT ROOT/src/util.

ace-desolvation-oo (compiledprogram)computesdesolvationvaluesfor ACE,RAASP, andotherpublishedparametersets.

acevalues-oo (compiledprogram)looksupACE or RAASPatompair coefficient

dotxyzfilter (compiledprogram)evaluationdistancefilter

bgrid info (compiledprogram)inspects(seldomused)DOT binarygrid

bgrid minmax (compiledprogram)inspects(seldomused)DOT binarygrid

bgrid sort (compiledprogram)sorts(seldomused)DOT binarygrid

convert-uhbd-grid to bgrid (compiledprogram)creates(seldomused)DOT binarygrid

bgrid bestenergy (compiledprogram)inspects(seldomused)DOT binarygrid

analyze-triangles (compiledprogram)checksfor degeneratetriangles,usedby prepscript

compare-edges (compiledprogram)checkstrianglelist edges,for datadebugging

compare-faces (compiledprogram)checkstrianglelist faces,for datadebugging

compare-verts (compiledprogram)checkstrianglelist vertices,for datadebugging

53

DOT 2.0UserGuide Chapter7 SectionB Needhelp?Email [email protected]

create-triangles (compiledprogram)createstrianglelist from MSMSoutput

expand-trianglesY (compiledprogram)enlargestrianglesby normals

matrix to eul (compiledprogram)convertsrotationmatricesto DOT eulerangles

fill-double-hull (compiledprogram)fills regionbetweenpolyhedra

fill-hull (compiledprogram)fill region within polyhedron,usedby prepscript

print-half-edges (compiledprogram)Obsolete

6dtoxfm (compiledprogram)convertsxyz-eulerto 4x3matrix

e6d closeness (compiledprogram)reportsangleandtranslationbetweenxyz-eulervaluesandspecifiedtarget. Seealsoe6dclosestande6dselectby dist angle.

e6d closest (compiledprogram)finds xyz-eulervalueswithin specifiedtolerancesof a specifiedtarget. Seealsoe6dclosenessande6dselectby dist angle.

e6dexpand (compiledprogram)fills cubicalgrid from e6dfile, for visualization

orient survey (compiledprogram)checkseulerfiles for completenessandnon-redundancy

ACE-script-oodot2 (script)runsACE evaluationwith specifiedoptions

Analyze-MSMS-script (script)usedby genxyzcrvsto createtrianglesfrom vertex andfacelists

Expand-tri-script (script)No longerusedor includedin DOT distribution

Fill-double-hull-script (script)runsfill-double-hull with specifiedoptions

Fill-hull-script (script)runsfill-hull with specifiedoptions

MSMS-exp-script (script)No longerusedor includedin DOT distribution

acenames (script)looksupPDB atomnamesin internalACE or RAASPdictionary. Usedby pdb to acenames.

apbsgrd lookup xyz (script) finds valuesin an apbs.dx grid, usedby prepscriptto computeelectrostaticclampingvalues.

archosv (script)reportswhatcomputerplatformit is runon,usedby prepscriptandrundot

bgrid to avsfield (script)converts(seldomused)DOT binarygrid for AVS visualizationprogram

create host file (script)makesanMPICH p4pgfile from simplifiedinput,usedby rundot

create p4pg entry (script)makesanMPICH p4pgentryfor aspecifiedhost, usedby rundot.

dot (script)currentlyout-of-datescriptto runDOT on supercomputers

dot0matrix (script)convertsobsoleteDOT 0 euleranglesto rotationmatrix

dotmatrix (script)convertscurrentDOT 2 euleranglesto rotationmatrix

dot pdb e6d eval rmsd (Ruby script) computesRMSD in angstromsbetweena target moleculeand a movingmoleculeaspositionedby xyz-eulervalues.Usedby evaluatedot runif thefile ”mov.instatcen.noh.pdb”ispresentin themoving moleculecoordsdirectory. Requiresthe“ruby” languageto beinstalled,seepage60.

dotpause (script)putsaDOT run to sleepon theuser’s computersoDOT will not interferewith theuser’s otherwork

dotresume (script)resumesa dotpause-edDOT run

54 May 30,2013

DOT 2.0UserGuide Chapter7 SectionB Needhelp?Email [email protected]

dot2-prep-gridsize (script)computessizeof grid for a run,usedby prepscript

dot2-prep-mol-common (script)computesfilesneededfor bothmoving andstationarymolecules,usedby prepscript

dot2-prep-potgrid-apbs (script)computeselectrostaticpotentialgrid usingAPBS,usedby prepscript

dot2-prep-potgrid-uhbd (script)computeselectrostaticpotentialgrid usingUHBD, usedby prepscript

dot2.setup.bash (script)when“source”ed,setsuser’s executionpathto includeDOT programs(bashshellversion)

dot2.setup.csh (script)when“source”ed,setsuser’s executionpathto includeDOT programs(C-shellversion)

e6d append axang (script)addsorientationaxis& angleto e6drecords

e6d append expression (script)addsmetadatafield to e6dheaderblock

e6d append rmsd (script)addsalternateRMSD to e6dentries

e6d append values (script)addsarbitrarydatato e6dheader& entries

e6d classify by xyz distance (script)passesentriesoutsideof specifiedsphere

e6d common lines (script) reportsentriesin two e6d files that are the samein fields 2-7 (placement:x, y, z, andorientation)

e6d extract fields (script)extractskey-namedvaluesfrom e6drecords

e6d first (script)selectsfirst N entriesfrom ane6dfile

e6d nonsimilar (script)quickly eliminatessimilarplacementsfrom ane6dfile

e6d selectby (script)selectsentriesthatmatcha criterionfrom ane6dfile

e6d selectby dist angle (script) quickly selectsplacementsfrom an e6dfile that arewithin a distanceandangulartolerancefrom aspecifiedtarget.Seealsoe6dclosenessande6dclosest.

e6d selectlessthan (script)selectsentriesthathavekey-namedvaluesnumericallylessthanspecifiedthresholds,suchasRMSDor energy.

e6d sort by (script)sortsane6dfile by a specifiedfield. Oftenusedin a pipelinefollowedby e6dfirst. For example,“e6d first 2000in.e6d— e6dsort by ACE6— e6dfirst 20” will reportthetop 20 “ACE6”-scoredentriesout ofthefirst 2000DOT entries.

e6d to cvs (script)convertse6dfile to comma-separated-valueformat,with headerline identifying thefield names.

e6d to pdbhoh (script) converts e6d file to a PDB file of one-atomwaterscenteredat eachentry’s x,y,z movingmoleculecenter.

evaluate dot run (script)basicpost-DOT-run evaluation.RescoresusingACE andRAASP, thencreatesPDB files ofmoving moleculeasplacedby DOT alone,by DOT 4 ACE,by DOT 4 RAASP, andby DOT 4 ACE 4 RAASP.Pleaseseethebeginningof thescript for information,andnotethat thedefault numberof RAASPevaluationsisvery small.Automaticallyinvokedby run dot,or canberunafterwardsasmany timesasdesired.

expand environment variables (script)expands$VAR if VAR is anenvironmentvariable,usedby rundot

eul to matrix (script)convertsDOT eulerangleto !:9Z&[?A@�*�! rotationmatrix

gen apbs com (script)generatesAPBSinputcommandfile from template,usedby prepscript

gen uhbd com (script)generatesUHBD input commandfile from template,optionallyusedby prepscript

gen dot parm (script)generatesaDOT parameterfile from a templatein $DOT ROOT/data

55 May 30,2013

DOT 2.0UserGuide Chapter7 SectionB Needhelp?Email [email protected]

gen dot prepscript (script)generatesfile ‘prepscript’customizedto user’s moving andstationarymoleculenames

gen xyzcrvs (script)makesstationarymoleculeshapedescriptionfile, usedby prepscript

gentestuhbdgrd (script)createsdummyUHBD gridsfor softwaretesting

hostlist to mpichhosts (script)convertslist of computersinto form neededby MPICH, usedby prepscript.

minmax (script)reportsminimumandmaximumvaluesin afile, field by field

minmaxmean (script)reportsminimum,maximum,andmeanvaluesin afile, field by field

pdb atom (script)PDBfile filter thatpassesonly atoms,notheaders

pdb atomhetatm (script)PDB file filter thatpassesonly atomsandhetatoms,notheaders

pdb ca (script)PDBfile filter thatpassesonly C-alphaatoms,usedby evaluatedot run

pdb cat with ter (script) ConcatenatesspecifiedPDB files, insertinga TER (chain termination)recordin betweeneach,usedby evaluatedot runandpdbgen.

pdb dealtloc (script)PDBfile filter thatpassesonly thefirst of any ‘alternatelocations’for anatom,usedby prepscript.

pdb dehydrogen (script)PDB file filter thatremovesall hydrogenatoms,usedby prepscript.

pdb dewater (script)PDBfile filter thatremovesall watermolecules,usedby prepscript.

pdb make centered (script)centersa PDBfile by geometricboundingbox,usedby prepscript.

pdb renameres by hydrogens (script)renamesresiduesin aPDBfile accordingto theirpolarhydrogenpattern,usedby prepscript.

pdb replaceresnum (script)PDBfile filter thatrenumbersresiduesconsecutively.

pdb replaceselenium (script)PDB file filter thatrenamesseleniumatomsto sulfurs.

pdb rmsd matrix (Rubyscript)printssquarematrixof RMSDvaluesbetweenall pairsin aspecifiedsetof PDBfiles,which musthave thesamenumberof atomsin thesameorder. Evaluationtool typically usedto feedclusteringprograms.Requiresthe“ruby” languageto beinstalled,seepage60.

pdb rot (script)rotatesaPDB file by specified39Z&)?A@S* 3 rotationmatrixabouttheorigin.

pdb rottrans (script)doespdb rot followedby pdb trans,q.v.

pdb to boxcenter (script)computesboundingboxof PDBfile

pdb to acenames (script)convertsPDBatomnamesto ACE or RAASPinternalcodes.

pdb to dot (script) prints the moving moleculex, y, z, charge of eachatomin a PDB file. Seepdb to xyzq for animprovedversion.

pdb to uhbd (script) renamesPDB (pre-version-3)hydrogensto form acceptableto UHBD (essentiallyAMBERnames),alsoclearschain-idcolumn.

pdb to vol (script)usedby pdb to xyzcrv

pdb to xyz (script)printsthex,y,z coordinatesof eachatomin aPDBfile, usedby prepscriptfor themoving molecule

pdb to xyzatomres (script)printsthex,y,z coordinates,atomtypes,andresiduetypesof eachatomin aPDB file.

pdb to xyzcrv (script) computesthe DOT stationarymoleculeshapedescription,including forbiddeninterior andattractive vdw layer, usedby prepscript.

56 May 30,2013

DOT 2.0UserGuide Chapter7 SectionB Needhelp?Email [email protected]

pdb to xyzq (script)printsthex,y,z coordinate,charge[andoptionally, radius]of eachatomin aPDBfile. Thevaluescomefrom the AMBER-style ‘rlb’ file in $DOT ROOT/data/uhbd.amber84.prot.rlbunlessa different library isspecified.

pdb to xyzqr (script)printsthex,y,zcoordinate,charge,andradiusof eachatomin aPDBfile. Thechargescomefromthe AMBER-style ‘rlb’ file in $DOT ROOT/data/uhbd.amber84.prot.rlbunlessa different library is specified.The radii comefrom the MSMS-stylefile in $DOT ROOT/data/atmtypenumbers.Not currentlyusedin DOTdistribution.

pdb to xyzr (script)printsthex,y,z coordinateandradiusof eachatomin aPDBfile. Theradii comefrom theMSMS-stylefile in $DOT ROOT/data/atmtypenumbers.Usedby prepscriptfor theMSMS calculationsof thestationarymoleculevolume.

pdb trans (script)translatesaPDB file by specified3-elementtranslationvector, usedby prepscript.Seealsopdb rotandpdb rottrans.

pdbchecksubunit (script)reportswhetheraPDB file containsvaryingchainID codes

pdbdiameter (script)reportslargestradialdimensionof aPDBfile, usedby prepscriptanddot2-prep-gridsize.

pdbfr omdot (script)Appliesrotationandtranslationto PDBfiles,usedby pdbgen.

pdbgen (script)makesa moving-moleculePDB file for eachplacementin ane6dfile, assigningeacha namederivedfrom its original rankingin theDOT run thatmadethee6dfile.

rundot (script) user-level script to run DOT in an automaticallycreatednew subdirectory, with logging of the runincludingusercommentary. Automaticallyrunsevaluatedot run, from theDOT scriptlibrary or from theprojectdirectory.

striph (script) PDB filter that removes non-polar hydrogens, retaining polar hydrogens (as defined in$DOT ROOT/data/uhbd.amber84.prot.rlbunlessadifferentlibrary is specified.)

uhbdgrd limit (script)replacesout-of-boundsvaluesin aUHBD electrostaticpotentialgrid.

uhbdgrd lookup xyz (script) finds valuesin a uhbd.grd grid, usedby prepscriptto computeelectrostaticclampingvalues.

uhbdlog to xyzq (script)reportsatom-by-atomchargesafterUHBD programhaslookedthemupin its library. Usefulfor debuggingAMBER-stylelibraries.

vol to xyzcrv (script)usedby pdb to xyzcrv

xyz fill sphere (script) converts xyz file to list of points within spherescenteredat eachxyz point. Useful forbuilding excludedvolumesinto xyzcrv files, for example:xyz fill sphere-r 2 -s 1 inhib.instatcenter.cen.noh.xyz� inhib.instatcenter.cen.noh.fill sphere.xyz

xyzq check ok (script)verifiesthattotal chargeis integral andwithin reasonablelimits, usedby prepscript.

xyzq xyzr to xyzqrxml (script) merges separately-computedcharge and radius files into input suitable for apbsprogram.Not currentlyusedby DOT distribution.

xyzqr to xyzqrxml (script)convertsx, y, z, charge,radiusfile into input suitablefor apbsprogram.

57 May 30,2013

Chapter 8

Installing DOT2

8.A Intr oduction

Welcometo DOT2! Thefollowing instructionsprovide abasicoverview of asimpleDOT installationonyour system.TheDOT2 distribution consistsof a singletarredandzippedfile containingpre-compiledbinariesandtwo external

librariesaswell asshellscripts,two of threeneededexternalprograms,userdocumentation,andtheDOT2 license.We currentlyoffer commandline binariesfor theRedHat Intel Linux platform(both32-bit and64-bit) aswell as

for MacOSX (bothIntel andPowerPCprocessors),andfor SPARC Solaris8 (SunOS5). For all othersystemsyouwillneedto compileandinstall from source,seeChapter9. In particular, theDOT developmentteamhasnot testedDOT onWindows; theport shouldbepossible.Thereareversionsof all requiredpackageson Windows underCygwin but thework of integrationandtestinghasnot beendoneandis not plannedfor thenearfuturedueto lack of resources.Usersarewelcometo try it on theirown.

Pleasefeel freeto contacttheDOT helpline, [email protected],regardingany issuesor requests.

8.B DOT Installation Quick Start Guide

In theinstructionsthatfollow we areassumingyouwill beusingthebashshell.YoumayinstallDOT in yourhomedirectoryor, if youhaveroot(superuseror administrator)accessto yourmachine,

youmayinstallDOT in /usr/local/binor anequallyappropriatelocation.We recommendyou unpackthedistribution in a directorythat is mountedon all computerplatformson which you

expectyourusersto runDOT, suchasagloballyexporteddirectoryonanNFSfile server. If youdothis,youruserswillbeableto runDOT on theplatformsincludedasbinariesin ourdistribution with no additionalinstallation.

After unpackingthedistribution, you will needto settheDOT ROOT environmentvariableto thedirectoryof theunpacked distribution. In thefollowing example,we install DOT into a directoryoff our homedirectoryandrenameitdot2.0:

tar xfz DOT2.0.tar.gzmv DOT2.0 dot2.0export DOT ROOT=$HOME/dot2.0source $DOT ROOT/bin/share/dot2.setup.bash

Youwill alsoneedto downloadtheMSMS programfrom theSannerlab; seebelow, page60.

8.C What is in the DOT distrib ution

The distribution containsthe DOT executablesfor both single and multi-processorruns, shell scriptsnecessarytogenerateDOT inputandanalyzeDOT output,two pre-compiledexternallibraries:

58

DOT 2.0UserGuide Chapter8 SectionD Needhelp?Email [email protected]

1. the MessagePassingInterface library (MPICH, version 1.2.7p1, ftp://ftp.mcs.anl.gov/pub/mpi/mpich.tar.gz)neededfor multi-processor(parallel)runs.

2. theFastFourierTransformlibrary (FFTW, version3.1.2http://www.fftw.org) neededfor all DOT runs.

andtwo of thethreeexternalprogramsrequiredto generateDOT input:

1. APBS,1.4.0,8 November2012,from http://sourceforge.net/projects/apbs, suppliedwith theDOT distribution.

2. Reduce, version 3.14.080821, from http://kinemage.biochem.duke.edu/software, supplied with the DOTdistribution (includingsource).

3. MSMS,not in the DOT distribution, seebelow.

Also includedin thedistribution arethe$DOT ROOT subdirectories“src” and“data”. Thesrcdirectoryis for peoplewhowantto changeDOT or build it to runonothercomputers,seeChapter9. Thedatadirectorycontainsfiles for DOTusers:therotationsets,theatomicchargeanddesolvationlibraries,andtemplatesthat“prepscript”uses.

DOT and DOT utilities are in $DOT ROOT/bin. The bin subdirectorycontainsa “share” subdirectory, whichcontainsplatform-independentshellscripts.Thebin subdirectoryalsocontainsplatform-specificsubdirectories,whicheachcontaincompiledDOT, DOT utilities, APBS,andReduce.Weprovideexecutablesfor thefollowing six platforms,with theindicatedsubdirectorynames:

1. Mac OSX (Intel), subdirectoryi86Darwin10

2. Mac OSX (PowerPC),subdirectoryppcDarwin9

3. Linux (Intel 32-bit),subdirectoryi86Linux2

4. Linux2 (Intel 64-bit),subdirectoryx86 64Linux2

5. Linux3 (Intel 64-bit),subdirectoryx86 64Linux3

6. Solaris8 (SunSPARC), subdirectorysun4SunOS5

The directory namesare generatedby the script ’$DOT ROOT/bin/share/archosv’.This script is run when yourun the script to createDOT input or the script to run DOT. It queriesthe computeryou areon andreturnsa string$ARCHOSVaccordingly, sothatthecorrectexecutableis foundfor yourmachinearchitectureandoperatingsystem.Ifyouwantto checkthatexecutablesareavailablefor your computer, type:

$DOT ROOT/bin/share/archosv

Theresultingstringshouldmatchoneof thedirectorynamesin $DOT ROOT/bin listedabove.

Mac OS X versions: Notethatthe2013distribution’s Mac OSX PowerPC(ppc)executableshave beencompiledon Mac OSX 10.5Leopard(Darwin 9). TheMac OSX Intel (i86) executableshave beencompiledon Mac OSX10.6Snow Leopard(Darwin10),but havebeentestedto work alsoon10.7Lion (Darwin11)andon10.8MountainLion (Darwin 12), thereforewe have set‘archosv’ to report‘Darwin10’ for Darwin 10, 11, and12. We no longersupportTigerandIntel Leopard(Darwin 8 and9).

8.D External programsneededto createDOT input files

8.D.1 Generalutilities: shell,awk/gawk, Ruby

DOT utilities areeithercompiledprograms(which areinstalledin $DOT ROOT/bin/$ARCHOSV),or arewritten asshell scripts(which areinstalledin $DOT ROOT/bin/share).Many of theseshell scriptsinvoke standardUnix/Linuxutilities, includingawk, bash,csh,head,tail, sort,m4,sed,grep,rm, anddate.Noneof theseshouldbeaproblem,but ifyou find incompatibilies,let usknow. The‘m4’ program(usedby gendot prepscript)comeswith Mac OSX, but onlyif you install theDeveloperTools,sowesupply‘m4’ in thedistribution, in theappropriate$DOT ROOT/bin/...Darwin..

59 May 30,2013

DOT 2.0UserGuide Chapter8 SectionD Needhelp?Email [email protected]

subdirectory. Someof the scriptsneeda versionof ‘awk’ that hascertaincapabilitiesand is called ‘awk’ on someplatformsand‘nawk’ onothers;thescriptsfirst try to use‘nawk’ andif ‘nawk’ is not found,they use‘awk’. In all cases,GnuAWK ‘gawk’ canbeused.A few of DOT’s newestanalysistoolsusethe“Ruby” programminglanguage.If youneedto run dot pdb e6deval rmsdor pdb rmsd matrix anddo not have Ruby installed,you maydownloadit for freefrom http://ruby-lang.org/en/downloads; any Rubyversion1 (1.8.2or later)is fine.

Creatingtherequiredinput for DOT needsthreespecializedexternalprograms:MSMS,Reduce,andAPBS.MSMScanbedownloaded,asdescribedbelow. TheDOT distribution includespre-compiledbinariesfor ReduceandAPBS.TheDOT utilities invoke theprogramsby thelower-casenames

msmsreduceapbs

Theseprogramsareneededonly on the platform on which you will be preparinginput for DOT. If you will bepreparinginput only on aparticularplatform,suchasMac OSX on Intel, youneedinstall themfor thatplatformonly.

Detailedexplanationof eachprogramandlibrary follows.

8.D.2 MSMS

The MSMS program[10]doesmolecularsurfacecalculationsneededby prepscript.MSMS is from the laboratoryofMichel Sanner. Binary executablesof MSMS for all of the platformssupportedin the currentDOT distribution areavailable from http://mgltools.scripps.edu/downloads. Be aware that you will needto scroll down this pageto findthe MSMS package.Download the appropriate“tar.gz” file, run ‘gunzip ....tar.gz’, thenmake a new directory (say,i86Darwin10if that is your platform),cd into thatdirectory, thenrun ‘tar xvf ../...tar.gz’. This will make abouta dozenfiles, includingdocumentationandreleasenotes;theexecutablefile is named‘msms.platform....versionnumber’.Copythatfile to $DOT ROOT/bin/$ARCHOSV/msms.For example,if for ourMac,wedid

tar xvf msms MacOSX 2.6.1.tar

theresultingfilesincludethefile msms.MacOSX.2.6.1andweputacopy in theappropriate$DOT ROOT/bin directory:

cp msms.MacOSX.2.6.1 $DOT ROOT/bin/i86Darwin10/msmscp msms.MacOSX.2.6.1 $DOT ROOT/bin/ppcDarwin9/msms

Note that the MSMS teamis currently (mid-2013)distributing a Macintosh“universalbinary” containingboth aPowerPC(ppc) andan Intel-native (i86) program;copying the samebinary to both ppcDarwin9andi86Darwin10isOK. Similarly, we find their Intel Linux binary works for both Linux2 and64-bit Linux3 platforms,so you cancopythe Intel Linux binary into any or all of i86Linux2, x86 64Linux2,or x86 64Linux3. If you arerunninga newer IntelFedoraLinux operatingsystemandyou get theerrormessage“msms: commandnot found” or “/lib/ld-linux.so.2: badELF interpreter”,thismeansyouneedto install a32-bit compatibilitypackageon thatcomputer. Try thefollowing:

sudo yum install glibc.i686

8.D.3 Reduce

TheReduceprogram[13] addshydrogensto molecularmodelsin anintelligentandflexible way neededby prepscript.Reduceis from thelaboratoryof David andJaneRichardsonatDuke University, writtenby J.MichaelWord. Reduceisfree,open-sourcesoftwareavailablefrom http://kinemage.biochem.duke.edu/softwareTheDOT projectis currentlyusingreduce.3.14.080821. Notethatyoucanrebuild Reduceif youneedto, see9.E,page64.

60 May 30,2013

DOT 2.0UserGuide Chapter8 SectionE Needhelp?Email [email protected]

Reduceneedsto know whereits heteratomgroupdictionaryis. Throughthecourtesyof theauthorsof Reduce,wedistributeacopy of this dictionaryin the$DOT ROOT/datadirectory, file reducewwPDB het dict.txt .

8.D.4 APBS

The APBS (Adaptive Poisson-BoltzmannSolver) program[1] performs electrostaticpotential calculationsneededby prepscript. APBS is from the laboratoryof NathanBaker ([email protected])of the DepartmentofBiochemistryand Molecular Biophysics,Centerfor ComputationalBiology, WashingtonUniversity in St. Louis.APBS is free, open-sourcesoftwareavailable from http://sourceforge.net/projects/apbs . After you have downloadedtheexecutablebinaryfor yourplatformsof interest,youmayinstallapbsanywhereyoulikeaslongasit is in your$pathwhenyou run prepscript.We foundthat theMac OSX (Darwin8)version,a “universalbinary” for bothPPCandIntelMacintoshes,needsadministratorprivilegesto install; thisappearsto beamistake in theinstallationprogram.TheDOTprojectis currentlyusingAPBSversion1.0.0,21April 2008.

8.E External libraries neededto run DOT

TheDOT distribution includestwo externallibraries,MPICH andFFTW. recompilingthemis neededonly if you arebuilding DOT by compilingfrom source.

8.E.1 MPICH

You needMPICH only if you wish to run a DOT job on more than one computer. MPICH software allows DOTcalculationsto be donein parallelon a multicorecomputer, a local-areanetwork, or a cluster. MPICH is from theArgonneNationalLaboratoryof theUnitedStatesDepartmentof Energy andis free,open-sourcesoftware.

Wesupplyfor eachplatformapre-compiledMPICH-versionof DOT, nameddot2.mpich,in $DOT ROOT/bin/$ARCHOSV.The“rundot” scriptautomaticallyinvokesthis versionof DOT for parallelruns.

To run MPI programson your local network, you must be able to useeither ‘ssh’ or ‘rsh’ without typing yourpassword; this canbethorny if securityis asevereconcern,see5.D.2,page43 for suggestions.

Note the DOT project currentlyusesMPICH 1 version1.2.7p1. DOT doesnot useMPICH 2 becausecurrentlyMPICH 2 doesnotlet usersmix differentkindsof computersin asingleparallelrun,importantin atypicalheterogeneouscomputernetwork.

If youpreferOpenMPI,we have runDOT usingOpenMPI;contactusif youhave problems.

61 May 30,2013

Chapter 9

RecompilingDOT2 and its components

You may want to changeDOT or build it to run on a new kind of computer. To do this, you needto install andbuildthesoftwarelibrariesFFTW3andMPICH thatDOT uses,andthenbuild DOT. If you want to prepareDOT input filesor analyzeDOT outputfiles, you will alsoneedto build theDOT2 utilities for thenew kind of computer. This chapteroffersinstructions.

9.A Recommendeddir ectory layout

This is thefile layoutwe suggestandthatthedistribution ‘tar’ file creates:Under$DOT ROOT/: bin, src,data,mpich,fftw3

Under$DOT ROOT/bin: shareandindividual platformdirectoriesIn $DOT ROOT/bin/share:non-compiledexecutablescriptsDOT usersneedIn $DOT ROOT/bin/$ARCHOSV: compiledexecutableprogramsDOT usersneedUnder$DOT ROOT/src: dot,util, andshareIn $DOT ROOT/src/dot:dotC sourcefilesUnder$DOT ROOT/src/dot/$ARCHOSV: individual platformbuild directoriesIn $DOT ROOT/src/util: dotutility C andC++ sourcefilesUnder$DOT ROOT/src/util/$ARCHOSV: individual platformbuild directoriesIn $DOT ROOT/src/share:distribution copy of executablescriptsIn $DOT ROOT/data:non-compiled,non-executedresourcesDOT utilities needUnder$DOT ROOT/mpich/$ARCHOSV: distribution andindividual platformdirectoriesUnder$DOT ROOT/fftw3/$ARCHOSV: distribution andindividual platformdirectories

Notethatin fourcases(src/util,src/dot,mpich,andfftw3), youshouldnotrunconfigure/makein thosedirectoriesbutinsteadin asubdirectoryyoumake thatis namedafteraparticularplatform($ARCHOSV).For example,on i86Linux2:

cd $DOT ROOT/mpich; mkdir i86Linux2cd i86Linux2; ../configure for dot2 ; make installcd $DOT ROOT/fftw3; mkdir i86Linux2;cd i86Linux2 ; ../configure for dot2 ; make ; make installcd $DOT ROOT/src/util; autoreconf -i; mkdir i86Linux2cd i86Linux2 ; ../configure ; make ; make installcd $DOT ROOT/src/share; ./configure ; make installcd $DOT ROOT/src/dot; mkdir i86Linux2cd i86Linux2 ; ../configure ; make ; make install

62

DOT 2.0UserGuide Chapter9 SectionD Needhelp?Email [email protected]

9.B GNU Autoconf/Automake

Westronglyrecommendyoudownloadthemostcurrentversionsof

� TheGNU Autoconftoolsfrom http://www.gnu.org/software/autoconf/ (theDOT projectis usingversion2.61).

� TheGNU Automake toolsfrom http://sources.redhat.com/automake/ (theDOT projectis usingversion1.10).

9.C How to recompileFFTW

The FFTW fourier transform library[3] is neededto compile DOT whether you want to run parallel or single-processor. FFTW is free software,availablefrom http://www.fftw.org, seehttp://www.fftw.org/fftw3 doc/Installation-on-Unix.html#Installation-on-Unix TheDOT projectis currentlyusingversion3.1.2.

Download the tar file from fftw.org, and copy or move it into the directory $DOT ROOT/fftw3, unpackit as$DOT ROOT/fftw3/fftw-3.1.2.For eachplatformyouwish to build DOT for ($ARCHOSV):

mkdir $DOT ROOT/fftw3/$ARCHOSVcd $DOT ROOT/fftw3/$ARCHOSV../configure for dot2

This runs../fftw-3.1.2/configure–enable-portable-binary –enable-float–prefix=$DOT ROOT/fftw3/$ARCHOSVThen

makemake check # (optional but recommended)make install

9.D How to recompileMPICH

If youwantto recompiletheparallelversionof DOT, youwill needto download,configure,andmakeMPICH. MPICHis availablefrom ftp://ftp.mcs.anl.gov/pub/mpi/mpich.tar.gz (about16 MB). Youcanprobablyjust click on thatURL inawebbrowserandthefile will bedownloaded.If not,use“ftp” directly:

ftp ftp.mcs.anl.govftp> binaryftp> cd pubftp> cd mpiftp> get mpich.tar.gzftp> quit

If yougetanerrormessagein responseto the“binary”, ignoreit. After downloadingmpich.tar.gz:

mkdir $DOT ROOT/mpichmv mpich.tar.gz $DOT ROOT/mpichcd $DOT ROOT/mpichgunzip mpich.tar.gz # unpack the zip filetar xf mpich.tar # unpack the tar file

Thiswill createadirectorynamed$DOT ROOT/mpich/mpich-1.2.7p1.We foundat TSRI thattheMPICH installation“configure” stepappearsto checkfor theabilty of you, theinstaller,

to runsshwithoutapassword,andconfigurationsdoneby peoplewhocouldnotdo thiswerenotusableby peoplewho

63 May 30,2013

DOT 2.0UserGuide Chapter9 SectionG Needhelp?Email [email protected]

could,soour adviceis that installersshouldmake surethey can“ssh(localhost)”without typing a password, seepage43.

Configureand make MPICH for eachplatform ($ARCHOSV), on a host that supportsthat platform (eg, IntelMacintoshfor i86Darwin):

source $DOT ROOT/bin/share/dot2.setup.bash (or .csh)mkdir $DOT ROOT/mpich/$ARCHOSVcd $DOT ROOT/mpich/$ARCHOSV

Note: we found that on Linux and on Mac OS X we had to set the environmentvariableRSHCOMMAND at thispoint. On bash,type“export RSHCOMMAND=/usr/bin/ssh”.On csh,type“setenv RSHCOMMAND/usr/bin/ssh”.

Be surethat$DOT ROOT is setnow, sinceits valuebecomescodedinto partsof MPICH whenyou configureandmake,socheckit:

printenv DOT ROOT

Noteno dollar-sign in this; verify thatits valueis whatyouexpect,suchas/usr/local/dot2.Now run:

../configure for dot2

This runs ../mpich-1.2.7p1/configure–prefix=$DOT ROOT/mpich-1.2.7p1/$ARCHOSV–with-device-name=chp4 –disable-f77–disable-cxx–disable-f90modules–disable-short-longs

make install

9.E How to recompileReduce

You can rebuild Reduce from the source files provided by the Richardson lab, re-distributed by us in$DOT ROOT/reduce/reduce.3.14.080821.src. Before running “make” there,“export CC=gcc” (for csh: “setenv CCgcc”). After running“make”, “mv reducesrc/reduce$DOT ROOT/bin/$ARCHOSV”.If youarebuilding for additionalplatforms,do “make clean”and“rm */*.a” in betweeneachbuild.

9.F How to recompileDOT2 Utilities

These do not need FFTW3 or MPICH; just a C/C++ compiler (such as gcc). To rebuild and install in$DOT ROOT/bin/shareand$DOT ROOT/bin/$ARCHOSV:

cd $DOT ROOT/src/share; configure; make installcd $DOT ROOT/src/util; autoreconf -i; mkdir $ARCHOSV; cd $ARCHOSV;../configure; make; make install

Compiling theutilities will yield someC++ “deprecated”messagesyou canignore. If you arestoppedby fatalerrors,pleaselet usknow sowe canhelpyouwork aroundthemandthenincludecorrectionsin a futureDOT release.

9.G How to recompileDOT2

First make andinstall fftw3 andmpich. Notethatyou needto build bothFFTW3andMPICH as64-bit beforebuildingDOT as64-bit, andsimilarly for 32-bit. The DOT Makefile we provide builds two executables,the non-parallelized“dot2” andtheparallelized“dot2.mpich”.To rebuild andinstall bothinto $DOT ROOT/bin/$ARCHOSV:

64 May 30,2013

DOT 2.0UserGuide Chapter9 SectionG Needhelp?Email [email protected]

cd $DOT ROOT/src/dot; mkdir $ARCHOSV; cd $ARCHOSV;../configure; make; make install

If youdon’t needtheparallelizedDOT, youdon’t needmpichatall, sojust

cd $DOT ROOT/src/dot; mkdir $ARCHOSV; cd $ARCHOSV;../configure; make dot2; make install

Testby makinga new directory, settingDOT ROOT, running“dot tutorial”, andtrying the exercisesdescribedinthetutorial,Chapter2. Email uswith any questionsandwe will helpasmuchaswe areable.As always,thankyou foryour interestin DOT.

65 May 30,2013

References

1. N. A. Baker, D. Sept,S. Joseph,J. J. Holst, andJ. A. McCammon.Electrostaticsof nanosystems:Application tomicrotubulesandtheribosome.Proc.Natl. Acad.Sci.USA, 98:10037–10041, 2001.

2. J. Fernandez-Recio,M. Totrov, C. Skorodumov, and R. Abagyan. Optimal docking area: A new methodforpredictingprotein-proteininteractionsites.Proteins, 58:134–143,2005.

3. MatteoFrigoandStevenG. Johnson.Thedesignandimplementationof FFTW3.Proc.IEEE, 93(2):216–231,2005.

4. M. K. Gilson, M. E. Davis, B. A. Luty, andJ. A. McCammon. Computationof electrostaticforceson solvatedmoleculesusingthePoisson-Boltzmannequation.J. Phys.Chem., 97:3591–3600,1993.

5. N. Guex andM. C. Peitsch.SWISS-MODELandtheSwiss-PdbViewer: anenvironmentfor comparative proteinmodeling.Electrophoresis, 18(15):2714–2723,Dec1997.

6. D. S. Law, L. F. Ten Eyck, O. Katzenelson,I. Tsigelny, V. A. Roberts,M. E. Pique,andJ. C. Mitchell. Findingneedlesin haystacks:RerankingDOT resultsby using shapecomplementarity, clusteranalysis,and biologicalinformation.Proteins, 52:33–40,2003.

7. V. A. Roberts,D. A. Case,andV. Tsui. Predictinginteractionsof winged-helixtranscriptionfactorswith DNA.Proteins, 57:172–187,2004.

8. VictoriaA. Roberts,MichaelE. Pique,SimonHsu,ShengLi, GeirSlupphaug,RobertP. Rambo,JonathanJamison,TongLiu, JunHo Lee,JohnA. Tainer, LynnF. TenEyck,andVirgil L. Woods,Jr. CombiningH/D exchangemassspectroscopy andcomputationaldockingrevealsextendedDNA-bindingsurfaceonuracil-DNA-glycosylase.Nucl.AcidsRes., 40(13):6070–6081, 2012.

9. Victoria A. Roberts,Elaine E. Thompson,Michael E. Pique,Martin S. Perez,and Lynn. F Ten Eyck. DOT2:Macromoleculardockingwith improvedbiophysicalmodels.J. Comp.Chem., 2013.

10. M. F. Sanner, A. J. Olson,andJ.-C.Spehner. Reducedsurface: An efficient way to computemolecularsurfaces.Biopolymers, 38:305–320,1996.

11. M. H. Trinh, M. Odorico,M. E. Pique,J.-M. Teulon,V. A. Roberts,L. F. TenEyck, E. D. Getzoff, P. Parot,S. W.Chen,andJ.-L. Pellequer. Computationalreconstructionof multidomainproteinsusingatomicforce microscopydata.Structure, 20:113–120,2012.

12. S. J. Weiner, P. A. Kollman,D. A. Case,U. C. Singh,C. Ghio, G. Alagona,S. Profeta,Jr., andP. Weiner. A newforce field for molecularmechanicalsimulationof nucleicacidsandproteins. J. Am.Chem.Soc., 106:765–784,1984.

13. J. Michael Word, SimonC. Lovell, JaneS. Richardson,andDavid C. Richardson. AsparagineandGlutamine:UsingHydrogenatomcontactsin thechoiceof side-chainamideorientation.J. Mol. Biol., 285:1735–1747,1999.

14. C. Zhang,G. Vasmatzis,J. L. Cornette,andC. DeLisi. Determinationof atomicdesolvation energies from thestructuresof crystallizedproteins.J. Mol. Biol., 267:707–726,1997.

66