running applications on the grid - university at … autodock/autogrid 4 suite of automated docking...
TRANSCRIPT
Running ApplicationsOn The Grid
Jon Bednasz & Steve GalloCenter for Computational Research
University at Buffalo
Applications
Autodock/AutoGrid 4
� Suite of automated docking tools designed to predic t how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structures
� http://autodock.scripps.edu/
� Autodock/AutoGrid 4 is Free Software
Applications
Who?
Dr. Barbara Poliks and Graduate Student Colby Chiauz zi
Research Associate Professor
Dept of Physics, Applied Physics and Astronomy
Binghamton University
Why?
Autodock/Autogrid was used to determine how the 6C-i nhibitor would position itself in each of the five binding s ites of pentamericLumazine Synthase.
Flow Chart - Gather
Gather files and information
Push files to grid machine
Globus-job-submit
Pull results back
Assume we have a valid DOE certificate
What machine do we want to run on?
Need correct autodock/autogrid executables
Need input files
Discover VO's directory on grid machine
run_directory=`globus-job-run ${host} /bin/sh -c 'ec ho $OSG_DATA'`
Gather files and information
Push files to grid machine
Globus-job-submit
Pull results back
Gather files and information
Push files to grid machine
Globus-job-submit
Pull results back
Create working directories within $OSG_DATA
globus-job-run ${host} -dir ${run_directory} /bin/s h-c "mkdir binghamton"
globus-job-run ${host} -dir ${run_directory} /bin/s h-c "mkdir binghamton/${experiment}"
Stage executables to grid machine
globus-url-copy file://`pwd`/${experiment}/autodock4 gsiftp://${host}/${run_directory}/binghamton/${expe riment}/autodock4
globus-url-copy file://`pwd`/${experiment}/autogrid4 gsiftp://${host}/${run_directory}/binghamton/${expe riment}/autogrid4
globus-url-copy file://`pwd`/${experiment}/grid.shgsiftp://${host}/${run_directory}/binghamton/${expe riment}/grid.sh
Change perms on the files
globus-job-run ${host} -dir ${run_directory}/bingha mton/${experiment} /bin/sh -c "chmod u+x autodock4"
globus-job-run ${host} -dir ${run_directory}/bingha mton/${experiment} /bin/sh -c "chmod u+x autogrid4"
globus-job-run ${host} -dir ${run_directory}/bingha mton/${experiment} /bin/sh -c "chmod u+x grid.sh"
Stage the input files to grid machine
globus-url-copy file://`pwd`/${experiment}/${input_f ile1} gsiftp://${host}/${run_directory}/binghamton/${expe riment}/${inp ut_file1}
globus-url-copy file://`pwd`/${experiment}/${input_f ile2} gsiftp://${host}/${run_directory}/binghamton/${expe riment}/${inp ut_file2}
Flow Chart - Push
Gather files and information
Push files to grid machine
Globus-job-submit
Pull results back
Globus Command
globus-job-submit ${host}/jobmanager-pbs-dir ${run_directory}/binghamton/${experiment}/ -maxtime 4300 ${run_directory}/binghamton/${experiment}/grid.sh
What is grid.sh
cd $OSG_DATA/binghamton/3_sa0GTDPColc4/./autogrid4 -p ./sa0GTDP4.gpf -l sa0GTDP4.glg && ./autodock4 -p sa0GTDP4.dpf -l sa0GTDP4.dlg
Flow Chart – Submit
Gather files and information
Push files to grid machine
Globus-job-submit
Pull results back
globus-url-copygsiftp://${host}/${run_directory}/binghamton/${experiment}/ file://`pwd`/${experiment}/
Flow Chart - Pull
Lesson's Learned
�Write a wrapper script “run_grid_job.sh”� Checks inputs � Helps to keep experiments/files organized
�Use globus-job-status
�Find someone, like Barbara and Colby, who are in need and highly motivated
Results
Figure 2.4 Distances between the 6C-inhibitor (31P) and selected 15N atoms of Lumazine Synathase. The NZ of lysine residue is located ~7.6Å away from the phosphorus. Three best docked inhibitors from AutoDock runs are shown. The green ribbon is traced through the backbone of the loop Ile 91-Lys 92-Gly 93-Ser 94-Thr 95-Met 96-His 97-Phe 98 in the vicinity of the phosphonate group of the inhibitor.
Results
Lumazine Synthase
Commercial Applications
COBALT CFD
� Computational Fluid Dynamics
� Licensed application
� License server resides at a remote location
� Source code not available
Commercial Challenges
� Source code typically not available
� License issues� Node-locked licensing model
� Remote license servers
� Many use MPI
� Applications sensitive to various MPI versions
� Grid job managers handle MPI differently than traditional command line submissions
Basic Grid Job Workflow
� Identify required files and gather in one location
�Stage data and executables to grid resource
�Submit job
�Query resource for job completion
�Retrieve results
�Clean up
Running: Gather Your Files
� Gather your files into a single directory� Application files
� Configuration files
� Data files
� Why a single directory?� Easier to manage a single directory
� Staging
� Easier to retrieve results
� Simpler cleanup
Running: Stage To Resource
� Select a working directory for your job ($OSG_DATA)globus-job-run localhost /bin/env | fgrep OSG_DATA
OSG_DATA=/san/scratch/grid/grid-tmp/grid-data
� Stage files to the resource� Use globus-url-copy
� file:// specifies a local file or directory
� gsiftp:// specifies a GSI FTP server
� Gotchyas� Local paths need 3 slashes ( file:///path/to/file )
� Directories must end with a slash ( gsiftp://path/to/dir/ )
� Data on the resource will be owned by the VO user a nd accessible to other VO users
� Must use fully qualified paths
� File permissions are NOT maintained on copy!
Running: Submit Job
� Submit using globus-job-submit� Use the correct job manager (pbs, sge, etc.)
� Returns a job id URL
� Stores standard output and standard error in your G lobus cache
� Use this URL to identify job for checking status an d cleaning up
� Don’t use globus-job-run since it will wait for the job to complete
#> globus-job-submit u2-grid.ccr.buffalo.edu/jobmanager-pbs -np 1 -maxtime 5 \
-dir /san/scratch/grid/grid-tmp/grid-data/NYSGRID/gallo-1 \
aracne -i arraydata10x336.exp -j arraydata10x336.adj -o arraydata10x336.out
https://u2-grid.ccr.buffalo.edu:15600/5540/1181757403/
Running: Get Job Status
� Query job status using globus-job-status
� Use URL from job submission
� Don’t query status constantly, this loads the gatek eeper
� Intermediate status is up to the application
� Valid job states: UNSUBMITTED, PENDING, ACTIVE, DON E, FAILED
#> globus-job-status https://u2-grid.ccr.buffalo.edu:15600/5540/1181757403/
PENDING
#> globus-job-status https://u2-grid.ccr.buffalo.edu:15600/5540/1181757403/
DONE
Running: Retrieve Results
�Retrieve results using globus-url-copy� Can retrieve entire directory or specific files
#> globus-url-copy -v -cd \
gsiftp://u2-grid.ccr.buffalo.edu/san/scratch/grid/grid-tmp/grid-data/NYSGRID/gallo-1/ \
file:///san/user/smgallo/u2/projects/nysgrid/gallo-1-results/
Source: gsiftp://u2-grid.ccr.buffalo.edu/san/scratch/grid/grid-tmp/grid-data/NYSGRID/gallo-1/
Dest: file:///san/user/smgallo/u2/projects/nysgrid/gallo-1-results/
aracne
arraydata10x336.adj
arraydata10x336.exp
arraydata10x336.out
Running: Cleanup
� Remove cached job output and any job files from res ource
� Remove job output globus-job-clean
� Manually remove job working directory
� Don’t use wildcards for filenames! You risk deleti ng everything if there is an error in your script.
#> globus-job-clean –force https://u2-grid.ccr.buffalo.edu:15600/5540/1181757403/
Cleanup successful.
#> globus-job-run u2-grid.ccr.buffalo.edu \
/bin/sh –c ‘cd $OSG_DATA/GRASE; rm -rf gallo-1’
COBALT: Specific Challenges
� Licensed product
� MPI FORTRAN application
� Single submission script creates input files and ru ns
MPI task launcher
COBALT: Licensing
� Licensed to Syracuse University for 16 nodes
� Needed permission from the vendor to run remotely
� Ensure there were no license domain restrictions
� Firewall rule modifications on license server
� Compute nodes require outbound internet connectivit y
COBALT: Workflow
� Three files used to run a job� case.job – Create an input file for Cobalt and runs t he MPI
wrapper. User modifies this file.� CoMPIRUN – Cobalt MPI wrapper installed with applicat ion� COBALT executable
Create input file
case.job
Set Environment Run MPI Wrapper
Set license
Server info
CoMPIRUN
Set up working
directory
Prepare input files
for COBALT
Execute COBALT using
MPI task launcherClean Up
License
Server
COBALT: Modifications
Modifying COBALT for the Grid
� Create a “Job Package” to send to grid host
� Modify COBALT scripts
� MPI handled by grid job manager
� Cannot simply send your submission script
� Job submission and status polling
� Results retrieval and cleanup
Traditional MPI
#> qsub case.job
case.job
CoMPIRUN
Generate configuration files
Copy/generate input files
mpirun cobalt.linux.dp
Node 1
Cleanup
Node 2
Node n
Resource manager
allocates nodes
Grid MPI
#> globus-job-submit jobmanager-pbs -x ‘&(jobtype=mp i)’ cobalt.linux.dp
mpirun cobalt.linux.dp
Node 1
Node 2
Node n
Resource manager
allocates nodes
Grid Job Manager
COBALT: Modified Workflow
� create_job_package – Create a “job package” including data files. Much of the work of the CoMPIRUN script is n ow done here.
� submit_job – Grid submission wrapper
Create input file
create_job_package
Poll for Status Retrieve ResultsStage Job Package
submit_job
Set up working
directory
Prepare data files
for COBALT
Execute COBALT using
grid job managerClean Up
License
Server
Job
Package
Applications
Amber
� Amber is used for simulation of biomolecules
� http://ambermd.org
Applications
Who?
Dr. Barbara Poliks
Research Associate Professor
Dept of Physics, Applied Physics and Astronomy
Binghamton University
Why?
Needed to run long term molecular dynamics (up to 20 ns) on large systems of protein + water (Lumazine Synthase + water and tubulin + water – both simulated systems had more then 100,000 atoms) 1ns dynamics took about 3.5 hours CPU with parallel pro cessing using 32 nodes.
Flow Chart - Gather
Gather files and information
Push files to grid machine
Globus-job-submit
Pull results back
Assume we have a valid DOE certificate
What machine do we want to run on?
Need correct amber path
Need input files
Discover VO's directory on grid machine
run_directory=`globus-job-run ${host} /bin/sh -c 'ec ho $OSG_DATA'`
Gather files and information
Push files to grid machine
Globus-job-submit
Pull results back
Gather files and information
Push files to grid machine
Globus-job-submit
Pull results back
Create working directories within $OSG_DATA
globus-job-run ${host} -dir ${run_directory} /bin/s h-c "mkdir binghamton"
globus-job-run ${host} -dir ${run_directory} /bin/s h-c "mkdir binghamton/${experiment}"
Stage inputs to grid machine
globus-url-copy file://`pwd`/${experiment}/dyn.ingsiftp://${host}/${run_directory}/binghamton/${expe riment}/dyn.in
globus-url-copy file://`pwd`/${experiment}/sa0_GGafd yn1.rstgsiftp://${host}/${run_directory}/binghamton/${expe riment}/sa0_GGafdyn1.rst
globus-url-copy file://`pwd`/${experiment}/sa0_GG.pr mtop gsiftp://${host}/${run_directory}/binghamton/${expe riment}/sa0_GG.prmtop
Flow Chart - Push
Gather files and information
Push files to grid machine
Globus-job-submit
Pull results back
Globus Command
globus-job-submit ${host}/jobmanager-pbs-dir ${run_directory}/binghamton/${experiment}/-maxtime 4300-np 4
-x '(jobType=mpi)'/util/amber/amber9/exe/pmemd -O -i dyn.in -o sa0_GGafdyn1.out -p sa0_GG.prmtop -c sa0_GGafdyn1.rst -r sa0_GGafdyn1.rst -x sa0_GGafdyn1.mdcrd
Flow Chart – Submit
Gather files and information
Push files to grid machine
Globus-job-submit
Pull results back
globus-url-copygsiftp://${host}/${run_directory}/binghamton/${experiment}/ file://`pwd`/${experiment}/
Flow Chart - Pull
Lesson's Learned
�Mpi jobs are significantly harder
�Needed to change pbs.pm� Define $mpirun� Change $remote_shell (rsh from ssh) �
� Execute submitted script directly
�Find someone, like Barbara and Colby, who are in need and highly motivated
Results - 1Paper 1 (accepted to Biochemistry October 14 2008):
15N{31P} REDOR NMR Studies of the Binding of Phosph onate Reaction Intermediate Analogues to Saccharomyces cerevisiae Lumazine Synthase
Tsyr-Yan Yu,† Robert D. O'Connor,† Astrid C. Sivertse n,† Colby Chiauzzi,‡ Barbara Poliks,‡Markus Fischer, §§§§Adelbert Bacher,¥ Ilka Haase, §§§§ Mark Cushman, ¶¶¶¶* and Jacob Schaefer†*
Department of Chemistry, Washington University, Sai nt Louis, Missouri 63130, Department of Physics,Sta teUniversity of New York at Binghamton, Binghamton, N ew York 13902, Universität Hamburg,Institute of Food Chemistry, Grindelallee 117, D-20146 Hamburg, German y,Lehrstuhl für Organische Chemie und Biochemie, Technische Universität München, D-85747 Garching, Ger many,and Department of Medicinal Chemistry and Molecular Pharmacology, School of Pharmacy, Purdue University,West Lafayette, IN 47907
ABSTRACT : Lumazine synthase catalyzes the reaction of 5-amino-6-D-ribitylamino-2,4(1 H,3 H)-pyrimidinedione(1) with (S)-3,4-dihydroxybutanone 4 -phosphate (2) to afford 6,7-dimethyl-8-D-ribityllu mazine(3), the immediate biosynthetic precursor of riboflavin. The overall reaction implies a series of intermedi ates that are incompletely understood. The 15N{31P} REDOR NMR spe ctra of three metabolically stable phosphonate reacti on intermediate analogues complexed to Saccharomyces cer evisiae lumazine synthase have been obtained at 7 and 12 T. Distances from the phosphorus atoms of the li gands to the side chain nitrogens of Lys92, His97, Ar g136, and His148 have been determined. These distances we re used in combination with the X-ray crystal coord inates of one of the intermediate analogues complexed with the enzyme in a series of distance-restrained molecu lar dynamics simulations. The resulting models indicate mobility of the Lys92 side chain, which could faci litate the exchange of inorganic phosphate eliminated from the substrate in one reaction, with the organic phospha te-containing substrate necessary for the next reactio n.
Results - 2Paper 2 - to be submitted to Biochemistry Dec. 2008:
Dissecting the Paclitaxel-Microtubule Association: Quantitative Assessment of the 2’-OH Group Shubhada Sharma,‡ Chandraiah Lagisetti, §§§§ Colby Chiauzzi, ║║║║ Barbara Poliks, ║║║║ Robert M. Coates, §§§§ and Susan Bane*, ‡
Department of Chemistry, Binghamton University, Sta te University of New York, Binghamton, New York 13902, §§§§Department of Chemistry, University of Illinois, Ur bana-Champaign, Illinois 61801, ║║║║Department of Physics, Binghamton University, State University of New York, Binghamton, New York 13902.
ABSTRACT: Molecular interactions of paclitaxel and c olchicine with tubulin have been studied in detail. The quantitative contribution of 2’-hydroxy l group and the role of N-benzoyl group in the association of paclitaxel with microtubules have bee n determined by studying the effect of selected paclitaxel derivatives on tubulin assembly. The affin ities of these taxanes for microtubules and their cytotoxicities were also measured. The 2’-hydroxyl g roup was found to contribute a quarter of the total free energy change and more than three quarters of free change attributed to the C-13 side chain for the association of paclitaxel with microtubules. Mol ecular modeling analysis suggests that the 2’-hydroxyl group forms a hydrogen bond with the D26 o f ββββ-tubulin. The N-benzoyl group therefore may serve as an anchor for appropriate conformation of C-13 side chain, facilitating the formation of the hydrogen binding interaction between the 2’-hydroxy l group and protein. These findings help define the structural requirements for high affinity bindi ng to microtubules and are consequently critical towards the development of structurally simpler pac litaxel analogs.
Miscellany
� Standard grid job managers allow a single MPI task launcher to be specified.� Many applications require a specific flavour or vers ion of
MPI
� Not all task launchers compatible
� TODO� Rewrite task launcher to accept generic scripts
� Create a jobmanager-pbs-mpi that allows any installe d MPI task launcher to be written
� Allow users to submit their standard MPI submission scripts
Conclusion
Questions?