running applications on the grid - university at … autodock/autogrid 4 suite of automated docking...

37
Running Applications On The Grid Jon Bednasz & Steve Gallo Center for Computational Research University at Buffalo

Upload: lybao

Post on 19-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Running ApplicationsOn The Grid

Jon Bednasz & Steve GalloCenter for Computational Research

University at Buffalo

Page 2: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Applications

Autodock/AutoGrid 4

� Suite of automated docking tools designed to predic t how small molecules, such as substrates or drug candidates, bind to a receptor of known 3D structures

� http://autodock.scripps.edu/

� Autodock/AutoGrid 4 is Free Software

Page 3: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Applications

Who?

Dr. Barbara Poliks and Graduate Student Colby Chiauz zi

Research Associate Professor

Dept of Physics, Applied Physics and Astronomy

Binghamton University

Why?

Autodock/Autogrid was used to determine how the 6C-i nhibitor would position itself in each of the five binding s ites of pentamericLumazine Synthase.

Page 4: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Flow Chart - Gather

Gather files and information

Push files to grid machine

Globus-job-submit

Pull results back

Assume we have a valid DOE certificate

What machine do we want to run on?

Need correct autodock/autogrid executables

Need input files

Discover VO's directory on grid machine

run_directory=`globus-job-run ${host} /bin/sh -c 'ec ho $OSG_DATA'`

Page 5: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Gather files and information

Push files to grid machine

Globus-job-submit

Pull results back

Gather files and information

Push files to grid machine

Globus-job-submit

Pull results back

Create working directories within $OSG_DATA

globus-job-run ${host} -dir ${run_directory} /bin/s h-c "mkdir binghamton"

globus-job-run ${host} -dir ${run_directory} /bin/s h-c "mkdir binghamton/${experiment}"

Stage executables to grid machine

globus-url-copy file://`pwd`/${experiment}/autodock4 gsiftp://${host}/${run_directory}/binghamton/${expe riment}/autodock4

globus-url-copy file://`pwd`/${experiment}/autogrid4 gsiftp://${host}/${run_directory}/binghamton/${expe riment}/autogrid4

globus-url-copy file://`pwd`/${experiment}/grid.shgsiftp://${host}/${run_directory}/binghamton/${expe riment}/grid.sh

Change perms on the files

globus-job-run ${host} -dir ${run_directory}/bingha mton/${experiment} /bin/sh -c "chmod u+x autodock4"

globus-job-run ${host} -dir ${run_directory}/bingha mton/${experiment} /bin/sh -c "chmod u+x autogrid4"

globus-job-run ${host} -dir ${run_directory}/bingha mton/${experiment} /bin/sh -c "chmod u+x grid.sh"

Stage the input files to grid machine

globus-url-copy file://`pwd`/${experiment}/${input_f ile1} gsiftp://${host}/${run_directory}/binghamton/${expe riment}/${inp ut_file1}

globus-url-copy file://`pwd`/${experiment}/${input_f ile2} gsiftp://${host}/${run_directory}/binghamton/${expe riment}/${inp ut_file2}

Flow Chart - Push

Page 6: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Gather files and information

Push files to grid machine

Globus-job-submit

Pull results back

Globus Command

globus-job-submit ${host}/jobmanager-pbs-dir ${run_directory}/binghamton/${experiment}/ -maxtime 4300 ${run_directory}/binghamton/${experiment}/grid.sh

What is grid.sh

cd $OSG_DATA/binghamton/3_sa0GTDPColc4/./autogrid4 -p ./sa0GTDP4.gpf -l sa0GTDP4.glg && ./autodock4 -p sa0GTDP4.dpf -l sa0GTDP4.dlg

Flow Chart – Submit

Page 7: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Gather files and information

Push files to grid machine

Globus-job-submit

Pull results back

globus-url-copygsiftp://${host}/${run_directory}/binghamton/${experiment}/ file://`pwd`/${experiment}/

Flow Chart - Pull

Page 8: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Lesson's Learned

�Write a wrapper script “run_grid_job.sh”� Checks inputs � Helps to keep experiments/files organized

�Use globus-job-status

�Find someone, like Barbara and Colby, who are in need and highly motivated

Page 9: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Results

Figure 2.4 Distances between the 6C-inhibitor (31P) and selected 15N atoms of Lumazine Synathase. The NZ of lysine residue is located ~7.6Å away from the phosphorus. Three best docked inhibitors from AutoDock runs are shown. The green ribbon is traced through the backbone of the loop Ile 91-Lys 92-Gly 93-Ser 94-Thr 95-Met 96-His 97-Phe 98 in the vicinity of the phosphonate group of the inhibitor.

Page 10: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Results

Lumazine Synthase

Page 11: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Commercial Applications

COBALT CFD

� Computational Fluid Dynamics

� Licensed application

� License server resides at a remote location

� Source code not available

Page 12: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Commercial Challenges

� Source code typically not available

� License issues� Node-locked licensing model

� Remote license servers

� Many use MPI

� Applications sensitive to various MPI versions

� Grid job managers handle MPI differently than traditional command line submissions

Page 13: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Basic Grid Job Workflow

� Identify required files and gather in one location

�Stage data and executables to grid resource

�Submit job

�Query resource for job completion

�Retrieve results

�Clean up

Page 14: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Running: Gather Your Files

� Gather your files into a single directory� Application files

� Configuration files

� Data files

� Why a single directory?� Easier to manage a single directory

� Staging

� Easier to retrieve results

� Simpler cleanup

Page 15: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Running: Stage To Resource

� Select a working directory for your job ($OSG_DATA)globus-job-run localhost /bin/env | fgrep OSG_DATA

OSG_DATA=/san/scratch/grid/grid-tmp/grid-data

� Stage files to the resource� Use globus-url-copy

� file:// specifies a local file or directory

� gsiftp:// specifies a GSI FTP server

� Gotchyas� Local paths need 3 slashes ( file:///path/to/file )

� Directories must end with a slash ( gsiftp://path/to/dir/ )

� Data on the resource will be owned by the VO user a nd accessible to other VO users

� Must use fully qualified paths

� File permissions are NOT maintained on copy!

Page 16: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Running: Submit Job

� Submit using globus-job-submit� Use the correct job manager (pbs, sge, etc.)

� Returns a job id URL

� Stores standard output and standard error in your G lobus cache

� Use this URL to identify job for checking status an d cleaning up

� Don’t use globus-job-run since it will wait for the job to complete

#> globus-job-submit u2-grid.ccr.buffalo.edu/jobmanager-pbs -np 1 -maxtime 5 \

-dir /san/scratch/grid/grid-tmp/grid-data/NYSGRID/gallo-1 \

aracne -i arraydata10x336.exp -j arraydata10x336.adj -o arraydata10x336.out

https://u2-grid.ccr.buffalo.edu:15600/5540/1181757403/

Page 17: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Running: Get Job Status

� Query job status using globus-job-status

� Use URL from job submission

� Don’t query status constantly, this loads the gatek eeper

� Intermediate status is up to the application

� Valid job states: UNSUBMITTED, PENDING, ACTIVE, DON E, FAILED

#> globus-job-status https://u2-grid.ccr.buffalo.edu:15600/5540/1181757403/

PENDING

#> globus-job-status https://u2-grid.ccr.buffalo.edu:15600/5540/1181757403/

DONE

Page 18: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Running: Retrieve Results

�Retrieve results using globus-url-copy� Can retrieve entire directory or specific files

#> globus-url-copy -v -cd \

gsiftp://u2-grid.ccr.buffalo.edu/san/scratch/grid/grid-tmp/grid-data/NYSGRID/gallo-1/ \

file:///san/user/smgallo/u2/projects/nysgrid/gallo-1-results/

Source: gsiftp://u2-grid.ccr.buffalo.edu/san/scratch/grid/grid-tmp/grid-data/NYSGRID/gallo-1/

Dest: file:///san/user/smgallo/u2/projects/nysgrid/gallo-1-results/

aracne

arraydata10x336.adj

arraydata10x336.exp

arraydata10x336.out

Page 19: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Running: Cleanup

� Remove cached job output and any job files from res ource

� Remove job output globus-job-clean

� Manually remove job working directory

� Don’t use wildcards for filenames! You risk deleti ng everything if there is an error in your script.

#> globus-job-clean –force https://u2-grid.ccr.buffalo.edu:15600/5540/1181757403/

Cleanup successful.

#> globus-job-run u2-grid.ccr.buffalo.edu \

/bin/sh –c ‘cd $OSG_DATA/GRASE; rm -rf gallo-1’

Page 20: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

COBALT: Specific Challenges

� Licensed product

� MPI FORTRAN application

� Single submission script creates input files and ru ns

MPI task launcher

Page 21: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

COBALT: Licensing

� Licensed to Syracuse University for 16 nodes

� Needed permission from the vendor to run remotely

� Ensure there were no license domain restrictions

� Firewall rule modifications on license server

� Compute nodes require outbound internet connectivit y

Page 22: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

COBALT: Workflow

� Three files used to run a job� case.job – Create an input file for Cobalt and runs t he MPI

wrapper. User modifies this file.� CoMPIRUN – Cobalt MPI wrapper installed with applicat ion� COBALT executable

Create input file

case.job

Set Environment Run MPI Wrapper

Set license

Server info

CoMPIRUN

Set up working

directory

Prepare input files

for COBALT

Execute COBALT using

MPI task launcherClean Up

License

Server

Page 23: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

COBALT: Modifications

Modifying COBALT for the Grid

� Create a “Job Package” to send to grid host

� Modify COBALT scripts

� MPI handled by grid job manager

� Cannot simply send your submission script

� Job submission and status polling

� Results retrieval and cleanup

Page 24: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Traditional MPI

#> qsub case.job

case.job

CoMPIRUN

Generate configuration files

Copy/generate input files

mpirun cobalt.linux.dp

Node 1

Cleanup

Node 2

Node n

Resource manager

allocates nodes

Page 25: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Grid MPI

#> globus-job-submit jobmanager-pbs -x ‘&(jobtype=mp i)’ cobalt.linux.dp

mpirun cobalt.linux.dp

Node 1

Node 2

Node n

Resource manager

allocates nodes

Grid Job Manager

Page 26: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

COBALT: Modified Workflow

� create_job_package – Create a “job package” including data files. Much of the work of the CoMPIRUN script is n ow done here.

� submit_job – Grid submission wrapper

Create input file

create_job_package

Poll for Status Retrieve ResultsStage Job Package

submit_job

Set up working

directory

Prepare data files

for COBALT

Execute COBALT using

grid job managerClean Up

License

Server

Job

Package

Page 27: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Applications

Amber

� Amber is used for simulation of biomolecules

� http://ambermd.org

Page 28: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Applications

Who?

Dr. Barbara Poliks

Research Associate Professor

Dept of Physics, Applied Physics and Astronomy

Binghamton University

Why?

Needed to run long term molecular dynamics (up to 20 ns) on large systems of protein + water (Lumazine Synthase + water and tubulin + water – both simulated systems had more then 100,000 atoms) 1ns dynamics took about 3.5 hours CPU with parallel pro cessing using 32 nodes.

Page 29: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Flow Chart - Gather

Gather files and information

Push files to grid machine

Globus-job-submit

Pull results back

Assume we have a valid DOE certificate

What machine do we want to run on?

Need correct amber path

Need input files

Discover VO's directory on grid machine

run_directory=`globus-job-run ${host} /bin/sh -c 'ec ho $OSG_DATA'`

Page 30: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Gather files and information

Push files to grid machine

Globus-job-submit

Pull results back

Gather files and information

Push files to grid machine

Globus-job-submit

Pull results back

Create working directories within $OSG_DATA

globus-job-run ${host} -dir ${run_directory} /bin/s h-c "mkdir binghamton"

globus-job-run ${host} -dir ${run_directory} /bin/s h-c "mkdir binghamton/${experiment}"

Stage inputs to grid machine

globus-url-copy file://`pwd`/${experiment}/dyn.ingsiftp://${host}/${run_directory}/binghamton/${expe riment}/dyn.in

globus-url-copy file://`pwd`/${experiment}/sa0_GGafd yn1.rstgsiftp://${host}/${run_directory}/binghamton/${expe riment}/sa0_GGafdyn1.rst

globus-url-copy file://`pwd`/${experiment}/sa0_GG.pr mtop gsiftp://${host}/${run_directory}/binghamton/${expe riment}/sa0_GG.prmtop

Flow Chart - Push

Page 31: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Gather files and information

Push files to grid machine

Globus-job-submit

Pull results back

Globus Command

globus-job-submit ${host}/jobmanager-pbs-dir ${run_directory}/binghamton/${experiment}/-maxtime 4300-np 4

-x '(jobType=mpi)'/util/amber/amber9/exe/pmemd -O -i dyn.in -o sa0_GGafdyn1.out -p sa0_GG.prmtop -c sa0_GGafdyn1.rst -r sa0_GGafdyn1.rst -x sa0_GGafdyn1.mdcrd

Flow Chart – Submit

Page 32: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Gather files and information

Push files to grid machine

Globus-job-submit

Pull results back

globus-url-copygsiftp://${host}/${run_directory}/binghamton/${experiment}/ file://`pwd`/${experiment}/

Flow Chart - Pull

Page 33: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Lesson's Learned

�Mpi jobs are significantly harder

�Needed to change pbs.pm� Define $mpirun� Change $remote_shell (rsh from ssh) �

� Execute submitted script directly

�Find someone, like Barbara and Colby, who are in need and highly motivated

Page 34: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Results - 1Paper 1 (accepted to Biochemistry October 14 2008):

15N{31P} REDOR NMR Studies of the Binding of Phosph onate Reaction Intermediate Analogues to Saccharomyces cerevisiae Lumazine Synthase

Tsyr-Yan Yu,† Robert D. O'Connor,† Astrid C. Sivertse n,† Colby Chiauzzi,‡ Barbara Poliks,‡Markus Fischer, §§§§Adelbert Bacher,¥ Ilka Haase, §§§§ Mark Cushman, ¶¶¶¶* and Jacob Schaefer†*

Department of Chemistry, Washington University, Sai nt Louis, Missouri 63130, Department of Physics,Sta teUniversity of New York at Binghamton, Binghamton, N ew York 13902, Universität Hamburg,Institute of Food Chemistry, Grindelallee 117, D-20146 Hamburg, German y,Lehrstuhl für Organische Chemie und Biochemie, Technische Universität München, D-85747 Garching, Ger many,and Department of Medicinal Chemistry and Molecular Pharmacology, School of Pharmacy, Purdue University,West Lafayette, IN 47907

ABSTRACT : Lumazine synthase catalyzes the reaction of 5-amino-6-D-ribitylamino-2,4(1 H,3 H)-pyrimidinedione(1) with (S)-3,4-dihydroxybutanone 4 -phosphate (2) to afford 6,7-dimethyl-8-D-ribityllu mazine(3), the immediate biosynthetic precursor of riboflavin. The overall reaction implies a series of intermedi ates that are incompletely understood. The 15N{31P} REDOR NMR spe ctra of three metabolically stable phosphonate reacti on intermediate analogues complexed to Saccharomyces cer evisiae lumazine synthase have been obtained at 7 and 12 T. Distances from the phosphorus atoms of the li gands to the side chain nitrogens of Lys92, His97, Ar g136, and His148 have been determined. These distances we re used in combination with the X-ray crystal coord inates of one of the intermediate analogues complexed with the enzyme in a series of distance-restrained molecu lar dynamics simulations. The resulting models indicate mobility of the Lys92 side chain, which could faci litate the exchange of inorganic phosphate eliminated from the substrate in one reaction, with the organic phospha te-containing substrate necessary for the next reactio n.

Page 35: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Results - 2Paper 2 - to be submitted to Biochemistry Dec. 2008:

Dissecting the Paclitaxel-Microtubule Association: Quantitative Assessment of the 2’-OH Group Shubhada Sharma,‡ Chandraiah Lagisetti, §§§§ Colby Chiauzzi, ║║║║ Barbara Poliks, ║║║║ Robert M. Coates, §§§§ and Susan Bane*, ‡

Department of Chemistry, Binghamton University, Sta te University of New York, Binghamton, New York 13902, §§§§Department of Chemistry, University of Illinois, Ur bana-Champaign, Illinois 61801, ║║║║Department of Physics, Binghamton University, State University of New York, Binghamton, New York 13902.

ABSTRACT: Molecular interactions of paclitaxel and c olchicine with tubulin have been studied in detail. The quantitative contribution of 2’-hydroxy l group and the role of N-benzoyl group in the association of paclitaxel with microtubules have bee n determined by studying the effect of selected paclitaxel derivatives on tubulin assembly. The affin ities of these taxanes for microtubules and their cytotoxicities were also measured. The 2’-hydroxyl g roup was found to contribute a quarter of the total free energy change and more than three quarters of free change attributed to the C-13 side chain for the association of paclitaxel with microtubules. Mol ecular modeling analysis suggests that the 2’-hydroxyl group forms a hydrogen bond with the D26 o f ββββ-tubulin. The N-benzoyl group therefore may serve as an anchor for appropriate conformation of C-13 side chain, facilitating the formation of the hydrogen binding interaction between the 2’-hydroxy l group and protein. These findings help define the structural requirements for high affinity bindi ng to microtubules and are consequently critical towards the development of structurally simpler pac litaxel analogs.

Page 36: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Miscellany

� Standard grid job managers allow a single MPI task launcher to be specified.� Many applications require a specific flavour or vers ion of

MPI

� Not all task launchers compatible

� TODO� Rewrite task launcher to accept generic scripts

� Create a jobmanager-pbs-mpi that allows any installe d MPI task launcher to be written

� Allow users to submit their standard MPI submission scripts

Page 37: Running Applications On The Grid - University at … Autodock/AutoGrid 4 Suite of automated docking tools designed to predict how small molecules, such as substrates or drug candidates,

Conclusion

Questions?