virtual laboratory: enabling distributed molecular modelling for drug discovery on the grid

Post on 01-Feb-2016

34 Views

Category:

Documents

2 Downloads

Preview:

Click to see full reader

DESCRIPTION

WW Grid. Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid. Rajkumar Buyya. Gri d Computing and D istributed S ystems (GRIDS) Lab . The University of Melbourne Melbourne, Australia www.gridbus.org/vlab/. Agenda. Introduction - PowerPoint PPT Presentation

TRANSCRIPT

Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid

Rajkumar BuyyaGrid Computing and Distributed Systems

(GRIDS) Lab. The University of MelbourneMelbourne, Australiawww.gridbus.org/vlab/

WW Grid

2

3

Agenda

Introduction Molecular Docking Application Needs

Virtual Lab Architecture Grid Enabling CDB (chemical databases) Application Composition Scheduling Experiments Conclusions

4

Drug Design: Data Intensive Computing on Grid

It involves screening millions of chemical compounds (molecules) in the Chemical DataBase (CDB) to identify those having potential to serve as drug candidates.

Protein

Molecules

Chemical Databases(legacy, in .MOL2 format)

[Collaboration with WEHI for Medical Science, Melbourne]

5

Using Basic Job submission commands

Do all yourself! (manually)

Total Cost:$???

6

Build Distributed Application & Scheduler

Build App case by case basisComplicated Construction

E.g., MPI based Total Cost:$???

7

Rapid Parameterisation and Deployment Using the Gridbus and

Nimrod-G Tools

0

10

20

30

40

50

60

70

80

90

1st Qtr 2nd Qtr 3rd Qtr 4th Qtr

East

West

North

South

Compose, Submit, & Play!

8

Docking Application Requirements

Protein

Molecules

Protein

Molecules

It is compute intensive: Each docking job can

take few minutes to hours depending on the structural complexity.

It is data intensive: The databases are huge

(MBs tpo GBs) and each contain thousands of molecules. Screening all molecules in all databases is a real data challenge!

CDBs are distributed. It is a killer application

for the Grid.

Chemical Databases(legacy, in .MOL2 format)

9

DataGrid Brokering

Nimrod/GComputational

Grid Broker

Data Replica CatalogueCDB Broker

Algorithm1

AlgorithmN

. . .

CDB Service

“Screen mol.5 please?”

GSP1 GSP2 GSP4GSP3(Grid Service Provider)

GSPm

CDB Service

GSPn

1

“advise CDB source?

2“selection & advise: use GSP4!”

5Grid Info. Service

3

“Is GSP4 healthy?”

4

“mol.5 please?”6

“CDB replicas please?”

“Screen 2K molecules in 30min. for $10”

7

“process & send results”

10

Software Tools

Molecular Modelling Application (DOCK) Parameter Modelling Tools (Nimrod/enFusion) Grid Resource Broker (Nimrod-G) Data Grid Broker Chemical DataBase (CDB) Management and Intelligent

Access Tools PDB databse Lookup/Index Table Generation. PDB and associated index-table Replication. PDB Replica Catalogue (that helps in Resource Discovery). PDB Servers (that serve PDB clients requests). PDB Brokering (Replica Selection). PDB Clients for fetching Molecule Record (Data Movement).

Grid Middleware (Globus and GrACE) Grid Fabric Management (Fork/LSF/Condor/Codine/…)

11

The Virtual Lab. – Software Stack

Globus [security, information, job submission]

[Distributed computers and databases with different Arch, OS, and local resource management systems]

Nimrod-G and CDB Data Broker

[task farming engine, scheduler, dispatcher, agents, CDB (chemical database) server]

Nimrod and Virtual Lab Tools

[parametric programming language, GUI tools, and CDB indexer]

Molecular Modelling for Drug Design

FABRIC

APPLICATIONS

CORE MIDDLEWARE

USER LEVEL MIDDLEWARE

PROGRAMMINGTOOLS

PDB

CDB

Worldwide Grid

12

V-Lab Components InteractionGrid InfoServer

ProcessServer

UserProcess

File accessFileServer

Grid Node

NimrodAgent

Compute NodeUser Node

GridDispatcher

Grid Trade Server

GridScheduler

Local Resource Manager

Nimrod-G Grid Broker

TaskFarmingEngine

Grid ToolsAnd

Applications

Do this in 30 min. for $10?

CDB Client

Get molecule “n” record from “abc” CDB

DockingProcess

CDBServer

Index and CDB1

..

.. .... ..

..

CDBm

Molecule “n”Location ?

Get mol. record

CDB Service on Grid

13

DOCK code*(Enhanced by WEHI, U of

Melbourne)

A program to evaluate the chemical and geometric complementarities between a small molecule and a macromolecular binding site.

It explores ways in which two molecules, such as a drug and an enzyme or protein receptor, might fit together.

Compounds which dock to each other well, like pieces of a three-dimensional jigsaw puzzle, have the potential to bind.

So, why is it important to able to identify small molecules which may bind to a target macromolecule?

A compound which binds to a biological macromolecule may inhibit its function, and thus act as a drug.

E.g., disabling the ability of (HIV) virus attaching itself to molecule/protein!

With system specific code changed, we have been able to compile it for Sun-Solaris, PC Linux, SGI IRIX, Compaq Alpha/OSF1

* Original Code: University of California, San Francisco: http://www.cmpharm.ucsf.edu/kuntz/

14

Dock input filescore_ligand yesminimize_ligand yesmultiple_ligands norandom_seed 7anchor_search notorsion_drive yesclash_overlap 0.5conformation_cutoff_factor 3torsion_minimize yesmatch_receptor_sites norandom_search yes . . . . . . . . . . . .maximum_cycles 1ligand_atom_file S_1.mol2receptor_site_file ece.sphscore_grid_prefix ecevdw_definition_file parameter/vdw.defnchemical_definition_file parameter/chem.defnchemical_score_file parameter/chem_score.tblflex_definition_file parameter/flex.defnflex_drive_file parameter/flex_drive.tblligand_contact_file dock_cnt.mol2ligand_chemical_file dock_chm.mol2ligand_energy_file dock_nrg.mol2

Molecule to Molecule to be screenedbe screened

15

score_ligand $score_ligandminimize_ligand $minimize_ligandmultiple_ligands $multiple_ligandsrandom_seed $random_seedanchor_search $anchor_searchtorsion_drive $torsion_driveclash_overlap $clash_overlapconformation_cutoff_factor $conformation_cutoff_factortorsion_minimize $torsion_minimizematch_receptor_sites $match_receptor_sitesrandom_search $random_search . . . . . . . . . . . .maximum_cycles $maximum_cyclesligand_atom_file ${ligand_number}.mol2receptor_site_file $HOME/dock_inputs/${receptor_site_file}score_grid_prefix $HOME/dock_inputs/${score_grid_prefix}vdw_definition_file vdw.defnchemical_definition_file chem.defnchemical_score_file chem_score.tblflex_definition_file flex.defnflex_drive_file flex_drive.tblligand_contact_file dock_cnt.mol2ligand_chemical_file dock_chm.mol2ligand_energy_file dock_nrg.mol2

1. Parameterize Dock input file(use Nimrod Tools: GUI/language)

Molecule to be Molecule to be screenedscreened

16

parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300";

parameter CDB_SERVER text default "bezek.dstc.monash.edu.au";parameter CDB_PORT_NO text default "5001";parameter score_ligand text default "yes";parameter minimize_ligand text default "yes";parameter multiple_ligands text default "no";parameter random_seed integer default 7;parameter anchor_search text default "no";parameter torsion_drive text default "yes";parameter clash_overlap float default 0.5;parameter conformation_cutoff_factor integer default 5;parameter torsion_minimize text default "yes";parameter match_receptor_sites text default "no"; . . . . . . . . . . . .parameter maximum_cycles integer default 1;parameter receptor_site_file text default "ece.sph";parameter score_grid_prefix text default "ece";parameter ligand_number integer range from 1 to 2000 step 1;

2. Create Docking Plan:Define Variable and their value

Molecules to be Molecules to be screenedscreened

17

task nodestart copy ./parameter/vdw.defn node:. copy ./parameter/chem.defn node:. copy ./parameter/chem_score.tbl node:. copy ./parameter/flex.defn node:. copy ./parameter/flex_drive.tbl node:. copy ./dock_inputs/get_molecule node:. copy ./dock_inputs/dock_base node:.endtasktask main node:substitute dock_base dock_run node:substitute get_molecule get_molecule_fetch node:execute sh ./get_molecule_fetch node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out copy node:dock_out ./results/dock_out.$jobname copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobnameendtask

Create Docking PlanFile3. Define Task that jobs need to

do

18

Gridbus Visual Tool for Parametric Application Creation (e.g., Docking)

19

Chemical DataBase (CDB)

Databases consist of small molecules from commercially available organic synthesis libraries, and natural product databases.

There is also the ability to screen virtual combinatorial databases, in their entirety.

This methodology allows only the required compounds to be subjected to physical screening and/or synthesis reducing both time and expense.

20

Target Testcase

The target for the test case: electrocardiogram (ECE) endothelin converting enzyme. This is involved in “heart stroke” and other transient ischemia.

Is·che·mi·a : A decrease in the blood supply to a bodily organ, tissue, or part caused by constriction or obstruction of the blood vessels.

Docking Deployment on The World Wide Grid

22

Scheduling Molecular Docking Application on Grid: Experiment

Workload – Docking 200 molecules with ECE 200 jobs, each need in the order of 3 minute depending

on molecule weight. Deadline: 60 min. and budget: 50, 000 G$/tokens Strategy: minimise time / cost Execution Cost with cost optimisation

Optimise Cost: 14, 277(G$) (finished in 59.30 min.) Optimise Time: 17, 702 (G$) (finished in 34 min.) In this experiment: Time-optimised scheduling costs

extra 3.5K$ compared to that of Cost-optimised. Users can now trade-off between Time Vs. Cost.

23

WWG Setup

GMonitor

Grid MarketDirectory

Australia

Melbourne+Monash U:

VPAC, Physics

Solaris WS

Gridbus+Nimrod-G

Europe

ZIB: T3E/OnyxAEI: Onyx CNR: ClusterCUNI/CZ: OnyxPozman: SGI/SP2Vrije U: ClusterCardiff: Sun E6500Portsmouth: Linux PCManchester: O3KCambridge: SGIMany others

Asia

AIST, Japan: Solaris ClusterOsaka University: ClusterDoshia: Linux clusterKorea: Linux cluster

North America

ANL: SGI/Sun/SP2NCSA: ClusterWisc: PC/clusterNRC, CanadaMany others

InternetWW Grid

MEG Visualisation

24

Resources Selected & Price/CPU-sec.

Resource & Location

Grid services & Fabric

Cost/CPU sec. or unit

No. of Jobs Executed

Time_Opt Cost_Opt

Monash, Melbourne, Australia (Sun Ultra01)

Globus, Nimrod-G, GTS (master node)

-- -- --

AIST, Tokyo, Japan, Ultra-4

Globus, GTS, Fork 1 44 102

AIST, Tokyo, Japan, Ultra-4

Globus, GTS, Fork 2 41 41

AIST, Tokyo, Japan, Ultra-4

Globus, GTS, Fork 1 42 39

AIST, Tokyo, Japan, Ultra-2

Globus, GTS, Fork 3 11 4

Sun-ANL, Chicago,US, Ulta-8

Globus, GTS, Fork 1 62 14Total Experiment Cost (G$) 17,702 14,277

Time to Complete Exp. (Min.) 34 59.30

25

DBC Scheduling for Time Optimization – No. of Jobs in Exec.

0

1

2

3

4

5

6

7

8

9

Time (in Min.)

No

. o

f Jo

bs

in E

xec.

AIST-Sun-hpc420.hpcc.jp

AIST-Sun-hpc420-1.hpcc.jp

AIST-Sun-hpc420-2.hpcc.jp

AIST-Sun-hpc220-2.hpcc.jp

ANL-Sun-pitcairn.mcs.anl.gov

26

DBC Scheduling for Cost Optimization – No. of Jobs in Exec.

0

1

2

3

4

5

6

7

8

9

10

Time (in min.)

No

. o

f Jo

bs

in E

xecu

tio

n

AIST-Sun-hpc420.hpcc.jp

AIST-Sun-hpc420-1.hpcc.jp

AIST-Sun-hpc420-2.hpcc.jp

AIST-Sun-hpc220-2.hpcc.jp

ANL-Sun-pitcairn.mcs.anl.gov

27

Summary and Conclusion

Applications can be Grid enabled and deployed on the Grid with minimal effort, but need a right set of Grid tools.

Distributed Docking demonstrates that Nimrod-G and Gridbus tools:

Enable Grid application software engineering rapidly Provide powerful runtime machinery for optimal

deployment of applications on the Grid. Easy to use tools for composing applications to run on

Grid are essential to attracting and getting application community on board.

Integrate with our Data Grid Broker to support selection of CDB nodes dynamically. (progress)

28

Thanks

http:/www.gridbus.org/vlab

29

DBC Time Opt. Scheduling

30

DBC Scheduling for Time Optimization – No. of Jobs Finished

0

10

20

30

40

50

60

70

Time (in min.)

No

. o

f Jo

bs

Fin

ish

ed

AIST-Sun-hpc420.hpcc.jp

AIST-Sun-hpc420-1.hpcc.jp

AIST-Sun-hpc420-2.hpcc.jp

AIST-Sun-hpc220-2.hpcc.jp

ANL-Sun-pitcairn.mcs.anl.gov

31

DBC Scheduling for Time Optimization – Budget Spent

0

1000

2000

3000

4000

5000

6000

7000

Time (in min.)

G$

spen

t fo

r p

roce

ssin

g j

ob

s

AIST-Sun-hpc420.hpcc.jp

AIST-Sun-hpc420-1.hpcc.jp

AIST-Sun-hpc420-2.hpcc.jp

AIST-Sun-hpc220-2.hpcc.jp

ANL-Sun-pitcairn.mcs.anl.gov

32

DBC Cost Opt. Scheduling

33

DBC Scheduling for Cost Optimization – No. of Jobs

Finished

0

20

40

60

80

100

120

Time (in min.)

No

. o

f Jo

bs

Exe

cute

d

AIST-Sun-hpc420.hpcc.jp

AIST-Sun-hpc420-1.hpcc.jp

AIST-Sun-hpc420-2.hpcc.jp

AIST-Sun-hpc220-2.hpcc.jp

ANL-Sun-pitcairn.mcs.anl.gov

34

DBC Scheduling for Cost Optimization – Budget Spent

0

1000

2000

3000

4000

5000

6000

Time (in min.)

G$

spen

t fo

r jo

b p

roce

ssin

g

AIST-Sun-hpc420.hpcc.jp

AIST-Sun-hpc420-1.hpcc.jp

AIST-Sun-hpc420-2.hpcc.jp

AIST-Sun-hpc220-2.hpcc.jp

ANL-Sun-pitcairn.mcs.anl.gov

35

Parametric Processing

Multiple RunsSame ProgramMultiple Data Killer Application for the Grid!

ParametersAge Hair

23 CleanAge Hair

23 Clean23 Beard28 Goatee

Age Hair23 Clean23 Beard

Age Hair23 Clean23 Beard28 Goatee28 Clean

Age Hair23 Clean23 Beard28 Goatee28 Clean19 Moustache

Age Hair23 Clean23 Beard28 Goatee28 Clean19 Moustache10 Clean

Age Hair23 Clean23 Beard28 Goatee28 Clean19 Moustache10 Clean

-4000000 Too much

Courtesy: Anand Natrajan, University of Virginia

Magic Engine forManufacturing Humans!

top related