Download - Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid
Virtual Laboratory: Enabling Distributed Molecular Modelling for Drug Discovery on the Grid
Rajkumar BuyyaGrid Computing and Distributed Systems
(GRIDS) Lab. The University of MelbourneMelbourne, Australiawww.gridbus.org/vlab/
WW Grid
2
3
Agenda
Introduction Molecular Docking Application Needs
Virtual Lab Architecture Grid Enabling CDB (chemical databases) Application Composition Scheduling Experiments Conclusions
4
Drug Design: Data Intensive Computing on Grid
It involves screening millions of chemical compounds (molecules) in the Chemical DataBase (CDB) to identify those having potential to serve as drug candidates.
Protein
Molecules
Chemical Databases(legacy, in .MOL2 format)
[Collaboration with WEHI for Medical Science, Melbourne]
5
Using Basic Job submission commands
Do all yourself! (manually)
Total Cost:$???
6
Build Distributed Application & Scheduler
Build App case by case basisComplicated Construction
E.g., MPI based Total Cost:$???
7
Rapid Parameterisation and Deployment Using the Gridbus and
Nimrod-G Tools
0
10
20
30
40
50
60
70
80
90
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
South
Compose, Submit, & Play!
8
Docking Application Requirements
Protein
Molecules
Protein
Molecules
It is compute intensive: Each docking job can
take few minutes to hours depending on the structural complexity.
It is data intensive: The databases are huge
(MBs tpo GBs) and each contain thousands of molecules. Screening all molecules in all databases is a real data challenge!
CDBs are distributed. It is a killer application
for the Grid.
Chemical Databases(legacy, in .MOL2 format)
9
DataGrid Brokering
Nimrod/GComputational
Grid Broker
Data Replica CatalogueCDB Broker
Algorithm1
AlgorithmN
. . .
CDB Service
“Screen mol.5 please?”
GSP1 GSP2 GSP4GSP3(Grid Service Provider)
GSPm
CDB Service
GSPn
1
“advise CDB source?
2“selection & advise: use GSP4!”
5Grid Info. Service
3
“Is GSP4 healthy?”
4
“mol.5 please?”6
“CDB replicas please?”
“Screen 2K molecules in 30min. for $10”
7
“process & send results”
10
Software Tools
Molecular Modelling Application (DOCK) Parameter Modelling Tools (Nimrod/enFusion) Grid Resource Broker (Nimrod-G) Data Grid Broker Chemical DataBase (CDB) Management and Intelligent
Access Tools PDB databse Lookup/Index Table Generation. PDB and associated index-table Replication. PDB Replica Catalogue (that helps in Resource Discovery). PDB Servers (that serve PDB clients requests). PDB Brokering (Replica Selection). PDB Clients for fetching Molecule Record (Data Movement).
Grid Middleware (Globus and GrACE) Grid Fabric Management (Fork/LSF/Condor/Codine/…)
11
The Virtual Lab. – Software Stack
Globus [security, information, job submission]
[Distributed computers and databases with different Arch, OS, and local resource management systems]
Nimrod-G and CDB Data Broker
[task farming engine, scheduler, dispatcher, agents, CDB (chemical database) server]
Nimrod and Virtual Lab Tools
[parametric programming language, GUI tools, and CDB indexer]
Molecular Modelling for Drug Design
FABRIC
APPLICATIONS
CORE MIDDLEWARE
USER LEVEL MIDDLEWARE
PROGRAMMINGTOOLS
PDB
CDB
Worldwide Grid
12
V-Lab Components InteractionGrid InfoServer
ProcessServer
UserProcess
File accessFileServer
Grid Node
NimrodAgent
Compute NodeUser Node
GridDispatcher
Grid Trade Server
GridScheduler
Local Resource Manager
Nimrod-G Grid Broker
TaskFarmingEngine
Grid ToolsAnd
Applications
Do this in 30 min. for $10?
CDB Client
Get molecule “n” record from “abc” CDB
DockingProcess
CDBServer
Index and CDB1
..
.. .... ..
..
CDBm
Molecule “n”Location ?
Get mol. record
CDB Service on Grid
13
DOCK code*(Enhanced by WEHI, U of
Melbourne)
A program to evaluate the chemical and geometric complementarities between a small molecule and a macromolecular binding site.
It explores ways in which two molecules, such as a drug and an enzyme or protein receptor, might fit together.
Compounds which dock to each other well, like pieces of a three-dimensional jigsaw puzzle, have the potential to bind.
So, why is it important to able to identify small molecules which may bind to a target macromolecule?
A compound which binds to a biological macromolecule may inhibit its function, and thus act as a drug.
E.g., disabling the ability of (HIV) virus attaching itself to molecule/protein!
With system specific code changed, we have been able to compile it for Sun-Solaris, PC Linux, SGI IRIX, Compaq Alpha/OSF1
* Original Code: University of California, San Francisco: http://www.cmpharm.ucsf.edu/kuntz/
14
Dock input filescore_ligand yesminimize_ligand yesmultiple_ligands norandom_seed 7anchor_search notorsion_drive yesclash_overlap 0.5conformation_cutoff_factor 3torsion_minimize yesmatch_receptor_sites norandom_search yes . . . . . . . . . . . .maximum_cycles 1ligand_atom_file S_1.mol2receptor_site_file ece.sphscore_grid_prefix ecevdw_definition_file parameter/vdw.defnchemical_definition_file parameter/chem.defnchemical_score_file parameter/chem_score.tblflex_definition_file parameter/flex.defnflex_drive_file parameter/flex_drive.tblligand_contact_file dock_cnt.mol2ligand_chemical_file dock_chm.mol2ligand_energy_file dock_nrg.mol2
Molecule to Molecule to be screenedbe screened
15
score_ligand $score_ligandminimize_ligand $minimize_ligandmultiple_ligands $multiple_ligandsrandom_seed $random_seedanchor_search $anchor_searchtorsion_drive $torsion_driveclash_overlap $clash_overlapconformation_cutoff_factor $conformation_cutoff_factortorsion_minimize $torsion_minimizematch_receptor_sites $match_receptor_sitesrandom_search $random_search . . . . . . . . . . . .maximum_cycles $maximum_cyclesligand_atom_file ${ligand_number}.mol2receptor_site_file $HOME/dock_inputs/${receptor_site_file}score_grid_prefix $HOME/dock_inputs/${score_grid_prefix}vdw_definition_file vdw.defnchemical_definition_file chem.defnchemical_score_file chem_score.tblflex_definition_file flex.defnflex_drive_file flex_drive.tblligand_contact_file dock_cnt.mol2ligand_chemical_file dock_chm.mol2ligand_energy_file dock_nrg.mol2
1. Parameterize Dock input file(use Nimrod Tools: GUI/language)
Molecule to be Molecule to be screenedscreened
16
parameter database_name label "database_name" text select oneof "aldrich" "maybridge" "maybridge_300" "asinex_egc" "asinex_epc" "asinex_pre" "available_chemicals_directory" "inter_bioscreen_s" "inter_bioscreen_n" "inter_bioscreen_n_300" "inter_bioscreen_n_500" "biomolecular_research_institute" "molecular_science" "molecular_diversity_preservation" "national_cancer_institute" "IGF_HITS" "aldrich_300" "molecular_science_500" "APP" "ECE" default "aldrich_300";
parameter CDB_SERVER text default "bezek.dstc.monash.edu.au";parameter CDB_PORT_NO text default "5001";parameter score_ligand text default "yes";parameter minimize_ligand text default "yes";parameter multiple_ligands text default "no";parameter random_seed integer default 7;parameter anchor_search text default "no";parameter torsion_drive text default "yes";parameter clash_overlap float default 0.5;parameter conformation_cutoff_factor integer default 5;parameter torsion_minimize text default "yes";parameter match_receptor_sites text default "no"; . . . . . . . . . . . .parameter maximum_cycles integer default 1;parameter receptor_site_file text default "ece.sph";parameter score_grid_prefix text default "ece";parameter ligand_number integer range from 1 to 2000 step 1;
2. Create Docking Plan:Define Variable and their value
Molecules to be Molecules to be screenedscreened
17
task nodestart copy ./parameter/vdw.defn node:. copy ./parameter/chem.defn node:. copy ./parameter/chem_score.tbl node:. copy ./parameter/flex.defn node:. copy ./parameter/flex_drive.tbl node:. copy ./dock_inputs/get_molecule node:. copy ./dock_inputs/dock_base node:.endtasktask main node:substitute dock_base dock_run node:substitute get_molecule get_molecule_fetch node:execute sh ./get_molecule_fetch node:execute $HOME/bin/dock.$OS -i dock_run -o dock_out copy node:dock_out ./results/dock_out.$jobname copy node:dock_cnt.mol2 ./results/dock_cnt.mol2.$jobname copy node:dock_chm.mol2 ./results/dock_chm.mol2.$jobname copy node:dock_nrg.mol2 ./results/dock_nrg.mol2.$jobnameendtask
Create Docking PlanFile3. Define Task that jobs need to
do
18
Gridbus Visual Tool for Parametric Application Creation (e.g., Docking)
19
Chemical DataBase (CDB)
Databases consist of small molecules from commercially available organic synthesis libraries, and natural product databases.
There is also the ability to screen virtual combinatorial databases, in their entirety.
This methodology allows only the required compounds to be subjected to physical screening and/or synthesis reducing both time and expense.
20
Target Testcase
The target for the test case: electrocardiogram (ECE) endothelin converting enzyme. This is involved in “heart stroke” and other transient ischemia.
Is·che·mi·a : A decrease in the blood supply to a bodily organ, tissue, or part caused by constriction or obstruction of the blood vessels.
Docking Deployment on The World Wide Grid
22
Scheduling Molecular Docking Application on Grid: Experiment
Workload – Docking 200 molecules with ECE 200 jobs, each need in the order of 3 minute depending
on molecule weight. Deadline: 60 min. and budget: 50, 000 G$/tokens Strategy: minimise time / cost Execution Cost with cost optimisation
Optimise Cost: 14, 277(G$) (finished in 59.30 min.) Optimise Time: 17, 702 (G$) (finished in 34 min.) In this experiment: Time-optimised scheduling costs
extra 3.5K$ compared to that of Cost-optimised. Users can now trade-off between Time Vs. Cost.
23
WWG Setup
GMonitor
Grid MarketDirectory
Australia
Melbourne+Monash U:
VPAC, Physics
Solaris WS
Gridbus+Nimrod-G
Europe
ZIB: T3E/OnyxAEI: Onyx CNR: ClusterCUNI/CZ: OnyxPozman: SGI/SP2Vrije U: ClusterCardiff: Sun E6500Portsmouth: Linux PCManchester: O3KCambridge: SGIMany others
Asia
AIST, Japan: Solaris ClusterOsaka University: ClusterDoshia: Linux clusterKorea: Linux cluster
North America
ANL: SGI/Sun/SP2NCSA: ClusterWisc: PC/clusterNRC, CanadaMany others
InternetWW Grid
MEG Visualisation
24
Resources Selected & Price/CPU-sec.
Resource & Location
Grid services & Fabric
Cost/CPU sec. or unit
No. of Jobs Executed
Time_Opt Cost_Opt
Monash, Melbourne, Australia (Sun Ultra01)
Globus, Nimrod-G, GTS (master node)
-- -- --
AIST, Tokyo, Japan, Ultra-4
Globus, GTS, Fork 1 44 102
AIST, Tokyo, Japan, Ultra-4
Globus, GTS, Fork 2 41 41
AIST, Tokyo, Japan, Ultra-4
Globus, GTS, Fork 1 42 39
AIST, Tokyo, Japan, Ultra-2
Globus, GTS, Fork 3 11 4
Sun-ANL, Chicago,US, Ulta-8
Globus, GTS, Fork 1 62 14Total Experiment Cost (G$) 17,702 14,277
Time to Complete Exp. (Min.) 34 59.30
25
DBC Scheduling for Time Optimization – No. of Jobs in Exec.
0
1
2
3
4
5
6
7
8
9
Time (in Min.)
No
. o
f Jo
bs
in E
xec.
AIST-Sun-hpc420.hpcc.jp
AIST-Sun-hpc420-1.hpcc.jp
AIST-Sun-hpc420-2.hpcc.jp
AIST-Sun-hpc220-2.hpcc.jp
ANL-Sun-pitcairn.mcs.anl.gov
26
DBC Scheduling for Cost Optimization – No. of Jobs in Exec.
0
1
2
3
4
5
6
7
8
9
10
Time (in min.)
No
. o
f Jo
bs
in E
xecu
tio
n
AIST-Sun-hpc420.hpcc.jp
AIST-Sun-hpc420-1.hpcc.jp
AIST-Sun-hpc420-2.hpcc.jp
AIST-Sun-hpc220-2.hpcc.jp
ANL-Sun-pitcairn.mcs.anl.gov
27
Summary and Conclusion
Applications can be Grid enabled and deployed on the Grid with minimal effort, but need a right set of Grid tools.
Distributed Docking demonstrates that Nimrod-G and Gridbus tools:
Enable Grid application software engineering rapidly Provide powerful runtime machinery for optimal
deployment of applications on the Grid. Easy to use tools for composing applications to run on
Grid are essential to attracting and getting application community on board.
Integrate with our Data Grid Broker to support selection of CDB nodes dynamically. (progress)
28
Thanks
http:/www.gridbus.org/vlab
29
DBC Time Opt. Scheduling
30
DBC Scheduling for Time Optimization – No. of Jobs Finished
0
10
20
30
40
50
60
70
Time (in min.)
No
. o
f Jo
bs
Fin
ish
ed
AIST-Sun-hpc420.hpcc.jp
AIST-Sun-hpc420-1.hpcc.jp
AIST-Sun-hpc420-2.hpcc.jp
AIST-Sun-hpc220-2.hpcc.jp
ANL-Sun-pitcairn.mcs.anl.gov
31
DBC Scheduling for Time Optimization – Budget Spent
0
1000
2000
3000
4000
5000
6000
7000
Time (in min.)
G$
spen
t fo
r p
roce
ssin
g j
ob
s
AIST-Sun-hpc420.hpcc.jp
AIST-Sun-hpc420-1.hpcc.jp
AIST-Sun-hpc420-2.hpcc.jp
AIST-Sun-hpc220-2.hpcc.jp
ANL-Sun-pitcairn.mcs.anl.gov
32
DBC Cost Opt. Scheduling
33
DBC Scheduling for Cost Optimization – No. of Jobs
Finished
0
20
40
60
80
100
120
Time (in min.)
No
. o
f Jo
bs
Exe
cute
d
AIST-Sun-hpc420.hpcc.jp
AIST-Sun-hpc420-1.hpcc.jp
AIST-Sun-hpc420-2.hpcc.jp
AIST-Sun-hpc220-2.hpcc.jp
ANL-Sun-pitcairn.mcs.anl.gov
34
DBC Scheduling for Cost Optimization – Budget Spent
0
1000
2000
3000
4000
5000
6000
Time (in min.)
G$
spen
t fo
r jo
b p
roce
ssin
g
AIST-Sun-hpc420.hpcc.jp
AIST-Sun-hpc420-1.hpcc.jp
AIST-Sun-hpc420-2.hpcc.jp
AIST-Sun-hpc220-2.hpcc.jp
ANL-Sun-pitcairn.mcs.anl.gov
35
Parametric Processing
Multiple RunsSame ProgramMultiple Data Killer Application for the Grid!
ParametersAge Hair
23 CleanAge Hair
23 Clean23 Beard28 Goatee
Age Hair23 Clean23 Beard
Age Hair23 Clean23 Beard28 Goatee28 Clean
Age Hair23 Clean23 Beard28 Goatee28 Clean19 Moustache
Age Hair23 Clean23 Beard28 Goatee28 Clean19 Moustache10 Clean
Age Hair23 Clean23 Beard28 Goatee28 Clean19 Moustache10 Clean
-4000000 Too much
Courtesy: Anand Natrajan, University of Virginia
Magic Engine forManufacturing Humans!