infso-ri-508833 enabling grids for e-science using grid computing to accelerate structure-based...
Post on 19-Dec-2015
213 views
TRANSCRIPT
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases
Hurng-Chun Lee, Li-Yung Ho, and Ying-Ta Wu*[email protected]
*Genomics Research Center
Academia Sinica, Taiwan
EGEE User Forum
CERN, 01-03.03.2006
EGEE User Forum, CERN, 01-03.03.2006 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Outline
Influenza A Pandemic
H5N1
H1N1 H1N1 H2N2 H3N2 H1N1H9N2
H7N7
H5N1
H5N1
NAHA
2006
2005
http://www.who.int/csr/disease/avian_influenza
92deaths/170cases
Feb 26, 2006
EGEE User Forum, CERN, 01-03.03.2006 3
Enabling Grids for E-sciencE
INFSO-RI-508833
Neuraminidases cleave host receptors
help release of new virions
EGEE User Forum, CERN, 01-03.03.2006 4
Enabling Grids for E-sciencE
INFSO-RI-508833
Neuraminidase and Inhibitors
Zanamivir R=guanidine
OseltamivirR=H R’=amine
R’
Structure-Based Drug Design
binding pocket
EGEE User Forum, CERN, 01-03.03.2006 5
Enabling Grids for E-sciencE
INFSO-RI-508833
Mutation N1 N2
R292Koseltamivir Zanamivir
oseltamivir Zanamivir
H274Y(F) oseltamivir oseltamivir
N294S oseltamivir? oseltamivir
E119V oseltamivir? oseltamivir
E119(G;A;D) oseltamivir? Zanamivir
: Predicted mutation site by structure overlay and sequence alignment: Reported mutation site
Drug-resistant variants and Point Mutation
EGEE User Forum, CERN, 01-03.03.2006 6
Enabling Grids for E-sciencE
INFSO-RI-508833
1. Prepare the Target Protein
-- add polar hydrogen atoms
-- assign charges to atoms
-- decide range of binding site
2. Run AutoGrid
3. Prepare the Ligand
-- assign charges to atoms
-- decide flexible bonds (run AutoTors)
4. Run AutoDock
5. Evaluate Results and Rank Score
AutoGrid
AutoTors
Garrett M. Morris David S. Goodsell Ruth Huey William E. Hart Scott Halliday Rik Belew Arthur J. Olson
AutoDock
Morris et al. (1998), J. Computational Chemistry , 19 : 1639-1662.
Docking Engine: AutoDock 3.0.5
EGEE User Forum, CERN, 01-03.03.2006 7
Enabling Grids for E-sciencE
INFSO-RI-508833
Application Characteristic
• Virtual screening based on molecular docking is the most time consuming part in structure-based drug design workflow
• Number of docking tasks = N x M– N: number of ligands– M: number of target structures
• CPU-bound application, huge amount of output, no communication between tasks
• Task complexity is unpredictable– difficult to apply trivial domain decomposition method in
splitting the tasks
The pitiful …
EGEE User Forum, CERN, 01-03.03.2006 8
Enabling Grids for E-sciencE
INFSO-RI-508833
Issues of the Grid applications
• Due to the loose coupling nature, distributing application jobs on the Grid is not trivial– extra works are needed concerning the efficient job handling and
result gathering– need also efforts to handle transient network or site problems– complexities should be hidden and the interface to end user
should be application oriented
• The significant Grid system overhead makes the Grid only benefit to the jobs with long computing time– not suitable for the pilot jobs for decision making
EGEE User Forum, CERN, 01-03.03.2006 9
Enabling Grids for E-sciencE
INFSO-RI-508833
What is DIANE?
• A lightweight framework for parallel scientific applications in master-worker model
– ideal for applications without communications between parallel tasks (e.g. for most of the Bioinformatics applications in analyzing huge amount of independent dataset)
• The framework takes care of all synchronization, communication and workflow management details on behalf of application
•DIANE = Distributed Analysis Environment
EGEE User Forum, CERN, 01-03.03.2006 10
Enabling Grids for E-sciencE
INFSO-RI-508833
Distributing AutoDock tasks on the Grid using DIANE
EGEE User Forum, CERN, 01-03.03.2006 11
Enabling Grids for E-sciencE
INFSO-RI-508833
DIANE/AutoDock A generic framework to which application can easily plug-in# -*- python -*-
Application = 'Autodock'
JobInitData = {'macro_repos' :'/home/hclee/diane_demo/autodock/macro', 'ligand_repos':'/home/hclee/diane_demo/autodock/ligand', 'ftprotocol':'gass', 'output_prefix':'autodock_test'
}
## The input files will be staged in to workersInputFiles = []
## The definition of failure recoverydef failRecovery(self): print '*'*30
for t in self.master.tasks.failed(): print "ignoring failed task:",t t.ignore() print '*'*30 return 1
autodock.job
Application specific job attributes
Job level failure recovery definition
% diane.startjob –-job autodock.job –ganga –w 32@lcg,32@pbs
• Intuitive job execution command• Possible to mix heterogeneous computing backends
EGEE User Forum, CERN, 01-03.03.2006 12
Enabling Grids for E-sciencE
INFSO-RI-508833
DIANE/AutoDock – integrated user interface
EGEE User Forum, CERN, 01-03.03.2006 13
Enabling Grids for E-sciencE
INFSO-RI-508833
Performance Evaluation
• Test case– 5 target protein: 1 protein, 5 conformations– ligand: 100 small compounds (with 7 positives )500 docking tasks in total
• Test environment– DIANE backend handler: SSH– Hardware spec:
Traditional PC cluster with NFS (2 x Intel Xeon
2.8 GHz + 2 GB memory per node)
– Grid: LCG
EGEE User Forum, CERN, 01-03.03.2006 14
Enabling Grids for E-sciencE
INFSO-RI-508833
0
2000
4000
6000
8000
10000
12000
14000
16000
18000
sec
10 20 30 40 50 60 70
worker
Test ResultsDIANE/AutoDock framework on Cluster
Duration time : total elapsed time of a DIANE job
Each DIANE job contain 500 tasks (5 protein conformations x 100 compounds)
EGEE User Forum, CERN, 01-03.03.2006 15
Enabling Grids for E-sciencE
INFSO-RI-508833
Handling docking jobs on traditional PC cluster
good load balance
a DIANE/Autodock Task
Test ResultsDIANE/AutoDock framework on Cluster
EGEE User Forum, CERN, 01-03.03.2006 16
Enabling Grids for E-sciencE
INFSO-RI-508833
DIANE/AutoDock framework on LCG-GRID
terminated
EGEE User Forum, CERN, 01-03.03.2006 17
Enabling Grids for E-sciencE
INFSO-RI-508833
Without redundant scheduling
EGEE User Forum, CERN, 01-03.03.2006 18
Enabling Grids for E-sciencE
INFSO-RI-508833
With redundant scheduling
job was reassigned to other nodes
EGEE User Forum, CERN, 01-03.03.2006 19
Enabling Grids for E-sciencE
INFSO-RI-508833
CID MW dG_D* dG_B* Real Time1 1tmt 419.5 -16.2 -11.3 11:14.392 1b3g 389.5 -16.2 -12.6 12:06.563 1r1h 417.4 -15.7 -12.4 8:59.824 1xka 389.5 -15.4 -13.9 9:40.485 1b7h 389.5 -15.1 -8.7 12:07.426 1tkb 421.3 -14.8 -12.1 7:00.347 1oxn 387.4 -14.7 -10.4 8:53.678 1fdq 327.5 -14.5 -9.0 7:04.149 2qwf 341.4 -14.4 -11.5 7:28.40
10 1byk 420.3 -14.3 -11.6 8:40.9811 1h1h 423.2 -14.2 -12.2 6:32.8212 1hi3 423.2 -13.8 -11.4 6:50.9513 2qwe 332.3 -13.7 -11.1 7:33.2114 1vyg 303.5 -13.3 -9.1 6:27.5115 1tyr 304.5 -13.2 -10.6 5:16.2716 1oe8 306.3 -12.9 -9.0 6:39.3617 1db1 416.6 -12.9 -11.1 8:21.0818 1b5h 363.5 -12.7 -11.6 12:13.8419 1f8e 290.3 -12.5 -10.4 6:33.1420 1tom 388.6 -12.5 -10.3 8:36.4921 1dud 385.1 -12.5 -10.4 5:55.1322 1cnw 389.5 -12.3 -8.0 10:01.8223 1nje 305.2 -12.2 -10.7 4:56.4824 1nc9 243.3 -12.2 -10.5 3:58.8625 1a0q 341.3 -12.2 -8.1 6:01.6826 1ur8 422.4 -12.1 -9.3 9:03.5627 1ndv 370.5 -12.0 -10.6 6:37.1128 1gzc 342.3 -12.0 -9.4 7:28.8729 1n5r 414.6 -11.9 -10.1 8:08.2030 1duv 289.2 -11.9 -8.9 6:06.6231 1njc 305.2 -11.8 -10.4 5:09.6832 1f8c 290.3 -11.8 -9.6 6:02.9233 1qi0 342.3 -11.7 -9.8 7:32.3534 2izl 243.3 -11.7 -10.3 3:59.7435 1nja 305.2 -11.6 -10.7 4:43.2536 1o2k 325.4 -11.6 -10.4 6:52.0137 2qwd 290.3 -11.5 -9.4 6:02.8738 3std 365.5 -11.5 -9.6 8:52.7039 1o2q 370.9 -11.4 -10.3 7:20.6640 1cru 371.2 -11.4 -8.9 5:15.4041 1fcy 385.5 -11.4 -10.5 5:46.0942 1v79 342.2 -11.4 -9.9 5:42.3643 1osv 419.6 -11.4 -10.5 6:12.8844 2qwb 308.3 -11.3 -8.9 6:07.3945 1bnq 370.5 -11.3 -9.2 5:30.1946 1o2n 363.8 -11.2 -10.4 7:02.6747 1af6 342.3 -11.1 -8.5 7:36.2348 1urg 342.3 -11.0 -8.7 7:23.5449 1q54 340.0 -11.0 -9.2 4:20.1450 1ejn 342.5 -11.0 -9.7 7:06.76
CID MW dG_D* dG_B* Real Time51 1swr 243.3 -10.8 -9.3 3:45.8752 1n1t 290.3 -10.7 -9.1 5:39.7853 1rej 370.4 -10.7 -8.7 7:12.4154 1swp 243.3 -10.7 -9.0 3:44.5655 1sqo 264.3 -10.6 -9.9 4:38.4556 1f8b 290.3 -10.6 -8.7 5:37.7357 2csn 286.8 -10.6 -9.0 4:31.0358 2qwc 290.3 -10.6 -8.7 5:37.6159 1q63 288.3 -10.5 -9.0 4:45.6360 2sim 290.3 -10.5 -8.6 8:38.2161 1erb 327.5 -10.5 -8.1 5:57.8262 1c5c 343.3 -10.4 -8.8 4:15.8763 1fh8 250.3 -10.3 -9.1 4:46.5164 1pzp 305.3 -10.3 -9.0 5:50.0865 1cil 325.4 -10.2 -9.5 4:22.4466 1c83 246.2 -10.1 -8.5 3:42.3867 1n46 367.4 -10.1 -9.3 6:15.4068 1vpo 288.4 -10.0 -10.0 2:46.5169 1g52 326.3 -10.0 -8.1 5:26.0270 1qaw 204.2 -9.8 -9.5 3:47.6671 1gpk 243.3 -9.8 -9.7 3:00.5272 1g46 326.3 -9.8 -8.1 5:25.4073 1n3i 264.3 -9.7 -8.6 4:35.7074 2tmn 208.2 -9.7 -8.3 4:53.6075 1k4h 264.4 -9.6 -8.1 4:24.9176 1vot 243.3 -9.6 -9.5 3:00.3577 1bgq 364.8 -9.5 -9.4 3:30.9078 1j01 263.2 -9.4 -8.6 4:48.1579 1hp0 266.3 -9.3 -8.7 4:53.9080 1u0g 200.1 -9.3 -8.5 3:54.5281 1g48 326.3 -9.3 -7.6 5:27.9682 1ctu 246.2 -9.2 -8.1 4:43.1783 1nw4 266.3 -9.2 -8.5 4:50.0084 1v0m 265.3 -9.1 -8.3 5:07.8885 1i9n 326.3 -9.1 -7.7 5:13.6186 1e6q 203.2 -9.0 -8.3 3:35.0487 1dl7 304.2 -9.0 -8.3 4:42.5688 1pxp 325.4 -8.9 -7.9 4:03.9789 1cbx 206.2 -8.8 -7.6 3:26.7190 1wht 206.2 -8.5 -7.9 3:26.5491 1g53 326.3 -8.4 -6.8 5:12.8492 1n2v 208.2 -8.4 -7.3 3:21.4993 1bcu 209.3 -8.4 -8.7 2:47.0894 1add 266.3 -8.4 -8.3 4:59.0195 1pxo 207.3 -8.3 -7.8 3:18.2296 1fj4 210.3 -7.8 -7.5 3:00.0997 2pcp 243.4 -7.5 -7.7 3:20.6198 1sbr 265.4 -7.4 -8.6 4:25.0599 1o3l 388.2 -7.1 -11.7 6:39.41
100 1kv1 306.8 -7.1 -7.1 5:06.49
Compound library enrichment
AutoDock parameters:
translation / step=2.0 Å
quaternion / step =20 degree
torsion / step= 20 degree
number of energy evaluation
=1.5 X 106
max. number of generation
=2.7 X 104
Run number =10
red = positives
All positives were docked
within RMSD<1.5Å
EGEE User Forum, CERN, 01-03.03.2006 20
Enabling Grids for E-sciencE
INFSO-RI-508833
Probe effects due to
minor changes in
target’s binding sites
EGEE User Forum, CERN, 01-03.03.2006 21
Enabling Grids for E-sciencE
INFSO-RI-508833
Summary• Modeling compound-protein complex can be speeded up
by distributing molecular docking processes on the Grid.
• With the DIANE framework, distributing molecular docking tasks on the Grid can be easily implemented with intuitive interface for end user.
• The DIANE framework also provides the functionalities by which the system can be easily tuned to tackle the issues in distributing molecular docking tasks on the loosely-coupled Grid.
• This simple test case demonstrated that huge compound databases can be effectively enriched by executing docking tasks on Grid. However, more resources are required in order to build up a real HTP docking service for life science community.
EGEE User Forum, CERN, 01-03.03.2006 22
Enabling Grids for E-sciencE
INFSO-RI-508833
Acknowledgements
Li-Yung Ho
Hurng-Chun Lee
Hsing-Yen Chen
Dr. Simon Lin
Jakub Moscicki
Dr. Massimo Lamanna
Supports from
Genomics Research Center, Academia Sinica
National Science Council, Taiwan
are highly appreciated
LCG-ARDA, CERN
EGEE User Forum, CERN, 01-03.03.2006 23
Enabling Grids for E-sciencE
INFSO-RI-508833
Interacting Complexes
A key step to structure-based inhibitor design
PDB1F8B