infso-ri-508833 enabling grids for e-science using grid computing to accelerate structure-based...

23
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases Hurng-Chun Lee, Li-Yung Ho, and Ying-Ta Wu * [email protected] *Genomics Research Center Academia Sinica, Taiwan EGEE User Forum CERN, 01-03.03.2006

Post on 19-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

Hurng-Chun Lee, Li-Yung Ho, and Ying-Ta Wu*[email protected]

*Genomics Research Center

Academia Sinica, Taiwan

EGEE User Forum

CERN, 01-03.03.2006

Page 2: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 2

Enabling Grids for E-sciencE

INFSO-RI-508833

Outline

Influenza A Pandemic

H5N1

H1N1 H1N1 H2N2 H3N2 H1N1H9N2

H7N7

H5N1

H5N1

NAHA

2006

2005

http://www.who.int/csr/disease/avian_influenza

92deaths/170cases

Feb 26, 2006

Page 3: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 3

Enabling Grids for E-sciencE

INFSO-RI-508833

Neuraminidases cleave host receptors

help release of new virions

Page 4: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 4

Enabling Grids for E-sciencE

INFSO-RI-508833

Neuraminidase and Inhibitors

Zanamivir R=guanidine

OseltamivirR=H R’=amine

R’

Structure-Based Drug Design

binding pocket

Page 5: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 5

Enabling Grids for E-sciencE

INFSO-RI-508833

Mutation N1 N2

R292Koseltamivir Zanamivir

oseltamivir Zanamivir

H274Y(F) oseltamivir oseltamivir

N294S oseltamivir? oseltamivir

E119V oseltamivir? oseltamivir

E119(G;A;D) oseltamivir? Zanamivir

: Predicted mutation site by structure overlay and sequence alignment: Reported mutation site

Drug-resistant variants and Point Mutation

Page 6: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 6

Enabling Grids for E-sciencE

INFSO-RI-508833

1. Prepare the Target Protein

-- add polar hydrogen atoms

-- assign charges to atoms

-- decide range of binding site

2. Run AutoGrid

3. Prepare the Ligand

-- assign charges to atoms

-- decide flexible bonds (run AutoTors)

4. Run AutoDock

5. Evaluate Results and Rank Score

AutoGrid

AutoTors

Garrett M. Morris David S. Goodsell Ruth Huey William E. Hart Scott Halliday Rik Belew Arthur J. Olson

AutoDock

Morris et al. (1998), J. Computational Chemistry , 19 : 1639-1662.

Docking Engine: AutoDock 3.0.5

Page 7: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 7

Enabling Grids for E-sciencE

INFSO-RI-508833

Application Characteristic

• Virtual screening based on molecular docking is the most time consuming part in structure-based drug design workflow

• Number of docking tasks = N x M– N: number of ligands– M: number of target structures

• CPU-bound application, huge amount of output, no communication between tasks

• Task complexity is unpredictable– difficult to apply trivial domain decomposition method in

splitting the tasks

The pitiful …

Page 8: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 8

Enabling Grids for E-sciencE

INFSO-RI-508833

Issues of the Grid applications

• Due to the loose coupling nature, distributing application jobs on the Grid is not trivial– extra works are needed concerning the efficient job handling and

result gathering– need also efforts to handle transient network or site problems– complexities should be hidden and the interface to end user

should be application oriented

• The significant Grid system overhead makes the Grid only benefit to the jobs with long computing time– not suitable for the pilot jobs for decision making

Page 9: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 9

Enabling Grids for E-sciencE

INFSO-RI-508833

What is DIANE?

• A lightweight framework for parallel scientific applications in master-worker model

– ideal for applications without communications between parallel tasks (e.g. for most of the Bioinformatics applications in analyzing huge amount of independent dataset)

• The framework takes care of all synchronization, communication and workflow management details on behalf of application

•DIANE = Distributed Analysis Environment

Page 10: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 10

Enabling Grids for E-sciencE

INFSO-RI-508833

Distributing AutoDock tasks on the Grid using DIANE

Page 11: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 11

Enabling Grids for E-sciencE

INFSO-RI-508833

DIANE/AutoDock A generic framework to which application can easily plug-in# -*- python -*-

Application = 'Autodock'

JobInitData = {'macro_repos' :'/home/hclee/diane_demo/autodock/macro', 'ligand_repos':'/home/hclee/diane_demo/autodock/ligand', 'ftprotocol':'gass', 'output_prefix':'autodock_test'

}

## The input files will be staged in to workersInputFiles = []

## The definition of failure recoverydef failRecovery(self): print '*'*30

for t in self.master.tasks.failed(): print "ignoring failed task:",t t.ignore() print '*'*30 return 1

autodock.job

Application specific job attributes

Job level failure recovery definition

% diane.startjob –-job autodock.job –ganga –w 32@lcg,32@pbs

• Intuitive job execution command• Possible to mix heterogeneous computing backends

Page 12: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 12

Enabling Grids for E-sciencE

INFSO-RI-508833

DIANE/AutoDock – integrated user interface

Page 13: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 13

Enabling Grids for E-sciencE

INFSO-RI-508833

Performance Evaluation

• Test case– 5 target protein: 1 protein, 5 conformations– ligand: 100 small compounds (with 7 positives )500 docking tasks in total

• Test environment– DIANE backend handler: SSH– Hardware spec:

Traditional PC cluster with NFS (2 x Intel Xeon

2.8 GHz + 2 GB memory per node)

– Grid: LCG

Page 14: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 14

Enabling Grids for E-sciencE

INFSO-RI-508833

0

2000

4000

6000

8000

10000

12000

14000

16000

18000

sec

10 20 30 40 50 60 70

worker

Test ResultsDIANE/AutoDock framework on Cluster

Duration time : total elapsed time of a DIANE job

Each DIANE job contain 500 tasks (5 protein conformations x 100 compounds)

Page 15: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 15

Enabling Grids for E-sciencE

INFSO-RI-508833

Handling docking jobs on traditional PC cluster

good load balance

a DIANE/Autodock Task

Test ResultsDIANE/AutoDock framework on Cluster

Page 16: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 16

Enabling Grids for E-sciencE

INFSO-RI-508833

DIANE/AutoDock framework on LCG-GRID

terminated

Page 17: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 17

Enabling Grids for E-sciencE

INFSO-RI-508833

Without redundant scheduling

Page 18: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 18

Enabling Grids for E-sciencE

INFSO-RI-508833

With redundant scheduling

job was reassigned to other nodes

Page 19: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 19

Enabling Grids for E-sciencE

INFSO-RI-508833

CID MW dG_D* dG_B* Real Time1 1tmt 419.5 -16.2 -11.3 11:14.392 1b3g 389.5 -16.2 -12.6 12:06.563 1r1h 417.4 -15.7 -12.4 8:59.824 1xka 389.5 -15.4 -13.9 9:40.485 1b7h 389.5 -15.1 -8.7 12:07.426 1tkb 421.3 -14.8 -12.1 7:00.347 1oxn 387.4 -14.7 -10.4 8:53.678 1fdq 327.5 -14.5 -9.0 7:04.149 2qwf 341.4 -14.4 -11.5 7:28.40

10 1byk 420.3 -14.3 -11.6 8:40.9811 1h1h 423.2 -14.2 -12.2 6:32.8212 1hi3 423.2 -13.8 -11.4 6:50.9513 2qwe 332.3 -13.7 -11.1 7:33.2114 1vyg 303.5 -13.3 -9.1 6:27.5115 1tyr 304.5 -13.2 -10.6 5:16.2716 1oe8 306.3 -12.9 -9.0 6:39.3617 1db1 416.6 -12.9 -11.1 8:21.0818 1b5h 363.5 -12.7 -11.6 12:13.8419 1f8e 290.3 -12.5 -10.4 6:33.1420 1tom 388.6 -12.5 -10.3 8:36.4921 1dud 385.1 -12.5 -10.4 5:55.1322 1cnw 389.5 -12.3 -8.0 10:01.8223 1nje 305.2 -12.2 -10.7 4:56.4824 1nc9 243.3 -12.2 -10.5 3:58.8625 1a0q 341.3 -12.2 -8.1 6:01.6826 1ur8 422.4 -12.1 -9.3 9:03.5627 1ndv 370.5 -12.0 -10.6 6:37.1128 1gzc 342.3 -12.0 -9.4 7:28.8729 1n5r 414.6 -11.9 -10.1 8:08.2030 1duv 289.2 -11.9 -8.9 6:06.6231 1njc 305.2 -11.8 -10.4 5:09.6832 1f8c 290.3 -11.8 -9.6 6:02.9233 1qi0 342.3 -11.7 -9.8 7:32.3534 2izl 243.3 -11.7 -10.3 3:59.7435 1nja 305.2 -11.6 -10.7 4:43.2536 1o2k 325.4 -11.6 -10.4 6:52.0137 2qwd 290.3 -11.5 -9.4 6:02.8738 3std 365.5 -11.5 -9.6 8:52.7039 1o2q 370.9 -11.4 -10.3 7:20.6640 1cru 371.2 -11.4 -8.9 5:15.4041 1fcy 385.5 -11.4 -10.5 5:46.0942 1v79 342.2 -11.4 -9.9 5:42.3643 1osv 419.6 -11.4 -10.5 6:12.8844 2qwb 308.3 -11.3 -8.9 6:07.3945 1bnq 370.5 -11.3 -9.2 5:30.1946 1o2n 363.8 -11.2 -10.4 7:02.6747 1af6 342.3 -11.1 -8.5 7:36.2348 1urg 342.3 -11.0 -8.7 7:23.5449 1q54 340.0 -11.0 -9.2 4:20.1450 1ejn 342.5 -11.0 -9.7 7:06.76

CID MW dG_D* dG_B* Real Time51 1swr 243.3 -10.8 -9.3 3:45.8752 1n1t 290.3 -10.7 -9.1 5:39.7853 1rej 370.4 -10.7 -8.7 7:12.4154 1swp 243.3 -10.7 -9.0 3:44.5655 1sqo 264.3 -10.6 -9.9 4:38.4556 1f8b 290.3 -10.6 -8.7 5:37.7357 2csn 286.8 -10.6 -9.0 4:31.0358 2qwc 290.3 -10.6 -8.7 5:37.6159 1q63 288.3 -10.5 -9.0 4:45.6360 2sim 290.3 -10.5 -8.6 8:38.2161 1erb 327.5 -10.5 -8.1 5:57.8262 1c5c 343.3 -10.4 -8.8 4:15.8763 1fh8 250.3 -10.3 -9.1 4:46.5164 1pzp 305.3 -10.3 -9.0 5:50.0865 1cil 325.4 -10.2 -9.5 4:22.4466 1c83 246.2 -10.1 -8.5 3:42.3867 1n46 367.4 -10.1 -9.3 6:15.4068 1vpo 288.4 -10.0 -10.0 2:46.5169 1g52 326.3 -10.0 -8.1 5:26.0270 1qaw 204.2 -9.8 -9.5 3:47.6671 1gpk 243.3 -9.8 -9.7 3:00.5272 1g46 326.3 -9.8 -8.1 5:25.4073 1n3i 264.3 -9.7 -8.6 4:35.7074 2tmn 208.2 -9.7 -8.3 4:53.6075 1k4h 264.4 -9.6 -8.1 4:24.9176 1vot 243.3 -9.6 -9.5 3:00.3577 1bgq 364.8 -9.5 -9.4 3:30.9078 1j01 263.2 -9.4 -8.6 4:48.1579 1hp0 266.3 -9.3 -8.7 4:53.9080 1u0g 200.1 -9.3 -8.5 3:54.5281 1g48 326.3 -9.3 -7.6 5:27.9682 1ctu 246.2 -9.2 -8.1 4:43.1783 1nw4 266.3 -9.2 -8.5 4:50.0084 1v0m 265.3 -9.1 -8.3 5:07.8885 1i9n 326.3 -9.1 -7.7 5:13.6186 1e6q 203.2 -9.0 -8.3 3:35.0487 1dl7 304.2 -9.0 -8.3 4:42.5688 1pxp 325.4 -8.9 -7.9 4:03.9789 1cbx 206.2 -8.8 -7.6 3:26.7190 1wht 206.2 -8.5 -7.9 3:26.5491 1g53 326.3 -8.4 -6.8 5:12.8492 1n2v 208.2 -8.4 -7.3 3:21.4993 1bcu 209.3 -8.4 -8.7 2:47.0894 1add 266.3 -8.4 -8.3 4:59.0195 1pxo 207.3 -8.3 -7.8 3:18.2296 1fj4 210.3 -7.8 -7.5 3:00.0997 2pcp 243.4 -7.5 -7.7 3:20.6198 1sbr 265.4 -7.4 -8.6 4:25.0599 1o3l 388.2 -7.1 -11.7 6:39.41

100 1kv1 306.8 -7.1 -7.1 5:06.49

Compound library enrichment

AutoDock parameters:

translation / step=2.0 Å

quaternion / step =20 degree

torsion / step= 20 degree

number of energy evaluation

=1.5 X 106

max. number of generation

=2.7 X 104

Run number =10

red = positives

All positives were docked

within RMSD<1.5Å

Page 20: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 20

Enabling Grids for E-sciencE

INFSO-RI-508833

Probe effects due to

minor changes in

target’s binding sites

Page 21: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 21

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary• Modeling compound-protein complex can be speeded up

by distributing molecular docking processes on the Grid.

• With the DIANE framework, distributing molecular docking tasks on the Grid can be easily implemented with intuitive interface for end user.

• The DIANE framework also provides the functionalities by which the system can be easily tuned to tackle the issues in distributing molecular docking tasks on the loosely-coupled Grid.

• This simple test case demonstrated that huge compound databases can be effectively enriched by executing docking tasks on Grid. However, more resources are required in order to build up a real HTP docking service for life science community.

Page 22: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 22

Enabling Grids for E-sciencE

INFSO-RI-508833

Acknowledgements

Li-Yung Ho

Hurng-Chun Lee

Hsing-Yen Chen

Dr. Simon Lin

Jakub Moscicki

Dr. Massimo Lamanna

Supports from

Genomics Research Center, Academia Sinica

National Science Council, Taiwan

are highly appreciated

LCG-ARDA, CERN

Page 23: INFSO-RI-508833 Enabling Grids for E-sciencE  Using Grid Computing to Accelerate Structure-Based Design Against Influenza A Neuraminidases

EGEE User Forum, CERN, 01-03.03.2006 23

Enabling Grids for E-sciencE

INFSO-RI-508833

Interacting Complexes

A key step to structure-based inhibitor design

PDB1F8B