INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
In silico docking on EGEE infrastructure, the case of WISDOM
Nicolas Jacq
LPC of Clermont-Ferrand, CNRS/IN2P3
EGEE User Forum
CERN, 01-03.03.2006
EGEE User Forum, CERN, 01-03.03.2006 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Challenges of in silico drug discovery against neglected diseases
• There is a need to develop new drugs for the diseases of the developing world– HIV/AIDS, malaria and tuberculosis account for 5,6 million deaths– Permanent necessity to develop new drugs to fight emerging
resistance to drugs (malaria)– Unchanged pharmacopeia for decades against trypanosomiasis,
leishmaniasis, Chagas disease...
• WHO Tropical Disease Research program is preparing a list of recommended targets for drug discovery
• Millions of chemical compounds are available in the laboratories and also in 2D, 3D electronic databases
• Set-up a world wide initiative to address in silico drug discovery against neglected diseases on grid infrastructures.
EGEE User Forum, CERN, 01-03.03.2006 3
Enabling Grids for E-sciencE
INFSO-RI-508833
Drug discovery workflow
Biology teams
Docking servicesMD services Annotation services
Bioinformatics teams
target
Chemist/biologist teams
hitsSelected hits
Grid service customers
Grid service providers
Grid infrastructure
Check point
Check point
Chimioinformatics teams
Data access for expert teams in the world
Check point
EGEE User Forum, CERN, 01-03.03.2006 4
Enabling Grids for E-sciencE
INFSO-RI-508833
Grid added value for a large scale in silico experimentation
• Key issues to promote the grid in the pharmaceutical community– Cost and time reduction in a drug discovery development– Security and data protection– Fault tolerant and robust services and infrastructure– Transparent and easy use of the interfaces
• Grid added value of EGEE for WISDOM– Large computing and storage resources– Job Management Service– Information and Monitoring Services– Data Management Services– Security (to be improved)– Reliability of services (to be improved)
EGEE User Forum, CERN, 01-03.03.2006 5
Enabling Grids for E-sciencE
INFSO-RI-508833
First biomedical data challenge: World-wide In Silico Docking On Malaria (WISDOM)
• Significant biological parameters– Two different molecular docking
applications (Autodock and FlexX)– About one million virtual ligands
selected (ZINC)– Target proteins from the parasite
responsible for malaria
• Significant numbers – Total of about 46 million ligands docked in 6 weeks– 1TB of data produced – Up 1700 computers in 15 countries used simultaneously
corresponding to about 80 CPU years– Average crunching factor ~600
• Significant results– Best hits to be reranked using Molecular Dynamics simulations
EGEE User Forum, CERN, 01-03.03.2006 6
Enabling Grids for E-sciencE
INFSO-RI-508833
WISDOM deployment : wisdom.eu-egee.fr
SouthEasternEurope, 10%
SouthWesternEurope, 12% Italy, 16%
France, 18%
UKI, 29%NorthernEurope, 7%
CentralEurope, 4%
AsiaPacific, 2%
GermanySwitzerland, 1%
Russia, 1%
Total amount of CPU provided by EGEE
federation
Countries with nodes contributing to the data challenge WISDOM
•10•UK•1•Poland•1•Germany
•1•Taiwan•2•Netherlands•9•France
•7•Spain•13•Italy•1•Cyprus
•2•Russia•1•Israel•1•Croatia
•1•Romania•3•Greece•3•Bulgaria
•sites•country•sites•country•sites•country
EGEE User Forum, CERN, 01-03.03.2006 7
Enabling Grids for E-sciencE
INFSO-RI-508833
Design of the WISDOM production system
BIOMEDICAL VOLCG componentsEGEE resources
Application components
wisdom_install
Installer Tester
wisdom_test
wisdom_executionWorkload definition
Job submissionJob monitoring
Job bookkeepingFault trackingFault fixing
Job resubmission
Instance
User
wisdom_collect
Accounting data
Superviser
wisdom_sitewisdom_db
License server
BIOMEDICAL VOLCG componentsEGEE resources
Application components
wisdom_install
Installer Tester
wisdom_test
wisdom_executionWorkload definition
Job submissionJob monitoring
Job bookkeepingFault trackingFault fixing
Job resubmission
Instance
User
wisdom_collect
Accounting data
Superviser
wisdom_sitewisdom_db
License server
EGEE User Forum, CERN, 01-03.03.2006 8
Enabling Grids for E-sciencE
INFSO-RI-508833
Preliminary results of the first data challenge
• Conditions controlled– Score of an output is independent of the grid
resource where the job runs
• 10% compounds of Chembridge (ZINC) may are hits– Top scoring compounds possess basic
chemical groups like thiourea, guanidino, andamino acroleinas core structure.
– Identified compounds are non peptidic and low molecular weight compounds
– The identified compounds look like thrombin inhibitors.
WISDOM-375228
WISDOM-113696
EGEE User Forum, CERN, 01-03.03.2006 9
Enabling Grids for E-sciencE
INFSO-RI-508833
Timescale
• Very short term = Spring 2006 : reranking of WISDOM hits by Molecular Dynamics simulations– Approximately 100 years CPU needed– Supported by EGEE-II & BioinfoGrid european projects– Need for ressources on supercomputers (contact with DEISA)
• Short term = fall 2006 : WISDOM2, second large scale grid docking – several new foreseen targets on malaria, dengue and other neglected
diseases. – Resources needed: up to 80 years CPU per target– Supported by EGEE-II and EELA european projects, Swiss BioGrid
initiative
• Mid term = Summer 2007: reranking of WISDOM2 hits by MD simulations
EGEE User Forum, CERN, 01-03.03.2006 10
Enabling Grids for E-sciencE
INFSO-RI-508833
Credits
LPC (CNRS/IN2P3)– V. Breton– N. Jacq– J. Salzemann– Y. Legré– M. Reichstadt– F. Jacq
EGEE– Biomed Task Force– EIS team– JRA2 team
Fraunhofer SCAI– M. Hofmann– M. Zimmermann– A. Maaß– M. Sridhar– K. Vinod-Kusam
– H. Schwichtenberg