milanesi luciano egee user forum, clermont-ferrand, france 11-14 february, 2008 bioinfogrid project...

24
Milanesi Luciano EGEE User Forum, Clermont-Ferrand , France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute of Biomedical Technologies, Milan, Italy [email protected]

Upload: darrin-shadrick

Post on 14-Dec-2015

217 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

Milanesi Luciano EGEE User Forum, Clermont-Ferrand , France 11-14 February, 2008

BioinfoGRID ProjectMilanesi Luciano National Research Council Institute of Biomedical Technologies, Milan, Italy [email protected]

Page 2: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

2Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

Networks of resources

• The potential of new biological and biomedical technological platforms in connection with HPC and GRID technology will be particularly useful to deal with the increasing amount, complexity, and heterogeneity of biological and biomedical data.

• Bioinformatics applications for eHealth have become an ideal research area where computer scientists can apply and further develop new intelligent computation methods, in both experimental and theoretical cases.

Page 3: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

3Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

BioinfoGRID Project

BioinfoGRID Project web site: www.bioinfogrid.euwww.bioinfogrid.eu

Page 4: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

4Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

Consortium

Page 5: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

5Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

BioinfoGRID Objectives

• Objective of the BioinfoGRID project

Page 6: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

6Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

Interaction with related projects

At present the BioinfoGRID project has established

co-operations with the following projects initiative:• EGEE • BELIEF • EMBRACE• EUCHINAGRID• EUMEDGRID • EELA• DILIGENT• ICEAGE• LITBIO• LIBI• HEALTHGRID• WISDOM

Page 7: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

7Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

BioinfoGRID Work Packages

Project Management OfficeWP8

Dissemination and Outreach.WP7

Coordination of technical aspects and

relation with Grid infrastructure Projects,

user training, application support and

resources integration.

WP6

Molecular Dynamics ApplicationsWP5

Database and Functional Genomics

Applications

WP4

Transcriptomics Applications in GRIDWP3

Proteomics Applications in GRIDWP2

Genomics Applications in GRIDWP1

Work Package titleWork-package

No

Page 8: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

8Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

HUSARProgram Package

GCG

EMBOSS

DATABASESSRS

(Sequence Retrieval System)

In-house developments

Third-party programs

(~130 programs)

- >300- Prompt updates (daily, weekly)

(~150 programs)

- own programms- automated tasks

WP1 – Genomics Applications

Page 9: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

9Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

SoapLab

ScLinux (OS)

GridClienttoolkit

anymore

software??

Interface

% formatdb …% blastall …

Grid CE

WebService

Grid API

W3Hanalysis

tasks

Solaris (OS)

% formatdb …% blastall …

Grid CE

W2H

HTML pages

@dkfz-heidelberg.de

ScLinux (OS)

GridClienttoolkit

% submit_formatdb …% submit_blastall …@dkfz or anywhere else

ssh

target setup preliminary setup

anymore

software??

WP1 – Genomics Applications

• Integrating W3H, SoapLab and the GRID

Page 10: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

10Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

WP2 – Proteomics Applications

• Perform functional protein analysis in GRID by using the functional protein domain annotations on large protein families using GRID and related databases.

• All 518 human protein kinases and 5129 proteins from non-redundant chainset of Protein DataBank were analyzed with InterProScan applications

Page 11: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

11Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

WP2 – Proteomics Applications

• Protein surface calculation in GRID. : the grid was used to compute the volumetric description of the proteins obtaining a precise representation of the corresponding surface. Then protein interactions could be quickly screened by the mean of surface analysis.– The ProSite domains were analyzed all-against-all– ATP-E against its inhibitor– Collagen against integrin

Page 12: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

12Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

WP3 – Transcriptomics applications

• Phylogenetics : Reconstructing the evolutionary history of a group of taxa is major research thrust in computational biology and a standard part of exploratory sequence analysis.

• An evolutionary history not only gives relationships among taxa, but also an important tool for inferring structural, physiological, and biochemical properties of sequences from other similar sequences, and reconstruction of tissue evolution.

Page 13: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

WP4 – Databases & Genomics Applications

• Work Package 4: Databases and Functional Genomics Applications– Testing the main biological databases in the Grid

environment optimization on storage space, bandwidth, download

time– Testing performances and scalability of database-based

applications performances/scalability testing according to various

use cases and submission algorithms– 1 challenge: Gene Analogous Finder

55+ years of computation on a single CPU, not feasible in a local environment.

Page 14: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

• GridDBManager– Automatic Updater

Timer based monitoring and update of Grid ported databases

– Adaptive replica manager Constantly adapts the number of replicas in relation

to the usage of each database in the last 10 days– Version Regression

Keeps patches on the Grid for allowing regression of each database to an earlier version

WP4 – Databases handling

Page 15: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

15Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

WP4 – Methods - GridDBManager

Page 16: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

16Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

• Testing performances and scalability of Database-Oriented

Bioinformatics Applications (DBApp) in the EGEE GRID

– Testing Performance and Scalability

Grid: too many variables (queue time, database

download time, queue failures, execution failures)

Submission mode: too many variables (number of jobs,

rate-limiting settings, resubmission algorithm)

Application too many variables: (performance of

specific application, location of database)

Probing of Grid performances

Numeric simulation for all algorithms

WP4 – Methods - DBApp Perf. Testing

Page 17: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

17Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

<1minute1-2min

2-4min4-10min

10-30min30min-1h

1h-4h4h-8h

>8hTime-out

0

5

10

15

20

25

30

Grid queue times

(normal load)

Queue times

% o

f jo

bs

• Probing Grid performances (Example)

– Grid queue times and reliability

Sent 150 jobs in 3 groups of 50 at different times

WP4 – Methods - DBApp Perf. Testing

Page 18: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

18Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

WP5 – Molecular docking

The neuraminidase viruses is considered a valid target for antiviral drugs

Page 19: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

19Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

Docking: predict howsmall molecules bind

to a receptor ofknown 3D structure

Starting compound database

Starting target structure model

DOCKING

Predicted binding models

Post-analysis

Compounds for assay

WP5 – Molecular docking

There are successful examples–rapid,–cost effective…

But there are limitations–CPU and storage needed

More specific talk by Ana Lucia Da Costa

Wednesday 13th 11:15 – Room: Bordeaux

Page 20: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

20Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

WP7 – Dissemination

• The following series of events were specifically associated to or organized by the BioinfoGRID project:– BioinfoGRID Symposium 2007: December 10th-13th 2007, Milan– BioinfoGRID Session at EGEE '07: October 4th 2007, Budapest– Biomed Grid School, Varenna, Italy, May 14th-19th 2007– BioinfoGRID Workshop at Healthgrid 2007 Conference - Geneva,

Switzerland, 24th April 2007– NETTAB 2006 Workshop: Distributed Applications, Web Services,

Tools and GRID Infrastructures for Bioinformatics - Santa Margherita di Pula, Sardinia, Italy - July 10-13th, 2006

– BioinfoGRID Initial Training Course, Bari, Italy, March 8th-10th 2006

• In addition, the BioinfoGRID project has been represented at 58 national and international conferences and workshops.

Page 21: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

21Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

WP7 – Dissemination

• 24 Journal Articles written within the frame of the BioinfoGRID project:– 9 - BMC Bioinformatics– 4 - IEEE Transactions on Nanobioscience– 3 - Studies in Health Technology and Informatics– 1 - Journal of Parallel and Distributed Computing– 1 - Journal of Chemical Information and Modeling– 1 - Parallel Computing– 1 - Int. J. of Bioinformatics Research and Applications– 1 - IEEE Transactions on Systems Science and Applications– 1 - Nucleic Acids Research– 1 - BMC Genetics– 1 - Bioinformatics

Page 22: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

22Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

WP7 – Dissemination

• 19 Conferences proceedings achieved within BioinfoGRID– 6 – NETTAB '06– 2 – EGEE User Forum 06/07– 2 – BITS '06– 2 – HPDC '07– 1 – EGEE 06/07– 1 – CAPI 2006– 1 – Bioinformatics of African Pathogens and Disease Vectors.

Nairobi 2007– 1 – MAS-BIOMED '06 Workshop– 1 – CCGrid '07 Symposium– 1 – EvoBIO '08– 1 – CHEP '07

Page 23: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

23Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

People Acknowledgments• Cristina Aiftimiei• Roberta Alfieri• Claudio Arlandini• Roberto Barbera• Endre Barta• Francesco Beltrame• Attila Bende• Chiara Bishop• Chirstophe Blanchet• Ignacio Blanquer• Vincent Bloch• Gianpaolo Bottoni• Vincent Breton• Andrea Calabria• Andrea Caprera• Tiziana Castrignanò• Federidica Chiappori• Dario Corrada• Paolo Cozzi• Stefano Cozzini• Enza D’Alba• Pasqualina D’Ursi• Ana Da Costa• Paride Dagna• Guilia De Sario• Davide Di Pasquale• Giacinto Donvito• Vihang Dudhalkar• Peter Ernst

• David Fergusson• Geraldine Fettahi• Sandro Fiore• Riccardo Gervasoni• Karl-Heinz Glatting• John Hatton• Ally Hume• Nicolas Jacq• Atul Jain• Miklos Kozlovszky• Giuseppe La Rocca• Yannick Legré• Pietro Liò• Carles Loomis• Mario Marchisio• Hajnal Marton• Rafael Mayo Garcia• Mirco Mazzucato• Giovanni Meloni• Ivan Merelli• Emanuale Merelli• Luciano Milanesi• Elisa Molinari• Ettore Mosca• Georgina Moulton• Loukas Moutsianas• Tibor Nagy• Alessandro Negro• Laszlo Oroszi

• Alessandro Orro• Giovanni Paolella• Silvano Paoli• Antonio Pierro• Giorgio Pietro Maggi• Marco Pirola• Raffaele Ponzini• Ivan Porro• Paolo Ramieri• Paolo Romano• Ermanna Rovida• Erika Salvi• Jean Salzemann• Diego Sardaci• Salvatore Scifo• Martin Senger• Giuliano Taffoni• Livia Torterolo• Gabriele Trombetti• Angelica Tulipano• Vania Ugè• Elizabeth van der Wath• Richard van der Wath• Kasam Vinod• Federica Viti• Guy Warner• Ted Wen• Pierfrancesco Zuccato

Page 24: Milanesi Luciano EGEE User Forum, Clermont-Ferrand, France 11-14 February, 2008 BioinfoGRID Project Milanesi Luciano National Research Council Institute

24Milanesi Luciano BioinfoGRID Symposium, Milan 10-13 December 2007

Projects Acknowledgements

EUGRIDGRIDISSeG

Di l i gentA DIgital Library Infrastructureon Grid ENabled Technology