infso-ri-508833 enabling grids for e-science a grid approach to distributed image analysis for...
TRANSCRIPT
INFSO-RI-508833
Enabling Grids for E-sciencE
www.eu-egee.org
A Grid Approach to Distributed Image Analysis for Early Diagnosis of Alzheimer Disease
Livia Torterolo
University of Genoa / DIST
EGEE User Forum
CERN, 01-03.03.2006
EGEE User Forum, CERN, 01-03.03.2006 2
Enabling Grids for E-sciencE
INFSO-RI-508833
Agenda
• Introduction to SPM analysis and to development of a web-based SPM service
• GRID implementation of SPM service
• Current Status and Plans
EGEE User Forum, CERN, 01-03.03.2006 3
Enabling Grids for E-sciencE
INFSO-RI-508833
Partners
• The first part of the work has been done during the Italian Neuroinformatics OCSE Node project in collaboration with
– San Raffaele Hospital of Milano– University of Milano – Bicocca
• GRID implementation has been carried out by Bio-Lab at DIST, University of Genoa, during the Grid.it Italian project
• The application is a quite stable work in progress and served as a demostrator for the Grid.it project but further developments and extensions are in progress still inside the GILDA testbed
EGEE User Forum, CERN, 01-03.03.2006 4
Enabling Grids for E-sciencE
INFSO-RI-508833
Introduction to Alzheimer Disease
• AD is the most common form of dementia, accounting from 50% to 70% of all dementias in elderly people (around 3 million people in EU25)
• Clinically, AD is characterized by a progressive loss of cognitive abilities
• Memory loss is typically the earliest sign of AD
EGEE User Forum, CERN, 01-03.03.2006 5
Enabling Grids for E-sciencE
INFSO-RI-508833
Analysis of PET-SPECT images
EGEE User Forum, CERN, 01-03.03.2006 6
Enabling Grids for E-sciencE
INFSO-RI-508833
Results of a SPM analysis on a PET study of glucose metabolism in a patient with dementia. Ipometabolic pattern in the frontal cortex: design matrix (top right), statistically significant clusters on a glass brain in three orthogonal planes (top left) and on a 3D brain rendering (bottom)
SPM Results
EGEE User Forum, CERN, 01-03.03.2006 7
Enabling Grids for E-sciencE
INFSO-RI-508833
Why a portal?
• The remote access to SPM analysis could provide doctors from peripheral hospitals with an invaluable tool to run the analysis remotely only using a standard browser web
• No particular HW resources or computer knowledge are required
• In order to avoid errors during the analysis, only selected users should access and use the service
http://www.neuroinf.ithttp://www.neuroinf.it
http://www.neuroinf.ithttp://www.neuroinf.it
EGEE User Forum, CERN, 01-03.03.2006 8
Enabling Grids for E-sciencE
INFSO-RI-508833
SPM: Why GRID?
• A large set of images of normal patients is required to be used for comparison
• PET and SPECT studies on normal subjects are very rare
• Images of patients are covered by privacy and security issues and for this reason they cannot be freely moved on the net or published by the centre that made the analysis
EGEE User Forum, CERN, 01-03.03.2006 9
Enabling Grids for E-sciencE
INFSO-RI-508833
Contribution to GILDA
• LCG Grid Environment v. 2.6.0.
6 machines dedicated to LCG NODE:
1 CE, 1 SE, 3 WNs, 1 UI, 1 LCG install server (12 CPUs, 7Giga (RAM), ≈ 350Giga of storage)
EGEE User Forum, CERN, 01-03.03.2006 10
Enabling Grids for E-sciencE
INFSO-RI-508833
• Data Management and File Access tools:
to access data from User Interface using Logical File Names (LFN)
• LCG File Catalog (LFC):
to register data into the catalog
• AMGA Metadata Catalog:
to add related metadata to data
LCG tools used
EGEE User Forum, CERN, 01-03.03.2006 11
Enabling Grids for E-sciencE
INFSO-RI-508833
Data Management and File Access
EGEE User Forum, CERN, 01-03.03.2006 12
Enabling Grids for E-sciencE
INFSO-RI-508833
LCG File Catalog (LFC)
Several functionalities:
• Use of lcg_utils• Use of GFAL calls• Use of GSI certification
• Access to grid file in SEs from “anywhere”• Several replicas of files in different sites• Copy of data from/to local file system to GRID
EGEE User Forum, CERN, 01-03.03.2006 13
Enabling Grids for E-sciencE
INFSO-RI-508833
AMGA: ARDA Metadata Grid Application
Side-by-Side a File Catalogue (LFC): File Metadata
Access control to resources on the Grid is done via VOMS
Different front-end:– mdcli & mdclient and C++ API– Java Client API and command line
mdjavaclient.sh &– mdjavacli.sh (also under Windows !!)– Python Client API
Strong security requirements:– patient data is sensitive– metadata access must be
restricted to authorized users
EGEE User Forum, CERN, 01-03.03.2006 14
Enabling Grids for E-sciencE
INFSO-RI-508833
GRID-Design of the application
With reference to different LCG tools mentioned above, application code has been modified in the following way:
– Registration and storage of data files (PET/SPECT images) on SEs available using lcg_utils interacting with LFC and AMGA.
– Development of a C program with GFAL C API in order to access distributed images using their LFNs and to extract some information necessary to SPM analysis without copying them locally.
– Job Submission: creation of a JDL file to submit the executable (and not the images) with GFAL call to the GRID.
– Statistical Analysis: running of SPM analysis from results obtained from job submission. Statistical analysis is performed outside GRID environment.
EGEE User Forum, CERN, 01-03.03.2006 15
Enabling Grids for E-sciencE
INFSO-RI-508833
Final GRID structure
EGEE User Forum, CERN, 01-03.03.2006 16
Enabling Grids for E-sciencE
INFSO-RI-508833
Application Running: Setting Parameters
Setting SPM parameters Selection of normal subjects
QUERIES to AMGA !
EGEE User Forum, CERN, 01-03.03.2006 17
Enabling Grids for E-sciencE
INFSO-RI-508833
Application running: JOB SUBMISSION
SPM Analysis RESULTS
EGEE User Forum, CERN, 01-03.03.2006 18
Enabling Grids for E-sciencE
INFSO-RI-508833
Issues & Problems
• Pros:– Data distribution over different SEs accessed with GFAL APIs.– Integration of a Grid application into an existing environment with
hard constraints and several integration problems to solve– Application behaviour to the user unchanged
• Cons:– GFAL API not well documented & Compatibility issues and
conflicts on libraries (but good support from GILDA staff)
– Too much components (Zope + Plone, Python, shell scripts, Matlab, C/C++ compiled code, …& Grid!)
– Software maintenance is hard
EGEE User Forum, CERN, 01-03.03.2006 19
Enabling Grids for E-sciencE
INFSO-RI-508833
Current Status and Plans
• Application– pre-processing parallelization (further functions to port on Grid)
• Interface– move to Genius 3.1
• Overall framework– improve performance and scalability– A Grid environment devoted to the analysis of microarrays, derived
from this project, will be tested in the frame of European Project BioinfoGRID http://www.itb.cnr.it/bioinfogrid (Bioinformatics Application for Life Science)
– A bioinformatics data storage and analysis platform with metadata support will be developed under Italian MIUR-FIRB project LITBIO www.litbio.org (Laboratory for Interdisciplinary Technologies in Bioinformatics)
Thanks for your attention!