user communities and applications

47
INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org User communities and applications David Fergusson 28th February

Upload: licia

Post on 30-Jan-2016

13 views

Category:

Documents


0 download

DESCRIPTION

User communities and applications. David Fergusson 28th February. Enabling Grids for EsciencE. What is the EGEE community? Researchers in eScience (applications NA4) eResearch European community World grid community Industry (industry forum) What is not the EGEE community?. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: User communities and applications

INFSO-RI-508833

Enabling Grids for E-sciencE

www.eu-egee.org

User communities and applicationsDavid Fergusson

28th February

Page 2: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Enabling Grids for EsciencE

• What is the EGEE community?– Researchers in eScience (applications NA4)– eResearch– European community– World grid community– Industry (industry forum)

• What is not the EGEE community?

Page 3: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

eScience/eResearch

• EGEE’s initial focus is on specific scientific communities– High Energy Physics (Large Hadron Collider)– Biomedical – Geology– Chemistry– Astrophysics

• Collaborating with other EU projects in other areas– For example, digital libraries - DILIGENT

Page 4: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Applications in EGEE• Production service supporting multiple VOs

with different requirements– Data

Volume Location – distributed? Write Once or Update? Metadata archives? Controlled or open access?

– Computation High throughput (~ current LCG) High performance, supercomputing

– No. of sites, scientists,…• Establish viable general process to bring other scientific

communities on board

Page 5: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

An EGEE community

• EGEE communities are based around the idea of Virtual Organisations.

• A Virtual Organisation:– Owns shared computing resources– Authorises and authenticates its members access to resources– Manages its own resources

Page 6: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE: adding a VO

EGEE has a formal procedure for adding selected new user communities (Virtual Organisations):

• Negotiation with one of the Regional Operations Centres

• Seek balance between the resources contributed by a VO and those that they consume.

• Resource allocation will be made at the VO level. • Many resources need to be available to multiple VOs :

shared use of resources is fundamental to a Grid

Page 7: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

The role of the pilot applications – HEP and Biomedicine

• Initial area of focus to establish a strong user base on which to build a broad EGEE user community

• Provide early feedback to the infrastructure activities on their experience with application deployment and VO management

• Act as guinea pigs and provide early feedback to the middleware developers on their experience with new services

Page 8: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE pilot application: Large Hadron Collider

Mont Blanc(4810 m)

• Data Challenge:– 10 Petabytes/year of data !!!– 20 million CDs each year!

• Simulation, reconstruction, analysis:– LHC data handling requires computing

power equivalent to ~100,000 of today's fastest PC processors!

• Operational challenges– Reliable and scalable through project

lifetime of decades Downtown Geneva

Page 9: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

The characteristics of pilot HEP applications

• Very large scale from project day 1

• Virtual Organizations were already set up at project day 1

• Very centralized: jobs are sent in a very organized way

• Multi-grid: data challenges are deployed on several grids– ALICE LCG, Alien– ATLAS LCG, US Grid2003, Nordugrid– CMS LCG, US Grid2003– LHCb LCG, Dirac

Page 10: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

The Large Hadron Collider

http://www.cern.ch

~9 km

LHC

SPS

CERN

Page 11: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

The LHC Experiments

Page 12: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Overview of experiences with LHC data challenges

• There was continual evolution throughout 2004, with LCG and experiments gaining more experience in the development and use of an expanding LCG grid

• All experiments had excellent relations with LCG-EIS support – a model for the future support of VOs

• Global job efficiencies ranged from 60-80% as experience developed – must get up to 90+% for user analysis - look to new middleware developments and tighter operational procedures

• Sources of problems and losses– Site configuration, management and stability– Data Management (especially metadata handling)– Difficult to monitor job running and causes of failure

• D0 in early 2005 showed that one can run with good efficiency with a set of well controlled sites

Page 13: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE pilot application: BioMedical

• BioMedical– Bioinformatics (gene/proteome databases

distributions)– Medical applications (screening,

epidemiology, image databases distribution, etc.)

– Interactive application (human supervision or simulation)

– Security/privacy constraints Heterogeneous data formats - Frequent

data updates - Complex data sets - Long term archiving

http://egee-na4.ct.infn.it/biomed/applications.html

Page 14: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

The characteristics of biomedical pilot applications

• Prototype level at project day 1 • VO was created after the project kicked-off• Very decentralized: application developers use the grid

at their own pace• Very demanding on services

– Compute intensive applications– Applications requiring large amounts of short jobs – Need for interactivity or guaranteed response time

• Resources were focused on the deployment of large scale applications on LCG-2– Integration of Biomed VO used to identify issues relevant to all

VOs to be deployed during EGEE lifetime– Decentralized usage of the infrastructure highlights different

weaknesses from the more centralized HEP data challenges

Page 15: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Status of Biomedical VO

PADOVA

BARI

15 resource centres ( )17 CEs (>750 CPUs)16 SEs

4 RBs:

CNAF, IFAE,

LAPP, UPV

RLS, VO LDAP Server:

CC-IN2P3

4 RBs

1 RLS

1 LDAP Server

Page 16: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Biomedical VO: production jobs on EGEE

Page 17: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Biomedical applications

– 3 batch-oriented applications ported on LCG2 SiMRI3D: medical image simulation xmipp_MLRefine: molecular structure analysis GATE: radiotherapy planning

– 3 high throughput applications ported on LCG2 CDSS: clinical decision support system GPS@: bioinformatics portal (multiple short jobs) gPTM3D: radiology images analysis (interactivity)

– New applications to join in the near future Especially in the field of drug discovery

Page 18: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

EGEE pilot application: BioMedical• BioMedical

– Bioinformatics (gene/proteome databases distributions)– Medical applications (screening, epidemiology, image databases

distribution, etc.)– Interactive application (human supervision or simulation)– Security/privacy constraints

Heterogeneous data formats - Frequent data updates - Complex data sets - Long term archiving

• BioMed applications deployed

• GATE - Geant4 Application for Tomographic Emission– GPS@ - genomic web portal – CDSS - Clinical Decision Support System

Page 19: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

12 Biomed applications

• GATE: Geant4 Application for Tomographic Emission (LPC)

• Docking platform for tropical diseases: grid-enabled docking platform for in sillico drug discovery (LPC)

• CDSS: Clinical Decision Support System (UPV)

• GPS@: Grid genomic web portal (IBCP)

• SiMRI 3D: Magnetic Resonance Image simulator (CREATIS)

• gPTM 3D: Interactive radiological image visualization and processing tool (LRI)

• xmipp_ML_refine: Macromolecular 3D structure analysis (CNB)

• xmipp_multiple_CTFs : Electronmicroscopic images CTF calculation (CNB)

• GridGRAMM: Molecular Docking web (CNB)

• GROCK: Mass screenings of molecular interaction (CNB

• Mammogrid: Mammograms analysis (EU project)

• SPLATCHE: Genome evolution modeling (U. Berne/WHO)

Page 20: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

...and more to come

• SPLATCHE– first application being migrated from GILDA to biomed VO

• Pharmacokinetics in MRI (UPV)– MRI registration for contrast agent diffusion study

• Some progress on biological sequences analysis (M. Lexa)

• ...

Page 21: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

BLAST – comparing DNA or protein sequences• BLAST is the first step for analysing new

sequences: to compare DNA or protein sequences to other ones stored in personal or public databases. Ideal as a grid application.

– Requires resources to store databases and run algorithms

– Can compare one or several sequence against a database in parallel

– Large user community

Page 22: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Bio-medicine applications

Medical images

Exam image patient key ACL ...

1. Query the medical image database and retrieve a patient image

Metadata

3. Retrieve most similar cases

Similar images Low score images

2. Compute similarity measures over the database images

Submit 1 job per image

• Bio-informatics– Phylogenetics– Search for primers– Statistical genetics– Bio-informatics web portal– Parasitology– Data-mining on DNA chips– Geometrical protein comparison

• Medical imaging– MR image simulation– Medical data and metadata

management– Mammographies analysis– Simulation platform for

PET/SPECT

Applications deployedApplications tested

Applications under preparation

Page 23: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Bio-medicine applications

Page 24: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Bio-medicine applications

Page 25: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Bio-medicine applications

Page 26: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

gPTM3D : Grid-Enabling Interactive Medical Analysis

Interaction

RenderExplore Analyse InterpretAcquire

Page 27: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Use case

Planning percutaneous nephrolithotomy

Page 28: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Evolution of biomedical applications

• Growing interest of the biomedical community– Partners involved proposing new applications– New application proposals (in various health-related areas)– Enlargement of the biomedical community (drug discovery)

• Growing scale of the applications– Progressive migration from prototypes to pre-production services

for some applications– Increase in scale (volume of data and number of CPU hours)

• Towards pre-production– Several initiatives to build user-friendly portals and interfaces to

existing applications in order to open to an end-users community

Page 29: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

A look at the future: the HealthGrid vision

INDIVIDUALISED HEALTHCAREMOLECULAR MEDICINE

Databases

Association

Modelling

Computation

HealthGRID

Computational recommendation

Public Health

Patient

Tissue, organ

Cell

MoleculePatient

related data

Public Health

Patient

Tissue, organ

Cell

Molecule

In this context "Health" does not involve only clinical practice but covers the whole range of information from molecular level (genetic and proteomic information) over cells and tissues, to the individual and finally the population level (social healthcare).

Page 30: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Earth Sciences in EGEE

• Research– Earth observations by satellite

(ESA(IT), KNMI(NL), IPSL(FR), UTV(IT), RIVM(NL),SRON(NL))

– Climate : DKRZ(GE),IPSL(FR)

– Solid Earth Physics: IPGP (FR)

– Hydrology: Neuchâtel University (CH)

• Industry– CGG : Geophysics Company (FR)

Page 31: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Climate Applications in EGEE

Model: Atmosphere, Ocean, Hydrology, Atmospheric and Marine chemistry….

Goal: Comparison of model outputs from different runs and/or institutes

Large volume of data (TB) from different model outputs, and experimental data

Run made on supercomputer      => Link the EGEE infrastruture with supercomputer Grids (DEISA)

EXAMPLE: For the IPCC Assessment reports many experiment are performed with different models (different spatial resolution, different time-step, different "physics" ..) and various sites.

The generated data need to be compared in a comprehensive and "unified" way.

Page 32: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Geophysics Applications

Seismic processing Generic Platform:

- Based on Geocluster, an industrial application – to be a starter of the core member VO.

- Include several standard tools for signal processing, simulation and inversion.

- Opened: any user can write new algorithms in new modules (shared or not)

- Free for academic research

-Controlled by license keys (opportunity to explore license issue at a grid level)

- initial partners F, CH, UK, Russia, Norway

Page 33: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Flood simulation

SampleVah river

Geographical Information SystemsGeographical Information Systems

Computer visionComputer vision

Results: flow Results: flow + water depths+ water depths

Page 34: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Computational Chemistry: molecular simulator

SURFACESURFACEConstruction of the

Potential Energy Surface

DYNAMICSDYNAMICSDynamical properties

Calculation

no yes

end

PROPERTIESPROPERTIESCalculation of

Averaged quantities

Good

Results?

Ar - Benzene

Page 35: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

The MAGIC telescope

• Largest Imaging Air Cherenkov Telescope (17 m mirror dish)

• Located on Canary Island La Palma (@ 2200 m asl)

• Lowest energy threshold ever obtained with a Cherenkov telescope

Aim: detect –ray sources in the unexplored energy range: 30 (10)-> 300 GeV

Page 36: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

AGNsAGNs

SNRsSNRs Cold Dark MatterCold Dark Matter

PulsarsPulsars

GRBsGRBs

Tests of Tests of Quantum Quantum Gravity effectsGravity effects

Cosmological Cosmological

-Ray -Ray HorizonHorizon

The MAGIC Physics Program

Origin of Origin of Cosmic Cosmic RaysRays

Page 37: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Feedback to LCG-2 middleware developers and infrastructure

• From HEP applications– Experiment Integration Support group and Grid Applications

Group produced documents summarizing problems encountered in use of LCG-2

• From Biomed applications– Very significant exchanges related to the set-up of the biomed

VO and the deployment of relevant services– Request to use MPI

Page 38: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Engineering applications

Page 39: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Engineering applications

Page 40: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Grid Applications: art

the Thomson the Thomson flat scannerflat scannerdeveloped in 1990developed in 1990

140,000 photo-archives

digitised in

6.000 dots x 8.000 lines

in 5 years (1996-2001)

Books are being scanned in at 767 MB per page

1/2 Terabyte for Gutenberg Bible

Paintings are being scanned in at 30 GB each

in the EU CRISATEL Project

Museo Virtual de Artes El Pais (MUVA) http://www3.diarioelpais.com/muva/.

Page 41: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Who else can benefit from EGEE?

• EGEE Generic Applications Advisory Panel:

– For new applications

• EU projects: MammoGrid, Diligent, SEE-GRID …

• Expression of interest: Planck/Gaia (astroparticle), SimDat (drug discovery)

http://agenda.cern.ch/age?a042351 Next meeting at EGEE conference (November)

Page 42: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

New communities identification

• Through training, dissemination and outreach, communities already using advanced computing and keen to use EGEE infrastructure are identified

• These communities are encouraged to prepare a document describing their interest to use EGEE

• A scientific advisory panel (EGAAP) assesses and chooses among the interested communities the ones which seem the most mature to deploy their applications on EGEE

Page 43: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

GILDA, an infrastructure for dissemination and demonstration

• Goals– Demonstration of grid operation for tutorials and outreach – Initial deployment of new applications for testing purposes

• Key features– Initiative of the INFN Grid Project using LCG-2 middleware– On request, anyone can quickly receive a grid certificate and a

VO membership allowing them to use the infrastructure for 2 weeks

– Certificate expires after two weeks but can be renewed – Use of friendly interface: Genius grid portal

• Very important for the first steps of new user communities on to the grid infrastructure

Page 44: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

GILDA numbers

• 14 sites in 2 continents

• >1200 certificates issued, 10% renewed at least once

• >35 tutorials and demos performed in 10 months

• >25 jobs/day on the average

• Job success rate above 96%

• >320,000 hits on the web site from 10’s of different countries

• >200 copies of the UI live CD distributed in the world

Page 45: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

NA4 Applications and GILDA

• 7 Virtual Organizations supported: – Biomed– Earth Science Academy (ESR)– Earth Science Industry (CGG)– Astroparticle Physics (MAGIC)– Computational Chemistry (GEMS)– Grid Search Engines (GRACE)– Astrophysics (PLANCK)

• Development of complete interfaces with GENIUS for 3 Biomed Applications: GATE, hadronTherapy, and Friction/Arlecore

• Development of complete interfaces with GENIUS for 4 Generic Applications: EGEODE (CGG), MAGIC, GEMS, and CODESA-3D (ESR) (see demos!)

• Development of complete interfaces with GENIUS for 16 demonstrative applications available on the GILDA Grid Demonstrator (https://grid-demo.ct.infn.it)

Page 46: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Summary

• EGEE and grids – not just physics

• For communities to benefit they need to know what grids can do for them – dissemination

• Many communities are beginning to adopt the grid

• EGEE has a mechanism for assisting communities onto the grid

Page 47: User communities and applications

Enabling Grids for E-sciencE

INFSO-RI-508833

Practical URLs

• homepages.nesc.ac.uk/~gcw

• grid-demo.ct.infn.it