application use cases nikhef, amsterdam, december 12, 13

28
Application Use Cases NIKHEF, Amsterdam, December 12, 13

Upload: damian-haynes

Post on 15-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Application Use Cases

NIKHEF, Amsterdam, December 12, 13

Page 2: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Use Cases

• part of a development process– requirements gathering– use cases– architectural design– fast prototyping– implementation

• text describing a real case

Page 3: Application Use Cases NIKHEF, Amsterdam, December 12, 13

D0@Fermi

Page 4: Application Use Cases NIKHEF, Amsterdam, December 12, 13

A D0 use case

produce 1 million events using the Pythia event generator and the GEANT D0 detector simulation program. Add pile-up events during digitisation and before reconstruction. After reconstruction and analysis all raw and resulting data is conserved. Any production should be exactly reproducible.

Page 5: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Pythia

GEANT-3

Simulation

Reconstruction

Analysis

cards

D0 MC Flow Chart

Min.bias

geometry

INPUT OUTPUT

Page 6: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Requirements Gathering

• discussing use cases often delivers system requirements

Design & Architecture

• new components

• extending existing components (standards!)

• CRC’s

Page 7: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Refinements

• # min.bias events depending on luminosity

• add 0< <10 min.bias events per event

• any event should be (exactly) reproducible

• min.bias events are also generated with Pythia

• min.bias events could also be measured data

• could be in a file of ## events

• could be stored one by one in a database

Page 8: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Options

• min.bias events generated on the fly

• min.bias events from a file on the grid

• min.bias events from file on the CE

• min.bias events from generated file

• etc.

• and always (exactly) reproducable !

Page 9: Application Use Cases NIKHEF, Amsterdam, December 12, 13

HEPCAL Document

• requirements from all 4 LHC experiments

• requirements from EO and Bio

• ~50 use cases

• discussed within Architectural Task Force

• design is input for Global Grid Forum

• follow always standard protocols

Page 10: Application Use Cases NIKHEF, Amsterdam, December 12, 13

ATLAS

Page 11: Application Use Cases NIKHEF, Amsterdam, December 12, 13

An implementation of distributed analysis in ALICE using natural parallelism of processing

Local

Remote

Selection

Parameters

Procedure

Proc.C

Proc.C

Proc.C

Proc.C

Proc.C

PROOF

CPU

CPU

CPU

CPU

CPU

CPU

TagDB

RDB

DB1

DB4

DB5

DB6

DB3

DB2

Bring the job to the data and not the data to the job

Page 13: Application Use Cases NIKHEF, Amsterdam, December 12, 13

                                                                    

 

Page 14: Application Use Cases NIKHEF, Amsterdam, December 12, 13

ATLAS/LHCb Software Framework(Based on Services)

Converter

Algorithm

Event DataService

PersistencyService

DataFiles

AlgorithmAlgorithm

Transient Event Store

Detec. DataService

PersistencyService

DataFiles

Transient Detector

Store

MessageService

JobOptionsService

Particle Prop.Service

OtherServices

HistogramService

PersistencyService

DataFiles

TransientHistogram

Store

ApplicationManager

ConverterConverter

The Gaudi/Athena Framework – Services will

interface to Grid (e.g. Persistency)

Page 15: Application Use Cases NIKHEF, Amsterdam, December 12, 13
Page 16: Application Use Cases NIKHEF, Amsterdam, December 12, 13

A CMS Data Grid JobThe vision for 2003

Page 17: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Common Applications Work

• Several discussions between application WPMs and technical coordination to consider the common needs of all applications

HEP EO Bio

Common applicative layer

EDG software

Globus

Page 18: Application Use Cases NIKHEF, Amsterdam, December 12, 13

reconstruction

simulation

analysis

interactivephysicsanalysis

batchphysicsanalysis

batchphysicsanalysis

detector

event summary data

rawdata

eventreprocessing

eventreprocessing

eventsimulation

eventsimulation

analysis objects(extracted by physics topic)

Data Handling and Computation for

Physics Analysisevent filter(selection &

reconstruction)

event filter(selection &

reconstruction)

processeddata

les.

rob

ert

son

@ce

rn.c

h

CERN

Page 19: Application Use Cases NIKHEF, Amsterdam, December 12, 13

LCG/Pool on the Grid

File Catalog

Collections

Replica Location Service

Grid Dataset Registry G

rid

Reso

urce

s

Exp

erim

en

t Fra

mew

ork

Use

r A

pplica

tion

LCG POOL Grid Middleware

RootI/O

Replica Manager

Page 20: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Applications in DataGrid

•HEP

•Bio Informatics and Health

•Earth Observation

Page 21: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Challenges for a biomedical grid

• The biomedical community has NO strong center of gravity in Europe

– No equivalent of CERN (High-Energy Physics) or ESA (Earth Observation)

– Many high-level laboratories of comparable size and influence without a practical activity backbone (EMB-net, national centers,…) leading to:

• Little awareness of common needs• Few common standards• Small common long-term investment

• The biomedical community is very large (tens of thousands of potential users)

• The biomedical community is often distant from computer science issues

Page 22: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Biomedical requirements

• Large user community(thousands of users)– anonymous/group login

• Data management– data updates and data

versioning– Large volume management

(a hospital can accumulate TBs of images in a year)

• Security– disk / network encryption

• Limited response time– fast queues

• High priority jobs

– privileged users

• Interactivity

– communication between user interface and computation

• Parallelization

– MPI site-wide / grid-wide

– Thousands of images

– Operated on by 10’s of algorithms

• Pipeline processing

– pipeline description language / scheduling

Page 23: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Biomedical projects in DataGrid

• Distributed Algorithms. New distributed "grid-aware" algorithms (bio-info algorithms, data mining, …)

• Grid Service Portals. Service providers taking advantage of the DataGrid computational power and storage capacity.

• Cooperative Framework. Use the DataGrid as a cooperative framework for sharing resources, algorithms, and organize experiments in a cooperative manner.

Cooperative FrameworkGrid Service Portals

Distributed Algorithms

EDG Middleware

WP10applications

Page 24: Application Use Cases NIKHEF, Amsterdam, December 12, 13

The grid impact on data handling• DataGrid will allow

mirroring of databases

– An alternative to the current costly replication mechanism

– Allowing web portals on the grid to access updated databases

BiomedicalBiomedicalReplica Replica CatalogCatalog

Trembl(EBI)

Swissprot(Geneva)

Page 25: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Web portals for biologists

• Biologist enters sequences through web interface• Pipelined execution of bio-informatics algorithms

– Genomics comparative analysis (thousands of files of ~Gbyte)• Genome comparison takes days of CPU (~n**2)

– Phylogenetics– 2D, 3D molecular structure of proteins…

• The algorithms are currently executed on a local cluster– Big labs have big clusters …– But growing pressure on resources – Grid will help

• More and more biologists• compare larger and larger sequences (whole

genomes)…• to more and more genomes…• with fancier and fancier algorithms !!

Page 26: Application Use Cases NIKHEF, Amsterdam, December 12, 13

The Visual DataGrid Blast, a first genomics application on DataGrid

• A graphical interface to enter query sequences and select the reference database

• A script to execute the BLAST algorithm on the grid

• A graphical interface to analyze result

• Accessible from the web

portal genius.ct.infn.it

Page 27: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Summary of added value provided by Grid for BioMed applications

• Data mining on genomics databases (exponential growth).

• Indexing of medical databases (Tb/hospital/year).

• Collaborative framework for large scale experiments (e.g. epidemiological studies).

• Parallel processing for– Databases analysis

– Complex 3D modelling

Page 28: Application Use Cases NIKHEF, Amsterdam, December 12, 13

Earth Observation (WP9)

Global Ozone (GOME) Satellite Data Processing and Validation by KNMI, IPSL and ESA

The DataGrid testbed provides a collaborative processing environment for 3 geographically distributed EO sites (Holland, France, Italy)

28