cern it department ch-1211 geneva 23 switzerland t the experiment dashboard isgc 2008 9-11 th april...

25
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/ The Experiment Dashboard ISGC 2008 9-11 th April 2008 Pablo Saiz, Julia Andreeva, Benjamin Gaidioz, Anastasia Ivanchecnko, Gerhild Maier, Ricardo Rocha, Irina Sidirova IT-GS-MND

Upload: augustine-melton

Post on 29-Dec-2015

213 views

Category:

Documents


0 download

TRANSCRIPT

CERN IT Department

CH-1211 Geneva 23

Switzerlandwww.cern.ch/

it

The Experiment Dashboard

ISGC 2008

9-11th April 2008

Pablo Saiz, Julia Andreeva, Benjamin Gaidioz, Anastasia Ivanchecnko,

Gerhild Maier, Ricardo Rocha, Irina Sidirova

IT-GS-MND

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Overview

• Dashboard structure• Dashboard in production

– Job Monitoring– Grid reliability– Prodsys– Data Management– SAM – FTS monitoring– Site status board

• Future development• Conclusions

ISGC 2008 -- [email protected] 2

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Dashboard Framework

Web / HTTP Interface

Data Access Layer (DAO)

Agents

Oracle DB

DB reading and writing via DAO layer

Connection pooling

Easy to add interface for a different backend

Collectors of information

Common configuration and management

Multiple clients: cli, web

Multiple output formats: plain text, csv, xml, xhtml

ISGC 2008 -- [email protected] 3

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Transfer monitoring for ALICE

Data management monitoring for ATLAS

Production monitoring for ATLAS and CMS

(prototypes)

IO rate monitoring between WN and SE (prototype)

Site availability

based on the

results of SAM tests

Job Robot

monitoring

Accounting information from Apel and Gratia for ATLAS (prototype)

Task monitoring for CMS analysis users (ATLAS on the way)

Job monitoring

Site reliability

Experiment Dashboard

COMMON applications

ALICE, ATLAS, CMS, LHCb,

Vlemed

CMS

Integration and commissioning

Experiment specific applications

Dashboard activities

ISGC 2008 -- [email protected] 4

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Job Monitoring

Display all the jobs submitted by a VOoFollow the status of the jobs

Collect information from different sourcesoRGMA, IC Real Time Monitor, BDII, MonALISA, …

Very useful for VO managers, site admin, users

Possibility to get the output in different formats

Deployed for ALICE, ATLAS, CMS, LHCb and VleMed

ISGC 2008 -- [email protected] 5

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Job Monitoring

ISGC 2008 -- [email protected] 6

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Job Monitoring

ISGC 2008 -- [email protected] 7

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Site Reliability

Efficiency of the different sitesoJobs and Job Attempts

List of most common errorsoAnd recipes to the solutions!!

Generic application

Automatic generation of monthly reports

ISGC 2008 -- [email protected] 8

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Site reliability

ISGC 2008 -- [email protected] 9

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Production System

ATLAS Prodsys

Identify failing tasks and jobs

Evaluate the performance of the sites

Daily/weekly/monthly statistics

User guide

ISGC 2008 -- [email protected] 10

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Production System

ISGC 2008 -- [email protected] 11

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Production System

ISGC 2008 -- [email protected] 12

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Data Management

Monitor of T0 and Production system

Report of transfers to the different sites

Integrated with the ATLAS management system

Information of the clouds, sites, SE and datasets

History of errors

ISGC 2008 -- [email protected] 13

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Data Management

ISGC 2008 -- [email protected] 14

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Data Management

ISGC 2008 -- [email protected] 15

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

FTS reliability

Daily report on the success of transfers

Drill down list of errors

Integrated in the ALICE environment

Extremely useful during the different ALICE challenges: PDC06, PDC07, CRC08

Working on making it generic

ISGC 2008 -- [email protected] 16

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

FTS reliability

ISGC 2008 -- [email protected] 17

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

SAM monitoring

Service Availability Monitoring

Clickable plots to drill down:Site availability Service availability Service tests

Links to the SAM results

At the moment, only for CMSATLAS requested a similar interfaceOngoing work to make it generic

ISGC 2008 -- [email protected] 18

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

SAM monitoring

ISGC 2008 -- [email protected] 19

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

SAM monitoring

ISGC 2008 -- [email protected] 20

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Site Status Board

Table with status of the different sites for CMS

Easy definition of new ‘metrics’oThe ‘metrics’ can come from different sources

Links to more detailed information

At the moment, deployed for CMSo It could be used by other VO

Working on providing historyoAnd aggregation…

ISGC 2008 -- [email protected] 21

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Site Status Board

ISGC 2008 -- [email protected] 22

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Site Status Board

ISGC 2008 -- [email protected] 23

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Experiment Dashboard plans

Include more data sources: condor_g, L&B, Security: X509 authentication

New application:Pilot jobsInput collections

Improve existing applicationsMake the SAM interface genericMore in depth failure analysisUser requests and suggestions

Integration with the GridMap technology ISGC 2008 -- [email protected] 24

CERN IT Department

CH-1211 Genève 23

Switzerlandwww.cern.ch/

it

InternetServices

Conclusions

The Experiment Dashboard provides:Several monitor applicationsIntegration of information from different sourcesMultiple output format: html, xml, csv, txt..

Generic appliations:Job Monitoring, Grid reliability

Experiment specificDDM, ProdSys, Site Status Board, SAM, …

Used in production by multiple VO

User, installation and developer guides

http://dashboard.cern.chISGC 2008 -- [email protected] 25