cern it department ch-1211 geneva 23 switzerland t the experiment dashboard isgc 2008 9-11 th april...
TRANSCRIPT
CERN IT Department
CH-1211 Geneva 23
Switzerlandwww.cern.ch/
it
The Experiment Dashboard
ISGC 2008
9-11th April 2008
Pablo Saiz, Julia Andreeva, Benjamin Gaidioz, Anastasia Ivanchecnko,
Gerhild Maier, Ricardo Rocha, Irina Sidirova
IT-GS-MND
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Overview
• Dashboard structure• Dashboard in production
– Job Monitoring– Grid reliability– Prodsys– Data Management– SAM – FTS monitoring– Site status board
• Future development• Conclusions
ISGC 2008 -- [email protected] 2
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Dashboard Framework
Web / HTTP Interface
Data Access Layer (DAO)
Agents
Oracle DB
DB reading and writing via DAO layer
Connection pooling
Easy to add interface for a different backend
Collectors of information
Common configuration and management
Multiple clients: cli, web
Multiple output formats: plain text, csv, xml, xhtml
ISGC 2008 -- [email protected] 3
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Transfer monitoring for ALICE
Data management monitoring for ATLAS
Production monitoring for ATLAS and CMS
(prototypes)
IO rate monitoring between WN and SE (prototype)
Site availability
based on the
results of SAM tests
Job Robot
monitoring
Accounting information from Apel and Gratia for ATLAS (prototype)
Task monitoring for CMS analysis users (ATLAS on the way)
Job monitoring
Site reliability
Experiment Dashboard
COMMON applications
ALICE, ATLAS, CMS, LHCb,
Vlemed
CMS
Integration and commissioning
Experiment specific applications
Dashboard activities
ISGC 2008 -- [email protected] 4
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Job Monitoring
Display all the jobs submitted by a VOoFollow the status of the jobs
Collect information from different sourcesoRGMA, IC Real Time Monitor, BDII, MonALISA, …
Very useful for VO managers, site admin, users
Possibility to get the output in different formats
Deployed for ALICE, ATLAS, CMS, LHCb and VleMed
ISGC 2008 -- [email protected] 5
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Job Monitoring
ISGC 2008 -- [email protected] 6
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Job Monitoring
ISGC 2008 -- [email protected] 7
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Site Reliability
Efficiency of the different sitesoJobs and Job Attempts
List of most common errorsoAnd recipes to the solutions!!
Generic application
Automatic generation of monthly reports
ISGC 2008 -- [email protected] 8
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Site reliability
ISGC 2008 -- [email protected] 9
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Production System
ATLAS Prodsys
Identify failing tasks and jobs
Evaluate the performance of the sites
Daily/weekly/monthly statistics
User guide
ISGC 2008 -- [email protected] 10
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Production System
ISGC 2008 -- [email protected] 11
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Production System
ISGC 2008 -- [email protected] 12
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Data Management
Monitor of T0 and Production system
Report of transfers to the different sites
Integrated with the ATLAS management system
Information of the clouds, sites, SE and datasets
History of errors
ISGC 2008 -- [email protected] 13
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Data Management
ISGC 2008 -- [email protected] 14
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Data Management
ISGC 2008 -- [email protected] 15
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
FTS reliability
Daily report on the success of transfers
Drill down list of errors
Integrated in the ALICE environment
Extremely useful during the different ALICE challenges: PDC06, PDC07, CRC08
Working on making it generic
ISGC 2008 -- [email protected] 16
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
FTS reliability
ISGC 2008 -- [email protected] 17
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
SAM monitoring
Service Availability Monitoring
Clickable plots to drill down:Site availability Service availability Service tests
Links to the SAM results
At the moment, only for CMSATLAS requested a similar interfaceOngoing work to make it generic
ISGC 2008 -- [email protected] 18
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
SAM monitoring
ISGC 2008 -- [email protected] 19
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
SAM monitoring
ISGC 2008 -- [email protected] 20
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Site Status Board
Table with status of the different sites for CMS
Easy definition of new ‘metrics’oThe ‘metrics’ can come from different sources
Links to more detailed information
At the moment, deployed for CMSo It could be used by other VO
Working on providing historyoAnd aggregation…
ISGC 2008 -- [email protected] 21
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Site Status Board
ISGC 2008 -- [email protected] 22
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Site Status Board
ISGC 2008 -- [email protected] 23
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Experiment Dashboard plans
Include more data sources: condor_g, L&B, Security: X509 authentication
New application:Pilot jobsInput collections
Improve existing applicationsMake the SAM interface genericMore in depth failure analysisUser requests and suggestions
Integration with the GridMap technology ISGC 2008 -- [email protected] 24
CERN IT Department
CH-1211 Genève 23
Switzerlandwww.cern.ch/
it
InternetServices
Conclusions
The Experiment Dashboard provides:Several monitor applicationsIntegration of information from different sourcesMultiple output format: html, xml, csv, txt..
Generic appliations:Job Monitoring, Grid reliability
Experiment specificDDM, ProdSys, Site Status Board, SAM, …
Used in production by multiple VO
User, installation and developer guides
http://dashboard.cern.chISGC 2008 -- [email protected] 25