web application for detailed real-time database transaction monitoring for cms condition data iccmse...

18
Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational Methods for Science and Engineering Friday October 2 Salvatore Di Guida , Michele de Gruttola, Vincenzo Innocente, Antonio Pierro

Upload: august-buddy-chambers

Post on 03-Jan-2016

219 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Web application for detailed real-time database transaction

monitoring for CMS condition data

ICCMSE 2009The 7th International Conference of Computational

Methods for Science and Engineering

Friday October 2

Salvatore Di Guida, Michele de Gruttola, Vincenzo Innocente, Antonio Pierro

Page 2: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 2

Outline

• What are CMS condition data?• How are CMS condition data handled by PopCon?• What is PopCon monitoring?• Why PopCon monitoring?• GUI:

– PopCon from different users’ perspectives;– Example of different users’ perspectives and different

reports (table, error, chart).

• Architecture.• Results.• Upgrades and improvements.

ICCMSE 2009

Page 3: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 3

What are condition data?• Configuration data: needed to bring CMS in running mode:

– Voltage settings of power supplies,– Parameters for front-end electronics;

• Condition data: describing the state of any detector sub-system:– High-low voltages,– Magnet currents,– Needed online for post mortem analysis of detector errors and for HLT,

while offline for data quality monitoring and proper event reconstruction.

• Calibration data: describing the calibration of different sub-detectors, mainly evaluated offline:– Pedestal offsets,– Drift velocities,– Alignments,– Needed online for HLT, and offline for reconstructing properly physical

quantities coming from collision events.

ICCMSE 2009

Page 4: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 4

What is PopCon?

• PopCon (Populator of Condition Objects tool):– is an application package fully integrated in the overall

CMS framework intended to transfer, store, and retrieve condition data in the Offline Databases;

– Assigns metadata information (tag and IOV).

• CMS relies on three ORACLE databases for the condition data.

ICCMSE 2009

OMDS (Online Master Database

System)

ORCON (Offline Reconstruction Condition

DB Online System)

ORCOFF (Offline Reconstruction Condition

Database Offline System)

PopCon

PopConCMSCompact Muon

Solenoid

Page 5: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 5

PopCon UML Diagram

ICCMSE 2009

Page 6: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 6

Central Population of Condition Databases

• Centralized procedure using an account and a dedicated machine in the online network, where a set of automatic jobs was deployed:

ICCMSE 2009

Populate ORCON accounts for each sub-detector,

Monitor any transactions towards them.

Page 7: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 7

Central Population of Condition Databases

• Two possibilities for each sub-detector:– Run automatically the so-called O2O application that

reads from any online source, assigns tag and IOV and uploads data in the dedicated ORCON account (condition data);

– Dropbox (calibration data): users copy data in SQLite format into a dedicated folder, then these data are automatically exported to the sub-detector’s ORCON account.

• PopCon transfers data into the DB accounts:– Creates log information stored in a DB account.

• Watchdog to monitor automatic jobs’ status:– Monitoring information stored in the DB.

ICCMSE 2009

Page 8: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 8

Offline Dropbox

ICCMSE 2009

The user uploads sqlite files containing calibration data and fills in metadata information

Automatic HTTP request for obtaining the next Run to be processed at PromptReco level

The calibration data are exported to ORCON using ssh tunneling, then streamed offline to ORCOFF

Infrastructure that, using Web applications inside Virtual Machine technology, allows the

exportation of calibration data to offline databases

Page 9: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 9

What is PopCon monitoring?

• Open source web based service for heterogeneous DB server performing large data transfers, providing HW and SW monitoring:– DB status and history of all DB transactions:

• Aborted, committed, pending;

– Error monitoring reports:• Identify any mistakes made by users, application failures,

unexpected networks shutdowns, etc.;

– Reports from different users’ perspectives:• Personal views for: Oracle database administrator, CMS

detector manager, CMS sub-detector manager, End user.

ICCMSE 2009

Page 10: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 10

Why PopCon Monitoring?

• We might use the existing web monitoring tool for our purpose, but we need to fulfill the challenge requirements of CMS experiment:– Usage of CMSSW standards:

• Generic CMSSW component to feel comfortable developers and end-users in building and using new package in CMSSW;

– Monitoring the heterogeneous software environment:• Oracle DBs, CMSSW framework and other open source packages;

– Open source product;– CERN Participation in Oracle Technology Beta Programs:

• We need a flexible architecture to handle unexpected error;

– Maximize the performance:• Stress test of CMSSW infrastructure and HW components,• Avoiding bottlenecks due to Huge Data Access (history and

current data).

ICCMSE 2009

Page 11: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 11

PopCon from different users’ perspectives

ICCMSE 2009

The ORACLE DBAdministrator and PopCon Developer

The central CMS detector manager

The CMS sub-detector manager

End - users

Personal reports, and the trend of self-monitoring to check the status of his own jobs

Overview and full report of sub-detector to check all transaction done in a dedicated account

Log Inspection for deep scan, security checks, performance issues

Overview and full report for all detectors’ subsystems

Page 12: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 12

GUI (I) – Table Reporting

• Recent activity recorded from sub-detector RPC (Resistive Plate Chambers) Manager point of view:– General view of last transactions towards a DB account, useful

to keep track of all the new data transfers for a specific sub-detector.

ICCMSE 2009

Page 13: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 13

GUI (II) – Error reporting

• The Central CMS Detector Administrator Error Reporting view:– General view of DB transaction status, useful to

identify the different running jobs and spot quickly problems in DB transactions.

ICCMSE 2009

Data transaction at 11:00 is missing

Page 14: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 14

GUI (III) – Error reporting

• The ORACLE DB Administrator Error Reporting view:– Log report to display information about primary key

violation and inconsistencies in mapping between data members of C++ objects and schema objects.

ICCMSE 2009

destDB: oracle://cms_orcon_prod/CMS_COND_31X_PIXEL, inputtag: GainCalib_TEST_hlt, tag: SiPixelGainCalibrationHLT_2009runs_hlt, from 111740 to , user comment: craft09gains2logDB: oracle://cms_orcon_prod/CMS_COND_31X_POPCONLOGCORAL/RelationalPlugins/oracle Error ORA-00001: unique constraint (CMS_COND_31X_PIXEL.METADATA_PK) violated (Executing the statement)error ---- Conditions BEGINaddMapping: metadata entry "SiPixelGainCalibrationHLT_2009runs_hlt" already exists---- Conditions END

Page 15: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 15

GUI (IV) – Chart reporting

• PopCon activity history:– A multiple line chart view of transaction DB, useful to

have a general view of the status of DBs usage for the central CMS detector manager and The ORACLE DB Administrator.

ICCMSE 2009

Num

ber o

f DB

tran

sacti

ons

Date

Page 16: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 16

Architecture

ICCMSE 2009

Page 17: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 17

Results

• Since the transaction status web monitoring was introduced:– The percentage of transaction failure decreased

• From 28.9% to 15.2%;

– Peak in January due to the introduction of the new tool:• Many users not yet familiar with it.

ICCMSE 2009

Introduction of transaction status monitoring

% o

f fai

lure

aug sep oct nov dec jan feb mar apr may jun jul aug05

10152025303540

Page 18: Web application for detailed real-time database transaction monitoring for CMS condition data ICCMSE 2009 The 7th International Conference of Computational

Salvatore di Guida 18

Upgrades and improvements

• Storing and monitoring logs of quota information for the online account:– DB backend set up, web interface ready to be

deployed.

• SMS/email alert system for end-user in case of transaction failures and DBA-developer in case of hardware/network problems.

• Automatic error resolution in a heterogeneous software environment– See Antonio’s talk.

ICCMSE 2009