first year experience with the atlas online monitoring framework alina corso-radu university of...

19
First year experience with First year experience with the ATLAS online the ATLAS online monitoring framework monitoring framework Alina Corso-Radu University of California Irvine on behalf of ATLAS TDAQ Collaboration CHEP 2009, March 23 rd -27 th Prague

Upload: lee-gibson

Post on 23-Dec-2015

217 views

Category:

Documents


4 download

TRANSCRIPT

First year experience First year experience with the ATLAS online with the ATLAS online monitoring frameworkmonitoring framework

First year experience First year experience with the ATLAS online with the ATLAS online monitoring frameworkmonitoring framework

Alina Corso-Radu University of California Irvine

on behalf of ATLAS TDAQ Collaboration

Alina Corso-Radu University of California Irvine

on behalf of ATLAS TDAQ Collaboration

CHEP 2009, March 23rd -27th Prague

22

OutlineOutline

ATLAS trigger and data acquisition system in a glance

Online monitoring frameworkReadiness of the online monitoring for

runs with cosmic rays and first LHC beam

Conclusions

ATLAS trigger and data acquisition system in a glance

Online monitoring frameworkReadiness of the online monitoring for

runs with cosmic rays and first LHC beam

Conclusions

33

High Level TriggersSoftware based

ATLAS Trigger/DAQATLAS Trigger/DAQ

Data Storage

Event Filter~200 Hz

Event Builder (EB)

LVL2 Trigger~3 kHz

Read Out Systems (ROSs)

Pixel TileCal LAr MDTSCT TRT

CalorimeterInner DetectorMuon

Spectrometer

RPC TGCCSC LVL1 Trigger<100 kHz

Interaction rate ~1 GHzBunch crossing rate 40 MHz

•Coarse granularity data•Calorimeter and Muon based•Identifies Regions of Interest

•Partial event reconstruction in Regions of Interest•Full granularity data•Trigger algorithms optimized for fast rejection

•Full event reconstruction seeded by LVL2•Trigger algorithms similar to offline

900 farm nodes1/3 of final system

150 PC

~100 PC

Hardware based

44

Online monitoring frameworkOnline monitoring framework

Analyze events content and produce histograms

Analyze operational conditions of hardware and software detector elements, trigger and data acquisition systems.

Automatic checks of histogram and operational data Visualize and save results Produce visual alerts

Set of tools to visualize information aimed for the shifters

Automatic archiving of histograms Monitoring data available remotely

Event Analysis Frameworks

Event Analysis Frameworks

Data Quality Analysis

Framework

Data Quality Analysis

Framework

Data Monitoring Archiving

Tools

Data Monitoring Archiving

Tools

Visualization Tools

Visualization Tools

Web ServiceWeb

Service

Event samples

Operational data

Data

Flow

: RO

D/LV

L1/H

LT

About 35 dedicated machines

Information ServiceInformation Service

Complexity and diversity in terms of monitoring needs of the sub-systems

55

Data Quality Monitoring Framework

Data Quality Monitoring Framework

Data Quality Monitoring Framework

Data Quality Monitoring Framework

Configuration DBConfiguration DB Conditions DBConditions DB

Data Quality monitoring

display

Data Quality monitoring

display

Event Analysis Frameworks

Event Analysis Frameworks

Data

Flo

w:

RO

D/LV

L1/H

LT Input Interface

Output Interface

Configuration Interface

Information ServiceInformation Service

Histograms

Configuration

DQResults

DQResults

Distributed framework that provides the mechanism to execute automatic checks on histograms and to produce results according to a particular user configuration.

Input and Output classes can be provided as plug-ins. Custom plug-ins are supported

About 40 predefined algorithms exists (Histogram empty, Mean values, Fits, Reference comparison, etc)

Custom algorithms are allowed Writes DQ Results automatically to Conditions Database.

Distributed framework that provides the mechanism to execute automatic checks on histograms and to produce results according to a particular user configuration.

Input and Output classes can be provided as plug-ins. Custom plug-ins are supported

About 40 predefined algorithms exists (Histogram empty, Mean values, Fits, Reference comparison, etc)

Custom algorithms are allowed Writes DQ Results automatically to Conditions Database.

Event samplers

Histograms

66

DQM DisplayDQM Display Summary panel shows overall

color-coded DQ status produced by DQMF per sub-system

Run Control conditions Log information

Details panel offers access to the detailed monitored information Checked histograms and

their references Configuration information

(algorithms, thresholds, etc.)

History tab displays time evolution of DQResults.

Details panel offers access to the detailed monitored information Checked histograms and

their references Configuration information

(algorithms, thresholds, etc.)

History tab displays time evolution of DQResults.

About 17 thousands histograms checked

Shifter attention focused on bad histograms

77

DQM Display - layoutsDQM Display - layouts

DQM Display allows for a graphical representation of the sub-systems and their components using detector-like pictorial views

Bad histograms spotted even faster

Expert tool DQM Configurator for editing configuration, aimed at layouts and shapes.

from a existing configuration one can attach layouts and shapes

these layouts are created and displayed online the same way they will show in the DQM Display

experts can tune layout/shape parameters until they look as required

DQM Display allows for a graphical representation of the sub-systems and their components using detector-like pictorial views

Bad histograms spotted even faster

Expert tool DQM Configurator for editing configuration, aimed at layouts and shapes.

from a existing configuration one can attach layouts and shapes

these layouts are created and displayed online the same way they will show in the DQM Display

experts can tune layout/shape parameters until they look as required

88

Online Histogram PresenterOnline Histogram Presenter

Supports hierarchy of tabs which contain predefined set of histograms

Reference histograms can be displayed as well

Sub-systems normally have several tabs with most important histograms which have to be watched out

Supports hierarchy of tabs which contain predefined set of histograms

Reference histograms can be displayed as well

Sub-systems normally have several tabs with most important histograms which have to be watched out

Main shifter tool for checking histograms manually

Main shifter tool for checking histograms manually

99

Trigger PresenterTrigger PresenterTrigger PresenterTrigger Presenter

Presents trigger specific information in a user friendly way: Trigger rates Trigger Chains

information HLT farms status

Reflect status of HLT sub-farms using DQMF color codes.

Implemented as an OHP plug-in

Presents trigger specific information in a user friendly way: Trigger rates Trigger Chains

information HLT farms status

Reflect status of HLT sub-farms using DQMF color codes.

Implemented as an OHP plug-in

1010

Histogram ArchivingHistogram ArchivingHistogram ArchivingHistogram Archiving

Almost 100 thousands histograms are currently saved at the end of a run (~200 MB per run)

Reads histograms from IS with respect to the given configuration and saves them to Root files

Registers Root files to Collection and Cache service Accumulates files into large archives and send them to CDR Archiving is done asynchronously with respect to the Run

states/transitions Histograms archived can be browsed as well by a dedicated tool

Almost 100 thousands histograms are currently saved at the end of a run (~200 MB per run)

Reads histograms from IS with respect to the given configuration and saves them to Root files

Registers Root files to Collection and Cache service Accumulates files into large archives and send them to CDR Archiving is done asynchronously with respect to the Run

states/transitions Histograms archived can be browsed as well by a dedicated tool

Monitoring Data Archiving

Monitoring Data Archiving

Collection and Cache

Collection and Cache

CDRCDRArchive BrowserArchive Browser

Histograms ROOT files

ZIP

Information Service

Information Service

1111

Operational Monitoring DisplayOperational Monitoring DisplayOperational Monitoring DisplayOperational Monitoring Display Each process in the system publishes its status and running statistics

into Information Service => O(1)M objects Reads IS information with respect to user configuration and display it as

time series graphs, bar charts. Analyses distributions against thresholds Groups and highlights the information for the shifter

Each process in the system publishes its status and running statistics into Information Service => O(1)M objects

Reads IS information with respect to user configuration and display it as time series graphs, bar charts.

Analyses distributions against thresholds Groups and highlights the information for the shifter

Is being mostly used for the HLT farms status: CPU, memory, events distribution

Is being mostly used for the HLT farms status: CPU, memory, events distribution

1212

Event DisplaysEvent Displays Atlantis:

Java based 2D event display VP1 :

3D Event display running in offline reconstruction framework

Both Atlantis and VP1 have been used during Commissioning runs and LHC start-up

Atlantis: Java based 2D event display

VP1 : 3D Event display running in

offline reconstruction framework

Both Atlantis and VP1 have been used during Commissioning runs and LHC start-up

Both Atlantis and VP1 can be used in remote monitoring mode - capable of browsing recent events via an http server.

Both Atlantis and VP1 can be used in remote monitoring mode - capable of browsing recent events via an http server.

1313

Remote access to the Remote access to the monitoring informationmonitoring informationRemote access to the Remote access to the

monitoring informationmonitoring information

Public - monitoring via Web Interface: Information is updated periodically No access restrictions

Expert and Shifter - monitoring via the mirror partition: Quasi real time information access Restricted access

Public - monitoring via Web Interface: Information is updated periodically No access restrictions

Expert and Shifter - monitoring via the mirror partition: Quasi real time information access Restricted access

1414

Web Monitoring InterfaceWeb Monitoring Interface

Generic framework which is running at P1 and is publishing information periodically to the Web

The information which is published is provided by plug-ins: currently two

Generic framework which is running at P1 and is publishing information periodically to the Web

The information which is published is provided by plug-ins: currently two

Run Status shows status and basic parameters for all active partitions at P1.

Data Quality shows the same information as the DQM Display (histograms, results, references, etc.) with few min. update interval.

Run Status shows status and basic parameters for all active partitions at P1.

Data Quality shows the same information as the DQM Display (histograms, results, references, etc.) with few min. update interval.

Monitoring at Point 1 (ATCN)

Data

Flo

w: L

VL1

/HLT

Remote Monitoring(CERN GPN)

Event Analysis Frameworks

Event Analysis Frameworks

Data Quality Analysis

Framework

Data Quality Analysis

Framework

Data Monitoring Archiving

Tools

Data Monitoring Archiving

Tools

Visualization Tools

Visualization Tools

Web ServiceWeb

Service

Web BrowserWeb

Browser

Information ServiceInformation Service

1515

Remote Monitoring via Remote Monitoring via mirror partitionmirror partition

Remote Monitoring via Remote Monitoring via mirror partitionmirror partition

Remote users are able to open remote session on one of the dedicated machines located at CERN GPN: Environment looks exactly like at P1 All monitoring tool displays are available and work exactly as at P1

The production system setup supports up to 24 concurrent remote users

Remote users are able to open remote session on one of the dedicated machines located at CERN GPN: Environment looks exactly like at P1 All monitoring tool displays are available and work exactly as at P1

The production system setup supports up to 24 concurrent remote users

Monitoring at Point 1 (ATCN)

Data

Flo

w: L

VL1/H

LT

Remote Monitoring(CERN GPN)

Event Analysis Frameworks

Event Analysis Frameworks

Data Quality Analysis

Framework

Data Quality Analysis

Framework

Data Monitoring Archiving

Tools

Data Monitoring Archiving

Tools

Visualization Tools

Visualization Tools

Web ServiceWeb

Service

Web BrowserWeb

Browser

Visualization Tools

Visualization Tools

Information ServiceInformation ServiceInformation

Service MirrorInformation

Service Mirror

Almost all information from Information Service is replicated to the mirroring partition

The information is available in the mirror partition with the O(1) ms delay

Almost all information from Information Service is replicated to the mirroring partition

The information is available in the mirror partition with the O(1) ms delay

1616

Performance achievedPerformance achieved

Online Monitoring Infrastructure is in place and is functioning reliably: More than 150 event monitoring tasks are started per run Handles more than 4 millions histogram updates per minute Almost 100 thousands histograms are saved at the end of a

run (~200 MB) Data Quality status are calculated online (about 10 thousands

histograms checked/min.) and stored in database. Several Atlantis event displays are always running in the

ATLAS Control Room and Satellite Control Rooms showing events for several data streams

Monitoring data is replicated in real-time to the mirror partition running outside P1 (with few msec delay)

Remote monitoring pilot system deployed successfully

Online Monitoring Infrastructure is in place and is functioning reliably: More than 150 event monitoring tasks are started per run Handles more than 4 millions histogram updates per minute Almost 100 thousands histograms are saved at the end of a

run (~200 MB) Data Quality status are calculated online (about 10 thousands

histograms checked/min.) and stored in database. Several Atlantis event displays are always running in the

ATLAS Control Room and Satellite Control Rooms showing events for several data streams

Monitoring data is replicated in real-time to the mirror partition running outside P1 (with few msec delay)

Remote monitoring pilot system deployed successfully

1717

ConclusionsConclusions

The tests performed on the system indicate that the online monitoring framework architecture meets ATLAS requirements.

The monitoring tools have been successfully used during data taking in detector commissioning runs and during LHC start-up.

Further details on DQM Display, Online Histogram Presenter and Gatherer on dedicated posters.

The tests performed on the system indicate that the online monitoring framework architecture meets ATLAS requirements.

The monitoring tools have been successfully used during data taking in detector commissioning runs and during LHC start-up.

Further details on DQM Display, Online Histogram Presenter and Gatherer on dedicated posters.

……

1919

Users have to provide:

Framework componentsFramework components

GNAM P

Data

Flow

: RO

D/LV

L1/H

LT

EMON Event Monitoring

MonaIsa

P

OMD (Operational MonitoringDisplay)

C

Event Filter PT(Processing Task)

JO

IS (Information Service)

Gatherer

OHP (Online Histogram Presenter) C

DQMF (Data Quality Mon Framework) C

MDA (Mon Data Archiving) C

OH

(Onlin

e H

istog

rann

ing)

C

P

JO

Configuration files

Plugins (C++ code)

Job Option files

TriP (Trigger Presenter)

Event Display (ATLANTIS, VP1)

WMI (Web Monitoring Interface) P

C