first year experience with the atlas online monitoring framework alina corso-radu university of...
TRANSCRIPT
First year experience First year experience with the ATLAS online with the ATLAS online monitoring frameworkmonitoring framework
First year experience First year experience with the ATLAS online with the ATLAS online monitoring frameworkmonitoring framework
Alina Corso-Radu University of California Irvine
on behalf of ATLAS TDAQ Collaboration
Alina Corso-Radu University of California Irvine
on behalf of ATLAS TDAQ Collaboration
CHEP 2009, March 23rd -27th Prague
22
OutlineOutline
ATLAS trigger and data acquisition system in a glance
Online monitoring frameworkReadiness of the online monitoring for
runs with cosmic rays and first LHC beam
Conclusions
ATLAS trigger and data acquisition system in a glance
Online monitoring frameworkReadiness of the online monitoring for
runs with cosmic rays and first LHC beam
Conclusions
33
High Level TriggersSoftware based
ATLAS Trigger/DAQATLAS Trigger/DAQ
Data Storage
Event Filter~200 Hz
Event Builder (EB)
LVL2 Trigger~3 kHz
Read Out Systems (ROSs)
Pixel TileCal LAr MDTSCT TRT
CalorimeterInner DetectorMuon
Spectrometer
RPC TGCCSC LVL1 Trigger<100 kHz
Interaction rate ~1 GHzBunch crossing rate 40 MHz
•Coarse granularity data•Calorimeter and Muon based•Identifies Regions of Interest
•Partial event reconstruction in Regions of Interest•Full granularity data•Trigger algorithms optimized for fast rejection
•Full event reconstruction seeded by LVL2•Trigger algorithms similar to offline
900 farm nodes1/3 of final system
150 PC
~100 PC
Hardware based
44
Online monitoring frameworkOnline monitoring framework
Analyze events content and produce histograms
Analyze operational conditions of hardware and software detector elements, trigger and data acquisition systems.
Automatic checks of histogram and operational data Visualize and save results Produce visual alerts
Set of tools to visualize information aimed for the shifters
Automatic archiving of histograms Monitoring data available remotely
Event Analysis Frameworks
Event Analysis Frameworks
Data Quality Analysis
Framework
Data Quality Analysis
Framework
Data Monitoring Archiving
Tools
Data Monitoring Archiving
Tools
Visualization Tools
Visualization Tools
Web ServiceWeb
Service
Event samples
Operational data
Data
Flow
: RO
D/LV
L1/H
LT
About 35 dedicated machines
Information ServiceInformation Service
Complexity and diversity in terms of monitoring needs of the sub-systems
55
Data Quality Monitoring Framework
Data Quality Monitoring Framework
Data Quality Monitoring Framework
Data Quality Monitoring Framework
Configuration DBConfiguration DB Conditions DBConditions DB
Data Quality monitoring
display
Data Quality monitoring
display
Event Analysis Frameworks
Event Analysis Frameworks
Data
Flo
w:
RO
D/LV
L1/H
LT Input Interface
Output Interface
Configuration Interface
Information ServiceInformation Service
Histograms
Configuration
DQResults
DQResults
Distributed framework that provides the mechanism to execute automatic checks on histograms and to produce results according to a particular user configuration.
Input and Output classes can be provided as plug-ins. Custom plug-ins are supported
About 40 predefined algorithms exists (Histogram empty, Mean values, Fits, Reference comparison, etc)
Custom algorithms are allowed Writes DQ Results automatically to Conditions Database.
Distributed framework that provides the mechanism to execute automatic checks on histograms and to produce results according to a particular user configuration.
Input and Output classes can be provided as plug-ins. Custom plug-ins are supported
About 40 predefined algorithms exists (Histogram empty, Mean values, Fits, Reference comparison, etc)
Custom algorithms are allowed Writes DQ Results automatically to Conditions Database.
Event samplers
Histograms
66
DQM DisplayDQM Display Summary panel shows overall
color-coded DQ status produced by DQMF per sub-system
Run Control conditions Log information
Details panel offers access to the detailed monitored information Checked histograms and
their references Configuration information
(algorithms, thresholds, etc.)
History tab displays time evolution of DQResults.
Details panel offers access to the detailed monitored information Checked histograms and
their references Configuration information
(algorithms, thresholds, etc.)
History tab displays time evolution of DQResults.
About 17 thousands histograms checked
Shifter attention focused on bad histograms
77
DQM Display - layoutsDQM Display - layouts
DQM Display allows for a graphical representation of the sub-systems and their components using detector-like pictorial views
Bad histograms spotted even faster
Expert tool DQM Configurator for editing configuration, aimed at layouts and shapes.
from a existing configuration one can attach layouts and shapes
these layouts are created and displayed online the same way they will show in the DQM Display
experts can tune layout/shape parameters until they look as required
DQM Display allows for a graphical representation of the sub-systems and their components using detector-like pictorial views
Bad histograms spotted even faster
Expert tool DQM Configurator for editing configuration, aimed at layouts and shapes.
from a existing configuration one can attach layouts and shapes
these layouts are created and displayed online the same way they will show in the DQM Display
experts can tune layout/shape parameters until they look as required
88
Online Histogram PresenterOnline Histogram Presenter
Supports hierarchy of tabs which contain predefined set of histograms
Reference histograms can be displayed as well
Sub-systems normally have several tabs with most important histograms which have to be watched out
Supports hierarchy of tabs which contain predefined set of histograms
Reference histograms can be displayed as well
Sub-systems normally have several tabs with most important histograms which have to be watched out
Main shifter tool for checking histograms manually
Main shifter tool for checking histograms manually
99
Trigger PresenterTrigger PresenterTrigger PresenterTrigger Presenter
Presents trigger specific information in a user friendly way: Trigger rates Trigger Chains
information HLT farms status
Reflect status of HLT sub-farms using DQMF color codes.
Implemented as an OHP plug-in
Presents trigger specific information in a user friendly way: Trigger rates Trigger Chains
information HLT farms status
Reflect status of HLT sub-farms using DQMF color codes.
Implemented as an OHP plug-in
1010
Histogram ArchivingHistogram ArchivingHistogram ArchivingHistogram Archiving
Almost 100 thousands histograms are currently saved at the end of a run (~200 MB per run)
Reads histograms from IS with respect to the given configuration and saves them to Root files
Registers Root files to Collection and Cache service Accumulates files into large archives and send them to CDR Archiving is done asynchronously with respect to the Run
states/transitions Histograms archived can be browsed as well by a dedicated tool
Almost 100 thousands histograms are currently saved at the end of a run (~200 MB per run)
Reads histograms from IS with respect to the given configuration and saves them to Root files
Registers Root files to Collection and Cache service Accumulates files into large archives and send them to CDR Archiving is done asynchronously with respect to the Run
states/transitions Histograms archived can be browsed as well by a dedicated tool
Monitoring Data Archiving
Monitoring Data Archiving
Collection and Cache
Collection and Cache
CDRCDRArchive BrowserArchive Browser
Histograms ROOT files
ZIP
Information Service
Information Service
1111
Operational Monitoring DisplayOperational Monitoring DisplayOperational Monitoring DisplayOperational Monitoring Display Each process in the system publishes its status and running statistics
into Information Service => O(1)M objects Reads IS information with respect to user configuration and display it as
time series graphs, bar charts. Analyses distributions against thresholds Groups and highlights the information for the shifter
Each process in the system publishes its status and running statistics into Information Service => O(1)M objects
Reads IS information with respect to user configuration and display it as time series graphs, bar charts.
Analyses distributions against thresholds Groups and highlights the information for the shifter
Is being mostly used for the HLT farms status: CPU, memory, events distribution
Is being mostly used for the HLT farms status: CPU, memory, events distribution
1212
Event DisplaysEvent Displays Atlantis:
Java based 2D event display VP1 :
3D Event display running in offline reconstruction framework
Both Atlantis and VP1 have been used during Commissioning runs and LHC start-up
Atlantis: Java based 2D event display
VP1 : 3D Event display running in
offline reconstruction framework
Both Atlantis and VP1 have been used during Commissioning runs and LHC start-up
Both Atlantis and VP1 can be used in remote monitoring mode - capable of browsing recent events via an http server.
Both Atlantis and VP1 can be used in remote monitoring mode - capable of browsing recent events via an http server.
1313
Remote access to the Remote access to the monitoring informationmonitoring informationRemote access to the Remote access to the
monitoring informationmonitoring information
Public - monitoring via Web Interface: Information is updated periodically No access restrictions
Expert and Shifter - monitoring via the mirror partition: Quasi real time information access Restricted access
Public - monitoring via Web Interface: Information is updated periodically No access restrictions
Expert and Shifter - monitoring via the mirror partition: Quasi real time information access Restricted access
1414
Web Monitoring InterfaceWeb Monitoring Interface
Generic framework which is running at P1 and is publishing information periodically to the Web
The information which is published is provided by plug-ins: currently two
Generic framework which is running at P1 and is publishing information periodically to the Web
The information which is published is provided by plug-ins: currently two
Run Status shows status and basic parameters for all active partitions at P1.
Data Quality shows the same information as the DQM Display (histograms, results, references, etc.) with few min. update interval.
Run Status shows status and basic parameters for all active partitions at P1.
Data Quality shows the same information as the DQM Display (histograms, results, references, etc.) with few min. update interval.
Monitoring at Point 1 (ATCN)
Data
Flo
w: L
VL1
/HLT
Remote Monitoring(CERN GPN)
Event Analysis Frameworks
Event Analysis Frameworks
Data Quality Analysis
Framework
Data Quality Analysis
Framework
Data Monitoring Archiving
Tools
Data Monitoring Archiving
Tools
Visualization Tools
Visualization Tools
Web ServiceWeb
Service
Web BrowserWeb
Browser
Information ServiceInformation Service
1515
Remote Monitoring via Remote Monitoring via mirror partitionmirror partition
Remote Monitoring via Remote Monitoring via mirror partitionmirror partition
Remote users are able to open remote session on one of the dedicated machines located at CERN GPN: Environment looks exactly like at P1 All monitoring tool displays are available and work exactly as at P1
The production system setup supports up to 24 concurrent remote users
Remote users are able to open remote session on one of the dedicated machines located at CERN GPN: Environment looks exactly like at P1 All monitoring tool displays are available and work exactly as at P1
The production system setup supports up to 24 concurrent remote users
Monitoring at Point 1 (ATCN)
Data
Flo
w: L
VL1/H
LT
Remote Monitoring(CERN GPN)
Event Analysis Frameworks
Event Analysis Frameworks
Data Quality Analysis
Framework
Data Quality Analysis
Framework
Data Monitoring Archiving
Tools
Data Monitoring Archiving
Tools
Visualization Tools
Visualization Tools
Web ServiceWeb
Service
Web BrowserWeb
Browser
Visualization Tools
Visualization Tools
Information ServiceInformation ServiceInformation
Service MirrorInformation
Service Mirror
Almost all information from Information Service is replicated to the mirroring partition
The information is available in the mirror partition with the O(1) ms delay
Almost all information from Information Service is replicated to the mirroring partition
The information is available in the mirror partition with the O(1) ms delay
1616
Performance achievedPerformance achieved
Online Monitoring Infrastructure is in place and is functioning reliably: More than 150 event monitoring tasks are started per run Handles more than 4 millions histogram updates per minute Almost 100 thousands histograms are saved at the end of a
run (~200 MB) Data Quality status are calculated online (about 10 thousands
histograms checked/min.) and stored in database. Several Atlantis event displays are always running in the
ATLAS Control Room and Satellite Control Rooms showing events for several data streams
Monitoring data is replicated in real-time to the mirror partition running outside P1 (with few msec delay)
Remote monitoring pilot system deployed successfully
Online Monitoring Infrastructure is in place and is functioning reliably: More than 150 event monitoring tasks are started per run Handles more than 4 millions histogram updates per minute Almost 100 thousands histograms are saved at the end of a
run (~200 MB) Data Quality status are calculated online (about 10 thousands
histograms checked/min.) and stored in database. Several Atlantis event displays are always running in the
ATLAS Control Room and Satellite Control Rooms showing events for several data streams
Monitoring data is replicated in real-time to the mirror partition running outside P1 (with few msec delay)
Remote monitoring pilot system deployed successfully
1717
ConclusionsConclusions
The tests performed on the system indicate that the online monitoring framework architecture meets ATLAS requirements.
The monitoring tools have been successfully used during data taking in detector commissioning runs and during LHC start-up.
Further details on DQM Display, Online Histogram Presenter and Gatherer on dedicated posters.
The tests performed on the system indicate that the online monitoring framework architecture meets ATLAS requirements.
The monitoring tools have been successfully used during data taking in detector commissioning runs and during LHC start-up.
Further details on DQM Display, Online Histogram Presenter and Gatherer on dedicated posters.
1919
Users have to provide:
Framework componentsFramework components
GNAM P
Data
Flow
: RO
D/LV
L1/H
LT
EMON Event Monitoring
MonaIsa
P
OMD (Operational MonitoringDisplay)
C
Event Filter PT(Processing Task)
JO
IS (Information Service)
Gatherer
OHP (Online Histogram Presenter) C
DQMF (Data Quality Mon Framework) C
MDA (Mon Data Archiving) C
OH
(Onlin
e H
istog
rann
ing)
C
P
JO
Configuration files
Plugins (C++ code)
Job Option files
TriP (Trigger Presenter)
Event Display (ATLANTIS, VP1)
WMI (Web Monitoring Interface) P
C