complexity management solutions for high energy physics control systems: the cms experiment
DESCRIPTION
Complexity Management Solutions for High Energy Physics Control Systems: The CMS experiment. Ildefons Magrans de Abril CMS Trigger Software Technical Coordinator, CERN, Geneva. Zurich (IBM Research Laboratory) 23 th January 2008. Outline. 1CERN and the LHC 2The CMS experiment - PowerPoint PPT PresentationTRANSCRIPT
Ildefons Magrans, CMS Trigger Software Technical Coordinator 1
Complexity Management Solutions for High Energy Physics Control Systems:
The CMS experiment
Zurich (IBM Research Laboratory)23th January 2008
Ildefons Magrans de Abril CMS Trigger Software Technical Coordinator, CERN, Geneva
Ildefons Magrans, CMS Trigger Software Technical Coordinator 2
OutlineOutline
1 CERN and the LHC
2 The CMS experiment
3 Enhancing complexity management with web services3.1 Software environment model: XSEQ
3.2 Concrete architecture: The CMS Trigger Supervisor
Ildefons Magrans, CMS Trigger Software Technical Coordinator 3
OutlineOutline
1 CERN and the LHC
2 The CMS experiment
3 Enhancing complexity management with web services3.1 Software environment model: XSEQ
3.2 Concrete architecture: The CMS Trigger Supervisor
Ildefons Magrans, CMS Trigger Software Technical Coordinator 4
CERN, CERN, European Organization for Nuclear ResearchEuropean Organization for Nuclear Research
ATLASA ToroidalLHC ApparatuS
CMSCompact MuonSolenoid Large Hadron Collider
CERN provides research facilities to particle physicists worldwide
Ildefons Magrans, CMS Trigger Software Technical Coordinator 5
Large Hadron Collider (LHC)Large Hadron Collider (LHC)
Largest superconducting installation:
•27 Km ring
•3 billion euros
CMS and ATLAS detect collision information (event):
•40 million events/second
Ildefons Magrans, CMS Trigger Software Technical Coordinator 6
OutlineOutline
1 CERN and the LHC
2 The CMS experiment
3 Enhancing complexity management with web services3.1 Software environment model: XSEQ
3.2 Concrete architecture: The CMS Trigger Supervisor
Ildefons Magrans, CMS Trigger Software Technical Coordinator 7
Compact Muon Solenoid (CMS)Compact Muon Solenoid (CMS)
Human complexity:
39 countries
182 Institutes (CERN is 1)
3330 people
~800 students!
Numeric complexity:
21.6 m long
15 m diameter
12500 tones
4 Tesla solenoid (100.000 time earth mag. Field)
1200 m3/hour of water for cooling (~gva jet d’eau 1800)
10 MWatts required for operation (~10.000 houses)
Time complexity:
Design stated 20 years ago!
7-8 years for construction
15 years of expected operational life time
Already developing upgrades
Ildefons Magrans, CMS Trigger Software Technical Coordinator 8
The CMS “sensor”The CMS “sensor”
Silicon Tracker:
Find charged particle tracks and momentum
Electromagnetic Calorimeter:
Measure energy of particles interacting electromagnetically
Hadronic Calorimeter:
Measure energy of particles interacting via the strong nuclear force (heavy neutral particles)
Muon detector:
Find muon tracks
?
Particle physicist
Ildefons Magrans, CMS Trigger Software Technical Coordinator 9
The CMS Trigger and Data Acquisition SystemThe CMS Trigger and Data Acquisition System
We can just store 100 events per second
Solution based on two filter levels: Level-1 Trigger (HW)High Level Trigger (SW)
Control system coordinates experiment operation
40 million events/second~55 million Channels ~1 Mbyte per event
Ildefons Magrans, CMS Trigger Software Technical Coordinator 10
About this talkAbout this talk
CMS Control System. SOFTWARE
L1 Decision Loop and detector front-ends.
HARDWARE
Ildefons Magrans, CMS Trigger Software Technical Coordinator 11
OutlineOutline
1 CERN and the LHC
2 The CMS experiment
3 Enhancing complexity management with web services3.1 Software environment model: XSEQ
3.2 Concrete architecture: The CMS Trigger Supervisor
Ildefons Magrans, CMS Trigger Software Technical Coordinator 12
Context complexityContext complexityNumeric dimension:
Thousands of hardware modules and the same order of electronic links
Time dimension:
•L1 Trigger development starts the year 2000
•L1 Trigger design for the SLHC has already started!
•CERN Linux platform upgrades every 2 years
→ Periodic Software & Hardware upgrades
Human & political dimension:
•Large number of independent research institutes with similar requirements using different technologies (e.g. FPGA vs ASIC, VME vs PCI vs tiny …)
•Most people are particle physicist with few % of time dedicated to SW development. ~20% students
Ildefons Magrans, CMS Trigger Software Technical Coordinator 13
OutlineOutline
1 CERN and the LHC
2 The CMS experiment
3 Enhancing complexity management with web services3.1 Software environment model: XSEQ
3.2 Concrete architecture: The CMS Trigger Supervisor
Ildefons Magrans, CMS Trigger Software Technical Coordinator 14
XSEQ: A Software environment modelXSEQ: A Software environment model
XML ControlSequence (XSEQ)Device
Description
DeviceData
Devices
InterpreterProcesses platform independent control sequences
1. XML as uniform data representation format for both data and code
• Long term technologic inversion (XML is here to stay)
• Maximize usage of standard tools
• Simplify software configuration management2. Interpreted approach for the code
• Execute code independently of the platform
Ildefons Magrans, CMS Trigger Software Technical Coordinator 15
<xseq xmlns="http://xdaq.cern.ch/xseq/basic" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://xdaq.cern.ch/xseq/basicxseqschema.xsd">
<extend …/><secure>
<out><stdout/><add>
<string>Hello </string><string>world!</string>
</add></out><rescue>
…<retry/>
</rescue></secure>
</xseq>
XSEQ languageXSEQ language (XML-based sequencer):
·Syntax specified in xsd documents + Extensions: file system, SOAP, DOM, HW access (PCI & VME).
·Exception handling with error recovery mechanism
·Other: object oriented and design by contract extensions.
XSEQ example 1: Hello worldXSEQ example 1: Hello world
XSEQ syntax core definition
Exception handling
Every tag is a function
Ildefons Magrans, CMS Trigger Software Technical Coordinator 16
XSEQ example 2: hardware accessXSEQ example 2: hardware access<?xml version="1.0" encoding="UTF-8"?><xseq xmlns=“http://xdaq.cern.ch/xseq/basic”
xmlns:hwa="http://xdaq.cern.ch/xseq/hwaext" …>
<extend ns="http://xdaq.cern.ch/xseq/hwaext" url="http://xdaq.cern.ch/xseq" module=“halx86”//>
<variable name="my_device“> <hwa:pcidevice> <url>http://xseq.cern.ch/register_table.xml</url> <hwa:busadapter>PCIi386BusAdapter</hwa:busadapter> <hwa:vendorid format="hex">ecd6</hwa:vendorid> <hwa:deviceid format="hex">fd05</hwa:deviceid> <hwa:index format="dec">0</hwa:index>
</hwa:pcidevice> </variable> <out> <var>my_device</var> <hwa:item>CTRL</hwa:item> <var>my_data</var> </out></xseq>
Device specifications <PciAddressTable …> <Item name=“MMU" address=“04040404" …/> <Item name=“STAT" address="10100" …/> …</PciAddressTable>
Common tools for processing code and data. Simplifies core development
Decoupling syntax and semantic enhances sharing code between sub-systems with similar requirements
Extends interpreter in order to execute a new syntactic extension
Scoped variableNot accessible in upper hierarchical levels
Ildefons Magrans, CMS Trigger Software Technical Coordinator 17
Online software integrationOnline software integration Xseq program (URL)
XDAQ framework: CMS in house developed C++ Middleware
Return message generated by the XSEQ program
Peer transport (SOAP)
XDAQ executive(one per host computer)
Interpreter plug-in
XDAQ application
The running XSEQ program can access the original SOAP message and retrieve parameters
SOAP message specifies the URL of the XSEQ program or embeds a the program itself
Ildefons Magrans, CMS Trigger Software Technical Coordinator 18
<VmeAddressTable …> <Item name=“CTRL" address=“10000" …/> <Item name=“STAT" address="10100" …/> …</VmeAddressTable>
XSEQ example 3: distributed system XSEQ example 3: distributed system
SOAP
Remote server:
StandaloneInterpreter
Client:HEPHY. Vienna. Global Trigger serverCERN. Geneve
SOAP pt
Interpreter plug-in
Xdaq executive
Bus protocol <xseq …> … <secure> <while> …
<switch> <case value=“1">
<call url="http://…script1.xml"/> </case> …
</switch> </while> <rescue>
… <retry/> </rescue> </secure></xseq>
Web repository(http://cmsdoc.cern.ch/~ildefons)
ControlSequence
DeviceData
Languageextension
DeviceDescription
SOAP message sent by Client
<xseq … rpc_msg="msg"> <variable
name="soapPart"> <xoap:getSOAPPart>
<var>msg</var> </xoap:getSOAPPart> </variable> …<gt:configure> <var>board</var> <var>fname</var> <var>chip</var> </gt:configure> … <return> <var>my_msg</var> </return></xseq>
SOAP exension
SOAP messae returned to the client
Ildefons Magrans, CMS Trigger Software Technical Coordinator 19
XSEQ conclusionsXSEQ conclusions
“Good”:• Suitable technologic investment (XML is here to stay)• Reduces in house development (Large asset of standard tools)• Enhances code sharing among sub-systems (extension mechanism)• Enhances platform evolution (interpreted approach)• Simplifies software configuration management (uniform usage of XML for
code/data)
“Bad”:• XML is verbose (programming with XSEQ is not fun), but:
– An XML editor could help– XSEQ could serve as the underlying syntax to store virtual instrumentation
developed with graphical tools like Labview• Just a prototype. It is not being used for production
Ildefons Magrans, CMS Trigger Software Technical Coordinator 20
OutlineOutline
1 CERN and the LHC
2 The CMS experiment
3 Enhancing complexity management with web services3.1 Software environment model: XSEQ
3.2 Concrete architecture: The CMS Trigger Supervisor
Ildefons Magrans, CMS Trigger Software Technical Coordinator 21
CMS Trigger Supervisor ContextCMS Trigger Supervisor Context
~55 Million Channels, ~1 Mbyte per event
CMS Control System. SOFTWARE
L1 Decision Loop. HARDWARE
HW Context
Concept
Prototype
SW Context
Framework
System
Services
Ildefons Magrans, CMS Trigger Software Technical Coordinator 22
HW context: L1-Trigger Decision LoopHW context: L1-Trigger Decision Loop
Partition controller 0 Partition
controller 7
OR (192 L1A)
Slink
OR (192 L1A)
DAQ
Local trigger
Muon Det. Front EndCalorimeters Front End
Local Control 31Local Control 31Local Control 0Local Control 0
Drift Tube Sector Collector
CSC Track Finder
Drift Tube Track finder RPC
Trigger
Trigger Control System
Global Muon Trigger
ECAL TPG HCAL/HF TPG
Regional Calorimeter Trigger
Global Calorimeter Trigger
Global Trigger
DT CSC RPCECAL HCAL HF
L1A + TTC
L1A + TTC
Back pressure
Back pressure
3.2 µs
DAQ
192 L1A’s (128 Algorithms + 64 Technical Triggers)
HW Context
Concept
Prototype
SW Context
Framework
System
Services
Configuration:•64 crates •O(103) boards•Firmware ~ 15 MB/board•O(102) regs/board
Testing:•O(103) links
Integration coordination:• 27 research institutes
Time:•Research: 1992-2000 •Development: 2000-present• Fully replaced by 2010
L1 decision loop operation ~ “business”
Ildefons Magrans, CMS Trigger Software Technical Coordinator 23
SW context: Experiment control systemSW context: Experiment control system
Run Control Session 1
Run Control Session 8
…
DCS Supervisor
DCS Panel
DCS Srv1
DCS Srv2
SD1 DCS
FM DAQ
FM DAQ
FM Triggger
FM Triggger
FM Subdetector 1
FM Subdetector 8
GT OSWIGT
OSWIGMT OSWIGMT OSWIRCT
OSWIRCT
OSWIGCT OSWIGCT
OSWICSCTF OSWI
CSCTF OSWI
…
XDAQ XDAQ
Front end crate Front end crate
Trigger crates
XDAQ
SD8 DCS
XDAQ
Run Control and Monitoring System (RCMS):•Overall experiment control and monitoring•RCMS framework implemented with java
Detector Control System (DCS):•Detector safety, gas and fluid control, cooling system, rack and crate control, high and low voltage control, and detector calibration.•DCS is implemented with PVSSII
Cross-platform Data AcQuisition middleware (XDAQ):•C++ component based distributed programming framework•Used to implement the distributed event builder
L1-Trigger Control and Hardware Monitoring System:Provides a machine and a human interfaces to operate, test and monitor the Level-1 decision loop hardware components.
Project context
Out of project context
(8)
HW Context
Concept
Prototype
SW Context
Framework
System
Services
Experiment control system ~ “business” IT infrastructure
Ildefons Magrans, CMS Trigger Software Technical Coordinator 24
Project phases and terminologyProject phases and terminology
HW Context
“Busines”: To filter the “best” events
Concept•Business needs•Project team
Prototype
Prove of concept
SW Context “Business” software
infrastructure
FrameworkServices and core developments
System
Architecture Services
New “business capabilities”: e.g. configuration
Ildefons Magrans, CMS Trigger Software Technical Coordinator 25
Business needs and project teamBusiness needs and project teamHW Context
Concept
Prototype
SW Context
Framework
System
Services
ECAL energy
Trigger Supervisor GUI
0..n0..n1 1
G. Muon Trigger
HF energyGT/TCS
G. Cal. Trigger
Trigger Supervisor
pattern comp.
DT TF
RPC hits
DT hits
CSC hits
1 1
1
1 1
1
1 11
1
1
1 1
CSC TF
1
HCAL energy
R. Cal. Trigger
1
Experiment control system
Business need: coordinate operation of CMS subsystems (eg. Configuration and test)TS team (2 + 1 or 2 students) :
•Services + core developments•Architecture•Business capabilities•Sub-system developers coordination & support•Communication
1 developer per subsystem:•Uses services to develop the subsystem architecture•Customizes subsystem architecture as required by TS team
0..n
Ildefons Magrans, CMS Trigger Software Technical Coordinator 26
Baseline service infrastructureBaseline service infrastructureHW Context
Concept
Prototype
SW Context
Framework
System
Services
Subsystem OSWI integration effort (C++, Linux)
Supervisory and Control Infrastructure development effort
DCS (PVSSII, Windows) ++ OkRCMS (Java) + +XDAQ (C++, Linux) Ok +
CMS official software frameworks to develop distributed systems: DCS, RCMS, XDAQ:
Subsystems Online SoftWare Infrastructure needs to be integrated Infrastructure should be
oriented to develop SCADA systems
XDAQ-based baseline solution + additional development to reach SCADA framework
Run Control Session 1
Run Control Session 8…
DCS Supervisor
DCS Panel
DCS Srv1
DCS Srv2
SD1 DCS
FM DAQ
FM DAQ
FM Triggger
FM Triggger
FM Subdetector 1
FM Subdetector 8
GTGTGMTGMT
RCTRCTGCT GCT
CSCTF OSWI
CSCTF OSWI
…
XDAQ XDAQ
Front end crate
Front end crate
Trigger crates
XDAQ
SD8 DCS
XDAQ
x8
RUs, Bus, FUs EVMs
RUs, Bus, FUs EVMs
Ildefons Magrans, CMS Trigger Software Technical Coordinator 27
Core development: The CellCore development: The CellHW Context
Concept
Prototype
SW Context
Framework
System
Services
Http/cgi (GUI) SOAP
Commands PoolOperations Pool
Access Control
Command factory
Operations factory
XDAQ Xhannel
Cell Xhannel
Data base Xhannel
Monitor Xhannel
Subsystem hardware driver
Subsystem hardware driver
Response Control
Control panel plug-in
Command plug-inOperation
plug-in
Monitoring data source
Monitorable item handler
Error Mgt.
Module
Synchronous and Asynchronous SOAP API
Other plug-ins:
•Command: RPC method. SOAP API extensions
•Monitoring items
FSM Plug-ins
ei: if (ci) then fi
S1 S2
e1 e2
e3e4
S3
Xhannel infrastructure:Designed to simplify access to web services (SOAP and HTTP/CGI) from operation transition methods•Tstore (DB)•Monitor collector•Cells
Control panel plug-ins
+
e.g. GT panel
e.g. DTTF panel
HTTP/CGI: Automatically generated
e.g. Cell FSM operation
Cell plug-ins (FSM, commands, control panels) hide HW and SW platform evolution
Ildefons Magrans, CMS Trigger Software Technical Coordinator 28
Service providers: building blocksService providers: building blocksHW Context
Concept
Prototype
SW Context
Framework
System
Services
New External Libraries
Cell
s hcp
op m
cd
mx dx cx xx
xe
XDAQ middleware
XDAQ components
XDAQ External Libraries
Tstore
sxe o
MonCollector Mstore
hxe
ss
Job control
sxe
RCMS components
Log Collector
utc c
x
u
j
MonSensor
h
XSo
Tstore: DB interface. Exposes SOAP.
1 per system.
Mon. Collector: Polls all cell sensors.
1 per system.
Mstore: interface M. collector with Tstore.
1 per system.
Job control: Remote startup of XDAQ applications.
1 per host.
XS: Reads logging data base.
1 per cell.
Monitor sensor: Cell interface to poll monitoring information.
1 per cell.
Cell: Facilitates subsystem integration and operation (additional development, next slide).
1 per crate.
Log Collector: •1 per system. •Collects log statements from cells and forward them to consumers.
Architecture based uniquely on these components
Ildefons Magrans, CMS Trigger Software Technical Coordinator 29
ArchitectureArchitecture
s hcp
op m
cd
mx dx cx xx
xe
Tstore
sxe o
MonCollector Mstore
hxe
ss
Job control
sxe
Log Collector
utc c
x
u
j
MonSensor
h
XSo
+
+
Building blocks
•User’s guide
•Workshops
•Support
Subsystem Usage model proposal
=
Control systemMonitoring system
Logging system
Start-up system
HW Context
Concept
Prototype
SW Context
Framework
System
Services
Ildefons Magrans, CMS Trigger Software Technical Coordinator 30
Control architectureControl architectureHW Context
Concept
Prototype
SW Context
Framework
System
Services
cx
s
dxcx
s
s
dx
s
dx
s
dx
s
dx…
Tstores o
ConfigurationDB
o
dx
…ddd
h
hh
hhh
d
Central Cell
Subsystem Central Cell
Crate Cell
1 crate ~ 1 cell
Multicrate subsystems ~ 2 level of subsystem cells (1 subsystem central cell)
Centralized access to DBs
Hierarchical control system
Stable infrastructure in top of what new “business” capabilities can be defined
Ildefons Magrans, CMS Trigger Software Technical Coordinator 31
Monitoring architectureMonitoring architectureHW Context
Concept
Prototype
SW Context
Framework
System
Services
…Tstores o
Monitoring DB
omx
h
h
xe sensorm
s
xe sensorm
s
mx
hxe sensorm
s
d
mx
hxe sensorm
s
dmx
hxe sensorm
s
dmx
hxe sensorm
s
d
…
MonCollector Mstore
s s
h
Externalsystem
h
1 cell ~ 1 sensor
Centralized system:
1 Collector, 1 Mstore
Centralized access to DBs
Infrastructure that facilitates the hardware monitoring
Ildefons Magrans, CMS Trigger Software Technical Coordinator 32
Logging and start-up architectureLogging and start-up architectureHW Context
Concept
Prototype
SW Context
Framework
System
Services
Job control
xe
Job control
xeJob
control
xe
Job control
xeJob
control
xe Job control
xe
…
…
s
Start-up manager
s
ss
sss
…u
h
h
xe XS
xe XS
u
hxe XS
d
u
hxe XS
du
hxe XS
du
hxe XS
d
…
o
o
o
u
Log Collector
u
LoggingDB
ChainsawXML file
Console
ooo
j
c
x
Log Collector
u
j
c
x
u
j
o
1 cell ~ 1 XS
Centralized system:
1 Collector1 host ~ 1 JC
Auxiliar infrastructure
Ildefons Magrans, CMS Trigger Software Technical Coordinator 33
New business capabilities: “How to”New business capabilities: “How to”HW Context
Concept
Prototype
SW Context
Framework
System
Services
cx
s
dxcx
s
s
dx
s
dx
s
dx
s
dx…
Tstores o
ConfigurationDB
o
dx
…ddd
h
hh
hhh
d
Central Cell
Subsystem Central Cell
Crate Cell
Entry cell Operation states Operation transitions Service testOperation transition methods
Particle physicist manager
SOAP or Http/cgi (GUI)
Operations Pool
Operations factory
Cell Xhannel
Data base Xhannel
Entry Cell
CellContext
S1 S2 S3 S4
e12() e23() e34()
e43()
SOAP or Http/cgi (GUI)
Operations Pool
Operations factory
Cell Xhannel
Data base Xhannel
Entry CellCellContext
SOAP or Http/cgi (GUI)
Operations Pool
Operations factory
Cell Xhannel
Data base Xhannel
Entry CellCellContext
S1 S2 S3
e12() e23()
S1 S2 S3
e12() e23()
Subsystem SW developer
New “business” capabilities can be coordinated by particle physicist managers without SW expertise
Ildefons Magrans, CMS Trigger Software Technical Coordinator 34
Trigger Supervisor conclusionsTrigger Supervisor conclusions
• Design:
HW Context
Concept
Prototype
SW Context
Framework
System
Services
Services, architecture and “business” capabilities
1 Services:– Reduced number of building blocks already developed in-house (but the Cell) – Main building block: Cell
• Isolates Hardware/Software evolution from architecture implementation• Adapts sub-system integration tasks to the human context academic
background (Non SW experts)2 Architecture:
– Uniquely based on 7 building blocks• Simplifies sub-system integration coordination
– Stable infrastructure• Isolates services evolution from the implementation of business capabilities
3 New “business” capabilities:– Coordination methodology associated with the architecture
• Facilitates the implementation of new “business capabilities” taking into account the academic background of managers (Non SW experts)
Ildefons Magrans, CMS Trigger Software Technical Coordinator 35
SummarySummaryEnhancing control systems design & development
with web-services technologies:1. XML-based programming language:
• Maximizes usage of existing XML standards and tools, good
tech. investment, max. code sharing and platf. evolution
2. Control system design example:
• Services: Hides HW/SW evolution
• Architecture: Hides Services evolution, stable infrastructure
• Business capabilities: Developed in top of the architecture
Ildefons Magrans, CMS Trigger Software Technical Coordinator 36
Thank you very much!Thank you very much!
… For more information:
http://triggersupervisor.cern.ch