clara gaspar, may 2010 the lhcb run control system an integrated and homogeneous control system
Post on 22-Dec-2015
214 views
TRANSCRIPT
Clara Gaspar, May 2010
The LHCb Run Control System
An Integrated and Homogeneous Control
System
22Clara Gaspar, May 2010
The Experiment Control System
Detector Channels
Front End Electronics
Readout Network
HLT Farm
Storage
L0
Experi
men
t C
on
trol S
yst
em
DAQ
DCS Devices (HV, LV, GAS, Cooling, etc.)
External Systems (LHC, Technical Services, Safety, etc)
TFC
Monitoring Farm
❚ Is in charge of the Control and Monitoring of all parts of the experiment
33Clara Gaspar, May 2010
Some Requirements❚ Large number of devices/IO channels
➨ Need for Distributed Hierarchical Control❘ De-composition in Systems, sub-systems, … , Devices
❘ Local decision capabilities in sub-systems
❚ Large number of independent teams and very different operation modes➨ Need for Partitioning Capabilities (concurrent
usage)
❚ High Complexity & Non-expert Operators➨ Need for Full Automation of:
❘ Standard Procedures
❘ Error Recovery Procedures
➨ And for Intuitive User Interfaces
44Clara Gaspar, May 2010
Design Steps❚ In order to achieve an integrated System:
❙Promoted HW Standardization (so that common components could be re-used)
❘Ex.: Mainly two control interfaces to all LHCb electronics〡Credit Card sized PCs (CCPC) for non-radiation zones
〡A serial protocol (SPECS) for electronics in radiation areas
❙Defined an Architecture❘That could fit all areas and all aspects of the monitoring and
control of the full experiment
❙Provided a Framework❘An integrated collection of guidelines, tools and components
that allowed the development of each sub-system coherently in view of its integration in the complete system
55Clara Gaspar, May 2010
Generic SW Architecture
LVDev1
LVDev2
LVDevN
DCS
SubDetNDCS
SubDet2DCS
SubDet1DCS
SubDet1LV
SubDet1TEMP
SubDet1GAS
…
…
Com
man
ds
DAQ
SubDetNDAQ
SubDet2DAQ
SubDet1DAQ
SubDet1FEE
SubDet1RO
FEEDev1
FEEDev2
FEEDevN
ControlUnit
DeviceUnit
…
…
Legend:
INFR. TFC LHC
ECS
HLT
Sta
tus
& A
larm
s
66Clara Gaspar, May 2010
❚The JCOP* Framework is based on:
❙SCADA System - PVSSII for:❘Device Description (Run-time Database)
❘Device Access (OPC, Profibus, drivers)
❘Alarm Handling (Generation, Filtering, Masking, etc)
❘Archiving, Logging, Scripting, Trending
❘User Interface Builder
❘Alarm Display, Access Control, etc.
❙SMI++ providing:❘Abstract behavior modeling (Finite State Machines)
❘Automation & Error Recovery (Rule based system)
* – The Joint COntrols Project (between the 4 LHC exp. and the CERN Control Group)
The Control FrameworkD
evic
e U
nit
s
Con
trol U
nit
s
77Clara Gaspar, May 2010
Device Units❚Provide access to “real” devices:
❙The Framework provides (among others):❘“Plug and play” modules for commonly used
equipment. For example: 〡CAEN or Wiener power supplies (via OPC)〡LHCb CCPC and SPECS based electronics (via DIM)
❘A protocol (DIM) for interfacing “home made” devices. For example:
〡Hardware devices like a calibration source 〡Software devices like the Trigger processes
(based on LHCb’s offline framework – GAUDI)
❘Each device is modeled as a Finite State Machine
DeviceUnit
88Clara Gaspar, May 2010
Hierarchical control❚Each Control Unit:
❙ Is defined as one or more Finite State Machines❙Can implement rules based on its children’s states❙ In general it is able to:
❘Summarize information (for the above levels)
❘“Expand” actions (to the lower levels)
❘Implement specific behaviour& Take local decisions
〡Sequence & Automate operations〡Recover errors
❘Include/Exclude children (i.e. partitioning)〡Excluded nodes can run is stand-alone
❘User Interfacing〡Present information and receive commands
DCS
MuonDCS
TrackerDCS
…
MuonLV
MuonGAS
ControlUnit
99Clara Gaspar, May 2010
Control Unit Run-Time
❚Dynamically generated operation panels(Uniform look and feel)
❚ Configurable User Panelsand Logos
❚ “Embedded” standard partitioning rules:❙ Take❙ Include❙ Exclude❙ Etc.
1010Clara Gaspar, May 2010
Operation Domains❚Three Domains have been defined:
❙DCS❘For equipment which operation and stability is
normally related to a complete running periodExample: GAS, Cooling, Low Voltages, etc.
❙HV❘For equipment which operation is normally related
to the Machine state. Example: High Voltages
❙DAQ❘For equipment which operation is related to a RUN
Example: Readout electronics, High Level Trigger processes, etc.
1111Clara Gaspar, May 2010
FSM Templates❚ DCS Domain ❚ HV Domain
❚ DAQ Domain READY
STANDBY1
OFF
ERRORRecover
STANDBY2
RAMPING_STANDBY1
RAMPING_STANDBY2
RAMPING_READY
NOT_READY
Go_STANDBY1
Go_STANDBY2
Go_READY
RUNNING
READY
NOT_READY
Start Stop
ERROR UNKNOWN
Configure
Reset
Recover
CONFIGURING
READY
OFF
ERROR NOT_READY
Switch_ON Switch_OFF
Recover Switch_OFF
❚ All Devices and Sub-Systems have been implemented using one of these templates
1212Clara Gaspar, May 2010
❚Size of the Control Tree:❙Distributed over ~150 PCs
❘~100 Linux(50 for the HLT)
❘~ 50 Windows
❙>2000 Control Units❙>30000 Device Units
❚The Run Control can be seen as:❙The Root node of the tree➨ If the tree is partitioned there can be
several Run Controls.
ECS: Run Control
DCS
SubDetNDCS
SubDet1DCS
…
DAQ
SubDetNDAQ
SubDet1DAQ
…
HV TFC LHCHLT
SubDet1
ECS
XX
1313Clara Gaspar, May 2010
DAQ Partitioning❚ ECS Domain
RUNNING
READY
NOT_READY
StartRun StopRun
ERROR
Configure
Reset
Recover
CONFIGURING
NOT_ALLOCATED
Allocate
Deallocate
ALLOCATING
❚Creating a Partition❙Allocate = Get a “slice” of:
❘Timing & Fast Control (TFC)❘High Level Trigger Farm (HLT)❘Storage System❘Monitoring Farm
ACTIVE
StartTrigger StopTrigger
SWITCH
HLT farm
Detector
TFC System
SWITCHSWITCH SWITCH SWITCH SWITCH SWITCH
READOUT NETWORK
L0 triggerLHC clock
MEP Request
Event building
Front-End
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
CPU
Readout Board
Expe
rimen
t Con
trol
Sys
tem
(EC
S)
VELO ST OT RICH ECal HCal MuonL0
Trigger
Event dataTiming and Fast Control SignalsControl and Monitoring data
SWITCH
MON farm
CPU
CPU
CPU
CPU
Readout Board
Readout Board
Readout Board
Readout Board
Readout Board
Readout Board
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
FEElectronics
1414Clara Gaspar, May 2010
DomainX
Sub-Detector
Run Control User Interface
❚Matrix
❚ActivityUsed for
Configuring all Sub-Systems
1515Clara Gaspar, May 2010
DAQ Configuration❚Cold Start to Running takes ~4 mins
❙Slowest Sub-systems:❘MUON FE Electronics
〡Did not use the recommended HW interface…
❘High Level Trigger〡Startup and Configuration of ~10000 processes〡Trigger Processes based on Offline Software
❚But a cold start is very rare:❙Run is started well before “Stable Beams”❙“Fast Run Change” mechanism
❘When “conditions” change❘Takes ~5 seconds
1616Clara Gaspar, May 2010
LHCb Operations❚Only 2 Operators on shift
❙The Shift Leader has two views of the system❘Run Control UI❘LHCb Status UI
❚Automation❙ In Sub-systems
❘HLT:Dead TriggerTasks/Nodes
❘HV:Tripped Channels
❙Centrally❘Voltages depending
on LHC State❘More will come…
1717Clara Gaspar, May 2010
Conclusions❚ LHCb has designed and implemented a coherent
and homogeneous control system
❚ The Run Control allows to:❙ Configure, Monitor and Operate the Full Experiment
❙ Run any combination of sub-detectors in parallel in standalone
❙ Can be completely automated (when we understand the machine)
❚ Some of its main features:❙ Sequencing, Automation, Error recovery, Partitioning
➨ Come from the usage of SMI++ (integrated with PVSS)
❚ It’s being used daily for Physics data taking and other global or sub-detector activities
1818Clara Gaspar, May 2010
Sub-Detector Run Control
❚“Scan” Run