hera-b daq system and its self-healing abilities · 5 may 2003 v.rybnikov, desy 1 hera-b daq system...

21
5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture Read-out Self-healing tools Switch SLT nodes isolation 3. Run control system 4. Self-healing tools (software) Releasing resources Process recovery V.Rybnikov, DESY, Hamburg

Upload: others

Post on 13-Sep-2019

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 1

Hera-B DAQ System and its self-healing abilities

1. HERA-B experiment2. DAQ architecture

• Read-outØ Self-healing tools

• SwitchØ SLT nodes isolation

3. Run control system4. Self-healing tools (software)

• Releasing resources• Process recovery

V.Rybnikov, DESY, Hamburg

Page 2: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 2

HERA-B experiment (sub-detectors)

Page 3: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 3

DAQ architecture

LOGGINGNODES (3)

Event Rate 10 MHz

50 KHz < 30 13 Gb/s

500 Hz 200 165 MB/s

50 Hz 150 22 MB/s

criti

cal p

oint

s

~1100 SHARC nodes

240 SLT nodes

100 x 2 4LT CPUs

~ 2000 processes on ~ 1500 nodes

DATAvolume

Page 4: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 4

DAQ architecture (SHARC board)

• 6U VME card (MSC, Stutensee, Germany)• 6 ADSP-21060 (Analog Devices), 40 MHz• ADSP chip holds 512 KB on-chip memory• global memory bus (240 MB/s in 48bit words)• external memory 256K x 32• 10 DMA controllers / chip

• 6 for 4 bit parallel links (40 MB/s)• 4 for global memory communication

• VME interface to write/read ADSP and global memory

44SWITCH

1FCS interface

2Event Controller

140SLBs

Page 5: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 5

DAQ architecture (read-out)

SHA

RC

INT

ER

FAC

E

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC

CO

NT

RO

LL

OG

IC

PIGGYBACK

SHA

RC

boa

rd

PIGGYBACK

0,5 – 60 m

27-40 MHz

SHA

RC

INT

ER

FAC

E

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC

CO

NT

RO

LL

OG

IC

SHA

RC

INT

ER

FAC

E

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC

CO

NT

RO

LL

OG

IC

SHA

RC

INT

ER

FAC

E

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CEDIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC

CO

NT

RO

LL

OG

IC

SHA

RC

INT

ER

FAC

E

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC DIGITAL PIPELINE

OUTPUTFIFO SH

AR

CIN

TE

RFA

CE

ADCTDC

CO

NT

RO

LL

OG

IC

Total : ~2070 FEDs(32-1024 channels)

Push-down systemNo missing clock allowedNo hardware recovery

Page 6: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 6

Self-healing tools (read-out recovery)

monitor SVD

monitor ITR

commonmonitor

DATA STREAM

• FED error threshold • min period between

consecutive recoveries • max number of

consecutive recoveries

FEDexpert ACTION

Monitors• check event header

information for every FED w.r.t.errors

Page 7: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 7

Self-healing tools (read-out recovery)

AC

TIO

N

• stop triggers• reset FEDs• Re-chain (initialize)

event buffers and EventController

• start triggers

Action takes < 5 sec Run re-initialization ~ 2 minRun re-start ~ 8 - 10 min

Page 8: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 8

DAQ architecture (switch)

from FEDs Routing tables server

• reads the switchconnection data base

• creates routing tablesin memory

• pushes down the tablesinto every SHARC node after the boot-up

SHARC to PCI interfaceboards are used to connectSecond Level PCs to the SWICTH

10

12

Page 9: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 9

Self-healing tools (SLT nodes isolation)

from FEDS

10

12 Distributor tasks:• to send calibration constants to all Second Level Trigger (SLT) nodes• to check status of the SLT nodes (processes) via ping-pongmessages

Problem:Accumulating messagesaddressed to a dead node (process) blocks the switch

DISTRIBUTOR

Page 10: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 10

Self-healing tools (SLT nodes isolation)

SLTprocess

routingtable server

distributor

SLT processexpert

processserver

interconnections

routinginformation

SLT process died

changerouting

SLT process died

terminateSLT process

ping-pong

Page 11: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 11

Run control system

Ø the process information for all runs is stored in the DAQ data baseü list of processesü how to start them (args, env, etc)ü where to start themü etc.

Ø all the processes are started remotelyby means of process servers and managers

Ø clean-up of shared resources (shared mem, semaphores, etc) carried out during the start-up and stop procedures

BASICS

Page 12: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 12

Run control system (process service)

Features

ü Process creation and terminationon any ‘ONLINE’ machine

ü Process status monitoring andnotification about its change

ü Monitoring the node resourcesutilization (CPU, memory, etc)

Implementation start interfaceproserv commands

processserver

inetd

proserv interface

startstopkill

Page 13: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 13

Run control system (process management)

Data Base

Data Taking Slow ControlStandaloneTestReprocessingMC

Boot upproceduresupporters

“SYSTEM” ProcessManagers

Run Watch is the very first process for every run

globalprocesses

ComponentFARM

processes

Page 14: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 14

Run control system (DAQ data base)

process configuration

process template

Page 15: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 15

Run Watch

Run control system (run boot-up)

ü checking process servers on all machinesØ restarting them if required

ü freeing resources by launching ‘fini’ scripts

Process managerCOMP 1

comp 1 processes comp N processes

Process managerCOMP N

Run ControlGUI

“SYSTEM”Process Manager

global processes

Page 16: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 16

Self-healing tools (process recovery)

ProcessManager

Can berestarted ?

Yes

No

restart

Critical ?No

Yes

forget

Prosess server reports on process termination

Checks processes

Page 17: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 17

Conclusions

HERA-B is a big complex experiment developedand built up by hundreds of scientists, engineersand technicians. The major developments are complete.Problems effecting data taking efficiency are being fixed by introducing self-healing tools.

Page 18: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 18

Appendix (ONLINE expert tools)

Page 19: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 19

Switch performance

Page 20: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 20

Switch performance

Page 21: Hera-B DAQ System and its self-healing abilities · 5 May 2003 V.Rybnikov, DESY 1 Hera-B DAQ System and its self-healing abilities 1. HERA-B experiment 2. DAQ architecture • Read-out

5 May 2003 V.Rybnikov, DESY 21

Switch routing