the lhcb event-builder

16
The LHCb Event-Builder Markus Frank, Jean-Christophe Garnier, Clara Gaspar, Richard Jacobson, Beat Jost, Guoming Liu, Niko Neufeld , CERN/PH 17 th Real-Time Conference Instituto Superior Tecnico Lisboa 2010

Upload: haruki

Post on 24-Feb-2016

31 views

Category:

Documents


0 download

DESCRIPTION

The LHCb Event-Builder. Markus Frank, Jean-Christophe Garnier , Clara Gaspar, Richard Jacobson, Beat Jost , Guoming Liu, Niko Neufeld , CERN/PH 17 th Real-Time Conference Instituto Superior Tecnico Lisboa 2010. Architecture. Detector. VELO. ECal. HCal. Muon. RICH. ST. OT. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The  LHCb  Event-Builder

The LHCb Event-BuilderMarkus Frank, Jean-Christophe Garnier, Clara Gaspar,

Richard Jacobson, Beat Jost, Guoming Liu, Niko Neufeld, CERN/PH

17th Real-Time ConferenceInstituto Superior Tecnico

Lisboa 2010

Page 2: The  LHCb  Event-Builder

Architecture

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 2

SWITCH

HLT farm

Detector

TFC System

EdgeRouter

Edge Router

EdgeRouter

EdgeRouter

EdgeRouter

EdgeRouter

Main Router

L0 triggerLHC clock

MEP Request

Event building

Front-End

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

CPU

Readout Board

VELO ST OT RICH ECal HCal MuonL0

Trigger

Event dataTiming and Fast Control SignalsControl and Monitoring data

Readout Board

Readout Board

Readout Board

Readout Board

Readout Board

Readout Board

FEElectronics

FEElectronics

FEElectronics

FEElectronics

FEElectronics

FEElectronics

FEElectronics

CASTOR x 50

x 313

Page 3: The  LHCb  Event-Builder

Key Parameters

# links (UTP Cat 6) ~ 3000Event-size (total – zero-suppressed)

35 kB

Read-out rate 1 MHz# read-out boards 313output bandwidth / read-out board up to 4 Gigabit/s (4 Ethernet links) # farm-nodes 550 (will grow to 1500)input bandwidth / farm-node 1 Gigabit (1 dedicated Ethernet

link)# core-routers 1 (1260 ports)# edge routers 50 (48 ports)Event-rate to storage 2000 Hz (nominal)

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 3

Page 4: The  LHCb  Event-Builder

Physical Installation

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 4

• More than 4000 UTP Cat 6 patch connections• 3 floors of ~ 80 m2

• 3rd (top) read-out boards, • 2nd (middle): DAQ network, • 1st (bottom): event-filter

farm• Cat 6 patch-cords RJ45• High-density RJ21 6-port

connectors for main-router

3 readout boards2 DAQ network1 event-filter farm

Page 5: The  LHCb  Event-Builder

The Read-out Board (“TELL1”)

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 5

12 * 1.6 Gbit/s

9.6 Gbit/s

9.6 Gbit/s

PreprocessingFeature ExtractionZero-suppression

Event fragment mergingFDAQ framing

Frame transmission4 x 1000 BaseT

To DAQ 4 Gbit/s

Page 6: The  LHCb  Event-Builder

The Protocol• Event fragments from up to 16

triggers packed into one Multi-Eventfragment Packet (MEP)

• A MEP must fit into one IPv4 packet (64 kB max!)

• IPv4 packets will be fragmented into Ethernet frames of MTU size

• MEP header is 12 bytes only (Event-number + length)

• IP address information is used in event-building

• Unreliable / no duplication / no re-transmission

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 6

Header

Event 2

Event 1

Event n

MEPEther headerIP header

Payload

Ether header

IP header

Payload

Ether header

IP header

Payload

Page 7: The  LHCb  Event-Builder

Dataflow in LHCb

77

Switch Buffer

Data AcquisitionSwitch

2 Readout Supervisor tells readout boards where events must be sent (round-robin) 3 Readout boards do

not buffer, so switch must1 Event Builders

inform Readout supervisor about availability

“Send next event

to EB1”“Send

next event to EB2”

Event Builder 2

Event Builder 1

ReadoutSupervisor

EB2 available for events

EB1 available for events

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld

Page 8: The  LHCb  Event-Builder

The many sources of packet loss

• Packet corruption (normally < 10-12) – this usually is a badly handled overflow in the FPGAs (very rare now)– Depending on the corruption these events may be dropped already

in the network• Packet loss due to buffer overflow (ingress,

egress) in core-router, edge-switches or farm-node (between 10-9 and 10-11)– This packet loss often occurs in bursts

• Packet loss in core-router (silent!) < 10-13 – could only be fixed in session with company

engineers

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 8

Page 9: The  LHCb  Event-Builder

Diagnostic: or “What the shifter saw”

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 9

Page 10: The  LHCb  Event-Builder

Event assembly / buffering in farm -node

• IP packets re-assembled by OS

• Event-building process (MEPRx) puts data in shared buffer

• Monitoring in kernel difficult: need good (and generous) tuning!

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 10

Frame buffer in NIC

DMA buffers(sk_buf)

socket-buffer

Shared Memory

MEPRx process

Main memory Network Interface

Card (NIC)

User-space OS-kernel

DMA

mem

cpy

Page 11: The  LHCb  Event-Builder

MEPRx resource usage

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 11

0 500 1000 1500 2000 2500 3000 35000

10

20

30

40

50

60

0

20

40

60

80

100

120

140

CPU %MEM

Event-rate [Hz]

% s

ingl

e 54

20 c

ore

MB

(RAM

sha

red)

% CPU of one Intel 5420 core (a 4 core processor running at 2.5 GHzMEM is resident memory (i.e. pages locked in RAM)Precision of measurements is about 10% for CPU and 1% for RAM

Page 12: The  LHCb  Event-Builder

Buffer Usage

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 12

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 170

500

1000

1500

2000

2500

3000

3500

Port (connected to edge-router)

Buff

er o

ccup

ancy

in k

BBuffer usage in core-router with a test using 270 sources @ 350 kHz event-rate

• 256 MB shared between 48 ports

• 17 ports used as “output”

• Non-uniformity under investigation

Page 13: The  LHCb  Event-Builder

Link Aggregation• IEEE 802.3ad (now actually

IEEE 802.1AX)• Does not define how traffic is

balanced over multiple links• Available balancing

algorithms are made to preserve frame-order

• Typically use a hash out of destination and source Ethernet and/or IP address– all links will be used when

different destinations sent to the same host

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 13

Core-RouterEdge-Router

1 Gbit/s6 x 1 Gbit/s1 Gbit/s

Page 14: The  LHCb  Event-Builder

Clock Tolerance in IEEE 802.3ab• 40.6.1.2.6 Transmit

clock frequencyThe quinary symbol transmission rate on each pair of the master PHY shall be 125.00 MHz ± 0.01%.

• 40.6.1.3.2 Receiver frequency toleranceThe receive feature shall properly receive incoming data with a 5-level symbol rate within the range 125.00 MHz ± 0.01%.

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 14

802.3ab transmits on 5 levels over4 twisted pairs @ 125 MHzInteframe gap (IFG) is 96 bits

Core-Router

Edge-Router

farm-node

Frame loss ~ 10-8

Frame 313 IFG Frame 312 IFG Frame 1 IFG…

Page 15: The  LHCb  Event-Builder

Not covered• Run-control (see talk by Clara

Gaspar)• Monitoring• Process Management in farm-nodes

(see poster of Juan Caicedo)• Event aggregation & storage (see

poster of Jean-Christophe Garnier)

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 15

Page 16: The  LHCb  Event-Builder

Conclusions• A (almost) pure push protocol for a

40 Gigabyte/s DAQ works– With very good hardware– And a lot of hard work

• More than 3000 UTP links are no problem and you do not need TCP

• Commercial will most of time not work off the shelf

• We will now go on to see how it works at 40 MHz (1.2 Terabyte/s)

The LHCb Eventbuilder - RealTime 2010 - Niko Neufeld 16