studies with the atlas trigger & data acquisition "pre-series" setup

CHEP06 - Mumbai / INDIA

Studies with the ATLAS Trigger & Data Acquisition "pre-

series" setup

Studies with the ATLAS Trigger & Data Acquisition "pre-

series" setup

N. Gökhan Ünel (UCI & CERN)

on behalf of

ATLAS TDAQ community

N. Gökhan Ünel (UCI & CERN)

on behalf of

ATLAS TDAQ community

SDX1

USA15

UX15

ATLAS Trigger / DAQ Data FlowATLAS Trigger / DAQ Data Flow

ATLASdetector

Read-Out

Drivers(RODs) First-

leveltrigger

Read-OutSubsystems

(ROSs)

UX15

USA15

Dedicated links

Timing Trigger Control (TTC)

1600Read-OutLinks

Gig

abit

Eth

erne

t

RoIBuilder

pROS

Reg

ions

Of I

nter

est

VME~150PCs

Data of events acceptedby first-level trigger

Eve

nt d

ata

requ

ests

Del

ete

com

man

ds

Req

uest

ed e

vent

dat

astores LVL2output

Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each

Second-leveltrigger

LVL2Super-visor

SDX1CERN computer centre

DataFlowManager

EventFilter(EF)

pROS

~ 500 ~1600

stores LVL2output

dual-CPU nodes

~100 ~30

Network switches

Event data pulled:partial events @ ≤ 100 kHz, full events @ ~ 3 kHz

Event rate ~ 200 HzData

storage

LocalStorage

SubFarmOutputs

(SFOs)

LVL2 farm

Network switches

EventBuilderSubFarm

Inputs

(SFIs)

TDAQ Det

ails:

B. Grori

ni’s tal

k

Pre-series of final systemPre-series of final system8 racks at Point-1 (10% of final dataflow)8 racks at Point-1 (10% of final dataflow)

•ROS, L2, EFIO and EF racks: one Local File Server, one or more Local Switches

•Machine Park: Dual Opteron and Xeon nodes, ROS nodes uniprocessor

•OS issues: Net booted and diskless nodes (localdisks as scratch), runing Scientific Linux, Cern version3.

•Trigger : Free trigger from L2SV or frequency manually set using LTP

One Switch

rack-

TDAQ rack-

128-port GEth for L2+EB

One ROS rack

-

TC rack+ horiz. Cooling

-

12 ROS48 ROBINs

One Full L2

rack-

TDAQ rack-

30 HLT PCs

PartialSuperv’r

rack-

TDAQ rack-

3 HE PCs

Partial EFIO rack

-

TDAQ rack-

10 HE PC(6 SFI - 2 SFO - 2 DFM)

Partial EF rack

-

TDAQ rack

-12 HLT

PCs

Partial ONLINE

rack-

TDAQ rack-

4 HLT PC(monitoring)

2 LE PC(control)2 Central

FileServers

RoIB rack

-

TC rack + horiz. cooling

-50% of RoIB

5.5

surface: SDX1underground : USA15

Farm man

agement

details:

M. Dobso

n’s talk

4

ROS & ROBIN basics

• a ROBIN card with 3 s-link fiber inputs is the basic readout system (ROS) unit. A typical ROS will house 4 such cards (4x3 =12 input channels).• ATLAS has ~1500 fiber links, ~150 ROS nodes.• ROS sw ensures the parallel processing of 12 input channels

3 inpu

t

channe

l

s

PCI readout

Custom design

ROBIN card

ROS internal parameters adjusted to

achieve maximum LVL1 rate

5

0

20

40

60

80

100

120

140

0 10 20 30 40 50 60

Level 2 Trigger acceptance (%)

“Hottest” ROS from paper model

Performance of final ROS (PC+ROBIN)

is already above requirements.

ROD-ROS mapping optimization would further reduce ROS requirements.

Final ROS HW usedUDP as n/w protocolROS with 12 ROLsROL size=1kB

50

60

70

80

90

100

110

120

130

140

0 2 4 6 8 10

Level 2 Trigger acceptance (%)

“Hottest” ROS from paper model

ROS studies1. A paper model is used to estimate ROS

requirements

2. max LVL1 rate measured on final hardware

3. Zoom in to “ATLAS region”

Low Lumi. operating region

6

• EB performance is understood in terms of Gigabit line speed and SFI performance. (No SFI output)

• The used to predict the number of SFIs needed in final ATLAS

EB Tput for various #SFI and #ROS

0

100

200

300

400

500

600

700

800

900

1000

0 1 2 3 4 5 6 7 8 9#SFI

Tput(MB/sec)

11 ROSs8 ROSs4 ROSs2 ROSs

Gigabit limit

Linear

behav

ior

EB studiesPreseries setup; 11x6x0; FS = 1kB

0

1000

2000

3000

4000

5000

6000

0 2 4 6 8 10

TS credits

EB rate [Hz]measurements

modeling

EB subsystem parameters optimized using measurements on hardware.

Discrete event simulation modeling faithfully reproduces hardware measurements.

11 ROSs, 12 ROLs, prototype ROBIN emulation (1kB per ROLs) 1 to 6 SFIs, TS = 11, NWProtocol : udp

100

1000

10000

0 10 20 30 40 50 60 70 80 90#SFIs

EB Throughput (MB/s)

P1 32

ATLAS operation P1 @70% Gigabit

Software version : tdaq-01-02

We estimate final ATLAS would need ~80SFIs when 70% of the input bandwidth is utilized.

7

L2 & ROIB studies

• We estimate 4-5 L2SVs can handle the load of full ATLAS.

0

50

100

150

200

250

300

350

400

450

500

0 2 4 6 8

nROBs

accesstime (us)

ROS not busy

ROS busy to calculate the time

budged left for

algorithms

8

Combined system -1

• The triggers are generated by the L2PUs driving the system as fast as possible. Stable operation observed even for overdriven cases.

• L2PUs run without algorithms, with more parallel threads

• each L2PU represents an LVL2 subfarm

• Measurements and their simulations are in good agreement allowing to scale the simulations from preseries setup (~10%) to final ATLAS size.

EB Rates for Various Accept Ratios

0

1000

2000

3000

4000

5000

6000

7000

8000

0 2 4 6 8 10 12 14 16 18 20 22 24#L2PU

EB Rate (Hz)

AcceptRatio=0.035AcceptRatio=0.05AcceptRatio=0.10

To be updated

ATLAS nominal runs at 3.5% LVL2

accept ratio

9

Combined system -2

• Performance analysis here

10

Simulations

• Scaling to full size, from simulations from Kris..

11

Adding Event Filter

• Output from SFI to EF nodes decreases maximum SFI throughput.

• The throughput with EF saturates to about 80% of the maximum for large events. (typical ATLAS event ~1MB)

• 80 / 0.80 = 100 SFIs needed for final ATLAS

• 2EF nodes (w/o) algorithms are enough to saturate 1 SFI

• 6 x2=12 EF nodes needed to drive the pre-series.

#EFD vs SFI Efficiency

0

20

40

60

80

100

120

0 1 2 3 4 5 6 7 8 9

#EFDs

SFI Efficiency (%)

Event size = 137KB

Event size = 268K

Event Fi

lter det

ails:

K. Korda

s’ talk

12

Adding SFO• Number of SFOs determined by disk I/O capacity & the requirement for disk space for ~24 hour independent running.

• TDR assumed 15MB/s of throughput capacity on single disk: 30 PCs result therefore in 450 MB/s (200Hz x ~2MB)

• Faster disk I/O and larger disks could mean less SFOs

• RAID can bring protection against failures:

• which raid level? Raid1, Raid5, Raid50 which filesystem ?

MB/sMB/s swsw hw1hw1 hw2hw2raid1raid1 44 48 51

raid5raid5 53 73 93

raid5raid500

n/a n/a

hw1: cheaphw2: expensive

0.5 .. 1.5MB events

ROSROS

Adding a detector • Using pre-series with Tile

Calorimeter Barrel setup at the pit (16 ROLs in 8 ROBINs, 2 ROSs)

• Possible Setups

• Simple EB (DFM self trigger)

• Combined (ROIB trigger)

• Goals for this exercise:

• Exercise the TDAQ chain (done)

• Write data @ SFO level (done)

• Use HLT sw to match μ tracks

• (prevented by lack of time)

ROSROSRODRODRODROD

DFMDFM

SFISFIEFEFSFOSFOTo Disc

ROIBROIBL2PUL2PU

L2SVL2SV

Data Path

Trigger Path

start EB

14

Adding Monitoring

• Alina’s work

SFI monitoring

0

20

40

60

80

100

0 50 100 150 200 250 300

Mon rate (Hz)

efficiency SFI mon

efficiency is obtained by

comparing rates with and without monitoring

ROS and SFI monitoring

0

20

40

60

80

100

0 0.05 0.1 0.15 0.2monitoring freq.

Efficiency

ROS eff SFI eff SFI & ROSSFI & ROS

SFI onlySFI only

The goal is to determine max allowed monitoring frequency

15

Stability & Recovery issues

• Force systematic failures in hardware and in software, check the system recovery

• Almost all infrastructure application failures can be recovered

• Apart from ROS, all DataFlow applications can be recovered. (ROS needs hardware handshake with detector side)

• For hardware failures, Process Manager will be part of Unix services to reintegrate a dead node into TDAQ chain.

P e r f o r m a n c e o f 8 x 8 x 2 0 , T S = 1 4 , A c c = 1 0

0

1 0 0 0

2 0 0 0

3 0 0 0

4 0 0 0

5 0 0 0

6 0 0 0

7 0 0 0

8 0 0 0

0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0 5 0 0T i m e ( m i n )

E B R a t e ( H z )

Combined system stable run at 10% accept ratio for ~8.5 hours

Fault tolerance and Recovery tests

16

Configuration Retrieval

• Measure time it takes to dump data from large databases

• can be done in a day

• Can we scale up to final system?

17

Fraction of events passing LVL2 as a function of the decision latency

0

0.2

0.4

0.6

0.8

1

1.2

1 3 5 7 9 11 13 15 17 19 21 23 25 27 29

Latency (ms)

Fraction of events

mu

jet

e

Replaying physics data• Event data files from simulation studies for different

physics objects (muons, e, jet) were preloaded into ROS & ROIB

• The whole TDAQ chain was triggered by the ROIB

• Various physics algorithms were run at the LVL2 farm nodes to select events matching the required triggers

• Selected events were assembled from all ROS and send to event filter farm

Algorith

m detail

s:

K. Korda

s’ talk

For the first time we have realistic

measurements on various algorithm execution times.

Conclusions

Conclusions

Next stepsNext steps

✓ Pre-series system has shown that ATLAS TDAQ operates reliably and up to the requirements.✓ We will be ready in time to match the LHC challenge.

✓ Pre-series system has shown that ATLAS TDAQ operates reliably and up to the requirements.✓ We will be ready in time to match the LHC challenge.

•Exploitation with algorithms•Garbage collection•State transition timing measurements•Preseries as a release testing environment

•Exploitation with algorithms•Garbage collection•State transition timing measurements•Preseries as a release testing environment

studies with the atlas trigger & data acquisition "pre-series" setup

Documents

l2 rack tdaq rack

switch rack tdaq rack

dfmpartial ef rack tdaq

final system

typical ros

ros requirements2

final atlas size

final ros hw usedudp