studies with the atlas trigger & data acquisition "pre-series" setup
DESCRIPTION
Studies with the ATLAS Trigger & Data Acquisition "pre-series" setup. N. Gökhan Ünel (UCI & CERN) on behalf of ATLAS TDAQ community. First- level trigger. LVL2 Super- visor. Network switches. pROS. pROS. Data storage. SDX1. Second- level trigger. Local Storage SubFarm - PowerPoint PPT PresentationTRANSCRIPT
CHEP06 - Mumbai / INDIA
Studies with the ATLAS Trigger & Data Acquisition "pre-
series" setup
Studies with the ATLAS Trigger & Data Acquisition "pre-
series" setup
N. Gökhan Ünel (UCI & CERN)
on behalf of
ATLAS TDAQ community
N. Gökhan Ünel (UCI & CERN)
on behalf of
ATLAS TDAQ community
SDX1
USA15
UX15
ATLAS Trigger / DAQ Data FlowATLAS Trigger / DAQ Data Flow
ATLASdetector
Read-Out
Drivers(RODs) First-
leveltrigger
Read-OutSubsystems
(ROSs)
UX15
USA15
Dedicated links
Timing Trigger Control (TTC)
1600Read-OutLinks
Gig
abit
Eth
erne
t
RoIBuilder
pROS
Reg
ions
Of I
nter
est
VME~150PCs
Data of events acceptedby first-level trigger
Eve
nt d
ata
requ
ests
Del
ete
com
man
ds
Req
uest
ed e
vent
dat
astores LVL2output
Event data pushed @ ≤ 100 kHz, 1600 fragments of ~ 1 kByte each
Second-leveltrigger
LVL2Super-visor
SDX1CERN computer centre
DataFlowManager
EventFilter(EF)
pROS
~ 500 ~1600
stores LVL2output
dual-CPU nodes
~100 ~30
Network switches
Event data pulled:partial events @ ≤ 100 kHz, full events @ ~ 3 kHz
Event rate ~ 200 HzData
storage
LocalStorage
SubFarmOutputs
(SFOs)
LVL2 farm
Network switches
EventBuilderSubFarm
Inputs
(SFIs)
TDAQ Det
ails:
B. Grori
ni’s tal
k
Pre-series of final systemPre-series of final system8 racks at Point-1 (10% of final dataflow)8 racks at Point-1 (10% of final dataflow)
•ROS, L2, EFIO and EF racks: one Local File Server, one or more Local Switches
•Machine Park: Dual Opteron and Xeon nodes, ROS nodes uniprocessor
•OS issues: Net booted and diskless nodes (localdisks as scratch), runing Scientific Linux, Cern version3.
•Trigger : Free trigger from L2SV or frequency manually set using LTP
One Switch
rack-
TDAQ rack-
128-port GEth for L2+EB
One ROS rack
-
TC rack+ horiz. Cooling
-
12 ROS48 ROBINs
One Full L2
rack-
TDAQ rack-
30 HLT PCs
PartialSuperv’r
rack-
TDAQ rack-
3 HE PCs
Partial EFIO rack
-
TDAQ rack-
10 HE PC(6 SFI - 2 SFO - 2 DFM)
Partial EF rack
-
TDAQ rack
-12 HLT
PCs
Partial ONLINE
rack-
TDAQ rack-
4 HLT PC(monitoring)
2 LE PC(control)2 Central
FileServers
RoIB rack
-
TC rack + horiz. cooling
-50% of RoIB
5.5
surface: SDX1underground : USA15
Farm man
agement
details:
M. Dobso
n’s talk
4
ROS & ROBIN basics
• a ROBIN card with 3 s-link fiber inputs is the basic readout system (ROS) unit. A typical ROS will house 4 such cards (4x3 =12 input channels).• ATLAS has ~1500 fiber links, ~150 ROS nodes.• ROS sw ensures the parallel processing of 12 input channels
3 inpu
t
channe
l
s
PCI readout
Custom design
ROBIN card
ROS internal parameters adjusted to
achieve maximum LVL1 rate
5
0
20
40
60
80
100
120
140
0 10 20 30 40 50 60
Level 2 Trigger acceptance (%)
“Hottest” ROS from paper model
Performance of final ROS (PC+ROBIN)
is already above requirements.
ROD-ROS mapping optimization would further reduce ROS requirements.
Final ROS HW usedUDP as n/w protocolROS with 12 ROLsROL size=1kB
50
60
70
80
90
100
110
120
130
140
0 2 4 6 8 10
Level 2 Trigger acceptance (%)
“Hottest” ROS from paper model
ROS studies1. A paper model is used to estimate ROS
requirements
2. max LVL1 rate measured on final hardware
3. Zoom in to “ATLAS region”
Low Lumi. operating region
6
• EB performance is understood in terms of Gigabit line speed and SFI performance. (No SFI output)
• The used to predict the number of SFIs needed in final ATLAS
EB Tput for various #SFI and #ROS
0
100
200
300
400
500
600
700
800
900
1000
0 1 2 3 4 5 6 7 8 9#SFI
Tput(MB/sec)
11 ROSs8 ROSs4 ROSs2 ROSs
Gigabit limit
Linear
behav
ior
EB studiesPreseries setup; 11x6x0; FS = 1kB
0
1000
2000
3000
4000
5000
6000
0 2 4 6 8 10
TS credits
EB rate [Hz]measurements
modeling
EB subsystem parameters optimized using measurements on hardware.
Discrete event simulation modeling faithfully reproduces hardware measurements.
11 ROSs, 12 ROLs, prototype ROBIN emulation (1kB per ROLs) 1 to 6 SFIs, TS = 11, NWProtocol : udp
100
1000
10000
0 10 20 30 40 50 60 70 80 90#SFIs
EB Throughput (MB/s)
P1 32
ATLAS operation P1 @70% Gigabit
Software version : tdaq-01-02
We estimate final ATLAS would need ~80SFIs when 70% of the input bandwidth is utilized.
7
L2 & ROIB studies
• We estimate 4-5 L2SVs can handle the load of full ATLAS.
0
50
100
150
200
250
300
350
400
450
500
0 2 4 6 8
nROBs
accesstime (us)
ROS not busy
ROS busy to calculate the time
budged left for
algorithms
8
Combined system -1
• The triggers are generated by the L2PUs driving the system as fast as possible. Stable operation observed even for overdriven cases.
• L2PUs run without algorithms, with more parallel threads
• each L2PU represents an LVL2 subfarm
• Measurements and their simulations are in good agreement allowing to scale the simulations from preseries setup (~10%) to final ATLAS size.
EB Rates for Various Accept Ratios
0
1000
2000
3000
4000
5000
6000
7000
8000
0 2 4 6 8 10 12 14 16 18 20 22 24#L2PU
EB Rate (Hz)
AcceptRatio=0.035AcceptRatio=0.05AcceptRatio=0.10
To be updated
ATLAS nominal runs at 3.5% LVL2
accept ratio
9
Combined system -2
• Performance analysis here
10
Simulations
• Scaling to full size, from simulations from Kris..
11
Adding Event Filter
• Output from SFI to EF nodes decreases maximum SFI throughput.
• The throughput with EF saturates to about 80% of the maximum for large events. (typical ATLAS event ~1MB)
• 80 / 0.80 = 100 SFIs needed for final ATLAS
• 2EF nodes (w/o) algorithms are enough to saturate 1 SFI
• 6 x2=12 EF nodes needed to drive the pre-series.
#EFD vs SFI Efficiency
0
20
40
60
80
100
120
0 1 2 3 4 5 6 7 8 9
#EFDs
SFI Efficiency (%)
Event size = 137KB
Event size = 268K
Event Fi
lter det
ails:
K. Korda
s’ talk
12
Adding SFO• Number of SFOs determined by disk I/O capacity & the requirement for disk space for ~24 hour independent running.
• TDR assumed 15MB/s of throughput capacity on single disk: 30 PCs result therefore in 450 MB/s (200Hz x ~2MB)
• Faster disk I/O and larger disks could mean less SFOs
• RAID can bring protection against failures:
• which raid level? Raid1, Raid5, Raid50 which filesystem ?
MB/sMB/s swsw hw1hw1 hw2hw2raid1raid1 44 48 51
raid5raid5 53 73 93
raid5raid500
n/a n/a
hw1: cheaphw2: expensive
0.5 .. 1.5MB events
ROSROS
Adding a detector • Using pre-series with Tile
Calorimeter Barrel setup at the pit (16 ROLs in 8 ROBINs, 2 ROSs)
• Possible Setups
• Simple EB (DFM self trigger)
• Combined (ROIB trigger)
• Goals for this exercise:
• Exercise the TDAQ chain (done)
• Write data @ SFO level (done)
• Use HLT sw to match μ tracks
• (prevented by lack of time)
ROSROSRODRODRODROD
DFMDFM
SFISFIEFEFSFOSFOTo Disc
ROIBROIBL2PUL2PU
L2SVL2SV
Data Path
Trigger Path
start EB
14
Adding Monitoring
• Alina’s work
SFI monitoring
0
20
40
60
80
100
0 50 100 150 200 250 300
Mon rate (Hz)
efficiency SFI mon
efficiency is obtained by
comparing rates with and without monitoring
ROS and SFI monitoring
0
20
40
60
80
100
0 0.05 0.1 0.15 0.2monitoring freq.
Efficiency
ROS eff SFI eff SFI & ROSSFI & ROS
SFI onlySFI only
The goal is to determine max allowed monitoring frequency
15
Stability & Recovery issues
• Force systematic failures in hardware and in software, check the system recovery
• Almost all infrastructure application failures can be recovered
• Apart from ROS, all DataFlow applications can be recovered. (ROS needs hardware handshake with detector side)
• For hardware failures, Process Manager will be part of Unix services to reintegrate a dead node into TDAQ chain.
P e r f o r m a n c e o f 8 x 8 x 2 0 , T S = 1 4 , A c c = 1 0
0
1 0 0 0
2 0 0 0
3 0 0 0
4 0 0 0
5 0 0 0
6 0 0 0
7 0 0 0
8 0 0 0
0 5 0 1 0 0 1 5 0 2 0 0 2 5 0 3 0 0 3 5 0 4 0 0 4 5 0 5 0 0T i m e ( m i n )
E B R a t e ( H z )
Combined system stable run at 10% accept ratio for ~8.5 hours
Fault tolerance and Recovery tests
16
Configuration Retrieval
• Measure time it takes to dump data from large databases
• can be done in a day
• Can we scale up to final system?
17
Fraction of events passing LVL2 as a function of the decision latency
0
0.2
0.4
0.6
0.8
1
1.2
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29
Latency (ms)
Fraction of events
mu
jet
e
Replaying physics data• Event data files from simulation studies for different
physics objects (muons, e, jet) were preloaded into ROS & ROIB
• The whole TDAQ chain was triggered by the ROIB
• Various physics algorithms were run at the LVL2 farm nodes to select events matching the required triggers
• Selected events were assembled from all ROS and send to event filter farm
Algorith
m detail
s:
K. Korda
s’ talk
For the first time we have realistic
measurements on various algorithm execution times.
Conclusions
Conclusions
Next stepsNext steps
✓ Pre-series system has shown that ATLAS TDAQ operates reliably and up to the requirements.✓ We will be ready in time to match the LHC challenge.
✓ Pre-series system has shown that ATLAS TDAQ operates reliably and up to the requirements.✓ We will be ready in time to match the LHC challenge.
•Exploitation with algorithms•Garbage collection•State transition timing measurements•Preseries as a release testing environment
•Exploitation with algorithms•Garbage collection•State transition timing measurements•Preseries as a release testing environment