modeling of the architectural studies for the panda dat system k. korcyl 1,2 w. kuehn 3, j....

29
Modeling of the architectural studies for the PANDA DAT system K. Korcyl 1,2 W. Kuehn 3 , J. Otwinowski 1 , P. Salabura 1 , L. Schmitt 4 1 Jagiellonian University,Krakow, Poland 2 Cracow University of Technology, Krakow, Poland 3 Justus-Liebig Universitat Giessen, Giessen, Germany 4 GSI, Darmstadt, Germany

Upload: ella-potter

Post on 27-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Modeling of the architectural studies for the PANDA DAT system

K. Korcyl1,2

W. Kuehn3, J. Otwinowski1, P. Salabura1, L. Schmitt4

1Jagiellonian University,Krakow, Poland 2Cracow University of Technology, Krakow, Poland

3Justus-Liebig Universitat Giessen, Giessen, Germany4GSI, Darmstadt, Germany

Outline

The PANDA experiment and TDAQ system Architecture proposal and operation of basic

components Detector Concentrator Board organization L1 processing node

Model of the architecture Preliminary results

Conclusions

Detector and DAT requirements

• interaction rate: 10 MHz

• raw data flow: 40 - 80 GB/s

• typ. event size: 4 – 8 kB

• lack of hardware trigger signal

• continuously sampling FE

• flexibility in the choice of triggering algorithms

• cost efficient (COTS components)

Detector Concentrator Board

Switch

L1 farm

L1out

Detector Front-end Data

PANDA DAT architecture

L2out L3out

Switch Switch

L2 farm L3 farm

Detector Front End Electronics

Receives precise synchronous clock signal from the central distribution system

Continuously sampling mode of operation capable of autonomous hit detection

Time stamps data with the interaction time based on the central clock and expedites message towards the DCB

Detector Concentrator Board

FreePages Fifo

Detector data with timeStamp

Paged mainMemory

timeStamp

Feature extractor

L1out L2out L3out

evt

page

L1 L2

L1dec L2req/dec L3req/dec

Empty = Busy

Central clock

timeoutL2-NO

L1-YES

L3-release

Detector Concentrator Board (2)

Local filtering process (Feature Extraction) is started with DCB_INSPECT_LATENCY. It may result in generation of a message to the LVL1 farm.

The L1 structure is cleared by the purge process running with the DCB_PURGE_LATENCY and the TIME STAMP rate. Addresses of pages found in the L1 at position DCB_PURGE_LATENCY are returned back to the free pages fifo (detector data will be overwritten).

Positive LVL1 decisions save detector data, by moving the page address from the L1 to the L2 structure

Detector Concentrator Board (3)

The DCB uses the TIME STAMPs to calculate address of the LVL1 processing node to send the features to.

N concecutive TIME STAMPs are directed to the same L1port. neutralizes quantization of time (TIME STAMPs) N - number of the TIME STAMPs sent to each of the destinations

is parameter (minimum: 3) the DCBs change the LVL1 destination port every:

(TIME STAMP width [ns] ) * N [nanoseconds] allows to rescale the architecture and avoid the switch output port

overflow.

Detector Concentrator data

Neighbor LVL1 data (copy of DCB data)

3 timeout-based processes run with the TIME STAMP rate and:

• open Latency: allows storage of the data

• neighbor Latency: sends message to neighbor LVL1

• close Latency: closes storage and starts filtration

FreePages Fifo

Paged mainMemory

page

Empty = Busy

Central clock

Time stamp

open/close

buffer0 buffer1 buffer2

open/close open/close

pageAddress

LVL1 processing node

L1 operation L1 uses sliding window of 3 TIME STAMPs to concatenate

data originating from the same interaction (ex: N=8)

TIME STAMP + 0

TIME STAMP + 1

TIME STAMP + 2

TIME STAMP + 4

TIME STAMP + 3

TIME STAMP + 7

TIME STAMP + 6

TIME STAMP + 5

TIME STAMP - 2

TIME STAMP - 1

•The close timeout analyses 3 adjacent TIMESLICES synchro-nous with the TIMESLICE rate and preprogrammed latency

time

•TIME STAMP-2 and TIME STAMP-1 are received from the neighbour L1 process port with "earlier times"

+ Copying data across L1 access points aimes to cater the L1 segmentation and boundary problems

- Duplicates data

L2 operation

On reception of the event info (+ results from processing) from the L1 processor, the L2 may request additional data from the detector concentrators referring the event number (unicast requests - PULL architecture)

L2 processing may become a sequential procedure requesting more data from various detectors in the course of verification of some physics hipotheses. The L2 latency can vary depending on the data.

L2 negative decisions are broadcasted to the DCBs.

EB operation

For positively annotated events the L2 processor will send event info + results from processing to the EB processor for the last stage of filtering and Event Building

On reception of the event info (+ results from processing) from the L2 processor, the EB processor makes series of unicast requests to ALL detector concentrators, requesting event data (avoids overflow with spontaneous replies)

After collection of replies from ALL detectors, the L3 processor sends the RELEASE event broadcast message to all DCBs.

The L3 processor performs Event Building and sends the data to the permanent storage.

The PANDA DAT model Uses SystemC – discrete event simulation platform

Model of the architecture:

Creates physics generator (100 ns inv exp inter-event)

Creates 5 detectors with various sizes of data

Creates 40 DCBs (8 DCBs per detector)

Creates 40 LVL1 processing nodes, 40 LVL2 processors and 40 EB processors

Creates 3 Ethernet switches

Connects all the components with 1Gbps Ethernet links

The model – DCB Buffer Occupancy

The model – Filtering Latency

Time measured at Time measured at the DCB between the DCB between arrival of the arrival of the detector data and detector data and the LVL1, LVL2 and the LVL1, LVL2 and the EB decisionthe EB decision

The model - Latency Variation

Sufficient processing resources Sufficient processing resources (number of CPUs) installed at the (number of CPUs) installed at the LVL1 processing node guarantee LVL1 processing node guarantee deterministic latency and lossless deterministic latency and lossless operation operation

The model – LVL1 Throughput Scaling

The number ofThe number oftime stamps per LVL1 time stamps per LVL1 processing node and the processing node and the time stamp width are the time stamp width are the key parameters which key parameters which allow the architecture to allow the architecture to scale with the amount of scale with the amount of data produced by thedata produced by theDCBsDCBs

The model – LVL1 Throughput Scaling

The number ofThe number oftime stamps per LVL1 time stamps per LVL1 processing node and the processing node and the time stamp width are the time stamp width are the key parameters which key parameters which allow the architecture to allow the architecture to scale with the amount of scale with the amount of data produced by thedata produced by theDCBsDCBs

What we did so far: We proposed architecture for the continuous

sampling data acquisition system for PANDA We built behavioral model to evaluate and

understand impact of the key architectural parameters on the performance

The architecture meets requirements: operates at 10 MHz interaction rate allows for data correlation based on time stamps offers flexibility for wide range of filtering algorithms scales for increased amount of data to be transferred

and processed – this we have to prove and it is now our main focus

Scalability studies

1 2 3 4 5 6 7

• assigns number to interaction

• introduces random delay of 200 ns per detector and stamps with delayed time

• delivers delayed messages to DCBs (each detector has 8 DCBs)

• DCB selects randomly 10% of messages and sends them to LVL1 nodes

• LVL1 sorts data and stores according to the time stamp

• LVL1 analyses messages from 3 adjacent time slices

• if sum of messages reaches criteria number the contents of the 3 time stamps is histogrammed against the interaction number

• if number of collected messages for interaction number meets another limit the event is considered as interesting and not previously recognized – it is assigned to CPU for processing. The earliest time stamp with a message belonging to the given interaction defines INTERACTION TIME STAMP.

Scalability studies

LVL1 accepted events

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15number of time slice

en

trie

s

7

8

9

10

11

12

7 - no delay

8 - no delay

9 - no delay

10 - no delay

Scalability studies

Scalability studies

Scalability studies

Scalability studies

Scalability studies

Scalability studies

Scalability studies

Scalability studies