tdaq workshop - geneve 11 july 2002 1 general description event storing mechanism ef dataflow…
DESCRIPTION
TDAQ Workshop - Geneve 11 July Event Handler Data Collection The Event Handler is composed of two types of processes: the EF Dataflow (EFD) the Processing Task (PT) Each Processing Host (PH) hosts a single EFD and several PTs, which shall be Athena programs with specific algorithms, services and converters for the EF context The EFD exchanges events with the Data Collection via the SFI and SFO components PH1 EFD PT SFI DCH1 SFO DCH1 SFI DCH2 SFI DCH3 SFI DCH4 DCHn PH2 EFD PT PH3 EFD PT SFO DCH2 SFO DCH3 SFO DCHm SFO DCH4 SFI the EFD is the client of the SFI and SFO processes which run on remote hosts the EFD could support multiple and concurrent SFI and SFO components protocol (based on raw TCP) defined in collaboration with the DataCollection Event Handler designTRANSCRIPT
1TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
General descriptionGeneral description
Event storing mechanismEvent storing mechanism
EF dataflowEF dataflow
TasksTasks
WorkersWorkers
WorksWorks
Athena integrationAthena integration
EF SupervisorEF Supervisor
2TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The detailed analysis of the DAQ-1 EF prototypes led to the identification of
some properties which has been at the basis of the new design
Decoupling of the dataflow task from the processing one (PT implemented as
independent processes) in order to maximise the robustness and the reliability of the
EF dataflow
Minimum number of control points in order to simplify the monitoring
Reliable and simple event recovery mechanism
Maximum exploitation of the SMP architecture wherever possible (especially for
communication) maintaining at the same time a good hardware architecture
flexibility
Scalability in terms of numbers of PTs
Versatile and modular event dataflow architecture
Design basis
3TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Eve
nt H
andl
er
Data Collection
The Event Handler is composed of two
types of processes:
the EF Dataflow (EFD)
the Processing Task (PT)
Each Processing Host (PH) hosts a single
EFD and several PTs, which shall be Athena
programs with specific algorithms, services
and converters for the EF context
The EFD exchanges events with the Data
Collection via the SFI and SFO components
PH1
EFDPT
PT
PT
PT
PT
SFI
DCH1
SFO
DCH1
SFI
DCH2
SFI
DCH3
SFI
DCH4 DCHn
PH2
EFDPT
PT
PT
PT
PT
PH3
EFDPT
PT
PT
PT
PT
SFO
DCH2
SFO
DCH3
SFO
DCHm
SFO
DCH4
SFI
the EFD is the client of the SFI and SFO processes which run on remote hosts
the EFD could support multiple and concurrent SFI and SFO components
protocol (based on raw TCP) defined in collaboration with the DataCollection
Event Handler design
4TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Events are sent on demand from the SFI to the EFD the EFD sends an event request to the SFI when it is
ready to handle it
When the SFI has an event ready, it responds by sending it
Only after the successful reception of the event the EFD sends a new event request
There is no time limit for the SFI to respond to the EFD
The event transfer is fully data driven
EFD does not send an explicit acknowledgment to the SFI when the event has been secured on disk the SFI keeps the sent event in memory until the EFD requests the next event
In case that the connection to an EFD is lost, the last sent event is put back in the queue and will be sent out to another connection on the next event request
The returned event is made of a header (check word + eventsize word) and a sequence of bytes corresponding to the event
SFI - EFD protocol
5TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The EFD is still the client 3 messages must be exchanged to transfer the event
The EFD has to send a space request to the SFO
If space is available, the SFO returns an event request to which the EFD respond by sending the event
There is an unlimited waiting time for the event request after sending the space request
But there should not be any waiting time between the event request and the event data sending
In case that the connection from the EFD to the SFO is lost before the event transfer is completed The SFO discards the event data received so far
The EFD will re-establish a connection and send a new space request
This protocol is less efficient than the SFI to EFD event transfer, but the bandwidth requirement is also expected to be 10 times smaller
SFO - EFD protocol
6TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The EFD, which is in charge of all data management and security issues,
gets events from the SFI
sorts events according to information contained in the event header
makes the events of a given type available to the corresponding event-type
specific PT
send selected events to the SFO
To avoid unnecessary data copy the events are exchanged between EFD and
the PTs via a shared memory mapped file
this solution offers also the advantage to provide data recovery in case of PT/EFD
process crash
The required control and synchronisation information for the access to the
m-mapped file is exchanged via Unix domain sockets
Event Handler Design
7TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The EFD
stores incoming events in the m-mapped file
passes to the PT, via Unix domain sockets,
the offset and size of the file portion
containing the event to be processed
The PT
maps this portion of file in read only mode
(to protect the data against corruption and to
allow event recovery in case of PT crashes)
when needs to extend the event by adding
computed data, it request to the EFD a writable zone in the
shared m-mapped file in which it writes the data extension
In the EFD, only the event reference will be passed along in the internal
event data flow; the event data is never copied
Event Storing Mechanism SFI
Memory mapped file Memory mapped file
PTA
PTB
PTB
PTA
SFO
Processing Host
EFD-SFI connection
Unix domain socket
Reference in memory
Supervisor communications
EFD
PTA
8TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
This solution minimises disk I/O operations
When writing in memory mapped file, data stays in physical memory until the pages
needs to be swapped out or the process is terminated by a crash
The OS will take care to save all dirty pages
It is only in case of OS crash that recovery of the file content may be questionable
Extending the physical memory should enhance the performance
The shared m-mapped file must support management of blocks of different size
We chose a simple and efficient technique that provides blocks of different
dimensions with the constrain that the size must be a power of 2
On 32 bit Linux machines
The smallest block size is fixed by the virtual memory page size: 4096 bits (= 212)
The maximum file size is in theory 232 although it should be limited to 230
otherwise there is no virtual memory space left for the EFD process
Event Storing Mechanism
9TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The file is divided in different sized blocks, free blocks of same size are chained and a small table of 18 entries (230-12) holds the entry point of the different lists
When a block of a given size is requested The next free block in the corresponding free block list is returned Otherwise recursively, a free block of twice its size is taken and spliced in two. One is
returned and the other inserted in the corresponding free block list When deleting a block
Recursively, one checks its other half to see whether it is free If it is, blocks are merged into a bigger block When the other half is used, the free block is inserted in the corresp. free block list
The simplicity of the algorithm provides a very fast memory management system although wasting some memory space because of the size restriction
m-mapped file management algorithm
3029282726
12.........2324252627282930
10TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The algorithm is encoded in the SharedHeap class: m_basePtr is the base address of the m-mapped file
m_freeBlock[18] store the address of the first free blocks of dimension 2n+12
The mutex m_m synchronizes the concurrent access to the m_freeBlock index in the EFD MT environment The index is also protected by a memory read-only protection that is removed only during index modification
newBlock(int sz) returns the pointer to a free Block of size 2sz-12 or throws an exception if space is exhausted
deleteBlock(Block *b)releases the allocated Block
The constructor requires the name of the m-mapped file and the size sz (in 2sz units or in bytes if sz > PAGESIZE)
If sz is given a new file is created, otherwise the default value ( sz=-1) triggers the file recovery procedure and an existing SharedHeap file is opened and the contained events can be recovered
The SharedHeap class struct Block——————————————————————————————— m_check : ushort m_sz : char m_free : char m_prev : long m_offset : long m_next : long m_data : char[PAGESIZE-16]——————————————————————————————— data() : void*
PA
GES
IZE
class SharedHeap—————————————————————————— m_fileName : string m_basePtr : Block* m_freeBlock : long* m_size : long m_m : mutex—————————————————————————— SharedHeap(string,int) newBlock(int) : Block* deleteBlock(Block*)
11TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
General capability of the EFD
Must allows events sorting (to different PT-type) by a user definable type-sorting rule
For each type of event or processing there can be one or more PT
Must allow the addition of pre and post event processing and user specified internal
event monitoring task like histograms, event counting or checking, etc.
Must allow dynamic reconfiguration with minimal interference with dataflow activity
Must support the dynamic addition/removal of PTs even if removal is due to PT crash
PT can either be a process using the Athena framework and executing offline algorithms
or a user specific process. The API and library supporting communication with the EFD
must be simple, "light" and efficient.
Interaction with the Supervision should be loosely coupled; it must be possible to stop
and restart the supervisor without interfering with the EFD activity
Events passed for processing to the PT must be recoverable in case of PT crash
The events handled by the EFD should be recoverable in case of EFD process crash
Event Filter Dataflow
12TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The global EFD function is divided into differ.
specific tasks that could be dynamically
interconnected to form an EF dataflow network
The tasks are the basic processing units and
can be
purely internal
sorting task, monitoring task, etc..
interface to external components
interface to the DataCollection
(e.g.: Input and Output Task)
interface with the PTs performing event
reconstruction/selection or detector calibr.
and therefore rely on the processing latency of these external components
Each task can be executed:
by a own thread (generally interfaces with external component)
by a global worker thread
Internal dataflow: the Tasks
PH EFD InputTask
SortingTask
ExtPTsTask
ExtPTsTask
OutputTask
CountingTask
Histogr.Task
PreProc.Task
PostProc.Task
PT
PT
PT
PT
PT
SFI
SFO
13TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Each task can be executed:
by its own thread
(in case of “external interface” Task:
light blue in figure)
by a global worker thread
The internal dataflow is based on
reference passing
only the pointer to the event (stored in the
memory mapped file) flows between
the different Tasks
the event pointer (as all other object pointers
used in the EFD) are smart pointers and
therefore provide garbage collection
Internal dataflow: the Tasks
PH EFD InputTask
SortingTask
ExtPTsTask
ExtPTsTask
OutputTask
CountingTask
Histogr.Task
PreProc.Task
PostProc.Task
PT
PT
PT
PT
PT
SFI
SFO
14TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The dataflow path is obtained chaining the different tasks
Each task is derived from the Task base class that provides a method named
processEvent() receiving a reference on an event pointer and returning a pointer
to the next Task to be executed
The implementation of the processEvent() method is task-specific and
distinguishes the different type of tasks
The last task of the chain returns NULL and has to store the event pointer
The engine of the chaining mechanism is a Worker thread (class Worker) which
obtains “work” from an associated Work Queue, which contains the pairs:
<Event pointer, Task pointer>
the pointer to the event to be processed and the first Task
scheduled for the processing of this event
Internal dataflow: the Tasks
15TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The Worker Thread executes the loop:
calls getWork() that returns from the WorkQueue the pair <*Event ,*Task >
calls the processEvent() method of the Task passing the *Event
this method returns the pointer to the next Task (and the loop continues) or the NULL
pointer if it is the last task in the chain
Internal dataflow: the Worker
tasknWorker thread Work queue task1 task2
getWork()
(task1,event
processEvent(event)
(task2)
processEvent(event)
(task3)
processEvent(event)
(NULL)
16TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
With this mechanism: Tasks that only need very simple processing (like event counting, event sorting, etc..) can
be implemented very efficiently with the minimal overhead on the overall processing time Event type sorting mapping is trivial since the Task will return the pointer to one of the
possible successor tasks It is very easy to duplicate an event so that two or more processing paths may be
executed in parallel. In that case, processEvent() returns a pointer to the first successor Task and entries are added to the Work Queue for the additional paths
The smart pointer to events will take care that the event is only deleted when there are no more references to it
On the other hand, Tasks requiring big and variable processing time (InputTask, OutputTask, ExtPTsTask) are executed by their own thread
to compensate processing/communication latency, they provide their own queue buffer that is filled by a previous Task of the EF DataFlow chain
Internal dataflow: the Worker Thread
17TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The basic dataflow elements are Tasks and Workers: they are managed by specific
managers (TaskMgr, WorkerMgr) based on the NameSet class
NameSet is a container, used to access instance by name, designed for support
dynamic reconfigurations of global objects in a MT environment
In case of object deletion, object must not be physically deleted immediately because if
another thread is referencing the instance, it may end up with an invalid reference
If the object is a Task the Work queue may contain references to it and many predecessor
tasks could have a reference to the task one wants to delete
The chosen method is that a request to delete a contained instance results in moving
the instance into a trash: another NamedSet
Instances moved into the trash are never deleted unless there is an explicit call to purge()
In this case only instances that are no longer referenced elsewhere are deleted
This ensures that there are never invalid references
Tasks and Workers Management: NameSet
18TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The contained object must have name() and erase()
erase() is called by NamedSet::erase() to inform the
instance that a request to erase it has been issued and to
check if it allows it. The instance can make the necessary
provision to cleanup, planify thread termination, etc.
It is important that threaded objects are not deleted until
their thread is terminated.
This is achieved by adding a reference to the instance in
the thread main routine
As long as this reference references the object, it will not
be deleted by a call to purge since it is referenced. Once the thread terminates, the
reference is released, and a call to purge on the container will succeed.
getFirst() and getNext() are used to iterate over the contained objects
NameSet
class NamedSet————————————————————————— trash : NamedSetLow<Obj>————————————————————————— erase(ObjPtr) : bool erase(string) : bool insert(ObjPtr) : bool insert(Obj*) : bool
Obj
class NamedSetLow——————————————————————————— shared_ptr<Obj> = ObjPtr m_obj : map<string,ObjPtr> m_mutex : mutex——————————————————————————— size() : int get(string) : ObjPtr getFirst() : ObjPtr getNext(ObjPtr) : ObjPtr purgeable() : int purge() insert(ObjPtr) : bool insert(Obj*) : bool
Obj
19TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Tasks (Workers) are managed by a TaskMgr (WorkerMgr) instance: a singleton
that inherits from the class NamedSet<Task>
It has a static method getInstance() returning a reference on the singleton instance
Thread cancellation is not supported yet
a worker thread locked in processEvent()of a Task cannot be released by erasing it
TaskMgr and WorkerMgr
To insert a new Task in the TaskMgrif(!TaskMgr::getInstance().insert(new MyTask("my task")))// … insertion succeededelse// … insertion failed because a task with // such a name already exist
To erase a Task in the TaskMgrif(TaskMgr::getInstance().erase("my task"))// … instance as been erased and moved to trashelse// … erased refused
To abtain a reference to Task in the TaskMgrTask::Ptr mytask = TaskMgr::getInstance().get("my task"))
20TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Work is stored as a pair
<Event::Ptr, Task::Ptr> in a STL deque
WorkMgr is a singleton that can be
accessed by the static getInstance()
as for the TaskMgr and WorkerMgr
The access is thread safe
getWork() returns true if it succeeds to get work in a given delay (5 s) and
otherwise it returns false
This allows the Worker thread to check if a request to erase it was issued while it was
waiting for some work to do
addWork() will always succeed and is called by Tasks like the InputTask
that receives events form the SFI (ExtPTsTask, DuplicatingTask, etc...)
Work Management
class WorkMgr—————————————————————————————————————— m_work : deque<Task::Ptr,Event::Ptr> m_hasWork : condition m_mutex : mutex—————————————————————————————————————— getInstance() : MorkMgr& addWork(Task::Ptr,Event::Ptr) getWork(Task::Ptr&,Event::Ptr&):bool size() : int purge()
21TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
InputTask continuously injects events and the Workers dispatch them to the “internal” Tasks
The event rate of “interface” Tasks (OutputTask, ExtPTsTask) depends on external operations and can be slower than the input rate
A back-pressure mechanism is necessary Barrier: a 2 states (open/close) object which,
associated to a given Task, allows a downstream Task to temporarily interrupt the event rate of the previous one
The allowed barrier operation are: open the barrier
close the barrier
test the barrier state and, if close, wait for it to be open
DataFlow control
FIGUDATAFLOW
PHEFDInputTask
SortingTask
ExtPTsTask
ExtPTsTask
OutputTask
Histogr.Task
PostProc.Task
PT
PT
SFI
SFOLo
ck()
/ un
Lock
()
test()
22TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Barriers are named and managed by BarrierMgr A Task that needs to lock the barrier or control its
state needs a local BarrierCtl instance holding an internal reference to the barrier The constructor receives a string that identifies the
controlling entity (the Task/Supervisor name) so that,for debugging purpose, the barrier keeps track of who requested to close it
When instantiated, setBarrier(name) is required in order to specify the barrier to control: it asks the BarrierMgr for a pointer to the named Barrier instance
When a barrier has been set, the BarrierCtl is valid and pass(), pass(time), lock() and unlock() can be used
There might be more than one request to close the barrier, but there must be asmany requests of reopening the barrier to change its state back to open
DataFlow control: Barrier class Barrier——————————————————————————— Ptr = shared_ptr<Barrier>——————————————————————————— Barrier(string) name() : string erase() : bool
class BarrierCtl——————————————————————————— m_barrier : Barrier::Ptr——————————————————————————— BarrierCtl(string) setBarrier(string) : bool isValid() : bool pass() pass(time) isLocked() : bool Lock() unLock()
23TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The InputTask has an associated barrier that it checks at each loop The OutputTask (e.g.) can use the barrier to signal that the InputTask should
suspend fetching more events When the number of queued events passes a threshold limit, it locks the barrier
When the number of queued events drops below the threshold, the barrier is unlocked
Any task can thus control the barrier state and remotely control the InputTask
~BarrierCtl() unlocks the barrier as expected;thus erasing a bogus Task, the barrier is properly unlocked
A barrier can be combined with a DroppingTask to silently drop events in a secondary event data flow path when it saturates For every incoming event, a DroppingTask task would check a specific barrier, and if
closed would erase the event, otherwise it forwards the event to the next task The downstream task would then simply have to control the DroppingTask barrier
to control the event flow in the branch The supervision could use this mechanism to manually lock such paths
Example: the input barrier
24TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Tasks need to have access to specific arguments
like the identity of their different successor tasks,
the name of the barrier to control, etc.
Other objects could also need parameters
It is also required that the supervisor has access
to these parameters to check and set their values
The currently proposed solution is to use a map where the key is the variable
name and the value a string holding the value
The variable name in the form "name1.name2.name3” allows to define a tree like
name space
The various provided methods allow to manage the map
Params has a singleton instance accessible by use of the static getInstance()
Parameters class Params—————————————————————————————————— m_params : map<string,string> m_mutex : mutex—————————————————————————————————— getInstance() : Params& add(string,string) : bool del(string) : bool set(string,string) : bool mod(string,string) : bool get(string,string,string) : bool get(string,int,int) : bool hasKey(string) : bool hasAny(string) : bool
25TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The EFDThread class wraps threads
Worker threads and thread implementing external interfaces
are derived from the EFDThread class
Thread is a very light thread wrapper class that allows
classes derived from it to start a thread running a
method of the class
start() launches the thread that executes
the run() method
This method has access to the class member
variables which are like thread private variables
There is currently no support for thread cancellation
and interrupt handling
EFDThread adds the concepts of state and run phase
EFDThread class
Runnable—————————— Run()
class EFDThread————————————————————————————— State = {Unstarted, Initializing, Running, Terminating, Terminated, Aborted} m_state : State————————————————————————————— initialise() : bool execute() = 0 finalize() terminate()
class Thread—————————————————— Start() Start(Runnable *)
26TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Thread::start() Without args it starts a thread running the instance's
run() method. But can also start a thread to execute the run() method of class derived from Runnable
EFDThread::initialise() If initialisation succeeded execute() is called Otherwise it directly calls finalize() where one
can do the necessary clean up When finalize() returns, the thread terminates
EFDThread::terminate() Changes the state to Terminating Each derived class could define its specific operations It is the responsibility of execute() to poll the state to detect its change from
Running to Terminating
If an exception occurs, the state is changed to Aborted, the thread is terminated and the exception is thrown further
EFDThread class Runnable—————————— Run()
EFDThread————————————————————————————— State = {Unstarted, Initializing, Running, Terminating, Terminated, Aborted} m_state : State————————————————————————————— initialise() : bool execute() = 0 finalize() terminate()
Thread—————————————————— Start() Start(Runnable *)
27TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
executes the Task specific event processing
deal with the Task interconnection network
nbrNext() returns the number of successors
setNext() allows to change the successor
getNext() returns the specific successor
an event type sorting Task has more than one
successor and processEvent() chooses according to some event header values
isValid() returns true if Task is fully operational and all successors are defined
used by the supervisor to check that the EFD has reached a runnable state
erase() and start() are used by the TaskMgr only
erase() allows a Task to be informed of the erase request (from the TaskMgr)
The Task instance will be purgeable only if there are no more references to it
Task base class class Task—————————————————————————————————————— Ptr = shared_ptr<Task> m_name : string—————————————————————————————————————— Task(string) processEvent(Event::Ptr) : Task::Ptr nbrNext() : int setNext(Task::Ptr,int=0) getNext(int=0) : Task::Ptr isValid() : bool name() : string type() : string erase() : bool start()
28TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
EndTask has no successor task and its
processEvent() always returns NULL;
it is a sort of event dataflow path terminator
ForwardingTask always returns the pointer to its successor task (to be used as “template”)
ThreadedTask
derived from EFDThread
start() starts the thread, calls initialise(), then execute() if it returned true
execute() should check the state periodically in order to terminate itself properly
It holds a pointer on itself to control its purgeable state: it is only once the thread terminates that the pointer is set to NULL and the ThreadTask may become
purgeable if there are no more references to it
erase() calls EFDThread::terminate() so that the state is set to terminate
Task inheritance tree Task
ThreadedTaskEndTaskForwardingTask
InputTask
ExtPTSTaskOutputTask
QueuedTask
29TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
InputTask implements the SFI interface
requires some parameters obtained trough the Parameter singleton instance the SFI address the name of barrier to monitor to detect back pressure a reference to the SharedHeap
QueuedTask Adds a thread safe event queue that may lock a barrier if the number of queued
events passes a threshold
processEvent() adds the event to the queue, changes the barrier state if required and returns NULL: a queued task is always the final task for a worker thread
getEvent() attempts to get an event from the queueIt will wait for a finite amount of time since it is required to poll the EFDThread state to return from the execute method if the state is not Running anymore
Tasks inheritance tree Task
ThreadedTaskEndTaskForwardingTask
InputTask
ExtPTSTaskOutputTask
QueuedTask
30TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
OutputTask
Implements the SFO interface
The OutputTask has no successor task:
The getNext() method returns a NULL pointer
ExtPTsTask
Is in charge of providing events to the external PTs
It implements a UNIX domain socket server based on the poll() system call,
that manages different PT connections
Socket hang-ups are detected and correctly treated
The events are got from the class queue and, if accepted, are inserted back in the
Worker queue (reinserted in the internal EF dataflow)
Tasks inheritance tree Task
ThreadedTaskEndTaskForwardingTask
InputTask
ExtPTSTaskOutputTask
QueuedTask
31TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
based on UNIX domain sockets: EFD implements the server (ExtPTsTask)
the PTs are the clients
PTs requests an event to the EFD, which returns the event offset in the SharedHeap
The communication protocol is based on the exchange of packets encapsulated in the structure UnixMessage header is the checkword id is used to identify the PT command contains the exchanged commands: the command enum will be increased in future
implementations in order to add more functionality The last 3 fields are used in association with the cmd_evSent: the PT, starting at offset
offset, maps portion of the SharedHeap file of size dimension send() and recv() are wrappers for the namesake system calls and are used to exchange
the message inside the socket
PTs-EFD communication protocol struct UnixMessage—————————————————————————————————— header : Nat16 id : Nat16 command : Nat8 offset : Nat32 dimension : Nat32 eventID : Nat32 cmd : enum {cmd_ping, cmd_EvAccepted, cmd_evRequest, cmd_evRejected, cmd_evSent}—————————————————————————————————— send(int,cmd,Nat32,Nat32,Nat32) recv(int,cmd,Nat32,Nat32,Nat32)
32TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Implements the UNIX domain socket client
It has been kept as simple as possible, in order to be
easily integrated in the PT code (Athena)
A “stand alone” class (requires only UnixMessage)
The constructor requires the file name associated
with the UNIX socket and the name of the SharedHeap file containing the events
connect() establishes the connection with the EFD (to be called at init. step)
To obtain a pointer to a new event the PT has only to call getEvent() This method requests a new event to the EFD via the socket, obtains the event’s offset and
dimensions, maps it in memory returning the memory address (m_addr) The map is read-only and therefore the PT cannot modify the data
answer(UnixMessage::cmd com) returns the filtering decision to the EFD In the next implementations the PT will be able to request additional portions
of the SharedHeap file where to write the processing results (event extensions)
The PTclient class class PTclient———————————————————————————————— m_name : sockaddr_un m_fd : int m_msgIn : UnixMessage* m_msgOut : UnixMessage* m_heap : string———————————————————————————————— PTclient(string,string) connect() getEvent() : char* answer(UnixMessage::cmd) recv(int,cmd,Nat32,Nat32,Nat32)
33TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
In Athena events are accessed via the Byte Stream Conversion Service (Event/ByteStreamCnvSrv) and a converter for each detector system AthenaEventLoopMgr::nextEvent() loops through the events by use of
EventIteratorByteStream and EventSelectorByteStream
The iterator increments events by calling EventSelectorByteStream::next() which in turn calls PackedRawEFHandlerSvc::nextPackedRawEvent()
The interface of Athena to the EFD occurs in the nextPackeRawEvent() method of PackedRawEFHandlerSvc PackedRawEFHandlerSvc is derived from a class in the ByteStreamCnvSvc called
PackedRawEventSrcSvc and specifies the event source as coming from the EFD
PackedRawEFHandlerSvc::nextPackedRawEvent() returns PackedRawEvent, which contains the event information in byte stream persistent form and which the ByteStreamCnvSvc and det. subsyst. converters know how to unpack into the TES
the input argument of the PackedRawEvent constructor requires a pointer to the event data, which is obtained as the return value of getEvent() of a local instance of the EFD PTclient class
Interface of Athena to the EFD
34TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The design of the EF Supervision system is intended to make use of OnlineSW
components as much as possible
However, the requirements of the OnlineSW are currently under review and at this stage it
is only feasible to present possibilities for the design
Assuming that the additional requirements are implemented in future versions of the
OnlineSW it should be possible to interface the EFD directly to the OnlineSW without
the need of an intermediate EF Supervisor
In this case the OnlineSW system “becomes” the EF Supervisor
Process monitoring and control within the EF would be carried out by the DSA_Supervisor
and activities to prepare the EF for data-taking by the RC system
It is still foreseen that some sort of EF supervision process implementing an “expert”
interface would be required to allow an expert to perform control and monitoring activities
without affecting the overall data-taking status of the EF
The EF Supervision interface to the PTs has still to be addressed
Interface to the EF Supervisor
35TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
The EFD must be capable of receiving commands from both the OnlineSW RC system and the EF Supervision process implementing the expert interface Currently it is intended to implement the interface using TCP sockets, with the EFD task
as server and the RC and EF Supervisor as clients
We hope to reuse the Run Controller and communication protocol developed by the DataCollection group
We are currently reviewing which actions should be carried out at each transition of the Run Control FSM Currently, it is clear that before a run/data-taking period can start, the Run Control must
know whether the EFD task is ready to receive events
One possible implementation is: On reception of the relevant command from the RC, the EFD Supervision interface iterates
over the critical tasks checking the return value of isValid() Once isValid() returns true for each critical task the OK reply is sent back to the RC If after a timeout not all critical tasks are returning true the Bad reply is sent back to the RC
Supervisor: command interface
36TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
On request from either the RC or the EF expert, the EFD supervision interface must be able to initiate the internal configuration or reconfiguration of the EFD Commands asking the EFD to alter its critical configuration should only be sent when not
in an active data-taking state
Commands asking the EFD to alter its non-critical configuration (insertion or removal of monitoring tasks) could in principle be sent at anytime (dynamic reconfiguration)
On receiving the command to configure, the EFD supervision interface must read the new task configuration from a file It is possible that this file would be part of the Configuration database component of the
OnlineSW currently under review
The tasks are instantiated and started by inserting them into the TaskMgr
The dataflow path is defined by calling Task::setNext()
Reconfiguration of the non critical tasks would consist of reading the new configuration and inserting the monitoring tasks into the TaskMgr and modifying the destinations of upstream tasks, using setNext() to direct events to them
Supervisor: configuration
37TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Statistics It must be possible for the EFD to record dataflow statistics and make these available to
the EF Supervision system It is currently intended to use the IS OnlineSW component As statistics become available the EFD will write them directly to the IS At periodic intervals, the information from IS will be read and displayed on the central
display console for the DAQ or/and on the EF expert console Error messages
It must be possible for the EFD to generate messages indicating abnormal situations In the current version of the OnlineSW this can be implemented using the MRS The EFD will send messages directly to MRS Situations under which error/warning messages are to be sent are still to be defined
Histograms Monitoring tasks in the EFD will create histograms The use of the ROOT package for their creation is currently under investigation The OH OnlineSW component could be used to transport the histograms for
display by either central console or the EF expert console
Supervisor: other services
38TDAQ W
orksho
p TD
AQ W
orksho
p -
Gen
eve
- Gen
eve 11 Jul
y11
Jul
y 20
0 20022
Status and Workplan A first prototype implementation is running
SFI and SFO are emulated with dummy random sized events
the PTs are dummy processes containing only a PTclient instance
there is no Supervisor interface components
Set-up of an integration test for the begin of September Sarah is working on the integration with the current version of the OnlineSW
Kristo is working for the Athena integration
Continue design of the EFD and progress prototype implementation activity should now focus on the Supervisor interaction
Other objectives are: integration of Gaudi framework/services according to Saul & Werner’s conclusions
Implementation, validation and integration of the Event Format library
Setup and configuration of a work place environment
Setup of execution environments to easy installation and testing of the whole system