M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 1
AliRoot-
Pub/Sub FrameworkAnalysis Component
Interface
Matthias Richter, Sebastian Kalcher, Jochen Thäder & Timm M. Steinbeck
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 2
Overview
● System consists of three main parts– A C++ shared library with a component handler class
● Compile and callable directly from AliRoot– A number of C++ shared libraries with the actual
reconstruction components themselves● Compiled as part of AliRoot and directly callable from it
– A C wrapper API that provides access to the component handler and reconstruction components
● Contained in component handler shared library● Called by Pub/Sub wrapper component● Makes Pub/Sub and AliRoot compiler independant
– Binary compatibility– No recompile of reconstruction code
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 3
Overview
HLT TPC Shared Library
ClusterfinderC++ Class
TrackerC++ Class
Component Handler
Shared LibraryCompon
entHandler
C++ Class
LoadLibrary
Initialize
Component
HandlerC
WrapperFunction
s
(Load Comp. Lib.;)Get Component;Initialize Component
Pub/SubFramewor
kWrapperProcessi
ngCompon
ent
(Load Component Library;)Get Component;Initialize Component;Process Event
AliRoot
(Load Component Library;)Initialize Components;
Get Components Global Clusterfinder
Object
GlobalTrackerObject
RegisterComponent
ProcessEvent
ProcessEvent
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 4
Components
● Components have to implement a set of abstract functions– Return ID (string)– Return set of required input data types– Return produced output data type(s)– Process one event– Close to what is needed for Pub/Sub components
● But simplified
● One global instance of each component has to be present in shared component library– Automagic registration with global component
handler object
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 5
AliRoot● AliRoot code
accesses classes in component handler shared library
● Obtains component objects from handler class
● Accesses component objects directly
HLT TPC Shared Library
ClusterfinderC++ Class
TrackerC++ Class
Component Handler
Shared Library
Component
HandlerC++
Class LoadLibrary
Initialize
AliRoot
(Load Component Library;)Initialize Components;
Get Components
Global Clusterfinder
Object
GlobalTrackerObject
RegisterComponent
ProcessEvent
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 6
Publisher/Subscriber● Pub/Sub
Framework uses ONE wrapper component
● Accesses handler and components via C wrapper API
● Can call multiple components in different libraries
– One component per wrapper instance
HLT TPC Shared Library
ClusterfinderC++ Class
TrackerC++ Class
Component Handler
Shared LibraryCompon
entHandler
C++ Class
LoadLibrary
InitializeCompon
entHandler
C WrapperFunction
s
(Load Comp. Lib.;)Get Component;Initialize Component
Pub/SubFramewor
kWrapperProcessi
ngCompon
ent
(Load Component Library;)Get Component;Initialize Component;Process Event
Global Clusterfinder
Object
GlobalTrackerObject
RegisterComponent
ProcessEvent
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 7
Publisher/Subscriber
: AliRootWrapperSubscri
ber
: C Wrapper
: AliHLTComponen
t
: AliHLTComponentHan
dler
Initialization Sequence
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 8
Publisher/Subscriber
: AliRootWrapperSubscri
ber
: C Wrapper
: AliHLTComponen
t
: AliHLTComponentHan
dler
Processing Sequence
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 9
Publisher/Subscriber
: AliRootWrapperSubscri
ber
: C Wrapper
: AliHLTComponen
t
: AliHLTComponentHan
dler
Termination Sequence
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 10
Current Status
● Basic Implementation Done● Base library with ComponentHandler and
Component base class implemented● Pub/Sub wrapper component done and working● HLT TPC reconstruction code ported and
working● Basic AliRoot HLT Configuration scheme
implemented● Ongoing work on integration of the
ComponentHandler into the data processing scheme of AliRoot
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 11
● pp – Events● 14 TeV , 0.5 T● Number of Events: 1200● Iterations: 100● TestBench: SimpleComponentWrapper● TestNodes:
– HD ClusterNodes e304, e307 (PIII, 733 MHz)– HD ClusterNodes e106, e107 (PIII, 800 MHz)– HD GatewayNode alfa (PIII, 1.0 GHz)– HD ClusterNode eh001 (Opteron, 1.6 GHz)– CERN ClusterNode eh000 (Opteron, 1.8 GHz)
ClusterFinder Benchmarks
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 12
Cluster Distribution
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 13
Signal Distribution
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 14
File Size Distribution
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 15
Total Distributions
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 16
Padrows & Pads per Patch
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 17
13797443442Average
Per patch , event
134374264353
122333985442
175255683611
133124149235
133844210294
128924313600
Filesize [Byte]# Signals# ClusterPatch
Basic Results
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 18
6,54
6,01
4,81
3,99
2,90
Patch 5
[ms]
6,906,616,676,148,826,57PIII 733MHz
6,336,066,125,648,106,04PIII 800 MHz
5,114,874,904,516,654,95PIII 1,0 GHz
4,133,943,983,665,323,96Opteron 1,8 GHz
3,062,932,962,733,922,93Opteron 1,6 GHz
Average
[ms]
Patch 4
[ms]
Patch 3
[ms]
Patch 2
[ms]
Patch 1
[ms]
Patch 0
[ms]CPU
Xeon IV 3.2 GHz 2.11 2.79 1.98 2.14 2.13 2.11 2.21
Timing Results
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 19
Timing Results
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 20
Timing Results● Memory streaming benchmarks:
– 1.6 GHz Opteron system● ca. 4.3 GB/s
– 1.8 GHz Opteron system● ca. 3 GB/s
● Reason for performance drop of 1.8 GHz system compared to 1.6 GHz system
● Cause of memory performance difference unknown, currently being investigated
● Maybe related to NUMA parameter (cf. slice 23)
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 21
Tracker Timing Results● Slice tracker, average times per slice● Opteron 1.8 GHz (Dual MP, Dual Core):
● 1 Process: ca. 3.6 ms/slice● Independent of CPU
● 2 Processes, different chips: ca. factor 1● 2 Processes, same chip, different cores: ca. factor 1.75● 4 Processes, all cores: ca. factor 1.83
● Xeon 3.2 GHz (Dual MP, HyperThreading:● Mapping to CPUs unknown for more than 1 process
● 1 Process: ca. 7.74 ms/slice● 2 Processes: ca. factor 2 slower● 3 Processes: ca. factor 3.5 slower
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 22
Timing Results – Opteron Memory● Floating Point/Memory Microbenchmarks
● CPU loop: No effect w. multiple processes● Linear memory read: Almost no effect● Random memory read:
– Runtime Factors: 1.33, 1.01, 1.43● Linear memory read and write:
– Runtime Factors: 1.57, 1.12, 2.31● Random memory read and write:
– Runtime Factors: 1.91, 1.92, 2.78● Linear memory write:
– Runtime Factors: 1.71, 1.72, 3.48● Random memory write:
– Runtime Factors: 1.97, 1.90, 3.76
Runtime Factors are for two processes on same chip, two processes on different chips, four processes on all cores, relative to single process
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 23
Timing Results – Opteron Memory● Floating Point/Memory Microbenchmarks
– Memory results, in particular memory writes, likely explanation f. tracker behaviour
– Tasks● Examine system memory parameters (e.g. BIOS, Linux kernel)
– One critical parameter: Kernel NUMA-Awareness found not activated● Re-evaluate/optimize tracker code wrt. memory writes
– Likely problem already found, Conformal Mapping uses large memory array with widely (quasi) randomly distributed write and read accesses
– Lesson for system procurement● If possible evaluate systems/architectures wrt. pure
performance AND scalability
M. Richter, University of Bergen & S. Kalcher, J. Thäder, T. M. Steinbeck, University of Heidelberg 24
Price Comparison● Opteron 1.8 GHz:
– Single Core: ca. 180,- €– Dual Core: ca. 350,- €
● Xeon 3.2 GHz: – Single Core: ca. 330,- €– Dual Core: ca. 350,- €
● Mainboard prices comparable– ca. 350-450 for dual MP, dual core capable
● For Opterons, per core prizes for full systems:– Assumption: 1 GB memory per core
– 1.8 GHz Single/Dual Core, Dual MP: ca. 800/600,- €– 2.4 GHz Single/Dual Core, Dual MP: ca. 1000/880,- €– 2.4 GHz Single/Dual Core, Quad MP: ca. 1700/1250,- €