computing infrastructure for online monitoring and control...
Post on 30-May-2020
3 Views
Preview:
TRANSCRIPT
www.kit.eduKIT – University of the State of Baden-Wuerttemberg and National Research Center of the Helmholtz Association
Computing Infrastructure for Online Monitoring and Control of High-throughput DAQ Electronics
S. Chilingaryan, M. Caselle, T. Dritschler, T. Farago,A. Kopmann, U. Stevanovic, M. Vogelgesang
Hardware, Software, and Network Organization
Picosecond Sampling Electronics for Terahertz Synchrotron Radiation
Prototype of Streaming PCIe Camera Developed in house
S. Chilingaryan et. all2 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Requirements• Handling of Sensors with data rates up to ~ 8 GB/s (8-12 bit)• Real-time control loop based on 2D Images + Online compression
– In-flow 8 GB/s, unpacked up to 16 GB/s (16 bit)• Slow control loop based on 3D Tomographic Images
– In-flow 4 GB/s, unpacked 32 GB/s (single-precision floating-point)• Raw data storage at full speed, i.e. 4 GB/s• Long-term storage at 1 GB/s• Integration with Tango Control System• Low administrative effort
Camera
up to8 GB/s
4 GB/s 1 GB/s 0,25 GB/sTransferrates
UFO Control System
Camera PC
Real-time control loopOuter control loop
Ar c hi vi n
g
Real-TimeStorage
Long-termStorage
4 GB/s
PCIe + GPUDirect Infiniband iSER GlusterFS NFS
S. Chilingaryan et. all3 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Concepts
• Programmable DAQ electronics with PCI-express interface• Distributed control system based on Infiniband interconnects• GPU-based computing• Multiple levels of scalability• Cheap off-the-shelf components
Poster FPI02
2007 2008 2010 2012 201410
100
1000
10000
Xeon/SP Xeon/DP Tesla/SP Tesla/DPGeForce/SP GeForce/DP
GF
lops
Historical trends of CPU and GPU performance
Prototype of Streaming PCIe Camera
Developed in house
Easily scalableQuad-SLI35 Tflops
for ~ 5000 EUR
S. Chilingaryan et. all4 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
UFO Control Network
CameraStation
Ar c hi ve
Lar ge
Sc a
l e D
at a
Fa
c ility
Beam-line
OpenCLNode
OpenCLNode
StorageNode
StorageNode
IB router
Control Room
ComputeCenter
Scalable Control Network
Infiniband(Optical)
FCO202WCO202
Master Server(lots of memory)
PCIe x8 gen3
Infiniband(Electrical)
Ethernet
10 Gig Ethernet
Talks on Data Analysis Lab:
S. Chilingaryan et. all5 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Master Server
LSDFLarge Scale Data Facility
FDR Infiniband
56 Gbit/s
Ethernet
10 Gb/sInternalPCO.edgePCO.dimax….
SuperMicro 7047GR-TRF (Intel C602 Chipset)CPU: 2 x Xeon E5-2680v2 ( total 20 cores at 2.8 Ghz)GPUs: 7 x NVIDIA GTX TitanMemory: 256 GB (512GB max)Network: Intel 82598EB (10 Gb/s)Infiniband: 2 x Mellanox ConnectX-3 VPIStorage: Areca ARC-1880-ix-12 SAS Raid 8 x Samsung 840 Pro 510 (Raid0)
Cameras Storage
High amount of memoryFast SSD-based Raid for overflow data
S. Chilingaryan et. all6 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Master Server
LSDFLarge Scale Data Facility
FDR Infiniband
56 Gbit/s
External PCIe x16 (16 GB/s)
Ethernet
10 Gb/sInternalPCO.edgePCO.dimax….
SuperMicro 7047GR-TRF (Intel C602 Chipset)CPU: 2 x Xeon E5-2680v2 ( total 20 cores at 2.8 Ghz)GPUs: 7 x NVIDIA GTX TitanMemory: 256 GB (512GB max)Network: Intel 82598EB (10 Gb/s)Infiniband: 2 x Mellanox ConnectX-3 VPIStorage: Areca ARC-1880-ix-12 SAS Raid 8 x Samsung 840 Pro 510 (Raid0) 16 x Hitachi A7K200 (Raid6)
SFF8088 (2.4 GB/s)
Cameras Storage
High amount of memoryFast SSD-based Raid for overflow dataEasy scalability with external PCI express and SAS
S. Chilingaryan et. all7 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Caching large data sets
A7K2000 (1)
A7K2000 (16)
Intel X25E (1)
Samsung 840 (8)
RamdDisk
1 10 100 1000 10000
28.71
29.74
140
1575.85
2859.88
MB/s
Using SSD drives may significantly increase random access performance to the data sets which are not fitting in memory completely. The big arrays of magnetic hard drives will not help unless multiple readers involved.
S. Chilingaryan et. all8 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Camera Station
High-speed 4-channel memoryIPMI-based remote controlOptional fast SSD-based storage4x high speed PCI express slots
Camera Link
850 MB/s
PCI express (x8)
Up to 8 GB/s
Infiniband
56 Gbit/s
Asus Z9PA-U8 (Intel C602 Chipset)CPU: Xeon E5-1620v2 ( total 4 cores at 3.7 Ghz)GPUs: NVIDIA GTX TitanMemory: 32 GB (128GB max)Infiniband: Mellanox ConnectX-3 VPI
S. Chilingaryan et. all9 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
NVIDIA GPUDirect
1 core
all cores
0 5 10 15 20 25 30 35
Core i7 950Core i7-980XCore i7 38202 x E5-2640
GB/s
Direct communication between GPUs, Network, and other devices on PCI express bus
memcpyperformance
MVAPICH2 1.9b throughput
H2H
D2D/GPUDirect
D2D
S. Chilingaryan et. all10 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Cluster Node
4-Way SLILow Price
Asus Z87-WS (Intel Z87 PCH Chipset)CPU: Core i5-4670 ( total 4 cores at 3.4 Ghz)GPUs: 3 x NVIDIA GTX TitanInfiniband: Mellanox ConnectX-3 VPIMemory: 16 GB (32GB max)
NVIDIA GTX TitanMemory: 6 GB at 288 GB/sSingle-precision Gflops: 4500Double-precision Gflops: 1500
S. Chilingaryan et. all11 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Storage Protocols
Raw
iSER
XFS/Local
XFS/iSER
OCFS2/Local
OCFS2/iSER
FhGFS (over XFS)
Gluster (over XFS)
0 500 1000 1500 2000 2500 3000 3500 4000 4500Write Read MB/s
Network FSNFSSambaSSHFS
Slow
Cluster FSLustre (patched kernel)GlusterFhGFS (close-sourced)
Slow if few nodes
Network DevicesiSCSI (slow)iSER
OCFS2
S. Chilingaryan et. all13 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Handling High-speed Storage
Buffer Cache
Storage
Data
Default data flow in Linux
Buffer cache significantly limits maximal write performance
Kernel AIO may be used to program IO scheduler to issue read requests without delays
Optimizing I/O for maximum streaming performance using a single data source/receiver
Read
Write
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Buffered Direct AIO MB/s
S. Chilingaryan et. all14 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
UFO Storage Subsystem
20 TB8 TB
Storage Node 1Raid6: 16 Hitachi 7K300, 28TB
20 TB8 TB
Storage Node 2Raid6: 16 Hitachi 7K300, 28TB
8 TB 8 TB
16 TB, XFS
iSer
iSer
SoftRaid Level 0
Master Server
High-speed Streaming storageRead: 2.3 GB/sWrite: 2.5 GB/s
GlusterFS
ComputeNode 2
ComputeNode 3
ComputeNode 1
Client 1 Client 2
NFS
S. Chilingaryan et. all15 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Software Stack
ALPSAdvanced Linux
PCI ServicesLibUCAUnified Camera
Access
UFOGPU Image
Processing Framework
FastWriterStreaming Library
Tango Control
DeviceMotors
DeviceOther slow device
ConcertControl System
LibPCOPCO Drivers
Camera Station
OpenCL
MPI
UFO
UFO Master Server
Computing Cluster
XFS
iSER
StorageCluster
SoftRaid
PythonGobject-Introspection
KIRO
S. Chilingaryan et. all16 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
KIRO: Integration with Tango
Poster FPI01Using InfiniBand for High-Throughput
Data Acquisition in a TANGO Environment
KIRO Architecture
Tango/Standard
Tango/KIRO
0 500 1000 1500 2000 2500 3000 3500 4000 4500MB/s
Tango over Corba over TCP over Infiniband is slow
S. Chilingaryan et. all17 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Summary
Poster FPI02Picosecond Sampling Electronics for
Terahertz Synchrotron Radiation
Poster FPI01Using InfiniBand for High-Throughput
Data Acquisition in a TANGO Environment
WCO202Data Management at the
Synchrotron Radiation Facility ANKA
FCO202OpenGL-Based Data Analysis in
Virtualized Self-Service Environments
• Only easy to get off-the-shelf components are used• Our architecture can be easily scaled from a single PC to the cluster with
performance in hundreds of teraflops.• The reliable storage for data streaming with rates over 3 GB/s can be easily build
based on 1 – 2 low-end servers.• The electronic components can be distributed over large area and connected with
high-speed using Optical Infiniband links.• The provided software stack allows easily integrate new devices and processing
algorithms.• The Tango control system is extended to support high-speed communication over
Infiniband
Check related talks and posters:
S. Chilingaryan et. all18 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Extra Slides
S. Chilingaryan et. all19 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
Ultra Fast X-ray Imaging of Scientific Processes with On-Line Assessment and Data-Driven Process Control
ANKA beam line
Optics and sample manipulators
Smart high-speed camera
Online monitoring and evaluation
Offline storage
UFO
GoalsHigh speed tomographyIncrease sample throughputTomography of temporal processesAllow interactive quality assessment
Enable data driven controlAuto-tunning optical systemTracking dynamic processesFinding area of interest
S. Chilingaryan et. all20 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
UFO Image Processing Framework
acquisitionflat field
correction
. . . . . .
noise reduction
PreprocessingExecuted on CPUs
sinogramgeneration FFT
. . .. . .
filter
ReconstructionExecuted on GPUs
iFFT backprojection
Storage
. . .. . .
Segmentation /meshing
storage of raw data
OpenSourcehttp://ufo.kit.edu
Features➢Easy Algorithm Exchange➢Camera Abstraction➢Pipelined Processing➢Glib/GObject, scripting language support with introspection➢OpenCL + automated management of OpenCL buffers
FiltersTomography & Laminography
FBP, DFI, Algebraic MethodsNon Local Means for Noise ReductionOptical Flow
S. Chilingaryan et. all21 Institute for Data Processing and ElectronicsKarlsruhe Institute of Technology
ALPS
DMA MemoryMapping
IRQHandling
Driver Access Layer Small & Easy to Maintain
Event EngineIPE Camera
DMA EngineNorthwest Logic DMA
XML Register List + Dynamic Registration API Register Access
Plain and FIFOPCI Memory Access Access Serialization,
Software Registers
pcitoolCommand-line tool
GUIPython/GTK User Interface
Web ServiceRemote Programming API
ScriptingBash, Perl, Ruby, Python
LabVIEWControl System Integration
PCI / PCI Express Board(Variety of FPGA Boards: KIT Camera, etc)
top related