distributed parallel processing analysis framework for belle ii and hyper suprime-cam
DESCRIPTION
Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam. MINEO Sogo (Univ. Tokyo), ITOH Ryosuke, KATAYAMA Nobu (KEK), LEE Soohyung (Korea Univ.). Distributed parallel framework. Analysis framework: ROOBASF Extended from BASF (Belle’s framework) - PowerPoint PPT PresentationTRANSCRIPT
Distributed parallel processing analysis framework for Belle II and
Hyper Suprime-CamMINEO Sogo (Univ. Tokyo),
ITOH Ryosuke, KATAYAMA Nobu (KEK), LEE Soohyung (Korea Univ.)
Distributed parallel framework
• Analysis framework: ROOBASF– Extended from BASF (Belle’s framework)
– Controls analysis workflow– For MPI distributed-memory system *– With a Python interface *– ROOT embedded *
• For the use of:– Belle II (High energy physics)– Hyper Suprime-Cam (Astrophysics)
2
* Newly appended features
Table of contents
• Motivation– Hyper Suprime-Cam & Belle II
• Distributed parallel framework– MPI & Python
• Test pipeline• Summary
3
MOTIVATION
4
Hyper Suprime-Cam (HSC) & Belle II
• Hyper Suprime-Cam (HSC)– Next-generation camera aiming for dark energy
• On the prime focus of the Subaru Telescope.• Data rate: 2GB/shot.
– 10 times larger than the current camera’s.
• Belle II– Next-generation B factory
• With Super KEKB: new high luminosity e--e+ collider at KEK.• Data rate: 600MB/sec.
– > 40 times larger than the current Belle detector’s
Efficient, distributed parallel analysis system is necessary
5
Analyses on HSC imagesChip-by-chip correction
116 CCD sensors cover the focal plane
Easily data-parallelized. Assigning chips with processes 1 by 1
Pedestalcorrection
Gaincorrection
Determine positionsby matching celestial objects
superpose chips
Parallelization is not trivial Processes must exchange – object position information – pixel information – etc.
“Mosaicking”Processes need communication
6
Use case in Belle ll
• ROOT-based data format.• DAQ cluster needs cooperation
7
Existing framework
• BASF: the framework for the Belle experiment– successfully used for 10 years.– Involved in nearly all of the experiment.
• Data Acquisition, Simulation, Users’ analysis– Software pipeline architecture
• Enables modular structure of analysis paths.• Flexible and dynamic module linking .
– Event-by-event parallel analysis
• Issues to be improved:– Large data rate: distributed parallelization– with Inter-process communication.– ROOT support / Object-oriented data flow.
analysis modules
Path
Upgrade BASF for Belle II & also for HSC
8
DISTRIBUTED PARALLEL FRAMEWORK
9
Parallel framework (ROOBASF)
• Control analysis paths.– Like BASF in Belle.
• Data parallel.– Inter-process comm.
• Program parallel.
• Python user interface.• ROOT utilization.
Process 1
Process 2
Process 3
Process 4
analysis modules
Process 1
Process 2
Path
10
Parallelization
• ROOBASF uses Message Passing Interface (MPI)– De-facto standard of distributed parallel computing.– Expected to run in various environments.
• Analysis modules use MPI to perform data-parallel algorithms.– Each pipeline stage is given an MPI group
(communicator.)– Modules perform parallel
processing just like stand-aloneMPI programs in the given group.
Process group 1 Process group 2
11
Two layers of analysis paths
• Sequential paths– Sequence of analysis modules.– Conditional branches.→All executed in one process.
• Parallel paths– Sequence of processes & c. branches.
• Each of the processes execute a “sequential path. ”
• Program-parallelization.
– Multiple copies run simultaneously.• Data-parallelization.
analysis modules
Con. branch
processes
12
Data flow• Events– Event or image data to be analyzed.
• Broadcast messages– Experiment parameters, observation params, etc.– Have to be sent to all modules.– Must not switch order with events.
overtake event?
c. branch
12
event
bcast
2
Suspend b-cast until it arrivesfrom all branches
13
Native(C++ etc)
Utilization of Python
• Analysis paths are described in the Python language.– Modules can also be described in the script inline.
• Modules can be quickly developed in Python.• CPU costly, then be rewritten in C++.
→Efficient development of analysis modules.• Implemented with the boost.python library.– Python scripts can call native codes.– Native codes can call Python scripts.
• Unique feature of boost.python, absent from SWIG.
ROOBASF
Pythonscript
PathDescrpt
.
Analysiscode
call
call
Analysiscode
14
Python scriptimport boostpbasf as basff = basf.CFrame()
f.Plug_Module( "Astr1Chip").SetParam( "config", "matching.scamp”)
Create an instance of ROOBASF framework
dopen() “Astr1Chip.so”,link the plugin code,and set its parameter.
class Load(basf.CModule): def __init__(self, namefmt): basf.CModule.__init__(self) self.namefmt = namefmt self.count = 0
def event(self, status, ev, comm): if status == 0: ev.SetFile(namefmt % count) (……) Define a python module
load = Load(“/data/img%03d.fits")
f.Seq_Add("main", load)f.Seq_Add("main", "Astr1Chip")
Create a sequential path “main”
Python
ROOBASF (native)
“main” path
Astr1Chip.so (native)
Load
15
TEST PIPELINE
16
Pipeline for the test
• Data-parallel analysis path (for on-line monitoring):– Performs pedestal/gain correction– Checks data quality– Performs 1-chip astrometry– Tiny modules in Python: Error detector, Time watch, etc.
ROOBASF
OSS FLAT AGP STAT SEXT ASTR
OSS FLAT AGP STAT SEXT ASTR
OSS FLAT AGP STAT SEXT ASTR
CCD
imag
es
correction CheckData Quality
1-chipastrometry
(Multi-threaded)
17
Test environment• 3 PCs only– x64 4-core– Gigabit-Ethernet-linked
• Number of processes– 1, 3x1, 3x2, 3x3
• Parallelization will not go linear(though CPU has 4 cores)because of multi-threaded modules.
1 process
3x1 process
3x2 processes
3x3 processes
HDD
•In. images•Out. images
CPU: 4 cores
HDD•Programs•In. images•Out. images
CPU: 4 cores
HDD
•In. images•Out. images
CPU: 4 cores
(NFS)(NFS)
18
Process with threads
Parallelization efficiency
Para
lleliz
ation
effi
cien
cy
19
Ideal speedup
1 3 6 9
20
Processwith threads
30
1510
5
1
2
3
4
5
6
7
8
9
Anal
ysis
tim
e pe
r im
age
/ se
c (in
vers
ed)
SUMMARY
20
Summary
• Analysis framework: ROOBASF– Distributed memory (MPI)– Python script– ROOT I/O
• We built a parallel analysis path for astronomical images.
• Yet to confirm feasibility in Belle II.
21