distributed parallel processing analysis framework for belle ii and hyper suprime-cam

21
Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam MINEO Sogo (Univ. Tokyo), ITOH Ryosuke, KATAYAMA Nobu (KEK), LEE Soohyung (Korea Univ.)

Upload: addison-camacho

Post on 02-Jan-2016

37 views

Category:

Documents


6 download

DESCRIPTION

Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam. MINEO Sogo (Univ. Tokyo), ITOH Ryosuke, KATAYAMA Nobu (KEK), LEE Soohyung (Korea Univ.). Distributed parallel framework. Analysis framework: ROOBASF Extended from BASF (Belle’s framework) - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Distributed parallel processing analysis framework for Belle II and

Hyper Suprime-CamMINEO Sogo (Univ. Tokyo),

ITOH Ryosuke, KATAYAMA Nobu (KEK), LEE Soohyung (Korea Univ.)

Page 2: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Distributed parallel framework

• Analysis framework: ROOBASF– Extended from BASF (Belle’s framework)

– Controls analysis workflow– For MPI distributed-memory system *– With a Python interface *– ROOT embedded *

• For the use of:– Belle II (High energy physics)– Hyper Suprime-Cam (Astrophysics)

2

* Newly appended features

Page 3: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Table of contents

• Motivation– Hyper Suprime-Cam & Belle II

• Distributed parallel framework– MPI & Python

• Test pipeline• Summary

3

Page 4: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

MOTIVATION

4

Page 5: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Hyper Suprime-Cam (HSC) & Belle II

• Hyper Suprime-Cam (HSC)– Next-generation camera aiming for dark energy

• On the prime focus of the Subaru Telescope.• Data rate: 2GB/shot.

– 10 times larger than the current camera’s.

• Belle II– Next-generation B factory

• With Super KEKB: new high luminosity e--e+ collider at KEK.• Data rate: 600MB/sec.

– > 40 times larger than the current Belle detector’s

Efficient, distributed parallel analysis system is necessary

5

Page 6: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Analyses on HSC imagesChip-by-chip correction

116 CCD sensors cover the focal plane

Easily data-parallelized. Assigning chips with processes 1 by 1

Pedestalcorrection

Gaincorrection

Determine positionsby matching celestial objects

superpose chips

Parallelization is not trivial Processes must exchange – object position information – pixel information – etc.

“Mosaicking”Processes need communication

6

Page 7: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Use case in Belle ll

• ROOT-based data format.• DAQ cluster needs cooperation

7

Page 8: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Existing framework

• BASF: the framework for the Belle experiment– successfully used for 10 years.– Involved in nearly all of the experiment.

• Data Acquisition, Simulation, Users’ analysis– Software pipeline architecture

• Enables modular structure of analysis paths.• Flexible and dynamic module linking .

– Event-by-event parallel analysis

• Issues to be improved:– Large data rate: distributed parallelization– with Inter-process communication.– ROOT support / Object-oriented data flow.

analysis modules

Path

Upgrade BASF for Belle II & also for HSC

8

Page 9: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

DISTRIBUTED PARALLEL FRAMEWORK

9

Page 10: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Parallel framework (ROOBASF)

• Control analysis paths.– Like BASF in Belle.

• Data parallel.– Inter-process comm.

• Program parallel.

• Python user interface.• ROOT utilization.

Process 1

Process 2

Process 3

Process 4

analysis modules

Process 1

Process 2

Path

10

Page 11: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Parallelization

• ROOBASF uses Message Passing Interface (MPI)– De-facto standard of distributed parallel computing.– Expected to run in various environments.

• Analysis modules use MPI to perform data-parallel algorithms.– Each pipeline stage is given an MPI group

(communicator.)– Modules perform parallel

processing just like stand-aloneMPI programs in the given group.

Process group 1 Process group 2

11

Page 12: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Two layers of analysis paths

• Sequential paths– Sequence of analysis modules.– Conditional branches.→All executed in one process.

• Parallel paths– Sequence of processes & c. branches.

• Each of the processes execute a “sequential path. ”

• Program-parallelization.

– Multiple copies run simultaneously.• Data-parallelization.

analysis modules

Con. branch

processes

12

Page 13: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Data flow• Events– Event or image data to be analyzed.

• Broadcast messages– Experiment parameters, observation params, etc.– Have to be sent to all modules.– Must not switch order with events.

overtake event?

c. branch

12

event

bcast

2

Suspend b-cast until it arrivesfrom all branches

13

Page 14: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Native(C++ etc)

Utilization of Python

• Analysis paths are described in the Python language.– Modules can also be described in the script inline.

• Modules can be quickly developed in Python.• CPU costly, then be rewritten in C++.

→Efficient development of analysis modules.• Implemented with the boost.python library.– Python scripts can call native codes.– Native codes can call Python scripts.

• Unique feature of boost.python, absent from SWIG.

ROOBASF

Pythonscript

PathDescrpt

.

Analysiscode

call

call

Analysiscode

14

Page 15: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Python scriptimport boostpbasf as basff = basf.CFrame()

f.Plug_Module( "Astr1Chip").SetParam( "config", "matching.scamp”)

Create an instance of ROOBASF framework

dopen() “Astr1Chip.so”,link the plugin code,and set its parameter.

class Load(basf.CModule): def __init__(self, namefmt): basf.CModule.__init__(self) self.namefmt = namefmt self.count = 0

def event(self, status, ev, comm): if status == 0: ev.SetFile(namefmt % count) (……) Define a python module

load = Load(“/data/img%03d.fits")

f.Seq_Add("main", load)f.Seq_Add("main", "Astr1Chip")

Create a sequential path “main”

Python

ROOBASF (native)

“main” path

Astr1Chip.so (native)

Load

15

Page 16: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

TEST PIPELINE

16

Page 17: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Pipeline for the test

• Data-parallel analysis path (for on-line monitoring):– Performs pedestal/gain correction– Checks data quality– Performs 1-chip astrometry– Tiny modules in Python: Error detector, Time watch, etc.

ROOBASF

OSS FLAT AGP STAT SEXT ASTR

OSS FLAT AGP STAT SEXT ASTR

OSS FLAT AGP STAT SEXT ASTR

CCD

imag

es

correction CheckData Quality

1-chipastrometry

(Multi-threaded)

17

Page 18: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Test environment• 3 PCs only– x64 4-core– Gigabit-Ethernet-linked

• Number of processes– 1, 3x1, 3x2, 3x3

• Parallelization will not go linear(though CPU has 4 cores)because of multi-threaded modules.

1 process

3x1 process

3x2 processes

3x3 processes

HDD

•In. images•Out. images

CPU: 4 cores

HDD•Programs•In. images•Out. images

CPU: 4 cores

HDD

•In. images•Out. images

CPU: 4 cores

(NFS)(NFS)

18

Process with threads

Page 19: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Parallelization efficiency

Para

lleliz

ation

effi

cien

cy

19

Ideal speedup

1 3 6 9

20

Processwith threads

30

1510

5

1

2

3

4

5

6

7

8

9

Anal

ysis

tim

e pe

r im

age

/ se

c (in

vers

ed)

Page 20: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

SUMMARY

20

Page 21: Distributed parallel processing analysis framework for Belle II and Hyper Suprime-Cam

Summary

• Analysis framework: ROOBASF– Distributed memory (MPI)– Python script– ROOT I/O

• We built a parallel analysis path for astronomical images.

• Yet to confirm feasibility in Belle II.

21