wire-cell software brett viren · brett viren (bnl)wc s/wjan 2019 9/22. wire-cell toolkit wct...

22
Wire-Cell Software Brett Viren Physics Department DUNE S&C @ BNL – Jan 2019

Upload: others

Post on 08-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Wire-Cell Software

Brett VirenPhysics Department

DUNE S&C @ BNL – Jan 2019

Page 2: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Outline

Wire-Cell Software Overview

Wire-Cell Toolkit

WC/LS/art Integration

Strategy and Discussion

Brett Viren (BNL) wc s/w Jan 2019 2 / 22

Page 3: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Wire-Cell Software Overview

Wire-Cell Algorithms and Processing Chain

Xin Qian

Brett Viren (BNL) wc s/w Jan 2019 3 / 22

Page 4: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Wire-Cell Software Overview

Wire-Cell Software Algorithm Scope

Wire-CellPrototype

Wire-CellToolkit

InteractiveVisualization Future

3DImaging

3DClustering

NoiseFiltering

Ionization -Scintilation

'flash'matching

SignalProcessing(inc. L1)

PatternRecognition

Drift andField Response

Simulation

''Bee'' 3D(WebGL)

''Magnify'' toolscharge + light

sigproc/tracking(ROOT/GUI)

3DMachineLearning

Parallelprocessing

(grid + HPC)

Currentlyout of scope

Particle-trackingsimulation (Geant4)

Raw data fileinterface (art)

Event-processingframework (art)

Brett Viren (BNL) wc s/w Jan 2019 4 / 22

Page 5: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Wire-Cell Software Overview

Prototype and Toolkit Code BasesCommon:• Similar build systems and external dependencies.• Open source repositories on GitHub.

Prototype:• Lightly structured code base, various main() programs, freedom to

experiment without many “rules”, comes with “no” user support, noreleases. Initial proving ground for eventual toolkit code.

• Used by MicroBooNE results.• Top package: https://github.com/BNLIF/wire-cell

Toolkit:• Structured/designed code base, careful dependency control,

toolkit-style integration to user’s app, shared library plugins, configurationsubsystem, optimized code, production releases and support.

• Used by MicroBooNE and ProtoDUNE.• Top package: https://github.com/wirecell/wire-cell-build

Brett Viren (BNL) wc s/w Jan 2019 5 / 22

Page 6: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Wire-Cell Toolkit

Wire-Cell Software Overview

Wire-Cell Toolkit

WC/LS/art Integration

Strategy and Discussion

Brett Viren (BNL) wc s/w Jan 2019 6 / 22

Page 7: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Wire-Cell Toolkit

Wire-Cell Toolkit Packagesutil fundamental data types, operations, toolkit infrastructure.

iface abstract “interface” base classes for WCT components.

cfg reference configuration files (.jsonnet and .fcl)

data larger config “data” files (.json.bz2)

gen components for electron drift and field response signal andnoise simulation.

sigproc components for noise filtering and signal processing (fieldresponse deconvolution and L1 regularization)

sio components to provide various I/O (depends on ROOT).

python utility, debugging, analysis, config data file prep.

pgraph single-thread, low-memory execution model implementation.

tbb experimental multi-threaded execution model implementation.

sing Singularity image creation and user scripts.

docs news blog, manual, presentations and other documentations.

tests larger-than-unit tests

waftools build system support.

Brett Viren (BNL) wc s/w Jan 2019 7 / 22

Page 8: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Wire-Cell Toolkit

Layered stack of user-code entry points

wire-cell CLI art CLI

Wire-Cell/LArSoft integration

Wire-Cell plugin/shared libraries

construct data flow processing graph via user configuration

use abstract component interfaces via factory lookup

#include "implementation.h" / use concrete WC classes

implement new concrete component classes

design new interface base classes

Brett Viren (BNL) wc s/w Jan 2019 8 / 22

Page 9: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Wire-Cell Toolkit

Aside: Data Flow Programming Paradigm

Programming = drawing: construct a directed graph of processing nodes withlabeled ports connected by edges which transfer data objects.

0 DepoFanout

0

1

2

3

4

5

0 Ductorductor0 0

0 Ductorductor1 0

0 Ductorductor2 0

0 Ductorductor3 0

0 Ductorductor4 0

0 Ductorductor5 0

0 Digitizerdigitizer0 0

0

1

2

3

4

5

FrameFanin 0

0 Digitizerdigitizer1 0

0 Digitizerdigitizer2 0

0 Digitizerdigitizer3 0

0 Digitizerdigitizer4 0

0 Digitizerdigitizer5 0

0 DumpFrames0 NumpyFrameSaver 00 NumpyDepoSaver 0TrackDepos 0 0 VagabondX 0

• Nodes may contain state, edges may buffer intermediate results.

• Stateless nodes and thread-safe edges allow for multi-threading.

• Edges may be generalized to allow multi-node communication.

• Edge data may be fine-grained (ie, objects smaller than one “event”).

• Different graph execution strategies may be employed.

Brett Viren (BNL) wc s/w Jan 2019 9 / 22

Page 10: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Wire-Cell Toolkit

WCT Implements DFP Paradigm

• Technically optional but all “real” jobs defined in terms of a DFP graph.

• WCT Jsonnet configuration supports scale invariant graph construction.o Complex subgraph description can be encapsulated into a single node

object, parameterized and re-used.

• Abstracted graph execution supports different execution strategies:o Primary engine: single-thread with memory-minimizing.o Experimental: TBB-based multi-threaded, CPU-maximizing.o Possible future: multi-thread+multi-node with ZeroMQ or MPI?

Brett Viren (BNL) wc s/w Jan 2019 10 / 22

Page 11: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

WC/LS/art Integration

Wire-Cell Software Overview

Wire-Cell Toolkit

WC/LS/art Integration

Strategy and Discussion

Brett Viren (BNL) wc s/w Jan 2019 11 / 22

Page 12: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

WC/LS/art Integration

WC/LS Integration Design

Brett Viren (BNL) wc s/w Jan 2019 12 / 22

Page 13: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

WC/LS/art Integration

WC/LS Design In Words

• WireCellToolkit module provides reference art module.• WCLS tool provides art Tool interface almost exactly likewire-cell command line interface.

• A pure-WCT DFP graph may be rewritten to a WC/LS graphsimply by replacing its data sources/sinks with correspondingWC/LS converter components.

o “visit” the art::Event before or after module execution.o convert data or provide interface to art services.

• High-level WCT configuration specified in FHiCL• Corresponds to what may otherwise be specified to wire-cell CLI• Typically, a few select “external variable” parameters set in FHiCL• Additionally, must specify list of converter components.

Brett Viren (BNL) wc s/w Jan 2019 13 / 22

Page 14: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

WC/LS/art Integration

WC/LS Status

• WCT is now standard for MicroBooNE noise filtering andsignal processing.

• WCT simulation integrated and tested to consume LArG4energy depositions, produce raw ADC waveforms.

• Multi-APA support added to above to support ProtoDUNE-SP.

Brett Viren (BNL) wc s/w Jan 2019 14 / 22

Page 15: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Strategy and Discussion

Wire-Cell Software Overview

Wire-Cell Toolkit

WC/LS/art Integration

Strategy and Discussion

Brett Viren (BNL) wc s/w Jan 2019 15 / 22

Page 16: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Strategy and Discussion

WCT Strategy For DUNE

1 Continue to advance algorithms and port to toolkit2 Add support for per-APA processing

• Relatively straight-forward in WCT• But, need a per-APA “loop” all the way to the input file (ie, art support).

3 Leverage WCT design to exploit multi-core platforms andreduce RAM/core.

4 Understand if multi-core GRID jobs are sufficient for DUNEproduction data and simulation processing and push intoHPC space if not.

Some more on each point on following slides.

Brett Viren (BNL) wc s/w Jan 2019 16 / 22

Page 17: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Strategy and Discussion

Wire-Cell Algorithm Advancement

Xin Qian

Brett Viren (BNL) wc s/w Jan 2019 17 / 22

Page 18: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Strategy and Discussion

Porting Prototype Algorithms - 3D Imaging

• Existing 3D prototype code needs understanding by others(started).

• New, more general data model needed (conceptual).• Develop optimized primitive operations (conceptual).• Initiate new eg wire-cell-img package.• Initial port of algorithms should generically support MB, PDSP

and others.• Benchmarking, validation, tuning....

Brett Viren (BNL) wc s/w Jan 2019 18 / 22

Page 19: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Strategy and Discussion

Per-APA execution model

• Full “event” in memory already requires substantial RAMo MicroBooNE/ProtoDUNE-SP easily break the 2GB/core Grid limit.o WCT recently reduced its footprint but still processes whole-evento art’s ROOT I/O overhead tends to dominateo 150 DUNE APAs in memory at once is untenable

• SigProc output is sparse, expect ≈ 104 reduction for DUNE• Input is dense: break processing down to per-APA units.

o WCT can define per-APA pipelines, some work needed.o Ultimately, only matters if art loads in N APAs at once

Discussion:→ DUNE needs to discuss with art/LArSoft experts how to achieve a per-APA

“event” loop.

→ DUNE should configure production jobs to be more selective as to whatdata types are “shunted” input→output.• Don’t carry forward “dense” RawDigits.

Brett Viren (BNL) wc s/w Jan 2019 19 / 22

Page 20: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Strategy and Discussion

Multi-threading

• WCT has existing TBB-based multi-threaded DFP graph execution engine.o May be “trivial” or maybe surprises.+ No mutable “globals”, const data objects so should be in good shape.? But, many node components exist now and with little care for thread-safety.

• Evaluate TBB execution performance and memory usage• Consider adoption/development of alternative engines.• Understand how far we can go on multi-core grid allocation alone.• Test the HPC waters.

Brett Viren (BNL) wc s/w Jan 2019 20 / 22

Page 21: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Strategy and Discussion

Thoughts on art Multi-threading

• art is multi-threaded but “only” at the module/path level,o This useful feature seems not yet exploited by the experiments.→ DUNE should try it!? Only useful to the extent that module paths are actually parallel.

• MT success and per-APA execution are linked.o Ideally, we want concurrent, per-APA pipelines.o These will require “event” level synchronization at least at job output.

• Must treat parallelism “holistically”o A highly parallel module (eg WCT) in an otherwise roughly serial art job

wastes CPU.→ See if art team can add pipelining (multiple events “in flight”).o WCT components may be pipelined “for free” depending on the execution

engine.o BNL PAS group’s “event server” technique may also be useful?

Brett Viren (BNL) wc s/w Jan 2019 21 / 22

Page 22: Wire-Cell Software Brett Viren · Brett Viren (BNL)wc s/wJan 2019 9/22. Wire-Cell Toolkit WCT Implements DFP Paradigm Technically optional but all “real” jobs defined in terms

Strategy and Discussion

Multi-Core Grid vs HPC

It’s not yet clear to me if DUNE really needs HPC.? Would achieving good CPU efficiency on Grid be enough?

• How much CPU-years does DUNE need?

- Supporting HPC will take work:o must greatly reduce RAM/core, support fine(ish) grain MT,o understand and obey special software environments (I’m told: “no ROOT”)o infrastructure, data ingest/egress, databases, security issues.o in some cases deal with “unusual” CPU architecture and OS.

+/- We would be minor player on HPC++ HPC power could open up new algorithms

? do we make the support effort just “in case”?

Brett Viren (BNL) wc s/w Jan 2019 22 / 22