Matti Kortelainen
13 June 2019
Software Framework and HPCShort biography
● 2013 PhD from University of Helsinki (Finland)● 2014-2017 CERN Fellow working on CMS track
reconstruction for phase 1 and phase 2 upgrades● 2018-> FNAL Computational Physics Developer
to bridge science community with computer science side of software
• CMS was the first LHC experiment toachieve multi-threaded framework– In production use since 2015
• Successfully engaged the collaborationfor the code modernization
• Running 8-thread jobs for over a year– 15-thread jobs at NERSC
• Lessons learned documented in publications– C D Jones et al 2014, J. Phys. Conf. Ser. 513 022034– C D Jones et al 2015, J. Phys. Conf. Ser. 664 072026– C D Jones et al 2017, J. Phys. Conf. Ser. 898 042008
Multi-threading success
2 6/13/19 Matti Kortelainen | Software Framework and HPC
256-thread Pythia+Geant4 simulation job on Intel Xeon Phi (Knights Landing)
Multi-threading transition
3 6/13/19 Matti Kortelainen | Software Framework and HPC
• In wall clock time the multithreading transition took– Initial R&D: 1 year– Framework: 1 year– Reconstruction code running: ½ year– Gradual improvement since then
• Reconstruction efficient on 8 threads: 2-3 years
• Involved core engineering team– Christopher Jones, David Dagenhart, William Tanenbaum,
Patrick Gartung• Involved few tens of domain experts and detector physicists
– People who built the detectors and know how to reconstruct the data
The challenge (example from HLT)
4 6/13/19 Matti Kortelainen | Software Framework and HPC
• Efficiently orchestratedata flow throughhundreds of modules
Multi-threaded orchestration of modules
5 6/13/19 Matti Kortelainen | Software Framework and HPC
• Framework uses Intel Threading Building Blocks (TBB)– C++ template library where computations are broken into tasks
that can be run in parallel• Each module expressed as a task, build dependence chain
for tasks, let TBB scheduler to deal with the execution• TBB’s model with work stealing and mechanisms to prevent
when needed have proven to be vital for good scaling
• The challenge extends to efficiently orchestrate hundreds of algorithms to utilize computing resources optimally
• Fruitful collaboration with R&D projects to gain experience• Framework gained generic ability to run external work
– Do other work on CPU while waiting for external processing
• Also developed mechanism for portable configuration, and patterns for using CUDA directly from modules
Supporting accelerators
6 6/13/19 Matti Kortelainen | Software Framework and HPC
CPU
Accelerator
acquire() produce()other work
GPU, FPGA, etc
Even
t data
Callback
• Geant V simulation toolkit– Has its own threadpool, external to CMS software– Eventually GPU support
• Pixel local reconstruction and tracking on GPU– Advanced prototype within CMS framework
• Also first prototypes for ECAL and HCAL reco– Collaborating with NERSC within NESAP program
• Parallelized Kalman Filter tracking– Full tracking on GPU
• ML inference on FPGA as a Service
Examples of accelerator use cases
7 6/13/19 Matti Kortelainen | Software Framework and HPC
• We have laid the groundwork for using accelerators
• Challenge: portable algorithm code– Want to maintain only one version of each algorithm
• Next steps– Investigate OpenMP
• I will participate to ATPESC in July-August– Argonne Training Program on Extreme-Scale Computing
– Investigate technologies like Kokkos and RAJA• Within the CCE
Looking ahead
8 6/13/19 Matti Kortelainen | Software Framework and HPC