instead of the screen while yourstorageconference.us/2019/invited/settlemyer.slides.pdflos alamos...
TRANSCRIPT
![Page 1: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/1.jpg)
Instead of the
screen while your
Operated by Triad National Security, LLC for the U.S. Department of Energy's NNSA
![Page 2: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/2.jpg)
in Slide, you
logo/management
use one of the two
Los Alamos National Laboratory
Understanding Storage System Challenges for Parallel Scientific Simulations
Brad SettlemyerLos Alamos National Laboratory
LA-UR-19-24811
![Page 3: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/3.jpg)
Los Alamos National Laboratory
4/23/19 | 3
Outline
• Intro to Computational Science
• VPIC Overview• PIC Introduction• VPIC Scientific Workflow• VPIC I/O Workloads
• Real VPIC I/O Challenges
![Page 4: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/4.jpg)
Los Alamos National Laboratory
4/23/19 | 4
A Brief Introduction to Computational Science
![Page 5: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/5.jpg)
Los Alamos National Laboratory
4/23/19 | 5
The Traditional Scientific Method
• A method for understanding the physical world
• Begins with observation• Some parts of the physical
world are not well suited to observation• Galaxy formations/collisions• Climate models• Asteroid collisions• Fluid dynamics
![Page 6: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/6.jpg)
Los Alamos National Laboratory
4/23/19 | 6
Incorporating Simulation into The Scientific Method
• Computer-based simulation enables new scientific inquiry
• Long time-scales• Complex interactions• Dangerous interactions
• Computational Challenges
![Page 7: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/7.jpg)
Los Alamos National Laboratory
4/23/19 | 7
Incorporating Simulation into The Scientific Method
• Computer-based simulation enables new scientific inquiry
• Long time-scales• Complex interactions• Dangerous interactions
• Computational Challenges• Tightly-coupled simulations
imply bulk-synchronous I/O• A single job may require
months of compute time
![Page 8: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/8.jpg)
Los Alamos National Laboratory
4/23/19 | 8
1. Create Mesh (Computational Science Workflow)
Fixed Mesh(Valves, cylinders)
Adaptive Mesh(Turbulent combustion)
Mesh deformation(Shock propagating in fluid)
![Page 9: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/9.jpg)
Los Alamos National Laboratory
4/23/19 | 9
2. Calculate Physics (Computational Science Workflow)
• Often takes weeks or months• Figure shows particle-in-cell
(PIC) method• Many other methods• Finite Element Methods• Finite Difference Methods• Monte Carlo Methods
• The actual scientific question being answered typically favors one method or another
![Page 10: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/10.jpg)
Los Alamos National Laboratory
4/23/19 | 10
3. Generate Data (Computational Science Workflow)
• Simulation pauses when all processes reach some interesting point in the simulation• Save state to protect against a
failure (checkpoint/restart)• Save state for later analysis• Machine failures and scientific
insight occur at different frequencies L
• Once I/O is complete, simulation resumes
Compute/Clients Lustre
OSS
LustreMDS
LustreOST
PFSRouters
IOBB(Infiniband)
![Page 11: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/11.jpg)
Los Alamos National Laboratory
4/23/19 | 11
3. Generate Data (Computational Science Workflow)
• Simulation pauses when all processes reach some interesting point in the simulation• Save state to protect against a
failure (checkpoint/restart)• Save state for later analysis• Machine failures and scientific
insight occur at different frequencies L
• Once I/O is complete, simulation resumes
Compute/Clients Lustre
OSS
LustreMDS
LustreOST
PFSRouters
IOBB(Infiniband)
![Page 12: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/12.jpg)
Los Alamos National Laboratory
4/23/19 | 12
4. Analyze Data (Computational Science Workflow)
• Scientists analyze/visualize simulation output• Test and validate hypotheses• Source of new phenomena
observations!
• Automatic and in-situ analysis emerging as relevant to some scientific fields
![Page 13: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/13.jpg)
Los Alamos National Laboratory
4/23/19 | 13
What makes HPC computing unique and difficult?
• Simulation Scale• Frequently billions or trillions of mesh cells (1.5PB simulations on Trinity)• Simulations run for weeks or months
• Longest simulation on Trinity: 7 months• Longest I’ve heard of: 18 months
• Universe tends toward disorder (entropy increases)• As simulation progresses, high % of memory is frequently modified• Tight-coupling, frequent communication due to boundary condition
exchanges and load balancing over time• Large storage system requirements
• Checkpoint/restart bursts to support long running jobs• Capacity to store large quantities of restart dumps and analysis data
![Page 14: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/14.jpg)
Los Alamos National Laboratory
4/23/19 | 14
An Overview of VPIC
![Page 15: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/15.jpg)
Los Alamos National Laboratory
4/23/19 | 15
Quick Particle-In-Cell (PIC) Overview
Particles model material• Millions of particles per process• Trillions of particles per simulation
Fixed Mesh• Method extends to 3D well• Each process maintains a
contiguous chunk of the mesh• Updates fields and materials
• Solves the Maxwell-Boltzmann kinetic equations
• Applications in astrophysics, fusion, plasma interactions
Node 0
Node 1
Node2
Node3t0 t1
PIC Introduction: https://www.youtube.com/watch?v=CmhSWPpa_6w
![Page 16: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/16.jpg)
Los Alamos National Laboratory
4/23/19 | 16
Why do I/O researchers use VPIC?
• Excellent scaling• Demonstrated across 4096 Trinity nodes (32k
processes)• Flexible code
• Popular CS languages (engine is 16k sloc C/C++)• Supports MPI, OpenMP, and Pthreads• Can be field dominant or particle dominant• Can be compute/comm/memory intensive
![Page 17: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/17.jpg)
Los Alamos National Laboratory
4/23/19 | 17
VPIC’s Simulation Science WorkflowD
ata
Ret
entio
n Ti
me
Forever
Temporary
Setup/Parameterize/Create Mesh
SimulatePhysics Viz
Initial Mesh Setup
CheckpointDump
JobBegin
JobEnd
Campaign
CheckpointDump
Time-stepData Set
SampledData Set
Down-Sample
Post-Process
AnalysisData Set
SimInputDeck
Phase S1 Phase S2 Phase S3 Phase S4 Phase S5
CheckpointDump
4 – 8x per week
5 - 15x per pipeline
Time-stepData Set
5 – 10x per week
Simulation Science Pipeline
![Page 18: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/18.jpg)
Los Alamos National Laboratory
4/23/19 | 18
VPIC Checkpoint/Restart
• Essential for simulations running for long duration over thousands of nodes• Basic paradigms: N-N, N-M, N-1• Typically the largest consumer of bandwidth/capacity
• In general must store both the particles and the fields• Why?! Performance!• Approximately 80% of system memory• VPIC uses N-N file organization for checkpoint/restart
![Page 19: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/19.jpg)
Los Alamos National Laboratory
4/23/19 | 19
LANL’sTrinityComputer
Burst Buffer (3.5 PB, 2.5 TB/s)
Platform Memory (2PiB, PiB/s)
Lustre PFS (78PB, 1.1 TB/s)
Campaign/Archive
HPC Checkpoint Workload
![Page 20: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/20.jpg)
Los Alamos National Laboratory
4/23/19 | 20
VPIC Time Step Data Sets
• Types of data• Particles (32 – 48 bytes each)• Fields (typically <1k, but could be much more)• Cell Materials (often 0 bytes)
• 2 primary methods for data reduction• Sampling (mean, spatial average, etc.)• Decimation
• Scientist typically determines the processing methods needed• Frequently not well optimized• Bound on bandwidth and performed on front ends
![Page 21: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/21.jpg)
Los Alamos National Laboratory
4/23/19 | 21
VPIC Visualization
• Format the data into a parallel visualization format
• Paraview, Ensight, VisIt, etc• Visualization workflows are
typically bound on read performance
• Interactivity defeats pre-fetching algorithms
• Viewing doesn’t always occur along the contiguous dimension
![Page 22: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/22.jpg)
Los Alamos National Laboratory
4/23/19 | 22
Real VPIC I/O Challenges
![Page 23: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/23.jpg)
Los Alamos National Laboratory
4/23/19 | 23
Tracking the Trajectory of High Energy particles
Assumptions:• Simulation has trillions of
particles• Highest energy particles only
known at simulation end• Insufficient memory to track the
history of each particleGoal:• Determine if the trajectory of
the high-energy particles follows Fermi acceleration between magnetic islands
• Highly selective queries
![Page 24: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/24.jpg)
Los Alamos National Laboratory
4/23/19 | 24
Spatial distribution of particles within energy band
Assumptions• Simulation has trillions of
particles• Energy distribution changing
over timeGoals• Filter particles by energy band to
examine the spatial location of energy bands
• Scan intensive workload
Image and problem from “Parallel I/O, Analysis, and Visualization of a Trillion Particle Simulation,” Byna, et al.
![Page 25: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/25.jpg)
Los Alamos National Laboratory
4/23/19 | 25
The tip of the iceberg …
• Where are the largest clusters of similarly charged particles (i.e. magnetic islands)?
• Which particles have most recently moved between magnetic islands?
• Which particles are moving as groups and how are they moving?
• Is it possible to develop a taxonomy of formations that occur during a magnetic reconnection?
• And more …
![Page 26: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/26.jpg)
Los Alamos National Laboratory
4/23/19 | 26
Conclusions
• VPIC is an excellent resource for I/O researchers• Open source• Popular programming languages (subsets)• Doesn’t require exotic compilers• Highly scalable• Important scientific problems
• VPIC scientists have real I/O problems• A VPIC researcher has consumed all of the Trinity storage systems inodes• Extremely small writes are an unsolved problem• Data analysis performance severely limits current insight
![Page 27: Instead of the screen while yourstorageconference.us/2019/Invited/Settlemyer.slides.pdfLos Alamos National Laboratory 4/23/19 | 3 Outline • Intro to Computational Science • VPIC](https://reader033.vdocument.in/reader033/viewer/2022060916/60a91f945ce4ee40134d0636/html5/thumbnails/27.jpg)
Los Alamos National Laboratory
4/23/19 | 27
Thanks!