parallel i/o performance: from events to ensembles
DESCRIPTION
Parallel I/O Performance: From Events to Ensembles. In collaboration with: Lenny Oliker David Skinner Mark Howison Nick Wright Noel Keen John Shalf Karen Karavanic. Andrew Uselton National Energy Research Scientific Computing Center Lawrence Berkeley National Laboratory. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/1.jpg)
Parallel I/O Performance: From Events to Ensembles
Andrew UseltonNational Energy Research Scientific Computing Center
Lawrence Berkeley National Laboratory
In collaboration with:• Lenny Oliker• David Skinner• Mark Howison• Nick Wright• Noel Keen• John Shalf• Karen Karavanic
![Page 2: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/2.jpg)
• Explosion of sensor & simulation data make I/O a critical component
• Petascale I/O requires new techniques: analysis, visualization, diagnosis
• Statistical methods can be revealing • Present case studies and optimization results
for:• MADbench – A cosmology application• GCRM – A climate simulation
Parallel I/O Evaluation and Analysis
2
![Page 3: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/3.jpg)
IPM-I/O is an interposition library that wraps I/O calls with tracing instructions
job
output
input
traceIP
M-I/O
Job trace
Read I/O Barrier Write I/O
3
![Page 4: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/4.jpg)
Events to Ensembles
The details of a trace can obscure as much as they revealAnd it does not scale
Task 0
Task 10,000 Wall clock time
Statistical methods reveal what the trace obscuresAnd it does scale
count
![Page 5: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/5.jpg)
Case Study #1:
• MADCAP analyzes the Cosmic Microwave Background radiation.
• Madbench – An out-of-core matrix solver writes and reads all of memory multiple times.
![Page 6: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/6.jpg)
CMB Data Analysistime domain - O(1012)
pixel sky map - O(108)
angular power spectrum - O(104)
![Page 7: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/7.jpg)
MADCAP is the maximum likelihood CMB angular power spectrum estimation code
MADbench is a lightweight version of MADCAP
Out-of-core calculation due to large size and number of pix-pix matrices
MADbench Overview
![Page 8: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/8.jpg)
Computational StructureI. Compute, Write
(Loop)III. Read, Compute, Write (Loop)
IV. Read, Compute/Communicate (Loop)
II. Compute/Communicate (no I/O)
The compute intensity can be tuned down to emphasize I/O
task
wall clock time
![Page 9: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/9.jpg)
MADbench I/O Optimization
wall clock time
task
Phase II. Read # 4 5 6 7 8
![Page 10: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/10.jpg)
MADbench I/O Optimization
count
duration (seconds)
![Page 11: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/11.jpg)
MADbench I/O Optimization
duration (seconds)
Cumulative Probability
A statistical approach revealed a systematic pattern
![Page 12: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/12.jpg)
MADbench I/O Optimization
Lustre patch eliminated slow
reads
Time
Pro
cess
#
Before
After
![Page 13: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/13.jpg)
Case Study #2:
• Global Cloud Resolving Model (GCRM) developed by scientists at CSU
• Runs resolutions fine enough to simulate cloud formulation and dynamics
• Mark Howison’s analysis fixed it
![Page 14: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/14.jpg)
GCRM I/O Optimization
Wall clock time
Task 0
Task 10,000
At 4km resolution GCRM is dealing with a lot of data. The goal is to work at 1km and 40k tasks, which will require 16x as much data.desired
checkpoint time
![Page 15: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/15.jpg)
GCRM I/O Optimization
Worst case 20 sec
Insight: all 10,000 are happening at once
![Page 16: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/16.jpg)
GCRM I/O Optimization
Worst case 3 sec
Collective buffering reduces concurrency
![Page 17: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/17.jpg)
GCRM I/O Optimization
Before
After
desired checkpoint time
![Page 18: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/18.jpg)
GCRM I/O Optimization
Still need better worst case behavior
Insight: Aligned I/O
Worst case 1 sec
![Page 19: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/19.jpg)
GCRM I/O Optimization
Before
After
desired checkpoint time
![Page 20: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/20.jpg)
GCRM I/O Optimization
Sometimes the trace view is the right way to look at it
Metadata is being serialized through task 0
![Page 21: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/21.jpg)
GCRM I/O Optimization
Defer metadata ops so there are fewer and they are larger
![Page 22: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/22.jpg)
GCRM I/O Optimization
Before
desired checkpoint time
After
![Page 23: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/23.jpg)
Conclusions and Future Work
• Traces do not scale, can obscure underlying features
• Statistical methods scale, give useful diagnostic insights into large datasets
• Future work: gather statistical info directly in IPM
• Future work: Automatic recognition of model and moments within IPM
![Page 24: Parallel I/O Performance: From Events to Ensembles](https://reader036.vdocument.in/reader036/viewer/2022081520/568160fa550346895dd03871/html5/thumbnails/24.jpg)
Acknowledgements
• Julian Borrill wrote MADCAP/MADbench• Mark Howison performed the GCRM optimizations• Noel Keen wrote the I/O extensions for IPM• Kitrick Sheets (Cray) and Tom Wang (SUN/Oracle)
assisted with the diagnosis of the Lustre bug
• This work was funded in part by the DOE Office of Advanced Scientific Computing Research (ASCR) under contract number DE-C02-05CH11231