with cray datawarp steve woods, solutions architect · •local cache for •checkpoint restart the...
TRANSCRIPT
![Page 1: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/1.jpg)
Accelerating Lustre!with Cray DataWarpSteve Woods, Solutions Architect
![Page 2: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/2.jpg)
Accelerate Your Storage!
● The Problem ● a new storage hierarchy
● DataWarp overview
● End user Perspectives
● Use cases
● Features
● Examples
● Configuration Considerations
● Summary
![Page 3: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/3.jpg)
The ProblemBuying Disk for Bandwidth is Expensive
HPC Wire, May 1, 2014Attributed to Gary Grider, LANL
![Page 4: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/4.jpg)
New Storage Hierarchy
CPU
Memory(DRAM)
Storage(HDD)
CPU
Near Memory(HBM/HMC)
Near Storage(SSD)
Far Memory(DRAM/NVDIMM)
Far Storage(HDD)
On Node
Off Node
On Node
Off Node
TraditionalToday
4Cray Storage and Data Management - 2015
Lowest effective costHighest latency
Highest effective costLowest latency
![Page 5: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/5.jpg)
New Storage Hierarchy
Cray Storage and Data Management - 2015
● DataWarp● Software defined storage● High performance storage pool
● Sonexion● Scalable file system● Resilient storage
● Problem solved! Scale bandwidth separately from
capacity Reduce overall solution cost Improve application run time
5
CPU
Near Memory(HBM/HMC)
Near Storage(SSD)
Far Memory(DRAM/NVDIMM)
Far Storage(HDD)
✓ Capacity needed
✓ Bandwidth needed
Cray Today
![Page 6: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/6.jpg)
Sonexion-only SolutionLots of SSU’s for bandwidth
Drives up the cost of bandwidth ($/GB/s)
Blended Solution DataWarp to satisfy the bandwidth needs
Sonexion to satisfy the capacity needsDrives down the cost of bandwidth ($/GB/s)
Blending Flash with DiskFor high Performance Lustre
![Page 7: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/7.jpg)
DataWarp Overview
● Software● Virtualizes the underlying HW● Single solution of flash & HDD● Automation via policy● Intuitive interface= Harnesses the performance
● Hardware● Intel Server● Block-based SSD ● Aries I/O blade
= Raw performance
![Page 8: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/8.jpg)
Software Phases of DataWarp
9/12/2016 Copyright 2015 Cray Inc
● Phase 0 (available 2014)● Statically configured compute node swap● Single server file systems, /flash/
● Phase 1 (fall 2015) [CLE 5.2UP04 + patches]● Dynamic allocation and configuration of DataWarp storage to jobs (WLM support)● Application controlled explicit movement of data between DataWarp and parallel
file system (stage_in and stage_out)● DVS striping across DataWarp nodes
● Phase 2 (late 2016) [CLE 6.0UP02]● DVS client caching● Implicit movement of data between DataWarp and PFS storage (cache)● No application changes required
8
![Page 9: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/9.jpg)
DataWarp Hardware● Package
● Standard XC I/O blade● SSDs instead of PCIe cables= Plugs right into the Aries
network
● Capacity● 2 nodes per blade● 2 SSD’s per node= 12.6 TB’s per blade (shown)
● Performance= Node processors are already
optimized for I/O and the Cray Aries network
A
A
A
A
A
A
CC
CCCC
CC
CC
CCCC
CC
DW SSDSSD
DW SSDSSD
LN HCAHCA
LN HCAHCA
Lustrestorage
3.2TB3.2TB
3.2TB3.2TB
=12.6TB
![Page 10: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/10.jpg)
Devices
DataWarp Software
Logical Volume Manager
Data Virtualization Service
Open Source FSDWFS
DataWarp Service
Application PFS
Distributed File system layer –Virtualizes the pool of Flash
Service layer (DWS) –Defines the user experience
Service layer (DVS) –Virtualizes I/O
User
WLM
File presentation
File presentation
File presentation
![Page 11: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/11.jpg)
DataWarp User Perspectives
Transparent• New user• No change to
their experience• e.g. PFS Cache
Active• Experienced
user• WLM script
cmds• Common for
most use cases
Optimized• Power user• Control Via
Lib/CLI• e.g. async
workflow
![Page 12: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/12.jpg)
DataWarp User Perspectives
● Workload Manager Integration (WLM)● Researcher/engineer inserts DataWarp commands into the job script
● “I need this much space in the DataWarp pool”● “I need the space in DataWarp to be shared”● “I need the results saved out to the Parallel File System”
● Job Script requests resources via WLM● DataWarp capacity● Compute nodes, files, file locations
● WLM automates clean up after the application completesWLM integration is the key Ease of use Dynamic provisioning
![Page 13: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/13.jpg)
Devices
DataWarp User Perspectives
Logical Volume Manager
Data Virtualization Service
XFSDWFS
DataWarp Service
Application PFS
User
WLM
● Supported Workload Managers● SLURM
● Moab/Torque
● PBS-Pro
![Page 14: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/14.jpg)
Use Cases for DataWarp
•Checkpoint Restart•Local Cache for the PFS
•Transparent user model
•Private scratch space
•Swap space
•Reference files•File interchange•High performance scratch
Shared Storage
Local Storage
Burst Buffer
PFS Cache
We’ll focus here
![Page 15: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/15.jpg)
Use Cases for DataWarp
ISC 2016 Copyright 2016 Cray Inc.
● Reference files ● Read intensive● commonly used by multi-compute nodes
● DataWarp Used directed behavior Automated provisioning of resources
Cray HPCCompute Nodes
DataWarp Nodes
Shared Storage
![Page 16: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/16.jpg)
Use Cases for DataWarp
ISC 2016 Copyright 2016 Cray Inc.
● File interchange● Sharing intermediate work
● DataWarp Used directed behavior Automated provisioning of resources
Cray HPCCompute Nodes
DataWarp Nodes
Shared Storage
![Page 17: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/17.jpg)
Use Cases for DataWarp
ISC 2016 Copyright 2016 Cray Inc.
● High performance scratch● Files are striped across the pool
● DataWarp User directed behavior Automated provisioning of resources
Cray HPCCompute Nodes
DataWarp Nodes
Shared Storage
![Page 18: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/18.jpg)
Use Cases for DataWarp
•Checkpoint Restart•Local Cache for the PFS
•Transparent user model
•Private scratch space
•Swap space
•Reference files•File interchange•High performance scratch
Shared Storage
Local Storage
Burst Buffer
PFS Cache
![Page 19: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/19.jpg)
DataWarp Application Flexibility
ISC 2016 Copyright 2016 Cray Inc.
Cray HPCCompute Nodes
DataWarp Nodes
Local Storage
Cray HPCCompute Nodes
DataWarp Nodes
Shared Storage
Cray HPCCompute Nodes
DataWarp Nodes
Burst
Sonexion Lustre
Trickle
Burst Buffer
Cray HPCCompute Nodes
Sonexion Lustre
DataWarp Nodes
PFS Cache
![Page 20: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/20.jpg)
#DW jobdw ...
● Requests a job DataWarp instance● Lifetime the same as batch job● Only usable by that batch job
● capacity=<size>● Indirect control over server count based on granularity.● Might help to request more space than you need.
● type=scratch● Selects use of DWFS file system
● type=cache● Selects use of DWCFS file system
20
![Page 21: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/21.jpg)
#DW jobdw ... (continued)
● access_mode=striped● All compute nodes see the same filesystem● Files are striped across all allocated DW server nodes● Files are visible to all compute nodes using the instance● Aggregates both capacity and bandwidth per file
● access_mode=private● All compute nodes see a different filesystem● Files only go to a single DW server node ● A compute node uses the same DW node and files only seen by that compute
node● access_mode=striped,private
● Two mount points created on each compute node● Share the same space
21
![Page 22: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/22.jpg)
Simple DataWarp job with Moab
9/12/2016 Copyright 2015 Cray Inc
#!/bin/bash#PBS -l walltime=2:00 -joe -l nodes=8#DW jobdw type=scratch access_mode=striped capacity=790GiB. /opt/modules/default/init/bashmodule load dwsdwstat most # show DW space available and allocatedcd $PBS_O_WORKDIR aprun -n 1 df -h $DW_JOB_STRIPED # only visible on compute nodesIOR=/home/users/dpetesch/bin/IOR.XCaprun -n 32 -N 4 $IOR -F -t 1m -b 2g -o $DW_JOB_STRIPED/IOR_file
22
![Page 23: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/23.jpg)
DataWarp scratch vs. cache
9/12/2016 Copyright 2015 Cray Inc
● Scratch (phase 1)#!/bin/bash#PBS -l walltime=4:00:00 -joe -l nodes=1#DW jobdw type=scratch access_mode=striped capacity=200GiBcd $PBS_O_WORKDIRexport TMPDIR=$DW_JOB_PRIVATENAST="/msc/nast20131/bin/nast20131 scr=yes bat=no sdir=$TMPDIR"ccmrun ${NAST} input.dat mem=16gb mode=i8 out=dw_out
● Cache (phase 2)#!/bin/bash#PBS -l walltime=4:00:00 -joe -l nodes=1#DW jobdw type=cache access_mode=striped pfs=/lus/scratch/dw_cache capacity=200GiBcd $PBS_O_WORKDIRexport TMPDIR=$DW_JOB_STRIPED_CACHENAST="/msc/nast20131/bin/nast20131 scr=yes bat=no sdir=$TMPDIR"ccmrun ${NAST} input.dat mem=16gb mode=i8 out=dw_cache_out
23
![Page 24: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/24.jpg)
DataWarp Bandwidth
The DataWarp bandwidth seen by an application depends on multiple factors:● Transfer size of the I/O requests● Number of Active Streams (files) per DataWarp server
(for File-per-Process I/O, equals number of processes)● Number of DataWarp server nodes
(which is related to capacity requested)● Other activity on the DW server nodes
Administrative and other user jobs. It is a shared resource.
24
![Page 25: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/25.jpg)
Minimize Compute Residence Time with Data Warp
ISC 2016 Copyright 2016 Cray Inc.
Wall Time
Nod
e Co
unt
Wall Time
DW Preload
DWPost Dump
InitialData Load
Final Data Writes
Com
pute
Compute Nodes
Compute Nodes - Idle
I/O Time Lustre
I/O Time DW
DW Nodes
KeyTimestep Writes (DW)
Timestep Writes
Nod
e Co
unt
Lustre
DataWarp
![Page 26: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/26.jpg)
DataWarp with MSC NASTRAN
ISC 2016 Copyright 2016 Cray Inc.
DataWarp
Lustre Only
Cray blog reference: http://www.cray.com/blog/io-accelerator-boosts-msc-nastran-simulations/
Job wall clock reduced by 2x with DataWarp
![Page 27: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/27.jpg)
9/12/2016 Copyright 2015 Cray Inc27
0
500
1000
1500
2000
2500
3000
3500
cpus=128 cpus=256 cpus=384 cpus=512 cpus=640 cpus=768 cpus=1024 cpus=1536
4 nodes 8 nodes 12 nodes 16 nodes 20 nodes 24 nodes 32 nodes 48nodes
Elap
sed
seco
nds f
or S
tand
ard
Abaqus 2016 s4e, 24M elements, 2 ranks per node 16-core 2.3 GHz Haswell, 128 GB nodes
XC40 ABI lustre XC40 ABI DW
CS400 lustre CS400 /tmp
![Page 28: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/28.jpg)
DataWarp Considerations
• Know your workload• Capacity requirement• Bandwidth requirement• Iteration interval
• Calculate ratio of DataWarp to Spinning disk• % of calculated bandwidth needed by DW vs HDD• Is excess bandwidth needed to sync to HDD• % of storage capacity needed by DW to maintain
performance – capacity for multiple iterations
• Budget
![Page 29: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/29.jpg)
DataWarp Bottom Line
• It is about reducing “Time to Solution”• Returning control back to compute
• Reducing the cost of “Time to Solution”
![Page 30: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/30.jpg)
DataWarp Summary
Faster time to insight
1
2
3
Easy to use Accelerates performance
Dynamic Flexible
2
![Page 31: with Cray DataWarp Steve Woods, Solutions Architect · •Local Cache for •Checkpoint Restart the PFS •Transparent user model •Private scratch ... -accelerator-boosts-msc-nastran](https://reader030.vdocument.in/reader030/viewer/2022021802/5b8221bc7f8b9a466b8dd7be/html5/thumbnails/31.jpg)
Questions?