i/o profiling towards the exascale
TRANSCRIPT
![Page 1: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/1.jpg)
I/O Profiling Towards the Exascale [email protected] ZIH, Technische Universität Dresden NEXTGenIO & SAGE: Working towards Exascale I/O Barcelona, May 19th, 2017
![Page 2: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/2.jpg)
NEXTGenIO facts
Project • Research & Innovation
Action • 36 month duration • €8.1 million
Partners • EPCC • INTEL • FUJITSU • BSC • TUD • ALLINEA • ECMWF • ARCTUR
May19th,2017 NextGenIO/SAGEworkshop
![Page 3: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/3.jpg)
Approx. 50% committed to hardware development
• Note: final configuration may differ
May19th,2017 NextGenIO/SAGEworkshop
![Page 4: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/4.jpg)
Intel™ DIMMs are a key feature
• Non-volatile RAM • 3D XPoint technology
• Much larger capacity than DRAM • Slower than DRAM • By a certain factor • Significantly faster than SSDs ™
• 12 DIMM slots per socket • Combination of DDR4 and Intel™ DIMMs
NextGenIO/SAGEworkshopMay19th,2017
![Page 5: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/5.jpg)
Three usage models
• The “memory” usage model • Extension of the main memory • Data is volatile like normal main memory
• The “storage” usage model • Classic persistent block device • Like a very fast SSD
• The “application direct” usage model • Maps persistent storage into address space • Direct CPU load/store instructions
NextGenIO/SAGEworkshopMay19th,2017
![Page 6: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/6.jpg)
New members in memory hierarchy
• New memory technology • Changes the memory
hierarchy we have • Impact on applications
e.g. simulations? • I/O performance is one of
the critical components for scaling up HPC applications and enabling HPDA applications at scale
HPC systems today HPC systems of the future
CPU
Memory NVRAM
Spinning storage disk
Register
Cache
Memory & Storage Latency Gaps
Storage tape
1x
100,000x
10x
10x
10,000x
DRAM
Storage SSD
CPU
Register
Cache
1x
10x
10x
DRAM
Spinning storage disk
Storage disk - MAID
Storage tape
10x
100x
100x
1,000x
10x
socket socket
socket
socket
socket
socket
DIMM
DIMM
DIMM
IO
IO
backup
IO
backupbackup
May19th,2017 NextGenIO/SAGEworkshop
![Page 7: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/7.jpg)
Remote memory access on top
• Network hardware will support remote access • Data in NVDIMMs • To be shared between nodes
• Systemware • Support remote access • Data partitioning and replication
NextGenIO/SAGEworkshopMay19th,2017
![Page 8: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/8.jpg)
Filesystem
Network
Memory Memory
Node
Memory Memory Memory Memory
Node
Node NodeNodeNode
Filesystem
Using distributed storage
• Global file system • No changes to apps
• Required functionality • Create and tear down file
systems for jobs • Works across nodes • Preload and postmove
filesystems • Support multiple
filesystems across system • I/O Performance
• Sum of many layers
May19th,2017 NextGenIO/SAGEworkshop
![Page 9: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/9.jpg)
Filesystem
Network
Memory Memory
Node
Memory Memory Memory Memory
Node
Node NodeNodeNode
Objectstore
Using an object store
• Needs changes in apps • Needs same functionality
as global filesystem • Removes need for POSIX
functionality • I/O Performance
• Different type of abstraction
• Mapping to objects • Different kind of
Instrumentation
May19th,2017 NextGenIO/SAGEworkshop
![Page 10: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/10.jpg)
Job1
Filesystem
Job2Job3
Job4Job2
Job2 Job2 Job4
Towards workflows
• Resident data sets • Sharing preloaded data
across a range of jobs • Data analytic workflows • How to control access/
authorisation/security/etc….?
• Workflows • Producer-consumer
model • Remove file system from
intermediate stages • I/O Performance
• Data merging/integration?
May19th,2017 NextGenIO/SAGEworkshop
![Page 11: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/11.jpg)
Tools have three key objectives
• Analysis tools need to • Reveal performance
interdependencies in I/O and memory hierarchy
• Support workflow visualization
• Exploit NVRAM to store data themselves
• (Workload modelling)
May19th,2017 NextGenIO/SAGEworkshop
![Page 12: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/12.jpg)
Vampir & Score-P
June2nd,2017 LUG17 18
![Page 13: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/13.jpg)
How to meet the objectives?
• File I/O, NVRAM performance • Monitoring (data acquisition)
• Sampling • Tracing
• Statistical analysis (profiles) • Time series analysis
• Multiple layers • Simultaneously • Topology context
• Workflow support • Merge and relate performance data
• Data sources
May19th,2017 NextGenIO/SAGEworkshop
![Page 14: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/14.jpg)
Tapping the I/O layers
• I/O layers • POSIX • MPI-I/O • HDF5 • NetCDF • PNetCDF • File system (Lustre, Adios)
• Data of interest • Open/Create/Close operations (meta data) • Data transfer operations
May19th,2017 NextGenIO/SAGEworkshop
![Page 15: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/15.jpg)
What the NVM library tells us
• Allocation and free events • Information • Memory size (requested, usable) • High Water Mark metric • Size and number of elements in memory
• NVRAM health status • Not measurable at high frequencies
• Individual NVRAM load/stores • Remain out of scope (e.g. memory mapped files)
May19th,2017 NextGenIO/SAGEworkshop
![Page 16: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/16.jpg)
Memory Access Statistics
• Memory access hotspots for using DRAM and NVRAM? • Where? When? Type of memory?
• Metric collection needs to be extended 1. DRAM local access 2. DRAM remote access (on a different socket) 3. NVRAM local access 4. NVRAM remote access (on a different socket)
May19th,2017 NextGenIO/SAGEworkshop
![Page 17: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/17.jpg)
Access to PMU using perf
• Architectural independent counters • May introduce some overhead
• MEM_TRANS_RETIRED.LOAD_LATENCY • MEM_TRANS_RETIRED.PRECISE_STORE • Guess: It will also work for NVRAM?
• Architectural dependent counters • Counter for DRAM
• MEM_LOAD_UOPS_L3_MISS_RETIRED.REMOTE_DRAM • MEM_LOAD_UOPS_L3_MISS_RETIRED.LOCAL_DRAM • MEM_LOAD_UOPS_*.REMOTE_NVRAM ? • MEM_LOAD_UOPS_*.LOCAL_NVRAM ?
May19th,2017 NextGenIO/SAGEworkshop
![Page 18: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/18.jpg)
I/O operations over time
May19th,2017 NextGenIO/SAGEworkshop
IndividualI/OOperaGon
I/ORunGmeContribuGon
![Page 19: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/19.jpg)
I/O data rate over time
May19th,2017 NextGenIO/SAGEworkshop
I/ODataRateofsinglethread
![Page 20: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/20.jpg)
I/O summaries with totals
May19th,2017 NextGenIO/SAGEworkshop
OtherMetrics:• IOPS• I/OTime• I/OSize• I/OBandwidth
![Page 21: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/21.jpg)
I/O summaries per file
May19th,2017 NextGenIO/SAGEworkshop
![Page 22: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/22.jpg)
I/O operations per file
May19th,2017 NextGenIO/SAGEworkshop
Focusonspecificresource
Showallresources
![Page 23: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/23.jpg)
Taken from my daily work...
• Bringing the system I/O down • with a single (serial)
application • Higher I/O demand than
IOR benchmark • Why?
May19th,2017 NextGenIO/SAGEworkshop
![Page 24: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/24.jpg)
Coarse grained time series reveal some clue, but...
May19th,2017 NextGenIO/SAGEworkshop
![Page 25: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/25.jpg)
Details make a difference
May19th,2017 NextGenIO/SAGEworkshop
AsingleNetCDFget_vara_floattriggers...
...15!POSIXreadoperaGons
![Page 26: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/26.jpg)
Approaching the real cause
May19th,2017 NextGenIO/SAGEworkshop
AsingleNetCDFget_vara_floattriggers...
...15!POSIXreadoperaGons
Evenworse:NetCDFreads136kbto
providejust2kb
![Page 27: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/27.jpg)
Before and after…
May19th,2017 NextGenIO/SAGEworkshop
![Page 28: I/O Profiling Towards the Exascale](https://reader031.vdocument.in/reader031/viewer/2022020622/61ee5504d0aa744c705383b0/html5/thumbnails/28.jpg)
Summary
• NEXTGenIO developing a full hardware and software solution
• Performance focus • Consider complete I/O stack • Incorporate new I/O paradigms • Study implications of NVRAM
• Reduce I/O costs • New usage models for HPC and HPDA
May19th,2017 NextGenIO/SAGEworkshop