persistent memory: the value to hpc and the challenges · 3 optimized system interconnect reach...
TRANSCRIPT
![Page 1: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/1.jpg)
DCG Data Center Group
Persistent Memory:The Value to HPC and the Challenges
November 12, 2017
Andy Rudoff
Principal Engineer, NVM Software
Intel Corporation
![Page 2: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/2.jpg)
DCG Data Center Group 2
New Type of Memory
• Persistent, Large Capacity & Byte Addressable• 6 TB per two-socket system
• DDR4 Socket Compatible • Can Co-exist with Conventional DDR4 DRAM DIMMs
• Demonstrated at SAP Sapphire and Oracle Open World 2017
• Cheaper than DRAM
• Availability • 2018
Intel Persistent Memory
![Page 3: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/3.jpg)
3
Optimized System Interconnect
Reach full potential of 3D XPoint™ Technology by connecting it as Memory
NVM Express* (NVMe)
Sources: “Storage as Fast as the rest of the system” 2016 IEEE 8th International Memory Workshop and measurement, Intel® Optane™ SSD measurements and Intel P3700 measurements, and technology projections
(4k)
![Page 4: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/4.jpg)
DCG Data Center Group 4
Definition of Persistent Memory
Byte-addressable§ As far as the programmer is concernedLoad/Store access§ Not demand-pagedMemory-like performance§ Would reasonably stall a CPU load waiting for pmemProbably DMA-able§ Including RDMA
For modeling, think: Battery-backed DRAM
![Page 5: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/5.jpg)
DCG Data Center Group 5
The Value of Persistent Memory
Data sets addressable with no DRAM footprint§ At least, up to application if data copied to DRAM
Typically DMA (and RDMA) to pmem works as expected§ RDMA directly to persistence – no buffer copy required!
The “Warm Cache” effect
§ No time spend loading up memoryByte addressableDirect user-mode access
§ No kernel code in data path
![Page 6: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/6.jpg)
DCG Data Center Group 66
Programming Models
![Page 7: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/7.jpg)
DCG Data Center Group 7
Programming Model:At least four meanings…
1. Interface between HW and SW
2. Instruction Set Architecture (ISA)
3. Exposed to Applications (by the OS)
4. The Programmer Experience
![Page 8: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/8.jpg)
DCG Data Center Group 8
Programming Model:At least four meanings…
1. Interface between HW and SW
2. Instruction Set Architecture (ISA)
3. Exposed to Applications (by the OS)
4. The Programmer Experience
![Page 9: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/9.jpg)
DCG Data Center Group 99
Core
L1 L1
L2
L3
Core
L1 L1
L2
NVDIMMNVDIMM
Memory Subsystem
…
Memory bus
Programming Model (meaning 1):HW to SW Interface
![Page 10: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/10.jpg)
DCG Data Center Group 1010
Core
L1 L1
L2
L3
Core
L1 L1
L2
NVDIMMNVDIMM
Memory Subsystem
…
Memory bus
Result:Persistent Memory hardware
accessed like memory(cache coherent).
Described by ACPI on x86.
Programming Model (meaning 1):HW to SW Interface
![Page 11: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/11.jpg)
DCG Data Center Group 11
NFIT(NVDIMM Firmware Interface Table)
First introduced in ACPI 6.0§ Describes the NVDIMMs in a system§ Separates NVDIMM memory from system main memory§ Describes persistency model§ Describes interleaving
§ Lots of interesting considerations here§ Who should see this (which layers of SW?)
§ Continuing to evolve to cover more use casesNFIT is created by the BIOS§ BIOS must understand each supported NVDIMM type
![Page 12: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/12.jpg)
DCG Data Center Group 12
JEDEC NVDIMM Types
Source: http://www.snia.org/sites/default/files/SSSI/NVDIMM%20-%20Changes%20are%20Here%20So%20What's%20Next%20-%20final.pdfAuthor: Arthur Sainio
![Page 13: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/13.jpg)
DCG Data Center Group 13
Programming Model:At least four meanings…
1. Interface between HW and SW
2. Instruction Set Architecture (ISA)
3. Exposed to Applications (by the OS)
4. The Programmer Experience
![Page 14: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/14.jpg)
DCG Data Center Group 1414
Core
L1 L1
L2
L3
Core
L1 L1
L2
NVDIMMNVDIMM
Memory Subsystem
…
persistentdomain
MOV
CLWB
Programming Model (meaning 2):Instruction Set Architecture (ISA)
![Page 15: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/15.jpg)
DCG Data Center Group 1515
Core
L1 L1
L2
L3
Core
L1 L1
L2
NVDIMMNVDIMM
Memory Subsystem
…
persistentdomain
MOV
CLWB
Result:Stores flushed from CPU cache,
globally-visible è Persistent (on x86)
Programming Model (meaning 2):Instruction Set Architecture (ISA)
![Page 16: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/16.jpg)
DCG Data Center Group 16
The Data Path
Core
L1 L1
L2
L3
Core
L1 L1
L2
Core
L1 L1
L2
Core
L1 L1
L2
NVDIMMNVDIMM
NVDIMMNVDIMM
Memory Controller Memory Controller
MOV
![Page 17: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/17.jpg)
DCG Data Center Group 17
Hiding Places
Core
L1 L1
L2
L3
Core
L1 L1
L2
Core
L1 L1
L2
Core
L1 L1
L2
NVDIMMNVDIMM
NVDIMMNVDIMM
Memory Controller Memory Controller
MOV
![Page 18: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/18.jpg)
DCG Data Center Group 18
How the Hardware Works
WPQ
ADR-or-
WPQ Flush (kernel only)
Core
L1 L1
L2
L3
WPQ
MOV
DIMM
CPU
CA
CHES
CLWB + fence-or-
CLFLUSHOPT + fence-or-
CLFLUSH-or-
NT stores + fence-or-
WBINVD (kernel only)
Minimum RequiredPower fail protected domain:
Memory subsystem
CustomPower fail protected domainindicated by ACPI property:
CPU Cache Hierarchy
![Page 19: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/19.jpg)
DCG Data Center Group 19
Visibility versus Power Fail AtomicityFeature Atomicity
Atomic Store8 byte powerfail atomicity
Much larger visibility atomicity
TSXProgrammer must comprehend XABORT,
cache flush can abort
LOCK CMPXCHGnon-blocking algorithms depend on CAS,
but CAS doesn’t include flush to persistence
• Software must implement all atomicity beyond 8-bytes for pmem• Transactions are fully up to software
![Page 20: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/20.jpg)
DCG Data Center Group 20
Programming Model:At least four meanings…
1. Interface between HW and SW
2. Instruction Set Architecture (ISA)
3. Exposed to Applications (by the OS)
4. The Programmer Experience
![Page 21: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/21.jpg)
DCG Data Center Group 21
The Storage Stack (50,000ft view…)
21
UserSpace
KernelSpace
StandardFile API
Driver
Application
File System
Application
StandardRaw Device
Access
Management Library
Management UI
Storage
![Page 22: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/22.jpg)
DCG Data Center Group 22
A Programmer’s View(not just C programmers!)
fd = open(“/my/file”, O_RDWR);
…
count = read(fd, buf, bufsize);
…
count = write(fd, buf, bufsize);
…
close(fd);
“Buffer-Based”
![Page 23: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/23.jpg)
DCG Data Center Group 23
A Programmer’s View (mapped files)
fd = open(“/my/file”, O_RDWR);
…
base = mmap(NULL, filesize, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
close(fd);
…
base[100] = ‘X’;
strcpy(base, “hello there”);
*structp = *base_structp;
… “Load/Store”
![Page 24: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/24.jpg)
DCG Data Center Group 24
Memory-Mapped Files
What are memory-mapped files really?§ Direct access to the page cache
§ Storage only supports block access (paging)
With load/store access, when does I/O happen?§ Read faults/Write faults
§ Flush to persistence
Not that commonly used or understood§ Quite powerful
§ Sometimes used without realizing it
24
![Page 25: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/25.jpg)
DCG Data Center Group 25
OS Paging
UserSpace
KernelSpace
Application ApplicationApplication
NVDIMMNVDIMM
DRAM
… …
load/storeaccess
page faultaccess
![Page 26: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/26.jpg)
DCG Data Center Group 2626
NVDIMM
UserSpace
KernelSpace
StandardFile API
NVDIMM Driver
Application
File System
ApplicationApplication
StandardRaw Device
Access
Load/Store
Management Library
Management UI
StandardFile API
pmem-AwareFile System
MMUMappings
Programming Model (meaning 3):Exposing to Applications
“DAX”
![Page 27: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/27.jpg)
DCG Data Center Group 2727
The SNIA NVM Programming Model
NVDIMMs
UserSpace
KernelSpace
StandardFile API
NVDIMM Driver
Application
File System
ApplicationApplication
StandardRaw Device
Access
Storage File Memory
Load/Store
Management Library
Management UI
StandardFile API
Mgmt.
pmem-AwareFile System
MMUMappings
![Page 28: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/28.jpg)
DCG Data Center Group 28
Defining The Programming Model
40+ Member Companies
http://snia.org/sites/default/files/NVMProgrammingModel_v1.pdf
SNIA Technical Working GroupDefined 4 programming modes required by developers
Spec 1.0 developed, approved by SNIA voting members and published
Interfaces for PM-aware file system accessing
kernel PM support
interfaces for application accessing a PM-aware file
system
Kernel support for block NVM
extensions
Interfaces for legacy applications to access block NVM extensions
![Page 29: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/29.jpg)
DCG Data Center Group 29
Programming Model:At least four meanings…
1. Interface between HW and SW
2. Instruction Set Architecture (ISA)
3. Exposed to Applications (by the OS)
4. The Programmer Experience
![Page 30: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/30.jpg)
DCG Data Center Group 3030
NVDIMM
KernelSpace
Application
Load/StoreStandardFile API
PM-AwareFile System
MMUMappings
Result:Safer, less error-prone,
idiomatic in common languages
Language Runtime
Libraries
Tools
Programming Model (meaning 4):The Programmer Experience
![Page 31: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/31.jpg)
DCG Data Center Group 31
Java PersistentSortedMap
31
PersistentSortedMap employees = new PersistentSortedMap();
…employees.put(id, data);
See “pilot” project at: https://github.com/pmem/pcj
No flush calls.Transactional.
Java library handles it all.
![Page 32: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/32.jpg)
DCG Data Center Group 32
What Lies Between:
Memory-Mapped Files
High-Level Language Support
32
Standard Flush
msync()
FlushViewOfFile()FlushFileBuffers()
PersistentSortedMap
employees.put(id, data);
![Page 33: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/33.jpg)
DCG Data Center Group 33
NVM Libraries: pmem.ioC/C++ on Linux and Windows
33
NVDIMM
UserSpace
KernelSpace
Application
Load/StoreStandardFile API
PM-AwareFile System
MMUMappings
NVM Libraries
• Open Source• http://pmem.io
• libpmem• libpmemobj• libpmemblk• libpmemlog• libvmem
Transactional
![Page 34: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/34.jpg)
DCG Data Center Group 34
Summary: What the Basic Model Provides(and what it does not provide)
• Raw user space access to a mapped area of persistent memory• “Your terabytes start at address X, have fun!”
• Hard problems programmers have to solve:• Allocation: something like malloc/free, new/delete• Transactions: prevent torn updates• Type safety:
• Disallow pointing from pmem to DRAM• Remember type of thing in pmem
• Position independence: allow mapping address to change• Memory error handling• Simpler, less error-prone programming
• With tools to help find errors
![Page 35: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/35.jpg)
DCG Data Center Group 3535
Persistent Memory and HPC
![Page 36: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/36.jpg)
DCG Data Center Group 36
Volatile Usages
Use pmem for its capacity, not for its persistence
§ pmem will be larger than DRAM
§ pmem will be cheaper than DRAM
§ But…
§ pmem will be slower than DRAM
HPC applications can manage data placement
§ “Hot” data structures in DRAM
§ Others in pmem (still faster than paging from storage)
![Page 37: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/37.jpg)
DCG Data Center Group 37
libmemkind
https://github.com/memkind/memkind
§ Provides familiar “malloc” interface
§ For different “kinds” of memory§ Includes NUMA nodes, HBM, pmem
§ No need to track type, the “free” interface figures it out
![Page 38: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/38.jpg)
DCG Data Center Group 38
Persistent Usages
Synchronous persistence§ Changes are persisted as they are made (transactionally)§ Best recovery semantics, worst performance overheadEventually persistent§ More complex programming (“sync points”)§ Better performanceSnapshots§ Really more of a block/sequential use case§ Holds state locally while sending it to a server§ Performs well
![Page 39: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/39.jpg)
DCG Data Center Group 39
libpmemcto: “close-to-open” Consistency
Behaves like malloc/free at run time
§ No need for transactions
§ No need for flushing
§ Normal pointers
Clean shutdown saves heap state
Open fails unless shutdown was clean
§ Or if can’t map at same place
Optimizes for most common case
![Page 40: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/40.jpg)
DCG Data Center Group 4040
Examining Challenges of Load/Store
![Page 41: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/41.jpg)
DCG Data Center Group 41
Application Memory Allocation
UserSpace
KernelSpace
Application
RAM
• Well-worn interface, around for decades• Memory is gone when application exits
– Or machine goes down
RAM
RAM RAM
MemoryManagement
ptr = malloc(len)
![Page 42: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/42.jpg)
DCG Data Center Group 42
Application NVM Allocation
UserSpace
KernelSpace
Application
NVM
• Simple, familiar interface, but then what?– Persistent, so apps want to “attach” to regions– Need to manage permissions for regions– Need to resize, remove, …, backup the data
NVM
NVM NVM
MemoryManagement
ptr = pm_malloc(len)
![Page 43: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/43.jpg)
DCG Data Center Group 43
Visibility versus PersistentIt has always been thus:
§ open()
§ mmap()
§ store...
§ msync()
pmem just follows this decades-old model
§ But the stores are cached in a different spot
43
visible
persistent
![Page 44: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/44.jpg)
DCG Data Center Group 44
pmem Programming Challenges:Flushing, Atomicity, Transactions
NVDIMM
UserSpace
KernelSpace
Application
Load/StoreStandardFile API
pmem-AwareFile System
MMUMappings
No Page Cache to FlushCPU Caches still need flushing
msync()FlushViewOfFile()FlushFileBuffers()Work as expected
pmem_persist()User-Mode Instructions
![Page 45: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/45.jpg)
DCG Data Center Group 45
POSIX Load/Store Persistence
45
open(…);mmap(…);
strcpy(pmem, "andy");
msync(pmem, 5, MS_SYNC);
![Page 46: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/46.jpg)
DCG Data Center Group 46
Optimized Flush (only use when safe!)
46
open(…);mmap(…);
strcpy(pmem, "andy");
msync(pmem, 5, MS_SYNC);pmem_persist(pmem, 5);
![Page 47: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/47.jpg)
DCG Data Center Group 47
libpmem Load/Store Persistence
47
open(…);mmap(…);
strcpy(pmem, "andy");
msync(pmem, 5, MS_SYNC);pmem_persist(pmem, 5);
Application
libpmem
memory-mapped pmem
![Page 48: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/48.jpg)
DCG Data Center Group 48
Crossing the 8-byte Store
48
open(…);mmap(…);
strcpy(pmem, "andy rudoff");
pmem_persist(pmem, 12); *crash*
1. "\0\0\0\0\0\0\0\0\0…"2. "andy\0\0\0\0\0\0…"3. "andy rud\0\0\0\0\0…"4. "\0\0\0\0\0\0\0\0off\0\0..."5. "andy rudoff\0"
Which Result?
![Page 49: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/49.jpg)
DCG Data Center Group 49
Position DependenceCan pointers work across sessions?
Will a pmem file be mapped at the same address every time?
text
data
BSS
heap
memory
mapped
stack
kernel
persistent
memory
element
element
random
![Page 50: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/50.jpg)
DCG Data Center Group 50
What is libpmemobj?
General purpose pmem transactionspmem-aware memory allocatorSome common operations made atomicApplication doesn’t worry about HW details§ Library handles CPU cache flushing
§ Library comprehends size of atomic stores
Lots of examples in the pmem.io source treeThis is probably the library you want
50
![Page 51: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/51.jpg)
DCG Data Center Group 51
libpmemobj Data Path
51
NVDIMM
Application
Load/StoreStandardFile API
pmem-AwareFile System
MMUMappings
obj
r Application uses libpmemobj API
r Transactions entirely memory-centric user space code
r Much faster, but…r App had to change
pmem
![Page 52: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/52.jpg)
DCG Data Center Group 52
Libpmemobj Replication: Application Transparent(except for performance overhead)
52
NVDIMM
Application
Load/StoreStandardFile API
pmem-AwareFile System
MMUMappings
objpmem
NVDIMM
pmem-AwareFile System
Remote MachineSame APIwith and without
replication
![Page 53: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/53.jpg)
DCG Data Center Group 53
C Programming with libpmemobj
![Page 54: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/54.jpg)
DCG Data Center Group 54
Transaction Syntax
54
TX_BEGIN(Pop) {/* the actual transaction code goes here... */
} TX_ONCOMMIT {/** optional − executed only if the above block* successfully completes*/
} TX_ONABORT {/** optional − executed if starting the transaction fails* or if transaction is aborted by an error or a call to* pmemobj_tx_abort()*/
} TX_FINALLY {/** optional − if exists, it is executed after* TX_ONCOMMIT or TX_ONABORT block*/
} TX_END /* mandatory */
![Page 55: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/55.jpg)
DCG Data Center Group 55
Properties of Transactions
55
TX_BEGIN_PARAM(Pop, TX_PARAM_MUTEX, &D_RW(ep)->mtx, TX_PARAM_NONE) { TX_ADD(ep);D_RW(ep)->count++;
} TX_END
PowerfailAtomicity
Multi-ThreadAtomicity
Caller mustinstrument codefor undo logging
![Page 56: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/56.jpg)
DCG Data Center Group 56
Persistent Memory Locks
§ Want locks to live near the data they protect (i.e. inside structs)
§ Does the state of locks get stored persistently?§ Would have to flush to persistence when used§ Would have to recover locked locks on start-up
– Might be a different program accessing the file
§ Would run at pmem speeds
§ PMEMmutex§ Runs at DRAM speeds§ Automatically initialized on pool open
![Page 57: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/57.jpg)
DCG Data Center Group 57
C++ Programming with libpmemobj
![Page 58: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/58.jpg)
DCG Data Center Group 58
libpmemobj
Application
libpmem
Load/Store
memory-mapped pmem
libpmemobj
transactionsatomic operations
allocatorlistslocks
API API API
![Page 59: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/59.jpg)
DCG Data Center Group 59
C++ Queue Example
Application
pmem pool “myfile”
root object:class pmem_queue
struct pmem_entry
![Page 60: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/60.jpg)
DCG Data Center Group 60
C++ Queue Example: Declarations
/* entry in the queue */struct pmem_entry {
persistent_ptr<pmem_entry> next;p<uint64_t> value;
};
persistent_ptr<T>Pointer is really a position-independentObject ID in pmem.Gets rid of need to use C macros like D_RW()
p<T>Field is pmem-resident and needs to bemaintained persistently.Gets rid of need to use C macros like TX_ADD()
![Page 61: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/61.jpg)
DCG Data Center Group 61
C++ Queue Example: Transactionvoid push(pool_base &pop, uint64_t value) {
transaction::exec_tx(pop, [&] {auto n = make_persistent<pmem_entry>();
n->value = value;n->next = nullptr;if (head == nullptr) {
head = tail = n;} else {
tail->next = n;tail = n;
}});
}
Transactional(including allocations & frees)
![Page 62: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/62.jpg)
DCG Data Center Group 6262
Summary
![Page 63: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/63.jpg)
DCG Data Center Group 63
OS Detection of NVDIMMs ACPI 6.0+
OS Exposes pmem to apps
DAX provides SNIA Programming ModelFully supported:
• Linux (ext4, XFS)• Windows (NTFS)
OS Supports Optimized Flush
Specified, but evolving (ask when safe)• Linux: unsafe except Device DAX
• MAP_SYNC fixes this• (safe in NOVA)
• Windows: safe
Remote Flush Proposals under discussion(works today with extra round trip)
Deep Flush In latest specification (SNIA NVMP and ACPI)
Transactions, AllocatorsBuilt on above via libraries and languages:
• http://pmem.ioMuch more language support to do
Virtualization All VMMs planning to support PM in guest(KVM changes upstream, Xen coming, others too…)
63
State of Ecosystem Today
![Page 64: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/64.jpg)
DCG Data Center Group 64
Summary
Persistent Memory§ Emerging technology, game changing, large capacity§ New programming models allow greater leverageNVM Libraries§ http://pmem.io§ Convenience, not a requirement§ Transactions, memory allocation, language support§ More comingWe don’t know all the answers yet§ The next few years are going to be pretty exciting!
![Page 65: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/65.jpg)
![Page 66: Persistent Memory: The Value to HPC and the Challenges · 3 Optimized System Interconnect Reach full potential of 3D XPoint™ Technology by connecting it as Memory NVM Express* (NVMe)](https://reader035.vdocument.in/reader035/viewer/2022062922/5f074add7e708231d41c43af/html5/thumbnails/66.jpg)
DCG Data Center Group 66
References
§ SNIA NVM Programming Model§ http://www.snia.org/forums/sssi/nvmp
§ Open Source NVM Libraries§ http://pmem.io