achieving dual benefits of pcm by reducing persistence overheads
TRANSCRIPT
![Page 1: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/1.jpg)
pMem- Achieving Dual Benefits of PCM/NVM by Reducing Persistence Overheads
Sudarsun Kannan, Ada Gavrilovska, Karsten Schwan
![Page 2: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/2.jpg)
Motivation
Growing number of end client appsE.g., Webstore -33 million users. ~1 Million apps
Lots of Data-intensive applications Picasa, Digikam, Facebook, Face/Voice recognition etc.
Increasing number of cores and multi-threaded applications
Effective memory capacity + persistent storage bottlenecks - MDRAM has limited scalability- External flash ~4- 16 MB/Sec (FAST' 11, Kim et al.)
![Page 3: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/3.jpg)
Motivation - Memory Capacity
![Page 4: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/4.jpg)
MotivationMotivation - Memory Capacity
![Page 5: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/5.jpg)
Byte addressable storage?NVM technologies like PCM
Byte addressable and persistent 2X-4X higher density compared to DRAMs 100X faster compared to SSDsLess power due to absence of refreshByte addressability - (Can be connected across memory bus and accessed with load/stores)
Limitations:Hight write latencies compared to DRAMS
(4X - 10X slower around a microsec)Limited endurance (approx. 10^8 writes/cell)Limited bandwidth: interface and device bottlenecks
![Page 6: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/6.jpg)
Prior Work: DRAM as Cache
Processor Cache
PCM Volatile DRAM Page Cache
APP
Good for high end server
![Page 7: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/7.jpg)
Prior Work: Fast Non Volatile Heap
Processor Cache
PCM Persistent
Heap
APP
High Persistence Guarantees:● Frequent cache flushing, memory fencing, writes to
PCM● High persistent management overhead
○ (user + kernel layers)
![Page 8: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/8.jpg)
Proposed: Capacity + Persistence
Processor cache plays crucial role in reducing write latency
![Page 9: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/9.jpg)
● Advantages○ Dual benefits: Capacity + fast persistence
● Key Idea○ Use PCM as NUMA node○ PCM 'Node' partitioned to volatile + persistent heap ○ Applications are provided with suitable interfaces
■ Application control persistent/non persistent data○ Throw/ stay way from traditional I/O calls
■ Goal: Reduce software interaction (includes OS)
Proposed: Dual Use using pMem
![Page 10: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/10.jpg)
Proposed: Dual Use using pMem
User level NVM Library
NVM Volatile Manager
-Similar to DRAM manager -Almost no application state maintenance
NVM Persistent Manager
APP1 APP2
Per process state across session
Kernel Layer
![Page 11: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/11.jpg)
User NVM Library
NVM Volatile Manager NVM Persistent Manager
APP1 APP2
npmalloc()
NUMA NODE
Proposed: Dual Use using pMem
![Page 12: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/12.jpg)
User NVM Library
NVM Volatile Manager NVM Persistent Manager
APP1 APP2
pmalloc(ID, size)
NUMA NODE
Proposed: Dual Use using pMem
![Page 13: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/13.jpg)
Impact on volatile applications
pMem System Structure
![Page 14: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/14.jpg)
pMem Experimental Results
Experimental Method:
● DRAM as NVM with a NUMA node as PCM● Persistence across sessions avoiding OS to reclaim
pages● Accounting for NVM read/writes using PIN based
instrumentation● Hardware counters to understand cache misses● Also architectural simulations (MACSim)
![Page 15: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/15.jpg)
Experimental Results
Experimental Use cases
Scalability: Linux Scalability benchmark for paging/allocation
Memory Capacity: Face recognition, Compression, Crime
Persistence: Machine learning application to load user
preferences during browser page time
![Page 16: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/16.jpg)
pMem Paging Performance
![Page 17: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/17.jpg)
pMem Memory UsagePerformance 4%-6% overhead
![Page 18: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/18.jpg)
pMem Persistent Storage
45% Improved I/O compared to SSDWith increasing data, cost of persistence increases~62% improvement in persistent hashtables
![Page 19: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/19.jpg)
Summary
● Volatile-Persistent heap partitioning ● Idea: Use PCM as persistent NUMA node● Upto 91% memory capacity benefits● ~45% faster I/O for end client apps.● Less that 6%-7% runtime overhead on some apps
But PCM/NVMs are theoretically 100x faster :-)
![Page 20: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/20.jpg)
Persistence Overheads
Processor Cache
PCM Persistent Heap
APPPCM volatile Heap
APP
Persistence requires constant barrier, cache line flushing
Is sharing cache a problem?
![Page 21: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/21.jpg)
Effects of Persistence
Persistent Application: Hashtable with 1M Operations (puts and gets)Intel Atom : Dual core, 1MB LLC, (8 way, Write Back, Shared LLC)Persistent and volatile applications pinned to their cores
![Page 22: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/22.jpg)
AddHash_Entry() {//Fence and Flush log (in PCM). BEGINTRANS((void *)table,0); ++(table->entrycount);
//Fence and flushe = nvalloc(sizeof(struct entry));
//Fence and flushBEGINTRANS((void *)e,0); e->h = hash(h,k); e->k = k; e->v = v; table->table[index] = e;//Fence and flush COMMIT((void *)e, (void *)table, 0);
Effects of Persistence
![Page 23: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/23.jpg)
AddHash_Entry() {//Fence and Flush log (in PCM). BEGINTRANS((void *)table,0); ++(table->entrycount);
//Fence and flushe = nvalloc(sizeof(struct entry));
//Fence and flushBEGINTRANS((void *)e,0); e->h = hash(h,k); e->k = k; e->v = v; table->table[index] = e;//Fence and flush COMMIT((void *)e, (void *)table, 0);
Effects of Persistence
Transactional overhead
Transactional overhead
Allocator overhead
![Page 24: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/24.jpg)
Cost of Persistence
● User level Overheads○ Allocator metadata maintenance○ Restart/ Recovery Swizzling
● Transactional (Durability) Overheads ○ Logging○ Substantial code changes○
● Kernel level Overheads○ Kernel metadata maintenance○ Kernel metadata swizzling
![Page 25: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/25.jpg)
Allocator Overhead
Problem: Complex allocator metadata in PCM, High random writes, High Cache miss rate
![Page 26: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/26.jpg)
Proposed AllocationComplex allocator state in DRAM 2 level allocator
Metadata log in PCMApp1
MMAP log(sequential)
C1 C2 C3 ..... ....
Chunk log - sequential
Insert - O(n)Lookup- O(log n) + C
baseaddrlength*dataptrcompartment ID
![Page 27: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/27.jpg)
Proposed Allocation
Reduction in Cache Flush: 8X
![Page 28: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/28.jpg)
Cost of Persistence
● User level Overheads○ Allocator metadata maintenance○ Restart/ Recovery Swizzling
● Transactional (Durability) Overheads ○ Logging○ Substantial code changes○
● Kernel level Overheads○ Kernel metadata maintenance○ Kernel metadata swizzling
![Page 29: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/29.jpg)
Swizzling - Recovery overheads
During Reboot,
Lets say process heap starting address is 2000
hash_s *hashtable = load_entire_hashtable("hashtable_root")
cout << "hashtable ptr << endl;
prints incorrectly 1000, should be >= 2000
![Page 30: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/30.jpg)
Swizzling - Recovery overheadsNormal Execution:
hash_s *hashtable = nvmalloc( size, "hashtable_root");for each new entry:
entry_s *entry = nvmalloc( size);hashtable[count] = entry;count++
cout << "hashtable ptr << endl; prints 1000
SYSTEM CRASH
![Page 31: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/31.jpg)
Traditional Recovery - Serialization
Requires extensive modification of datastructures
Substantial I/O calls, and more OS interaction
Two phase overhead:
1. serialization when saving data2. deserializationfor recovery3. kills byte addressability4. Can increase overhead upto 20% each phase
Prior Work: Swizzling during application execution
![Page 32: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/32.jpg)
Proposed Solution - Lazy Swizzling○ Lazy/ On demand pointer swizzling
○ Use allocator metadata as history of previous allocation
○ On restart, when a chunk is accessed, get its stale pointer value.
○ See if stale pointer is in history (allocator log)
○ If yes, map the state pointers to get new virtual address
○ Convert the old state pointer to new pointer
![Page 33: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/33.jpg)
Proposed Solution - Lazy Swizzling
h = (struct hashtable *)nvalloc_("root_hash");
for each entry in hash:
LOADNVPTR(&key); LOADNVPTR(&value);
Benefits: ● No serialization of pointers required during commit● Application decides what to load during restart● Multiple level of pointer can be recovered● Less than 10 % performance overhead during restart
![Page 34: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/34.jpg)
Constant Virtual address
○ Use same virtual address across sessions
○ No requirement of pointer swizzling
○ Requires static partitioning of NVM/PCM
![Page 35: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/35.jpg)
Cost of Persistence
● User level Overheads○ Allocator metadata maintenance○ Restart/ Recovery Swizzling
● Transactional (Durability) Overheads ○ Logging○ Substantial code changes○
● Kernel level Overheads○ Kernel metadata maintenance○ Kernel metadata swizzling
![Page 36: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/36.jpg)
Durability overheads - Logging typesLog every write (in PCM) to overcome failures
Undo Logging
● Create a log, and copy the original data to log● Modify the data in-place● Upon failure before commit, restore stable log version ● Problems
○ Two writes for every single write ○ Random Writes
![Page 37: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/37.jpg)
Durability overheads - Logging typesWrite Ahead logging ( most favoured and widely used )
● Create log and write sequentially to log● When log fills up, log committed to original data
● Problems○ Usually for heaps, every word is logged○ High Log Metadata/ Log Data overhead○ Metadata: 24bytes even for 8 bytes○ Substantial Code changes
Prior Work: Word based or Object based logging
![Page 38: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/38.jpg)
Write Ahead logging (WAL) in Heap
i = (unsigned int)LOAD(&h->entrycount); STORE(&h->entrycount, i++);
if (LOAD(&h->entrycount) > h->loadlimit) { hashtable_expand(h);
} e = (struct entry *)nvmalloc(sizeof(struct entry));
STORE(&e->h, hash(h,k)); STORE(&e->v, v); STORE(&e->next, h->table[index]); STORE(&h->table[index], e);
COMMIT;
![Page 39: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/39.jpg)
Proposed: Hybrid logging Heap● Using only Word or Object based logging granularity not
optimal (Why?)
● Combine Object and Word based logging with Undo Logging
● Maintain separate Object and Word based logs
● Object based log: Less Log Metadata/ Log Data ratio
● Word based log: Convenient for small changes (e.g., hash entry count)
![Page 40: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/40.jpg)
Benefits: Hybrid logging Heap
For Object based undo logging, easy dirt checking● e.g, first time inserts
Object based allocator metadata used also for logging
No separate log metadata is required
![Page 41: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/41.jpg)
Benefits: Hybrid logging Heap
![Page 42: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/42.jpg)
Summary
Goal to reduce persistence overheadsCache efficient NVM allocatorLazy pointer swizzling to reduce serialization costLess than 10% swizzling overheadNovel hybrid logging (Object + Word)Improved I/O performance by 63%
More opportunities:Reducing Kernel OverheadsCompiler optimizations
![Page 43: Achieving Dual Benefits of PCM by Reducing Persistence Overheads](https://reader034.vdocument.in/reader034/viewer/2022051521/586cc33c1a28ab3f528b6d78/html5/thumbnails/43.jpg)
Questions / Comments
Thanks!