u niversity of m assachusetts, a mherst department of computer science memory management for...
TRANSCRIPT
![Page 1: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/1.jpg)
UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Memory Management for High-Performance
ApplicationsEmery Berger
University of Massachusetts, Amherst
![Page 2: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/2.jpg)
2UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
High-Performance Applications
Web servers, search engines, scientific codes
C or C++ (still…) Run on one or
cluster of server boxes
Raid drive
cpucpucpucpu
RAM
Raid drive
cpucpucpucpu
RAM
RAID drive
cpucpucpucpu
RAM
software
compiler
runtime system
operating system
hardware
Needs support at every level
![Page 3: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/3.jpg)
3UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
New Applications,Old Memory Managers
Applications and hardware have changed Multiprocessors now commonplace Object-oriented, multithreaded Increased pressure on memory manager
(malloc, free)
But memory managers have not kept up Inadequate support for modern
applications
![Page 4: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/4.jpg)
4UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Current Memory ManagersLimit Scalability
Runtime Performance
01234567
89
1011121314
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Number of Processors
Sp
eed
up
Ideal
Actual
As we add processors, program slows down
Caused by heap contention
Larson server benchmark on 14-processor Sun
![Page 5: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/5.jpg)
5UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
The Problem
Current memory managersinadequate for high-performance applications on modern architectures Limit scalability, application
performance, and robustness
![Page 6: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/6.jpg)
6UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
This Talk
Building memory managers Heap Layers framework [PLDI 2001]
Problems with current memory managers Contention, false sharing, space
Solution: provably scalable memory manager Hoard [ASPLOS-IX]
Extended memory manager for servers Reap [OOPSLA 2002]
![Page 7: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/7.jpg)
7UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Implementing Memory Managers
Memory managers must be Space efficient Very fast
Heavily-optimized code Hand-unrolled loops Macros Monolithic functions
Hard to write, reuse, or extend
![Page 8: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/8.jpg)
8UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Building Modular Memory Managers
Classes Overhead Rigid hierarchy
Mixins No overhead Flexible hierarchy
![Page 9: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/9.jpg)
9UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
A Heap Layer
template <class SuperHeap>class GreenHeapLayer :
public SuperHeap {…};
GreenHeapLayer
RedHeapLayer
Mixin with malloc & free methods
![Page 10: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/10.jpg)
10UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
LockedHeap
mallocHeapmallocHeap
Example:Thread-Safe Heap Layer
LockedHeap protect the superheap with a lock
LockedMallocHeap
mallocHeap
LockedHeap
![Page 11: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/11.jpg)
11UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Empirical Results
Heap Layers vs. originals: KingsleyHeap
vs. BSD allocator
LeaHeapvs. DLmalloc 2.7
Competitive runtime and memory efficiency
Runtime (normalized to Lea allocator)
0
0.25
0.5
0.75
1
1.25
1.5
cfrac espresso lindsay LRUsim perl roboop Average
BenchmarkN
orm
alized
Ru
nti
me
Kingsley KingsleyHeap Lea LeaHeap
Space (normalized to Lea allocator)
0
0.5
1
1.5
2
2.5
cfrac espresso lindsay LRUsim perl roboop Average
Benchmark
No
rmali
zed
Sp
ace
Kingsley KingsleyHeap Lea LeaHeap
![Page 12: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/12.jpg)
12UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Building memory managers Heap Layers framework
Problems with memory managers Contention, space, false sharing
Solution: provably scalable allocator Hoard
Extended memory manager for servers Reap
![Page 13: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/13.jpg)
13UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Problems with General-Purpose Memory Managers
Previous work for multiprocessors Concurrent single heap [Bigler et al. 85, Johnson 91,
Iyengar 92] Impractical
Multiple heaps [Larson 98, Gloger 99]
Reduce contention but cause other problems: P-fold or even unbounded increase in space Allocator-induced false sharing
we show
![Page 14: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/14.jpg)
14UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Multiple Heap Allocator:Pure Private Heaps
One heap per processor: malloc gets memory
from its local heap free puts memory
on its local heap
STL, Cilk, ad hoc
x1= malloc(1)
free(x1) free(x2)
x3= malloc(1)
x2= malloc(1)
x4= malloc(1)
processor 0 processor 1
= in use, processor 0
= free, on heap 1
free(x3) free(x4)
Key:
![Page 15: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/15.jpg)
15UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Problem:Unbounded Memory Consumption
Producer-consumer: Processor 0 allocates Processor 1 frees
Unbounded memory consumption Crash!
free(x1)
x2= malloc(1)
free(x2)
x1= malloc(1)processor 0 processor 1
x3= malloc(1)
free(x3)
![Page 16: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/16.jpg)
16UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Multiple Heap Allocator:Private Heaps with Ownership
free returns memory to original heap
Bounded memory consumption No crash!
“Ptmalloc” (Linux),LKmalloc
x1= malloc(1)
free(x1)
free(x2)
x2= malloc(1)
processor 0 processor 1
![Page 17: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/17.jpg)
17UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Problem:P-fold Memory Blowup
Occurs in practice Round-robin producer-
consumer processor i mod P
allocates processor (i+1) mod P
frees
Footprint = 1 (2GB),but space = 3 (6GB) Exceeds 32-bit address
space: Crash!
free(x2)
free(x1)
free(x3)
x1= malloc(1)
x2= malloc(1)
x3=malloc(1)
processor 0 processor 1 processor 2
![Page 18: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/18.jpg)
18UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Problem:Allocator-Induced False Sharing
False sharing Non-shared objects
on same cache line Bane of parallel
applications Extensively studied
All these allocatorscause false sharing!
CPU 0 CPU 1
cache cache
bus
processor 0 processor 1x2= malloc(1)x1= malloc(1)
cache line
thrash… thrash…
![Page 19: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/19.jpg)
19UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
So What Do We Do Now? Where do we put free memory?
on central heap: on our own heap:
(pure private heaps) on the original heap:
(private heaps with ownership)
How do we avoid false sharing?
Heap contention Unbounded
memory consumption
P-fold blowup
![Page 20: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/20.jpg)
20UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Building memory managers Heap Layers framework
Problems with memory managers Contention, space, false sharing
Solution: provably scalable allocator Hoard
Extended memory manager for servers Reap
![Page 21: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/21.jpg)
21UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Hoard: Key Insights Bound local memory consumption
Explicitly track utilization Move free memory to a global
heap Provably bounds memory
consumption
Manage memory in large chunks Avoids false sharing Reduces heap contention
![Page 22: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/22.jpg)
22UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview of Hoard
Manage memory in heap blocks Page-sized Avoids false sharing
Allocate from local heap block Avoids heap contention
Low utilization
Move heap block to global heap Avoids space blowup
global heap
…
processor 0 processor P-1
![Page 23: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/23.jpg)
23UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Summary of Analytical Results
Space consumption: near optimal worst-case
Hoard: O(n log M/m + P) {P « n} Optimal: O(n log M/m)
[Robson 70]: ≈ bin-packing
Private heaps with ownership:O(P n log M/m)
Provably low synchronization
n = memory requiredM = biggest object sizem = smallest object sizeP = processors
![Page 24: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/24.jpg)
24UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Empirical Results Measure runtime on 14-processor Sun
Allocators Solaris (system allocator) Ptmalloc (GNU libc) mtmalloc (Sun’s “MT-hot” allocator)
Micro-benchmarks Threadtest: no sharing Larson: sharing (server-style) Cache-scratch: mostly reads & writes
(tests for false sharing) Real application experience similar
![Page 25: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/25.jpg)
25UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Runtime Performance: threadtest
speedup(x,P) = runtime(Solaris allocator, one processor) / runtime(x on P processors)
Many threads,no sharing
Hoard achieves linear speedup
![Page 26: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/26.jpg)
26UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Runtime Performance: Larson
Many threads,sharing(server-style)
Hoard achieves linear speedup
![Page 27: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/27.jpg)
27UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Runtime Performance:false sharing
Many threads,mostly reads & writes of heap data
Hoard achieves linear speedup
![Page 28: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/28.jpg)
28UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Hoard in the “Real World” Open source code
www.hoard.org 13,000 downloads Solaris, Linux, Windows, IRIX, …
Widely used in industry AOL, British Telecom, Novell, Philips Reports: 2x-10x, “impressive” improvement in
performance Search server, telecom billing systems, scene
rendering,real-time messaging middleware, text-to-speech engine, telephony, JVM
Scalable general-purpose memory manager
![Page 29: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/29.jpg)
29UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Building memory managers Heap Layers framework
Problems with memory managers Contention, space, false sharing
Solution: provably scalable allocator Hoard
Extended memory manager for servers Reap
![Page 30: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/30.jpg)
30UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Custom Memory Allocation
Programmers often replace malloc/free Attempt to increase performance Provide extra functionality (e.g., for
servers) Reduce space (rarely)
Empirical study of custom allocators Lea allocator often as fast or faster Custom allocation ineffective,
except for regions. [OOPSLA 2002]
![Page 31: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/31.jpg)
31UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview of Regions
+ Fast+ Pointer-bumping
allocation+ Deletion of chunks
+ Convenient+ One call frees all memory
regionmalloc(r, sz)regiondelete(r)
Separate areas, deletion only en masse
regioncreate(r) r
- Risky- Accidental
deletion- Too much
space
![Page 32: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/32.jpg)
32UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Why Regions?
Apparently faster, more space-efficient
Servers need memory management support: Avoid resource leaks
Tear down memory associated with terminated connections or transactions
Current approach (e.g., Apache): regions
![Page 33: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/33.jpg)
33UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Drawbacks of Regions
Can’t reclaim memory within regions Problem for long-running computations,
producer-consumer patterns,off-the-shelf “malloc/free” programs
unbounded memory consumption
Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming
![Page 34: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/34.jpg)
34UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Reap = region + heap Adds individual object deletion & heap
Reap Hybrid Allocator
reapmalloc(r, sz)
reapdelete(r)
reapcreate(r)r
reapfree(r,p)
Can reduce memory consumption Fast
Adapts to use (region or heap style) Cheap deletion
![Page 35: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/35.jpg)
35UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Using Reap as Regions
Runtime - Region-Based Benchmarks
0
0.5
1
1.5
2
2.5
lcc mudlle
No
rma
lize
d R
un
tim
e
Original Win32 DLmalloc WinHeap Vmalloc Reap
4.08
Reap performance nearly matches regions
![Page 36: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/36.jpg)
36UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Reap: Best of Both Worlds
Combining new/delete with regionsusually impossible:
Incompatible API’s Hard to rewrite code
Use Reap: Incorporate new/delete code into Apache “mod_bc” (arbitrary-precision calculator)
Changed 20 lines (out of 8000) Benchmark: compute 1000th prime
With Reap: 240K Without Reap: 7.4MB
![Page 37: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/37.jpg)
37UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Open Questions
Grand Unified Memory Manager? Hoard + Reap Integration with garbage collection
Effective Custom Allocators? Exploit sizes, lifetimes, locality and
sharing
Challenges of newer architectures NUMA, SMT/CMP, 64-bit, predication
![Page 38: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/38.jpg)
38UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Current Work: Robust Performance
Currently: no VM-GC communicaton BAD interactions under memory
pressure Our approach (with Eliot Moss, Scott
Kaplan):Cooperative Robust Automatic Memory Management
Garbage collector
/ allocator
Virtual memory manager
LRU queuememory pressure
empty pages
reduced impact
![Page 39: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/39.jpg)
39UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Current Work: Predictable VMM
Recent work on scheduling for QoS E.g., proportional-share Under memory pressure, VMM is
scheduler Paged-out processes may never recover Intermittent processes may wait long time
Scheduler-faithful virtual memory(with Scott Kaplan, Prashant Shenoy) Based on page value rather than order
![Page 40: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/40.jpg)
40UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
ConclusionMemory management for high-performance
applications Heap Layers framework [PLDI 2001]
Reusable components, no runtime cost Hoard scalable memory manager [ASPLOS-IX]
High-performance, provably scalable & space-efficient Reap hybrid memory manager [OOPSLA 2002]
Provides speed & robustness for server applications
Current work: robust memory management for multiprogramming
![Page 41: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/41.jpg)
41UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
The Obligatory URL Slide
http://www.cs.umass.edu/~emery
![Page 42: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/42.jpg)
42UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
If You Can Read This,I Went Too Far
![Page 43: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/43.jpg)
43UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Hoard: Under the Hood
MallocOrFreeHeap
PerProcessorHeap
SelectSizeHeap
LockedHeap
HeapBlockManager
LockedHeap
SuperblockHeap
LockedHeap
HeapBlockManager
SystemHeap
Largeobjects(> 4K)
FreeToHeapBlock
EmptyHeap Blocks
LockedHeap
HeapBlockManager
select heap based on size
malloc from local heap, free to heap block
get or return memory to global heap
![Page 44: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/44.jpg)
44UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Custom Memory Allocation
Very common practice Apache, gcc, lcc,
STL, database servers…
Language-level support in C++
Replace new/delete,bypassing general-purpose allocator Reduce runtime – often Expand functionality –
sometimes Reduce space – rarely
“Use custom allocators”
![Page 45: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/45.jpg)
45UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Drawbacks of Custom Allocators
Avoiding memory manager means: More code to maintain & debug Can’t use memory debuggers Not modular or robust:
Mix memory from customand general-purpose allocators → crash!
Increased burden on programmers
![Page 46: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/46.jpg)
46UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose
allocators Advantages and drawbacks of regions Reaps – generalization of regions &
heaps
![Page 47: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/47.jpg)
47UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Class1 free list
(1) Per-Class Allocators
a
b
c
a = new Class1;b = new Class1;c = new Class1;delete a;delete b;delete c;a = new Class1;b = new Class1;c = new Class1;
Recycle freed objects from a free list
+ Fast+ Linked list operations
+ Simple+ Identical semantics+ C++ language
support- Possibly space-
inefficient
![Page 48: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/48.jpg)
48UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
(II) Custom Patterns
Tailor-made to fit allocation patterns Example: 197.parser (natural
language parser)char[MEMORY_LIMIT]
a = xalloc(8);b = xalloc(16);c = xalloc(8);xfree(b);xfree(c);d = xalloc(8);
a b cd
end_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_array
+ Fast+ Pointer-bumping allocation
- Brittle- Fixed memory size- Requires stack-like
lifetimes
![Page 49: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/49.jpg)
49UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
(III) Regions
+ Fast+ Pointer-bumping
allocation+ Deletion of chunks
+ Convenient+ One call frees all memory
regionmalloc(r, sz)regiondelete(r)
Separate areas, deletion only en masse
regioncreate(r) r
- Risky- Accidental
deletion- Too much
space
![Page 50: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/50.jpg)
50UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose
allocators Advantages and drawbacks of regions Reaps – generalization of regions &
heaps
![Page 51: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/51.jpg)
51UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Custom Allocators Are Faster…
Runtime - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
helcc
mud
lle
Non-re
gions
Regio
ns
Ove
rall
No
rma
lize
d R
un
tim
e
Custom Win32
non-regions regions averages
![Page 52: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/52.jpg)
52UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Not So Fast…
Runtime - Custom Allocator Benchmarks
0
0.25
0.5
0.75
1
1.25
1.5
1.75
197.
pars
er
boxe
d-sim
c-br
eeze
175.
vpr
176.
gcc
apac
he lcc
mud
lle
Non-re
gions
Regio
ns
Overa
ll
No
rma
lize
d R
un
tim
e
Custom Win32 DLmalloc
non-regions regions averages
![Page 53: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/53.jpg)
53UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
The Lea Allocator (DLmalloc 2.7.0)
Optimized for common allocation patterns Per-size quicklists ≈ per-class
allocation Deferred coalescing
(combining adjacent free objects) Highly-optimized fastpath Space-efficient
![Page 54: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/54.jpg)
54UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Space Consumption Results
Space - Custom Allocator Benchmarks
00.250.5
0.751
1.251.5
1.75
No
rmal
ized
Sp
ace
Original DLmalloc
regionsnon-regions averages
![Page 55: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/55.jpg)
55UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Overview
Introduction Perceived benefits and drawbacks Three main kinds of custom allocators Comparison with general-purpose
allocators Advantages and drawbacks of regions Reaps – generalization of regions &
heaps
![Page 56: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/56.jpg)
56UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Why Regions?
Apparently faster, more space-efficient
Servers need memory management support: Avoid resource leaks
Tear down memory associated with terminated connections or transactions
Current approach (e.g., Apache): regions
![Page 57: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/57.jpg)
57UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Drawbacks of Regions
Can’t reclaim memory within regions Problem for long-running computations,
producer-consumer patterns,off-the-shelf “malloc/free” programs
unbounded memory consumption
Current situation for Apache: vulnerable to denial-of-service limits runtime of connections limits module programming
![Page 58: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/58.jpg)
58UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Reap = region + heap Adds individual object deletion & heap
Reap Hybrid Allocator
reapmalloc(r, sz)
reapdelete(r)
reapcreate(r)r
reapfree(r,p)
Can reduce memory consumption+ Fast
+ Adapts to use (region or heap style)+ Cheap deletion
![Page 59: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/59.jpg)
59UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Using Reap as Regions
Runtime - Region-Based Benchmarks
0
0.5
1
1.5
2
2.5
lcc mudlle
No
rma
lize
d R
un
tim
e
Original Win32 DLmalloc WinHeap Vmalloc Reap
4.08
Reap performance nearly matches regions
![Page 60: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/60.jpg)
60UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Reap: Best of Both Worlds
Combining new/delete with regionsusually impossible:
Incompatible API’s Hard to rewrite code
Use Reap: Incorporate new/delete code into Apache “mod_bc” (arbitrary-precision calculator)
Changed 20 lines (out of 8000) Benchmark: compute 1000th prime
With Reap: 240K Without Reap: 7.4MB
![Page 61: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/61.jpg)
61UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Conclusion
Empirical study of custom allocators Lea allocator often as fast or faster Custom allocation ineffective,
except for regions Reaps:
Nearly matches region performancewithout other drawbacks
Take-home message: Stop using custom memory allocators!
![Page 62: U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Memory Management for High-Performance Applications Emery Berger University of Massachusetts,](https://reader035.vdocument.in/reader035/viewer/2022070307/551ac7e45503466b6a8b503e/html5/thumbnails/62.jpg)
62UUNIVERSITY OF NIVERSITY OF MMASSACHUSETTSASSACHUSETTS, A, AMHERST • MHERST • Department of Computer Science Department of Computer Science
Software
http://www.cs.umass.edu/~emery
(part of Heap Layers distribution)