reconsidering custom memory allocation

30
UNIVERSITY OF MASSACHUSETTS DEPARTMENT OF COMPUTER SCIENCE Reconsidering Custom Memory Allocation Emery Berger, Ben Zorn, Kathryn McKinley

Upload: emery-berger

Post on 04-Dec-2014

4.252 views

Category:

Business


0 download

DESCRIPTION

This talk presents an extensive experimental study that shows that a good general-purpose allocator is better than almost all commonly-used custom allocators, with one exception: regions (a.k.a., pools, arenas, zones). However, it shows that regions consume much more memory than necessary. The talk then introduces reaps (regions + heaps), which combine the flexibility and space efficiency of heaps with the performance of regions.

TRANSCRIPT

Page 1: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE

Reconsidering

Custom Memory Allocation

Emery Berger, Ben Zorn, Kathryn McKinley

Page 2: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 2

Custom Memory Allocation

Very common practice

Apache, gcc, lcc, STL,

database servers…

Language-level

support in C++

Widely recommended

Programmers replace

new/delete, bypassing

system allocator

Reduce runtime – often

Expand functionality – sometimes

Reduce space – rarely

“Use custom

allocators”

Page 3: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 3

Drawbacks of Custom Allocators

Avoiding system allocator:

More code to maintain & debug

Can’t use memory debuggers

Not modular or robust:

Mix memory from custom

and general-purpose allocators → crash!

Increased burden on programmers

Are custom allocators really a win?

Page 4: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 4

Overview

Introduction

Perceived benefits and drawbacks

Three main kinds of custom allocators

Comparison with general-purpose allocators

Advantages and drawbacks of regions

Reaps – generalization of regions & heaps

Page 5: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 5

Class1 free list

(1) Per-Class Allocators

a

b

c

a = new Class1;

b = new Class1;

c = new Class1;

delete a;

delete b;

delete c;

a = new Class1;

b = new Class1;

c = new Class1;

Recycle freed objects from a free list

+ Fast+ Linked list operations

+ Simple

+ Identical semantics

+ C++ language support

- Possibly space-inefficient

Page 6: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 6

(II) Custom Patterns

Tailor-made to fit allocation patterns

Example: 197.parser (natural language parser)

char[MEMORY_LIMIT]

a = xalloc(8);b = xalloc(16);c = xalloc(8);xfree(b);xfree(c);d = xalloc(8);

a b cd

end_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_arrayend_of_array

+ Fast+ Pointer-bumping allocation

- Brittle

- Fixed memory size

- Requires stack-like lifetimes

Page 7: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 7

(III) Regions

+ Fast

+ Pointer-bumping allocation

+ Deletion of chunks

+ Convenient

+ One call frees all memory

regionmalloc(r, sz)

regiondelete(r)

Separate areas, deletion only en masse

regioncreate(r) r

- Risky

- Dangling

references

- Too much space

Increasingly popular custom allocator

Page 8: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 8

Overview

Introduction

Perceived benefits and drawbacks

Three main kinds of custom allocators

Comparison with general-purpose allocators

Advantages and drawbacks of regions

Reaps – generalization of regions & heaps

Page 9: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 9

Custom Allocators Are Faster…

Runtime - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

helcc

mud

lle

No

rma

lize

d R

un

tim

e

Custom Win32

non-regions regions

As good as and sometimes much faster than Win32

Page 10: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 10

Not So Fast…

Runtime - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

No

rma

lize

d R

un

tim

e

Custom Win32 DLmalloc

non-regions regions

DLmalloc: as fast or faster for most benchmarks

Page 11: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 11

The Lea Allocator (DLmalloc 2.7.0)

Mature public-domain general-purpose allocator

Optimized for common allocation patterns

Per-size quicklists ≈ per-class allocation

Deferred coalescing(combining adjacent free objects)

Highly-optimized fastpath

Space-efficient

Page 12: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 12

Space Consumption: Mixed Results

Space - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

No

rmalized

Sp

ace

Custom DLmalloc

regionsnon-regions

Page 13: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 13

Overview

Introduction

Perceived benefits and drawbacks

Three main kinds of custom allocators

Comparison with general-purpose allocators

Advantages and drawbacks of regions

Reaps – generalization of regions & heaps

Page 14: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 14

Regions – Pros and Cons

+ Fast, convenient, etc.

+ Avoid resource leaks (e.g., Apache)

Tear down memory for terminated connections

- No individual object deletion

Unbounded memory consumption(producer-consumer, long-running computations,

off-the-shelf programs)

Apache: vulnerable to DoS, memory leaks

Page 15: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 15

Reap = region + heap

Adds individual object deletion & heap

Reap Hybrid Allocator

reapmalloc(r, sz)

reapdelete(r)

reapcreate(r)r

reapfree(r,p)

+ Can reduce memory consumption

+ Fast

+ Adapts to use (region or heap style)

+ Cheap deletion

Page 16: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 16

Reap Runtime

Runtime - Custom Allocation Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

No

rma

lize

d r

un

tim

e

Custom Win32 DLmalloc Reap

non-regions regions

Page 17: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 17

Reap Space

Space - Custom Allocator Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

197.

pars

er

boxe

d-sim

c-br

eeze

175.

vpr

176.

gcc

apac

he lcc

mud

lle

No

rma

lize

d S

pa

ce

Custom DLmalloc Reap

non-regions regions

Page 18: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 18

Reap: Best of Both Worlds

Allows mixing of regions and new/delete

Case study:

New Apache module “mod_bc”

bc: C-based arbitrary-precision calculator

Changed 20 lines out of 8000

Benchmark: compute 1000th prime

With Reap: 240K

Without Reap: 7.4MB

Page 19: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 19

Conclusions and Future Work

Empirical study of custom allocators

Lea allocator often as fast or faster

Non-region custom allocation ineffective

Reap: region performance without drawbacks

Future work:

Reduce space with per-page bitmaps

Combine with scalable general-purpose

allocator (e.g., Hoard)

Page 20: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 20

Software

http://www.cs.umass.edu/~emery

(Reap: part of Heap Layers distribution)

http://g.oswego.edu

(DLmalloc 2.7.0)

Page 21: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 21

If You Can Read This,

I Went Too Far

Page 22: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 22

Backup Slides

Page 23: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 23

Experimental Methodology

Comparing to general-purpose

allocators

Same semantics: no problem

E.g., disable per-class allocators

Different semantics: use emulator

Uses general-purpose allocator

Adds bookkeeping to support

region semantics

Page 24: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 24

Why Did They Do That?

Recommended practice

Premature optimization

Microbenchmarks vs. actual performance

Drift

Not bottleneck anymore

Improved competition

Modern allocators are better

Page 25: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 25

Reaps as Regions: Runtime

Runtime - Region-Based Benchmarks

0

0.25

0.5

0.75

1

1.25

1.5

1.75

lcc mudlle

No

rma

lize

d R

un

tim

e

Custom Win32 DLmalloc Reap

Reap performance nearly matches regions

Page 26: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 26

Using Reap as Regions

Runtime - Region-Based Benchmarks

0

0.5

1

1.5

2

2.5

lcc mudlle

No

rma

lize

d R

un

tim

e

Original Win32 DLmalloc WinHeap Vmalloc Reap

4.08

Reap performance nearly matches regions

Page 27: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 27

Drawbacks of Regions

Can’t reclaim memory within regions

Bad for long-running computations,

producer-consumer patterns,

“malloc/free” programs

unbounded memory consumption

Current situation for Apache:

vulnerable to denial-of-service

limits runtime of connections

limits module programming

Page 28: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 28

Use Custom Allocators?

Strongly recommended by practitioners

Little hard data on performance/space improvements

Only one previous study [Zorn 1992]

Focused on just one type of allocator

Custom allocators: waste of time

Small gains, bad allocators

Different allocators better? Trade-offs?

Page 29: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 29

Kinds of Custom Allocators

Three basic types of custom allocators

Per-class

Fast

Custom patterns

Fast, but very special-purpose

Regions

Fast, possibly more space-efficient

Convenient

Variants: nested, obstacks

Page 30: Reconsidering Custom Memory Allocation

UNIVERSITY OF MASSACHUSETTS • DEPARTMENT OF COMPUTER SCIENCE 30

Optimization Opportunity

Time Spent in Memory Operations

0

20

40

60

80

100

197.pa

rser

boxe

d-sim

c-br

eeze

175.vp

r

176.gc

c

apac

he lcc

mud

lle

Avera

ge

% o

f ru

nti

me

Memory Operations Other