comparing gcs and allocation richard jones, antony hosking and eliot moss 2012 presented by yarden...
Post on 04-Jan-2016
213 Views
Preview:
TRANSCRIPT
Comparing GCs and Allocation
Richard Jones, Antony Hosking and Eliot Moss2012
Presented by Yarden Marton18.11.14
• Comparing between different garbage collectors.
• Allocation – methods and considerations.
Outline
Comparing GCs
• What is the best GC?• When we say “best” do we mean:
- Best throughput?- Shortest pause times?- Good space utilization?- Compromised combination?
Comparing GCs
• More to consider:• Application dependency• Heap space availability• Heap size
• Throughput• Pause time• Space• Implementation
Comparing GCs - Aspects
• Throughput• Pause time• Space• Implementation
Comparing GCs - Aspects
• Primary goal for ‘batch’ applications or for systems experiencing delays.
• Does a faster collector means faster application? Not necessarily.– Mutators pay the cost
Throughput
Throughput
• Algorithmic complexity• Mark-sweep:
- cost of tracing and sweeping phases.- Requires visiting every object
• Copying: - cost of tracing phase only- Requires visiting only live objects
Throughput
• Is Copying collection faster? • Not necessarily:
- Number of instructions executed to visit an object- Locality- Lazy sweeping
• Throughput• Pause time• Space• Implementation
Comparing GCs - Aspects
Pause Time
• Important for interactive applications, transaction processors and more.
• ‘stop-the-world’ collectors• Immediate attraction to reference counting• However:
- Recursive reference count is costly- Both improvements of reference count
reintroduce a stop-the-world pause
• Throughput• Pause time• Space• Implementation
Comparing GCs - Aspects
Space
• Important for:- Tight physical constraints on memory- Large applications
• All collectors incur space overhead:- Reference count fields- Additional heap space- Heap fragmentation- Auxiliary data structures- Room for garbage
Space
• Completeness – reclaiming all dead objects eventually.- Basic reference counting is incomplete
• Promptness – reclaiming all dead objects at each collection cycle.- Basic tracing collectors (but with a cost)
• Modern high-performances collectors typically trade immediacy for performance.
• Throughput• Pause time• Space• Implementation
Comparing GCs - Aspects
Implementation
• GC algorithms are difficult to implement, especially concurrent algorithms.
• Errors can manifest themselves long afterwards• Tracing:
- Advantage: Simple collector-mutator interface - Disadvantage: Determining roots is complicated
• Reference counting:- Advantage : Can be implemented in a library- Disadvantage : Processing overheads and correctness
essentiality of all reference count manipulation.• In general, copying and compacting collectors are more
complex than non-moving collectors.
Adaptive Systems
• Commercial system often offer a choice between GCs, with a large number of tuning options.
• Researchers have developed systems that adapts to the enviroment:- Java run-time (Soman et al [2004])- Singer et al [2007a]- Sun’s Ergonomic tuning
Advice For Developers
• Know your application:- Measure its behavior- Track the size and lifetime distributions of the objects it uses.
• Experiment with the different collector configurations on offer.
• Considered two styles of collection:– Direct, reference counting.– Indirect, tracing collection.
• Next: An abstract framework for a wide variety of collectors.
A Unified Theory of GC
• GC can be expressed as a fixed point computation that assigns reference counts (n) to nodes n Nodes.
• Nodes with non-zero count are retained and the rest should be reclaimed.
• Use of abstract data structures whose implementations can vary.• W – a work list of objects to be processed. When empty, the
algorithms terminate.
Abstract GC
atomic collectTracing():rootsTracing(W)
//find root objectsscanTracing(W)
//mark reachable objectssweepTracing()
//free dead objects
rootsTracing(R):for each fld in Roots
ref ← *fldif ref ≠ null
R ← R + [ref]
scanTracing(W):while not isEmpty(W)
src ← remove(W) (src) ← (src)+1if (src) = 1
for each fld in Pointers(src)ref ← *fldif ref ≠ null
W ← W + [ref]
Abstract Tracing GC Algorithm
sweepTracing():for each noed in Nodes
if (node) = 0free(node)
else (node) ← 0
New():ref ← allocate()if ref = null
collectTracing()ref ← allocate()if ref = null
error “Out of memory” (ref) ← 0return ref
Abstract Tracing GC Algorithm (Continued)
A
DC
B
Roots
A B C D
0 0 0 0
W
A
DC
B
Roots
A B C D
0 0 0 0
W B C
A
DC
B
Roots
A B C D
0 1 0 0
W C
A
DC
B
Roots
A B C D
0 1 1 0
W A B
A
DC
B
Roots
A B C D
1 1 1 0
W B
A
DC
B
Roots
A B C D
1 2 1 0
W
A
DC
B
Roots
A B C D
0 0 0 0
W
atomic collectCounting(I,D):applyIncrements(I)//increase necessary scanCounting(D) //decrease reqursivalysweepCounting()//free dead objects
applyIncrements(I):while not isEmpty(I)
ref ← remove(I)(ref) ← (ref)+1
scanCounting(W):
while not isEmpty(W)src ← remove(W) (src) ← (src)-1if (src) = 0
for each fld in Pointers(src)ref ← *fldif ref ≠ null
W ← W + [ref]
Abstract reference counting GC Algorithm
sweepCounting():for each node in Nodes
if (node) = 0free(node)
New():ref ← allocate()if ref = null
collectCounting()ref ← allocate()if ref = null
error “Out of memory” (ref) ← 0return ref
Abstract reference counting GC Algorithm (Continued)
inc(ref):
if ref ≠ nullI ← I + [ref]
dec(ref):if ref ≠ null
D ← D + [ref]
Atomic Write(src, i, dst):inc(dst)dec(src[i])src[i] ← dst
Abstract reference counting GC Algorithm (Continued)
A B C D
0 0 0 0
A
DC
B
Roots
I A B A D B C B
D A D
atomic collectCounting()applyIncrements(I)
A B C D
1 0 0 0
A
DC
B
Roots
atomic collectCounting()applyIncrements(I)
I B A D B C B
D A D
A B C D
2 3 1 1
A
DC
B
Roots
atomic collectCounting()applyIncrements(I)
I
D A D
A B C D
1 3 1 0
A
DC
B
Roots
atomic collectCounting()applyIncrements(I)scanCounting(D)
I
D B
A B C D
1 2 1 0
A
DC
B
Roots
atomic collectCounting()applyIncrements(I)scanCounting(D)
I
D
A B C D
1 2 1 0
A
DC
B
Roots
atomic collectCounting()applyIncrements(I)scanCounting(D)sweepCounting()
I
D
Atomic collecDrc(I,D):rootsTracing(I) //add root objects to IapplyIncrements(I) //increase necessary scanCounting(D) //decrease reqursively sweepCounting() //free dead objectsrootsTracing(D) //keep invariantapplyDecrements(D)
New():ref ← allocate()if ref = null
collecDrc(I,D)ref ← allocate()if ref = null
error “Out of memory” (ref) ← 0return ref
Abstract deferred reference counting GC Algorithm
Atomic Write(src, i, dst):if src ≠ Roots
inc(dst)dec(src[i])
src[i] ← dst
applyDecrements(D):while not isEmpty(D)
ref ← remove(D) (ref) ← (ref)-1
Abstract deferred reference counting GC Algorithm (Continued)
A B C D
0 0 0 0
A
DC
B
Roots
I A B A D B
D A D
atomic collectDrc()rootsTracing(I)
A B C D
0 0 0 0
A
DC
B
Roots
I A B A D B B C
D A D
atomic collectDrc()rootsTracing(I)applyIncrements(I)
A B C D
2 3 1 1
A
DC
B
Roots
I
D A D
atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)
A B C D
1 2 1 0
A
DC
B
Roots
I
D
atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()
A B C D
1 2 1 0
A
DC
B
Roots
I
D
atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()rootsTracing(D)
A B C
1 2 1
A
C
B
Roots
I
D B C
atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()rootsTracing(D)applyDecrements(D)
A B C
1 1 0
A
C
B
Roots
I
D
atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()rootsTracing(D)applyDecrements(D)
Comparing GCs Summary
• GCs performance depends on various aspects- Therefore, no GC has an absolute advantage on the others.
• Garbage collection can be expressed in an abstract way.- Highlights similarity and differences
Allocation
• Three aspects to memory management:- Allocation of memory in the first place- Identification of live data- Reclamation for future use
• Allocation and reclamation of memory are tightly linked• Several key differences between automatic and explicit
memory management, in terms of allocating and freeing:- GC free space all at once- A system with GC has more information when allocating- With GC, users tends to write programs in a different style.
• Uses a large free chunk of memory• Given a request for n bytes, it allocates that much from one
end of the free chunk.
sequentialAllocate(n):result ← freenewFree ← result + nif newFree > limit
return nullfree ← newFreereturn result
Sequential Allocation
allocated available
free limitRequest to allocate n bytes
n
allocated available
free limit
allocated
result
Alignmentpadding
• Properties:– Simple– Efficient– Better cache locality– May be less suitable for non-moving collectors
Sequential Allocation
• A data structure records the location and size of free cells of memory.
• The allocator considers each free cell in turn, and according to some policy, chooses one to allocate.
• Three basic types of free-list allocation:– First-fit– Next-fit– Best-fit
Free-list Allocation
First-fit Allocation
• Use the first cell that can satisfy the allocation request.• A split of the cell may occur unless the remainder is too small.
firstFitAllocate(n):prev ← adressOf(head)loop
curr ← next(prev)if curr = null
return nullelse if size(curr) < n
prev ← currelse
return listAllocate(prev, curr, n)
listAllocate(prev, curr, n):result ← currif shouldSplit(size(curr), n)
remainder ← result + nnext(remainder) ← next(curr)size(remainder) ← size(curr)-nnext(prev) ← remainder
elsenext(prev) ← next(curr)
return result
liatAllocateAlt(prev, curr, n):if sholudSplit(size(curr), n)
size(curr) ← size(curr) – nresult ← curr + size(curr)
elsenext(prev) ← next(curr)result ← curr
return result
First-fit Allocation
150KB 100KB 170KB 300KB 50KB
AllocatedFree
120KB allocation request
30KB 100KB 170KB 300KB 50KB
First-fit
30KB 100KB 170KB 300KB 50KB
50KB allocation request
30KB 50KB 170KB 300KB 50KB
30KB 50KB 170KB 300KB 50KB
200KB allocation request
30KB 50KB 170KB 100KB 50KB
• Small remainder cells accumulate near the front of the list, slowing down allocation.
• In terms of space utilization, may behave similarly to best-fit.
• An issue is where in the list to enter a newly freed cell• It is usually more natural to build the list in address
order, like mark-sweep does.
First-fit Allocation
• A variation of first-fit• Method - start the search for a cell of suitable size
from the point in the list where the last search succeeded.
• When reaching the end of list, start over from the beginning.
• Idea - reduce the need to iterate repeatedly past the small cells at the head of the list.
• Drawbacks:– Fragmentation– Poor locality on accessing the list– Poor locality of the allocated objects
Next-fit Allocation
nextFitAllocate(n):start ← prevloop
curr ← next(prev)if curr = null
prev ← addressOf(head)curr ← next(prev)
if prev = startreturn null
else if size(curr) < nprev ← curr
elsereturn listAllocate(prev, curr, n)
Next-fit Allocation Algorithm
150KB 100KB 170KB 300KB 50KB
AllocatedFree
120KB allocation request
30KB 100KB 170KB 300KB 50KB
Next-fit
30KB 100KB 170KB 300KB 50KB
20KB allocation request
30KB 80KB 170KB 300KB 50KB
30KB 80KB 170KB 300KB 50KB
50KB allocation request
30KB 80KB 120KB 300KB 50KB
• Method - find the cell whose size most closely matches the allocation request.
• Idea:– Minimize waste– Avoid splitting large cells unnecessarily
• Bad worst case
Best-fit Allocation
bestFitAllocate(n):best ← nullbestSize ← ∞prev ← addressOf(head)loop
curr ← next(prev)if curr = null || size(curr) = n
if curr ≠ nullbestPrev ← prevbest ← curr
else if best = nullreturn null
return listAllocate(bestPrev, best, n)else if size(curr) < n || bestSize < size(curr)
prev ← curr else
best ← currbestPrev ← prevbestSize ← size(curr)
Best-fit Allocation Algorithm
150KB 100KB 170KB 300KB 50KB
AllocatedFree
150KB 10KB 170KB 300KB 50KB
90KB allocation request
Best-fit
150KB 10KB 170KB 300KB 50KB
50KB allocation request
150KB 10KB 170KB 300KB
150KB 10KB 170KB 300KB
50KB 10KB 170KB 300KB
100KB allocation request
• Use of a Balanced binary tree• Sorted by size (for best-fit) or by address (for first-fit
or next-fit).• If sorted by size, can enter only one cell of each size.• Example: Cartesian tree for first/next-fit.– Indexed by address (primary key) and size (secondary key)– Total order by address– Organized as a heap for the sizes
Speeding Free-list Allocation
• Searching in the Cartesian tree under first-fits policy:
firstFitAllocateCartesian(n):parent ← nullcurr ← rootloop
if left(curr) ≠ null && max(left(curr)) ≥ nparent ← currcurr ← left(curr)
else if prev < curr && size(curr) ≥ nprev ← currreturn treeAllocate(curr, parent, n)
else if right(curr) ≠ null && max(right(curr)) ≥ nparent ← currcurr ← right (curr)
elsereturn null
Speeding Free-list Allocation
• Dispersal of free memory across a possibly large number of small free cells.
• Negative effects:– Can prevent allocation from succeeding– May cause a program to use more address space, more resident
pages and more cache lines.• Fragmentation is impractical to avoid:
– Usually the allocator cannot know what the future request sequence will be.
– Even given a known request sequence, doing an optimal allocation is NP-hard.
• Usually There is a trade-off between allocation speed and fragmentation.
Fragmentation
• Idea – use multiple free-list whose members are segregated by size in order to speed allocation.
• Usually a fixed number k of size values s0 < s1 < … < sk-1• k+1 free lists f0,…,fk• For a free cell, b, on list fi,
size(b) > sk-1 if i=k• When requesting a cell of size b≤sk-1, the allocator rounds
the request size up to the smallest si such that b ≤si.
• Si is called a size class
Segregated-fits Allocation
SegregatedFitAllocate(j):result ← remove(freeLists[j])if result = null
large ← allocateBlock()if large = null
return nullinitialize(large, sizes[j])result ← remove(freeList[j])
return result
• List fk, for cells larger than sk, is organized to use one of the basic single-list algorithm.
• Per-cell overheads for large cell are a bit higher but in total it is negligible.
• The main advantage: for size classes other than sk, allocation typically requires constant time.
Segregated-fits Allocation
fk-1
fk
f1
f0 s0
s1
sk-1
>sk-1 >sk-1
• On simple free-list allocators – free cells that were too small to satisfy a request. Called external fragmentation.
• On segregated-fits allocation – wasted space inside an individual cell because the requested size was rounded up. Called internal fragmentation.
More on Fragmentation
• Important consideration – how to populate each free-list of segregated-fits.
• Two approaches:– Dedicating whole blocks to particular sizes– Splitting
Populating size classes
• Choose some block size B, a power of two.• The allocator is provided with blocks.• If the request is larger than one block,
multiple contiguous blocks are allocated.• For a size class s < B, we populate the free-list
fs by allocating a block and immediately slice it into cells of size s.
• Metadata of the cells is stored on the block.
Big Bag of PagesBlock-based allocation
• Disadvantage:– Fragmentation, average waste of half a block
(worst case (B-s)/B).• Advantages:– Reduced per-cell metadata– Simple and efficient for the common case
Big Bag of PagesBlock-based allocation
• Like simple free-list schemes, split a cell if that is the only way to satisfy a request.
• Improvement: concatenate the remaining portion to a suitable free-list (if possible).
• For example – the buddy system:– Size class are powers of two– Can split a cell of size 2i+1 into two cells of size 2i
– Can combine in the opposite direction (only if the two small cells were split from the same large cell)
Splitting
128KB
Allocated Minimum cell size – 16KBFree Maximum cell size – 128KB
Allocation request20KB
The Buddy System
64KB 64KB
Allocated Minimum cell size – 16KBFree Maximum cell size – 64KB
32KB 64KB32KB
Allocation request10KB
Allocated Minimum cell size – 16KBFree Maximum cell size – 64KB
12KB 64KB32KB20KB
12KB 64KB16KB20KB 16KB
Free10KB
12KB 64KB16KB20KB 16KB
Allocated Minimum cell size – 16KBFree Maximum cell size – 64KB
12KB 64KB20KB 16KB10KB 6KB
Allocated Minimum cell size – 16KBFree Maximum cell size – 64KB
12KB 64KB32KB20KB
Free20KB
32KB 64KB32KB
64KB 64KB
Allocated Minimum cell size – 16KBFree Maximum cell size – 64KB
128KB
• Alignment• Size constraints• Boundary tags• Heap parsability• Locality
Allocation’s Additional Considerations
• Alignment• Size constraints• Boundary tags• Heap parsability• Locality
Allocation’s Additional Considerations
• Allocated objects may require special alignment
• For example: a double-word floating point– Can make the granule a double-word – wasteful– Header of array in java takes 3 words – one word
is wasted or skipped.
Alignment
• Alignment• Size constraints• Boundary tags• Heap parsability• Locality
Allocation’s Additional Considerations
• Some collection schemes require a minimum amount of space in each cell.– Forwarding address– Lock/status
• In that case, the allocator will allocate more words than requested.
Size Constraints
• Alignment• Size constraints• Boundary tags• Heap parsability• Locality
Allocation’s Additional Considerations
• Additional header or boundary tag associated with each cell.
• Found outside the storage available to the program.
• Indicates size and allocated/free status• Is one or two words long• Possible use of bitmap instead
Boundary Tags
• Alignment• Size constraints• Boundary tags• Heap parsability• Locality
Allocation’s Additional Considerations
• The ability to advance cell to cell in the heap• An object’s header (one or two words):– Type– Hash code– Synchronization information– Mark bit
• The header comes before the data• The reference refers to the first element/field
Heap Parsability
• How to handle alignment?– Zero all free space in advance– Devise a distinct range of values to write at the
start of the gap• Easier parsing with a bit map, indicating where
each object start.– Require additional space and time
Heap Parsability
• Alignment• Size constraints• Boundary tags• Heap parsability• Locality
Allocation’s Additional Considerations
• During allocating– Address-ordered free-list and sequential allocation
present good locality.• During freeing– Goal: Objects being freed together will be near
each other.– Empirically, objects allocated at the same time
often become unreachable at about the same time.
Locality
• Multiple threads allocating• Most steps in allocation need to be atomic• Can result a bottleneck• Basic solution – each thread has its own
allocation area.• Use of a global pool and smart chunk handing
Allocation in Concurrent Systems
Allocation Summary
• Methods:- Sequential- Free-list: First-fit, Next-fit and Best-fit.- Segregated-fits
• Various considerations to notice
top related