comparing gcs and allocation richard jones, antony hosking and eliot moss 2012 presented by yarden...

Comparing GCs and Allocation

Richard Jones, Antony Hosking and Eliot Moss2012

Presented by Yarden Marton18.11.14

• Comparing between different garbage collectors.

• Allocation – methods and considerations.

Outline

Comparing GCs

• What is the best GC?• When we say “best” do we mean:

- Best throughput?- Shortest pause times?- Good space utilization?- Compromised combination?

Comparing GCs

• More to consider:• Application dependency• Heap space availability• Heap size

• Throughput• Pause time• Space• Implementation

Comparing GCs - Aspects

• Primary goal for ‘batch’ applications or for systems experiencing delays.

• Does a faster collector means faster application? Not necessarily.– Mutators pay the cost

Throughput

• Algorithmic complexity• Mark-sweep:

- cost of tracing and sweeping phases.- Requires visiting every object

• Copying: - cost of tracing phase only- Requires visiting only live objects

Throughput

• Is Copying collection faster? • Not necessarily:

- Number of instructions executed to visit an object- Locality- Lazy sweeping

Pause Time

• Important for interactive applications, transaction processors and more.

• ‘stop-the-world’ collectors• Immediate attraction to reference counting• However:

- Recursive reference count is costly- Both improvements of reference count

reintroduce a stop-the-world pause

• Important for:- Tight physical constraints on memory- Large applications

• All collectors incur space overhead:- Reference count fields- Additional heap space- Heap fragmentation- Auxiliary data structures- Room for garbage

• Completeness – reclaiming all dead objects eventually.- Basic reference counting is incomplete

• Promptness – reclaiming all dead objects at each collection cycle.- Basic tracing collectors (but with a cost)

• Modern high-performances collectors typically trade immediacy for performance.

Implementation

• GC algorithms are difficult to implement, especially concurrent algorithms.

• Errors can manifest themselves long afterwards• Tracing:

- Advantage: Simple collector-mutator interface - Disadvantage: Determining roots is complicated

• Reference counting:- Advantage : Can be implemented in a library- Disadvantage : Processing overheads and correctness

essentiality of all reference count manipulation.• In general, copying and compacting collectors are more

complex than non-moving collectors.

Adaptive Systems

• Commercial system often offer a choice between GCs, with a large number of tuning options.

• Researchers have developed systems that adapts to the enviroment:- Java run-time (Soman et al [2004])- Singer et al [2007a]- Sun’s Ergonomic tuning

Advice For Developers

• Know your application:- Measure its behavior- Track the size and lifetime distributions of the objects it uses.

• Experiment with the different collector configurations on offer.

• Considered two styles of collection:– Direct, reference counting.– Indirect, tracing collection.

• Next: An abstract framework for a wide variety of collectors.

A Unified Theory of GC

• GC can be expressed as a fixed point computation that assigns reference counts (n) to nodes n Nodes.

• Nodes with non-zero count are retained and the rest should be reclaimed.

• Use of abstract data structures whose implementations can vary.• W – a work list of objects to be processed. When empty, the

algorithms terminate.

Abstract GC

atomic collectTracing():rootsTracing(W)

//find root objectsscanTracing(W)

//mark reachable objectssweepTracing()

//free dead objects

rootsTracing(R):for each fld in Roots

ref ← *fldif ref ≠ null

R ← R + [ref]

scanTracing(W):while not isEmpty(W)

src ← remove(W) (src) ← (src)+1if (src) = 1

for each fld in Pointers(src)ref ← *fldif ref ≠ null

W ← W + [ref]

Abstract Tracing GC Algorithm

sweepTracing():for each noed in Nodes

if (node) = 0free(node)

else (node) ← 0

New():ref ← allocate()if ref = null

collectTracing()ref ← allocate()if ref = null

error “Out of memory” (ref) ← 0return ref

Abstract Tracing GC Algorithm (Continued)

A B C D

0 0 0 0

A B C D

0 0 0 0

A B C D

0 1 0 0

A B C D

0 1 1 0

A B C D

1 1 1 0

A B C D

1 2 1 0

A B C D

0 0 0 0

atomic collectCounting(I,D):applyIncrements(I)//increase necessary scanCounting(D) //decrease reqursivalysweepCounting()//free dead objects

applyIncrements(I):while not isEmpty(I)

ref ← remove(I)(ref) ← (ref)+1

scanCounting(W):

while not isEmpty(W)src ← remove(W) (src) ← (src)-1if (src) = 0

for each fld in Pointers(src)ref ← *fldif ref ≠ null

W ← W + [ref]

Abstract reference counting GC Algorithm

sweepCounting():for each node in Nodes

if (node) = 0free(node)

collectCounting()ref ← allocate()if ref = null

Abstract reference counting GC Algorithm (Continued)

inc(ref):

if ref ≠ nullI ← I + [ref]

dec(ref):if ref ≠ null

D ← D + [ref]

Atomic Write(src, i, dst):inc(dst)dec(src[i])src[i] ← dst

Abstract reference counting GC Algorithm (Continued)

A B C D

0 0 0 0

I A B A D B C B

atomic collectCounting()applyIncrements(I)

A B C D

1 0 0 0

I B A D B C B

A B C D

2 3 1 1

A B C D

1 3 1 0

atomic collectCounting()applyIncrements(I)scanCounting(D)

A B C D

1 2 1 0

atomic collectCounting()applyIncrements(I)scanCounting(D)

A B C D

1 2 1 0

atomic collectCounting()applyIncrements(I)scanCounting(D)sweepCounting()

Atomic collecDrc(I,D):rootsTracing(I) //add root objects to IapplyIncrements(I) //increase necessary scanCounting(D) //decrease reqursively sweepCounting() //free dead objectsrootsTracing(D) //keep invariantapplyDecrements(D)

collecDrc(I,D)ref ← allocate()if ref = null

Abstract deferred reference counting GC Algorithm

Atomic Write(src, i, dst):if src ≠ Roots

inc(dst)dec(src[i])

src[i] ← dst

applyDecrements(D):while not isEmpty(D)

ref ← remove(D) (ref) ← (ref)-1

Abstract deferred reference counting GC Algorithm (Continued)

A B C D

0 0 0 0

I A B A D B

atomic collectDrc()rootsTracing(I)

A B C D

0 0 0 0

I A B A D B B C

atomic collectDrc()rootsTracing(I)applyIncrements(I)

A B C D

2 3 1 1

atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)

A B C D

1 2 1 0

atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()

A B C D

1 2 1 0

atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()rootsTracing(D)

atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()rootsTracing(D)applyDecrements(D)

Comparing GCs Summary

• GCs performance depends on various aspects- Therefore, no GC has an absolute advantage on the others.

• Garbage collection can be expressed in an abstract way.- Highlights similarity and differences

Allocation

• Three aspects to memory management:- Allocation of memory in the first place- Identification of live data- Reclamation for future use

• Allocation and reclamation of memory are tightly linked• Several key differences between automatic and explicit

memory management, in terms of allocating and freeing:- GC free space all at once- A system with GC has more information when allocating- With GC, users tends to write programs in a different style.

• Uses a large free chunk of memory• Given a request for n bytes, it allocates that much from one

end of the free chunk.

sequentialAllocate(n):result ← freenewFree ← result + nif newFree > limit

return nullfree ← newFreereturn result

Sequential Allocation

allocated available

free limitRequest to allocate n bytes

allocated available

free limit

allocated

result

Alignmentpadding

• Properties:– Simple– Efficient– Better cache locality– May be less suitable for non-moving collectors

Sequential Allocation

• A data structure records the location and size of free cells of memory.

• The allocator considers each free cell in turn, and according to some policy, chooses one to allocate.

• Three basic types of free-list allocation:– First-fit– Next-fit– Best-fit

Free-list Allocation

First-fit Allocation

• Use the first cell that can satisfy the allocation request.• A split of the cell may occur unless the remainder is too small.

firstFitAllocate(n):prev ← adressOf(head)loop

curr ← next(prev)if curr = null

return nullelse if size(curr) < n

prev ← currelse

return listAllocate(prev, curr, n)

listAllocate(prev, curr, n):result ← currif shouldSplit(size(curr), n)

remainder ← result + nnext(remainder) ← next(curr)size(remainder) ← size(curr)-nnext(prev) ← remainder

elsenext(prev) ← next(curr)

return result

liatAllocateAlt(prev, curr, n):if sholudSplit(size(curr), n)

size(curr) ← size(curr) – nresult ← curr + size(curr)

elsenext(prev) ← next(curr)result ← curr

return result

150KB 100KB 170KB 300KB 50KB

AllocatedFree

120KB allocation request

30KB 100KB 170KB 300KB 50KB

First-fit

30KB 100KB 170KB 300KB 50KB

30KB 50KB 170KB 300KB 50KB

30KB 50KB 170KB 100KB 50KB

• Small remainder cells accumulate near the front of the list, slowing down allocation.

• In terms of space utilization, may behave similarly to best-fit.

• An issue is where in the list to enter a newly freed cell• It is usually more natural to build the list in address

order, like mark-sweep does.

• A variation of first-fit• Method - start the search for a cell of suitable size

from the point in the list where the last search succeeded.

• When reaching the end of list, start over from the beginning.

• Idea - reduce the need to iterate repeatedly past the small cells at the head of the list.

• Drawbacks:– Fragmentation– Poor locality on accessing the list– Poor locality of the allocated objects

Next-fit Allocation

nextFitAllocate(n):start ← prevloop

curr ← next(prev)if curr = null

prev ← addressOf(head)curr ← next(prev)

if prev = startreturn null

else if size(curr) < nprev ← curr

elsereturn listAllocate(prev, curr, n)

Next-fit Allocation Algorithm

150KB 100KB 170KB 300KB 50KB

AllocatedFree

30KB 100KB 170KB 300KB 50KB

Next-fit

30KB 100KB 170KB 300KB 50KB

30KB 80KB 170KB 300KB 50KB

30KB 80KB 120KB 300KB 50KB

• Method - find the cell whose size most closely matches the allocation request.

• Idea:– Minimize waste– Avoid splitting large cells unnecessarily

• Bad worst case

Best-fit Allocation

bestFitAllocate(n):best ← nullbestSize ← ∞prev ← addressOf(head)loop

curr ← next(prev)if curr = null || size(curr) = n

if curr ≠ nullbestPrev ← prevbest ← curr

else if best = nullreturn null

return listAllocate(bestPrev, best, n)else if size(curr) < n || bestSize < size(curr)

prev ← curr else

best ← currbestPrev ← prevbestSize ← size(curr)

Best-fit Allocation Algorithm

150KB 100KB 170KB 300KB 50KB

AllocatedFree

150KB 10KB 170KB 300KB 50KB

Best-fit

150KB 10KB 170KB 300KB 50KB

150KB 10KB 170KB 300KB

50KB 10KB 170KB 300KB

• Use of a Balanced binary tree• Sorted by size (for best-fit) or by address (for first-fit

or next-fit).• If sorted by size, can enter only one cell of each size.• Example: Cartesian tree for first/next-fit.– Indexed by address (primary key) and size (secondary key)– Total order by address– Organized as a heap for the sizes

Speeding Free-list Allocation

• Searching in the Cartesian tree under first-fits policy:

firstFitAllocateCartesian(n):parent ← nullcurr ← rootloop

if left(curr) ≠ null && max(left(curr)) ≥ nparent ← currcurr ← left(curr)

else if prev < curr && size(curr) ≥ nprev ← currreturn treeAllocate(curr, parent, n)

else if right(curr) ≠ null && max(right(curr)) ≥ nparent ← currcurr ← right (curr)

elsereturn null

Speeding Free-list Allocation

• Dispersal of free memory across a possibly large number of small free cells.

• Negative effects:– Can prevent allocation from succeeding– May cause a program to use more address space, more resident

pages and more cache lines.• Fragmentation is impractical to avoid:

– Usually the allocator cannot know what the future request sequence will be.

– Even given a known request sequence, doing an optimal allocation is NP-hard.

• Usually There is a trade-off between allocation speed and fragmentation.

Fragmentation

• Idea – use multiple free-list whose members are segregated by size in order to speed allocation.

• Usually a fixed number k of size values s0 < s1 < … < sk-1• k+1 free lists f0,…,fk• For a free cell, b, on list fi,

size(b) > sk-1 if i=k• When requesting a cell of size b≤sk-1, the allocator rounds

the request size up to the smallest si such that b ≤si.

• Si is called a size class

Segregated-fits Allocation

SegregatedFitAllocate(j):result ← remove(freeLists[j])if result = null

large ← allocateBlock()if large = null

return nullinitialize(large, sizes[j])result ← remove(freeList[j])

return result

• List fk, for cells larger than sk, is organized to use one of the basic single-list algorithm.

• Per-cell overheads for large cell are a bit higher but in total it is negligible.

• The main advantage: for size classes other than sk, allocation typically requires constant time.

Segregated-fits Allocation

>sk-1 >sk-1

• On simple free-list allocators – free cells that were too small to satisfy a request. Called external fragmentation.

• On segregated-fits allocation – wasted space inside an individual cell because the requested size was rounded up. Called internal fragmentation.

comparing gcs and allocation richard jones, antony hosking and eliot moss 2012 presented by yarden...

Documents

muziekboek yarden crematorium heerlen in heerlen

angela hosking media kit -...

london flood risk management strategy adam hosking

hosking farm complete dispersal 2012

visual correlation for situational awareness yarden livnat...

start-up weekend business life. Рабочая...

cs newsletter e 11-2014-ak.qxp 18.11.14 16:14 seite...

dkom 3 - alex ionescu...dkom 3.0 hiding and hooking with...

mitrani yarden - the wedding dress - by roberto alborghetti...

pertuzumab and trastuzumab: the rationale way to synergy ·...

[rodger h. hosking] digital receiver handbook...

mostly-copying reachability-based orthogonal...

the future of local government-steve jorden & sophie hosking

“the art & science of entrepreneurship: people, technology...

bop regional council geospatial services michele hosking &...

bop regional council geospatial services michele hosking &...

clement hosking · 2019. 10. 27. · clement hosking was...

hosking farm complete dispersal

professor john hosking - wiki

brochure 18.11.14