comparing gcs and allocation richard jones, antony hosking and eliot moss 2012 presented by yarden...

Post on 04-Jan-2016

213 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Comparing GCs and Allocation

Richard Jones, Antony Hosking and Eliot Moss2012

Presented by Yarden Marton18.11.14

• Comparing between different garbage collectors.

• Allocation – methods and considerations.

Outline

Comparing GCs

• What is the best GC?• When we say “best” do we mean:

- Best throughput?- Shortest pause times?- Good space utilization?- Compromised combination?

Comparing GCs

• More to consider:• Application dependency• Heap space availability• Heap size

• Throughput• Pause time• Space• Implementation

Comparing GCs - Aspects

• Throughput• Pause time• Space• Implementation

Comparing GCs - Aspects

• Primary goal for ‘batch’ applications or for systems experiencing delays.

• Does a faster collector means faster application? Not necessarily.– Mutators pay the cost

Throughput

Throughput

• Algorithmic complexity• Mark-sweep:

- cost of tracing and sweeping phases.- Requires visiting every object

• Copying: - cost of tracing phase only- Requires visiting only live objects

Throughput

• Is Copying collection faster? • Not necessarily:

- Number of instructions executed to visit an object- Locality- Lazy sweeping

• Throughput• Pause time• Space• Implementation

Comparing GCs - Aspects

Pause Time

• Important for interactive applications, transaction processors and more.

• ‘stop-the-world’ collectors• Immediate attraction to reference counting• However:

- Recursive reference count is costly- Both improvements of reference count

reintroduce a stop-the-world pause

• Throughput• Pause time• Space• Implementation

Comparing GCs - Aspects

Space

• Important for:- Tight physical constraints on memory- Large applications

• All collectors incur space overhead:- Reference count fields- Additional heap space- Heap fragmentation- Auxiliary data structures- Room for garbage

Space

• Completeness – reclaiming all dead objects eventually.- Basic reference counting is incomplete

• Promptness – reclaiming all dead objects at each collection cycle.- Basic tracing collectors (but with a cost)

• Modern high-performances collectors typically trade immediacy for performance.

• Throughput• Pause time• Space• Implementation

Comparing GCs - Aspects

Implementation

• GC algorithms are difficult to implement, especially concurrent algorithms.

• Errors can manifest themselves long afterwards• Tracing:

- Advantage: Simple collector-mutator interface - Disadvantage: Determining roots is complicated

• Reference counting:- Advantage : Can be implemented in a library- Disadvantage : Processing overheads and correctness

essentiality of all reference count manipulation.• In general, copying and compacting collectors are more

complex than non-moving collectors.

Adaptive Systems

• Commercial system often offer a choice between GCs, with a large number of tuning options.

• Researchers have developed systems that adapts to the enviroment:- Java run-time (Soman et al [2004])- Singer et al [2007a]- Sun’s Ergonomic tuning

Advice For Developers

• Know your application:- Measure its behavior- Track the size and lifetime distributions of the objects it uses.

• Experiment with the different collector configurations on offer.

• Considered two styles of collection:– Direct, reference counting.– Indirect, tracing collection.

• Next: An abstract framework for a wide variety of collectors.

A Unified Theory of GC

• GC can be expressed as a fixed point computation that assigns reference counts (n) to nodes n Nodes.

• Nodes with non-zero count are retained and the rest should be reclaimed.

• Use of abstract data structures whose implementations can vary.• W – a work list of objects to be processed. When empty, the

algorithms terminate.

Abstract GC

atomic collectTracing():rootsTracing(W)

//find root objectsscanTracing(W)

//mark reachable objectssweepTracing()

//free dead objects

rootsTracing(R):for each fld in Roots

ref ← *fldif ref ≠ null

R ← R + [ref]

scanTracing(W):while not isEmpty(W)

src ← remove(W) (src) ← (src)+1if (src) = 1

for each fld in Pointers(src)ref ← *fldif ref ≠ null

W ← W + [ref]

Abstract Tracing GC Algorithm

sweepTracing():for each noed in Nodes

if (node) = 0free(node)

else (node) ← 0

New():ref ← allocate()if ref = null

collectTracing()ref ← allocate()if ref = null

error “Out of memory” (ref) ← 0return ref

Abstract Tracing GC Algorithm (Continued)

A

DC

B

Roots

A B C D

0 0 0 0

W

A

DC

B

Roots

A B C D

0 0 0 0

W B C

A

DC

B

Roots

A B C D

0 1 0 0

W C

A

DC

B

Roots

A B C D

0 1 1 0

W A B

A

DC

B

Roots

A B C D

1 1 1 0

W B

A

DC

B

Roots

A B C D

1 2 1 0

W

A

DC

B

Roots

A B C D

0 0 0 0

W

atomic collectCounting(I,D):applyIncrements(I)//increase necessary scanCounting(D) //decrease reqursivalysweepCounting()//free dead objects

applyIncrements(I):while not isEmpty(I)

ref ← remove(I)(ref) ← (ref)+1

scanCounting(W):

while not isEmpty(W)src ← remove(W) (src) ← (src)-1if (src) = 0

for each fld in Pointers(src)ref ← *fldif ref ≠ null

W ← W + [ref]

Abstract reference counting GC Algorithm

sweepCounting():for each node in Nodes

if (node) = 0free(node)

New():ref ← allocate()if ref = null

collectCounting()ref ← allocate()if ref = null

error “Out of memory” (ref) ← 0return ref

Abstract reference counting GC Algorithm (Continued)

inc(ref):

if ref ≠ nullI ← I + [ref]

dec(ref):if ref ≠ null

D ← D + [ref]

Atomic Write(src, i, dst):inc(dst)dec(src[i])src[i] ← dst

Abstract reference counting GC Algorithm (Continued)

A B C D

0 0 0 0

A

DC

B

Roots

I A B A D B C B

D A D

atomic collectCounting()applyIncrements(I)

A B C D

1 0 0 0

A

DC

B

Roots

atomic collectCounting()applyIncrements(I)

I B A D B C B

D A D

A B C D

2 3 1 1

A

DC

B

Roots

atomic collectCounting()applyIncrements(I)

I

D A D

A B C D

1 3 1 0

A

DC

B

Roots

atomic collectCounting()applyIncrements(I)scanCounting(D)

I

D B

A B C D

1 2 1 0

A

DC

B

Roots

atomic collectCounting()applyIncrements(I)scanCounting(D)

I

D

A B C D

1 2 1 0

A

DC

B

Roots

atomic collectCounting()applyIncrements(I)scanCounting(D)sweepCounting()

I

D

Atomic collecDrc(I,D):rootsTracing(I) //add root objects to IapplyIncrements(I) //increase necessary scanCounting(D) //decrease reqursively sweepCounting() //free dead objectsrootsTracing(D) //keep invariantapplyDecrements(D)

New():ref ← allocate()if ref = null

collecDrc(I,D)ref ← allocate()if ref = null

error “Out of memory” (ref) ← 0return ref

Abstract deferred reference counting GC Algorithm

Atomic Write(src, i, dst):if src ≠ Roots

inc(dst)dec(src[i])

src[i] ← dst

applyDecrements(D):while not isEmpty(D)

ref ← remove(D) (ref) ← (ref)-1

Abstract deferred reference counting GC Algorithm (Continued)

A B C D

0 0 0 0

A

DC

B

Roots

I A B A D B

D A D

atomic collectDrc()rootsTracing(I)

A B C D

0 0 0 0

A

DC

B

Roots

I A B A D B B C

D A D

atomic collectDrc()rootsTracing(I)applyIncrements(I)

A B C D

2 3 1 1

A

DC

B

Roots

I

D A D

atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)

A B C D

1 2 1 0

A

DC

B

Roots

I

D

atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()

A B C D

1 2 1 0

A

DC

B

Roots

I

D

atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()rootsTracing(D)

A B C

1 2 1

A

C

B

Roots

I

D B C

atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()rootsTracing(D)applyDecrements(D)

A B C

1 1 0

A

C

B

Roots

I

D

atomic collectDrc()rootsTracing(I)applyIncrements(I)scanCounting(D)sweepCounting()rootsTracing(D)applyDecrements(D)

Comparing GCs Summary

• GCs performance depends on various aspects- Therefore, no GC has an absolute advantage on the others.

• Garbage collection can be expressed in an abstract way.- Highlights similarity and differences

Allocation

• Three aspects to memory management:- Allocation of memory in the first place- Identification of live data- Reclamation for future use

• Allocation and reclamation of memory are tightly linked• Several key differences between automatic and explicit

memory management, in terms of allocating and freeing:- GC free space all at once- A system with GC has more information when allocating- With GC, users tends to write programs in a different style.

• Uses a large free chunk of memory• Given a request for n bytes, it allocates that much from one

end of the free chunk.

sequentialAllocate(n):result ← freenewFree ← result + nif newFree > limit

return nullfree ← newFreereturn result

Sequential Allocation

allocated available

free limitRequest to allocate n bytes

n

allocated available

free limit

allocated

result

Alignmentpadding

• Properties:– Simple– Efficient– Better cache locality– May be less suitable for non-moving collectors

Sequential Allocation

• A data structure records the location and size of free cells of memory.

• The allocator considers each free cell in turn, and according to some policy, chooses one to allocate.

• Three basic types of free-list allocation:– First-fit– Next-fit– Best-fit

Free-list Allocation

First-fit Allocation

• Use the first cell that can satisfy the allocation request.• A split of the cell may occur unless the remainder is too small.

firstFitAllocate(n):prev ← adressOf(head)loop

curr ← next(prev)if curr = null

return nullelse if size(curr) < n

prev ← currelse

return listAllocate(prev, curr, n)

listAllocate(prev, curr, n):result ← currif shouldSplit(size(curr), n)

remainder ← result + nnext(remainder) ← next(curr)size(remainder) ← size(curr)-nnext(prev) ← remainder

elsenext(prev) ← next(curr)

return result

liatAllocateAlt(prev, curr, n):if sholudSplit(size(curr), n)

size(curr) ← size(curr) – nresult ← curr + size(curr)

elsenext(prev) ← next(curr)result ← curr

return result

First-fit Allocation

150KB 100KB 170KB 300KB 50KB

AllocatedFree

120KB allocation request

30KB 100KB 170KB 300KB 50KB

First-fit

30KB 100KB 170KB 300KB 50KB

50KB allocation request

30KB 50KB 170KB 300KB 50KB

30KB 50KB 170KB 300KB 50KB

200KB allocation request

30KB 50KB 170KB 100KB 50KB

• Small remainder cells accumulate near the front of the list, slowing down allocation.

• In terms of space utilization, may behave similarly to best-fit.

• An issue is where in the list to enter a newly freed cell• It is usually more natural to build the list in address

order, like mark-sweep does.

First-fit Allocation

• A variation of first-fit• Method - start the search for a cell of suitable size

from the point in the list where the last search succeeded.

• When reaching the end of list, start over from the beginning.

• Idea - reduce the need to iterate repeatedly past the small cells at the head of the list.

• Drawbacks:– Fragmentation– Poor locality on accessing the list– Poor locality of the allocated objects

Next-fit Allocation

nextFitAllocate(n):start ← prevloop

curr ← next(prev)if curr = null

prev ← addressOf(head)curr ← next(prev)

if prev = startreturn null

else if size(curr) < nprev ← curr

elsereturn listAllocate(prev, curr, n)

Next-fit Allocation Algorithm

150KB 100KB 170KB 300KB 50KB

AllocatedFree

120KB allocation request

30KB 100KB 170KB 300KB 50KB

Next-fit

30KB 100KB 170KB 300KB 50KB

20KB allocation request

30KB 80KB 170KB 300KB 50KB

30KB 80KB 170KB 300KB 50KB

50KB allocation request

30KB 80KB 120KB 300KB 50KB

• Method - find the cell whose size most closely matches the allocation request.

• Idea:– Minimize waste– Avoid splitting large cells unnecessarily

• Bad worst case

Best-fit Allocation

bestFitAllocate(n):best ← nullbestSize ← ∞prev ← addressOf(head)loop

curr ← next(prev)if curr = null || size(curr) = n

if curr ≠ nullbestPrev ← prevbest ← curr

else if best = nullreturn null

return listAllocate(bestPrev, best, n)else if size(curr) < n || bestSize < size(curr)

prev ← curr else

best ← currbestPrev ← prevbestSize ← size(curr)

Best-fit Allocation Algorithm

150KB 100KB 170KB 300KB 50KB

AllocatedFree

150KB 10KB 170KB 300KB 50KB

90KB allocation request

Best-fit

150KB 10KB 170KB 300KB 50KB

50KB allocation request

150KB 10KB 170KB 300KB

150KB 10KB 170KB 300KB

50KB 10KB 170KB 300KB

100KB allocation request

• Use of a Balanced binary tree• Sorted by size (for best-fit) or by address (for first-fit

or next-fit).• If sorted by size, can enter only one cell of each size.• Example: Cartesian tree for first/next-fit.– Indexed by address (primary key) and size (secondary key)– Total order by address– Organized as a heap for the sizes

Speeding Free-list Allocation

• Searching in the Cartesian tree under first-fits policy:

firstFitAllocateCartesian(n):parent ← nullcurr ← rootloop

if left(curr) ≠ null && max(left(curr)) ≥ nparent ← currcurr ← left(curr)

else if prev < curr && size(curr) ≥ nprev ← currreturn treeAllocate(curr, parent, n)

else if right(curr) ≠ null && max(right(curr)) ≥ nparent ← currcurr ← right (curr)

elsereturn null

Speeding Free-list Allocation

• Dispersal of free memory across a possibly large number of small free cells.

• Negative effects:– Can prevent allocation from succeeding– May cause a program to use more address space, more resident

pages and more cache lines.• Fragmentation is impractical to avoid:

– Usually the allocator cannot know what the future request sequence will be.

– Even given a known request sequence, doing an optimal allocation is NP-hard.

• Usually There is a trade-off between allocation speed and fragmentation.

Fragmentation

• Idea – use multiple free-list whose members are segregated by size in order to speed allocation.

• Usually a fixed number k of size values s0 < s1 < … < sk-1• k+1 free lists f0,…,fk• For a free cell, b, on list fi,

size(b) > sk-1 if i=k• When requesting a cell of size b≤sk-1, the allocator rounds

the request size up to the smallest si such that b ≤si.

• Si is called a size class

Segregated-fits Allocation

SegregatedFitAllocate(j):result ← remove(freeLists[j])if result = null

large ← allocateBlock()if large = null

return nullinitialize(large, sizes[j])result ← remove(freeList[j])

return result

• List fk, for cells larger than sk, is organized to use one of the basic single-list algorithm.

• Per-cell overheads for large cell are a bit higher but in total it is negligible.

• The main advantage: for size classes other than sk, allocation typically requires constant time.

Segregated-fits Allocation

fk-1

fk

f1

f0 s0

s1

sk-1

>sk-1 >sk-1

• On simple free-list allocators – free cells that were too small to satisfy a request. Called external fragmentation.

• On segregated-fits allocation – wasted space inside an individual cell because the requested size was rounded up. Called internal fragmentation.

More on Fragmentation

• Important consideration – how to populate each free-list of segregated-fits.

• Two approaches:– Dedicating whole blocks to particular sizes– Splitting

Populating size classes

• Choose some block size B, a power of two.• The allocator is provided with blocks.• If the request is larger than one block,

multiple contiguous blocks are allocated.• For a size class s < B, we populate the free-list

fs by allocating a block and immediately slice it into cells of size s.

• Metadata of the cells is stored on the block.

Big Bag of PagesBlock-based allocation

• Disadvantage:– Fragmentation, average waste of half a block

(worst case (B-s)/B).• Advantages:– Reduced per-cell metadata– Simple and efficient for the common case

Big Bag of PagesBlock-based allocation

• Like simple free-list schemes, split a cell if that is the only way to satisfy a request.

• Improvement: concatenate the remaining portion to a suitable free-list (if possible).

• For example – the buddy system:– Size class are powers of two– Can split a cell of size 2i+1 into two cells of size 2i

– Can combine in the opposite direction (only if the two small cells were split from the same large cell)

Splitting

128KB

Allocated Minimum cell size – 16KBFree Maximum cell size – 128KB

Allocation request20KB

The Buddy System

64KB 64KB

Allocated Minimum cell size – 16KBFree Maximum cell size – 64KB

32KB 64KB32KB

Allocation request10KB

Allocated Minimum cell size – 16KBFree Maximum cell size – 64KB

12KB 64KB32KB20KB

12KB 64KB16KB20KB 16KB

Free10KB

12KB 64KB16KB20KB 16KB

Allocated Minimum cell size – 16KBFree Maximum cell size – 64KB

12KB 64KB20KB 16KB10KB 6KB

Allocated Minimum cell size – 16KBFree Maximum cell size – 64KB

12KB 64KB32KB20KB

Free20KB

32KB 64KB32KB

64KB 64KB

Allocated Minimum cell size – 16KBFree Maximum cell size – 64KB

128KB

• Alignment• Size constraints• Boundary tags• Heap parsability• Locality

Allocation’s Additional Considerations

• Alignment• Size constraints• Boundary tags• Heap parsability• Locality

Allocation’s Additional Considerations

• Allocated objects may require special alignment

• For example: a double-word floating point– Can make the granule a double-word – wasteful– Header of array in java takes 3 words – one word

is wasted or skipped.

Alignment

• Alignment• Size constraints• Boundary tags• Heap parsability• Locality

Allocation’s Additional Considerations

• Some collection schemes require a minimum amount of space in each cell.– Forwarding address– Lock/status

• In that case, the allocator will allocate more words than requested.

Size Constraints

• Alignment• Size constraints• Boundary tags• Heap parsability• Locality

Allocation’s Additional Considerations

• Additional header or boundary tag associated with each cell.

• Found outside the storage available to the program.

• Indicates size and allocated/free status• Is one or two words long• Possible use of bitmap instead

Boundary Tags

• Alignment• Size constraints• Boundary tags• Heap parsability• Locality

Allocation’s Additional Considerations

• The ability to advance cell to cell in the heap• An object’s header (one or two words):– Type– Hash code– Synchronization information– Mark bit

• The header comes before the data• The reference refers to the first element/field

Heap Parsability

• How to handle alignment?– Zero all free space in advance– Devise a distinct range of values to write at the

start of the gap• Easier parsing with a bit map, indicating where

each object start.– Require additional space and time

Heap Parsability

• Alignment• Size constraints• Boundary tags• Heap parsability• Locality

Allocation’s Additional Considerations

• During allocating– Address-ordered free-list and sequential allocation

present good locality.• During freeing– Goal: Objects being freed together will be near

each other.– Empirically, objects allocated at the same time

often become unreachable at about the same time.

Locality

• Multiple threads allocating• Most steps in allocation need to be atomic• Can result a bottleneck• Basic solution – each thread has its own

allocation area.• Use of a global pool and smart chunk handing

Allocation in Concurrent Systems

Allocation Summary

• Methods:- Sequential- Free-list: First-fit, Next-fit and Best-fit.- Segregated-fits

• Various considerations to notice

top related