automatic storage management patrick earl simon leonard jack newton

102
Automatic Storage Automatic Storage Management Management Patrick Earl Simon Leonard Jack Newton

Upload: emma-nephew

Post on 14-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

Automatic Storage Automatic Storage ManagementManagement

Patrick Earl

Simon Leonard

Jack Newton

Page 2: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 2

OverviewOverview

Terminology Why use Automatic Storage Management? Comparing garbage collection algorithms The “Classic” algorithms Copying garbage collection Incremental Tracing garbage collection Generational garbage collection Conclusions

Page 3: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 3

TerminologyTerminology

Stack: a memory area where activation records or frames are pushed onto when a procedure is called and popped off when it returns

Heap: a memory area where data structures can be allocated and deallocated in any order.

Page 4: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 4

TerminologyTerminology(Continued)(Continued)

Roots: values that a program can manipulate directly (i.e. values held in registers, on the program stack, and global variables.)

Node/Cell/Object: an individually allocated piece of data in the heap.

Children Nodes: the list of pointers that a given node contains.

Live Node: a node whose address is held in a root or is the child of a live node.

Page 5: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 5

TerminologyTerminology(Continued)(Continued)

Garbage: nodes that are not live, but are not free either.

Garbage collection: the task of recovering (freeing) garbage nodes.

Mutator: The program running alongside the garbage collection system.

Page 6: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 6

Why Garbage Collect?Why Garbage Collect?

Language requirements– In some situations it may be impossible to

know when a shared data structure is no longer in use.

Page 7: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 7

Why Garbage Collect?Why Garbage Collect?(Continued)(Continued)

Software Engineering– Garbage collection increases abstraction level

of software development.– Simplified interfaces and decreases coupling of

modules.– Studies have shown a significant amount of

development time is spent on memory management bugs [Rovner, 1985].

Page 8: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 8

Comparing Garbage Comparing Garbage Collection AlgorithmsCollection Algorithms

Directly comparing garbage collection algorithms is difficult – there are many factors to consider.

Some factors to consider:– Cost of reclaiming cells– Cost of allocating cells– Storage overhead– How does the algorithm scale with residency?– Will user program be suspended during garbage collection?– Does an upper bound exist on the pause time?– Is locality of data structures maintained (or maybe even

improved?)

Page 9: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 9

Classes of Garbage Collection Classes of Garbage Collection AlgorithmsAlgorithms

Direct Garbage Collectors: a record is associated with each node in the heap. The record for node N indicates how many other nodes or roots point to N.

Indirect/Tracing Garbage Collectors: usually invoked when a user’s request for memory fails because the free list is exhausted. The garbage collector visits all live nodes, and returns all other memory to the free list. If sufficient memory has been recovered from this process, the user’s request for memory is satisfied.

Page 10: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 10

Quick Review: Quick Review: Reference CountingReference Counting

Every cell has an additional field: the reference count. This field represents the number of pointers to that cell from roots or heap cells.

Initially, all cells in the heap are placed in a pool of free cells, the free list.

Page 11: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 11

Reference CountingReference Counting(Continued)(Continued)

When a cell is allocated from the free list, its reference count is set to one.

When a pointer is set to reference a cell, the cell’s reference count is incremented by 1; if a pointer is to the cell is deleted, its reference count is decremented by 1.

When a cell’s reference count reaches 0, its pointers to its children are deleted and it is returned to the free list.

Page 12: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 12

01

0

0

0

Reference Counting ExampleReference Counting Example

1

2

1

1

Page 13: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 13

2

1

1

1

Reference Counting Example Reference Counting Example (Continued)(Continued)

0

1

Page 14: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 14

2

1

1

1

Reference Counting Example Reference Counting Example (Continued)(Continued)

0

0

1

1

Page 15: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 15

2

1

1

1

Reference Counting Example Reference Counting Example (Continued)(Continued)

0

0

0

1

1

Returned to free list

Page 16: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 16

Reference Counting: Reference Counting: Advantages and DisadvantagesAdvantages and Disadvantages

Advantages:– Garbage collection overhead is distributed.– Locality of reference is no worse than mutator.– Free memory is returned to free list quickly.

Page 17: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 17

Reference Counting: Reference Counting: Advantages and DisadvantagesAdvantages and Disadvantages

(Continued)(Continued)

Disadvantages:– High time cost (every time a pointer is changed,

reference counts must be updated).– Storage overhead for reference counter can be

high.– Unable to reclaim cyclic data structures.– If the reference counter overflows, the object

becomes permanent.

Page 18: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 18

Reference Counting: Reference Counting: Cyclic Data Structure - Cyclic Data Structure - BeforeBefore

02

0

0

1

2

1

Page 19: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 19

Reference Counting: Reference Counting: Cyclic Data Structure – Cyclic Data Structure – AfterAfter

01

0

0

1

2

1

Page 20: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 20

Deferred Reference CountingDeferred Reference Counting

Optimisation– Cost can be improved by special treatment of

local variables.– Only update reference counters of objects on the

stack at fixed intervals.– Reference counts are still affected from pointers

from one heap object to another.

Page 21: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 21

Quick Review: Mark-SweepQuick Review: Mark-Sweep

The first tracing garbage collection algorithm Garbage cells are allowed to build up until heap

space is exhausted (i.e. a user program requests a memory allocation, but there is insufficient free space on the heap to satisfy the request.)

At this point, the mark-sweep algorithm is invoked, and garbage cells are returned to the free list.

Page 22: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 22

Mark-SweepMark-Sweep(Continued)(Continued)

Performed in two phases:– Mark phase: identifies all live cells by setting

a mark bit. Live cells are cells reachable from a root.

– Sweep phase: returns garbage cells to the free list.

Page 23: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 23

Mark-Sweep ExampleMark-Sweep Example

Returned to free list

Page 24: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 24

Mark-Sweep: Mark-Sweep: Advantages and DisadvantagesAdvantages and Disadvantages

Advantages:– Cyclic data structures can be recovered.– Tends to be faster than reference counting.

Page 25: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 25

Mark-Sweep: Mark-Sweep: Advantages and DisadvantagesAdvantages and Disadvantages

(Continued)(Continued)

Disadvantages:– Computation must be halted while garbage

collection is being performed– Every live cell must be visited in the mark

phase, and every cell in the heap must be visited in the sweep phase.

– Garbage collection becomes more frequent as residency of a program increases.

– May fragment memory.

Page 26: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 26

Mark-Sweep: Mark-Sweep: Advantages and DisadvantagesAdvantages and Disadvantages

(Continued)(Continued)

Disadvantages:– Has negative implications for locality of

reference. Old objects get surrounded by new ones (not suited for virtual memory applications).

However, if objects tend to survive in clusters in memory, as they apparently often do, this can greatly reduce the cost of the sweep phase.

Page 27: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 27

Mark-Compact CollectionMark-Compact Collection

Remedy the fragmentation and allocation problems of mark-sweep collectors.

Two phases:– Mark phase: identical to mark sweep.– Compaction phase: marked objects are

compacted, moving most of the live objects until all the live objects are contiguous.

Page 28: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 28

Mark-Compact: Mark-Compact: Advantages and DisadvantagesAdvantages and Disadvantages

(Continued)(Continued)

Advantages:– The contiguous free area eliminates

fragmentation problem. Allocating objects of various sizes is simple.

– The garbage space is "squeezed out", without disturbing the original ordering of objects. This ameliorate locality.

Page 29: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 29

Mark-Compact: Mark-Compact: Advantages and DisadvantagesAdvantages and Disadvantages

(Continued)(Continued)

Disadvantages:– Requires several passes over the data are

required. "Sliding compactors" takes two, three or more passes over the live objects.

One pass computes the new location Subsequent passes update the pointers to refer to new

locations, and actually move the objects

Page 30: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 30

Copying Garbage CollectionCopying Garbage Collection

Like mark-compact, copying garbage collection does not really "collect" garbage.

Rather it moves all the live objects into one area and the rest of the heap is know to be available.

Copying collectors integrate the traversal and the copying process, so that objects need only be traversed once.

The work needed is proportional to the amount of live date (all of which must be copied).

Page 31: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 31

Semispace Collector Using the Semispace Collector Using the Cheney AlgorithmCheney Algorithm

The heap is subdivided into two contiguous subspaces (FromSpace and ToSpace).

During normal program execution, only one of these semispaces is in use.

When the garbage collector is called, all the live data are copied from the current semispace (FromSpace) to the other semispace (ToSpace).

Page 32: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 32

Semispace Collector Using Semispace Collector Using the Cheney Algorithmthe Cheney Algorithm

A

B C

D

FromSpace ToSpace

Page 33: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 33

Semispace Collector Using Semispace Collector Using the Cheney Algorithmthe Cheney Algorithm

FromSpace ToSpace

A B C D

A

B C

D

Page 34: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 34

Semispace Collector Using the Semispace Collector Using the Cheney AlgorithmCheney Algorithm

(Continued)(Continued)

Once the copying is completed, the ToSpace is made the "current" semispace.

A simple form of copying traversal is the Cheney algorithm.

The immediately reachable objects from the initial queue of objects for a breadth-first traversal.

A scan pointer is advanced through the first object location by location.

Each time a pointer into FromSpace is encountered, the referred-to-object is transported to the end of the queue and the pointer to the object is updated.

Page 35: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 35

Cheney Algorithm: ExampleCheney Algorithm: Example

Root Nodes

AB F

EDC

A

A

A

B

B

B

C

C

C

D

D

E

A B C D E F

BAscan

scan

scan

scan

scan

free

free

free

free

free

Page 36: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 36

Semispace Collector Using the Semispace Collector Using the Cheney Algorithm Cheney Algorithm

(Continued)(Continued)

Multiple paths must not be copied to tospace multiple times.

When an object is transported to tospace, a forwarding pointer is installed in the old version of the object.

The forwarding pointer signifies that the old object is obsolete and indicates where to find the new copy.

Page 37: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 37

Copying Garbage Collection: Copying Garbage Collection: Advantages and DisadvantagesAdvantages and Disadvantages

Advantages:– Allocation is extremely cheap.– Excellent asymptotic complexity.– Fragmentation is eliminated.– Only one pass through the data is required.

Page 38: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 38

Copying Garbage Collection: Copying Garbage Collection: Advantages and DisadvantagesAdvantages and Disadvantages

(Continued)(Continued)

Disadvantages:– The use of two semi-spaces doubles memory

requirement needs– Poor locality. Using virtual memory will cause

excessive paging.

Page 39: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 39

Problems with Simple Tracing Problems with Simple Tracing CollectorsCollectors

Difficult to achieve high efficiency in a simple garbage collector, because large amounts of memory are expensive.

If virtual memory is used, the poor locality of the allocation/reclamation cycle will cause excessive paging.

Even as main memory becomes steadily cheaper, locality within cache memory becomes increasingly important.

Page 40: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 40

Problems with Simple Tracing Problems with Simple Tracing CollectorsCollectors

(Continued)(Continued)

With a simple semispace copy collector, locality is likely to be worse than mark-sweep.

The memory issue is not unique to copying collectors.

Any efficient garbage collection involves a trade-off between space and time.

The problem of locality is an indirect result of the use of garbage collection.

Page 41: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 41

Incremental Tracing Collectors Incremental Tracing Collectors OverviewOverview

Introduction to Incremental CollectorsCoherence and ConservatismTricolor MarkingWrite Barrier AlgorithmsBaker’s Read Barrier Algorithm

Page 42: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 42

Incremental Tracing CollectorsIncremental Tracing Collectors

Program (Mutator) and Garbage Collector run concurrently.– Can think of system as similar to two threads.

One performs collection, and the other represents the regular program in execution.

Can be used in systems with real-time requirements. For example, process control systems.

Page 43: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 43

Coherence & ConservatismCoherence & Conservatism

Coherence: A proper state must be maintained between the mutator and the collector.

Conservatism: How aggressive the garbage collector is at finding objects to be deallocated.

Page 44: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 44

TricoloringTricoloring

White – Not yet traversed. A candidate for collection.

Black – Already traversed and found to be live. Will not be reclaimed.

Grey – In traversal process. Defining characteristic is that it’s children have not necessarily been explored.

Page 45: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 45

The Tricolor AbstractionThe Tricolor Abstraction

Page 46: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 46

Tricoloring InvariantTricoloring Invariant

There must not be a pointer from a black object to a white object.

Page 47: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 47

Violation of Coloring InvariantViolation of Coloring Invariant

Before After

A

B C

D

A

B C

D

Page 48: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 48

Steps in ViolationSteps in Violation

Read a pointer to a white objectAssign that pointer to a black objectOriginal pointer must be destroyed without

collection system noticing.

Page 49: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 49

Read BarrierRead Barrier

Barriers are essentially memory access detection systems.

We detect when any pointers to any white objects are read.

If a read to the pointer occurs, we conceptually color that object grey.

Page 50: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 50

Write BarrierWrite Barrier

When a pointer is written to an object, we record the write somehow.

The recorded write is dealt with at a later point.

Read vs. Write efficiency considerations.

Page 51: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 51

Write Barrier AlgorithmsWrite Barrier Algorithms

Snapshot-at-beginningIncremental update

Page 52: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 52

Snapshot-at-beginningSnapshot-at-beginning

Conceptually makes a copy-on-write duplication of the pointer graph.

Can be implemented with a simple write barrier that records pointer writes and adds the old addresses to a stack to be traversed later.

Page 53: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 53

Snapshot-at-beginning Snapshot-at-beginning ExampleExample

Before After

A

B C

D

A

B C

D

Stack

Pointer toD is nowOn stack

Page 54: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 54

Comments on Snapshot-at-Comments on Snapshot-at-beginningbeginning

Very conservative.All overwritten pointer values are saved and

traversed.No objects can be freed while collection

process is occurring.

Page 55: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 55

Incremental Update Write-Incremental Update Write-Barrier AlgorithmBarrier Algorithm

No copy of tree is made.Catches overwrites of pointers that have

been copied.– If a pointer is not copied before being written, it

will be freed.The object with the overwritten pointer is

colored grey and the algorithm must search that node again at the end.

Page 56: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 56

Incremental Update ExampleIncremental Update Example

Before After

A

B C

D

A

B C

D

Page 57: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 57

Comments on Incremental Comments on Incremental UpdateUpdate

Things that are freed during collection are far more likely to be collected than with the snapshot algorithm. (Less conservative)

Although the collector restarts the traversal in some places, it is guaranteed to do a full search and will eventually terminate.

Page 58: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 58

Baker’s Read Barrier Baker’s Read Barrier AlgorithmsAlgorithms

Incremental CopyingNon-copying Algorithm (The Treadmill)

Page 59: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 59

Incremental CopyingIncremental Copying

Variation of Copying Collector“Garbage collection cycle begins with an

atomic flip.”All objects directly pointed to by the root

are copied into tospace.

Page 60: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 60

Read Barrier in Incremental Read Barrier in Incremental CopyingCopying

Whenever an object is read that is not already in ToSpace, the read barrier catches that and copies the object over to ToSpace at that point.

Normal “background scavenging” occurs simultaneously to ensure that all objects are traversed and reclamation can occur.

Page 61: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 61

Incremental Copying ExampleIncremental Copying Example

A

BC

D

FromSpace ToSpace

Atomic Flip, then a read to D occurs…

E

DA B C

Page 62: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 62

Comments on Read BarrierComments on Read Barrier

If implemented in software can be quite slow due to numerous reads to heap.

Specialized hardware is available on some unique machines that allow this type of tracing to be done quickly.

Page 63: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 63

Baker’s Incremental Non-Baker’s Incremental Non-Copying AlgorithmCopying Algorithm

Doubly Linked Lists

New area for allocations since started collection

To/From spacesFree list

New

Free

FromTo

Allocation

Scanning

Page 64: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 64

Example - AllocationExample - Allocation

Take an object from the free list and move it to the new list.

New

Free

FromTo

Allocation

Scanning

Page 65: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 65

Example - ScanningExample - Scanning

Searching nodes in ToSpace for references to objects in FromSpace.

When found, object is unlinked in FromSpace and is linked in ToSpace.

New

Free

FromTo

Allocation

Scanning

Page 66: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 66

Treadmill WorkingsTreadmill Workings

When starting collection cycle:– New list is empty– From list contains all New and To objects from

last cycle.Collection proceeds and scanning and

allocation are performed.When finished:

– From list is merged with Free list.

Page 67: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 67

Comments on TreadmillComments on Treadmill

As in Incremental Copying, the garbage found in the FromSpace is reclaimed in constant time.

Conservative with new objectsConservative also in that reached objects

will not be removed even if they become garbage before scan ends.

Page 68: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 68

Incremental Collectors Incremental Collectors SummarySummary

Incremental Tracing CollectorsTricolor Marking and InvariantRead and Write BarriersSnapshot-at-beginningIncremental UpdateBaker’s Incremental CopyingBaker’s Non-copying (Treadmill)

Page 69: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 69

Generational Garbage Generational Garbage CollectionCollection

Attempts to address weaknesses of simple tracing collectors such as mark-sweep and copying collectors:– All active data must be marked or copied.– For copying collectors, each page of the heap is

touched every two collection cycles, even though the user program is only using half the heap, leading to poor cache behavior and page faults.

– Long-lived objects are handled inefficiently.

Page 70: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 70

Generational Garbage Generational Garbage CollectionCollection

(Continued)(Continued)

Generational garbage collection is based on the generational hypothesis:

Most objects die young.As such, concentrate garbage collection

efforts on objects likely to be garbage: young objects.

Page 71: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 71

Generational Garbage Generational Garbage Collection: Object LifetimesCollection: Object Lifetimes

When we discuss object lifetimes, the amount of heap allocation that occurs between the object’s birth and death is used rather than the wall time.

For example, an object created when 1Kb of heap was allocated and was no longer referenced when 4 Kb of heap data was allocated would have lived for 3Kb.

Page 72: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 72

Generational Garbage Generational Garbage Collection: Object LifetimesCollection: Object Lifetimes

(Continued)(Continued)

Typically, between 80 and 98 percent of all newly-allocated heap objects die before another megabyte has been allocated.

Page 73: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 73

Generational Garbage Generational Garbage CollectionCollection

(Continued)(Continued)

Objects are segregated into different areas of memory based on their age.

Areas containing newer objects are garbage collected more frequently.

After an object has survived a given number of collections, it is promoted to a less frequently collected area.

Page 74: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 74

Generational Garbage Generational Garbage Collection: ExampleCollection: Example

Old Generation New Generation

Root SetS

A

B

C

Memory Usage Memory Usage

Page 75: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 75

Generational Garbage Generational Garbage Collection: ExampleCollection: Example

(Continued)(Continued)

Old Generation New Generation

Root SetS

A

B

C

Memory Usage Memory Usage

R

Page 76: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 76

Generational Garbage Generational Garbage Collection: ExampleCollection: Example

(Continued)(Continued)

Old Generation New Generation

Root SetS

A

B

C

Memory Usage Memory Usage

R D

Page 77: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 77

Generational Garbage Generational Garbage Collection: ExampleCollection: Example

(Continued)(Continued)

This example demonstrates several interesting characteristics of generational garbage collection:– The young generation can be collected independently of

the older generations (resulting in shorter pause times).– An intergenerational pointer was created from R to D.

These pointers must be treated as part of the root set of the New Generation.

– Garbage collection in the new generation result in S becoming unreachable, and thus garbage. Garbage in older generations (sometimes called tenured garbage) can not be reclaimed via garbage collections in younger generations.

Page 78: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 78

Generational Garbage Generational Garbage Collection: ImplementationCollection: Implementation

Usually implemented as a copying collector, where each generation has its own semispace:

Old Generation New Generation

FromSpace FromSpace

ToSpace ToSpace

Page 79: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 79

Generational Garbage Generational Garbage Collection: IssuesCollection: Issues

Choosing an appropriate number of generations:– If we benefit from dividing the heap into two

generations, can we further benefit by using more than two generations?

Choosing a promotion policy:– How many garbage collections should an object

survive before being moved to an older generation?

Page 80: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 80

Generational Garbage Generational Garbage Collection: IssuesCollection: Issues

(Continued)(Continued)

Tracking intergenerational pointers:– Inter-generational pointers need to be tracked,

since they form part of the root set for younger generations.

Collection Scheduling– Can we attempt to schedule garbage collection

in such a way that we minimize disruptive pauses?

Page 81: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 81

Generational Garbage Collection: Generational Garbage Collection: Multiple GenerationsMultiple Generations

Generation 1Generation 2Generation 3Generation 4

Page 82: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 82

Generational Garbage Collection: Generational Garbage Collection: Multiple GenerationsMultiple Generations

(Continued)(Continued)

Advantages:– Keeps youngest generation’s size small.– Helps address mistakes made by the promotion policy by creating

more intermediate generations that still get garbage collected fairly frequently.

Disadvantages:– Collections for intermediate generations may be disruptive.– Tends to increase number of inter-generational pointers, increasing

the size of the root set for younger generations.

Most generational collectors are limited to just two or three generations.

Page 83: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 83

Generational Garbage Collection: Generational Garbage Collection: Promotion PoliciesPromotion Policies

A promotion policy determines how many garbage collections cycles (the cycle count) an object must survive before being advanced to the next generation.

If the cycle count is too low, objects may be advanced too fast; if too high, the benefits of generational garbage collection are not realized.

Page 84: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 84

Generational Garbage Collection: Generational Garbage Collection: Promotion PoliciesPromotion Policies

(Continued)(Continued)

With a cycle count of just one, objects created just before the garbage collection will be advanced, even though the generational hypothesis states they are likely to die soon.

Increasing the cycle count to two denies advancement to recently created objects.

Under most conditions, it increasing the cycle count beyond two does not significantly reduce the amount of data advanced.

Page 85: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 85

Generational Garbage Collection: Generational Garbage Collection: Inter-generational PointersInter-generational Pointers

Inter-generational pointers can be created in two ways:– When an object containing pointers is promoted to an

older generation.– When a pointer to an object in a newer generation is

stored in an object.

The garbage collector can easily detect promotion-caused inter-generational pointers, but handling pointer stores is a more complicated task.

Page 86: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 86

Generational Garbage Collection: Generational Garbage Collection: Inter-generational PointersInter-generational Pointers

Pointer stores can be tracked via the use of a write barrier:– Pointer stores must be accompanied by extra

bookkeeping instructions that let the garbage collector know of pointers that have been updated.

Often implemented at the compiler level.

Page 87: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 87

Generational Garbage Collection: Generational Garbage Collection: Inter-generational PointersInter-generational Pointers

(Continued)(Continued)

However, write barriers only provide a conservative estimation of live intergenerational pointers:

Old Generation New Generation

Root Set

Page 88: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 88

Generational Garbage Collection: Generational Garbage Collection: Inter-generational PointersInter-generational Pointers

(Continued)(Continued)

Tracking inter-generational pointers are often the largest cost of generational garbage collection.

1 percent of a typical Lisp program’s total instruction count are pointer stores. If a write barrier adds 10 instructions to a pointer store, overall performance will drop by 10 percent.

Page 89: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 89

Generational Garbage Collection: Generational Garbage Collection: Inter-generational PointersInter-generational Pointers

(Continued)(Continued)

Entry Tables– Pointers from older generations point indirectly

to younger generations via an entry table:Generation 2 Generation 1Generation 3

Entry Table Entry Table

Page 90: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 90

Generational Garbage Collection: Generational Garbage Collection: Inter-generational PointersInter-generational Pointers

(Continued)(Continued)

Entry Table: Advantages– When a younger generation is collected, only the entry

table for that generation needs to be scanned.

Entry Table: Disadvantages– Entry table may contain several entries to the same

object, making scans of the object table proportional to the number of pointer stores rather than to the number of inter-generational pointers.

– High overhead because of extra level of indirection.

Page 91: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 91

Generational Garbage Collection: Generational Garbage Collection: Inter-generational PointersInter-generational Pointers

(Continued)(Continued)

Remembered Sets– The write barrier checks to see if a pointer

being stored in an old objects points to an object in a newer generation. If so, the address of the old object is added to the remembered set (if that object is not already in the set).

Page 92: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 92

Generational Garbage Collection: Generational Garbage Collection: Inter-generational PointersInter-generational Pointers

(Continued)(Continued)

Remembered Sets (Continued)New GenerationOld Generation

Remembered Set

Page 93: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 93

Generational Garbage Collection: Generational Garbage Collection: Inter-generational PointersInter-generational Pointers

(Continued)(Continued)

Remembered Sets: Advantages– Scanning is proportional to the number of

stored-into objects, not the number of store operations.

Remembered Sets: Disadvantages– Pointer store checking can be expensive.

Page 94: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 94

Generational Garbage Collection: Generational Garbage Collection: Collection SchedulingCollection Scheduling

Generational garbage collection aims to reduce pause times. When should these (hopefully short) pause times occur?

Two strategies exist:– Hide collections when the user is least likely to

notice a pause, or– Trigger efficient collections when there is likely

to be lots of garbage to collect.

Page 95: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 95

Generational Garbage Collection: Generational Garbage Collection: AdvantagesAdvantages

In practice it has proven to be an effective garbage collection technique.

Minor garbage collections are performed quickly.

Good cache and virtual memory behavior.

Page 96: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 96

Generational Garbage Collection: Generational Garbage Collection: DisadvantagesDisadvantages

Performs poorly if any of the main assumptions are false:– That objects tend die young.– That there are relatively few pointers from old objects

to young ones.

Frequent pointer writes to older generations will increase the cost of the write barrier, and possibly increase the size of the root set for younger generations.

Page 97: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 97

Garbage Collection: SummaryGarbage Collection: Summary

Method Conservatism Space Time Fragmentation Locality

Mark Sweep Major Basic 1 traversal + heap scan

Yes Fair

Mark Compact Major Basic Many passes of heap

No Good

Copying Major Two Semispaces

1 traversal No Poor

Reference Counting

No Reference count field

Constant per Assignment

Yes Very Good

Deferred Reference Counting

Only for stack variables

Reference Count Field

Constant per Assignment

Yes Very Good

Incremental Varies depending on algorithm

Varies Can be Guaranteed Real-Time

Varies Varies

Generational Variable Segregated Areas

Varies with number of live objects in new generation

Yes (Non-Copying)

No (Copying)

Good

Tracin

gIn

cremen

tal

Page 98: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 98

Garbage Collection: Garbage Collection: ConclusionsConclusions

Relieves the burden of explicit memory allocation and deallocation.

Software module coupling related to memory management issues is eliminated.

An extremely dangerous class of bugs is eliminated.

Page 99: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 99

Garbage Collection: ConclusionsGarbage Collection: Conclusions(Continued)(Continued)

Zorn’s study in 1989/93 compared garbage collection to explicit deallocation:

– Non-generational Between 0% and 36% more CPU time. Between 40% and 280% more memory.

– Generational garbage collection Between 5% to 20% more CPU time. Between 30 and 150% more memory.

Wilson feels these numbers can be improved, and they are also out of date.

A well implemented garbage collector will slow a program down by approximately 10 percent relative to explicit heap deallocation.

Page 100: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 100

Garbage Collection: ConclusionsGarbage Collection: Conclusions(Continued)(Continued)

Despite this cost, garbage collection a feature in many widely used languages:– Lisp (1959)– Perl (1987)– Java (1995)– C# (2001)– Microsoft’s Common Language Runtime (2002)

Page 101: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 101

Garbage Collection: PointersGarbage Collection: Pointers

Heap of fish applet (Mark and Sweep garbage collection example)http://www.artima.com/insidejvm/applets/HeapOfFish.html

Java HotSpot Garbage Collection Strategieshttp://developer.java.sun.com/developer/technicalArticles/Networking/HotSpot/

The Memory Management Referencehttp://www.memorymanagement.org/

Uniprocessor Garbage Collection Techniques (Wilson)

http://www.cs.ualberta.ca/~duane/courses/425-525/WilsonACMDraft.pdf

Garbage Collection: Algorithms for Automatic Dynamic Memory Management

(Richard Jones and Rafael Lins)

Page 102: Automatic Storage Management Patrick Earl Simon Leonard Jack Newton

CMPUT 425/525: Automatic Storage Management 102

Questions?Questions?

If you have any questions, please feel free to

e-mail one of us:Patrick Earl [email protected]

Simon Leonard [email protected]

Jack Newton [email protected]