a parallel, real-time garbage collector author: perry cheng, guy e. blelloch presenter: jun tao

Post on 22-Dec-2015

214 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

A Parallel, Real-Time Garbage Collector

Author: Perry Cheng,Guy E. Blelloch

Presenter: Jun Tao

Outline

• Introduction• Background and definitions• Theoretical algorithm• Extended algorithm• Evaluation• Conclusion

Introduction

• First garbage collectors:– Non-incremental, non-parallel

• Recent collector– Incremental– Concurrent– Parallel

Introduction

• Scalably parallel and real-time collector– All aspects of the collector are incremental– Parallel• Arbitrary number of application and collector threads

– Tight theoretical bounds on• Pause time for any application• Total memory usage

– Asymptotically but not practically efficient

Introduction

• Extended collector algorithm– Work with generations– Increase the granularity of the incremental steps– Separately handle global variables– Delay the copy on write– Reduce the synchronization cost of copying small

objects– Parallelize the processing of large objects– Reduce double allocation during collection– Allow program stacks

Background and Definitions

• A semispace Stop-Copy Collector– Divide heap memory into two equally-sized• From-space and to-space

– Suspend mutator and copy reachable objects to the to-space when from-space is full

– Update root values and reversing the role of from-space and to-space

Background and Definitions

• Types of Garbage Collectors

Background and Definitions

• Type of Garbage Collector (continued)

Background and Definitions

• Real-time Collector

– Maximum pause time– Utilization• The fraction of time that the mutator executes

– Minimum Mutator Utilization• A function of window size• Minimum utilization at all windows of that size• = 0 when window size <= maximum pause time

Theoretical Algorithm

• A Parallel, incremental and concurrent collector– Base on Cheney’s simple copying collector– All objects are stored in a shared global pool of

memory– Two atomic instruction

• FetchAndAdd• CompareAndSwap

– Collector interfaces with the application• Allocating space for a new object• Initializing the fields of a new object• Modifying the field of an existing object

Theoretical Algorithm

• Scalable Parallelism– Maintain the set of gray objects– Cheney’s technique• Keeping them in contiguous locations in to-space• Pros

– Simple

• Cons– Restricts the traversal order to breadth-first– Difficult to implement in a parallel setting

Theoretical Algorithm

• Scalable Parallelism (continued)– Explicitly managed local stack

• Each processor maintains a stack• A shared stack of gray objects• Periodically transfer gray objects between local and shared

stack• Avoid idleness

– Pushes (or pops) can proceed in parallel• Reserve a target region before transfer• Pushes and pops are not concurrent• Room sychronization

Theoretical Algorithm

• Scalable Parallelism (continued)– Avoid white objects being copied twice• Exclusive access by atomic instructions• Copy-copy synchronization

Theoretical Algorithm

• Incremental and Replicating Collection– Baker’s incremental collector• Copy k units of data when allocate a unit of data

– Bound the pause time

• Mutator can only see copied objects in to-space– A read barrier is needed

– Modification to avoid the read barrier• Mutator can only see the original objects in from-space

– A write barrier is needed

Theoretical Algorithm• Concurrency– Program and collector execute simultaneously– Program manipulate primary memory graph– Collector manipulate replica graph– A copy-write synchronization is needed

• Replica objects should be modified correspondently• Avoid race condition

– Mark objects being copied– Mutator’s update to replica should be delay

– A write-write synchronization is needed• Prohibit different mutator threads from modifying the same

memory location concurrently

Theoretical Algorithm

• Space and Time Bounds– Time bounds on each memory operation• ck

– C : a constant– K: the number of words we collect per word allocated

– Space bounds• 2(R(1+1.5/k)+N+5PD) ≈ 2(R(1+1.5/k)

– R: reachable space– N: maximum object count– P: P-way multiprocessor– D: maximum memory graph depth

Extended Algorithm• Globals, Stacks and Stacklets– Globals

• Updated when collection ends• Arbitrary many -> unbound time• Replicate globals like other heap objects• Every global has two location• A single flag is used for all globals

– Stacks and Stacklets• Divided stacks into fixed-size stacklets• At most one stacklet is active and the other can be replicated

savely• Also bound the waste space per stack

Extended Algorithm

• Granularity– Block Allocation and Free Initialization

• Avoid calling FetchAndAdd for every memory allocation• Each processor maintain a local pool in from-space and a

local pool in to-space when collector is on• Using a FetchAndAdd when allocating a local pool

– Write Barrier• Avoid updating copied objects every time• Record a triple <x, i, y> in a write log and defer• Invoke the collector when the write log is full• Eliminating frequent context switches

Extended Algorithm

• Small and Large Objects– Original Algorithm

• One field at a time– Reinterpretation of the tag word– Transferring the object from and to the local stack

– Extended Algorithm• Small objects

– Locked down and copied at a time

• Large objects– Divided into segments– One segment at a time

Extended Algorithm

• Algorithmic Modifications– Reducing double allocation

• One allocation by mutator and one by collector• Deferring the double allocation

– Rooms and Better Rooms• A push room and a pop room• Only one room can be non-empty• Rooms

– Enter the pop room, fetch work and perform, transition to the push room, push objects back to the shared stack

– Graying objects is time-consuming– Wait for entering the push room

Extended Algorithm

• Algorithm modifications– Rooms and Better Rooms (continued)

• Better rooms– Leave the pop room after fetching work from shared stack– Detect the shared stack is empty by maintaining a borrow counter

– Generational Collection• Nursery and tenured space• Trigger a minor collection when nursery space is full• Trigger a major collection when tenured space is full• Tenured references might not be modified during collection• Hold two fields for mutable pointer

– one for mutator to use, the other for collector to update

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Evaluation

Conclusion

• Implements a scalably parallel, concurrent, real-time garbage collector

• Thread synchronization is minimized

top related