garbage collecting the internet

7/28/2019 Garbage Collecting the Internet

1/44

Garbage Collecting the Internet: A Survey of Distributed GarbageCollection

SALEH E. ABDULLAHI AND GRAEM A. RINGWOODQueen Mary and Westfield College, University of London

Internet programming languages such as Java present new challenges to garbage-collection design. The spectrum of garbage-collection schema for linked structuresdistributed over a network are reviewed here. Distributed garbage collectors areclassified first because they evolved from single-address-space collectors. Thistaxonomy is used as a framework to explore distribution issues: locality of action,communication overhead and indeterministic communication latency.

Categories and Subject Descriptors: C.2.4 [ Computer-CommunicationsNetworks ]: Distributed Systems; D.1.3 [ Programming Techniques ]: ConcurrentProgramming, Distributed Programming, Parallel Programming; D.4.2 [ Operating Systems ]: Storage Management; D.4.3 [ File Systems Management ]

General Terms: Languages, Management, Performance, Reliability

Additional Key Words and Phrases: Automatic storage reclamation, distributedobject-oriented management, distributed file systems, distributed memories,memory management, network communication, distributed, object-orienteddatabases, reference counting

1. INTRODUCTION

Efficient automatic garbage collection isso useful and so difficult to make unob-trusive that it has been a field of activeresearch for over three decades. Theproblem was considered solved in thelate 80s with state-of-the-art genera-tion scavengers employed by Smalltalksystems. Smalltalk was the major driv-ing force in the development of garbagecollectors in the 1980s. The 1990s saw asignificant technology shift to distrib-uted systems, which provide a furtherchallenge to language and garbage-col-lection design.

Unfortunately, the term distributedsystem has been applied to so wide arange of computing systemslooselycoupled, closely coupled, tightly cou-

pled, array processors, dataflow, neuralnets, etc. [Booth 1981]as to becomealmost without meaning. For this review, following Lamport [1978], a sys-tem is classified as distributed if theend-to-end message transmission delayis significantly greater than the timebetween consecutive events of a process.

Storage reclamation became a neces-sity when LISP pairs were introduced in

the early 1960s [McCarthy 1981]. Thelifetimes of such structures generallyexceed the lifetime of modules that ac-cess them. Indeed, for Smalltalk thedata structures are persistent. Moss[1990] defines a persistent store as onethat outlives the execution of any pro-cess. The definition includes file sys-tems and databases. Atkinson et al.

Permission to make digital/hard copy of part or all of this work for personal or classroom use is grantedwithout fee provided that the copies are not made or distributed for profit or commercial advantage, thecopyright notice, the title of the publication, and its date appear, and notice is given that copying is bypermission of the ACM, Inc. To copy otherwise, to republish, to post on servers, or to redistribute tolists, requires prior specific permission and/or a fee. 1998 ACM 0360-0300/98/03000330 $5.00

ACM Computing Surveys, Vol. 30, No. 3, September 1998


2/44


3/44

protocol, the title of this paper para-phrases the title of Lang et al. [1992].Empirical evidence [Abdullahi 1995]suggests this collector is the state of the

art. Knuth [1973] reviewed (nondistrib-uted) collectors that appeared before1968, and Cohen [1981] brought thesurvey of papers up to 1981. The mostrecent review of (nondistributed) gar-bage collection is Wilson [1992] pre-sented at the International Workshop on Memory Management . A preliminary version of the present review [Abdullahiet al. 1992] appeared in this same work-shop. A second review of distributed

garbage collectors [Plainfosse and Sha-piro 1995] was presented at a subse-quent International Workshop of Mem-ory Management. Jones and Lins [1996]published the first textbook on garbagecollection, but it is mainly concernedwith nondistributed garbage collection;Jones [1996] maintains a webliographyon garbage collection.

Research on distributed garbage col-lection can be better understood as tri-

al-and-error adaptation of ideas devel-oped for single-processor, single-addressspace collectors. For this reason, Section2 surveys and classifies nondistributedgarbage collectors. Old hands arewarned that some attempt is made torationalize the ontology of the subject.Section headings reflect the rationalizedrather than historical nomenclature.Using this classification, Section 3 grad-ually uncovers the issues of distribu-tion. Section 4 summarizes the prob-lems and proposed solutions.

2. TAXONOMY OF SINGLE-ADDRESS-SPACE GARBAGE COLLECTORS

2.1 The Single-Address-Space Garbage-Collection Problem

Garbage collection, GC, is an inevitableconsequence of programming languagesthat employ dynamic data structures.With dynamic data structures, the stateof a computation at any instant can beconsidered as a many-rooted, directedgraph called the computation graph

(Figure 1). The roots are distinguished vertices that provide entry points to thegraph. The internal vertices of the com-putation graph are realized as cells ,

contiguous segments of memory. A cellis effectively a base address from whichoffsets may be legitimately accessed.Due to concern for garbage collection inobject-oriented languages, such asSmalltalk, cells (as everything in object-oriented languages) are called objects . If the cells are not of fixed size, they com-monly have a terminator or a headerfield containing some indicator of thesize of the cell. The indicator may be the

number of bytes in the cell or a pointerto the last byte of the cell. Runtime-typed languages such as Prolog andSmalltalk often have a header field de-scribing the data type of the cell. Edgesof the graph are realized by store ad-dress fields within cells.

Cells referenced directly or indirectlyfrom a root are said to be reachable ,accessible , or live . As a computationprogresses, addition and deletion of

roots, vertices, and edges modify thegraph. As a result, some portions of thegraph become disconnected. Such cellsare said to be unreachable , inaccessible ,or dead . These disconnected subgraphsmake no contribution to the result of the computation and are known as gar-bage . In Figure 1, a garbage cell is de-noted by a filled circle and an accessiblecell by an unfilled circle. Without reuse,the finite store available for allocating new vertices diminishes to zero. Gar-bage collection is the process by whichthe store occupied by garbage can bereused.

Such graph structures also occur incaches, file systems, and databases. Thestorage controlled by a file system ap-pears as a directed graph of blocks.Some blocks are indices that containreferences to other blocks. A file systemmaintains a file as long as its blocks canbe traced from the root block of the disk.The idea of collecting processes wasfirst addressed in the context of the Actor language [Agha 1986; Kafura etal. 1990]. Each process, or actor , is ref-

332 S.E. Abdullahi and G.A. Ringwood



4/44

erenced by its mailbox. Root actors are

those that interface directly with de- vices. One actor can communicate withanother if its mailbox is known. An ac-tor is garbage if it cannot be communi-cated with, indirectly, from a root actor.

Most imperative languages (e.g., Pas-cal, C, C , etc.) place the responsibil-ity for cell allocation and reclamation onthe programmer. With such languages,the programmer must explicitly reclaimgarbage cells by using dispose , delete , or free system calls. Explicit store manage-ment leads to two very common bugs:incompleteness (also called memoryleaks), a failure to reclaim all extantgarbage, and unsoundness (also calledthe dangling reference problem), thepremature reclaiming of accessible cells.Memory leaks can gradually accumu-late until the computation fails becauseno space is left to allocate a new cellneeded to complete the computation orstore a file. Even if the computationdoes not run out of space, a computationmay make little progress because of thrashing. Following Denning [1968], acomputation with page faults every few

instructions is said to thrash. Memory

leaks can cause extreme nonlocality of reference as live cells are dispersed overa large virtual address space.

Reclaiming cells prematurely by acareless use of delete or free can causethe other even more insidious bug, un-soundness. When memory is reclaimedtoo soon, the space occupied by a cellmay be reused while the cell is stillreachable from a root. The same storelocation may, therefore, be simulta-neously interpreted in different ways.Updates to one interpretation will causeunpredictable effects on the other. Un-soundness can be difficult to detect anddebug because computations tend to faillong after the event that caused theproblem.

The ontology, soundness, and com-pleteness derive from theorem proving.What is different here is that the tran-sitive closure of the references is dy-namic. Because the two problems aredifficult for programmers to address, awide variety of languages provide auto-matic allocation and reclamation aspart of their runtime system. In such

Figure 1. A representative, though small, state of a computation.

Garbage Collecting the Internet 333



5/44

systems, the allocation routines (e.g.,CONS in Lisp) perform special actionsto reclaim space as necessary, oftenwhen a memory request cannot be satis-

fied from the available free store. Callsto the deallocator (free or dispose) be-come unnecessary, as they are implicitin calls to the allocator. Dijkstra et al.[1978] introduce two useful abstractionsto the study of garbage collection. Themutator , M, abstracts the process thatperforms the computation, including al-location of a new cell. The process thatautomatically reclaims garbage is calledthe collector , C. Three kinds of mutator

operations that affect the garbage col-lector can be distinguished:

creation of an edge to a new vertex creation of a new edge between exist-

ing vertices, and destruction of an edge

Historically, the major drawback of automatic garbage collection was that itsignificantly detracted from the perfor-mance of the mutator, both by introduc-ing unpredictably long pauses in compu-tation and by using large proportions of available processing cycles. Measure-ments of early Smalltalk-80 implemen-tations [Krasner 1983] indicate 20% to70% of runtime spent garbage collect-ing. Steele [1975] and Wadler [1976]reported collection overheads of be-tween 10% to 40% for Lisp, with pausetimes between mutation of 4.5 secondsevery 79 seconds [Foderaro 1981]. Inthe 80s, state-of-the-art collectors forSmalltalk-80 achieved less than 5% col-lector overhead, with typically less than100-millisecond pause times [Ungar1992]. Assuming two orders of magni-tude increase in processor speed overthe decade, this represents an order of magnitude reduction in pause time.

Very few papers have considered filemanagement from the point of view of garbage collection [Rosenblum andOusterhout 1992]. There are, however,many tools that use the garbage-collec-tion techniques for optimizing and re-pairing hard disks, Unixs fsck for ex-

ample. Many object-oriented databases,object servers and persistent storessuch as O2 [Delobel et al. 1995] andObject-Store [Lamb et al. 1991] have

garbage collectors. While there havebeen numerous papers on garbage col-lection of data structures, they tend tobe language-specific, which is problem-atical because some languages allow op-timizations that are not generally appli-cable. Language semantics can restrictthe topology of the computation graph,which may be cyclic, acyclic, or a poly-tree tree. (A polytree is a singly con-nected graph.) In unoptimized func-

tional languages, the graph ispredominantly treelike [Clarke 1977].This is not the case for the purer object-oriented languages such as Smalltalk,which make extensive use of cyclic datastructures. The topology of the computa-tion graph in turn (as will become clear)constrains the type of collector that canbe effective.

Cohen [1981] refined the collector pro-cess, C [Dijkstra et al. 1978] to two

subprocesses: identification, I, and rec-lamation, R. This can be extended into ataxonomy of collectors by distinguishing two classes of identification, direct andindirect (Figure 2). Direct identificationof garbage (also called reference count-ing) identifies cells that have no refer-ences to them. Indirect identificationidentifies live cellswhat remains of the total store must be unallocated storeand garbage. While direct identificationis local in nature, indirect identificationis globalit requires tracing live cellsfrom the roots. The latter is called thetracing family in Lang and Dupont[1987].

The form of reclamation depends onhow free-store is managed. It can bemanaged as a free-list (equally well as abitmap or buddy system) or a heap (lin-ear allocation) [Wilson et al. 1995]. If managed as a free-list, contiguous gar-bage can be coalesced to provide largercells. If managed as a heap, a singlereference, the top of heap , indicates thedivision between allocated and unallo-cated store. Reclamation can be per-




6/44

formed either by compacting or scaveng-ing.

Various collectors have been proposedthat seek to optimize different criteria.Some aim to minimize the ratio of timespent collecting to the time spent mu-tating; some aim to minimize the timetaken in any one invocation of the col-lector to provide predictable perfor-mance for reactive (interactive or real-time) systems; some aim to minimizethe space overhead (the additionalmemory required to identify and re-claim garbage; and some are concernedwith localization to prevent thrashing in virtual memory systems. The follow-ing sections further refine the taxon-omy, outlining the advantages and dis-advantages of different species.

2.2 Direct Identification of Garbage(Reference Counting)

Direct identification of garbage is ef-fected by a reference count : a record of the number of references to a cell [Col-lins 1960; Weizenbaum 1963]. Each cellcarries the space overhead of an extraheader field to hold the count.

2.2.1 Immediate Identification and Reclamation

In the simplest form, garbage identifica-tion is coupled to mutationafter everymutation the count is adjusted. If as aresult of mutation a cell count falls to

zero, the cell is garbage and is re-claimed immediately [Collins 1960]. Thecollector reclaims cells recursively, dec-rementing the counts of referents andreclaiming as appropriate. This processis known, naturally enough, as recursive freeing . It is illustrated using a slightlyelaborated informal notation introducedby Watson [1986]:

(M I) (M I)(M I). . .(R I) (R I) (R I). . .

(M I)(M I) (M I). . .(R I) (R I). . . .

The parentheses indicate atomic ac-tions. The notation illustrates that re-cursive freeing can cause unboundedmutator delays.

A problem with immediate referencecounting is that one component of thetime spent in identification is propor-tional to the number of mutation opera-tions [Steele 1975; Ungar 1984]. Signif-icant time is also spent in recursivefreeing: 5% on Berkeley Smalltalk and1.9% on Dorado Smalltalk [Ungar1984]. Because recursive freeing is un-bounded, immediate reference counting can cause indeterministically long pauses in mutation, and so is unsuit-able for reactive applications.

2.2.2 Deferred Reclamation

The problem of recursive freeing can bealleviated by deferring reclamation. Us-ing doubly linked free-list store man-agement [Weizenbaum 1963], a newly

Figure 2. Garbage collector taxonomy.




7/44

deallocated cell is placed on the end of the free-list, but its referents are notimmediately reclaimed. The referentsare reclaimed when the cell advances to

the head of the free-list. Only when thecell is reallocated are the counts of its immediate referents decrementedand added to the free-list. In this re-gime, garbage collection is local, fine-grained and interleaved with muta-tion:

(M I) (M I) (R I) (M I) (M I)(R I). . .

Collectors that underestimate the ex-tent of garbage in a single invocation of

the collector are said to be incomplete .Such temporarily (or permanently) un-identified or unreclaimed garbage isknown as floating garbage .

Deferred reclamation provides asmoother collection policy, one not so vulnerable to unbounded mutator de-lays. A scheme similar to Weizenbaumsbut more suitable for arbitrary-size cellsis that of Glaser and Thomson [1985]. Ituses a to-be-decremented stack, TBD,

instead of a doubly-linked list. In thisscheme, references to cells are pushedonto the TBD stack if they have a countof one that is due to be decremented.When cells are allocated from the stacktheir count is already one, so thescheme also manages to elide clearing and setting the count.

2.2.3 Deferred Identification

Deutsch and Bobrow [Deutsch 1976] ob-serve that frequently, over a series of reference-counting operations, the netchange in a cells reference count issmall, if not nil. For example, whenduplicating a cell reference as a stackparameter to a procedure call, the cellacquires a reference that is lost once theprocedure call returns. If adjusting such volatile references can be deferred,many garbage-identification operationscan be elided.

Baden [1983] proposes such a schemefor Smalltalk-80, which was imple-mented by Miranda [1987]. Referencesto cells from roots, such as the stack,

are not included in a cells count. In-stead, root-only referenced cells are re-corded in a Zero Count Table (ZCT). If areference to a new cell is pushed on the

call stack (the typical way new cells jointhe computation graph), a reference isplaced in the ZCT. When a nonroot ref-erence-counting operation causes acells count to fall to zero, a reference isalso placed in the ZCT because it mightstill be referenced from a root. When theZCT fills up or when no free store isavailable for cell allocation, the collectorpreferentially reclaims cells referencedby the ZCT. Reference counts are first

stabilized (made consistent) by scanning the roots and increasing the count of allreferenced cells. The ZCT is emptied byscanning the table and any cell with azero count is freed. Finally, the rootsare scanned again and the counts of cells referred to from roots are decre-mented. During this process any cellswhose counts return to zero are placedin the ZCT because they are now onlyreferenced by roots.

With this technique, stack pushes andpops are made without identification op-erations. Badens measurements of aSmalltalk-80 system suggest that thismethod eliminates 90% of the reference-count operations and reduces the totaltime spent on garbage collection by half [Baden 1983]. A potential disadvantageis that scanning the ZCT causes a pausein mutation. However, typical pausetimes are of a few milliseconds[Miranda 1987]; a further disadvantageis the extra storage required by theZCT.

2.2.4 Space Overhead and Overflow

Another drawback of reference counting is the space overhead of the count field.It has been observed [Krasner 1983]that the majority of cells have a smallcount. To reduce the space overhead,the size of the count field of a cell isoften chosen smaller than needed. Typi-cally, systems allocate one byte to holdthe count. To prevent overflow once acount becomes saturated , i t is not al-




8/44

tered and no longer accurately reflectsthe number of references to the cell. Tominimize the test for saturation, asigned byte is used to hold the count; a

count is saturated if the byte is nega-tive. This only allows a count to recordaccurately up to 127 references.

Clarks measurements of Lisp pro-grams [Clark 1979] show that about97% of list cells have a reference countof 1. This suggests an extreme form of saturation using a single-bit count[Friedman and Wise 1977; Chikayamaand Kimura 1987]. A clear bit is used toindicate a single reference to a cell.

When a second reference to the cell iscreated the bit is set. Once set, the bitcannot be cleared because, without trac-ing from the root, it cannot be deter-mined if the cell has more than onereference. To reclaim cells that acquiremore than one reference during theirlifetime, it is necessary to use a secondcollector that uses indirect identifica-tion. Because of the predominance of single references, the indirect collector

will be invoked considerably less oftenthan if it were used alone.

2.2.5 Memory Leaks

An important aspect of direct identifica-tion is its dependence on local informa-tion (the count). As a result of mutation,a subgraph may become detached fromthe computation graph, yet the refer-ence count of none of its cells is zero.This will happen if the mutator cangenerate cycles. Reference-counting schemes do exist that attempt to collectcycles of garbage, but they are complex[Friedman and Wise 1979], have signif-icant computational overheads and lackgenerality. Bobrows [1980] concern isfunctional languages while Brown-bridge [1985] specializes in combinatorgraph-reduction machines.

Brownbridge [1985] uses two types of reference. Strong references form anacyclic graph of accessible cells, whileweak references complete cycles (Figure3). A weak reference is intended to ref-erence a cell with a strong reference. A

cell holds two reference counts: one forstrong references, SC, and the other,WC, for weak. When the deletion of astrong reference results in a zero SCand nonzero WC, there is a possiblecycle of garbage. To determine if the cellis garbage, Brownbridges scheme recur-sively traces the cells referents. If a cell

is located with zero SC, the weak refer-ence is made strong. If a cell is locatedwith SC greater than one, the referenceis made weak and the search termi-nated. If the trace returns to the start-ing point without having located a cellwith SC greater than one, the cycle isgarbage and can be reclaimed.

Tracing can spread over arbitrarilylarge parts of the computation graph.The algorithm fails when there are in-tersecting cycles (e.g. Figure 4). Withtwo mutually referencing cycles, eachwill regard the other as an externalreference.

Hughes [1984] gives a scheme basedon identifying strongly connected components , SCCs, of a graph. SCCs arethose subgraphs for which there is acycle at each vertex. Such SCCs havetheir own reference count. SCCs aremerged to produce larger SCCs so thatno cycles of SCCs are formed. During mutation, SCCs can be created, de-stroyed, split, or amalgamated. The major complication is splitting. A split re-quires tracing the graph looking for

Figure 3. Weak and strong references.




9/44


10/44

(assuming the cells are the same size),the mark-bit is cleared and a forward-ing reference to the new cell placed in itsprevious position. When the two refer-

ences meet, all marked cells have beenunmarked and compacted in the lowerpart of the heap. A second scan isneeded to readjust references to movedcells. Indirect references from cells inthe compacted area, by way of thecleared area, are made direct.

In some schemes compaction is con-trolled by the mutator. The allocator inBrouHaHa Smalltalk [Miranda 1987]checks whether the total size of free

cells is insufficient, and if so, invokes acompactor. If compaction proves futile,the collector is invoked. Martin [1982]combines the marking phase with a re-arrangement of the references so thatcells can be moved more readily. Carls-son et al. [1990] present a variationsuitable for cells of varying sizes. Dur-ing the mark phase, the reference fieldsof the accessible cells (not the data) arecopied to a table. After sorting the ad-

dresses, the reachable cells are com-pacted by sliding the cells to one endof the store.

Mark-scan has large pause times. Thescan time is proportional to the size of the store. In virtual memory systems,the collector may access numerouspages on secondary store, an inherentlyslow process. As such, mark-scan is un-suitable for reactive applications. Evenif the garbage collector goes into actioninfrequently, when it does no reaction ispossible.

2.3.2 Concurrent Mark-Scan

A major advantage of deferred directidentification is that identification andreclamation have a fine grain size. Thismakes it suitable for interactive andreal-time applications [Goldberg 1983].Dijkstra et al. [1978] describe a varia-tion of mark-scan in which the mutatorand the collector operate concurrently,called on-the-fly garbage collection. Theconcepts of mutator and collector werecoined in this context.

In the simple mark-scan scheme of Section 2.3.1, concurrency is not possi-ble because of possible interference of identification with mutation. If a refer-

ence to a new cell is added after theidentifier has passed over its referents,the new cell is not recognized as part of the computation graph and so collected.Dijkstra et al. achieve a decoupling of the mutator from the collector by intro-ducing a third state for a cell. The threestates referred to, perhaps inappropri-ately, as colors are white (unreachable);black (reachable); and gray (possiblyreachable). They can be realized by two

mark bits.Two or more processes, one or moreresponsible for mutation and exactlyone for collection, run concurrently:

M M(M I) M M. . .(I I I. . .) (R R R. . .)

(I I I. . . ). . . .

The mutator aids the marker by setting a cell gray at the point of allocation. Inthe marking phase, the roots of thegraph are initially marked gray. Theidentification process scans the heapgraying all descendents of a gray celland then blackening the cell. As previ-ously, white cells are unreachable fromthe roots. In the scan phase, white cellsare reclaimed and black cells are whit-ened.

The collector is incomplete because itmay take two cycles to reclaim a deadcell. Dead gray cells are blackened inone invocation of the marker and thenwhitened in the next. With direct iden-tification, an incomplete collector un-derestimates the amount garbage cells.With indirect identification an incom-plete collector overestimates the livecells. Dijkstra et al.s scheme allowsconcurrency of mutation and collection,but the phases of identification and rec-lamation are strictly serialized. Thishas the effect that a mutator may stillhave to wait until a collection finishes if there is no free store to make an alloca-tion. Starvation of the mutator isavoided by Queinnec et al. [1985].

Despite the decoupling the extra color




11/44

gives, concurrency of the mutator andcollector has to be carefully controlled toprevent unsoundness. Several pur-ported implementations have contained

subtle synchronization problems [An-drews 1991]. A sound solution requiresthe mutator and collector actions of testing and changing colors to beatomic. Without atomic actions, concur-rency leads to lost updates, sometimescalled the test-and-set problem.

Wadler [1976] has shown that a con-current mark-scan collector uses agreater proportion of the computationtime than the sequential scheme. This

might be expected because of the over-head of atomic operations and becausethe collector runs even when there is nogarbage to collect.

2.3.3 Scavenging Collectors

The generality and modularity of com-pacting mark-scan account for the at-tention it has received in the past threedecades. The language implementations

of the 1960s for which mark-scan collec-tion was originally intended had smallphysical memories (by current stan-dards). For small address spaces, theexecution cost of scanning the entirestore is negligible. With large modernsystems, compacting mark-scan is inef-ficient because of its global nature. Themarking phase inspects all accessiblecells while the sweep phase traversesthe whole store twice. Ungar [1984] re-ported that Fateman found mark-scanto take 25% to 40% of the mutator timeof Franz-Lisp programs. Wadler [1976]reported that typical Lisp programsspend from 10% to 30% of their timecollecting.

The cost of the scan phase of mark-scan is proportional to the total size of store. This phase can be eliminated if,rather than scanning, live cells are relo-cated as they are identified. The store ismanaged as two heaps, historicallycalled semi-spaces [Fenichel and Yochel-son 1969]. The mutator begins allocat-ing in from-space . When the heap isexhausted, the collector scavenges. A

scavenge is a simultaneous traversaland copy of the computation graphfrom from-space to the second heap,to-space :

M M M . . . ( I R I R I R . . . ) ( R R R . . . ) M M M . . . .

Multiple mutations are followed bycombined identification and reclama-tion. In Fenichel and Yochelson [1969],when each cell is moved to to-space, aforwarding reference is left behind.(This can be compared with the for-warding reference of mark-scan collec-tors, Section 2.3.1). After a scavenge, ascan of to-space is needed to redirect

references to from-space. From-spacethen becomes free and can be reused.The two semi-spaces are flipped and themutator continues allocating in the newfrom-space. This combination of treetraversal and copying also has the advantage of improving locality, which isbeneficial in virtual address spaces.

2.3.4 Incremental Scavenging

The FenichelYochelson scheme ap-peared in the late 1960s, but only in thelate 1970s had technology changed suf-ficiently that new algorithms for gar-bage collection were required. Proces-sors became faster, memories becamelarger, and programs became signifi-cantly larger. There is a Parkinsonslaw in operation: programs expand tofill the memory available. As programdata increased from tens of kilobytes tomegabytes, the time required to collectgarbage increased dramatically. By thelate 1970s, pauses resulting from gar-bage collection could last tens of secondsor more. At this time, Baker [1978] pro-posed a modification of Cheneys [1970]compacting algorithm that avoided sub-stantial collector interruptions.

In Bakers incremental scavenger, themutator is given some responsibility forreclamation. After the from-space be-comes full, the mutator allocates newcells in the to-space. Each time the mu-tator allocates a cell in to-space, a num-ber of live cells are traced and copiedfrom from-space to to-space. This means




12/44

that the two semi-spaces are simulta-neously active. A consequence of this isthat mutation and collection are inter-leaved:

(M I R) ( M I R I R). . . .

By distributing mutation through col-lection, the conservative scavenging col-lector gives bounded collection when cellsize is bounded. Baker examined theeffect of varying the number of cellstraced at each invocation on the mem-ory requirements. He concluded thatthe maximum memory requirement of conservative scavenging is similar tothe use of two mutators running concur-rently. Scavenging schemes trade spacefor time because they require twoheaps. Consequently, they have muchhigher space overhead than eithermark-scan or reference-counting algo-rithms. A major reason for their successis that virtual memory appears cheap,so flagrant use of address space be-comes acceptable. It is, of course, paidfor by I/O costs.

Baker [1992] recently described a col-lector that is isomorphic to his originalconservative copying algorithm [Baker1978], but does not require relocation.Baker recognized that the spaces of ascavenging collector are just manifesta-tions of sets of cells. Any other reifica-tion of sets would do just as well. Allthat is necessary for any cell is to iden-tify which set (from-space or to-space) itbelongs to. It need not be copied if itsallegiance can be transferred from oneset to another. Baker requires two refer-ence fields and a color field for each cell.The reference fields link each cell intodoubly-linked lists that implement sets.The color field indicates which set a cellbelongs to. The colors of Baker can becompared with the colors of Dijkstra etal.s [1978] concurrent mark-scan collec-tor. The colors gray and black serve todistinguish alternate collections.

In Baker [1992], all free store is ini-tially in from-set . An allocation refer-ence serves to divide the list into thepart that has been allocated and the

remaining free part. Allocation is asefficient as heap store management be-cause it only requires advancing thepointer by the size of the new cell. When

the free space is exhausted, the collectortraverses the reachable cells andmoves them from the allocated from-set to to-set by unlinking the cell fromfrom-set, toggling its color field, andlinking it into to-set. When all thereachable cells have been traversed andreassigned from from-set to to-set, from-set is known to contain only garbage,and is therefore a list of free store.

Free-list store management is best

suited to languages that use equal-sizecells. If cells of different sizes are man-aged, the free-list must be searched tofind a cell of appropriate size. This canlead to fragmentation, poor locality of reference, and thrashing. These are justthe problems that copying solves.

2.3.5 Generation Scavenging

Lieberman and Hewitt [1983] observed

that cells tend to die young and thatlong-lived cells are typically very long-lived. Having to copy long-lived cells forevery invocation of the collector seemsextravagant. Baker [1992] and Dijkstraet al.s [1978] collectors are incompleteand effectively distinguish new cellsfrom old cells with colors. Liebermanand Hewitts collector segregates cellsinto generations, each with its own pairof semi-spaces. Each generation may bescavenged without disturbing olderones. Younger generations are scav-enged more frequently. The youngestgeneration will be filled most rapidly,but on flipping very few cells survive.This drastically reduces the amount of copying. Generations can be created dy-namically when the youngest genera-tion fills up with cells that survive sev-eral flips.

Ungar [1984] presents a simpler,more efficient generation scavenger.This collector classifies cells as eithernew or old . Old cells live in a region of store called Old-Space , OS. Old cellsthat reference new ones are members of




13/44

the Remembered-Set , RS cells are addedto RS as a side effect of the mutator.Cells that no longer refer to new cellsare removed from RS when scavenging.

All new cells must be reachable fromRS, and so RS behaves as roots for thenew cells. Any traversal of new cellsonly needs to start from RS.

Three heaps are used for the newcells: New-Space , NS, a large nurseryheap where new cells are spawned; Past-Survivor space, PS, which holdsnew cells that have survived previousscavenges; and Future-Survivor space,FS, which remains empty while the mu-

tator is in operation. A scavenge copieslive cells from NS and PS to FS spaceand then flips PS and FS. Cells thathave survived more than a prescribednumber of flips are copied to OS, aprocess called tenuring . With Ungarscollector, the mutator is stopped during scavenging. This elides forwarding ref-erences and achieves some performancegains. While explicitly not concurrent,pause times are short because genera-

tions are small. By carefully tailoring the size of NS, FS, and PS, an imple-mentation of Ungars scheme for Small-talk manages to keep scavenge times toa median of 150 milliseconds occurring every 16 seconds [Ungar 1984] on a Sunworkstation.

Other generation-based collectors in-clude: opportunistic collectors [Wilsonand Moher 1989]; ephemeral collectorsused in Symbolics machines [Moon1984]; and the Tektronix Smalltalk col-lector [McCullough 1983]. All threecommercial Smalltalk systems, Digital,Tektronix, and ParcPlace, adopted gen-eration scavengers [Ungar and Jackson1988]. The New Jersey SML compiler[Wilson 1992a] also includes a genera-tion collector. Demers et al. [1990] haveinvestigated a generation scheme com-bined with a mark-scan garbage collec-tor for use with Scheme, Mesa, and Cintermixed in one virtual memory. Be-fore Demers et al. [1990], many believedthat only scavenging collectors could bemade generational.

Wilson et al. [1990] show that genera-

tion scavengers typically have poor lo-cality of reference, but careful attentionto memory hierarchy issues greatly im-proves performance. They attributed

the small success recorded by severalresearchers in their attempts to im-prove locality to two flaws in traversalalgorithms. They failed to group datastructures in a manner reflecting theirhierarchical organization. What is moreimportant, they ignored the disastrousgrouping effects caused by reaching cells by linear traversal of hash tables(i.e., in pseudo-random order).

A generation scavenger that adapts to

the allocation patterns of applicationswas presented by Hudson and Diwan[1990]. This collector has a variablenumber of fixed-size (power of 2) gener-ations. The generations are placed instore at contiguous addresses. The gen-eration is apparent from the most sig-nificant bits of the address. Each gener-ation has its own to-space, from-space,and RS (remembered-set). RS is fed in-directly through a buffer containing ad-

dresses of possible intergeneration ref-erences. The feeder may filter outduplicates, intrageneration referencesand nonreferences. When scavenging more cells than a generation can accom-modate, a new generation is inserted.To retain the ordering, the younger gen-erations are shuffled backwards during scavenging. Conservative collectors thatcopy cells when the mutator addressesthem have also been looked at by White[1980] and Kolodner [Kolodner et al.1989; 1991]. These reorder cells in theorder they are likely to be accessed inthe future, giving improved locality.However, the technique requires specialhardware. Other reordering optimiza-tions that dont require special hard-ware work by reordering pages withinlarger units of disk transfer [Wilson1990].

Although generation collectors are themost complex single-processor collectionschemes, they suffer poor performanceif many cells live just long enough to bepromoted before dying, the so-called premature tenuring problem . Ungar and




14/44

Jackson propose an adaptive tenuring scheme based on extensive measure-ments of real Smalltalk runs [Ungar1988; 1992]. This scheme varies the te-

nuring threshold depending on dynami-cally measured cell lifetimes. It alsoproposes a refinement that has beenincluded in the ParcPlace [1991] collec-tor. With languages like Smalltalk, in-teractive response is at a premium andmany large cells, mainly bit-maps andstrings, dont contain references toother cells. To avoid copying these cells,they are segregated in a large-cell spaceand tenured to OS (old space) when

opportune.Multigenerational collectors have tocope with the waterfall problem [McCul-lough 1983]: collecting a particular gen-eration requires collection of all youngergenerations. The result is that pausetimes will be longer for older genera-tions. While generation collectors collectintrageneration cycles, they cannot col-lect intergeneration cycles of referencesthat cross more than one generation.

Some schemes do not attempt to scav-enge old generations. In persistentstores, reclamation of such garbage isoften left to off-line reorganization [Un-gar 1984], where a full garbage collec-tion is done after the system has beenstopped. The ParcPlace [1991] Small-talk-80 generation garbage collector isbacked up by a mark-scan compactorthat collects OS.

2.4 Hybrid CollectorsTo tackle the memory leaks of cyclicstructures, Martinez et al. [1990] com-bine simple reference counting with alocal mark-scan. Besides the referencecount, an extra field holds the color of the cell: green, red, or blue. The generalidea is to perform a local mark-scanwhenever a reference to a shared sub-graph is deleted. That is, local mark-scan is initiated each time a reference isdeleted to a cell with counter greaterthan one. Marking starts from the de-leted reference, decrements the counter,and sets the color red (possible gar-

bage). The subgraph is then rescanned;any subgraphs with external references(nonzero count) are remarked green (ac-cessible) and their counts restored. All

other cells are marked blue (garbage). At the end of the cycle, all blue cells arepart of a dead cycle and may safely bereclaimed.

The algorithm has one major problem:the need to perform a local mark-scanevery time a reference to a shared cell isdeleted. This increases the complexityof the local mark-scan to O (n ), where nis the size of the shared subgraph. Inunoptimized functional languages, most

structures have a reference count of one[Clarke 1977], and the cost of the algo-rithm is exactly the same as the stan-dard reference count. This is not thecase for purer object-oriented languageslike Smalltalk, which make extensiveuse of sharing and cyclic data struc-tures, making the overhead of thisscheme high.

Lins [1990] addresses the problem byintroducing an extra state information

in the form of a fourth color, black, anda control queue . This data allows themark-scan to be done lazily. As in Wei-zenbaum [1963] and Glaser and Thom-son [1985], subgraphs are not scannedimmediately, but are queued in a spe-cial list, the control queue , and the rootcells set black. When the allocator isunable to supply memory, the controlqueue is scanned to reclaim possiblegarbage cycles. Lins and Vasques [1991]found that, with appropriate manage-ment of the control queue, no unneces-sary calls to mark-scan are made.

Lins [1992] applies the concept of thecell age from generation scavenging tothe problem of cyclic reference counting. A second counter records the age of cells. A global time counter is initializedto zero and is incremented every time acell is allocated from the free-list. Linsprofits from the age information in twoways. First, as most cells die young,mark-scan is initiated from the young-est cell in the control queue. Second, theage information gives a check on theabsence of cycles. A sufficient but not




15/44

necessary condition for the absence of cycles is that younger cells do not refer-ence older cells. During cycle detection

(red marking phase), a check for thecondition that the parent cells are olderthan their offspring is made. If at theend of the mark phase the condition istrue for all traced cells, the graph isacyclic. As a result, cells can be putdirectly into the free-list or restored totheir original status without having tobe set blue.

3. DISTRIBUTED GARBAGE COLLECTORS

For the purposes of this paper, a distrib-uted system means a collection of au-tonomous sites that share a communica-tion facility for exchanging messages.Each site has its own store, at least onemutator, and at least one stack. Thecomputation graph and roots are dis-tributed over a number of sites (Figure5). A similar structure is exhibited bydistributed file systems [Garnett andNeedham 1980], distributed object-ori-ented databases [Delobel et al. 1995]and web pages.

A reference to a cell in the same siteis said to be local . A reference to a cell

on another site is said to be remote .Four classes of garbage subgraphs areexemplified in Figure 5: intrasite acyclic garbage (e.g., v, w, x); intrasite cyclic garbage (e.g., i, j, k,

m); intersite acyclic garbage (e.g., n, r, t); intersite cyclic garbage (e.g., c, d, h, l,

o).

Processing power is necessarily local-ized in sites. Each site has direct accessonly to those cells that live in its localstore. Access to a remote cell is achievedby sending a message to the site onwhich it lives to spawn a task to per-form the required operation. Becausethere is no global address space, refer-ence to a remote cell is necessarily indi-rect. A cell references a remote importrecord and the import record referencesa local cell. To simplify remote refer-ences, a cell can reference a local exportrecord that in turn references the re-mote import record (Figure 6).

Indirection due to the absence of ahomogeneous address space is com-pounded because message transmissionbetween sites is unreliable: messages

Figure 5. An illustrative, but small, distributed computation graph.




16/44

may be duplicated, delivered out of or-der or lost, and sites may temporarilybe incommunicado. Establishing reli-able message transmission betweencomputers is a complex problem. Thiscomplexity is handled in communicationsystems by the principle of division of concerns. The communication system islayered, lower layers guaranteeing service properties to higher layers. Typicalproperties offered to the applicationlayer are that messages are not lost, messages are not duplicated, and messages are delivered in mutual-

causal order (FOFI 1 ) between pairs of sites.

The Internet TCP/IP protocols haveemerged as the de facto open systeminterconnection. The main tasks of the Internet Protocol , IP layer are the frag-mentation of messages into packets andthe routing of packets to destinationmachines. The size of a packet is deter-

mined dynamically by a number of fac-tors that include network loading. IPmakes a best effort to forward packetsto the next destination, but forwarding is not guaranteed. If a router is overrunwith packets, it discards them. If arouter fails, other routers send packetsalong alternative paths. Thus, packetsmay be duplicated, arrive out of se-quence, and take a relatively long timeto arrive intact at their destination.

Above the IP layer, TCP eliminatesduplicates and reassembles the packets

in their correct order. In more detail,TCP, like Unix, is byte-oriented. A se-quence number gives a position in thebyte stream of data so far exchanged. A checksum is applied to each packet. A number of packets received intact(checksum agrees) can be acknowledgedith a byte position. A packet is retrans-mitted if it has not been acknowledgedafter a certain time, the time-out . Thetime-out is determined by network load-ing. Each message is received once andmessages between any two sites are re-ceived in the order in which they weresentmutual causal ordering.

To add to the problem of reliable mes-sage transmission, a remotely spawnedtask is not acted upon immediately.Once accepted, the task is added to thetask queue and must wait its turn. Thismeans that remote tasks may take aconsiderable time (relative to machineinstruction execution) to be acted upon. Latency is the elapsed time between theissue of a remote task-request messageand when it is executed. The latency istypically orders of magnitude greaterthan an instruction cycle, particularlyfor RISC processors. For efficiency,avoiding processor idling while waiting for a response is vital. In most evalua-tions of distributed garbage collection,communication overhead is the princi-pal metric. Published measurements[Bennett 1987; Schelvis and Bledoeg 1988] indicate remote cell access to beslower than local by three to four ordersof magnitude.

Early distributed garbage collectorswere, naturally enough, based on sin-gle-address-space collectors. Develop-1 First Out First In.

Figure 6. A remote reference.




17/44

ments in distributed garbage collectioncan well be understood as the trial-and-error adaptation of the ideas developedfor single-processor, single-address-space collectors to the distributed envi-ronment. Garbage collection processes,

mutation, identification, and reclama-tion, are necessarily decomposed accord-ing to site boundaries. The high cost of communication relative to local compu-tation make efficient distributed gar-bage collection difficult enough. Thisproblem, as will be illustrated, is com-pounded by the problem of indetermin-istic latency.

3.1 Direct Identification of Distributed

GarbageIt was noted in Section 2.2 that, withdirect identification, the processes of identification and mutation are local-ized. This initially seems ideally suitedto distribution, as (M I) phases can runconcurrently on different sites. Onlysmall portions of the graph that havebeen affected by a mutation need beconsidered for reclamation. These canbe reclaimed concurrently on differentsites. A straightforward attempt to usereference counting in distributed envi-ronments exposes the problem of inde-terministic latency. A succession of im-

provements elide the problem bytransferring successively more state in-formation from the cell to the reference. At the same time, these solutions re-duce the communication overhead.

3.1.1 Distributed Reference Counting

One of the earliest distributed refer-ence-counting collectors was describedby Nori [1979]. To preserve the refer-ence count invariant, it is essential thatmessages are not duplicated. Even if thecommunication system does filter dupli-cate messages, counts can become un-sound if a decrement count task is actedon before a corresponding incrementtask, as illustrated in Figures 7 and 8.In these timing diagrams, the horizon-tal axes indicate spatial distributionand the vertical axes increasing time. A message between sites is represented byan arrowthe tail of the arrow belowthe arrowhead reflects the latency. Sup-pose site A duplicates a reference to acell b on site B by sending a cp(@b)message to site C. (With a C-like nota-tion, @n denotes the reference to anentity denoted by n.) If either A or C isresponsible for incrementing the count,premature reception of a decrementcount task could lead to a dangling ref-erence.

Figure 7. Site A increments the count.




18/44

To prevent these race conditions , Ler-men and Maurer [1986] impose a proto-col (Figure 9) on remote duplication anddecrement tasks. When a site A thatholds a reference to a cell b on site Bwants to duplicate the reference on siteC, it sends an acknowledge-request mes-sage, ack_req(@b, @C), to site B in addi-tion to the copy message, cp(@b), to siteC. The copy message provides C with areference to b. The ack_req(@b, @C)

message informs B about the creation of the new reference. When site B acts onthe ack_req(@b, @C) task, it incrementsbs reference count and sends an ac-knowledgment ack(@b) message to siteC. On arrival, this informs site C that cellb knows about the additional referenceto it. Lermen and Maurers [1986] proto-col ensures that site C cannot send adelete message, del(@b), to site B before itaccepts the ack(@b) message from B.

Figure 9. LM protocol.

Figure 8. Site C increments the count.




19/44

It is essential for soundness of theprotocol that the messages be deliveredin mutual causal order (FOFI) (Figure10). The crossing of message arrowsreflects the nonmutual causal order-ing. Suppose site A sends a del(@b)

message to site B soon after sending ack_req(@b, @C). If these messages donot arrive at B in the order they weresent from A, there can be prematurereclamation of cell b.

3.1.2 Weighted Reference Count

An extension of reference counting thatelides the problem of nonmutual causalorder is WRC, weighted reference count-ing . There is some controversy as to itsorigin. The scheme was published at thesame conference by Watson and Watson[1987] and Bevan [1987]. Watson andWatson attribute the algorithm to Weng [1979], but Thomas [1981] credits Arvind. The idea is to associate a weightwith each reference. The count is onlydecremented, and so there can be norace conditions. The protocol guaranteespreserving an invariantthe sum of weights of all the references to a cell isequal to the count of the cell.

To illustrate how weighted referencecounting works, a cell is represented bya triple (Figure 11). (In general, a cell

will have any number of referencefields, but one is sufficient to illustratethe mechanisms). When a cell is allo-cated (Figure 12), its count is set to themaximum the field can hold and theweight of the reference set equal to it.

When a reference is duplicated (Fig-ure 13), the weight of the reference isdivided between itself and the copyitis not necessary to access the cell. Onlyone message is required to duplicate aremote reference. The sum of theweights of references pointing to the cellremains unchanged.

Figure 11. A cell with a single reference in WRC.

Figure 12. Creation of a new cell in WRC.

Figure 10. Non-FOFI order.




20/44

When a reference is destroyed (Figure14), its weight must be decremented

from the count to preserve the invari-ant. If this involves remote cells, a de-lete reference message, del(@cell,weight), is sent to the remote site host-ing the cell. If a cells count falls to zero,it is garbage and can be reclaimed.

Besides eliding race conditions, WRCreduces the communication overhead byeliminating the need for an incrementmessage when duplicating a reference.This is achieved at the cost of space forstoring a weight for each reference. If the weight is always a power of two, toallow for equal division, the log 2 of theweight can be stored. This provides animportant reduction in the space re-quirement. However, when a referenceis deleted, the weight must be converted(by shifting) to effect subtraction, soincreasing the overhead of identifica-tion.

A problem occurs, underflow , when areference weight of 1 needs to be dupli-cated. A reference with a total weight Wcan have at most W references, each of weight 1. This could be overcome byadding a fixed number to the weight

and the count of the reference. But thisproposal is essentially the same as in-

crementing the count, and so suffersfrom the same race conditions as naivereference counting (Section 3.1).

A sound solution is the use of indirec-tion illustrated in Figure 15. When aweight falls to one, an indirection allowsfurther duplication. This has the disad- vantage of requiring two messages toaccess a cell if a reference and its indi-rection live on different sites: one to thesite hosting the indirection and one tothe site hosting the cell. In a worst-casescenario, a long chain of indirectionscan be created. Once an indirection iscreated, it remains for ever. Rudalics[1990] calls this the domino problem. A reference consisting of a long chain of remote indirections may even loop backto a local cell a number of times.

3.1.3 Generation Reference Count

Generation reference counting , GRC[Goldberg 1989], provides another solu-tion to the problem of duplicating a unitweight reference in WRC. This isachieved by replacing the weight by a

Figure 13. Duplication of a cell in WRC.

Figure 14. Deletion of a reference in WRC.




21/44

generation and a copy count . Each newlycreated reference is a zero generationreference. A copy of an ith-generationreference is an (i 1)th-generation ref-erence. The reference count is replacedby a table, called a ledger , which countsthe references from each generation.

To illustrate, a cell is represented bya quadruplet (Figure 16). (As previ-ously, a cell may have any number of references but one again is sufficient toillustrate the mechanisms.) The ith ele-ment of the ledger contains a count of ith-generation references.

When a new cell is created (Figure17), the generation and count fields of the reference are cleared and the ledgeris initialized. (When, as in Figure 17,the ledger contains no relevant informa-tion it is omitted.) When a reference isduplicated (Figure 18), a new first-gen-eration reference is allocated.

When a reference is deleted (Figure

19a), a delete message del(@cell, n, cop-ies) is sent to the host site. On receipt,the host decrements the copy count forthe n th generation and increments thecount for the ( n 1)st generation bythe number of copies made. As a result,some elements of the ledger may holdnegative values. This can occur whendelete messages for ( n 1)th-genera-tion references are acted upon beforedelete messages for the n th-generationreferences. For example, in Figure 19b,if the copied reference is deleted, rather

Figure 15. Indirection to duplicate a reference of weight 1.

Figure 16. A single reference cell in GRC.

Figure 17. Allocating a new cell in GRC.




22/44

than the original, a ledger value will benegative. A ledger correctly indicatesoutstanding references to a cell for any

order in which delete messages are re-ceived. A cell is only reclaimed if all theledger entries are zero. This is only the

Figure 18. Duplicating a reference in GRC.

Figure 19. Reference deletion in GRC: (a) deleting a reference; (b) deleting a duplicated reference.




23/44

case if a delete message for every refer-ence has been received.

While GRC may have lower communi-cation overhead than WRC, it hasgreater computational and space re-quirements. If no underflow indirectionis needed, its communication overheadis the same as WRC, namely, one ac-knowledged message for each copy of aremote reference. Just as WRC is sus-ceptible to underflow , GRC ledgers canoverflow . Goldberg [1989] suggests us-ing indirection to solve this problem.Unlike WRC, the indirection will alwaysbe on the same site as the reference,thus adding no extra communicationoverhead.

3.1.4 Indirect Reference Count

Indirect reference counting , IRC [Ichi-sugi and Yonezawa 1990; Rudalics1990; Piquer 1991], provides a solutionto the problem of underflow in GRC.IRC replaces the generation by a refer-ence, parent , to the source of the copy.Each reference is a triplet (Figure 20).

The parent field is used to maintainan inverted diffusion tree of duplicatedreferences (Figure 21). The depth of areference in the tree is the generation.The number of vertices in the diffusiontree equals the total number of refer-ences to the root. The indirect referencecount counts the number of children of each vertex in the diffusion tree andcorresponds to the copy count of GRC.While WRC and GRC can be seen todivide state information between thecell and the reference, for IRC all stateinformation is maintained by the refer-ence.

When a new cell is created (Figure 22)the initial reference becomes the root of a diffusion tree. When a reference isduplicated, the copy count is incre-

mented and the new reference linkedinto the diffusion tree (Figure 23). Re-mote duplication requires just one mes-sage.

If a copy is deleted, the parent copycount is decremented, resulting in(again) Figure 22. For a remote refer-ence this requires one message. Whilereferences in the body of the diffusiontree can be excised, only cells that areleaves of the diffusion tree (zero copycount) can be reclaimed, Figure 24.Thus, like WRC, IRC can accumulatelarge amounts of floating garbage. How-ever, an excised reference can be re-

stored before its copy count reacheszero. This can be compared with de-ferred reclamation (Sect. 2.2.2).

3.1.5 Indirect Reference Listing (IRL)

Piquer [1991] suggests that the spaceoverhead of IRC is acceptable if it isused only for remote references. Follow-ing Fowler [1986], Shapiro et al. [1990]replace the copy count of IRC by a list of

references where duplicates have beendiffused. A reference is interpreted as ashortcut to the root of the diffusion tree.The additional space overhead of indi-rect reference listing, IRL, is justifiedby simpler management of messageloss, duplication, and latency. Plain-fosse and Shapiro [1992] describe a pro-totype implementation for Lisp; Birrellet al. [1993] describe an implementationfor remote objects in Modula3. Site fail-ure is detected by regular pinging. Theimport records of sites that do notpromptly acknowledge a ping are un-soundly deleted.

3.1.6 Trial Deletion

Vestal [1987] proposes trial deletion toremove cycles of garbage with indirectidentification. The algorithm is seededwith some cell suspected of being part of a dead cycle. The method consists of hypothetical recursive deletion of theseed and its referents and checking if this brings all the counts in the sub-graph to zero. A drawback of trial dele-

Figure 20. A reference in IRC.




24/44

tion is that, like recursive freeing, it isunbounded. Furthermore, seeds are cho-sen heuristically, so a bad choice canlead to wasted effort. The scheme can beseen as a generalization of Brownbridge[1985] (Section 2.2.5), where the strong counter is used as a heuristic. Trialdeletion has the same problem asBrownbridges scheme, mutually refer-encing cycles.

3.2 Indirect Identification of DistributedGarbage

Mohammed-Ali [1984] describes a num-ber of variations of distributed mark-scan collectors. The simplest, sequentialmark-scan (referred to as [Mohammed- Ali 1984a] in Table I), is not a seriouscontender but provides a straw man tocompare improved versions.

3.2.1 Distributed Mark-ScanMohammed-Alis [1984a] sequentialmark-scan requires mutation to be sus-pended during garbage collection on all

sites. Across the sites, the processeshave the following synchronous behav-ior:

site A:M M M . . . I I I. . . R R R . . . M M M. . .

: : : : :site Z:

M M M . . . I I I. . . R R R . . . M M M. . .

The vertical bars indicate global syn-chronization points.

Any site that has exhausted its freestore can initiate garbage collection bysending a request to some master site.This master may be designated stati-cally or determined dynamically. If dy-namic, the initiating site can be themaster but arbitration is necessary if more than one site simultaneouslyneeds to collect garbage. The mastersends a command to each site to sus-pend mutation. The master waits foreach site to report all messages in tran-sit have been received and acted upon.The master then directs each site tostart the identification (marking) pro-cess. Mohammed-Ali remarks that al-though fast, parallel breadth-firstmarking has high and unpredictablespace requirements that make it im-practical. The alternative, sequentialdepth-first marking, requires much lessspace. The master waits until each sitereports all messages in transit havebeen received and acted upon and localmarking is complete. The master thendirects each site to perform a local rec-lamation (scan). When all sites report

Figure 21. An inverted diffusion tree.

Figure 22. Allocating a new cell in IRC.




25/44

messages in transit and reclamation iscomplete, the master directs each to re-sume mutation.

Similar schemes have been imple-

mented in Berkeley Smalltalk [Schelvisand Bledoeg 1988] and the Emerald object system [Black et al. 1987; Jul et al.1988]. The problem with such schemesis that without global termination therecan be interference between mutation,identification, and reclamation on dif-ferent sites. Synchronization isachieved by the master waiting for allsites to report phase completion. A major problem is that a slow site cannot be

distinguished from failed site. Moham-med-Ali [1984] observes that while onlyone site needs to collect garbage, theothers are compelled to do so. Further-more, forcing sites to synchronize re-quires all but one site to be idle waiting for the last to complete (usually the onethat initiates the collection).

3.2.2 Distributed Concurrent Mark-Scan

Dijkstra et al.s [1978] concurrent mark-scan, which allows mutation to continuewhile collecting garbage, seems bettersuited to multiple mutators. One of thefirst distributed adaptations was themarking-tree collector [Hudak and Keller1982]. In this variation, there is assumedto be a single root of the whole distributedcomputation graph. (This is the case forgraph reduction of functional languages.)Identification and mutation take placeconcurrently across sites:

Each recursive mark step in Dijkstraet al.s scheme is replaced by a marktask. Each site maintains two task

queues: one for mutation operations andone for collection operations. Termina-tion of the mark phase is detected byeach mark task of a leaf node spawning

a task that is propagated upward in themark tree. Tricolor marking, as in[Dijkstra et al. 1978], is used to recordthe identification state of a cell but the

interpretation of the colors is subtly dif-ferent. A white cell is one to which iden-tification has not yet propagated. Ini-tially, all cells are white and aftermarking is complete, white cells iden-tify garbage. A gray cell is one to whichmarking has propagated and fromwhich a mark task has been spawned foreach of its referents. A black cell is of oneof two types: a newly allocated cell or apreviously gray cell for which all of itsspawned marking tasks have terminated.

Mutator tasks and identifier taskscompete to modify cells. Each task hasto lock all cells it intends to modify toprevent lost updates. At the end of themarking (identification) phase, whitecells are garbage and all tasks referenc-ing white cells are garbage. The scan(reclamation) phase first terminates allredundant tasks and then collects allwhite cells. No locks are necessary inthe reclamation phase because there isno contention with the mutator. Thealgorithm is concurrent, but the phasesof identification and reclamation mustbe globally synchronized across sites.

Similar mark-scan collectors are de-scribed by Augusteijn [1987], Vestal[1987], and Derbyshire [1990]. Au-gusteijn [1987] describes the collectorfor the object-oriented languagePOOL-T. Communication between objects is made using a rendezvous proto-

col with a sender suspending until itreceives a reply. A central synchroniza-tion object is introduced to establish

and maintain global invariants. As in the nondistributed version of concurrent mark-scan, the collector op-erates even when there is no garbage to

site A (mutator): M M M . . .s ite A (collec tor) : I I I . . . R R R. . . I I I. . .

: : : :site Z (mutator): M M M . . .s ite Z (collec tor) : I I I . . . R R R. . . I I I. . .




26/44

collect. Propagating gray marks causesa combinatorial avalanche of marking

tasks. Because collectors generally donot batch remote tasks, this imposeshigh message traffic. If batched, spaceneeded for storing these requests cannotbe determined in advance.

3.2.3 Central Coordination of Local Collection

Mohammed-Ali [1984b] proposes thatlocal garbage collection might freeenough space for a site to continue with-out requiring a global collection. Adapt-ing the area concept of Bishop [1977]developed for large (virtual) addressspaces, each site is provided with an Import Record Table , IRT, which holdsall import records. The IRT is used asadditional roots for local garbage collec-tion. Grouping the export records in atable, the Export Record Table , ERT(Figure 25) restores the symmetry.

Liskov and Ladin [1986] use the cli-ent-server model to extend local mark-scan with centralized identification of parts of the graph between import andexport records. Each local collector in-forms a server about the paths it knows

of. Local collectors query the centralizedservice for the current IRT. Dead inter-site cycles are detected by the central-ized service from the paths advised by thelocal collectors. The centralized servicebuilds a graph of intersite references anddetects dead cycles with a standard col-lector. While logically centralized, Liskovand Ladins [1986] scheme is physicallyreplicated to achieve high availability. A client communicates with a single replica;replicas stay up-to-date by exchanging background gossip messages.

By means of a counterexample, Ru-dalics [1990] demonstrates that the Lis-kov and Ladin [1986] scheme is unsound. A scenario can occur when a cell, such asb in Figure 26, has more than one refer-ence to it. If the local marker on site Ctraverses cell d before a, cell b will only betraversed once. At the end of collection,site C only informs the server of the pathbetween c and d and not the one betweena and c. The central server unsoundlyconcludes that d and c are garbage. Ru-dalics proposes two computationally ex-pensive solutions to overcome the prob-lem.

Figure 23. Duplicating a reference in IRC.

Figure 24. Deleting a reference in IRC.




27/44

3.2.4 Cell Migration

As with generation scavengers, Section2.3.5, local collection does not removegarbage subgraphs that cross site

boundaries. While generation scaven-gers give a temporal segregation of cells, distributed systems have a spatialsegregation of cells. Following Bishop[1977], El-Habbash et al. [1990] proposemigrating cells so that intersite cyclescan be reclaimed by local collection (Fig-ure 27).

El-Habbash et al. introduce a Private-Table , PT, to provide complete location-independent addressing. Cells are parti-tioned into locality clusters , each withits own IRT, ERT, and PT. A cluster is alogical partition of cells in contrast to aphysical partition, a site. Ideally, a clus-ter has many more intracluster refer-ences than intercluster references. Thedivision of cells into locality clusters canbe compared with generation scaven-gers (Section 2.3.5), a division of cellsinto temporal clusters. Remotely refer-enced cells in a locality cluster are givenunique public identifiers , PIDs. Cellsthat are only referenced locally are notknown outside the cluster and are givenlocal identifiers , LIDs. The LIDs com-prise entries in PT. A major problem

with this scheme is generating uniquePIDs, particularly in a very large net-work.

Clusters are the unit of managementfor El-Habbash et al. [1990]. The objec-tive of management is to increase thelocality of reference of a cluster. Gar-bage collection is a by-product of in-creasing locality. To increase locality,cells may migrate from cluster to clus-ter via archive clusters. Subgraphs thatare only reachable from IRT are trans-ferred to an archive cluster. When anarchived cell is accessed from anothercluster, that cell and its subgraph aremoved to the referencing cluster. Cellsthat are not accessed remain in thearchive. Starting from the roots of acluster and traversing the subgraphsrooted in them, any cells encounteredremain in the cluster. Cells that are notreachable from the roots are moved toan archive. The cells that are not reach-able from any remote cells (roots ornonroots) in the cluster are garbage.

The El-Habbash et al. [1990] collectoris intended for use in persistent envi-ronments such as Smalltalk. A similarscheme for persistent store is describedby Moss [1990] for the Mneme project.Moss equates a persistent store with a

Figure 25. Distributed GC by local collection.




28/44

database, but cell retention is based onreachability (in garbage collection) asopposed to explicit deletion (in the data-base sense).

One problem with cell migration as a

means of collecting intercluster cycles isthrashing. Migration can lead to a sce-nario (Figure 27a) where a is migratedto D, d to C and c to A. El-Habbash etal. [1990] propose a total ordering onclusters (such as name ordering) toavoid thrashing. A cell can only migrateto an inferior cluster. A more seriousproblem is that archival garbage collec-tion is controlled by setting time limitson access. With slow sites this will lead

to unsoundness.3.2.5 Pipelined Local Collections

As with generation scavenging, copying large cells is expensive. Mohammed-Ali[1984c] suggests that garbage thatcrosses site boundaries can be collectedif at the end of a local collection a siteinforms other sites of the export recordsit holds. A message containing a refer-ence may be in transit when a localcollection is invoked. This can lead to acell not being identified as live. Moham-med-Ali proposes each site be providedwith a temporary Transport Table , TT,

which records in-transit references.These are moved to IRT or ERT whenthey are acknowledged.

Rudalics [1986] describes a distrib-uted collector adapted from Bakers[1978] incremental scavenger. Each sitehas two semi-spaces used for garbage-collecting local cells. The upper part of each semi-space is used for exportrecords. The import records are linkedin either of three lists. The first two actas semi-spaces for external references,while the third corresponds to Moham-med-Alis TT. As with single-address-space generation scavengers, neitherRudalics nor Mohammed Alis scheme isable to identify cycles of garbage thatspan more than one site.

Hughes [1985] describes a way of pipelining local collections that can de-tect intersite cycles of garbage. This isachieved by propagating timestamps inplace of marks. Import and exportrecords are initialized with a globalclock [Lamport 1978]. Necessary condi-tions for a global clock are that theunderlying message-passing systemguarantees that messages are not lost,duplicated, and arrive in mutual causalorder (FOFI). An export record reach-able from a local root is marked with

Figure 26. Rudalics counterexample.




29/44

the time at which the local marking phase started. An export record tracedfrom an import record adopts the time-stamp of the traced import record.

At the end of a local collection, exportrecord timestamps are sent to corre-sponding import records. If the time-stamp of the export record is greaterthan the import record, the importrecord timestamp is updated. When re-ceipt of all such messages has been ac-

knowledged, the local clock is incre-mented to the greatest propagatedtimestamp. In this way, the timestampof a dead import or export record re-mains constant while live ones increase.

Import records that carry a times-tamp less than some threshold are re-claimed. The threshold is the least localtimestamp. Hughes determines thethreshold using Ranas [1983] termina-tion algorithm. A problem is that a slow

Figure 27. Cell migration: (a) intersite cycle of garbage; (b) migration of two cells.




30/44

site unwilling to initiate a local collec-tion will leave the threshold at the ini-tial value. This is the case even when aslow site does not hold any remote refer-

ences.

3.3 Distributed Hybrid Collectors

While WRC, GRC, and IRC elide raceconditions (and at the same time reducethe communication overhead), they suf-fer the same problem as their single-address-space progenitors: memoryleaks due to cycles of garbage. Worsestill, the cycles may be intersite, such as

o-c-h in Figure 5.Lins and Jones [1991] give an adapta-tion of the cyclic reference counting schemes of Martinez et al. [1990] andLins [1990] to the distributed environ-ment. The algorithm combines WRC(Section 3.1.2) with Lins [1990] localmark-scan (Section 2.4). The algorithmhas the same problem as its progenitor:the need to perform a local mark-scanevery time a reference to a shared sub-

graph is deleted. Successive attempts toaddress these problems are presentedby Jones and Lins [1992, 1993]. As ad-mitted by the authors, the scheme hasfour deficiencies. The first is that recla-mation of garbage cycles may be de-layed indefinitely. Second, the schemehas higher storage overheads thanWRC. Third, the three phases of gar-bage collection require termination de-tection. Last, unlike Hudak and Keller[1982], the scheme cannot detect norremove tasks that become redundantdue to garbage collection. Dehne andLins [1994] attempt to address theseproblems. The scheme allows sites toperform local mark-scan without theneed to synchronize the phases eitheron a single site or across sites. Thisrequires six colors.

Piquer [1991] suggests that the spaceoverhead of IRC (Section 3.1.4) is ac-ceptable if it is only used for remotereferences. Intersite cycles can be col-lected by cell migration and local directidentification collectors. Inverted diffu-sion trees (Section 3.1.4) can easily sup-

port cell migration with an overhead of only one decrement message betweensource and destination sites. The migra-tion of a cell requires a change of the

root in the diffusion tree (Figure 28).This operation is trivial, as the old rootis known: the new root is extracted fromthe tree and the old root added as achild of the new root. The extractioncosts one decrement message and theaddition is done locally at the respectivesites (the new and old roots). Like ElHabbash [1992], a total ordering onsites will avoid thrashing of migration.Cell migration can, however, lead to

unsoundness if references to the old lo-cation of a cell are in transit while a cellemigrates.

Shapiro et al. [1992a] describe anRPC (remote procedure call) implemen-tation of a hybrid collector that usesIRL (Section 3.1.5) for remote refer-ences and local tracing collectors. Thegarbage collector is tightly coupled withan object management system. The cellfinder RPC handles cell deletion and

site crashes. When given an indirect(parent) reference, the procedure lo-cates the cell referred to. In this way,the reference field is completed lazily.Other RPCs include reference-sending,cell migration, cycle-detection, and ab-normal termination.

As in Hughes [1985] (Section 3.2.5),messages in Shapiro et al. [1992a] aretime-stamped by a local monotonic (in-creasing) clock. Each IRT entry isstamped with the clock value of the lastcorresponding message sent. UnlikeIRC (Section 3.1.4), remote references tothe same cell each have separate importrecords. Each site maintains a vector of highest time-stamped messages re-ceived from other sites. Unlike Hughes[1990], clocks on different sites need notbe synchronized; a total count of trans-mitted (mutator and control) messagesis sufficient for the purpose. To detectduplicated or lost messages, a list of export records is sent to the site refer-encing them.

When a mutator exports a referenceto another site, it is first added to the




31/44

local IRT. Both the IRT and the ERTare incomplete (overestimates). Local

garbage collection proceeds from bothlocal roots and the IRT. Shapiro et al.use two colors in local marking. A cellaccessible from the local root is marked green . A cell accessible only from theIRT is marked red . The collector re-moves garbage entries in the ERT send-ing update information to the IRT en-tries in appropriate sites. This, in turn,allows previously referenced IRT en-tries to be collected. Unlike the distrib-

uted concurrent mark-scan collectors(Section 3.2.2) the interface between theglobal collector and other components(i.e., the mutator and the cell finder) islimited to just the IRT and ERT. Up-dates to IRT and ERT can occur inparallel with other activities.

In a prototype distributed Small-talk-80 system, Bennett [1987] de-scribes a scheme that pipelines localdeferred-reference-counting collectorsthrough global-reference-counting andmark-scan collectors. The global collec-tors rely on the local collection to enu-merate the export records, called proxycells . Bennetts fast global referencecounter relies on cells in alternate col-lection cycles being distinguishable.Each IRT entry has a flag that identi-fies import records created since thestart of a collection. This is similar tothe gray color of Dijkstra et al. [1978].During a local collection, each site enu-merates its export records and for eachsends a message that increases the ex-ternal reference count in its correspond-ing IRT entry. After this marking

phase, live remotely referenced cellshave a nonzero external reference

count. Each site then scans its IRT andremoves those cells with a zero externalreference count that were in existencebefore the start of the cycle (i.e., notgray). Any referents not referenced lo-cally are reclaimed by the sites localcollector.

Bennetts fast collector cannot detectintersite cycles. The second, slower col-lector is a mark-scan algorithm. Themarking phase proceeds from those cells

in the IRT that also have local refer-ences (determined by the local referencecount). References are followed to ex-port records and messages are sent tothe remote sites to continue the traceremotely. At the end of the phase, inter-site cycles have not been marked andcan be removed from the IRT.

Lang et al. [1992] describes a schemethat pipelines local tracing collectorsthrough a global reference-counting col-lector. When an ERT entry is reclaimed,a decrement message is sent to the sitehosting the corresponding importrecord. If the decrement action bringsits counter to zero, the IRT entry isreclaimed. This is the only mechanismfor reclaiming IRT entries. It is sound,since sites that are down do not senddecrement messages.

Lang et al.s [1992] sites are orga-nized into groups that cooperate to re-move garbage cycles that span theirmembers. The groups can be hierarchi-cal with the largest containing all thesites. Collection begins with group es-tablishment. The composition of a group

Figure 28. Cell migration in an inverted diffusion tree.




32/44

can be determined statically or dynami-cally, but is independent of collection. A site would contemplate group collectiononly if local collection does not free

enough space for the mutator to con-tinue. When a site fails to cooperate, thegroup is reorganized to exclude it andcollection continues without losing workalready done. Messages with acknowl-edgments and time-outs are used to de-tect noncooperating sites. Multiple over-lapping group collections can besimultaneously active if each group as-sociates a unique identifier to a collec-tion.

A group cooperates to collect theirERTs by direct identification. Local gar-bage collection is used to transmitmarks from IRTs to ERTs. For eachgroup collection, IRT and ERT entrieshave a mark that is local to the group.IRT entries may be marked soft or hard .The ERT entries may be marked hard ,soft , or none . Effectively, an IRT entry ismarked hard if it is needed outside thegroup or is accessible from a root of a

site in the group. It is marked soft if itis referenced only from another memberof the group.

Local garbage collection has twomarking phases. In the first phase, theinitial marks of IRT entries are deter-mined from the reference count and ref-erences from members of the group (af-ter Christopher [1984]). All marks onERT entries are reset to none . Marking proceeds from both local roots and hardIRT entries. Any ERT entry reached bythis tracing is marked hard . In the sec-ond phase, tracing starts from the softIRT entries. Any ERT entry reached ismarked soft if it is not already markedhard .

After a local garbage collection, theERT entries that are marked none aregarbage. They can be reclaimed whilesending decrement messages to the IRTentries they reference. ERT entriesmarked hard (and the IRT they refer-ence) are reachable either from a hardIRT entry or from a local root. When anERT entry is known to be hard , itsmark has to be propagated to the IRT

entry it references (if it is in the group).When a new remote reference is cre-ated, the associated IRT entry ismarked emphhard (as in the distributed

version of concurrent mark-scan (Sec-tion 3.2.2)) since it is necessarily acces-sible from a root.

After n such marking cycles, where nis the number of sites a cycle spans, allhard IRT entries are directly or indi-rectly accessible from a root or from asite outside the group. IRT entriesmarked soft are inaccessible, and canthus be safely reclaimed. Such soft IRTentries are set to reference nil rather

than a local cell. The unreachable off-spring of these IRT entries will be re-claimed by the next local GC. Similarly,the ERT entries that were kept aliveexclusively by these entries will be re-claimed by the next local GC. The recla-mation of such an ERT entry causes thesending of a decrement message to theIRT entry it references. In the case of dead cycles, dead IRT entries in thecycle eventually receive decrement mes-

sages from all the dead ERT entriesthat reference them. Hence their refer-ence counts decrease to zero and theyare eventually reclaimed by the refer-ence-counting mechanism. This protocolis conservative as it achieves a deferredreclamation instead of a synchronized

garbage collecting the internet

Documents