cs.technion.ac.il/~assaf/publications/gc.ps

86
Remote Reference Remote Reference Counting Counting Distributed Garbage Distributed Garbage Collection with Low Collection with Low Communication and Communication and Computation Overhead Computation Overhead www. cs .technion.ac. il /~ assaf /publications/ gc . ps

Upload: wanda-pearson

Post on 30-Dec-2015

16 views

Category:

Documents


0 download

DESCRIPTION

Remote Reference Counting Distributed Garbage Collection with Low Communication and Computation Overhead. www.cs.technion.ac.il/~assaf/publications/gc.ps. Distributed Systems. Consist of nodes: Lowest level: local address space Next level: disk partition, processor Top level: local net - PowerPoint PPT Presentation

TRANSCRIPT

Page 2: cs.technion.ac.il/~assaf/publications/gc.ps

Distributed Systems• Consist of nodes:

– Lowest level: local address space– Next level: disk partition, processor– Top level: local net

• Interaction through message passing

• Failures:– Due to hardware or software problems– Disconnection: due to network overload, reboot...

Page 3: cs.technion.ac.il/~assaf/publications/gc.ps

Distributed GC

• Motivations:– Transparent object management– Storage management is complex - not to be

handled by users

• Goals:– Efficiency– Scalability– Fault tolerance

Page 4: cs.technion.ac.il/~assaf/publications/gc.ps

Distributed GC

• The main problem:– A section of GC code running on one node

must verify that no other node needs an object before collecting it

• Result:– Many modules must cooperate closely, leading

to a tight binding between supposedly independent modules

Page 5: cs.technion.ac.il/~assaf/publications/gc.ps

Distributed GC

• Problems with simple approaches:– Determining the status of a remote node is

costly– Asynchronous systems inconsistent data– Failures

Page 6: cs.technion.ac.il/~assaf/publications/gc.ps

Remote References

• Terminology:– Owner - node which contains the object

– Client - node which has a reference to the object

• Creation:– A reference to an object crosses node boundaries

– Side effect of message passing

• Duplication:– Client of a remote object sends to a receiver node a

reference to that object

Page 7: cs.technion.ac.il/~assaf/publications/gc.ps

Naive Reference Counting

• Keep a reference count for each object• Upon duplication or creation, inform the

owner to update the counter, by sending him a control message

• Problems:– Increases communication overhead– Loss or duplication of messages– Race between decrement/increment messages

Page 8: cs.technion.ac.il/~assaf/publications/gc.ps

Race Conditions in Naive Reference Counting:

Decrement/Increment

RA

X

RB

V

RC

U

&V

+1-1

Counterv = 1

Page 9: cs.technion.ac.il/~assaf/publications/gc.ps

Race Conditions in Naive Reference Counting:

Increment/Decrement

RA

X

RB

V

RC

U

&V

+1 -1

Counterv = 1

Page 10: cs.technion.ac.il/~assaf/publications/gc.ps

Avoiding Race by Acknowledge Messages

RA

X

RB

V

RC

U

&V

+1

Counterv = 1

ack

2

Page 11: cs.technion.ac.il/~assaf/publications/gc.ps

Weighted Reference Counting

• Each object referenced has a partial weight and a total weight

• Object creation: – total weight = partial weight = even value > 0

RB

V

Total = 64

Partial = 64

Page 12: cs.technion.ac.il/~assaf/publications/gc.ps

Weighted Reference Counting:Reference Duplication

RA

X

RB

V

RC

U

Totalv = 64

Partialv = 32

Partialv = 32 Partialv = 16

&V/16

16

partial weight halved and sent with the reference

Page 13: cs.technion.ac.il/~assaf/publications/gc.ps

Weighted Reference Counting:Reference Deletion

RA

X

RB

V

RC

U

Partialv = 16 Partialv = 16-16

Totalv = 64

Partialv = 32

48

partial weight sent to owner and subtracted from total weight

Page 14: cs.technion.ac.il/~assaf/publications/gc.ps

Weighted Reference Counting

• Invariant: total weightv = partial weightv

• When total weight = partial weight there are no remote references

• Advantage: Eliminates increment messages, and therefore race conditions

Page 15: cs.technion.ac.il/~assaf/publications/gc.ps

Weighted Reference Counting• Shortcomings:

– Weight underflow• Possible solutions:

– Use partial weights which are powers of 2, keep only the exponent

– [Yu-Cox] “Stop the world”, last resort global trace

– Not resilient to message loss or duplication:• Loss may cause garbage objects to remain uncollected

• Duplication may cause an object to be prematurely collected

Page 16: cs.technion.ac.il/~assaf/publications/gc.ps

Indirect Reference Counting

• Stub contains strong and weak locators– Strong: refers to a scion in the sender node; used

only for distributed GC

– Weak: refers to the node where target object is located; used to invoke target object in a single hop

• Duplication performed locally without informing the owner node– The weak reference is sent along with the message

containing the reference

Page 17: cs.technion.ac.il/~assaf/publications/gc.ps

Indirect Reference Duplication

RA

X

RB

V

RC

U

&scionB, &scionA

VA

VA

1

1

scion

stub

weak locator

strong locator

Page 18: cs.technion.ac.il/~assaf/publications/gc.ps

Indirect Reference Deletion

RA

X

RB

V

RC

UVA

VA

1

1

scion

stub

weak locator

strong locator

Page 19: cs.technion.ac.il/~assaf/publications/gc.ps

Indirect Reference Deletion

RA

X

RB

V

RC

UVA

1

1

scion

stub

-1

Page 20: cs.technion.ac.il/~assaf/publications/gc.ps

Indirect Reference Deletion

RA

X

RB

V

RC

UVA

1 scion

stub

Page 21: cs.technion.ac.il/~assaf/publications/gc.ps

Indirect Reference Counting

• Advantages:– Unlimited number of duplications

– Access to object in one hop through weak locator

• Disadvantages:– Not resilient to message failures

– Messages are sent whenever an object is deleted

Page 22: cs.technion.ac.il/~assaf/publications/gc.ps

Reference Listing

• The object’s owner allocates a table of outgoing pointers (scions), one for each client that owns a reference to the object

• Client nodes hold tables of incoming pointers (stubs)

RA

X B

RB

X

A x

object

RC

scion

CxY

Z

XB

stub

Page 23: cs.technion.ac.il/~assaf/publications/gc.ps

Use of Timestamps

RB

Xobject

RC

scion

CxY

XB

stub

Sent &X/1

Received delete X/1

Sent delete X/1

Sent &X/2

Ignored

Page 24: cs.technion.ac.il/~assaf/publications/gc.ps

Reference Listing• Advantages:

– Resilience to message duplication when timestamps are used

– Resilience to node failure: Owner can prompt client to send a live/delete message

– Owner may explicitly query about a reference that is suspected to be part of a distributed garbage cycle

– Owner can decide whether to keep objects referred to by a crashed client node until it recovers or not

• Disadvantages:– Memory overhead

– Doesn’t collect cycles of garbage

Page 25: cs.technion.ac.il/~assaf/publications/gc.ps

Remote Reference Counting

• Advantages:– Depends only on the number of nodes in the

system• Independent of pointer operations• Independent of heap size

– Messages are sent only during GC, when the chance of collecting an object is very high

– Independent of consistency protocols and global order of operations

Page 26: cs.technion.ac.il/~assaf/publications/gc.ps

Remote Reference Counting

• Disadvantages:– Doesn’t collect cycles of garbage– Dependent on the number of nodes in the

system

Page 27: cs.technion.ac.il/~assaf/publications/gc.ps

The System Model

• Communication through a reliable asynchronous message-passing system– Messages are never lost, duplicated or altered– Messages can be delayed or arrive out of order

• Processors can share objects

• Objects can be replicated

Page 28: cs.technion.ac.il/~assaf/publications/gc.ps

Local and Remote Counters

• Local and remote counters are attached to every shared object

• Locali(X)

– Increased by m when node i receives a message containing m pointers to X

– Otherwise maintained as in traditional reference counting

– When Locali(X) = 0, i is clean - has no references to X

Page 29: cs.technion.ac.il/~assaf/publications/gc.ps

Local and Remote Counters

• Remotei(X)

– Increased by m when some object Y containing m pointers to X is sent from node i

– Decreased by m when some object Y containing m pointers to X is received at node i

– The sum of Remotei(X) is the number of pointers to X in transit in the system

Page 30: cs.technion.ac.il/~assaf/publications/gc.ps

The Algorithm - Layout• Build a spanning tree covering all the nodes• Collection of object X:

– The root send signals to all its children

– Inner nodes pass the signal down

– When a leaf is clean it sends up a token

– An inner node sends up a token when it received tokens from all its children and is clean

– When the root received tokens from all its children it checks a condition C:

• If C = true X is garbage

• Otherwise - another wave begins

Page 31: cs.technion.ac.il/~assaf/publications/gc.ps

The Algorithm

0Signals

a node with local(x) = a

Page 32: cs.technion.ac.il/~assaf/publications/gc.ps

The Algorithm

0

1

0 00

0 0 0 01 0

Tokens

a node with local(x) = a

Page 33: cs.technion.ac.il/~assaf/publications/gc.ps

The Algorithm

0

1

0 00

0 0 0 01 0

a node in S - hasn’t sent a token

TokensSS

R = R0 all the nodes outside S are clean

Page 34: cs.technion.ac.il/~assaf/publications/gc.ps

Example: R0 falsification

0

1

Y:=ZY:=Z 00

0 0 0 01 0

a node in S - hasn’t sent a token

SS j

Page 35: cs.technion.ac.il/~assaf/publications/gc.ps

Example: R0 falsification

0

XXZZ 1

Y:=ZY:=Z 00

0 0 0 01 0

a node in S - hasn’t sent a token

ZZ

Locali(x) = 1

Remotei(x) = 1

Localj(x) = 2

Remotej(x) = -1

i

jSS

Page 36: cs.technion.ac.il/~assaf/publications/gc.ps

The Algorithm• Use the remote counter to count pointers sent

and receivedi definition:

– for a node i outside S, i is the value held at remotei(X) when i sent its token

– for a node i in S, i is the value held at remotei(X)

= i

fin = at the end of the wave

Page 37: cs.technion.ac.il/~assaf/publications/gc.ps

The Algorithm

• A leaf sends in the token the value of its remote counter

• An inner node sends up the sums of its remote counter and those of its descendants

• R1 > 0

• R = R0 R1

Page 38: cs.technion.ac.il/~assaf/publications/gc.ps

Example (cont.)

0

XXZZ 1

Y:=ZY:=Z 00

0 0 0 01 0

Locali(x) = 1

Remotei(x) = 1

Localj(x) = 2

Remotej(x) = -1

i

j

= 1 R1 is true

SS

Page 39: cs.technion.ac.il/~assaf/publications/gc.ps

Example: R1 Falsification

0

XXZZ W:=YW:=Y

Y:=ZY:=Z 00

0 0 0 01 0

Locali(x) = 1

Remotei(x) = 1

Localj(x) = 2

Remotej(x) = -1

i

j

k

SS

Page 40: cs.technion.ac.il/~assaf/publications/gc.ps

Example: R1 Falsification

0

XXZZ W:=YW:=Y

Y:=ZY:=Z 00

0 0 0 01 0

Locali(x) = 1

Remotei(x) = 1

Localj(x) = 2

Remotej(x) = 0

i

j

k YY

= 0 R1 is false

Localk(x) = 2

Remotek(x) = -1

SS

Page 41: cs.technion.ac.il/~assaf/publications/gc.ps

The Algorithm

• Detect if may have decreased due to a node in S:– Initially paint all nodes in white– A node that decreases remote(X) turns black

• R2 at least one node in S is black

• R = R0 R1 R2

Page 42: cs.technion.ac.il/~assaf/publications/gc.ps

Example: R2 Falsification

0

XXZZ W:=YW:=Y

Y:=ZY:=Z 00

0 0 0 01 0

Locali(x) = 1

Remotei(x) = 1

Localj(x) = 2

Remotej(x) = 0

i

j

k YYLocalk(x) = 2

Remotek(x) = -1

SS

Page 43: cs.technion.ac.il/~assaf/publications/gc.ps

Example: R2 Falsification

0

XXZZ W:=YW:=Y

Y:=ZY:=Z 00

0 0 0 01 0

Locali(x) = 1

Remotei(x) = 1

Localj(x) = 2

Remotej(x) = 0

i

j

Localk(x) = 2

Remotek(x) = -1

k

SS

Page 44: cs.technion.ac.il/~assaf/publications/gc.ps

Example: R2 Falsification

0

XXZZ

Y:=ZY:=Z 00

0 0 0 01 0

Locali(x) = 1

Remotei(x) = 1

Localj(x) = 2

Remotej(x) = 0

i

j

k Localk(x) = 0

Remotek(x) = -1

Token

No node is S is black R2 is false

SS

Page 45: cs.technion.ac.il/~assaf/publications/gc.ps

The Algorithm

• Propagate the color information:– A node that is black or has received a black

token transmits a black token– Otherwise, transmits a white token– A node that transmits a black token becomes

white

• R3 some node in S has a black token

• R = R0 R1 R2 R3

Page 46: cs.technion.ac.il/~assaf/publications/gc.ps

Example (cont.)

0

XXZZ

Y:=ZY:=Z 00

0 0 0 01 0

Locali(x) = 1

Remotei(x) = 1

Localj(x) = 2

Remotej(x) = 0

i

j

kLocalk(x) = 0

Remotek(x) = -1

Token

SS

Page 47: cs.technion.ac.il/~assaf/publications/gc.ps

The Algorithm• C = [S = {root}

root is white and localroot(X) = 0

all tokens at the root are white

fin = 0]

• Once the root received tokens from all its children and localroot(x) = 0 it checks C:

– C = true object X is garbage– Otherwise - the root becomes white and initiates another

wave

Page 48: cs.technion.ac.il/~assaf/publications/gc.ps

Correctness Proof

• Layout:– Show that R = (R0 R1 R2 R3) is invariant– C = true (R1 R2 R3) = false

R0 = true object X is garbage

Page 49: cs.technion.ac.il/~assaf/publications/gc.ps

R0R1R2R3 is invariant

• Assume by negation R is false• Look at the wave in which R first became false:

– R = false R0 = false some node outside S was dirty

– i = the first node outside S to become dirty• Case 1: R became false before i first became dirty

– Implies that some node became dirty before i - impossible by definition of i

Page 50: cs.technion.ac.il/~assaf/publications/gc.ps

R0R1R2R3 is invariant

• Case 2: R became false after i first became dirty– i received a message containing a pointer to X

after sending its token– case 2.1: the message was sent in a previous wave

• More pointers sent than received > 0 at the beginning of the wave

– If doesn’t decrease R1 = true

– Otherwise: some node becomes black R2 R3 = true

Page 51: cs.technion.ac.il/~assaf/publications/gc.ps

R0R1R2R3 is invariant

– case 2.2: the message was sent in the current wave

• The message could have been sent only by a node j with local(X) > 0 inside S

• j increased after sending the message:

– If was < 0 before then some node turned black before i became dirty R2 R3 = true until the end of the wave

– Otherwise > 0 after j increased it R1 = true till the end of the wave or until some node becomes black

Page 52: cs.technion.ac.il/~assaf/publications/gc.ps

Correctness Proof (cont.)

• If the root hasn’t received a black token, S={root}, the root is white and fin = 0, then there are no messages in transit with pointers to object X– No node became black during the wave didn’t decrease

no messages were sent during the wave by nodes in S

– 0 at the beginning, fin = 0 = 0 for the duration of

the wave no message in transit at the beginning of the wave

– No node outside S can receive a message and become dirty

Page 53: cs.technion.ac.il/~assaf/publications/gc.ps

Correctness Proof (cont.)

• If the root hasn’t received a black token, S={root}, the root is white and fin = 0, then R0 = true– R2 R3 = false

– R1 = false

– R is invariant

• If the root hasn’t received a black token, S={root}, fin = 0 and the root is white and clean, then object X can be safely reclaimed– R0 = true - all nodes outside S are clean

– The root is clean

– There are no pointers in transit

Page 54: cs.technion.ac.il/~assaf/publications/gc.ps

Liveness Proof

• RRC doesn’t reclaim cycles• Unreferenced object - referenced from neither

local memory of any node nor from any traveling message

Page 55: cs.technion.ac.il/~assaf/publications/gc.ps

Liveness Proof

• If an object is unreferenced, it will finally be reclaimed by RRC– For all nodes local(X) = 0

– All nodes will finally send a token

– If at the root C = false another wave begins:• No messages with pointers to X exist no node

will turn black

• No pointers to X exist none will be sent = 0

• C = true at the end of the wave

Page 56: cs.technion.ac.il/~assaf/publications/gc.ps

Liveness Proof

• If a garbage object is not reachable from any garbage cycle, it will finally be reclaimed by RRC

X

X1

X2

X3

Page 57: cs.technion.ac.il/~assaf/publications/gc.ps

Liveness Proof

• If a garbage object is not reachable from any garbage cycle, it will finally be reclaimed by RRC

X

X1

X2

Page 58: cs.technion.ac.il/~assaf/publications/gc.ps

Liveness Proof

• If a garbage object is not reachable from any garbage cycle, it will finally be reclaimed by RRC

X

X1

Page 59: cs.technion.ac.il/~assaf/publications/gc.ps

Liveness Proof

• If a garbage object is not reachable from any garbage cycle, it will finally be reclaimed by RRC

X

Page 60: cs.technion.ac.il/~assaf/publications/gc.ps

Distributed Shared Memory (DSM)

• Software providing an abstraction of shared memory, running on networked workstations

• Workstation’s memory act as cache

• No messages exchanged - data shared through virtual shared memory

Page 61: cs.technion.ac.il/~assaf/publications/gc.ps

Millipage DSM

• Implements MULTIVIEW– Enables fine-grained sharing in page-based DSMs

– Eliminates false sharing

• Each object is mapped to a different virtual page, called minipage– One node is the manager of the minipage

• handles page faults - read/write requests

• invalidation of a minipage = discarding it from local memory

– Current version implements sequential consistency

Page 62: cs.technion.ac.il/~assaf/publications/gc.ps

RRC Message Waves

• A global tree is build during initialization• A wave begins when the local counter at the root

becomes 0• Communication may be asynchronous - RRC

message can be delayed and sent with other RRC or DSM messages

• Discard messages are sent only in memory reuse

Page 63: cs.technion.ac.il/~assaf/publications/gc.ps

Example

k

i

j

P P

Y

P

Xp1 Locali(X) = 1Locali(Y) = 1

Remotei(X) = 2

Localk(X) = 1

Remotek(X) = -1

Localj(X) = 1

Remotej(X) = -1

Read(X)Read(X)

PP PP

Remotei(Y) = 1

Page 64: cs.technion.ac.il/~assaf/publications/gc.ps

Example

k

i

j

P P

Y

P

Xp1Locali(Y) = 1

MinipageMinipageXX

Remotei(Y) = 1

Locali(X) = 1

Remotei(X) = 2

Localk(X) = 1

Remotek(X) = -1

Localj(X) = 1

Remotej(X) = -1

Page 65: cs.technion.ac.il/~assaf/publications/gc.ps

Example

k

i

j

P P

Y

P

Xp1Locali(Y) = 1

Remotei(Y) = 1

Localj(Y) = 1

Remotej(Y) = -1

Localk(X) = 1

Remotek(X) = -1

Localj(X) = 1

Remotej(X) = -1

Locali(X) = 1

Remotei(X) = 2

X

Page 66: cs.technion.ac.il/~assaf/publications/gc.ps

Example

k

i

j

P P

Y

P

Xp1Locali(Y) = 1

Remotei(Y) = 2

Locali(X) = 1

Remotei(X) = 2

X

Localk(Y) = 1

Remotek(Y) = -1

Localj(Y) = 1

Remotej(Y) = -1X

Localk(X) = 1

Remotek(X) = -1

Localj(X) = 1

Remotej(X) = -1

Z

Page 67: cs.technion.ac.il/~assaf/publications/gc.ps

Example

k

i

j

P P

Y

P

Xp1Locali(Y) = 1

Remotei(Y) = 2

Locali(X) = 1

Remotei(X) = 2

X

Localk(Y) = 1

Remotek(Y) = -1

Localj(Y) = 1

Remotej(Y) = -1X

Localk(X) = 1

Remotek(X) = -1

Localj(X) = 1

Remotej(X) = -1

Z

PageInvalide(X)

Page 68: cs.technion.ac.il/~assaf/publications/gc.ps

Example

k

i

j

P P

Y

P

Xp1Locali(Y) = 1

Remotei(Y) = 2

Locali(X) = 1

Remotei(X) = 2

Localk(Y) = 1

Remotek(Y) = -1

Localj(Y) = 1

Remotej(Y) = -1 XX

Localk(X) = 1

Remotek(X) = -1

Localj(X) = 1

Remotej(X) = -1

Z

Page 69: cs.technion.ac.il/~assaf/publications/gc.ps

Example

k

i

j

P P

Y

P

Xp1Locali(Y) = 0

Remotei(Y) = 2

Locali(X) = 1

Remotei(X) = 2

Localk(Y) = 0

Remotek(Y) = -1

Localj(Y) = 0

Remotej(Y) = -1 XX

Localk(X) = 1

Remotek(X) = -1

Localj(X) = 1

Remotej(X) = -1

Z

Signals

Page 70: cs.technion.ac.il/~assaf/publications/gc.ps

Example

k

i

j

P P

Y

P

Xp1Locali(Y) = 0

Remotei(Y) = 2

Locali(X) = 1

Remotei(X) = 2

Localk(Y) = 0

Remotek(Y) = -1

Localj(Y) = 0

Remotej(Y) = -1 XX

Localk(X) = 1

Remotek(X) = -1

Localj(X) = 1

Remotej(X) = -1

Z

Tokens

YY

Page 71: cs.technion.ac.il/~assaf/publications/gc.ps

Performance Evaluation

• The system:– 8 Pentium II 300 MHz

– Windows NT Workstation 4.0 SP3

– 128 Mbytes RAM

– Workstations interconnected by a switched Myrinet LAN

• Benchmarks:– Allocate objects and don’t free them

– Executed a number of times in a non-stop manner

Page 72: cs.technion.ac.il/~assaf/publications/gc.ps

Benchmarks

• Water - a parallel application from the field of molecular dynamics

• LU Decomposition - factors a dense matrix A into the product of a lower triangular matrix L and an upper triangular matrix U

• Integer Sort - sorts N integer values in parallel• Successive Over-Relaxation - input: a two dimensional grid.

In each iteration every grid element is updated to the average of its four neighboring elements

• Traveling Salesman Problem - find the minimum-cost, simple, cyclic tour in a weighted graph.

Page 73: cs.technion.ac.il/~assaf/publications/gc.ps

Application Suite

Water LU TSPIS SOR3 30 510 3

1MB 240MB 4MB160KB 24.6MB

1542 510 2502580 6150Garbage Creation Rate

(obj/sec)

No. of Runs

Shared Memory

No. of Objects

27 57 13212 5082

6.5 4.9 5.57.4 7.0Speedup on 8 Nodes

Page 74: cs.technion.ac.il/~assaf/publications/gc.ps

RRC Communication Cost

1-2 waves are enough to detect an object as garbage

0.00

0.50

1.00

1.50

2.00

2.50

3.00

3.50

IS Water LU TSP SOR

Nu

mb

er

of

RR

C m

es

sa

ge

s p

er

ob

jec

t

Page 75: cs.technion.ac.il/~assaf/publications/gc.ps

RRC Communication Cost

• Communication complexity is independent of the number of pointer operations– Simulation of different rates of pointer operations

showed no change in the number of GC messages

• Efficiency relies on 2 observations:– Object use is usually localized in time– The node that created the object is usually the last

to use it

Page 76: cs.technion.ac.il/~assaf/publications/gc.ps

RRC Communication Cost

• To improve performance:– Tokens and signals can be combined or

piggybacked on other messages– RRC can be turned off or delayed when best

performance is desired

Page 77: cs.technion.ac.il/~assaf/publications/gc.ps

Scalability

• Problem: GC waves span all the processes in the system

• Increase less than linear, also due to:– Increased garbage creation rate– Increased number of page faults– Increased number of “discard” messages

• GC overhead in a single node is independent of the number of signals and tokens sent

Page 78: cs.technion.ac.il/~assaf/publications/gc.ps

Scalability

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

1 2 3 4 5 6 7 8

Number of processors

Ove

rhea

d, %

IS (2 obj/sec)

Water (27 obj/sec)

LU (57 obj/sec)

TSP (1321 obj/sec)

Page 79: cs.technion.ac.il/~assaf/publications/gc.ps

Scalability

0

1

2

3

4

5

6

7

8

1 2 3 4 5 6 7 8

Number of processors

Spe

edup

SOR

LU

IS

WATER

TSP

linear

Page 80: cs.technion.ac.il/~assaf/publications/gc.ps

Collection in Granularity Larger than Objects

• Expected to decrease the number of GC messages

• Tested on SOR:– A single minipage contains several objects

instead of only one

Page 81: cs.technion.ac.il/~assaf/publications/gc.ps

Collection in Granularity of Pages

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

1 2 4 8 16 32

matrix rows per minipage

over

head

Total RRCoverhead

RRC messageprocessingoverhead

Other thanmessageprocessing RRCoverhead

Page 82: cs.technion.ac.il/~assaf/publications/gc.ps

Collection in Granularity of Pages

• Advantages:– Reduction in memory overhead

– Easier organization of the free list

– Cycles contained entirely within the page are collected

• Disadvantages:– Delay in reclamation

• Not a significant problem according to the memory locality principle

– Creation of false cycles

Page 83: cs.technion.ac.il/~assaf/publications/gc.ps

CPU Time - Root

0%

20%

40%

60%

80%

100%

IS Water LU TSP SOR

invalidation

on DSM page receive

on DSM page send

pointer operations

allocation

GC message processing

Page 84: cs.technion.ac.il/~assaf/publications/gc.ps

CPU Time - Inner Node

0%

20%

40%

60%

80%

100%

IS Water LU TSP SOR

invalidation

on DSM page receive

on DSM page send

pointer operations

allocation

GC message processing

Page 85: cs.technion.ac.il/~assaf/publications/gc.ps

RRC - Conclusions

• A GC algorithms that works correctly in a reliable asynchronous message passing distributed system

• Successfully implemented as a WIN32 library on Windows-NT on top of MILLIPAGE

• 2-3 messages to identify a garbage object, independent of reference graph mutations

• Use of a reference counting technique insures low computational overhead

Page 86: cs.technion.ac.il/~assaf/publications/gc.ps

RRC - Conclusions (cont.)

• Scalable - the number of GC messages sent by a single node is independent of the number of nodes

• Improvement in communication overhead with increase in collection granularity

• Unable to collect cycles