portable, mostly-concurrent, mostly-copying gc for multi-processors

Portable,mostly-concurrent,

mostly-copying GC formulti-processors

Tony HoskingSecure Software Systems

LabPurdue University

Platform assumptions

• Symmetric multi-processor (SMP/CMP)

• Multiple mutator threads• (Large heaps)

Desirable properties

• Maximize throughput• Minimize collector pauses• Scalability

Exploiting parallelism

• Avoid contention• (Mostly-)Concurrent allocation

• (Mostly-)Concurrent collection

Concurrent allocation

• Use thread-private allocation “pages”

• Threads contend for free pages• Each thread allocates from its own page• multiple small objects per page, or

• multiple pages per large object

Concurrent collection:The tricolour abstraction

• Black• “live”• scanned • cannot refer to white

• Grey• “live” wavefront• still to be scanned• may refer to any color

• White• hypothetical garbage

Garbage collection

• White = whole heap• Shade root targets grey• While grey nonempty

• Shade one grey object black• Shade its white children grey

• At end, white objects are garbage

Copying collection

• Partition white from black by copying

• Reclaim white partition wholesale

• At next GC, “flip” black to white

Mutator threads

Incremental collection

Mutator threads

Concurrent collection

Background GC thread

Concurrent mutators

• Mutation changes reachability during GC

• Loss of black/grey reference is safe• Non-white object losing its last reference will be garbage at next GC

• New reference from black to white• New reference may make target live• Collector may never see new reference

• Mutations may require compensation

Compensation options

• Prevent mutator from creating black-to-white references• write barrier on black• read barrier on grey to prevent mutator obtaining white refs

• Prevent destruction of any path from a grey object to a white object without telling GC• write barrier on grey

Mostly-copying GC [Bartlett]

• Copying collection with ambiguous roots• Uncooperative compilers• Untidy references• Explicit pinning

• Pin ambiguously-referenced objects• Shade their page grey without copying

• Assume heap accuracy• Copy remaining heap-referenced objects

Incremental MCGC[DeTreville]

• Enforce grey mutator invariant– STW greys ambiguously-referenced pages– Read barrier on grey using VM page protection

• Read barrier– Stop mutator threads– Unprotect page– Copy white targets to grey– Shade page black– Restart threads

• Atomic system call wrappers unprotect parameter targets (otherwise traps in OS return error)

Concurrent MCGC?

• Stopping all threads at each increment is prohibitive on SMP & impedes concurrency

• BUT barriers difficult to place on ambiguous references with uncooperative compilers

• ALSO Preemptive scheduling may break wrapper atomicity

Mostly-concurrent MCGC

• Enforce black mutator invariant• STW blackens ambiguously-referenced pages

• Read barrier on load of accurate (tidy) grey reference

• Read barrier:• Blacken grey references as they are loaded

• No system call wrappers: arguments are always black

Read barrier on load of grey

• Object header bit marks grey objects• Inline fast path checks grey bit in target header, calls out to slow path if set

• Out-of-line slow path:• Lock heap meta-data• For each (grey) source object in target page• Copy white targets to grey• Clear grey header bit

• Shade target page black• Unlock heap meta-data

Coherence for fast path

• STW phase synchronizes mutators’ views of heap state

• Grey bits are set only in newly-copied objects (ie, newly-allocated grey pages) since most recent STW

• Mutators can never see a cleared grey header unless the page is also black

• Seeing a spurious grey header due to weak ordering is benign: slow path will synchronize

Implementation

• Modula-3:• gcc-based compiler back-end• No tricky target-specific stack-maps• Compiler front-end emits barriers• M3 threads map to preemptively-scheduled POSIX pthreads

• Stop/start threads: signals + semaphores, or OS primitives if available

• Simple to port: Darwin (OS X), Linux, Solaris, Alpha/OSF

Experiments

• Parallelized GCOld benchmark to permit throughput measurements for multiple mutators

• Measures steady-state GC throughput

• 2 platforms:• 2 x 2.3GHz PowerPC Macintosh Xserve running OS X 10.4.4

• 8 x 700MHz Intel Pentium 3 SMP running Linux 2.6

Read Barriers: STW1 user-level mutator thread, work=1

0.1 0.5 1 2 4 8

GC ratio

elapsed time (s)

Hardware Software

Elapsed time (s)1 system-level mutator thread, work=1

0.1 0.5 1 2 4 8

GC ratio

elapsed time (s)

STW INC

Heap size1 system-level mutator thread

0.1 0.5 1 2 4 8

GC ratio

maximum heap (MB)

STW INC

BMU1 system-level mutator thread,

work=1000, ratio=1

Scalabilitywork=1000, ratio=1, 8xP3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

mutator threads

elapsed time (s)

STW INC

Java Hotspot serverwork=1000, 8xP3

1 2 3 4 5 6 7 8

mutator threads

elapsed time (s)

Serial Concurrent MS

Conclusions

• Mostly-concurrent,mostly-copying collection is feasible for multi-processors (proof-of-existence)

• Performance is good (scalable)• Portable: changes only to compiler front-end to introduce barriers, and to GC run-time system

• Compiler back-end unchanged: full-blown optimizations enabled, no stack-map overheads

Future work

• Convert read barrier to “clean” only target object instead of whole page

Scalabilitywork=10, ratio=1, 8xP3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

mutator threads

elapsed time (s)

STW INC

Java Hotspot serverwork=10, 8xP3

1 2 3 4 5 6 7 8

mutator threads

elapsed time (s)

Serial Concurrent MS

portable, mostly-concurrent, mostly-copying gc for multi-processors

page grey

grey references

white objects

grey source object

grey object blackshade

white referenceswrite

safenonwhite object

target pagecopy white

Documents

draft. not for quotation or copying. - pompeu fabra...

hydraulic copying attachment

yvr18-405:to jdk 11 and beyond! · jep 333: zgc: a scalable...

9. copying effects

instructions for copying - macmillanmh.com · instructions...

copying guide

copying y combinator

a mostly non-copying real-time collector with low overhead...

owncloud scalability and a nextcloud design for 10.000-20...

effective concurrent audit/ concurrent audit reporting

an implementation of mostly- copying gc on ruby vm tomoharu...

bw copying specs

02 copying objects

paralleljava 2012 8 conditiondeadlockwr ·...

a generational mostly-concurrent garbage...

instructions for copying

basic copying

ismm 2004 mostly concurrent compaction for mark-sweep gc...

new chemicalengineeringscience · 2018. 1. 1. · are...

copying guide4