ismm 2004 mostly concurrent compaction for mark-sweep gc yoav ossia, ori ben-yitzhak, marc segal ibm...
TRANSCRIPT
ISMM 2004
Mostly Concurrent Compaction for Mark-Sweep GC
Yoav Ossia, Ori Ben-Yitzhak, Marc SegalIBM Haifa Research Lab. Israel
IBM Labs in Haifa
ISMM 2004
Prologue: Commercial Multi-tier Applications
Clients (or load injectors) Sending requests to Server
Web Server – using application Application and database
Transaction – a request cycle
Performance requirements Restricted resource utilization on the server
(e.g., CPU utilization below 50%)Throughput – Transactions per secondAverage transaction response time
ClientClient Client
Server
Application
DB
IBM Labs in Haifa
ISMM 2004
Prologue: Commercial Multi-tier Applications
Clients (or load injectors) Sending requests to Server
Web Server – using application Application and database
Transaction – a (set of) request cycle(s)
Performance requirements Restricted resource utilization on the server
(e.g., CPU utilization below 50%)Throughput – Transactions per secondAverage transaction response time
ClientClient Client
Server
Application
DB
IBM Labs in Haifa
ISMM 2004
Prologue: Commercial Multi-tier Applications
Clients (or load injectors) Sending requests to Server
Web Server – using application Application and database
Transaction – a (set of) request cycle(s)
Performance requirements Throughput – Transactions per second Average Transaction Response Time At restricted CPU utilization
(e.g., below 50%)
ClientClient Client
Server
Application
DB
IBM Labs in Haifa
ISMM 2004
Prologue: Observations
GC share is negligible (every 20 sec.) in all examples
1. Long compaction occurs
2. Switch from 500 ms Stop-The-World (STW) GC, to 250 ms mostly concurrent GC
1
2
IBM Labs in Haifa
ISMM 2004
Prologue: Observations
GC share is negligible (every 20 sec.) in all examples
1. Long compaction occurs
2. Switch from 500 ms Stop-The-World (STW) GC, to 250 ms mostly concurrent GC
1
2
Ave
rag
e R
es
po
ns
e t
ime
Time
1.0
2.0
Ave
rag
e R
es
po
ns
e t
ime
Time
1.0
2.0
IBM Labs in Haifa
ISMM 2004
Prologue: Insights
Average response time overreacts To shorter GC pause time To occasional compaction
Why? Longer GC pause times create a queue
of transactions Queue persist long after the GC
Transaction timeout creates additional work
Conclusion: “some” pause time is acceptable but extras should be avoided
IBM Labs in Haifa
ISMM 2004
Prologue: The Clinic Analogy
Receptionist handles the incoming patient in 5 minutes, the physician in 10 minutes. Appointments are scheduled every 10 minutes
An appointmentlasts 15 minutes :=)
But if the receptionist takes a long break…
When he returns,appointmentslast ~50 minutes :=(
Only after a while, with hard work (of both receptionist and physician), QoS may be restored
IBM Labs in Haifa
ISMM 2004
Prologue: The Physician Analogy
Receptionist handles the incoming patient in 5 minutes, the physician in 10 minutes. Appointments are scheduled every 10 minutes
An appointmentlasts 15 minutes :=)
But if the receptionist takes a long break…
When he returns,appointmentslast ~50 minutes :=(
Only after a while, with hard work (of both receptionist and physician), QoS may be restored
IBM Labs in Haifa
ISMM 2004
Prologue: The Physician Analogy
Receptionist handles the incoming patient in 5 minutes, the physician in 10 minutes. Appointments are scheduled every 10 minutes
An appointmentlasts 15 minutes :=)
But if the receptionist takes a long break…
When he returns,appointmentslast ~50 minutes :=(
Only after a while, with hard work (of both receptionist and physician), QoS may be restored
IBM Labs in Haifa
ISMM 2004
Prologue: The Physician Analogy
Receptionist handles the incoming patient in 5 minutes, the physician in 10 minutes. Appointments are scheduled every 10 minutes
An appointmentlasts 15 minutes :=)
But if the receptionist takes a long break…
When he returns,appointmentslast ~50 minutes :=(
Only after a while, with hard work (of both receptionist and physician), QoS may be restored
IBM Labs in Haifa
ISMM 2004
Prologue: The Physician Analogy
Receptionist handles the incoming patient in 5 minutes, the physician in 10 minutes. Appointments are scheduled every 10 minutes
An appointmentlasts 15 minutes :=)
But if the receptionist takes a long break…
When he returns,appointmentslast ~50 minutes :=(
Only after a while, with hard work (of both receptionist and physician), QoS may be restored
IBM Labs in Haifa
ISMM 2004
Outline
Prologue – Commercial applications Mark Sweep (and Compact) GC Mostly Concurrent Compaction
Overview The generic algorithm Our implementation
Results Related work, conclusions and future directions
IBM Labs in Haifa
ISMM 2004
Mark-Sweep (and Compact) GC
Used by many modern memory management systems Either for the entire heap, or for parts (e.g., the old objects area of
generational GC) Good performance on large server heaps Usually activated by an allocation request, when the heap is full
Mark - tags all objects that are reachable from roots
Sweep – Reclaims unmarked objects into list of free chunks
Result may be unsatisfactory (fragmentation) Compact – packs together all live objects, creating a large free
chunk
IBM Labs in Haifa
ISMM 2004
Characteristics of Compaction Includes two activities
Move of live objects Fix-up of all references (in objects and roots) to new locations
Advantages Eliminates fragmentation and enables (better, faster) allocation Better cache locality
Disadvantages Very expensive. Typically takes much more time than Mark- Sweep Done in Stop-The-World (STW) mode Severe impact on pause time
Avoided as much as possible, but is occasionally inevitable Compaction is the weak point of Mark Sweep GC (pause time)
IBM Labs in Haifa
ISMM 2004
Outline
Prologue – Commercial applications Mark Sweep (and Compact) GC Mostly Concurrent Compaction
Overview The generic algorithm Our implementation
Results Conclusions and future directions
IBM Labs in Haifa
ISMM 2004
Mostly Concurrent Compaction - Overview
Our Goal: restrict the effect of compaction on pause time Typically to less than mark time For average response time QoS, critical code (e.g., heartbeat)
Method – partial Move in STW, concurrent Fix-up Reduce the pause time of the Move phase, by using
incremental compaction Select the compacted part according to sweep results To optimize compaction impact and control pause time effect
Execute the fix-up phase after the move, when application threads are resumed Correctness preserved by page-protecting the unfixed objects
from application threads access
IBM Labs in Haifa
ISMM 2004
Assumptions About the Environment
Memory management module Uses Mark Sweep GC Has a move operation – able to pack objects in the heap Supplies fix-up logic - knows the new location of an object by
the original address Operating system services
Map2 - maps physical memory into two virtual address ranges, or views
ProtN - protects a virtual address range of pages from read and write access.
Unprot - removes the protection from specified page(s) Execute a Trap routine upon page access violation
IBM Labs in Haifa
ISMM 2004
Outline
Prologue – Commercial applications Mark Sweep (and Compact) GC Mostly Concurrent Compaction
Overview The generic algorithm Our implementation
Results Conclusions and future directions
IBM Labs in Haifa
ISMM 2004
The Generic Algorithm - Details
At Application initialization Use Map2 to create the application view and the fix-up view
of the heap Calculating the areas to compact
Motivation: optimal quality at restricted move time Heap is divided into small sections (e.g., 100 sections) Gather object layout information during sweep
Per section: free space, number of small free chunks, etc. Select the optimal set of sections for compaction Using configurable policy/heuristic
IBM Labs in Haifa
ISMM 2004
The Generic Algorithm - Details (cont.)
Move phase Objects are compacted within the selected areas
Fix-up of root references Prepare the heap pages
Page protect all heap pages that contain objects Reset state of all pages (that contain objects) to Unfixed Rest of (“free”) pages are set to Fixed
Resume execution of application threads
IBM Labs in Haifa
ISMM 2004
The Generic Algorithm - Concurrent Fix-up (method)
Constrains All Unfixed pages are fixed, and only once A page starts as Unfixed (and protected) , then Busy, and
finally Fixed (and unprotected) Application threads access only Fixed pages
Fix-up of page (Exclusive Fix) Done only by a thread that managed to change the page’s
state from Unfixed to Busy All the (protected) page’s references are fixed. Page is
accessed through the (unprotected) Fix-up view Protection is lifted Page state is set to Fixed
IBM Labs in Haifa
ISMM 2004
The Generic Algorithm - Concurrent fix-up (who/how)
Concurrent Fixing – fix-up that is initiated by the collector All concurrency flavors are possible Concurrent Fixers scan the heap, and try to Exclusively Fix
each page Failure is OK; someone else did (or is doing) the fix-up
Trapped Fixing – forced fix-up Access violating application thread becomes a Trapped Fixer Executes a trap routine that attempts to Exclusively Fix the
accessed page If fails, thread must wait till page becomes Fixed
Completed when Concurrent Fixing exhaust the heap
IBM Labs in Haifa
ISMM 2004
Outline
Prologue – Commercial applications Mark Sweep (and Compact) GC Mostly Concurrent Compaction
Overview The generic algorithm Our implementation
Results Conclusions and future directions
IBM Labs in Haifa
ISMM 2004
Our Implementation
Implemented for Java, on top of the IBM J9 JVM Using Mark Sweep GC on the entire heap Reusing parallel move code and fix-up logic of J9’s compactor
Configurable fix-up unit, bigger than the OS page size Fix-up more than an OS page on each trap Fewer access violations (more “hot” memory fixed each time)
Reduces the relative cost of traps Longer trapped fixing
We found that a significant unit size increase can be tolerated
Concurrent fixing by incremental work of the Java threads For each X KB of allocation, fix-up X*F KB of heap space
IBM Labs in Haifa
ISMM 2004
Outline
Prologue – Commercial applications Mark Sweep (and Compact) GC Mostly Concurrent Compaction
Overview The generic algorithm Our implementation
Results Conclusions and future directions
IBM Labs in Haifa
ISMM 2004
Testing Environment Red Hat Linux OS Pentium 4 Intel uniprocessor and a 4-way, Intel Xeon MP
processors, server Benchmarks: SPECjbb2000, Health (from Java-olden suite)
and SPECjvm98
Compaction triggered every N GCs N=10 for SPECjvm98, 15 for SPECjbb, and 1 for Health
No compact (Base) compared to compact with three area selection heuristics: Dark Matter reduction (DM) Creating Bigger Free chunks (BF) Round-Robin (RR)
IBM Labs in Haifa
ISMM 2004
Results : Throughput and Pause Time (Highlights)
Minor effect on pause time Area selection heuristics matters, and should not be
hard-coded
Benchmark Pause time contribution
relative to mark
Throughput improvement
Best area selection heuristic
SPECjvm up to 30%Improves most benchmarks(2% to 10%)
Varies(round-robin usually inferior)
Health up to 15% Base needs twice the heap to complete
Dark matter reduction
SPECjbb up to 15% No improvement round-robin is inferior
IBM Labs in Haifa
ISMM 2004
Results: Overall Costs of Concurrent Fix-up INCR-C - our Mostly Concurrent incremental compactor to
INCR-STW - same incremental move with STW fix-up FULL-STW - full heap move with STW fix-up
STWinc pause time contribution is 3 times the move timeNo throughput gain over our compactor
STWfull has very large pause time increase Compaction time is up to ten times the mark time Significant throughput gain with Health, some gain with SPECjvm
Concurrent fix-up is better than STW fix-up, for incremental compaction
Partial (but “smart”) compaction may be more effective than full compaction
Collector Pause time additionrelative to mark
Throughput gain over MCincr
INCR-C up to 1/3
INCR-STW up to 3 times No throughput gain
FULL-STW up to 10 times Health: significant, SPECjvm: some
IBM Labs in Haifa
ISMM 2004
Results: Cost of Access Violations
Concern: recently, page protection techniques became relatively inefficient, due to increase in computational speed
SPECjbb costs of Trapped fix-up
Conclusion: For concurrent fix-up, bigger fix-up units (64 KB-256 KB) are acceptable, and justify the use of page protection techniques
32 65 128 256Fix-up unit size (KB)
0
0.04
0.08
0.12
0.16
0.2
Total cost of Trapped fix-up (ms)
32 65 128 256Fix-up unit size (KB)
0
20
40
60
80
100
Relative cost of access violation (%)
IBM Labs in Haifa
ISMM 2004
Results: Java Mutator Utilization
Concern: Trapped fix-up cannot be controlled. If most pages are accessed all the time, the Java application, right
after STW, will practically do nothing but Trapped fix-up We measured the portion of time spent on trapped fix-up in first
450 ms
Acceptable Java utilization Reasonable Java utilization
after 50..100 ms
With 256 KB fix-up unit results are even better
SPECjbb’s Java utilizationin first 100 ms improves from 16% to 48%
1030
5070
90110
130150
170190
250350
450
Time from resuming the application (ms)
0
10
20
30
40
50
60
70
80
90
100
Trap
ped
fix-u
p pa
rt (%
)
jess
mtrt
javac
db
Jack
health
SPECjbb
IBM Labs in Haifa
ISMM 2004
Results: Java Mutator Utilization
Concern: Trapped fix-up cannot be controlled. If most pages are accessed all the time, the Java application, right
after STW, will practically do nothing but fix-up We measured the portion of time spent on trapped fix-up in first
450 ms
Acceptable Java utilization Reasonable Java utilization
after 50..100 ms With 256 KB fix-up unit
results are even betterSPECjbb’s Java utilization
in first 100 ms improves from 16% to 48%
1030
5070
90110
130150
170190
250350
450
Time from resuming the application (ms)
0
10
20
30
40
50
60
70
80
90
100
Trap
ped
fix-u
p pa
rt (
%)
jess
mtrt
javac
db
SPEC256
health
SPECjbb
IBM Labs in Haifa
ISMM 2004
Outline
Prologue – Commercial applications Mark Sweep (and Compact) GC Mostly Concurrent Compaction
Overview The generic algorithm Our implementation
Results Related work, conclusions and future directions
IBM Labs in Haifa
ISMM 2004
Related Work
Compaction techniques Jonkers, Morris - The threaded algorithm. 1978, 1979 Flood et al - Parallel garbage collection for shared memory
multiprocessors. 2001 Sachindran and Moss - Mark Copy: Fast copying GC with less
space overhead. 2003 Abuaiadh et al - An efficient parallel heap compaction algorithm.
2004
Incremental compaction Lang and Dupont - Incremental incrementally compacting garbage
collection. 1987 Ben-Yitzhak et al. - An algorithm for parallel incremental
compaction. 2002
IBM Labs in Haifa
ISMM 2004
Related Work (cont.)
Concurrent Copying collectors Baker - List processing in real-time on a serial computer. 1978 Brooks - Trading data space for reduced time and code space. 1984 Appel et al. - Real-time concurrent collection on stock
multiprocessors. 1988
Fully concurrent compaction Larose and Feeley - A compacting incremental collector and its
performance...1998 Bacon et al. - Controlling fragmentation and space consumption in
the metronome. 2003
Use of page protection Appel et al. - Virtual memory primitives for user programs.
1991
IBM Labs in Haifa
ISMM 2004
Conclusions
A solution is proposed for bounding the pause time effect of compaction
Mostly concurrent compaction: A generic solution suitable for Mark Sweep, and other GCs Method – partial Move in STW, concurrent Fix-up
A Java implementation is presented, on top of IBM J9 JVM Minor pause time hit (less than 1/3 of the mark time) Highly efficient - No significant hit due to concurrent
fix-up Improved performance with most benchmarks
IBM Labs in Haifa
ISMM 2004
Future Directions
Explore adaptive and sophisticated methods for: Triggering of the mostly concurrent compaction Choosing an optimal policy for selecting the parts to
compact Minimize the costs of Trapped Fix-up, by performing
“proactive” concurrent fix-up Fix the predicted next locations of access violations, rather
than performing sequential pass of heap
IBM Labs in Haifa
ISMM 2004
End
IBM Labs in Haifa
ISMM 2004
Java Mutator Utilization – The Mark Perspective
1030
5070
90110
130150
170190
250350
450
Time from resuming the application (ms)
0
10
20
30
40
50
60
70
80
90
100
Trap
ped
fix-u
p pa
rt (
%)
jess
mtrt
javac
db
SPEC256
health
SPECjbb
2000…